Ensuring Correlated Partition Exclusion #2

Ensuring Correlated Partition Exclusion #2 December 20, 2009

Posted by mwidlake in performance, Uncategorized, VLDB.
Tags: partitions, performance, VLDB
trackback

A few days ago {oh dear, it is now two weeks!} I showed how you could better control Correlated Partition Exclusion by creating a lookup table that showed, for each partition, the minimum and maximum ID (the partition key) and the minimum and maximum CRE_DATETIME (what you want to limit your query on). Using this range table in the way I described, you did not have to take an educated guess as to what range of IDs you used to include the partitions for the date range you were interested in, you could check the table.

But this only worked reliably where the ID and CRE_DATETIME increase directly in proportion to each other. It was a special case.

What about the more normal case, where there is a correlation between the two columns but it is not a perfect correlation? This would happen, for example, if the CRE_DATETIME is entered from another system, or from paper records, where the order of the records is not enforced. So some records from today get put into the system before some from yesterday. Or if the correlation is even loser than this. eg you are looking at orders for new customers. You “know” there are going to be no orders for these customers from 5 years ago but you are not sure how far back in the orders table you should go to find what you want.

You can still use the lookup table method. The lookup table in effect becomes a meta-index – an index of the segments where you will find data but not actually the rows.

To demonstrate this, I created a table where the ID and CRE_DATETIME increase in order:

create a partitioned table test_p4
(id           number(10) not null
,cre_datetime date not null
,status        number(1) not null
,num_1         number(4) not null -- random 20
,num_2         number(4) -- random 500
,num_3         number(5) -- cycle smoothly
,num_4         number(5) -- Random 10000
,vc_1          varchar2(10)
,vc_2          varchar2(10)
,vc_pad        varchar2(2000))
tablespace users
partition by range (id)
(partition id_01k values less than (1000)
tablespace users
,partition id_02k values less than (2000)
tablespace users
...
,partition id_45k values less than (45000)
tablespace users
,partition id_max values less than (maxvalue)
tablespace users
)
--
-- local partitioned indexes on the table
 IND_NAME           TAB_NAME           PSN       COL_NAME
 ------------------ ------------------ --------- ------------
 TP4_CRE_DT         TEST_P4            1         CRE_DATETIME

 TP4_PK             TEST_P4            1         ID
--
-- populate the table with data
insert into test_p4(id,cre_datetime,status,num_1,num_2,num_3,num_4
                  ,vc_1,vc_2,vc_pad)
select rownum
,to_date('01-JUL-2009','DD-MON-YYYY')+(rownum/360)
,decode(mod(rownum,100),0,1
              ,0)
,trunc(dbms_random.value(1,20)) 
,trunc(dbms_random.value(1,50)) 
,mod(rownum,10)+1
,trunc(dbms_random.value(1,10000))
,dbms_random.string('U',5)
,lpad(chr(mod(rownum,6)+65),5,chr(mod(rownum,6)+65) )
,lpad('A',1000,'A')
from dba_objects
where rownum <43000

I then messed with the data, updating 10% of records and setting the CRE_DATETIME to plus or minus a random amount up to 2 days different, so data in partitions would overlap. I then created a range table in the same way as I did for the previous post.

I ended up with a range table like the below:

MIN_CRE_DATI      MAX_CRE_DATI          MIN_ID     MAX_ID
----------------- ----------------- ---------- ----------
29-JUN-2009 20:35 05-JUL-2009 14:24          1        999
02-JUL-2009 04:50 07-JUL-2009 12:56       1000       1999
05-JUL-2009 02:34 10-JUL-2009 08:31       2000       2999
08-JUL-2009 05:23 13-JUL-2009 03:32       3000       3999
11-JUL-2009 08:07 15-JUL-2009 18:41       4000       4999
14-JUL-2009 00:14 18-JUL-2009 18:27       5000       5999
16-JUL-2009 08:58 21-JUL-2009 14:18       6000       6999
19-JUL-2009 01:28 24-JUL-2009 11:21       7000       7999
22-JUL-2009 08:02 27-JUL-2009 07:01       8000       8999
24-JUL-2009 22:06 30-JUL-2009 05:37       9000       9999
28-JUL-2009 04:59 01-AUG-2009 10:57      10000      10999
...
24-SEP-2009 01:52 28-SEP-2009 18:36      31000      31999
26-SEP-2009 16:49 01-OCT-2009 01:26      32000      32999
29-SEP-2009 13:20 04-OCT-2009 13:43      33000      33999
02-OCT-2009 08:40 07-OCT-2009 10:11      34000      34999
05-OCT-2009 04:29 10-OCT-2009 04:09      35000      35999
08-OCT-2009 02:04 12-OCT-2009 17:34      36000      36999
10-OCT-2009 20:03 15-OCT-2009 08:39      37000      37999
13-OCT-2009 15:09 18-OCT-2009 12:01      38000      38999
16-OCT-2009 06:49 21-OCT-2009 03:53      39000      39999
18-OCT-2009 20:16 23-OCT-2009 21:01      40000      40999
21-OCT-2009 12:10 26-OCT-2009 07:13      41000      41999
25-OCT-2009 01:29 29-OCT-2009 20:56      42000      42999

You can see that the MIN_CRE_DATI-MAX_CRE_DATI from record to record (partition to partition) overlap but generally increase.

How do you find the start and end of the ID range to cover a date period you are interested in? I always have to sit down with a pen a paer to work this out. It is the classic “overlapping ranges” check, but my brain cannot hold on to it. So here goes. I want all records between the 1st of October and the 6th, this year.

You want to find the partition with the lowest ID range which has a record that falls into the date range you want. ie the Maximum CRE_DATETIME record is as late or later than the range you are interested in. If the maximum CRE_DATETIME is less than the date range you are interested in, no records will be in that partition.

Here is the code to find the lowest partition

select min(min_id)
from tp4_range 
where MAX_CRE_DATI>=to_date('01-OCT-2009','DD-MON-YYYY')
MIN(MIN_ID)
-----------
      32000

Execution Plan
----------------------------------------------------------
| Id  | Operation                    | Name           | Rows  | Bytes | Cost 
----------------------------------------------------------
|   0 | SELECT STATEMENT             |                |     1 |    12 |     2
|   1 |  SORT AGGREGATE              |                |     1 |    12 |         
|   2 |   TABLE ACCESS BY INDEX ROWID| TP4_RANGE      |    12 |   144 |     2
|*  3 |    INDEX RANGE SCAN          | TP4R_MACD_MIAD |    12 |       |     1
----------------------------------------------------------

To cover the whole of the partition of interest you need to look for records with and I greater than the minimum in that partition, thus the selection of the min(min_id)

Similarly, you only want partitions where the minimum CRE_DATETIME is before the end of the date range you want. If all records in the partition have a CRE_DATETIME beyond the range, you are not interested in it.

select max(max_id)
             from tp4_range where MIN_CRE_DATI<=to_date('06-OCT-2009','DD-MON-YYYY')
MAX(MAX_ID)
-----------
      35999

Execution Plan
----------------------------------------------------------
| Id  | Operation                    | Name           | Rows  | Bytes | Cost 
----------------------------------------------------------
|   0 | SELECT STATEMENT             |                |     1 |    13 |     2
|   1 |  SORT AGGREGATE              |                |     1 |    13 |         
|   2 |   TABLE ACCESS BY INDEX ROWID| TP4_RANGE      |    37 |   481 |     2
|*  3 |    INDEX RANGE SCAN          | TP4R_MICD_MIID |    37 |       |     1
----------------------------------------------------------

Then you put these sub-queries into the query to select the data you want. Below I show the “original” code which does not use the partition range table and then using the partition range table, to prove the same number of records come back and to see the plan and statistics change:

select count(*) from test_p4
where cre_datetime between to_date('01-OCT-2009','DD-MON-YYYY')
                   and     to_date('06-OCT-2009','DD-MON-YYYY')
and num_2 = 5

  COUNT(*)
----------
        48

Execution Plan
------------------------------------------------------
| Id  | Operation                           | Name       | R
ows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
------------------------------------------------------------
------------------------------------------------------
|   0 | SELECT STATEMENT                    |            |
   1 |    11 |   356   (1)| 00:00:02 |       |       |
|   1 |  SORT AGGREGATE                     |            |
   1 |    11 |            |          |       |       |
|   2 |   PARTITION RANGE ALL               |            |
  36 |   396 |   356   (1)| 00:00:02 |     1 |    46 |
|*  3 |    TABLE ACCESS BY LOCAL INDEX ROWID| TEST_P4    |
  36 |   396 |   356   (1)| 00:00:02 |     1 |    46 |
|*  4 |     INDEX RANGE SCAN                | TP4_CRE_DT |
1788 |       |    52   (0)| 00:00:01 |     1 |    46 |
------------------------------------------------------------
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
        676  consistent gets
          0  physical reads

Note the 676 consistent gets and the Pstart/Pstop of 1-46

select count(*) from test_p4
where cre_datetime between to_date('01-OCT-2009','DD-MON-YYYY')
                   and     to_date('06-OCT-2009','DD-MON-YYYY')
and id> (select min(min_id)
             from tp4_range where MAX_CRE_DATI
                 >=to_date('01-OCT-2009','DD-MON-YYYY'))
and id < (select max(max_id)
             from tp4_range where MIN_CRE_DATI
                <=to_date('06-OCT-2009','DD-MON-YYYY'))
and num_2 = 5

  COUNT(*)
----------
        48

Execution Plan
-----------------------------------------------------------
| Id  | Operation                            | Name
  | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
------------------------------------------------------------
-----------------------------------------------------------
|   0 | SELECT STATEMENT                     |
  |     1 |    16 |    66   (5)| 00:00:01 |       |       |
|   1 |  SORT AGGREGATE                      |
  |     1 |    16 |            |          |       |       |
|   2 |   PARTITION RANGE ITERATOR           |
  |     1 |    16 |    62   (5)| 00:00:01 |   KEY |   KEY |
|*  3 |    TABLE ACCESS BY LOCAL INDEX ROWID | TEST_P4
  |     1 |    16 |    62   (5)| 00:00:01 |   KEY |   KEY |
|   4 |     BITMAP CONVERSION TO ROWIDS      |
  |       |       |            |          |       |       |
|   5 |      BITMAP AND                      |
  |       |       |            |          |       |       |
|   6 |       BITMAP CONVERSION FROM ROWIDS  |
  |       |       |            |          |       |       |
|   7 |        SORT ORDER BY                 |
  |       |       |            |          |       |       |
|*  8 |         INDEX RANGE SCAN             | TP4_PK
  |  1788 |       |     2   (0)| 00:00:01 |   KEY |   KEY |
|   9 |          SORT AGGREGATE              |
  |     1 |    12 |            |          |       |       |
|  10 |           TABLE ACCESS BY INDEX ROWID| TP4_RANGE
  |    12 |   144 |     2   (0)| 00:00:01 |       |       |
|* 11 |            INDEX RANGE SCAN          | TP4R_MACD_MIA
D |    12 |       |     1   (0)| 00:00:01 |       |       |
|  12 |          SORT AGGREGATE              |
  |     1 |    13 |            |          |       |       |
|  13 |           TABLE ACCESS BY INDEX ROWID| TP4_RANGE
  |    37 |   481 |     2   (0)| 00:00:01 |       |       |
|* 14 |            INDEX RANGE SCAN          | TP4R_MICD_MII
D |    37 |       |     1   (0)| 00:00:01 |       |       |
|  15 |       BITMAP CONVERSION FROM ROWIDS  |
  |       |       |            |          |       |       |
|  16 |        SORT ORDER BY                 |
  |       |       |            |          |       |       |
|* 17 |         INDEX RANGE SCAN             | TP4_CRE_DT
  |  1788 |       |    52   (0)| 00:00:01 |   KEY |   KEY |
------------------------------------------------------------

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
        350  consistent gets
          0  physical reads

The plan is admittedly more complex and the SQL can be tricky to understand if you are not used to code that looks for overlap of ranges. But Consistent gets is down to 350 and the Pstart/Pstop values are KEY-KEY. The CBO cannot tell you WHAT the ranges will be when the code is parsed {I think it should check but in 10.2.0.3 at least, the CBO is not that smart at parse time}, but it knows there will be a start and stop.

I am using tiny partitions for my example and only 45 of them for the table. When the partitions are for millions of rows and there are a couple of thousand of them, excluding partitions in a manner you can rely on is a powerful tool. Which leads onto a final word of caution.

Nothing is enforcing that the range table is maintained.

You could do things with triggers or regular re-calculation of the ranges table but this will be something you need to consider if you use ranges tables to help partition exclusion. Flipping tablespaces to read-only can help the worry go away though… 🙂

As an example of the issue of needing to maintain the ranges table and also a demonstration that the ranges table does work correctly if maintainted, I’ll update a record well outside of the “expected window”, show that it does not appear in my range-check controlled code, then update the ranges table and try again.

update test_p4
set cre_datetime=to_date(’02-OCT-2009 11:15′,’DD-MON-YYYY HH24:MI’)
,NUM_2=5
where id=22100
/
3> select count(*) from test_p4
where cre_datetime between to_date(’01-OCT-2009′,’DD-MON-YYYY’)
and to_date(’06-OCT-2009′,’DD-MON-YYYY’)
and id> (select min(min_id)
from tp4_range where MAX_CRE_DATI>=to_date(’01-OCT-2009′,’DD-MON-YYYY’))
and id < (select max(max_id)
from tp4_range where MIN_CRE_DATI (select min(min_id)
from tp4_range where MAX_CRE_DATI>=to_date(’01-OCT-2009′,’DD-MON-YYYY’))
and id < (select max(max_id)
from tp4_range where MIN_CRE_DATI<=to_date('06-OCT-2009','DD-MON-YYYY'))
and num_2 = 5

COUNT(*)
———-
49
— success
—

I wonder if I will do the next posting on this topic in less than two weeks!

Comments»

1. Bernard Polarski - December 24, 2009: This is the best that could be done in the direction of partition pruning but it will not spare the visit to each local root block of each local index of ‘cr_time’. When you have 2 000 partitions, the simplest query will requests 4000 gets before even start working (2k of each subquery. The (select max(id)…where MIN_CRE_DATI …) will have to visit each root block of each local index (partitions) on CRE_TIME.

You mentioned this in the first post of this series about inefficient partitioning. Up to now I could not circumvent this issue. Creating CRE_TIME as a global with 2 Tera of data to rebuild/scan every time a partition is added/dropped, is simply a non-affordable-solution.

An enhancement on index partition for correlate columns would be to create a global header segment for secondary local indexes that will store the high and low value of each local index segment, so that query on unique values would lead to immediate massive pruning and maintenance of this header in the add/drop partitions would be light. I wonder if there is not such a function in the DBMS_GATHER for local indexes.

Reply
2. Blogroll Report – 18/12/2009-25/12/2009 « Coskan’s Approach to Oracle - January 6, 2010: […] 3-Transforming queries for ensuring Partition Pruning? Martin Widlake- Ensuring Correlated Partition Exclusion #2 […]

Reply

Martin Widlake's Yet Another Oracle Blog