Ensuring Correlated Partition Exclusion #2 December 20, 2009
Posted by mwidlake in performance, Uncategorized, VLDB.Tags: partitions, performance, VLDB
trackback
A few days ago {oh dear, it is now two weeks!} I showed how you could better control Correlated Partition Exclusion by creating a lookup table that showed, for each partition, the minimum and maximum ID (the partition key) and the minimum and maximum CRE_DATETIME (what you want to limit your query on). Using this range table in the way I described, you did not have to take an educated guess as to what range of IDs you used to include the partitions for the date range you were interested in, you could check the table.
But this only worked reliably where the ID and CRE_DATETIME increase directly in proportion to each other. It was a special case.
What about the more normal case, where there is a correlation between the two columns but it is not a perfect correlation? This would happen, for example, if the CRE_DATETIME is entered from another system, or from paper records, where the order of the records is not enforced. So some records from today get put into the system before some from yesterday. Or if the correlation is even loser than this. eg you are looking at orders for new customers. You “know” there are going to be no orders for these customers from 5 years ago but you are not sure how far back in the orders table you should go to find what you want.
You can still use the lookup table method. The lookup table in effect becomes a meta-index – an index of the segments where you will find data but not actually the rows.
To demonstrate this, I created a table where the ID and CRE_DATETIME increase in order:
create a partitioned table test_p4 (id number(10) not null ,cre_datetime date not null ,status number(1) not null ,num_1 number(4) not null -- random 20 ,num_2 number(4) -- random 500 ,num_3 number(5) -- cycle smoothly ,num_4 number(5) -- Random 10000 ,vc_1 varchar2(10) ,vc_2 varchar2(10) ,vc_pad varchar2(2000)) tablespace users partition by range (id) (partition id_01k values less than (1000) tablespace users ,partition id_02k values less than (2000) tablespace users ... ,partition id_45k values less than (45000) tablespace users ,partition id_max values less than (maxvalue) tablespace users ) -- -- local partitioned indexes on the table IND_NAME TAB_NAME PSN COL_NAME ------------------ ------------------ --------- ------------ TP4_CRE_DT TEST_P4 1 CRE_DATETIME TP4_PK TEST_P4 1 ID -- -- populate the table with data insert into test_p4(id,cre_datetime,status,num_1,num_2,num_3,num_4 ,vc_1,vc_2,vc_pad) select rownum ,to_date('01-JUL-2009','DD-MON-YYYY')+(rownum/360) ,decode(mod(rownum,100),0,1 ,0) ,trunc(dbms_random.value(1,20)) ,trunc(dbms_random.value(1,50)) ,mod(rownum,10)+1 ,trunc(dbms_random.value(1,10000)) ,dbms_random.string('U',5) ,lpad(chr(mod(rownum,6)+65),5,chr(mod(rownum,6)+65) ) ,lpad('A',1000,'A') from dba_objects where rownum <43000
I then messed with the data, updating 10% of records and setting the CRE_DATETIME to plus or minus a random amount up to 2 days different, so data in partitions would overlap. I then created a range table in the same way as I did for the previous post.
I ended up with a range table like the below:
MIN_CRE_DATI MAX_CRE_DATI MIN_ID MAX_ID ----------------- ----------------- ---------- ---------- 29-JUN-2009 20:35 05-JUL-2009 14:24 1 999 02-JUL-2009 04:50 07-JUL-2009 12:56 1000 1999 05-JUL-2009 02:34 10-JUL-2009 08:31 2000 2999 08-JUL-2009 05:23 13-JUL-2009 03:32 3000 3999 11-JUL-2009 08:07 15-JUL-2009 18:41 4000 4999 14-JUL-2009 00:14 18-JUL-2009 18:27 5000 5999 16-JUL-2009 08:58 21-JUL-2009 14:18 6000 6999 19-JUL-2009 01:28 24-JUL-2009 11:21 7000 7999 22-JUL-2009 08:02 27-JUL-2009 07:01 8000 8999 24-JUL-2009 22:06 30-JUL-2009 05:37 9000 9999 28-JUL-2009 04:59 01-AUG-2009 10:57 10000 10999 ... 24-SEP-2009 01:52 28-SEP-2009 18:36 31000 31999 26-SEP-2009 16:49 01-OCT-2009 01:26 32000 32999 29-SEP-2009 13:20 04-OCT-2009 13:43 33000 33999 02-OCT-2009 08:40 07-OCT-2009 10:11 34000 34999 05-OCT-2009 04:29 10-OCT-2009 04:09 35000 35999 08-OCT-2009 02:04 12-OCT-2009 17:34 36000 36999 10-OCT-2009 20:03 15-OCT-2009 08:39 37000 37999 13-OCT-2009 15:09 18-OCT-2009 12:01 38000 38999 16-OCT-2009 06:49 21-OCT-2009 03:53 39000 39999 18-OCT-2009 20:16 23-OCT-2009 21:01 40000 40999 21-OCT-2009 12:10 26-OCT-2009 07:13 41000 41999 25-OCT-2009 01:29 29-OCT-2009 20:56 42000 42999
You can see that the MIN_CRE_DATI-MAX_CRE_DATI from record to record (partition to partition) overlap but generally increase.
How do you find the start and end of the ID range to cover a date period you are interested in? I always have to sit down with a pen a paer to work this out. It is the classic “overlapping ranges” check, but my brain cannot hold on to it. So here goes. I want all records between the 1st of October and the 6th, this year.
You want to find the partition with the lowest ID range which has a record that falls into the date range you want. ie the Maximum CRE_DATETIME record is as late or later than the range you are interested in. If the maximum CRE_DATETIME is less than the date range you are interested in, no records will be in that partition.
Here is the code to find the lowest partition
select min(min_id) from tp4_range where MAX_CRE_DATI>=to_date('01-OCT-2009','DD-MON-YYYY') MIN(MIN_ID) ----------- 32000 Execution Plan ---------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost ---------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 12 | 2 | 1 | SORT AGGREGATE | | 1 | 12 | | 2 | TABLE ACCESS BY INDEX ROWID| TP4_RANGE | 12 | 144 | 2 |* 3 | INDEX RANGE SCAN | TP4R_MACD_MIAD | 12 | | 1 ----------------------------------------------------------
To cover the whole of the partition of interest you need to look for records with and I greater than the minimum in that partition, thus the selection of the min(min_id)
Similarly, you only want partitions where the minimum CRE_DATETIME is before the end of the date range you want. If all records in the partition have a CRE_DATETIME beyond the range, you are not interested in it.
select max(max_id) from tp4_range where MIN_CRE_DATI<=to_date('06-OCT-2009','DD-MON-YYYY') MAX(MAX_ID) ----------- 35999 Execution Plan ---------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost ---------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 13 | 2 | 1 | SORT AGGREGATE | | 1 | 13 | | 2 | TABLE ACCESS BY INDEX ROWID| TP4_RANGE | 37 | 481 | 2 |* 3 | INDEX RANGE SCAN | TP4R_MICD_MIID | 37 | | 1 ----------------------------------------------------------
Then you put these sub-queries into the query to select the data you want. Below I show the “original” code which does not use the partition range table and then using the partition range table, to prove the same number of records come back and to see the plan and statistics change:
select count(*) from test_p4 where cre_datetime between to_date('01-OCT-2009','DD-MON-YYYY') and to_date('06-OCT-2009','DD-MON-YYYY') and num_2 = 5 COUNT(*) ---------- 48 Execution Plan ------------------------------------------------------ | Id | Operation | Name | R ows | Bytes | Cost (%CPU)| Time | Pstart| Pstop | ------------------------------------------------------------ ------------------------------------------------------ | 0 | SELECT STATEMENT | | 1 | 11 | 356 (1)| 00:00:02 | | | | 1 | SORT AGGREGATE | | 1 | 11 | | | | | | 2 | PARTITION RANGE ALL | | 36 | 396 | 356 (1)| 00:00:02 | 1 | 46 | |* 3 | TABLE ACCESS BY LOCAL INDEX ROWID| TEST_P4 | 36 | 396 | 356 (1)| 00:00:02 | 1 | 46 | |* 4 | INDEX RANGE SCAN | TP4_CRE_DT | 1788 | | 52 (0)| 00:00:01 | 1 | 46 | ------------------------------------------------------------ Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 676 consistent gets 0 physical reads
Note the 676 consistent gets and the Pstart/Pstop of 1-46
select count(*) from test_p4 where cre_datetime between to_date('01-OCT-2009','DD-MON-YYYY') and to_date('06-OCT-2009','DD-MON-YYYY') and id> (select min(min_id) from tp4_range where MAX_CRE_DATI >=to_date('01-OCT-2009','DD-MON-YYYY')) and id < (select max(max_id) from tp4_range where MIN_CRE_DATI <=to_date('06-OCT-2009','DD-MON-YYYY')) and num_2 = 5 COUNT(*) ---------- 48 Execution Plan ----------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop | ------------------------------------------------------------ ----------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 16 | 66 (5)| 00:00:01 | | | | 1 | SORT AGGREGATE | | 1 | 16 | | | | | | 2 | PARTITION RANGE ITERATOR | | 1 | 16 | 62 (5)| 00:00:01 | KEY | KEY | |* 3 | TABLE ACCESS BY LOCAL INDEX ROWID | TEST_P4 | 1 | 16 | 62 (5)| 00:00:01 | KEY | KEY | | 4 | BITMAP CONVERSION TO ROWIDS | | | | | | | | | 5 | BITMAP AND | | | | | | | | | 6 | BITMAP CONVERSION FROM ROWIDS | | | | | | | | | 7 | SORT ORDER BY | | | | | | | | |* 8 | INDEX RANGE SCAN | TP4_PK | 1788 | | 2 (0)| 00:00:01 | KEY | KEY | | 9 | SORT AGGREGATE | | 1 | 12 | | | | | | 10 | TABLE ACCESS BY INDEX ROWID| TP4_RANGE | 12 | 144 | 2 (0)| 00:00:01 | | | |* 11 | INDEX RANGE SCAN | TP4R_MACD_MIA D | 12 | | 1 (0)| 00:00:01 | | | | 12 | SORT AGGREGATE | | 1 | 13 | | | | | | 13 | TABLE ACCESS BY INDEX ROWID| TP4_RANGE | 37 | 481 | 2 (0)| 00:00:01 | | | |* 14 | INDEX RANGE SCAN | TP4R_MICD_MII D | 37 | | 1 (0)| 00:00:01 | | | | 15 | BITMAP CONVERSION FROM ROWIDS | | | | | | | | | 16 | SORT ORDER BY | | | | | | | | |* 17 | INDEX RANGE SCAN | TP4_CRE_DT | 1788 | | 52 (0)| 00:00:01 | KEY | KEY | ------------------------------------------------------------ Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 350 consistent gets 0 physical reads
The plan is admittedly more complex and the SQL can be tricky to understand if you are not used to code that looks for overlap of ranges. But Consistent gets is down to 350 and the Pstart/Pstop values are KEY-KEY. The CBO cannot tell you WHAT the ranges will be when the code is parsed {I think it should check but in 10.2.0.3 at least, the CBO is not that smart at parse time}, but it knows there will be a start and stop.
I am using tiny partitions for my example and only 45 of them for the table. When the partitions are for millions of rows and there are a couple of thousand of them, excluding partitions in a manner you can rely on is a powerful tool. Which leads onto a final word of caution.
Nothing is enforcing that the range table is maintained.
You could do things with triggers or regular re-calculation of the ranges table but this will be something you need to consider if you use ranges tables to help partition exclusion. Flipping tablespaces to read-only can help the worry go away though… 🙂
As an example of the issue of needing to maintain the ranges table and also a demonstration that the ranges table does work correctly if maintainted, I’ll update a record well outside of the “expected window”, show that it does not appear in my range-check controlled code, then update the ranges table and try again.
update test_p4
set cre_datetime=to_date(’02-OCT-2009 11:15′,’DD-MON-YYYY HH24:MI’)
,NUM_2=5
where id=22100
/
3> select count(*) from test_p4
where cre_datetime between to_date(’01-OCT-2009′,’DD-MON-YYYY’)
and to_date(’06-OCT-2009′,’DD-MON-YYYY’)
and id> (select min(min_id)
from tp4_range where MAX_CRE_DATI>=to_date(’01-OCT-2009′,’DD-MON-YYYY’))
and id < (select max(max_id)
from tp4_range where MIN_CRE_DATI (select min(min_id)
from tp4_range where MAX_CRE_DATI>=to_date(’01-OCT-2009′,’DD-MON-YYYY’))
and id < (select max(max_id)
from tp4_range where MIN_CRE_DATI<=to_date('06-OCT-2009','DD-MON-YYYY'))
and num_2 = 5
COUNT(*)
———-
49
— success
—
I wonder if I will do the next posting on this topic in less than two weeks!
This is the best that could be done in the direction of partition pruning but it will not spare the visit to each local root block of each local index of ‘cr_time’. When you have 2 000 partitions, the simplest query will requests 4000 gets before even start working (2k of each subquery. The (select max(id)…where MIN_CRE_DATI …) will have to visit each root block of each local index (partitions) on CRE_TIME.
You mentioned this in the first post of this series about inefficient partitioning. Up to now I could not circumvent this issue. Creating CRE_TIME as a global with 2 Tera of data to rebuild/scan every time a partition is added/dropped, is simply a non-affordable-solution.
An enhancement on index partition for correlate columns would be to create a global header segment for secondary local indexes that will store the high and low value of each local index segment, so that query on unique values would lead to immediate massive pruning and maintenance of this header in the add/drop partitions would be light. I wonder if there is not such a function in the DBMS_GATHER for local indexes.
[…] 3-Transforming queries for ensuring Partition Pruning? Martin Widlake- Ensuring Correlated Partition Exclusion #2 […]