jump to navigation

Ensuring Correlated Partition Exclusion December 7, 2009

Posted by mwidlake in performance, Uncategorized, VLDB.
Tags: , ,
9 comments

<Previous Post…Next Pos >
I’ve posted a couple of times recently about a tuning method with partition exclusion where you infer a relationship between the value you want to limit a query on and the partition key. It takes a while to explain the theory so I am going to give it a name, in the vain hope it catches on {forgive me if someone has already done this – in fact, just flame me with pointers if you already know a name for this}. At the very least, for my own postings, I can use this name from now on and link back to a single posting explaining it.

I’m going to call it Correlated Partition Exclusion. In essence, you have partitioned the table on a key, in my case the primary key ID, which is an ascending numeric probably sourced from a sequence.
You have a second column, in my case CRE_DATETIME, which increases in line with the PK. If you limit your query on the CRE_DATETIME partition exclusion is not possible as there is no guarantee which CRE_DATETIME values appear in which partition. But, as a human, you understand that if you create 10,000 records a day, if you want to look at the last week’s date you can use:

WHERE ID > MAX_ID-(7*10000)

to exclude partitions with an id more than 7 days worth ago.

So, you have your Correlated Partition Exclusion.

How can you be sure that going back 70,000 IDs is going to safely cover one week of data and how can you maximise your efficiency of including only partitions that cover the date range? {note, I am using a numeric ID as my partition range key and a datetime as my correlated column, this principle works just as well if you partition on datetime and want to correlate to a (generally) ascending numeric key}

Here is my test table :-

create table test_p3
(id number(10) not null
,cre_datetime date not null
,status number(1) not null
,num_1 number(4) not null -- random 20
,num_2 number(4) -- random 500
,num_3 number(5) -- cycle smoothly
,num_4 number(5) -- Random 10000
,vc_1 varchar2(10)
,vc_2 varchar2(10)
,vc_pad varchar2(2000))
tablespace users
partition by range (id)
(partition id_01k values less than (1000)
tablespace users
,partition id_02k values less than (2000)
tablespace users
,partition id_03k values less than (3000)
tablespace users
,partition id_04k values less than (4000)
tablespace users
...
,partition id_45k values less than (45000)
tablespace users
--
,partition id_max values less than (maxvalue)
tablespace users
)
/
--@ind_cols
IND_NAME TAB_NAME PSN COL_NAME
------------------ ------------------ --------- --------------------
TP3_PK TEST_P3 1 ID

TP_CRE_DT TEST_P3 1 CRE_DATETIME

If I want to look for all records between two dates I could use code like the below (based in that suggested by Bernard Polarski, any mistakes are mine).

with get_min_id as
(select max(id) min_id from test_p3
where cre_datetime >= TO_DATE('18-OCT_2009','DD-MON-YYYY') )
,get_max_id as
(select min(id) max_id from test_p3
where cre_datetime <= TO_DATE('20-OCT_2009','DD-MON-YYYY') )
select count(*)
from test_p3
,get_min_id
,get_max_id
where id >get_min_id.min_id
and id < get_max_id.max_id
and cre_datetime between TO_DATE('18-OCT_2009','DD-MON-YYYY')
and TO_DATE('20-OCT_2009','DD-MON-YYYY')
/
COUNT(*)
----------
721

1 row selected.

ie find the minimum ID for the start date and the maximum ID for the end date and query between them.

This works fine. Unfortunately, you get a plan like the below

-------------------------------------------------------
| Id | Operation | Name |
Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
------------------------------------------------------------------
-------------------------------------------------------
| 0 | SELECT STATEMENT | |
1 | 39 | 887 (4)| 00:00:04 | | |
| 1 | SORT AGGREGATE | |
1 | 39 | | | | |
| 2 | NESTED LOOPS | |
2 | 78 | 887 (4)| 00:00:04 | | |
| 3 | NESTED LOOPS | |
1 | 26 | 717 (5)| 00:00:03 | | |
| 4 | VIEW | |
1 | 13 | 505 (5)| 00:00:02 | | |
|* 5 | FILTER | |
| | | | | |
| 6 | SORT AGGREGATE | |
1 | 26 | | | | |
|* 7 | VIEW | index$_join$_001 |
38393 | 974K| 505 (5)| 00:00:02 | | |
|* 8 | HASH JOIN | |
| | | | | |
| 9 | PARTITION RANGE ALL | |
38393 | 974K| 418 (6)| 00:00:02 | 1 | 46 |
|* 10 | INDEX RANGE SCAN | TP_CRE_DT |
38393 | 974K| 418 (6)| 00:00:02 | 1 | 46 |
| 11 | PARTITION RANGE ALL | |
38393 | 974K| 152 (4)| 00:00:01 | 1 | 46 |
| 12 | INDEX FAST FULL SCAN | TP3_PK |
38393 | 974K| 152 (4)| 00:00:01 | 1 | 46 |
| 13 | VIEW | |
1 | 13 | 212 (5)| 00:00:01 | | |
| 14 | SORT AGGREGATE | |
1 | 26 | | | | |
|* 15 | VIEW | index$_join$_002 |
2972 | 77272 | 212 (5)| 00:00:01 | | |
|* 16 | HASH JOIN | |
| | | | | |
| 17 | PARTITION RANGE ALL | |
2972 | 77272 | 59 (6)| 00:00:01 | 1 | 46 |
|* 18 | INDEX RANGE SCAN | TP_CRE_DT |
2972 | 77272 | 59 (6)| 00:00:01 | 1 | 46 |
| 19 | PARTITION RANGE ALL | |
2972 | 77272 | 152 (4)| 00:00:01 | 1 | 46 |
| 20 | INDEX FAST FULL SCAN | TP3_PK |
2972 | 77272 | 152 (4)| 00:00:01 | 1 | 46 |
| 21 | PARTITION RANGE ITERATOR | |
2 | 26 | 170 (1)| 00:00:01 | KEY | KEY |
|* 22 | TABLE ACCESS BY LOCAL INDEX ROWID| TEST_P3 |
2 | 26 | 170 (1)| 00:00:01 | KEY | KEY |
|* 23 | INDEX RANGE SCAN | TP_CRE_DT |
707 | | 48 (0)| 00:00:01 | KEY | KEY |
------------------------------------------------------------------
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
892 consistent gets
0 physical reads

If you look at the plan {towards the end, and sorry about the naff layout, my blog is not wide enough for this sort of printout} there are two checks on the TP_CRE_DT indexes that scan all partitions of the index – Pstart/Pstop are 1-46. This is the CBO looking for the partitions where the stated CRE_DATETIME records occur in the whole table. Cost is 892 consistent gets, much of which is the checking of local index partitions that will hold no relevant entries.

Bernard also spotted that the code was considering all partitions and was not as efficient as it could be but did not know how to get around it except with a Global index, which brings it’s own issues.

One way is to just say “well, I know how many records per day I get so I will fake up the ID limits based on this”. The problem is, and this is why the CBO cannot make the decision for you, is that there is no guarantee, no fixed rule, saying that CRE_DATETIME will always increment with the ID. In fact, there is nothing stopping you altering a record which has an ID from yesterday having a CRE_DATETIME from 10 years ago {but forget I said that until tomorrow’s post}. Now, in my example the CRE_DATETIME is going to increment with ID. We will use this special case for now, I know it is flawed and will address that flaw {tomorrow}.

So to ensure yo do not miss data you end up making your ID window pretty large to endure you do not miss records. Say you want all records for the last day, you process 1500 records a day, so you consider a window covering all samples with an ID within the last 10,000. It will still be more efficient than scanning all partitions and should be safe. Fairly safe.

The way out of this is to have a ranges table. Something that tells you, for each partition, which is the maximum and minimum CRE_DATE and the IDs covered by that range. You can then use that to identify the partitions that cover the date range you are interested in.

Here is an example. I will create a simple table to hold the ranges:

create table tp3_range
(min_cre_dati date
,max_cre_dati date
,min_id number
,max_id number)

Now you have to populate it.
The below code will create the first record for the first partition.

insert into tp3_range
SELECT MIN(CRE_DATETIME)
,MAX(CRE_DATETIME)
,MIN(ID)
,MAX(ID) FROM TEST_P3 PARTITION (ID_01K)
/
Execution Plan
----------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (
%CPU)| Time | Pstart| Pstop |
------------------------------------------------------------------
| 0 | INSERT STATEMENT | | 1 | 12 | 102
(1)| 00:00:01 | | |
| 1 | SORT AGGREGATE | | 1 | 12 |
| | | |
| 2 | PARTITION RANGE SINGLE| | 999 | 11988 | 102
(1)| 00:00:01 | 1 | 1 |
| 3 | TABLE ACCESS FULL | TEST_P3 | 999 | 11988 | 102
(1)| 00:00:01 | 1 | 1 |
------------------------------------------------------------------

Statistics
----------------------------------------------------------
1 recursive calls
3 db block gets
195 consistent gets
0 physical reads

As you can see from the plan and cost, this is not very efficient as it has to scan the whole partition. Maybe not a problem if you do this once, but there are indexes on both these columns, can’t this be done more efficiently? Yes, if you split up the code into four in-line selects (if you want more details about it being more performant to do MIN and MAX on their own than in one statement then see this post on the topic ):

insert into tp3_range
SELECT (SELECT MIN(CRE_DATETIME) FROM TEST_P3 PARTITION (ID_04K))
,(SELECT MAX(CRE_DATETIME) FROM TEST_P3 PARTITION (ID_04K))
,(SELECT MIN(id ) FROM TEST_P3 PARTITION (ID_04K))
,(SELECT MAX(ID ) FROM TEST_P3 PARTITION (ID_04K))
FROM DUAL

PLAN
---------------------------------------------------------
SORT AGGREGATE
PARTITION RANGE SINGLE cst:2 rws:1000
INDEX FULL SCAN (MIN/MAX) TP_CRE_DT cst:2 rws:1000
Statistics
----------------------------------------------------------
1 recursive calls
5 db block gets
9 consistent gets
0 physical reads

{You may wonder why the Explain Plan section above has a different look. This is because there seems to be a bug in 10.2.0.3 where the autotrace plan for the insert statement comes out wrong, as a FAST DUAL access, so I had to Explain the statement in a different way}

You would run one of the above statements against each partition. Probably, the “Best Practice” way would be to generate a SQL script from a query against DBA_TAB_PARTITIONS.

To populate my table I cheated – I just assumed all my partitions are of the same size (1000 records) and used:-
insert into tp3_range
select min(cre_datetime),max(cre_datetime),min(id),max(id)
from test_p3
— where id between 10000 and 20000
group by trunc(id/1000)
/

Note I left in a commented line. You could run the above against the whole table and then limit it to just new partitions as you add them. This is a tad risky though, you are relying on the partitioning being perfect and it would not scale to hundreds of very large partitions. You would be better off with the partition-by-partition methods above.

You end up with a table like the below:-

select * from tp3_range order by min_cre_dati


MIN_CRE_DATI MAX_CRE_DATI MIN_ID MAX_ID
----------------- ----------------- ---------- ----------
01-JUL-2009 00:04 03-JUL-2009 18:36 1 999
03-JUL-2009 18:40 06-JUL-2009 13:16 1000 1999
06-JUL-2009 13:20 09-JUL-2009 07:56 2000 2999
09-JUL-2009 08:00 12-JUL-2009 02:36 3000 3999
12-JUL-2009 02:40 14-JUL-2009 21:16 4000 4999
14-JUL-2009 21:20 17-JUL-2009 15:56 5000 5999
17-JUL-2009 16:00 20-JUL-2009 10:36 6000 6999
20-JUL-2009 10:40 23-JUL-2009 05:16 7000 7999
23-JUL-2009 05:20 25-JUL-2009 23:56 8000 8999
26-JUL-2009 00:00 28-JUL-2009 18:36 9000 9999
28-JUL-2009 18:40 31-JUL-2009 13:16 10000 10999
31-JUL-2009 13:20 03-AUG-2009 07:56 11000 11999
...
14-OCT-2009 13:20 17-OCT-2009 07:56 38000 38999
17-OCT-2009 08:00 20-OCT-2009 02:36 39000 39999
20-OCT-2009 02:40 22-OCT-2009 21:16 40000 40999
22-OCT-2009 21:20 25-OCT-2009 15:56 41000 41999
25-OCT-2009 16:00 28-OCT-2009 10:36 42000 42999

add the two below indexes:
create index tp3r_micd_miid on tp3_range(min_cre_dati,min_id);
create index tp3r_macd_miad on tp3_range(max_cre_dati,max_id);

And now you can find the upper and lower ID bounds for a date range with the following code:

select min_id from tp3_range
where min_cre_dati = (select max(min_cre_dati)
from tp3_range
where min_cre_dati <TO_DATE('18-OCT_2009','DD-MON-YYYY')
)
MIN_ID
----------
39000

Execution Plan
---------------------------------------------------------
| Id  | Operation                     | Name           | Rows  | Bytes | Cost 
---------------------------------------------------------
|   0 | SELECT STATEMENT              |                |     1 |    22 |     2
|*  1 |  INDEX RANGE SCAN             | TP3R_MICD_MIID |     1 |    22 |     1
|   2 |   SORT AGGREGATE              |                |     1 |     9 |        
|   3 |    FIRST ROW                  |                |    40 |   360 |     1
|*  4 |     INDEX RANGE SCAN (MIN/MAX)| TP3R_MICD_MIID |    40 |   360 |     1
---------------------------------------------------------
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
3 consistent gets
0 physical reads

select max_id from tp3_range
where max_cre_dati = (select min(max_cre_dati)
from tp3_range
where max_cre_dati >to_date('20-OCT_2009','DD-MON-YYYY')
)

MAX_ID
----------
39999

Execution Plan
----------------------------------------------------------
| Id  | Operation                     | Name           | Rows  | Bytes | Cost 
----------------------------------------------------------
|   0 | SELECT STATEMENT              |                |     1 |    22 |     2
|*  1 |  INDEX RANGE SCAN             | TP3R_MACD_MIAD |     1 |    22 |     1
|   2 |   SORT AGGREGATE              |                |     1 |     9 |        
|   3 |    FIRST ROW                  |                |     4 |    36 |     1
|*  4 |     INDEX RANGE SCAN (MIN/MAX)| TP3R_MACD_MIAD |     4 |    36 |     1
----------------------------------------------------------
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
3 consistent gets
0 physical reads

Put it all together into one statement and let’s see how much it costs:

with get_min_id as
(select min_id from tp3_range
where min_cre_dati = (select max(min_cre_dati)
from tp3_range
where min_cre_dati <TO_DATE('18-OCT_2009','DD-MON-YYYY')
) )
,get_max_id as
(select max_id from tp3_range
where max_cre_dati = (select min(max_cre_dati)
from tp3_range
where max_cre_dati >to_date('20-OCT_2009','DD-MON-YYYY')
) )
select count(*)
from test_p3
,get_min_id
,get_max_id
where id >get_min_id.min_id
and id < get_max_id.max_id
and cre_datetime between TO_DATE('18-OCT_2009','DD-MON-YYYY')
and TO_DATE('20-OCT_2009','DD-MON-YYYY')

COUNT(*)
----------
721
1 row selected.

Execution Plan
----------------------------------------------------------
| Id | Operation | Name | Row
s | Bytes | Cost (%CPU)| Time | Pstart| Pstop |
------------------------------------------------------------------
----------------------------------------------------
| 0 | SELECT STATEMENT | |
1 | 57 | 67 (5)| 00:00:01 | | |
| 1 | SORT AGGREGATE | |
1 | 57 | | | | |
| 2 | TABLE ACCESS BY LOCAL INDEX ROWID | TEST_P3 |
2 | 26 | 65 (5)| 00:00:01 | | |
| 3 | NESTED LOOPS | |
2 | 114 | 65 (5)| 00:00:01 | | |
| 4 | MERGE JOIN CARTESIAN | |
1 | 44 | 2 (0)| 00:00:01 | | |
|* 5 | INDEX RANGE SCAN | TP3R_MICD_MIID |
1 | 22 | 1 (0)| 00:00:01 | | |
| 6 | SORT AGGREGATE | |
1 | 9 | | | | |
| 7 | FIRST ROW | |
40 | 360 | 1 (0)| 00:00:01 | | |
|* 8 | INDEX RANGE SCAN (MIN/MAX) | TP3R_MICD_MIID |
40 | 360 | 1 (0)| 00:00:01 | | |
| 9 | BUFFER SORT | |
1 | 22 | 1 (0)| 00:00:01 | | |
|* 10 | INDEX RANGE SCAN | TP3R_MACD_MIAD |
1 | 22 | 1 (0)| 00:00:01 | | |
| 11 | SORT AGGREGATE | |
1 | 9 | | | | |
| 12 | FIRST ROW | |
4 | 36 | 1 (0)| 00:00:01 | | |
|* 13 | INDEX RANGE SCAN (MIN/MAX) | TP3R_MACD_MIAD |
4 | 36 | 1 (0)| 00:00:01 | | |
| 14 | PARTITION RANGE ITERATOR | |
| | | | KEY | KEY |
| 15 | BITMAP CONVERSION TO ROWIDS | |
| | | | | |
| 16 | BITMAP AND | |
| | | | | |
| 17 | BITMAP CONVERSION FROM ROWIDS| |
| | | | | |
| 18 | SORT ORDER BY | |
| | | | | |
|* 19 | INDEX RANGE SCAN | TP3_PK | 7
07 | | 6 (0)| 00:00:01 | KEY | KEY |
| 20 | BITMAP CONVERSION FROM ROWIDS| |
| | | | | |
| 21 | SORT ORDER BY | |
| | | | | |
|* 22 | INDEX RANGE SCAN | TP_CRE_DT | 7
07 | | 48 (0)| 00:00:01 | KEY | KEY |
------------------------------------------------------------------


Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
133 consistent gets
0 physical reads

If you look way back up this post, you will see that the number of records selected by the above is the same as from the code indicated by Bernard (721 records) and for a lot less cost – 133 consistent gets compared to 892.

So I have hopefully explained the issue (having to visit all the index partitions to check the date range), shown how a ranges table can help, given some simple and a less-simple-but-more-efficient examples of code to populate the table and finally shown that using the range table can be more efficient.

And I arrogantly gave it a name – Correlated Partition Exclusion :-)

Some of you may remember tha this example has been a special case, as there is no overlap between the dates; the relationship between the CRE_DATETIME and ID is perfect. You are unlikely to have that in real life. Also, dome of you may also be tired of reading this post. So I will cover the general case in the NEXT post.

Follow

Get every new post delivered to your Inbox.

Join 171 other followers