jump to navigation

More on Assisting Partition Exclusion December 6, 2009

Posted by mwidlake in performance, VLDB.
Tags: , ,

A couple of weeks ago I posted about how you could help the oracle CBO carry out partition exclusion and thus aid performance. In essence, you have a table partitioned on ID (based on an ascending numeric) and you also have a date column (CRE_DATETIME, indexed using a locally partitioned index), which you often want to limit queries on.
The ID and the CRE_DATETIME both increase over time, so as a human you can infer that a date created range will match to an ID range that you can limit any queries on and get partition exclusion. The CBO does not understand this relationship and so can’t help. If you want to look at records for the last week and you know that you generate 10,000 records a day maximum, you can add a where clause of {and this is pseudocode}
I’m afraid you will have to look back at the referenced post if this does not make sense.

The issue I want to address is that how can you be sure that you are using a fake ID range to cover the date range?

A simple real-world example of this is when someone (a business objects user or helpdesk user seems to be the usual source) wants to look up the details of a fairly new customer on the datawarehouse. They don’t know the ID of the record, but they know the new account was created this week and some other filtering data. So they run a query saying:


This visits every partition in the table, even if there is a locally partitioned index on CRE_DATETIME. See this post for details of this problem. If you are really unlucky, and this seems to be the usual situation, there is no index on the CRE_DATETIME either, and a full scan of the billion-row table is initiated, tying up one CPU and giving that expansive {Sorry, type, expensive – though both are true) storage sub-system something to work on for a few hours.

However, in this situation, do this. Select max(ID) {the primary key} from the partitioned table, which will take less time than it does to type “select”. Then you say to the user:

 “OK, the max ID is currently 987,000,000. Add this to the WHERE clause:

AND ID > 980000000

If that does not find your record now change that additional clause to:

AND ID BETWEEN 970000000 AND 980000000

and just keep going down by 10 million each time.”

By doing this, the user will be limiting the query on the partition key, the ID, and partition exclusion will be possible and each query will come back quickly. The user will usually find the record they want in the first couple of attempts {or will be kept quiet and less demanding of the system until they get bored and come back to tell you “you are an idiot”, having scanned back over several 10’s of millions of records, but then you know they lied about the record being recent so you need to help them in some other way}.

This post is actually to remind you of where I had got on this thread, my being so unforgivably slow in keeping this thread running. Tomorrow (HONEST!) I will cover HOW you ensure your inferred ID range is covering the CRE_DATETIME range you are interested in.



1. Bernard Polarski - December 7, 2009

Tomorrow, promised, I will be there. I wonder whether in 11g there are some stuff you can define for correlated fields.

mwidlake - December 7, 2009

Just for you Bernard, I got out my sick bed and finished off that promised post!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: