jump to navigation

Assisting Partition Exclusion – Partitions for Performance November 23, 2009

Posted by mwidlake in performance, VLDB.
Tags: , ,
10 comments

<Previous PostNext Post>
Partitions are often used to help increase the performance of SQL select activity via the concept of partition exclusion – the CBO will be able to identify which of the partitions could hold the data of interest and ignore the others. This is their main benefit from a SQL Select performance perspective.

The problem is, the partitioning key is not always included in your SQL statements.

I discussed this potential negative impact of partitioning on SQL select performance in this post and this one also. I did not give any proof then, so the first half of this post provides that.

As an example, I have created a table with 45 partitions, partitioned on the column ID, an ascending numeric primary key (traditionally sourced from a sequence). There is an index on the ID column, locally partitioned. The table also has a column CRE_DATETIME, which is the date the record was created. This column is also indexed with a locally partitioned index.

NAME                           VALUE
------------------------------ -------------
compatible                     10.2.0.3.0
cpu_count                      8
db_block_size                  8192
db_file_multiblock_read_count  16
optimizer_mode                 ALL_ROWS
sga_target                     0
sort_area_size                 65536

create table test_p3
(id           number(10) not null
,cre_datetime date not null
,status        number(1) not null
,num_1         number(4) not null -- random 20
,num_2         number(4) -- random 500
,num_3         number(5) -- cycle smoothly
,num_4         number(5) -- Random 10000
,vc_1          varchar2(10)
,vc_2          varchar2(10)
,vc_pad        varchar2(2000))
tablespace users
partition by range (id)
(partition id_01k values less than (1000)
tablespace users
,partition id_02k values less than (2000)
tablespace users
,partition id_03k values less than (3000)
tablespace users
...
,partition id_45k values less than (45000)
tablespace users
--
,partition id_max values less than (maxvalue)
tablespace users
)
/
select table_name,partitioning_type p_type
,partition_count pt_count
from dba_part_tables
where owner=user and table_name = 'TEST_P3'
TABLE_NAME                     P_TYPE    PT_COUNT
------------------------------ ------- ----------
TEST_P3                        RANGE           46

>@ind_cols
IND_NAME           TAB_NAME           PSN       COL_NAME
------------------ ------------------ --------- ------------
TP3_PK             TEST_P3            1         ID
TP_CRE_DT          TEST_P3            1         CRE_DATETIME

{This is a very typical senario, as is the opposite situation where the table is partitioned on date but has a numeric column holding eg order number or customer ID.}

The below shows a select of a single record by ID.

select id,cre_datetime,num_1,vc_1
from test_p3 where id=37123
/
        ID CRE_DATETIME           NUM_1 VC_1
---------- ----------------- ---------- ----------
     37123 12-OCT-2009 02:52         19 RMZAN
1 row selected.

Execution Plan
----------------------------------------------------------
----------------------------------------------
| Id  | Operation                          | Name    | Rows  | 
Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
----------------------------------------------------------------
----------------------------------------------
|   0 | SELECT STATEMENT                   |         |     1 |
  22 |     2   (0)| 00:00:01 |       |       |
|   1 |  PARTITION RANGE SINGLE            |         |     1 |
  22 |     2   (0)| 00:00:01 |    38 |    38 |
|   2 |   TABLE ACCESS BY LOCAL INDEX ROWID| TEST_P3 |     1 |
  22 |     2   (0)| 00:00:01 |    38 |    38 |
|*  3 |    INDEX UNIQUE SCAN               | TP3_PK  |     1 |
     |     1   (0)| 00:00:01 |    38 |    38 |
----------------------------------------------------------------
----------------------------------------------
Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
          3  consistent gets
          0  physical reads

Note the PARTITION RANGE SINGLE and pstart/pstop are both 38. Consistent gets are only 3. The query is very efficient as the CBO identified that records matching the WHERE clause could only exist in the one partition. The relevant local index is examined for 2 consisted gets (the BLEVEL of the local index partitions is 1) and then one consistent get against the table.

Now I’ll scan for records for a range of IDs:

select id,cre_datetime,num_1,vc_1
from test_p3 where id between 30980 and 31019
/
        ID CRE_DATETIME           NUM_1 VC_1
---------- ----------------- ---------- ----------
     30980 25-SEP-2009 01:20         15 JQLYA
     30981 25-SEP-2009 01:24          5 WYWCR
     30982 25-SEP-2009 01:28         10 QZSHD
     30983 25-SEP-2009 01:32         18 KQBSU
     30984 25-SEP-2009 01:36          2 HRPMA
....
     31018 25-SEP-2009 03:52          3 YOPJO
     31019 25-SEP-2009 03:56         10 GNGKG

40 rows selected.

Execution Plan
----------------------------------------------------------
----------------------------------------------
| Id  | Operation                          | Name    | Rows  | 
Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
----------------------------------------------------------------
----------------------------------------------
|   0 | SELECT STATEMENT                   |         |    40 |
 880 |     9   (0)| 00:00:01 |       |       |
|   1 |  PARTITION RANGE ITERATOR          |         |    40 |
 880 |     9   (0)| 00:00:01 |    31 |    32 |
|   2 |   TABLE ACCESS BY LOCAL INDEX ROWID| TEST_P3 |    40 |
 880 |     9   (0)| 00:00:01 |    31 |    32 |
|*  3 |    INDEX RANGE SCAN                | TP3_PK  |     2 |
     |     2   (0)| 00:00:01 |    31 |    32 |
----------------------------------------------------------------

Statistics
---------------------------------------------
          0  recursive calls
          0  db block gets
         52  consistent gets
          0  physical reads

Again, you can see partition exclusion, this time as a PARTITION RANGE ITERATOR and Pstart/Pstop of 31 and 32. Consistent gets were higher, but I did fetch several records.

Now a query is issued for a record with a specific value for CRE_DATETIME.

select id,cre_datetime,num_1,vc_1
from test_p3 where cre_datetime = 
   to_date('06-OCT-2009 12:20','DD-MON-YYYY HH24:MI')
/
        ID CRE_DATETIME           NUM_1 VC_1
---------- ----------------- ---------- ----------
     35105 06-OCT-2009 12:20         11 JPQHO
1 row selected.

Execution Plan
----------------------------------------------------------
------------------------------------------------
| Id  | Operation                          | Name      | Rows  |
 Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
----------------------------------------------------------------
------------------------------------------------
|   0 | SELECT STATEMENT                   |           |     1 |
    22 |    48   (0)| 00:00:01 |       |       |
|   1 |  PARTITION RANGE ALL               |           |     1 |
    22 |    48   (0)| 00:00:01 |     1 |    46 |
|   2 |   TABLE ACCESS BY LOCAL INDEX ROWID| TEST_P3   |     1 |
    22 |    48   (0)| 00:00:01 |     1 |    46 |
|*  3 |    INDEX RANGE SCAN                | TP_CRE_DT |     1 |
       |    47   (0)| 00:00:01 |     1 |    46 |
----------------------------------------------------------------

Statistics
---------------------------------------
          1  recursive calls
          0  db block gets
         90  consistent gets
          0  physical reads

Notice that partition exclusion does not occur, you see PARTITION RANGE ALL, Pstart/Pstop of 1 – 46. The CBO cannot identify which partitions may or may not hold records with that CRE_DATETIME. Consistent gets are now a lot higher, 90 (as compared to 3 for the situation where partition exclusion can occur}, as oracle has to prob each and every local index partition to identify if any records for the given value exist.

Of course, you probably would not look for a record with a specific datetime, but in the opposite senario of having partitioned on datetime, it would not be at all unusual to look for a record with a given ID and partition exclusion not being possible.

As I mentioned earlier, this need to scan all of the local indexes because no partition exclusion is possible is something I discussed a couple of weeks back.

Now I will select for the range of dates that my range scan in ID identified:

select id,cre_datetime,num_1,vc_1
from test_p3 
where cre_datetime between to_date('25-SEP-2009 01:20','DD-MON-YYYY HH24:MI')
                   and     to_date('25-SEP-2009 03:56','DD-MON-YYYY HH24:MI')

        ID CRE_DATETIME           NUM_1 VC_1
---------- ----------------- ---------- ----------
     30980 25-SEP-2009 01:20          3 BGAFO
     30981 25-SEP-2009 01:24         15 PXGJD
     30982 25-SEP-2009 01:28         12 PGQHJ
     30983 25-SEP-2009 01:32         17 TIBZG
     30984 25-SEP-2009 01:36         11 EQQQV
...
     31018 25-SEP-2009 03:52         18 FSNVI
     31019 25-SEP-2009 03:56         16 LJWNO
40 rows selected.

Execution Plan
----------------------------------------------------------
------------------------------------------------
| Id  | Operation                          | Name      | Rows  |
 Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
----------------------------------------------------------------
------------------------------------------------
|   0 | SELECT STATEMENT                   |           |    42 |
   924 |    54   (0)| 00:00:01 |       |       |
|   1 |  PARTITION RANGE ALL               |           |    42 |
   924 |    54   (0)| 00:00:01 |     1 |    46 |
|   2 |   TABLE ACCESS BY LOCAL INDEX ROWID| TEST_P3   |    42 |
   924 |    54   (0)| 00:00:01 |     1 |    46 |
|*  3 |    INDEX RANGE SCAN                | TP_CRE_DT |    42 |
       |    47   (0)| 00:00:01 |     1 |    46 |
----------------------------------------------------------------
------------------------------------------------

Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
        136  consistent gets
          0  physical reads

That actually selects back the same records, but now a PARTITION RANGE ALL is invoked, Pstart/Pstop os 1-46 and the number of consistent gets goes up to 136.

I discussed the impact of partitions on range scans in this second post about performance issues with partitions.

Thankfully, you can often alleviate this problem of considering all partitions.

The below shows a scan for records for a range of dates.

select count(*)
from test_p3
where cre_datetime between to_date('18-OCT-2009','DD-MON-YYYY')
                   and     to_date('20-OCT-2009','DD-MON-YYYY')

  COUNT(*)
----------
       721
1 row selected.

Execution Plan
----------------------------------------------------------
----------------------------------
| Id  | Operation            | Name      | Rows  | Bytes | Cost
(%CPU)| Time     | Pstart| Pstop |
----------------------------------------------------------------
----------------------------------
|   0 | SELECT STATEMENT     |           |     1 |     8 |    49
   (0)| 00:00:01 |       |       |
|   1 |  SORT AGGREGATE      |           |     1 |     8 |
      |          |       |       |
|   2 |   PARTITION RANGE ALL|           |   731 |  5848 |    49
   (0)| 00:00:01 |     1 |    46 |
|*  3 |    INDEX RANGE SCAN  | TP_CRE_DT |   731 |  5848 |    49
   (0)| 00:00:01 |     1 |    46 |
----------------------------------------------------------------
----------------------------------

Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
         90  consistent gets
          0  physical reads

As you can see, no partitiion exclusion is possible, every partition is considered (PARTITION RANGE ALL, Pstart/Pstop of 1-46). 90 consistent gets are required.

The trick is to introduce a second WHERE clause, limiting the query by the partitioning key as well.
As a human, you can recognise that the ID and the CRE_DATETIME columns are going to increase pretty much in synchronisation. Records with a CRE_DATETIME a month ago are going to be found in partitions with a lower ID range than those from a week ago. Oracle does not know this business logic, so you have to tell it.

Let us say you never have more than 1,000 records created a day. so if you want “today’s” data, you need only consider IDs that are between the max(ID) and the max(ID) less 1,000. To be safe, you would increase this range substantially. The idea is to exclude most partitions without risking missing data:

select count(*)
from test_p3
where cre_datetime between to_date('18-OCT-2009','DD-MON-YYYY')
                   and     to_date('20-OCT-2009','DD-MON-YYYY')
and  id between 39000 and 41000

  COUNT(*)
----------
       721
1 row selected.

Execution Plan
----------------------------------------------------------
------------------------------------------------
| Id  | Operation                   | Name             | Rows  |
 Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
----------------------------------------------------------------
------------------------------------------------
|   0 | SELECT STATEMENT            |                  |     1 |
    13 |    14   (8)| 00:00:01 |       |       |
|   1 |  SORT AGGREGATE             |                  |     1 |
    13 |            |          |       |       |
|*  2 |   VIEW                      | index$_join$_001 |    34 |
   442 |    14   (8)| 00:00:01 |       |       |
|*  3 |    HASH JOIN                |                  |       |
       |            |          |       |       |
|   4 |     PARTITION RANGE ITERATOR|                  |    34 |
   442 |     7  (15)| 00:00:01 |    40 |    42 |
|*  5 |      INDEX RANGE SCAN       | TP_CRE_DT        |    34 |
   442 |     7  (15)| 00:00:01 |    40 |    42 |
|   6 |     PARTITION RANGE ITERATOR|                  |    34 |
   442 |     9  (23)| 00:00:01 |    40 |    42 |
|*  7 |      INDEX RANGE SCAN       | TP3_PK           |    34 |
   442 |     9  (23)| 00:00:01 |    40 |    42 |
----------------------------------------------------------------
------------------------------------------------

Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
         18  consistent gets
          0  physical reads

In the above I have managed to exclude most of the partitions and still get the correct data, by adding the clause and id between 39000 and 41000 . There is something a little unusal {if you have not seen it before} with the plan being scans of two indexes and then hashed together and viewed, but both are PARTITION RANGE ITERATOR scans and Pstart/Pstop values are 40-42. Consistent gets are down to 18.

You do have to take great care with this method that the extra WHERE clause you add on the partitioning key will never exclude records you want to find. But so long as you do, it is a very powerful technique.

I’ll go into that a little more in the next posting {which hopefully will not be as delayed as this one!}

Follow

Get every new post delivered to your Inbox.

Join 161 other followers