jump to navigation

IOT part 3 – Significantly Reducing IO August 2, 2011

Posted by mwidlake in development, performance.
Tags: , , , ,
trackback

<..IOT1 – the basics
<….IOT2 – Examples and proofs
……>IOT4 – Boosting Buffer Cache Efficiency
……..>IOT5 – Primary Key issues
……….>IOT6a – Slowing Down Insert
…………>IOT6(B) – OLTP Inserts

In the previous two posts I covered the basics of Index Organized Tables (IOTs) and then created some test tables to demonstrate the benefit of IOTs that is most often covered – reducing the IO needed to get a single record by one IO, say from 5 to 4. {Whether this is a buffer get from memory or a disc IO depends on if the block is cached, of course}.

In this post I am going to show how IOTs can far more significantly reduce the IO when several related rows are required.

Below is one of my test tables, the one that is a normal heap table and has a primary key, CHHE_PK on PARE_ID and CRE_DATE:

mdw11> desc child_heap
 Name                                      Null?    Type
 ----------------------------------------- -------- --------------
 PARE_ID                                   NOT NULL NUMBER(10)
 CRE_DATE                                  NOT NULL DATE
 VC_1                                      NOT NULL VARCHAR2(100)
 DATE_1                                             DATE
 NUM_1                                              NUMBER(2)
 NUM_2                                              NUMBER(2)

--
mdw11> select count(*),count(distinct(pare_id)) from child_heap

  COUNT(*) COUNT(DISTINCT(PARE_ID))
---------- ------------------------
   1000000                     9999

As you can see, the table has 1 million records and 9,999 values for PARE_ID, there are approx 100 records per parent. The data was created to match a common situation – that of a bit of data coming in for each parent every day. See post 2 for details.

The result of this is that the data for any given parent is scattered through the table. As the data comes in for a given day, the data for the first parent is added to the end of the table, followed by all the data for all the other parents who have data that day. The next day this is repeated, so the child records for a given parent are interspersed with the child records for many other parents.

The below diagram demonstrate what will now happen if you issue a statement like
select *
from CHILD_HEAP
where PARE_ID=12

Oracle quickly works down the index to the leaf block containing the first key that matches the range. This takes, in my example, 4 block reads. Oracle now works through the index entries and, via the rowid, identifies the exact block to visit in the table for each key. For each key it has to visit a new block – because the data is scattered through the table. This is what the clustering_factor in the index statistics is measuring, how often contiguous rows in the index are for the same block. In our case, almost never.
In my diagram I do not number those table reads but in my simplistic diagram it would be 10 further reads.
If Oracle reaches the end of the leaf block before it reaches the end of the range of key values, oracle follows the pointer in the leaf block (not shown) to the next leaf block (whcih is another block read) and continues working through the keys until the range scan is completed.

In my simplified diagram I only have 6 entries per leaf block. In reality, and in my example tables, this is more like a few hundred. 247 in the case of CHHE_PK.

Now let’s consider my Index Organized Table, CHILD_IOT. It has exactly the same columns as CHILD_HEAP and the data was created in the same way. However, because it is an IOT, as the data came in it was inserted into the primary key index and is thus in an ordered state.

The below diagram demonstrate what will now happen if you issue a statement like
select *
from CHILD_IOT
where PARE_ID=12

Oracle works down the index to the leaf block where the range scan begins and now simply works along the leaf blocks. There is no need to go and visit the table as there is no table.

In my IOT diagram the leaf entries are longer and there are fewer in each leaf block, ie 5. So my scan has to visit 3 leaf blocks rather than 2. In reality the difference is more pronounced, in my example table there are actually 56 rows per leaf block, compared to the 247 in the index on the heap table. As such, my scan on the IOT will cover more leaf blocks but this is insignificant compared to the reduction in block visits caused by not having to go hunt down records scattered over the table. Even in the unlikely event of my IOT being deeper by 1 level (an extra layer of branch blocks) due to the reduces entries per leaf block, I would still be winning for range scans.

That is all nice theory and pictures. As ever, we need to back this up with some real tests. Firstly, I am using SQL*Plus and I need to set my arraysize large enough so that I do not introduce extra consistent gets through selecting small sets of rows between client and server. You will need to do the same to see similar results to me.
{I keep meaning to do a dedicated post on arraysize but H.Tonguç YIlmaz has a nice post already on it.}

set arraysize 200
set autotrace on

Now I will select all the records for PARE_ID=10, including a column not in the Primary Key, so that the table needs to be visited. I did this twice to remove the parsing overhead:

select pare_id,cre_date,vc_1
from child_heap
where pare_id =10
order by cre_date

   PARE_ID CRE_DATE  VC_1
---------- --------- -----------------------------------------------------------------------
        10 17-APR-11 LDOBKMLCYCSQYBDFIUISJWQAHNYSQOSUQJKIGCSEJHDPOFFLHHXYSMDSQNUB
        10 18-APR-11 LBGDNOYQFQMTMJQRAUWSRNBTHQSKBEUVLZSFWEGULOPDXQSVXOIC
        10 18-APR-11 LBGDNOYQFQMTMJQRAUWSRNBTHQSKBEUVLZSFWEGULOPDXQSVXOICOSFTSYNO
        10 19-APR-11 IBVTIGYBXJLMZQKRPJZEPXLMQLNOYNWLQOYVVGARNSITZWULVBYLEJKZNII
        10 19-APR-11 IBVTIGYBXJLMZQKRPJZEPXLMQLNOYNWLQOYVVGARNSITZWULVBYLEJ
        10 19-APR-11 IBVTIGYBXJLMZQKRPJZEPXLMQLNOYNWLQOYVVGARNSITZWULVBYLEJ
        10 20-APR-11 USIGVSPPIUUXEIRBMPFNBTTMDUJTVITHKQWZAKZOMJEDZCUPQAEFQQEYM
        10 20-APR-11 USIGVSPPIUUXEIRBMPFNBTTMDUJTVITHKQWZAKZOMJEDZCUPQAEF
...
        10 19-JUL-11 BNOYCIDTFJHPPOYPSVAVKJSYUNVPGPHLJXUOIKYKASKHYGZNVHVFFGPVAKN
        10 25-JUL-11 HDFGAQWTYZBSVYVXTFFRDIAKRYWFUPFCNDCETHUWHSQUITHHVUEJTJ

82 rows selected.


Execution Plan
------------------------------------------------------------------------------------------
| Id  | Operation                   | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |            |   100 |  6900 |   103   (0)| 00:00:02 |
|   1 |  TABLE ACCESS BY INDEX ROWID| CHILD_HEAP |   100 |  6900 |   103   (0)| 00:00:02 |
|*  2 |   INDEX RANGE SCAN          | CHHE_PK    |   100 |       |     3   (0)| 00:00:01 |
------------------------------------------------------------------------------------------

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
         86  consistent gets
          0  physical reads

82 rows collected and 86 consistent gets. That will be 4 consistent gets to process the index blocks and 82 for the table blocks.

Now let’s repeat that on the IOT:

select pare_id,cre_date,vc_1
from child_IOT
where pare_id =10
order by cre_date
mdw11> /
any key>

   PARE_ID CRE_DATE  VC_1
---------- --------- ------------------------------------------------------------
        10 17-APR-11 QJHQXTQAYEUICPNDQTYMMFZPWJSIDLBKOXYTHLEHKTVWUPKQMWUUX
        10 18-APR-11 BUTUEWDCDQVPLTPPRFGBBEDOZYRPERPRROVUQPTSRZLHKVBSBUEAMZYAS
        10 18-APR-11 BUTUEWDCDQVPLTPPRFGBBEDOZYRPERPRROVUQPTSRZLHKVBSBUEAMZY
        10 19-APR-11 DEGNPALVLMIDYCYIQIIQJJVZFTNIMEULMAGDEWVTOAKBNHOPUQJE
        10 19-APR-11 DEGNPALVLMIDYCYIQIIQJJVZFTNIMEULMAGDEWVTOAKBNHOPUQJ
...
        10 24-JUL-11 TJGLOEITTVXQTQPHSKGVERSGJDREYSKKCDUFMQXQVXMHMMDWPLJNSNK
        10 24-JUL-11 TJGLOEITTVXQTQPHSKGVERSGJDREYSKKCDUFMQXQVXMHMMDWPLJNSNKCN
        10 25-JUL-11 BCLLVPYMWAAQOVLILXARQZXEGAQAARPURIFKFKHROUSFORRYYXQZUAJHDBL

108 rows selected.


Execution Plan
----------------------------------------------------------------------------
| Id  | Operation        | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT |         |   100 |  6900 |     4   (0)| 00:00:01 |
|*  1 |  INDEX RANGE SCAN| CHIO_PK |   100 |  6900 |     4   (0)| 00:00:01 |
----------------------------------------------------------------------------

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          6  consistent gets
          0  physical reads

We actually gathered more data, 108 rows compared to 82, all for 6 consistent gets compared to 86 consistent gets. That is a reduction to less than 10% of the original effort.

Now for a more extreme test. I am going to select a single row summary of data for 10 parents, flushing the cache between each run to show the impact when you have to do real IO to support those consistent gets. This is on a fairly old {4 years} laptop with a rather tired hard disc

alter system flush buffer_cache

System altered.

Elapsed: 00:00:00.18

--
--

select count(*),sum (num_1)
from child_heap
where pare_id between 50 and 60

  COUNT(*) SUM(NUM_1)
---------- ----------
      1155      12031

Elapsed: 00:00:06.39

Execution Plan
-------------------------------------------------------------------------------------------
| Id  | Operation                    | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |            |     1 |     7 |  1203   (0)| 00:00:18 |
|   1 |  SORT AGGREGATE              |            |     1 |     7 |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID| CHILD_HEAP |  1200 |  8400 |  1203   (0)| 00:00:18 |
|*  3 |    INDEX RANGE SCAN          | CHHE_PK    |  1200 |       |     7   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
       1157  consistent gets
       1112  physical reads

--
--

alter system flush buffer_cache

System altered.

Elapsed: 00:00:00.18

--
--

select count(*),sum (num_1)
from child_iot
where pare_id between 50 and 60

  COUNT(*) SUM(NUM_1)
---------- ----------
      1111      11528

Elapsed: 00:00:00.29

Execution Plan
-----------------------------------------------------------------------------
| Id  | Operation         | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |         |     1 |     7 |    24   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE   |         |     1 |     7 |            |          |
|*  2 |   INDEX RANGE SCAN| CHIO_PK |  1200 |  8400 |    24   (0)| 00:00:01 |
-----------------------------------------------------------------------------

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
         25  consistent gets
         25  physical reads

The Heap took 6.39 seconds and required 1112 physical reads to support 1157 consistent gets.
The IOT took 0.29 seconds and required 25 physical reads to support 25 consistent gets.

I think we can all see that IOTs have the potential to greatly reduce physical and logical IO. Perhaps we should all be using IOTs more.

Final point. The Heap version took less physical reads than consistent gets as some blocks read into the block buffer cache held data required later in the query.

The impact of IOTs on the buffer cache will be the topic of my next post on IOTs. I think { hope:-) } that many of you will be very interested and impressed by what you could gain…

Comments»

1. Index Organized Tables – the Basics. « Martin Widlake's Yet Another Oracle Blog - August 2, 2011

[…] Examples and proofs..> Greatly reducing IO with IOTs….> […]

2. Bernard Polarski - August 3, 2011

Nice series of articles.

Any suggestion on how to carve the three characters that makes the word ‘IOT’ into DEV and Architect skull?
16 years into Oracle job and I may have not seen more than 5 IOT into production.

mwidlake - August 3, 2011

Thanks Bernard. I hope you like the next couple of bits too

Carving IOT into the skulls of architects huh? 🙂 Sounds painful.

You don’t see clusters either, unless you look at the data dictionary. If the designers of Oracle think they are a good way to hold the metadata crown jewels, what does that say?

All you can do is try and spot where physical placement of data is beneficial and argue for it and demonstrate it. Oh, and play with it yourself so you understand it well.

Dom Brooks - August 3, 2011

> You don’t see clusters either
Ooo – a series on hash clusters? Excellent. I’m looking forward to it, once you’ve finished on IOTs obviously. And the anticipation of the pictures is mouthwatering…

mwidlake - August 3, 2011

Maybe hash clusters. Maybe not. Depends how nice to me you are 🙂

3. Darryl Griffiths - August 3, 2011

Nice post(s).
Would you consider expanding the series to include an analysis on the time impact of performing DML on an IOT versus a non-IOT?
If your theory is correct, it should be quicker since that extra 1 (primary key) block of I/O will not need to be performed on the table, only the index.
Does this affect the amount of undo and redo?
Would you consider the I/O saving benefits to be logarithmic in scale when comparing the number of rows in the table?
You can probably see where I’m going with this. I guess I’m asking if this has additional benefits for the general performance of the DB. Especially if you can save undo and redo too.

mwidlake - August 3, 2011

Hi Darryl,

I’m goiing to look at the DML side of things in a later post – if I get that far {I’ve done this before, as in sketched out a series of posts – and not quite finished them, on SQL Audit}
Undo and redo is impacted as you only have to write the changes information for the index – but as a row takes up more space and thus there are block splits and the like…Plus if you used compression or overflow segments….

Martin

4. IOT Part 4 – Greatly Boosting Buffer Cache Efficiency « Martin Widlake's Yet Another Oracle Blog - August 8, 2011

[…] <..IOT1 – the basics <….IOT2 – Examples and proofs <……IOT3 – Significantly reducing IO […]

5. IOT 2 – First examples and proofs « Martin Widlake's Yet Another Oracle Blog - August 8, 2011

[…] <.. IOT Basics Great reductions in IO for IOTs..> […]

6. IOT Part 5 – Primary Key Drawback – and Workaround « Martin Widlake's Yet Another Oracle Blog - August 17, 2011

[…] <..IOT1 – the basics <….IOT2 – Examples and proofs <……IOT3 – Significantly reducing IO […]

7. IOT Part 6 – Inserts and Updates Slowed Down (part A) « Martin Widlake's Yet Another Oracle Blog - November 1, 2011

[…] – the basics <….IOT2 – Examples and proofs <……IOT3 – Significantly reducing IO <……..IOT4 – Boosting Buffer Cache efficiency <……….IOT5 […]

8. IOTs « Oracle Scratchpad - November 22, 2011

[…] Part 3: Greatly reduced I/O […]

9. Dax - February 3, 2012

Good Post(s)!

One question – Why in your first example there is no physical read at all in both type of tables?

Regards
Dax

mwidlake - February 13, 2012

Sorry for the long delay Dax, the day job distracted me.

Thanks for the positive feedback. The reason that there are no physical IOs is that I run each statement once, to get it parsed, and then run it a second time. By then the blocks are cached in the SGA.

One of the problems with both real-world testing and demonstration testing (like this) is that you either have to have the parse overhead in your tests (first run), have the data fully cached (second run) or spend a lot of time constructing tests that show a partially cached working data set that is close to reality (loads of runs). If you know what reality looks like for that situation.

For Proof of Concept work I aim for the last option, for other purposes I aim for the second option unless I am purposefully showing what happens when you have none of the relevant data cached.

It also takes quite a bit of effort to explain the whole caching and parsing thing, even to technicians. With management “way up there” it’s nigh on impossible. So I’ve got lazy now and on my blog at least I usually just mention the caching conundrum in passing.

10. Construyendo un Arbol B+ (B+Tree) « El bienestar de la mayoría, supera al bienestar de la minoría. O de uno solo. - August 20, 2012
11. Camille - May 13, 2014

Appreciating the time and energy you put into your website and in depth information you offer.
It’s nice to come across a blog every once in a while that isn’t the same
unwanted rehashed material. Fantastic read! I’ve
saved your site and I’m adding your RSS feeds to my Google account.


Leave a reply to Camille Cancel reply