jump to navigation

IOT Part 4 – Greatly Boosting Buffer Cache Efficiency August 8, 2011

Posted by mwidlake in development, performance.
Tags: , , ,

<..IOT1 – the basics
<….IOT2 – Examples and proofs
<……IOT3 – Significantly reducing IO
……..>IOT5 – Primary Key issues
……….>IOT6a – Slowing Down Insert
…………>IOT6(B) – OLTP Inserts

So far I have covered the basics of Index Organized Tables, created some examples and shown how IOTs can help reduce the cost of single row selects and then how they can greatly aid range scans. Follow the above links for details.

Now I’d like to show that the use of IOTs has the potential to make your block buffer cache (BBC) far more efficient. Going to disc is very,very slow compared to going to memory {NB solid state storage improves this situation but does not remove it}. The block buffer cache has always been critical to oracle SQL Select performance as it allows you to access data in memory rather than disc and in general the more block buffer cache you have the faster your system will be.
{I am of the opinion that the BBC is even more important now than ever. As hard discs get larger we are seeing fewer and fewer spindles per GB of storage and, in essence, disc storage is effectively getting slower – because more data is hosted on the same number of spindles and those spindles are not themselves getting faster – I digress, for more details see posts Big Discs are Bad and IOPs and Form Factors}

In the scenario I’ve covered in my previous posts on IOTs we have a system where child data is coming in for many parents every day for 100 days. With a heap table the data pours into the growing end of the table, usually a record or two per parent each day and no guarantee that if two records come in, they will be put into the same block.

So, when you select a child record for a parent you get the situation shown below:

For many systems, the Block Buffer Cache is holding a lot of data no queries asked for -collateral data

When oracle needs to collect a record from the table, it has to read the whole block. Oracle only reads in tablespace data in whole blocks. That record comes with many other records in it that you did not ask for or want. I refer to this as Collateral Data – innocent bystander data that has got pulled into the BBC just because it was in the same block as required data. The larger the block size, the more collateral data there is.

To get all the child records for the parent, you need to read all those individual table blocks holding one or two records of interest. For our 100 child records you will probably need to read in close to 100 table blocks. Your Block Buffer Cache is filling up with of blocks where only one row out of each block is “of interest”. If that is one row out of 80 in a block, you are effectively wasting 98.75% of the space that table takes up in the block buffer cache.

With an IOT the situation is very different. We have already seen in my previous post on reducing IO that for a range scan on the IOT, oracle does not need to go and collect records from blocks scattered throughout the table. It simply collects the IOT leaf blocks holding the relevant data. Not only does this require less IO, it also results in the fetched blocks mostly holding the required data. The percentage of collateral data is greatly reduced:

IOTs are a powerful tool in reducing collateral data and using the BBC more efficiently

Thus instead of 100 table blocks that mostly hold collateral data, you have 2 or 3 blocks holding mostly the data you are interested in. Your wastage, the collateral data, is about 33-50%. With my example tables from post IOT2, it is indeed an average of 2-3 IOT blocks holding all the data for a single parent and 100 heap table blocks holding the same data.

You can think of it another way.

With my IOT I use only 3% of the memory to cache a single set of records for a parent compared to that needed with a normal HEAP table.

Let’s extend that idea a little. Let’s say I have 100,000 customer and 5% of the customers are active.
Each customer has on average 500 * 200-byte activity records for 100K of data
Each 8K Heap Table block holds 40 records, a very inefficient IOT holds 20 records per block {I’ll go into the details of this in a later post, but I am being a little unfair on the IOT here}.
There is some grouping of records for the customer so that those 500 records are over 400 blocks.
The whole working set of 400 blocks * (100,000*0.05) customers *8K = 2,000,000K

Now replace the table with an IOT keyed on customer ID.
Those 400 records would be spread over (400/20)+2 blocks. The +2 is the start and end of the range.
That is 22 *(100,000*0.05) *8K = 110,000K

Both would also need the overhead of an index structure to be cached also, for the Heap table it is the Primary Key index, for the IOT it is the rest of the IOT structure.

So you would need 2GB or so of Block Buffer Cache dedicated to caching the working set held as a heap table and 110MB of Block Buffer Cache dedicated to caching the IOT equivalent.

How many of you spotted that the space needed to “cache the heap table working set” is actually twice the size of the table? Well, that is because with a 5% working set and 40 records per block, there is a fair chance that some of those Collateral Data records in each block are for the 5% working set. Over all, almost every block will hold data for two or three active users. This is one of the complications of working out how efficient you block buffer cache is likely to be and I’ll have to leave that to another post. In reality you would need to hold 99% of the heap table in memory to cache the whole working set, so still 1GB. The IOT is still far more efficient.

I’ll just finish by saying that on one project I worked on we would have needed several hundred GB of Block Buffer Cache to hold the working set of the main tables if held as heap tables. That volume of memory was simply not available. With IOTs this reduced to about 40GB. This was available. The majority of this working set was able to stay in their SGA and it meant that so long as the instance stayed up, that working set of data mostly stayed in cache. I was able to see on the live system that processing of the data for these active customers was mostly being supported by consistent gets and less than 1% disk reads.

So, by using IOTS I reduced consistent gets dramatically, made more efficient use of the block buffer cache and, as a result of that, reduced the number of physical reads needed to support the consistent gets.