jump to navigation

Cost of full table scans #2 June 30, 2009

Posted by mwidlake in performance, Uncategorized.
Tags: , ,

This is the second post in a series on Full Table Scans. Sorry to anyone waiting for part two, I got distracted {I suspect very few people were actually waiting :-)} I make up for it by making posting this veeeery long.

Post One – I got several comments, mostly pre-empting what I planned to say next {I am not complaining as such, I like to get feedback and Randolf Geist in particular was being very helpful}, but I’m going to say what I wanted to say anyway, especially as I was being purposefully misleading in my first post…

I’m attempting to show three things in this series of posts.

  1. How table scans are costed by the CBO under 10g and 11g, as it is quite different from before.
  2. That simple test cases of what you knew once can fail to work and leave you confused and questioning your knowledge.
  3. How levels of “truth” help you understand more as you dig deeper.
  4. As a fourth I might touch on how Oracle’s calculated cost is not actually always reflecting how it executes sql, but I need to work on that a little.

If you thought that db_file_mutliblock_read_count is the number of blocks that oracle reads in one go when scanning tables and indexes and that this improves performance, that will help you understand what is going on inside your Oracle database.
If you think that increasing the value will make full table scans look more efficient to Oracle and thus it will favour them, you are again right and it may well help you tune a system.
However, you may know more – you may know that Oracle modifies the value of the MBRC to make it more realistic. As the MBRC gets larger, things like the size of the segment, hitting the end of individual extents and the physical time taken to read data off disk and chuck it across the network makes the efficiency gains of these larger values less significant. Each layer of knowledge helps a little more. Then the technology underneath shifts and you may not realise this as the older knowledge still kind of helps.

I’m not trying to be smart or “I know better” here, I’ve posted before that none of us knows this stuff until we are told and I certainly did not know all of what is in these posts when I started preparing them 3 or 4 weeks ago. Some of what I plan to say I am still not sure of. But I’ll postpone the philosphy now for a later blog post, though as I have a rough version already written, here it is if you want.

Back to the technology and the costing of full table scans.

Here is my test script again.

It creates a couple of tables, the one of interest is TEST_BIG, holding 40,000 rows. TEST_BIG is 2304 blocks big but only 2267 blocks are occupied. As nothing has been deleted from the table, this is also the high water mark, ie the highest point in the table into which data has been inserted. Oracle keeps track of this high water mark. The table is in a locally managed, autoallocate tablespace with automatic segment space management. All the defaults.

This is my environment:-
—————————— ——————–
cpu_count 2
db_block_size 8192
db_file_multiblock_read_count 128
optimizer_mode ALL_ROWS
sga_target 0
sort_area_size 65536

Just to re-itterate, I have not gathered system statistics -the system is “out of the box”.

From Oracle 10 onwards, there are some default system statistics set. You can see these by looking at the sys.aux_stats$ table (click on “view plain” to see it properly laid out):-

 select sname,pname,pval1,pval2
 order by sname,pname

SNAME           PNAME                PVAL1 PVAL2
SYSSTATS_INFO   DSTART                     10-15-2007 11:32
SYSSTATS_INFO   DSTOP                      10-15-2007 11:32
SYSSTATS_INFO   FLAGS                    1
SYSSTATS_MAIN   IOSEEKTIM               10

There are only 3 values set {on both version 11g and 10g},

  • CPUSPEEDNW – an estimate of the speed of the CPU. This value is different on different servers. Maybe. Both my old V10 windows test boxes have a value of 484, even though one is a truely old laptop, yet the linux servers I have access to all have different values, ranging from 713 to 1778.
  • IOSEEKTIME – This is the time for the IO subsystem to locate the start of an item. In both v10 and 11, this is 10 milliseconds
  • IOTFRSPEED – This is the volume of data that can be read from the io subsystem in 1 millisecond – 4096 bytes or 4k, again in V10 and 11

These default figures for IOSEEKTIME and IOTFRSPEED are fairly conservative, especially as they have not changed since V10, but then seek time has not really improved (especially on single discs).

In my experience, and looking around on the net, it seems that many sites on V10 upwards have not gathered system stats {BTW see the end of this post for some clarification on system statistics}. This seems to be due to one of several factors:

  • Some sites do not have a DBA or one who knows about system statistics {I’m not criticising, if you have been labelled a DBA and thrown the whole of the Oracle stack to look after, there is a lot to get your head around}.
  • Some DBAs think the system stats are gathered by the automatic stats job that {sort of} runs every night .
  • Changing to using system statistics would change execution plans and thus need full regression testing, some re-tuning, which there is no time or money for.
  • Hey, it works, why change things {this can also be viewed as  “I am fundamentally scared of change”🙂 }.

I can understand all of the above, and as a result of them many of us benefit from understanding how the default system statistics work. I would suggest it is better to gather system statitics than not as you are giving the oracle CBO optimizer more accurate information about the actual speed of your cpu, your IO and the difference between single block reads and multi-block reads, but life is not perfect. {NB I’ve heard on the grapevine that you should not gather system stats if you run Oracle Apps 11i but I am no expert on Oracle Apps!}

So what does Oracle do with the “out of the box” information? It calculate values for SREADTIME {time to read a single block} and MREADTIM {the time taken to do a multi-block read. If you think about how discs work, to read something the head of the disc must locate where the data is and physically move to it – this is the seek time – and then read the data.
So for a single block (8k) the time taken is

IOSEEKTIME+(size of one block/IOTFRSPEED).
ie 10+(8192bytes/4096 bytes) = 12.

For a multi block read, then the disc head will seek once and then read db_file_multiblock_read_count number of blocks at once {just accept this for now, please}. In my case, 128*8k.

IOSEEKTIME+( (MBRC * Size of one block) / IOTFRSPEED)
ie 10+( (128*8192) / 4096)
= 10+(1048576/4096)
=10+(1024k/4k) =10+256 =236
I don’t know about you but I can’t divide 1048576 by 4096 just like that but 1024/4, yep easier.

The CBO can only cost something if it can treat all parts in the same units, so it converts those multiblock read counts to equivalent single block read counts by dividing the time for the multi block read by the time of the single read.
236/12 = 19.6667.
Oracle then rounds the figure to the next whole number, so 20.

{Incidentally, it does this with the CPU cost as well.
Oracle makes a guess at how much cpu effort an activity will take and then divides it by the CPU speed and the time for a single block read – I’m being a little simplistic here, but it will do for now}.

So, a single block read is cost 1
A multi block read of 128 is cost 20, and reads 128 times as much.

In previous versions of Oracle this was all different. In 8i, single block reads and multi block reads were costed the same. In 9i without cpu costing {no cpu costing being the default and I can’t remember seeing it changed on a live system I saw} a “fudge factor” was introduced where values for MBRC were reduced down to better match reality. And in my first post I misleadingly referred to a Fudge Factor and gave a table. Anyone who was used to this fudge factor may have noticed my numbers did not match, but then who of us remembers that the fudge factor for 8 as the multi block read count is 6.19? 🙂

{That table is still useful, it shows an effective ratio for the MBRC, IF you have gatherd no stats AND you have an 8k block size. OK, maybe it is not so useful…}

Anyway, as you can now see from the above formulae, Oracle on 10g and 11g now uses a fairly sensible piece of logic to decide on the relative costs of single and multi block reads. The CBO formula is basically saying you save the cost of multiple seek times when reading multiple blocks, but the time to transfer data is directly related to the volumeof data read. This is probably reasonable on a system with a simple IO setup (like single disks or very basic RAID) but will break down with high-end storage which has read-ahead and SAN caches and the like.

OK, I have to address one more issue before I demonstrate that the formula is correct. And it is something that I did not know and confused the hell out of me when I decided to do a “simple test” of what I thought I knew, and prompted me to go on endlessly about how doing a simple test to prove what you have learned can go wrong and send you crazy with confusion. And thanks again to Randolf Geist for pointing me in the right direction.

At the end of the last post I had shown that my multiblock read count was 128 but acted like it was 8.
ie I got a cost of 617 for scanning TEST_BIG. When I manually set my MBRC to 8 I also got a cost of 617. When I manually set a value for MBRC of 128, it gave a cost of 395. Huh?!

Coskan alluded to this bit, but I missed it – I was ignorant about what he was saying.

On a newly created Oralce 11.1 test database, created as part of a simple instal on windows vista, you have db_file_multiblock_read_count set to:-

show parameter db_file_multiblock_read
NAME Value
db_file_multiblock_read_count 128

But there is a hidden parameter:-
—————————- —– —–
_db_file_optimizer_read_count 8
multiblock read count for regular clients

{To see hidden parameters you need to be logged in as sys or use a trick like intermediate views – here is a script you can run as sys to see them. Warning, there are a LOT of them! I’ll do yet another post about hidden parameters sometime soon.}

How the hell are we supposed to know this???! Do Oracle do this on purposed to keep us awake at night worrying about this???.

On Oracle 10, this hidden initialization parameter defaults to 8 and MBRC defaults to 8 {usually – I just recreated a new test database and both are set to 16 “out of the box”}. If you change MBRC, the hidden parameter alters too, to the same value, so you do not notice it so much.
On 11, MBRC is set to 128 and the hidden parameter is set to 8 initially. Once you change MBRC, the hidden _db_file_optimizer_read_count is also changed to match.

So, as my first paragraph on the first posting said, you go on to a test database to check something and you get confused as the very first “simple” test does not give you the results you expected.

OK, does the formula work?
In the below table I list:

  • The MBRC that I set by altering my session
  • The cost as reported by Explain Plan
  • The calculated MREADTIM = IOSEEKTIM+ (MBRC*blocksize/IOTFRSPEED)
  • The calculated SREADTIM  = IOSEEKTIM+(blocksize/IOTFRSPEED)
  • MREATIME/SREADTIM { I’ll call it MT/ST}
  • Blocks in table/MBRC {I’ll call it blks/MBRC
  • (blks/MBRC)  * (MT/ST) to get a calculated cost in single block reads
MBRC Cost MRT SRT MT/ST blks/  Calc cost
2   1325  14    12  1.167   1133.5     1323
3   1010  16    12  1.333   755.667   1008
4    853   18    12  1.5       566.75     851
6    695   22    12  1.833   377.833   693
8    617   26    12  2.167   283.375   615
10  569   30    12  2.5       226.7       567
12  538   34    12  2.833   188.917   536
16  499   42    12  3.5       141.688   496
20  475   50    12  4.167   113.35     473
24  459   58    12  4.833   94.458     457
32  440   74    12  6.167   70.844     437
33  438   76    12  6.333   68.697     436
64  410   138  12  11.5    35.422     408
128 395  266  12  22.167  17.711   393

Yes 🙂
The calculated cost is always 2 less than the cost given by the CBO. I know that under Oracle 10 1 is added to the cost for reading the table header. I am presuming that an additional 1 has been added for some reason under 11.

That will do for now.

Oh, as promised, some comments on system statistics

Oracle System statistics are NOT statistics collected on the tables and indexes owned by users SYS or SYSTEM. These would be “Dictionary Statistics”. These can be collected via specific calls to the DBMS_STATS.GATHER_DICTIONARY_STATISTICS and {if you have not turned off the automatics gathering of object statistics} are gathered automatically during weekday nights and over the weekend by the automated job, which gathers table and index stats when it feels they are needed. I’ll be posting about that {probably at inordinate length} in a month or so.
System Statistics are statistics about how the underlying computer system works, specifically cpu speed and IO subsystem performance.
Even Oracle Support can get confused over this. You might like to check out this unfortunate experience on the topic. I have to say, I have much sympathy with Piet as I had very similar issues trying to discuss automated stats gathering with Oracle a couple of years ago. Having said that, I in the end got good support on the topic.