jump to navigation

cost of full table scans June 17, 2009

Posted by mwidlake in performance.
Tags: , ,
trackback

As many sites point out, the cost as stated in explain plan is {being a little simplistic} a calculation by the CBO of the number of IO operations it thinks will be needed to complete that step of the plan. {It includes the cost of any previous step or steps that feed into it but many examples are nice and simple and have only a couple of steps in them}. 
However, it can be confusing when you decide to look at some simple examples and don’t see what you expect, as you start with simple table scans.

My test script  can be found here .

I set up a simple pair of test tables

-- create a table with 20k rows, an ascending numeric PK
-- a unique VC
-- a non unique numeric
-- a sparsly populated numeric
-- and some padding fields.
create table test_big
as select rownum  id
,mod(rownum,10)  num_1to10
,trunc(dbms_random.value(1,1000)) num_1k_vals
,'A'||to_char(rownum+1000) id_2
,trunc(dbms_random.value(1,100)) num_100_vals
,mod(rownum,5)  num_4
,decode(mod(rownum,10),0,trunc(dbms_random.value(1,30))
        ,null) num_sparse
,lpad('A',50,'A') vc_1
,lpad('B',50,'B') vc_2
,lpad('C',250,'C') vc_3
from dba_objects
where rownum < 40001
/
create table test_small
as select rownum  id
,mod(rownum,10)  num_1to10
,trunc(dbms_random.value(1,1000)) num_1k_vals
,'A'||to_char(rownum+1000) id_2
,trunc(dbms_random.value(1,100)) num_100_vals
,mod(rownum,5)  num_4
,decode(mod(rownum,10),0,trunc(dbms_random.value(1,30))
        ,null) num_sparse
,lpad('A',50,'A') vc_1
,lpad('B',50,'B') vc_2
,lpad('C',250,'C') vc_3
from dba_objects
where rownum < 101
/
-- now add statistics
-- NB each exec must be on one line
exec dbms_stats.gather_table_stats
(ownname=>user,tabname=>'TEST_BIG'
,estimate_percent=>100,cascade=>true)
exec dbms_stats.gather_table_stats
(ownname=>user,tabname=>'TEST_SMALL'
,estimate_percent=>100,cascade=>true)
--
desc test_BIG
DESC TEST_SMALL
--
set autotrace on
set timi on
set pause off
select count(*) from test_big;
select count(*) from test_small;

I’m running in Oracle 11.1.0.6 on a windows vista laptop, multi block read count is 128, no system statistics have been collected.

It creates one table with 40,000 records and one with 100 records and gather stats on them. I’ll now select count(*) from both and look at the plan.

select count(*) from test_big;
  COUNT(*)
----------
     40000

Execution Plan
-----------------------------------------------------
| Id  | Operation          | Name     | Rows  | Cost (%CPU)|
-----------------------------------------------------
|   0 | SELECT STATEMENT   |          |     1 |   617   (1)|
|   1 |  SORT AGGREGATE    |          |     1 |            |
|   2 |   TABLE ACCESS FULL| TEST_BIG | 40000 |   617   (1)|
-----------------------------------------------------

select count(*) from test_small;

  COUNT(*)
----------
       100

Execution Plan
-----------------------------------------------------
| Id  | Operation          | Name       | Rows  | Cost (%CPU)|
-----------------------------------------------------
|   0 | SELECT STATEMENT   |            |     1 |     4   (0)|
|   1 |  SORT AGGREGATE    |            |     1 |            |
|   2 |   TABLE ACCESS FULL| TEST_SMALL |   100 |     4   (0)|
-----------------------------------------------------

The costs of full table scans are 617 for the large table and 4 for the small table. We have full stats on the tables and they are just simple tables, no indexes, we should be able to see where the numbers come from.

Any guesses? Well, it could be the number of blocks making up the objects:-

select table_name,num_rows,blocks,empty_blocks
from dba_tab_statistics
where owner=user
and table_name in ('TEST_SMALL','TEST_BIG');

TABLE_NAME   NUM_ROWS     BLOCKS EMPTY_BLOCKS
---------- ---------- ---------- ------------
TEST_BIG        40000       2267            0
TEST_SMALL        100          9            0

Err, no. 2267 blocks in TEST_BIG and 9 blocks in TEST_SMALL.
{btw the view DBA_TAB_STATISTICS holds a subset of the information on DBA_TABLES and I tend to use it if I just want stats info}.

It could be the number of blocks in the segment maybe?
run @seg_dets.sql .

Enter value for seg_name: test
Enter value for owner: t

SEG_NAME    SEG TS_NAME   BYTES_K  BLOCKS exts  INI_K NXT_K
------------------------------------------------------
TEST_BIG    TAB DATA_01      18,432     2,304   33      64
TEST_SMALL   TAB DATA_01         128        16    2      64

No, it isn’t that. 2304 is not very different from 2267, as you would expect as we simply created and populated the table.

Hands up anyone who is thinking about db_file_multiblock_read_count? MBRC is the number of blocks Oracle will try and select from a segment in one I/O operation if it is scanning it. Well, MBRC is set to 128 on this system {it’s “out of the box” and I question it being that high actually. On 10.2 on windows it is, I think, 8}.

That sounds incredibly sensible, we can check if it has scanned all those blocks in chunks as dictated by the MBRC. Well, divide 2304 {number of blocks used by the segment} by 617 {I/O cost} and you get…3.734. Nothing like the MBRC.

What’s going on?!!!


Well, oracle modifies the value of db_file_multiblock_read_count before using it in calculations. But it IS the source of the calculation. You can change MBRC at a session level to check:-

alter session set db_file_multiblock_read_count=4;
Session altered.

select count(*) from test_big;
  COUNT(*)
----------
     40000
Execution Plan
----------------------------------------------------------
| Id  | Operation          | Name     | Rows  | Cost (%CPU)|
----------------------------------------------------------
|   0 | SELECT STATEMENT   |          |     1 |   853   (1)|
|   1 |  SORT AGGREGATE    |          |     1 |            |
|   2 |   TABLE ACCESS FULL| TEST_BIG | 40000 |   853   (1)|
----------------------------------------------------------

alter session set db_file_multiblock_read_count=8;
Session altered.

select count(*) from test_big;
  COUNT(*)
----------
     40000
Execution Plan
----------------------------------------------------------
| Id  | Operation          | Name     | Rows  | Cost (%CPU)|
----------------------------------------------------------
|   0 | SELECT STATEMENT   |          |     1 |   617   (1)|
|   1 |  SORT AGGREGATE    |          |     1 |            |
|   2 |   TABLE ACCESS FULL| TEST_BIG | 40000 |   617   (1)|
----------------------------------------------------------

At an MBRC of 4 the CBO calculates a cost of 853 to scan the TEST_BIG table and at an MBRC of 8 the CBO calculates a cost of 617.

This script here will run through a set of values for db_file_multiblock_read_count and run the count(*) so you can see the values going down as MBRC goes up, but oracle discounts the effect of the parameter as it gets larger.

To save you the effort, these are the costs at each value of MBRC I checked and I calculate the modifed MBRC {based on 2267 blocks, 2267 is the best number to use as it is likely to be the high water mark or very close}.

MBRC          modified
Value  Cost    MBRC
2      1325    1.711
3      1010    2.245 -- not a Magic Number
4      853     2.658
6      695     3.262
8      617     3.674
10     569     3.984
12     538     4.214
16     499     4.543
20     475     4.773
24     459     4.939
32     440     5.152
33     438     5.176 -- not a Magic Number
64     410     5.529
128   395      5.729

What do I mean by the comment “it does not need to be a magic number”. Well, MBRC is normally set to 2, 4, 8, 16, 32,64… a factor of 2. All examples I can remember seeing used have been a Magic Number, so I thought I would check if Oracle does any rounding to such numbers or insists on even values. And it does not.
{quick comment – the table has changed since the first post, I stuck the segment size not the hwm into my spreadsheet}

A couple of final comments for today:

  • How in heck can you know about this fudge unless someone tells you? And yes, I know as someone told me.
  • The values I gained on 10.2 were very,very similar, consistently a cost 3 higher than on 11.1
  • Why does scanning 9 blocks have a cost of 4?
  • Why was I getting a cost of 617 with the default MBRC of 128 and yet when I set my session to have an MBRC of 128 the cost is calculated at 395?

I don’t know all the answers to the above yet.

 

About these ads

Comments»

1. coskan - June 18, 2009
2. John Brady - June 18, 2009

The dbfile_multiblock_read_count (DBFMBRC) value is adjusted by the Optimizer to factor in the different performance behaviour of multi-block reads compared to single block reads.

I have written up on this in a recent blog entry myself on Full Table Scan Costing. I try and describe how the Optimizer does the costing of scans you are on about. I am slowly doing other things such as Index Range Scan costing and Joins too in subsequent posts.

The Optimizer calculates the cost of a full table scan as:
(Blocks * MREADTIM) / (MBRC * SREADTIM)

And MREADTIM and SREADTIM are from the System Statistics. If these are not set, and you only have default No Workload System Statistics, then the Optimizer in turn calculates default values for these based on the assumed I/O transfer rate and disk seek time. I go through this in the blog post I mentioned.

John

John Brady - June 18, 2009

The URL of the blog post on Full Table Scans seems to have been modified somehow, and does not appear as a hyperlink when I view this page. So I posting the full URL below:

http://databaseperformance.blogspot.com/2009/05/oracle-optimizer-plan-costing-full.html

John

3. Randolf Geist - June 18, 2009

Martin,

the results you got on 11.1.0.6 are consistent with what you get with default NOWORKLOAD system statistics. These are active from 10.2 on by default even if you don’t have gathered system statistics explicitly.

I recommend reading my series on system statistics on my blog, but to give you an impression of an explanation for your 617 cost of the FTS:

Having default NOWORKLOAD system statistics in place (check SYS.AUX_STATS$), you have IOSEEKTIM = 10 (ms) and IOTFRSPEED = 4096 (bytes transfer rate per microsecond).

If you have left db_file_multiblock_read_count unset, then at runtime it uses a value of 128 (e.g. what you see from v$parameter) but uses 8 for calculation (if no MBRC is set in the system statistics, otherwise this takes precedence). Further I assume a default block size of 8KB.

This results in a derived MREADTIM and SREADTIM as following:

SREADTIM = IOSEEKTIM + DB_BLOCK_SIZE / IOTFRSPEED = 10 + 8192 / 4096 = 12 (ms)

MREADTIM = IOSEEKTIM + (DB_BLOCK_SIZE * MBRC) / IOTFRSPEED = 10 + (8192 * 8 ) / 4096 = 10 + 16 = 26 (ms)

Now the cost calculation for the FTS uses these derived values:

I/O cost = Blocks below HWM / MBRC * MREADTIM / SREADTIM

Which simply means number of multi-block I/O request times the MREADTIM which gives you the estimated time in ms for the FTS. Now since Oracle used to report costs in single read request units, this result is divided by the SREADTIM to arrive at the “classic” cost units.

Slotting in your number:

2267 blocks / 8 (MBRC used for calculation) * 26 / 12 = 614

By default one is added to the cost (every FTS needs to visit the segment header first), so we arrive at 615 and the remaining part is the CPU cost calculated for the FTS operation, which seems to be close enough for a reasonable explanation.

This can be verified by looking at the 10053 optimizer trace.

Lots of further details and answers to most to your questions can be found in my series here:

http://oracle-randolf.blogspot.com/2009/04/understanding-different-modes-of-system.html

Best Regards,
Randolf

4. mwidlake - June 18, 2009

Thank you gentlemen for your input. Heck, thank you just for noticing my blog :-)
Coskan, I apologise for the delay in your comment appearing, akismet wrongly dumped it into the spam box as it has so many links.

Randolf, John, thank you for your explanations. As you point out, the results I see are because I have no system statistics yet. That is why I italicised the line saying so and, yes, my block size is 8k. {My test scripts pull out my basic setup and I paste that into my techie posts, but I went and deleted it by accident during editing. It will reappear shortly.}

You guys have been so helpful you have kind of skewed my pitch :-).
Post two was going to say something about having no system stats and how common that is or isn’t and then go on to show the gathering of system stats and the formulae.

The question is, do I still do that or just post saying “look at these guy’s blogs”?

Oh, and the bit I had not found out yet is why on 11 the MBRC is set to 128 but acts like it is 8 for costing purposes. But I think Randolfs postings point the way there…

5. Randolf Geist - June 18, 2009

Martin,

slight correction: You have already system statistics, but of the flavor “DEFAULT NOWORKLOAD”.

Quoting from my blog series:
“The important point to understand here is that starting with Oracle 10g system statistics are enabled by default (using the default NOWORKLOAD system statistics) and you can only disable them by either downgrading your optimizer (using the OPTIMIZER_FEATURES_ENABLE parameter) or using undocumented parameters or hints (“_optimizer_cost_model” respectively the CPU_COSTING and NOCPU_COSTING hints).”

My blog series also answer the question regarding the different values used for costing and at runtime for the MBRC value. It’s not a 11 feature, but has already been introduced in 10.2.

Regards,
Randolf

mwidlake - June 18, 2009

Yes, you are right Randolf, I made a semantic mistake in my reply. As I said in the original post, I have not yet gathered statistics. But the default ones are there. And that is a situation I come across on site.

Actually, your comment on different values for costing and runtime is not the answer to my question (it’s a very good point though and one I was not even going to touch on), my question is “why does my session have a db_file_multiblock_read_count” of 128, the default setting for the instance if it has not been altered, yet act like it is 8 for costing purposes, as I see when I set my session MBRC to different values. See the end of my posting. Is it a bug or is something over-riding the default MBRC? If I set my session MBRC to 128 the cost is 395 not 617…

Which still leaves me pondering the question of if I should bother blogging more or just point to other blogs on this topic?

6. Randolf Geist - June 18, 2009

Martin,

I’m not sure if I get you right, but I think my previous answer is the answer to your question regarding the MBRC and the different costs encountered depending the default value or explicitly set.

If you set the “db_file_multiblock_read_count” explicitly then the internal parameters “_db_file_optimizer_read_count” (MBRC used for costing) and “_db_file_exec_read_count” (used at runtime) are set to exactly that value of “db_file_multiblock_read_count”.

If you leave “db_file_multiblock_read_count” unset, then “_db_file_optimizer_read_count” defaults to 8 and “_db_file_exec_read_count” is set to the largest multi-block I/O request supported by the platform, which is usually 1MB, or 128 blocks with 8KB block size.

Things are different though if you have WORKLOAD system statistics where MBRC is part of the values measured (or explicitly set). Then this MBRC is used for costing, and “_db_file_exec_read_count” used at runtime, either the default value as described or equal to an explicitly set “db_file_multiblock_read_count”.

Further details are covered in the blog series.

Regarding the costing of a small tables with (whatever flavor of) system statistics, see e.g.

http://forums.oracle.com/forums/thread.jspa?messageID=3280024&#3280024

where an example is shown. It follows the same principle like shown above but it’s interesting to note that the cost for small tables can be – depending on the “db_file_multiblock_read_count” (or MBRC) setting and the default block size – surprisingly high.

Regards,
Randolf

mwidlake - June 18, 2009

Hi Randolf,

Ahhhh, yes, now that IS the answer to my outstanding question and indeed is what I had seen in your blog thread and needed to go back and check on (so much to do, so little time). So I now understand that by setting a value of MBRC, it is causing the underscore parameters to be set also. Before I set MBRC they have default values that are not the same as MBRC but are used in the costings algorithm.

I do not think it is at all helpful by Oracle Corp that the visible MBRC appears to be being used {and used to be used} but in reality now it is hidden parameters. Thank you for clarifying that.

I also realise something else. My original post ends with “I do not know answers to all of the above yet”. It can be read in one of two ways. I either mean I do not know any of the answers or it can mean I know some but not all. Which is why I am getting so much help from you guys? English can be a very non-precise language.

Randolf Geist - June 18, 2009

Martin,

regarding the separation of the MBRC used to cost and at runtime: It’s probably just another attempt to improve the cost calculation. I think its main purpose is when you start to use WORKLOAD system statistics. If you leave the defaults in place Oracle will attempt to perform the largest possible multi-block read requests on your platform and the measured average MBRC (stored as MBRC in sys.aux_stats$) represents then the actual MBRC achieved under “your” representative workload.

So from then on Oracle uses this measured MBRC (and a measured SREADTIM & MREADTIM) for calculation and still tries to perform as few I/O requests as possible at runtime, which should – in theory – generate execution plans that “suit” your particular environment and at the same time maximize the I/O throughput for multi-block read requests at runtime.

Christian Antognini however has already observed that there might be environments where using that large I/O requests is actually slower than using smaller ones, so this might not always be the optimal approach.

Regards,
Randolf

7. Cost of full table scans #2 « Martin Widlake’s Yet Another Oracle Blog - June 30, 2009

[…] Post One – I got several comments, mostly pre-empting what I planned to say next {I am not complaining as such, I like to get feedback and Randolf Geist in particular was being very helpful}, but I’m going to say what I wanted to say anyway, especially as I was being purposefully misleading in my first post… […]


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 166 other followers

%d bloggers like this: