jump to navigation

Counting the Cost #5 – accurate and fast July 9, 2009

Posted by mwidlake in performance.
Tags: , ,
1 comment so far

I started off my blog by discussing quick ways of selecting count(*) from tables. I never completed the series, so I’m back to it.

This is the previous post on the topic.

It covers how using ALL_TAB_MODIFICATIONSand the NUM_ROWS column in ALL_TABLES can be combined to get a very accurate estimate of count(*) for most tables in most situations.

If this does not work for you {e.g no privilege to see the tables or flush the information), or you absolutely must have the count as accurately as you can right now, as a Manager is threatening you, this is another method using a count table. It can be expanded to hold more than just count(*) and is particularly useful for partitioned tables {which are often huge}.

Basically, every time you select count(*)on a table {or even eg select count(*) where status=0} you are probably scanning a lot of table you scanned last time you did count(*) on that table. If you are not deleting records from that table, that scanning effort is effectively a waste of time – it won’t have changed since last time you did select count(*). So, as a developer or DBA, you can decide you can trust the last count(*), so you need only count more recent records.

If you have an ascending primary key {traditionally a number sourced from a sequence} you can store the max(PK) and count(*) in a simple table. You then count all the records with a PK greater than the stored one. I’ve used count tables several times in the past and two enhancements invariably crop up, so I am going to include them in the below example of how you can implement this.

You need the partitioning option to run the test yourself, so sorry you will need Enterprise Edition with the option. If you downloaded Oracle to your PC/linux box to play on, you probably have it. {it never ceases to amaze me that Oracle Corp will only allow Partitions to be used in EE edition and it is an extra cost option!}

You need to create tablespaces to hold the partition tables {or alter the test script to use existing tablespaces}. Feel free to take and modify this script  to create the tablespaces.

This is my test script .

This is my basic setup:-
NAME VALUE
—————————— ——————–
compatible 10.2.0.1.0
cpu_count 1
db_block_size 8192
db_file_multiblock_read_count 8
optimizer_mode ALL_ROWS
sga_target 612368384
sort_area_size 65536

My test table:-

create table test_p
 id    number(8) not null
 ,status number(1) not null
,num_1 number(3) not null
 ,num_2  number(3)
,vc_pad varchar2(2000))
tablespace parts_curr
 partition by range (id)
 (partition id_10k values less than (10000)
tablespace parts_old
 ,partition id_20k values less than (20000)
tablespace parts_old
 ,partition id_30k values less than (30000)
tablespace parts_old
 ,partition id_40k values less than (40000)
tablespace parts_old
,partition id_max values less than (maxvalue)
tablespace parts_curr
 )
/
-- {41999 records created - see script}
alter table test_p
add constraint pt_pk primary key (id)
using index
local(
partition id_10k tablespace parts_old
 ,partition id_20k tablespace parts_old
,partition id_32k tablespace parts_old
 ,partition id_40k tablespace parts_old
,partition id_max tablespace parts_curr)
/
-- gather stats
begin
dbms_stats.gather_table_stats(ownname=>user
  ,tabname=>'TEST_P',estimate_percent=>100,granularity=>'ALL');
  end;
/-- create the count table
create table test_p_running_count
  (max_id	number(8) not null
  ,row_count number(8)
,constraint tprc_pk primary key(max_id)
  )
/

OK, let’s count the number of records in TEST_P and then do the same but insert the data collected into the count table:-

select count(*) from test_p;

  COUNT(*)
----------
     41999
Execution Plan
----------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)| Time | Pstart| Pstop |
----------------------------------------------------------
|   0 | SELECT STATEMENT       |       |     1 |    26   (4)| 00:00:01 |       |       |
|   1 |  SORT AGGREGATE        |       |     1 |            |          |       |       |
|   2 |   PARTITION RANGE ALL  |       | 41999 |    26   (4)| 00:00:01 |     1 |     5 |
|   3 |    INDEX FAST FULL SCAN| PT_PK | 41999 |    26   (4)| 00:00:01 |     1 |     5 |
-----------------------------------------------------------

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
        111  consistent gets
          0  physical reads

insert into test_p_running_count (max_id,row_count)
select max(id),count(*) from test_p;

Execution Plan
----------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| P stop |
--------------------------------------------------------------
|   0 | INSERT STATEMENT       |       |     1 |     5 |    26   (4)| 00:00:01 |       |     |
|   1 |  SORT AGGREGATE        |       |     1 |     5 |            |          |       |     |
|   2 |   PARTITION RANGE ALL  |       | 41999 |   205K|    26   (4)| 00:00:01 |     1 |   5 |
|   3 |    INDEX FAST FULL SCAN| PT_PK | 41999 |   205K|    26   (4)| 00:00:01 |     1 |   5 |
---------------------------------------------------------------

Statistics
----------------------------------------------------------
         26  recursive calls
         21  db block gets
        115  consistent gets
          0  physical reads

So, it takes 111 consistent gets and an estimated cost of 26 to do a count(*) on this relatively small table, doing a fast full scan of the PK index.  It takes only slightly more effort to create a record to hold this information.

{As an aside, it is curious that the total estimated cost of the insert is the same as the select, suggesting naughtily that the insert is free, but that’s not my topic today}.

I now create 100 records by running a little insert script.

Let’s select the number of records and the new max(ID) created since we last stored information in the count table:

select max(id),count(*) from test_p
where id>(select max(max_id) from test_p_running_count)
 /
   MAX(ID)   COUNT(*)
---------- ----------
     42099        100

Execution Plan
----------------------------------------------------------
| Id  | Operation                     | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
Pstart| Pstop |
--------------------------------------------------------------
|   0 | SELECT STATEMENT              |         |     1 |     5 |     4   (0)| 00:00:01 |      |       |
|   1 |  SORT AGGREGATE               |         |     1 |     5 |            |          |      |       |
|   2 |   PARTITION RANGE ITERATOR    |         |  2100 | 10500 |     2   (0)| 00:00:01 |  KEY |     5 |
|   3 |    INDEX RANGE SCAN           | PT_PK   |  2100 | 10500 |     2   (0)| 00:00:01 |  KEY |     5 |
|   4 |     SORT AGGREGATE            |         |     1 |    13 |            |          |      |       |
| 5 | INDEX FULL SCAN (MIN/MAX)| TPRC_PK | 1 | 13 | 2 (0)| 00:00:01 | | |
---------------------------------------------------------------
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          3  consistent gets
          0  physical reads

The CBO scanned the index of the TEST_P_RUNNING_COUNT table using an index full scan (min/max), which means Oracle basically worked down the index to get a min or max value, which is highty efficient. This value was then used to do an index range scan of only one index partition (partition number 5).

Estimated cost was 4 and the execution took 3 consistent gets {NB this is for the second execution, it is 12 consistent gets an 4 recursive calls the first time, when the statement is parsed}.

But that only got the new max(id) and how many records have been created since you last recorded it, 100. What you really want is the count of records since you last recorded the count and max(id) PLUS the count as it was then. ie the full count now.
This is not straightforward in a single SQL statement as you want the count, a goup function, from one table and the single row value from another. You can do this though. You basically write SQL statements to do the bits you want and then select from them. My example script shows how I build up to this, but the final statement I came up with {and you might well be able to come up with better options yourself} is:

select change.max_id new_max,orig.max_id orig_max
,change.rowcount chg_c,orig.rowcount orig_c 
,change.rowcount+orig.rowcount tot_c
from
(select max(id) max_id,count(*) rowcount,1 tabjoin
  from test_p
  where id>(select max(max_id) from test_p_running_count) ) change
,(select max(max_id) max_id,max(row_count) rowcount,1 tabjoin
  from test_p_running_count)                                            orig
where change.tabjoin=orig.tabjoin
 /

NEW_MAX ORIG_MAX  CHG_C     ORIG_C    TOT_C
---------- ---------- ---------- ---------- -----------------------------
     42099      41999        100      41999                         42099

Execution Plan
----------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CP
U)| Time | Pstart| Pstop |
----------------------------------------------------------
|   0 | SELECT STATEMENT                |                      |     1 |    58 |     7   (0)| 00:00:01 |       |       |
|   1 |  NESTED LOOPS                   |                      |     1 |    58 |     7   (0)| 00:00:01 |       |       |
|   2 |   VIEW                          |                      |     1 |    29 |     4   (0)| 00:00:01 |       |       |
|   3 |    SORT AGGREGATE               |                      |     1 |     5 |  |          |       |       |
|   4 |     PARTITION RANGE ITERATOR    |                      |  2100 | 10500 |     2   (0)| 00:00:01 |   KEY |     5 |
|*  5 |      INDEX RANGE SCAN           | PT_PK                |  2100 | 10500 |     2   (0)| 00:00:01 |   KEY |     5 |
|   6 |       SORT AGGREGATE            |                      |     1 |    13 |  |          |       |       |
| 7 | INDEX FULL SCAN (MIN/MAX)| TPRC_PK | 1 | 13 | 2 (0)| 00:00:01 | | |
|*  8 |   VIEW                          |                      |     1 |    29 |     3   (0)| 00:00:01 |       |       |
|   9 |    SORT AGGREGATE               |                      |     1 |    26 |  |          |       |       |
|  10 |     TABLE ACCESS FULL           | TEST_P_RUNNING_COUNT |     1 |    26 |     3   (0)| 00:00:01 |       |       |
--------------------------------------------------------

Statistics
---------------------------------------------------------
          0  recursive calls
          0  db block gets
         10  consistent gets
          0 physical reads

{you might want to click on “show plain” in the above code window for a clearer layout}
That’s an estimated cost of 7 by the CBO and 10 consistent gets {again, the second itteration}. Remember, scanning the whole partitioned table was costed at 27 and took 111 consistent gets.

You can convert that sql select statement into an insert and use it to update your count table:-

insert into test_p_running_count (max_id,row_count)
 select change.max_id
--,orig.max_id,change.rowcount,orig.rowcount
 ,change.rowcount+orig.rowcount
 from
(select max(id) max_id,count(*) rowcount,1 tabjoin
   from test_p
   where id>(select max(max_id) from test_p_running_count) ) change
,(select max(max_id) max_id,max(row_count) rowcount,1 tabjoin
   from test_p_running_count) orig
where change.tabjoin=orig.tabjoin
 /

That is one of the improvements usually asked for when you have a count table, people want to see what the counts were historically, seeing as you are now holding the data in a table {why is it so many people suddenly want to keep information once it is seen in a table?}. That is why the above does an insert and not an update and my code selects the max(max_id) and max(rowcount). It is not perfect, if no records have been inserted into your massive table since last the max(id) and row count was inserted, you will get a duplicate primary key error. You could add a datetime column to get around this.

Also, my SQL is a little naughty in that I select the max(id) and max(rowcount). There is nothing in the table design to enforce the unspoken rule that they both appear in the “last record” of the table, but as a human you can see that they do. Again, the use of a datetime can help with this.

The final bit for today. The above two sql statments look a bit nasty, certainly a lot more complex than “select count(*) from table”. Also, they are not as efficient as they could be, the selecting max(row_count) is not supported by an index and could slow down over time. I could tweak the SQL statement even further to work around this, but a simple piece of PL/SQL does the job better, and is what I usually end up implementing {within a package, along with a load of other count table functions for several large tables}.

Here is a script to create a function . that gets the current count(*) for the table.

--get_new_max
create or replace function get_tp_rowcount
return number
as
--
v_rtn number;
v_extra_rc number;
v_last_rc   number;
v_max_id    number;
v_last_max  number;
begin
  select max_id,row_count
into v_last_max,v_last_rc
  from test_p_running_count
  where max_id=(select max(max_id) from test_p_running_count);
  select count(*),max(id)
into v_extra_rc,v_max_id
  from test_p
  where id>v_last_max;
dbms_output.put_line(to_char(v_last_rc)
||'~'||to_char(v_last_rc)
              ||'~'||to_char(v_last_max)
              ||'~'||to_char(v_max_id));
v_rtn :=v_last_rc+v_extra_rc;
return v_rtn;
end;
/

And, as you can see below, it remains an efficient trick. I’ll leave it to you to create a procedure to update the count table and with the observation that you can then argue for several hours as to how often you update the count table… :-)

select get_tp_rowcount from dual;

GET_TP_ROWCOUNT
---------------
          42299

1 row selected.

Elapsed: 00:00:00.01

Execution Plan
----------------------------------------------------------
| Id  | Operation        | Name | Rows  | Cost (%CPU)| Time     |
-------------------------------------------------------------
|   0 | SELECT STATEMENT |      |     1 |     2   (0)| 00:00:01 |
|   1 |  FAST DUAL       |      |     1 |     2   (0)| 00:00:01 |
-------------------------------------------------------------

Statistics
----------------------------------------------------------
          2  recursive calls
          0  db block gets
          5  consistent gets
          0  physical reads

--
--

select count(*) from test_p;

  COUNT(*)
----------
     42299

Execution Plan
----------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)| Time | Pstart| Pstop |
----------------------------------------------------------
|   0 | SELECT STATEMENT       |       |     1 |    26   (4)| 00:00:01 |       |       |
|   1 |  SORT AGGREGATE        |       |     1 |            |          |       |       |
|   2 |   PARTITION RANGE ALL  |       | 41999 |    26   (4)| 00:00:01 |     1 |     5 |
|   3 |    INDEX FAST FULL SCAN| PT_PK | 41999 |    26   (4)| 00:00:01 |     1 |     5 |
----------------------------- ----------------------------
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
        111  consistent gets
          0  physical reads

Cost of full table scans #2 June 30, 2009

Posted by mwidlake in performance, Uncategorized.
Tags: , ,
2 comments

This is the second post in a series on Full Table Scans. Sorry to anyone waiting for part two, I got distracted {I suspect very few people were actually waiting :-)} I make up for it by making posting this veeeery long.

Post One – I got several comments, mostly pre-empting what I planned to say next {I am not complaining as such, I like to get feedback and Randolf Geist in particular was being very helpful}, but I’m going to say what I wanted to say anyway, especially as I was being purposefully misleading in my first post…

I’m attempting to show three things in this series of posts.

  1. How table scans are costed by the CBO under 10g and 11g, as it is quite different from before.
  2. That simple test cases of what you knew once can fail to work and leave you confused and questioning your knowledge.
  3. How levels of “truth” help you understand more as you dig deeper.
  4. As a fourth I might touch on how Oracle’s calculated cost is not actually always reflecting how it executes sql, but I need to work on that a little.

If you thought that db_file_mutliblock_read_count is the number of blocks that oracle reads in one go when scanning tables and indexes and that this improves performance, that will help you understand what is going on inside your Oracle database.
If you think that increasing the value will make full table scans look more efficient to Oracle and thus it will favour them, you are again right and it may well help you tune a system.
However, you may know more – you may know that Oracle modifies the value of the MBRC to make it more realistic. As the MBRC gets larger, things like the size of the segment, hitting the end of individual extents and the physical time taken to read data off disk and chuck it across the network makes the efficiency gains of these larger values less significant. Each layer of knowledge helps a little more. Then the technology underneath shifts and you may not realise this as the older knowledge still kind of helps.

I’m not trying to be smart or “I know better” here, I’ve posted before that none of us knows this stuff until we are told and I certainly did not know all of what is in these posts when I started preparing them 3 or 4 weeks ago. Some of what I plan to say I am still not sure of. But I’ll postpone the philosphy now for a later blog post, though as I have a rough version already written, here it is if you want.

Back to the technology and the costing of full table scans.

Here is my test script again.

It creates a couple of tables, the one of interest is TEST_BIG, holding 40,000 rows. TEST_BIG is 2304 blocks big but only 2267 blocks are occupied. As nothing has been deleted from the table, this is also the high water mark, ie the highest point in the table into which data has been inserted. Oracle keeps track of this high water mark. The table is in a locally managed, autoallocate tablespace with automatic segment space management. All the defaults.

This is my environment:-
NAME VALUE
—————————— ——————–
compatible 11.1.0.0.0
cpu_count 2
db_block_size 8192
db_file_multiblock_read_count 128
optimizer_mode ALL_ROWS
sga_target 0
sort_area_size 65536

Just to re-itterate, I have not gathered system statistics -the system is “out of the box”.

From Oracle 10 onwards, there are some default system statistics set. You can see these by looking at the sys.aux_stats$ table (click on “view plain” to see it properly laid out):-

 select sname,pname,pval1,pval2
 from
 sys.aux_stats$
 order by sname,pname

SNAME           PNAME                PVAL1 PVAL2
-----------------------------------------------------
SYSSTATS_INFO   DSTART                     10-15-2007 11:32
SYSSTATS_INFO   DSTOP                      10-15-2007 11:32
SYSSTATS_INFO   FLAGS                    1
SYSSTATS_INFO   STATUS                     COMPLETED
SYSSTATS_MAIN   CPUSPEED
SYSSTATS_MAIN   CPUSPEEDNW      1845.90945
SYSSTATS_MAIN   IOSEEKTIM               10
SYSSTATS_MAIN   IOTFRSPEED            4096
SYSSTATS_MAIN   MAXTHR
SYSSTATS_MAIN   MBRC
SYSSTATS_MAIN   MREADTIM
SYSSTATS_MAIN   SLAVETHR
SYSSTATS_MAIN   SREADTIM

There are only 3 values set {on both version 11g and 10g},

  • CPUSPEEDNW – an estimate of the speed of the CPU. This value is different on different servers. Maybe. Both my old V10 windows test boxes have a value of 484, even though one is a truely old laptop, yet the linux servers I have access to all have different values, ranging from 713 to 1778.
  • IOSEEKTIME – This is the time for the IO subsystem to locate the start of an item. In both v10 and 11, this is 10 milliseconds
  • IOTFRSPEED – This is the volume of data that can be read from the io subsystem in 1 millisecond – 4096 bytes or 4k, again in V10 and 11

These default figures for IOSEEKTIME and IOTFRSPEED are fairly conservative, especially as they have not changed since V10, but then seek time has not really improved (especially on single discs).

In my experience, and looking around on the net, it seems that many sites on V10 upwards have not gathered system stats {BTW see the end of this post for some clarification on system statistics}. This seems to be due to one of several factors:

  • Some sites do not have a DBA or one who knows about system statistics {I’m not criticising, if you have been labelled a DBA and thrown the whole of the Oracle stack to look after, there is a lot to get your head around}.
  • Some DBAs think the system stats are gathered by the automatic stats job that {sort of} runs every night .
  • Changing to using system statistics would change execution plans and thus need full regression testing, some re-tuning, which there is no time or money for.
  • Hey, it works, why change things {this can also be viewed as  “I am fundamentally scared of change” :-) }.

I can understand all of the above, and as a result of them many of us benefit from understanding how the default system statistics work. I would suggest it is better to gather system statitics than not as you are giving the oracle CBO optimizer more accurate information about the actual speed of your cpu, your IO and the difference between single block reads and multi-block reads, but life is not perfect. {NB I’ve heard on the grapevine that you should not gather system stats if you run Oracle Apps 11i but I am no expert on Oracle Apps!}

So what does Oracle do with the “out of the box” information? It calculate values for SREADTIME {time to read a single block} and MREADTIM {the time taken to do a multi-block read. If you think about how discs work, to read something the head of the disc must locate where the data is and physically move to it – this is the seek time – and then read the data.
So for a single block (8k) the time taken is

IOSEEKTIME+(size of one block/IOTFRSPEED).
ie 10+(8192bytes/4096 bytes) = 12.

For a multi block read, then the disc head will seek once and then read db_file_multiblock_read_count number of blocks at once {just accept this for now, please}. In my case, 128*8k.

IOSEEKTIME+( (MBRC * Size of one block) / IOTFRSPEED)
ie 10+( (128*8192) / 4096)
= 10+(1048576/4096)
=10+(1024k/4k) =10+256 =236
I don’t know about you but I can’t divide 1048576 by 4096 just like that but 1024/4, yep easier.

The CBO can only cost something if it can treat all parts in the same units, so it converts those multiblock read counts to equivalent single block read counts by dividing the time for the multi block read by the time of the single read.
236/12 = 19.6667.
Oracle then rounds the figure to the next whole number, so 20.

{Incidentally, it does this with the CPU cost as well.
Oracle makes a guess at how much cpu effort an activity will take and then divides it by the CPU speed and the time for a single block read – I’m being a little simplistic here, but it will do for now}.

So, a single block read is cost 1
A multi block read of 128 is cost 20, and reads 128 times as much.

In previous versions of Oracle this was all different. In 8i, single block reads and multi block reads were costed the same. In 9i without cpu costing {no cpu costing being the default and I can’t remember seeing it changed on a live system I saw} a “fudge factor” was introduced where values for MBRC were reduced down to better match reality. And in my first post I misleadingly referred to a Fudge Factor and gave a table. Anyone who was used to this fudge factor may have noticed my numbers did not match, but then who of us remembers that the fudge factor for 8 as the multi block read count is 6.19? :-)

{That table is still useful, it shows an effective ratio for the MBRC, IF you have gatherd no stats AND you have an 8k block size. OK, maybe it is not so useful…}

Anyway, as you can now see from the above formulae, Oracle on 10g and 11g now uses a fairly sensible piece of logic to decide on the relative costs of single and multi block reads. The CBO formula is basically saying you save the cost of multiple seek times when reading multiple blocks, but the time to transfer data is directly related to the volumeof data read. This is probably reasonable on a system with a simple IO setup (like single disks or very basic RAID) but will break down with high-end storage which has read-ahead and SAN caches and the like.

OK, I have to address one more issue before I demonstrate that the formula is correct. And it is something that I did not know and confused the hell out of me when I decided to do a “simple test” of what I thought I knew, and prompted me to go on endlessly about how doing a simple test to prove what you have learned can go wrong and send you crazy with confusion. And thanks again to Randolf Geist for pointing me in the right direction.

At the end of the last post I had shown that my multiblock read count was 128 but acted like it was 8.
ie I got a cost of 617 for scanning TEST_BIG. When I manually set my MBRC to 8 I also got a cost of 617. When I manually set a value for MBRC of 128, it gave a cost of 395. Huh?!

Coskan alluded to this bit, but I missed it – I was ignorant about what he was saying.

On a newly created Oralce 11.1 test database, created as part of a simple instal on windows vista, you have db_file_multiblock_read_count set to:-

show parameter db_file_multiblock_read
NAME Value
————————————————————
db_file_multiblock_read_count 128

But there is a hidden parameter:-
NAME VALUE
————————————
DESCRIPTION
—————————- —– —–
_db_file_optimizer_read_count 8
multiblock read count for regular clients

{To see hidden parameters you need to be logged in as sys or use a trick like intermediate views – here is a script you can run as sys to see them. Warning, there are a LOT of them! I’ll do yet another post about hidden parameters sometime soon.}

How the hell are we supposed to know this???! Do Oracle do this on purposed to keep us awake at night worrying about this???.
*sigh*.

On Oracle 10, this hidden initialization parameter defaults to 8 and MBRC defaults to 8 {usually – I just recreated a new test 10.2.0.1 database and both are set to 16 “out of the box”}. If you change MBRC, the hidden parameter alters too, to the same value, so you do not notice it so much.
On 11, MBRC is set to 128 and the hidden parameter is set to 8 initially. Once you change MBRC, the hidden _db_file_optimizer_read_count is also changed to match.

So, as my first paragraph on the first posting said, you go on to a test database to check something and you get confused as the very first “simple” test does not give you the results you expected.

OK, does the formula work?
In the below table I list:

  • The MBRC that I set by altering my session
  • The cost as reported by Explain Plan
  • The calculated MREADTIM = IOSEEKTIM+ (MBRC*blocksize/IOTFRSPEED)
  • The calculated SREADTIM  = IOSEEKTIM+(blocksize/IOTFRSPEED)
  • MREATIME/SREADTIM { I’ll call it MT/ST}
  • Blocks in table/MBRC {I’ll call it blks/MBRC
  • (blks/MBRC)  * (MT/ST) to get a calculated cost in single block reads
MBRC Cost MRT SRT MT/ST blks/  Calc cost
                                       mbrc
2   1325  14    12  1.167   1133.5     1323
3   1010  16    12  1.333   755.667   1008
4    853   18    12  1.5       566.75     851
6    695   22    12  1.833   377.833   693
8    617   26    12  2.167   283.375   615
10  569   30    12  2.5       226.7       567
12  538   34    12  2.833   188.917   536
16  499   42    12  3.5       141.688   496
20  475   50    12  4.167   113.35     473
24  459   58    12  4.833   94.458     457
32  440   74    12  6.167   70.844     437
33  438   76    12  6.333   68.697     436
64  410   138  12  11.5    35.422     408
128 395  266  12  22.167  17.711   393

Yes :-)
The calculated cost is always 2 less than the cost given by the CBO. I know that under Oracle 10 1 is added to the cost for reading the table header. I am presuming that an additional 1 has been added for some reason under 11.

That will do for now.

Oh, as promised, some comments on system statistics

Oracle System statistics are NOT statistics collected on the tables and indexes owned by users SYS or SYSTEM. These would be “Dictionary Statistics”. These can be collected via specific calls to the DBMS_STATS.GATHER_DICTIONARY_STATISTICS and {if you have not turned off the automatics gathering of object statistics} are gathered automatically during weekday nights and over the weekend by the automated job, which gathers table and index stats when it feels they are needed. I’ll be posting about that {probably at inordinate length} in a month or so.
System Statistics are statistics about how the underlying computer system works, specifically cpu speed and IO subsystem performance.
Even Oracle Support can get confused over this. You might like to check out this unfortunate experience on the topic. I have to say, I have much sympathy with Piet as I had very similar issues trying to discuss automated stats gathering with Oracle a couple of years ago. Having said that, I in the end got good support on the topic.

cost of full table scans June 17, 2009

Posted by mwidlake in performance.
Tags: , ,
11 comments

As many sites point out, the cost as stated in explain plan is {being a little simplistic} a calculation by the CBO of the number of IO operations it thinks will be needed to complete that step of the plan. {It includes the cost of any previous step or steps that feed into it but many examples are nice and simple and have only a couple of steps in them}. 
However, it can be confusing when you decide to look at some simple examples and don’t see what you expect, as you start with simple table scans.

My test script  can be found here .

I set up a simple pair of test tables

-- create a table with 20k rows, an ascending numeric PK
-- a unique VC
-- a non unique numeric
-- a sparsly populated numeric
-- and some padding fields.
create table test_big
as select rownum  id
,mod(rownum,10)  num_1to10
,trunc(dbms_random.value(1,1000)) num_1k_vals
,'A'||to_char(rownum+1000) id_2
,trunc(dbms_random.value(1,100)) num_100_vals
,mod(rownum,5)  num_4
,decode(mod(rownum,10),0,trunc(dbms_random.value(1,30))
        ,null) num_sparse
,lpad('A',50,'A') vc_1
,lpad('B',50,'B') vc_2
,lpad('C',250,'C') vc_3
from dba_objects
where rownum < 40001
/
create table test_small
as select rownum  id
,mod(rownum,10)  num_1to10
,trunc(dbms_random.value(1,1000)) num_1k_vals
,'A'||to_char(rownum+1000) id_2
,trunc(dbms_random.value(1,100)) num_100_vals
,mod(rownum,5)  num_4
,decode(mod(rownum,10),0,trunc(dbms_random.value(1,30))
        ,null) num_sparse
,lpad('A',50,'A') vc_1
,lpad('B',50,'B') vc_2
,lpad('C',250,'C') vc_3
from dba_objects
where rownum < 101
/
-- now add statistics
-- NB each exec must be on one line
exec dbms_stats.gather_table_stats
(ownname=>user,tabname=>'TEST_BIG'
,estimate_percent=>100,cascade=>true)
exec dbms_stats.gather_table_stats
(ownname=>user,tabname=>'TEST_SMALL'
,estimate_percent=>100,cascade=>true)
--
desc test_BIG
DESC TEST_SMALL
--
set autotrace on
set timi on
set pause off
select count(*) from test_big;
select count(*) from test_small;

I’m running in Oracle 11.1.0.6 on a windows vista laptop, multi block read count is 128, no system statistics have been collected.

It creates one table with 40,000 records and one with 100 records and gather stats on them. I’ll now select count(*) from both and look at the plan.

select count(*) from test_big;
  COUNT(*)
----------
     40000

Execution Plan
-----------------------------------------------------
| Id  | Operation          | Name     | Rows  | Cost (%CPU)|
-----------------------------------------------------
|   0 | SELECT STATEMENT   |          |     1 |   617   (1)|
|   1 |  SORT AGGREGATE    |          |     1 |            |
|   2 |   TABLE ACCESS FULL| TEST_BIG | 40000 |   617   (1)|
-----------------------------------------------------

select count(*) from test_small;

  COUNT(*)
----------
       100

Execution Plan
-----------------------------------------------------
| Id  | Operation          | Name       | Rows  | Cost (%CPU)|
-----------------------------------------------------
|   0 | SELECT STATEMENT   |            |     1 |     4   (0)|
|   1 |  SORT AGGREGATE    |            |     1 |            |
|   2 |   TABLE ACCESS FULL| TEST_SMALL |   100 |     4   (0)|
-----------------------------------------------------

The costs of full table scans are 617 for the large table and 4 for the small table. We have full stats on the tables and they are just simple tables, no indexes, we should be able to see where the numbers come from.

Any guesses? Well, it could be the number of blocks making up the objects:-

select table_name,num_rows,blocks,empty_blocks
from dba_tab_statistics
where owner=user
and table_name in ('TEST_SMALL','TEST_BIG');

TABLE_NAME   NUM_ROWS     BLOCKS EMPTY_BLOCKS
---------- ---------- ---------- ------------
TEST_BIG        40000       2267            0
TEST_SMALL        100          9            0

Err, no. 2267 blocks in TEST_BIG and 9 blocks in TEST_SMALL.
{btw the view DBA_TAB_STATISTICS holds a subset of the information on DBA_TABLES and I tend to use it if I just want stats info}.

It could be the number of blocks in the segment maybe?
run @seg_dets.sql .

Enter value for seg_name: test
Enter value for owner: t

SEG_NAME    SEG TS_NAME   BYTES_K  BLOCKS exts  INI_K NXT_K
------------------------------------------------------
TEST_BIG    TAB DATA_01      18,432     2,304   33      64
TEST_SMALL   TAB DATA_01         128        16    2      64

No, it isn’t that. 2304 is not very different from 2267, as you would expect as we simply created and populated the table.

Hands up anyone who is thinking about db_file_multiblock_read_count? MBRC is the number of blocks Oracle will try and select from a segment in one I/O operation if it is scanning it. Well, MBRC is set to 128 on this system {it’s “out of the box” and I question it being that high actually. On 10.2 on windows it is, I think, 8}.

That sounds incredibly sensible, we can check if it has scanned all those blocks in chunks as dictated by the MBRC. Well, divide 2304 {number of blocks used by the segment} by 617 {I/O cost} and you get…3.734. Nothing like the MBRC.

What’s going on?!!!


Well, oracle modifies the value of db_file_multiblock_read_count before using it in calculations. But it IS the source of the calculation. You can change MBRC at a session level to check:-

alter session set db_file_multiblock_read_count=4;
Session altered.

select count(*) from test_big;
  COUNT(*)
----------
     40000
Execution Plan
----------------------------------------------------------
| Id  | Operation          | Name     | Rows  | Cost (%CPU)|
----------------------------------------------------------
|   0 | SELECT STATEMENT   |          |     1 |   853   (1)|
|   1 |  SORT AGGREGATE    |          |     1 |            |
|   2 |   TABLE ACCESS FULL| TEST_BIG | 40000 |   853   (1)|
----------------------------------------------------------

alter session set db_file_multiblock_read_count=8;
Session altered.

select count(*) from test_big;
  COUNT(*)
----------
     40000
Execution Plan
----------------------------------------------------------
| Id  | Operation          | Name     | Rows  | Cost (%CPU)|
----------------------------------------------------------
|   0 | SELECT STATEMENT   |          |     1 |   617   (1)|
|   1 |  SORT AGGREGATE    |          |     1 |            |
|   2 |   TABLE ACCESS FULL| TEST_BIG | 40000 |   617   (1)|
----------------------------------------------------------

At an MBRC of 4 the CBO calculates a cost of 853 to scan the TEST_BIG table and at an MBRC of 8 the CBO calculates a cost of 617.

This script here will run through a set of values for db_file_multiblock_read_count and run the count(*) so you can see the values going down as MBRC goes up, but oracle discounts the effect of the parameter as it gets larger.

To save you the effort, these are the costs at each value of MBRC I checked and I calculate the modifed MBRC {based on 2267 blocks, 2267 is the best number to use as it is likely to be the high water mark or very close}.

MBRC          modified
Value  Cost    MBRC
2      1325    1.711
3      1010    2.245 -- not a Magic Number
4      853     2.658
6      695     3.262
8      617     3.674
10     569     3.984
12     538     4.214
16     499     4.543
20     475     4.773
24     459     4.939
32     440     5.152
33     438     5.176 -- not a Magic Number
64     410     5.529
128   395      5.729

What do I mean by the comment “it does not need to be a magic number”. Well, MBRC is normally set to 2, 4, 8, 16, 32,64… a factor of 2. All examples I can remember seeing used have been a Magic Number, so I thought I would check if Oracle does any rounding to such numbers or insists on even values. And it does not.
{quick comment – the table has changed since the first post, I stuck the segment size not the hwm into my spreadsheet}

A couple of final comments for today:

  • How in heck can you know about this fudge unless someone tells you? And yes, I know as someone told me.
  • The values I gained on 10.2 were very,very similar, consistently a cost 3 higher than on 11.1
  • Why does scanning 9 blocks have a cost of 4?
  • Why was I getting a cost of 617 with the default MBRC of 128 and yet when I set my session to have an MBRC of 128 the cost is calculated at 395?

I don’t know all the answers to the above yet.

 

Follow

Get every new post delivered to your Inbox.

Join 166 other followers