jump to navigation

Counting the Cost #2 May 19, 2009

Posted by mwidlake in performance.
Tags: , ,
trackback

Let’s continue where I left off. I had shown that select count(*), count(1), count(pk) were equivalent under Oracle 10 onwards (and I am pretty sure Oracle 9 and 11, but you would need to check).

So, looking at this simple select count(whatever) from a table, what else can we establish? Please look back at the prior posting to see the test table and indexes I created and basic environment info, but to summarise:-

select version,archiver,database_status from v$instance;

VERSION           ARCHIVE DATABASE_STATUS
----------------- ------- -----------------
10.2.0.1.0        STARTED ACTIVE

select substr(name,1,30) name,substr(value,1,20) value
from v$parameter
where name in ('db_block_size','compatible','cpu_count'
              ,'db_file_multiblock_read_count'
             ,'optimizer_mode','sga_target','sort_area_size')
order by name/

NAME                           VALUE
------------------------------ --------------------
compatible                     10.2.0.1.0
cpu_count                      1
db_block_size                  8192
db_file_multiblock_read_count  16
optimizer_mode                 ALL_ROWS
sga_target                     612368384
sort_area_size                 65536

create table test_1
as select rownum id
,mod(rownum,10) num_1
,’A’||to_char(rownum+1000) id_2
,trunc(dbms_random.value(1,100)) num_2
,mod(rownum,5) num_3
,decode(mod(rownum,10)
,0,trunc(dbms_random.value(1,30))
,null) num_4
,lpad(‘A’,50,’A’) vc_1
,lpad(‘B’,50,’B’) vc_2
,lpad(‘C’,250,’C’) vc_3
from dba_objects
where rownum < 20000; desc test_1 Name Null? Type ---------------------------- ------- -------------- ID NOT NULL NUMBER NUM_1 NUMBER ID_2 VARCHAR2(41) NUM_2 NUMBER NUM_3 NUMBER NUM_4 NUMBER VC_1 VARCHAR2(50) VC_2 VARCHAR2(50) VC_3 VARCHAR2(250) [/sourcecode]
So, count(*), count((pk_column) and count (1) had proved to be equivalent. This was the stable plan and stats from them (all of them):

Execution Plan
———————————————————-
Plan hash value: 454320086
———————————————————————-
| Id | Operation | Name | Rows | Cost (%CPU)| Time |
———————————————————————-
| 0 | SELECT STATEMENT | | 1 | 11 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | | |
| 2 | INDEX FAST FULL SCAN| T_PK | 19999 | 11 (0)| 00:00:01 |
———————————————————————-

Statistics
———————————————————-
1 recursive calls
0 db block gets
46 consistent gets
0 physical reads
0 redo size
414 bytes sent via SQL*Net to client
381 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed

Let us see what happens when we count a column that is indexed uniquely but is not set as a not null column:

— count on a none PK unique column
— (but with no unique constraint or set to not-null
select count(id_2) from test_1;

COUNT(ID_2)
———–
19999

Execution Plan
———————————————————-
Plan hash value: 341132572
——————————————————————————
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
——————————————————————————
| 0 | SELECT STATEMENT | | 1 | 6 | 12 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 6 | | |
| 2 | INDEX FAST FULL SCAN| T_UQ | 19999 | 117K| 12 (0)| 00:00:01 |
——————————————————————————

Statistics
———————————————————-
1 recursive calls
0 db block gets
52 consistent gets
0 physical reads

The Execution plan has a new hash plan value and looks at a new index.
This is because we have told Oracle to count the new column. What we are actually asking is “give me a count of all rows with a value set for the column ID_2.
The column is not mandatory, ie set to not-null (check the tables description above), there could be many null values set for this column.
(Many null values do not break the UNIQUE constraint. Null, in Oracle terms, is a value that could be anything. Anything does not match anything. Thus you can have 0, 1 or many null values for a non-mandatory but unique column. A Primary key column, incidentally, MUST be mandatory.).
A count on the column will count all non-null values. So it uses the index on that column.
Let’s run it again, to get stats that do not include any parse overhead.:

select count(id_2) from test_1;

COUNT(ID_2)
———–
19999

Execution Plan
———————————————————-
Plan hash value: 341132572
——————————————————————————
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
——————————————————————————
| 0 | SELECT STATEMENT | | 1 | 6 | 12 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 6 | | |
| 2 | INDEX FAST FULL SCAN| T_UQ | 19999 | 117K| 12 (0)| 00:00:01 |
——————————————————————————

Statistics
———————————————————-
0 recursive calls
0 db block gets
52 consistent gets

52 consistent gets, which is more than for the scans of index T_PK. The cost is also one higher, 12 as opposed to 11. This shows that the Cost Based Optimiser is correctly identifying that count(ID_2) will take more effort than count(*), and this is borne out by the number of consistent gets.
Index T_UQ is a larger index (I did check). Oracle can’t use the smaller primary key index T_PK as this index does not hold information on column ID_2.

Now, let us select count of a column that is not unique but indexed. Column NUM_2. Remember, what we are actually asking is “give me a count of all rows with a value set for the column NUM_2.
What we see is:

— count on a non-unique indexed column
select count(num_2) from test_1;

COUNT(NUM_2)
————
19999

Execution Plan
———————————————————-
Plan hash value: 225025118
———————————————————————————
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
———————————————————————————
| 0 | SELECT STATEMENT | | 1 | 2 | 10 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 2 | | |
| 2 | INDEX FAST FULL SCAN| T_NONUQ | 19999 | 39998 | 10 (0)| 00:00:01 |
———————————————————————————

Statistics
———————————————————-
1 recursive calls
0 db block gets
44 consistent gets
0 physical reads

There is a new plan hash value and plan. Again, it needs to check for rows with the column specified set to a value, and there is an index to support this, T_NONUQ and it does a fast scan of it. Second run:

select count(num_2) from test_1;

COUNT(NUM_2)
————
19999

Execution Plan
———————————————————-
Plan hash value: 225025118
———————————————————————————
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
———————————————————————————
| 0 | SELECT STATEMENT | | 1 | 2 | 10 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 2 | | |
| 2 | INDEX FAST FULL SCAN| T_NONUQ | 19999 | 39998 | 10 (0)| 00:00:01 |
———————————————————————————

Statistics
———————————————————-
0 recursive calls
0 db block gets
44 consistent gets
0 physical reads

44 consistent gets and a cost of 10. Hang on, that is less effort than count(*), count(1) or count(id). It is a smaller index. Why did the CBO not use this index to get the count of records in the table? Because, we just asked Oracle to count the rows with a value set for the NUM_2 column. The column is not mandatory, Oracle cannot guarantee there are no null values, so a count of the entries in the index would not be guaranteed to be the number of records in the table.

If you know a little more about Oracle, you may wonder if Oracle could check the column stats to see how many null values there are and, if there are none, us that smaller index for the count(*). But null values may have been created since the stats were gathered. If the CBO tried to use this trick, it could easily give the wrong answer, and then how would you feel about that licence fee?

As a further aside, I once came across a claim that counting a column with a unique index would be faster than counting a column with a non-unique index. They could have been right, depending on the size of the indexes, but they were not doing what they thought they were. They were counting the number of records with a value set for that column.

For this reason I would shy away from using count(column) to count the records in a table. If the column is mandatory, you will be counting the number of records, but your code is saying “count how often that column has a value”. As a pedant I feel it’s just wrong. As an experienced DBA who has seen mandatory columns made non-mandatory in live systems, I feel it is also slightly dangerous. Having also seen indexes, even unique indexes, be dropped from live systems, the performance impact could be significant, as you will soon see.

OK, I have been careful so far to count only columns with indexes on them. Let us count a column than has no index on it.

— count on an unindexed column
select count(vc_1) from test_1;

COUNT(VC_1)
———–
19999

Execution Plan
———————————————————-
Plan hash value: 479375906
—————————————————————————–
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
—————————————————————————–
| 0 | SELECT STATEMENT | | 1 | 50 | 240 (1)| 00:00:03 |
| 1 | SORT AGGREGATE | | 1 | 50 | | |
| 2 | TABLE ACCESS FULL| TEST_1 | 19999 | 976K| 240 (1)| 00:00:03 |
—————————————————————————–

Statistics
———————————————————-
1 recursive calls
0 db block gets
1058 consistent gets
0 physical reads

There is a new plan hash value and plan. Now a full table scan is being used and the cost has shot up. The CBO has no index holding information on this column, so it has to scan the whole table. Let’s do the second run:

select count(vc_1) from test_1;

COUNT(VC_1)
———–
19999

Execution Plan
———————————————————-
Plan hash value: 479375906
—————————————————————————–
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
—————————————————————————–
| 0 | SELECT STATEMENT | | 1 | 50 | 240 (1)| 00:00:03 |
| 1 | SORT AGGREGATE | | 1 | 50 | | |
| 2 | TABLE ACCESS FULL| TEST_1 | 19999 | 976K| 240 (1)| 00:00:03 |
—————————————————————————–

Statistics
———————————————————-
0 recursive calls
0 db block gets
1058 consistent gets
0 physical reads

A cost of 240 (the estimated effort) and buffer gets of 1058 (the actual effort). That is a lot more than scanning an index that is considerably smaller than the table. And now you can see why dropping an index that was on a column someone was counting (rightly or wrongly) can have an impact on performance.

Finally for now, let’s count a column that is sparsely populated, column NUM_4. Oh, I have only included the second run, so any parse overhead is gone:

— select on a sparse, indexed column
select count(num_4) from test_1;

COUNT(NUM_4)
————
1999

Execution Plan
———————————————————-
Plan hash value: 2410540583
———————————————————————————-
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
———————————————————————————-
| 0 | SELECT STATEMENT | | 1 | 2 | 3 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 2 | | |
| 2 | INDEX FAST FULL SCAN| T_SPARSE | 19999 | 39998 | 3 (0)| 00:00:01 |
———————————————————————————-

Statistics
———————————————————-
0 recursive calls
0 db block gets
7 consistent gets
0 physical reads

A new plan hash value and a new plan. The CBO is using the index on the sparse column. The cost is 3 and the consistent gets only 7, a lot lower than we have seen before.. And the number of records is 1,999, not 19,999. We have counted the number of records with a value set for NUM_4. It needs so many fewer consistent gets as the index is a lot smaller.

This can be used as a performance trick, if you regularly want to count a small number of records in a large table, for example active orders in an orders table, which holds orders going back years and 99.9% of them are complete:

Have a column called ACTIVE, a CHAR(1) column.
Set it to a value, eg ‘Y’, when the record is created.
Set it to null when the order is complete.
Re-set it to ‘Y’ if the order ever becomes valid again.

A count on the column will use the very small index (helped by the fact the column is a CHAR(1)) to count the records. This little index will also allow the active records to be identified in other select queries also.

Don’t forget though, that indexes on the table has to be maintained by oracle and it does take up some space, so there is a cost to having it. Also, I have not mentioned what happens to that index over time… That might be for a later posting, but if you look at Richard Footes blog, you may well find something about such indexes growing over time.

My next post on this subject will consider if there is a way to count the records in a table even quicker than count(*).

Comments»

1. Counting the Cost « Martin Widlake’s Yet Another Oracle Blog - May 22, 2009

[…] the second posting for what happens when you select count (column_name) from a table and why I think it is a poor idea […]

2. Counting the Cost #4 – The speedy way « Martin Widlake’s Yet Another Oracle Blog - May 26, 2009

[…] etc are immaterial with modern version of Oracle – they all take the same effort by the CBO. This next post looks at how indexes are used when you count a single column in a table and why it is not always a […]


Leave a reply to Counting the Cost « Martin Widlake’s Yet Another Oracle Blog Cancel reply