jump to navigation

Counting the Cost #3 May 22, 2009

Posted by mwidlake in performance.
Tags: , ,
trackback

I’m sorry for the delay, I will now continue to look at the fastest way to count the rows in a table and what we can learn from it.

Today I will show you how you could potentially add a bitmap index to greatly speed up the count.

I’ve got a new test script that you can run that does most of the steps below. You can get it from here.

See this first posting for the details of setting up the table and showing that select count(*), count(1) and count(pk) all take the same amount of time and do the same work internally and it probably matters not one jot which you use.

See the second posting for what happens when you select count (column_name) from a table and why I think it is a poor idea to use it to count the number of rows in a table.

Previously I had shown that Oracle chooses to use primary key index to count the number of rows in a table – which has been the fastest way so far.

Oracle will actually use the smallest index it can to count the number of rows. Generally, the smallest index you can create is a bitmap index, as it is stored in such a compact manner. Just take my word on it.

OK, I suppose I better give some supporting evidence rather than just my word.
This is the table:

Name                             Null?    Type
 --------------------------------------- -------- 
 ID                             NOT NULL NUMBER
 NUM_1                                     NUMBER
 ID_2                                        VARCHAR2(41)
 NUM_2                                     NUMBER
 NUM_3                                     NUMBER
 NUM_4                                     NUMBER
 VC_1                                        VARCHAR2(50)
 VC_2                                        VARCHAR2(50)
 VC_3                                        VARCHAR2(250)

These are the sizes of the table and indexes on the table. The column used/leaf takes into account empty blocks in a table and just the leaf blocks in an index, which are the blocks scanned in a full scan:

                                            USED/ 
SEGMENT     TYPE     BLOCKS  LEAF  SIZE_K
--------- -------- ------- ----- ------
TEST_1       TABLE        1152 1079  9,216
T_PK           INDEX            48    41      384 Primary key on ID
T_UQ           INDEX            56    47      448 Unique idx on ID2
T_NONUQ     INDEX           48    39      384 nonunique on NUM
T_SPARSE     INDEX              8     4        64 on NUM4, 95% null

I am now going to create a simple bitmap index on NUM_3, a non-mandatory column but it does have a non-null value in every row.

Now to do something that can make counts faster, add a nice, compact bitmap index.

create bitmap index t_bm on test_1(num_3);
Index created.
Elapsed: 00:00:00.04

test102>analyze table test_1 compute statistics;
Table analyzed.
Elapsed: 00:00:00.29

I’ve gathered stats again so that the CBO knows about the index. Let’s start with a count of the column. We would expect that to use the new index.

— count on the indexed column
select count(num_3) from test_1;

COUNT(NUM_3)
————
19999

Execution Plan
———————————————————-
Plan hash value: 864032431
————————————————————————————–
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
————————————————————————————–
| 0 | SELECT STATEMENT | | 1 | 2 | 5 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 2 | | |
| 2 | BITMAP CONVERSION TO ROWIDS | | 19999 | 39998 | 5 (0)| 00:00:01 |
| 3 | BITMAP INDEX FAST FULL SCAN| T_BM | | | | |
————————————————————————————–

Statistics
———————————————————-
1 recursive calls
0 db block gets
10 consistent gets
0 physical reads

As you can see, the CBO is using the new index and the cost has dropped down to 5 {please note, this is the estimated cost, but it is accurate in this case. I might do a blog entry on testing changes in very quick statements}. Consistent Gets is down to 10.

Let’s compare that to what was previously the fastest way to count the records, count(*) {or count(1) {or count(pk)}}, which had a cost of 11 and 46 consistent gets.

COUNT(*)
———-
19999

———————————————————————-
| Id | Operation | Name | Rows | Cost (%CPU)| Time |
———————————————————————-
| 0 | SELECT STATEMENT | | 1 | 11 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | | |
| 2 | INDEX FAST FULL SCAN| T_PK | 19999 | 11 (0)| 00:00:01 |
———————————————————————-

Statistics
———————————————————-
0 recursive calls
0 db block gets
46 consistent gets
0 physical reads

How big is the new index?

SEGMENT    TYPE      BLOCKS  LEAF  SIZE_K
---------- ------- ------- ----- --------
T_BM          INDEX           16         5       128
T_PK           INDEX           48       41       384 

I’ll run the count of NUM_3 a second time to get rid of any parse overhead.

select count(num_3) from test_1;

COUNT(NUM_3)
————
19999

Execution Plan
Plan hash value: 864032431
————————————————————————————–
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
————————————————————————————–
| 0 | SELECT STATEMENT | | 1 | 2 | 5 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 2 | | |
| 2 | BITMAP CONVERSION TO ROWIDS | | 19999 | 39998 | 5 (0)| 00:00:01 |
| 3 | BITMAP INDEX FAST FULL SCAN| T_BM | | | | |
————————————————————————————–

Statistics
———————————————————-
0 recursive calls
0 db block gets
10 consistent gets

The bitmap index has 5 leaf blocks, a cost of 5 and there are 10 consistent gets.
The primary key has 41 leaf blocks, a cost of 11 and 46 consistent gets.

{Don’t worry too much about the relationship between blocks, costs and consistent gets, I’ll cover that in a later posting}

Bitmap indexes are unlike normal Oracle indexes in that they also index nulls. Thus a count (*) on the table can be answered by the CBO by using the smaller bitmap index. Here is the proof (second run only shown)

— traditional count(*)
select count(*) from test_1;

COUNT(*)
———-
19999

Execution Plan
———————————————————-
Plan hash value: 1066692852
——————————————————————————
| Id | Operation | Name | Rows | Cost (%CPU)| Time |
——————————————————————————
| 0 | SELECT STATEMENT | | 1 | 5 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | | |
| 2 | BITMAP CONVERSION COUNT | | 19999 | 5 (0)| 00:00:01 |
| 3 | BITMAP INDEX FAST FULL SCAN| T_BM | | | |
——————————————————————————

Statistics
———————————————————-
0 recursive calls
0 db block gets
10 consistent gets

We see the same stats and cost, but look carefully and you will see that the Plan Hash Value is different and the plan is slightly different. The BITMAP CONVERSION TO ROWIDS has become a BITMAP CONVERSION COUNT. This is because, when we said count(NUM_3) we were asking for a count of NON-NULL values for column NUM_3, so the CBO had to handle that. Count(*) just wants to know how many rows there are.
{I am not sure exactly is being done by the conversion to rowids in this case as the CBO has no need to visit the table. It should be able to just ignore the bitmap for null values. Maybe it does so but this is how Explain Plan reports the activity. Maybe I should ask Jonathan Lewis :-)}

Here are just two quick proofs that select count (PK) and count (1) will both use this smaller index:

— count on pk column
select count(id) from test_1;

COUNT(ID)
———-
19999

Execution Plan
———————————————————-
Plan hash value: 1066692852
——————————————————————————
| Id | Operation | Name | Rows | Cost (%CPU)| Time |
——————————————————————————
| 0 | SELECT STATEMENT | | 1 | 5 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | | |
| 2 | BITMAP CONVERSION COUNT | | 19999 | 5 (0)| 00:00:01 |
| 3 | BITMAP INDEX FAST FULL SCAN| T_BM | | | |
——————————————————————————

Statistics
———————————————————-
1 recursive calls
0 db block gets
10 consistent gets

— count on a static value
select count(1) from test_1;

COUNT(1)
———-
19999

Execution Plan
———————————————————-
Plan hash value: 1066692852
——————————————————————————
| Id | Operation | Name | Rows | Cost (%CPU)| Time |
——————————————————————————
| 0 | SELECT STATEMENT | | 1 | 5 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | | |
| 2 | BITMAP CONVERSION COUNT | | 19999 | 5 (0)| 00:00:01 |
| 3 | BITMAP INDEX FAST FULL SCAN| T_BM | | | |
——————————————————————————

Statistics
———————————————————-
1 recursive calls
0 db block gets
10 consistent gets

Finally for this post (and I know, I do go on a bit) I will show that the CBO will use the bitmap to count the number of values in a column that it knows is mandatory – presumably someone in the Oracle Optimisation team recognised that such a count is counting the number of rows, so just count the number of rows the quickest way possible (which is to use the smallest index which is on a mandatory column or is a bitmap index, I suspect).

First, a count on a column that is NOT mandatory, which has a unique index on it. The CBO uses the supporting index:-

–count on a non-pk unique column
select count(id_2) from test_1;

COUNT(ID_2)
———–
19999

Execution Plan
———————————————————-
Plan hash value: 341132572
——————————————————————————
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
——————————————————————————
| 0 | SELECT STATEMENT | | 1 | 6 | 12 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 6 | | |
| 2 | INDEX FAST FULL SCAN| T_UQ | 19999 | 117K| 12 (0)| 00:00:01 |
——————————————————————————

Statistics
———————————————————-
0 recursive calls
0 db block gets
52 consistent gets
0 physical reads

Cost of 12, 52 consistent gets. Now let us make the column mandatory, analyze the table and select count (NUM_2) again {I show the second run only, as the first run has the parse overhead}

alter table test_1
modify num_2 not null;

Table altered.

test102>analyze table test_1 compute statistics;

Table analyzed.

select count(num_2) from test_1;

COUNT(NUM_2)
————
19999

Execution Plan
———————————————————-
Plan hash value: 1066692852
——————————————————————————
| Id | Operation | Name | Rows | Cost (%CPU)| Time |
——————————————————————————
| 0 | SELECT STATEMENT | | 1 | 5 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | | |
| 2 | BITMAP CONVERSION COUNT | | 19999 | 5 (0)| 00:00:01 |
| 3 | BITMAP INDEX FAST FULL SCAN| T_BM | | | |
——————————————————————————

Statistics
———————————————————-
0 recursive calls
0 db block gets
10 consistent gets
0 physical reads

And there we go, a cost of 5, consistent gets of 10 and the same plan/plan hash value as count(*).
I still would not select count(column) as a general rule UNLESS I wanted to count non-null values of that column, but the CBO will step in and use the most efficient way it can to count the rows in a table, if it can identify that counting all the rows is what you really want to do.

I’ve managed to write a lot and learn a lot from a very, very simple statement!

My next post will be on even faster ways of {sort of} getting the number of rows in a table.

Comments»

1. PdV - May 23, 2009

Fascination stuff.
Just out of the blue, I’m thinking: The fastest way to keep count of something is to have an object (index?) that is guaranteed to “know” of all the rows.
Unless you are thinking of a running-total-trigger or an MVIEW with count(1) in it. If it is at all possible, that would be a bit of a stunt, just for keeping count.
But not sure if it would be efficient (on commit…).

2. Counting the Cost #4 – The speedy way « Martin Widlake’s Yet Another Oracle Blog - May 26, 2009

[…] are used when you count a single column in a table and why it is not always a good idea. The third post shows how Oracle will use the smallest index it can to select count(*) and in particular how a […]


Leave a reply to PdV Cancel reply