Counting the Cost #3 May 22, 2009
Posted by mwidlake in performance.Tags: explain plan, performance, SQL
trackback
I’m sorry for the delay, I will now continue to look at the fastest way to count the rows in a table and what we can learn from it.
Today I will show you how you could potentially add a bitmap index to greatly speed up the count.
I’ve got a new test script that you can run that does most of the steps below. You can get it from here.
See this first posting for the details of setting up the table and showing that select count(*), count(1) and count(pk) all take the same amount of time and do the same work internally and it probably matters not one jot which you use.
See the second posting for what happens when you select count (column_name) from a table and why I think it is a poor idea to use it to count the number of rows in a table.
Previously I had shown that Oracle chooses to use primary key index to count the number of rows in a table – which has been the fastest way so far.
Oracle will actually use the smallest index it can to count the number of rows. Generally, the smallest index you can create is a bitmap index, as it is stored in such a compact manner. Just take my word on it.
OK, I suppose I better give some supporting evidence rather than just my word.
This is the table:
Name Null? Type --------------------------------------- -------- ID NOT NULL NUMBER NUM_1 NUMBER ID_2 VARCHAR2(41) NUM_2 NUMBER NUM_3 NUMBER NUM_4 NUMBER VC_1 VARCHAR2(50) VC_2 VARCHAR2(50) VC_3 VARCHAR2(250)
These are the sizes of the table and indexes on the table. The column used/leaf takes into account empty blocks in a table and just the leaf blocks in an index, which are the blocks scanned in a full scan:
USED/ SEGMENT TYPE BLOCKS LEAF SIZE_K --------- -------- ------- ----- ------ TEST_1 TABLE 1152 1079 9,216 T_PK INDEX 48 41 384 Primary key on ID T_UQ INDEX 56 47 448 Unique idx on ID2 T_NONUQ INDEX 48 39 384 nonunique on NUM T_SPARSE INDEX 8 4 64 on NUM4, 95% null
I am now going to create a simple bitmap index on NUM_3, a non-mandatory column but it does have a non-null value in every row.
Now to do something that can make counts faster, add a nice, compact bitmap index.
create bitmap index t_bm on test_1(num_3);
Index created.
Elapsed: 00:00:00.04
test102>analyze table test_1 compute statistics;
Table analyzed.
Elapsed: 00:00:00.29
I’ve gathered stats again so that the CBO knows about the index. Let’s start with a count of the column. We would expect that to use the new index.
— count on the indexed column
select count(num_3) from test_1;
COUNT(NUM_3)
————
19999
Execution Plan
———————————————————-
Plan hash value: 864032431
————————————————————————————–
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
————————————————————————————–
| 0 | SELECT STATEMENT | | 1 | 2 | 5 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 2 | | |
| 2 | BITMAP CONVERSION TO ROWIDS | | 19999 | 39998 | 5 (0)| 00:00:01 |
| 3 | BITMAP INDEX FAST FULL SCAN| T_BM | | | | |
————————————————————————————–
Statistics
———————————————————-
1 recursive calls
0 db block gets
10 consistent gets
0 physical reads
As you can see, the CBO is using the new index and the cost has dropped down to 5 {please note, this is the estimated cost, but it is accurate in this case. I might do a blog entry on testing changes in very quick statements}. Consistent Gets is down to 10.
Let’s compare that to what was previously the fastest way to count the records, count(*) {or count(1) {or count(pk)}}, which had a cost of 11 and 46 consistent gets.
COUNT(*)
———-
19999
———————————————————————-
| Id | Operation | Name | Rows | Cost (%CPU)| Time |
———————————————————————-
| 0 | SELECT STATEMENT | | 1 | 11 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | | |
| 2 | INDEX FAST FULL SCAN| T_PK | 19999 | 11 (0)| 00:00:01 |
———————————————————————-
Statistics
———————————————————-
0 recursive calls
0 db block gets
46 consistent gets
0 physical reads
How big is the new index?
SEGMENT TYPE BLOCKS LEAF SIZE_K ---------- ------- ------- ----- -------- T_BM INDEX 16 5 128 T_PK INDEX 48 41 384
I’ll run the count of NUM_3 a second time to get rid of any parse overhead.
select count(num_3) from test_1;
COUNT(NUM_3)
————
19999
Execution Plan
Plan hash value: 864032431
————————————————————————————–
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
————————————————————————————–
| 0 | SELECT STATEMENT | | 1 | 2 | 5 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 2 | | |
| 2 | BITMAP CONVERSION TO ROWIDS | | 19999 | 39998 | 5 (0)| 00:00:01 |
| 3 | BITMAP INDEX FAST FULL SCAN| T_BM | | | | |
————————————————————————————–
Statistics
———————————————————-
0 recursive calls
0 db block gets
10 consistent gets
The bitmap index has 5 leaf blocks, a cost of 5 and there are 10 consistent gets.
The primary key has 41 leaf blocks, a cost of 11 and 46 consistent gets.
{Don’t worry too much about the relationship between blocks, costs and consistent gets, I’ll cover that in a later posting}
Bitmap indexes are unlike normal Oracle indexes in that they also index nulls. Thus a count (*) on the table can be answered by the CBO by using the smaller bitmap index. Here is the proof (second run only shown)
— traditional count(*)
select count(*) from test_1;
COUNT(*)
———-
19999
Execution Plan
———————————————————-
Plan hash value: 1066692852
——————————————————————————
| Id | Operation | Name | Rows | Cost (%CPU)| Time |
——————————————————————————
| 0 | SELECT STATEMENT | | 1 | 5 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | | |
| 2 | BITMAP CONVERSION COUNT | | 19999 | 5 (0)| 00:00:01 |
| 3 | BITMAP INDEX FAST FULL SCAN| T_BM | | | |
——————————————————————————
Statistics
———————————————————-
0 recursive calls
0 db block gets
10 consistent gets
We see the same stats and cost, but look carefully and you will see that the Plan Hash Value is different and the plan is slightly different. The BITMAP CONVERSION TO ROWIDS has become a BITMAP CONVERSION COUNT. This is because, when we said count(NUM_3) we were asking for a count of NON-NULL values for column NUM_3, so the CBO had to handle that. Count(*) just wants to know how many rows there are.
{I am not sure exactly is being done by the conversion to rowids in this case as the CBO has no need to visit the table. It should be able to just ignore the bitmap for null values. Maybe it does so but this is how Explain Plan reports the activity. Maybe I should ask Jonathan Lewis :-)}
Here are just two quick proofs that select count (PK) and count (1) will both use this smaller index:
— count on pk column
select count(id) from test_1;
COUNT(ID)
———-
19999
Execution Plan
———————————————————-
Plan hash value: 1066692852
——————————————————————————
| Id | Operation | Name | Rows | Cost (%CPU)| Time |
——————————————————————————
| 0 | SELECT STATEMENT | | 1 | 5 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | | |
| 2 | BITMAP CONVERSION COUNT | | 19999 | 5 (0)| 00:00:01 |
| 3 | BITMAP INDEX FAST FULL SCAN| T_BM | | | |
——————————————————————————
Statistics
———————————————————-
1 recursive calls
0 db block gets
10 consistent gets
— count on a static value
select count(1) from test_1;
COUNT(1)
———-
19999
Execution Plan
———————————————————-
Plan hash value: 1066692852
——————————————————————————
| Id | Operation | Name | Rows | Cost (%CPU)| Time |
——————————————————————————
| 0 | SELECT STATEMENT | | 1 | 5 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | | |
| 2 | BITMAP CONVERSION COUNT | | 19999 | 5 (0)| 00:00:01 |
| 3 | BITMAP INDEX FAST FULL SCAN| T_BM | | | |
——————————————————————————
Statistics
———————————————————-
1 recursive calls
0 db block gets
10 consistent gets
Finally for this post (and I know, I do go on a bit) I will show that the CBO will use the bitmap to count the number of values in a column that it knows is mandatory – presumably someone in the Oracle Optimisation team recognised that such a count is counting the number of rows, so just count the number of rows the quickest way possible (which is to use the smallest index which is on a mandatory column or is a bitmap index, I suspect).
First, a count on a column that is NOT mandatory, which has a unique index on it. The CBO uses the supporting index:-
–count on a non-pk unique column
select count(id_2) from test_1;
COUNT(ID_2)
———–
19999
Execution Plan
———————————————————-
Plan hash value: 341132572
——————————————————————————
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
——————————————————————————
| 0 | SELECT STATEMENT | | 1 | 6 | 12 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 6 | | |
| 2 | INDEX FAST FULL SCAN| T_UQ | 19999 | 117K| 12 (0)| 00:00:01 |
——————————————————————————
Statistics
———————————————————-
0 recursive calls
0 db block gets
52 consistent gets
0 physical reads
Cost of 12, 52 consistent gets. Now let us make the column mandatory, analyze the table and select count (NUM_2) again {I show the second run only, as the first run has the parse overhead}
alter table test_1
modify num_2 not null;
Table altered.
test102>analyze table test_1 compute statistics;
Table analyzed.
select count(num_2) from test_1;
COUNT(NUM_2)
————
19999
Execution Plan
———————————————————-
Plan hash value: 1066692852
——————————————————————————
| Id | Operation | Name | Rows | Cost (%CPU)| Time |
——————————————————————————
| 0 | SELECT STATEMENT | | 1 | 5 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | | |
| 2 | BITMAP CONVERSION COUNT | | 19999 | 5 (0)| 00:00:01 |
| 3 | BITMAP INDEX FAST FULL SCAN| T_BM | | | |
——————————————————————————
Statistics
———————————————————-
0 recursive calls
0 db block gets
10 consistent gets
0 physical reads
And there we go, a cost of 5, consistent gets of 10 and the same plan/plan hash value as count(*).
I still would not select count(column) as a general rule UNLESS I wanted to count non-null values of that column, but the CBO will step in and use the most efficient way it can to count the rows in a table, if it can identify that counting all the rows is what you really want to do.
I’ve managed to write a lot and learn a lot from a very, very simple statement!
My next post will be on even faster ways of {sort of} getting the number of rows in a table.
Fascination stuff.
Just out of the blue, I’m thinking: The fastest way to keep count of something is to have an object (index?) that is guaranteed to “know” of all the rows.
Unless you are thinking of a running-total-trigger or an MVIEW with count(1) in it. If it is at all possible, that would be a bit of a stunt, just for keeping count.
But not sure if it would be efficient (on commit…).
[…] are used when you count a single column in a table and why it is not always a good idea. The third post shows how Oracle will use the smallest index it can to select count(*) and in particular how a […]