jump to navigation

Counting the Costs May 17, 2009

Posted by mwidlake in performance.
Tags: , ,
trackback

Anyone who has been around the Oracle DBA or Developer world for a while is sure to have come across people debating(*) the fastest way of selecting the number of  rows from a table. Different suggestions have been that count(*) is not as fast as count(pk_column), count(rowid) or count(1).

(*) by debating I mean arguing, sometimes getting angry and, on one memorable occasion, pretty much asking each other to step outside to resolve this matter Man-to-Man (why do women not feel the need for this final step? Just more sensible I guess).

Bottom line, it makes no difference whether you select count(*), count(1), count(pk) on Oracle 10 (and, I am sure, Oracle 9). However, that is an unsupported claim and you should not believe me yet. Any web site or blog making a solid statement about how Oracle works needs to provide some proof, otherwise it is just opinion.

What I find more interesting is that some people seem happy to spend more time debating the question than doing a few tests to find out for sure. Also, proving the answer to such a simple question can actually lead to some more interesting insights.

I’ve followed the lead of my betters and I have created a script to run through all the below tests. If I can bend my very new blogging skills to it, the sql*plus script can be downloaded from this link.

Below follows my proof and some observations on the way.

First I better show what version of Oracle I am using and the basic set up as far as sql performance goes (I’ve trimmed the output to remove number of rows returned, timing etc where it is not significant)

test102>– basic setup info

test102>select version,archiver,database_status from v$instance; 

VERSION           ARCHIVE DATABASE_STATUS
—————– ——- —————–

  • 10.2.0.1.0        STARTED ACTIVE
  •  
    test102>select substr(name,1,30) name,substr(value,1,20) value
      2  from v$parameter
      3  where name in (‘db_block_size’,’compatible’,’cpu_count’,’db_file_multiblock_read_count’
      4              ,’optimizer_mode’,’sga_target’,’sort_area_size’)
      5  order by name
      6  /

    NAME                           VALUE
    —————————— ——————–
    compatible                     10.2.0.1.0
    cpu_count                      1
    db_block_size                  8192
    db_file_multiblock_read_count  16
    optimizer_mode                 ALL_ROWS
    sga_target                     612368384
    sort_area_size                 65536
     

    Now I create a reasonably sized table with a numeric column ID to be the primary key, column NUM_1 which is a small varying numeric, a unique column ID_2, NUM_2 holding random values between 0 and 99, another small number column NUM_3 and a final numeric column NUM_4 which is sparse, only 1 in 20 records will hold a value. Finally there are three columns VC_1 to VC_3 which just hold strings to pad out the table a little.

    test102>drop table test_1;
    Table dropped.

     Elapsed: 00:00:00.59

    test102>–
    test102>– create a table with 20k rows, an ascending numeric PK
    test102>– a unique VC
    test102>– a non unique numeric
    test102>– a sparsly populated numeric
    test102>– and some padding fields.
    test102>create table test_1
      2  as select rownum  id
      3  ,mod(rownum,10)  num_1
      4  ,’A’||to_char(rownum+1000) id_2
      5  ,trunc(dbms_random.value(1,100)) num_2
      6  ,mod(rownum,5)  num_3
      7  ,decode(mod(rownum,10),0,trunc(dbms_random.value(1,30))
      8         ,null) num_4
      9  ,lpad(‘A’,50,’A’) vc_1
     10  ,lpad(‘B’,50,’B’) vc_2
     11  ,lpad(‘C’,250,’C’) vc_3
     12  from dba_objects
     13  where rownum < 20000
     14  /

    test102>desc test_1

     Name                                            Null?    Type
     ———————————————– ——– ——————————–
     ID                                              NOT NULL NUMBER
     NUM_1                                                    NUMBER
     ID_2                                                     VARCHAR2(41)
     NUM_2                                                    NUMBER
     NUM_3                                                    NUMBER
     NUM_4                                                    NUMBER
     VC_1                                                     VARCHAR2(50)
     VC_2                                                     VARCHAR2(50)
     VC_3                                                     VARCHAR2(250)
     
    Now I add a primary key (a supporting unique index is created automatically), a unique index, a non-unique index and an index on the sparse column. Finally, I use the old ANALYZE command so compute statistics on the table and index, so that the Oracle Optimizer has information about the table and it’s indexes.
     
    test102>– now add a pK on the ID
    test102>alter table test_1
      2  add constraint t_pk primary key (id);
    Table altered.

    test102>– add a unique index on the unique varchar
    test102>create unique index t_uq on test_1(id_2);
    Index created.

    test102>– add a non-unique index on the non-unique numeric
    test102>create index t_nonuq on test_1(num_2);
    Index created.

    test102>– and finally an index on the sparse column
    test102>create index t_sparse on test_1(num_4);
    Index created.

    test102>– now add statistics
    test102>analyze table test_1 compute statistics;
    Table analyzed.
     
    That is the setup, now I turn on autotrace (which causes SQL*Plus to show information about the statements executed), timing and I’ll do my first test, the traditional select count(*).

    test102>set autotrace on
    test102>set timi on
    test102>set pause off
    test102>– Traditional count(*)
    test102>select count(*) from test_1;
     

      COUNT(*)
    ———-
         19999
    1 row selected. 

    Elapsed: 00:00:00.01

    Execution Plan
    ———————————————————-
    Plan hash value: 454320086

    ———————————————————————-
    | Id  | Operation             | Name | Rows  | Cost (%CPU)| Time     |
    ———————————————————————-
    |   0 | SELECT STATEMENT      |      |     1 |    11   (0)| 00:00:01 |
    |   1 |  SORT AGGREGATE       |      |     1 |            |          |
    |   2 |   INDEX FAST FULL SCAN| T_PK | 19999 |    11   (0)| 00:00:01 |
    ———————————————————————-

    Statistics
    ———————————————————-
             44  recursive calls
              0  db block gets
             53  consistent gets
              0  physical reads
              0  redo size
            413  bytes sent via SQL*Net to client
            381  bytes received via SQL*Net from client
              2  SQL*Net roundtrips to/from client
              0  sorts (memory)
              0  sorts (disk)
              1  rows processed
     
    What does this show? Well, there are 19,999 rows, it took 1 millisecond (ms) to do the count.

    The plan is simple, there was a fast full scan of the index T_PK, cost was 11 and it took 1ms

    The statistics show 44 recursive calls, 0 block reads (this is physical reads from disc, as far as Oracle is concerned) and 53 consistent gets, which is reads from memory. I’ll ignore the rest for now.

    However, I have done one test on one statement. I was taught to be a scientist at college, one isolated test is not enough, I do not know if the results are stable. I’ll do it again.

     
    test102>select count(*) from test_1;

      COUNT(*)
    ———-
         19999
    1 row selected.

    Elapsed: 00:00:00.00 

    Execution Plan
    ———————————————————-
    Plan hash value: 45432008
     
    ———————————————————————-
    | Id  | Operation             | Name | Rows  | Cost (%CPU)| Time     |
    ———————————————————————-
    |   0 | SELECT STATEMENT      |      |     1 |    11   (0)| 00:00:01 |
    |   1 |  SORT AGGREGATE       |      |     1 |            |          |
    |   2 |   INDEX FAST FULL SCAN| T_PK | 19999 |    11   (0)| 00:00:01 |
    ———————————————————————-

    Statistics
    ———————————————————-
              0  recursive calls
              0  db block gets
             46  consistent gets
              0  physical reads
              0  redo size
            413  bytes sent via SQL*Net to client
            381  bytes received via SQL*Net from client
              2  SQL*Net roundtrips to/from client
              0  sorts (memory)
              0  sorts (disk)
              1  rows processed
     
    Well the result and the plan are the same, but now the code takes less than 1ms, there are no recursive calls and consistent gets drop from 53 to 46. The recursive calls and the extra consistent gets in the first statement are due to Oracle doing some parsing work on the statement – it has checked the syntax, looked up the objects in the data dictionary and come up with an execution plan (I have to be honest, this is not quite true, I have run the statement several times already so some of that work was done before, thus the parse work in this example is a little light).
     
    The lesson here is, treat the first execution of a statement as a special case. The second and further executions give better stats on the execution of the statement. Run the select count(*) from test_1 half a dozen times yourself and you should see the stats are stable.

    Now,  select count(id) from test_1. Column ID is a primary key

    test102>– now count on the Primary key
    test102>select count(id) from test_1;
     
     COUNT(ID)
    ———-
         19999
    1 row selected.

    Elapsed: 00:00:00.00

    Execution Plan
    ———————————————————-
    Plan hash value: 454320086

    ———————————————————————-
    | Id  | Operation             | Name | Rows  | Cost (%CPU)| Time     |
    ———————————————————————-
    |   0 | SELECT STATEMENT      |      |     1 |    11   (0)| 00:00:01 |
    |   1 |  SORT AGGREGATE       |      |     1 |            |          |
    |   2 |   INDEX FAST FULL SCAN| T_PK | 19999 |    11   (0)| 00:00:01 |
    ———————————————————————-
     
    Statistics
    ———————————————————-
              1  recursive calls
              0  db block gets
             46  consistent gets
              0  physical reads
              0  redo size
            414  bytes sent via SQL*Net to client
            381  bytes received via SQL*Net from client
              2  SQL*Net roundtrips to/from client
              0  sorts (memory)
              0  sorts (disk)
              1  rows processed
     
    That is spookily like the second run of count(*). Remembering my comments on the first run, Let’s do a second run of the code.

    test102>select count(id) from test_1;
     COUNT(ID)
    ———-
         19999
    1 row selected.

    Elapsed: 00:00:00.00

    Execution Plan
    ———————————————————-
    Plan hash value: 454320086

    ———————————————————————-
    | Id  | Operation             | Name | Rows  | Cost (%CPU)| Time     |
    ———————————————————————-
    |   0 | SELECT STATEMENT      |      |     1 |    11   (0)| 00:00:01
    |   1 |  SORT AGGREGATE       |      |     1 |            |          |
    |   2 |   INDEX FAST FULL SCAN| T_PK | 19999 |    11   (0)| 00:00:01 |
    ———————————————————————-

    Statistics
    ———————————————————-
              0  recursive calls
              0  db block gets
             46  consistent gets
              0  physical reads
              0  redo size
            414  bytes sent via SQL*Net to client
            381  bytes received via SQL*Net from client
              2  SQL*Net roundtrips to/from client
              0  sorts (memory)
              0  sorts (disk)
              1  rows processed

    Yep, exact same plan and exact same stats as for count(*).

    Why is there only a difference of 1 recursive call between the two runs? The Oracle kernel has spotted that the query can be satisfied with the same plan, so it just allocates it to the cursor (ie uses it for this statement). Note, the plan hash value is the same. 

    Now to try count(1). For brevity I am going to trim the output of things that do not change. You have the script to run yourself so you can check I am not deleting anything significant. 

    test102>– count on a static value
    test102>select count(1) from test_1;

      COUNT(1)
    ———-
         19999
     
    Execution Plan
    ———————————————————-
    Plan hash value: 454320086

    ———————————————————————-
    | Id  | Operation             | Name | Rows  | Cost (%CPU)| Time     |
    ———————————————————————-
    |   0 | SELECT STATEMENT      |      |     1 |    11   (0)| 00:00:01 |
    |   1 |  SORT AGGREGATE       |      |     1 |            |          |
    |   2 |   INDEX FAST FULL SCAN| T_PK | 19999 |    11   (0)| 00:00:01 |
    ———————————————————————-
     
    Statistics
    ———————————————————-
              1  recursive calls
              0  db block gets
             46  consistent gets
     
    Exactly the same as the first iteration of select count(id). The plan hash value is the same too. Second run?

    test102>select count(1) from test_1;

      COUNT(1)
    ———-
         19999
     

    ———————————————————————-
    | Id  | Operation             | Name | Rows  | Cost (%CPU)| Time     |
    ———————————————————————-
    |   0 | SELECT STATEMENT      |      |     1 |    11   (0)| 00:00:01 |
    |   1 |  SORT AGGREGATE       |      |     1 |            |          |
    |   2 |   INDEX FAST FULL SCAN| T_PK | 19999 |    11   (0)| 00:00:01 |
    ———————————————————————-

    Statistics
    ———————————————————-
              0  recursive calls
              0  db block gets
             46  consistent gets

    I feel that is pretty good supporting evidence that count(*), count(id) and count(1) are all being executed the same and are equally efficient. 

    As a final test for now, how about select count(‘any old text’)?
     
    test102>–count on a larger, text static value
    test102>select count(‘any old text’) from test_1;

    COUNT(‘ANYOLDTEXT’)
    ——————-
                  19999

    Execution Plan
    ———————————————————-
    Plan hash value: 454320086

    ———————————————————————-
    | Id  | Operation             | Name | Rows  | Cost (%CPU)| Time     |
    ———————————————————————-
    |   0 | SELECT STATEMENT      |      |     1 |    11   (0)| 00:00:01 |
    |   1 |  SORT AGGREGATE       |      |     1 |            |          |
    |   2 |   INDEX FAST FULL SCAN| T_PK | 19999 |    11   (0)| 00:00:01 |
    ———————————————————————-

    Statistics
    ———————————————————-
              1  recursive calls
              0  db block gets
             46  consistent gets

    test102>select count(‘any old text’) from test_1;
     
    COUNT(‘ANYOLDTEXT’)
    ——————-
                  19999

    Execution Plan
    ———————————————————-
    Plan hash value: 454320086

    ———————————————————————-
    | Id  | Operation             | Name | Rows  | Cost (%CPU)| Time     |
    ———————————————————————-
    |   0 | SELECT STATEMENT      |      |     1 |    11   (0)| 00:00:01 |
    |   1 |  SORT AGGREGATE       |      |     1 |            |          |
    |   2 |   INDEX FAST FULL SCAN| T_PK | 19999 |    11   (0)| 00:00:01 |
    ———————————————————————-

    Statistics
    ———————————————————-
              0  recursive calls
              0  db block gets
             46  consistent gets
     
    Tomorrow I will continue on the theme but that is enough for now.

    Comments»

    1. PdV - May 18, 2009

    Hello Martin!

    And a nice, simple, understandable and repeatable blogpost.

    At risk of debating the topic to death: My guess is that any Unique Index could be used to “count” the records. It would be interesting to see if a “smaller” UK index (it may need an explicit constraint) would be chosen over a larger (overloaded, inflated) PK index, assuming a table had those kind of indexes.

    As for displaying SQL and Code-text in blogs, in the blogspot area the tag works ok, but it still removes whitespace.
    Any idea of better ways to display code in "courier" font ?
    How do the pros do that?

    Anyways, Keep Writing!

    PdV

    2. mwidlake - May 18, 2009

    Thanks Piet.
    Your points are good ones and will be touched on in the next part on this topic. And thanks for the tags hint. I know other bloggers have trouble getting code fragments to look good in blogs.

    3. Counting the Cost « Martin Widlake’s Yet Another Oracle Blog - May 22, 2009

    […] this first posting for the details of setting up the table and showing that select count(*), count(1) and count(pk) […]

    4. Counting the Cost #4 – The speedy way « Martin Widlake’s Yet Another Oracle Blog - May 26, 2009

    […] far I have shown with this posting here that the myths and arguments about whether count(*), count(1), count(pk) etc are immaterial […]

    5. Sandro - September 29, 2009

    Fastest way are debated by JL

    count(*)


    via BitmapIndexes

    6. mwidlake - September 29, 2009

    Thanks for that Sandro. The fastest way to count the rows in a table is to scan the smallest index that includes all columns – which bitmaps do. I cover this in the third post on the topic, but I was not very good at adding links between posts when I started blogging.
    If you check the fourth post on this topic, I show that there is potentially an even faster way to get the number of rows in a table 🙂

    7. Sandro - September 29, 2009
    mwidlake - September 29, 2009

    Oh dear.
    Thanks for letting me know. I used the hotlinks service before getting my website up and running. Looks like I need to do some fixing up of old posts 😦 It might take me a while to get around to it, but if you want the scripts Sandro, send me an email

    8. Jigar Doshi - January 8, 2010

    Why not select num rows from any not null column index 🙂

    mwidlake - January 8, 2010

    If you look at posting two on this topic (there is a link on one of the above comments) then you will see that I suggest that 🙂
    Posting three goes on to bitmap indexes, which have an entry for every row and are often THE fastest way to count all rows in a table.

    I should add a work of warning about selecting count(row) on a not-null indexed column. You are NOT counting the rows in the table, you are counting the occurences of that column being populated. It just so happens that if the column is mandatory, the two is the same. What if someone changes that column to nullable?

    Just select count(*) and be done 🙂


    Leave a reply to mwidlake Cancel reply