jump to navigation

Decoding high_value and low_value January 3, 2010

Posted by mwidlake in internals, performance.
Tags: , ,
15 comments

There is an update to this post here>

The table DBA_TAB_COLUMNS holds the LOW_VALUE and HIGH_VALUE for columns. This information is potentially very useful to us, especially as when the CBO looks at a WHERE clause on a column for a given value, the High and Low values for that column are considered. If the value in the WHERE clause is outside the known range, the expected number of values to be found is decreased in proportion to the distance outside the range.

What is not so helpful is that Oracle stores, and displays, the information in an internal raw format. Which is utterly unhelpful to us of course. Why can’t they at least expose the information in a view in a human-readable format? Tsch.

So here is some code on how to decode low_value and high_value. I pulled most of this together a few years back but always struggled to decode the low and high values for dates, until I found this post by Gary Myers.

-- col_stats
-- Martin Widlake mdw 21/03/2003
-- MDW 11/12/09 enhanced to include more translations of low_value/high_value
-- pilfered from Gary Myers blog
col owner        form a6 word wrap
col table_name   form a15 word wrap
col column_name  form a22 word wrap
col data_type    form a12
col M            form a1
col num_vals     form 99999,999
col dnsty        form 0.9999
col num_nulls    form 99999,999
col low_v        form a18
col hi_v         form a18
col data_type    form a10
set lines 110
break on owner nodup on table_name nodup
spool col_stats.lst
select --owner
--      ,table_name
      column_name
      ,data_type
      ,decode (nullable,'N','Y','N')  M
      ,num_distinct num_vals
      ,num_nulls
      ,density dnsty
,decode(data_type
  ,'NUMBER'       ,to_char(utl_raw.cast_to_number(low_value))
  ,'VARCHAR2'     ,to_char(utl_raw.cast_to_varchar2(low_value))
  ,'NVARCHAR2'    ,to_char(utl_raw.cast_to_nvarchar2(low_value))
  ,'BINARY_DOUBLE',to_char(utl_raw.cast_to_binary_double(low_value))
  ,'BINARY_FLOAT' ,to_char(utl_raw.cast_to_binary_float(low_value))
  ,'DATE',to_char(1780+to_number(substr(low_value,1,2),'XX')
         +to_number(substr(low_value,3,2),'XX'))||'-'
       ||to_number(substr(low_value,5,2),'XX')||'-'
       ||to_number(substr(low_value,7,2),'XX')||' '
       ||(to_number(substr(low_value,9,2),'XX')-1)||':'
       ||(to_number(substr(low_value,11,2),'XX')-1)||':'
       ||(to_number(substr(low_value,13,2),'XX')-1)
,  low_value
       ) low_v
,decode(data_type
  ,'NUMBER'       ,to_char(utl_raw.cast_to_number(high_value))
  ,'VARCHAR2'     ,to_char(utl_raw.cast_to_varchar2(high_value))
  ,'NVARCHAR2'    ,to_char(utl_raw.cast_to_nvarchar2(high_value))
  ,'BINARY_DOUBLE',to_char(utl_raw.cast_to_binary_double(high_value))
  ,'BINARY_FLOAT' ,to_char(utl_raw.cast_to_binary_float(high_value))
  ,'DATE',to_char(1780+to_number(substr(high_value,1,2),'XX')
         +to_number(substr(high_value,3,2),'XX'))||'-'
       ||to_number(substr(high_value,5,2),'XX')||'-'
       ||to_number(substr(high_value,7,2),'XX')||' '
       ||(to_number(substr(high_value,9,2),'XX')-1)||':'
       ||(to_number(substr(high_value,11,2),'XX')-1)||':'
       ||(to_number(substr(high_value,13,2),'XX')-1)
,  high_value
       ) hi_v
from dba_tab_columns
where owner      like upper('&tab_own')
and   table_name like upper(nvl('&tab_name','WHOOPS')||'%')
ORDER BY owner,table_name,COLUMN_ID
/
clear colu
spool off
clear breaks

Most of the translation is done via the utl_raw.cast_to_datatype functions but no such function is provided for dates, which is where most of us come unstuck. Gary recognised that the data was being stored in hex format, with an offset for the year.

I hope the script helps some of you.
{There are also functions under DBMS_STATS, DBMS_STATS.CONVERT_RAW_VALUES, that can also be called to translate many datatypes}

If anyone knows how to decode timestamps, I’d love to know as it would save me trying to work it out/find it on the Net somewhere. I’ll add it to the post so that there is one place to find all translatations.

Here is a quick output for a test table

@col_stats
Enter value for tab_own: dwperf
old  40: where owner      like upper('&tab_own')
new  40: where owner      like upper('dwperf')
Enter value for tab_name: ad_sql_exec_p
old  41: and   table_name like upper(nvl('&tab_name','WHOOPS')||'%')
new  41: and   table_name like upper(nvl('ad_sql_exec_p','WHOOPS')||'%')
any key> 

COLUMN_NAME            DATA_TYPE  M   NUM_VALS  NUM_NULLS   DNSTY LOW_V              HI_V
---------------------- ---------- - ---------- ---------- ------- ------------------ ---------------
INST_ID                NUMBER     Y          4          0  0.2500 1                  4
SQL_ID                 VARCHAR2   Y     87,104          0  0.0000 008d71chrks14      gzw309ttkx862
PLAN_HASH_VALUE        NUMBER     Y      1,884          0  0.0005 2646229            4294043874
SPC_ID                 NUMBER     N         83          0  0.0120 1076               2269
ADDRESS                RAW        N    118,198          0  0.0000 00000003E33123A0   0000001342FF3FF8
HASH_VALUE             NUMBER     N     87,104          0  0.0000 2758710            4294676643
CREATED_DATE           DATE       N        120          0  0.0083 2009-10-23 8:19:10 2009-12-10 9:19:13
LATEST_FIRST_LOAD_TIME DATE       N     11,791          0  0.0001 2009-9-14 11:55:59 2009-12-13 9:33:24
TOTAL_LOADS            NUMBER     N         98          0  0.0102 1                  55047
TOTAL_PARSE_CALLS      NUMBER     N         92          0  0.0109 0                  2972
TOTAL_EXECUTIONS       NUMBER     N        235          0  0.0043 0                  834624
TOTAL_ROWS_PROCESSED   NUMBER     N        809          0  0.0012 0                  26946123
TOTAL_FETCHES          NUMBER     N        313          0  0.0032 0                  834624
TOTAL_BUFFER_GETS      NUMBER     N      3,016          0  0.0003 0                  3355576809
TOTAL_DISK_READS       NUMBER     N        985          0  0.0010 0                  28189240
TOTAL_DIRECT_WRITES    NUMBER     N         98          0  0.0102 0                  751289
TOTAL_SORTS            NUMBER     N        106          0  0.0094 0                  5283
TOTAL_CPU_TIME         NUMBER     N     94,401          0  0.0000 1337               12183936207
TOTAL_ELAPSED_TIME     NUMBER     N    115,203          0  0.0000 1337               139692482086
TOTAL_OPTIMIZER_COST   NUMBER     N      1,467          0  0.0007 0                  369740902209315000
                                                                                     0000000

Using AWR data for OS information December 22, 2009

Posted by mwidlake in performance.
Tags: ,
5 comments

This is a demo of using AWR data to get some OS performance information.

This posting was actually a comment by (and is the work of) Bernard Polarski on a recent post about command_types I did a week or two back. I’ve lifted it into a separate posting as (a) I thought it might be useful to people and it would not get as much exposure as a comment as it would as a post and (b) I want to try and lay it out better for Bernard (WordPress likes to garble code in comments, which it did to Bernard’s efforts) and (c) I strognly believe in nicking other people’s code, if they are happy to put it “out there”. Why re-code what someone has already done and is happy to share :-)

I tried the code (I added the SYS. ownership on tables, so it will work even if you can’t log in as sys but have access to the objects) and it worked fine for me.

I should just add, use of AWR data is covered by an oracle licence, the diagnostic pack. If you do not have a licence for it, Oracle deem the WRH$ views as not yours to view, even though they are on your box, in the database you paid for, containing data gathered by your CPUs by code that is damned tricky to turn off and is on by default. Shesh.

Any mistakes or typoes, blame me not Bernard. I took the liberty of tweaking some of the English, hope that is OK Bernard.

*************************************
Okay a challenge : Let me try to surprise you with something improbable. Some months ago while investigating an unsual load on a Linux box, the unix sysadmin told us he could not go back more than 2 weeks with ‘sar’ history. Since Oracle takes also system stats I quickly enriched smenu with an Oracle sar.
The example below gives the sar stats for the last 10 snaps, it is stored into shortcut sts and called with param ‘-sar’

( replace workd Superior_TO and Inferior_to with mathematic signs, system eat them)

/tmp: sts -sar 10

+ sqlplus -s '/ as sysdba'
+
set feed off verify off lines 190 pages 66
col inst  new_value inst noprint;
col id1      head 'Idle time(%)'    justify c
col usr1     head 'User time(%)'    justify c
col sys1     head 'Sys time(%)'     justify c
col io1      head 'Io Wait time(%)' justify c
col nice1    head 'Nice time (%)'   justify c
col snap_len head 'Interval| (Secs)' justify c
col num_cpus   new_value p_num_cpus  head 'Number of CPU';
col a1         new_value secs noprint;
col SNAP_BEGIN format a20 head 'Snap begin' justify c
col SNAP_END   format a20 head 'Snap end'   justify c

 select instance_number inst from v$instance;
 SELECT value num_cpus 
 FROM v$osstat WHERE stat_name = 'NUM_CPUS';
 prompt
 prompt Negatives values correspond to Shutdown:
 prompt
select  snap_id, snap_len, round(id1 /snap_len *100,2) id1,
                       round(usr1 /snap_len *100,2) usr1,
                       round(sys1 /snap_len *100,2) sys1,
                       round(io1 /snap_len *100,2) io1,
                       round(nice1 /snap_len *100,2) nice1 
                     , snap_begin, snap_end
from (
     select  snap_id,  id1, usr1,sys1, io1, nice1
           , snap_begin, snap_end ,
             round( extract( day from diffs) *24*60*60*60+
                    extract( hour from diffs) *60*60+
                    extract( minute from diffs )* 60 +
                    extract( second from diffs )) snap_len 
             -- above is the exact length of the snapshot in seconds
     from ( select /*+ at this stage, each row show the cumulative value.
                       r1    7500  8600
                       r2    7300  8300
                       r3    7200  8110
                    we use [max(row) - lag(row)] to have the difference 
                    between [row and row-1], to obtain differentials values:
                       r1    200   300
                       r2    100   190
                       r3    0       0
                    */
        a.snap_id,
        (max(id1)    - lag( max(id1))   over (order by a.snap_id))/100        id1 ,
        (max(usr1)   - lag( max(usr1))  over (order by a.snap_id))/100        usr1,
        ( max(sys1)  - lag( max(sys1))  over (order by a.snap_id))/100        sys1,
        ( max(io1)   - lag( max(io1))   over (order by a.snap_id))/100        io1,
        ( max(nice1) - lag( max(nice1)) over (order by a.snap_id))/100        nice1,
          -- for later display
          max(to_char(BEGIN_INTERVAL_TIME,' YYYY-MM-DD HH24:mi:ss'))          snap_begin,
          -- for later display
          max(to_char(END_INTERVAL_TIME,' YYYY-MM-DD HH24:mi:ss'))            snap_end,
          -- exact len of snap used for percentage calculation
        ( max(END_INTERVAL_TIME)-max(BEGIN_INTERVAL_TIME))                    diffs
        from ( /*+  perform a pivot table so that the 5 values selected appear 
                    on one line. The case, distibute col a.value among 5 new 
                    columns, but creates a row for each. We will use the group 
                    by (snap_id) to condense the 5 rows into one. If you don't 
                    see how this works, just remove the group by and max function, 
                    then re-add it and you will see the use of max(). 
                    Here is what you will see:
        Raw data :   1000     2222
                     1000     3333
                     1000     4444
                     1000     5555
                     1000     6666
        The SELECT CASE creates populate our inline view with structure:
                       ID    IDLE    USER   SYS   IOWAIT   NICE
                     1000    2222
                     1000             3333
                     1000                   4444
                     1000                          5555
                     1000                                  6666
         the group by(1000) condenses the rows in one:
                       ID    IDLE    USER   SYS   IOWAIT  NICE
                      1000   2222    3333   4444   5555   6666
               */
               select a.snap_id,
                  case b.STAT_NAME
                       when 'IDLE_TIME' then a.value / &p_num_cpus
                   end  id1,
                  case b.STAT_NAME
                       when 'USER_TIME' then a.value / &p_num_cpus
                  end usr1 ,
                  case b.STAT_NAME
                       when 'SYS_TIME' then  a.value / &p_num_cpus
                  end sys1 ,
                  case b.STAT_NAME
                       when 'IOWAIT_TIME' then  a.value / &p_num_cpus
                  end io1,
                  case b.STAT_NAME
                       when 'NICE_TIME' then  a.value / &p_num_cpus
                  end nice1
                  from  sys.WRH$_OSSTAT a,  sys.WRH$_OSSTAT_NAME b
                       where
                            a.dbid      = b.dbid       and
                            a.STAT_ID   = b.stat_id    and
                            instance_number = &inst    and
                            b.stat_name in ('IDLE_TIME','USER_TIME','SYS_TIME','IOWAIT_TIME','NICE_TIME') and
                            a.snap_id > (( select max(snap_id) from  sys.WRH$_OSSTAT) - 10  -1 )
                   order by 1 desc
              ) a,  sys.wrm$_snapshot s
         where  a.snap_id = s.snap_id
         group by a.snap_id
         order by snap_id desc
        )
   )where rownum < (10+1);





Number of CPU
-------------
            2

Negatives values correspond to Shutdown:


            Interval
   SNAP_ID   (Secs)   Idle time(%) User time(%) Sys time(%) Io Wait time(%) Nice time (%)      Snap begin            Snap end
---------- ---------- ------------ ------------ ----------- --------------- ------------- -------------------- --------------------
      2212       3603        87.07         9.43        2.84           16.19             0  2009-12-15 14:00:19  2009-12-15 15:00:21
      2211       3613        90.15         7.31        2.14           18.97             0  2009-12-15 13:00:06  2009-12-15 14:00:19
      2210       3593        92.58         5.91         1.4            7.91             0  2009-12-15 12:00:13  2009-12-15 13:00:06
      2209       3574        91.24         6.78        1.73            9.37             0  2009-12-15 11:00:38  2009-12-15 12:00:13
      2208       3620        90.04          7.8        1.96           11.81             0  2009-12-15 10:00:19  2009-12-15 11:00:38
      2207       3594         89.8         6.89        2.37           12.99             0  2009-12-15 09:00:25  2009-12-15 10:00:19
      2206       3611        91.75         5.81        1.87            8.29             0  2009-12-15 08:00:14  2009-12-15 09:00:25
      2205       3599        92.38         5.63        1.68            8.57             0  2009-12-15 07:00:15  2009-12-15 08:00:14
      2204       3605        91.72         6.51        1.58           10.79             0  2009-12-15 06:00:11  2009-12-15 07:00:15
      2203       3601        86.36        10.35        2.94           10.62             0  2009-12-15 05:00:10  2009-12-15 06:00:11

COMMAND_TYPE Values December 10, 2009

Posted by mwidlake in internals.
Tags: , ,
18 comments

Follow-up post on finding most of the COMMAND_TYPES in the data dictionary is here and thanks to Christian Antognini who’s comment led me in that direction.

Spoiler – See end for getting a full list of COMMAND_TYPE values.

If you are in the habit of looking at V$SQLAREA and V$SQL to see what is going on in the database {for those of use who occasionally step out of the “GUI DBA Tool” environment} you will see a lot of stuff in there. I tend to look for the most demanding SQL so I am not interested in PL/SQL, I just want to see the SQL executed via it.

To do this, you can ignore those entires in V$SQL/V$SQLAREA with a “COMMAND_TYPE=47” filter.

This COMMAND_TYPE is useful. 3=SELECT, the most common thing you probablly see and look at, 6=UPDATE, 7=DELETE. 50=EXPLAIN.

If you google/bing/whatever COMMAND_TYPE you get many hits that list of some of the commands, all looking very similar. But very partial lists.

So I decided to amalgamate these partial lists, verrify what COMMAND_TYPE links to what commands in the databases I have access to and publish this fuller, verified list. I quickly found some commands do not stay in the SGA after issuing, so I could not confirm them {“drop index” being one}.

I got this far:

-- mdw 10/12/09
-- check for cmd types not seen before, as lists on web are not complete
-- * = I have verified
col sql_txt form a60
select command_type,sql_id,substr(sql_text,1,60) sql_txt
from gv$sqlarea
where command_type not in (
 1 --  create table  *
,2 --  INSERT        *
,3 --  SELECT        *
,6 --  UPDATE        *
,7 --  DELETE        *
,9 --  create index  *
,11 -- ALTER INDEX   *
,26 -- LOCK table    *
,42 -- ALTER_SESSION (NOT ddl)
--two postings suggest 42 is alter session
,44 -- COMMIT
,45 -- rollback
,46 -- savepoint
,47 -- PL/SQL BLOCK' or begin/declare *
,48 -- set transaction   *
,50 -- explain           *
,62 -- analyze table     *
,90 -- set constraints   *
,170 -- call             *
,189 -- merge            *
)
and rownum < 20
/

Please feel free to add to it.

Or save yourself the bother and check out the list provided in the Oracle documentation. Not under the descriptions for V$SQL or V$SQLAREA, which would have been the obvious place Mr Larry Ellison thank you, but under V$SESSION. . Further, the fact that in the V$SESSION table the column is called just COMMAND and not COMMAND_TYPE does not assist in locating this information (don’t google COMMAND and ORACLE, you get many millions of hits…). Just click the below.

This is the full table of values and meanings.

(This is the listing for 10.2 and 11.1 is very similar).

But it does not include 189 – MERGE, so that is one up for practical testing than just reading the manual :-)

Back to the day job…

Wrong way to Query on Dates – the Persistent Offender November 15, 2009

Posted by mwidlake in performance.
Tags: ,
13 comments

Some of you, hopefully most of you, will find nothing new in this little posting. Except maybe the realisation that you are not the only one utterly frustrated by the ever-reoccuring nature of this. Actually, I should say “frustrated by these“. There are two things here I keep coming across, year after year after year…Actually, three things. {Maybe I am frustrated too easily, I need a holiday….}

{Maybe I should start a “weekend rant” as well as a”Friday Philospohy” thread…}

OK, you have a massive ORDERS table with a column ORDER_DATE on it, type DATE. This column has a normal B-tree index on it. You want to query for all ORDERS placed on a date, say 1st November 2009, so you issue a query with the following WHERE clause:

WHERE ORDER_DATE = ’01-NOV-09′

But you would not, would you? DATE columns can (and often do) contain a time portion. And in this case it does, so you would track down whoever specified this table, give them a “talking to”, get the column renamed to ORDER_DATETIME and use something like:

WHERE TRUNC(ORDER_DATETIME)=’01-NOV-09′

But you would not, would you? The function on ORDER_DATETIME will prevent the index being used (unless you added a function-based index, you clever so-and-so).
Now this is the first second thing that I keep seeing and have to run through with people. Some people simply have not yet been told that functions on columns stop indexes from being used, others know this but do not quite understand that a TRUNC {or SUBSTR on the first X characters} still stop indexes being used by Oracle, even though they know that “functions stop indexes being used” – It seems sensible to them, as humans, that if they had a printed index they would not think twice about doing a range check with the index. But the CBO is not a human and is not so smart. So you explain this to them.

So maybe you get to the point where the following is the WHERE clause to use:

WHERE ORDER_DATETIME >= ’01-NOV-09′
AND ORDER_DATETIME < ’02-NOV-09′

That will work. It will use the index on ORDER_DATETIME, it will pass unit testing, it will get released and work fine.

Until someone runs this on a system with a different NLS_DATE_FORMAT.
Or in a session with a different NLS_DATE_FORMAT.
Or on an application server/middle tier where the NLS_DATE_FORMAT is different and it all falls over.
{and for NLS_DATE_FORMAT, keep in mind that NLS_LANG and NLS_TERRITORY can implicitly alter NLS_DATE_FORMAT, so anyone supporting database systems over national boundaries may well be nodding their heads sagely right now}.

It falls over because there is an implicit data conversion of the string ’01-NOV-09′ to a date. It uses your default date format, which is not the same on all databases, on all middle tiers or even all sessions (I always alter my session to show the time portion and I seem to be one of the first to hit these issues, maybe as a result). It is far less common, but I must come across this NLS issue once a year minimum and usually it has been a bit of a major issue (like feeding back utterly the wrong data to a regulatory body, Ouch).

So, you need to explicitly state your character to date format conversion. And to all developers reading this, Yes I know, it is a pain to type those extra bits, but you get paid to do it! :-)
WHERE ORDER_DATETIME >= TO_DATE(’01-NOV-2009′,’DD-MON-YYYY’)
AND ORDER_DATETIME < TO_DATE(’02-NOV-2009′,’DD-MON-YYYY’)

That will work and keep working.

So:

  • Call columns that hold a DATE with a time portion blar_DATETIME
  • Don’t have functions on the column, move them over to the other side of the WHERE statement(s)
  • Explicitly specify your date conversions. Always. Every time.

I know, I know, most of you knew all of that. Those of you who didn’t, well probably no one had told you before but now you have been told. And that is the heart of it, these things are easy to explain, easy to understand, but somehow seem to be missed when people are getting going with Oracle. They just need telling.

And I’m only able to tell so many. So, the rest of you, will you please tell all inexperienced Oracle people about these three things as I don’t think I can face another 20 years of telling people these things :-)

{The serious final line is, there are lots and lots of interesting gotchas, what-ifs, quirks on things in Oracle, all of which are good to share, but perhaps the most important things to keep sharing are those things “we all know”, but those with less experience don’t.}

Describing tables you can’t DESC October 7, 2009

Posted by mwidlake in internals.
Tags: ,
3 comments

This is more an oddity than anything particularly useful. Sometimes you can’t use the sql*plus DESCRIBE {DESC} command on tables- but you might have an alternative.

I’m doing a lot of work for a client on a 10.2.0.3 database.  I have SELECT ANY DICTIONARY but not SELECT ANY TABLE privilege. This is because there is sensitive data in the database and it is duly protected {and this is certainly not the first time I have worked for a client with full dictionary access but not data access, it’s becoming normal}. I’m granted access to specific things as needs arise.

I knew I had to look at a table called AD_FAKE_STATS_COLUMNS.

select count(*) from mdw.AD_FAKE_STATS_COLUMNS
/
select count(*) from mdw.AD_FAKE_STATS_COLUMNS
                         *
ERROR at line 1:
ORA-00942: table or view does not exist

Have I got the table name wrong?

select table_name,owner
from dba_tables
where table_name = 'AD_FAKE_STATS_COLUMNS'

TABLE_NAME                     OWNER
------------------------------ -----------------
AD_FAKE_STATS_COLUMNS          MDW

OK, it is there, I don’t have access to it. Fine, I don’t want access to the data, I just want to see the structure of the table:

desc mdw.AD_FAKE_STATS_COLUMNS
ERROR:
ORA-04043: object mdw.AD_FAKE_STATS_COLUMNS does not exist

Oh. DESC in sql*plus does not work.

I can’t DESC a table I do not have access to carry out DML on. I’m going to have to go and ask someone to give me permission to see the table. How annoying.

Or do I?

@tab_desc
Enter value for tab_name: AD_FAKE_STATS_COLUMNS
old 21: where table_name like upper (nvl('&amp;TAB_NAME','W')||'%')
new 21: where table_name like upper (nvl('AD_FAKE_STATS_COLUMNS','W')||'%')

TAB_OWN  TAB_NAME              COL_NAME              M COL_DEF
-----------------------------------------------------------------
MDW      AD_FAKE_STATS_COLUMNS TABLE_NAME            Y VARCHAR2(30)
                               COLUMN_NAME           Y VARCHAR2(30)
                               COPY_STATS_FROM       N VARCHAR2(61)
                               LOW_VALUE_SQL         N VARCHAR2(100)
                               HIGH_VALUE_SQL        N VARCHAR2(100)
                               DISTINCT_SQL          N VARCHAR2(100)
                               DAYS_HIST_NULL_AVGLEN N NUMBER(3,0)

:-)

I have access to the data dictionary. So I can see the structure of the table, which after all is what I wanted and is what the client is happy for me to have.{I’ve never much liked the DESC command in sql*plus, I replaced it with a little sql script against the data dictionary years ago}.

In case you want it, here is the script:

-- tab_desc.sql
-- Martin Widlake date? way back in the mists of time
-- my own replacement for desc.
-- 16/11/01 improved the data_type section
 SET PAUSE ON
 SET PAUSE 'Any Key...&gt;'
 SET PAGES 24
col Tab_own form A10
col tab_name form a22 wrap
col col_name form a28 wrap
col col_def form A14
 break on tab_own skip 1 on tab_name skip 1
 spool tab_desc.lst
 select
 owner                               Tab_Own
,table_name             Tab_Name
,column_name            Col_Name
,decode(NULLABLE,'Y','N','Y') Mand
,data_type||decode(data_type
       ,'NUMBER','('
        ||decode(to_char(data_precision)
                ,null,'38'
                ,     to_char(data_precision)||
                      decode(data_scale,null,''
                                      ,      ','||data_scale)
                 )
                    ||')'
       ,'DATE',null
       ,'LONG',null
       ,'LONG RAW',null
,'('||Substr(DATA_LENGTH,1,5)||')'
				     )  col_def
from dba_tab_columns
where table_name like upper (nvl('&amp;TAB_NAME','WHOOPS')||'%')
order by 1,2,column_id,3,4
/
spool off
clear col
--

Decrypting Histogram Data #2 September 3, 2009

Posted by mwidlake in internals, performance.
Tags: , , ,
7 comments

Hint – if you have a column where most or all of the entries have the same 15 plus characters, don’t bother gathering histograms on it. They will not help you a lot. Or at all. eg you have a column that holds “CUSTOMER IDENTIFIER xxxxxxxxx”, “CUSTOMER IDENTIFIER yyyyyyyy”, etc. Of course, good design suggests that the “CUSTOMER IDENTIFIER” bit is probably redundant and can be got rid of, but we live in the real world and may not have the power or ability to enforce that change, or we might have 3 or 4 such “prefix” strings.

Further, histograms on dnumerics longer than 15 significant digits will also potentially behave not as you would expect.

I better justify my assertion.

In the previous post on decrypting histogram data I covered how Oracle turns a varchar2 value into a numeric value that is held in the ENDPOINT_VALUE of DBA_TAB_HISTOGRAMS and I also gave you a cunning (but slightly flawed) function to convert it back. I use it in this post so you might want to go look at the prior post. Sorry it is long, I can’t stop rambling.

First, I’ll create some test data {Oh, this is on 10.2.0.3 on Linux}. The below script create a table HIST_TEST with columns NUM_1 through to NUM7, which hold numbers padded out to greater lengths and then 0-9 as the last value. Similarlry columns VC_2 to VC_8 are padded out and have a random character added. VC_1 is a random 5-character string.

create table hist_test
as select rownum  id
,mod(rownum,10)  num_1
,trunc(dbms_random.value(1,10)) num_2
,1000+trunc(dbms_random.value(1,10)) num_3
,1000000+trunc(dbms_random.value(1,10)) num_4
,1000000000+trunc(dbms_random.value(1,10)) num_5
,1000000000000+trunc(dbms_random.value(1,10)) num_6
,1000000000000000+trunc(dbms_random.value(1,10)) num_7
,lpad(chr(mod(rownum,6)+65),5,chr(mod(rownum,6)+65) )vc_1
,lpad('A',2,'A')||dbms_random.string('U',1) vc_2
,lpad('B',3,'B')||dbms_random.string('U',1) vc_3
,lpad('C',4,'C')||dbms_random.string('U',1) vc_4
,lpad('D',5,'D')||dbms_random.string('U',1) vc_5
,lpad('E',6,'E')||dbms_random.string('U',1) vc_6
,lpad('F',7,'F')||dbms_random.string('U',1) vc_7
,lpad('G',8,'G')||dbms_random.string('U',1) vc_8
--below skews the table to AB.
,lpad('H',16,'H')||decode(mod(rownum,3),0,'A'
                                       ,1,'B'
                                       ,   dbms_random.string('U',1)) vc_16
,lpad('I',40,'I')||dbms_random.string('U',1) vc_40
from dba_objects
where rownum < 1001
/
begin
  dbms_stats.gather_table_stats(ownname=>user,tabname=>'HIST_TEST'
  ,method_opt=>'FOR ALL COLUMNS SIZE 10');
END;
/

The below is a simple select against DBA_TAB_COLUMNS to see the information for the column {Oh, I say simple, but you have to use the functions utl_raw.cat_to_number and utl_raw.cast_to_varchar2 to turn the raw values held in the columns LOW_VALUE and HIGH_VALUE to something we humans can read. Why does Oracle Corp have to make life so difficult? *sigh*.

Oh, click on “view plain” on the box below to get a better layout.

select
 column_name
,num_distinct num_dist
,decode(data_type,'NUMBER',to_char(utl_raw.cast_to_number(low_value))
                 ,'VARCHAR2',to_char(utl_raw.cast_to_varchar2(low_value))
                 ,          low_value
       ) low_v
,decode(data_type,'NUMBER',to_char(utl_raw.cast_to_number(high_value))
                 ,'VARCHAR2',to_char(utl_raw.cast_to_varchar2(high_value))
                 ,          high_value
       ) hi_v
,num_nulls    n_nulls
,num_buckets  n_buck
,avg_col_len  avg_l
from dba_tab_columns
where table_name ='HIST_TEST'
and owner=USER
order by owner,column_id
/

COLUMN_NAME       NUM_DIST LOW_V              HI_V                    N_NULLS N_BUCK  AVG_L
---------------------------------------------------------------------
ID                   1,000 1                  1000                          0     10      4
NUM_1                   10 0                  9                             0     10      3
NUM_2                    9 1                  9                             0      9      3
NUM_3                    9 1001               1009                          0      9      4
NUM_4                    9 1000001            1000009                       0      9      6
NUM_5                    9 1000000001         1000000009                    0      9      7
NUM_6                    9 1000000000001      1000000000009                 0      9      9
NUM_7                    9 1000000000000001   1000000000000009              0      9     10
VC_1                     6 AAAAA              FFFFF                         0      6      6
VC_2                    26 AAA                AAZ                           0     10      4
VC_3                    26 BBBA               BBBZ                          0     10      5
VC_4                    26 CCCCA              CCCCZ                         0     10      6
VC_5                    26 DDDDDA             DDDDDZ                        0     10      7
VC_6                    26 EEEEEEA            EEEEEEZ                       0     10      8
VC_7                    26 FFFFFFFA           FFFFFFFZ                      0     10      9
VC_8                    26 GGGGGGGGA          GGGGGGGGZ                     0     10     10
VC_16                   26 HHHHHHHHHHHHHHHHA  HHHHHHHHHHHHHHHHZ             0     10     18
VC_40                    1 IIIIIIIIIIIIIIIIII IIIIIIIIIIIIIIIIII            0      1     42/

I now select the data out of the DBA_TAB_HISTOGRAMS table to see what is really being stored. For the sake of brevity {which means, “so I can finish this post tonight”} I’ll just show bits, but if you want, download the script right at the end and, if you created the functions from the prior post, you can play with this yourself.

First, here you can see some of the values actually stored in ENDPOINT_VALUE and what they translate into:-

TABLE_NAME   colname                        ENDPOINT_VALUE ROWCOUNT REAL_VAL
------------------------------------------------------------
Numbers
HIST_TEST    NUM_5                              1000000007      760 1000000007
HIST_TEST    NUM_5                              1000000008      870 1000000008
HIST_TEST    NUM_5                              1000000009    1,000 1000000009
HIST_TEST    NUM_6                           1000000000001      103 1000000000001
HIST_TEST    NUM_6                           1000000000002      219 1000000000002
HIST_TEST    NUM_6                           1000000000003      328 1000000000003
HIST_TEST    NUM_6                           1000000000004      427 1000000000004
Varchars
HIST_TEST    VC_2     338824386822815000000000000000000000        8 AAU
HIST_TEST    VC_2     338824624507302000000000000000000000        9 AAW
HIST_TEST    VC_2     338824782963627000000000000000000000       10 AAY
HIST_TEST    VC_3     344035480872391000000000000000000000        0 BBB@
HIST_TEST    VC_3     344035481491361000000000000000000000        1 BBBB
HIST_TEST    VC_3     344035482419816000000000000000000000        2 BBBE
HIST_TEST    VC_3     344035483348271000000000000000000000        3 BBBH
HIST_TEST    VC_3     344035483967241000000000000000000000        4 BBBJ

Note that for numerics the number itself is stored and I do not need to translate it.
For VARCHAR2 columns the value held is the 15-digit number padded with zeros.
Also note, for VC_2 the range covered seems to end at AAY not AAZ and column VC_3 starts at BBB@ not BBBA {I am creating values with the last character set to A-Z}. Also, bucket 8 for VC_2 ends in a control character.

To reduce this I add a fudge to my function {again, see previous post}. It helps:

TABLE_NAME   colname ROWCOUNT REAL_VAL          MOD_REAL
---------------------------------------------------
HIST_TEST    VC_2           9 AAW               AAX
HIST_TEST    VC_2          10 AAY               AAZ
HIST_TEST    VC_3           0 BBB@              BBBA
HIST_TEST    VC_3           1 BBBB              BBBC
HIST_TEST    VC_3           2 BBBE              BBBF
HIST_TEST    VC_3           3 BBBH              BBBI
HIST_TEST    VC_3           4 BBBJ              BBBK
HIST_TEST    VC_3           5 BBBM              BBBN
HIST_TEST    VC_3           6 BBBO              BBBP
HIST_TEST    VC_3           7 BBBR              BBBS
HIST_TEST    VC_3           8 BBBT              BBBU
HIST_TEST    VC_3           9 BBBW              BBBX
HIST_TEST    VC_3          10 BBBY              BBBZ
HIST_TEST    VC_4           0 CCCC@             CCCCA
HIST_TEST    VC_4           1 CCCCB             CCCCC
HIST_TEST    VC_4           2 CCCCD             CCCCE
HIST_TEST    VC_4           3 CCCCH            CCCCH
HIST_TEST    VC_4           4 CCCCJ            CCCCJ

As you can see, I now get a better translation, but it still goes wrong sometimes (see last couple of rows). So, feel free to take my functions and use them, but be aware even the modified version is not perfect. If YOU perfect it, can I have a copy please :-)

The below shows that Numeric value histograms break when you exceed 15 digits:

colname                  END_VAL ROWCOUNT REAL_VAL
------- ------------------------ -------- --------
NUM_5                 1000000009    1,000 1000000009
NUM_6              1000000000001      103 1000000000001
NUM_6              1000000000002      219 1000000000002
NUM_6              1000000000003      328 1000000000003
NUM_6              1000000000004      427 1000000000004
NUM_6              1000000000005      542 1000000000005
NUM_6              1000000000006      651 1000000000006
NUM_6              1000000000007      771 1000000000007
NUM_6              1000000000008      881 1000000000008
NUM_6              1000000000009    1,000 1000000000009
NUM_7           1000000000000000      133 1000000000000000
NUM_7           1000000000000000      256 1000000000000000
NUM_7           1000000000000000      367 1000000000000000
NUM_7           1000000000000000      467 1000000000000000
NUM_7           1000000000000010      567 1000000000000010
NUM_7           1000000000000010      665 1000000000000010
NUM_7           1000000000000010      784 1000000000000010
NUM_7           1000000000000010      896 1000000000000010
NUM_7           1000000000000010    1,000 1000000000000010

This is the point at which storage of numbers for histograms breaks. You can see that NUM_6 is fine but NUM_7 is not. That is because NUM_6 is below 15 significant digits and NUM_7 is over 15 significant digits and effectively gets truncated.

Histograms on numeric values with more than 15 significant digits will not work as you expect, possible not at all.

With Varchar(2) values, histogram END_VALUES break even sooner, at around 7 digits:

colname                               END_VAL ROWCOUNT MOD_REAL
------- ------------------------------------- -------- --------
VC_5     354460798876024000000000000000000000        5 DDDDDO
VC_5     354460798876038000000000000000000000        6 DDDDDR
VC_5     354460798876047000000000000000000000        7 DDDDDS
VC_5     354460798876061000000000000000000000        8 DDDDDV
VC_5     354460798876071000000000000000000000        9 DDDDDY
VC_5     354460798876080000000000000000000000       10 DDDDDZ
VC_6     359673457682977000000000000000000000        0 EEEEEF8
VC_6     359673457682977000000000000000000000        1 EEEEEF8
VC_6     359673457682977000000000000000000000        2 EEEEEF8
VC_6     359673457682977000000000000000000000        3 EEEEEF8
VC_6     359673457682977000000000000000000000        4 EEEEEF8
VC_6     359673457682977000000000000000000000        5 EEEEEF8
VC_6     359673457682977000000000000000000000        6 EEEEEF8
VC_6     359673457682977000000000000000000000        7 EEEEEF8
VC_6     359673457682978000000000000000000000        8 EEEEEFn
VC_6     359673457682978000000000000000000000        9 EEEEEFn
VC_6     359673457682978000000000000000000000       10 EEEEEFn
VC_7     364886116489977000000000000000000000        0 FFFFFGK
VC_7     364886116489977000000000000000000000        1 FFFFFGK
VC_7     364886116489977000000000000000000000        2 FFFFFGK
VC_7     364886116489977000000000000000000000        3 FFFFFGK
VC_7     364886116489977000000000000000000000        4 FFFFFGK
VC_7     364886116489977000000000000000000000        5 FFFFFGK
VC_7     364886116489977000000000000000000000        6 FFFFFGK
VC_7     364886116489977000000000000000000000        7 FFFFFGK
VC_7     364886116489977000000000000000000000        8 FFFFFGK
VC_7     364886116489977000000000000000000000        9 FFFFFGK
VC_7     364886116489977000000000000000000000       10 FFFFFGK

You can see that for column VC_5 the actual ENDPOINT_VALUE is varying at the 14th and 15th significant digit and my translated value changes. But for VC_6 the numeric ENDPOINT_VALUE is hardly changing. Each row translated to one of two values.
For VC_7 the ENDPOINT_VALUE is static. All histogram END_VALUES are the same.

This means Histograms will not work properly for any VARCHAR(2) columns which do not vary for the first 7 or more characters and any characters after the 7th will be ignored.

Or does it? My logic is correct but tomorrow (or soon after) I’ll try some actual tests over theory… {So don’t ping me with corrections just yet}

I’ve not mentioned VC_16 and VC_40 have I? I will tomorrow. Maybe :-|.

Finally, Here is the selecting script, as promised.

col column_name form a7 head colname
col rowcount form 99,999
col real_val form a17
col end_val form 999999999999999999999999999999999999
col end_string form a10
col endpoint_actual_value form a40
col mod_real form a17
select --owner
 dth.table_name
,dth.column_name
,dth.endpoint_value  end_val
,dth.endpoint_number rowcount
,decode(dtc.data_type,'NUMBER',to_char(dth.endpoint_value)
            ,'VARCHAR2',hist_numtochar(dth.endpoint_value+1)
                 ,          dth.endpoint_value
       ) real_val
,decode(dtc.data_type,'NUMBER',to_char(dth.endpoint_value)

            ,'VARCHAR2',hist_numtochar2(dth.endpoint_value+1)
                 ,          dth.endpoint_value
       ) mod_real
,endpoint_actual_value
from dba_tab_histograms dth
,dba_tab_columns dtc
where dth.table_name ='HIST_TEST'
and  dth.owner=USER
and dth.owner=dtc.owner
and dth.table_name=dtc.table_name
and dth.column_name=dtc.column_name
order by dth.table_name,dth.column_name,dth.endpoint_number
/

Counting the Cost #5 – accurate and fast July 9, 2009

Posted by mwidlake in performance.
Tags: , ,
1 comment so far

I started off my blog by discussing quick ways of selecting count(*) from tables. I never completed the series, so I’m back to it.

This is the previous post on the topic.

It covers how using ALL_TAB_MODIFICATIONSand the NUM_ROWS column in ALL_TABLES can be combined to get a very accurate estimate of count(*) for most tables in most situations.

If this does not work for you {e.g no privilege to see the tables or flush the information), or you absolutely must have the count as accurately as you can right now, as a Manager is threatening you, this is another method using a count table. It can be expanded to hold more than just count(*) and is particularly useful for partitioned tables {which are often huge}.

Basically, every time you select count(*)on a table {or even eg select count(*) where status=0} you are probably scanning a lot of table you scanned last time you did count(*) on that table. If you are not deleting records from that table, that scanning effort is effectively a waste of time – it won’t have changed since last time you did select count(*). So, as a developer or DBA, you can decide you can trust the last count(*), so you need only count more recent records.

If you have an ascending primary key {traditionally a number sourced from a sequence} you can store the max(PK) and count(*) in a simple table. You then count all the records with a PK greater than the stored one. I’ve used count tables several times in the past and two enhancements invariably crop up, so I am going to include them in the below example of how you can implement this.

You need the partitioning option to run the test yourself, so sorry you will need Enterprise Edition with the option. If you downloaded Oracle to your PC/linux box to play on, you probably have it. {it never ceases to amaze me that Oracle Corp will only allow Partitions to be used in EE edition and it is an extra cost option!}

You need to create tablespaces to hold the partition tables {or alter the test script to use existing tablespaces}. Feel free to take and modify this script  to create the tablespaces.

This is my test script .

This is my basic setup:-
NAME VALUE
—————————— ——————–
compatible 10.2.0.1.0
cpu_count 1
db_block_size 8192
db_file_multiblock_read_count 8
optimizer_mode ALL_ROWS
sga_target 612368384
sort_area_size 65536

My test table:-

create table test_p
 id    number(8) not null
 ,status number(1) not null
,num_1 number(3) not null
 ,num_2  number(3)
,vc_pad varchar2(2000))
tablespace parts_curr
 partition by range (id)
 (partition id_10k values less than (10000)
tablespace parts_old
 ,partition id_20k values less than (20000)
tablespace parts_old
 ,partition id_30k values less than (30000)
tablespace parts_old
 ,partition id_40k values less than (40000)
tablespace parts_old
,partition id_max values less than (maxvalue)
tablespace parts_curr
 )
/
-- {41999 records created - see script}
alter table test_p
add constraint pt_pk primary key (id)
using index
local(
partition id_10k tablespace parts_old
 ,partition id_20k tablespace parts_old
,partition id_32k tablespace parts_old
 ,partition id_40k tablespace parts_old
,partition id_max tablespace parts_curr)
/
-- gather stats
begin
dbms_stats.gather_table_stats(ownname=>user
  ,tabname=>'TEST_P',estimate_percent=>100,granularity=>'ALL');
  end;
/-- create the count table
create table test_p_running_count
  (max_id	number(8) not null
  ,row_count number(8)
,constraint tprc_pk primary key(max_id)
  )
/

OK, let’s count the number of records in TEST_P and then do the same but insert the data collected into the count table:-

select count(*) from test_p;

  COUNT(*)
----------
     41999
Execution Plan
----------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)| Time | Pstart| Pstop |
----------------------------------------------------------
|   0 | SELECT STATEMENT       |       |     1 |    26   (4)| 00:00:01 |       |       |
|   1 |  SORT AGGREGATE        |       |     1 |            |          |       |       |
|   2 |   PARTITION RANGE ALL  |       | 41999 |    26   (4)| 00:00:01 |     1 |     5 |
|   3 |    INDEX FAST FULL SCAN| PT_PK | 41999 |    26   (4)| 00:00:01 |     1 |     5 |
-----------------------------------------------------------

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
        111  consistent gets
          0  physical reads

insert into test_p_running_count (max_id,row_count)
select max(id),count(*) from test_p;

Execution Plan
----------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| P stop |
--------------------------------------------------------------
|   0 | INSERT STATEMENT       |       |     1 |     5 |    26   (4)| 00:00:01 |       |     |
|   1 |  SORT AGGREGATE        |       |     1 |     5 |            |          |       |     |
|   2 |   PARTITION RANGE ALL  |       | 41999 |   205K|    26   (4)| 00:00:01 |     1 |   5 |
|   3 |    INDEX FAST FULL SCAN| PT_PK | 41999 |   205K|    26   (4)| 00:00:01 |     1 |   5 |
---------------------------------------------------------------

Statistics
----------------------------------------------------------
         26  recursive calls
         21  db block gets
        115  consistent gets
          0  physical reads

So, it takes 111 consistent gets and an estimated cost of 26 to do a count(*) on this relatively small table, doing a fast full scan of the PK index.  It takes only slightly more effort to create a record to hold this information.

{As an aside, it is curious that the total estimated cost of the insert is the same as the select, suggesting naughtily that the insert is free, but that’s not my topic today}.

I now create 100 records by running a little insert script.

Let’s select the number of records and the new max(ID) created since we last stored information in the count table:

select max(id),count(*) from test_p
where id>(select max(max_id) from test_p_running_count)
 /
   MAX(ID)   COUNT(*)
---------- ----------
     42099        100

Execution Plan
----------------------------------------------------------
| Id  | Operation                     | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
Pstart| Pstop |
--------------------------------------------------------------
|   0 | SELECT STATEMENT              |         |     1 |     5 |     4   (0)| 00:00:01 |      |       |
|   1 |  SORT AGGREGATE               |         |     1 |     5 |            |          |      |       |
|   2 |   PARTITION RANGE ITERATOR    |         |  2100 | 10500 |     2   (0)| 00:00:01 |  KEY |     5 |
|   3 |    INDEX RANGE SCAN           | PT_PK   |  2100 | 10500 |     2   (0)| 00:00:01 |  KEY |     5 |
|   4 |     SORT AGGREGATE            |         |     1 |    13 |            |          |      |       |
| 5 | INDEX FULL SCAN (MIN/MAX)| TPRC_PK | 1 | 13 | 2 (0)| 00:00:01 | | |
---------------------------------------------------------------
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          3  consistent gets
          0  physical reads

The CBO scanned the index of the TEST_P_RUNNING_COUNT table using an index full scan (min/max), which means Oracle basically worked down the index to get a min or max value, which is highty efficient. This value was then used to do an index range scan of only one index partition (partition number 5).

Estimated cost was 4 and the execution took 3 consistent gets {NB this is for the second execution, it is 12 consistent gets an 4 recursive calls the first time, when the statement is parsed}.

But that only got the new max(id) and how many records have been created since you last recorded it, 100. What you really want is the count of records since you last recorded the count and max(id) PLUS the count as it was then. ie the full count now.
This is not straightforward in a single SQL statement as you want the count, a goup function, from one table and the single row value from another. You can do this though. You basically write SQL statements to do the bits you want and then select from them. My example script shows how I build up to this, but the final statement I came up with {and you might well be able to come up with better options yourself} is:

select change.max_id new_max,orig.max_id orig_max
,change.rowcount chg_c,orig.rowcount orig_c 
,change.rowcount+orig.rowcount tot_c
from
(select max(id) max_id,count(*) rowcount,1 tabjoin
  from test_p
  where id>(select max(max_id) from test_p_running_count) ) change
,(select max(max_id) max_id,max(row_count) rowcount,1 tabjoin
  from test_p_running_count)                                            orig
where change.tabjoin=orig.tabjoin
 /

NEW_MAX ORIG_MAX  CHG_C     ORIG_C    TOT_C
---------- ---------- ---------- ---------- -----------------------------
     42099      41999        100      41999                         42099

Execution Plan
----------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CP
U)| Time | Pstart| Pstop |
----------------------------------------------------------
|   0 | SELECT STATEMENT                |                      |     1 |    58 |     7   (0)| 00:00:01 |       |       |
|   1 |  NESTED LOOPS                   |                      |     1 |    58 |     7   (0)| 00:00:01 |       |       |
|   2 |   VIEW                          |                      |     1 |    29 |     4   (0)| 00:00:01 |       |       |
|   3 |    SORT AGGREGATE               |                      |     1 |     5 |  |          |       |       |
|   4 |     PARTITION RANGE ITERATOR    |                      |  2100 | 10500 |     2   (0)| 00:00:01 |   KEY |     5 |
|*  5 |      INDEX RANGE SCAN           | PT_PK                |  2100 | 10500 |     2   (0)| 00:00:01 |   KEY |     5 |
|   6 |       SORT AGGREGATE            |                      |     1 |    13 |  |          |       |       |
| 7 | INDEX FULL SCAN (MIN/MAX)| TPRC_PK | 1 | 13 | 2 (0)| 00:00:01 | | |
|*  8 |   VIEW                          |                      |     1 |    29 |     3   (0)| 00:00:01 |       |       |
|   9 |    SORT AGGREGATE               |                      |     1 |    26 |  |          |       |       |
|  10 |     TABLE ACCESS FULL           | TEST_P_RUNNING_COUNT |     1 |    26 |     3   (0)| 00:00:01 |       |       |
--------------------------------------------------------

Statistics
---------------------------------------------------------
          0  recursive calls
          0  db block gets
         10  consistent gets
          0 physical reads

{you might want to click on “show plain” in the above code window for a clearer layout}
That’s an estimated cost of 7 by the CBO and 10 consistent gets {again, the second itteration}. Remember, scanning the whole partitioned table was costed at 27 and took 111 consistent gets.

You can convert that sql select statement into an insert and use it to update your count table:-

insert into test_p_running_count (max_id,row_count)
 select change.max_id
--,orig.max_id,change.rowcount,orig.rowcount
 ,change.rowcount+orig.rowcount
 from
(select max(id) max_id,count(*) rowcount,1 tabjoin
   from test_p
   where id>(select max(max_id) from test_p_running_count) ) change
,(select max(max_id) max_id,max(row_count) rowcount,1 tabjoin
   from test_p_running_count) orig
where change.tabjoin=orig.tabjoin
 /

That is one of the improvements usually asked for when you have a count table, people want to see what the counts were historically, seeing as you are now holding the data in a table {why is it so many people suddenly want to keep information once it is seen in a table?}. That is why the above does an insert and not an update and my code selects the max(max_id) and max(rowcount). It is not perfect, if no records have been inserted into your massive table since last the max(id) and row count was inserted, you will get a duplicate primary key error. You could add a datetime column to get around this.

Also, my SQL is a little naughty in that I select the max(id) and max(rowcount). There is nothing in the table design to enforce the unspoken rule that they both appear in the “last record” of the table, but as a human you can see that they do. Again, the use of a datetime can help with this.

The final bit for today. The above two sql statments look a bit nasty, certainly a lot more complex than “select count(*) from table”. Also, they are not as efficient as they could be, the selecting max(row_count) is not supported by an index and could slow down over time. I could tweak the SQL statement even further to work around this, but a simple piece of PL/SQL does the job better, and is what I usually end up implementing {within a package, along with a load of other count table functions for several large tables}.

Here is a script to create a function . that gets the current count(*) for the table.

--get_new_max
create or replace function get_tp_rowcount
return number
as
--
v_rtn number;
v_extra_rc number;
v_last_rc   number;
v_max_id    number;
v_last_max  number;
begin
  select max_id,row_count
into v_last_max,v_last_rc
  from test_p_running_count
  where max_id=(select max(max_id) from test_p_running_count);
  select count(*),max(id)
into v_extra_rc,v_max_id
  from test_p
  where id>v_last_max;
dbms_output.put_line(to_char(v_last_rc)
||'~'||to_char(v_last_rc)
              ||'~'||to_char(v_last_max)
              ||'~'||to_char(v_max_id));
v_rtn :=v_last_rc+v_extra_rc;
return v_rtn;
end;
/

And, as you can see below, it remains an efficient trick. I’ll leave it to you to create a procedure to update the count table and with the observation that you can then argue for several hours as to how often you update the count table… :-)

select get_tp_rowcount from dual;

GET_TP_ROWCOUNT
---------------
          42299

1 row selected.

Elapsed: 00:00:00.01

Execution Plan
----------------------------------------------------------
| Id  | Operation        | Name | Rows  | Cost (%CPU)| Time     |
-------------------------------------------------------------
|   0 | SELECT STATEMENT |      |     1 |     2   (0)| 00:00:01 |
|   1 |  FAST DUAL       |      |     1 |     2   (0)| 00:00:01 |
-------------------------------------------------------------

Statistics
----------------------------------------------------------
          2  recursive calls
          0  db block gets
          5  consistent gets
          0  physical reads

--
--

select count(*) from test_p;

  COUNT(*)
----------
     42299

Execution Plan
----------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)| Time | Pstart| Pstop |
----------------------------------------------------------
|   0 | SELECT STATEMENT       |       |     1 |    26   (4)| 00:00:01 |       |       |
|   1 |  SORT AGGREGATE        |       |     1 |            |          |       |       |
|   2 |   PARTITION RANGE ALL  |       | 41999 |    26   (4)| 00:00:01 |     1 |     5 |
|   3 |    INDEX FAST FULL SCAN| PT_PK | 41999 |    26   (4)| 00:00:01 |     1 |     5 |
----------------------------- ----------------------------
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
        111  consistent gets
          0  physical reads

Peeking under the Data Dictionary Hood July 7, 2009

Posted by mwidlake in internals.
Tags: ,
9 comments

Have you ever wondered how some people seem to be able to find out how the Oracle database does things internally? How they work out where some of the interesting stuff is contained within the data dictionary and can pull it out?

Well, one of the simpler ways to peek under the data dictionary hood is to just look at the objects in the data dictionary itself. I’ve found some interesting things by looking at the data dictionary views. A lot of the dictionary objects areviews, like DBA_TABLES and DBA_COLUMNS. I’ll look at DBA_TABLESPACES {as it’s a bit smaller!}

You can get my script here – vw_txt.sql . As you will see, it is a terribly complex script… The only trick is to make sure that “set long 32767” as the view text is held in column of the ancient LONG datatype.

> @vw_txt
Enter value for vw_name: dba_tablespaces
old 6: where view_name like upper (nvl('&vw_name','WHOOPS')||'%')
new   6: where view_name like upper (nvl('dba_tablespaces','WHOOPS')||'%')
Any Key...>

OWNER                          VIEW_NAME                      TEXT_LENGTH
------------------------------ ------------------------------ -----------
TEXT
--------------------------------------------------------------------------------
SYS DBA_TABLESPACES 1724
select ts.name, ts.blocksize, ts.blocksize * ts.dflinit,
          decode(bitand(ts.flags, 3), 1, to_number(NULL),
ts.blocksize * ts.dflincr),
ts.dflminext,
          decode(ts.contents$, 1, to_number(NULL), ts.dflmaxext),
decode(bitand(ts.flags, 4096), 4096, ts.affstrength, NULL),
decode(bitand(ts.flags, 3), 1, to_number(NULL), ts.dflextpct),
ts.blocksize * ts.dflminlen,
decode(ts.online$, 1, 'ONLINE', 2, 'OFFLINE',
                 4, 'READ ONLY', 'UNDEFINED'),
decode(ts.contents$, 0, (decode(bitand(ts.flags, 16), 16, 'UNDO',
                 'PERMANENT')), 1, 'TEMPORARY'),
decode(bitand(ts.dflogging, 1), 0, 'NOLOGGING', 1, 'LOGGING'),
decode(bitand(ts.dflogging, 2), 0, 'NO', 2, 'YES'),
          decode(ts.bitmapped, 0, 'DICTIONARY', 'LOCAL'),
decode(bitand(ts.flags, 3), 0, 'USER', 1, 'SYSTEM', 2, 'UNIFORM',
                 'UNDEFINED'),
          decode(ts.plugged, 0, 'NO', 'YES'),
decode(bitand(ts.flags,32), 32,'AUTO', 'MANUAL'),
decode(bitand(ts.flags,64), 64,'ENABLED', 'DISABLED'),
decode(bitand(ts.flags,16), 16, (decode(bitand(ts.flags, 512), 512,
                 'GUARANTEE', 'NOGUARANTEE')), 'NOT APPLY'),
decode(bitand(ts.flags,256), 256, 'YES', 'NO'),
decode(tsattr.storattr, 1, 'STORAGE', 'HOST'),
decode(bitand(ts.flags,16384), 16384, 'YES', 'NO'),
decode(bitand(ts.flags,64), 0, null,
decode(bitand(ts.flags,65536), 65536,'FOR ALL OPERATIONS',
                 'DIRECT LOAD ONLY'))
from sys.ts$ ts, sys.x$kcfistsa tsattr
where ts.online$ != 3
and bitand(flags,2048) != 2048
and ts.ts# = tsattr.tsid

1 row selected.

You see a lot of decodes of bitand functions in these internal views. What is more interesting is to see what they are decoded into as it confirms what possible options there are.

Are you curious as to what all the possible object types in the database are? Whether your database happens to have examples of them or not? Peek inside the DBA_OBJECTS view {the following is part of that view}:-

OWNER                          VIEW_NAME                      TEXT_LENGTH
------------------------------ ------------------------------ -----------
TEXT
-------------------------------------------------------------------------------
SYS DBA_OBJECTS 2992
select u.name, o.name, o.subname, o.obj#, o.dataobj#,
       decode(o.type#, 0, 'NEXT OBJECT', 1, 'INDEX', 2, 'TABLE', 3, 'CLUSTER',
                      4, 'VIEW', 5, 'SYNONYM', 6, 'SEQUENCE',
                      7, 'PROCEDURE', 8, 'FUNCTION', 9, 'PACKAGE',
                      11, 'PACKAGE BODY', 12, 'TRIGGER',
                      13, 'TYPE', 14, 'TYPE BODY',
                      19, 'TABLE PARTITION', 20, 'INDEX PARTITION', 21, 'LOB',
                      22, 'LIBRARY', 23, 'DIRECTORY', 24, 'QUEUE',
                      28, 'JAVA SOURCE', 29, 'JAVA CLASS', 30, 'JAVA RESOURCE',
                      32, 'INDEXTYPE', 33, 'OPERATOR',
                      34, 'TABLE SUBPARTITION', 35, 'INDEX SUBPARTITION',
                      40, 'LOB PARTITION', 41, 'LOB SUBPARTITION',
42, NVL((SELECT 'REWRITE EQUIVALENCE'
                               FROM sum$ s
                               WHERE s.obj#=o.obj#
and bitand(s.xpflags, 8388608) = 8388608),
                              'MATERIALIZED VIEW'),
                      43, 'DIMENSION',
                      44, 'CONTEXT', 46, 'RULE SET', 47, 'RESOURCE PLAN',
                      48, 'CONSUMER GROUP',
                      51, 'SUBSCRIPTION', 52, 'LOCATION',
                      55, 'XML SCHEMA', 56, 'JAVA DATA',
                      57, 'EDITION', 59, 'RULE',
                      60, 'CAPTURE', 61, 'APPLY',
                      62, 'EVALUATION CONTEXT',
                      66, 'JOB', 67, 'PROGRAM', 68, 'JOB CLASS', 69, 'WINDOW',
                      72, 'WINDOW GROUP', 74, 'SCHEDULE', 79, 'CHAIN',
                      81, 'FILE GROUP', 82, 'MINING MODEL', 87, 'ASSEMBLY',
                      90, 'CREDENTIAL', 92, 'CUBE DIMENSION', 93, 'CUBE',
                      94, 'MEASURE FOLDER', 95, 'CUBE BUILD PROCESS',
'UNDEFINED')

Scanning the view definitions for something you become interested in is a little more tricky as the text is, as I said, stored in a long and you can’t run sql functions against a long. I use a bit of PL/SQL to get around the problem :-

find_vw_txt.sql .

-- find_vw_txt
-- martin widlake 13/8/08
-- can't search view text as it is in a long.
-- so let's try PL/SQL
set serveroutput on size unlimited
spool find_vw_txt.lst
set trims on  lines 120 pages 32
declare
v_name varchar2(4000) :='&viewname';
v_search varchar2(100) :='&search_txt';
cursor get_vw_txt is
select u.name owner, o.name name, v.textlength textlen, v.text text
from sys.obj$ o
, sys.view$ v
   , sys.user$ u
where o.obj#     = v.obj#
and   o.owner#   = u.user#
and   o.name     like upper(v_name)||'%'
and v.textlength &lt;32000; -- cant think how I stop a numeric
                          -- error trying to pull a larger long
-- into a varchar. bloody longs
v_text varchar2(32000);
v_where number;
begin
  v_name :=upper (v_name);
  v_search := upper(v_search);
for vtr in get_vw_txt loop
--dbms_output.put_line('looking at '||vtr.name);
    v_text := vtr.text;
    v_text := upper(v_text);
    v_where := instr(v_text,v_search);
    if v_where !=0 then
dbms_output.put_line('view '||vtr.owner||'.'||vtr.name||':** '
||substr(v_text,greatest(0,v_where),80)||' **'
                           );
    end if;
  end loop;
end;
/
spool off

It can be fun to dig into the internals this way. And useful. A few times I’ve been able to find out one or two things about how oracle is storing information.

It might occur to you that there are lots of internal pl/sql packages and you can look into them too:-

pkg_txt.sql will show you the contents of a package or stored function/procedure.
find_plsql_txt.sql will scan the stored PL/SQL for a text string.

However, a lot of the internal PL/SQL is wrapped, ie converted into a form you can’t peek into easily {well, in my case, at all, but some people know how}.

One last word of warning. You can find what look like interesting undocumented features when you peek under the covers. In V10 I came across the REVERSE function:


select reverse('Martin was here')
from dual

REVERSE(‘MARTIN
—————
ereh saw nitraM

Nice :-).

That could be useful, yes?

I’ve checked in the 11g documentation and it is still not documented. I googled “oracle reverse function” and several people have also found it. A couple ask why such a useful thing is not documented…

Well, I was “lucky” enough to find out why it is undocumented. I tried to use the REVERSE function on a number { this wason v10.1 I think}.

My session core-dumped.

I was on a production system when I tried this {Yes I KNOW!!! It was a momentary lapse}!

It seems to work OK on 10.2.0.4 and 11.1 but don’t. You have no idea if it will always work OK and no come-back to Oracle Corp if it blows up on you. My guess is they wrote the REVERSE function for internal use and did not test it for people using it “not as intended” {as I had done}.

So, my warning is, though it is fun to look under the covers and try things out, never, ever ever, ever do so on a production system. Or a test system used for proper testing. Or a development system those touchy developers are using. Or any system that you can’t afford to trash and recreate on a whim. Try it at home on your own PC, OK?

Cost of full table scans #2 June 30, 2009

Posted by mwidlake in performance, Uncategorized.
Tags: , ,
2 comments

This is the second post in a series on Full Table Scans. Sorry to anyone waiting for part two, I got distracted {I suspect very few people were actually waiting :-)} I make up for it by making posting this veeeery long.

Post One – I got several comments, mostly pre-empting what I planned to say next {I am not complaining as such, I like to get feedback and Randolf Geist in particular was being very helpful}, but I’m going to say what I wanted to say anyway, especially as I was being purposefully misleading in my first post…

I’m attempting to show three things in this series of posts.

  1. How table scans are costed by the CBO under 10g and 11g, as it is quite different from before.
  2. That simple test cases of what you knew once can fail to work and leave you confused and questioning your knowledge.
  3. How levels of “truth” help you understand more as you dig deeper.
  4. As a fourth I might touch on how Oracle’s calculated cost is not actually always reflecting how it executes sql, but I need to work on that a little.

If you thought that db_file_mutliblock_read_count is the number of blocks that oracle reads in one go when scanning tables and indexes and that this improves performance, that will help you understand what is going on inside your Oracle database.
If you think that increasing the value will make full table scans look more efficient to Oracle and thus it will favour them, you are again right and it may well help you tune a system.
However, you may know more – you may know that Oracle modifies the value of the MBRC to make it more realistic. As the MBRC gets larger, things like the size of the segment, hitting the end of individual extents and the physical time taken to read data off disk and chuck it across the network makes the efficiency gains of these larger values less significant. Each layer of knowledge helps a little more. Then the technology underneath shifts and you may not realise this as the older knowledge still kind of helps.

I’m not trying to be smart or “I know better” here, I’ve posted before that none of us knows this stuff until we are told and I certainly did not know all of what is in these posts when I started preparing them 3 or 4 weeks ago. Some of what I plan to say I am still not sure of. But I’ll postpone the philosphy now for a later blog post, though as I have a rough version already written, here it is if you want.

Back to the technology and the costing of full table scans.

Here is my test script again.

It creates a couple of tables, the one of interest is TEST_BIG, holding 40,000 rows. TEST_BIG is 2304 blocks big but only 2267 blocks are occupied. As nothing has been deleted from the table, this is also the high water mark, ie the highest point in the table into which data has been inserted. Oracle keeps track of this high water mark. The table is in a locally managed, autoallocate tablespace with automatic segment space management. All the defaults.

This is my environment:-
NAME VALUE
—————————— ——————–
compatible 11.1.0.0.0
cpu_count 2
db_block_size 8192
db_file_multiblock_read_count 128
optimizer_mode ALL_ROWS
sga_target 0
sort_area_size 65536

Just to re-itterate, I have not gathered system statistics -the system is “out of the box”.

From Oracle 10 onwards, there are some default system statistics set. You can see these by looking at the sys.aux_stats$ table (click on “view plain” to see it properly laid out):-

 select sname,pname,pval1,pval2
 from
 sys.aux_stats$
 order by sname,pname

SNAME           PNAME                PVAL1 PVAL2
-----------------------------------------------------
SYSSTATS_INFO   DSTART                     10-15-2007 11:32
SYSSTATS_INFO   DSTOP                      10-15-2007 11:32
SYSSTATS_INFO   FLAGS                    1
SYSSTATS_INFO   STATUS                     COMPLETED
SYSSTATS_MAIN   CPUSPEED
SYSSTATS_MAIN   CPUSPEEDNW      1845.90945
SYSSTATS_MAIN   IOSEEKTIM               10
SYSSTATS_MAIN   IOTFRSPEED            4096
SYSSTATS_MAIN   MAXTHR
SYSSTATS_MAIN   MBRC
SYSSTATS_MAIN   MREADTIM
SYSSTATS_MAIN   SLAVETHR
SYSSTATS_MAIN   SREADTIM

There are only 3 values set {on both version 11g and 10g},

  • CPUSPEEDNW – an estimate of the speed of the CPU. This value is different on different servers. Maybe. Both my old V10 windows test boxes have a value of 484, even though one is a truely old laptop, yet the linux servers I have access to all have different values, ranging from 713 to 1778.
  • IOSEEKTIME – This is the time for the IO subsystem to locate the start of an item. In both v10 and 11, this is 10 milliseconds
  • IOTFRSPEED – This is the volume of data that can be read from the io subsystem in 1 millisecond – 4096 bytes or 4k, again in V10 and 11

These default figures for IOSEEKTIME and IOTFRSPEED are fairly conservative, especially as they have not changed since V10, but then seek time has not really improved (especially on single discs).

In my experience, and looking around on the net, it seems that many sites on V10 upwards have not gathered system stats {BTW see the end of this post for some clarification on system statistics}. This seems to be due to one of several factors:

  • Some sites do not have a DBA or one who knows about system statistics {I’m not criticising, if you have been labelled a DBA and thrown the whole of the Oracle stack to look after, there is a lot to get your head around}.
  • Some DBAs think the system stats are gathered by the automatic stats job that {sort of} runs every night .
  • Changing to using system statistics would change execution plans and thus need full regression testing, some re-tuning, which there is no time or money for.
  • Hey, it works, why change things {this can also be viewed as  “I am fundamentally scared of change” :-) }.

I can understand all of the above, and as a result of them many of us benefit from understanding how the default system statistics work. I would suggest it is better to gather system statitics than not as you are giving the oracle CBO optimizer more accurate information about the actual speed of your cpu, your IO and the difference between single block reads and multi-block reads, but life is not perfect. {NB I’ve heard on the grapevine that you should not gather system stats if you run Oracle Apps 11i but I am no expert on Oracle Apps!}

So what does Oracle do with the “out of the box” information? It calculate values for SREADTIME {time to read a single block} and MREADTIM {the time taken to do a multi-block read. If you think about how discs work, to read something the head of the disc must locate where the data is and physically move to it – this is the seek time – and then read the data.
So for a single block (8k) the time taken is

IOSEEKTIME+(size of one block/IOTFRSPEED).
ie 10+(8192bytes/4096 bytes) = 12.

For a multi block read, then the disc head will seek once and then read db_file_multiblock_read_count number of blocks at once {just accept this for now, please}. In my case, 128*8k.

IOSEEKTIME+( (MBRC * Size of one block) / IOTFRSPEED)
ie 10+( (128*8192) / 4096)
= 10+(1048576/4096)
=10+(1024k/4k) =10+256 =236
I don’t know about you but I can’t divide 1048576 by 4096 just like that but 1024/4, yep easier.

The CBO can only cost something if it can treat all parts in the same units, so it converts those multiblock read counts to equivalent single block read counts by dividing the time for the multi block read by the time of the single read.
236/12 = 19.6667.
Oracle then rounds the figure to the next whole number, so 20.

{Incidentally, it does this with the CPU cost as well.
Oracle makes a guess at how much cpu effort an activity will take and then divides it by the CPU speed and the time for a single block read – I’m being a little simplistic here, but it will do for now}.

So, a single block read is cost 1
A multi block read of 128 is cost 20, and reads 128 times as much.

In previous versions of Oracle this was all different. In 8i, single block reads and multi block reads were costed the same. In 9i without cpu costing {no cpu costing being the default and I can’t remember seeing it changed on a live system I saw} a “fudge factor” was introduced where values for MBRC were reduced down to better match reality. And in my first post I misleadingly referred to a Fudge Factor and gave a table. Anyone who was used to this fudge factor may have noticed my numbers did not match, but then who of us remembers that the fudge factor for 8 as the multi block read count is 6.19? :-)

{That table is still useful, it shows an effective ratio for the MBRC, IF you have gatherd no stats AND you have an 8k block size. OK, maybe it is not so useful…}

Anyway, as you can now see from the above formulae, Oracle on 10g and 11g now uses a fairly sensible piece of logic to decide on the relative costs of single and multi block reads. The CBO formula is basically saying you save the cost of multiple seek times when reading multiple blocks, but the time to transfer data is directly related to the volumeof data read. This is probably reasonable on a system with a simple IO setup (like single disks or very basic RAID) but will break down with high-end storage which has read-ahead and SAN caches and the like.

OK, I have to address one more issue before I demonstrate that the formula is correct. And it is something that I did not know and confused the hell out of me when I decided to do a “simple test” of what I thought I knew, and prompted me to go on endlessly about how doing a simple test to prove what you have learned can go wrong and send you crazy with confusion. And thanks again to Randolf Geist for pointing me in the right direction.

At the end of the last post I had shown that my multiblock read count was 128 but acted like it was 8.
ie I got a cost of 617 for scanning TEST_BIG. When I manually set my MBRC to 8 I also got a cost of 617. When I manually set a value for MBRC of 128, it gave a cost of 395. Huh?!

Coskan alluded to this bit, but I missed it – I was ignorant about what he was saying.

On a newly created Oralce 11.1 test database, created as part of a simple instal on windows vista, you have db_file_multiblock_read_count set to:-

show parameter db_file_multiblock_read
NAME Value
————————————————————
db_file_multiblock_read_count 128

But there is a hidden parameter:-
NAME VALUE
————————————
DESCRIPTION
—————————- —– —–
_db_file_optimizer_read_count 8
multiblock read count for regular clients

{To see hidden parameters you need to be logged in as sys or use a trick like intermediate views – here is a script you can run as sys to see them. Warning, there are a LOT of them! I’ll do yet another post about hidden parameters sometime soon.}

How the hell are we supposed to know this???! Do Oracle do this on purposed to keep us awake at night worrying about this???.
*sigh*.

On Oracle 10, this hidden initialization parameter defaults to 8 and MBRC defaults to 8 {usually – I just recreated a new test 10.2.0.1 database and both are set to 16 “out of the box”}. If you change MBRC, the hidden parameter alters too, to the same value, so you do not notice it so much.
On 11, MBRC is set to 128 and the hidden parameter is set to 8 initially. Once you change MBRC, the hidden _db_file_optimizer_read_count is also changed to match.

So, as my first paragraph on the first posting said, you go on to a test database to check something and you get confused as the very first “simple” test does not give you the results you expected.

OK, does the formula work?
In the below table I list:

  • The MBRC that I set by altering my session
  • The cost as reported by Explain Plan
  • The calculated MREADTIM = IOSEEKTIM+ (MBRC*blocksize/IOTFRSPEED)
  • The calculated SREADTIM  = IOSEEKTIM+(blocksize/IOTFRSPEED)
  • MREATIME/SREADTIM { I’ll call it MT/ST}
  • Blocks in table/MBRC {I’ll call it blks/MBRC
  • (blks/MBRC)  * (MT/ST) to get a calculated cost in single block reads
MBRC Cost MRT SRT MT/ST blks/  Calc cost
                                       mbrc
2   1325  14    12  1.167   1133.5     1323
3   1010  16    12  1.333   755.667   1008
4    853   18    12  1.5       566.75     851
6    695   22    12  1.833   377.833   693
8    617   26    12  2.167   283.375   615
10  569   30    12  2.5       226.7       567
12  538   34    12  2.833   188.917   536
16  499   42    12  3.5       141.688   496
20  475   50    12  4.167   113.35     473
24  459   58    12  4.833   94.458     457
32  440   74    12  6.167   70.844     437
33  438   76    12  6.333   68.697     436
64  410   138  12  11.5    35.422     408
128 395  266  12  22.167  17.711   393

Yes :-)
The calculated cost is always 2 less than the cost given by the CBO. I know that under Oracle 10 1 is added to the cost for reading the table header. I am presuming that an additional 1 has been added for some reason under 11.

That will do for now.

Oh, as promised, some comments on system statistics

Oracle System statistics are NOT statistics collected on the tables and indexes owned by users SYS or SYSTEM. These would be “Dictionary Statistics”. These can be collected via specific calls to the DBMS_STATS.GATHER_DICTIONARY_STATISTICS and {if you have not turned off the automatics gathering of object statistics} are gathered automatically during weekday nights and over the weekend by the automated job, which gathers table and index stats when it feels they are needed. I’ll be posting about that {probably at inordinate length} in a month or so.
System Statistics are statistics about how the underlying computer system works, specifically cpu speed and IO subsystem performance.
Even Oracle Support can get confused over this. You might like to check out this unfortunate experience on the topic. I have to say, I have much sympathy with Piet as I had very similar issues trying to discuss automated stats gathering with Oracle a couple of years ago. Having said that, I in the end got good support on the topic.

Min and Max do not Mix May 27, 2009

Posted by mwidlake in performance.
Tags: , ,
10 comments

If you want to select the min and max value for an indexed column in a table, it may not perform as well as you like, despite some nice tricks Oracle can do with min and max. I’ll show you what I mean and a little trick, which might be the fastest way to get the min and max values in Oracle.

You can use this scipt here to replicate the below. It creates a test table of 99,999 rows. It was written on oracle 10 and should work on 11 unchanged {post a comment if you try and find anything is different} and you just need to remove the “purge” part of the drop command to get it to work on 9 {I think}.

This is the basic environment and setup.

NAME                           VALUE
------------------------------ --------------------
compatible                     10.2.0.1.0
cpu_count                      1
db_block_size                  8192
db_file_multiblock_read_count  16
optimizer_mode                 ALL_ROWS
sga_target                     612368384
sort_area_size                 65536

test102>desc test_1
 Name                             Null?    Type
 ------------------------------------------------------
 ID                               NOT NULL NUMBER
 NUM_1                                     NUMBER
 ID_2                                      VARCHAR2(41)
 NUM_2                                     NUMBER
 NUM_3                                     NUMBER
 NUM_4                                     NUMBER
 VC_1                                      VARCHAR2(50)
 VC_2                                      VARCHAR2(50)
 VC_3                                      VARCHAR2(250)

select count(*) from test_1;

  COUNT(*)
----------
     99999
Elapsed: 00:00:00.01

Execution Plan
-----------------------------------------------------
| Id  | Operation             | Name | Rows  | Cost (%CPU)|
-----------------------------------------------------
|   0 | SELECT STATEMENT      |      |     1 | 49   (5)|
|   1 |  SORT AGGREGATE       |      |     1 |         |
|   2 |   INDEX FAST FULL SCAN| T_PK | 99999 | 49   (5)|
-----------------------------------------------------

Statistics
-----------------------------------------------------
          0  recursive calls
          0  db block gets
        215  consistent gets
          0  physical reads

You can see that to count the 99,999 rows the CBO expects to fast full scan index T_PK for a cost of 49 {which equates roughly to the number of expected I/O operations} and the execution results in 215 buffer gets, which is the number of times a block was read from memory.

Let’s select the MAX(ID):

select max(id) from test_1;

   MAX(ID)
----------
     99999

Execution Plan
-----------------------------------------------------
| Id  | Operation                  | Name | Rows  | Bytes | Cost (%CPU)|
-----------------------------------------------------
|   0 | SELECT STATEMENT           |      |     1 |    4 |2(0)|
|   1 |  SORT AGGREGATE            |      |     1 |    4 |    |
|   2 |   INDEX FULL SCAN (MIN/MAX)| T_PK | 99999 |  390K|2(0)|
-----------------------------------------------------

Statistics
-----------------------------------------------------
          0  recursive calls
          0  db block gets
          2  consistent gets

The plan shows that Oracle will use an INDEX FULL SCAN (MIN/MAX) to get the result. I find it a little misleading that it is called a “full scan” and it shows as processing 99,999 rows when in fact what Oracle does is nip down the ‘edge’ of the index to get the result. It reads the root block and one leaf block to get the result {Oracle does not need to visit the table, the data for column ID is in the index}.
If you have any doubts, look at the consistent gets – 2.
It’s fast and efficient.
It works with MIN too:-

select min(id) from test_1;

   MIN(ID)
----------
         1

Execution Plan
--------------------------------------------------
| Id  | Operation                  | Name | Rows  |Bytes| Cost (%CPU)|
--------------------------------------------------
|   0 | SELECT STATEMENT           |      |     1 |   4 |2 (0)|
|   1 |  SORT AGGREGATE            |      |     1 |   4 |     |
|   2 |   INDEX FULL SCAN (MIN/MAX)| T_PK | 99999 |390K|2 (0)|
---------------------------------------------------

Statistics
----------------------------------------------------
          0  recursive calls
          0  db block gets
          2  consistent gets

It does not work for the other group functions as they need to gather all the data to, for example, calculate the average value for ID:

select avg(id) from test_1;

AVG(ID)
———-
50000

Execution Plan
——————————————————————————
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
—————————————————–
| 0 | SELECT STATEMENT | | 1 | 4 | 49 (5)|
| 1 | SORT AGGREGATE | | 1 | 4 | |
| 2 | INDEX FAST FULL SCAN| T_PK | 99999 | 390K| 49 (5)|
—————————————————–

Statistics
—————————————————–
0 recursive calls
0 db block gets
215 consistent gets

Here Oracle returns to the index fast full scan and the same workload as count(*).

Now for the “gotcha” bit. What happens if you select MIN and MAX?

select max(id),min(id) from test_1;

   MAX(ID)    MIN(ID)
---------- ----------
     99999          1


Execution Plan
--------------------------------------------------
| Id  | Operation             | Name | Rows  | Bytes |Cost (%CPU)|
--------------------------------------------------
|   0 | SELECT STATEMENT      |      |     1 |     4 |49   (5)|
|   1 |  SORT AGGREGATE       |      |     1 |     4 |        |
|   2 |   INDEX FAST FULL SCAN| T_PK | 99999 |   390K|49   (5)|
---------------------------------------------------


Statistics
----------------------------------------------------
          0  recursive calls
          0  db block gets
        215  consistent gets
          0  physical reads
          0  redo size
        472  bytes sent via SQL*Net to client
        381  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

Damn! Oracle goes back to the index fast full scan and 215 consistent gets!

Way back when I was beta testing Oracle 10 I found this and reported it as a bug. Oracle Corp replied that it was not a bug, it was expected behaviour. It would scan the whole index even though, as a human, you could see a better way. The reason given, and I have some sympathy with this, is that spotting that only min and max supported by an index had been asked for and there was a more efficient way to do this would have to be added to the CBO.
Where we differed is they thought this would be a rare event, I thought it would be a more common event and worth their effort.

Never mind, you can do it yourself {I won’t line this up like the previous examples, they need too much editing to fit!}:

— now together
select max(id) from test_1
union
select min(id) from test_1;

MAX(ID)
———-
1
99999

Execution Plan
————————————————————–
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)|
————————————————————-
| 0 | SELECT STATEMENT | | 2 | 8 | | 601 (53)| 00:00:08 |
| 1 | SORT UNIQUE | | 2 | 8 | 4753K| 601 (53)| 00:00:08 |
| 2 | UNION-ALL | | | | | | |
| 3 | SORT AGGREGATE | | 1 | 4 | | 301 (5)| 00:00:04 |
| 4 | INDEX FULL SCAN (MIN/MAX)| T_PK | 99999 | 390K| | 2 (0)| 00:00:01 |
| 5 | SORT AGGREGATE | | 1 | 4 | | 301 (5)| 00:00:04 |
| 6 | INDEX FULL SCAN (MIN/MAX)| T_PK | 99999 | 390K| | 2 (0)| 00:00:01 |
————————————————————————-

Statistics
———————————————————-
0 recursive calls
0 db block gets
4 consistent gets
0 physical reads
0 redo size
451 bytes sent via SQL*Net to client
381 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
2 rows processed

There are times when a human beats the computer. 4 consistent gets. I’ve played with this a little and on my test system the select min(id),max(id) version takes around 4ms. The union method takes 0-1ms.
I see a couple of oddities.

– Firstly, the plan has 3 sort operations, but the statistics show only 1 sort. I am not fully sure what is going on, it might be that the plan is a little duff, it might be that at execution time Oracle realises it really does have only 1 record to sort and so does not. {I’ll buy the person who tells me the cause of the remaining sort and how to get rid of it a pint at the next UKOUG – if you do so before I blog about it}

– Secondly, if you look at the plan, the CBO estimates a lot of effort for the sorts and thus the total cost of the statement is high relatively high, 601. I need to look into that.

I’ve used this trick of splitting “select min,max” into two statements unioned together several times over the years and I find it very useful. Below I go into a few more oddities, but I know I ramble on so feel free to stop now.

In previous postings I have looked at the fastest way to select the number of records in a table. Well, if your tables have a surrogate primary key consisting of an ascending number (generated from a sequence and a very common occurrence) and you know there are few or no skipped values, you can use the above trick to get the MIN(ID) and MAX(ID) and thus the difference is the number of rows. You can do it in one statement, as shown below:

-- diff between min and max in one statement 
select a.result-b.result id_number
from (select max(id)+1 result,1 eric from test_1) a
   ,(select min(id) result,1 eric from test_1) b
where a.eric=b.eric
/

ID_NUMBER
-----------------
            99999


Execution Plan
-------------------------------------------------------
| Id  | Operation                    | Name | Rows  | Bytes | Cost (%CPU)|
-------------------------------------------------------
|   0 | SELECT STATEMENT             |      |     1 |    32 |     4   (0)|
|   1 |  NESTED LOOPS                |      |     1 |    32 |     4   (0)|
|   2 |   VIEW                       |      |     1 |    16 |     2   (0)|
|   3 |    SORT AGGREGATE            |      |     1 |     4 |            |
|   4 |     INDEX FULL SCAN (MIN/MAX)| T_PK | 99999 |   390K|     2   (0)|
|*  5 |   VIEW                       |      |     1 |    16 |     2   (0)|
|   6 |    SORT AGGREGATE            |      |     1 |     4 |            |
|   7 |     INDEX FULL SCAN (MIN/MAX)| T_PK | 99999 |   390K|     2   (0)|
------------------------------------------------------

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          4  consistent gets
          0  physical reads
          0  redo size
        422  bytes sent via SQL*Net to client
        381  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

With this version the CBO recognises that the statement cost is low (look, cost 4), the odd sorts are still there in the plan on a single row but the statement takes 4 consistent gets.

I used to think, wrongly, that Oracle would only use the INDEX FULL SCAN (MIN/MAX) for getting the absolute minimum and maximum values. I don’t know why I thought this but check out the next two examples:

select max(id) from test_1
where id7500;

MAX(ID)
———-
99999

Execution Plan
———————————————————-
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
———————————————————-
| 0 | SELECT STATEMENT | | 1 | 4 | 2 (0)|
| 1 | SORT AGGREGATE | | 1 | 4 | |
| 2 | FIRST ROW | | 92500 | 361K| 2 (0)|
|* 3 | INDEX RANGE SCAN (MIN/MAX)| T_PK | 92500 | 361K| 2 (0)|
———————————————————-

Statistics
———————————————————-
0 recursive calls
0 db block gets
2 consistent gets
0 physical reads

As you can see, Oracle knows it can get the Max (and min) value using the INDEX RANGE SCAN MIN/MAX efficiently, even for the second case where I am asking for the max value ABOVE a stated point.
I half expected oracle to do a range scan from the value I gave it. But it has the sense to know that, logically, the maximum value below X can be found by looking for X and then it is the record before that (or where it would be if it does not exist). The maximum above X is the maximum. In restrospect, I was asking a daft question there :-)

I could play with this longer, but enough for now.

Follow

Get every new post delivered to your Inbox.

Join 210 other followers