jump to navigation

Update to Decoding High and Low values February 24, 2010

Posted by mwidlake in internals, performance.
Tags: ,
10 comments

After what seems an age, I finally got a little time to enhance my col_stats code with the extra info people provided as comments to
this original posting on it. I also now handle the translation of dates correctly and also timestamps.

I had a real struggle with timestamps. I knew I could find the answer in Tom Kyte’s “Expert Oracle Database Architecture” as one of the comments said so, but I tried to work it out myself as my copy of the book was at my Mothers {no, she is not learning Oracle Databases, I left it there by accident}. As the other elements of timestamps are held as numerical elements stored as hex (number of days, number of months) I tried to interpret it like that. I was being too complex, it is just an 8-digit hex number – to_number(value,’XXXXXXXX’).

Anyway, this is new-improved, cleanes whiter-than-white script {I’ve increased the width of myt blog to better show code like the belwo, but try clicking on “show source” if you want a plain text version}:

-- col_stats
-- Martin Widlake mdw 21/03/2003
-- MDW 11/12/09 enhanced to include more translations of low_value/high_value
-- pilfered from Gary Myers blog
-- MDW 20/02/10 added in the handling of timestamps.
col owner        form a6 word wrap
col table_name   form a15 word wrap
col column_name  form a22 word wrap
col data_type    form a12
col M            form a1
col num_vals     form 99999,999
col dnsty        form 0.9999
col num_nulls    form 99999,999
col low_v        form a30
col low_v2       form a18
col hi_v         form a30
col data_type    form a10
col low_value    form a25
col high_value   form a25
set lines 110
break on owner nodup on table_name nodup
spool col_stats.lst
select --owner
--      ,table_name
      column_name
      ,data_type
      ,decode (nullable,'N','Y','N')  M
      ,num_distinct num_vals
      ,num_nulls
      ,density dnsty
,decode(substr(data_type,1,9) -- as there are several timestamp types
  ,'NUMBER'       ,to_char(utl_raw.cast_to_number(low_value))
  ,'VARCHAR2'     ,to_char(utl_raw.cast_to_varchar2(low_value))
  ,'NVARCHAR2'    ,to_char(utl_raw.cast_to_nvarchar2(low_value))
  ,'BINARY_DO',to_char(utl_raw.cast_to_binary_double(low_value))
  ,'BINARY_FL' ,to_char(utl_raw.cast_to_binary_float(low_value))
  ,'DATE',rtrim(
               to_char(100*(to_number(substr(low_value,1,2),'XX')-100)
                      + (to_number(substr(low_value,3,2),'XX')-100),'fm0000')||'-'||
               to_char(to_number(substr(low_value,5,2),'XX'),'fm00')||'-'||
               to_char(to_number(substr(low_value,7,2),'XX'),'fm00')||' '||
               to_char(to_number(substr(low_value,9,2),'XX')-1,'fm00')||':'||
               to_char(to_number(substr(low_value,11,2),'XX')-1,'fm00')||':'||
               to_char(to_number(substr(low_value,13,2),'XX')-1,'fm00'))
  ,'TIMESTAMP',rtrim(
               to_char(100*(to_number(substr(low_value,1,2),'XX')-100)
                      + (to_number(substr(low_value,3,2),'XX')-100),'fm0000')||'-'||
               to_char(to_number(substr(low_value,5,2),'XX'),'fm00')||'-'||
               to_char(to_number(substr(low_value,7,2),'XX'),'fm00')||' '||
               to_char(to_number(substr(low_value,9,2),'XX')-1,'fm00')||':'||
               to_char(to_number(substr(low_value,11,2),'XX')-1,'fm00')||':'||
               to_char(to_number(substr(low_value,13,2),'XX')-1,'fm00')
              ||'.'||to_number(substr(low_value,15,8),'XXXXXXXX')  )
       ) low_v
,decode(substr(data_type,1,9) -- as there are several timestamp types
  ,'NUMBER'       ,to_char(utl_raw.cast_to_number(high_value))
  ,'VARCHAR2'     ,to_char(utl_raw.cast_to_varchar2(high_value))
  ,'NVARCHAR2'    ,to_char(utl_raw.cast_to_nvarchar2(high_value))
  ,'BINARY_DO',to_char(utl_raw.cast_to_binary_double(high_value))
  ,'BINARY_FL' ,to_char(utl_raw.cast_to_binary_float(high_value))
  ,'DATE',rtrim(
               to_char(100*(to_number(substr(high_value,1,2),'XX')-100)
                      + (to_number(substr(high_value,3,2),'XX')-100),'fm0000')||'-'||
               to_char(to_number(substr(high_value,5,2),'XX'),'fm00')||'-'||
               to_char(to_number(substr(high_value,7,2),'XX'),'fm00')||' '||
               to_char(to_number(substr(high_value,9,2),'XX')-1,'fm00')||':'||
               to_char(to_number(substr(high_value,11,2),'XX')-1,'fm00')||':'||
               to_char(to_number(substr(high_value,13,2),'XX')-1,'fm00'))
  ,'TIMESTAMP',rtrim(
               to_char(100*(to_number(substr(high_value,1,2),'XX')-100)
                      + (to_number(substr(high_value,3,2),'XX')-100),'fm0000')||'-'||
               to_char(to_number(substr(high_value,5,2),'XX'),'fm00')||'-'||
               to_char(to_number(substr(high_value,7,2),'XX'),'fm00')||' '||
               to_char(to_number(substr(high_value,9,2),'XX')-1,'fm00')||':'||
               to_char(to_number(substr(high_value,11,2),'XX')-1,'fm00')||':'||
               to_char(to_number(substr(high_value,13,2),'XX')-1,'fm00')
              ||'.'||to_char(to_number(substr(low_value,15,8),'XXXXXXXX')))
  ,  high_value
       ) hi_v
,low_value,high_value
from dba_tab_columns
where owner      like upper('&tab_own')
and   table_name like upper(nvl('&tab_name','WHOOPS')||'%')
ORDER BY owner,table_name,COLUMN_ID
/
clear colu
spool off
clear breaks

Yes, I know, it is a considerable chunk of code just to get the high and low values for columns, as identified by the last stats gather to be run on the table. {it is important to remember that these high and low values will only be as accurate as the stats gather that was run and when. An “estimate_percent=> 100″ run a minute ago will give probaby give accurate values, an “estimate_percent=>0.1′” run last month is not – which is usually what I am checking for}. I could make the script a lot neater by writing a set of functions to call to get the translation, as was suggested as a comment on the original posting and is in fact somethign I have done in the past, but one is often not allowed to put new stored PL/SQL code onto production systems, so a script which does it is, for me, best.

I should show some proof of it working, I guess. The below will create a table with a nice spread of column type and three records with a reasonable spread of data:

-- test_col_hilo
-- M Widlake 11/12/09
--
--testing script to show high and low values for various column types
--
set lines 90 pages 1000 trims on pause off
--set sqlpro 'test102>'
set sqlpro'>'
set echo on
set autotrace off
spool test_col_hilo.lst
-- basic setup info
col name form a30
col value form a20
select substr(name,1,30) name,substr(value,1,20) value
from v$parameter
where name in ('db_block_size','compatible','cpu_count','db_file_multiblock_read_count'
              ,'optimizer_mode','sga_target','sort_area_size')
order by name
/
drop table test_hilo purge;
--
--
COL TABLE_NAME FORM A15
COL TABLE_OWNER FORM A10
create table test_hilo
(id          number(8) not null
,numsmall    number(4)
,numlarge    number
,datesmall   date
,datelarge   date
,ts          timestamp
,ts0        timestamp(0)
,ts3	     timestamp(3)
,ts6	     timestamp(6)
,ts9	     timestamp(9)
,vcsmall     varchar2(10)
,vclarge     varchar2(100)
,vcodd       varchar2(20)
,nvcsmall    nvarchar2(10)
,nvclarge    nvarchar2(100)
,bfsmall     binary_float -- NB precision is only 6 or 7 digits
,bflarge     binary_float
,bdsmall     binary_double
,bdlarge     binary_double
)
tablespace users
/
-- insert low values
 insert into test_hilo
(id,numsmall,numlarge
,datesmall,datelarge
,ts,ts0,ts3
,ts6,ts9
,vcsmall,vclarge
,vcodd
,nvcsmall,nvclarge
,bfsmall,bflarge
,bdsmall,bdlarge
)
values
(1
,-10,-123456789.12345
,trunc(sysdate-1000),sysdate-500000
,systimestamp,systimestamp
,to_timestamp('01-FEB-2010 12:34:56.123','DD-MON-YYYY HH24:MI:SS.FF')
,to_timestamp('01-FEB-2010 12:34:56.1234567','DD-MON-YYYY HH24:MI:SS.FF')
,to_timestamp('01-FEB-2010 12:34:56.987654321','DD-MON-YYYY HH24:MI:SS.FF')
,'ABCDE','AABCABCDEABCDEFGABCDEFGHI'
,'Abc!"£$%^&*()deF'
,'ABCDE','AABCABCDEABCDEFGABCDEFGHI'
,-1.23,-12345.6
,12345,1234567890
)
/
-- insert mid values
 insert into test_hilo
(id,numsmall,numlarge
,datesmall,datelarge
,ts,ts0,ts3
,ts6,ts9
,vcsmall,vclarge
,vcodd
,nvcsmall,nvclarge
,bfsmall,bflarge
,bdsmall,bdlarge
)
values
(2
,15,1515151515151
,trunc(sysdate-10),sysdate-5000
,systimestamp,systimestamp
,to_timestamp('05-FEB-2010 12:34:56.123','DD-MON-YYYY HH24:MI:SS.FF')
,to_timestamp('05-FEB-2010 12:34:56.1234567','DD-MON-YYYY HH24:MI:SS.FF')
,to_timestamp('05-FEB-2010 12:34:56.987654321','DD-MON-YYYY HH24:MI:SS.FF')
,'CCCCC','CCBCABCDEABCDEFGABCDEFGHI'
,'CbaaBC'
,'EFGHI','ABCDEFGHIJKLMNOPQRSTUV'
,1.23,12345.6
,54321,5432112345
)
/
--insert high values
 insert into test_hilo
(id,numsmall,numlarge
,datesmall,datelarge
,ts,ts0,ts3
,ts6,ts9
,vcsmall,vclarge
,vcodd
,nvcsmall,nvclarge
,bfsmall,bflarge
,bdsmall,bdlarge
)
values
(3
,99,9898989898989
,trunc(sysdate-1),sysdate+1000
,systimestamp+10,systimestamp+10
,to_timestamp('20-FEB-2010 18:17:16.876','DD-MON-YYYY HH24:MI:SS.FF')
,to_timestamp('21-FEB-2010 17:16:15.555555','DD-MON-YYYY HH24:MI:SS.FF')
,to_timestamp('22-FEB-2010 16:15:14.123456789','DD-MON-YYYY HH24:MI:SS.FF')
,'ZYXWV','ZZZZZYZZXZZWZZVZZUZZTZZSZZRZZQZZP'
,'Z;#[]{}~@:'
,'VWXYZ','ZYXWVUTSRQPONMLKJIHGFEDCBA'
,9.87,98765.4
,987654321,987654321.1234567
)
/
commit;
EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=> USER, TABNAME=>'TEST_HILO')
set autotrace off
set echo off
--
spool off
clear colu
--

And the below is the translation of the values – I used a cut-down version of my script, so it woud fit on the blog better:

COLUMN_NAME  DATA_TYPE     LOW_V                         HI_V
------------ ------------- ----------------------------- -----------------------------
LOW_VALUE                 HIGH_VALUE
------------------------- -------------------------
ID           NUMBER        1                             3
C102                      C104
NUMSMALL     NUMBER        -10                           99
3E5B66                    C164
NUMLARGE     NUMBER        -123456789.12345              9898989898989
3A644E38220C59433366      C70A5A5A5A5A5A5A
DATESMALL    DATE          2007-05-31 00:00:00           2010-02-23 00:00:00
786B051F010101            786E0217010101
DATELARGE    DATE          0641-03-10 17:36:47           2012-11-20 17:36:47
6A8D030A122530            78700B14122530
TS           TIMESTAMP(6)  2010-02-24 17:36:47.255015000 2010-03-06 17:36:47.255015000
786E02181225300F333858    786E0306122530
TS0          TIMESTAMP(0)  2010-02-24 17:36:47.          2010-03-06 17:36:47.
786E0218122530            786E0306122530
TS3          TIMESTAMP(3)  2010-02-01 12:34:56.123000000 2010-02-20 18:17:16.123000000
786E02010D23390754D4C0    786E02141312113436B300
TS6          TIMESTAMP(6)  2010-02-01 12:34:56.123457000 2010-02-21 17:16:15.123457000
786E02010D2339075BCDE8    786E0215121110211D18B8
TS9          TIMESTAMP(9)  2010-02-01 12:34:56.987654321 2010-02-22 16:15:14.987654321
786E02010D23393ADE68B1    786E021611100F075BCD15
VCSMALL      VARCHAR2      ABCDE                         ZYXWV
4142434445                5A59585756
VCLARGE      VARCHAR2      AABCABCDEABCDEFGABCDEFGHI     ZZZZZYZZXZZWZZVZZUZZTZZSZZRZZ
                                                         QZZ
4141424341424344454142434 5A5A5A5A5A595A5A585A5A575
4454647414243444546474849 A5A565A5A555A5A545A5A535A
                          5A525A5A515A5A
VCODD        VARCHAR2      Abc!"£$%^&*()deF              Z;#[]{}~@:
4162632122C2A324255E262A2 5A3B235B5D7B7D7E403A
829646546
NVCSMALL     NVARCHAR2     ABCDE                         VWXYZ
00410042004300440045      0056005700580059005A
NVCLARGE     NVARCHAR2     AABCABCDEABCDEFG              ZYXWVUTSRQPONMLK
0041004100420043004100420 005A005900580057005600550
0430044004500410042004300 0540053005200510050004F00
44004500460047            4E004D004C004B
BFSMALL      BINARY_FLOAT  3.53999972E+000               -9.86999989E+000
40628F5B                  C11DEB85
BFLARGE      BINARY_FLOAT  3.64494306E-004               -9.87653984E+004
39BF1999                  C7C0E6B3
BDSMALL      BINARY_DOUBLE -1.2345E+004                  -9.87654321E+008
C0C81C8000000000          C1CD6F3458800000
BDLARGE      BINARY_DOUBLE -9.8765432112345672E+008      -5.432112345E+009
C1CD6F34588FCD6E          C1F43C774D900000

I could go on and look at handling intervals and time zones but I leave this to you as an exercise. In other words, I have no real need for those data types right now and can’t justify the time! I hope the above is enough to answer whatever questions you may have about the high and low values of your columns…

Missing SQL in AWR/ASH February 23, 2010

Posted by mwidlake in AWR, performance.
Tags: , , ,
3 comments

I’ve been working on some demonstrations of AWR/ASH within OEM for a client and I’ve come across a situation where you can be missing the SQL (and thus a lot of the useful information) for code run via DBMS_JOB or DBMS_SCHEDULER.

This is due to bugs 5928612.8 for DBMS_JOB and 5140631.8 for DBMS_SCHEDULER. If you can’t access metalink, then check out the end of this webpage. The SQL_ID/SQL_HASH_VALUE is not populated in V$SESSION for those jobs running under DBMS_JOB or DBMS_SCHEDULER. I am presuming from the fact that V$SESSION is not updated with these values, it is not available to V$ACTIVE_SESSION_HISTORY.

I did check that the SQL_ID was null in V$ACTIVE_SESSION_HISTORY for the records that connected to the USER_ID I was using, except for the handful of entries for when I was testing my code manually before running it via DBMS_JOB. The SQL_ID was captured fine for these interactive sessions (and the code executed was identical).

The bugs appear in 10.1.0.2 and both are fixed in 10.2.0.4 and 11. Both have patches available.
I am using 10.2.0.3

As soon as I mentioned my woes about not being able to see the SQL for code I was running via DBMS_JOB I knew of the existence of the bugs, as colleagues mentioned being aware of them as the patches have been applied to a couple of the other systems. However, I could find no trace of them on Metalink or Google.These bugs are listed as against v$session, not AWR or ASH, which is why I could not find them via searches for “missing sql awr” or missing sql ash” etc. So hopefully this posting will allow the connection to be made.

I find it very vexing when you know there is a fix for an oracle problem but you can’t find it in metalink (which seems even more true with the new, all-singing all-dancing all-fancy-graphics little help metalink). Sometimes it is because the bug number is not being made publically available and in other cases, such as this one, it is because no one has made the connection between the initial problem and other things it could impact. I guess it is asking a lot of Oracle Corp to do this for us and it would be unreasonable of us to hold it against them – But they are supposed to be a blooming data, information and knowledge expertise company! Tsch.

As an example of what I see in AWR with my workload, generated via DBMS_JOB {and I’m sorry it is so small, you should be able to click on it and get a full-size version}:

Notice that the top five jobs are via user MDW_2. I know that none of the top SQL on the left is from my workload (you will have to just take my work, but it is, in fact, all the “OEM/AWR code :-) ). Also, the top code shown is heave on green for CPU, but you can see most of the workload in the Top Activity and the Top Sessions is blue, for User IO.

If I pick out the top session, 545, and click on it I get the below screen:

Note the entry for SQL_ID is blank and SQL Command is “UNKNOWN”. I can see what it is doing (db_file_scattered read and some CPU) but not what on or why. :-(

Command Line or GUI – Which is Best? February 18, 2010

Posted by mwidlake in performance.
Tags: , ,
15 comments

At present I am suffering ever so slightly from “split personality disorder”* in respect of my liking for Command Line and GUI interfaces.

On the one hand, much to my colleagues mild reproach, I use SQL*PLus and not PL/SQL Developer for my day-to-day work. Even worse, I got sick of using notepad to hack around scripts {I am working in a windows client environment and you simply can’t use MS Word with SQL files!} so I have retrograded recently and downloaded a windows-complient ‘vi’ interface and it is soooooo nice to be able to use powerful ‘vi’ commands on my files once more. “:.,$s/^ /,/”. Ahhh, it is so much easier. I can do stuff in 3 seconds in ‘vi’ that would take me 10 minutes in Notepad in a large, complex file. That and, I’m sorry, but notepad seems to be unable to manage a 100MB file, despite me having 2GB of real memory and a decent machine, but ‘vi’ has no problem with it at all.
Even more retrograde, I have direct telnet access to my linux servers and I am getting back to that as it makes me so much more productive. “ls -alrt ^d” for all directories anyone? “df -k .” to see how many data files I can add? Yep, it’s all arcane and means so little to many modern IT “Java/Struts/CDE” people but boy it is direct and fast. I might even dig out that book on SED and AWK.

On the other hand, I have finally (after much very painful discussions back and forth) got agreement that my site probably has access to AWR, ASH and all that good performance repository stuff. So I am hacking around with the OEM screens that focus on performance and snapshots and stuff. Now, I am traditionally not a big fan of GUI DBA tools. Partly it is because I am a bit old and stuck in my ways and partly it is because GUIs are really just “menus of options”. You are limited to what options are available in your DBA GUI tool and you have a harder time learning all the options available or what is really going on “under the covers”.

But with AWR and all those graphs, links and utilities, you can drill down into real problems real time or in the past so effectively that, well, once they start using this tool properly they will not need me at all. It is a fantastic boon to performance management and problem resolution, as well as proactive performance management.

So there you are, I am with Doug Burns on this one, in that I have Learned to Love Pictures. When the Pictures are well thought out and well connected and simple enough to help make sense of a complex system {and Oh Boy Oracle performance has become sooo Complex!!!!}

So right now, I spend half my day in vi/linux/command line world and half of it in pretty picture/GUI world. I think what really makes me happy is to leave behind the half-way-house of text-like Windows World {Windows SQL*Plus, Notepad}.

Just to finish, you can’t mention AWR without someone raising the ugly issue of licence cost and how Evil Oracle Corp were to charge for it. Well, I think it has been established that the guys and gals who developed AWR/ASH did not expect it to become a cost option but it did. And I suspect that what kept it a cost option was the community’s OutRage at it being a cost option. Anything that popular, hey, a commercial company is going to charge for. I still reckon Oracle Corp ballsed up as making it free and helping people use it a bit would have made 90% of customers’ lives easier and would have translated into user happiness and a certain % of sales for training courses to learn more, but heck my day job is to make things work, not maintain sales percentages, so my opinion counts for nowt. *sigh*

(*apologies to real sufferers of Dissociative Identity Disorder, I am using the term in the loose, non-scientific, “common usage” term of “not sure of my opinion” rather than having truly disparate personalities and memories.** And note, I certainly do not mean schizophrenia which, despite the on-going public-opinion misunderstanding, is rarely anything to do with multiple personality disorders or “spit minds” AT ALL, and is more to do with a difficulty in determining between reality and hallucination. ).

Richard Foote on the Impact of stats staying the same. February 16, 2010

Posted by mwidlake in performance.
Tags: , ,
add a comment

I just wanted to highlight this very good posting by Richard Foote. He is talking about how SQL execution plans can change when “nothing else does”. Not the table structures, not the code, note the initialisation parameters, not even the table and index stats.

But something does change, which is what day it is (or what hour it is or what week it is). Time moves on and our data does too. If the statistics on the tables does NOT move on, then the CBO thinks that the range of data in the table does not change. So, the CBO thinks your queries are getting further and further “out of range” and so would expect to find less and less data in the tables to bring back. That will lead to plan changes.

If you have noticed my preoccupation with identifying the contents of histograms and high/low values in column stats, you may appreciate that this topic is one I have been finding is a large part of my day job.

Richard explains the point very well, as always, so go have a look.

Stats Need Stats to Gather Stats February 16, 2010

Posted by mwidlake in performance.
Tags: , ,
10 comments

Did you know that you sometimes need good stats so that the stats-gathering package can gather stats in an efficient way? This is a recent, quite extreme example.

I’ve been forced to learn a lot about gathering Oracle stats using DBMS_STATS over the last 4 or 5 years. But no matter how much I learn about the “challengingly bizarre” way in which it works, it seems every week or two there is a new oddity. I plan a whole series on the topic “soon”.

This particular example is from a 10.2.0.3 system.

I am gathering partition-only table stats as we are using the ability of Oracle to roll up Partition stats to Global stats under certain specific conditions. One of the conditions is that you need stats for every partition. Plus, to get global column stats, each partition must have stats for each column. Some of our partitions lacked or had very bad stats.

So I quickly knocked up a script-generating script to create DBMST_STATS.GATHER_TABLE_STATS statements that collected, for those partitions:

  • ONLY partition stats.
  • NO cascade down to indexes
  • BLOCK sampling so it is fast {and poor, but there you go}
  • ESTIMATE_PERCENT of 2, which is quite low for block sampling
  • collect histograms as we decided the devil of having them was better than the devil of not having them.

the above is not ideal to get “good stats”, but it is quick and gets OK stats which is what we need right now. An example statement is:

begin                                                                                              
-- part SD_INFO-DY07032004 rows 34554                                                              
dbms_stats.gather_table_stats(ownname => 'ERIC',tabname=> 'SD_INFO'
,partname=> 'DY07032004'
,granularity=> 'PARTITION', estimate_percent => 2,block_sample=> true ,cascade=> FALSE
,method_opt=> 'FOR ALL COLUMNS SIZE AUTO' ,no_invalidate=> false);
END;
/

Not a big partition (34554 rows and not very wide rows) and so did not take long to gather:

PARTITION_NAME                 LAST_ANALYZED
------------------------------ --------------------
DY04012004                     12-FEB-2010 16:26:57
DY05012004                     12-FEB-2010 16:27:00
DY06012004                     12-FEB-2010 16:27:04
DY07012004                     12-FEB-2010 16:27:07
DY08012004                     12-FEB-2010 16:27:11 -- 4 seconds difference
DY09012004                     12-FEB-2010 16:27:15

I’ve collected statistics for a few thousand partitions over the last couple of days and the time taken is anything between just under half a second to 10 seconds per partition, the odd unusually large partition taking a minute or so. {I believe it takes half a second to gather stats on an empty partition, on our system at least, due to the time it takes for the internal housekeeping, including copying the old statistics information to the SYS.WRI$_OPSTAT_… tables to support restoring stats}. Sorry, I drift away from my main point.

This partition took a lot longer than 10 seconds:

begin
-- part W_ACTIVITY_FAILURE-DY02092008 rows 49425
dbms_stats.gather_table_stats(ownname => 'ERIC',tabname=> 'W_ACTIVITY_FAILURE'
,partname=> 'DY02092008'
,granularity=> 'PARTITION', estimate_percent => 2,block_sample=> true ,cascade=> FALSE
,method_opt=> 'FOR ALL COLUMNS SIZE AUTO' ,no_invalidate=> false);
END;
/

After 10 minutes it was still running. WHY? I quickly checked the number of rows in the partition and then the size of the partition segment, incase either was much larger than I expected. Neither were.
select count(*) from eric.W_ACTIVITY_FAILURE partition (DY07092008)
COUNT(*)
———-
42182

From dba_segments.
BYTES BLOCKS
———- ———-
2621440 320

There is one advantage of a DBMS_STATS statement running for a long time – you can grab from the SGA the actual code being executed for the DBMS_STATS statement. I saw this.

select substrb(dump(val,16,0,32),1,120) ep, cnt 
from (select /*+ no_parallel(t) no_parallel_index(t) dbms_stats cursor_sharing_exact
use_weak_name_resl dynamic_sampling(0) no_monitoring */
"ACCOUNT_ID" val,count(*) cnt 
from "ERIC"."W_ACTIVITY_FAILURE" t 
where TBL$OR$IDX$PART$NUM("ERIC"."W_ACTIVITY_FAILURE",0,4,0,"ROWID") = :objn
and "ACCOUNT_ID" is not null 
group by "ACCOUNT_ID") 
order by val

DBMS_STATS is scanning the ACCOUNT_ID column on this table and it is taking a long time about it. The only index on the table is on ACCOUNT_ID. It then struck us.

The index is a global index.
INDEX_NAME PARTITIONED
—————————— ————-
API_XXXXXXXXXX_ACCOUNTS_IND NO

And that global index actually lacked stats {another feature of DBMS_STATS and rolling up partition stats had led to that}.

INDEX_NAME TYP UNQ BL L_BLKS DIST_KEYS CLUSTF LB_KEY DB_KEY LST_ANL
————— ——— — —- ———- ———– ———– ———- ———- ——–
API_XXXXXXX_ NOR NON
ACCOUNTS_IND

By this time 3 partitions for this table had been processed by my code, taking around 15 minutes each one. Incredibly slow.

I did a very quick 0.01% sample size DBMS_STATS.GATHER_INDEX_STATS on that index which took about 1 minute. As soon as the partition DBMS_STATS.GATHER_TABLE_STATS statement that was in flight finished, the following similar statements on that table’s partitions took under 3 seconds each. I’ll buy a pint for the first person to guess (within 10 minutes) WHEN I collected the global index stats {You can collect from any pub in central London if you give me a few day’s notice :-) }.

PARTITION_NAME   NUM_ROWS     BLOCKS LAST_ANALYZED
-------------- ---------- ---------- ---------------------
DY01092008          31461        452 16-FEB-2010 09:51:41
DY02092008          49425        686 16-FEB-2010 10:03:44
DY03092008          54472        719 16-FEB-2010 10:16:31
DY04092008          35762        491 16-FEB-2010 10:30:52
DY05092008          42182        587 16-FEB-2010 10:44:24
DY06092008          21186        312 16-FEB-2010 10:56:13
DY07092008          20898        313 16-FEB-2010 11:12:59
DY08092008         469500       1233 16-FEB-2010 11:13:02
DY09092008         480300        741 16-FEB-2010 11:13:04
DY10092008          15724        223 16-FEB-2010 11:31:01
DY11092008          55671        732 16-FEB-2010 11:31:06
DY12092008         820100       1779 16-FEB-2010 11:31:08
DY13092008          80215       1113 16-FEB-2010 11:31:16
DY14092008          10094        155 16-FEB-2010 11:31:18
DY15092008          10268        158 16-FEB-2010 11:31:20
DY16092008          24578        330 16-FEB-2010 11:31:22
DY17092008          21012        290 16-FEB-2010 11:31:24
DY18092008          22755        318 16-FEB-2010 11:31:27
DY19092008          21276        293 16-FEB-2010 11:31:29
DY20092008          20882        281 16-FEB-2010 11:31:31
DY21092008          24131        323 16-FEB-2010 11:31:34

I will investigate this oddity further if I get time this week, but the lesson to myself so far is:

Global indexes lacking stats can result in very long stats gathering times for partitions EVEN WHEN YOU DO NOT CASCADE DOWN TO INDEXES.

Turning on SQL Audit February 2, 2010

Posted by mwidlake in development, performance, Testing.
Tags: , , ,
6 comments

<Previous post…

I want to test out using the SQL AUDIT functionality.

The first thing you have to do (having read up on it a bit) is to enable it for the database. You do this by setting the initialization parameter AUDIT_TRAIL. On version 10.2 you have the options to write the audit entires to:

  •  The DB, by setting it to DB or DB_EXTENDED {not DB,EXTENDED as the manual says, that is the old format}. This puts the entries into the table  SYS.AUD$
  • The operating system, by setting it to OS, XML or XML_EXTENDED. If you set it to XML or XML_EXTENDED then the data is written out in XML format. You also optionally set AUDIT_FILE_DEST to where you want to data to be written to.

Writing the audit trail to the OS is potentially more secure, as you can stop those cunning and devious DBAs messing with the audit trail. {I’m not so sure that this really helps that much – if anyone knows of any DBAs caught out being naughty solely as a result of using the OS to store the SQL AUDIT records, it would be a fascinating comment}

I want to write to the DB as I want to be able to get at the audit data easily and I am not sure how I want to interrogate it. I’m faster with SQL than SED and AWK.

I also decided up front I wanted to use DB_EXTENDED so that the triggering SQL statement and all bind variables are caught, so I can see more about what it triggering the audit record. I am cautious of the impact of storing CLOBs though, which these two values are stored as. I’ve had performance issues moving lots of CLOBS around and I know from some old colleagues that Secure Files are a lot faster. If Secure Files are faster, that means CLOBs are slower :-). If the audit trail seems to add too much burden on my system, swapping back to just DB will be my first step.

Now for the bad news. You can’t just turn on AUDIT. That initialization parameter is not dynamic. You can’t even enable it for your session. It will need a restart of your database.

This tells me something. Oracle needs to do some setting up for SQL AUDIT when it starts the instance. Either start a new process, enable functionality in one of it’s regular processes or set up structures in memory to cope. Or a mixture thereof. I strongly suspect the need for memory structures {but this is only because, in reality, I have done some testing and I am writing this up afterwards}.

I should not really need to say this but DON’T go turning this on for a production system without extensive testing somewhere else first. There is not a lot “Out There” about the details of the performance impact of AUDIT but the general opinion is there is some; and that is reasonable given it is going to write database records for every action audited. Also, you have no idea yet of any knock-on effects. You know, things you did not expect that causes your database to lock or crash and you to get fired.

{Question, what happens if you alter the initialization file and restart only one node of a RAC database? I don’t know and so I should test that. My current test system is not RAC, but the final destination for this stuff is RAC}.

You probably also want to check that no one has gone and tried turning on SQL AUDIT on things already. You never know if someone else decided to have a play with this and issued a load of AUDIT statements only to find nothing happened – and left what they did in place “as nothing happened”. I already know of one example of this happening…

Here is a little script I knocked up to see what is currently set to be audited:

-- what_is_audited.sql
-- Martin Widlake 11/01/10
-- simple listing of what auditing is currently set
set pages 100
set pause on
spool what_is_audited.lst
select * from dba_priv_audit_opts
order by user_name,privilege
/
select * from sys.dba_stmt_audit_opts
order by user_name,audit_option
/
select * from DBA_OBJ_AUDIT_OPTS
order by owner,object_name
/
spool off
clear col
--
-- EOF
--

And some sample output. I’m not going to explain it in this post, but you can have a look though it.

DEV3&gt; @what_is_audited
USER_NAME                      PROXY_NAME
------------------------------ ------------------------------
PRIVILEGE                                SUCCESS    FAILURE
---------------------------------------- ---------- ----------
MDW1
ALTER SYSTEM                             BY ACCESS  BY ACCESS
MDW1
AUDIT SYSTEM                             BY ACCESS  BY ACCESS
MDW1
CREATE SESSION                           BY ACCESS  BY ACCESS

ALTER SYSTEM                             BY ACCESS  BY ACCESS

AUDIT SYSTEM                             BY ACCESS  BY ACCESS

CREATE SESSION                           BY ACCESS  BY ACCESS

6 rows selected.

USER_NAME                      PROXY_NAME
------------------------------ ------------------------------
AUDIT_OPTION                             SUCCESS    FAILURE
---------------------------------------- ---------- ----------
MDW1
ALTER SYSTEM                             BY ACCESS  BY ACCESS
MDW1
CLUSTER                                  BY ACCESS  BY ACCESS
MDW1
CONTEXT                                  BY ACCESS  BY ACCESS
MDW1
CREATE SESSION                           BY ACCESS  BY ACCESS
MDW1
DATABASE LINK                            BY ACCESS  BY ACCESS
MDW1
DELETE TABLE                             BY ACCESS  BY ACCESS
MDW1
...
TYPE                                     BY ACCESS  BY ACCESS
MDW1
UPDATE TABLE                             BY ACCESS  BY ACCESS
MDW1
USER                                     BY ACCESS  BY ACCESS
MDW1
VIEW                                     BY ACCESS  BY ACCESS

ALTER SYSTEM                             BY ACCESS  BY ACCESS

CLUSTER                                  BY ACCESS  BY ACCESS

CONTEXT                                  BY ACCESS  BY ACCESS

CREATE SESSION                           BY ACCESS  BY ACCESS

DATABASE LINK                            BY ACCESS  BY ACCESS

DIMENSION                                BY ACCESS  BY ACCESS

DIRECTORY                                BY ACCESS  BY ACCESS

INDEX                                    BY ACCESS  BY ACCESS

MATERIALIZED VIEW                        BY ACCESS  BY ACCESS
...
USER                                     BY ACCESS  BY ACCESS

VIEW                                     BY ACCESS  BY ACCESS

56 rows selected.

OWNER                          OBJECT_NAME                    OBJECT_TYPE
------------------------------ ------------------------------ --------------
ALT     AUD     COM     DEL     GRA     IND     INS     LOC     REN     SEL
------- ------- ------- ------- ------- ------- ------- ------- ------- ----
UPD     REF EXE     CRE     REA     WRI     FBK
------- --- ------- ------- ------- ------- -------
MWPERF                         FORN_M_SEQ                     SEQUENCE
-/-     -/-     -/-     -/-     -/-     -/-     -/-     -/-     -/-     A/A
-/-     -/- -/-     -/-     -/-     -/-     -/-
MWPERF                         PERSON                         TABLE
A/A     A/A     A/A     A/A     A/A     A/A     A/A     A/A     A/A     A/A
A/A     -/- -/-     -/-     -/-     -/-     A/A
MWPERF                         ROAD_TYPE                      TABLE
-/-     -/-     -/-     A/A     -/-     -/-     A/A     -/-     -/-     A/A
A/A     -/- -/-     -/-     -/-     -/-     -/-

If you discover you have a lot of things set to be audited, ESPECIALLY if they are auditing select access, think about turning some or all of it off before you enable AUDITING by setting that initialization parameter.

Once you have turned on the feature, you can start testing it…

Testing Methodology – How to Test #2 January 26, 2010

Posted by mwidlake in performance, Testing.
Tags: ,
1 comment so far

<previous post …

Last post I (see link above) I said something about how I like timing how long things {usually SQL queries but you can apply this to more complex situations} take as it is simple. But I highlighted that it is an approach that misses a lot of information and also can catch you out if you do not appreciate that the first run of any test script is likely to be slow (due to parsing and data caching overheads which are reduced for subsequent itterations).

I want to continue on that theme now, of simple testing with just how long things take. And how to cope with variation.

Hands up all of you who have Oracle on a PC at home? Great, you can test what you want on a dedicated machine. No one else is running anything on your machine, no one else is using your storage, no one else is using your network (if you have a network at home) and no one is running the SQL statement from hell on your database.

BUT. You might have some download of the latest security patch set running in the background, your virus scanner may kick in (or you might just have Norton full-security-everything-running-on-your-cpu-all-the-time turned on). Your miles per gallon may vary, especially on Windows but also on Linux.

For many of us we are testing on a server at work, where developers are working, where the storage is shared, where the network is dealing with that lady behind you watching “Celebrity Strictly come roller-skate dancing impersonations” {well, she sat behind me for 4 months and she was in the Quality team, what does that tell you}. Bottom line, the performance of your server will vary, your testing needs to cope with that.

How do you cope with a varying workload on the system you are running tests on? Well, you run the new and old version of your test several times, in succession, interleaved. This is ideal when , in the very un-ideal situation of having to test on a Live situation. Testing on “Live” is wrong, in the same way that spelling “initialisation” with a Z is wrong, but it is often forced upon us.

As an example, I want to check into the questionable suggestion someone made that selecting count(*) from dba_tables where owner=user is faster than selecting count(*) from user_tables.

So, I knock up a quick sql*plus script that runs the two alternatives in quick succession, interleaved.

--testa.sql
set timi on pause off
spool testa.lst
--
prompt run 1
select count(*) from user_tables;
select count(*) from dba_tables where owner=user;
--
prompt run 2
select count(*) from user_tables;
select count(*) from dba_tables where owner=user;
--
prompt run 3
select count(*) from user_tables;
select count(*) from dba_tables where owner=user;
--
prompt run 4
select count(*) from user_tables;
select count(*) from dba_tables where owner=user;
--
prompt run 5
select count(*) from user_tables;
select count(*) from dba_tables where owner=user;
--
spool off
-- eof

Just a couple of things to mention. I always spool out to a file called the same thing as the script, with .lst at the end (which is the default file extension for spool commands but I like to state it) and I like to use “prompt” to record in the output what is going on. Here it is obvious, sometimes it is not. Oh, and I like to put –EOF at the end of my files, so I can spot cut-and-paste errors. I know, I am very “command line” centric. I love pictures too Doug. Finally, I should not run the two versions with the same one first each time, I should swap them over after every two runs (not each run as then you are running A,A,B,B,A,A,B,B. A more complex A,B,A,B,B,A,B,A is better). But the results look messy, so I didn’t.

Here is the output (trimmed a little, for the sake of brevity):-

run 1

  COUNT(*)                                                                      
----------                                                                      
        35                                                                      
1 row selected.
Elapsed: 00:00:00.58

  COUNT(*)                                                                      
----------                                                                      
        35                                                                      
1 row selected.
Elapsed: 00:00:00.73

run 2

  COUNT(*)                                                                      
----------                                                                      
        35                                                                      
1 row selected.
Elapsed: 00:00:00.28

  COUNT(*)                                                                      
----------                                                                      
        35                                                                      
1 row selected.
Elapsed: 00:00:00.10

run 3

  COUNT(*)                                                                      
----------                                                                      
        35                                                                      
1 row selected.
Elapsed: 00:00:00.28

  COUNT(*)                                                                      
----------                                                                      
        35                                                                      
1 row selected.
Elapsed: 00:00:00.10

run 4

  COUNT(*)                                                                      
----------                                                                      
        35                                                                      
1 row selected.
Elapsed: 00:00:00.28

  COUNT(*)                                                                      
----------                                                                      
        35                                                                      
1 row selected.
Elapsed: 00:00:00.10

run 5

  COUNT(*)                                                                      
----------                                                                      
        35                                                                      
1 row selected.
Elapsed: 00:00:00.29

  COUNT(*)                                                                      
----------                                                                      
        35                                                                      
1 row selected.
Elapsed: 00:00:00.09

At this point I might pull the results into Excel, as it is so good at drawing pretty graphs, but I think we can interpret the results “by eye” in this case.

Run    User Tabs   DBA_Tabs
1        0.58          0.73
2        0.28          0.10
3        0.28          0.10
4        0.28          0.10
5        0.29          0.09

AVG      0.28          0.10

My Average is ignoring the first run for each version of the code, as previously explained {parse overhead/warming caches}

You can see that version B is considerably faster than A. In this case, there is no overlap between the two result sets, by which I mean every run o version B is faster than every run of version A. Also, there is little variation in the performance of each version.

What if you got a result set like:-

Run    User Tabs   DBA_Tabs
1        0.77          0.68
2        0.35          0.25
3        0.42          0.34
4        0.48          0.38
5        0.48          0.30
6        0.41          0.29
7        0.40          0.28

AVG      0.42          0.31

If you look at the figures, both A and B slow down and then start to speed up. Something happened to slow down the system in the middle of the test and it was recovering but not back to “normal” by the end. If we had done 7 runs of A followed by 7 runs of B, B would have been hard hit by the temporary system slow-down {I opened a WORD document :-) }.

Also, there is some overlap in the results for A and B. The slowest result for B, 0.38, is slower than for the fastest result for A, 0.35. When the spread of times for test A and test B overlap like this, you need to do a few more tests to ensure the results are consistent. I did 7 but I should really have done 9 or maybe 15.
In any case, B is consistently 25% faster than A. It is Faster.

How about this result set:

Run    User Tabs   DBA_Tabs
1        0.87          0.71
2        0.41          0.42
3        0.47          0.43
4        0.37          0.42
5        0.39          0.34
6        0.51          0.38
7        0.42          0.40

AVG      0.43          0.40

The variation in the result sets is higher and if fact if you only looked at results for paired runs 2,3 and 4 you would come to the conclusion A was very slightly faster than B. In fact, B is slightly faster than A.

Or is it? There are only 6 tests {remember, ignore run 1}, the variation within the sets is high (.37 to .51 in A, .34 ot .42 in B) and the overall difference is low. You could run a statistical analysis on this, a “Student T” test I think, to see if the difference was significant. But unless I was looking at something utterly business critical at this point where the smallest fraction of improvement was important, I would not bother. I would be far better off looking for solution C or doing something else completely. If you ever find yourself doing scientific statistical analysis to decide which of two options is faster, it is probably time to go home and consider what else you could be doing to help your employer more…

Enough for tonight and HOPEFULLY the wordpress pixies will not eat the end of this post (yesterday’s effort was better than tonights I think…)

Testing Methodology – How To Test January 26, 2010

Posted by mwidlake in performance, Testing.
Tags: , ,
1 comment so far

<Previous Post…

On those rare but pleasant occasions when I find myself running a training course on performance, something I always want to press home is “How to Test”.

I believe you can learn everthing in every manual on oracle technology and internals and still be very poor at performance tuning. Similarly, I think you can be lacking an awful lot of knowledge and still be good at performance tuning. It comes down to how you test. Not just how to test, but how you as an individual design and run your tests.

I joke with whoever I am talking to that what you need most of all to test is a watch and someone calling out “start” and “Stop”. ie you need to be able to time things. It is a bit of a throw-away statement, but actually most of us will do exactly this on a regular basis. We will be working on something, run it and it will come back in a few seconds. Then we will tweak it and run it again and mentally {or in my case, just very quietly under my breath} count. We even discuss it with people like that “How long did it take that time?” Ohh, a couple of seconds faster than last time”.

I like tuning by timing very, very, very much.

Firstly, it is simple – If a SQL query runs faster, it runs faster. If you tune by looking at the explain plan and you see a full table scan being replace with a nested loop and an index look up, is it faster? It depends. If you tune by looking at the buffer gets and other statistics from the SGA (or wherever), if the “buffer gets” go down, is it faster? This depends again. If the disk reads went up, even by a realtively small amount, maybe not. If the memory usage went through the roof because of a sort that spills down to disc, well quite probably not. Obviously if all of buffer gets, disk reads and cpu usage went down, you can be pretty certain you are having a positive impact in speeding up the statement. But timing the statement from start to finish gives you a nice, simple answer on if it runs faster.

Secondly, it is simple. You do not have to look at plans, at runtime statistics, at deltas of session statistics, at trace files. All those things are good but you need more knowledge and experience to use them and time to set up, collect and study them. This is not to say you do not get a lot more out of tuning if you understand all of the stuff about trace files, explain plans etc, I hasten to add – but you do not need that to get going with performance tuning.

Thirdly, it is simple. You tell your manager’s manager that you use 47% less buffer gets, 12% less disk reads but 9% more CPU then they will ask you what all that means. You tell them that version A runs in 2 minutes and 9 seconds and version B in 34 seconds, they will understand. Better still, graph it and give them a power point…

Fourthly, it is simple. You can see that statement (a) is faster or slower than statement (b). Now you can look at the plan, at the statistics for the statement, at what wait events are occuring and start tying up book knowledge with real-world results.

Of course, timing with a watch is very crude. You have a very capable piece of computing power underlying that database of yours, so let’s get it to do the timing.

As a simple start, in SQL*Plus use “set timing on” (it can be abbreviated to “set timi on”). Most GUI SQL tools will either default to telling you the elapsed time or have an obvious option to switch on the functionality.

{Oh, I’ll just mention that if you cannot get timing in sql*plus or wherever to work you, might want to check what the initialisation parameter “TIMED_STATISTICS” is set to. Some ancient memory is telling me that if it is set to FALSE, you may not be able to use TIMING in SQL*Plus but another memory is telling me that stopped being true a while back, version 8 maybe. I tried setting TIMED_STATISTICS to false in my session on 10.2 but TIMING still worked, but then I left STATISTICS_LEVEL alone and that cause TIMED_STATISTICS to be set. It is so long ago that I worked on a system that had TIMED_STATISTICS set to false! Even a quick google did not throw up an immediate answer}.

DB10.2> select count(*) from sn
  2  /
any key> 

  COUNT(*)
----------
       116

1 row selected.

DB10.2> set timi on
DB10.2> select count(*) from sn
  2  /
any key> 

  COUNT(*)
----------
       116

1 row selected.

Elapsed: 00:00:01.09

There you go, the count of SN records, all 116, took 1.09 seconds.

Yes, I had “set pause on” and the timing includes how long it takes me to spot that the query has finished and press a key. We all do it. However, sometimes the need for user input it still missed {Something I see quite often is, for example, not pressing the button in PL/SQL developer to get ALL the rows for the SQL statement, rather than just the first screenful}. A key thing is to try and remove from testing all and any waiting for user response, as we humans are slow and irratic.

so SET PAUSE OFF.

Now you have TIMING on and pause off. Time to run something and see how long it takes, then try and speed it up and run again:

DB10.2> select count(*) from person where to_char(dob,'YYYY') = 1988
  2  /

  COUNT(*)
----------
        29
Elapsed: 00:00:01.15

DB10.2> create index per_dob on person(dob);

Index created.

Elapsed: 00:00:01.31
DB10.2> select count(*) from person where to_char(dob,'YYYY') = 1988;

  COUNT(*)
----------
        29
Elapsed: 00:00:00.14

There you go, I added an index and the query went down from 1.15 seconds to 0.14, that is 8 times faster. Timing is timing.
Well, no, and this is something you need to be careful of if you are new to tuning.

The second itteration is nearly always faster.

Why? Well, the first time you run a piece of SQL it has to be parsed for a start, and that takes some time. More importantly, the first time you select the data, it is going to be on disk. Oracle has read it in and put it into memory. The second time you query the same data, it will be found in memory. That {nearly always} makes the next access to the data a lot faster. The index was not being used as I had a function on the column in the WHERE clause and this stops the index from being used.

So having said I love testing by timing, you need to be cautious about one-off tests. Oh, and here below is proof that the index I created is making no real difference to the speed of the SQL query:

DB10.2> set autotrace on
DB10.2> select count(*) from person where to_char(dob,'YYYY') = 1988;

  COUNT(*)
----------
        29
Elapsed: 00:00:00.12

Execution Plan
-----------------------------------------------------------
| Id  | Operation          | Name   | Rows  | Bytes | Cost 
-----------------------------------------------------------
|   0 | SELECT STATEMENT   |        |     1 |     9 |   335
|   1 |  SORT AGGREGATE    |        |     1 |     9 |      
|*  2 |   TABLE ACCESS FULL| PERSON |    36 |   324 |   335
-----------------------------------------------------------

Statistics
----------------------------------------------------------
         20  recursive calls
         19  db block gets
        727  consistent gets
          0  physical reads
       1992  redo size

DB10.2> drop index per_dob
  2  /

Index dropped.

Elapsed: 00:00:00.03
DB10.2> select count(*) from person where to_char(dob,'YYYY') = 1988
  2  /

  COUNT(*)
----------
        29
Elapsed: 00:00:00.15

Execution Plan
----------------------------------------------------------
| Id  | Operation          | Name   | Rows  | Bytes | Cost
----------------------------------------------------------
|   0 | SELECT STATEMENT   |        |     1 |     9 |  335
|   1 |  SORT AGGREGATE    |        |     1 |     9 |      
|*  2 |   TABLE ACCESS FULL| PERSON |    36 |   324 |  335
----------------------------------------------------------


Statistics
----------------------------------------------------------
        261  recursive calls
         19  db block gets
        841  consistent gets
          0  physical reads
       1992  redo size

We see 0.12 seconds goes to 0.14 seconds, exactly the same explain plan for both statements, a small increase in consistent gets, no physical gets by either statement (so the data is cached).
Why is the second statement a little slower and has a large increase in recursive calls and db block gets? Because I dropped the index and the statement had to be reparsed. Now, if you are new to tuning you would almost certainly not have appreciated what the recursive calls and DB block gets were all about, it could distract you from the question of “does it run faster”. It is certainly good to know all about that, but when you are starting off, you want to keep things simple and learn in stages.

What the above demonstrates, I hope, is that the second thing you run will have an advantage and could probably run faster even though, in reality, it is using more resource. And we tend to run old code first and new code second. So swap them over, give the old code the advantage of being run second.

Do not test things once, as you can easily be caught out. Test each version several times. And ignore the first run of each version. This is not perfect advice, code in production may well be being parsed and gathering data from disk, but unless you can allow for this in your testing, I think it is generally better, for simple testing, to run each version 6 times. Of the six runs, ignore the first run of each and average the results of the other 5. Which ever one runs faster on average is, well, fastest. IF the difference is significant.

Oh. Where is the SQL AUDIT in all this? Well, ponder on why am I generating REDO for a simple select…

Decoding high_value and low_value January 3, 2010

Posted by mwidlake in internals, performance.
Tags: , ,
13 comments

There is an update to this post here>

The table DBA_TAB_COLUMNS holds the LOW_VALUE and HIGH_VALUE for columns. This information is potentially very useful to us, especially as when the CBO looks at a WHERE clause on a column for a given value, the High and Low values for that column are considered. If the value in the WHERE clause is outside the known range, the expected number of values to be found is decreased in proportion to the distance outside the range.

What is not so helpful is that Oracle stores, and displays, the information in an internal raw format. Which is utterly unhelpful to us of course. Why can’t they at least expose the information in a view in a human-readable format? Tsch.

So here is some code on how to decode low_value and high_value. I pulled most of this together a few years back but always struggled to decode the low and high values for dates, until I found this post by Gary Myers.

-- col_stats
-- Martin Widlake mdw 21/03/2003
-- MDW 11/12/09 enhanced to include more translations of low_value/high_value
-- pilfered from Gary Myers blog
col owner        form a6 word wrap
col table_name   form a15 word wrap
col column_name  form a22 word wrap
col data_type    form a12
col M            form a1
col num_vals     form 99999,999
col dnsty        form 0.9999
col num_nulls    form 99999,999
col low_v        form a18
col hi_v         form a18
col data_type    form a10
set lines 110
break on owner nodup on table_name nodup
spool col_stats.lst
select --owner
--      ,table_name
      column_name
      ,data_type
      ,decode (nullable,'N','Y','N')  M
      ,num_distinct num_vals
      ,num_nulls
      ,density dnsty
,decode(data_type
  ,'NUMBER'       ,to_char(utl_raw.cast_to_number(low_value))
  ,'VARCHAR2'     ,to_char(utl_raw.cast_to_varchar2(low_value))
  ,'NVARCHAR2'    ,to_char(utl_raw.cast_to_nvarchar2(low_value))
  ,'BINARY_DOUBLE',to_char(utl_raw.cast_to_binary_double(low_value))
  ,'BINARY_FLOAT' ,to_char(utl_raw.cast_to_binary_float(low_value))
  ,'DATE',to_char(1780+to_number(substr(low_value,1,2),'XX')
         +to_number(substr(low_value,3,2),'XX'))||'-'
       ||to_number(substr(low_value,5,2),'XX')||'-'
       ||to_number(substr(low_value,7,2),'XX')||' '
       ||(to_number(substr(low_value,9,2),'XX')-1)||':'
       ||(to_number(substr(low_value,11,2),'XX')-1)||':'
       ||(to_number(substr(low_value,13,2),'XX')-1)
,  low_value
       ) low_v
,decode(data_type
  ,'NUMBER'       ,to_char(utl_raw.cast_to_number(high_value))
  ,'VARCHAR2'     ,to_char(utl_raw.cast_to_varchar2(high_value))
  ,'NVARCHAR2'    ,to_char(utl_raw.cast_to_nvarchar2(high_value))
  ,'BINARY_DOUBLE',to_char(utl_raw.cast_to_binary_double(high_value))
  ,'BINARY_FLOAT' ,to_char(utl_raw.cast_to_binary_float(high_value))
  ,'DATE',to_char(1780+to_number(substr(high_value,1,2),'XX')
         +to_number(substr(high_value,3,2),'XX'))||'-'
       ||to_number(substr(high_value,5,2),'XX')||'-'
       ||to_number(substr(high_value,7,2),'XX')||' '
       ||(to_number(substr(high_value,9,2),'XX')-1)||':'
       ||(to_number(substr(high_value,11,2),'XX')-1)||':'
       ||(to_number(substr(high_value,13,2),'XX')-1)
,  high_value
       ) hi_v
from dba_tab_columns
where owner      like upper('&tab_own')
and   table_name like upper(nvl('&tab_name','WHOOPS')||'%')
ORDER BY owner,table_name,COLUMN_ID
/
clear colu
spool off
clear breaks

Most of the translation is done via the utl_raw.cast_to_datatype functions but no such function is provided for dates, which is where most of us come unstuck. Gary recognised that the data was being stored in hex format, with an offset for the year.

I hope the script helps some of you.
{There are also functions under DBMS_STATS, DBMS_STATS.CONVERT_RAW_VALUES, that can also be called to translate many datatypes}

If anyone knows how to decode timestamps, I’d love to know as it would save me trying to work it out/find it on the Net somewhere. I’ll add it to the post so that there is one place to find all translatations.

Here is a quick output for a test table

@col_stats
Enter value for tab_own: dwperf
old  40: where owner      like upper('&tab_own')
new  40: where owner      like upper('dwperf')
Enter value for tab_name: ad_sql_exec_p
old  41: and   table_name like upper(nvl('&tab_name','WHOOPS')||'%')
new  41: and   table_name like upper(nvl('ad_sql_exec_p','WHOOPS')||'%')
any key> 

COLUMN_NAME            DATA_TYPE  M   NUM_VALS  NUM_NULLS   DNSTY LOW_V              HI_V
---------------------- ---------- - ---------- ---------- ------- ------------------ ---------------
INST_ID                NUMBER     Y          4          0  0.2500 1                  4
SQL_ID                 VARCHAR2   Y     87,104          0  0.0000 008d71chrks14      gzw309ttkx862
PLAN_HASH_VALUE        NUMBER     Y      1,884          0  0.0005 2646229            4294043874
SPC_ID                 NUMBER     N         83          0  0.0120 1076               2269
ADDRESS                RAW        N    118,198          0  0.0000 00000003E33123A0   0000001342FF3FF8
HASH_VALUE             NUMBER     N     87,104          0  0.0000 2758710            4294676643
CREATED_DATE           DATE       N        120          0  0.0083 2009-10-23 8:19:10 2009-12-10 9:19:13
LATEST_FIRST_LOAD_TIME DATE       N     11,791          0  0.0001 2009-9-14 11:55:59 2009-12-13 9:33:24
TOTAL_LOADS            NUMBER     N         98          0  0.0102 1                  55047
TOTAL_PARSE_CALLS      NUMBER     N         92          0  0.0109 0                  2972
TOTAL_EXECUTIONS       NUMBER     N        235          0  0.0043 0                  834624
TOTAL_ROWS_PROCESSED   NUMBER     N        809          0  0.0012 0                  26946123
TOTAL_FETCHES          NUMBER     N        313          0  0.0032 0                  834624
TOTAL_BUFFER_GETS      NUMBER     N      3,016          0  0.0003 0                  3355576809
TOTAL_DISK_READS       NUMBER     N        985          0  0.0010 0                  28189240
TOTAL_DIRECT_WRITES    NUMBER     N         98          0  0.0102 0                  751289
TOTAL_SORTS            NUMBER     N        106          0  0.0094 0                  5283
TOTAL_CPU_TIME         NUMBER     N     94,401          0  0.0000 1337               12183936207
TOTAL_ELAPSED_TIME     NUMBER     N    115,203          0  0.0000 1337               139692482086
TOTAL_OPTIMIZER_COST   NUMBER     N      1,467          0  0.0007 0                  369740902209315000
                                                                                     0000000

Using AWR data for OS information December 22, 2009

Posted by mwidlake in performance.
Tags: ,
5 comments

This is a demo of using AWR data to get some OS performance information.

This posting was actually a comment by (and is the work of) Bernard Polarski on a recent post about command_types I did a week or two back. I’ve lifted it into a separate posting as (a) I thought it might be useful to people and it would not get as much exposure as a comment as it would as a post and (b) I want to try and lay it out better for Bernard (WordPress likes to garble code in comments, which it did to Bernard’s efforts) and (c) I strognly believe in nicking other people’s code, if they are happy to put it “out there”. Why re-code what someone has already done and is happy to share :-)

I tried the code (I added the SYS. ownership on tables, so it will work even if you can’t log in as sys but have access to the objects) and it worked fine for me.

I should just add, use of AWR data is covered by an oracle licence, the diagnostic pack. If you do not have a licence for it, Oracle deem the WRH$ views as not yours to view, even though they are on your box, in the database you paid for, containing data gathered by your CPUs by code that is damned tricky to turn off and is on by default. Shesh.

Any mistakes or typoes, blame me not Bernard. I took the liberty of tweaking some of the English, hope that is OK Bernard.

*************************************
Okay a challenge : Let me try to surprise you with something improbable. Some months ago while investigating an unsual load on a Linux box, the unix sysadmin told us he could not go back more than 2 weeks with ‘sar’ history. Since Oracle takes also system stats I quickly enriched smenu with an Oracle sar.
The example below gives the sar stats for the last 10 snaps, it is stored into shortcut sts and called with param ‘-sar’

( replace workd Superior_TO and Inferior_to with mathematic signs, system eat them)

/tmp: sts -sar 10

+ sqlplus -s '/ as sysdba'
+
set feed off verify off lines 190 pages 66
col inst  new_value inst noprint;
col id1      head 'Idle time(%)'    justify c
col usr1     head 'User time(%)'    justify c
col sys1     head 'Sys time(%)'     justify c
col io1      head 'Io Wait time(%)' justify c
col nice1    head 'Nice time (%)'   justify c
col snap_len head 'Interval| (Secs)' justify c
col num_cpus   new_value p_num_cpus  head 'Number of CPU';
col a1         new_value secs noprint;
col SNAP_BEGIN format a20 head 'Snap begin' justify c
col SNAP_END   format a20 head 'Snap end'   justify c

 select instance_number inst from v$instance;
 SELECT value num_cpus 
 FROM v$osstat WHERE stat_name = 'NUM_CPUS';
 prompt
 prompt Negatives values correspond to Shutdown:
 prompt
select  snap_id, snap_len, round(id1 /snap_len *100,2) id1,
                       round(usr1 /snap_len *100,2) usr1,
                       round(sys1 /snap_len *100,2) sys1,
                       round(io1 /snap_len *100,2) io1,
                       round(nice1 /snap_len *100,2) nice1 
                     , snap_begin, snap_end
from (
     select  snap_id,  id1, usr1,sys1, io1, nice1
           , snap_begin, snap_end ,
             round( extract( day from diffs) *24*60*60*60+
                    extract( hour from diffs) *60*60+
                    extract( minute from diffs )* 60 +
                    extract( second from diffs )) snap_len 
             -- above is the exact length of the snapshot in seconds
     from ( select /*+ at this stage, each row show the cumulative value.
                       r1    7500  8600
                       r2    7300  8300
                       r3    7200  8110
                    we use [max(row) - lag(row)] to have the difference 
                    between [row and row-1], to obtain differentials values:
                       r1    200   300
                       r2    100   190
                       r3    0       0
                    */
        a.snap_id,
        (max(id1)    - lag( max(id1))   over (order by a.snap_id))/100        id1 ,
        (max(usr1)   - lag( max(usr1))  over (order by a.snap_id))/100        usr1,
        ( max(sys1)  - lag( max(sys1))  over (order by a.snap_id))/100        sys1,
        ( max(io1)   - lag( max(io1))   over (order by a.snap_id))/100        io1,
        ( max(nice1) - lag( max(nice1)) over (order by a.snap_id))/100        nice1,
          -- for later display
          max(to_char(BEGIN_INTERVAL_TIME,' YYYY-MM-DD HH24:mi:ss'))          snap_begin,
          -- for later display
          max(to_char(END_INTERVAL_TIME,' YYYY-MM-DD HH24:mi:ss'))            snap_end,
          -- exact len of snap used for percentage calculation
        ( max(END_INTERVAL_TIME)-max(BEGIN_INTERVAL_TIME))                    diffs
        from ( /*+  perform a pivot table so that the 5 values selected appear 
                    on one line. The case, distibute col a.value among 5 new 
                    columns, but creates a row for each. We will use the group 
                    by (snap_id) to condense the 5 rows into one. If you don't 
                    see how this works, just remove the group by and max function, 
                    then re-add it and you will see the use of max(). 
                    Here is what you will see:
        Raw data :   1000     2222
                     1000     3333
                     1000     4444
                     1000     5555
                     1000     6666
        The SELECT CASE creates populate our inline view with structure:
                       ID    IDLE    USER   SYS   IOWAIT   NICE
                     1000    2222
                     1000             3333
                     1000                   4444
                     1000                          5555
                     1000                                  6666
         the group by(1000) condenses the rows in one:
                       ID    IDLE    USER   SYS   IOWAIT  NICE
                      1000   2222    3333   4444   5555   6666
               */
               select a.snap_id,
                  case b.STAT_NAME
                       when 'IDLE_TIME' then a.value / &p_num_cpus
                   end  id1,
                  case b.STAT_NAME
                       when 'USER_TIME' then a.value / &p_num_cpus
                  end usr1 ,
                  case b.STAT_NAME
                       when 'SYS_TIME' then  a.value / &p_num_cpus
                  end sys1 ,
                  case b.STAT_NAME
                       when 'IOWAIT_TIME' then  a.value / &p_num_cpus
                  end io1,
                  case b.STAT_NAME
                       when 'NICE_TIME' then  a.value / &p_num_cpus
                  end nice1
                  from  sys.WRH$_OSSTAT a,  sys.WRH$_OSSTAT_NAME b
                       where
                            a.dbid      = b.dbid       and
                            a.STAT_ID   = b.stat_id    and
                            instance_number = &inst    and
                            b.stat_name in ('IDLE_TIME','USER_TIME','SYS_TIME','IOWAIT_TIME','NICE_TIME') and
                            a.snap_id > (( select max(snap_id) from  sys.WRH$_OSSTAT) - 10  -1 )
                   order by 1 desc
              ) a,  sys.wrm$_snapshot s
         where  a.snap_id = s.snap_id
         group by a.snap_id
         order by snap_id desc
        )
   )where rownum < (10+1);





Number of CPU
-------------
            2

Negatives values correspond to Shutdown:


            Interval
   SNAP_ID   (Secs)   Idle time(%) User time(%) Sys time(%) Io Wait time(%) Nice time (%)      Snap begin            Snap end
---------- ---------- ------------ ------------ ----------- --------------- ------------- -------------------- --------------------
      2212       3603        87.07         9.43        2.84           16.19             0  2009-12-15 14:00:19  2009-12-15 15:00:21
      2211       3613        90.15         7.31        2.14           18.97             0  2009-12-15 13:00:06  2009-12-15 14:00:19
      2210       3593        92.58         5.91         1.4            7.91             0  2009-12-15 12:00:13  2009-12-15 13:00:06
      2209       3574        91.24         6.78        1.73            9.37             0  2009-12-15 11:00:38  2009-12-15 12:00:13
      2208       3620        90.04          7.8        1.96           11.81             0  2009-12-15 10:00:19  2009-12-15 11:00:38
      2207       3594         89.8         6.89        2.37           12.99             0  2009-12-15 09:00:25  2009-12-15 10:00:19
      2206       3611        91.75         5.81        1.87            8.29             0  2009-12-15 08:00:14  2009-12-15 09:00:25
      2205       3599        92.38         5.63        1.68            8.57             0  2009-12-15 07:00:15  2009-12-15 08:00:14
      2204       3605        91.72         6.51        1.58           10.79             0  2009-12-15 06:00:11  2009-12-15 07:00:15
      2203       3601        86.36        10.35        2.94           10.62             0  2009-12-15 05:00:10  2009-12-15 06:00:11
Follow

Get every new post delivered to your Inbox.

Join 161 other followers