Update to Decoding High and Low values February 24, 2010
Posted by mwidlake in internals, performance.Tags: data dictionary, statistics
12 comments
After what seems an age, I finally got a little time to enhance my col_stats code with the extra info people provided as comments to
this original posting on it. I also now handle the translation of dates correctly and also timestamps.
I had a real struggle with timestamps. I knew I could find the answer in Tom Kyte’s “Expert Oracle Database Architecture” as one of the comments said so, but I tried to work it out myself as my copy of the book was at my Mothers {no, she is not learning Oracle Databases, I left it there by accident}. As the other elements of timestamps are held as numerical elements stored as hex (number of days, number of months) I tried to interpret it like that. I was being too complex, it is just an 8-digit hex number – to_number(value,’XXXXXXXX’).
Anyway, this is new-improved, cleanes whiter-than-white script {I’ve increased the width of myt blog to better show code like the belwo, but try clicking on “show source” if you want a plain text version}:
-- col_stats -- Martin Widlake mdw 21/03/2003 -- MDW 11/12/09 enhanced to include more translations of low_value/high_value -- pilfered from Gary Myers blog -- MDW 20/02/10 added in the handling of timestamps. col owner form a6 word wrap col table_name form a15 word wrap col column_name form a22 word wrap col data_type form a12 col M form a1 col num_vals form 99999,999 col dnsty form 0.9999 col num_nulls form 99999,999 col low_v form a30 col low_v2 form a18 col hi_v form a30 col data_type form a10 col low_value form a25 col high_value form a25 set lines 110 break on owner nodup on table_name nodup spool col_stats.lst select --owner -- ,table_name column_name ,data_type ,decode (nullable,'N','Y','N') M ,num_distinct num_vals ,num_nulls ,density dnsty ,decode(substr(data_type,1,9) -- as there are several timestamp types ,'NUMBER' ,to_char(utl_raw.cast_to_number(low_value)) ,'VARCHAR2' ,to_char(utl_raw.cast_to_varchar2(low_value)) ,'NVARCHAR2' ,to_char(utl_raw.cast_to_nvarchar2(low_value)) ,'BINARY_DO',to_char(utl_raw.cast_to_binary_double(low_value)) ,'BINARY_FL' ,to_char(utl_raw.cast_to_binary_float(low_value)) ,'DATE',rtrim( to_char(100*(to_number(substr(low_value,1,2),'XX')-100) + (to_number(substr(low_value,3,2),'XX')-100),'fm0000')||'-'|| to_char(to_number(substr(low_value,5,2),'XX'),'fm00')||'-'|| to_char(to_number(substr(low_value,7,2),'XX'),'fm00')||' '|| to_char(to_number(substr(low_value,9,2),'XX')-1,'fm00')||':'|| to_char(to_number(substr(low_value,11,2),'XX')-1,'fm00')||':'|| to_char(to_number(substr(low_value,13,2),'XX')-1,'fm00')) ,'TIMESTAMP',rtrim( to_char(100*(to_number(substr(low_value,1,2),'XX')-100) + (to_number(substr(low_value,3,2),'XX')-100),'fm0000')||'-'|| to_char(to_number(substr(low_value,5,2),'XX'),'fm00')||'-'|| to_char(to_number(substr(low_value,7,2),'XX'),'fm00')||' '|| to_char(to_number(substr(low_value,9,2),'XX')-1,'fm00')||':'|| to_char(to_number(substr(low_value,11,2),'XX')-1,'fm00')||':'|| to_char(to_number(substr(low_value,13,2),'XX')-1,'fm00') ||'.'||to_number(substr(low_value,15,8),'XXXXXXXX') ) ) low_v ,decode(substr(data_type,1,9) -- as there are several timestamp types ,'NUMBER' ,to_char(utl_raw.cast_to_number(high_value)) ,'VARCHAR2' ,to_char(utl_raw.cast_to_varchar2(high_value)) ,'NVARCHAR2' ,to_char(utl_raw.cast_to_nvarchar2(high_value)) ,'BINARY_DO',to_char(utl_raw.cast_to_binary_double(high_value)) ,'BINARY_FL' ,to_char(utl_raw.cast_to_binary_float(high_value)) ,'DATE',rtrim( to_char(100*(to_number(substr(high_value,1,2),'XX')-100) + (to_number(substr(high_value,3,2),'XX')-100),'fm0000')||'-'|| to_char(to_number(substr(high_value,5,2),'XX'),'fm00')||'-'|| to_char(to_number(substr(high_value,7,2),'XX'),'fm00')||' '|| to_char(to_number(substr(high_value,9,2),'XX')-1,'fm00')||':'|| to_char(to_number(substr(high_value,11,2),'XX')-1,'fm00')||':'|| to_char(to_number(substr(high_value,13,2),'XX')-1,'fm00')) ,'TIMESTAMP',rtrim( to_char(100*(to_number(substr(high_value,1,2),'XX')-100) + (to_number(substr(high_value,3,2),'XX')-100),'fm0000')||'-'|| to_char(to_number(substr(high_value,5,2),'XX'),'fm00')||'-'|| to_char(to_number(substr(high_value,7,2),'XX'),'fm00')||' '|| to_char(to_number(substr(high_value,9,2),'XX')-1,'fm00')||':'|| to_char(to_number(substr(high_value,11,2),'XX')-1,'fm00')||':'|| to_char(to_number(substr(high_value,13,2),'XX')-1,'fm00') ||'.'||to_char(to_number(substr(low_value,15,8),'XXXXXXXX'))) , high_value ) hi_v ,low_value,high_value from dba_tab_columns where owner like upper('&tab_own') and table_name like upper(nvl('&tab_name','WHOOPS')||'%') ORDER BY owner,table_name,COLUMN_ID / clear colu spool off clear breaks
Yes, I know, it is a considerable chunk of code just to get the high and low values for columns, as identified by the last stats gather to be run on the table. {it is important to remember that these high and low values will only be as accurate as the stats gather that was run and when. An “estimate_percent=> 100” run a minute ago will give probaby give accurate values, an “estimate_percent=>0.1′” run last month is not – which is usually what I am checking for}. I could make the script a lot neater by writing a set of functions to call to get the translation, as was suggested as a comment on the original posting and is in fact somethign I have done in the past, but one is often not allowed to put new stored PL/SQL code onto production systems, so a script which does it is, for me, best.
I should show some proof of it working, I guess. The below will create a table with a nice spread of column type and three records with a reasonable spread of data:
-- test_col_hilo -- M Widlake 11/12/09 -- --testing script to show high and low values for various column types -- set lines 90 pages 1000 trims on pause off --set sqlpro 'test102>' set sqlpro'>' set echo on set autotrace off spool test_col_hilo.lst -- basic setup info col name form a30 col value form a20 select substr(name,1,30) name,substr(value,1,20) value from v$parameter where name in ('db_block_size','compatible','cpu_count','db_file_multiblock_read_count' ,'optimizer_mode','sga_target','sort_area_size') order by name / drop table test_hilo purge; -- -- COL TABLE_NAME FORM A15 COL TABLE_OWNER FORM A10 create table test_hilo (id number(8) not null ,numsmall number(4) ,numlarge number ,datesmall date ,datelarge date ,ts timestamp ,ts0 timestamp(0) ,ts3 timestamp(3) ,ts6 timestamp(6) ,ts9 timestamp(9) ,vcsmall varchar2(10) ,vclarge varchar2(100) ,vcodd varchar2(20) ,nvcsmall nvarchar2(10) ,nvclarge nvarchar2(100) ,bfsmall binary_float -- NB precision is only 6 or 7 digits ,bflarge binary_float ,bdsmall binary_double ,bdlarge binary_double ) tablespace users / -- insert low values insert into test_hilo (id,numsmall,numlarge ,datesmall,datelarge ,ts,ts0,ts3 ,ts6,ts9 ,vcsmall,vclarge ,vcodd ,nvcsmall,nvclarge ,bfsmall,bflarge ,bdsmall,bdlarge ) values (1 ,-10,-123456789.12345 ,trunc(sysdate-1000),sysdate-500000 ,systimestamp,systimestamp ,to_timestamp('01-FEB-2010 12:34:56.123','DD-MON-YYYY HH24:MI:SS.FF') ,to_timestamp('01-FEB-2010 12:34:56.1234567','DD-MON-YYYY HH24:MI:SS.FF') ,to_timestamp('01-FEB-2010 12:34:56.987654321','DD-MON-YYYY HH24:MI:SS.FF') ,'ABCDE','AABCABCDEABCDEFGABCDEFGHI' ,'Abc!"£$%^&*()deF' ,'ABCDE','AABCABCDEABCDEFGABCDEFGHI' ,-1.23,-12345.6 ,12345,1234567890 ) / -- insert mid values insert into test_hilo (id,numsmall,numlarge ,datesmall,datelarge ,ts,ts0,ts3 ,ts6,ts9 ,vcsmall,vclarge ,vcodd ,nvcsmall,nvclarge ,bfsmall,bflarge ,bdsmall,bdlarge ) values (2 ,15,1515151515151 ,trunc(sysdate-10),sysdate-5000 ,systimestamp,systimestamp ,to_timestamp('05-FEB-2010 12:34:56.123','DD-MON-YYYY HH24:MI:SS.FF') ,to_timestamp('05-FEB-2010 12:34:56.1234567','DD-MON-YYYY HH24:MI:SS.FF') ,to_timestamp('05-FEB-2010 12:34:56.987654321','DD-MON-YYYY HH24:MI:SS.FF') ,'CCCCC','CCBCABCDEABCDEFGABCDEFGHI' ,'CbaaBC' ,'EFGHI','ABCDEFGHIJKLMNOPQRSTUV' ,1.23,12345.6 ,54321,5432112345 ) / --insert high values insert into test_hilo (id,numsmall,numlarge ,datesmall,datelarge ,ts,ts0,ts3 ,ts6,ts9 ,vcsmall,vclarge ,vcodd ,nvcsmall,nvclarge ,bfsmall,bflarge ,bdsmall,bdlarge ) values (3 ,99,9898989898989 ,trunc(sysdate-1),sysdate+1000 ,systimestamp+10,systimestamp+10 ,to_timestamp('20-FEB-2010 18:17:16.876','DD-MON-YYYY HH24:MI:SS.FF') ,to_timestamp('21-FEB-2010 17:16:15.555555','DD-MON-YYYY HH24:MI:SS.FF') ,to_timestamp('22-FEB-2010 16:15:14.123456789','DD-MON-YYYY HH24:MI:SS.FF') ,'ZYXWV','ZZZZZYZZXZZWZZVZZUZZTZZSZZRZZQZZP' ,'Z;#[]{}~@:' ,'VWXYZ','ZYXWVUTSRQPONMLKJIHGFEDCBA' ,9.87,98765.4 ,987654321,987654321.1234567 ) / commit; EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=> USER, TABNAME=>'TEST_HILO') set autotrace off set echo off -- spool off clear colu --
And the below is the translation of the values – I used a cut-down version of my script, so it woud fit on the blog better:
COLUMN_NAME DATA_TYPE LOW_V HI_V ------------ ------------- ----------------------------- ----------------------------- LOW_VALUE HIGH_VALUE ------------------------- ------------------------- ID NUMBER 1 3 C102 C104 NUMSMALL NUMBER -10 99 3E5B66 C164 NUMLARGE NUMBER -123456789.12345 9898989898989 3A644E38220C59433366 C70A5A5A5A5A5A5A DATESMALL DATE 2007-05-31 00:00:00 2010-02-23 00:00:00 786B051F010101 786E0217010101 DATELARGE DATE 0641-03-10 17:36:47 2012-11-20 17:36:47 6A8D030A122530 78700B14122530 TS TIMESTAMP(6) 2010-02-24 17:36:47.255015000 2010-03-06 17:36:47.255015000 786E02181225300F333858 786E0306122530 TS0 TIMESTAMP(0) 2010-02-24 17:36:47. 2010-03-06 17:36:47. 786E0218122530 786E0306122530 TS3 TIMESTAMP(3) 2010-02-01 12:34:56.123000000 2010-02-20 18:17:16.123000000 786E02010D23390754D4C0 786E02141312113436B300 TS6 TIMESTAMP(6) 2010-02-01 12:34:56.123457000 2010-02-21 17:16:15.123457000 786E02010D2339075BCDE8 786E0215121110211D18B8 TS9 TIMESTAMP(9) 2010-02-01 12:34:56.987654321 2010-02-22 16:15:14.987654321 786E02010D23393ADE68B1 786E021611100F075BCD15 VCSMALL VARCHAR2 ABCDE ZYXWV 4142434445 5A59585756 VCLARGE VARCHAR2 AABCABCDEABCDEFGABCDEFGHI ZZZZZYZZXZZWZZVZZUZZTZZSZZRZZ QZZ 4141424341424344454142434 5A5A5A5A5A595A5A585A5A575 4454647414243444546474849 A5A565A5A555A5A545A5A535A 5A525A5A515A5A VCODD VARCHAR2 Abc!"£$%^&*()deF Z;#[]{}~@: 4162632122C2A324255E262A2 5A3B235B5D7B7D7E403A 829646546 NVCSMALL NVARCHAR2 ABCDE VWXYZ 00410042004300440045 0056005700580059005A NVCLARGE NVARCHAR2 AABCABCDEABCDEFG ZYXWVUTSRQPONMLK 0041004100420043004100420 005A005900580057005600550 0430044004500410042004300 0540053005200510050004F00 44004500460047 4E004D004C004B BFSMALL BINARY_FLOAT 3.53999972E+000 -9.86999989E+000 40628F5B C11DEB85 BFLARGE BINARY_FLOAT 3.64494306E-004 -9.87653984E+004 39BF1999 C7C0E6B3 BDSMALL BINARY_DOUBLE -1.2345E+004 -9.87654321E+008 C0C81C8000000000 C1CD6F3458800000 BDLARGE BINARY_DOUBLE -9.8765432112345672E+008 -5.432112345E+009 C1CD6F34588FCD6E C1F43C774D900000
I could go on and look at handling intervals and time zones but I leave this to you as an exercise. In other words, I have no real need for those data types right now and can’t justify the time! I hope the above is enough to answer whatever questions you may have about the high and low values of your columns…
Missing SQL in AWR/ASH February 23, 2010
Posted by mwidlake in AWR, performance.Tags: AWR, bug, data dictionary, performance
3 comments
I’ve been working on some demonstrations of AWR/ASH within OEM for a client and I’ve come across a situation where you can be missing the SQL (and thus a lot of the useful information) for code run via DBMS_JOB or DBMS_SCHEDULER.
This is due to bugs 5928612.8 for DBMS_JOB and 5140631.8 for DBMS_SCHEDULER. If you can’t access metalink, then check out the end of this webpage. The SQL_ID/SQL_HASH_VALUE is not populated in V$SESSION for those jobs running under DBMS_JOB or DBMS_SCHEDULER. I am presuming from the fact that V$SESSION is not updated with these values, it is not available to V$ACTIVE_SESSION_HISTORY.
I did check that the SQL_ID was null in V$ACTIVE_SESSION_HISTORY for the records that connected to the USER_ID I was using, except for the handful of entries for when I was testing my code manually before running it via DBMS_JOB. The SQL_ID was captured fine for these interactive sessions (and the code executed was identical).
The bugs appear in 10.1.0.2 and both are fixed in 10.2.0.4 and 11. Both have patches available.
I am using 10.2.0.3
As soon as I mentioned my woes about not being able to see the SQL for code I was running via DBMS_JOB I knew of the existence of the bugs, as colleagues mentioned being aware of them as the patches have been applied to a couple of the other systems. However, I could find no trace of them on Metalink or Google.These bugs are listed as against v$session, not AWR or ASH, which is why I could not find them via searches for “missing sql awr” or missing sql ash” etc. So hopefully this posting will allow the connection to be made.
I find it very vexing when you know there is a fix for an oracle problem but you can’t find it in metalink (which seems even more true with the new, all-singing all-dancing all-fancy-graphics little help metalink). Sometimes it is because the bug number is not being made publically available and in other cases, such as this one, it is because no one has made the connection between the initial problem and other things it could impact. I guess it is asking a lot of Oracle Corp to do this for us and it would be unreasonable of us to hold it against them – But they are supposed to be a blooming data, information and knowledge expertise company! Tsch.
As an example of what I see in AWR with my workload, generated via DBMS_JOB {and I’m sorry it is so small, you should be able to click on it and get a full-size version}:
Notice that the top five jobs are via user MDW_2. I know that none of the top SQL on the left is from my workload (you will have to just take my work, but it is, in fact, all the “OEM/AWR code 🙂 ). Also, the top code shown is heave on green for CPU, but you can see most of the workload in the Top Activity and the Top Sessions is blue, for User IO.
If I pick out the top session, 545, and click on it I get the below screen:
Note the entry for SQL_ID is blank and SQL Command is “UNKNOWN”. I can see what it is doing (db_file_scattered read and some CPU) but not what on or why. 😦
Friday Philosophy – Alternatives to Testing February 19, 2010
Posted by mwidlake in Friday Philosophy, Testing.Tags: system development, Testing
5 comments
We all know that we should test everything before we do it on a production system.
Or should we? Well yes, you should, but sometimes there is no suitable test system (or even any test system, which seems to be becoming a more common problem) or the testing regime takes weeks and you need a fix right now.
So given that there really is a proper business reason why doing the testing is expensive {in effort or time} or impossible, what can you do? I believe there is an alternative.
To be able to reduce or remove the need for proper testing you need to be absolutely sure the following three things are true, in order of importance:
- Be able to monitor the live system for negative impact of your changes.
- Have a proven and quick rollout.
- Be confident that the chance of negative impact on your system is low likelihood and low impact.
Note that, in my opinion, the least important {yet still very important} consideration is that the change be almost certain to work and won’t cause too much grief if something unexpected happens. However, this seems to be the main concern when people are considering a “Quick Fix” on a production system. “Is it likely to go wrong? Is it going to matter if it does not help?”. At this point all eyes tend to swivel to the DBA or the Lead Developer who has to go on what I call DBA Instinct. Given your experience to date and what you know about how the database works, give it your best guess.
The problem with DBA Instinct is that if you are not very experienced you are unlikely to get it right. If you are very experienced, most of that experience was on earlier version of Oracle and you probably do little hands-on work now as you are a manger. So you are still likely unlikely to get it right 🙂
So, I argue that point 1 (spotting things going wrong) and 2 (Getting back to where you started) are most important.
Let us take the classic example of just adding an index as our untested change (as we have no system capable of generating a realistic workload and thus the impact across the board).
3) Most of us would agree that adding an index is easy and the impact is low and the chance of plans going terribly slow by using the new index unsuitably are low…
2) Rollback is relatively simple, you drop the index again. Question, can you just drop an index on a table if a SQL statement is in-flight and using that index? You know, a SQL statement that the CBO has decided is going to be useful and, despite your DBA Instinct that nothing is likely to go wrong, now takes 20 seconds to run rather than 0.2 seconds. And it is being run all the time on your OLTP system so now there is no time when there are not 2 or 3 queries in flight.
3) Verifying impact. This point is vital and can be tricky. You can just rely on staring at your monitoring tool and hoping any serious impact shows up so you can catch it and do something about it. Or people ring up and complain. That would not be so good, you have impacted the business.
I would pull out code from the SGA that is referencing the table to be indexed before you start (I’ll stick a bit of code at the bottom for doing this). If the list is not too large, you can do the same check after the code has gone in and compare changes to buffer gets and disk reads per execution.
A more targeted check is to pull out of v$SQL_PLAN anything that starts using the new index and check to make sure it is not now a problem.
A second example, I want to gather system statistics as I am sure the system does not appreciate how blindingly fast my disks are for multi block reads.
3) Impact? Well, I would suggest the potential impact is wide and across the board, else why do it?
2) Rollback? Delete the system statistics. What happens when you delete the system statistics? Are they all set to null? Are the defaults put back in place? Could I use DBMS_STATS.RESTORE_SYSTEM_STATS? What is the impact of any of them? {RESTORE appears to work reliably and, if you delete the system stats, you get the “out of the box” values again, which could be very different to what you had before your spur-of-the-moment gather}
1) How to assess negative impact? You are going to have to monitor the whole system and hope to pick up any code that starts running slowly before it impacts the business too much.
I would probably add the index but I would need to do at least some testing and proof of rollback before I gather system statistics. And I would only do so without proper testing if said proper testing was impossible due to a lack of test systems.
So, I do think you can implement changes without testing or with reduced testing, but it is not always the easy option.
I also think it is a valid (and often quicker, in terms of elapsed time) way of changing the productions system without going through full unit/integration/regression/NFR testing.
Now you just need to persuade the Change Advisory Board that it is OK to do the change. Good luck 🙂
Oh, that code:-
-- chk_sga_txt.sql -- Martin Widlake - look for sql in sga containing the provided txt -- set pause on set pages 36 col first_load_time form A20 spool chk_sga_txt select first_load_time ,parse_calls prse ,executions excs ,buffer_gets buffs ,disk_reads discs ,rows_processed rws --,address address ,hash_value hash_value ,sql_id --,plan_hash_value ,sql_text from v$sqlarea --where parsing_schema_id !='0' where upper(sql_text) like upper('%'||nvl('&sql_txt','whoops')||'%') and rownum <50 --order by first_load_time desc order by (greatest(buffer_gets,1)/greatest(rows_processed,1)) desc / spool off
Command Line or GUI – Which is Best? February 18, 2010
Posted by mwidlake in performance.Tags: perception, performance, rant
15 comments
At present I am suffering ever so slightly from “split personality disorder”* in respect of my liking for Command Line and GUI interfaces.
On the one hand, much to my colleagues mild reproach, I use SQL*PLus and not PL/SQL Developer for my day-to-day work. Even worse, I got sick of using notepad to hack around scripts {I am working in a windows client environment and you simply can’t use MS Word with SQL files!} so I have retrograded recently and downloaded a windows-complient ‘vi’ interface and it is soooooo nice to be able to use powerful ‘vi’ commands on my files once more. “:.,$s/^ /,/”. Ahhh, it is so much easier. I can do stuff in 3 seconds in ‘vi’ that would take me 10 minutes in Notepad in a large, complex file. That and, I’m sorry, but notepad seems to be unable to manage a 100MB file, despite me having 2GB of real memory and a decent machine, but ‘vi’ has no problem with it at all.
Even more retrograde, I have direct telnet access to my linux servers and I am getting back to that as it makes me so much more productive. “ls -alrt ^d” for all directories anyone? “df -k .” to see how many data files I can add? Yep, it’s all arcane and means so little to many modern IT “Java/Struts/CDE” people but boy it is direct and fast. I might even dig out that book on SED and AWK.
On the other hand, I have finally (after much very painful discussions back and forth) got agreement that my site probably has access to AWR, ASH and all that good performance repository stuff. So I am hacking around with the OEM screens that focus on performance and snapshots and stuff. Now, I am traditionally not a big fan of GUI DBA tools. Partly it is because I am a bit old and stuck in my ways and partly it is because GUIs are really just “menus of options”. You are limited to what options are available in your DBA GUI tool and you have a harder time learning all the options available or what is really going on “under the covers”.
But with AWR and all those graphs, links and utilities, you can drill down into real problems real time or in the past so effectively that, well, once they start using this tool properly they will not need me at all. It is a fantastic boon to performance management and problem resolution, as well as proactive performance management.
So there you are, I am with Doug Burns on this one, in that I have Learned to Love Pictures. When the Pictures are well thought out and well connected and simple enough to help make sense of a complex system {and Oh Boy Oracle performance has become sooo Complex!!!!}
So right now, I spend half my day in vi/linux/command line world and half of it in pretty picture/GUI world. I think what really makes me happy is to leave behind the half-way-house of text-like Windows World {Windows SQL*Plus, Notepad}.
Just to finish, you can’t mention AWR without someone raising the ugly issue of licence cost and how Evil Oracle Corp were to charge for it. Well, I think it has been established that the guys and gals who developed AWR/ASH did not expect it to become a cost option but it did. And I suspect that what kept it a cost option was the community’s OutRage at it being a cost option. Anything that popular, hey, a commercial company is going to charge for. I still reckon Oracle Corp ballsed up as making it free and helping people use it a bit would have made 90% of customers’ lives easier and would have translated into user happiness and a certain % of sales for training courses to learn more, but heck my day job is to make things work, not maintain sales percentages, so my opinion counts for nowt. *sigh*
(*apologies to real sufferers of Dissociative Identity Disorder, I am using the term in the loose, non-scientific, “common usage” term of “not sure of my opinion” rather than having truly disparate personalities and memories.** And note, I certainly do not mean schizophrenia which, despite the on-going public-opinion misunderstanding, is rarely anything to do with multiple personality disorders or “spit minds” AT ALL, and is more to do with a difficulty in determining between reality and hallucination. ).
Richard Foote on the Impact of stats staying the same. February 16, 2010
Posted by mwidlake in performance.Tags: Blogging, performance, statistics
add a comment
I just wanted to highlight this very good posting by Richard Foote. He is talking about how SQL execution plans can change when “nothing else does”. Not the table structures, not the code, note the initialisation parameters, not even the table and index stats.
But something does change, which is what day it is (or what hour it is or what week it is). Time moves on and our data does too. If the statistics on the tables does NOT move on, then the CBO thinks that the range of data in the table does not change. So, the CBO thinks your queries are getting further and further “out of range” and so would expect to find less and less data in the tables to bring back. That will lead to plan changes.
If you have noticed my preoccupation with identifying the contents of histograms and high/low values in column stats, you may appreciate that this topic is one I have been finding is a large part of my day job.
Richard explains the point very well, as always, so go have a look.
Stats Need Stats to Gather Stats February 16, 2010
Posted by mwidlake in performance.Tags: partitions, performance, statistics
10 comments
Did you know that you sometimes need good stats so that the stats-gathering package can gather stats in an efficient way? This is a recent, quite extreme example.
I’ve been forced to learn a lot about gathering Oracle stats using DBMS_STATS over the last 4 or 5 years. But no matter how much I learn about the “challengingly bizarre” way in which it works, it seems every week or two there is a new oddity. I plan a whole series on the topic “soon”.
This particular example is from a 10.2.0.3 system.
I am gathering partition-only table stats as we are using the ability of Oracle to roll up Partition stats to Global stats under certain specific conditions. One of the conditions is that you need stats for every partition. Plus, to get global column stats, each partition must have stats for each column. Some of our partitions lacked or had very bad stats.
So I quickly knocked up a script-generating script to create DBMST_STATS.GATHER_TABLE_STATS statements that collected, for those partitions:
- ONLY partition stats.
- NO cascade down to indexes
- BLOCK sampling so it is fast {and poor, but there you go}
- ESTIMATE_PERCENT of 2, which is quite low for block sampling
- collect histograms as we decided the devil of having them was better than the devil of not having them.
the above is not ideal to get “good stats”, but it is quick and gets OK stats which is what we need right now. An example statement is:
begin -- part SD_INFO-DY07032004 rows 34554 dbms_stats.gather_table_stats(ownname => 'ERIC',tabname=> 'SD_INFO' ,partname=> 'DY07032004' ,granularity=> 'PARTITION', estimate_percent => 2,block_sample=> true ,cascade=> FALSE ,method_opt=> 'FOR ALL COLUMNS SIZE AUTO' ,no_invalidate=> false); END; /
Not a big partition (34554 rows and not very wide rows) and so did not take long to gather:
PARTITION_NAME LAST_ANALYZED ------------------------------ -------------------- DY04012004 12-FEB-2010 16:26:57 DY05012004 12-FEB-2010 16:27:00 DY06012004 12-FEB-2010 16:27:04 DY07012004 12-FEB-2010 16:27:07 DY08012004 12-FEB-2010 16:27:11 -- 4 seconds difference DY09012004 12-FEB-2010 16:27:15
I’ve collected statistics for a few thousand partitions over the last couple of days and the time taken is anything between just under half a second to 10 seconds per partition, the odd unusually large partition taking a minute or so. {I believe it takes half a second to gather stats on an empty partition, on our system at least, due to the time it takes for the internal housekeeping, including copying the old statistics information to the SYS.WRI$_OPSTAT_… tables to support restoring stats}. Sorry, I drift away from my main point.
This partition took a lot longer than 10 seconds:
begin -- part W_ACTIVITY_FAILURE-DY02092008 rows 49425 dbms_stats.gather_table_stats(ownname => 'ERIC',tabname=> 'W_ACTIVITY_FAILURE' ,partname=> 'DY02092008' ,granularity=> 'PARTITION', estimate_percent => 2,block_sample=> true ,cascade=> FALSE ,method_opt=> 'FOR ALL COLUMNS SIZE AUTO' ,no_invalidate=> false); END; /
After 10 minutes it was still running. WHY? I quickly checked the number of rows in the partition and then the size of the partition segment, incase either was much larger than I expected. Neither were.
select count(*) from eric.W_ACTIVITY_FAILURE partition (DY07092008)
COUNT(*)
———-
42182
From dba_segments.
BYTES BLOCKS
———- ———-
2621440 320
There is one advantage of a DBMS_STATS statement running for a long time – you can grab from the SGA the actual code being executed for the DBMS_STATS statement. I saw this.
select substrb(dump(val,16,0,32),1,120) ep, cnt from (select /*+ no_parallel(t) no_parallel_index(t) dbms_stats cursor_sharing_exact use_weak_name_resl dynamic_sampling(0) no_monitoring */ "ACCOUNT_ID" val,count(*) cnt from "ERIC"."W_ACTIVITY_FAILURE" t where TBL$OR$IDX$PART$NUM("ERIC"."W_ACTIVITY_FAILURE",0,4,0,"ROWID") = :objn and "ACCOUNT_ID" is not null group by "ACCOUNT_ID") order by val
DBMS_STATS is scanning the ACCOUNT_ID column on this table and it is taking a long time about it. The only index on the table is on ACCOUNT_ID. It then struck us.
The index is a global index.
INDEX_NAME PARTITIONED
—————————— ————-
API_XXXXXXXXXX_ACCOUNTS_IND NO
And that global index actually lacked stats {another feature of DBMS_STATS and rolling up partition stats had led to that}.
INDEX_NAME TYP UNQ BL L_BLKS DIST_KEYS CLUSTF LB_KEY DB_KEY LST_ANL
————— ——— — —- ———- ———– ———– ———- ———- ——–
API_XXXXXXX_ NOR NON
ACCOUNTS_IND
By this time 3 partitions for this table had been processed by my code, taking around 15 minutes each one. Incredibly slow.
I did a very quick 0.01% sample size DBMS_STATS.GATHER_INDEX_STATS on that index which took about 1 minute. As soon as the partition DBMS_STATS.GATHER_TABLE_STATS statement that was in flight finished, the following similar statements on that table’s partitions took under 3 seconds each. I’ll buy a pint for the first person to guess (within 10 minutes) WHEN I collected the global index stats {You can collect from any pub in central London if you give me a few day’s notice 🙂 }.
PARTITION_NAME NUM_ROWS BLOCKS LAST_ANALYZED -------------- ---------- ---------- --------------------- DY01092008 31461 452 16-FEB-2010 09:51:41 DY02092008 49425 686 16-FEB-2010 10:03:44 DY03092008 54472 719 16-FEB-2010 10:16:31 DY04092008 35762 491 16-FEB-2010 10:30:52 DY05092008 42182 587 16-FEB-2010 10:44:24 DY06092008 21186 312 16-FEB-2010 10:56:13 DY07092008 20898 313 16-FEB-2010 11:12:59 DY08092008 469500 1233 16-FEB-2010 11:13:02 DY09092008 480300 741 16-FEB-2010 11:13:04 DY10092008 15724 223 16-FEB-2010 11:31:01 DY11092008 55671 732 16-FEB-2010 11:31:06 DY12092008 820100 1779 16-FEB-2010 11:31:08 DY13092008 80215 1113 16-FEB-2010 11:31:16 DY14092008 10094 155 16-FEB-2010 11:31:18 DY15092008 10268 158 16-FEB-2010 11:31:20 DY16092008 24578 330 16-FEB-2010 11:31:22 DY17092008 21012 290 16-FEB-2010 11:31:24 DY18092008 22755 318 16-FEB-2010 11:31:27 DY19092008 21276 293 16-FEB-2010 11:31:29 DY20092008 20882 281 16-FEB-2010 11:31:31 DY21092008 24131 323 16-FEB-2010 11:31:34
I will investigate this oddity further if I get time this week, but the lesson to myself so far is:
Global indexes lacking stats can result in very long stats gathering times for partitions EVEN WHEN YOU DO NOT CASCADE DOWN TO INDEXES.
Friday Philosophy – Are short Posts Better Than Long Ones? February 12, 2010
Posted by mwidlake in Friday Philosophy.Tags: Blogging, Humour
7 comments
Making Things Better Makes Things Worse February 11, 2010
Posted by mwidlake in development, Management, Perceptions.Tags: behaviour, perception, system development
12 comments
This could be a Friday Philosophy but I’ve got a couple of those lined up already. Anyway, I am suffering at work at the moment. I’m encountering a phenomenon that I have talked about with Dennis Adams a couple of times. It probably has a proper term, but basically it is the odd situation that when you make things better, you get more complaints. {Dennis, do you know the proper term?}
{Update. Dennis was good enough to link to this paper he wrote on customer feedback}
Anyway, Let me explain. You have an application sitting on a database. The screens are slow, the reports take an age to come out, your might even have considerable system instability and unplanned outages. The users are not unhappy. They were unhappy last year. Now they are just cynical and they just expect the system to be slow, unresponsive, flaky. So they do not report any problems.
Then things change. The system gets some much-needed care and attention and now the slowest reports get tuned up, the screens come back faster and less spontaneous department-wide coffee breaks are caused by the system crashing. Everything gets better. But not for the help desk, now they start getting calls. “This report is too slow”. “Why can’t I jump straight from the customer screen to the pending orders screen?”. This happens because the users now realise that something can be done. There is a point in complaining as there is a chance their piece of misery could be made better. I certainly went through this many years ago when I inherited a system that crashed every week. No one mentioned it, they just went to tea and complained about it. The first time it crashed after I arrived I could not believe that no one had called before I had realised it had died. Once it had been up solidly for a couple of months, then when it crashed boy did we hear about it!
Also, when you improve a system and things generally get better, occasionally something will improve and then fall back a bit. Hardly anyone says “thanks” for the initial improvement but they will say something if it improves and then drops back.
That is what is happening for my main client at the moment. The system was not that bad, but it needed some help. Why else would I be there? I’ve been beavering away with the rest of the team and we are making things better, so far mostly at an underlying “getting the overall system straight” level. A new chap has joined to help concentrate on performance and he is really making inroads into specific things, individual reports and processes that need a good sorting out.
So now that things are getting better and performance is generally improving, anything that is still slow is being brought up by the development and support teams. Also, we’ve made a few things slower (I’m sorry, it just happens like that) and they certainly get mentioned.
So, I’m busy. And I could get annoyed at people asking why X is slower when Y and Z are faster. But I don’t, because Dennis explained this counter intuitive theory to me.
I know things are getting better as people are annoyed as opposed to apathetic 🙂
Format of Date Histograms February 6, 2010
Posted by mwidlake in internals.Tags: data dictionary, histograms, statistics
9 comments
Oracle annoyingly handles dates internally in several formats. Dates are stored in tables as seven bytes, each byte representing century, year-in-century, month, day, hour, minute and second, shifted by different amounts. For the high/low values seen in DBA_TAB_COLUMNS or DBA_TAB_COL_STATISTICS, it is stored as a RAW value, where a two-digit hex string represents the century, year-in-century,month etc as before – see this post on decoding high and low column values and check out the comments for corrections.
So what about histograms? You might want to know what is in the histogram for a date column in order to better understand the decisions made by the CBO. Below, I pull out the histogram for an example date column {I’ve trimmed the output a little to save space}:
select table_name ,column_name ,endpoint_value end_val ,endpoint_number rowcount from all_tab_histograms where table_name ='TEST_TABLE_X' and owner ='TEST_1' and column_name = 'PLACED_DT' order by endpoint_number TABLE_NAME COLUMN_NAME END_VAL ROWCOUNT --------------- ------------- ------------- --------------- TEST_TABLE_X PLACED_DT 2,452,258 0 TEST_TABLE_X PLACED_DT 2,454,334 1 TEST_TABLE_X PLACED_DT 2,454,647 2 TEST_TABLE_X PLACED_DT 2,454,737 3 TEST_TABLE_X PLACED_DT 2,454,820 4 TEST_TABLE_X PLACED_DT 2,454,867 5 TEST_TABLE_X PLACED_DT 2,454,929 6 TEST_TABLE_X PLACED_DT 2,454,981 7 TEST_TABLE_X PLACED_DT 2,455,006 8 TEST_TABLE_X PLACED_DT 2,455,024 9 TEST_TABLE_X PLACED_DT 2,455,039 10 TEST_TABLE_X PLACED_DT 2,455,050 11 ... TEST_TABLE_X PLACED_DT 2,455,205 42 TEST_TABLE_X PLACED_DT 2,455,207 43 TEST_TABLE_X PLACED_DT 2,455,209 44 TEST_TABLE_X PLACED_DT 2,455,211 45 TEST_TABLE_X PLACED_DT 2,455,213 46 TEST_TABLE_X PLACED_DT 2,455,215 47 TEST_TABLE_X PLACED_DT 2,455,216 48 TEST_TABLE_X PLACED_DT 2,455,218 49 TEST_TABLE_X PLACED_DT 2,455,220 50 TEST_TABLE_X PLACED_DT 2,455,221 51 TEST_TABLE_X PLACED_DT 2,455,222 52 TEST_TABLE_X PLACED_DT 2,455,222 53 TEST_TABLE_X PLACED_DT 2,455,222 54 TEST_TABLE_X PLACED_DT 2,455,222 55 TEST_TABLE_X PLACED_DT 2,455,223 56 TEST_TABLE_X PLACED_DT 2,455,223 57 TEST_TABLE_X PLACED_DT 2,455,223 58 TEST_TABLE_X PLACED_DT 2,455,223 59 ... TEST_TABLE_X PLACED_DT 2,455,223 69 TEST_TABLE_X PLACED_DT 2,455,223 70 TEST_TABLE_X PLACED_DT 2,455,223 71 TEST_TABLE_X PLACED_DT 2,455,223 72 TEST_TABLE_X PLACED_DT 2,455,223 73 TEST_TABLE_X PLACED_DT 2,455,224 74 TEST_TABLE_X PLACED_DT 2,455,226 75
Well, it looks like a seven digit number so it must be representing the date in a similar way to as described above, yes? No. The format is obviously something different {Oh come ON Oracle, some internal standards would be nice once in a while! 🙂 }
I pulled out the minimum and maximum values from the DBA_TAB_COLUMNS table and translated them:-
COLUMN_NAME NUM_DIST LOW_V HI_V ------------ ---------- ------------------- ------------------- PLACED_DT 375,428 2001-12-13 17:31:38 2010-01-28 23:51:38
So the above low values and high values will pretty much match the first and last values in the histograms, which are 2452258 and 2455226. Any guesses? Those values from the histogram are very close to each other, only 2968 different to cover around 9 years of values. 9 times 365…
The histogram is representing the dates in Julian format, ie number of days since 1st Jan 4712BC. As a quick proof of this:
select to_date('2452258','J'),to_date('2455226','J') from dual; TO_DATE('2452258' TO_DATE('2455226' ----------------- ----------------- 14-DEC-2001 00:00 29-JAN-2010 00:00
Well, what do you know. Very close to the 13th Dec 2001 and 28th Jan 2010
This of course makes sense, storing the date as an ascending numeric means it is simple to calculate the width of the range and how many values per day there are.
Imagine how complex it would be to do this if the date was stored in the bizarre way we humans deal with it – an ascending numeric for year, a cycling 12 digit number for month and a varying cyclic number for the day of the month. And that is ignoring other calendars in common use.
However, I’m looking at this method of representing the date and something bothers me. There is no allowance for the time portion. Maybe column histograms can’t cope with time? I feel a couple of tests coming on…
ADDITIONAL.
If you have seen the comments, you will know that Jonathan commented to ask if I might have truncated/been selective with the data in creating my test table as he is sure there is a fractional section representing the time portion.
Well, I did say I needed to check further so maybe I would have spotted my error on my own… 🙂
This is the script I use to pull out histogram data (and yes, I trimed the output from the script before copying it into this post, so it does not quite match the script):
-- chk_hist.sql -- Martin Widlake 7/4/4 -- pull out the histograms on a table set pause on pages 32 spool chk_hist.lst col owner form A8 col table_name form A15 col column_name form a20 col rowcount form 99,999,999,999 col end_val form 99,999,999,999 select owner ,table_name ,column_name ,endpoint_value end_val ,endpoint_number rowcount from all_tab_histograms where table_name like upper(nvl('&tab_name','WHOOPS')||'%') and owner like upper('&tab_own')||'%' and column_name like upper('&col_name')||'%' order by table_name,column_name,endpoint_number / spool off clear colu
Hmm wonder what happens if I change the “col end_val form 99,999,999,999” so that it would show fractions…
col end_val form 999,999,999.99999999
TABLE_NAME COLUMN_NAM END_VAL ROWCOUNT --------------- ---------- --------------------- ------------ TEST_TABLE_X PLACED_DT 2,452,257.73030093 0 TEST_TABLE_X PLACED_DT 2,454,333.76546296 1 TEST_TABLE_X PLACED_DT 2,454,647.32561343 2 TEST_TABLE_X PLACED_DT 2,454,737.25017361 3 TEST_TABLE_X PLACED_DT 2,454,820.02204861 4 TEST_TABLE_X PLACED_DT 2,454,866.98009259 5 TEST_TABLE_X PLACED_DT 2,454,928.66848380 6 TEST_TABLE_X PLACED_DT 2,454,980.94815972 7 TEST_TABLE_X PLACED_DT 2,455,005.68413194 8 TEST_TABLE_X PLACED_DT 2,455,023.67142361 9 TEST_TABLE_X PLACED_DT 2,455,039.03236111 10 TEST_TABLE_X PLACED_DT 2,455,050.39246528 11
Ahhhhh……
Thanks Jonathan 🙂
It is still a Julian date, but with the time as a fraction as he said.
Friday Philosophy – Software being Good is not Good Enough February 5, 2010
Posted by mwidlake in Friday Philosophy, Uncategorized.Tags: behaviour, knowledge
4 comments
In a previous Friday Philosophy on In Case of Emergency I mention that something being simply a “Good Idea” with technology is not good enough. Even being a GREAT idea is not enough. It also has to be:
- Easy and simple to use. After all, using that funny stick thing behind your steering wheel in a car, to indicate which direction you are turning, seems to be too much of an effort for many people. If installing some bit of softare or running a web page is more than a little bit of effort, most people will not bother.
- Quick. No one has patience anymore, or spare time. This will probably be the fourth year in a row I do not plant any vegetables in our garden as I need to spend a day or two clearing and digging over the spot for said veg. You can’t beat home-grown veg. Similarly, I won’t use a web pages that takes as long to load as it does to plant a carrot seed.
- Known about. There could be a really fantastic little program Out There that allows you to take a screen shot, add a title and comment and pastes it straight into a document for you, converting to half a dozen common formats on the fly. But I do not know about it. { ScreenHunter is pretty good, I have to say, and when I have shown it to people a lot of them like it}.
- Popular. This is not actually the same as “known about”. For a stand-alone application to be good for you, you just need to know where it exists. Like maybe a free building architecture package. Whether thousands of people use it is moot, so long as you can get your extension drawings done in it with ease, that makes it great. But something that relies on the community, like a service to rate local eataries, unless lots of people use it and add ratings, well who cares. There are dozens (if not hundreds) of such community “good ideas” started every day but unless enough people start to use it, it will fizzle out, as the vast majority of them do.
Point 4 is highly relevant to “In Case Of Emergency” as it is simple, quick and relativley known about. It just needs to be ubiquitous.
I became very aware of point 3 a few years ago and also of the ability for very clever people to be sometimes very stupid when it comes to dealing with their fellow humans.
I was working on a database holding vast quantities of DNA information. If you don’t know, DNA information is basically represented by huge long strings of A, C, T and G. So something like AACTCGTAGGTACGGGTAGGGGTAGAGTTTGAGATTGACTGAGAGGGGGAAAAATGTGTAGTGA…etc, etc, etc. These strings are hundreds, thousand, hundreds of thousands of letters long. And Scientists like to search against these strings. Of which there are millions and millions. Not for exact match mind, but kind-of-similar, fuzzy matches, where for example 95% of the ACTGs match but some do not. It’s called a BLAST match.
Anway, suffice to say, it takes a lot of compute power to do this and a fair amount of time to run. There was a service in America which would allow you to submit a BLAST query and get the answer in 20 minutes or so {I have no idea how fast it is now}.
Some extremely clever chaps I had the pleasure of working with came up with a faster solution. Same search, under 5 seconds. Now that is GREAT. We put together the relevant hadware and software and started the service. Now I thought it went beyond Good or even Great. It was Amazing (and I mean it, I was amazed we could do a fuzzy search against a billion such strings in 2, 3 seconds using a dozen or so PC-type servers).
No one used it. This was because almost no one knew about it and there was already this slow service people were used to using. People who used the old service never really thought to look for a new one and the chances were they would not have found ours anyway.
I pushed for more to be made of this new, faster service, that it should be advertised to the community, that it should be “sold” to people (it was free to use, by “sold” I mean an attempt made to persuade the scientific community it was worth their while investigating). The response I was given?
“If the service is worth using, people will come and use it”.
No they won’t. And indeed they didn’t. It was, I felt, a stupid position to take by an incredibly inteligent person. How were people to know it existed? Were they just supposed to just wake up one morning knowing a better solution was out there? The internet pixies would come along in the night and whisper about it in your ear? In the unlikely event of someone who would be interested in it just coming across it, were they then going to swap to using it? After all no one else seemed to know about it and it was 2 orders of magnitude faster, suspiciously fast, how could it be any good?
The service got shut down as it was just humming in the corner consuming electricity. No one knew it existed, no one found it, no one came. I can’t but help wonder how much it could have helped the scientific community.
There must be thousands of other “failed” systems across the IT world that never took off just because the people who could use it never knew it existed. Depressing huh?