jump to navigation

Automatic Statistics Gathering Fails #3 July 29, 2009

Posted by mwidlake in internals, performance.
Tags: , ,

I’m steeling a few minutes from my lunch time to finish off this thread on the automated statistics gathering failing.
<<first post<second post

There is another way to spot what the automated stats gathering choked on but it is not that reliable. But it does involve digging a little into Oracle’s internals, which is fun.

Under 10.2 onwards, when you gather stats on a segment, either directly or via a schema, database or the automatic stats collection, Oracle stores the stats that are replaced for you automatically {I say 10.2 as I am not sure that segment-level gather_table/index_stats are automatically backed up like this. I have some old notes saying it seemed not but I am not sure if this was 9i, 10.1 or me just getting confused at the time. If anyone knows or can check back, let me know ūüôā }

This means you can get the stats back using the dbma_stats.restore_xxxxxxxx_stats procedures. eg dbms_stats.restore_table_stats(ownname=>user,tabname=>’TEST_P’,as_of_timestamp=>systimestamp-1);
This will restore the stats of my table TEST_P to what they were at this time yesterday. You do not need to create a stattab table and store the prevous stats manually.
I’m not going to blog any more right now about this handy feature, the “PL/SQL packages and types” manual will tell you what you need, but I will comment that by default you can only go back 31 days.

Oracle gives you a table to see the history of stats stored, DBA_TAB_STATS_HIST

desc dba_tab_stats_history
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 OWNER                                              VARCHAR2(30)
 TABLE_NAME                                         VARCHAR2(30)
 PARTITION_NAME                                     VARCHAR2(30)
 SUBPARTITION_NAME                                  VARCHAR2(30)
 STATS_UPDATE_TIME                                  TIMESTAMP(6) WITH TIME ZONE

select owner,table_name, partition_name,stats_update_time
from dba_tab_stats_history
where table_name='TEST_P'  and owner=user

-------- ---------- --------------- --------------------------------
WIDLAKEM TEST_P                     28-JUL-09 +00:00
WIDLAKEM TEST_P     ID_10K          28-JUL-09 +00:00
WIDLAKEM TEST_P     ID_30K          28-JUL-09 +00:00
WIDLAKEM TEST_P     ID_MAX          28-JUL-09 +00:00
WIDLAKEM TEST_P     ID_40K          28-JUL-09 +00:00
WIDLAKEM TEST_P     ID_20K          28-JUL-09 +00:00
WIDLAKEM TEST_P                     28-JUL-09 +00:00
WIDLAKEM TEST_P     ID_MAX          28-JUL-09 +00:00
WIDLAKEM TEST_P     ID_40K          28-JUL-09 +00:00
WIDLAKEM TEST_P     ID_30K          28-JUL-09 +00:00
WIDLAKEM TEST_P     ID_20K          28-JUL-09 +00:00
WIDLAKEM TEST_P     ID_10K          28-JUL-09 +00:00
WIDLAKEM TEST_P                     28-JUL-09 +00:00
WIDLAKEM TEST_P     ID_MAX          28-JUL-09 +00:00
WIDLAKEM TEST_P     ID_40K          28-JUL-09 +00:00
WIDLAKEM TEST_P     ID_30K          28-JUL-09 +00:00
WIDLAKEM TEST_P     ID_20K          28-JUL-09 +00:00
WIDLAKEM TEST_P     ID_10K          28-JUL-09 +00:00

You can see that I probably gathered stats on my test table 3 times yesterday, each time I gathered at a granularity of all, ie global and partition stats. Note also, all partitions and the table get the same timestamp. I think this is because oracle records the timestamp as when the stats for the set of segments was swapped into the data dictionary {note, not started}.

That’s quite useful. However, there is no similar view for indexes. That’s a shame. But you can recover index stats and this view is saying nothing about the stats as they were?…Is there more to be found?… Let’s go see what that view is looking it

Enter value for vw_name: dba_tab_stats_history
OWNER    VIEW_NAME                      TEXT_LENGTH
-------- ------------------------------ -----------
SYS      DBA_TAB_STATS_HISTORY                  876
select u.name, o.name, null, null, h.savtime
  from   sys.user$ u, sys.obj$ o, sys.wri$_optstat_tab_history h
  where  h.obj# = o.obj# and o.type# = 2 and o.owner# = u.user#
  union all
  -- partitions
  select u.name, o.name, o.subname, null, h.savtime
  from   sys.user$ u, sys.obj$ o, sys.wri$_optstat_tab_history h
  where  h.obj# = o.obj# and o.type# = 19 and o.owner# = u.user#
  union all
  -- sub partitions
  select u.name, osp.name, ocp.subname, osp.subname, h.savtime
  from  sys.user$ u,  sys.obj$ osp, obj$ ocp,  sys.tabsubpart$ tsp,
        sys.wri$_optstat_tab_history h
  where h.obj# = osp.obj# and osp.type# = 34 and osp.obj# = tsp.obj# and
        tsp.pobj# = ocp.obj# and osp.owner# = u.user#
  union all
  -- fixed tables
  select 'SYS', t.kqftanam, null, null, h.savtime
  from  sys.x$kqfta t, sys.wri$_optstat_tab_history h
  t.kqftaobj = h.obj#

Now that is interesting. Let’s go look at that table:

desc sys.wri$_optstat_tab_history
 Name                                      Null?    Type
 ----------------------------------------- -------- ---------------------------
 OBJ#                                      NOT NULL NUMBER
 SAVTIME                                            TIMESTAMP(6) WITH TIME ZONE
 FLAGS                                              NUMBER
 ROWCNT                                             NUMBER
 BLKCNT                                             NUMBER
 AVGRLN                                             NUMBER
 SAMPLESIZE                                         NUMBER
 ANALYZETIME                                        DATE
 CACHEDBLK                                          NUMBER
 CACHEHIT                                           NUMBER
 LOGICALREAD                                        NUMBER
 SPARE1                                             NUMBER
 SPARE2                                             NUMBER
 SPARE3                                             NUMBER
 SPARE4                                             VARCHAR2(1000)
 SPARE5                                             VARCHAR2(1000)
 SPARE6                                             TIMESTAMP(6) WITH TIME ZONE

Some of those columns look interesting. Unforturnately CACHEDBLK onwards are empty, even in version 11 {but it shows that Oracle have built in the ability to use those yet-to-be used columns about average cached blocks and cache hit ratios for segments you might have spotted in eg DBA_TAB_STATISTICS}.
Could there be an index version? Of course there is :-

desc sys.wri$_optstat_ind_history
 Name                                      Null?    Type
 ----------------------------------------- -------- ---------------------------
 OBJ#                                      NOT NULL NUMBER
 SAVTIME                                            TIMESTAMP(6) WITH TIME ZONE
 FLAGS                                              NUMBER
 ROWCNT                                             NUMBER
 BLEVEL                                             NUMBER
 LEAFCNT                                            NUMBER
 DISTKEY                                            NUMBER
 LBLKKEY                                            NUMBER
 DBLKKEY                                            NUMBER
 CLUFAC                                             NUMBER
 SAMPLESIZE                                         NUMBER
 ANALYZETIME                                        DATE
 GUESSQ                                             NUMBER
 CACHEDBLK                                          NUMBER
 CACHEHIT                                           NUMBER
 LOGICALREAD                                        NUMBER
 SPARE1                                             NUMBER
 SPARE2                                             NUMBER
 SPARE3                                             NUMBER
 SPARE4                                             VARCHAR2(1000)
 SPARE5                                             VARCHAR2(1000)
 SPARE6                                             TIMESTAMP(6) WITH TIME ZONE

I’ve written a couple of scripts to extract data from these underlying tables, with the extra data you can grab. Use these links to download the table script and the index script.
Here below is the index script though:-

-- show_ish_full
-- Martin Widlake 14/4/08
-- this gets the index stats history - there is no index equiv of dba_tab_stats_hist
-- that I can see in 10.2 (or 11)
col owner form a12
col ind_full_name form a40
col stat_upd_time form a15
col blevel form 99 head bl
select owner,ind_full_name,
to_char(stat_upd_time,'MMDD HH24:MI:SS') stat_upd_time
,numrows,numleafs,blevel,dist_kys,lf_p_ky, dbl_p_ky,clufac,samp_size,obj_hash
,greatest(nvl(samp_size,1),1)/greatest(nvl(numrows,1),1) samp_pct
 from (
select u.name owner, o.name ind_name,o.name ind_full_name
, h.savtime stat_upd_time
,h.rowcnt numrows,h.leafcnt numleafs, h.blevel blevel,h.distkey dist_kys
,h.lblkkey lf_p_ky,h.dblkkey dbl_p_ky,h.clufac clufac, h.samplesize samp_size,h.obj# obj_hash
  from   sys.user$ u, sys.obj$ o, sys.wri$_optstat_ind_history h
  where  h.obj# = o.obj# and o.type# = 1 and o.owner# = u.user#
  union all
  -- partitions
  select u.name owner, o.name ind_name, o.name||'-'||o.subname ind_full_name
, h.savtime stat_upd_time
,h.rowcnt numrows,h.leafcnt numleafs, h.blevel blevel,h.distkey dist_kys
,h.lblkkey lf_p_ky,h.dblkkey dbl_p_ky,h.clufac clufac, h.samplesize samp_size,h.obj# obj_hash
  from   sys.user$ u, sys.obj$ o, sys.wri$_optstat_ind_history h
  where  h.obj# = o.obj# and o.type# = 20 and o.owner# = u.user#
  union all
  -- sub partitions
  select u.name owner, osp.name ind_name, osp.name||'-'||ocp.subname||'='|| osp.subname ind_full_name
, h.savtime stat_upd_time
,h.rowcnt numrows,h.leafcnt numleafs, h.blevel blevel,h.distkey dist_kys
,h.lblkkey lf_p_ky,h.dblkkey dbl_p_ky,h.clufac clufac, h.samplesize samp_size,h.obj# obj_hash
  from  sys.user$ u,  sys.obj$ osp, obj$ ocp,  sys.indsubpart$ tsp,
        sys.wri$_optstat_ind_history h
  where h.obj# = osp.obj# and osp.type# = 35 and osp.obj# = tsp.obj# and
        tsp.pobj# = ocp.obj# and osp.owner# = u.user#
) where ind_name like upper(nvl('&indname','whoops')||'%')
and owner like upper('&indown')||'%'
order by owner,ind_name,stat_upd_time desc
clear colu

I’ll leave you to play with these as you wish, especially as I ran out of time to finish this blog an hour ago and now owe my current employer some of my weekend in return, but I’ll finish off with a use for the scripts.

One thing to bear in mind, these tables are holding the stats that were replaced, not the stats that were created. So they hold historical information.

If you read the earlier posts, you may remember that the automatic stats job can choke and fail on extremely large segments. Well, sometimes {and I have to admit, only sometimes} you can spot what segments it failed on. That is because, at the end of the window when the job gets stopped, it attempts to clean up after itself. It appears to check for stats that have changed during the run but it failed to process correctly and resets them. It copes the stats back. So you sometimes see this:-

OWNER        TAB_FULL_NAME                            STAT_UPD_TIME           NUMROWS    NUMBLKS AVG_RL  SAMP_SIZE OBJ_HASH                           SAMP_PCT
------------ ---------------------------------------- -------------------- ---------- ---------- ------ ---------- --------                           --------
TABOWNER1   HUGETABLE4                               0522 06:00:01        1473990266   27319245    128    5660375    54714                               .004
TABOWNER1   HUGETABLE4                               0522 05:48:22        1327119100   24544873    128   13271191    54714                               .010

In the above case, you can see that this massive table had stats set at 5:48am and then again at 6:00.01am. That second record is the automated stats swapping the last known good stats back in place. That is the segment it choked on.

How do you stop the automatic stats job chocking on this extent? You do a manual gather of course. You will see the above job attempted a 0.1% sample size that almost worked at 05:48. The value it swapped back is 0.04 – which was the sample size of the manual gather I did a few days earlier to get over the last choke. {I also tweaked the histograms I gathered to save time}. I had not yet finished converting the manual fix to an automated one when I recorded the above.

Why do I think that the automatic job sets the timestamp in the WRI$_OPTSTAT_xxx_HIST tables when it swaps in the new stats and not when it starts? Because of the time of the entries after a large object has been analysed. Why do I think it checks for stats that have changed since the start of the run and replaces them rather than keeping track of the information as it goes? Because, when I first came across this choking issue, I was veryifying the problem after 10pm at night. When I realised there was an issue at around 1am, I started manually gathering stats. It took me until 4am, I checked they were in place and went to bed. Imagine my bad temper the next day when I found out that the automatic job had gone and re-wound most of the stats I had gathered, resetting them to what they had been at 10am the previous night. It was a seriously bad temper.

Addition – I’ve had a comment that may or may not be spam, saying they do not understand “the last bit” (If you are a real person molamola, I apologise). Reading it, it is maybe not clear, so this is what happened and why I was angry ūüôā

1) I had realised the auto stats job was failing overnight.
2) I logged on from home at just before 10pm and watched the system
3) at 10pm the auto stats job started and collected stats on some small tables and indexes
4) around 11,11:30pm I realised is was not progressing,
5) from 11:30 to 3am in the morning, I created my own gather_table_stats and gather_index_stats statements and ran them.
6) at around 4am I went to bed, having checked stats were gathered and execution plans were good.
7) Got to work around 9am, a bit tired.
8) Found lots of performance issues due to old stats.
9) Shouted randomly at people and got angry, started re-gathering stats.
10) Worked out that at 6am, when the auto stats job finished, all the tables/indexes IT had intended to gather stats on but had failed to do so, it reset the stats to what they had been set to at 10pm the previous night, when it started.
ie very old and bad stats.


Oracle Exadata – does it work? July 26, 2009

Posted by mwidlake in development, VLDB.
Tags: ,

Does Oracle Exadata work? 

That is a tricky question as, unlike most Oracle database¬†features, you can’t download it and give it a test.

You can try out partitioning, bigfiles, oracle Text, InterMedia {sorry, Multimedia),} all sorts of things by downloading the software. You can even try out RAC pretty cheaply, using either VM-Ware or a couple of old machines and linux, and many hundreds of Oracle techies¬†have. The conclusion is that it works. The expert conclusion is “yes it works, but is it a good idea? It depends {my fees are reasonable}” :-).

I digress,¬†this ability to download and play¬†allows Oracle technophiles to get some grounding in these things, even if their employer is not currently looking to implement them {BTW how often do you look at something in your own private time that your company will not give you bandwidth for – only to have them so very interested once you have gained a bit of knowledge? Answers on a postcard please…}.

Exadata is another beast, as it is hardware. I think this is an issue.

I was lucky enough to get John Nangle to come and present on Exadata at the last UKOUG Management and Infrastructure meeting, having seen his talk at a prior meeing. John gave a very good presentation and interest was high. I have also seen Joel Goodman talk {another top presenter}, so I understand the theory. I have to say, it looks very interesting, especially in respect of what is ,perhaps, my key area of personal¬†expertise, VLDB. Databases of 10’s of terabytes.

I don’t plan to expand here on the concepts or physical attributes of Exadata too much, it is¬†enough¬†to say that it appears to gain it’s advantage via two main aspects:-

  • Intelligence is¬†sited at the “disc controller” level {which in this case is a cheap 4-cpu HP server, not really the disc controller} which basically pre-filters the data coming off storage so only the data that is of interest is passed back to the database.¬† This means that only blocks of interest are chucked across the network to the database.
  • The whole system is balanced. Oracle have looked at the CPU-to-IO requirements of data warehouses and decide what seems to be a good balance, they have implemented fast, low latency¬†IO via infiniband and made sure there are a lot of network pipes from the storage up the stages to the database servers. That’s good.

The end result is that there is lots of fast, balanced IO from the storage layer to the database and only data that is “of interest” is passed up to the database.

It all sounds great in theory and Oracle Corp bandy around figures of up to 100 times (not 100%, 100 times) speedup for datawarehouse activity, with no need to re-design your implementation. At the last M&I UKOUG meeting there was also someone who had tried it in anger and they said it was 70 times faster. Unless this was a clever plant by Oracle, that is an impressive independent stated increase.

I am still very interested in the technology, but still sceptical. After all, RAC can be powerful, but in my experience it is highly unlikely that by dropping an existing system onto RAC you will get any performance (or high availability) increase. In fact, you are more likely to just make life very, very difficult for yourself. RAC works well when you design your system up-front with the intention of working on the same data on the same nodes. {Please note, this is NOT the oft-cited example of doing different work types on different nodes, ie data load on one node, OLTP on another and batch work on the third. If all three are working on the same working set, you could well be in trouble. You are better off having all load, OLTP and Batch for one set of data on one node, OLTP-load-batch  for another set of data on another node etc, etc, etc. If your RAC system is not working well, this might be why}.  Similarly, partitioning is an absolutely brilliant feature РIF you designed it up-front into your system. I managed to implement a database that has scaled to 100 TB with 95% of the database read-only {so greatly reducing the backup and recovery woes} as it was designed in from the start.

Where was I? Oh yes, I remain unconvinced about Exadata. It sounds great, it sounds like it will work for datawarehouse systems where full table scans are used to get all the data and the oracle instance then filters most of the data out. Now the storage servers will do that for you.  You can imagine how instead of reading 500GB of table off disc, across the network and into Oracle memory and then filtering it, the  eight disc servers will do the filtering and send a GB of data each up to the database. It has to be faster.


What if you have some OLTP activity and some of the data is in the SGA? That is what stops full-table-scans working at Multi-Block-Read_Count levels of efficiency.

What happens if some of the table is being updated by a load process at the same time?

¬†What happens if you want some of the data hosted under ASM and full Exadata performance brilliance¬†but you have¬†several 10’s of TB of less-active data you just want to store on cheap SATA raid 5 discs as well? How does Exadata integrate then?

You can’t test any of this out. I did email and ask John about this inability to play with and discover stuff about a solution that is hardware and very expensive.¬†And he was good enough to respond, but I think he missed the point of my question {I should ask again, he is a nice chap and will help if he can}. He just said that the DBA does not have to worry about the technology, it just works. There are no special considerations.

Well, there are. And I can’t play with it as I would need to buy a shed load of hardware to do so. I can’t do that, I have a wife and cat to feed.

So even though Exadata sound great, it is too expensive for anyone but large, seriously interested companies to look in to.

And I see that as a problem. Exadata experts will only come out of organisations that have invested in the technology or Oracle itself. And I’m sorry, I’ve worked for Oracle and as an employee you are always going to put the best face forward.¬† So, skills in this area are going to stay scarce unless it takes off and I struggle to see how it will take off unless it is not just as good as Oracle says , but better than Netezza and Teradata by a large margin.

Does anyone have an exadata system I can play on? I’d love to have a try on it.

Friday Philosophy – Simply Complex July 24, 2009

Posted by mwidlake in development, Management.
Tags: , ,

Piet de Visser is an ardent champion of simple solutions within the Oracle arena – and I completely agree with his outlook on this. {Almost}.

Simple solutions usually win long-term as they are easy to understand, easy to look after and easy to change. OK, you may not be getting the absolute best performance you can, you may not be doing the job as completely as you can, but if it is a simple solution then you probably implemented it easily and quickly. This probably also means it cost you not-a-lot in person time and brain power, so you can personally afford to blow it away and do it again. In fact, with all that saved time, money and brain power you can probably afford to create a couple more simple solutions to other problems to.

Perhaps¬†the only thing you are probably losing out on is the kudos of having been smart enough to come up with something very cunning and complicated, to impress everyone with. You’ll get over it, people don’t stay impressed for very long, especially when your mind-bendingly cunning and complicated solution melts into a heap {as a couple of mine have, ho-hum}.

Is your chosen solution simple? I give you a simple test – Explain it to someone.

  • If you can explain it to your colleagues, it is probably quite simple. If you can’t, either the solution is not so simple or your colleagues are.
  • If you can explain it to your boss then it is probably an excellent example of simplicity.
  • If you can explain it to your mum, you have blindingly clever simplicity and your work here is done.

Oh, you remembered that I said I almost agreed with Piet.

I can think of four reasons for not implementing a simple solution. I have listed them in ascending order of being a good reason (best last). And, unfortunately, also descending order of likelihood (most likely last).

  • You were sold a complex solution as complex solutions earn the vendor more money.
  • You needed to impress someone with your incredible technical skills {this could be your peers,¬†quite¬†often it is your¬†boss, but for most of us it is usually ourselves, let’s be honest ūüôā }
  • You really do need to do something complex for a very valid business reason, like 99.999% availability¬†{eg for that system¬†people can¬†ring up as they have a cough but are convinced they are dying of “swine flu”}.
  • you are creating a generic solution.

What do I mean by the last point? I mean that your one solution has to work for a lot of different situations or usages. The system has to work for a large range of inputs or do a lot of things.

The whole Oracle database  is {in my opinion} a generic solution. A vast and complex one to be sure, but it is intended to work for a little database on a desktop keeping track of the parts in your workshop, an integrated system in eg medical or scientific robots keeping track of thousands of samples, vast data stores of telephone calls so you can do your bills, huge on-line web-based shopping systems, a front end to image or video stores.., the list is a big one. You might need a little Oracle database to hold the list in.

With version 10 Oracle Corp made a big thing about the database looking after itself .  The database was a generic, self-managing, handle-anything solution and you would not need those pesky DBA types to keep the beast working for much longer.

That is why it is so complex and, not matter how much some of us complain {as I myself often do}, it has to be and it is getting more complex with every version. I’ll take my current favorite topic, stats gathering, as an example.

Back with the rule based optimiser, you had a set of rules. 15-16 I think {I’ll let you lot check – google “rule based optimizer oracle¬†-burleson”}. You learnt the rules, understood them and you were home and dry. Except that the RBO could not cope with data-specific oddities, how different access methods were better for different table sizes and index specificity.

So¬†Oracle added statistics and the cost based optimiser. To make use of the¬†cost based¬†logic¬†a load of mathematical calculations and considerations had to be added (and continues to be added), based on statistics¬†you had to collect at the right time and the right level and many sites did not. People complained¬†the CBO “just did not work”, which it did not if you didn’t collect the stats {and sometimes even when you had} but¬†it was doing a lot to cope with a wider range of systems automatically. Histogram stats now allowed skewed data to be coped with, in most situations.¬†

So they introduced a job to do it for you but it had to detect the right level and type of statistics to gather on all objects, be they tiny tables, massive tables, tables with skewed data, indexes, global indexes on partitioned tables…¬†And yes, once again, it is another complexity you have to get to grips with if it does not fit your particular system demands.

I’m sure you can argue with me over the details, but I think I’m making a valid point that every time a system {be it Oracle or an O/S} is modified to cope automatically with more senarios, it becomes a more complex system. You need a DBA with several manuals welded to their brains to get the best out of this thing now, not less as claimed back at 10’s release {did they make the same claims for 11? I was too busy keeping systems running to really notice at the time}.

Maybe the answer is to stop using generic systems like Oracle and swap them out for a mixture of¬†spreadsheets, MySQL-type simplistic¬†databases, netezza-type¬†systems for datawarehouses, hand cut ‘C’ applications for handling image stores, JAVA apps and flat files for web sites…Only you are going to have to learn how to use all those things to create all the systems you need to create.

You are only going to manage this if you create those dozens of systems as simple ones.

Fuggles was very simple. The lights were on but nobody was home. But that was fine as all she wanted was to sit on you and be scratched.

Fuggles was very simple. The lights were on but nobody was home. But that was fine as all she wanted was to sit on you and be scratched.

Automated Statistics Gathering Silently Fails #2 July 23, 2009

Posted by mwidlake in performance.
Tags: ,

<previous postnext post>

My first posting about the automated stats gathering job on 10/11g failing is here. I show how it can fail for really large segments and can go on failing night after night, the little so-and-so.

Today I am going to give you some pointers in spotting what the job choked on and how to deal with it {damn, ran out of time – see next post}.

How can you tell what it is failing on? Well, you can use OEM to look at the latest run of the automated stats gathering job and what was processed. It probably won’t tell you what it failed on, it did not in the version I tried (early 10.2) and no one has pinged me to say it now does. Look back at the previous posting for a screen shot if you want help finding the page in OEM.

However, you can check what was analyzed last night and compare to the list of objects shown. Concentrate on tables or partitions you know to be big, the job is not going to choke on a 14-row EMP table, more like a half-billion row ORDERS table.

select owner,table_name,last_analyzed,sample_size
from dba_tables
where last_analyzed between sysdate-.5 and sysdate -.4
order by table_name
dw> /
OWNER          TABLE_NAME                     LAST_ANALYZED     SAMPLE_SIZE
-------------- ------------------------------ ----------------- -----------
ODS XXXX_TIM_RF5 16-JUL-2009 00:09 19
ODS            H_XXXXXXXXXXXXXXX_TYPES        16-JUL-2009 00:10      272838
ODS            H_TXXXXXXXXXXXXXXXNER          16-JUL-2009 00:10       85184
ODS            H_XXXXXXXXXXXXXXXCTIONS        16-JUL-2009 00:04      165999
ODS            STGXXXXXXXXXXXXXXXCTIONS       16-JUL-2009 00:51          10
ODS            TBL_XXXXXXXX                   16-JUL-2009 01:06     3499693
ODS WRK_XXXXXX_FLAG 16-JUL-2009 01:09 173340

If it shows as being analysed by the automated job last night and it’s last_analyzed date is prior to the run {so is not in the list you just pulled out}, it choked on that.
It is often quicker to make an educated guess and check those objects that you know are big and what their LAST_ANALYZED date is.

If you have no access to OEM, you could try looking at objects that should have been analysed. These are tables/partitions with more than 10% difference. You do this via DBA_TAB_MODIFICATIONS. I’ve already posted about this table {look towards the end of the post}, I am strangely attached to it, I think it is jolly useful.

The table looks like this

desc all_tab_modifications
 Name                                 Null?    Type
 ---------------- -------------------------------
 TABLE_OWNER                           VARCHAR2(30)
 TABLE_NAME                             VARCHAR2(30)
 PARTITION_NAME                        VARCHAR2(30)
 INSERTS                                   NUMBER
 UPDATES                                   NUMBER
 DELETES                                   NUMBER
 TRUNCATED                               VARCHAR2(3)

You can interogate it with something like the below {which only looks at tables, not partitions} You can get my script here – this script here

-- tab_count
-- mdw 11/05/03
-- mdw 17/01/08 Modified to look at dba_tab_modifications
set pause on pages 24 pause 'Any Key>'
colu anlyzd_rows form 99999,999,999
colu tot_rows form 99999,999,999
colu tab_name form a30
colu chngs form 99,999,999
colu pct_c form 999.999
set lines 110
spool tab_count.lst
select dbta.owner||'.'||dbta.table_name tab_name
,dbta.num_rows anlyzd_rows
,to_char(dbta.last_analyzed,'yymmdd hh24:mi:ss') last_anlzd
-nvl(dtm.deletes,0) tot_rows
,nvl(dtm.inserts,0)+nvl(dtm.deletes,0)+nvl(dtm.updates,0) chngs
     /greatest(nvl(dbta.num_rows,0),1)                          pct_c
,dtm.truncated trn
from dba_tables dbta
left outer join sys.dba_tab_modifications dtm
on dbta.owner = dtm.table_owner
and dbta.table_name = dtm.table_name
  and dtm.partition_name is null
where dbta.table_name like upper(nvl('&Tab_name'||'%','WHOOPS'))
and dbta.owner        not in ('SYS','SYSTEM')
spool off
clear colu

And it comes out with something like {view plain}:-

------------------------ -------------- --------------- -------------- ----------- -------- --
O.ADGHERHEHES                       384 060726 09:49:22            391           7     .018 NO
O.AUEHERHERHERATUS               15,578 060825 15:39:38         15,578       2,185     .140 NO
O.ADRTJRTJRENT                1,865,595 060823 07:32:53      1,892,301      26,706     .014 NO
I.APP_ARTHRTHRT_EXTR                  0 060726 22:10:21              0           0     .000 YES
I.APP_ARTJHRJHTJH_EXTR                0 060726 22:10:21              0           0     .000 YES
O.AD_RHRTHRTHRS                 143,435 061110 15:49:46        186,646      86,899     .606 NO
O.AD_RHRTHHORHUMNS              599,066 061110 15:49:57        885,304     286,238     .478 NO
O.AD_UNMTABRTHLES                    25 061110 15:49:41             31           7     .280 NO
O.AD_FTD_SRCORTHHOLD                 23 060726 09:49:25             23           1     .043 NO

Anything with a PCT_C {percentage change} of over 0.1 should have stats gathered next time the job runs, if the table does not have it’s statistics locked {phew, remembered to mention that caveat}.

You could of course write something yourself that only selects objects that have 10% or more change – it will give you a heads up of what is likely to be analyzed tonight and those objects who’s dependent SQL could act differently come the morning ūüôā

There is a final way to spot what got changed and it involves Oracle’s internal tables for storing stats, but I’ve run out of time. I’ll have to post that next time.

Automated statistics gathering silently fails July 20, 2009

Posted by mwidlake in performance.
Tags: ,

…Next Post>
Or “The worst Oracle Performance Feature.”

I posted a few days ago about what I feel is the best performance feature in Oracle and cited the Automated Stats Gathering job. A few people did not agree… ūüôā So, in the best traditions of politicians and lawyers the world over, I would now like to claim I was misunderstood and that is not what I said. I said the opposite…(* see later).

Of the many problems that can occur with the automated statistics job, and the one I am going to pick up on today, is the fact that it can choke on large table or table partition segments {and does so silently at that}. What do I mean by this?

The automatic stats job runs every weekday night, by default between 22:00 hours and 06:00 hours. It appears to have some logic where by it looks at the number of table segments (full tables or partitions within a table) that need to be analysed. It then calculates the amount of time it can spend on each segment. If it has 32 objects to analyse in the 8 hours available, it seems to decide it can afford to spend not 8/32 (1/4) hours on any given segment but something more. It looks to be a couple of hours.

The job also tries to prioritise the most “in need” segments to be considered first. I am not sure what it goes on but I think it is those with the largest percentage change, but it is not just as simple as that.

As a result of the above, this is what seems to happen. The stats job starts up and identifies a massive segment to analyse, say a 200GB table. It starts analysing it, say allowing 2 hours maximum, but after 2 hours it has not analysed the massive table so it gives up.

The next night, it identifies the same massive segment to analyze (now 201GB) and maybe gives it an even higher priority. The job tries to analyse it IN EXACTLY THE SAME WAY as before. Oddly enough {this is sarcasm} it usually fails. The only reason it WILL succeed is if the night before the automated job managed to analyse other segments and fewer new ones need considering tonight, so it can spend more time on each segment. In my experience this is unlikely.

As a result, the object will probably fail to be analysed every night until the weekend, when the automated job gets to spend 2 whole days analysing segments and so can give each one more time. Unless you spot the error yourself and do a manual dbms_stats.gather_xxx statement yourself, with a low sample size that is likely to succeed.

How can you tell this is happening, or has happened? Look at the history of schema/database wide statsistics gathering operations in the table DBA_OPTSTAT_OPERATIONS . If a job ran all night, it probably choked {in the below, I look at table sys.WRI_OPSTAT_OPR – it is the underlying object}:

The highlighted record is an example of where the automated stats job ran all night. It did this because it choked on a large segment. There are 3 other examples of this in this screen shot. Once the job starts choking, it becomes a constant problem!

The highlighted record is an example of where the automated stats job ran all night. It did this because it choked on a large segment. There are 3 other examples of this in this screen shot. Once the job starts choking, it becomes a constant problem!

{You may ask why this is not failing the next night, like I claimed. It was because we were sorting out the failure the next morning, but had not yet properly fixed the problem}

So, for very large segments, the automated stats job can fail to analyse them.


No error appears in the alert log that I have seen, OEM or grid control does not seem to flag it up anywhere on the system summary or performance monitoring pages.

In OEM you can look at the latest stats run and see how it did. It lies. {Well, last time I checked it lies, I do not currently have access to OEM on a site suffering from the problem and, as you can imagine, clients do not want me to create massive tables just to fake this up for my blog ūüôā }. OEM has a screen to show the activity of the latest automated stats gathering run but, even though it has the option to look for objects for which there is a timeout, it does not show them.

Even though there appears to be the option to show objects for which the stats gather timed out, at least in this version it does not.

Even though there appears to be the option to show objects for which the stats gather timed out, at least in this version it does not.

This situation can get completly out of hand. In the below, pulled out with this script here from one site I have seen, the stats was choking every night. Not only at night, it was now choking every weekend {click on “view plain” to see a better layout}¬†.

-- Martin Widlake
-- show when and what type of multi=object stats gathering runs have occured
set lines 100
col operation form a40 wrap head 'operation(on)'
col target form a1
spool show_auto_stat_runs.lst
select operation||decode(target,null,null,'-'||target) operation
      ,to_char(start_time,'YYMMDD HH24:MI:SS.FF4') start_time
      ,to_char(  end_time,'YYMMDD HH24:MI:SS.FF4') end_time
from dba_optstat_operations
order by start_time desc
clear colu
spool off

operation(on)                            START_TIME                END_TIME
----------------------------------- -------------------- --------------------
gather_database_stats(auto)         090518 06:09:43.3429 090520 00:00:52.7399
gather_database_stats(auto)         090517 22:00:01.0653 090518 06:01:02.1637
gather_database_stats(auto)         090516 22:00:00.5150 090517 06:00:39.3076
gather_database_stats(auto)         090515 22:00:00.4157 090516 06:01:10.6544
gather_database_stats(auto)         090514 22:00:01.6624 090515 06:01:10.3767
gather_database_stats(auto)         090513 22:00:01.0706 090514 06:01:07.7145
gather_database_stats(auto)         090511 06:01:44.2381 090513 00:01:23.3091
gather_database_stats(auto)         090510 22:00:00.3218 090511 06:00:53.7735
gather_database_stats(auto)         090509 22:00:02.4334 090510 06:01:11.2443
gather_database_stats(auto)         090508 22:00:03.2080 090509 06:01:22.6858
gather_database_stats(auto)         090507 22:00:01.4859 090508 06:01:19.1966
gather_database_stats(auto)         090506 22:00:02.1927 090507 06:03:55.9747
gather_database_stats(auto)         090504 06:00:47.3064 090506 00:00:29.0017
gather_database_stats(auto)         090503 22:00:01.0015 090504 06:00:45.7018
gather_schema_stats-BIG_SCHEMA      090503 10:41:16.2605 090503 12:32:33.2488
gather_database_stats(auto)         090502 22:00:00.9199 090503 06:00:47.9868
gather_database_stats(auto)         090501 22:00:02.3382 090502 06:01:24.6070
gather_database_stats(auto)         090530 22:00:01.4914 090501 06:01:31.4369
gather_database_stats(auto)         090529 22:00:01.1076 090530 06:01:14.7856
gather_database_stats(auto)         090527 06:01:01.0286 090529 00:01:16.2678
gather_database_stats(auto)         090526 22:00:00.3628 090527 06:00:52.0203

The Automated stats job was virtually as a standstill as every night and every weekend it tried to do the same things in the same way and failed. So,unlike a human who finds they choke if they try and swallow a pork pie in one go and so stop doing it, Oracle keeps on trying to swallow it whole.


Worse still, it was burning CPU and IO doing a pointless set of partial table analyzes all night and all weekend.

Double Tsch!


(*) This claim that I never said what I said about the stats job is of course a big weasly lie. I did say it was the best feature. I also stick to this, though I think I am clinging on by my fingertips. I think it is the best step forward for smaller, average, not-too-demanding databases, those which tend to have less DBA support. But not the big, complex, highly active ones. I did say that it has problems with them right at the beginning, honest! ūüôā

Learning stuff you know you don’t know July 17, 2009

Posted by mwidlake in Uncategorized.

A Friday, off-topic post ūüôā

OK, how many people remember the much maligned speech by Donald Rumsfeld about {and I paraphrase} “things we know, things we don’t know, things we know we don’t know and things we don’t know we don’t know”.

If you want, you can watch the dear chap himself saying it here. {I’m afraid this is a slightly maligning version, but not as bad as most I found}.

Leaving aside it was Mr Rumsfeld who said it and thus accounted for some of the endless scorn poured upon the speech, I think it was a very, very valid observation by his speech writers. I always think that people laughing at the “idiocy” of the speech have missed something. The idea is nothing new, Henry David Therou for example said

“To know that we know what we know, and that we do not know what we do not know, that is true knowledge.”

{an excellent quote, for which I thank Wikipedia and Google  РI knew I did not know any good quotes but I knew they existed and I just did not know them}.

I think it is very valid to realise you don’t know stuff in your chosen area of knowledge but that you recognise that lack and that there is thus more to learn. It also means that people who do already know an awful lot in one area of knowledge, they probably have gaps you can help fill. So speak up when you think you know something, even¬†to¬†acknowledged¬†experts. It gives you a really warm glow inside when you do teach somone you regard as an expert something new. And you know what? Most of those experts will be really appreciative for you even trying, let alone succeeding. ¬†

I could¬†of course¬† be saying all this to give my utterences about not knowing stuff a veneer of philosphical respectability, but I really¬†do believe and have done for¬†many years now, that we should be able to say we do not know something and it be respected that we acknowledge that gap. I don’t take glory in my stupidity {for even my ego could not hold up under all that amount of glory ūüôā¬†} but I like to feel that if I can be honest about my stupidity I can thus be honest about my knowledge, and¬†be quietly¬†happy with¬†what I know and what I can do.¬†

OK, so none of this is specific to Oracle but it is no less valid for being applicable to all knowledge. I guess I’m trying to say that I think it is OK {and should be publically acceptable}¬†to admit not knowing stuff and it is equally OK and good for all of us to try and teach others,¬†even if we feel individually that we maybe do not have that much to add. ¬†

What prompted this philosophical outpouring? I found out today that I don’t even know how to peel a banana. And probably most of you don’t either. But Monkeys do.

This is the video.

Go on, watch the video, the boss is in a meeting. It makes you think….

Oracle’s greatest performance feature July 14, 2009

Posted by mwidlake in performance.
Tags: ,

With oracle 10 onwards, you get what I feel is the greatest single step forward for general oracle database performance since…Ohhhh, I give up. I think it is more significant than anything else I’ve seen and I started with Oracle 6.

What is it? It’s the automated statistics gathering job. If you don’t know what this is, I’ll give you a brief summary in a few moments.
If you don’t agree with me, I’d love to know what you would nominate as the best single performance improvement since Oracle 6. {Or even before, for those of you even greyer and more world-weary than I}. This is after all, just an opinion and I’m open to changing my mind.

Now for that promised and brief {fairly brief} description of the automated stats gathering and why I love it {but please do not interpret this as an attempt to stop you telling me you alternatives, I really would like to know}.

By default, every work-day night, an Oracle database that is 10g or higher will spend from 10pm collecting table, index and column stats for you. This stats gathering job will keep going until it has collected all that it thinks it needs or the clock hits 6am, at which point it stops. At the weekend, your busy little databases will work even harder, from midnight Saturday {ie 1 second after Friday ended} to the end of Sunday gathering stats, if need be. Again, the job will stop once it has gathered the stats it thinks your system needs.

The job gathers stats on tables/indexes in all schemas, including SYSTEM, SYS and those other odd internal ones like DBSNMP and WMSYS, as well as all your own schemas. It gathers stats only on tables that have changed by 10% or more since last stats were gathered, or that have been truncated or created since the last run. For each such table, this job will sample a proportion of said table that it thinks is needed to give reliable stats. For each table it also gathers stats on each index {I’ll skim over a slight issue in respect of sample size for indexes}. When a table has it’s stats gathered, Oracle will even make a stab at gathering the correct level a column statistics for each column, based on whether you ever use the column in joins or where clause {ie there is some sense to collecting detailed histogram stats only for those columns it would help}.

Finally in my brief description, and something that not all DBAs and developers appreciate, is that this automatic job stores the stats as they were before it gathered new ones, so you can go back to previous stats if you so wish {and it was not more than a month ago}.

This process, this nightly job, has flaws. Some of it’s decisions can be poor. It can gather stats you might not want it it and it can mess up. BUT! It does run regularly and it does gather generally beneficial stats for all tables, indexes, partitions, sub-partitions, columns. All of them. {mostly}¬†.

Prior to this automatic job, many sites’ object stats were in very poor health.

  • Some sites¬†did not gather table/index stats at all.
  • Many sites gathered stats only occasionally.
  • Some sites would gather stats on a few tables and none on most.

This last situation, stats on some tables none on many others, condemned the cost based optimiser to having to make decisions based on a very poor information and was/is very common under 9. I like to think of it as taking a map of the UK and removing all the roads and then trying to drive from London to Manchester. So long as you keep going generally North and west, you will get there. Eventually. 

{I have a private theory that the number of hints in code is in proportion to the three situations above, the most being on systems with “stats on only a few tables”.}

This is why I think that job is the biggest step forward for performance. It ensures there is at least some information for every table and every index and that for most of them the data is correct to within 10%. Thus it allows the CBO to work with pretty accurate information and all that clever maths has a good chance of working.

Yes, if you know more about stats and performance and your system, you can do better than the automated job alone, but for the majority of sites it is a step forward. Especially for those sites that lack strong DBA/Developer expertise. Ie, the majority :-).

I know from talking to some Oracle support people that their lives have got quieter.¬†I’m told that they¬†get a lot fewer calls about SQL performance from Oracle¬†10 and 11 systems.¬†Mind you, the usual¬†“fix” of getting the customer to just gather some stats has gone¬†out the window, which is a shame as it fixed most issues. All in all, it has been a massive boon to them.

So, I think it is the greatest step forward.

I also hate it. I hate that automated stats job. It makes silly decisions, it¬†breaks, it is poorly documented and it makes a dog’s dinner of very big, very active or very odd systems. In fact it is rubbish. But very, very, very beneficial rubbish. I love that pile of rubbish.

Counting the Cost #5 – accurate and fast July 9, 2009

Posted by mwidlake in performance.
Tags: , ,
1 comment so far

I started off my blog by discussing quick ways of selecting count(*) from tables. I never completed the series, so I’m back to it.

This is the previous post on the topic.

It covers how using ALL_TAB_MODIFICATIONSand the NUM_ROWS column in ALL_TABLES can be combined to get a very accurate estimate of count(*) for most tables in most situations.

If this does not work for you {e.g no privilege to see the tables or flush the information), or you absolutely must have the count as accurately as you can right now, as a Manager is threatening you, this is another method using a count table. It can be expanded to hold more than just count(*) and is particularly useful for partitioned tables {which are often huge}.

Basically, every time you select count(*)on a table {or even eg select count(*) where status=0} you are probably scanning a lot of table you scanned last time you did count(*) on that table. If you are not deleting records from that table, that scanning effort is effectively a waste of time – it won’t have changed since last time you did select count(*). So, as a developer or DBA, you can decide you can trust the last count(*), so you need only count more recent records.

If you have an ascending primary key {traditionally a number sourced from a sequence} you can store the max(PK) and count(*) in a simple table. You then count all the records with a PK greater than the stored one. I’ve used count tables several times in the past and two enhancements invariably crop up, so I am going to include them in the below example of how you can implement this.

You need the partitioning option to run the test yourself, so sorry you will need Enterprise Edition with the option. If you downloaded Oracle to your PC/linux box to play on, you probably have it. {it never ceases to amaze me that Oracle Corp will only allow Partitions to be used in EE edition and it is an extra cost option!}

You need to create tablespaces to hold the partition tables {or alter the test script to use existing tablespaces}. Feel free to take and modify this script  to create the tablespaces.

This is my test script .

This is my basic setup:-
—————————— ——————–
cpu_count 1
db_block_size 8192
db_file_multiblock_read_count 8
optimizer_mode ALL_ROWS
sga_target 612368384
sort_area_size 65536

My test table:-

create table test_p
 id    number(8) not null
 ,status number(1) not null
,num_1 number(3) not null
 ,num_2  number(3)
,vc_pad varchar2(2000))
tablespace parts_curr
 partition by range (id)
 (partition id_10k values less than (10000)
tablespace parts_old
 ,partition id_20k values less than (20000)
tablespace parts_old
 ,partition id_30k values less than (30000)
tablespace parts_old
 ,partition id_40k values less than (40000)
tablespace parts_old
,partition id_max values less than (maxvalue)
tablespace parts_curr
-- {41999 records created - see script}
alter table test_p
add constraint pt_pk primary key (id)
using index
partition id_10k tablespace parts_old
 ,partition id_20k tablespace parts_old
,partition id_32k tablespace parts_old
 ,partition id_40k tablespace parts_old
,partition id_max tablespace parts_curr)
-- gather stats
/-- create the count table
create table test_p_running_count
  (max_id	number(8) not null
  ,row_count number(8)
,constraint tprc_pk primary key(max_id)

OK, let’s count the number of records in TEST_P and then do the same but insert the data collected into the count table:-

select count(*) from test_p;

Execution Plan
| Id | Operation | Name | Rows | Cost (%CPU)| Time | Pstart| Pstop |
|   0 | SELECT STATEMENT       |       |     1 |    26   (4)| 00:00:01 |       |       |
|   1 |  SORT AGGREGATE        |       |     1 |            |          |       |       |
|   2 |   PARTITION RANGE ALL  |       | 41999 |    26   (4)| 00:00:01 |     1 |     5 |
|   3 |    INDEX FAST FULL SCAN| PT_PK | 41999 |    26   (4)| 00:00:01 |     1 |     5 |

          0  recursive calls
          0  db block gets
        111  consistent gets
          0  physical reads

insert into test_p_running_count (max_id,row_count)
select max(id),count(*) from test_p;

Execution Plan
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| P stop |
|   0 | INSERT STATEMENT       |       |     1 |     5 |    26   (4)| 00:00:01 |       |     |
|   1 |  SORT AGGREGATE        |       |     1 |     5 |            |          |       |     |
|   2 |   PARTITION RANGE ALL  |       | 41999 |   205K|    26   (4)| 00:00:01 |     1 |   5 |
|   3 |    INDEX FAST FULL SCAN| PT_PK | 41999 |   205K|    26   (4)| 00:00:01 |     1 |   5 |

         26  recursive calls
         21  db block gets
        115  consistent gets
          0  physical reads

So, it takes 111 consistent gets and an estimated cost of 26 to do a count(*) on this relatively small table, doing a fast full scan of the PK index.  It takes only slightly more effort to create a record to hold this information.

{As an aside, it is curious that the total estimated cost of the insert is the same as the select, suggesting naughtily that the insert is free, but that’s not my topic today}.

I now create 100 records by running a little insert script.

Let’s select the number of records and the new max(ID) created since we last stored information in the count table:

select max(id),count(*) from test_p
where id>(select max(max_id) from test_p_running_count)
   MAX(ID)   COUNT(*)
---------- ----------
     42099        100

Execution Plan
| Id  | Operation                     | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
Pstart| Pstop |
|   0 | SELECT STATEMENT              |         |     1 |     5 |     4   (0)| 00:00:01 |      |       |
|   1 |  SORT AGGREGATE               |         |     1 |     5 |            |          |      |       |
|   2 |   PARTITION RANGE ITERATOR    |         |  2100 | 10500 |     2   (0)| 00:00:01 |  KEY |     5 |
|   3 |    INDEX RANGE SCAN           | PT_PK   |  2100 | 10500 |     2   (0)| 00:00:01 |  KEY |     5 |
|   4 |     SORT AGGREGATE            |         |     1 |    13 |            |          |      |       |
| 5 | INDEX FULL SCAN (MIN/MAX)| TPRC_PK | 1 | 13 | 2 (0)| 00:00:01 | | |
          0  recursive calls
          0  db block gets
          3  consistent gets
          0  physical reads

The CBO scanned the index of the TEST_P_RUNNING_COUNT table using an index full scan (min/max), which means Oracle basically worked down the index to get a min or max value, which is highty efficient. This value was then used to do an index range scan of only one index partition (partition number 5).

Estimated cost was 4 and the execution took 3 consistent gets {NB this is for the second execution, it is 12 consistent gets an 4 recursive calls the first time, when the statement is parsed}.

But that only got the new max(id) and how many records have been created since you last recorded it, 100. What you really want is the count of records since you last recorded the count and max(id) PLUS the count as it was then. ie the full count now.
This is not straightforward in a single SQL statement as you want the count, a goup function, from one table and the single row value from another. You can do this though. You basically write SQL statements to do the bits you want and then select from them. My example script shows how I build up to this, but the final statement I came up with {and you might well be able to come up with better options yourself} is:

select change.max_id new_max,orig.max_id orig_max
,change.rowcount chg_c,orig.rowcount orig_c 
,change.rowcount+orig.rowcount tot_c
(select max(id) max_id,count(*) rowcount,1 tabjoin
  from test_p
  where id>(select max(max_id) from test_p_running_count) ) change
,(select max(max_id) max_id,max(row_count) rowcount,1 tabjoin
  from test_p_running_count)                                            orig
where change.tabjoin=orig.tabjoin

---------- ---------- ---------- ---------- -----------------------------
     42099      41999        100      41999                         42099

Execution Plan
| Id | Operation | Name | Rows | Bytes | Cost (%CP
U)| Time | Pstart| Pstop |
|   0 | SELECT STATEMENT                |                      |     1 |    58 |     7   (0)| 00:00:01 |       |       |
|   1 |  NESTED LOOPS                   |                      |     1 |    58 |     7   (0)| 00:00:01 |       |       |
|   2 |   VIEW                          |                      |     1 |    29 |     4   (0)| 00:00:01 |       |       |
|   3 |    SORT AGGREGATE               |                      |     1 |     5 |  |          |       |       |
|   4 |     PARTITION RANGE ITERATOR    |                      |  2100 | 10500 |     2   (0)| 00:00:01 |   KEY |     5 |
|*  5 |      INDEX RANGE SCAN           | PT_PK                |  2100 | 10500 |     2   (0)| 00:00:01 |   KEY |     5 |
|   6 |       SORT AGGREGATE            |                      |     1 |    13 |  |          |       |       |
| 7 | INDEX FULL SCAN (MIN/MAX)| TPRC_PK | 1 | 13 | 2 (0)| 00:00:01 | | |
|*  8 |   VIEW                          |                      |     1 |    29 |     3   (0)| 00:00:01 |       |       |
|   9 |    SORT AGGREGATE               |                      |     1 |    26 |  |          |       |       |
|  10 |     TABLE ACCESS FULL           | TEST_P_RUNNING_COUNT |     1 |    26 |     3   (0)| 00:00:01 |       |       |

          0  recursive calls
          0  db block gets
         10  consistent gets
          0 physical reads

{you might want to click on “show plain” in the above code window for a clearer layout}
That’s an estimated cost of 7 by the CBO and 10 consistent gets {again, the second itteration}. Remember, scanning the whole partitioned table was costed at 27 and took 111 consistent gets.

You can convert that sql select statement into an insert and use it to update your count table:-

insert into test_p_running_count (max_id,row_count)
 select change.max_id
(select max(id) max_id,count(*) rowcount,1 tabjoin
   from test_p
   where id>(select max(max_id) from test_p_running_count) ) change
,(select max(max_id) max_id,max(row_count) rowcount,1 tabjoin
   from test_p_running_count) orig
where change.tabjoin=orig.tabjoin

That is one of the improvements usually asked for when you have a count table, people want to see what the counts were historically, seeing as you are now holding the data in a table {why is it so many people suddenly want to keep information once it is seen in a table?}. That is why the above does an insert and not an update and my code selects the max(max_id) and max(rowcount). It is not perfect, if no records have been inserted into your massive table since last the max(id) and row count was inserted, you will get a duplicate primary key error. You could add a datetime column to get around this.

Also, my SQL is a little naughty in that I select the max(id) and max(rowcount). There is nothing in the table design to enforce the unspoken rule that they both appear in the “last record” of the table, but as a human you can see that they do. Again, the use of a datetime can help with this.

The final bit for today. The above two sql statments look a bit nasty, certainly a lot more complex than “select count(*) from table”. Also, they are not as efficient as they could be, the selecting max(row_count) is not supported by an index and could slow down over time. I could tweak the SQL statement even further to work around this, but a simple piece of PL/SQL does the job better, and is what I usually end up implementing {within a package, along with a load of other count table functions for several large tables}.

Here is a script to create a function . that gets the current count(*) for the table.

create or replace function get_tp_rowcount
return number
v_rtn number;
v_extra_rc number;
v_last_rc   number;
v_max_id    number;
v_last_max  number;
  select max_id,row_count
into v_last_max,v_last_rc
  from test_p_running_count
  where max_id=(select max(max_id) from test_p_running_count);
  select count(*),max(id)
into v_extra_rc,v_max_id
  from test_p
  where id>v_last_max;
v_rtn :=v_last_rc+v_extra_rc;
return v_rtn;

And, as you can see below, it remains an efficient trick. I’ll leave it to you to create a procedure to update the count table and with the observation that you can then argue for several hours as to how often you update the count table… ūüôā

select get_tp_rowcount from dual;


1 row selected.

Elapsed: 00:00:00.01

Execution Plan
| Id  | Operation        | Name | Rows  | Cost (%CPU)| Time     |
|   0 | SELECT STATEMENT |      |     1 |     2   (0)| 00:00:01 |
|   1 |  FAST DUAL       |      |     1 |     2   (0)| 00:00:01 |

          2  recursive calls
          0  db block gets
          5  consistent gets
          0  physical reads


select count(*) from test_p;


Execution Plan
| Id | Operation | Name | Rows | Cost (%CPU)| Time | Pstart| Pstop |
|   0 | SELECT STATEMENT       |       |     1 |    26   (4)| 00:00:01 |       |       |
|   1 |  SORT AGGREGATE        |       |     1 |            |          |       |       |
|   2 |   PARTITION RANGE ALL  |       | 41999 |    26   (4)| 00:00:01 |     1 |     5 |
|   3 |    INDEX FAST FULL SCAN| PT_PK | 41999 |    26   (4)| 00:00:01 |     1 |     5 |
----------------------------- ----------------------------
          0  recursive calls
          0  db block gets
        111  consistent gets
          0  physical reads

Peeking under the Data Dictionary Hood July 7, 2009

Posted by mwidlake in internals.
Tags: ,

Have you ever wondered how some people seem to be able to find out how the Oracle database does things internally? How they work out where some of the interesting stuff is contained within the data dictionary and can pull it out?

Well, one of the simpler ways to peek under the data dictionary hood is to just look at the objects in the data dictionary itself. I’ve found some interesting things by looking at the data dictionary views. A lot of the dictionary objects areviews, like DBA_TABLES and DBA_COLUMNS. I’ll look at DBA_TABLESPACES {as it’s a bit smaller!}

You can get my script here – vw_txt.sql . As you will see, it is a terribly complex script… The only trick is to make sure that “set long 32767” as the view text is held in column of the ancient LONG datatype.

> @vw_txt
Enter value for vw_name: dba_tablespaces
old 6: where view_name like upper (nvl('&vw_name','WHOOPS')||'%')
new   6: where view_name like upper (nvl('dba_tablespaces','WHOOPS')||'%')
Any Key...>

OWNER                          VIEW_NAME                      TEXT_LENGTH
------------------------------ ------------------------------ -----------
select ts.name, ts.blocksize, ts.blocksize * ts.dflinit,
          decode(bitand(ts.flags, 3), 1, to_number(NULL),
ts.blocksize * ts.dflincr),
          decode(ts.contents$, 1, to_number(NULL), ts.dflmaxext),
decode(bitand(ts.flags, 4096), 4096, ts.affstrength, NULL),
decode(bitand(ts.flags, 3), 1, to_number(NULL), ts.dflextpct),
ts.blocksize * ts.dflminlen,
decode(ts.online$, 1, 'ONLINE', 2, 'OFFLINE',
                 4, 'READ ONLY', 'UNDEFINED'),
decode(ts.contents$, 0, (decode(bitand(ts.flags, 16), 16, 'UNDO',
                 'PERMANENT')), 1, 'TEMPORARY'),
decode(bitand(ts.dflogging, 1), 0, 'NOLOGGING', 1, 'LOGGING'),
decode(bitand(ts.dflogging, 2), 0, 'NO', 2, 'YES'),
          decode(ts.bitmapped, 0, 'DICTIONARY', 'LOCAL'),
decode(bitand(ts.flags, 3), 0, 'USER', 1, 'SYSTEM', 2, 'UNIFORM',
          decode(ts.plugged, 0, 'NO', 'YES'),
decode(bitand(ts.flags,32), 32,'AUTO', 'MANUAL'),
decode(bitand(ts.flags,64), 64,'ENABLED', 'DISABLED'),
decode(bitand(ts.flags,16), 16, (decode(bitand(ts.flags, 512), 512,
                 'GUARANTEE', 'NOGUARANTEE')), 'NOT APPLY'),
decode(bitand(ts.flags,256), 256, 'YES', 'NO'),
decode(tsattr.storattr, 1, 'STORAGE', 'HOST'),
decode(bitand(ts.flags,16384), 16384, 'YES', 'NO'),
decode(bitand(ts.flags,64), 0, null,
decode(bitand(ts.flags,65536), 65536,'FOR ALL OPERATIONS',
                 'DIRECT LOAD ONLY'))
from sys.ts$ ts, sys.x$kcfistsa tsattr
where ts.online$ != 3
and bitand(flags,2048) != 2048
and ts.ts# = tsattr.tsid

1 row selected.

You see a lot of decodes of bitand functions in these internal views. What is more interesting is to see what they are decoded into as it confirms what possible options there are.

Are you curious as to what all the possible object types in the database are? Whether your database happens to have examples of them or not? Peek inside the DBA_OBJECTS view {the following is part of that view}:-

OWNER                          VIEW_NAME                      TEXT_LENGTH
------------------------------ ------------------------------ -----------
select u.name, o.name, o.subname, o.obj#, o.dataobj#,
       decode(o.type#, 0, 'NEXT OBJECT', 1, 'INDEX', 2, 'TABLE', 3, 'CLUSTER',
                      4, 'VIEW', 5, 'SYNONYM', 6, 'SEQUENCE',
                      7, 'PROCEDURE', 8, 'FUNCTION', 9, 'PACKAGE',
                      11, 'PACKAGE BODY', 12, 'TRIGGER',
                      13, 'TYPE', 14, 'TYPE BODY',
                      19, 'TABLE PARTITION', 20, 'INDEX PARTITION', 21, 'LOB',
                      22, 'LIBRARY', 23, 'DIRECTORY', 24, 'QUEUE',
                      28, 'JAVA SOURCE', 29, 'JAVA CLASS', 30, 'JAVA RESOURCE',
                      32, 'INDEXTYPE', 33, 'OPERATOR',
                      34, 'TABLE SUBPARTITION', 35, 'INDEX SUBPARTITION',
                      40, 'LOB PARTITION', 41, 'LOB SUBPARTITION',
                               FROM sum$ s
                               WHERE s.obj#=o.obj#
and bitand(s.xpflags, 8388608) = 8388608),
                              'MATERIALIZED VIEW'),
                      43, 'DIMENSION',
                      44, 'CONTEXT', 46, 'RULE SET', 47, 'RESOURCE PLAN',
                      48, 'CONSUMER GROUP',
                      51, 'SUBSCRIPTION', 52, 'LOCATION',
                      55, 'XML SCHEMA', 56, 'JAVA DATA',
                      57, 'EDITION', 59, 'RULE',
                      60, 'CAPTURE', 61, 'APPLY',
                      62, 'EVALUATION CONTEXT',
                      66, 'JOB', 67, 'PROGRAM', 68, 'JOB CLASS', 69, 'WINDOW',
                      72, 'WINDOW GROUP', 74, 'SCHEDULE', 79, 'CHAIN',
                      81, 'FILE GROUP', 82, 'MINING MODEL', 87, 'ASSEMBLY',
                      90, 'CREDENTIAL', 92, 'CUBE DIMENSION', 93, 'CUBE',
                      94, 'MEASURE FOLDER', 95, 'CUBE BUILD PROCESS',

Scanning the view definitions for something you become interested in is a little more tricky as the text is, as I said, stored in a long and you can’t run sql functions against a long. I use a bit of PL/SQL to get around the problem :-

find_vw_txt.sql .

-- find_vw_txt
-- martin widlake 13/8/08
-- can't search view text as it is in a long.
-- so let's try PL/SQL
set serveroutput on size unlimited
spool find_vw_txt.lst
set trims on  lines 120 pages 32
v_name varchar2(4000) :='&viewname';
v_search varchar2(100) :='&search_txt';
cursor get_vw_txt is
select u.name owner, o.name name, v.textlength textlen, v.text text
from sys.obj$ o
, sys.view$ v
   , sys.user$ u
where o.obj#     = v.obj#
and   o.owner#   = u.user#
and   o.name     like upper(v_name)||'%'
and v.textlength &lt;32000; -- cant think how I stop a numeric
                          -- error trying to pull a larger long
-- into a varchar. bloody longs
v_text varchar2(32000);
v_where number;
  v_name :=upper (v_name);
  v_search := upper(v_search);
for vtr in get_vw_txt loop
--dbms_output.put_line('looking at '||vtr.name);
    v_text := vtr.text;
    v_text := upper(v_text);
    v_where := instr(v_text,v_search);
    if v_where !=0 then
dbms_output.put_line('view '||vtr.owner||'.'||vtr.name||':** '
||substr(v_text,greatest(0,v_where),80)||' **'
    end if;
  end loop;
spool off

It can be fun to dig into the internals this way. And useful. A few times I’ve been able to find out one or two things about how oracle is storing information.

It might occur to you that there are lots of internal pl/sql packages and you can look into them too:-

pkg_txt.sql will show you the contents of a package or stored function/procedure.
find_plsql_txt.sql will scan the stored PL/SQL for a text string.

However, a lot of the internal PL/SQL is wrapped, ie converted into a form you can’t peek into easily {well, in my case, at all, but some people know how}.

One last word of warning. You can find what look like interesting undocumented features when you peek under the covers. In V10 I came across the REVERSE function:

select reverse('Martin was here')
from dual

ereh saw nitraM

Nice :-).

That could be useful, yes?

I’ve checked in the 11g documentation and it is still not documented. I googled “oracle reverse function” and several people have also found it. A couple ask why such a useful thing is not documented…

Well, I was “lucky” enough to find out why it is undocumented. I tried to use the REVERSE function on a number { this wason v10.1 I think}.

My session core-dumped.

I was on a production system when I tried this {Yes I KNOW!!! It was a momentary lapse}!

It seems to work OK on and 11.1 but don’t. You have no idea if it will always work OK and no come-back to Oracle Corp if it blows up on you. My guess is they wrote the REVERSE function for internal use and did not test it for people using it “not as intended” {as I had done}.

So, my warning is, though it is fun to look under the covers and try things out, never, ever ever, ever do so on a production system. Or a test system used for proper testing. Or a development system those touchy developers are using. Or any system that you can’t afford to trash and recreate on a whim. Try it at home on your own PC, OK?

Stuck execution plans – purging a cursor July 3, 2009

Posted by mwidlake in performance.

I was at a UKOUG RAC/HA special interest group meeting last month.¬†My favorite¬†presentation of the day {out of what was a very good selection}¬†was by Harald Van Breederode, an Oracle University¬†DBA¬†trainer¬†from the Netherlands.¬† Harald’s¬†presentation was on Checkpoints, a feature of Oracle that most DBAs tend to know a little about but never look into extensively. If you are a member of the UKOUG you can get the slides from here {If you are not a member, and you are in the UK {{or even Europe}}, nag your boss, get a membership – the conferences and meetings are a great source of info}.

Anyway, that is not the topic of this Blog. I finally downloaded a copy of the slides today and I checked out Harald’s blog. I immediately learnt something, which isthe topic of this blog.

In Oracle (and upwards I assume) you can now flush a specific cursor out of the library cache using dbms_shared_pool.purge. You need to create the package first, it is not installed by default:-


Package created.
Grant succeeded.
View created.
Package body created.

You also need to set an event to allow the purge to work. See metalink note 751876.1:
‚ÄúThe fix has gone into However, the fix is event protected. You need to set the event 5614566, to make use of purge. Unless the event is set, purge will have no effect.‚ÄĚ

alter session set events ‘5614566 trace name context forever’;

Now that you have the feature installed you can find the address and hash_value of a cursor you want to purge with eg:

select address, hash_value from v$sqlarea
where sql_text = ‘select count(c2) from skew where c1 = :bind’;
——– ———-
27308318 2934790721

And purge it.

exec sys.dbms_shared_pool.purge(‘&address, &hash_value’,’c’)
PL/SQL procedure successfully completed.

For more comprehensive details check out Haralds original post.

Why would you want to flush a specific cursor out of the library cache? Because it is performing poorly and you want the CBO to have another go at coming up with a plan. Before this nice sys.dbms_shared_pool.purge function you had three other options to flush the cursor out.

  • You can flush the whole shared pool
    alter system flush shared_pool;
    You really want to avoid doing this on a live system, as it flushes all SQL, PL/SQL (functions, packages, procedures} and data dictionary information, which will now have to be read back in when next required. It’s like unloading a whole delivery truck just to throw out one letter and then reloading the truck.
  • Alter one of the objects referenced by the SQL statement, with some DDL. Some suggest doing a grant or revoke on a table, I used to try adding a table comment {to what it already was}. I’m still not that comfortable with that option on a live system as you are doing something to a live object.
  • Another option is to gather new stats on one of the objects referenced by the cursor, which is fine so long as there is a small object you can quickly execute a dbms_stats.gather_xxxx_statistics on.

So I’ll look more into this package as I think it is cleaner way to do it. Mind you, there is that setting of an event on a live system…

How does a statement get a poor plan that is going to be fixed simply by asking the CBO to parse it again?

In Harald’s posting he also covers a main reason as to why you would want to flush a sql cursor out of the shared pool. His worked example shows how a SQL statement with a bind variable is open to getting an execution plan the first time it is parsed which is suitable for that first value of the bind – but NOT suitable for most other executions.¬†eg the first value passed¬†to the¬†bind is maybe a very common one matching a large percentage of the table and so an index is not used, but most executions of the statement are with a bind value that matches very few records, and thus the index lookup is the best plan. But once the sql statement is parsed, the plan will be used until the statement disappears out of the¬† shared pool.

Another cause of a poor plan is when the statistics for an object referenced by the SQL statement changes. The CBO sometimes just chooses a plan that is not good. You may be thinking that, in this case, the optimizer will simply come to the same plan if asked again. Maybe not.

An issue I kept encountering¬†at one site was very interesting. One of a small handful of simple SQL statements would go rouge overnight. Usually about 2 or 3am in the morning. Instead of a plan using a a highly specific index and a couple of table joins, a very, very poor plan was used instead. It only ever occurred early in the morning and only when the automated statistics gathering job had gathered stats on one of the tables involved. It took a while to spot this as the SQL would usually go rogue a while after the stats on the relevant tables had been gathered. This is because SQL statements are not invalidated when the underlying segments have their stats re-gathered by the automated job, they are invalidated “a little later”. It seems on Oracle 10.2 to be within an hour of the gather but not always. {To be totally accurate, this delayed invalidation is due to the DBMS_STATS¬†parameter “no invalidate” defaulting to the value DBMS_STATS.AUTO_INVALIDATE but it can be overridden if you wish}

What seemed to be happening, though I never 100% proved it {so I am guessing, do not take this as a fact} is that one table would have new stats and the range of values for a column would include recent data {let’s say values 1 to 10,000,000}. Another table had the old information and so the range of known values was different {1 to 9,200,000}. This discrepancy would cause the wrong plan to be chosen. {I have a wilder theory which is that the indexes for a table had a different range of values for a column as the table stats had, but there are problems with my theory}.

By the time I logged in to the system in the morning to fix the overnight performance problem, stats on all relevant tables had finished being gathered and prompting the code to re-parse was the solution.

That leads me to the last point {sorry, a very long and wordy post again}.

Sometimes you can’t purge the cursor. The execution plan is stuck. Why?

Let’s say you have a cursor that is executed say 100 times a minute. Normally each execution runs in less than 50 milliseconds. All is fine. It has now gone rogue and it is taking 5 seconds to run, as the execution plan is poor. Each execution completes but at any time there is always at least one session running the code, usually several.

A SQL cursor will not be flushed from the shared pool if it is in use. Thus this rogue cursor gets stuck in the SGA. You can’t alter any of the objects as you can never get an exclusive lock on them. You flush the shared pool in desperation {even on your live system} and the cursor stays there, as even alter system flush shared_pool will not flush out an in-flight cursor. You could try locking the table, but like the DML, you are likely never to get that lock.

In this situation you have to identify the application running the code, stop it and wait for all executions to finish. Not maybe an issue in my example of 5 seconds to complete, but I’ve had the same problem with code run every few minutes now taking over an hour, so we could not wait an hour for it to sort out. In that situation we also had to kill sessions.

Of course, stopping and starting the database will cure the problem but the business may not be too happy about the whole database being shut down. {This prompts a thought – I’ve never noticed this but I wonder if you get the same problem on RAC but only impacting one node?}

This new package may help with stuck execution plans in that you can just keep trying over and over again to flush the one cursor until you hit a moment when no current execution is running.