jump to navigation

Richard Foote on the Impact of stats staying the same. February 16, 2010

Posted by mwidlake in performance.
Tags: , ,
add a comment

I just wanted to highlight this very good posting by Richard Foote. He is talking about how SQL execution plans can change when “nothing else does”. Not the table structures, not the code, note the initialisation parameters, not even the table and index stats.

But something does change, which is what day it is (or what hour it is or what week it is). Time moves on and our data does too. If the statistics on the tables does NOT move on, then the CBO thinks that the range of data in the table does not change. So, the CBO thinks your queries are getting further and further “out of range” and so would expect to find less and less data in the tables to bring back. That will lead to plan changes.

If you have noticed my preoccupation with identifying the contents of histograms and high/low values in column stats, you may appreciate that this topic is one I have been finding is a large part of my day job.

Richard explains the point very well, as always, so go have a look.


Stats Need Stats to Gather Stats February 16, 2010

Posted by mwidlake in performance.
Tags: , ,

Did you know that you sometimes need good stats so that the stats-gathering package can gather stats in an efficient way? This is a recent, quite extreme example.

I’ve been forced to learn a lot about gathering Oracle stats using DBMS_STATS over the last 4 or 5 years. But no matter how much I learn about the “challengingly bizarre” way in which it works, it seems every week or two there is a new oddity. I plan a whole series on the topic “soon”.

This particular example is from a system.

I am gathering partition-only table stats as we are using the ability of Oracle to roll up Partition stats to Global stats under certain specific conditions. One of the conditions is that you need stats for every partition. Plus, to get global column stats, each partition must have stats for each column. Some of our partitions lacked or had very bad stats.

So I quickly knocked up a script-generating script to create DBMST_STATS.GATHER_TABLE_STATS statements that collected, for those partitions:

  • ONLY partition stats.
  • NO cascade down to indexes
  • BLOCK sampling so it is fast {and poor, but there you go}
  • ESTIMATE_PERCENT of 2, which is quite low for block sampling
  • collect histograms as we decided the devil of having them was better than the devil of not having them.

the above is not ideal to get “good stats”, but it is quick and gets OK stats which is what we need right now. An example statement is:

-- part SD_INFO-DY07032004 rows 34554                                                              
dbms_stats.gather_table_stats(ownname => 'ERIC',tabname=> 'SD_INFO'
,partname=> 'DY07032004'
,granularity=> 'PARTITION', estimate_percent => 2,block_sample=> true ,cascade=> FALSE
,method_opt=> 'FOR ALL COLUMNS SIZE AUTO' ,no_invalidate=> false);

Not a big partition (34554 rows and not very wide rows) and so did not take long to gather:

------------------------------ --------------------
DY04012004                     12-FEB-2010 16:26:57
DY05012004                     12-FEB-2010 16:27:00
DY06012004                     12-FEB-2010 16:27:04
DY07012004                     12-FEB-2010 16:27:07
DY08012004                     12-FEB-2010 16:27:11 -- 4 seconds difference
DY09012004                     12-FEB-2010 16:27:15

I’ve collected statistics for a few thousand partitions over the last couple of days and the time taken is anything between just under half a second to 10 seconds per partition, the odd unusually large partition taking a minute or so. {I believe it takes half a second to gather stats on an empty partition, on our system at least, due to the time it takes for the internal housekeeping, including copying the old statistics information to the SYS.WRI$_OPSTAT_… tables to support restoring stats}. Sorry, I drift away from my main point.

This partition took a lot longer than 10 seconds:

-- part W_ACTIVITY_FAILURE-DY02092008 rows 49425
dbms_stats.gather_table_stats(ownname => 'ERIC',tabname=> 'W_ACTIVITY_FAILURE'
,partname=> 'DY02092008'
,granularity=> 'PARTITION', estimate_percent => 2,block_sample=> true ,cascade=> FALSE
,method_opt=> 'FOR ALL COLUMNS SIZE AUTO' ,no_invalidate=> false);

After 10 minutes it was still running. WHY? I quickly checked the number of rows in the partition and then the size of the partition segment, incase either was much larger than I expected. Neither were.
select count(*) from eric.W_ACTIVITY_FAILURE partition (DY07092008)

From dba_segments.
———- ———-
2621440 320

There is one advantage of a DBMS_STATS statement running for a long time – you can grab from the SGA the actual code being executed for the DBMS_STATS statement. I saw this.

select substrb(dump(val,16,0,32),1,120) ep, cnt 
from (select /*+ no_parallel(t) no_parallel_index(t) dbms_stats cursor_sharing_exact
use_weak_name_resl dynamic_sampling(0) no_monitoring */
"ACCOUNT_ID" val,count(*) cnt 
and "ACCOUNT_ID" is not null 
group by "ACCOUNT_ID") 
order by val

DBMS_STATS is scanning the ACCOUNT_ID column on this table and it is taking a long time about it. The only index on the table is on ACCOUNT_ID. It then struck us.

The index is a global index.
—————————— ————-

And that global index actually lacked stats {another feature of DBMS_STATS and rolling up partition stats had led to that}.

————— ——— — —- ———- ———– ———– ———- ———- ——–

By this time 3 partitions for this table had been processed by my code, taking around 15 minutes each one. Incredibly slow.

I did a very quick 0.01% sample size DBMS_STATS.GATHER_INDEX_STATS on that index which took about 1 minute. As soon as the partition DBMS_STATS.GATHER_TABLE_STATS statement that was in flight finished, the following similar statements on that table’s partitions took under 3 seconds each. I’ll buy a pint for the first person to guess (within 10 minutes) WHEN I collected the global index stats {You can collect from any pub in central London if you give me a few day’s notice 🙂 }.

-------------- ---------- ---------- ---------------------
DY01092008          31461        452 16-FEB-2010 09:51:41
DY02092008          49425        686 16-FEB-2010 10:03:44
DY03092008          54472        719 16-FEB-2010 10:16:31
DY04092008          35762        491 16-FEB-2010 10:30:52
DY05092008          42182        587 16-FEB-2010 10:44:24
DY06092008          21186        312 16-FEB-2010 10:56:13
DY07092008          20898        313 16-FEB-2010 11:12:59
DY08092008         469500       1233 16-FEB-2010 11:13:02
DY09092008         480300        741 16-FEB-2010 11:13:04
DY10092008          15724        223 16-FEB-2010 11:31:01
DY11092008          55671        732 16-FEB-2010 11:31:06
DY12092008         820100       1779 16-FEB-2010 11:31:08
DY13092008          80215       1113 16-FEB-2010 11:31:16
DY14092008          10094        155 16-FEB-2010 11:31:18
DY15092008          10268        158 16-FEB-2010 11:31:20
DY16092008          24578        330 16-FEB-2010 11:31:22
DY17092008          21012        290 16-FEB-2010 11:31:24
DY18092008          22755        318 16-FEB-2010 11:31:27
DY19092008          21276        293 16-FEB-2010 11:31:29
DY20092008          20882        281 16-FEB-2010 11:31:31
DY21092008          24131        323 16-FEB-2010 11:31:34

I will investigate this oddity further if I get time this week, but the lesson to myself so far is:

Global indexes lacking stats can result in very long stats gathering times for partitions EVEN WHEN YOU DO NOT CASCADE DOWN TO INDEXES.