jump to navigation

Memory Changes Everything July 12, 2010

Posted by mwidlake in Architecture, performance.
Tags: , , ,
9 comments

I’ve got this USB memory stick which I use to carry around my scripts, documents, presentations, Oracle manuals and enough music to keep me going for a few days. It is on an 8GB Gizzmo Junior and it is tiny. By tiny I mean as wide as my little finger, the length of a matchstick and about the same thickness of said matchstick. So small that I did indeed lose the damn thing for 6 months before I realised it had got trapped behind a credit card in my wallet.

It cost me ten British pounds about 15 months ago (less than most 4GB USB sticks seem to cost now, but then it is nothing more than the memory chip and connectors wrapped in plastic) and it highlights how cheap solid-state “storage” is becoming.

Connected to this, I was looking at buying a new PC this week and this machine comes with 10 USB slots, if you include the ones on the supplied monitor and stubs on the motherboard.
10 USB slots, 8GB gizzmo memory sticks… That would be 80GB of cheap and fast storage. Now get a few USB hubs and bulk-buy a few dozen cheap USB2 sticks and you could soon have a solid-state database of a few hundred GB for a thousand pounds. Then of course you can have fun seeing where the pinch-points in the system are (USB2 has a maximum speed per port and going USB3 right now is going to break that 1 grand barrier. But give it a year…).

This really started me thinking about when memory-based storage would take over from spinning disk as the best option for enterprise-level storage and my gut feeling is in about 5 years. I think it will be both technically possible and financially viable in much less than that, say as little as 2 years, but the cost of solid-state storage per MB will still be higher than disk by then but potentially much faster. A few considerations going through my mind were:-

  • Disk is getting a lot slower in relation to acreage. By this I mean that, for a single disc drive, capacity is doubling about every 18 months but seek time has hardly reduced in a decade and transfer rate (reading from the physical platters to the units buffer) is again almost stationary, at about 120MB/s for 10,000rpm disk and up towards 180 for those very expensive and noisy 15,000 rpm disks. Being a tad ridiculous to make the point, with modern 3TB disks you could build most Oracle database on one disc. Let’s make it two in a raid 10 configuration for redundancy. My point is, your 3TB database could well be being run right now, for real, across say 5 physical disks with a total sustainable physical throughput of around 500MB a second.
  • Solid state storage seems to be halving in price in more like 8-10 months.
  • IO subsystems are made faster by using RAID so that several physical discs can contribute to get towards the 300MB or so speed of the interface – but solid state is already that fast.
  • IO subsystems are made faster by building big caches into them and pre-fetching data that “might” be requested next. Oh, that is kind of solid state storage already.
  • Solid state storage, at least the cheap stuff in your USB stick, has the problem that you can only write to each bit a thousand or so times before it starts to get unreliable. But physical disk has exactly the same issue.
  • There are new methods of solid-state memory storage coming along – “New Scientist” had a nice article on it a few months ago, and these versions will be even higher density and more long-term reliable.
  • Seek time on solid-state memory is virtually zero, so random IO is going to be particularly fast compared to spinning disk.

Solid state memory needs less power, and thus less cooling, is silent, is potentially denser and is less vulnerable to temperature and humidity fluctuations. I can see it not needing to be kept in a specialist server room with the need for all that air con and ear defenders when you go in the room.
Just somewhere with normal air con and a lock on the door should suffice.
We do not need Solid State storage to match the size of current disks or even be as cheap to take over. As I have already pointed out, it is not acreage you need with physical disks but enough spindles and caches to make it fast enough in relation to the space. Further, we can afford to pay more for solid state if we do not need to keep it in such expensive clean-room like environments.

I can see that in a couple of years for a given computer system, say a mixed-workload order processing system, to support the storage needs we will have maybe a dozen solid-state chunks of storage, perhaps themselves consisting of several small units of memory in some sort of raid for resilience, all able to flood the IO channels into our processing server and the issue will be getting the network and io channels into the server to go fast enough. So don’t, stick all the storage directly into the server. You just got rid of half your SAN considerations.

I’m going to stop there. Partly because I have run out of time and partly because, in checking out what I am writing, I’ve just spotted someone did a better job of this before me. Over to James Morle who did a fantastic post on this very topic back in May. Stupid me for not checking out his blog more often. Jame also mentions that often it is not total throughput you are interested in at all but IOPS. That zero latency of solid-state memory is going to be great for supporting very high IOPS.

DBA_TAB_MODIFICATIONS July 2, 2010

Posted by mwidlake in internals, performance, statistics.
Tags: , ,
16 comments

I’ve mentioned DBA_TAB_MODIFICATIONS (and there is also of course ALL_TAB_MODIFICATIONS and USER_TAB_MODIFICATIONS) in a couple of posts, but mostly buried in long, rambling posts about other things. So here is an overview and demonstration of DBA_TAB_MODIFICATIONS. Skip to the end if you just want to see an example of how it works.

The below example is from 10.2.0.3 but for V11 it is pretty much the same, as is 9 except where I say otherwise.

SYS.DBA_TAB_MODIFICATIONS is used by oracle internally to track how many inserts, updates and deletes have been done to a table, table partition or table subpartition since the segment last had stats gathered on it with dbms_stats. It also records if the segment has been truncated since the last stats gather. Why since the last stats gather? Because Oracle uses it internally to detect how much a segment has changed and if it is 10% or more oracle will deem it stale for stats gathering purposes. (under 11 you can alter the stale percentage). Note I said SYS.DBA_TAB_MODIFICATIONS – on 10 at least the public synonym for it is missing, so unlike most other DBA_ views you need to state the owner (or create a synonym).

OWNER.TABLE_NAME     INSERTS    UPDATES    DELETES TIMESTAMP         TRU
------------------- ---------- ---------- ---------- ----------------- ---
XXXXXXX.YYYYYYYYYYY   22598264          0          1 01-MAR-2010 02:08 NO
XXXXXXX.YYYYYYYYYYY          5          0          0 19-SEP-2007 01:47 NO
XXXXXXX.YYYYYYYYYYY     888766          0          0 01-MAR-2010 02:08 NO
XXXXXXX.YYYYYYYYYYY       3191       1486          0 27-NOV-2009 05:11 NO
XXXXXXX.YYYYYYYYYYY      34742          0          0 08-MAR-2010 15:16 NO
XXXXXXX.YYYYYYYYYYY          0       7192          0 02-JUL-2010 05:00 NO
XXXXXXX.YYYYYYYYYYY          0          1          0 10-MAR-2010 15:16 NO
XXXXXXX.YYYYYYYYYYY          8          8          8 26-JAN-2010 08:05 NO
XXXXXXX.YYYYYYYYYYY    1533536          0          2 01-MAR-2010 02:08 NO
XXXXXXX.YYYYYYYYYYY        281          0          0 11-SEP-2009 03:00 NO 

Under oracle 9 you have to register a table as MONITORED before this information is gathered. Under 10 and 11 all tables/partitions/subpartitions are monitored and you can’t turn that monitoring off {you can try, but oracle ignores you :-) }.

The information can be very useful for checking how volatile a segment is, if it has been changed a lot since the last time stats were gathered on it and you can also add the values held in DBA_TAB_MODIFICATIONS to the value for NUM_ROWS held for the segment and get a very accurate estimate of the current number of rows. It is a lot, lot faster than actually counting them!

The information on inserts/updates and deletes is gathered for pretty much all DML against tables (see an up-coming post for an example of this not being true). Direct load SQL*Loader and other direct-io activity can skip being recorded but insert-append, using the /*+ append */ hint is recorded correctly {I suspect this was not true for V9 and 10.1 but am no longer sure}. This information is initially held in memory and only later pushed into DBA_TAB_MODIFICATIONS and so you may not see the latest information. Under oracle 9 this information is flushed down every 15 minutes I believe, under 10.1 it is 3 hours and under 10.2 onwards the information is only flushed down when stats are gathered against the segment OR you manually flush the information down to the database.

flushing the latest information is achieved with DBMS_STATS.FLUSH_DATABASE_MONITORING_INFO. It is relatively quick to run, normally taking less than a few seconds.

When statistics are gathered on a segment, any corresponding rows in DBA_TAB_MODIFOCATIONS is deleted, not updated to zeros, and is recreated only when relevent inserts,updates, deletes or truncates occur on the segment.

the DBA_TAB_MODIFICATIONS view sits on top of sys.mon_mods_all$ as well as obj$,user$ and the usual suspects. sys.mon_mods_all$ does not contain any more information that the view exposes.

desc sys.dba_tab_modifications
 Name                                                  Null?    Type
 ----------------------------------------------------- -------- ------------
 TABLE_OWNER                                                    VARCHAR2(30)
 TABLE_NAME                                                     VARCHAR2(30)
 PARTITION_NAME                                                 VARCHAR2(30)
 SUBPARTITION_NAME                                              VARCHAR2(30)
 INSERTS                                                        NUMBER
 UPDATES                                                        NUMBER
 DELETES                                                        NUMBER
 TIMESTAMP                                                      DATE
 TRUNCATED                                                      VARCHAR2(3)
 DROP_SEGMENTS                                                  NUMBER

--View description
TEXT
-----------------------------------------------------------------------
SYS                            DBA_TAB_MODIFICATIONS                  9
select u.name, o.name, null, null,
       m.inserts, m.updates, m.deletes, m.timestamp,
       decode(bitand(m.flags,1),1,'YES','NO'),
       m.drop_segments
from sys.mon_mods_all$ m, sys.obj$ o, sys.tab$ t, sys.user$ u
where o.obj# = m.obj# and o.obj# = t.obj# and o.owner# = u.user#
union all
select u.name, o.name, o.subname, null,
       m.inserts, m.updates, m.deletes, m.timestamp,
       decode(bitand(m.flags,1),1,'YES','NO'),
       m.drop_segments
from sys.mon_mods_all$ m, sys.obj$ o, sys.user$ u
where o.owner# = u.user# and o.obj# = m.obj# and o.type#=19
union all
select u.name, o.name, o2.subname, o.subname,
       m.inserts, m.updates, m.deletes, m.timestamp,
       decode(bitand(m.flags,1),1,'YES','NO'),
       m.drop_segments
from sys.mon_mods_all$ m, sys.obj$ o, sys.tabsubpart$ tsp, sys.obj$ o2,
     sys.user$ u
where o.obj# = m.obj# and o.owner# = u.user# and
      o.obj# = tsp.obj# and o2.obj# = tsp.pobj#

-- underlying sys.$ table
desc sys.mon_mods_all$
 Name                                                  Null?    Type
 ----------------------------------------------------- -------- ------
 OBJ#                                                           NUMBER
 INSERTS                                                        NUMBER
 UPDATES                                                        NUMBER
 DELETES                                                        NUMBER
 TIMESTAMP                                                      DATE
 FLAGS                                                          NUMBER
 DROP_SEGMENTS                                                  NUMBER

Here is a demo of it in action:

TDB>-- clear down the test table.
TDB>-- if you do not have access to DBA_TAB_MODIFICATIONS change to ALL_TAB_MODIFICATIONS
TDB>drop table test_1 purge;

Table dropped.

TDB>create table test_1
  2  as select * from all_objects where rownum<1
  3  /

Table created.

TDB>select count(*) from test_1;

  COUNT(*)                                                                      
----------                                                                      
         0                                                                      

1 row selected.

TDB>-- WHAT IS IN DBA_TAB_MODIFICATIONS
TDB>select table_name,inserts,updates,deletes,truncated,timestamp
  2  from sys.dba_tab_modifications
  3  where table_owner=user and table_name= 'TEST_1'
  4  /

no rows selected

TDB>-- OK, let's flush down the information
TDB>--
TDB>exec DBMS_STATS.FLUSH_DATABASE_MONITORING_INFO

PL/SQL procedure successfully completed.

TDB>-- Still nothing as no activity has occurred on the table.
TDB>select table_name,inserts,updates,deletes,truncated,timestamp
  2  from sys.dba_tab_modifications
  3  where table_owner=user and table_name= 'TEST_1'
  4  /

no rows selected

TDB>-- Now create some data
TDB>insert into test_1
  2  select * from dba_objects where rownum <= 100
  3  /

100 rows created.

TDB>commit;

Commit complete.

TDB>select count(*) from test_1;

  COUNT(*)                                                                      
----------                                                                      
       100                                                                      

1 row selected.

TDB>-- WHAT IS IN DBA_TAB_MODIFICATIONS
TDB>select table_name,inserts,updates,deletes,truncated,timestamp
  2  from sys.dba_tab_modifications
  3  where table_owner=user and table_name= 'TEST_1'
  4  /

no rows selected

TDB>-- OK, let's flush down the information
TDB>--
TDB>exec DBMS_STATS.FLUSH_DATABASE_MONITORING_INFO

PL/SQL procedure successfully completed.

TDB>--
TDB>select table_name,inserts,updates,deletes,truncated,timestamp
  2  from sys.dba_tab_modifications
  3  where table_owner=user and table_name= 'TEST_1'
  4  /

TABLE_NAME      INSERTS    UPDATES    DELETES trnc TIMESTAMP                    
------------ ---------- ---------- ---------- ---- -----------------            
TEST_1              100          0          0 NO   02-JUL-2010 10:21            

1 row selected.

TDB>-- The information in DBA_TAB_MODIFICATIONS is used by Oracle to detect if a table
TDB>-- (or partition) in a tables is stale - changed by 10%
TDB>-- Gathering statistics on an object DELETES the record from DBA_TAB_MODIFICATIONS
TDB -- rather than setting all the values to zero.
TDB>--
TDB>exec dbms_stats.gather_table_stats(ownname=>user,tabname=>'TEST_1')

PL/SQL procedure successfully completed.

TDB>--
TDB>select table_name,inserts,updates,deletes,truncated,timestamp
  2  from sys.dba_tab_modifications
  3  where table_owner=user and table_name= 'TEST_1'
  4  /

no rows selected

TDB>-- now do some activity again and flush it down to the dictionary
TDB>insert  into test_1
  2  select * from dba_objects where rownum <= 150
  3  /

150 rows created.

TDB>exec DBMS_STATS.FLUSH_DATABASE_MONITORING_INFO

PL/SQL procedure successfully completed.

TDB>select table_name,inserts,updates,deletes,truncated,timestamp
  2  from sys.dba_tab_modifications
  3  where table_owner=user and table_name= 'TEST_1'
  4  /

TABLE_NAME      INSERTS    UPDATES    DELETES trnc TIMESTAMP                    
------------ ---------- ---------- ---------- ---- -----------------            
TEST_1              150          0          0 NO   02-JUL-2010 10:21            

1 row selected.

TDB>-- Direct inserts via insert-append are also captured (I think this might have changed)
TDB>insert /*+ append */ into test_1
  2  select * from dba_objects where rownum <= 170
  3  /

170 rows created.

TDB>exec DBMS_STATS.FLUSH_DATABASE_MONITORING_INFO

PL/SQL procedure successfully completed.

TDB>select table_name,inserts,updates,deletes,truncated,timestamp
  2  from sys.dba_tab_modifications
  3  where table_owner=user and table_name= 'TEST_1'
  4  /

TABLE_NAME      INSERTS    UPDATES    DELETES trnc TIMESTAMP                    
------------ ---------- ---------- ---------- ---- -----------------            
TEST_1              320          0          0 NO   02-JUL-2010 10:21            

1 row selected.

TDB>-- now a mixed bag of inserts, updates and deletes
TDB>insert into test_1
  2  select * from dba_objects where rownum <= 75
  3  and owner not in ('SYS','SYSTEM')
  4  /

75 rows created.

TDB>--
TDB>update test_1 set created=sysdate
  2  where object_type !='TABLE'
  3  /

289 rows updated.

TDB>delete from test_1
  2  where object_type='SEQUENCE'
  3  /

10 rows deleted.

TDB>commit;

Commit complete.

TDB>--
TDB>exec DBMS_STATS.FLUSH_DATABASE_MONITORING_INFO

PL/SQL procedure successfully completed.

TDB>select table_name,inserts,updates,deletes,truncated,timestamp
  2  from sys.dba_tab_modifications
  3  where table_owner=user and table_name= 'TEST_1'
  4  /

TABLE_NAME      INSERTS    UPDATES    DELETES trnc TIMESTAMP                    
------------ ---------- ---------- ---------- ---- -----------------            
TEST_1              395        289         10 NO   02-JUL-2010 10:21            

1 row selected.

If you want to see what else I said about DBA_TAB_MODIFICATIONS then the end of this {way too long} post shows how to use it to get a very fast count of rows in a table and if you have issues with permissions flushing database monitoring see this link about permissions needed

Question. As the data gets flushed from memory to a data dictionary table, it persists the database being bounced. What happens to the data in memory about table changes when thers is a controlled shutdown and when the database crashes?

Friday Philosophy – The power of cooperation June 27, 2010

Posted by mwidlake in Friday Philosophy, Perceptions, performance.
Tags: , ,
3 comments

Being the person responsible for the performance of an Oracle-based system can be an oddly lonely working experience. It can also be a very gregarious position as you get to meet a lot of people and discuss lots of different aspects of many systems – all systems seem to have performance issues and so people come along to you, hoping you can identify a “work faster” database setting for them.

But you are often the only person who specialises in Oracle performance. You generally need to be in an organisation that is very large or where performance is key to success for there to be justification for dedicating a person to the topic. To have more than one person dedicated to performance your organisations has to have a very strong focus on getting the best performance out of the Oracle and related systems {or really, really atrocious performance issues :-) }. So usually there is no one else around who is as experienced (or more so) as yourself to discuss such things over with or ask for a second opinion.

Which is why I am very lucky at the moment. I’m working in a team of oracle performance people. There are 2.5 of us (one is a manager with other responsibilities, so he only counts as half). Being able to say “Hey, Dave, What do you think of the wait times on scattered reads?” or “how in heck do I force this nested subquery on a view to use a hash join?” and get some other ideas is very valuable.

What is also interesting is how opinions and preferred techniques on tuning can be different and just as valid. As an example, last week I was working on a poorly performing statement. I was at home and it was the evening, so I was not communicating with the rest of the team. I managed to get the code down from 40 minutes to just under 20 by using a combination of a LEADING and USE_HASH hint. I sent the code back to the user. Only to see that within thirty seconds of each other my colleague Graeme had also sent the user a response, again getting the code down to around 20 minutes. Graeme had pulled a chunk of the code into a subquery factoring “WITH” clause and added cardinality hints. Totally different changes.

So Graeme and I then had a “philosophical” discussion about the different changes {“Mine is best” – “No! Mine is, yours is all bloated and complex”- “Your hint is less future-flexible!!!”}. Only joking, we actually discussed the changes and why we each chose what we did. Graeme commented that is showed that tuning was an art and not a science and I countered that it was a science, as we had both identified where we felt the execution plan could be improved but used different techniques to get there. The thing is, Oracle is so complex and has so many options to influence the optimiser that you have flexibility to chose different tools and techniques.

We had both identified the same main improvement but had each come up with different tweaks for later in the plan.

The end result was that we went with Graeme’s main plan {he is bigger than me} but we pulled in my tweak. That bought the execution time down to around 10 minutes, so about four times faster over all and twice as fast of either of us alone. That is one of the advantages of not working alone.

We also then discussed how we could get this code down to seconds with the use of either Materialized views or changing the process that generated the report to do so incrementally and store the daily results. Until one of us realised we had reached the boundary of compulsive tuning disorder. The report running in 10 minutes was enough improvement to satisfy the business, the report was only going to be run over the next couple of months, so spending a day or two re-working it further was going to be fun – but of no advantage to the business. We told each other to finish for the day. So another advantage of not working alone is that not only do you get more technical input but your help prevent each other losing sight of the overall aim.

It really does help to have two people working on the same area.

{There is a sneaky way of getting beyond being a lone performance specialist. If you are in an organisation long enough you can usually find some other idiot who is silly enough to want to learn more about performance and you can train them up. It does not take long before they know enough to start coming up with things you never thought of. Oracle is, after all, full of many ways to do the same thing and you can’t know it all}.

I can’t Explain Why June 8, 2010

Posted by mwidlake in internals, performance.
Tags: , ,
3 comments

Have you ever tried to use Explain Plan and it gives you an error like the below (this is on 10.2)?

DWDBT1> set autotrace on

  1  select count(*)
  2  from person pers
  3  ,person_name pena
  4  where pena.pers_id=pers.pers_id
  5* and pena.surname='SMITH'
DWDBT1> /

  COUNT(*)
----------
     23586
1 row selected.

Execution Plan
----------------------------------------------------------
ERROR:
ORA-00604: error occurred at recursive SQL level 1
ORA-00959: tablespace 'USER_TEMP' does not exist

SP2-0612: Error generating AUTOTRACE EXPLAIN report

I seem to run into this once a year or so and, as it has happened three times now, this time around my brain had at long last hung onto the details and I remembered the cause. As a clue, I run into this issue when I am using a new development database that was derived from a live one…

As you can see from the above error message, tablespace USER_TEMP does not exist. How can you be referencing a tablespace that does not exist? First thing, does the tablespace really not exist and thus the error is real?

DWDBT1> @ts_lst
Enter value for ts_name: u
old 12: where tablespace_name like upper('&ts_name'||'%')
new 12: where tablespace_name like upper('u'||'%')
Any Key>;

TS_NAME                                              INI_EXT_K NEXT_EXT_K MIN_EX
--------------------------------------------------- ---------- ---------- ------
    MAX_EX PCT       MIN_EXTLN ST
---------- --------- --------- ---------
UNDOTBS                                                     64                 1
2147483645                  64 ON
USERS                                                       40         40      1
2147483645 0                40 ON

2 rows selected.

As you can see, there is no tablespace USER_TEMP. So it must be something to do with PLAN_TABLE, the table that underlies EXPLAIN PLAN. So let’s check out that the table exists.

DWDBT1> desc plan_table
 Name                                                        Null?    Type
 ----------------------------------------------------------- -------- ---------------------
 STATEMENT_ID                                                         VARCHAR2(30)
 PLAN_ID                                                              NUMBER
 TIMESTAMP                                                            DATE
 REMARKS                                                              VARCHAR2(4000)
 OPERATION                                                            VARCHAR2(30)
 OPTIONS                                                              VARCHAR2(255)
 OBJECT_NODE                                                          VARCHAR2(128)
 OBJECT_OWNER                                                         VARCHAR2(30)
 OBJECT_NAME                                                          VARCHAR2(30)
 OBJECT_ALIAS                                                         VARCHAR2(65)
 OBJECT_INSTANCE                                                      NUMBER(38)
 OBJECT_TYPE                                                          VARCHAR2(30)
 OPTIMIZER                                                            VARCHAR2(255)
 SEARCH_COLUMNS                                                       NUMBER
...

Yes, the table exists. Or a view or something that looks like a table anyway. Let’s check further.

DWDBT1> @obj_lst
Enter value for obj_name: plan_table
old   8: where object_name like upper(nvl('&obj_name','WHOOPS')||'%')
new   8: where object_name like upper(nvl('plan_table','WHOOPS')||'%')
Enter value for sub_name: 
old   9: and   nvl(subobject_name,'WHOOPS') like upper(nvl('&sub_name','WHOOPS')||'%')
new   9: and   nvl(subobject_name,'WHOOPS') like upper(nvl('','WHOOPS')||'%')
Enter value for obj_type: %
old  10: and object_type like upper(nvl('&obj_type','TABLE')||'%')
new  10: and object_type like upper(nvl('%','TABLE')||'%')
Any Key>

OWNER      OBJECT_NAME          SUBOBJECT_NA OBJ_TYPE     CRE_DATE        LAST_DDL
---------- -------------------- ------------ ------------ --------------- ------------------------
PUBLIC     PLAN_TABLE                        SYNONYM      10-JUL-07       10-JUL-07 12:19:14
SYS        PLAN_TABLE$                       TABLE        10-JUL-07       10-JUL-07 12:19:14

If you are not aware, in V10 PLAN_TABLE was replaced with a global temporary table PLAN_TABLE$ and a public synonym of PLAN_TABLE referencing it, which is what you see above. If I quickly pull out a few details of the table, you can see that it is temporary and that is has no tablespace allocated to the table.

 1 select table_name,status,temporary TEMP,tablespace_name
2 from dba_tables
3* where owner=user and table_name ='PLAN_TABLE$'
DWDBT1> /
TABLE_NAME   STATUS   TEMP   TABLESPACE_NAME
------------------------------ -------- ---- ------------------------------
PLAN_TABLE$   VALID   Y    

The temporary table segment goes into the user’s temporary tablespace (and we are getting close to the cause of the error now, honest). Here is a subset of user details:

USERNAME        USER_ID DFLT_TABSPACE
------------ ---------- ---------------------------------------------
TEMP_TABSPACE                                 CREATED
--------------------------------------------- -----------------
XXADMIN             413 WORKING_128K
USER_TEMP                                     07-DEC-2009 17:34
XXRED               134 DW_COMMON_MG
BATCH_TEMP                                    07-DEC-2009 17:29
DWABCD              414 USERS
USER_TEMP                                     07-DEC-2009 17:3

DWABCD is the user I log in as and it has the temp tablespace set to USER_TEMP, from the original error message. But if I now check for what temporary files are on the system then I see the below:

DWDBT1> select * from dba_temp_files
2 /
Any Key...>

FILE_NAME
----------------------------------------------------------------------------------------------------
FILE_ID    TABLESPACE_NAME                BYTES      BLOCKS     STATUS     RELATIVE_FNO AUT
---------- ------------------------------ ---------- ---------- --------- ------------ ---
MAXBYTES   MAXBLOCKS  INCREMENT_BY USER_BYTES USER_BLOCKS
---------- ---------- ------------ ---------- -----------
/u01/oradata/dwDBT1/temp01.tdbf
1          TEMP                            5368709120 655360    AVAILABLE 1             YES
3.4360E+10  4194302   1280          5367660544  655232
/u01/oradata/dwDBT1/batch_temp01.tdbf
2          BATCH_TEMP                      104857600  12800     AVAILABLE 1             YES
3.4360E+10  4194302   1280          103809024  12672

2 rows selected.

Only two and neither are called USER_TEMP.

So, the error is occurring when the call to Explain Plan is trying to generate a temporary segment in the non-existent temp tablespace. The fix is to simply set the TEMPORARY tablespace for the user to one that exists {or I guess you could create a new temporary tablespace of the correct name}:

DWDBT1> alter user dwabcd temporary tablespace temp;

User altered.

DWDBT1> select count(*) from person;
Any Key...&gt;

COUNT(*)
----------
322798

1 row selected.

`Elapsed: 00:00:03.96

Execution Plan
----------------------------------------------------------
Plan hash value: 1154882994

---------------------------------------------------------------------
| Id  | Operation          | Name   | Rows  | Cost (%CPU)| Time     |
---------------------------------------------------------------------
|   0 | SELECT STATEMENT   |        |     1 |  2417   (1)| 00:01:06 |
|   1 |  SORT AGGREGATE    |        |     1 |            |          |
|   2 |   TABLE ACCESS FULL| PERSON |   328K|  2417   (1)| 00:01:06 |
---------------------------------------------------------------------

What is the cause of the problem? Well, Oracle will not let you set the temporary tablespace of a user to a non-existent temporary tablespace, as you can see by my attempt to set my user back to the invalid value:

DWDBT1> alter user dwabcd temporary tablespace USER_TEMP
2 /
alter user dwabcd temporary tablespace USER_TEMP
*
ERROR at line 1:
ORA-00959: tablespace 'USER_TEMP' does not exist

But it will import users from another system where the temporary tablespace exists and user have it set as their default temporary tablespace. This is why I run into the problem every year or so. I find myself helping out with the creation of development databases from a live database and this mismatch in temporary tablespaces seems to regularly occur.

Dynamic Sampling Hint Ignored May 20, 2010

Posted by mwidlake in internals, performance.
Tags: ,
27 comments

This is a bit of an odd one. On V10.2.0.3 (I have not tried other versions yet) then the dynamic sampling hint appears to be ignored if the stats for the table say there are zero rows in the table. Even if there are in fact lots of rows.

I’ve not had chance to dig into Metalink etc yet {I’m putting together this post before I yet again get overwhelmed and fail to put up something I think is of interest} but if this turns out to be a general bug impacting other versions, it could catch people out. This is because one tactic to handle queries where the data volumes shifts all the time is to not gather stats and use the dynamic sampling hint. “Not gathering stats” could also be “gather stats when the table is empty and ignore it – the dynamic sampling hint will take care of everything”.

Here is my worked example. I use two tables, PERSON and PERSON_NAME, related on a column PERS_ID. {If you want the exact table definitions etc, mail me}.

--preserve my data
create table pena_backup as select * from person_name;

Table created.

--ensure I have stats
exec dbms_stats.gather_table_stats(ownname=>user,tabname =>'PERSON')
PL/SQL procedure successfully completed.

exec dbms_stats.gather_table_stats(ownname=>user,tabname =>'PERSON_NAME')
PL/SQL procedure successfully completed.

--Show I have data
select table_name,num_rows,last_analyzed from dba_tables
where owner=USER  AND TABLE_NAME IN ('PERSON','PERSON_NAME');

TABLE_NAME                       NUM_ROWS LAST_ANALYZED
------------------------------ ---------- -----------------
PERSON                              71913 20-MAY-2010 11:06
PERSON_NAME                        107567 20-MAY-2010 11:06

--Explain a simple select
explain plan for
select /* */ pers.surname, pers.first_forename
from    person pers, person_name pena
where pena.pers_id =pers.pers_id
and    pena.surname = 'BROWN'
/ 
Explained.

select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------
Plan hash value: 2653984949

----------------------------------------------------------------------------------
| Id  | Operation          | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |             |  1427 | 45664 |   637   (5)| 00:00:03 |
|*  1 |  HASH JOIN         |             |  1427 | 45664 |   637   (5)| 00:00:03 |
|*  2 |   TABLE ACCESS FULL| PERSON_NAME |  1427 | 17124 |   320   (5)| 00:00:02 |
|   3 |   TABLE ACCESS FULL| PERSON      | 71913 |  1404K|   314   (4)| 00:00:02 |
----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("PENA"."PERS_ID"="PERS"."PERS_ID")
   2 - filter("PENA"."SURNAME"='BROWN')

16 rows selected.

--Now explain with dynamic sampling
explain plan for
select /*+ dynamic_sampling (pena 6) */ pers.surname, pers.first_forename
from    person pers, person_name pena
where pena.pers_id =pers.pers_id
and    pena.surname = 'BROWN'
/
Explained.

select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------
Plan hash value: 2653984949

----------------------------------------------------------------------------------
| Id  | Operation          | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |             |  1427 | 45664 |   637   (5)| 00:00:03 |
|*  1 |  HASH JOIN         |             |  1427 | 45664 |   637   (5)| 00:00:03 |
|*  2 |   TABLE ACCESS FULL| PERSON_NAME |  1427 | 17124 |   320   (5)| 00:00:02 |
|   3 |   TABLE ACCESS FULL| PERSON      | 71913 |  1404K|   314   (4)| 00:00:02 |
----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("PENA"."PERS_ID"="PERS"."PERS_ID")
   2 - filter("PENA"."SURNAME"='BROWN')

Note
-----
   - dynamic sampling used for this statement

20 rows selected.

-- Note the appearance of the NOTE about dynamic sampling
-- If I did this again, I would add extra data to the table to get different stats
-- from the dynamic sampling


--truncate my person_name table and re-gather stats

truncate table person_name;
Table truncated.

exec dbms_stats.gather_table_stats(ownname=>user,tabname =>'PERSON_NAME')
PL/SQL procedure successfully completed.

TABLE_NAME                       NUM_ROWS LAST_ANALYZED
------------------------------ ---------- -----------------
PERSON                              71913 20-MAY-2010 11:06
PERSON_NAME                             0 20-MAY-2010 11:06


--now try and dynamic sample a known empty table
explain plan for
select /*+ dynamic_sampling (pena 6) */ pers.surname, pers.first_forename
from    person pers, person_name pena
where pena.pers_id =pers.pers_id
and    pena.surname = 'BROWN'; 
Explained.

select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------
Plan hash value: 1295262714

---------------------------------------------------------------------------------------------
| Id  | Operation                     | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |             |     1 |    50 |     3   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID  | PERSON      |     1 |    20 |     2   (0)| 00:00:01 |
|   2 |   NESTED LOOPS                |             |     1 |    50 |     3   (0)| 00:00:01 |
|   3 |    TABLE ACCESS BY INDEX ROWID| PERSON_NAME |     1 |    30 |     1   (0)| 00:00:01 |
|*  4 |     INDEX RANGE SCAN          | PENA_SN     |     1 |       |     1   (0)| 00:00:01 |
|*  5 |    INDEX RANGE SCAN           | PERS_PEID   |     1 |       |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - access("PENA"."SURNAME"='BROWN')
   5 - access("PENA"."PERS_ID"="PERS"."PERS_ID")

18 rows selected.

-- I initially thought Oracle might be detecting that the 
-- table was empty - never had data inserted or had been truncated
-- and thus knew there was no point sampling...But...
 
-- put the data back
insert into person_name select * from pena_backup;

107567 rows created.

commit;
Commit complete.

--and explain again

Explained.
explain plan for
select /*+ dynamic_sampling (pena 6) */ pers.surname, pers.first_forename
from    person pers, person_name pena
where pena.pers_id =pers.pers_id
and    pena.surname = 'BROWN';

select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------
Plan hash value: 1295262714

---------------------------------------------------------------------------------------------
| Id  | Operation                     | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |             |     1 |    50 |     3   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID  | PERSON      |     1 |    20 |     2   (0)| 00:00:01 |
|   2 |   NESTED LOOPS                |             |     1 |    50 |     3   (0)| 00:00:01 |
|   3 |    TABLE ACCESS BY INDEX ROWID| PERSON_NAME |     1 |    30 |     1   (0)| 00:00:01 |
|*  4 |     INDEX RANGE SCAN          | PENA_SN     |     1 |       |     1   (0)| 00:00:01 |
|*  5 |    INDEX RANGE SCAN           | PERS_PEID   |     1 |       |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - access("PENA"."SURNAME"='BROWN')
   5 - access("PENA"."PERS_ID"="PERS"."PERS_ID")

18 rows selected.

-- Note, nothing about dynamic sampling, the rows and bytes expected are the same as
-- the query with no dynamic sampling hint

-- now gather stats again so oracle knows there is some data

exec dbms_stats.gather_table_stats(ownname=>user,tabname =>'PERSON_NAME') 
PL/SQL procedure successfully completed.

select table_name,num_rows,last_analyzed from dba_tables
where owner=USER  AND TABLE_NAME IN ('PERSON','PERSON_NAME');

TABLE_NAME                       NUM_ROWS LAST_ANALYZED
------------------------------ ---------- -----------------
PERSON                              71913 20-MAY-2010 11:06
PERSON_NAME                        107567 20-MAY-2010 11:09

--and now try dynamic sampling again

explain plan for
select /*+ dynamic_sampling (pena 6) */ pers.surname, pers.first_forename
from    person pers, person_name pena
where pena.pers_id =pers.pers_id
and    pena.surname = 'BROWN';
Explained.

select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------
Plan hash value: 2653984949

----------------------------------------------------------------------------------
| Id  | Operation          | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |             |  1427 | 45664 |   637   (5)| 00:00:03 |
|*  1 |  HASH JOIN         |             |  1427 | 45664 |   637   (5)| 00:00:03 |
|*  2 |   TABLE ACCESS FULL| PERSON_NAME |  1427 | 17124 |   320   (5)| 00:00:02 |
|   3 |   TABLE ACCESS FULL| PERSON      | 71913 |  1404K|   314   (4)| 00:00:02 |
----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("PENA"."PERS_ID"="PERS"."PERS_ID")
   2 - filter("PENA"."SURNAME"='BROWN')

Note
-----
   - dynamic sampling used for this statement

20 rows selected.

What prompted me to find this is I had to provide a DYNAMIC SAMPLING hint to a development team and I wanted to make sure it was working correctly first (it was a MERGE statement and I was not sure where to best put it). As you get no warnings about syntax errors with hints, it is best to prove they are correct before handing them over :-)
I was doing this in a dev system and, try as I might I could not get the hint to work. Because, as I now know, the dev system had no data in the driving table and stats had been gathere on it. I wasted 30 mins in a critical fix-on-fail due to this oddity.

I have not tried to see if DYNAMIC SAMPLING stated at the instance level is also ignored if there are zero-row stats against the table.

INTERNAL_FUNCTION() Impact April 21, 2010

Posted by mwidlake in internals, performance.
Tags: , ,
13 comments

You may occasionally see something like the following in the filter predicates of an Explain Plan:

INTERNAL_FUNCTION(“COL_1″)>:P0

What is INTERNAL_FUNCTION, what is the use of INTERNAL_FUNCTION() in Oracle explain plans and why am I putting things like ORACLE INTERNAL_FUNCTION partition exclusion and meaning of internal_function performance almost as random strings in this blog?

Well, I want to get the hit rate up for this post as when I did a quick search on INTERNAL_FUNCTION the current top hit is This unfortunately misleading posting that tells you “It is not a significant overhead in SQL execution performance”. This internal function IS potentially a significant overhead in SQL Execution performance. You can’t add comments to the above site {just book a course or order a book}, so I’m having to create my own page…

INTERNAL_FUNCTION is, as the above posting says, where oracle does an implicit data conversion. In the shown case, as well as in my example in this blog and in most cases I have seen, from timestamp to date.

Why is it an issue? Because it is a function on a column and as such it can have a disasterous impact on sql execution performance.

This code below ( which has most select columns removed for simplicity and to protect the innocent) was performing very poorly, taking over 30 minutes to complete. This is all on 10.2.0.3 Enterprise edition of Oracle.

SELECT trans0_.ID as ID3_1_
...
, round(transfers0_.AMOUNT,2) as formula10_0_
FROM VW_PMTT trans0_
WHERE  (trans0_.MODIFIED_DT between :p0 AND :p1
        AND trans0_.PERS_ID = :p2)
and transfers0_.ACCOUNT_ID=:p3
ORDER BY transfers0_.MODIFIED_DT

This is the plan and the significant filter/access predicates:

Plan hash value: 3261981189
------------------------------------------------------------------------------------------------------
| Id  | Operation                         | Name                 | Rows  |Bytes| Cost   |Pstart|Pstop|
------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                  |                      |       |     | 67841  |      |     |
|   1 |  SORT ORDER BY                    |                      |     8 | 920 | 67841  |      |     |
|   2 |   VIEW                            | VW_PMTP              |     8 | 920 | 67840  |      |     |
|   3 |    UNION-ALL                      |                      |       |     |        |      |     |
|*  4 |     FILTER                        |                      |       |     |        |      |     |
|   5 |      NESTED LOOPS                 |                      |     4 | 280 | 28490  |      |     |
|*  6 |       TABLE ACCESS BY INDEX ROWID | W_TBLP               |     1 |  14 |     3  |      |
|*  7 |        INDEX UNIQUE SCAN          | PK_W_TBLP            |     1 |     |     2  |      |     |
|   8 |       PARTITION RANGE ALL         |                      |     4 | 224 | 28487  |    1 | 1580|
|*  9 |        TABLE ACCESS BY LOC INd RID| W_PAYMENT            |     4 | 224 | 28487       1 | 1580|
|* 10 |         INDEX RANGE SCAN          | IDX_W_P_TRANS        | 66936 |     |  3345  |    1 | 1580|
...

Predicate Information (identified by operation id):
---------------------------------------------------

   6 - filter("PLY"."ACCOUNTID"=:P3)
   7 - access("PLY"."PERS_ID"=:P2)
   9 - filter(("PT"."TYPE"=1184769 AND INTERNAL_FUNCTION("PT"."MODIFIED")>=:P0 AND INTERNAL_FUNCTION ("PT"."MODIFIED")<:P1

As you can see, the view is translated into a simple two-table join {and unioned with a second two-table join, which I removed again for clarity, but if anyone wants the full plan and statement, email me} where:

  • one table is accessed on a unique index (via bind variable P2).
  • Row filtered for P3
  • This table is then joined to a Partitioned table in a nested loop 
  • The table being accessed via a local index range scan.

At first glance, the plan may look fine, but Look at the pstart and pstop. 1 to 1580. That is every partition in the table.
The predicate information shows that the INTERANAL_FUNCTION(“PT”.”MODIFIED”) column is being compared to P0 and P1.

The partition key on the table is:-

>@chk_patr_key
Name of Table : W_PAYMENT

TAB_NAME                       OBJECT_TYP PSN COLUMN_NAME
------------------------------ ---------- --- -----------
USER.W_PAYMENT                 TABLE       1 MODIFIED

ie the column that is being flitered by. Why no partition pruning?

Partition pruning is not occuring because of the “INTERNAL_FUNCTION” being applied to that column. The CBO is not able to understand how the result of a function will match to the partition values of the column. Not even it’s own, internally used function :-P

For the above, the “date” bind variables were defined as

p0 timestamp := to_timestamp(’01-FEB-2010 00:00′,’DD-MON-YYYY HH24:MI’);
p1 timestamp := to_timestamp(’03-FEB-2010 00:00′,’DD-MON-YYYY HH24:MI’);

I then altered the code such that the bind variables were defined as dates.

p0 date := to_date(’01-FEB-2010′,’DD-MON-YYYY’);
p1 date := to_date(’03-FEB-2010′,’DD-MON-YYYY’);

The plan and significant predicates were now:

Plan hash value: 346162108
------------------------------------------------------------------------------------------------------
| Id  | Operation                         | Name                 | Rows  |Bytes| Cost   |Pstart|Pstop|
------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                  |                      |       |     | 67841  |      |     |
|   1 |  SORT ORDER BY                    |                      |     4 | 460 | 67841  |      |     |
|   2 |   VIEW                            | VW_PMTP              |     4 | 460 | 67840  |      |     |
|   3 |    UNION-ALL                      |                      |       |     |        |      |     |
|*  4 |     FILTER                        |                      |       |     |        |      |     |
|   5 |      NESTED LOOPS                 |                      |     2 | 140 | 28490  |      |     |
|*  6 |       TABLE ACCESS BY INDEX ROWID | W_TBLP               |     1 |  14 |     3  |      |     |
|*  7 |        INDEX UNIQUE SCAN          | PK_W_TBLP            |     1 |     |     2  |      |     |
|   8 |       PARTITION RANGE ALL         |                      |     2 | 112 | 28487  |  KEY | KEY |
|*  9 |        TABLE ACCESS BY LOC INd RID| W_PAYMENT            |     2 | 112 | 28487  |  KEY | KEY |
|* 10 |         INDEX RANGE SCAN          | IDX_W_P_TRANS        | 66936 |     |  3345  |  KEY | KEY |
...

Predicate Information (identified by operation id):
---------------------------------------------------

   6 - filter("PLY"."ACCOUNTID"=:P3)
   7 - access("PLY"."PERS_ID"=:P2)
   9 - filter(("PT"."TYPE"=1184769 AND "PT"."MODIFIED">=:P0 AND "PT"."MODIFIED"<:P1

The expected rows and bytes drop but the key difference in the plan is Pstart and Pstop are now KEY KEY. ie the CBO will evaluate at run time the values coming back from the Table access of W_TBLP and partition prune the access of the table.

In this situation, the original query took over 30 minutes to return. The new code took 18 seconds.

I tried one final test. I set the bind variables to be varchar2 so that implicit data conversion would occur:
p0 varchar2(20) := ’01-FEB-2010 00:00′;
p1 varchar2(20) := ’03-FEB-2010 00:00′;

With VARCHAR2 the CBO was able to do the implicit conversion without the INTERNAL_FUNCTION and the plan/filter predicates were exactly as for native dates.

It would seem this issue with INTERNAL_FUNCTION blights timestamp to date implicit data conversion but not some other implicit data conversions. Without testing further, I can’t be sure which.

This is a documented feature by Oracle. Unusually (and much to their credit) Oracle even document the potential negative impact of this particular construct with timestamps and dates. See this section in chapter 5 of the Oracle 10g data warehousing guide.

I better balance the fact I linked to an inaccurate posting on the subject with a few good ones.
This is a nice one from McLaud on the OraFAQ site where he has come across it as a problem and with help shows how it is due to implicit conversion of a timestamp from Hibernate.
This blog on the issue arising with Java is a nice example of the fix (use a date to hold a date, not a timestamp).

Finally, this link is to an older but very, very valid rant by Tom Kyte about implicit data conversion and using the correct data type for your data. {Don’t go using numbers or varchars to hold dates, just don’t, and similarly you should avoid implicit data conversion}.

Performance Tipping Points April 13, 2010

Posted by mwidlake in performance, statistics.
Tags: , ,
add a comment

This week I came across a nice example of a performance tipping point. This is where Everything is OK until you reach a point where it all quickly cascades to Not OK.

The below shows the timings for a regulalry run “Alert”. Four times an hour we want to know if something quite important has happened on the system.

ID S START_DATE END_DATE DUR TIME_RAN START_PARAM END_PARAM
---- - ------------- ------------- --------- ------------- -------------------- -----------------
292 S 0410 03:06:00 0410 03:06:31 .000359 0410 03:06:00 20100410 02:51:00 20100410 03:06:00
292 S 0410 03:21:00 0410 03:21:35 .000405 0410 03:21:00 20100410 03:06:00 20100410 03:21:00
292 S 0410 03:36:00 0410 03:36:38 .000440 0410 03:36:00 20100410 03:21:00 20100410 03:36:00
292 S 0410 03:51:00 0410 03:51:33 .000382 0410 03:51:00 20100410 03:36:00 20100410 03:51:00
292 S 0410 04:06:00 0410 04:06:28 .000324 0410 04:06:00 20100410 03:51:00 20100410 04:06:00
292 S 0410 04:21:00 0410 04:21:44 .000509 0410 04:21:00 20100410 04:06:00 20100410 04:21:00
292 S 0410 04:36:00 0410 04:36:27 .000313 0410 04:36:00 20100410 04:21:00 20100410 04:36:00
292 S 0410 04:51:00 0410 04:51:34 .000394 0410 04:51:00 20100410 04:36:00 20100410 04:51:00
292 S 0410 05:06:00 0410 05:06:44 .000509 0410 05:06:00 20100410 04:51:00 20100410 05:06:00
292 S 0410 05:21:00 0410 05:21:43 .000498 0410 05:21:00 20100410 05:06:00 20100410 05:21:00
292 S 0410 05:36:00 0410 05:37:01 .000706 0410 05:36:00 20100410 05:21:00 20100410 05:36:00
292 S 0410 05:51:00 0410 05:52:21 .000938 0410 05:51:00 20100410 05:36:00 20100410 05:51:00
292 S 0410 06:06:00 0410 06:08:09 .001493 0410 06:06:00 20100410 05:51:00 20100410 06:06:00
292 S 0410 06:21:01 0410 06:24:00 .002072 0410 06:21:01 20100410 06:06:00 20100410 06:21:01
292 S 0410 06:36:00 0410 06:40:12 .002917 0410 06:36:00 20100410 06:21:01 20100410 06:36:00
292 S 0410 06:51:00 0410 06:56:54 .004097 0410 06:51:00 20100410 06:36:00 20100410 06:51:00
292 S 0410 07:06:00 0410 07:13:17 .005058 0410 07:06:00 20100410 06:51:00 20100410 07:06:00
292 S 0410 07:21:00 0410 07:29:42 .006042 0410 07:21:00 20100410 07:06:00 20100410 07:21:00
292 S 0410 07:36:00 0410 07:47:48 .008194 0410 07:36:00 20100410 07:21:00 20100410 07:36:00
292 S 0410 07:51:00 0410 08:08:07 .011887 0410 07:51:00 20100410 07:36:00 20100410 07:51:00
292 S 0410 08:08:07 0410 08:29:43 .015000 0410 08:08:07 20100410 07:51:00 20100410 08:08:07
292 S 0410 08:29:43 0410 08:50:10 .014201 0410 08:29:43 20100410 08:08:07 20100410 08:29:43
292 S 0410 08:50:10 0410 09:22:28 .022431 0410 08:50:10 20100410 08:29:43 20100410 08:50:10
292 S 0410 09:22:28 0410 10:27:11 .044942 0410 09:22:28 20100410 08:50:10 20100410 09:22:28
292 S 0410 10:27:11 0410 12:57:16 .104225 0410 10:27:11 20100410 09:22:28 20100410 10:27:11
292 F 0410 12:57:16 0410 14:50:26 .078588 0410 12:57:16 20100410 10:27:11 20100410 12:57:16
292 F 0410 14:50:26 0410 16:49:42 .082824 0410 14:50:26 20100410 10:27:11 20100410 14:50:26
292 F 0410 16:49:42 0410 19:06:10 .094769 0410 16:49:42 20100410 10:27:11 20100410 16:49:42

The first half dozen records show the program ID 292 running in half aminute or so. It varies a little, from just under 30 seconds to 45 seconds. Each run kicks off 15 minutes after the previous and, if you check the START_PARAM and END_PARAM on the right of the listing, you can see that the report runs for the previous 15 minutes. ie the period since the last run ( including the running time of the last itteration).

Then, at 05:36 something happens. The execution takes a full minute. The next run takes 1 minute and 20 seconds. The next run takes over two minutes and each following execution takes a little longer and longer.

At 07:51, 1.5 hours later, something significant happens. The execution takes more than 15 minutes to run. This report is run every 15 minutes. I’m sure you can appreciate that this is a problem. What happens now depends on your architecture.

If you have a scheduler that simply kicks off the report every 15 minutes, at the next scheduled time (08:06 in my case) a new execution will start and you will have two version of the report running for a couple of minutes. As time progresses this overlap increases. When the run time reaches 30 minutes you will now start having 3 reports running at the same time. And soon you will get three, four, five etc version running at the same time. Depending on the number of CPUs and IO bandwidth of your system, how long will it be until it will be doing nothing but running this code?
If the code is doing somthing more than simply selecting data, the chance of the concurrent versions competing for locks or resources is high and can mean that the slow-down of the code escalates rapidly and in a very, very short time, your system is on it’s knees.

In our case, we have some protection against this. The scheduler detects that the previous version has not finished and it waits until it has done so before starting the next execution. so the next execution is delayed. In my example the 08:06 execution is delayed until 08:08, 2 minutes late.

We still have two possible situations. The report could simply continue to slow down at the previous rate and just be delayed longer and longer. In my example it has been slowing down at around 1 minute 20 seconds per run for the last 5 runs.

However, The next execution is further delayed until 08:29 – which is 6 minutes beyond the 15 minute window from 08:08. This latets run of the code is 5 minutes 30 seconds slower than the previous execution, not 1 minute 20 seconds slower. We still have a tipping point. If you remember, this report runs to cover the period since the last execution. As the last execution took more than 15 minutes, the next run has to report on more than 15 minutes. 21 minutes in this case.

The slow-down now rapidly escalates due to not just whatever was slowing the report down initially but also the growing reporting period.

Just 5 runs and 5 hours later, at 12:57:16, the report fails. It now runs for so long that it cannot recreate the data as it looked at the start of the run and we get snapshot-too-old errors. In the listing you just see the second column go from S to F.

It is game over. Thankfully only one copy of the report is running at any time (A version is kicked off as soon as the previous one fails) and so, worst case, is using only one process on the box, one CPU maximum and the IO that one thread can demand.

What was the root cause? Time and statistics and data volume.
Statistics because the job that collects statistics failed earlier.
Time because at 05:50 or so, the run window of 15 minutes was far enough out of the range of known date-times for a partition that the CBO decided only a few rows would be found and swapped to a nested-loop execution.
Data volume as data was coming in quicker as the day progressed and the nested loop plan performance was highly susceptible to increases in data volume.

For an excellent posting on why code suddenly changes it’s execution plan when nothing has happened but time passing, seethis excellent post and comments on the topic on Richard Foote’s blog

{And isn’t his blog so much skinnier than mine? :-) }

Friday Philosophy – I killed a presentation April 1, 2010

Posted by mwidlake in Friday Philosophy, performance, statistics.
Tags: , , ,
9 comments

Well, this week saw the latest Management and Infrastructure SIG. I won’t go into the SIG meeting itself just now, maybe later this weekend. If you want an opinion on it, you might like to see what Graham Oaks said.

Being Chair of the SIG I have to find presenters. Being also one of those rare people who enjoys presenting, I know I can always tap one person to present. Me. {If only I could sing, play an instrument, act, do impression, put both my ankles behind my ears(*) or anything that was of any general interest, getting an audience would be so much easier}.

I decided to tackle a topic I have tried before and which I previously did a very poor show of: “General principles on gathering database statistics”.
It is not a title to strike joy into the soul, I know, but it is really important to getting good and reliable performance out of any modern Oracle database. Even other Oracle DBA types tend to find the topic worthy but boring, but I seem to have been forced to know a lot about it and I’ve become sadly very passionate about it.

So, I tried again. I mean, how long should it take to describe the issues with database statistics and the general principles to gathering them? I took my old presentation and stripped out all code, all SQL syntax, all listing of tables and got it down to describing the process. Then I started adding the bits and pieces you need to know to get this half-right. 132 slides. Hmmmm

I tried again and took it higher level. 48 slides. I can do 48 slides in something between 45 minutes and an hour. I was ready.

I had the last presentation slot of the day. I figure if I am going to foist myself on the audience, they should have the right to leave before I start, without missing any of the good stuff. I had 3/4 of the day’s attendees still with me when I started.

I think I killed my audience. Actually, I know I killed my audience. I lost a few as the slot progressed {I kidded myself they had to get back to the office/catch a train} and I made the fatal mistake of not shutting up when I had done 45 minutes. You see, there was all this important stuff I had not mentioned yet! Thankfully, I had a friend in the second row and when I saw him lose the will to live, I stopped. I’d been at it for 70 minutes. What we really needed was the Chair to shut me up but I was the chair.

Anyway, I came away from the talk with two realisations.

  • The topic of gathering database statistics is too large to cover in one presentation session.
  • It really is very boring, even when you are passionate about it.

I think I have no choice but to make this a blog thread (as I threatened to do so about 1 month ago). But I have also promised myself to finish off SQL Audit before I start a new thread and that was 3 months ago.

So I think I am going to have to put some time into this blogging lark. It’s OK though, I seem to go on about database statistics so much that even my cat is staring to avoid me. I’ll just stop watching Star Trek and drinking wine in the evenings and switch to preparing technical blogs. And drinking wine of course.

(*) Up until 2007 I could do this. I get out more now. Despite the stats fixation

Which Hints to Use March 18, 2010

Posted by mwidlake in performance.
Tags: , ,
2 comments

When tuning Oracle SQL code, which are the best Hints to use? Are there good and bad Hints? I believe so.

Firstly, I wish Oracle had called Hints something else. As many have commented, Hints are actually directives to the CBO. If the Hint is syntactically correct and can be applied, it will be applied. If you include a “USE_INDEX” Hint to tell the CBO to use an existing index then, no matter how crazy it would be to do so, the CBO will use the hint if doing so does not prevent the query from being logically possible.

That leads onto one of the reasons I do not actually like Hints {and these reasons lead to my definition of good and bad hints}. So long as the Hint remains possible, it will be followed. It has become a vital part of the logic of the code but Hints are rarely reviewed.
Another reason, often mentioned, is that if new functionality is introduced to the Oracle Optimizer, Hints may prevent it being used {eg you might have hinted the use of an index but now Oracle can convert two other, less selective B-tree indexes into bitmaps and merge them together to get a far more selective path – but the Hint forces the use of the original index}.
The above is possible but is rare compared to a far more common issue – You hinted a specific index be used, but if you now create another, more suitable index for that sql statement, the new index will not be used. The hinted one will still be used. Similarly If you drop the index that the Hint references, now the Hint is invalid and the CBO will chose a new access path. You are given no warning when you drop an index that hints reference {and it would be very tricky for Oracle to do this for you in a reliable way}.

A final problem with Hints is that the access path may need to change as data volumes and relationships change. When there are very few rows in a driving table, a nested loop access path may make sense. But if that driving table grows and has many more rows in it, then a hash join becomes more efficient. A Hint can fix this path.

I prefer Hints that give the CBO more information but allow it to still choose the path and to vary as data volumes change.

I like the DYNAMIC_SAMPLING Hint as it is generally just telling oracle to look harder {more intelligently} at the statistical information, at the cost of spending a little longer on the parse . Most systems have OPTIMIZER_DYNAMIC_SAMPLING set at 1 or 2 so by default tables with no stats will have stats gathered. Hinting at level 3 and 4 instructs the CBO to verify estimate guesses for predicates it has made and check correlation between rows. It is probably the Hint I am most happy using.

In version 10.1 I encountered lots and lots of issues with the CBO trying to unnest and merge elements of the SQL into the main body of the query. And getting it wrong. Though it is potentially stopping the CBO from examining useful access paths, I do use NO_MERGE and NO_UNNEST quite often and I “like” them as it leaves the rest of the decisions up to the optimizer. You are basically saying “leave that subquery alone and satisfy it in isolation”. I still encounter lots of such issues on 10.2, but I also use UNNEST more often, to push a subquery into the body of the code.

I am more happy using a CARDINALITY hint than USE_NL or USE_HASH as the CARDINALITY hint is self documenting (it says exactly what correction {or lie} you are giving the optimiser). A USE_NL Hint is enforcing an access path and not really saying why.

If specific hints are going to be used, I like to be very specific. USE_HASH should state both tables that are to be hashed together {It might even be that modern versions of Oracle insist on the form USE_HASH (taba tab) and not just USE_HASH (taba), as I never use the less specific Hint}.
{ NB see comment by Jonathan Lewis as to why I am utterly wrong on this one – USE_HASH basically says “Use a hash join on the tables listed” but does not force a hash between the two stated tables}.

I only Hint specific use of an index if I can’t fix the problem with better gathered stats. I don’t LIKE adding INDEX Hints, even though they are commonly used and easy to understand. For the reasons stated above, I do not like ordering the CBO to use one of the currently existing indexes.

I really do not like using the RULE hint. In Version 9 it was an uphill struggle to get people to not use it as it so often fixed the immediate problem and, of course, oracle used it so much for internal SQL (and still do in 10 and even in 11). How many current DBAs and Developers know what the Rule Based Optimizer rules are? {hands DOWN you over-40 lot}. Using the RULE hint is bordering on homeopathy for databases. It seems to work, you have no idea why and, in reality, it may well be doing nothing, as you are using a feature of Oracle that is incompatible wiht the RBO.

I am very, very uncomfortable about some hints. The bypass_ujvc hint is one of them. It basically tells Oracle it can do a MERGE INTO statement without having the unique constraint in place to support the where clause on the target table that allows it to work reliably. You are telling the optimizer “just trust me”. IE you can lie horribly to the CBO.

All in all, I try and get the stats right rather than hint. I’ll spend ten times as long trying to understand and fix (if I can) why the estimated costs and cardinalites in an Explain Plan are wrong than slapping in an INDEX Hint. I will use Hints if I can’t fix the plan via the stats, but I try and use the more generic Hints. I know from experience that fixing the stats {or at least understanding why I can’t} fixes more code than adding one hint to one SQL statement.

A good rule of thumb is, if the Cardinality is accurate for a step, the plan will be acceptable. {This is a Rule of thumb, not a cast-iron truth!}.

DBMS SIG March 10, 2010

Posted by mwidlake in internals, Meeting notes, performance.
Tags: , ,
5 comments

I went to the DBMS SIG today {DBMS Special Interest Group meeting of the UK Oracle User Group}. Don’t worry, I am not going to run through the set of presentations and make comments on them – although I like reading such entries by others on their blogs, I generally like them due to the style of writing as opposed to getting information out of them. But there is the odd tidbit that can be worth disseminating to as wide an audience as possible and I like to do my bit.

Having said I won’t go through all the talks… :-). No, I just want to comment that all the talks had merit at this meeting and, quite rightly, the meeting was full. This is nice to see as last year SIG attendance fell across the board, due to the “economic climate”. Very daft, in my opinion, as I see SIGs as free training plus networking opportunities (sorry to use the “networking” word” plus different viewpoints, all of which are invaluable at any time and especially valuable when things are hard.

Go to SIGs (or whatever is available in your country) as it is always worth the day invested. Where else can you see half a dozen experts give opinions and also corner them and ask them more stuff as well?

Anyway, the tidbits. First, Jonathan Lewis demonstrated how he goes about solving issues with complex SQL statements. It was terribly feindish and extremly clever and needs incredible in-depth knowledge of the oracle RDBMS… He draws a pseudo Entity Relationship Diagram of the tables involved, adds in some figures on what filtering will achieve and what ratio of records will be involved in table joins and asks the question “what will probably happen” a couple of dozen times. Yes, I lied, it does not need vast knowledge. It needs a clear, simple approach to solving the problem. And an ERD. I love ERDs and rue their general demise. I use exactly the same method myself to investigate complex SQL performance issues {I think I could be accused of trying to ride on shirt-tails here, but honestly, the step I take if an Explain Plan does not help me is to ERD the statement and look at the indexes and ratios between tables to see how I , as a human, would solve the query. Chatting to a few other old lags, it is a relatively universal approach by those of us who have used ERDs}. If you are a member of the UKOUG I strongly recommend downloading the slides to Jonathan’s talk. If you are not a member, maybe Jonathan will present it again at another venue, or you could get him to come along and do a course on tuning. {Jonathan, if you get a gig as a result if this, I want a pint of Marston’s Pedigree, OK?}

{And thanks to Sean malloy for commenting to provide a link to a published version of Jonathan’s method – Jonathan did mention this, highlighting the fact that it is his first real foray into SQL*Server. However, the method is database agnostic. This is the article}

Second tidbit. Adrian Dodman and Owen Ireland (who both look remarkably like Hollywood hearthrobs in their pictures, but different as their in-the-flesh selves, though very decent chaps they are too.) did an excellent talk on VLDB physical standbys, a topic that has particular resonance for myself. They mentioned parallel_execution_message_size. This defaults to 2k, on 10g at least. It is a rubbish setting. No, let me not beat about the bush, it is an utterly rubbish setting. If you use parallel query, parallel recovery or parallel anything-at-all, check out this parameter and, due dilligence allowing, increase it. Try 8k as opposed to 2k and even 16k. The manual on it says the default of 2k/4k is fine. It ain’t. Increasing the value just takes some memory out of the shared pool and, these days, if you can’t afford a few extra KB out of your shared pool, you need to replace your server with something costing about twice as much as a top-end desktop PC. { Why am I so vigorous in my opinion on this parameter? Well, I had a situation a few months back of trying to migrate a database to a new geographic location for a client in Germany. We did a backup/recovery type operation to do this. Applying the redo logs was proving to be a performance issue so Oracle Corp suggested parallel redo log application. It ran a LOT slower than single thread, about 300% slower. However, increasing the parallel_execution_message_size from 2k to 8k made the parallel log application about 400% faster than single thread. ie a dozen times faster. I know from presentations by Christian Antognini and discussions with others that it is a key parameter to getting parallel query to perform well too.}

Last tidbit. Don’t drop the OUTLN user. Yes, I know, why would you? Just don’t, OK? Especially on Oracle 11. If you do, for whatever reason, DO NOT SHUT DOWN THE DATABASE. Call Oracle Support and pray. Thanks go to Peter Mahaffey for that one. Yes he did. It all went terribly wrong for him.

Follow

Get every new post delivered to your Inbox.

Join 166 other followers