jump to navigation

Dynamic Sampling Hint Ignored May 20, 2010

Posted by mwidlake in internals, performance.
Tags: ,
27 comments

This is a bit of an odd one. On V10.2.0.3 (I have not tried other versions yet) then the dynamic sampling hint appears to be ignored if the stats for the table say there are zero rows in the table. Even if there are in fact lots of rows.

I’ve not had chance to dig into Metalink etc yet {I’m putting together this post before I yet again get overwhelmed and fail to put up something I think is of interest} but if this turns out to be a general bug impacting other versions, it could catch people out. This is because one tactic to handle queries where the data volumes shifts all the time is to not gather stats and use the dynamic sampling hint. “Not gathering stats” could also be “gather stats when the table is empty and ignore it – the dynamic sampling hint will take care of everything”.

Here is my worked example. I use two tables, PERSON and PERSON_NAME, related on a column PERS_ID. {If you want the exact table definitions etc, mail me}.

--preserve my data
create table pena_backup as select * from person_name;

Table created.

--ensure I have stats
exec dbms_stats.gather_table_stats(ownname=>user,tabname =>'PERSON')
PL/SQL procedure successfully completed.

exec dbms_stats.gather_table_stats(ownname=>user,tabname =>'PERSON_NAME')
PL/SQL procedure successfully completed.

--Show I have data
select table_name,num_rows,last_analyzed from dba_tables
where owner=USER  AND TABLE_NAME IN ('PERSON','PERSON_NAME');

TABLE_NAME                       NUM_ROWS LAST_ANALYZED
------------------------------ ---------- -----------------
PERSON                              71913 20-MAY-2010 11:06
PERSON_NAME                        107567 20-MAY-2010 11:06

--Explain a simple select
explain plan for
select /* */ pers.surname, pers.first_forename
from    person pers, person_name pena
where pena.pers_id =pers.pers_id
and    pena.surname = 'BROWN'
/ 
Explained.

select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------
Plan hash value: 2653984949

----------------------------------------------------------------------------------
| Id  | Operation          | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |             |  1427 | 45664 |   637   (5)| 00:00:03 |
|*  1 |  HASH JOIN         |             |  1427 | 45664 |   637   (5)| 00:00:03 |
|*  2 |   TABLE ACCESS FULL| PERSON_NAME |  1427 | 17124 |   320   (5)| 00:00:02 |
|   3 |   TABLE ACCESS FULL| PERSON      | 71913 |  1404K|   314   (4)| 00:00:02 |
----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("PENA"."PERS_ID"="PERS"."PERS_ID")
   2 - filter("PENA"."SURNAME"='BROWN')

16 rows selected.

--Now explain with dynamic sampling
explain plan for
select /*+ dynamic_sampling (pena 6) */ pers.surname, pers.first_forename
from    person pers, person_name pena
where pena.pers_id =pers.pers_id
and    pena.surname = 'BROWN'
/
Explained.

select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------
Plan hash value: 2653984949

----------------------------------------------------------------------------------
| Id  | Operation          | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |             |  1427 | 45664 |   637   (5)| 00:00:03 |
|*  1 |  HASH JOIN         |             |  1427 | 45664 |   637   (5)| 00:00:03 |
|*  2 |   TABLE ACCESS FULL| PERSON_NAME |  1427 | 17124 |   320   (5)| 00:00:02 |
|   3 |   TABLE ACCESS FULL| PERSON      | 71913 |  1404K|   314   (4)| 00:00:02 |
----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("PENA"."PERS_ID"="PERS"."PERS_ID")
   2 - filter("PENA"."SURNAME"='BROWN')

Note
-----
   - dynamic sampling used for this statement

20 rows selected.

-- Note the appearance of the NOTE about dynamic sampling
-- If I did this again, I would add extra data to the table to get different stats
-- from the dynamic sampling


--truncate my person_name table and re-gather stats

truncate table person_name;
Table truncated.

exec dbms_stats.gather_table_stats(ownname=>user,tabname =>'PERSON_NAME')
PL/SQL procedure successfully completed.

TABLE_NAME                       NUM_ROWS LAST_ANALYZED
------------------------------ ---------- -----------------
PERSON                              71913 20-MAY-2010 11:06
PERSON_NAME                             0 20-MAY-2010 11:06


--now try and dynamic sample a known empty table
explain plan for
select /*+ dynamic_sampling (pena 6) */ pers.surname, pers.first_forename
from    person pers, person_name pena
where pena.pers_id =pers.pers_id
and    pena.surname = 'BROWN'; 
Explained.

select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------
Plan hash value: 1295262714

---------------------------------------------------------------------------------------------
| Id  | Operation                     | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |             |     1 |    50 |     3   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID  | PERSON      |     1 |    20 |     2   (0)| 00:00:01 |
|   2 |   NESTED LOOPS                |             |     1 |    50 |     3   (0)| 00:00:01 |
|   3 |    TABLE ACCESS BY INDEX ROWID| PERSON_NAME |     1 |    30 |     1   (0)| 00:00:01 |
|*  4 |     INDEX RANGE SCAN          | PENA_SN     |     1 |       |     1   (0)| 00:00:01 |
|*  5 |    INDEX RANGE SCAN           | PERS_PEID   |     1 |       |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - access("PENA"."SURNAME"='BROWN')
   5 - access("PENA"."PERS_ID"="PERS"."PERS_ID")

18 rows selected.

-- I initially thought Oracle might be detecting that the 
-- table was empty - never had data inserted or had been truncated
-- and thus knew there was no point sampling...But...
 
-- put the data back
insert into person_name select * from pena_backup;

107567 rows created.

commit;
Commit complete.

--and explain again

Explained.
explain plan for
select /*+ dynamic_sampling (pena 6) */ pers.surname, pers.first_forename
from    person pers, person_name pena
where pena.pers_id =pers.pers_id
and    pena.surname = 'BROWN';

select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------
Plan hash value: 1295262714

---------------------------------------------------------------------------------------------
| Id  | Operation                     | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |             |     1 |    50 |     3   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID  | PERSON      |     1 |    20 |     2   (0)| 00:00:01 |
|   2 |   NESTED LOOPS                |             |     1 |    50 |     3   (0)| 00:00:01 |
|   3 |    TABLE ACCESS BY INDEX ROWID| PERSON_NAME |     1 |    30 |     1   (0)| 00:00:01 |
|*  4 |     INDEX RANGE SCAN          | PENA_SN     |     1 |       |     1   (0)| 00:00:01 |
|*  5 |    INDEX RANGE SCAN           | PERS_PEID   |     1 |       |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - access("PENA"."SURNAME"='BROWN')
   5 - access("PENA"."PERS_ID"="PERS"."PERS_ID")

18 rows selected.

-- Note, nothing about dynamic sampling, the rows and bytes expected are the same as
-- the query with no dynamic sampling hint

-- now gather stats again so oracle knows there is some data

exec dbms_stats.gather_table_stats(ownname=>user,tabname =>'PERSON_NAME') 
PL/SQL procedure successfully completed.

select table_name,num_rows,last_analyzed from dba_tables
where owner=USER  AND TABLE_NAME IN ('PERSON','PERSON_NAME');

TABLE_NAME                       NUM_ROWS LAST_ANALYZED
------------------------------ ---------- -----------------
PERSON                              71913 20-MAY-2010 11:06
PERSON_NAME                        107567 20-MAY-2010 11:09

--and now try dynamic sampling again

explain plan for
select /*+ dynamic_sampling (pena 6) */ pers.surname, pers.first_forename
from    person pers, person_name pena
where pena.pers_id =pers.pers_id
and    pena.surname = 'BROWN';
Explained.

select * from table(dbms_xplan.display);
PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------
Plan hash value: 2653984949

----------------------------------------------------------------------------------
| Id  | Operation          | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |             |  1427 | 45664 |   637   (5)| 00:00:03 |
|*  1 |  HASH JOIN         |             |  1427 | 45664 |   637   (5)| 00:00:03 |
|*  2 |   TABLE ACCESS FULL| PERSON_NAME |  1427 | 17124 |   320   (5)| 00:00:02 |
|   3 |   TABLE ACCESS FULL| PERSON      | 71913 |  1404K|   314   (4)| 00:00:02 |
----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("PENA"."PERS_ID"="PERS"."PERS_ID")
   2 - filter("PENA"."SURNAME"='BROWN')

Note
-----
   - dynamic sampling used for this statement

20 rows selected.

What prompted me to find this is I had to provide a DYNAMIC SAMPLING hint to a development team and I wanted to make sure it was working correctly first (it was a MERGE statement and I was not sure where to best put it). As you get no warnings about syntax errors with hints, it is best to prove they are correct before handing them over :-)
I was doing this in a dev system and, try as I might I could not get the hint to work. Because, as I now know, the dev system had no data in the driving table and stats had been gathere on it. I wasted 30 mins in a critical fix-on-fail due to this oddity.

I have not tried to see if DYNAMIC SAMPLING stated at the instance level is also ignored if there are zero-row stats against the table.

Which Hints to Use March 18, 2010

Posted by mwidlake in performance.
Tags: , ,
2 comments

When tuning Oracle SQL code, which are the best Hints to use? Are there good and bad Hints? I believe so.

Firstly, I wish Oracle had called Hints something else. As many have commented, Hints are actually directives to the CBO. If the Hint is syntactically correct and can be applied, it will be applied. If you include a “USE_INDEX” Hint to tell the CBO to use an existing index then, no matter how crazy it would be to do so, the CBO will use the hint if doing so does not prevent the query from being logically possible.

That leads onto one of the reasons I do not actually like Hints {and these reasons lead to my definition of good and bad hints}. So long as the Hint remains possible, it will be followed. It has become a vital part of the logic of the code but Hints are rarely reviewed.
Another reason, often mentioned, is that if new functionality is introduced to the Oracle Optimizer, Hints may prevent it being used {eg you might have hinted the use of an index but now Oracle can convert two other, less selective B-tree indexes into bitmaps and merge them together to get a far more selective path – but the Hint forces the use of the original index}.
The above is possible but is rare compared to a far more common issue – You hinted a specific index be used, but if you now create another, more suitable index for that sql statement, the new index will not be used. The hinted one will still be used. Similarly If you drop the index that the Hint references, now the Hint is invalid and the CBO will chose a new access path. You are given no warning when you drop an index that hints reference {and it would be very tricky for Oracle to do this for you in a reliable way}.

A final problem with Hints is that the access path may need to change as data volumes and relationships change. When there are very few rows in a driving table, a nested loop access path may make sense. But if that driving table grows and has many more rows in it, then a hash join becomes more efficient. A Hint can fix this path.

I prefer Hints that give the CBO more information but allow it to still choose the path and to vary as data volumes change.

I like the DYNAMIC_SAMPLING Hint as it is generally just telling oracle to look harder {more intelligently} at the statistical information, at the cost of spending a little longer on the parse . Most systems have OPTIMIZER_DYNAMIC_SAMPLING set at 1 or 2 so by default tables with no stats will have stats gathered. Hinting at level 3 and 4 instructs the CBO to verify estimate guesses for predicates it has made and check correlation between rows. It is probably the Hint I am most happy using.

In version 10.1 I encountered lots and lots of issues with the CBO trying to unnest and merge elements of the SQL into the main body of the query. And getting it wrong. Though it is potentially stopping the CBO from examining useful access paths, I do use NO_MERGE and NO_UNNEST quite often and I “like” them as it leaves the rest of the decisions up to the optimizer. You are basically saying “leave that subquery alone and satisfy it in isolation”. I still encounter lots of such issues on 10.2, but I also use UNNEST more often, to push a subquery into the body of the code.

I am more happy using a CARDINALITY hint than USE_NL or USE_HASH as the CARDINALITY hint is self documenting (it says exactly what correction {or lie} you are giving the optimiser). A USE_NL Hint is enforcing an access path and not really saying why.

If specific hints are going to be used, I like to be very specific. USE_HASH should state both tables that are to be hashed together {It might even be that modern versions of Oracle insist on the form USE_HASH (taba tab) and not just USE_HASH (taba), as I never use the less specific Hint}.
{ NB see comment by Jonathan Lewis as to why I am utterly wrong on this one – USE_HASH basically says “Use a hash join on the tables listed” but does not force a hash between the two stated tables}.

I only Hint specific use of an index if I can’t fix the problem with better gathered stats. I don’t LIKE adding INDEX Hints, even though they are commonly used and easy to understand. For the reasons stated above, I do not like ordering the CBO to use one of the currently existing indexes.

I really do not like using the RULE hint. In Version 9 it was an uphill struggle to get people to not use it as it so often fixed the immediate problem and, of course, oracle used it so much for internal SQL (and still do in 10 and even in 11). How many current DBAs and Developers know what the Rule Based Optimizer rules are? {hands DOWN you over-40 lot}. Using the RULE hint is bordering on homeopathy for databases. It seems to work, you have no idea why and, in reality, it may well be doing nothing, as you are using a feature of Oracle that is incompatible wiht the RBO.

I am very, very uncomfortable about some hints. The bypass_ujvc hint is one of them. It basically tells Oracle it can do a MERGE INTO statement without having the unique constraint in place to support the where clause on the target table that allows it to work reliably. You are telling the optimizer “just trust me”. IE you can lie horribly to the CBO.

All in all, I try and get the stats right rather than hint. I’ll spend ten times as long trying to understand and fix (if I can) why the estimated costs and cardinalites in an Explain Plan are wrong than slapping in an INDEX Hint. I will use Hints if I can’t fix the plan via the stats, but I try and use the more generic Hints. I know from experience that fixing the stats {or at least understanding why I can’t} fixes more code than adding one hint to one SQL statement.

A good rule of thumb is, if the Cardinality is accurate for a step, the plan will be acceptable. {This is a Rule of thumb, not a cast-iron truth!}.

Follow

Get every new post delivered to your Inbox.

Join 163 other followers