jump to navigation

Which Hints to Use March 18, 2010

Posted by mwidlake in performance.
Tags: , ,
2 comments

When tuning Oracle SQL code, which are the best Hints to use? Are there good and bad Hints? I believe so.

Firstly, I wish Oracle had called Hints something else. As many have commented, Hints are actually directives to the CBO. If the Hint is syntactically correct and can be applied, it will be applied. If you include a “USE_INDEX” Hint to tell the CBO to use an existing index then, no matter how crazy it would be to do so, the CBO will use the hint if doing so does not prevent the query from being logically possible.

That leads onto one of the reasons I do not actually like Hints {and these reasons lead to my definition of good and bad hints}. So long as the Hint remains possible, it will be followed. It has become a vital part of the logic of the code but Hints are rarely reviewed.
Another reason, often mentioned, is that if new functionality is introduced to the Oracle Optimizer, Hints may prevent it being used {eg you might have hinted the use of an index but now Oracle can convert two other, less selective B-tree indexes into bitmaps and merge them together to get a far more selective path – but the Hint forces the use of the original index}.
The above is possible but is rare compared to a far more common issue – You hinted a specific index be used, but if you now create another, more suitable index for that sql statement, the new index will not be used. The hinted one will still be used. Similarly If you drop the index that the Hint references, now the Hint is invalid and the CBO will chose a new access path. You are given no warning when you drop an index that hints reference {and it would be very tricky for Oracle to do this for you in a reliable way}.

A final problem with Hints is that the access path may need to change as data volumes and relationships change. When there are very few rows in a driving table, a nested loop access path may make sense. But if that driving table grows and has many more rows in it, then a hash join becomes more efficient. A Hint can fix this path.

I prefer Hints that give the CBO more information but allow it to still choose the path and to vary as data volumes change.

I like the DYNAMIC_SAMPLING Hint as it is generally just telling oracle to look harder {more intelligently} at the statistical information, at the cost of spending a little longer on the parse . Most systems have OPTIMIZER_DYNAMIC_SAMPLING set at 1 or 2 so by default tables with no stats will have stats gathered. Hinting at level 3 and 4 instructs the CBO to verify estimate guesses for predicates it has made and check correlation between rows. It is probably the Hint I am most happy using.

In version 10.1 I encountered lots and lots of issues with the CBO trying to unnest and merge elements of the SQL into the main body of the query. And getting it wrong. Though it is potentially stopping the CBO from examining useful access paths, I do use NO_MERGE and NO_UNNEST quite often and I “like” them as it leaves the rest of the decisions up to the optimizer. You are basically saying “leave that subquery alone and satisfy it in isolation”. I still encounter lots of such issues on 10.2, but I also use UNNEST more often, to push a subquery into the body of the code.

I am more happy using a CARDINALITY hint than USE_NL or USE_HASH as the CARDINALITY hint is self documenting (it says exactly what correction {or lie} you are giving the optimiser). A USE_NL Hint is enforcing an access path and not really saying why.

If specific hints are going to be used, I like to be very specific. USE_HASH should state both tables that are to be hashed together {It might even be that modern versions of Oracle insist on the form USE_HASH (taba tab) and not just USE_HASH (taba), as I never use the less specific Hint}.
{ NB see comment by Jonathan Lewis as to why I am utterly wrong on this one – USE_HASH basically says “Use a hash join on the tables listed” but does not force a hash between the two stated tables}.

I only Hint specific use of an index if I can’t fix the problem with better gathered stats. I don’t LIKE adding INDEX Hints, even though they are commonly used and easy to understand. For the reasons stated above, I do not like ordering the CBO to use one of the currently existing indexes.

I really do not like using the RULE hint. In Version 9 it was an uphill struggle to get people to not use it as it so often fixed the immediate problem and, of course, oracle used it so much for internal SQL (and still do in 10 and even in 11). How many current DBAs and Developers know what the Rule Based Optimizer rules are? {hands DOWN you over-40 lot}. Using the RULE hint is bordering on homeopathy for databases. It seems to work, you have no idea why and, in reality, it may well be doing nothing, as you are using a feature of Oracle that is incompatible wiht the RBO.

I am very, very uncomfortable about some hints. The bypass_ujvc hint is one of them. It basically tells Oracle it can do a MERGE INTO statement without having the unique constraint in place to support the where clause on the target table that allows it to work reliably. You are telling the optimizer “just trust me”. IE you can lie horribly to the CBO.

All in all, I try and get the stats right rather than hint. I’ll spend ten times as long trying to understand and fix (if I can) why the estimated costs and cardinalites in an Explain Plan are wrong than slapping in an INDEX Hint. I will use Hints if I can’t fix the plan via the stats, but I try and use the more generic Hints. I know from experience that fixing the stats {or at least understanding why I can’t} fixes more code than adding one hint to one SQL statement.

A good rule of thumb is, if the Cardinality is accurate for a step, the plan will be acceptable. {This is a Rule of thumb, not a cast-iron truth!}.

Fun with Filters March 9, 2010

Posted by mwidlake in performance.
Tags: , ,
9 comments

{Please note – this post has been used to test changes to layout, thus the duplicated sections}

This post is about the importance of filter statements in execution plans and how they can indicate problems that are not always apparent.

I had a problem recently with some code that had gone rogue – it had been running in a few seconds, being executed every 10 minutes or so. Now it ran until either someone killed it off or it got “snapshot too old” errors. I pretty much knew it was prompted by stats being gathered on the tables in question as we had just gathered stats on the tables in that schema.

The code was something like this (it is not too important what exactly the code was). Oh, and this is on 10.2.0.3, enterprise edition with partitioning.

select  accounted, max(recorded_date),count(*)
from W_LCG_OPENING_
where accountid||DATA_SRC_COUNTRY_ID in
     (
       SELECT accountid||DATA_SRC_COUNTRY_ID FROM W_LCG_OPENING_
       minus
       SELECT  accountid ||DATA_SRC_COUNTRY_ID FROM W_LCG_CLIENT_
      )
   and 1= 1 – this is pointless, it replaces a check of “run”=”run”
   group by accountid,DATA_SRC_COUNTRY_ID

Those tables are quite large, several million rows in each. The code is basically scanning all of the rows in the table as there are no indexes to support the query and it is written in a way that, well, I would not write it.

Digging around in some monitoring tools I use I confirmed that the code has swapped plan

{original layout, having hacked it to fit}

inst SQL_ID        Plan Hash   First Load   execs   
---- ------------- ----------- ------------ ------ 
  TOT_ROWS       TOT_BG      TOT_DR        TOT_CPU    CREATED_DT
---------- ------------ ----------- --------------  ------------
   3 1t7ma4xn3rw46  3183284037 100203 09:10     125
     7,854     6,457,885   1208,149    816,476,585  091020 10:27
   3 1t7ma4xn3rw46  3127554972 100206 09:13       2
         0  1260,936,642        980  94599,678,960  100205 09:19

{newly possible layout, with wider screen}

inst SQL_ID        Plan Hash   First Load   execs    TOT_ROWS       TOT_BG      TOT_DR        TOT_CPU    CREATED_DT

---- ------------- ----------- ------------ ------ ---------- ------------ ----------- --------------  ------------
   3 1t7ma4xn3rw46  3183284037 100203 09:10    125      7,854    6,457,885    1208,149    816,476,585  091020 10:27
   3 1t7ma4xn3rw46  3127554972 100206 09:13      2          0 1260,936,642         980  94599,678,960  100205 09:19

Version 1 comes back in 5 or 6 seconds. Version 2 does not effectively come back, which is why it records 0 rows. You can see that the Disk Gets are pretty low in version 2 (allowing that it was never left to finish) but Buffer Gets and CPU are both massively up. So much so, it exceeds where I format the numbers with comas (as, if they are THAT big, they need fixing anyway).

I looked at the plans and, though they were different, nothing jumped out at me. So I trimmed down the code and built it up section by section until I saw something significant change. This is my general mode of tuning code if nothing initially occurs to me.
As usual, I started with any scalar or inline code, in this case that SELECT…MINUS…SELECT in the where clause.

SELECT accountid||DATA_SRC_COUNTRY_ID FROM W_LCG_OPENING_
Minus
SELECT  accountid ||DATA_SRC_COUNTRY_ID FROM W_LCG_CLIENT_

This came back in 3.145 seconds, with all of 62 rows. That is fine.

Now I added back the use of the MINUS code as a WHERE clause to the outer query.

select accountid
from  W_LCG_OPENING_
where accountid||DATA_SRC_COUNTRY_ID in
     (
       SELECT accountid||DATA_SRC_COUNTRY_ID FROM W_LCG_OPENING_
       minus
       SELECT  accountid ||DATA_SRC_COUNTRY_ID FROM W_LCG_CLIENT_
      ) 

This came back in 5.22 seconds with 4468 records. Again, I can live with that, we do not need sub-second response, we need sub-minute response.

So I now added back in the group by accountid…

select accountid,count(*)
from W_LCG_OPENING_
where accountid||DATA_SRC_COUNTRY_ID in
     (
       SELECT accountid||DATA_SRC_COUNTRY_ID FROM W_LCG_OPENING_
       minus
       SELECT  accountid ||DATA_SRC_COUNTRY_ID FROM W_LCG_CLIENT_
      )
group by accountid

This does not come back before I have fetched and consumed a cup of tea. That is a significant change.

I was not expecting this, why would sorting under 5 thousand rows stress a database that can scan multi-million row tables in seconds? There is all that CPU being burned up and huge numbers of buffer gets, but not much in the way of disc IO, so it is probably doing something that can be done in memory a very, very large number of times.

What is the difference in plan? {Oh HELL, they look rubbish – sorry, click on the images to blow them up, I’d suggest right-click and open in a new tab, and, YES, I am working on making my blog a little more “wide and friendly”.}. Good plan first, bad plan second

{Original compressed – width=”459″ height=”122″}

{Largest size I can use – width=”800″ height=”212″}

{Natural “overflow” size – width=”866″ height=”230″}

{ was width=”460″ height=”132″ now 800*242. Orig is 839* 242}

The view step has disappeared, which was an expensive step so should help not hinder.
The “SORT GROUP BY” has appeared, which it would do as we introduced the “GROUP BY” clause.
The hash join has been replaced with a filter. In a more complex plan you might miss this replacement, you would maybe notice the hash join disappearing but a filter, well, it is checking some simple “WHERE” clause isn’t it?

Well, the hash join was joining the results coming from the two steps feeding into it, the PARTITION RANGE FULL and the VIEW (which is the in-memory construct of the SELECT..MINUS…SELECT statement).

Now the Filter is filtering the results from the PARTITION RANGE ALL with the results from the MINUS. At this point I’d like to highlight that the predicted cardinality and bytes coming back for the steps within the union have reduced by a factor of 100 from the good and bad plans. And I’ll also admit I hid some key detail in the screen shot. I am not showing the access and filter predicates.

{image is 989*246, I am trying 800*246. Original 460*114}

The above is a screen shot showing that in the new code there are no filter predicates but an access predicate, for the access to the view (ie the select…minus…select). For easier reading, the full access predicate is below, rather than in the screen shot:

“$nso_col_1″=TO_CHAR(“ACCOUNTID”)||TO_CHAR(“DATA_SRC_COUNTRY_ID”)

However, for the slow code, there are no access predicated but are filter predicates. Again, the screen shot shows that there are predicates and I show the full text below. (Screen shots are from PL/SQL developer, btw).

{image 953*238, trying 800*238, original 460*114}

2- EXISTS ( (SELECT /*+ */ TO_CHAR("ACCOUNTID")
   ||TO_CHAR("DATA_SRC_COUNTRY_ID") 
FROM "W_LCG_OPENING_" "W_LCG_OPENING_" 
WHERE TO_CHAR("ACCOUNTID")||TO_CHAR("DATA_SRC_COUNTRY_ID")=TO_CHAR(:B1)||TO_CHAR(:B2))
MINUS (SELECT /*+ */ TO_CHAR("ACCOUNTID")||TO_CHAR("DATA_SRC_COUNTRY_ID")
FROM "W_LCG_CLIENT_" "W_LCG_CLIENT_" 
WHERE TO_CHAR("ACCOUNTID")||TO_CHAR("DATA_SRC_COUNTRY_ID")=TO_CHAR(:B3)||TO_CHAR(:B4)))	

8- TO_CHAR("ACCOUNTID")||TO_CHAR("DATA_SRC_COUNTRY_ID")=TO_CHAR(:B1)||TO_CHAR(:B2)	

10- TO_CHAR("ACCOUNTID")||TO_CHAR("DATA_SRC_COUNTRY_ID")=TO_CHAR(:B1)||TO_CHAR(:B2)	

Basically, the filter is to run the union query for every row from the driving query, passing in the relevant filter clause as an extra predicate to each of the individual queries of the SELECT…MINUS…SELECT

Clever, but not of much help to us as the performance is awful.

I showed this to a colleague and their immediate response was “why does that not show up as a nested loop? How am I supposed to spot that the “filter” is doing so much?” There were a couple of robust Saxon words mixed in with that statement.

So, Beware Filters replacing joins and get in the habit of checking out the filter and access predicates

If you use an old-style template for showing plans {like I do, a guilty sin of mine}, or a GUI where you have not selected that the filter and access predicates be shown, you may well not get them displayed. If you use autotrace in SQL*Plus, you will though:

db3_mw> select accountid,count(*)
  2  from W_LCG_OPENING_
  3  where accountid||DATA_SRC_COUNTRY_ID in
  4       (
  5         SELECT accountid||DATA_SRC_COUNTRY_ID FROM W_LCG_OPENING_
  6         minus
  7         SELECT  accountid ||DATA_SRC_COUNTRY_ID FROM W_LCG_CLIENT_
  8        )
  9  group by accountid
 10  /

Execution Plan
----------------------------------------------------------
Plan hash value: 397856216

--------------------------------------------------------------------------------
| Id  | Operation                | Name             | Rows  | Bytes | Cos
t (%CPU)| Time     | Pstart| Pstop |
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |                  |   911 |  7288 |
15M  (7)| 17:25:41 |       |       |
|   1 |  SORT GROUP BY           |                  |   911 |  7288 |
15M  (7)| 17:25:41 |       |       |
|*  2 |   FILTER                 |                  |       |       |
        |          |       |       |
|   3 |    PARTITION RANGE ALL   |                  |  3153K|    24M|  83
11   (4)| 00:00:34 |     1 |   840 |
|   4 |     TABLE ACCESS FULL    | W_LCG_OPENING_   |  3153K|    24M|  83
11   (4)| 00:00:34 |     1 |   840 |
|   5 |    MINUS                 |                  |       |       |
        |          |       |       |
|   6 |     SORT UNIQUE NOSORT   |                  | 31535 |   246K|  86
15   (7)| 00:00:35 |       |       |
|   7 |      PARTITION RANGE ALL |                  | 31535 |   246K|  86
11   (7)| 00:00:35 |     1 |   840 |
|*  8 |       TABLE ACCESS FULL  | W_LCG_OPENING_   | 31535 |   246K|  86
11   (7)| 00:00:35 |     1 |   840 |
|   9 |     SORT UNIQUE NOSORT   |                  |   132 |  1056 |
17  (18)| 00:00:01 |       |       |
|* 10 |      INDEX FAST FULL SCAN| W_LCG_CLIENT__PK |   132 |  1056 |
16  (13)| 00:00:01 |       |       |
--------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter( EXISTS ( (SELECT /*+ */ TO_CHAR("ACCOUNTID")||TO_CHAR("DATA_SRC_
") FROM
              "W_LCG_OPENING_" "W_LCG_OPENING_" WHERE
              TO_CHAR("ACCOUNTID")||TO_CHAR("DATA_SRC_")=TO_CHAR(:B1)|
|TO_CHAR(:B2))MINUS (SELECT /*+ */
              TO_CHAR("ACCOUNTID")||TO_CHAR("DATA_SRC_") 
FROM "W_LCG_CLIENT_"
              "W_LCG_CLIENT_" WHERE TO_CHAR("ACCOUNTID")||TO_CHAR("DATA_S
RC_")=TO_CHAR(:B3)||TO_CHAR(:B4) )))
   8 - filter(TO_CHAR("ACCOUNTID")||TO_CHAR("DATA_SRC_")=TO_CHAR(:B1)|
|TO_CHAR(:B2))
  10 - filter(TO_CHAR("ACCOUNTID")||TO_CHAR("DATA_SRC_")=TO_CHAR(:B1)|
|TO_CHAR(:B2)) 

{re-worked layout}

db3_mw> select accountid,count(*)
  from W_LCG_OPENING_
  where accountid||DATA_SRC_COUNTRY_ID in
       (   SELECT accountid||DATA_SRC_COUNTRY_ID FROM W_LCG_OPENING_
           minus
          SELECT  accountid ||DATA_SRC_COUNTRY_ID FROM W_LCG_CLIENT   )
  group by accountid

Execution Plan
----------------------------------------------------------
Plan hash value: 397856216

-------------------------------------------------------------------------------------------------------------
| Id  | Operation                | Name             | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
-------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |                  |   911 |  7288 |15M  (7)    |17:25:41  |       |       |
|   1 |  SORT GROUP BY           |                  |   911 |  7288 |15M  (7)    |17:25:41  |       |       |
|*  2 |   FILTER                 |                  |       |       |            |          |       |       |
|   3 |    PARTITION RANGE ALL   |                  |  3153K|    24M|  8311   (4)|00:00:34  |     1 |   840 |
|   4 |     TABLE ACCESS FULL    | W_LCG_OPENING_   |  3153K|    24M|  8311   (4)| 00:00:34 |     1 |   840 |
|   5 |    MINUS                 |                  |       |       |            |          |       |       |
|   6 |     SORT UNIQUE NOSORT   |                  | 31535 |   246K|  8615   (7)| 00:00:35 |       |       |
|   7 |      PARTITION RANGE ALL |                  | 31535 |   246K|  8611   (7)| 00:00:35 |     1 |   840 |
|*  8 |       TABLE ACCESS FULL  | W_LCG_OPENING_   | 31535 |   246K|  8611   (7)| 00:00:35 |     1 |   840 |
|   9 |     SORT UNIQUE NOSORT   |                  |   132 |  1056 |17  (18)    | 00:00:01 |       |       |
|* 10 |      INDEX FAST FULL SCAN| W_LCG_CLIENT__PK |   132 |  1056 |16  (13)    | 00:00:01 |       |       |
--------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter( EXISTS ( (SELECT /*+ */ TO_CHAR("ACCOUNTID")||TO_CHAR("DATA_SRC_COUNTRY_ID") FROM
              "W_LCG_OPENING_" "W_LCG_OPENING_" WHERE
              TO_CHAR("ACCOUNTID")||TO_CHAR("DATA_SRC_")=TO_CHAR(:B1)||TO_CHAR(:B2))MINUS (SELECT /*+ */
              TO_CHAR("ACCOUNTID")||TO_CHAR("DATA_SRC_") 
              FROM "W_LCG_CLIENT_"  "W_LCG_CLIENT_" 
              WHERE TO_CHAR ("ACCOUNTID")||TO_CHAR("DATA_SRC_")=TO_CHAR(:B3)||TO_CHAR(:B4)              )))
   8 - filter(TO_CHAR("ACCOUNTID")||TO_CHAR("DATA_SRC_COUNTRY_ID")=TO_CHAR(:B1)||TO_CHAR(:B2))
  10 - filter(TO_CHAR("ACCOUNTID")||TO_CHAR("DATA_SRC_")=TO_CHAR(:B1)||TO_CHAR(:B2)) 

What did I do to fix the problem? Well, even though the code originally went bad due to stats being gathered, I could not force it back to a nice plan after an hour or two playing with gathering new stats so, in this case, I used an UNNEST hint.

select accountid,count(*)
from W_LCG_OPENING_
where accountid||DATA_SRC_COUNTRY_ID in
     (
       SELECT /*+ unnest */ accountid||DATA_SRC_COUNTRY_ID FROM W_LCG_OPENING_
       minus
       SELECT  accountid ||DATA_SRC_COUNTRY_ID FROM W_LCG_CLIENT_
      )
group by accounted

And it forced the plan back to the version using a HASH join.

The hinted plan

I’m a little unhappy about that hint, but the code needed fixing and the hint did what I wanted it to {I think it was poor of me to only hint one of the two minus statements and I do not like resorting to hints just because I can’t work out what is “wrong” with the stats – If I work out what is wrong and still need the hint, fair enough}. However, I had to be pragmatic and get the code fixed and working in Live, so it has gone in with a hint

More on COMMAND_TYPE values. January 8, 2010

Posted by mwidlake in internals.
Tags: ,
3 comments

This is just a quick update on finding the values for COMMAND_TYPE as is held in V$SQL and V$SQLAREA. The original post is here. Quick sumary – table SYS.AUDIT_ACTIONS as an almost-complete list of COMMAND_TYPES.

In one of those little concidences of fate, I have been looking at use of SQL AUDIT for the last few days. I am not very knowledgeble about AUDIT or oracle security in general, so I do what I normally do when I don’t know much about an area. I look on the web, scan the manual and then go poking around in the data dictionary. I came across two potentially interesting tables, SYS.AUDIT_ACTIONS and SYS.STMT_AUDIT_OPTION_MAP. At first glance they look to hold similar information, things that can be audited. This seems to be what SYS.STMT_AUDIT_OPTION_MAP is:-

select * from SYS.STMT_AUDIT_OPTION_MAP
order by option#
   OPTION# NAME                                       PROPERTY
---------- ---------------------------------------- ----------
         3 ALTER SYSTEM                                      0
         4 SYSTEM AUDIT                                      0
         5 CREATE SESSION                                    0
         6 ALTER SESSION                                     0
         7 RESTRICTED SESSION                                0
         8 TABLE                                             0
         9 CLUSTER                                           0
        10 CREATE TABLESPACE                                 0
        11 ALTER TABLESPACE                                  0
        12 MANAGE TABLESPACE                                 0
        13 DROP TABLESPACE                                   0
        14 TABLESPACE                                        0
        15 UNLIMITED TABLESPACE                              0
-- lets look at dba_audit_trail that underlys many of the audit views.
@vw_txt
Enter value for vw_name: dba_audit_trail
Any Key...>

OWNER      VIEW_NAME                      TEXT_LENGTH
---------- ------------------------------ -----------
TEXT
------------------------------------------------------------------
SYS        DBA_AUDIT_TRAIL                       3424
select spare1           /* OS_USERNAME */,
       userid           /* USERNAME */,
       userhost         /* USERHOST */,
       terminal         /* TERMINAL */,
       cast (           /* TIMESTAMP */
           (from_tz(ntimestamp#,'00:00') at local) as date),
       obj$creator      /* OWNER */,
       obj$name         /* OBJECT_NAME */,
       aud.action#      /* ACTION */,
       act.name         /* ACTION_NAME */,
....
from sys.aud$ aud, system_privilege_map spm, system_privilege_map spx,
     STMT_AUDIT_OPTION_MAP aom, audit_actions act
where   aud.action#     = act.action    (+)
  and - aud.logoff$dead = spm.privilege (+)
  and   aud.logoff$dead = aom.option#   (+)
  and - aud.priv$used   = spx.privilege (+)

What about AUDIT_ACTIONS? Well, I looked at the table and thought “this looks very similar to the list of values for COMMAND_TYPE – I should check that out”.

I don’t have to. The very next day Christian Antognini posted a quick comment on the posting about COMMAND_TYPE telling me where one could find a list of these commands in the DB – Yep, AUDIT_ACTIONS.

Christian’s posting warned me that a couple of commands are missing (our old favorite from the original post MERGE being one and CREATE_DISKGROUP being the other he cites) but it includes several more that are not listed in the manual. If anyone is aware of COMMAND_TYPES not included in AUDIT_ACTIONS then let me know and I can maintain a list here.

{Thanks to Chris for providing many more missing actions and two new to 11.2, included below}

New ones to those liste in the manual are:-
128 – FLASHBACK
129 – CREATE_SESSION
197 – PURGE USER_RECYCLEBIN
198 – PURGE DBA_RECYCLEBIN
199 – PURGE TABLESAPCE
200 – PURGE TABLE
201 – PURGE INDEX
202 – UNDROP OBJECT
204 – FLASHBACK DATABASE
205 – FLASHBACK TABLE
206 – CREATE RESTORE POINT
207 – DROP RESTORE POINT
208 – PROXY AUTHENTICATION ONLY
209 – DECLARE REWRITE EQUIVALENCE
210 – ALTER REWRITE EQUIVALENCE
211 – DROP REWRITE EQUIVALENCE
and new in the table for 11.2
225 – ALTER DATABASE LINK
305 – ALTER PUBLIC DATABASE LINK

Missing codes are:-
88 – ALTER VIEW
90 – SET CONSTRAINTS
187 – CREATE SPFILE
188 – CREATE PFILE
189 – MERGE
192 – ALTER SYNONYM
193 – ALTER DISKGROUP
194 – CREATE DISKGROUP
195 – DROP DISKGROUP
XXX – FLASHBACK

Here is the list form AUDIT_ACTIONS from 10.2.0.3 (an eyeball check with 11.1.0.1 indicates the list has not increased in that version at least, which surprised me):

select * from audit_actions order by action

    ACTION NAME
---------- ----------------------------
         0 UNKNOWN
         1 CREATE TABLE
         2 INSERT
         3 SELECT
         4 CREATE CLUSTER
         5 ALTER CLUSTER
         6 UPDATE
         7 DELETE
         8 DROP CLUSTER
         9 CREATE INDEX
        10 DROP INDEX
        11 ALTER INDEX
        12 DROP TABLE
        13 CREATE SEQUENCE
        14 ALTER SEQUENCE
        15 ALTER TABLE
        16 DROP SEQUENCE
        17 GRANT OBJECT
        18 REVOKE OBJECT
        19 CREATE SYNONYM
        20 DROP SYNONYM
        21 CREATE VIEW
        22 DROP VIEW
        23 VALIDATE INDEX
        24 CREATE PROCEDURE
        25 ALTER PROCEDURE
        26 LOCK
        27 NO-OP
        28 RENAME
        29 COMMENT
        30 AUDIT OBJECT
        31 NOAUDIT OBJECT
        32 CREATE DATABASE LINK
        33 DROP DATABASE LINK
        34 CREATE DATABASE
        35 ALTER DATABASE
        36 CREATE ROLLBACK SEG
        37 ALTER ROLLBACK SEG
        38 DROP ROLLBACK SEG
        39 CREATE TABLESPACE
        40 ALTER TABLESPACE
        41 DROP TABLESPACE
        42 ALTER SESSION
        43 ALTER USER
        44 COMMIT
        45 ROLLBACK
        46 SAVEPOINT
        47 PL/SQL EXECUTE
        48 SET TRANSACTION
        49 ALTER SYSTEM
        50 EXPLAIN
        51 CREATE USER
        52 CREATE ROLE
        53 DROP USER
        54 DROP ROLE
        55 SET ROLE
        56 CREATE SCHEMA
        57 CREATE CONTROL FILE
        59 CREATE TRIGGER
        60 ALTER TRIGGER
        61 DROP TRIGGER
        62 ANALYZE TABLE
        63 ANALYZE INDEX
        64 ANALYZE CLUSTER
        65 CREATE PROFILE
        66 DROP PROFILE
        67 ALTER PROFILE
        68 DROP PROCEDURE
        70 ALTER RESOURCE COST
        71 CREATE MATERIALIZED VIEW LOG
        72 ALTER MATERIALIZED VIEW LOG
        73 DROP MATERIALIZED VIEW LOG
        74 CREATE MATERIALIZED VIEW
        75 ALTER MATERIALIZED VIEW
        76 DROP MATERIALIZED VIEW
        77 CREATE TYPE
        78 DROP TYPE
        79 ALTER ROLE
        80 ALTER TYPE
        81 CREATE TYPE BODY
        82 ALTER TYPE BODY
        83 DROP TYPE BODY
        84 DROP LIBRARY
        85 TRUNCATE TABLE
        86 TRUNCATE CLUSTER
        91 CREATE FUNCTION
        92 ALTER FUNCTION
        93 DROP FUNCTION
        94 CREATE PACKAGE
        95 ALTER PACKAGE
        96 DROP PACKAGE
        97 CREATE PACKAGE BODY
        98 ALTER PACKAGE BODY
        99 DROP PACKAGE BODY
       100 LOGON
       101 LOGOFF
       102 LOGOFF BY CLEANUP
       103 SESSION REC
       104 SYSTEM AUDIT
       105 SYSTEM NOAUDIT
       106 AUDIT DEFAULT
       107 NOAUDIT DEFAULT
       108 SYSTEM GRANT
       109 SYSTEM REVOKE
       110 CREATE PUBLIC SYNONYM
       111 DROP PUBLIC SYNONYM
       112 CREATE PUBLIC DATABASE LINK
       113 DROP PUBLIC DATABASE LINK
       114 GRANT ROLE
       115 REVOKE ROLE
       116 EXECUTE PROCEDURE
       117 USER COMMENT
       118 ENABLE TRIGGER
       119 DISABLE TRIGGER
       120 ENABLE ALL TRIGGERS
       121 DISABLE ALL TRIGGERS
       122 NETWORK ERROR
       123 EXECUTE TYPE
       128 FLASHBACK
       129 CREATE SESSION
       157 CREATE DIRECTORY
       158 DROP DIRECTORY
       159 CREATE LIBRARY
       160 CREATE JAVA
       161 ALTER JAVA
       162 DROP JAVA
       163 CREATE OPERATOR
       164 CREATE INDEXTYPE
       165 DROP INDEXTYPE
       167 DROP OPERATOR
       168 ASSOCIATE STATISTICS
       169 DISASSOCIATE STATISTICS
       170 CALL METHOD
       171 CREATE SUMMARY
       172 ALTER SUMMARY
       173 DROP SUMMARY
       174 CREATE DIMENSION
       175 ALTER DIMENSION
       176 DROP DIMENSION
       177 CREATE CONTEXT
       178 DROP CONTEXT
       179 ALTER OUTLINE
       180 CREATE OUTLINE
       181 DROP OUTLINE
       182 UPDATE INDEXES
       183 ALTER OPERATOR
       197 PURGE USER_RECYCLEBIN
       198 PURGE DBA_RECYCLEBIN
       199 PURGE TABLESAPCE
       200 PURGE TABLE
       201 PURGE INDEX
       202 UNDROP OBJECT
       204 FLASHBACK DATABASE
       205 FLASHBACK TABLE
       206 CREATE RESTORE POINT
       207 DROP RESTORE POINT
       208 PROXY AUTHENTICATION ONLY
       209 DECLARE REWRITE EQUIVALENCE
       210 ALTER REWRITE EQUIVALENCE
       211 DROP REWRITE EQUIVALENCE

Decoding high_value and low_value January 3, 2010

Posted by mwidlake in internals, performance.
Tags: , ,
13 comments

There is an update to this post here>

The table DBA_TAB_COLUMNS holds the LOW_VALUE and HIGH_VALUE for columns. This information is potentially very useful to us, especially as when the CBO looks at a WHERE clause on a column for a given value, the High and Low values for that column are considered. If the value in the WHERE clause is outside the known range, the expected number of values to be found is decreased in proportion to the distance outside the range.

What is not so helpful is that Oracle stores, and displays, the information in an internal raw format. Which is utterly unhelpful to us of course. Why can’t they at least expose the information in a view in a human-readable format? Tsch.

So here is some code on how to decode low_value and high_value. I pulled most of this together a few years back but always struggled to decode the low and high values for dates, until I found this post by Gary Myers.

-- col_stats
-- Martin Widlake mdw 21/03/2003
-- MDW 11/12/09 enhanced to include more translations of low_value/high_value
-- pilfered from Gary Myers blog
col owner        form a6 word wrap
col table_name   form a15 word wrap
col column_name  form a22 word wrap
col data_type    form a12
col M            form a1
col num_vals     form 99999,999
col dnsty        form 0.9999
col num_nulls    form 99999,999
col low_v        form a18
col hi_v         form a18
col data_type    form a10
set lines 110
break on owner nodup on table_name nodup
spool col_stats.lst
select --owner
--      ,table_name
      column_name
      ,data_type
      ,decode (nullable,'N','Y','N')  M
      ,num_distinct num_vals
      ,num_nulls
      ,density dnsty
,decode(data_type
  ,'NUMBER'       ,to_char(utl_raw.cast_to_number(low_value))
  ,'VARCHAR2'     ,to_char(utl_raw.cast_to_varchar2(low_value))
  ,'NVARCHAR2'    ,to_char(utl_raw.cast_to_nvarchar2(low_value))
  ,'BINARY_DOUBLE',to_char(utl_raw.cast_to_binary_double(low_value))
  ,'BINARY_FLOAT' ,to_char(utl_raw.cast_to_binary_float(low_value))
  ,'DATE',to_char(1780+to_number(substr(low_value,1,2),'XX')
         +to_number(substr(low_value,3,2),'XX'))||'-'
       ||to_number(substr(low_value,5,2),'XX')||'-'
       ||to_number(substr(low_value,7,2),'XX')||' '
       ||(to_number(substr(low_value,9,2),'XX')-1)||':'
       ||(to_number(substr(low_value,11,2),'XX')-1)||':'
       ||(to_number(substr(low_value,13,2),'XX')-1)
,  low_value
       ) low_v
,decode(data_type
  ,'NUMBER'       ,to_char(utl_raw.cast_to_number(high_value))
  ,'VARCHAR2'     ,to_char(utl_raw.cast_to_varchar2(high_value))
  ,'NVARCHAR2'    ,to_char(utl_raw.cast_to_nvarchar2(high_value))
  ,'BINARY_DOUBLE',to_char(utl_raw.cast_to_binary_double(high_value))
  ,'BINARY_FLOAT' ,to_char(utl_raw.cast_to_binary_float(high_value))
  ,'DATE',to_char(1780+to_number(substr(high_value,1,2),'XX')
         +to_number(substr(high_value,3,2),'XX'))||'-'
       ||to_number(substr(high_value,5,2),'XX')||'-'
       ||to_number(substr(high_value,7,2),'XX')||' '
       ||(to_number(substr(high_value,9,2),'XX')-1)||':'
       ||(to_number(substr(high_value,11,2),'XX')-1)||':'
       ||(to_number(substr(high_value,13,2),'XX')-1)
,  high_value
       ) hi_v
from dba_tab_columns
where owner      like upper('&tab_own')
and   table_name like upper(nvl('&tab_name','WHOOPS')||'%')
ORDER BY owner,table_name,COLUMN_ID
/
clear colu
spool off
clear breaks

Most of the translation is done via the utl_raw.cast_to_datatype functions but no such function is provided for dates, which is where most of us come unstuck. Gary recognised that the data was being stored in hex format, with an offset for the year.

I hope the script helps some of you.
{There are also functions under DBMS_STATS, DBMS_STATS.CONVERT_RAW_VALUES, that can also be called to translate many datatypes}

If anyone knows how to decode timestamps, I’d love to know as it would save me trying to work it out/find it on the Net somewhere. I’ll add it to the post so that there is one place to find all translatations.

Here is a quick output for a test table

@col_stats
Enter value for tab_own: dwperf
old  40: where owner      like upper('&tab_own')
new  40: where owner      like upper('dwperf')
Enter value for tab_name: ad_sql_exec_p
old  41: and   table_name like upper(nvl('&tab_name','WHOOPS')||'%')
new  41: and   table_name like upper(nvl('ad_sql_exec_p','WHOOPS')||'%')
any key> 

COLUMN_NAME            DATA_TYPE  M   NUM_VALS  NUM_NULLS   DNSTY LOW_V              HI_V
---------------------- ---------- - ---------- ---------- ------- ------------------ ---------------
INST_ID                NUMBER     Y          4          0  0.2500 1                  4
SQL_ID                 VARCHAR2   Y     87,104          0  0.0000 008d71chrks14      gzw309ttkx862
PLAN_HASH_VALUE        NUMBER     Y      1,884          0  0.0005 2646229            4294043874
SPC_ID                 NUMBER     N         83          0  0.0120 1076               2269
ADDRESS                RAW        N    118,198          0  0.0000 00000003E33123A0   0000001342FF3FF8
HASH_VALUE             NUMBER     N     87,104          0  0.0000 2758710            4294676643
CREATED_DATE           DATE       N        120          0  0.0083 2009-10-23 8:19:10 2009-12-10 9:19:13
LATEST_FIRST_LOAD_TIME DATE       N     11,791          0  0.0001 2009-9-14 11:55:59 2009-12-13 9:33:24
TOTAL_LOADS            NUMBER     N         98          0  0.0102 1                  55047
TOTAL_PARSE_CALLS      NUMBER     N         92          0  0.0109 0                  2972
TOTAL_EXECUTIONS       NUMBER     N        235          0  0.0043 0                  834624
TOTAL_ROWS_PROCESSED   NUMBER     N        809          0  0.0012 0                  26946123
TOTAL_FETCHES          NUMBER     N        313          0  0.0032 0                  834624
TOTAL_BUFFER_GETS      NUMBER     N      3,016          0  0.0003 0                  3355576809
TOTAL_DISK_READS       NUMBER     N        985          0  0.0010 0                  28189240
TOTAL_DIRECT_WRITES    NUMBER     N         98          0  0.0102 0                  751289
TOTAL_SORTS            NUMBER     N        106          0  0.0094 0                  5283
TOTAL_CPU_TIME         NUMBER     N     94,401          0  0.0000 1337               12183936207
TOTAL_ELAPSED_TIME     NUMBER     N    115,203          0  0.0000 1337               139692482086
TOTAL_OPTIMIZER_COST   NUMBER     N      1,467          0  0.0007 0                  369740902209315000
                                                                                     0000000

Using AWR data for OS information December 22, 2009

Posted by mwidlake in performance.
Tags: ,
5 comments

This is a demo of using AWR data to get some OS performance information.

This posting was actually a comment by (and is the work of) Bernard Polarski on a recent post about command_types I did a week or two back. I’ve lifted it into a separate posting as (a) I thought it might be useful to people and it would not get as much exposure as a comment as it would as a post and (b) I want to try and lay it out better for Bernard (WordPress likes to garble code in comments, which it did to Bernard’s efforts) and (c) I strognly believe in nicking other people’s code, if they are happy to put it “out there”. Why re-code what someone has already done and is happy to share :-)

I tried the code (I added the SYS. ownership on tables, so it will work even if you can’t log in as sys but have access to the objects) and it worked fine for me.

I should just add, use of AWR data is covered by an oracle licence, the diagnostic pack. If you do not have a licence for it, Oracle deem the WRH$ views as not yours to view, even though they are on your box, in the database you paid for, containing data gathered by your CPUs by code that is damned tricky to turn off and is on by default. Shesh.

Any mistakes or typoes, blame me not Bernard. I took the liberty of tweaking some of the English, hope that is OK Bernard.

*************************************
Okay a challenge : Let me try to surprise you with something improbable. Some months ago while investigating an unsual load on a Linux box, the unix sysadmin told us he could not go back more than 2 weeks with ‘sar’ history. Since Oracle takes also system stats I quickly enriched smenu with an Oracle sar.
The example below gives the sar stats for the last 10 snaps, it is stored into shortcut sts and called with param ‘-sar’

( replace workd Superior_TO and Inferior_to with mathematic signs, system eat them)

/tmp: sts -sar 10

+ sqlplus -s '/ as sysdba'
+
set feed off verify off lines 190 pages 66
col inst  new_value inst noprint;
col id1      head 'Idle time(%)'    justify c
col usr1     head 'User time(%)'    justify c
col sys1     head 'Sys time(%)'     justify c
col io1      head 'Io Wait time(%)' justify c
col nice1    head 'Nice time (%)'   justify c
col snap_len head 'Interval| (Secs)' justify c
col num_cpus   new_value p_num_cpus  head 'Number of CPU';
col a1         new_value secs noprint;
col SNAP_BEGIN format a20 head 'Snap begin' justify c
col SNAP_END   format a20 head 'Snap end'   justify c

 select instance_number inst from v$instance;
 SELECT value num_cpus 
 FROM v$osstat WHERE stat_name = 'NUM_CPUS';
 prompt
 prompt Negatives values correspond to Shutdown:
 prompt
select  snap_id, snap_len, round(id1 /snap_len *100,2) id1,
                       round(usr1 /snap_len *100,2) usr1,
                       round(sys1 /snap_len *100,2) sys1,
                       round(io1 /snap_len *100,2) io1,
                       round(nice1 /snap_len *100,2) nice1 
                     , snap_begin, snap_end
from (
     select  snap_id,  id1, usr1,sys1, io1, nice1
           , snap_begin, snap_end ,
             round( extract( day from diffs) *24*60*60*60+
                    extract( hour from diffs) *60*60+
                    extract( minute from diffs )* 60 +
                    extract( second from diffs )) snap_len 
             -- above is the exact length of the snapshot in seconds
     from ( select /*+ at this stage, each row show the cumulative value.
                       r1    7500  8600
                       r2    7300  8300
                       r3    7200  8110
                    we use [max(row) - lag(row)] to have the difference 
                    between [row and row-1], to obtain differentials values:
                       r1    200   300
                       r2    100   190
                       r3    0       0
                    */
        a.snap_id,
        (max(id1)    - lag( max(id1))   over (order by a.snap_id))/100        id1 ,
        (max(usr1)   - lag( max(usr1))  over (order by a.snap_id))/100        usr1,
        ( max(sys1)  - lag( max(sys1))  over (order by a.snap_id))/100        sys1,
        ( max(io1)   - lag( max(io1))   over (order by a.snap_id))/100        io1,
        ( max(nice1) - lag( max(nice1)) over (order by a.snap_id))/100        nice1,
          -- for later display
          max(to_char(BEGIN_INTERVAL_TIME,' YYYY-MM-DD HH24:mi:ss'))          snap_begin,
          -- for later display
          max(to_char(END_INTERVAL_TIME,' YYYY-MM-DD HH24:mi:ss'))            snap_end,
          -- exact len of snap used for percentage calculation
        ( max(END_INTERVAL_TIME)-max(BEGIN_INTERVAL_TIME))                    diffs
        from ( /*+  perform a pivot table so that the 5 values selected appear 
                    on one line. The case, distibute col a.value among 5 new 
                    columns, but creates a row for each. We will use the group 
                    by (snap_id) to condense the 5 rows into one. If you don't 
                    see how this works, just remove the group by and max function, 
                    then re-add it and you will see the use of max(). 
                    Here is what you will see:
        Raw data :   1000     2222
                     1000     3333
                     1000     4444
                     1000     5555
                     1000     6666
        The SELECT CASE creates populate our inline view with structure:
                       ID    IDLE    USER   SYS   IOWAIT   NICE
                     1000    2222
                     1000             3333
                     1000                   4444
                     1000                          5555
                     1000                                  6666
         the group by(1000) condenses the rows in one:
                       ID    IDLE    USER   SYS   IOWAIT  NICE
                      1000   2222    3333   4444   5555   6666
               */
               select a.snap_id,
                  case b.STAT_NAME
                       when 'IDLE_TIME' then a.value / &p_num_cpus
                   end  id1,
                  case b.STAT_NAME
                       when 'USER_TIME' then a.value / &p_num_cpus
                  end usr1 ,
                  case b.STAT_NAME
                       when 'SYS_TIME' then  a.value / &p_num_cpus
                  end sys1 ,
                  case b.STAT_NAME
                       when 'IOWAIT_TIME' then  a.value / &p_num_cpus
                  end io1,
                  case b.STAT_NAME
                       when 'NICE_TIME' then  a.value / &p_num_cpus
                  end nice1
                  from  sys.WRH$_OSSTAT a,  sys.WRH$_OSSTAT_NAME b
                       where
                            a.dbid      = b.dbid       and
                            a.STAT_ID   = b.stat_id    and
                            instance_number = &inst    and
                            b.stat_name in ('IDLE_TIME','USER_TIME','SYS_TIME','IOWAIT_TIME','NICE_TIME') and
                            a.snap_id > (( select max(snap_id) from  sys.WRH$_OSSTAT) - 10  -1 )
                   order by 1 desc
              ) a,  sys.wrm$_snapshot s
         where  a.snap_id = s.snap_id
         group by a.snap_id
         order by snap_id desc
        )
   )where rownum < (10+1);





Number of CPU
-------------
            2

Negatives values correspond to Shutdown:


            Interval
   SNAP_ID   (Secs)   Idle time(%) User time(%) Sys time(%) Io Wait time(%) Nice time (%)      Snap begin            Snap end
---------- ---------- ------------ ------------ ----------- --------------- ------------- -------------------- --------------------
      2212       3603        87.07         9.43        2.84           16.19             0  2009-12-15 14:00:19  2009-12-15 15:00:21
      2211       3613        90.15         7.31        2.14           18.97             0  2009-12-15 13:00:06  2009-12-15 14:00:19
      2210       3593        92.58         5.91         1.4            7.91             0  2009-12-15 12:00:13  2009-12-15 13:00:06
      2209       3574        91.24         6.78        1.73            9.37             0  2009-12-15 11:00:38  2009-12-15 12:00:13
      2208       3620        90.04          7.8        1.96           11.81             0  2009-12-15 10:00:19  2009-12-15 11:00:38
      2207       3594         89.8         6.89        2.37           12.99             0  2009-12-15 09:00:25  2009-12-15 10:00:19
      2206       3611        91.75         5.81        1.87            8.29             0  2009-12-15 08:00:14  2009-12-15 09:00:25
      2205       3599        92.38         5.63        1.68            8.57             0  2009-12-15 07:00:15  2009-12-15 08:00:14
      2204       3605        91.72         6.51        1.58           10.79             0  2009-12-15 06:00:11  2009-12-15 07:00:15
      2203       3601        86.36        10.35        2.94           10.62             0  2009-12-15 05:00:10  2009-12-15 06:00:11

COMMAND_TYPE Values December 10, 2009

Posted by mwidlake in internals.
Tags: , ,
18 comments

Follow-up post on finding most of the COMMAND_TYPES in the data dictionary is here and thanks to Christian Antognini who’s comment led me in that direction.

Spoiler – See end for getting a full list of COMMAND_TYPE values.

If you are in the habit of looking at V$SQLAREA and V$SQL to see what is going on in the database {for those of use who occasionally step out of the “GUI DBA Tool” environment} you will see a lot of stuff in there. I tend to look for the most demanding SQL so I am not interested in PL/SQL, I just want to see the SQL executed via it.

To do this, you can ignore those entires in V$SQL/V$SQLAREA with a “COMMAND_TYPE=47″ filter.

This COMMAND_TYPE is useful. 3=SELECT, the most common thing you probablly see and look at, 6=UPDATE, 7=DELETE. 50=EXPLAIN.

If you google/bing/whatever COMMAND_TYPE you get many hits that list of some of the commands, all looking very similar. But very partial lists.

So I decided to amalgamate these partial lists, verrify what COMMAND_TYPE links to what commands in the databases I have access to and publish this fuller, verified list. I quickly found some commands do not stay in the SGA after issuing, so I could not confirm them {“drop index” being one}.

I got this far:

-- mdw 10/12/09
-- check for cmd types not seen before, as lists on web are not complete
-- * = I have verified
col sql_txt form a60
select command_type,sql_id,substr(sql_text,1,60) sql_txt
from gv$sqlarea
where command_type not in (
 1 --  create table  *
,2 --  INSERT        *
,3 --  SELECT        *
,6 --  UPDATE        *
,7 --  DELETE        *
,9 --  create index  *
,11 -- ALTER INDEX   *
,26 -- LOCK table    *
,42 -- ALTER_SESSION (NOT ddl)
--two postings suggest 42 is alter session
,44 -- COMMIT
,45 -- rollback
,46 -- savepoint
,47 -- PL/SQL BLOCK' or begin/declare *
,48 -- set transaction   *
,50 -- explain           *
,62 -- analyze table     *
,90 -- set constraints   *
,170 -- call             *
,189 -- merge            *
)
and rownum < 20
/

Please feel free to add to it.

Or save yourself the bother and check out the list provided in the Oracle documentation. Not under the descriptions for V$SQL or V$SQLAREA, which would have been the obvious place Mr Larry Ellison thank you, but under V$SESSION. . Further, the fact that in the V$SESSION table the column is called just COMMAND and not COMMAND_TYPE does not assist in locating this information (don’t google COMMAND and ORACLE, you get many millions of hits…). Just click the below.

This is the full table of values and meanings.

(This is the listing for 10.2 and 11.1 is very similar).

But it does not include 189 – MERGE, so that is one up for practical testing than just reading the manual :-)

Back to the day job…

Wrong way to Query on Dates – the Persistent Offender November 15, 2009

Posted by mwidlake in performance.
Tags: ,
13 comments

Some of you, hopefully most of you, will find nothing new in this little posting. Except maybe the realisation that you are not the only one utterly frustrated by the ever-reoccuring nature of this. Actually, I should say “frustrated by these“. There are two things here I keep coming across, year after year after year…Actually, three things. {Maybe I am frustrated too easily, I need a holiday….}

{Maybe I should start a “weekend rant” as well as a”Friday Philospohy” thread…}

OK, you have a massive ORDERS table with a column ORDER_DATE on it, type DATE. This column has a normal B-tree index on it. You want to query for all ORDERS placed on a date, say 1st November 2009, so you issue a query with the following WHERE clause:

WHERE ORDER_DATE = ’01-NOV-09′

But you would not, would you? DATE columns can (and often do) contain a time portion. And in this case it does, so you would track down whoever specified this table, give them a “talking to”, get the column renamed to ORDER_DATETIME and use something like:

WHERE TRUNC(ORDER_DATETIME)=’01-NOV-09′

But you would not, would you? The function on ORDER_DATETIME will prevent the index being used (unless you added a function-based index, you clever so-and-so).
Now this is the first second thing that I keep seeing and have to run through with people. Some people simply have not yet been told that functions on columns stop indexes from being used, others know this but do not quite understand that a TRUNC {or SUBSTR on the first X characters} still stop indexes being used by Oracle, even though they know that “functions stop indexes being used” – It seems sensible to them, as humans, that if they had a printed index they would not think twice about doing a range check with the index. But the CBO is not a human and is not so smart. So you explain this to them.

So maybe you get to the point where the following is the WHERE clause to use:

WHERE ORDER_DATETIME >= ’01-NOV-09′
AND ORDER_DATETIME < ’02-NOV-09′

That will work. It will use the index on ORDER_DATETIME, it will pass unit testing, it will get released and work fine.

Until someone runs this on a system with a different NLS_DATE_FORMAT.
Or in a session with a different NLS_DATE_FORMAT.
Or on an application server/middle tier where the NLS_DATE_FORMAT is different and it all falls over.
{and for NLS_DATE_FORMAT, keep in mind that NLS_LANG and NLS_TERRITORY can implicitly alter NLS_DATE_FORMAT, so anyone supporting database systems over national boundaries may well be nodding their heads sagely right now}.

It falls over because there is an implicit data conversion of the string ’01-NOV-09′ to a date. It uses your default date format, which is not the same on all databases, on all middle tiers or even all sessions (I always alter my session to show the time portion and I seem to be one of the first to hit these issues, maybe as a result). It is far less common, but I must come across this NLS issue once a year minimum and usually it has been a bit of a major issue (like feeding back utterly the wrong data to a regulatory body, Ouch).

So, you need to explicitly state your character to date format conversion. And to all developers reading this, Yes I know, it is a pain to type those extra bits, but you get paid to do it! :-)
WHERE ORDER_DATETIME >= TO_DATE(’01-NOV-2009′,’DD-MON-YYYY’)
AND ORDER_DATETIME < TO_DATE(’02-NOV-2009′,’DD-MON-YYYY’)

That will work and keep working.

So:

  • Call columns that hold a DATE with a time portion blar_DATETIME
  • Don’t have functions on the column, move them over to the other side of the WHERE statement(s)
  • Explicitly specify your date conversions. Always. Every time.

I know, I know, most of you knew all of that. Those of you who didn’t, well probably no one had told you before but now you have been told. And that is the heart of it, these things are easy to explain, easy to understand, but somehow seem to be missed when people are getting going with Oracle. They just need telling.

And I’m only able to tell so many. So, the rest of you, will you please tell all inexperienced Oracle people about these three things as I don’t think I can face another 20 years of telling people these things :-)

{The serious final line is, there are lots and lots of interesting gotchas, what-ifs, quirks on things in Oracle, all of which are good to share, but perhaps the most important things to keep sharing are those things “we all know”, but those with less experience don’t.}

Describing tables you can’t DESC October 7, 2009

Posted by mwidlake in internals.
Tags: ,
3 comments

This is more an oddity than anything particularly useful. Sometimes you can’t use the sql*plus DESCRIBE {DESC} command on tables- but you might have an alternative.

I’m doing a lot of work for a client on a 10.2.0.3 database.  I have SELECT ANY DICTIONARY but not SELECT ANY TABLE privilege. This is because there is sensitive data in the database and it is duly protected {and this is certainly not the first time I have worked for a client with full dictionary access but not data access, it’s becoming normal}. I’m granted access to specific things as needs arise.

I knew I had to look at a table called AD_FAKE_STATS_COLUMNS.

select count(*) from mdw.AD_FAKE_STATS_COLUMNS
/
select count(*) from mdw.AD_FAKE_STATS_COLUMNS
                         *
ERROR at line 1:
ORA-00942: table or view does not exist

Have I got the table name wrong?

select table_name,owner
from dba_tables
where table_name = 'AD_FAKE_STATS_COLUMNS'

TABLE_NAME                     OWNER
------------------------------ -----------------
AD_FAKE_STATS_COLUMNS          MDW

OK, it is there, I don’t have access to it. Fine, I don’t want access to the data, I just want to see the structure of the table:

desc mdw.AD_FAKE_STATS_COLUMNS
ERROR:
ORA-04043: object mdw.AD_FAKE_STATS_COLUMNS does not exist

Oh. DESC in sql*plus does not work.

I can’t DESC a table I do not have access to carry out DML on. I’m going to have to go and ask someone to give me permission to see the table. How annoying.

Or do I?

@tab_desc
Enter value for tab_name: AD_FAKE_STATS_COLUMNS
old 21: where table_name like upper (nvl('&amp;TAB_NAME','W')||'%')
new 21: where table_name like upper (nvl('AD_FAKE_STATS_COLUMNS','W')||'%')

TAB_OWN  TAB_NAME              COL_NAME              M COL_DEF
-----------------------------------------------------------------
MDW      AD_FAKE_STATS_COLUMNS TABLE_NAME            Y VARCHAR2(30)
                               COLUMN_NAME           Y VARCHAR2(30)
                               COPY_STATS_FROM       N VARCHAR2(61)
                               LOW_VALUE_SQL         N VARCHAR2(100)
                               HIGH_VALUE_SQL        N VARCHAR2(100)
                               DISTINCT_SQL          N VARCHAR2(100)
                               DAYS_HIST_NULL_AVGLEN N NUMBER(3,0)

:-)

I have access to the data dictionary. So I can see the structure of the table, which after all is what I wanted and is what the client is happy for me to have.{I’ve never much liked the DESC command in sql*plus, I replaced it with a little sql script against the data dictionary years ago}.

In case you want it, here is the script:

-- tab_desc.sql
-- Martin Widlake date? way back in the mists of time
-- my own replacement for desc.
-- 16/11/01 improved the data_type section
 SET PAUSE ON
 SET PAUSE 'Any Key...&gt;'
 SET PAGES 24
col Tab_own form A10
col tab_name form a22 wrap
col col_name form a28 wrap
col col_def form A14
 break on tab_own skip 1 on tab_name skip 1
 spool tab_desc.lst
 select
 owner                               Tab_Own
,table_name             Tab_Name
,column_name            Col_Name
,decode(NULLABLE,'Y','N','Y') Mand
,data_type||decode(data_type
       ,'NUMBER','('
        ||decode(to_char(data_precision)
                ,null,'38'
                ,     to_char(data_precision)||
                      decode(data_scale,null,''
                                      ,      ','||data_scale)
                 )
                    ||')'
       ,'DATE',null
       ,'LONG',null
       ,'LONG RAW',null
,'('||Substr(DATA_LENGTH,1,5)||')'
				     )  col_def
from dba_tab_columns
where table_name like upper (nvl('&amp;TAB_NAME','WHOOPS')||'%')
order by 1,2,column_id,3,4
/
spool off
clear col
--

Decrypting Histogram Data #2 September 3, 2009

Posted by mwidlake in internals, performance.
Tags: , , ,
7 comments

Hint – if you have a column where most or all of the entries have the same 15 plus characters, don’t bother gathering histograms on it. They will not help you a lot. Or at all. eg you have a column that holds “CUSTOMER IDENTIFIER xxxxxxxxx”, “CUSTOMER IDENTIFIER yyyyyyyy”, etc. Of course, good design suggests that the “CUSTOMER IDENTIFIER” bit is probably redundant and can be got rid of, but we live in the real world and may not have the power or ability to enforce that change, or we might have 3 or 4 such “prefix” strings.

Further, histograms on dnumerics longer than 15 significant digits will also potentially behave not as you would expect.

I better justify my assertion.

In the previous post on decrypting histogram data I covered how Oracle turns a varchar2 value into a numeric value that is held in the ENDPOINT_VALUE of DBA_TAB_HISTOGRAMS and I also gave you a cunning (but slightly flawed) function to convert it back. I use it in this post so you might want to go look at the prior post. Sorry it is long, I can’t stop rambling.

First, I’ll create some test data {Oh, this is on 10.2.0.3 on Linux}. The below script create a table HIST_TEST with columns NUM_1 through to NUM7, which hold numbers padded out to greater lengths and then 0-9 as the last value. Similarlry columns VC_2 to VC_8 are padded out and have a random character added. VC_1 is a random 5-character string.

create table hist_test
as select rownum  id
,mod(rownum,10)  num_1
,trunc(dbms_random.value(1,10)) num_2
,1000+trunc(dbms_random.value(1,10)) num_3
,1000000+trunc(dbms_random.value(1,10)) num_4
,1000000000+trunc(dbms_random.value(1,10)) num_5
,1000000000000+trunc(dbms_random.value(1,10)) num_6
,1000000000000000+trunc(dbms_random.value(1,10)) num_7
,lpad(chr(mod(rownum,6)+65),5,chr(mod(rownum,6)+65) )vc_1
,lpad('A',2,'A')||dbms_random.string('U',1) vc_2
,lpad('B',3,'B')||dbms_random.string('U',1) vc_3
,lpad('C',4,'C')||dbms_random.string('U',1) vc_4
,lpad('D',5,'D')||dbms_random.string('U',1) vc_5
,lpad('E',6,'E')||dbms_random.string('U',1) vc_6
,lpad('F',7,'F')||dbms_random.string('U',1) vc_7
,lpad('G',8,'G')||dbms_random.string('U',1) vc_8
--below skews the table to AB.
,lpad('H',16,'H')||decode(mod(rownum,3),0,'A'
                                       ,1,'B'
                                       ,   dbms_random.string('U',1)) vc_16
,lpad('I',40,'I')||dbms_random.string('U',1) vc_40
from dba_objects
where rownum < 1001
/
begin
  dbms_stats.gather_table_stats(ownname=>user,tabname=>'HIST_TEST'
  ,method_opt=>'FOR ALL COLUMNS SIZE 10');
END;
/

The below is a simple select against DBA_TAB_COLUMNS to see the information for the column {Oh, I say simple, but you have to use the functions utl_raw.cat_to_number and utl_raw.cast_to_varchar2 to turn the raw values held in the columns LOW_VALUE and HIGH_VALUE to something we humans can read. Why does Oracle Corp have to make life so difficult? *sigh*.

Oh, click on “view plain” on the box below to get a better layout.

select
 column_name
,num_distinct num_dist
,decode(data_type,'NUMBER',to_char(utl_raw.cast_to_number(low_value))
                 ,'VARCHAR2',to_char(utl_raw.cast_to_varchar2(low_value))
                 ,          low_value
       ) low_v
,decode(data_type,'NUMBER',to_char(utl_raw.cast_to_number(high_value))
                 ,'VARCHAR2',to_char(utl_raw.cast_to_varchar2(high_value))
                 ,          high_value
       ) hi_v
,num_nulls    n_nulls
,num_buckets  n_buck
,avg_col_len  avg_l
from dba_tab_columns
where table_name ='HIST_TEST'
and owner=USER
order by owner,column_id
/

COLUMN_NAME       NUM_DIST LOW_V              HI_V                    N_NULLS N_BUCK  AVG_L
---------------------------------------------------------------------
ID                   1,000 1                  1000                          0     10      4
NUM_1                   10 0                  9                             0     10      3
NUM_2                    9 1                  9                             0      9      3
NUM_3                    9 1001               1009                          0      9      4
NUM_4                    9 1000001            1000009                       0      9      6
NUM_5                    9 1000000001         1000000009                    0      9      7
NUM_6                    9 1000000000001      1000000000009                 0      9      9
NUM_7                    9 1000000000000001   1000000000000009              0      9     10
VC_1                     6 AAAAA              FFFFF                         0      6      6
VC_2                    26 AAA                AAZ                           0     10      4
VC_3                    26 BBBA               BBBZ                          0     10      5
VC_4                    26 CCCCA              CCCCZ                         0     10      6
VC_5                    26 DDDDDA             DDDDDZ                        0     10      7
VC_6                    26 EEEEEEA            EEEEEEZ                       0     10      8
VC_7                    26 FFFFFFFA           FFFFFFFZ                      0     10      9
VC_8                    26 GGGGGGGGA          GGGGGGGGZ                     0     10     10
VC_16                   26 HHHHHHHHHHHHHHHHA  HHHHHHHHHHHHHHHHZ             0     10     18
VC_40                    1 IIIIIIIIIIIIIIIIII IIIIIIIIIIIIIIIIII            0      1     42/

I now select the data out of the DBA_TAB_HISTOGRAMS table to see what is really being stored. For the sake of brevity {which means, “so I can finish this post tonight”} I’ll just show bits, but if you want, download the script right at the end and, if you created the functions from the prior post, you can play with this yourself.

First, here you can see some of the values actually stored in ENDPOINT_VALUE and what they translate into:-

TABLE_NAME   colname                        ENDPOINT_VALUE ROWCOUNT REAL_VAL
------------------------------------------------------------
Numbers
HIST_TEST    NUM_5                              1000000007      760 1000000007
HIST_TEST    NUM_5                              1000000008      870 1000000008
HIST_TEST    NUM_5                              1000000009    1,000 1000000009
HIST_TEST    NUM_6                           1000000000001      103 1000000000001
HIST_TEST    NUM_6                           1000000000002      219 1000000000002
HIST_TEST    NUM_6                           1000000000003      328 1000000000003
HIST_TEST    NUM_6                           1000000000004      427 1000000000004
Varchars
HIST_TEST    VC_2     338824386822815000000000000000000000        8 AAU
HIST_TEST    VC_2     338824624507302000000000000000000000        9 AAW
HIST_TEST    VC_2     338824782963627000000000000000000000       10 AAY
HIST_TEST    VC_3     344035480872391000000000000000000000        0 BBB@
HIST_TEST    VC_3     344035481491361000000000000000000000        1 BBBB
HIST_TEST    VC_3     344035482419816000000000000000000000        2 BBBE
HIST_TEST    VC_3     344035483348271000000000000000000000        3 BBBH
HIST_TEST    VC_3     344035483967241000000000000000000000        4 BBBJ

Note that for numerics the number itself is stored and I do not need to translate it.
For VARCHAR2 columns the value held is the 15-digit number padded with zeros.
Also note, for VC_2 the range covered seems to end at AAY not AAZ and column VC_3 starts at BBB@ not BBBA {I am creating values with the last character set to A-Z}. Also, bucket 8 for VC_2 ends in a control character.

To reduce this I add a fudge to my function {again, see previous post}. It helps:

TABLE_NAME   colname ROWCOUNT REAL_VAL          MOD_REAL
---------------------------------------------------
HIST_TEST    VC_2           9 AAW               AAX
HIST_TEST    VC_2          10 AAY               AAZ
HIST_TEST    VC_3           0 BBB@              BBBA
HIST_TEST    VC_3           1 BBBB              BBBC
HIST_TEST    VC_3           2 BBBE              BBBF
HIST_TEST    VC_3           3 BBBH              BBBI
HIST_TEST    VC_3           4 BBBJ              BBBK
HIST_TEST    VC_3           5 BBBM              BBBN
HIST_TEST    VC_3           6 BBBO              BBBP
HIST_TEST    VC_3           7 BBBR              BBBS
HIST_TEST    VC_3           8 BBBT              BBBU
HIST_TEST    VC_3           9 BBBW              BBBX
HIST_TEST    VC_3          10 BBBY              BBBZ
HIST_TEST    VC_4           0 CCCC@             CCCCA
HIST_TEST    VC_4           1 CCCCB             CCCCC
HIST_TEST    VC_4           2 CCCCD             CCCCE
HIST_TEST    VC_4           3 CCCCH            CCCCH
HIST_TEST    VC_4           4 CCCCJ            CCCCJ

As you can see, I now get a better translation, but it still goes wrong sometimes (see last couple of rows). So, feel free to take my functions and use them, but be aware even the modified version is not perfect. If YOU perfect it, can I have a copy please :-)

The below shows that Numeric value histograms break when you exceed 15 digits:

colname                  END_VAL ROWCOUNT REAL_VAL
------- ------------------------ -------- --------
NUM_5                 1000000009    1,000 1000000009
NUM_6              1000000000001      103 1000000000001
NUM_6              1000000000002      219 1000000000002
NUM_6              1000000000003      328 1000000000003
NUM_6              1000000000004      427 1000000000004
NUM_6              1000000000005      542 1000000000005
NUM_6              1000000000006      651 1000000000006
NUM_6              1000000000007      771 1000000000007
NUM_6              1000000000008      881 1000000000008
NUM_6              1000000000009    1,000 1000000000009
NUM_7           1000000000000000      133 1000000000000000
NUM_7           1000000000000000      256 1000000000000000
NUM_7           1000000000000000      367 1000000000000000
NUM_7           1000000000000000      467 1000000000000000
NUM_7           1000000000000010      567 1000000000000010
NUM_7           1000000000000010      665 1000000000000010
NUM_7           1000000000000010      784 1000000000000010
NUM_7           1000000000000010      896 1000000000000010
NUM_7           1000000000000010    1,000 1000000000000010

This is the point at which storage of numbers for histograms breaks. You can see that NUM_6 is fine but NUM_7 is not. That is because NUM_6 is below 15 significant digits and NUM_7 is over 15 significant digits and effectively gets truncated.

Histograms on numeric values with more than 15 significant digits will not work as you expect, possible not at all.

With Varchar(2) values, histogram END_VALUES break even sooner, at around 7 digits:

colname                               END_VAL ROWCOUNT MOD_REAL
------- ------------------------------------- -------- --------
VC_5     354460798876024000000000000000000000        5 DDDDDO
VC_5     354460798876038000000000000000000000        6 DDDDDR
VC_5     354460798876047000000000000000000000        7 DDDDDS
VC_5     354460798876061000000000000000000000        8 DDDDDV
VC_5     354460798876071000000000000000000000        9 DDDDDY
VC_5     354460798876080000000000000000000000       10 DDDDDZ
VC_6     359673457682977000000000000000000000        0 EEEEEF8
VC_6     359673457682977000000000000000000000        1 EEEEEF8
VC_6     359673457682977000000000000000000000        2 EEEEEF8
VC_6     359673457682977000000000000000000000        3 EEEEEF8
VC_6     359673457682977000000000000000000000        4 EEEEEF8
VC_6     359673457682977000000000000000000000        5 EEEEEF8
VC_6     359673457682977000000000000000000000        6 EEEEEF8
VC_6     359673457682977000000000000000000000        7 EEEEEF8
VC_6     359673457682978000000000000000000000        8 EEEEEFn
VC_6     359673457682978000000000000000000000        9 EEEEEFn
VC_6     359673457682978000000000000000000000       10 EEEEEFn
VC_7     364886116489977000000000000000000000        0 FFFFFGK
VC_7     364886116489977000000000000000000000        1 FFFFFGK
VC_7     364886116489977000000000000000000000        2 FFFFFGK
VC_7     364886116489977000000000000000000000        3 FFFFFGK
VC_7     364886116489977000000000000000000000        4 FFFFFGK
VC_7     364886116489977000000000000000000000        5 FFFFFGK
VC_7     364886116489977000000000000000000000        6 FFFFFGK
VC_7     364886116489977000000000000000000000        7 FFFFFGK
VC_7     364886116489977000000000000000000000        8 FFFFFGK
VC_7     364886116489977000000000000000000000        9 FFFFFGK
VC_7     364886116489977000000000000000000000       10 FFFFFGK

You can see that for column VC_5 the actual ENDPOINT_VALUE is varying at the 14th and 15th significant digit and my translated value changes. But for VC_6 the numeric ENDPOINT_VALUE is hardly changing. Each row translated to one of two values.
For VC_7 the ENDPOINT_VALUE is static. All histogram END_VALUES are the same.

This means Histograms will not work properly for any VARCHAR(2) columns which do not vary for the first 7 or more characters and any characters after the 7th will be ignored.

Or does it? My logic is correct but tomorrow (or soon after) I’ll try some actual tests over theory… {So don’t ping me with corrections just yet}

I’ve not mentioned VC_16 and VC_40 have I? I will tomorrow. Maybe :-|.

Finally, Here is the selecting script, as promised.

col column_name form a7 head colname
col rowcount form 99,999
col real_val form a17
col end_val form 999999999999999999999999999999999999
col end_string form a10
col endpoint_actual_value form a40
col mod_real form a17
select --owner
 dth.table_name
,dth.column_name
,dth.endpoint_value  end_val
,dth.endpoint_number rowcount
,decode(dtc.data_type,'NUMBER',to_char(dth.endpoint_value)
            ,'VARCHAR2',hist_numtochar(dth.endpoint_value+1)
                 ,          dth.endpoint_value
       ) real_val
,decode(dtc.data_type,'NUMBER',to_char(dth.endpoint_value)

            ,'VARCHAR2',hist_numtochar2(dth.endpoint_value+1)
                 ,          dth.endpoint_value
       ) mod_real
,endpoint_actual_value
from dba_tab_histograms dth
,dba_tab_columns dtc
where dth.table_name ='HIST_TEST'
and  dth.owner=USER
and dth.owner=dtc.owner
and dth.table_name=dtc.table_name
and dth.column_name=dtc.column_name
order by dth.table_name,dth.column_name,dth.endpoint_number
/

Counting the Cost #5 – accurate and fast July 9, 2009

Posted by mwidlake in performance.
Tags: , ,
1 comment so far

I started off my blog by discussing quick ways of selecting count(*) from tables. I never completed the series, so I’m back to it.

This is the previous post on the topic.

It covers how using ALL_TAB_MODIFICATIONSand the NUM_ROWS column in ALL_TABLES can be combined to get a very accurate estimate of count(*) for most tables in most situations.

If this does not work for you {e.g no privilege to see the tables or flush the information), or you absolutely must have the count as accurately as you can right now, as a Manager is threatening you, this is another method using a count table. It can be expanded to hold more than just count(*) and is particularly useful for partitioned tables {which are often huge}.

Basically, every time you select count(*)on a table {or even eg select count(*) where status=0} you are probably scanning a lot of table you scanned last time you did count(*) on that table. If you are not deleting records from that table, that scanning effort is effectively a waste of time – it won’t have changed since last time you did select count(*). So, as a developer or DBA, you can decide you can trust the last count(*), so you need only count more recent records.

If you have an ascending primary key {traditionally a number sourced from a sequence} you can store the max(PK) and count(*) in a simple table. You then count all the records with a PK greater than the stored one. I’ve used count tables several times in the past and two enhancements invariably crop up, so I am going to include them in the below example of how you can implement this.

You need the partitioning option to run the test yourself, so sorry you will need Enterprise Edition with the option. If you downloaded Oracle to your PC/linux box to play on, you probably have it. {it never ceases to amaze me that Oracle Corp will only allow Partitions to be used in EE edition and it is an extra cost option!}

You need to create tablespaces to hold the partition tables {or alter the test script to use existing tablespaces}. Feel free to take and modify this script  to create the tablespaces.

This is my test script .

This is my basic setup:-
NAME VALUE
—————————— ——————–
compatible 10.2.0.1.0
cpu_count 1
db_block_size 8192
db_file_multiblock_read_count 8
optimizer_mode ALL_ROWS
sga_target 612368384
sort_area_size 65536

My test table:-

create table test_p
 id    number(8) not null
 ,status number(1) not null
,num_1 number(3) not null
 ,num_2  number(3)
,vc_pad varchar2(2000))
tablespace parts_curr
 partition by range (id)
 (partition id_10k values less than (10000)
tablespace parts_old
 ,partition id_20k values less than (20000)
tablespace parts_old
 ,partition id_30k values less than (30000)
tablespace parts_old
 ,partition id_40k values less than (40000)
tablespace parts_old
,partition id_max values less than (maxvalue)
tablespace parts_curr
 )
/
-- {41999 records created - see script}
alter table test_p
add constraint pt_pk primary key (id)
using index
local(
partition id_10k tablespace parts_old
 ,partition id_20k tablespace parts_old
,partition id_32k tablespace parts_old
 ,partition id_40k tablespace parts_old
,partition id_max tablespace parts_curr)
/
-- gather stats
begin
dbms_stats.gather_table_stats(ownname=>user
  ,tabname=>'TEST_P',estimate_percent=>100,granularity=>'ALL');
  end;
/-- create the count table
create table test_p_running_count
  (max_id	number(8) not null
  ,row_count number(8)
,constraint tprc_pk primary key(max_id)
  )
/

OK, let’s count the number of records in TEST_P and then do the same but insert the data collected into the count table:-

select count(*) from test_p;

  COUNT(*)
----------
     41999
Execution Plan
----------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)| Time | Pstart| Pstop |
----------------------------------------------------------
|   0 | SELECT STATEMENT       |       |     1 |    26   (4)| 00:00:01 |       |       |
|   1 |  SORT AGGREGATE        |       |     1 |            |          |       |       |
|   2 |   PARTITION RANGE ALL  |       | 41999 |    26   (4)| 00:00:01 |     1 |     5 |
|   3 |    INDEX FAST FULL SCAN| PT_PK | 41999 |    26   (4)| 00:00:01 |     1 |     5 |
-----------------------------------------------------------

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
        111  consistent gets
          0  physical reads

insert into test_p_running_count (max_id,row_count)
select max(id),count(*) from test_p;

Execution Plan
----------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| P stop |
--------------------------------------------------------------
|   0 | INSERT STATEMENT       |       |     1 |     5 |    26   (4)| 00:00:01 |       |     |
|   1 |  SORT AGGREGATE        |       |     1 |     5 |            |          |       |     |
|   2 |   PARTITION RANGE ALL  |       | 41999 |   205K|    26   (4)| 00:00:01 |     1 |   5 |
|   3 |    INDEX FAST FULL SCAN| PT_PK | 41999 |   205K|    26   (4)| 00:00:01 |     1 |   5 |
---------------------------------------------------------------

Statistics
----------------------------------------------------------
         26  recursive calls
         21  db block gets
        115  consistent gets
          0  physical reads

So, it takes 111 consistent gets and an estimated cost of 26 to do a count(*) on this relatively small table, doing a fast full scan of the PK index.  It takes only slightly more effort to create a record to hold this information.

{As an aside, it is curious that the total estimated cost of the insert is the same as the select, suggesting naughtily that the insert is free, but that’s not my topic today}.

I now create 100 records by running a little insert script.

Let’s select the number of records and the new max(ID) created since we last stored information in the count table:

select max(id),count(*) from test_p
where id>(select max(max_id) from test_p_running_count)
 /
   MAX(ID)   COUNT(*)
---------- ----------
     42099        100

Execution Plan
----------------------------------------------------------
| Id  | Operation                     | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
Pstart| Pstop |
--------------------------------------------------------------
|   0 | SELECT STATEMENT              |         |     1 |     5 |     4   (0)| 00:00:01 |      |       |
|   1 |  SORT AGGREGATE               |         |     1 |     5 |            |          |      |       |
|   2 |   PARTITION RANGE ITERATOR    |         |  2100 | 10500 |     2   (0)| 00:00:01 |  KEY |     5 |
|   3 |    INDEX RANGE SCAN           | PT_PK   |  2100 | 10500 |     2   (0)| 00:00:01 |  KEY |     5 |
|   4 |     SORT AGGREGATE            |         |     1 |    13 |            |          |      |       |
| 5 | INDEX FULL SCAN (MIN/MAX)| TPRC_PK | 1 | 13 | 2 (0)| 00:00:01 | | |
---------------------------------------------------------------
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          3  consistent gets
          0  physical reads

The CBO scanned the index of the TEST_P_RUNNING_COUNT table using an index full scan (min/max), which means Oracle basically worked down the index to get a min or max value, which is highty efficient. This value was then used to do an index range scan of only one index partition (partition number 5).

Estimated cost was 4 and the execution took 3 consistent gets {NB this is for the second execution, it is 12 consistent gets an 4 recursive calls the first time, when the statement is parsed}.

But that only got the new max(id) and how many records have been created since you last recorded it, 100. What you really want is the count of records since you last recorded the count and max(id) PLUS the count as it was then. ie the full count now.
This is not straightforward in a single SQL statement as you want the count, a goup function, from one table and the single row value from another. You can do this though. You basically write SQL statements to do the bits you want and then select from them. My example script shows how I build up to this, but the final statement I came up with {and you might well be able to come up with better options yourself} is:

select change.max_id new_max,orig.max_id orig_max
,change.rowcount chg_c,orig.rowcount orig_c 
,change.rowcount+orig.rowcount tot_c
from
(select max(id) max_id,count(*) rowcount,1 tabjoin
  from test_p
  where id>(select max(max_id) from test_p_running_count) ) change
,(select max(max_id) max_id,max(row_count) rowcount,1 tabjoin
  from test_p_running_count)                                            orig
where change.tabjoin=orig.tabjoin
 /

NEW_MAX ORIG_MAX  CHG_C     ORIG_C    TOT_C
---------- ---------- ---------- ---------- -----------------------------
     42099      41999        100      41999                         42099

Execution Plan
----------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CP
U)| Time | Pstart| Pstop |
----------------------------------------------------------
|   0 | SELECT STATEMENT                |                      |     1 |    58 |     7   (0)| 00:00:01 |       |       |
|   1 |  NESTED LOOPS                   |                      |     1 |    58 |     7   (0)| 00:00:01 |       |       |
|   2 |   VIEW                          |                      |     1 |    29 |     4   (0)| 00:00:01 |       |       |
|   3 |    SORT AGGREGATE               |                      |     1 |     5 |  |          |       |       |
|   4 |     PARTITION RANGE ITERATOR    |                      |  2100 | 10500 |     2   (0)| 00:00:01 |   KEY |     5 |
|*  5 |      INDEX RANGE SCAN           | PT_PK                |  2100 | 10500 |     2   (0)| 00:00:01 |   KEY |     5 |
|   6 |       SORT AGGREGATE            |                      |     1 |    13 |  |          |       |       |
| 7 | INDEX FULL SCAN (MIN/MAX)| TPRC_PK | 1 | 13 | 2 (0)| 00:00:01 | | |
|*  8 |   VIEW                          |                      |     1 |    29 |     3   (0)| 00:00:01 |       |       |
|   9 |    SORT AGGREGATE               |                      |     1 |    26 |  |          |       |       |
|  10 |     TABLE ACCESS FULL           | TEST_P_RUNNING_COUNT |     1 |    26 |     3   (0)| 00:00:01 |       |       |
--------------------------------------------------------

Statistics
---------------------------------------------------------
          0  recursive calls
          0  db block gets
         10  consistent gets
          0 physical reads

{you might want to click on “show plain” in the above code window for a clearer layout}
That’s an estimated cost of 7 by the CBO and 10 consistent gets {again, the second itteration}. Remember, scanning the whole partitioned table was costed at 27 and took 111 consistent gets.

You can convert that sql select statement into an insert and use it to update your count table:-

insert into test_p_running_count (max_id,row_count)
 select change.max_id
--,orig.max_id,change.rowcount,orig.rowcount
 ,change.rowcount+orig.rowcount
 from
(select max(id) max_id,count(*) rowcount,1 tabjoin
   from test_p
   where id>(select max(max_id) from test_p_running_count) ) change
,(select max(max_id) max_id,max(row_count) rowcount,1 tabjoin
   from test_p_running_count) orig
where change.tabjoin=orig.tabjoin
 /

That is one of the improvements usually asked for when you have a count table, people want to see what the counts were historically, seeing as you are now holding the data in a table {why is it so many people suddenly want to keep information once it is seen in a table?}. That is why the above does an insert and not an update and my code selects the max(max_id) and max(rowcount). It is not perfect, if no records have been inserted into your massive table since last the max(id) and row count was inserted, you will get a duplicate primary key error. You could add a datetime column to get around this.

Also, my SQL is a little naughty in that I select the max(id) and max(rowcount). There is nothing in the table design to enforce the unspoken rule that they both appear in the “last record” of the table, but as a human you can see that they do. Again, the use of a datetime can help with this.

The final bit for today. The above two sql statments look a bit nasty, certainly a lot more complex than “select count(*) from table”. Also, they are not as efficient as they could be, the selecting max(row_count) is not supported by an index and could slow down over time. I could tweak the SQL statement even further to work around this, but a simple piece of PL/SQL does the job better, and is what I usually end up implementing {within a package, along with a load of other count table functions for several large tables}.

Here is a script to create a function . that gets the current count(*) for the table.

--get_new_max
create or replace function get_tp_rowcount
return number
as
--
v_rtn number;
v_extra_rc number;
v_last_rc   number;
v_max_id    number;
v_last_max  number;
begin
  select max_id,row_count
into v_last_max,v_last_rc
  from test_p_running_count
  where max_id=(select max(max_id) from test_p_running_count);
  select count(*),max(id)
into v_extra_rc,v_max_id
  from test_p
  where id>v_last_max;
dbms_output.put_line(to_char(v_last_rc)
||'~'||to_char(v_last_rc)
              ||'~'||to_char(v_last_max)
              ||'~'||to_char(v_max_id));
v_rtn :=v_last_rc+v_extra_rc;
return v_rtn;
end;
/

And, as you can see below, it remains an efficient trick. I’ll leave it to you to create a procedure to update the count table and with the observation that you can then argue for several hours as to how often you update the count table… :-)

select get_tp_rowcount from dual;

GET_TP_ROWCOUNT
---------------
          42299

1 row selected.

Elapsed: 00:00:00.01

Execution Plan
----------------------------------------------------------
| Id  | Operation        | Name | Rows  | Cost (%CPU)| Time     |
-------------------------------------------------------------
|   0 | SELECT STATEMENT |      |     1 |     2   (0)| 00:00:01 |
|   1 |  FAST DUAL       |      |     1 |     2   (0)| 00:00:01 |
-------------------------------------------------------------

Statistics
----------------------------------------------------------
          2  recursive calls
          0  db block gets
          5  consistent gets
          0  physical reads

--
--

select count(*) from test_p;

  COUNT(*)
----------
     42299

Execution Plan
----------------------------------------------------------
| Id | Operation | Name | Rows | Cost (%CPU)| Time | Pstart| Pstop |
----------------------------------------------------------
|   0 | SELECT STATEMENT       |       |     1 |    26   (4)| 00:00:01 |       |       |
|   1 |  SORT AGGREGATE        |       |     1 |            |          |       |       |
|   2 |   PARTITION RANGE ALL  |       | 41999 |    26   (4)| 00:00:01 |     1 |     5 |
|   3 |    INDEX FAST FULL SCAN| PT_PK | 41999 |    26   (4)| 00:00:01 |     1 |     5 |
----------------------------- ----------------------------
Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
        111  consistent gets
          0  physical reads

Follow

Get every new post delivered to your Inbox.

Join 171 other followers