jump to navigation

IOT part 3 – Significantly Reducing IO August 2, 2011

Posted by mwidlake in development, performance.
Tags: , , , ,
16 comments

<..IOT1 – the basics
<….IOT2 – Examples and proofs
……>IOT4 – Boosting Buffer Cache Efficiency
……..>IOT5 – Primary Key issues
……….>IOT6a – Slowing Down Insert
…………>IOT6(B) – OLTP Inserts

In the previous two posts I covered the basics of Index Organized Tables (IOTs) and then created some test tables to demonstrate the benefit of IOTs that is most often covered – reducing the IO needed to get a single record by one IO, say from 5 to 4. {Whether this is a buffer get from memory or a disc IO depends on if the block is cached, of course}.

In this post I am going to show how IOTs can far more significantly reduce the IO when several related rows are required.

Below is one of my test tables, the one that is a normal heap table and has a primary key, CHHE_PK on PARE_ID and CRE_DATE:

mdw11> desc child_heap
 Name                                      Null?    Type
 ----------------------------------------- -------- --------------
 PARE_ID                                   NOT NULL NUMBER(10)
 CRE_DATE                                  NOT NULL DATE
 VC_1                                      NOT NULL VARCHAR2(100)
 DATE_1                                             DATE
 NUM_1                                              NUMBER(2)
 NUM_2                                              NUMBER(2)

--
mdw11> select count(*),count(distinct(pare_id)) from child_heap

  COUNT(*) COUNT(DISTINCT(PARE_ID))
---------- ------------------------
   1000000                     9999

As you can see, the table has 1 million records and 9,999 values for PARE_ID, there are approx 100 records per parent. The data was created to match a common situation – that of a bit of data coming in for each parent every day. See post 2 for details.

The result of this is that the data for any given parent is scattered through the table. As the data comes in for a given day, the data for the first parent is added to the end of the table, followed by all the data for all the other parents who have data that day. The next day this is repeated, so the child records for a given parent are interspersed with the child records for many other parents.

The below diagram demonstrate what will now happen if you issue a statement like
select *
from CHILD_HEAP
where PARE_ID=12

Oracle quickly works down the index to the leaf block containing the first key that matches the range. This takes, in my example, 4 block reads. Oracle now works through the index entries and, via the rowid, identifies the exact block to visit in the table for each key. For each key it has to visit a new block – because the data is scattered through the table. This is what the clustering_factor in the index statistics is measuring, how often contiguous rows in the index are for the same block. In our case, almost never.
In my diagram I do not number those table reads but in my simplistic diagram it would be 10 further reads.
If Oracle reaches the end of the leaf block before it reaches the end of the range of key values, oracle follows the pointer in the leaf block (not shown) to the next leaf block (whcih is another block read) and continues working through the keys until the range scan is completed.

In my simplified diagram I only have 6 entries per leaf block. In reality, and in my example tables, this is more like a few hundred. 247 in the case of CHHE_PK.

Now let’s consider my Index Organized Table, CHILD_IOT. It has exactly the same columns as CHILD_HEAP and the data was created in the same way. However, because it is an IOT, as the data came in it was inserted into the primary key index and is thus in an ordered state.

The below diagram demonstrate what will now happen if you issue a statement like
select *
from CHILD_IOT
where PARE_ID=12

Oracle works down the index to the leaf block where the range scan begins and now simply works along the leaf blocks. There is no need to go and visit the table as there is no table.

In my IOT diagram the leaf entries are longer and there are fewer in each leaf block, ie 5. So my scan has to visit 3 leaf blocks rather than 2. In reality the difference is more pronounced, in my example table there are actually 56 rows per leaf block, compared to the 247 in the index on the heap table. As such, my scan on the IOT will cover more leaf blocks but this is insignificant compared to the reduction in block visits caused by not having to go hunt down records scattered over the table. Even in the unlikely event of my IOT being deeper by 1 level (an extra layer of branch blocks) due to the reduces entries per leaf block, I would still be winning for range scans.

That is all nice theory and pictures. As ever, we need to back this up with some real tests. Firstly, I am using SQL*Plus and I need to set my arraysize large enough so that I do not introduce extra consistent gets through selecting small sets of rows between client and server. You will need to do the same to see similar results to me.
{I keep meaning to do a dedicated post on arraysize but H.Tonguç YIlmaz has a nice post already on it.}

set arraysize 200
set autotrace on

Now I will select all the records for PARE_ID=10, including a column not in the Primary Key, so that the table needs to be visited. I did this twice to remove the parsing overhead:

select pare_id,cre_date,vc_1
from child_heap
where pare_id =10
order by cre_date

   PARE_ID CRE_DATE  VC_1
---------- --------- -----------------------------------------------------------------------
        10 17-APR-11 LDOBKMLCYCSQYBDFIUISJWQAHNYSQOSUQJKIGCSEJHDPOFFLHHXYSMDSQNUB
        10 18-APR-11 LBGDNOYQFQMTMJQRAUWSRNBTHQSKBEUVLZSFWEGULOPDXQSVXOIC
        10 18-APR-11 LBGDNOYQFQMTMJQRAUWSRNBTHQSKBEUVLZSFWEGULOPDXQSVXOICOSFTSYNO
        10 19-APR-11 IBVTIGYBXJLMZQKRPJZEPXLMQLNOYNWLQOYVVGARNSITZWULVBYLEJKZNII
        10 19-APR-11 IBVTIGYBXJLMZQKRPJZEPXLMQLNOYNWLQOYVVGARNSITZWULVBYLEJ
        10 19-APR-11 IBVTIGYBXJLMZQKRPJZEPXLMQLNOYNWLQOYVVGARNSITZWULVBYLEJ
        10 20-APR-11 USIGVSPPIUUXEIRBMPFNBTTMDUJTVITHKQWZAKZOMJEDZCUPQAEFQQEYM
        10 20-APR-11 USIGVSPPIUUXEIRBMPFNBTTMDUJTVITHKQWZAKZOMJEDZCUPQAEF
...
        10 19-JUL-11 BNOYCIDTFJHPPOYPSVAVKJSYUNVPGPHLJXUOIKYKASKHYGZNVHVFFGPVAKN
        10 25-JUL-11 HDFGAQWTYZBSVYVXTFFRDIAKRYWFUPFCNDCETHUWHSQUITHHVUEJTJ

82 rows selected.


Execution Plan
------------------------------------------------------------------------------------------
| Id  | Operation                   | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |            |   100 |  6900 |   103   (0)| 00:00:02 |
|   1 |  TABLE ACCESS BY INDEX ROWID| CHILD_HEAP |   100 |  6900 |   103   (0)| 00:00:02 |
|*  2 |   INDEX RANGE SCAN          | CHHE_PK    |   100 |       |     3   (0)| 00:00:01 |
------------------------------------------------------------------------------------------

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
         86  consistent gets
          0  physical reads

82 rows collected and 86 consistent gets. That will be 4 consistent gets to process the index blocks and 82 for the table blocks.

Now let’s repeat that on the IOT:

select pare_id,cre_date,vc_1
from child_IOT
where pare_id =10
order by cre_date
mdw11> /
any key>

   PARE_ID CRE_DATE  VC_1
---------- --------- ------------------------------------------------------------
        10 17-APR-11 QJHQXTQAYEUICPNDQTYMMFZPWJSIDLBKOXYTHLEHKTVWUPKQMWUUX
        10 18-APR-11 BUTUEWDCDQVPLTPPRFGBBEDOZYRPERPRROVUQPTSRZLHKVBSBUEAMZYAS
        10 18-APR-11 BUTUEWDCDQVPLTPPRFGBBEDOZYRPERPRROVUQPTSRZLHKVBSBUEAMZY
        10 19-APR-11 DEGNPALVLMIDYCYIQIIQJJVZFTNIMEULMAGDEWVTOAKBNHOPUQJE
        10 19-APR-11 DEGNPALVLMIDYCYIQIIQJJVZFTNIMEULMAGDEWVTOAKBNHOPUQJ
...
        10 24-JUL-11 TJGLOEITTVXQTQPHSKGVERSGJDREYSKKCDUFMQXQVXMHMMDWPLJNSNK
        10 24-JUL-11 TJGLOEITTVXQTQPHSKGVERSGJDREYSKKCDUFMQXQVXMHMMDWPLJNSNKCN
        10 25-JUL-11 BCLLVPYMWAAQOVLILXARQZXEGAQAARPURIFKFKHROUSFORRYYXQZUAJHDBL

108 rows selected.


Execution Plan
----------------------------------------------------------------------------
| Id  | Operation        | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT |         |   100 |  6900 |     4   (0)| 00:00:01 |
|*  1 |  INDEX RANGE SCAN| CHIO_PK |   100 |  6900 |     4   (0)| 00:00:01 |
----------------------------------------------------------------------------

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          6  consistent gets
          0  physical reads

We actually gathered more data, 108 rows compared to 82, all for 6 consistent gets compared to 86 consistent gets. That is a reduction to less than 10% of the original effort.

Now for a more extreme test. I am going to select a single row summary of data for 10 parents, flushing the cache between each run to show the impact when you have to do real IO to support those consistent gets. This is on a fairly old {4 years} laptop with a rather tired hard disc

alter system flush buffer_cache

System altered.

Elapsed: 00:00:00.18

--
--

select count(*),sum (num_1)
from child_heap
where pare_id between 50 and 60

  COUNT(*) SUM(NUM_1)
---------- ----------
      1155      12031

Elapsed: 00:00:06.39

Execution Plan
-------------------------------------------------------------------------------------------
| Id  | Operation                    | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |            |     1 |     7 |  1203   (0)| 00:00:18 |
|   1 |  SORT AGGREGATE              |            |     1 |     7 |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID| CHILD_HEAP |  1200 |  8400 |  1203   (0)| 00:00:18 |
|*  3 |    INDEX RANGE SCAN          | CHHE_PK    |  1200 |       |     7   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
       1157  consistent gets
       1112  physical reads

--
--

alter system flush buffer_cache

System altered.

Elapsed: 00:00:00.18

--
--

select count(*),sum (num_1)
from child_iot
where pare_id between 50 and 60

  COUNT(*) SUM(NUM_1)
---------- ----------
      1111      11528

Elapsed: 00:00:00.29

Execution Plan
-----------------------------------------------------------------------------
| Id  | Operation         | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |         |     1 |     7 |    24   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE   |         |     1 |     7 |            |          |
|*  2 |   INDEX RANGE SCAN| CHIO_PK |  1200 |  8400 |    24   (0)| 00:00:01 |
-----------------------------------------------------------------------------

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
         25  consistent gets
         25  physical reads

The Heap took 6.39 seconds and required 1112 physical reads to support 1157 consistent gets.
The IOT took 0.29 seconds and required 25 physical reads to support 25 consistent gets.

I think we can all see that IOTs have the potential to greatly reduce physical and logical IO. Perhaps we should all be using IOTs more.

Final point. The Heap version took less physical reads than consistent gets as some blocks read into the block buffer cache held data required later in the query.

The impact of IOTs on the buffer cache will be the topic of my next post on IOTs. I think { hope:-) } that many of you will be very interested and impressed by what you could gain…

IOT 2 – First examples and proofs July 26, 2011

Posted by mwidlake in development, performance.
Tags: , , , ,
8 comments

<.. IOT1 – Basics
..>IOT3 – Great reductions in IO for IOTs
….>IOT4 – Boosting Buffer Cache Efficiency
……>IOT5 – Primary Key issues
……….>IOT6a – Slowing Down Insert
…………>IOT6(B) – OLTP Inserts

In my first post on IOTs I ran through the basics of what they are. Here I am going to create some test tables and show you a few things.

I am going to create a simple PARENT table with 9,999 records and then two CHILD tables. CHILD_HEAP, a normal table, and CHILD_IOT, an Index Organized Table. They have the same columns and will hold very similar data.

All of this is on Oracle 11.1 but is exactly the same on 10.2. 8K block size, tablespaces are auto segment space managed.

Here are the creation statements:

--first create the parent table, keyed by ID.
-- The other columns are not significant, they just represent "information"
create table mdw.parent
(id       number(10)    not null 
,name     varchar2(100) not null
,date_1   date          not null
,num_1    number(2)
,num_2    number(2) 
,constraint pare_pk primary key(id) 
 using index tablespace index_01
)
tablespace data_01
/
--
--Now put my 9999 parents into the table.
insert into parent
select rownum
,dbms_random.string('U',mod(rownum,10)+50)
,sysdate-(mod(rownum,500)+1000)
,mod(rownum,99)+1
,trunc(dbms_random.value(0,100))
from dual connect by level < 10000
/
--
-- create the table to hold the children as a heap table
create table child_heap
(pare_id   number(10)    not null
,cre_date  date          not null
,vc_1      varchar2(100) not null
,date_1    date
,num_1     number(2)
,num_2     number(2)
,constraint chhe_pk primary key(pare_id,cre_date) 
 using index tablespace index_01
)
tablespace data_01
/
--
-- create the table to hold the children as an IOT table
create table child_iot
(pare_id   number(10)    not null
,cre_date  date          not null
,vc_1      varchar2(100) not null
,date_1    date
,num_1     number(2)
,num_2     number(2)
,constraint chio_pk primary key(pare_id,cre_date) 
-- using index tablespace index_01 -- CANNOT STATE for IOT. State in table definition
)
ORGANIZATION INDEX -- This is it. This makes the table an IOT
tablespace data_01
/

There are only two differences between the statements creating the CHILD_HEAP and the CHILD_IOT tables.

The main one is the inclusion of the line ORGANIZATION INDEX and is what instructs Oracle to create the table as an IOT. Note that it does not state the index and you cannot state the index. The IOT is created based on the Primary Key.
The other change is that you now cannot state the tablespace for the Primary Key index. I’ve not played with this at all but I don’t think you can state anything with the “using index” as the table storage clauses are used for the Primary Key index. I personally find this a little illogical as it is the index segment that is created, but I guess others would find it more natural that you still state this at the table level.

When I create IOTs on a real system, I put the IOT in a table tablespace {I still maintain table and index tablespaces, for reasons I won’t go into here}. I put it there as it holds the actual data. If I lose that Primary Key index I am losing real data, not duplicated data.

I then populated the two CHILD tables with data. The method of creating this test data is very important.

I am simulating a very common situation, where data is coming in for a set of Parents (think customers, accounts, scientific instruments, financial entities) and the data is coming in as a record or set of records each day. ie not where the parent and all of it’s child records are created at one time, like an order and it’s order lines. I am simulating where the child data is created a few records at a time, not all in one go.

The code is simple. it loops for one hundred days and for each day it creates 10,000 records for random parents. On each day any given parent will have none, one or several records. On average, each parent will end up with 100 records, but some will have more and some less. The key thing is that the data for any given parent is created a record at a time, with lots of records created for other parents before the next record for that given parent.

The two tables will have the same pattern of data but not identical data. {I could have seeded the random number generator to make the two data sets the same but this will do}. Below is the statement for one table, you just change the table name to populate each table. {BTW I like using the from dual connect by level <=x method of getting the number of rows desired – it is fast and is neat, once you have seen it once}.

declare
v_num number :=10000; -- number of people
v_str varchar2(60);
begin
dbms_output.put_line (to_char(SYSTIMESTAMP,'HH24:MI:SS.FF'));
for i in 1..100 loop --days to do
  v_str:=dbms_random.string('U',60);
  insert into CHILD_HEAP
    (pare_id,cre_date,vc_1,date_1,num_1,num_2)
  select
    trunc(dbms_random.value(1,v_num))
   ,sysdate-(100-i) + (rownum/(60*60*24) )
   ,substr(v_str,1,51+mod(rownum,10))
   ,sysdate-(100-i) + ((mod(rownum,30)+1)/3)
   ,mod(rownum,20)+1
   ,mod(rownum,99)+1
  from dual connect by level <=v_num;
end loop;
dbms_output.put_line (to_char(SYSTIMESTAMP,'HH24:MI:SS.FF'));
end;
/

I then gathered objects stats on the tables.
Let’s check the size of the tables:

select segment_name, segment_type,tablespace_name,blocks
from dba_segments where owner=USER and segment_name like 'CHILD%';

SEGMENT_NAME    SEGMENT_TYPE    TABLESPACE_NAME     BLOCKS
--------------- --------------- --------------- ----------
CHILD_HEAP      TABLE           DATA_01              12288

1 row selected.

ONE row? Where is the other table, where is CHILD_IOT? It does not exists.

Remember from my first post that I made the comment I would have prefered it if Index Organized Tables had been called something like ‘Table Containing Indexes’? The table data has been placed in the Primary Key index and the table segment does not even exist. If you start using IOTs this will catch you out periodically – it does me anyway and I’ve been using them on and off for years:-).

Let’s look at the size of the primary key indexes:

select segment_name, segment_type,tablespace_name,blocks
from dba_segments where owner=USER and segment_name like 'CH%PK'
and segment_name not like '%ORD%'

SEGMENT_NAME    SEGMENT_TYPE    TABLESPACE_NAME     BLOCKS
--------------- --------------- --------------- ----------
CHHE_PK         INDEX           INDEX_01              4224
CHIO_PK         INDEX           DATA_01              19456

2 rows selected.

Note that the Primary Key index for CHILD_HEAP, CHHE_PK, is there and is 4,224 blocks in size, and the CHILD_IOT Primary Key, CHIO_PK, is a lot larger at 19,456 blocks. In fact, not only is the CHIO_PK index larger than the CHILD_HEAP table, it is larger than the combined size of the CHILD_HEAP table and CHHE_PK index combines. So much for me saying last post that IOTs can save disk space? I’ll come back to that in a later post…

Here are some other stats from one of my scripts:

mdw11> @tab_sci_own
owner for Table: mdw
Name for Table: child_heap

OWNER    TABLE_NAME          NUM_ROWS      BLOCKS AVG_L GLS ULS LST_ANL      PRT  SAMP_SIZE
-------- -------------- ------------- ----------- ----- --- --- ------------ --- ----------
MDW      CHILD_HEAP          1000,000      12,137    83 YES NO  250711 22:01 NO     1000000


INDEX_NAME      TYP PRT UNQ BL     L_BLKS   DIST_KEYS       CLUSTF     LB_KEY     DB_KEY LST_ANL
--------------- --- --- --- -- ---------- ----------- ------------ ---------- ---------- ------------
CHHE_PK         NOR NO  UNI  2      4,034    1000,000      995,857          1          1 250711 22:02

INDEX_NAME                   TABLE_NAME       PSN COL_NAME
---------------------------- ---------------- --- ------------------------------------------------
CHHE_PK                      CHILD_HEAP       1   PARE_ID
CHHE_PK                      CHILD_HEAP       2   CRE_DATE

--
--
owner for Table: mdw
Name for Table: child_iot

OWNER    TABLE_NAME          NUM_ROWS      BLOCKS AVG_L GLS ULS LST_ANL      PRT  SAMP_SIZE
-------- -------------- ------------- ----------- ----- --- --- ------------ --- ----------
MDW      CHILD_IOT           1000,000                83 YES NO  250711 22:03 NO     1000000

INDEX_NAME      TYP PRT UNQ BL     L_BLKS   DIST_KEYS       CLUSTF     LB_KEY     DB_KEY LST_ANL
--------------- --- --- --- -- ---------- ----------- ------------ ---------- ---------- ------------
CHIO_PK         IOT NO  UNI  2     17,855     910,881            0          1          1 250711 22:03

INDEX_NAME                   TABLE_NAME       PSN COL_NAME
---------------------------- ---------------- --- ------------------------------------------------
CHIO_PK                      CHILD_IOT        1   PARE_ID
CHIO_PK                      CHILD_IOT        2   CRE_DATE

Note the lack of BLOCKS for the CHILD_IOT table and the CLUSTERING_FACTOR of 0 for the CHIO_PK.

The clustering factor is the number of times Oracle, when scanning the whole index in order, would have to swap to a different Table block to look up the table record for each index entry. If it is close to the number of blocks in the table, then the clustering factor is low and the order of records in the table matches the order of entries in the index. This would make index range scans that need to visit the table reasonably efficient.

If the clustering factor is close to the number of records in the table then it means there is no correlation between index order and table row order and such index ranges scans that have to visit the table would be inefficient. Again, this is significant and will be the major topic of the next post.

The depth of the index does not change, being 3 in each case (BL or blevel 2)

So, can we see evidence of the theoretical efficiency of looking up single records via the IOT that I mentioned in the fist post? Here we go {oh, usual disclaimer, I run the code twice and show the second run, to remove the parsing overhead}:

-- First the Heap table
select * from child_HEAP where PARE_ID=1234
AND cre_date=to_date('24-JUN-11 20:13:21','DD-MON-YY HH24:MI:SS')

   PARE_ID CRE_DATE  VC_1
---------- --------- ------------------------------------------------------
DATE_1         NUM_1      NUM_2
--------- ---------- ----------
      1234 24-JUN-11  LUTFHOCIJNYREYICQNORREAJOVBRIHFVLXNIGIVZDMFJCTGYFWC
25-JUN-11         11         16
1 row selected.

Execution Plan
------------------------------------------------------------------------------------------
| Id  | Operation                   | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |            |     1 |    83 |     3   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| CHILD_HEAP |     1 |    83 |     3   (0)| 00:00:01 |
|*  2 |   INDEX UNIQUE SCAN         | CHHE_PK    |     1 |       |     2   (0)| 00:00:01 |
------------------------------------------------------------------------------------------

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          4  consistent gets


--and now the IOT table

select * from child_IOT where PARE_ID=1234
AND cre_date=to_date('24-JUN-11 21:23:41','DD-MON-YY HH24:MI:SS')

   PARE_ID CRE_DATE  VC_1
---------- --------- -------------------------------------------------------
DATE_1         NUM_1      NUM_2
--------- ---------- ----------
      1234 24-JUN-11
CSIGBHSXWNDDTCFRCNWYPRNLEQWPCRYTXQQZHACDEXHOBEYXLNYBHRUHJ
27-JUN-11          7         52
1 row selected.

Execution Plan
-----------------------------------------------------------------------------
| Id  | Operation         | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |         |     1 |    83 |     2   (0)| 00:00:01 |
|*  1 |  INDEX UNIQUE SCAN| CHIO_PK |     1 |    83 |     2   (0)| 00:00:01 |
-----------------------------------------------------------------------------

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          3  consistent gets

{I had to look up the exact values of CRE_DATE of a couple of records to do the above queries}

To look up a single row with the heap table you can see that the explain plan was to carry out a unique scan on the primary key and then look up the row via the rowid and took 4 consistent gets. 3 to walk down the index and get the rowid, one to look up the row block.

For the IOT table the explain plan reveals that there was simply an index unique scan of the Primary Key, nothing more. All data for the row was there in the index entry rather than the rowid. Thus only 3 consistent gets were required.

For single row lookups on the Primary Key, IOTS are more efficient than traditional Heap tables with a Primary Key index. {Please, no one point out that if all the columns you need are in the index you also do not need to go to the table, that is a different topic}.

Quite a few people have shown this efficiency before but the next step is far, far more interesting and shows a much more significant impact of IOTs. That is the topic of the next post:-).

For now, I am going to finish off with what happens with range scans as I suggested they could slow down with an IOT.
Below, I select count(*) for just one of the parent values.

select count(*) from child_heap where pare_id = 2

  COUNT(*)
----------
        98

Execution Plan
-----------------------------------------------------------------------------
| Id  | Operation         | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |         |     1 |     4 |     3   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE   |         |     1 |     4 |            |          |
|*  2 |   INDEX RANGE SCAN| CHHE_PK |   100 |   400 |     3   (0)| 00:00:01 |
-----------------------------------------------------------------------------

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          3  consistent gets

--
--

select count(*) from child_iot where pare_id = 2

  COUNT(*)
----------
        93

Execution Plan
-----------------------------------------------------------------------------
| Id  | Operation         | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |         |     1 |     4 |     4   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE   |         |     1 |     4 |            |          |
|*  2 |   INDEX RANGE SCAN| CHIO_PK |   100 |   400 |     4   (0)| 00:00:01 |
-----------------------------------------------------------------------------

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          4  consistent gets

Both statements carry out a range scan on the Primary Key of the table. For the normal HEAP table this takes 3 consistent gets, which is no suprise as we have an 8k block size and only 100 rows for a given parent, they happen to fit into one block of the index. So Oracle works down the depth of the index and looks at one block.

For the IOT the scan works down the index but has to scan three blocks. Even though there are fewer entries, 93 compared to 98, they span three blocks and thus the total number of consistent gets is 5.

Admittedly I was a little lucky in my example above. Sometimes the entries for one parent will scan 2 blocks for the heap table’s Primary Key and occasionally the entries for the IOT will fit into 2 blocks. But if you look at the number of leaf blocks in the earlier stats (4,034 for the normal and 17,855 for the IOT, both for 10,000 entries) usually the 100 or so entries for single parent in the normal index will all fall into one block and the entries for the IOT will fall into between 2 and 3 blocks.

A select count(*) will full scan the smallest segment that can satisfy the query. Let’s try it:

mdw11> select count(*) from child_heap

  COUNT(*)
----------
   1000000

Execution Plan
-------------------------------------------------------------------------
| Id  | Operation             | Name    | Rows  | Cost (%CPU)| Time     |
-------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |         |     1 |   989   (1)| 00:00:15 |
|   1 |  SORT AGGREGATE       |         |     1 |            |          |
|   2 |   INDEX FAST FULL SCAN| CHHE_PK |  1000K|   989   (1)| 00:00:15 |
-------------------------------------------------------------------------

Statistics
----------------------------------------------------------
          1  recursive calls
          2  db block gets
       4109  consistent gets
       4088  physical reads

mdw11> select count(*) from child_iot

  COUNT(*)
----------
   1000000

Execution Plan
-------------------------------------------------------------------------
| Id  | Operation             | Name    | Rows  | Cost (%CPU)| Time     |
-------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |         |     1 |  4359   (1)| 00:01:05 |
|   1 |  SORT AGGREGATE       |         |     1 |            |          |
|   2 |   INDEX FAST FULL SCAN| CHIO_PK |  1000K|  4359   (1)| 00:01:05 |
-------------------------------------------------------------------------

Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
      19298  consistent gets
      19246  physical reads

The number of consistent gets (and physical reads) are close to the number of leaf blocks in the two segments, though higher. This is because Oracle is scanning the whole index, leaf blocks and branch blocks. The scan is far more expensive for the IOT, simply as the index is so much larger. I’ve not shown timings but on my little laptop, the count(*) takes about 3 seconds on CHILD_HEAP and about 5 seconds on the CHILD_IOT.

That is enough for one post.

Index Organized Tables – the Basics. July 18, 2011

Posted by mwidlake in development, internals, performance.
Tags: , , , ,
34 comments

..>IOT2 – Examples and proofs
….>IOT3 – Greatly reducing IO with IOTs
……>IOT4 – Boosting Buffer Cache Efficiency
……..>IOT5 – Primary Key issues
……….>IOT6a – Slowing Down Insert
…………>IOT6(B) – OLTP Inserts

I think Index Organized Tables(IOTs) are a much under-used and yet very useful feature of Oracle. Over the next few postings I’m going to cover some aspect of Index Organised Tables, both good and not-so-good. I am going to cover some benefits of IOTs that I think many people are unaware of. In this first post I am just going to run through the basics of IOTs.

The idea behind an IOT is simple. You hold all the data for the table in the ordered structure of an index. Why would you want to do that? Let us consider a very common requirement, accessing a row in a “large” table via a known, unique key.

Traditionally you have a heap table holding the data you want to access and a standard index to support access to that table. See the first diagram below. The 4-layer triangle represents the index, with a root block, two levels of branch blocks and then the leaf blocks at the “bottom”. The blue rectangle represents the table with the squares being individual rows. Of course, in a large table there would be thousands or millions of “squares”, this is just a simple diagram to show the idea.

When you issue a SQL statement to select the row via the indexed column(s) then oracle will read the root block (1), find the relevent block in the first level of branch blocks (2), then the relevant block in the second level of branch blocks (3) and finally (as far as the index is concerned) the relevant Leaf Block for the unique key. The leaf block holds the indexed column(s) and also the rowid. The rowid is the fastest way to look up a record, it states the file, block and row offset for the row. This allows oracle to go straight to the block and get the row. That is read number (5).
The number of branch blocks {and thus the number of blocks that need to be read to find a row} will vary depending on how much data is indexed, the number and size of the columns in the index, how efficiently the space has been used in the blocks and one or two other factors. In my experience most indexes for tables with thousands or millions of rows have one, two or three levels of branch blocks.

The second diagram shows a representation of the Index Organized Table. The table has in effect disappeared as a distinct object and the information has been moved into the leaf blocks of the index {part of me feels Index Organized Tables should really be called Table Organized Indexes or Table Containing Indexes as that would better indicate what is physically done}:

So with the IOT oracle reads the root block (1), the two branch level blocks (2 and 3) and finally the leaf block (4). The leaf block does not hold the rowid but rather the rest of the columns for the table {this can be changed, a more advanced feature allows you to store some or all the extra columns in an overflow segment}. Thus to access the same data, Oracle has to read only 4 blocks, not 5. Using an IOT saves one block read per unique lookup.

This saving of block reads is probably the main feature that IOTs are known for, but there are others which I will cover in later posts. Two things I will mention now is that, firstly, the use of IOTs is potentially saving disc space. An index is in effect duplication of data held in the table. When you create an index no new information is created but space is used up holding some of the table information in a structure suitable for fast lookup. Secondly, the index and table have to be maintained whenever a change is made to the columns that are indexed. IOTs reduce this maintenance overhead as there is only one thing to maintain.

Now for some drawbacks.

  • The IOT has to be indexed on the primary key. There is no option to create an IOT based on other indexes. As such you have to either be accessing the table via the primary key to get the benefit – or you have to be a little cunning.
  • The index is going to be larger than it was and very often larger than the original table. This can slow down range scans or full scans of the index and a “full table scan” will now be a full index scan on this large object, so that can also negatively impact performance. However, if a range scan would then have resulted in access to the table to get extra columns, the IOT gives a similar benefit in reducing IO to that for single row lookups.
  • I just want to highlight that you now have no rowid for the rows.
  • Secondary indexes are supported but will potentially be less efficient due to this lack of rowid.

So, a brief summary is that Index Organised Tables effectively move the table data into the Primary Key index, reduce the number of block lookups needed to select one row, can save some disc space. But you can only organize the table via the Primary Key and it can make full or partial table scans and lookups via other indexes slower.

There are several more important benefits to IOTs {in my opinion} which I will come to over the next week or two.

Missing information in SQL*Plus May 19, 2011

Posted by mwidlake in development.
Tags:
13 comments

By preference I like to work against the databse using SQL*Plus. GUIs are fine and they are much better for developing code than using sql*plus and notepad (or vi or whatever ascii text editor floats your boat). However, for finding things out quickly and also storing what I find (via spool files) then sql*plus is much better. As screens have got larger, I have used a larger – and wider – sql*plus session to show more information at one time.
But I have been having an issue with SQL*Plus not showing some information. I lose information on the right hand side of the screen.

This is on the V10 client on Windows by the way, I’ve seen it on windows XP to Vista.

Look at the below. It is quite a wide output and I have had to increase the standard 80 character linesize from 80 up to 120:

{BTW if you are looking at this in a small browser winder, you might have to click on the screen shot to open it up in it’s own window, to see what I mean – it is quite wide, that is the point of the post – it should just fit in my over-wide blog layout :-)}

set linesize 120

AUDIT_PROGRAM            entry_dt ENTRY_TS     AUDIT_TEXT                                                    aud_val1   aud_val2
------------------------ -------- ------------ -----------------------------------------------------
MM_STATS_GATHER          19-05-11 04.46.39.599 Index EVE_CAT_USER_FK_I Partition EVE_I_TOP                          0        100
MM_STATS_GATHER          19-05-11 04.46.39.571 Index EVE_IDX Partition EVE_I_TOP                                    0        100
MM_STATS_GATHER          19-05-11 04.46.39.556 Index EVE_AGR_CNTRPY_FK_I Partition EVE_I_TOP                        0        1
MM_STATS_GATHER          19-05-11 04.46.39.519 Index EVE_CAT_BANK_FK_I Partition EVE_I_TOP                          0        100
MM_STATS_GATHER          19-05-11 04.46.38.846 Table EVENTS Partition EVE_D_TOP                                     0        100
MM_STATS_GATHER          19-05-11 04.46.32.497 Index EVE_CAT_USER_FK_I Partition EVE_I_2011_07                  4,962       
MM_STATS_GATHER          19-05-11 04.46.20.787 Index EVE_IDX Partition EVE_I_2011_07                          437,500         
MM_STATS_GATHER          19-05-11 04.46.09.939 Index EVE_AGR_CNTRPY_FK_I Partition EVE_I_2011_07               91,
MM_STATS_GATHER          19-05-11 04.45.58.613 Index EVE_CAT_BANK_FK_I Partition EVE_I_2011_07                441,2
MM_STATS_GATHER          19-05-11 04.45.44.534 Table EVENTS Partition EVE_D_2011_07                           927,950          
MM_STATS_GATHER          19-05-11 04.45.41.207 Index EVE_CAT_USER_FK_I Partition EVE_I_2011_04                  6,324       
MM_STATS_GATHER          19-05-11 04.45.22.638 Index EVE_IDX Partition EVE_I_2011_04                          771,840         
MM_STATS_GATHER          19-05-11 04.45.17.024 Index EVE_AGR_CNTRPY_FK_I Partition EVE_I_2011_04              168,
MM_STATS_GATHER          19-05-11 04.45.05.463 Index EVE_CAT_BANK_FK_I Partition EVE_I_2011_04                848,0
MM_STATS_GATHER          19-05-11 04.44.55.474 Table EVENTS Partition EVE_D_2011_04                         1,625,100          
MM_STATS_GATHER          19-05-11 04.44.47.100 Index EVE_CAT_USER_FK_I Partition EVE_I_2011_01                    461       
MM_STATS_GATHER          19-05-11 04.44.23.829 Index EVE_IDX Partition EVE_I_2011_01                          800,020         
MM_STATS_GATHER          19-05-11 04.44.03.673 Index EVE_AGR_CNTRPY_FK_I Partition EVE_I_2011_01              158,

Note that for line 8 onwards there is no value for aud_val2. But then, some of the numbers in aud_val1 also look odd. But I’m getting output for some of the lines…

I am in fact getting some of the lines truncated. It just seems to be a bug in the V10 sql*plus client and it might pass you by as the first few lines usually come out “full length”. This truncation only happens if you increase the linesize.

To fix it, you need tell the “windows” sql*plus window how big your output now is. Click on the options and then environment menu items at the top left of the screen:

Change the SQL*Plus environment

Now set the screen buffer width to a bit more than it was, say by one character (and whilst you are at it, why not increasre the buffer length to 2000, the memory required is peanuts on today’s machines). I actually tend to set it to 140 to anticipate some of my more verbose scripts.

Alter the screen buffer width to one or two characters more

And confirm that you are happy to “truncate the buffer” – whoever coded this did not bother to check if you alter the values up or down – you can ignore this message if you are simply increasing the buffers.

Accept the prompt about truncation

Now when I run my sql statement I see the full output {Again, click on the image to see it’s full length to fully see how the problem is now fixed}:

AUDIT_PROGRAM            entry_dt ENTRY_TS     AUDIT_TEXT                                                    aud_val1   aud_val2
------------------------ -------- ------------ ------------------------------------------------------- -------------- ----------
MM_STATS_GATHER          19-05-11 04.46.39.599 Index EVE_CAT_USER_FK_I Partition EVE_I_TOP                          0        100
MM_STATS_GATHER          19-05-11 04.46.39.571 Index EVE_IDX Partition EVE_I_TOP                                    0        100
MM_STATS_GATHER          19-05-11 04.46.39.556 Index EVE_AGR_CNTRPY_FK_I Partition EVE_I_TOP                        0        100
MM_STATS_GATHER          19-05-11 04.46.39.519 Index EVE_CAT_BANK_FK_I Partition EVE_I_TOP                          0        100
MM_STATS_GATHER          19-05-11 04.46.38.846 Table EVENTS Partition EVE_D_TOP                                     0        100
MM_STATS_GATHER          19-05-11 04.46.32.497 Index EVE_CAT_USER_FK_I Partition EVE_I_2011_07                  4,962        100
MM_STATS_GATHER          19-05-11 04.46.20.787 Index EVE_IDX Partition EVE_I_2011_07                          437,500          5
MM_STATS_GATHER          19-05-11 04.46.09.939 Index EVE_AGR_CNTRPY_FK_I Partition EVE_I_2011_07               91,395         20
MM_STATS_GATHER          19-05-11 04.45.58.613 Index EVE_CAT_BANK_FK_I Partition EVE_I_2011_07                441,240          5
MM_STATS_GATHER          19-05-11 04.45.44.534 Table EVENTS Partition EVE_D_2011_07                           927,950          2
MM_STATS_GATHER          19-05-11 04.45.41.207 Index EVE_CAT_USER_FK_I Partition EVE_I_2011_04                  6,324        100
MM_STATS_GATHER          19-05-11 04.45.22.638 Index EVE_IDX Partition EVE_I_2011_04                          771,840          5
MM_STATS_GATHER          19-05-11 04.45.17.024 Index EVE_AGR_CNTRPY_FK_I Partition EVE_I_2011_04              168,840          5
MM_STATS_GATHER          19-05-11 04.45.05.463 Index EVE_CAT_BANK_FK_I Partition EVE_I_2011_04                848,060          5
MM_STATS_GATHER          19-05-11 04.44.55.474 Table EVENTS Partition EVE_D_2011_04                         1,625,100          1
MM_STATS_GATHER          19-05-11 04.44.47.100 Index EVE_CAT_USER_FK_I Partition EVE_I_2011_01                    461        100
MM_STATS_GATHER          19-05-11 04.44.23.829 Index EVE_IDX Partition EVE_I_2011_01                          800,020          5
MM_STATS_GATHER          19-05-11 04.44.03.673 Index EVE_AGR_CNTRPY_FK_I Partition EVE_I_2011_01              158,600          5

Database Sizing – How much Disk do I need? (The Easy Way) November 11, 2010

Posted by mwidlake in Architecture, development, VLDB.
Tags: , , , ,
7 comments

How much Disk do I need for my new Oracle database? Answer:-

  • 8-10 times the volume of raw data for an OLTP system
  • 2-4 times the raw data volume for a Data Warehouse.
  • The bigger the database, the nearer you will be to the lower multiplication factors.

{Disclaimer. This is of course just my opinion, based on some experience. If you use the above figures for a real project and get the total disc space you need wrong, don’t blame me. If you do and it is right, then of course you now owe me a beer.}

Many of us have probably had to calculate the expected size a database before, but the actual database is only one component of all the things you need to run the Oracle component of your system. You need to size the other components too – Archived redo logs, backup staging area, dataload staging area, external files, the operating system, swap space, the oracle binaries {which generally gets bigger every year but shrink in comparison to the average size of an Oracle DB} etc…

In a similar way to my thoughts on how much database space you need for a person, I also used to check out the total disk space every database I created and those that I came across took up. {A friend emailed me after my earlier posting to ask if I had an obsession about size. I think the answer must be “yes”}.

First of all, you need to know how much “raw data” you have. By this I mean what will become the table data. Back in the early 90’s this could be the total size of the flat files the old system was using, even the size of the data as it was in spreadsheets. An Oracle export file of the system gives a pretty good idea of the raw data volume too. Lacking all these then you need to roughly size your raw data. Do a calculation of “number_of_rows*sum_of_columns” for your biggest 10 tables (I might blog more on this later). Don’t be tempted to overestimate, my multipliers allow for the padding.

Let us say you have done this and it is 60GB of raw data for an OLTP system. Let the storage guys know you will probably want about 500GB of space. They will then mentally put it down as “of no consequence” as if you have dedicated storage guys you probably have many terabytes of storage. {Oh, I should mention that I am not considering redundancy at all but space that is provided. The amount of actual spinning disk is down to the level and type of RAID you storage guys make you use. That is a whole other discussion}.

If you come up with 5TB of raw data for a DW system then you need around 12-15TB of disk storage.

If you come up with more than a Terabyte or so of raw data for an OLTP system or 10 to 20 Terabytes for a DW, when you give you figures to the storage guys/procurement people then they may well go pale and say something like “you have got to be kidding!”. This is part of why the multiplication factor for Data Warehouses and larger systems in general is less, as you are forced to be more careful about the space you allocate and how you use it.

The overhead of total disk space over Raw data reduces as the database gets bigger for a number of reasons:

  • The size of the Oracle binaries and the OS does not change as the database gets bigger.
  • The size of swap space does not increase in line wiht the database as, generally speaking, if you increase the database size from 100GB to 1TB you do not have the luxury of increasing the system memory of your server. It probably doubles.
  • Very large databases tend to have something making them big, like images or embedded documents, which are not indexed. Thus the ratio of table segments to index segments increases.
  • If you have a very large database you start removing indexes (often those that support constraints) to aid performance of data load and management, again improving the ratio of table segments to index segments.
  • Backups become partial or incremental to reduce the size and duration of the backup.
  • As mentioned before, the sheer size of system is such that you just take more care over cleaning up work areas, paring down the archived redo log areas (those files to compress well) and other areas.
  • If things get extreme or you have been doing this for donkeys years {note to none-UK people, this means many, many years} you start altering PCTFREE and checking over extent sizes.

My best ever ratio of database size to raw data was around 1.6 and it took an awful lot of effort and planning to get there. And an IT manager who made me very, very aware of how much the storage was costing him (it is not the disks, it’s all the other stuff).

I should also just mention that the amount of disk you need is only one consideration. If you want your database to perform well you need to consider the number of spindles. After all, you can create a very large database indeed using a single 2TB disc – but any actual IO will perform terribly.

SQL*Plus Line Insertion June 22, 2010

Posted by mwidlake in development.
Tags:
4 comments

I’m quite new to SQL*Plus, having only been using it for just shy of 19 years… And today I learnt something new about it. You can over-write lines anywhere in the buffer by stating the line number:

  1  select p.surname
  2        ,p.first_forename
  3        ,a.house_number
  4        ,a.post_code
  5  from person p
  6      ,adres a
  7* where p.addr_id=a.addr_id

-- Damn, miss-spelt address in line 6
TDB> 6   ,address a
TDB> l
  1  select p.surname
  2        ,p.first_forename
  3        ,a.house_number
  4        ,a.post_code
  5  from person p
  6    ,address a
  7* where p.addr_id=a.addr_id 

I use SQL*Plus a lot as (a) I am getting stuck in my ways (b) it is always available on all Oracle sites (c) I can use all those useful little scripts I’ve developed over the years. Using “edit” to fire up vi or notepad to edit the file is fine but for correcting a typo I usually type in the line number and c/mistake/correct. This other method I’ve just come across of replacing the line could be useful.

But I played around and found I could do a couple of other things. I can add lines at the end of the script {maybe not so useful as, when you do a list, you are at the end of the buffer so you just insert and type) – but I can also insert lines at the START of the script:

  1  select p.surname
  2        ,p.first_forename
  3        ,a.house_number
  4        ,a.post_code
  5  from person p
  6    ,address a
  7* where p.addr_id=a.addr_id

TDB> 8 and a.dob <sysdate-(18*365)
TDB> l
  1  select p.surname
  2        ,p.first_forename
  3        ,a.house_number
  4        ,a.post_code
  5  from person p
  6    ,address a
  7  where p.addr_id=a.addr_id
  8* and a.dob <sysdate-(18*365)

-- if you enter a line number a way beyond the end of the buffer, SQL*Plus
-- intelligently corrects it to the next valid line number
TDB> 12 order by 1,2
TDB> l
  1  select p.surname
  2        ,p.first_forename
  3        ,a.house_number
  4        ,a.post_code
  5  from person p
  6    ,address a
  7  where p.addr_id=a.addr_id
  8  and a.dob <sysdate-(18*365)
  9* order by 1,2

-- And it works from the other end of the file. Only it does not replace the
-- first valid line, it inserts the new line and moves all the others "down".
TDB> 0 select count(*) from (
TDB> l
  1  select count(*) from (
  2  select p.surname
  3        ,p.first_forename
  4        ,a.house_number
  5        ,a.post_code
  6  from person p
  7    ,address a
  8  where p.addr_id=a.addr_id
  9  and a.dob <sysdate-(18*365)
 10* order by 1,2
TDB> a  )
 10* order by 1,2 )
-- script finished...
TDB> /
and a.dob <sysdate-(18*365)
    *
ERROR at line 9:
ORA-00904: "A"."DOB": invalid identifier

-- Damn! another typo.
-- I think in this case I will just go to the line and <em>C</em>hange the character - it 
-- is less effort than typing the whole line again.
TDB> 9
  9* and a.dob <sysdate-(18*365)
TDB> c/a./p./
  9* and p.dob <sysdate-(18*365)
DWPDV1> l
  1  select count(*) from (
  2  select p.surname
  3        ,p.first_forename
  4        ,a.house_number
  5        ,a.postcode
  6  from person p
  7    ,address a
  8  where p.addr_id=a.addr_id
  9  and p.dob <sysdate-(18*365)
 10* order by 1,2 )
TDB> /
Any Key>

  COUNT(*)
----------
     31963

1 row selected.

Up until now, to insert a new line at the start of the file I either had to save and edit it or I altered the first line and re-inserted the original. I do quite often want to insert a first line as I like to use the “select count(*) from (original_query) to test the performance of a sql statement without having the actual data sent over SQL*Net to my session. It takes that network activity out of the equation.

The other thing I do occasionally is add an explain plan statement:

TDB> l
  1  select count(*) from (
  2  select p.surname
  3        ,p.first_forename
  4        ,a.house_number
  5        ,a.post_code
  6  from person p
  7    ,address a
  8  where p.addr_id=a.addr_id
  9  and p.dob <sysdate-(18*365)
 10* order by 1,2 )
TDB> -13 explain plan set statement_id='MDW' for
TDB> l
  1  explain plan set statement_id='MDW' for
  2  select count(*) from (
  3  select p.surname
  4        ,p.first_forename
  5        ,a.house_number
  6        ,a.post_code
  7  from person p
  8    ,address a
  9  where p.addr_id=a.addr_id
 10  and p.dob <sysdate-(18*365)
 11* order by 1,2 )
TDB> /

Explained.

TDB> 

Friday Philosophy – CABs {an expensive way to get nowhere?} March 11, 2010

Posted by mwidlake in biology, development, Friday Philosophy, Management.
Tags: , ,
4 comments

A few years ago, my wife and I went to New York for a holiday. We got a cab from the airport into Manhattan. It was an expensive way to see, at great length, some of the more uninteresting automobile transit routes through New York. We arrived at our hotel a great deal later than we anticipated. And with a lot of our paper dollars no longer in our possession.

I’ve also taken cabs through London, usually at the weekend to get back to Liverpool Street Station. The trip is generally quick, painless and not too expensive, no matter what bit of London is currently being dug up. Those black-cab drivers know their stuff.

Of course, the CABs I refer to in the title of this Friday Philosophy are not private cars for hire. In this context CAB is Change Advisory Board. A term that can make grown developers weep. If you do not know, the Change Advisory Board is a group of people who look at the changes that are planed for a computer system and decide if they are fit for release. My personal experience of them has been similar to my experience of the taxi variety, though sadly more of the New York than London experience.

You might expect me to now sink into a diatribe {ie extended rant} about how I hate CABs. Well, I don’t. CABs can be a part of a valuable and highly worthwhile process control mechanism. Just as proper QA is core to any mature software development process, so CABs are important in getting informed, talented stakeholders to review proposed changes. They check for overall system impact, clashes with other proposed changes that individual development streams may be unaware of, use their own {hopefully deep and wide} experience to consider the changes and to verify Due Diligence has been invoked {that last one is a bit of a minefield and where, I believe, many CABs fail}.

Sadly, though this is often the aim, the end result is too often a bunch of uninformed and technically naive politicos trying to wield power, using the CAB meeting as an extended game of management chess.

I’ve seen CABs trade changes. “I’ll let you have X if I can have Y and Z”. I’ve seen CABs turn down changes because the form had spelling mistakes in it. I’ve seen CABs object to a change that will save the company 5 million pounds a day because it lacked one signature.

That last one just stopped me in my tracks {I’m not exaggerating either, if anything I am underplaying the cost impact of that decision. I saw the figures and I wasted a couple of days of my life checking, 5 million pounds a day was the least I felt I could prove.} We are talking about enough money every day to pay the salary of everyone on the CAB for several years. And they blocked it because the DBA team had not signed off the change. No effort was made to address the lack of the signature in any way, the change was just refused.

The DBA Team had not signed off the change because the one and only DBA Team Leader who was allowed to sign off was on holiday for two weeks. They needed that holiday too, for other but I suspect linked reasons.

Now, I knew the DBA Team Lead and he was a good bloke, he knew his stuff and he was not paid 5 million pounds a day. His deputy was paid even less but was no less talented but she was not allowed to sign off the change as she was not the DBA Team Lead.

That was a CAB gone very wrong. The process of the CAB had been allowed to over-rule good business sense. It was also overruling general and technical sense, but that really is secondary to what keeps the business making a profit.

I’ve seen the opposite of course, technical teams that just apply whatever changes they feel are fit, with no oversight or CAB. To be honest, this less controlled process seem to mess up less often than a poor CAB process as the technicians know they are the ones who will spend the weekend fixing a mess if one occurs. But that mess up will occur eventually, if control is lacking, and the bigger and more complex the IT environment, the greater the chance of the mess up.

So, I feel CABs are good, no make that Great, if you have the right people on them and you have a sensible cascade of authority so one person being away does not block the system. That is quite a bit harder to put in place than a simple “Dave A, John, Andrea, Alex, Raj, Dave P, Mal, Malcolm and Sarah have final signoff” which most CABs effecively become.

But there is one last fault of CABs I want to highlight. They tend to treat all changes in the same way and all changes are not the same. Upgrading the underlying OS is not the same as adding a cardinality hint to one Business Objects report.

If your CAB or change process treat the two above examples the same, then your CAB or change process is broken. Now, in all IT “rules of thumb” there is an exception. In this case, I am truly struggling to think of one. My feeling is that if your change process treats an OS upgrade the same as adding a hint to a report, it is not fit for purpose.

Those are my main issue with CABs. They should be of significant business importance, but nearly always they are implemented with one process to deal with all situations and then get taken over by people with an “Office Politics” agenda as opposed to a “Getting the best job we can reasonably expect done” agenda.

I’m very passionate about this and I have a way I hope can throw this issue into context, an analogy.

Ask yourself this senario.
You go to your doctor with a niggly cough you have had for a week OR you go to your doctor because you almost passed out each day you got out of bed for the last three days.
If your doctor treated you the same for both sets of symptoms, would you be happy with that doctor?

Why are all IT changes handled by most CABs in exactly the same way?

(BTW if you ever almost collapse when you get out of the bed in the morning, do NOT go to work, go instead to your doctor and ask them for a full medical and if he/she does not take blood pressure readings and order a full blood chemisty test, go find a new doctor.)

Making Things Better Makes Things Worse February 11, 2010

Posted by mwidlake in development, Management, Perceptions.
Tags: , ,
12 comments

This could be a Friday Philosophy but I’ve got a couple of those lined up already. Anyway, I am suffering at work at the moment. I’m encountering a phenomenon that I have talked about with Dennis Adams a couple of times. It probably has a proper term, but basically it is the odd situation that when you make things better, you get more complaints. {Dennis, do you know the proper term?}

{Update. Dennis was good enough to link to this paper he wrote on customer feedback}

Anyway, Let me explain. You have an application sitting on a database. The screens are slow, the reports take an age to come out, your might even have considerable system instability and unplanned outages. The users are not unhappy. They were unhappy last year. Now they are just cynical and they just expect the system to be slow, unresponsive, flaky. So they do not report any problems.

Then things change. The system gets some much-needed care and attention and now the slowest reports get tuned up, the screens come back faster and less spontaneous department-wide coffee breaks are caused by the system crashing. Everything gets better. But not for the help desk, now they start getting calls. “This report is too slow”. “Why can’t I jump straight from the customer screen to the pending orders screen?”. This happens because the users now realise that something can be done. There is a point in complaining as there is a chance their piece of misery could be made better. I certainly went through this many years ago when I inherited a system that crashed every week. No one mentioned it, they just went to tea and complained about it. The first time it crashed after I arrived I could not believe that no one had called before I had realised it had died. Once it had been up solidly for a couple of months, then when it crashed boy did we hear about it!

Also, when you improve a system and things generally get better, occasionally something will improve and then fall back a bit. Hardly anyone says “thanks” for the initial improvement but they will say something if it improves and then drops back.

That is what is happening for my main client at the moment. The system was not that bad, but it needed some help. Why else would I be there? I’ve been beavering away with the rest of the team and we are making things better, so far mostly at an underlying “getting the overall system straight” level. A new chap has joined to help concentrate on performance and he is really making inroads into specific things, individual reports and processes that need a good sorting out.

So now that things are getting better and performance is generally improving, anything that is still slow is being brought up by the development and support teams. Also, we’ve made a few things slower (I’m sorry, it just happens like that) and they certainly get mentioned.

So, I’m busy. And I could get annoyed at people asking why X is slower when Y and Z are faster. But I don’t, because Dennis explained this counter intuitive theory to me.

I know things are getting better as people are annoyed as opposed to apathetic:-)

Turning on SQL Audit February 2, 2010

Posted by mwidlake in development, performance, Testing.
Tags: , , ,
6 comments

<Previous post…

I want to test out using the SQL AUDIT functionality.

The first thing you have to do (having read up on it a bit) is to enable it for the database. You do this by setting the initialization parameter AUDIT_TRAIL. On version 10.2 you have the options to write the audit entires to:

  •  The DB, by setting it to DB or DB_EXTENDED {not DB,EXTENDED as the manual says, that is the old format}. This puts the entries into the table  SYS.AUD$
  • The operating system, by setting it to OS, XML or XML_EXTENDED. If you set it to XML or XML_EXTENDED then the data is written out in XML format. You also optionally set AUDIT_FILE_DEST to where you want to data to be written to.

Writing the audit trail to the OS is potentially more secure, as you can stop those cunning and devious DBAs messing with the audit trail. {I’m not so sure that this really helps that much – if anyone knows of any DBAs caught out being naughty solely as a result of using the OS to store the SQL AUDIT records, it would be a fascinating comment}

I want to write to the DB as I want to be able to get at the audit data easily and I am not sure how I want to interrogate it. I’m faster with SQL than SED and AWK.

I also decided up front I wanted to use DB_EXTENDED so that the triggering SQL statement and all bind variables are caught, so I can see more about what it triggering the audit record. I am cautious of the impact of storing CLOBs though, which these two values are stored as. I’ve had performance issues moving lots of CLOBS around and I know from some old colleagues that Secure Files are a lot faster. If Secure Files are faster, that means CLOBs are slower:-). If the audit trail seems to add too much burden on my system, swapping back to just DB will be my first step.

Now for the bad news. You can’t just turn on AUDIT. That initialization parameter is not dynamic. You can’t even enable it for your session. It will need a restart of your database.

This tells me something. Oracle needs to do some setting up for SQL AUDIT when it starts the instance. Either start a new process, enable functionality in one of it’s regular processes or set up structures in memory to cope. Or a mixture thereof. I strongly suspect the need for memory structures {but this is only because, in reality, I have done some testing and I am writing this up afterwards}.

I should not really need to say this but DON’T go turning this on for a production system without extensive testing somewhere else first. There is not a lot “Out There” about the details of the performance impact of AUDIT but the general opinion is there is some; and that is reasonable given it is going to write database records for every action audited. Also, you have no idea yet of any knock-on effects. You know, things you did not expect that causes your database to lock or crash and you to get fired.

{Question, what happens if you alter the initialization file and restart only one node of a RAC database? I don’t know and so I should test that. My current test system is not RAC, but the final destination for this stuff is RAC}.

You probably also want to check that no one has gone and tried turning on SQL AUDIT on things already. You never know if someone else decided to have a play with this and issued a load of AUDIT statements only to find nothing happened – and left what they did in place “as nothing happened”. I already know of one example of this happening…

Here is a little script I knocked up to see what is currently set to be audited:

-- what_is_audited.sql
-- Martin Widlake 11/01/10
-- simple listing of what auditing is currently set
set pages 100
set pause on
spool what_is_audited.lst
select * from dba_priv_audit_opts
order by user_name,privilege
/
select * from sys.dba_stmt_audit_opts
order by user_name,audit_option
/
select * from DBA_OBJ_AUDIT_OPTS
order by owner,object_name
/
spool off
clear col
--
-- EOF
--

And some sample output. I’m not going to explain it in this post, but you can have a look though it.

DEV3&gt; @what_is_audited
USER_NAME                      PROXY_NAME
------------------------------ ------------------------------
PRIVILEGE                                SUCCESS    FAILURE
---------------------------------------- ---------- ----------
MDW1
ALTER SYSTEM                             BY ACCESS  BY ACCESS
MDW1
AUDIT SYSTEM                             BY ACCESS  BY ACCESS
MDW1
CREATE SESSION                           BY ACCESS  BY ACCESS

ALTER SYSTEM                             BY ACCESS  BY ACCESS

AUDIT SYSTEM                             BY ACCESS  BY ACCESS

CREATE SESSION                           BY ACCESS  BY ACCESS

6 rows selected.

USER_NAME                      PROXY_NAME
------------------------------ ------------------------------
AUDIT_OPTION                             SUCCESS    FAILURE
---------------------------------------- ---------- ----------
MDW1
ALTER SYSTEM                             BY ACCESS  BY ACCESS
MDW1
CLUSTER                                  BY ACCESS  BY ACCESS
MDW1
CONTEXT                                  BY ACCESS  BY ACCESS
MDW1
CREATE SESSION                           BY ACCESS  BY ACCESS
MDW1
DATABASE LINK                            BY ACCESS  BY ACCESS
MDW1
DELETE TABLE                             BY ACCESS  BY ACCESS
MDW1
...
TYPE                                     BY ACCESS  BY ACCESS
MDW1
UPDATE TABLE                             BY ACCESS  BY ACCESS
MDW1
USER                                     BY ACCESS  BY ACCESS
MDW1
VIEW                                     BY ACCESS  BY ACCESS

ALTER SYSTEM                             BY ACCESS  BY ACCESS

CLUSTER                                  BY ACCESS  BY ACCESS

CONTEXT                                  BY ACCESS  BY ACCESS

CREATE SESSION                           BY ACCESS  BY ACCESS

DATABASE LINK                            BY ACCESS  BY ACCESS

DIMENSION                                BY ACCESS  BY ACCESS

DIRECTORY                                BY ACCESS  BY ACCESS

INDEX                                    BY ACCESS  BY ACCESS

MATERIALIZED VIEW                        BY ACCESS  BY ACCESS
...
USER                                     BY ACCESS  BY ACCESS

VIEW                                     BY ACCESS  BY ACCESS

56 rows selected.

OWNER                          OBJECT_NAME                    OBJECT_TYPE
------------------------------ ------------------------------ --------------
ALT     AUD     COM     DEL     GRA     IND     INS     LOC     REN     SEL
------- ------- ------- ------- ------- ------- ------- ------- ------- ----
UPD     REF EXE     CRE     REA     WRI     FBK
------- --- ------- ------- ------- ------- -------
MWPERF                         FORN_M_SEQ                     SEQUENCE
-/-     -/-     -/-     -/-     -/-     -/-     -/-     -/-     -/-     A/A
-/-     -/- -/-     -/-     -/-     -/-     -/-
MWPERF                         PERSON                         TABLE
A/A     A/A     A/A     A/A     A/A     A/A     A/A     A/A     A/A     A/A
A/A     -/- -/-     -/-     -/-     -/-     A/A
MWPERF                         ROAD_TYPE                      TABLE
-/-     -/-     -/-     A/A     -/-     -/-     A/A     -/-     -/-     A/A
A/A     -/- -/-     -/-     -/-     -/-     -/-

If you discover you have a lot of things set to be audited, ESPECIALLY if they are auditing select access, think about turning some or all of it off before you enable AUDITING by setting that initialization parameter.

Once you have turned on the feature, you can start testing it…

Accessing Roles in stored PL/SQL October 22, 2009

Posted by mwidlake in development, internals.
Tags: , ,
7 comments

Whilst looking for the minimum privileges I needed to execute DBMS_STATS.FLUSH_DATABASE_MONITORING_INFO {it is ANALYZE ANY, by the way} I discovered something about PL/SQL and roles that I did not know. Now, any right I had to claim expertise in PL/SQL expired at least 8 years ago but I asked some friends who ARE still professional PL/SQL experts and they did not know this either.

Privileges granted via Roles to a user are not available to stored PL/SQL created by that user, correct? This is widely known and understood. You have to grant priveleges directly to the user for them to be seen in the PL/SQL packages, functions etc.

Having found that I needed the ANALYZE ANY privilege as I mentioned above, I asked the DBA team to grant my user that privilege on Production and Staging. They did so – via a role. “it won’t work” I said “I run the code via a package, it won’t see the privilege” and proved it by running DBMS_STATS.FLUSH_DATABASE_MONITORING_INFO via a quickly constructed demo. Except I only proved my ignorance, it worked. WHY?

If a package is created such that it is executed with invokers rights then roles are seen

This is my test script:

create or replace package test1 is
procedure run_flush;
end test1;
/
--
create or replace package test1 authid current_user is
procedure run_flush is
cursor get_ses_roles is
select role
from session_roles;
begin
  dbms_output.put_line('starting');
  for ses_roles_rec in get_ses_roles loop
    dbms_output.put_line(ses_roles_rec.role);
  end loop;
  dbms_output.put_line('flushing');
  dbms_stats.flush_database_monitoring_info;
  dbms_output.put_line('ending');
end;
begin
  null;
end;
/

I create this package as user MDW.

Now as a privileged user I create a role and grant analyze_any to the role.

MGR>create role mdw_role
Role created.
MGR>grant analyze any to mdw_role;
Grant succeeded.

I’ll just prove that user MDW cannot yet execute the monitoring procedure

MDW> exec dbms_stats.flush_database_monitoring_info
BEGIN dbms_stats.flush_database_monitoring_info; END;
*
ERROR at line 1:
ORA-20000: Insufficient privileges
ORA-06512: at "SYS.DBMS_STATS", line 2148
ORA-06512: at "SYS.DBMS_STATS", line 14135
ORA-06512: at line 1

Now I grant the role

MGR>grant mdw_role to mdw
Grant succeeded.

MDW has to log out and back in again to see the role correctly. Having done this I check for the role and then try to execute the test procedure:

MDW> select * from session_roles

ROLE
------------------------------
CONNECT
MDW_ROLE
2 rows selected.

MDW> exec test1.run_flush
starting
CONNECT
MDW_ROLE
flushing
ending

PL/SQL procedure successfully completed.

You can see that the package sees the roles and it executes the procedure successfully. So, stored PL/SQL can utilise privileges via roles if the packages is created with authid current_user, ie executors rights.

I better admit, as someone else might raise it, that this is not the best demonstration of this feature. I recreated the package with the first line set to:

create or replace package test1 is

ie the default of owners privileges. I now re-execute the call to the package:-

MDW> exec test1.run_flush
starting
flushing
ending

PL/SQL procedure successfully completed.

Note that the roles are no longer seen. However, the DBMS_STATS.FLUSH_DATABASE_MONITORING_INFO still executed correctly.
Why?
Well, because if you look at the package specification of DBMS_STATS you see:-

create or replace package sys.dbms_stats authid current_user is

It makes sense. It is dangerous for system-owned packages to be executing as the owner, ie SYS, as execute access to the package would allow access to SYS privileges.

Which, of course, is why my little proof script executed the flush correctly and I looked a bit silly in front of the DBA team. Oh well.

Follow

Get every new post delivered to your Inbox.

Join 221 other followers