IOT Part 5 – Primary Key Drawback – and Workaround | Martin Widlake's Yet Another Oracle Blog

IOT Part 5 – Primary Key Drawback – and Workaround August 17, 2011

Posted by mwidlake in Architecture, development, performance, Uncategorized.
Tags: Architecture, design, index organized tables, IOT, performance
trackback

<..IOT1 – the basics
<….IOT2 – Examples and proofs
<……IOT3 – Significantly reducing IO
<……..IOT4 – Boosting Buffer Cache efficiency
……….>IOT6a – Slowing Down Insert
…………>IOT6(B) – OLTP Inserts

One of the drawbacks of IOTs is that they have to be organised by the primary key of the table. If your table does not have a primary key, it cannot be Index Organized.

I would argue that any table that holds persistent data (ie it is not transient data about to be loaded into the database proper or a temporary working set) should have a Primary Key. If I am working on a system and come across a table without a Primary Key I immediately challenge it. {There are occasional, valid reasons for a persistent table to lack a PK, but I confess I am struggling right now to come up with one – but I digress}. I’m a big fan of database-enforced referential integrity.

The problem is, if you you are making a table into an Index Organized Table so that the records are clustered to match how you process the data, it could well be that the primary key is not related to how you want to order the data. Let me give you an example. {Oh, and for brevity, I’ll put the SQL statements to create the examples at the end of this post}.

mdw11> desc ACCOUNT
 Name                                                  Null?    Type
 ----------------------------------------------------- -------- ----------------------
 ACCO_TYPE                                             NOT NULL NUMBER(2)  ---PKK
 ACCO_ID                                               NOT NULL NUMBER(10) ---PK
 NAME                                                  NOT NULL VARCHAR2(100)
 DATE_1                                                NOT NULL DATE
 NUM_1                                                          NUMBER(2)
 NUM_2                                                          NUMBER(2)

mdw11> desc TRANSACTION_HEAP
 Name                                                  Null?    Type
 ----------------------------------------------------- -------- ----------------------
 TRAN_TYPE                                             NOT NULL NUMBER(2)  ---PK
 TRAN_ID                                               NOT NULL NUMBER(10) ---PK
 ACCO_TYPE                                             NOT NULL NUMBER(2)
 ACCO_ID                                               NOT NULL NUMBER(10)
 CRE_DATE                                              NOT NULL DATE
 VC_1                                                  NOT NULL VARCHAR2(100)
 DATE_1                                                         DATE
 NUM_1                                                          NUMBER(2)
 NUM_2                                                          NUMBER(2)

This is a classic parent-child relationship, each account has a set of transactions. I’ve expanded on my prior example by:

changing the parent to be called ACCOUNT and giving it a two-part Primary Key, ACCO_TYPE and ACCO_ID.
Changing the child to be called TRANSACTION and given it a Primary Key of TRAN_TYPE and TRAN_ID.
In a real system I would create a foreign key from TRANSACTION.ACCO_TYPE,ACCO_ID to the ACCOUNT table primary key.

Note that the Primary Key on the TRANSACTION table is NOT based on the account columns. Maybe in theory the primary key on the transaction table would be the account columns and the cre_date – if the cre_date held a datetime AND two records could not be created on the same second. If we used a timestamp then you might be able to argue no record would be created in the same fraction of a second – except that often transactions get given a fixed time. Midnight springs to mind (consider when you would add the accrued interest on a savings account). So, a new surrogate Primary Key is intoduced, a transaction type and ID. TRAN_TYPE and TRAN_ID are the primary key of the TRANSACTION table.

I’d say that I see such two-part primary keys more often then single column primary keys these days. Possibly because so many databases receive information from other systems or even applications on the same database.

As before, I create 10,000 parent records (ACCOUNT) and 10,000 random child records (TRANSACTION_HEAP) each day for 100 days.

Also as before, I want to select information grouped by account. I want all the transactions for an account, not all transactions on a day or for a range of transaction IDs. Hopefully this is a scenario most of you will recognise.

Selecting a sum of one of the non-indexed columns and a count of records for a given account takes quite a bit of effort on the part of the HEAP table:

select sum(num_1), count(*) from transaction_heap th where acco_type=10 and acco_id=123

SUM(NUM_1)   COUNT(*)
---------- ----------
      1201        116
Elapsed: 00:00:02.68

Execution Plan
---------------------------------------------------------------------------------------
| Id  | Operation          | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |                  |     1 |    10 |  3466   (1)| 00:00:52 |
|   1 |  SORT AGGREGATE    |                  |     1 |    10 |            |          |
|*  2 |   TABLE ACCESS FULL| TRANSACTION_HEAP |   100 |  1000 |  3466   (1)| 00:00:52 |
---------------------------------------------------------------------------------------

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
      13929  consistent gets
      13921  physical reads

Of course, it has to do a full table scan as my Primary Key is on two columns that have nothing to do with the query. I can repeat this statement as often as I like, it takes the same number of physical reads and consistent gets as it is not caching the information.

I add an index on the ACCO_TYPE, ACCO_ID and CRE_DATE columns and re-run the query:

select sum(num_1),count(*) from transaction_heap th where acco_type=10 and acco_id=123

SUM(NUM_1)   COUNT(*)
---------- ----------
      1201        116
Elapsed: 00:00:00.01

Execution Plan
---------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name               | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                    |     1 |    10 |   103   (0)| 00:00:02 |
|   1 |  SORT AGGREGATE              |                    |     1 |    10 |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID| TRANSACTION_HEAP   |   100 |  1000 |   103   (0)| 00:00:02 |
|*  3 |    INDEX RANGE SCAN          | TRHE_ACCO_CRDA_IDX |   100 |       |     3   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
        120  consistent gets
          0  physical reads

I ran it twice to get rid of the parse overhead, but the first time it did a load of physical reads to support those 120 consistent gets.

I could recreate the TRANSACTION_HEAP table as an IOT of course – but it will be organized by the TRAN_TYPE and TRAN_ID columns. That is useless to me. Even if I add a secondary index on the ACCO_TYPE, ACCO_ID and CRE_DATE columns it will at best be no better than the above HEAP table and, because the secondary index will hold rowid guesses and will sometimes have to use the primary key information to walk down the index, it will be worse. {I am not sure I have explained that bit yet about row guesses. Post 6?}

So, if you want the information organized in an order that is not helped by the Primary Key of the table, an IOT is useless to you. You cannot achieve that physical record grouping by the IOT method.

I am going to do something else though. I’m going to sort of change the rules to work around the issue.

As far as the physical implementation is concerned, a Primary Key is in effect just a unique index and two rules. The rules are that all the columns in the Primary Key must be mandatory and there can only be one PK on a table. I can have as many unique indexes as I like, so long as the key combinations lead to no duplicate rows. I can alter my Primary Key – it is not set in stone.

Before I go any further I am going to stress that I am about to abuse the concept of the Primary Key. I’d need to do a seperate blog to fully justify saying what a Primary Key is, but part of the concept is that no column must be derivable from other columns in the PK and it must be the minimum number of columns required to make the key unique.

We want to group the data by the account columns and the creation date. So let’s define a Primary Key that is ACCO_TYPE, ACCO_ID, CRE_DATE and whatever else we need to guarantee the key is unique. In our case that would be TRAN_TYPE and TRAN_ID – the current Primary Key! If I knew I would always want all records for the account, I could drop the CRE_DATE out of my fake Primary Key, but I know that the creation date is very often important. You may want activity for the last month, last quarter, a stated date or even an exact datetime. For all those cases, including the CRE_DATE column is highly beneficial.

So, I create TRANSACTION_IOT below and populate it with data.

desc transaction_iot
 Name                                                        Null?    Type
 ----------------------------------------------------------- -------- --------------
 TRAN_TYPE                                                   NOT NULL NUMBER(2)
 TRAN_ID                                                     NOT NULL NUMBER(10)
 ACCO_TYPE                                                   NOT NULL NUMBER(2)
 ACCO_ID                                                     NOT NULL NUMBER(10)
 CRE_DATE                                                    NOT NULL DATE
 VC_1                                                        NOT NULL VARCHAR2(100)
 DATE_1                                                               DATE
 NUM_1                                                                NUMBER(2)
 NUM_2                                                                NUMBER(2)

--
--

OWNER    TABLE_NAME          NUM_ROWS      BLOCKS AVG_L GLS ULS LST_ANL      PRT  SAMP_SIZE
-------- -------------- ------------- ----------- ----- --- --- ------------ --- ----------
MDW      TRANSACTION_IO      1000,000                94 YES NO  160811 23:05 NO     1000000
         T
INDEX_NAME      TYP PRT UNQ BL     L_BLKS   DIST_KEYS       CLUSTF     LB_KEY     DB_KEY LST_ANL
--------------- --- --- --- -- ---------- ----------- ------------ ---------- ---------- ------------
TRIO_PK         IOT NO  UNI  2     21,433    1058,381            0          1          1 160811 23:05
TRIO_TRAN_UQ    NOR NO  UNI  2      4,386    1000,000      999,405          1          1 160811 23:05

INDEX_NAME                   TABLE_NAME       PSN COL_NAME
---------------------------- ---------------- --- ------------------------------------------------
TRIO_PK                      TRANSACTION_IOT  1   ACCO_TYPE
TRIO_PK                      TRANSACTION_IOT  2   ACCO_ID
TRIO_PK                      TRANSACTION_IOT  3   CRE_DATE
TRIO_PK                      TRANSACTION_IOT  4   TRAN_TYPE
TRIO_PK                      TRANSACTION_IOT  5   TRAN_ID
TRIO_TRAN_UQ                 TRANSACTION_IOT  1   TRAN_TYPE
TRIO_TRAN_UQ                 TRANSACTION_IOT  2   TRAN_ID

Now let’s select our data from that IOT.

select sum(num_1),count(*) from transaction_IOT th where acco_type=10 and acco_id=123

SUM(NUM_1)   COUNT(*)
---------- ----------
      1030         97
Elapsed: 00:00:00.00

Execution Plan
-----------------------------------------------------------------------------
| Id  | Operation         | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |         |     1 |    10 |     5   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE   |         |     1 |    10 |            |          |
|*  2 |   INDEX RANGE SCAN| TRIO_PK |   100 |  1000 |     5   (0)| 00:00:01 |
-----------------------------------------------------------------------------

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          5  consistent gets
          0  physical reads

5 consistent gets. It has walked down the IOT and scanned 3 blocks to collect that data. Our IOT based on an abused Primary Key does the job of supporting range scans efficiently, with the benefits to the Block Buffer Cache I refered to in IOT4

That “Primary Key” I created is NOT a real Primary key. It is not the minimum number of columns I need to uniquely identify a column. My Primary key is on ACCO_TYPE, ACCO_ID, CRE_DATE,TRAN_TYPE and TRAN_ID – the account, the datetime of the transaction and the transaction. What if I was to alter the datetime by a second? I could create a record with the same account, the same transaction_id as an existing record but a second into the future. That is just wrong. After all, the whole point of the TRAN_TYPE and TRAN_ID is to uniquely identify a record. If created the new record I stated above, there would be two records for the one TRAN_TYPE/TRAN_ID.

I protect against this ability to create incorrect records by creating a UNIQUE KEY against the table also, against columns TRAN_TYPE and TRAN_ID. This is unique index TRIO_TRAN_UQ as displayed in the information above. A Primary Key is usually the referenced parent of any referential integrity, ie foreign keys, between this table and any children. However, a Unique Key can also be the target of Referential Integrity. I cannot create a record in TRANSACTION_IOT with the same TRAN_TYPE/TRAN_ID as already exists due to this unique constraint:

insert into transaction_iot_p
values
(2,163 -- existing transaction type and id
,10,11111
,sysdate,'ASCAFWEWEHGWSHERJH',SYSDATE,7,7)
/

insert into transaction_iot_p
*
ERROR at line 1:
ORA-00001: unique constraint (MDW.TIP_TRAN_UQ) violated

Elapsed: 00:00:00.34

So, I have my IOT to support querying code and I have my Unique Constraint to police my original Primary Key and be used as the target for any Foreign Key requirements I might need. This is not a perfect solution – the design will look a little strange to anyone who looks at this database and the Unique Key is supported by a secondary index on an IOT which can have some issues. But it does work.

My “primary key” is no longer a true Primary Key. It is just a tool for allowing me to organise the data physically in a way that will support my application. That is what I meant about changing the rules.

I am willing to abuse a Primary Key in this way because of the performance benefits. It is a solution for a system where most of the query access is against a set of records which would be scatter-gunned across a table if you did not use some sort of physical grouping. If you are reading this and thinking “oh, I am not sure about you doing that to a Primary Key Martin” then you are probably OK to consider this solution. If you can’t see a problem with it then you are either very used to turning off referential integrity and understand the consequences – or you simply do not understand what RI does for your database. If you are in the latter camp, do not even consider doing this. If you are one of those people who works on data warehouse and for whom is it just part of the DW process to turn off RI as that is what you do for data warehouses – DON’T do this!

OK, I’m nearly at the end of this topic but I want to touch on partitioning. You can range partitition an Index Organized Table from 9i I think. It is certainly supported in Oracle 10 upwards. Partitioning is important in this technique because a unique index must contain the partition key if the index is to be locally partitioned – otherwise the index must be global, ie the one index object references all the partitions across the table.

Below is my table creation statement for the IOT organized by the account, creation date and transaction. The table is ranged partitioned by CRE_DATE, into months.

create table transaction_IOT_P
(tran_type number(2)     not null
,tran_id   number(10)    not null
,acco_type number(2)     not null
,acco_id   number(10)    not null
,cre_date  date          not null
,vc_1      varchar2(100) not null
,date_1    date
,num_1     number(2)
,num_2     number(2)
,constraint tip_pk primary key(ACCO_TYPE,ACCO_ID,CRE_DATE,TRAN_TYPE,TRAN_ID) 
--  using index tablespace index_01
,constraint tip_tran_uq unique (TRAN_TYPE,TRAN_ID)
  using index tablespace index_01
)
organization index
tablespace data_01
partition by range  (cre_date)
(partition rm20110601 values less than (to_date('01-06-2011','DD-MM-YYYY')) 
  tablespace data_01
,partition rm20110701 values less than (to_date('01-07-2011','DD-MM-YYYY'))
  tablespace data_01
,partition rm20110801 values less than (to_date('01-08-2011','DD-MM-YYYY'))
  tablespace data_01
,PARTITION RMTOP  VALUES LESS THAN (MAXVALUE)
  tablespace USERS
)
/

You can see the definition of my fake Primary Key and the fact that it does not have a tablespace defined for it – as the ‘organization index’ statement lower down causes the table to be an IOT and the segment will go into the “table” tablespace.
I then state my Unique Index to police the integrity of my table – TIP_TRAN_UQ
I then state the partition clause, ‘partition by range (cre_date)’ followed by my initial partition definitions. It’s as simple as that to partition an IOT.

What gets created? A set of four segments for the IOT, which are primary key index segments of course, not table segments:

@seg_dets
Enter value for seg_name: tip_pk
Enter value for owner: mdw

OWNER    SEG_NAME        SEG TS_NAME     BYTES_K    BLOCKS exts   INI_K   NXT_K
-------- --------------- --- -------- ---------- --------- ---- ------- -------
MDW      TIP_PK RM201106 IP  DATA_01      45,056     5,632   59      64    1024
         01
MDW      TIP_PK RM201107 IP  DATA_01      60,416     7,552   74      64    1024
         01
MDW      TIP_PK RM201108 IP  DATA_01      61,440     7,680   75      64    1024
         01
MDW      TIP_PK RMTOP    IP  USERS        34,816     4,352   49      64    1024

Note that the SEG (type) is “IP” – my script decodes the type into a short mnemonic and IP is Index Partition. You can see the tablespaces those segments are in and the size of the segments. What about that unique index I created?

@seg_dets
Enter value for seg_name: tip_tran_uq
Enter value for owner: mdw

OWNER    SEG_NAME        SEG TS_NAME     BYTES_K    BLOCKS exts   INI_K   NXT_K
-------- --------------- --- -------- ---------- --------- ---- ------- -------
MDW      TIP_TRAN_UQ     IND INDEX_01     35,840     4,480   50      64    1024

It is a single segment, a normal index. I cannot have it as a locally partitioned index as it is a unique index and lacks the partitioning key in it’s definition.

This could be a problem. The usual reason you partition a table is because it is too large to comfortably be held as a single segment {and also for the benefit of partition exclusion, but you don’t usually need that on small tables!}. This means that the global index to support that primary key is going to be large. Now, I made a “mistake” when I created my partitioned IOT – I did not create a partition for this month, some data has gone into the MAXVALUE partition (see the size of the segment above, 34K and 49 extents). If I split that last partition to create a new partition for this month and a new MAXVALUE partition, I will invalidate the global index and I will have to rebuild it. Very large indexes can take a long time and a heck of a lot of temporary space to gather and sort the data. That could be an ongoing maintenance nightmare.

In a recent implementation I did using IOTs I did not create a global unique index to replace the original foreign key. I create a non-unique, locally partitioned index to support some queries using those columns and the table had no children so no Foreign Keys were needed. But there was something else I needed to do as I had removed the referential integrity rules for that table. Remember I sad I am a fan of database enforced referential integrity? Now I “know” the application will not create data that will break the removed Primary Key rule, I “know” I documented what I had done. And I know that in 12 months time there will almost certainly be data that will have duplicate values for that Primary Key if it is not enforced somehow, because it always happends. I need to implement a little script to regularly check for duplicate TRAN_TYPE/TRAN_ID conmbinations being created. If you remove RI from a relational database, you should replace it in some way. Otherwise, you will pretty soon have a non-relational database.

That’s it for this topic. The below is my example script for creating most of the above, in case anyone wants it or wants to verify what I have said.

-- test_iot2.sql
-- create test tables to show how you can work around the PK issue and
-- partition an IOt - and the possible impact on my PK workaround.
spool test_iot2.lst
--
set feed on timi on pause off
--
drop table account purge;
drop table transaction_heap purge;
drop table transaction_iot purge;
drop table transaction_iot_p purge;
--
-- create 10,000 parent records
create table mdw.account
(ACCO_type  number(2)     not null 
,ACCO_id       number(10)    not null 
,name     varchar2(100) not null
,date_1   date          not null
,num_1    number(2)
,num_2    number(2) 
,constraint ACCO_pk primary key(ACCO_type,ACCO_id) 
 using index tablespace index_01
)
tablespace data_01
/
insert into account
select 10 
,rownum
,dbms_random.string('U',mod(rownum,10)+50)
,sysdate-(mod(rownum,500)+1000)
,mod(rownum,99)+1
,trunc(dbms_random.value(0,100))
from dual connect by level <= 5000
/
insert into account
select 15 
,rownum
,dbms_random.string('U',mod(rownum,10)+50)
,sysdate-(mod(rownum,500)+1000)
,mod(rownum,99)+1
,trunc(dbms_random.value(0,100))
from dual connect by level <= 5000
/
--
-- create the table to hold the children as a heap table
create table transaction_heap
(tran_type number(2)     not null
,tran_id   number(10)    not null
,ACCO_type number(2)     not null
,ACCO_id   number(10)    not null
,cre_date  date          not null
,vc_1      varchar2(100) not null
,date_1    date
,num_1     number(2)
,num_2     number(2)
,constraint trhe_pk primary key(tran_type,tran_id) 
 using index tablespace index_01
)
tablespace data_01
/
--
create index trhe_ACCO_crda_idx
on transaction_heap(ACCO_type,ACCO_id,cre_date)
tablespace index_01
/
-- populate the Heap table
-- 100 days, 10000 people
declare
v_num number :=10000; -- number of people
v_str varchar2(60);
begin
dbms_output.put_line (to_char(SYSTIMESTAMP,'HH24:MI:SS.FF'));
for i in 1..100 loop --days to do
  v_str:=dbms_random.string('U',60);
  insert into transaction_heap
    (tran_type,tran_id,ACCO_type,ACCO_id,cre_date,vc_1,date_1,num_1,num_2)
  select mod(rownum,3)+1 
   ,((i-1)*v_num)+rownum
   , 5+(trunc(dbms_random.value(1,3))*5)
   ,trunc(dbms_random.value(1,v_num/2))
   ,sysdate-(100-i) + (rownum/(60*60*24) )
   ,substr(v_str,1,51+mod(rownum,10))
   ,sysdate-(100-i) + ((mod(rownum,30)+1)/3)
   ,mod(rownum,20)+1
   ,mod(rownum,99)+1
  from dual connect by level <=v_num;
end loop;
dbms_output.put_line (to_char(SYSTIMESTAMP,'HH24:MI:SS.FF'));
end;
/
--
--
--
create table transaction_IOT
(tran_type number(2)     not null
,tran_id   number(10)    not null
,ACCO_type number(2)     not null
,ACCO_id   number(10)    not null
,cre_date  date          not null
,vc_1      varchar2(100) not null
,date_1    date
,num_1     number(2)
,num_2     number(2)
,constraint trio_pk primary key(ACCO_type,ACCO_id,cre_date,tran_type,tran_id) 
--  using index tablespace index_01
,constraint trio_tran_uq unique (tran_type,tran_id)
  using index tablespace index_01
)
organization index
tablespace data_01
/
--
-- populate the IOT table
-- 100 days, 10000 people
declare
v_num number :=10000; -- number of people
v_str varchar2(60);
begin
dbms_output.put_line (to_char(SYSTIMESTAMP,'HH24:MI:SS.FF'));
for i in 1..100 loop --days to do
  v_str:=dbms_random.string('U',60);
  insert into transaction_IOT
    (tran_type,tran_id,ACCO_type,ACCO_id,cre_date,vc_1,date_1,num_1,num_2)
  select mod(rownum,3)+1 
   ,((i-1)*v_num)+rownum
   , 5+(trunc(dbms_random.value(1,3))*5)
   ,trunc(dbms_random.value(1,v_num/2))
   ,sysdate-(100-i) + (rownum/(60*60*24) )
   ,substr(v_str,1,51+mod(rownum,10))
   ,sysdate-(100-i) + ((mod(rownum,30)+1)/3)
   ,mod(rownum,20)+1
   ,mod(rownum,99)+1
  from dual connect by level <=v_num;
end loop;
dbms_output.put_line (to_char(SYSTIMESTAMP,'HH24:MI:SS.FF'));
end;
/
create table transaction_IOT_P
(tran_type number(2)     not null
,tran_id   number(10)    not null
,ACCO_type number(2)     not null
,ACCO_id   number(10)    not null
,cre_date  date          not null
,vc_1      varchar2(100) not null
,date_1    date
,num_1     number(2)
,num_2     number(2)
,constraint tip_pk primary key(ACCO_type,ACCO_id,cre_date,tran_type,tran_id) 
--  using index tablespace index_01
,constraint tip_tran_uq unique (tran_type,tran_id)
  using index tablespace index_01
)
organization index
tablespace data_01
partition by range  (cre_date)
(partition rm20110601 values less than (to_date('01-06-2011','DD-MM-YYYY')) 
  tablespace data_01
,partition rm20110701 values less than (to_date('01-07-2011','DD-MM-YYYY'))
  tablespace data_01
,partition rm20110801 values less than (to_date('01-08-2011','DD-MM-YYYY'))
  tablespace data_01
,PARTITION RMTOP  VALUES LESS THAN (MAXVALUE)
  tablespace USERS
)
/
-- populate the IOT_P table
-- 100 days, 10000 people
declare
v_num number :=10000; -- number of people
v_str varchar2(60);
begin
dbms_output.put_line (to_char(SYSTIMESTAMP,'HH24:MI:SS.FF'));
for i in 1..100 loop --days to do
  v_str:=dbms_random.string('U',60);
  insert into transaction_IOT_P
    (tran_type,tran_id,ACCO_type,ACCO_id,cre_date,vc_1,date_1,num_1,num_2)
  select mod(rownum,3)+1 
   ,((i-1)*v_num)+rownum
   , 5+(trunc(dbms_random.value(1,3))*5)
   ,trunc(dbms_random.value(1,v_num/2))
   ,sysdate-(100-i) + (rownum/(60*60*24) )
   ,substr(v_str,1,51+mod(rownum,10))
   ,sysdate-(100-i) + ((mod(rownum,30)+1)/3)
   ,mod(rownum,20)+1
   ,mod(rownum,99)+1
  from dual connect by level <=v_num;
end loop;
dbms_output.put_line (to_char(SYSTIMESTAMP,'HH24:MI:SS.FF'));
end;
/
commit;
--
exec dbms_stats.gather_table_stats(ownname=>USER,tabname=>'ACCOUNT')
exec dbms_stats.gather_table_stats(ownname=>USER,tabname=>'TRANSACTION_HEAP')
exec dbms_stats.gather_table_stats(ownname=>USER,tabname=>'TRANSACTION_IOT')
exec dbms_stats.gather_table_stats(ownname=>USER,tabname=>'TRANSACTION_IOT_P')
--
select * from transaction_iot_p
where rownum < 10
/
insert into transaction_iot_p
values
(2,163 -- existing transaction type and id
,1,11111
,sysdate,'ASCAFWEWEHGWSHERJH',SYSDATE,7,7)
/
insert into transaction_iot_p
values
(3,163 -- new transaction type and id
,1,11111 -- but the whole of the rest of the record is the same.
,sysdate,'ASCAFWEWEHGWSHERJH',SYSDATE,7,7)
/
--
BEGIN
dbms_output.put_line (to_char(SYSTIMESTAMP,'HH24:MI:SS.FF'));
END;
/
--
spool off

Comments»

1. Index Organized Tables – the Basics. « Martin Widlake's Yet Another Oracle Blog - August 17, 2011: […] IOT Part 5 – Primary Key Drawback – and Workaround « Martin Widlake's Yet Ano… – August 17, […]

Reply
2. IOT 2 – First examples and proofs « Martin Widlake's Yet Another Oracle Blog - August 17, 2011: […] IOT Part 5 – Primary Key Drawback – and Workaround « Martin Widlake's Yet Ano… – August 17, […]

Reply
3. IOT part 3 – Significantly Reducing IO « Martin Widlake's Yet Another Oracle Blog - August 17, 2011: […] IOT Part 5 – Primary Key Drawback – and Workaround « Martin Widlake's Yet Ano… – August 17, […]

Reply
4. IOT Part 4 – Greatly Boosting Buffer Cache Efficiency « Martin Widlake's Yet Another Oracle Blog - August 17, 2011: […] <..IOT1 – the basics <….IOT2 – Examples and proofs <……IOT3 – Significantly reducing IO IOT5 – Primary Key issues..> […]

Reply
5. Dom Brooks - August 17, 2011: > I’d say that I see such two-part primary keys more often then single column primary keys these days
Oh really? And where do you see these? On an ERD? I shall suspend my disbelief…
I just checked a certain system – 1000 tables, 180 primary keys, 50 foreign keys. – I was surprised it was that high but it’s an old system. These things were still en vogue back then.

No wonder IOTs are so hard to find in the wild.

Constraints, referential integrity, design – these are so yesteryear, so unagile.
Joke, joke … sarcasm back in the box.

Reply
mwidlake - August 17, 2011: Oh dear Dom – you need a pint.
I know what you mean about RI disappearing though. At first it seemed to be just Data warehouses where they were removed – ostensibly because of performance but I suspected more because it was too much hassle to keep the data clean – but it is spreading.

I think I’ve maybe been lucky these last few years, I’ve seen more systems with RI intact than not and I do see more two-part Primary Keys than single column ones (that is not lucky, it just seems to be the way it is).

One thing I am sure of. Remove your RI and your database will become poisoned with bad data. I’ve yet to see a system where RI was removed (or never implemented) that did not have a load of broken data in there. Often no one even appreciated how bad it was…

Reply
Dom Brooks - August 18, 2011: Bad data indeed – it will find a way of sneaking in there, pesky blighter.

And persisted data is the lifeblood of all the systems I deal with (although I appreciate that that’s not the case for all systems).

I’ve had a couple of conversations about bad data that have emerged from SQL tuning activities, duplicated data or dataset volumes multiplied several times beyond expectations, such conversations often end with “well, that shouldn’t be there”.

It’s probably why there’s such a proliferation of SELECT DISTINCT these days – so often a bad indicator – and sorting or aggregation that shouldn’t be necessary.
6. jgarry - August 17, 2011: XXXXXXX@YYYY> select count(*), constraint_type from user_constraints group by constraint_type;

COUNT(*) C
———- –
1 U
22 P
16601 C

XXXXXXX@YYYY> select count(*) from user_tables;

COUNT(*)
———-
1369

Those P’s are tables that “extend the data dictionary.” All the normal oltp tables don’t have primary keys. Why? This is “Rapid Application Development” from the early ’90’s. The app decides which is the dominant domain (that’s a relational concept) by the spelling of the index name. The app enforces integrity, since the underlying databases don’t really have enough capability to do so (well, they didn’t in the early ’90s, and it’s still an issue if you are db-blind).

So, the valid reason is “the ERP has been developed and refined for years.” In fact, it does work well, aside from driving me crazy every time I want to use an Oracle feature dependent on primary keys (materialized views was but the first to make me notice), and yes, loads of broken data. Of course, most of the broken data actually comes from implementing poorly normalized business rules. But what do you expect from users who expect everything to work like Excel, and expect to be able to change the rules as perceived business conditions change? Successful software companies don’t sell databases, they sell applications.

Another rationalization is “full employment for Joel.” I’ve found both data and bugs that have been wrong for a decade and no one noticed. The stuff that comes out on payroll, checks or invoices tends to be noticed real fast.

Reply
mwidlake - August 17, 2011: God that’s depressing Joel!
In what way does “works quite well” mesh with your comments about inability to use performance centric features like Materialized Views and broken data?!?
The CBO optimizer can use information about unique/primary key indexes.

I just don’t buy that argument about removing RI is for performance. Yes, there are cases when it can not only help but is vital, but do you really need to speed up data load/modification so much that the quality of your data is of zip-all concern? As a wild guess I would say that in 90% of cases where RI is removed for “performance” you could get the performance your business needs with the RI there. The RI is really removed because it hides the issues you have with your data. One of the massive, and I mean MASSIVE problems with datawarehouses is that these business reports get created and businesses rely on them and no one ever seems to bother to check if the results are right {just as you say}. The broken data is one thing, the inaccurate {no, correct that, the WRONG} SQL churning out the results is another. The number of times I have tuned up some DW report and passed it back to the business to be checked for validity and they say “Oh, we don’t know – can’t you check?”. I hate checking as most of the time it reveals that the report has always been wrong. Then you get into arguments about “your code is wrong, it gives different data to the old report”. Yeah, I looked at the actual database and the actual requirements for the report and I wrote something to give you it. Your old report seems to me failing to take into account {usually somethign to do with valid date ranges or even missing blooming table joins}, so I figure that is wrong. Which idiot wrote It?!? Ahhh, I see.

How in heck do businesses expect to do the best they can when their ERP and MI reports are based on cr4p?

I’m going to drink some wine and go to bed. Bl00dy idiot companies and their bl00dy idiot attitudes to IT…and RI.

Reply
jgarry - August 22, 2011: They mesh in that the people who rate the performance (management) aren’t the same people who realize what they are missing (me). But I have to agree with them given their viewpoint. I agree more with your viewpoint. But they make millions of dollars more than either of us, so they must be doing something right.

In the end, the business is the focal point, and it’s their decision, good or bad. Like I said, the real important stuff (read:accounting) gets noticed fast (for sometimes strange values of fast). I too have had to justify why my code is right and what they are doing is wrong (even basic arithmetic, formulae in Excel and fricken’ percentages to sales VP’s), but it is still their decision. Actually, the very first conversion I did as a junior programmer 30 years ago, my reports were $50K/year (out of $30M) off from the existing reports. Turned out to be rounding errors in the old reports. It never gets better, only some details change.
7. Darryl Griffiths - August 19, 2011: Just the comments alone, make this the most “fun” and “laugh out load” post on IOTs yet – The Times.
Keep ’em coming.

Reply
8. Neil Chandler - October 6, 2011: Martin,

A great set of articles on IOT’s. I must read them sometime :0)

One drawback that (I think) you have failed to mention is that Exadata doesn’t like IOT’s. Yup, that holy grail of Oracle sales can’t do a Smartscan if the following conditions are met:

Smart Scans cannot be used on Index Organized Tables (IOTs).
Smart Scans cannot be used on clustered tables.
Smart Scans cannot be used on tables with ROWDEPENDENCIES enabled.

One more coffin for IOT’s, and database design, to get nailed into. Who needs design when you can smartscan all of these non-RI heaps directly into the PGA? Must write a blog about that…

Neil.

Reply
9. balajiintech - October 6, 2011: Hi Martin,

I’m i have an issue in IOT and Partition.

My table is partitioned by date range and table is IOT. this partition is for every quarter. and Oracle stats are updated daily.

Scenario: Q3 partition has 30 Million records and we created a new partition Q4 at end of september.
when stats job is completed, Oracle understands that Q3 has 30 Mil records and Q4 has 0 record. so every time the query refers to q3 it used Index based rule but then query refering to Q4 it does a table scan or bad query plan as it thinks it got no records.

but the problem occurs, when we load the huge volumn of data into q4 partition.
So on 1st of Oct, Stats job is run then the data loading happens. after it completes.. oracle still think that q4 has less volumn, however they have 3million record. as the bad plan is been used by the system. we have huge issue in query response time.

I refered to one of the Jonathan lewis blog and he mentioned about copy stats and its usage.. but one of oracle employee says that copy_stats wont help in Oracle 10g..

Need ur though on this..

Regards
Bala

Reply
mwidlake - October 6, 2011: Hi Bala,

This is one of the classic issues with partitions and statistics, I don’t think the use of IOTs is directly relevant to your issue (though I have noticed one or two oddities with IOTS and stats).

The cost based optimiser thinks that your Q4 partition is empty, as it was when you gathered stats, so it calculates that the quickest way to look at it is a full segment scan. However, you have loaded a considerable volume of data since the stats were gathered. Thus a full scan is going to be a very bad idea if you are only looking for a small number of records that should be accesible via the IOT index (or any other index).

What you need to do is let Oracle know about all this data in the partition before you start running code against it. You could gather stats on that partition directly after loading it (rather the wrong way around as you are doing it at present). A quick gather using a small sample size and block sampling would probably be enough to avoid the problem but you would need to carry out your own tests.
Another valid option would be to copy stats from a previous partition, as suggested by Jonathan (by using dbms_stats.get_XXXX_stats and set_XXXX_stats), or even just set the stats on the new partition with sensible values. You need to be mindful of altering the min and max values for column stats when you copy or set your stats though, so that eg date columns properly reflect what will be in this partition rather than what was in the partition you copy from. Your Oracle employee is probably thinking of the 11g new feature dbms_stats.copy_table_stats, which in effect does the get and set for you and attempts to handle the different column ranges. I’ve not had chance to play with that yet.

Another option would be to delete the stats for this partition, so that any code that would look at this partition would do a dynamic sampling when it is parsed.

Please keep in mind Bala that if you gather your own stats (and probably if you set your own/copy stats) you should state NO_INVALIDATE=FALSE in the
dbms_stats call as otherwise Oracle will age out any existing SQL cursors “sometime over the next few hours”. You want all new cursors that look at that partition to be immediately aged out and re-parsed so that they see the new stats, otherwise exisiting code could still go quite horribly wrong.

Oh, and you can’t use a dynamic sampling hint in the code to work around this. If the CBO sees that stats have been gathered on a segment and they indicate that the segment is empty, the CBO will ignore the hint and NOT do the dynamic sampling. You have to delete the stats for that segment.

Hope that helps

Reply
10. IOT Part 6 – Inserts and Updates Slowed Down (part A) « Martin Widlake's Yet Another Oracle Blog - November 1, 2011: […] <..IOT1 – the basics <….IOT2 – Examples and proofs <……IOT3 – Significantly reducing IO <……..IOT4 – Boosting Buffer Cache efficiency <……….IOT5 – Primary Key Drawback […]

Reply
11. IOTs « Oracle Scratchpad - November 22, 2011: […] Part 5: Primary Key issues […]

Reply
12. Brian Tkatch - January 18, 2012: This is awesome, thanx. Found via Richard Foote’s blog.

I love the attitude, the simple rule “if you want the information organized in an order that is not helped by the Primary Key of the table, an IOT is useless to you.”, the PK attitude, the UNIQUE CONSTRAINT workaround, and the warning “If you remove RI from a relational database, you should replace it in some way. Otherwise, you will pretty soon have a non-relational database.” I’m reading my own thoughts in your blog. 🙂

Typo: recevie

On to #6 now.

Reply
mwidlake - January 18, 2012: Thanks for the feedback Brain and I’m glad you like the content. Typo tracked down and fixed.

Reply