jump to navigation

Overloaded Indexes (for ODC Appreciation Day) October 10, 2017

Posted by mwidlake in database design, development, performance, SQL.
Tags: , ,
trackback

ODC Appreciation Day is an idea that Tim Hall (aka Oracle-Base) came up with, to show out appreciation for the Oracle Technology Network (OTN)/Oracle Developer Community.

Fig 1 This is an efficient range scan

I want to show my support but rather than mention an Oracle “feature” I particularly like I’d like to mention a “trick” that I (and many others) often employ to help performance. That trick is Overloaded Indexes.

We can all appreciate that accessing data in a table via an index is a very good thing to do when we want just the few records for a given index key from the thousands or millions of rows in the table. As an example, we are looking up all the records for someone and we know their first name, last name and date of birth – a very common occurrence in things like hospital or billing systems. So our PERSON table has an index on those three columns. Oracle will read the root block of the index, read the one relevant block in each of the branch levels for the key, find all of the leaf block entries for that key – and then collect the relevant rows from the table. Fig 1 shows how we think of this working. i.e. most of the records we want to find will be close together in the table.

Actually, a lot of people who just “use” the Oracle database as a store of information sort-of think this is how an index always works. It efficiently identifies the rows that you want and that is the end of it. If the index is on the value(s) you are looking up rows for (say LAST_NAME, FIRST_NAME, DATE_OF_BIRTH) the index is ideal and that is as good as it gets.

Fig 2 – a more usual, less efficient index range scan

But in reality, the index lookup is often far less efficient than this and is more like fig 2. Working down the index structure to get all of the required index entries is exactly the same, but the rows you want are scattered all over the table. Oracle has to fetch many table blocks to get your data, maybe as many blocks as there records to be found. This is far from efficient.

So what can you do about this? You already have the “perfect” index, on LAST_NAME, FIRST_NAME, DATE_OF_BIRTH, the values you are looking up. Maybe you could add another column to the index to avoid those situations where there are many people with the same name and date of birth. But you may not have that extra information or it is simply not possible to identify the values in the table any more accurately, you really do need all the rows scattered though that table for the given search key.

There are “architectural” things you can do such as create the table as an Index Organised Table (see my little set of blogs about them starting here). You can also use various methods to group the relevant rows together in the table. But all of those methods are Big Impact. You need to recreate the table or plan for this eventuality up-front when you design the system.

But there is a very specific, easy thing you can do to address this particular problem, for the SQL statement you need to speed up. You can add all the columns your query needs into the index. This is an Overloaded Index.

An Overloaded Index holds not only the table columns in the WHERE clause but all the columns needed from that table for the query.

Why does this work? Because when Oracle identifies the range of keys for the key (LAST_NAME, FIRST_NAME, DATE_OF_BIRTH) all the other columns it needs are also in those index leaf entries and there is no need to get the rows from the table. All those lookups to the table are avoided. Fig 3 at the end of this article demonstrates this.

However, I’ll give you a real world example I had recently. A client had a SELECT statement something like the below, with the execution plan shown, and it was running too slowly for the business requirements:

SELECT (SUM(NVL(T.TRAN_VALUE_CR,0))-SUM(NVL(T.TRAN_VALUE_DB,0))) ,
        COUNT(*)
FROM    ACCOUNTS     A ,
        TRANSACTIONS T
WHERE   A.ACC_XYZ_IND      =:3
AND     A.ACC_ACCOUNT_NO   =:1             
AND     A.ACC_SUBACC_NO    =:2
AND     T.TRAN_XYZ_IND     =A.ACC_XYZ_IND
AND     T.TRAN_ACCOUNT_NO  =A.ACC_ACCOUNT_NO
AND     T.TRAN_SUBACC_NO   =A.ACC_SUBACC_NO
AND     T.TRAN_P_IND       =:4 
AND     T.TRAN_DLM_DATE    >=TO_DATE(:5,'YYYYMMDD')

------------------------------------------------------------
| Operation                                 | Name         | 
------------------------------------------------------------
|   0 | SELECT STATEMENT                    |              |
|   1 |  SORT AGGREGATE                     |              |
|   2 |   NESTED LOOPS                      |              |
|   3 |    NESTED LOOPS                     |              |
|*  4 |     INDEX RANGE SCAN                | ACC_PRIME    |
|*  5 |      INDEX RANGE SCAN               | TRAN2_3      |
|*  6 |    TABLE ACCESS BY LOCAL INDEX ROWID| TRANSACTIONS |
------------------------------------------------------------
Statistics
----------------------------------------------------------
       4740  consistent gets
       3317  physical reads

The index used on TRANSACTIONS is:

INDEX_NAME                   TABLE_NAME       PSN COL_NAME
---------------------------- ---------------- --- --------------
TRAN2_3                      TRANSACTIONS     1   TRAN_ACCOUNT_NO
TRAN2_3                      TRANSACTIONS     2   TRAN_SUBACC_NO
TRAN2_3                      TRANSACTIONS     3   TRAN_DLM_DATE

The index TRAN2_3 on the TRANSACTION table that you can see being used in the plan was for all the columns being used in the WHERE clause that actually helped identify the TRANSACTION records required – TRAN_ACCOUNT_NO, TRAN_SUBACC_NO and TRAN_DLM_DATE (the TRAN_XYZ_IND and TRAN_P_IND were always the same so “pointless” to index).

I added a new index to the TRANSACTION table. I added a new index rather than change the existing index as we did not want to impact other code and we wanted to be able to drop this new index if there were any unexpected problems. I added all the columns on the TRANSACTION table that were in the SELECT list, were in the the WHERE clauses even though they did not help better identify the rows needed. If there had been TRANSACTION columns in an ORDER BY or windowing clause, I would have added them too. So my index looked like this:

create index TRAN2_FQ on TRANSACTIONS
(TRAN_ACCOUNT_NO   ,TRAN_SUBACC_NO   ,TRAN_DLM_DATE
,TRAN_P_IND        ,TRAN_XYZ_IND
,TRAN_VALUE_CR     ,TRAN_VALUE_DB)

It is very, very important that the new index holds every column from the TRANSACTION table that the query needs.To prevent accessing the table, all the data the query needs for that table must be in the index.

The query could now satisfy the query by just using the new index, as the below explain plan shows.

----------------------------------------------------
| Id  | Operation                      | Name      |
----------------------------------------------------
|   0 | SELECT STATEMENT               |           |
|   1 |  SORT AGGREGATE                |           |
|   2 |   NESTED LOOPS                 |           |
|*  3 |    INDEX RANGE SCAN            | ACC_PRIME |
|*  4 |     INDEX RANGE SCAN           | TRAN2_FQ  |
----------------------------------------------------
Statistics
----------------------------------------------------
         56  consistent gets
         52  physical reads

There is now no line in the plan for visiting the table TRANSACTIONS and we are using the new TRAN2_FQ index. The consistent gets and physical reads to satisfy the query have gone down from 4740 and 3317 respectively to 56 and 52. I think that is good enough.

Fig 3 shows what is happening. The new index is effectively a “mini IOT” designed to support the given query.

Fig 3 – Tables? Who needs tables!

There are of course a few caveats. The new index needs to be maintained, which is an overhead on all INSERT/UPDATE/MERGE/DELETE activity on the table. The index will only remove the need to visit the table for queries that are very, very similar to the one it is designed for – ones that use the same rows from the TRANSACTIONS table or a subset of them. If you alter the query you, e.g. select another column from the TRANSACTION table you would need to revisit this overloaded index.

Finally, be careful of modifying existing indexes to overload them to support specific queries. If the index is there to support referential integrity you need to think carefully about this and the modified index may be less efficient for other queries that used the original index (as adding columns to an index make it “wider”).

BTW if you think you recognise this from a recent Oracle Scene article then you would be right. I was pushed for time, I used something I had written before πŸ™‚

Advertisements

Comments»

No comments yet — be the first.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: