jump to navigation

The Frustrated User’s perspective. November 28, 2009

Posted by mwidlake in Perceptions.
Tags: ,
1 comment so far

I got the below email from a friend this evening. Said friend does not work in IT. He works in a large organisation with a large and active IT department that might just be forgetting they provide a service as opposed to laying down the law…

****************************************************************
Hi Martin

For the last few weeks since {an edited out software virus disaster} we have been bombarded with unsolicited security policies from I.T. They pop up during the 10-15 minutes it takes to logon to our computers. You then have to download the policy and sign at the bottom to say whether you accept or decline the policy. When I scanned through the 10th policy I was struck by the fact that none of it applied to my area of responsibility except for one small part that had been covered in excruciating detail in one of their previous pathetic attempts at communicating what is expected of us. And all said missives using what looks like a variation of the english language. Having skipped the policy during a number of recent logons I was now being informed that it is “mandatory” to accept the policy or decline it giving a reason. I declined giving the above observation on the lack of relevance to my role as a reason.

I have now been informed that it is not possible to issue only the relevant policies to individuals (and presumably having identified this is not possible, have not bothered trying in the first place?) and in any case there might come a time when I “might” be given a task where the latest I.T policy applies and therefore I have to be aware of the existance of the policy. I think this latest one was something to do with purchasing software packages from suppliers -although this isn’t entirely clear. There is no way that I would be allowed to purchase software packages, which is a shame as there are off the shelf products that do what we require, whereas the in-house system foisted upon us simply does not provide any reliable or useful information what-so-ever.

The following senario occurs to me. I write a policy on controlling legionella – not unreasonable given that we have swimming pools, showers, air con etc. in our premises. I then send a copy to every employee requiring them to open it — expect them to read it —- understand it —- and accept it, “just-in-case” they get asked to go and run a sports centre. What response do think I would get?

Although the risk of catching legionella is low, people have died as a result, but we do not require everyone to sign a policy for this or any of the other more serious hazards they face at work. I am not aware of any software-purchasing-related deaths of late. For dangerous stuff employees sign one policy when they join the organisation. If they have to deal with a hazard we make them aware by warning them about it and if necessary give them additional training, guidance and support so that they can manage the risk in accordance with the overall policy.

Perhaps we have got this wrong. Maybe we should require all computer users (just for example) to complete a workstation assessment online every day when they start work – and if they don’t their computer should blow up in their face and a guilotine then drop from the ceiling removing their hands so they can’t sue for RSI or eyestrain.

That’ll teach them
************************************************************

I hope I have never been responsible for inflicting enough inconvenienve on my users to make them as aggrieved and angry as my friend.. Thing is, I now worry that I might have…

Friday Philosophy – Statistically Significant November 27, 2009

Posted by mwidlake in performance, VLDB.
Tags: ,
3 comments

There are very few generalist Oracle DBAs around, and very very very few expert generalist Oracle DBAs. Tom Kyte might count, but I’m sure if you pressed him he would admit to knowing nothing about {some key aspect of being a general DBA}. That is because to be an expert on something you have to spend a lot of time on it. Oh, and learn. {I’ve spent a lot of time around my wife but she still confuses me, I just don’t seem to be able to learn}. Oracle is simply too wide a topic to be able to spend a lot of time on all of it, even if by “Oracle” you mean just the core RDBMS technology, which I do. You have to pick an area.

Pete Finnigan specialises in security, Julian Dyke on RAC and Internals, Doug Burns on the relationship between the database and cuddly toys. Oh and ASH/AWR.

So I ask myself, what is my specialty?

Well, if I go on the last 5 working years of my life, it would probably be Oracle Database Statistics. Which is quite ironic given my woeful attempts with Statistics when I tried that maths ‘A’ level all those years back {for non-UK people, an ‘A’ level is what you do at age 17- 18. It’s that point in maths when most of us are convinced logic is replaced by magic}. I’ll go further and say I specialise in Oracle Database Statistics on VLDBS. Maybe even more specific, Gathering and Maintaining Oracle Database Statistics on VLDBs.

Not a very sexy-sounding specialty is it, even in the context of IT technical specialties. I am sure that if I was to tell a lady I wish to impress that I was “a specialist in gathering and maintaining Oracle database statistics on VLDBs” then I would soon be standing alone as she looked for an accountant to talk to {I refer back to my comment on my wife, I repeatedly try to impress her with this line and she never fails to walk away}. Heck, I could spend all my time at the UKOUG Conference next week and struggle to find someone who would respond positively to such a claim to greatness.

But the situation is that I have had to deal with the failures and idiosyncrasies of this area of Oracle for 4 major clients on a dozen or so systems and have discussed it with half a dozen other people who have had challenging times with it. And even now it trips me up all the time. Because, frankly, some if it is not very well implemented (choking automated stats gathering job anyone?), different parts work in different ways (if you state a statid, statown and stattab when you SET_TABLE_STATS the values go into the stats table, if you state them for GATHER_TABLE_STATS, the gathered values go into the data dictionary and the OLD ones go into the stats table – yeah, I know, if you did not know that you are now going ?huh?), some of it is wrapped up in a blanket of confusion and secrecy (what exactly DOES the automated stats job do and when you say “GATHER AUTO” what exactly will Oracle do automatically?).

Thankfully the secrecy side is reducing as Oracle (and others) say more about how some of these things are controlled or decided “automatically” , but then the world shifts under your feet with new versions of Oracle. Stats gathering under 11g is far more open to control than 10, but as my current client is not on 11g then I can’t spend too long looking at that.

So currently I am a expert in 10g Gathering and Maintaining Oracle Statistics on VLDBs. Now if that is not pretty damned specialist then I don’t know what is.

I should blog technical details on all of this {sadly I know the answers to the things I mention above}, but I suspect people would go “Hmmm, interesting…..” in that kind of “who in heck needs to know that! I’m off to do block dumps of temporary segments” way. But if you think otherwise, let me know.

Besides, I am being very, very poor at getting on with the Partition stuff I want to do, even though I have half-written about 4 more posts on it. I need to stick to that for my technical blogs for now.

That and I kind of shy away from being an expert in such an unexciting area, I might get offered {oh dear lord not more} work in it.

Thankfully I also specialise in beta testing new features of Oracle for clients in the travel and media industries, where on-site work in exotic locations is, at times, required… If anyone has any opening in that field, drop me a line :-)

UKOUG Conference approaches. November 26, 2009

Posted by mwidlake in Meeting notes.
Tags:
9 comments

Well, it’s that time of year again. It’s dark by 4pm, weather is miserable and I’m already sick to the hind teeth with Christmas, thanks to the never ending drive by the commercial sector to bully me into buying tatt and rubbish “to make christmas special”.

But there is another staple of the season of the end of November/start of December, which is to trek over to Birmingham for the annual UKOUG conference. Except of course that there are now several conferences run by the UK Oracle User Group, for different Oracle sectors and regions, but this is The Technical Conference. Back to three days and focusing on the database, application server/E-business suite and development. Just like the good-old-days before Larry bought up 300 companies and a big chunk of the software business services industry.

I’ve been lucky enough to go to a couple of Oracle Open Worlds (OOW) and mighty impressive the event is. But I prefer the UKOUG annual conference. It’s not the huge (and, I feel, overwhelming) size of OOW, but still pretty big, I think 10 streams of presentations, workshops and panels this year, and has the added bonus of being more Real. The presentations are to a large part independent from Oracle Corp and even the Oracle Corp presentations tend to be a little less Corporate and a little more Real. People say “how it is” at a User Group.

Plus there is more a feeling of meeting friends and colleagues and like-minded practitioners of every-day living and learning with oracle.

For those new to the scene, it’s far more easy to get to grips with than OOW and yet with Big Names doing top technical presentations, spread over a range of levels from introductory to the esoteric details of stuff most of us never need to know. Plus everything in-between.

Since 2003 I have presented every year except last year {when I decide to go play with elephants in Thailand instead}. I’m not presenting this year either and I would be lying if I said I was upset. I’m utterly gutted. But then I saw the spread of talent and interesting topics when I helped score the abstracts back in May and competition this year was fierce. I’ll be chairing a few sessions though and trying to meet up with people I know, and also people I don’t know but would like to.

If you see some small (5 foot sod all) chap with little glasses, short, brown hair and cream trousers, that is probably me. Come over and say “hi”. If you already know me and come over and say “hi” and I look like a rabbit caught in headlights, well you should know me by now and that I am utterly rubbish at recognising faces or remembering names, but that’s OK as I rarely bite and will be happy to apologise for my lack of social skills. If I’m anywhere near a bar, I’ll probably buy you a drink too, but then you have to buy me one. I’ll be under the table first though.

So, if you are also heading to Birmingham, see you there. If you are not, you are missing out on a fantastic Oracle event.

 

Assisting Partition Exclusion – Partitions for Performance November 23, 2009

Posted by mwidlake in performance, VLDB.
Tags: , ,
10 comments

<Previous PostNext Post>
Partitions are often used to help increase the performance of SQL select activity via the concept of partition exclusion – the CBO will be able to identify which of the partitions could hold the data of interest and ignore the others. This is their main benefit from a SQL Select performance perspective.

The problem is, the partitioning key is not always included in your SQL statements.

I discussed this potential negative impact of partitioning on SQL select performance in this post and this one also. I did not give any proof then, so the first half of this post provides that.

As an example, I have created a table with 45 partitions, partitioned on the column ID, an ascending numeric primary key (traditionally sourced from a sequence). There is an index on the ID column, locally partitioned. The table also has a column CRE_DATETIME, which is the date the record was created. This column is also indexed with a locally partitioned index.

NAME                           VALUE
------------------------------ -------------
compatible                     10.2.0.3.0
cpu_count                      8
db_block_size                  8192
db_file_multiblock_read_count  16
optimizer_mode                 ALL_ROWS
sga_target                     0
sort_area_size                 65536

create table test_p3
(id           number(10) not null
,cre_datetime date not null
,status        number(1) not null
,num_1         number(4) not null -- random 20
,num_2         number(4) -- random 500
,num_3         number(5) -- cycle smoothly
,num_4         number(5) -- Random 10000
,vc_1          varchar2(10)
,vc_2          varchar2(10)
,vc_pad        varchar2(2000))
tablespace users
partition by range (id)
(partition id_01k values less than (1000)
tablespace users
,partition id_02k values less than (2000)
tablespace users
,partition id_03k values less than (3000)
tablespace users
...
,partition id_45k values less than (45000)
tablespace users
--
,partition id_max values less than (maxvalue)
tablespace users
)
/
select table_name,partitioning_type p_type
,partition_count pt_count
from dba_part_tables
where owner=user and table_name = 'TEST_P3'
TABLE_NAME                     P_TYPE    PT_COUNT
------------------------------ ------- ----------
TEST_P3                        RANGE           46

>@ind_cols
IND_NAME           TAB_NAME           PSN       COL_NAME
------------------ ------------------ --------- ------------
TP3_PK             TEST_P3            1         ID
TP_CRE_DT          TEST_P3            1         CRE_DATETIME

{This is a very typical senario, as is the opposite situation where the table is partitioned on date but has a numeric column holding eg order number or customer ID.}

The below shows a select of a single record by ID.

select id,cre_datetime,num_1,vc_1
from test_p3 where id=37123
/
        ID CRE_DATETIME           NUM_1 VC_1
---------- ----------------- ---------- ----------
     37123 12-OCT-2009 02:52         19 RMZAN
1 row selected.

Execution Plan
----------------------------------------------------------
----------------------------------------------
| Id  | Operation                          | Name    | Rows  | 
Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
----------------------------------------------------------------
----------------------------------------------
|   0 | SELECT STATEMENT                   |         |     1 |
  22 |     2   (0)| 00:00:01 |       |       |
|   1 |  PARTITION RANGE SINGLE            |         |     1 |
  22 |     2   (0)| 00:00:01 |    38 |    38 |
|   2 |   TABLE ACCESS BY LOCAL INDEX ROWID| TEST_P3 |     1 |
  22 |     2   (0)| 00:00:01 |    38 |    38 |
|*  3 |    INDEX UNIQUE SCAN               | TP3_PK  |     1 |
     |     1   (0)| 00:00:01 |    38 |    38 |
----------------------------------------------------------------
----------------------------------------------
Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
          3  consistent gets
          0  physical reads

Note the PARTITION RANGE SINGLE and pstart/pstop are both 38. Consistent gets are only 3. The query is very efficient as the CBO identified that records matching the WHERE clause could only exist in the one partition. The relevant local index is examined for 2 consisted gets (the BLEVEL of the local index partitions is 1) and then one consistent get against the table.

Now I’ll scan for records for a range of IDs:

select id,cre_datetime,num_1,vc_1
from test_p3 where id between 30980 and 31019
/
        ID CRE_DATETIME           NUM_1 VC_1
---------- ----------------- ---------- ----------
     30980 25-SEP-2009 01:20         15 JQLYA
     30981 25-SEP-2009 01:24          5 WYWCR
     30982 25-SEP-2009 01:28         10 QZSHD
     30983 25-SEP-2009 01:32         18 KQBSU
     30984 25-SEP-2009 01:36          2 HRPMA
....
     31018 25-SEP-2009 03:52          3 YOPJO
     31019 25-SEP-2009 03:56         10 GNGKG

40 rows selected.

Execution Plan
----------------------------------------------------------
----------------------------------------------
| Id  | Operation                          | Name    | Rows  | 
Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
----------------------------------------------------------------
----------------------------------------------
|   0 | SELECT STATEMENT                   |         |    40 |
 880 |     9   (0)| 00:00:01 |       |       |
|   1 |  PARTITION RANGE ITERATOR          |         |    40 |
 880 |     9   (0)| 00:00:01 |    31 |    32 |
|   2 |   TABLE ACCESS BY LOCAL INDEX ROWID| TEST_P3 |    40 |
 880 |     9   (0)| 00:00:01 |    31 |    32 |
|*  3 |    INDEX RANGE SCAN                | TP3_PK  |     2 |
     |     2   (0)| 00:00:01 |    31 |    32 |
----------------------------------------------------------------

Statistics
---------------------------------------------
          0  recursive calls
          0  db block gets
         52  consistent gets
          0  physical reads

Again, you can see partition exclusion, this time as a PARTITION RANGE ITERATOR and Pstart/Pstop of 31 and 32. Consistent gets were higher, but I did fetch several records.

Now a query is issued for a record with a specific value for CRE_DATETIME.

select id,cre_datetime,num_1,vc_1
from test_p3 where cre_datetime = 
   to_date('06-OCT-2009 12:20','DD-MON-YYYY HH24:MI')
/
        ID CRE_DATETIME           NUM_1 VC_1
---------- ----------------- ---------- ----------
     35105 06-OCT-2009 12:20         11 JPQHO
1 row selected.

Execution Plan
----------------------------------------------------------
------------------------------------------------
| Id  | Operation                          | Name      | Rows  |
 Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
----------------------------------------------------------------
------------------------------------------------
|   0 | SELECT STATEMENT                   |           |     1 |
    22 |    48   (0)| 00:00:01 |       |       |
|   1 |  PARTITION RANGE ALL               |           |     1 |
    22 |    48   (0)| 00:00:01 |     1 |    46 |
|   2 |   TABLE ACCESS BY LOCAL INDEX ROWID| TEST_P3   |     1 |
    22 |    48   (0)| 00:00:01 |     1 |    46 |
|*  3 |    INDEX RANGE SCAN                | TP_CRE_DT |     1 |
       |    47   (0)| 00:00:01 |     1 |    46 |
----------------------------------------------------------------

Statistics
---------------------------------------
          1  recursive calls
          0  db block gets
         90  consistent gets
          0  physical reads

Notice that partition exclusion does not occur, you see PARTITION RANGE ALL, Pstart/Pstop of 1 – 46. The CBO cannot identify which partitions may or may not hold records with that CRE_DATETIME. Consistent gets are now a lot higher, 90 (as compared to 3 for the situation where partition exclusion can occur}, as oracle has to prob each and every local index partition to identify if any records for the given value exist.

Of course, you probably would not look for a record with a specific datetime, but in the opposite senario of having partitioned on datetime, it would not be at all unusual to look for a record with a given ID and partition exclusion not being possible.

As I mentioned earlier, this need to scan all of the local indexes because no partition exclusion is possible is something I discussed a couple of weeks back.

Now I will select for the range of dates that my range scan in ID identified:

select id,cre_datetime,num_1,vc_1
from test_p3 
where cre_datetime between to_date('25-SEP-2009 01:20','DD-MON-YYYY HH24:MI')
                   and     to_date('25-SEP-2009 03:56','DD-MON-YYYY HH24:MI')

        ID CRE_DATETIME           NUM_1 VC_1
---------- ----------------- ---------- ----------
     30980 25-SEP-2009 01:20          3 BGAFO
     30981 25-SEP-2009 01:24         15 PXGJD
     30982 25-SEP-2009 01:28         12 PGQHJ
     30983 25-SEP-2009 01:32         17 TIBZG
     30984 25-SEP-2009 01:36         11 EQQQV
...
     31018 25-SEP-2009 03:52         18 FSNVI
     31019 25-SEP-2009 03:56         16 LJWNO
40 rows selected.

Execution Plan
----------------------------------------------------------
------------------------------------------------
| Id  | Operation                          | Name      | Rows  |
 Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
----------------------------------------------------------------
------------------------------------------------
|   0 | SELECT STATEMENT                   |           |    42 |
   924 |    54   (0)| 00:00:01 |       |       |
|   1 |  PARTITION RANGE ALL               |           |    42 |
   924 |    54   (0)| 00:00:01 |     1 |    46 |
|   2 |   TABLE ACCESS BY LOCAL INDEX ROWID| TEST_P3   |    42 |
   924 |    54   (0)| 00:00:01 |     1 |    46 |
|*  3 |    INDEX RANGE SCAN                | TP_CRE_DT |    42 |
       |    47   (0)| 00:00:01 |     1 |    46 |
----------------------------------------------------------------
------------------------------------------------

Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
        136  consistent gets
          0  physical reads

That actually selects back the same records, but now a PARTITION RANGE ALL is invoked, Pstart/Pstop os 1-46 and the number of consistent gets goes up to 136.

I discussed the impact of partitions on range scans in this second post about performance issues with partitions.

Thankfully, you can often alleviate this problem of considering all partitions.

The below shows a scan for records for a range of dates.

select count(*)
from test_p3
where cre_datetime between to_date('18-OCT-2009','DD-MON-YYYY')
                   and     to_date('20-OCT-2009','DD-MON-YYYY')

  COUNT(*)
----------
       721
1 row selected.

Execution Plan
----------------------------------------------------------
----------------------------------
| Id  | Operation            | Name      | Rows  | Bytes | Cost
(%CPU)| Time     | Pstart| Pstop |
----------------------------------------------------------------
----------------------------------
|   0 | SELECT STATEMENT     |           |     1 |     8 |    49
   (0)| 00:00:01 |       |       |
|   1 |  SORT AGGREGATE      |           |     1 |     8 |
      |          |       |       |
|   2 |   PARTITION RANGE ALL|           |   731 |  5848 |    49
   (0)| 00:00:01 |     1 |    46 |
|*  3 |    INDEX RANGE SCAN  | TP_CRE_DT |   731 |  5848 |    49
   (0)| 00:00:01 |     1 |    46 |
----------------------------------------------------------------
----------------------------------

Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
         90  consistent gets
          0  physical reads

As you can see, no partitiion exclusion is possible, every partition is considered (PARTITION RANGE ALL, Pstart/Pstop of 1-46). 90 consistent gets are required.

The trick is to introduce a second WHERE clause, limiting the query by the partitioning key as well.
As a human, you can recognise that the ID and the CRE_DATETIME columns are going to increase pretty much in synchronisation. Records with a CRE_DATETIME a month ago are going to be found in partitions with a lower ID range than those from a week ago. Oracle does not know this business logic, so you have to tell it.

Let us say you never have more than 1,000 records created a day. so if you want “today’s” data, you need only consider IDs that are between the max(ID) and the max(ID) less 1,000. To be safe, you would increase this range substantially. The idea is to exclude most partitions without risking missing data:

select count(*)
from test_p3
where cre_datetime between to_date('18-OCT-2009','DD-MON-YYYY')
                   and     to_date('20-OCT-2009','DD-MON-YYYY')
and  id between 39000 and 41000

  COUNT(*)
----------
       721
1 row selected.

Execution Plan
----------------------------------------------------------
------------------------------------------------
| Id  | Operation                   | Name             | Rows  |
 Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
----------------------------------------------------------------
------------------------------------------------
|   0 | SELECT STATEMENT            |                  |     1 |
    13 |    14   (8)| 00:00:01 |       |       |
|   1 |  SORT AGGREGATE             |                  |     1 |
    13 |            |          |       |       |
|*  2 |   VIEW                      | index$_join$_001 |    34 |
   442 |    14   (8)| 00:00:01 |       |       |
|*  3 |    HASH JOIN                |                  |       |
       |            |          |       |       |
|   4 |     PARTITION RANGE ITERATOR|                  |    34 |
   442 |     7  (15)| 00:00:01 |    40 |    42 |
|*  5 |      INDEX RANGE SCAN       | TP_CRE_DT        |    34 |
   442 |     7  (15)| 00:00:01 |    40 |    42 |
|   6 |     PARTITION RANGE ITERATOR|                  |    34 |
   442 |     9  (23)| 00:00:01 |    40 |    42 |
|*  7 |      INDEX RANGE SCAN       | TP3_PK           |    34 |
   442 |     9  (23)| 00:00:01 |    40 |    42 |
----------------------------------------------------------------
------------------------------------------------

Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
         18  consistent gets
          0  physical reads

In the above I have managed to exclude most of the partitions and still get the correct data, by adding the clause and id between 39000 and 41000 . There is something a little unusal {if you have not seen it before} with the plan being scans of two indexes and then hashed together and viewed, but both are PARTITION RANGE ITERATOR scans and Pstart/Pstop values are 40-42. Consistent gets are down to 18.

You do have to take great care with this method that the extra WHERE clause you add on the partitioning key will never exclude records you want to find. But so long as you do, it is a very powerful technique.

I’ll go into that a little more in the next posting {which hopefully will not be as delayed as this one!}

Depth of Indexes on VLDBs November 18, 2009

Posted by mwidlake in performance, Uncategorized, VLDB.
Tags: , ,
3 comments

In a couple of recent posts on partitions, on single row and range scan access I have started exploring the performance of partitions. I will continue on the main theme soon, honest.

I feel a little bad that I have not given concrete examples so far {creating very large objects for the sake of demonstrations and proof-of-concepts is a bit of an issue with VLDBs, due to the space and time requirements to do it}. So, here is just some real-world information on index depth, or BLEVEL of indexes on Very Large DataBases.

The BLEVEL is the number of levels in the index. Minus one. See this post for a full description, but in brief, if an index has a BLEVEL of 3, that means there is a root node, a level of branch nodes, a second level of branch nodes and then the leaf nodes. 4 levels. 4 IOs to check if the index holds the value you are looking for. And then one or more IOs against the table to get the actual records from the table, if required (one block if there is only one row, one or more blocks if there are several rows).

The below is from a 25TB datawarehouse system. I am looking for the maximum index depth on the database:-

select max(blevel) from dba_indexes

MAX(BLEVEL)
———–
4

OK, how many indexes are this deep?…

select owner,index_name from dba_indexes
where blevel=4

OWNER INDEX_NAME
—————————— ———————-
ERIC W_IIIIIIII_URLS_PK

1 row selected.

Oh. I have a pretty massive database and the deepest index I have is BLEVEL 4 and there is one of them. So the index height is 5. And access to that table for a given referral ID is going to take 5 IOs to work down the index and one against the table.

What is more, the index is an IOT

INDEX_NAME INDEX_TYPE
—————————— —————————
W_REFERRAL_URLS_PK IOT – TOP

so, in fact, to get to the table record it is just the 5 IOs as the table record is held in the IOT.

So with a 25TB data warehouse, the deepest normal indexes are BLEVEL 3.

select index_name,num_rows/leaf_blocks
from dba_indexes
where BLEVEL =3
order by num_rows desc

INDEX_NAME                             NUM_ROWS  LEAF_BLOCKS
------------------------------ ---------------- ------------
W_WL_SH_API_XIIIIIIIIIII_ID     4,585,108,192   13,406,015
W_WL_SHIIIIIIIIIIIXXXX_ID    4,564,350,002   16,892,898
W_WL_SIIIIIIIIIIIIIIIIIIIIII_ID     4,422,775,820   17,189,536
W_WL_SHIIIIIIIIIIIIIIIIIIIIII_ID    4,182,087,545   12,243,438
W_WL_GX_RESIIIIIIIIIII_LOG_ID1     3,690,374,216   14,613,388
IDX_W_WL_LIIIIIIIIIIIESS_ID          351,507,905    1,958,109
WL_LOGIN_SUIIIIIIIIIIIIE_ID          348,004,861    1,550,180
IND_IIII_LEG_ID                     312,727,000      972,690
IND_TMLR_MARIIIIIIIIIII_ID           306,988,247    1,121,592
PK_TBL_IIIIIIIIIIIUNNER              305,566,240      939,775
PK_IIIIIIIIIIIGS                    284,208,000      863,000

I asked an old friend, Tony Webb, to check out the depth of indexes on the largest databases they have, which range from a few TB to several 10s of TB. {one  is 100TB plus, but a lot of the data is in LOBs, so the actual number of records is only billions, not 10′s of billions :-) } No index had a BLEVEL greater than 3.

Hi Martin,

I ran the first query on DAVE, DEE(our biggest db), DOZY and MICK. The max depth results were 3,2,3 and 3. Our largest database only had 2. In fact some of the other largish dbs also only have a max depth of 2.

That enough for your blog Martin?

Now, some of you may be smelling a Rat, and you would be right. At both sites, the VLDBs have been designed with partitioning as a key part of the Physical Implementation. All the largest objects are partitioned. What impact does partitioning have on BLEVEL?

The depth of the index segment is dependent on the number of keys per block.

Let us say you have 8k blocks {this is a questionably low block size for a data warehouse, but it is what some of these systems have} and your index is on a couple of columns, totalling 20 bytes per key on average, including rowid. Allowing block overhead, that is approx 8000/20 entries = 400 max….

Blow that, that’s whooly theory, I’ll look at my real 25TB database for number of rows per leaf block… 

select leaf_blocks, num_rows\leaf_blocks rows_per_leaf
from dba_ind_statistics
where num_rows &gt; 100000000
order by leaf_blocks desc

 LEAF_BLOCKS ROWS_PER_LEAF
------------ -------------
  30,211,949    325.602273
  28,717,447    262.762331
  26,308,608    382.505959
  25,429,862    385.047045
  24,054,626     309.92864
  23,660,747    382.571434
  22,458,058    411.063205
  22,316,976    346.620625
  18,875,827    379.952198
  17,451,901    338.600909
  17,189,536     257.29466

From the above let us take a low figure of 300 average enteries per block. For a BLEVEL 0 , a one block index, that would be 300 rows that are referenced. For a level 1 index, 300 entries in the root block will point to 300 leaf blocks, so that will be 300*300 rows…90,000 entires.

Go down another BLEVEL to 2 (and a height of 3) and you have a root node referencing 300 Branch nodes, referenceing 300 Leaf nodes each referencing 300 records. 300*300*300 = 27 million.

Another level (BLEVEL 3), another factor of 300 and that is 8.1 billion.

BLEVEL 0= 300
BLEVEL 1 = 90,000
BLEVEL 2 = 27,000,000
BLEVEL 3 = 8,100,000,000
BLEVEL 4 = 6,480,000,000,000 (6,480 billion entries).

You can see that with an 8k block size and being very pessimistic about the number of entries per block (I used a low average to calculate enttries per block where as in reality most indexes will fill the block at the root and branch nodes to close to full before throwing a new level) your have to increase your data volume by amost 300 to throw a new level of index.

On average, knowing nothing about the actual number of records in an index, you would have to reduce your number of indexed rows by 150 times (half of 300) to have a 50% chance of lowering the BLEVEL of a locally partitioned index by one. ie 150 or more partitions.

with a 32k block size, you would have to reduce your data volume by more like 620 times to drop the BLEVEL (not 600 as you have less wastage from block overhead with larger blocks).

To reduce it by two it would be 300*150 I think, wiht 8k block size {can any statisticians, like Mr Lewis, help me out on this one?}. If I am right, that is 45,000 partitions. In that 25TB datawarehouse, no table has more than 4,000 partitions.

That is why I say, when you partition a table and have locally partitioned indexes, you might reduce the index level by 1. Might.

However, none of these systems has more than 4000 partitions per table. That might sound a lot, but it is only going to reduce an index BLEVEL by 1. Almost certainly it will, but if you have read the previous postings, that is not going to really help index lookups be that more efficient :-)

I’ll just add a couple of last comments.

  • If you have an index BLEVEL of 4 or 5 or more and do not have a VLDB, you might want to look at why {hints, it could be an IOT, it could be an index on several concatenated VARCHAR2 columns, it could be you are using 2k block size, it could be that you regularly delete a large pecentage of your data and then re-insert more data, it could be you have a one-table database, hehe.}.
  • Point one does not mean you should rebuild said index. I almost never rebuild indexes. The need to do so is one of those Oracle myths. There are situations where index rebuilds are advantageous, but they are pretty damned rare. If you regularly rebuild all your indexes or you are thinking of doing so, can I suggest you book a “meeting” for an afternoon and sit down with a cup of tea and google/bing/yahoo or whatever and check out index rebuilding first? Ignore any sites you land on offering database health checks or training courses on cruise liners.

That’s all for now.

Wrong way to Query on Dates – the Persistent Offender November 15, 2009

Posted by mwidlake in performance.
Tags: ,
13 comments

Some of you, hopefully most of you, will find nothing new in this little posting. Except maybe the realisation that you are not the only one utterly frustrated by the ever-reoccuring nature of this. Actually, I should say “frustrated by these“. There are two things here I keep coming across, year after year after year…Actually, three things. {Maybe I am frustrated too easily, I need a holiday….}

{Maybe I should start a “weekend rant” as well as a”Friday Philospohy” thread…}

OK, you have a massive ORDERS table with a column ORDER_DATE on it, type DATE. This column has a normal B-tree index on it. You want to query for all ORDERS placed on a date, say 1st November 2009, so you issue a query with the following WHERE clause:

WHERE ORDER_DATE = ’01-NOV-09′

But you would not, would you? DATE columns can (and often do) contain a time portion. And in this case it does, so you would track down whoever specified this table, give them a “talking to”, get the column renamed to ORDER_DATETIME and use something like:

WHERE TRUNC(ORDER_DATETIME)=’01-NOV-09′

But you would not, would you? The function on ORDER_DATETIME will prevent the index being used (unless you added a function-based index, you clever so-and-so).
Now this is the first second thing that I keep seeing and have to run through with people. Some people simply have not yet been told that functions on columns stop indexes from being used, others know this but do not quite understand that a TRUNC {or SUBSTR on the first X characters} still stop indexes being used by Oracle, even though they know that “functions stop indexes being used” - It seems sensible to them, as humans, that if they had a printed index they would not think twice about doing a range check with the index. But the CBO is not a human and is not so smart. So you explain this to them.

So maybe you get to the point where the following is the WHERE clause to use:

WHERE ORDER_DATETIME >= ’01-NOV-09′
AND ORDER_DATETIME < ’02-NOV-09′

That will work. It will use the index on ORDER_DATETIME, it will pass unit testing, it will get released and work fine.

Until someone runs this on a system with a different NLS_DATE_FORMAT.
Or in a session with a different NLS_DATE_FORMAT.
Or on an application server/middle tier where the NLS_DATE_FORMAT is different and it all falls over.
{and for NLS_DATE_FORMAT, keep in mind that NLS_LANG and NLS_TERRITORY can implicitly alter NLS_DATE_FORMAT, so anyone supporting database systems over national boundaries may well be nodding their heads sagely right now}.

It falls over because there is an implicit data conversion of the string ’01-NOV-09′ to a date. It uses your default date format, which is not the same on all databases, on all middle tiers or even all sessions (I always alter my session to show the time portion and I seem to be one of the first to hit these issues, maybe as a result). It is far less common, but I must come across this NLS issue once a year minimum and usually it has been a bit of a major issue (like feeding back utterly the wrong data to a regulatory body, Ouch).

So, you need to explicitly state your character to date format conversion. And to all developers reading this, Yes I know, it is a pain to type those extra bits, but you get paid to do it! :-)
WHERE ORDER_DATETIME >= TO_DATE(’01-NOV-2009′,’DD-MON-YYYY’)
AND ORDER_DATETIME < TO_DATE(’02-NOV-2009′,’DD-MON-YYYY’)

That will work and keep working.

So:

  • Call columns that hold a DATE with a time portion blar_DATETIME
  • Don’t have functions on the column, move them over to the other side of the WHERE statement(s)
  • Explicitly specify your date conversions. Always. Every time.

I know, I know, most of you knew all of that. Those of you who didn’t, well probably no one had told you before but now you have been told. And that is the heart of it, these things are easy to explain, easy to understand, but somehow seem to be missed when people are getting going with Oracle. They just need telling.

And I’m only able to tell so many. So, the rest of you, will you please tell all inexperienced Oracle people about these three things as I don’t think I can face another 20 years of telling people these things :-)

{The serious final line is, there are lots and lots of interesting gotchas, what-ifs, quirks on things in Oracle, all of which are good to share, but perhaps the most important things to keep sharing are those things “we all know”, but those with less experience don’t.}

Friday Philosophy – Memory (of the Human Kind) November 14, 2009

Posted by mwidlake in Perceptions, Uncategorized.
Tags:
7 comments

One of the Joys of having been around for a while (I’ve hit that point where I occasionally find myself working with people who were not born when I got my first job in IT) is that you start to re-learn things. Well, I do. {I have to point out that, this being a Friday Philosophy, you will learn nothing yourself, especially not about Oracle. from this opost. If you want Oracle knowledge you might want to pop by Dion cho, Tanel Poder or Richard foote}

Several times over the last couple of weeks I have learnt something “new” and realised a little while later that I already knew it, but had forgotton (BLEVEL is one, consistent gets being impacted by array fetch size is another) . This is not something to be proud of – not knowing something you once did is far worse, and slightly worrying, then never having known it…

I like to think that it is all the new stuff I am learning, pushing previous knowledge out of my brain’s buffer cache {if it is not warmed by re-using it}, but I suspect that it might just be an uptime-related memory leak. A bit like having your database open for a few months and doing a lot of work that includes massive PL/SQL in-memory tables and hash joins (pickler fetches). No one seems sure of the exact mechanics, but after a while you have a hell of a lot less spare memory in your serever than you started with :-)

Maybe the memory loss is closer to the “pickler query” analogy than I thought, you can preserve soft tissue for a long time in alcohol. I’ll have another glass of wine and think on that.

This Forgetting Stuff was actually a major factor to my starting a blog. I was in the process of putting a load of notes on Oracle Odditites and things I had learnt from years past onto a web site so I could get at it from wherever I was in the world, and a friend told me I was being stupid – I should put them on a blog. So I did. There are only two little issues with this.

  • I can’t remember what I have already put on my blog.
  • I’ve forgotten where I put the ZIP disk with the notes on.

So I was contemplating this drift of knowledge and two things struck me.

1) I reckon that the very best people, in any area of expertise, are blessed with excellent memories. There is a comedian in the UK called Stephen Fry, and he is renowned for being stunningly clever. I think he probably is stunningly clever, but he also remembers everything he has learnt (Rory McGrath is another UK comedian with a perfect memory, but somehow he lacks the charm of Stephen Fry, and he has a beard, so not so highly renowned).
2) My poor memory is not maybe such a bad thing. I don’t have to get so upset when a whole chunk of my Oracle knowledge becomes obsolete. I used to be really good at sizing objects and utilizing space, taking into account the 5-block minimum, getting the extents to fit nicely into the datafiles, little scripts to resize segments into a small but sensibele number of extents to reduce wasted space, considering initrans, pctfree, pctused… Pretty much all pointless now :-)

BLEVEL and Height of Indexes November 13, 2009

Posted by mwidlake in internals, performance.
Tags: , ,
6 comments

{Update – this post describes blevel, demonstrates it’s growth with the number if index entries and the impact on index-to-table lookups. for a pretty graphic of BLEVEL try looking at this page}

I got something wrong on a couple of postings recently, namely the relationship between BLEVEL and the number of blocks needed to read “down” an index, the true depth or HEIGHT of the index {I used to know this but I forgot, but heck no one pinged me on the two posts in question, so I got away with it :-) – I’ve updated the postings already.}

BLEVEL is the number of branch levels (including the root node) in a B-Tree index. Height is the actual depth of the index. Height is BLEVEL plus one. So when you see BLEVEL of 3 against an index in DBA_INDEXES/DBA_IND_STATISTICS, that means the index has a root node, a first level of Branch blocks, then a second level of Branch blocks and finally the Leaf blocks (which hold the indexed values and rowids to the table entries).

Thus to scan the index for one unique entry, Oracle will need to read the root node to locate the correct branch node in branch level one, read that to find the correct branch node in branch level 2 and that will lead to the correct leaf block. That is four blocks to read. The leaf block contains the index entry and the rowid of the relevant data block, which allows oracle to go directly to that block, for the fifth block read.

{I’m having trouble finding a nice diagram of this {{ I hate the one in the Oracle manuals}}, not even on Mr Foote’s or Mr Lewis’s pages, so if you spot one before I do, let me know and I’ll update this page with a relevant link}.
{Update 18 months later – I finally drew a nice diagram of the index-rowid-table_row path.}

Some documentation on the Web mentions HEIGHT being held in the index stats table. This is SYS.INDEX_STATS, not the DBA_IND_STATISTICS table, and SYS.INDEX_STATS is only populated when you run the old “ANLAYZE INDEX index_name VALIDATE STRUCTURE” command, so ignore that. You should not really be using the old ANALYZE command any more.

The below demonstrates the increasing BLEVEL and the number of consistent gets to select one record {it’s more complicated if you select more than one}

NAME                           VALUE
------------------------------ --------------------
compatible                     10.2.0.3.0
cpu_count                      8
db_block_size                  8192
db_file_multiblock_read_count  16
optimizer_mode                 ALL_ROWS
sga_target                     0
sort_area_size                 65536

create table test_bl
 (id    number(8) not null
 ,status number(1) not null
 ,num_1     number(3) not null -- random 20
 ,num_2     number(3) -- random 20
 ,num_3     number(5) -- cycle smoothly
 ,num_4     number(5) -- cycle smoothly
 ,vc_1      varchar2(10)
 ,vc_2      varchar2(10)
 ,vc_pad varchar2(2000))
 tablespace users
 /
Table created.

insert into test_bl(id,status,num_1,num_2,num_3,num_4
                   ,vc_1,vc_2,vc_pad)
select rownum,decode(mod(rownum,100),0,1
               ,0)
,trunc(dbms_random.value(1,20))
,trunc(dbms_random.value(1,30))
,mod(rownum,10)+1
,mod(rownum,100)+1
,dbms_random.string('U',10)
,lpad(chr(mod(rownum,6)+65),5,chr(mod(rownum,6)+65) )
,lpad('A',100,'A')
from dba_objects
where rownum < 500
/
499 rows created.

commit;
Commit complete.

-- now add a pK on the ID
alter table test_bl
add constraint tb_pk primary key (id)
using index
tablespace users
 /
Table altered.

begin
  dbms_stats.gather_table_stats(user,'TEST_BL');
END;
/
PL/SQL procedure successfully completed.

select index_name,blevel,leaf_blocks
from dba_indexes
where owner=user
and index_name like 'TB%'
/

INDEX_NAME                         BLEVEL LEAF_BLOCKS
------------------------------ ---------- -----------
TB_PK                                   0           1

So I’ve created the test table, put 499 records in it and added the index, via a primary key constraint. The index created has one leaf block in it and a BLEVEL of 0.

Now let’s select a record via it {and the reason I put 499 records in the table is so that oracle decides to use the index and not a full table scan, which would be a likely choice by CBO with a very small table}.

set autotrace on
select vc_1 from test_bl where id=54
 /

VC_1
----------
BRFVRHEMWP
1 row selected.

Execution Plan
----------------------------------------------------------
| Id| Operation                   | Name    | Rows  |Bytes| Cost|
----------------------------------------------------------
| 0 | SELECT STATEMENT            |         |    1 |  15 |  1  (0)|
| 1 |  TABLE ACCESS BY INDEX ROWID| TEST_BL |    1 |  15 |  1  (0)|
|*2 |   INDEX UNIQUE SCAN         | TB_PK   |    1 |     |  0  (0)|
----------------------------------------------------------

Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
          2  consistent gets
          0  physical reads
          0  redo size
        339  bytes sent via SQL*Net to client
        338  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

select vc_1 from test_bl where id=54
 /

VC_1
----------
BRFVRHEMWP
1 row selected.

Execution Plan
----------------------------------------------------------
|Id | Operation                   | Name    | Rows |Bytes| Cost
----------------------------------------------------------
| 0 | SELECT STATEMENT            |         |    1 |  15 | 1   (0)|
| 1 |  TABLE ACCESS BY INDEX ROWID| TEST_BL |    1 |  15 | 1   (0)|
|*2 |   INDEX UNIQUE SCAN         | TB_PK   |    1 |     | 0   (0)|
-----------------------------------------------------------


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          2  consistent gets
          0  physical reads

I generally run test selects twice, to remove any parsing and recursive SQL overhead {I’ll remove these from the rest of this post}. So, 2 consistent gets. That would be one on the index and one on the table then.
Note the cost of the index index unique scan – 0. See end.

Now I’ll add more data and grow the index.

test102>set autotrace off echo off
insert into test_bl(id,status,num_1,num_2,num_3,num_4
                  ,vc_1,vc_2,vc_pad)
select rownum+500,decode(mod(rownum,100),0,1
              ,0)
,trunc(dbms_random.value(1,20)) 
,trunc(dbms_random.value(1,30)) 
,mod(rownum,10)+1
,mod(rownum,100)+1
,dbms_random.string('U',5)
,lpad(chr(mod(rownum,6)+65),5,chr(mod(rownum,6)+65) )
,lpad('A',100,'A')
from dba_objects
where rownum < 5500
/
5499 rows created.

begin
  dbms_stats.gather_table_stats(user,'TEST_BL');
END;
/
PL/SQL procedure successfully completed.

select index_name,blevel,leaf_blocks
from dba_indexes
where owner=user
and index_name like 'TB%'
/

INDEX_NAME                         BLEVEL LEAF_BLOCKS
------------------------------ ---------- -----------
TB_PK                                   1          11

So now we have a BLEVEL of 1 and 11 leaf blocks. That will be a root node and below it the leaf blocks. Let’s try a select:

select vc_1 from test_bl where id=454
/
VC_1
----------
IQGSEOCCCH

Execution Plan
----------------------------------------------------------
|Id | Operation                   | Name    | Rows  |Bytes| Cost|
----------------------------------------------------------
| 0 | SELECT STATEMENT            |         |     1 |  11 | 2   (0)|
| 1 |  TABLE ACCESS BY INDEX ROWID| TEST_BL |     1 |  11 | 2   (0)|
|*2 |   INDEX UNIQUE SCAN         | TB_PK   |     1 |     | 1   (0)|
-----------------------------------------------------------

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          3  consistent gets
          0  physical reads

3 consistent gets, one for the root node, one for the relevant leaf block and one for the data block holding the record.

Now I’ll insert about 400,000 more records to cause the index to become one level deeper. (You might be interested to know that 300,000 records was not enough to cause a layer of branch nodes to be created, though as I am indexing an ascending numerical column, each index entry is not exactly huge. This does show that the BLEVEL does not scale with data volume – but it ‘does not scale’ in a very beneficial way. You need to massively increase the volume of data between increasing BLEVELs.)

I will then select my record:

99999 rows created.
99999 rows created.
99999 rows created.
99999 rows created.

begin
  dbms_stats.gather_table_stats(ownname=>user,tabname =>'TEST_BL'
                                ,estimate_percent=> 10);
END;
 /
PL/SQL procedure successfully completed.

INDEX_NAME                         BLEVEL LEAF_BLOCKS
------------------------------ ---------- -----------
TB_PK                                   2         761

set autotrace on
select vc_1 from test_bl where id=454
 /

VC_1
----------
IQGSEOCCCH
1 row selected.

Execution Plan
----------------------------------------------------------
|Id | Operation                   | Name    | Rows  |Bytes| Cost|
--------------------------------------------------------
| 0 | SELECT STATEMENT            |         |     1 |  12 | 3   (0)|
| 1 |  TABLE ACCESS BY INDEX ROWID| TEST_BL |     1 |  12 | 3   (0)|
|*2 |   INDEX UNIQUE SCAN         | TB_PK   |     1 |     | 2   (0)|
---------------------------------------------------------

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          4  consistent gets
          0  physical reads

The index BLEVEL has gone up to 2 {index height is 3} and now 4 consistent gets are needed to fetch the record.

You may have noticed that the estimated cost of the INDEX_UNIQUE_SCAN is the same as the BLEVEL, which is not really correct. After all, in the first example the cost was 0 and there has to be a read of the index leaf block! The costing makes more sense when it is part of the calculation for scanning an index and then visiting the table for all found records:-

“basic index range scan cost = index blevel + ceil(index selectivity x leaf blocks) + ceil(table selectivity x clustering factor)”

In words this formula means “go down the index to the leaf nodes (this is the BLEVEL), scan the number of leaf nodes expected for this index value, then visit the number of table blocks this set of index entries would map to”.

For more information on the formula, I’d plug part of that formula into google (or bing or whatever takes your fancy, search-engine-wise). The original is this page by Richard Foote but there are some good notes by others as well.

There are a lot of references on the web about the cost of accesing an index being the BLEVEL, but remember, if it is a unique access it is the BLEVEL plus one, and oracle seems (in my little tests anyway) to be underestimating the cost by 1. I think this reference to the BLEVEL and the costs might be leading to people into mistaking the BLEVEL as the actual height of the index.

Partitions are Still Not for Performance – Sometimes November 5, 2009

Posted by mwidlake in performance.
Tags: ,
6 comments

This is a follow up to my first posting on why Partitions are not for Performance where I discuss how lookups against a partitioned table with a locally partitioned index and no partition pruning can lead to performance problems. Basically, the CBO ends up scanning all local indexes to find the record(s) you want, which with a hundred partitions will probably result in several hundred buffer gets instead of 4 or 5.

The first posting dealt with the simple situation where a single record is being selected but this lack of partition pruning also crops up with range scans of course.

{warning, this is a long post, but if you use partitions you need to understand this}.
{Caveat, it has been pointed out the last comment, and by association this one, appears partition-negative. I am not partition-negative, partitions are great but for reasons generally other than performance and, to improve performance, you need to understand how partitions fit in with the CBO}

Imagine you have an unpartitioned ORDERS table with 10 million rows, the primary key is the traditional ascending numeric, derived from a sequence.
About 5,000 orders a day are received, the table goes back around 10 years (as in the past, less orders per day were received).
There is also an ORDER_DATETIME column, a DATE type column holding the data and time the order was placed. This column has a traditional index on it, which has a B-level of 3.

You want to query for all ORDERS placed on a date, say 1st November 2009, so you issue a query with the following WHERE clause:
WHERE ORDER_DATETIME >= TO_DATE(’01-NOV-2009′,’DD-MON-YYYY’)
AND ORDER_DATETIME < TO_DATE(’01-NOV-2009′,’DD-MON-YYYY’)

That will select around 5,000 out of 10 million rows and so a nested loop lookup on the index is the most suitable {just accept this for now, OK?}

The CBO will work down the levels on the index on ORDER_DATETIME to the first record of interest, taking 4 I/Os. It will then scan the index for all entries in that range, so maybe 10 leaf blocks, and then read the database blocks for each one – 5000 I/Os against the table, but as the data will be clustered well, most reads will be to the same say 100 blocks, so they will be cached in the block buffer cache. {Under Oracle – to 11.1 anyway – all IO is treated as physical, so the default optimizer will calculate the cost based on this, but in reality most IO will be logical IO, not physical IO}.

So the real cost will be 4 IOs to find the start of the index to scan, 10 I/Os to scan the index and 5000 IOs to get the table data – 5014 logical I/O’s, with between 0 and (4+10+100 =114) physical IOs to get the data into the cache, depending on what was cached already.

OK, remember those figures…

Now we partition the table by the ORDER_DATETIME column into 100 partitions and the index on ORDER_DATETIME is locally partitioned of course. Your same WHERE clause on ORDER_DATETIME will now resolve to a single index partition.

If you are lucky the local index partition will have a B-level of 2, 1 less than the standard non-partitioned index. So, with the same WHERE clause, the CBO will work down the local index partition to the first record of interest, for 3 I/Os. Then it will scan the index for the same 10 leaf blocks and find 5,000 records to check in the table partition, with the same 5,000 I/Os. You have saved….Yep, 1 logical IO. Physical IOs are between 0 and (3+10+100 =113). Let’s be kind and assume you saved a physical IO.

In this case, partitioning saved 1 out of 5104 logical and 1 out of 114 physical IOs.

Partitioning, in this case, is performance agnostic. It matters very, very little.

In reality, if your application is working on the same day of data over and over again, using the one partition, then that set of data will be cached and a nested-loop access to it will find the data in memory -but that would be true with the standard table and index too :-) So still no gain….

I’ve assume a nested-loop access path is still used by the CBO when accessing your partition. This is where things can be very different. {and it is also where I should swap from theory and start creating some test data to prove this, but I’ll hang myself out first and then look like an idiot next week when I am back with my test system :-)}.

Let’s estimate that the unpartitioned table is 100,000 blocks (100 rows per block).
With 5,104 I/Os against a 10 million row table/index , a nested loop lookup is likely to be chosen as the CBO estimated the number of I/Os to be less than scanning the whole table (100,000 blocks divided by the actual multi block read count {which is not the db_file_multi_block_read_count, but I’m skipping over that detail}, let’s say 12, for say 8,500 I/Os ).

With the table split into 100 partitions, there is 1,000 blocks in each partition. The whole table can be scanned with 1,000/real-multi-block-read-count {12}. So 80 or so I/Os

So in reality, your partitioning MAY be beneficial as the CBO decided to forget the index and nested lookups and simply scans the whole partition into memory for maybe 80-100 I/Os using multi block scans and processes the data in memory, burning some CPU as well, to find the records you want.

So, instead of 5,103 IOs the CBO decides to do 80-100 I/Os and some extra memory/cpu work, which is cheap given the power of modern CPUs.

So, partitions may help performance as the CBO swaps from nested loop lookups with an index to simply full scanning the partition.

You may start to wonder if that index on ORDER_DATETIME helps much… If it is to support range scans of a significant number of records, maybe not, but if it helps single or small-number record lookups, the index still helps.

I don’t know about you, but my brain is full for the night. I’ll leave this for another day….

Buffer Cache Hit Ratio – my “guilty” Confession November 1, 2009

Posted by mwidlake in Perceptions, performance.
Tags: , ,
14 comments

My Friday Philosophy this week was on Rules of Thumb on buffer gets per row returned.

Piet de Visser responded with a nice posting of his own, confessing to using ratios to help tuning {We seem to be playing some sort of blog-comment tag team game at the moment}.

Well, I have a confession so “guilty” or “dirty” that I feel I cannot inflict it on someone else’s blog as a comment.

I use the Buffer Cache Hit Ratio.

And the Library Cache Hit Ratio and the other Ratios.

As has been blogged and forum’d extensively, using these ratios is bad and stupid and anyone doing so does not know what they are doing as they do not help you solve performance problems. I mean, hell, you can download Connor McDonald’s/Jonathan Lewis’s script  to set it to what you want so it must be rubbish {go to the link and chose “tuning” and pick “Custom Hit Ratio” – it’s a rather neat little script}.

The point I am trying to make is that once the Buffer Cache Hit Ratio (BCHR) was wrongly elevated to the level of being regarded as a vital piece of key information but the reaction against this silly situation has been that it is now viewed by many (I feel) as the worst piece of misleading rubbish. Again a silly situation.

I think of the BCHR as similar to a heart rate. Is a heart rate of 120 good or bad? It’s bad if it is an adult’s resting heart rate, but pretty good if it is a kitten’s resting heart rate. It’s also probably pretty good if it is your heart rate as you walk briskly. Like the BCHR it can be fudged. I can go for a run to get mine higher, I can drain a couple of pints of my blood from my body and it will go up {I reserve the right not to prove that last one}. I can go to sleep and it will drop. Comparing my resting heart rate to yours (so like comparing BCHRs between systems) is pretty pointless, as I am a different size, age and metabolism to you {probably} but looking at mine over a year of dieting and exercising is very useful. If only I could keep up dieting and exercising for a year…

So what do I think the much-maligned Buffer Cache Hit Ratio gives me? It gives me what percentage of sql access, across the whole database activity, is satisfied from memory as opposed to disc. Or, put another way, the percentage of occurences a block has to be got from the I/O subsystem. Not how many blocks are read from storage or memory though, but you can get that information easily enough. As Physical IO is several orders of magnitude slower than memory access {ignoring I/O caches I should add} , it gives me an immediate feel for where I can and can’t look for things to improve.

If I am looking at a system that is overall very slow (eg high process wait queues under l/unix, the client has said the system is generally slow) and I see that the BCHR is low, say below 90%, this tells me I probably can get some performance increase by reducing physical access. I’ll go and look for those statements with the highest physical IO and the hottest tablespaces/objects in the DB.
If the BCHR is already up at the 99% level, I need to look at other things, such as tuning sort, looking at removing activity in the database, to be very mindful of nested loop access where maybe it is not the best access method (very likely due to old stats on tables).

When I have got to know a system and what it’s BCHR generally sits at, a sudden change, especially a drop, means there is some unusual physical IO going on. If the phones start going and someone is complaining “it’s all slow”, the BCHR is one of the first things to look at – especially as it is available from so many places.

Another thing the BCHR gives me is, if I am looking at a given SQL statement or part of an application, it’s specific BCHR can be compared to the system BCHR. this does not help me tune the statement itself, but I know if it’s specific BCHR is low then it has unusually high IO demands compared to the rest of the system. Further, Reducing it might help the whole system, so I might want to keep an eye on overall system throughput. If I reduce the statement’s execution time by 75% and the whole system IO by 1%, the client is likely to be more happy, especially if that 1% equates to other programs running a little faster “for free”.

So, I don’t use the BCHR to tune individual statements but I feel confident using it to track the general health of my database, so long as I am mindful of the impact of new functionality or upgrades. It’s a rule of thumb. It’s a database heart rate. (and so is redo generation and half a dozen other things).

Follow

Get every new post delivered to your Inbox.

Join 156 other followers