jump to navigation

Headlong rush to Conference – Preparing the Presentations November 29, 2011

Posted by mwidlake in Meeting notes, UKOUG.
Tags: , , , ,

With only a few days to go before the UKOUG conference this year I’m preparing my presentations. I know pretty much what I want to say and, for the IOT talk at least, it is not as if I do not have enough material already prepared – some of which has been on the blog posts and some of which has not. (though it did strike me that I could just fire up the blog and walk through the thread, taking questions).

My big problem is not what to say – it is what not to say.

I’ve always had this problem when I want to impart knowledge, I have this desire to grab the audience by the throat, take what I know about the subject and somehow just cram the information into the heads of the people in front of me. All of it. I want them to know everything about it that I know, the core knowledge, the oddities, the gotchas, how it meshes with other topics. It’s ridiculous of course, if I’ve spent many hours (days, weeks, 20 years) acquiring experience, reading articles and learning, I can’t expect to pass that all on in a one hour presentation – especially as I like to provide proof and examples for what I say. But I think the desire to do so is part of what makes me a good presenter and tutor. I bounce around in front of the audience, lobbing information at them and constantly trying to judge if I need to backup and give more time to anything or if I can plough on, skipping the basics. Hint, if you are in the audience and I’m going too fast or garbling my topic, then I am always happy to be asked questions or told to reverse up a little. I’ve never been asked to speed up though :-)

It gets even worse. If I am putting myself up there to talk about a topic then I don’t want to be found wanting. I want to be able to handle any question and have a slide or example up my sleeve to demonstrate it. It’s exhausting and, again, pointless. At somewhere like the UKOUG there is bound to be someone who knows something I don’t know about any topic.

For me the trick is to pare it down, to keep reminding myself that if the audience leaves with more knowledge than they came in with, that is a win. If they actually enjoyed the experience I’m even more happier. Maybe I should forget the topic and just take drinks and nibbles…

So, I’m currently doing what I always do, which is trying to force myself to remove stuff that is not directly relevant whilst still leaving a few little oddities and interesting items. Plus getting the 200 slides down to something more reasonable – like say 120 :-)

If I can get it down to one slide per minute (some of which I skip on the day as they are there for anyone downloading the talk) then I’m OK.

Of course, having done this, the day before the course I’ll do one last “final review” – and add a couple of dozen slides to just clarify a few points…

Lack of Index and Constraint Comments November 24, 2011

Posted by mwidlake in Architecture, database design, development.
Tags: , , , ,

Something I’ve just reminded myself of is that under Oracle you cannot add a comment on an index or a constraint. You can only add comments on tables, views, materialized views, columns of those object types and a couple of esoteric things like Operators, Editions and Indextypes.

Here is an example of adding comments to tables and columns:

set pause off feed off
drop table mdw purge;
create table mdw(id number,vc1 varchar2(10));
comment on table mdw is 'Martin Widlake''s simple test table';
comment on column mdw.id is 'simple numeric PK sourced from sequence mdw_seq';
comment on column mdw.vc1 is'allow some random text up to 10 characters';
desc user_tab_comments

 Name                                                  Null?    Type
 ----------------------------------------------------- -------- ------------------------------------
 TABLE_NAME                                            NOT NULL VARCHAR2(30)
 TABLE_TYPE                                                     VARCHAR2(11)
 COMMENTS                                                       VARCHAR2(4000)

select * from dba_tab_comments where table_name='MDW'
OWNER                          TABLE_NAME                     TABLE_TYPE
------------------------------ ------------------------------ -----------
MDW                            MDW                            TABLE
Martin Widlake's simple test table

select * from dba_col_comments where table_name='MDW'
order by column_name
OWNER                          TABLE_NAME                     COLUMN_NAME
------------------------------ ------------------------------ --------------
MDW                            MDW                            ID
simple numeric PK sourced from sequence mdw_seq
MDW                            MDW                            VC1
allow some random text up to 10 characters
-- now to add a big comment so need to use the '-' line continuation character in sqlplus
comment on table mdw is 'this is my standard test table.-
 As you can see it is a simple table and has only two columns.-
 It will be populated with 42 rows as that is the solution to everything.'
select * from dba_tab_comments where table_name='MDW'
OWNER                          TABLE_NAME                     TABLE_TYPE
------------------------------ ------------------------------ -----------
MDW                            MDW                            TABLE
this is my standard test table.  As you can see it is a simple table and has only two columns.  It w
ill be populated with 42 rows as that is the solution to everything.

Adding comments on tables, views and columns seems to have dropped out of fashion over the years but I think it is still a very useful feature of oracle and I still do add them (though I am getting a little slack about it myself over the last 3 or 4 years, which I must stop).

Comments are great, you can put 4000 characters of information into the database about each table, view and column. This can be a brief description of the object, a full explanation of what a column is to hold or even a list of typical entries for a column or table.

But you can’t add a comment on indexes or constraints. Why would I want to? Well, constraints and indexes should only be there for a reason and the reason is not always obvious from either the names of the columns or the name of the constraint or index, especially where you have a naming standard that forces you to name indexes and constraints after the columns they reference.

When you design a database, do a schema diagram or an ERD, you label your relationships between entities/tables. It tells you exactly what the relationship is. You might create an index to support a specific method of access or particular business function. You might alter the index in a way not immediately obvious to the casual observer, such as to allow queries that use the index to avoid having to visit the table. All of those things will, of course, be fully documented in the maintained project documentation in the central repository, available and used by all…

If I was able to add comments to constraints and indexes within the database then they would there. You move the system from one platform to the other, they are there. If for any wildly unlikely reason the central documentation lets you down, the information is always there in the database and easy to check. You may not be able to track down the original design documents but you have the database in front of you, so comments in that will persist and be very easy to find.

Lacking the ability to add comments on indexes and constraints, I have to put them at the table level, which I always feel is a kludge. I might actually raise an enhancement request for this, but as Oracle 12 is already nailed down, it will have to wait until Oracle 14. (A little bird told me Larry said there would be no Oracle 13…).

How deep to dig – Another Opinion and Another Good Blog November 22, 2011

Posted by mwidlake in Blogging, UKOUG.
Tags: , ,
1 comment so far

I think I’ve posted before about how deep a good DBA should dig into solving issues, as opposed to fixing them as soon as possible and moving on to the next urgent task.

Well, a friend of mine, Neil Chandler, has just posted on this topic, giving his reasons why you don’t run a 10046 trace on production. Neil raises some good points about how difficult it can be to get permission to do something as intrusive as a 10046 trace on a production system as well as the fact that most problems can be solved way before you get down to the level of tracing. Especially if it is not your job to go around solving the problems that have stumped the in-house team, which is the lot of many people who are recognised as being very good with Oracle.

That leads me onto a slightly different topic. For every one of those technicians who’s names you know there are at least a dozen who are just as good but not as visible. Some of us choose to make more “noise” {blogging, presenting, writing articles} so we become visible. Others also support the user community but in a less noticeable way, some even positively choosing to keep a lower profile. Neil is one of those. He’s a very good Oracle and SQL*Server DBA and also very knowledgeable about Unix OS’s and SANs/storage {though he would maybe argue he is not – don’t believe him}. When he comes along to the pub his is an opinion worth listening to {once he’s finished demeaning me that is, and I’m sure he will give me a hard time about this article about him next time we meet} and when I’ve asked his opinion he has never failed to help. Neil also supports the UKOUG, he’s been deputy chair of one of the SIGs since it’s inception – but he refuses to be chair, has kept away from presenting and he keeps in the background.

So, I was very happy when Neil started blogging. It is a mixture of technical stuff and observations on the IT world, all written in a very comfortable style – Look back at his previous post on timestamps not being impacted by FIXED_DATE. A nice piece of information to tuck into your bag of “things to be aware of”.

So, a belated welcome to the world of blogging Neil. I think it is a blog worth watching.

Friday Philosophy – OK, so I am on Twitter Now November 18, 2011

Posted by mwidlake in Friday Philosophy, Private Life, Twitter.
Tags: , ,

Not a very exciting Friday Philosophy this week I’m afraid, just a self-publicising announcement that I am now on Twitter. I’ve put the wordpress widget on the blog for a while (days or weeks, I don’t know), my twitter name is MDWidlake. {I was a little surprised mwidlake had gone already but that says more about how rare I consider my surname to be than naivety, I hope}. It seems you can click on a part of the widget to follow me, which is a pretty safe thing to do as I am not very verbal as yet.

As I said, I’m not very active at the moment, I’m more following just a few friends and seeing what people use Twitter for. So far it mostly seems to be about:

  • Random stuff posted when bored
  • Complaining about work or, more specifically, tasks that are proving trickier than hoped
  • Drinking
  • Random stuff posted when bored
  • Articles that have caught someone’s eye
  • …or more often, about tweets about articles that have caught someone’s eye
  • Chatty stuff that only makes sense between social peers (and isn’t that one of the main points of something like Twitter?)
  • Random stuff posted when bored
  • Cuddly toys. I think that is a result of low sample size and that Doug Burns is away at a conference. I worry about his sanity sometimes.

Niall Litchfield, Neil Chandler and Doug Burns were right {thanks again for your advice, gents}, there is some nice stuff on there and I’ve already seen some articles and web pages I found interesting via it – but I have also failed to get on with proper work-like stuff I should have been doing as a result.

I also like the chatty extension to real social engagement that Twitter gives but I hold out on my final decision as to whether this makes up for the negative impact it seems to have on real, meeting-in-person socialising.

The interface to Twitter seems a bit, well, rubbish to me. I know, I’ve been on there for all of a week and I am probably missing the Bleedin’ Obvious  but it seems the stuff I see in Timeline, the default view, is just a subset of what people I follow say. I suspect that it’s got something to do with whether the person the tweet is replying to is on my follow list. To understand half the social stuff you have to go clicking around on people’s full tweet history and follow the thread back. Surely there is an easier way than this, maybe some connect-by tree-walk SQL could be invoked…

I’ve already dropped one person off my “following” list. I only followed one celebrity and I decided I could live without the random musings of Simon Pegg. I can imagine people get addicted to following several dozen b to z level celebs, maybe it’s like constantly living in some sort of poor quality reality tv show {Personally I tend to avoid all reality TV, I prefer reality. Except that I am forced to watch that dancing thing on BBC by my wife. And like most men who make that sort of defence, I can’t quite explain away why I still watch it if she is away…}.

So, don’t expect too much in the way of interesting, witty, insightful or even existing tweets from me as yet, but if you want to follow, heck you can always drop me like a sack of manure any time you like :-).

Skipped Initial Sequence Value on Oracle 11.2 November 17, 2011

Posted by mwidlake in bug, Testing.
Tags: , ,

I’m seeing an oddity with newly created sequences skipping the initial value. ie I create the sequence and the first use of it returns the value 2, not 1.

{update, see the post comments and this post by Niall – this is a side effect of delayed segment creation – the first insert silently fails, but gets the sequence, and is repeated behind the scenes once the segment is there. It is fixed in, my db is at as I got sick of the download of the latest version failing on my poor broadband connection and got it off a mate – turns out it is not the latest version.
Thanks for the enlightenment Niall.}

This is on Oracle on Windows (XP, Vista and now Windows 7 64 bit). I am sure I did not see this on Oracle 10.1 and 10.2 on linux, Tru64 and windows.

I create a set of test tables for when I run courses or just want to test some things, a few tables holding people, names, addresses, phone calls and monthly call summaries. I do this with a bunch of pretty crude scripts that create the data in steps. Only on Oracle 11.2 it all stopped working. I tracked down the problem to the skipping of the first value for the first sequence I create and use. I’ve just confirmed this on a new server I have created.

I’ve boiled it down to the following test case. I ensure my test table and two sequences are not there, create them and then insert 10 records for men and 10 for women. For the women the records are numbered 1 to 10. For the men they are numbered 2 to 11!!!

My code:

-- seq_oddity.sql
-- BUG ON ??
-- though I create both sequences afresh, the first use of seq_m is getting 2, yes two!
-- cannot work out why, so fixed by setting the row with forn_id of 130 to 1.
set timi off
spool seq_oddity.lst
drop table test_fn purge;
drop sequence seq_m;
drop sequence seq_f;
create table test_fn
(forn_id number(5) not null
,forname varchar2(30) not null
,sex_ind char(1) not null)
tablespace users
create sequence seq_m;
create sequence seq_f;
insert into test_fn values (seq_m.nextval,'ALAN','M');
exec dbms_output.put_line ('I have just created male name number '||seq_m.currval);
insert into test_fn values (seq_m.nextval,'BARRY','M');
insert into test_fn values (seq_m.nextval,'CHRIS','M');
insert into test_fn values (seq_m.nextval,'DAVID','M');
insert into test_fn values (seq_m.nextval,'EDWARD','M');
insert into test_fn values (seq_m.nextval,'JANG','M');
insert into test_fn values (seq_m.nextval,'GARY','M');
insert into test_fn values (seq_m.nextval,'HARRY','M');
insert into test_fn values (seq_m.nextval,'IAN','M');
insert into test_fn values (seq_m.nextval,'JAMES','M');
exec dbms_output.put_line ('I created 10 men and last value was '||seq_m.currval);
insert into test_fn values (seq_f.nextval,'ALISON','F');
exec dbms_output.put_line ('I have just created female name number '||seq_f.currval);
insert into test_fn values (seq_f.nextval,'BARBARA','F');
insert into test_fn values (seq_f.nextval,'CHERYL','F');
insert into test_fn values (seq_f.nextval,'DAWN','F');
insert into test_fn values (seq_f.nextval,'ELAINE','F');
insert into test_fn values (seq_f.nextval,'FRANCIS','F');
insert into test_fn values (seq_f.nextval,'GILLIAN','F');
insert into test_fn values (seq_f.nextval,'CHERRY','F');
insert into test_fn values (seq_f.nextval,'INGRID','F');
insert into test_fn values (seq_f.nextval,'JANET','F');
exec dbms_output.put_line ('I created 10 women and last value was '||seq_f.currval);
select sex_ind,min(forn_id),max(forn_id)
from test_fn
group by sex_ind/
spool off

The results are:

mdw11> @seq_oddity
drop table test_fn purge
ERROR at line 1:
ORA-00942: table or view does not exist

drop sequence seq_m
ERROR at line 1:
ORA-02289: sequence does not exist

drop sequence seq_f
ERROR at line 1:
ORA-02289: sequence does not exist

Table created.

Sequence created.

Sequence created.

1 row created.

I have just created male name number 2

1 row created.
1 row created.
1 row created.

I created 10 men and last value was 11

1 row created.

I have just created female name number 1

1 row created.
1 row created.
1 row created.
I created 10 women and last value was 10

- ------------ ------------
M            2           11
F            1           10
2 rows selected.


I suppose I should trace this and check for known oracle bugs but I wondered if anyone else had seen it.

What Have I Let Myself in For! – UKOUG this year November 16, 2011

Posted by mwidlake in development, Meeting notes, UKOUG.
Tags: , , , ,

One of my favourite Oracle happenings of the year is fast approaching, the UK Oracle User Group technical conference {see/click on the link on the right margin}. I’ve blogged before ( like here, last year) why I think it is so good.

I try and present at the conference each year and I go no matter if I am presenting or not.

However, this year I think I might have got myself into trouble. I put forward 3 talks, expecting one or possibly two to get through. One on Index Organized Tables, one on IT disasters and one as an introduction to database design – I’ve moaned about it being a dying art so I figured I should get off my backside and do something positive about it. Each talk is in a different stream.

Well, the IOT talk was accepted, the Disasters talk was rejected and the Database Design intro was put on the reserve list. I was happy with that. I did three talks the first year I presented and promised myself never to be that stupid again {I spent most of the conference in the Speaker’s lounge or my hotel putting the talks together and tweaking slides}.

What I was not expecting was for the OakTable to ask me to do the IOT talk on the OakTable Sunday. Yikes! {The OakTable Sunday is a great opportunity to see a set of presentations by people who really know their stuff in a smaller setting – You really want to get along to it if you can}. However I had two reasons not to do it:

  1. I would have to miss one of the other OakTable talks.
  2. That thing I said about people presenting who really know their stuff.

I was told that (1) was not a problem as the talks would be repeated in the main conference so I would have an opportunity to see  the one I missed and (2) stop being so British and do it. {In fact, one friend on the OakTable told me off after the last conference for my criticism of my own presentation that year – “yes it was poor for you but normally you do a good job, so keep doing it”}. Of course I said yes.

Then it struck me, I was presenting twice now. Once on Sunday and repeating on Wednesday in hall 5 {I’ll probably not simply repeat the contents, at the OakTable Sunday I’ll assume a little more knowledge by the audience and dig a bit deeper technically, in the main conference I’ll cover off the basics more, for those utterly new to IOTs}. At least it was only one set of slides to prepare.

A few days later I get a mail from the UKOUG office. A gap had appeared in the Development stream, would I be willing to do my “Oracle Lego – an introduction to database design” talk – but beef it up a little? Yes, sure. What do you mean about beef it up? The dev stream guys wanted something that went into more detail, was more about some of the more challenging systems I’ve work on. So we exchanged a few emails and it quickly became apparent that some wanted the intro talk I had originally proposed, to get people going with database design. Others felt there would be more audience for a more in-depth talk, so could I span both? I had to say no. I remember attending my Oracle database design course in 1993. It was 5 days long. If my memory serves there was also a second course a couple of weeks later that covered more advanced design for 3 days! I can talk fast but not 8 days fast. They were effectively asking for two quite different presentations, an intro and then a review of more challenging examples “OK” they said, “do Oracle Lego – But if another gap comes up, could you do the intermediate talk?”. Err, OK… So I wrote a quick synopsis for “Oracle Meccano” {Meccano is a toy construction kit made up of miniature girders, plates, bolts and stuff you can make proper things out of. If you liked Lego you would love Meccano as you got older} .

Since then I have been slightly anxious about getting an email from the UKOUG about a gap in the development stream for the conference…

This week I have started preparing the presentations for real {which so far has resulted in me breaking my server, finding a load of notes on blogs I was going to write and then doing this post} so I contacted the ladies in charge of the agenda and asked if I was now off the hook for the Oracle Meccano talk? “Yes, no more gaps, it is not on the agenda”. Phew. “But could you put it together in case of last minute cancellations?”. *sigh*. OK.

So I will, but I’m not signing up to do any Session Chairing, which I was about to. If you see me at the conference and I look a little crazed, it’s because I got a mail from the UKOUG just before the event about a sudden gap…

At least there is no chance I will be asked to do the Disasters talk at short notice, I saw the scores it got by the paper reviewers :-).

IOT Part 6(B) – OLTP Inserts into an IOT November 10, 2011

Posted by mwidlake in development, performance, Testing.
Tags: , , , , , ,

<..IOT1 – the basics
<….IOT2 – Examples and proofs
<……IOT3 – Significantly reducing IO
<……..IOT4 – Boosting Buffer Cache efficiency
<……….IOT5 – Primary Key Drawback
<…………IOT6(A) – Bulk Insert slowed down

IOT 6 part A covered the sort of insert work used to create an IOT or push a large number of rows into it via an “insert into…select from” type format. ie Bulk Load.
In this, part B, I am going to test inserting a small number of rows (10,000) into a table already holding a million rows. Something more like an hourly load or the inserts that would occur over a working day – a more OLTP-like test

I’m going to do the tests against four things. A “non-insert” test to ensure my data creation is not taking a lot of resource {the Null test}, then against my three versions of test table – a normal heap table, my IOT and my Partitioned IOT. I run each test on each segment type three times, the minimum number needed to help identify variation that could invalidate the test.

You will probably be surprised by the results….

All three tables have the same column structure, the heap has indexes on the primary key and a unique constraint, the IOT and partitioned IOT are organized on a modified version of the Primary key and have the second index for the unique key. Below is the creation statement for the partitioned IOT (the IOT is the same without the partition clause). You can get full creation details from IOT part 2:

create table transaction_IOT_P
(tran_type number(2)     not null
,tran_id   number(10)    not null
,ACCO_type number(2)     not null
,ACCO_id   number(10)    not null
,cre_date  date          not null
,vc_1      varchar2(1000) not null
,vc_2      varchar2(1000)
,date_1    date
,num_1     number(2)
,num_2     number(2)
,constraint tip_pk primary key(ACCO_type,ACCO_id,cre_date,tran_type,tran_id) 
--  using index tablespace index_01
,constraint tip_tran_uq unique (tran_type,tran_id)
  using index tablespace index_01
organization index
tablespace data_01
partition by range  (cre_date)
(partition rm20110901 values less than (to_date('01-09-2011','DD-MM-YYYY')) 
  tablespace data_01
,partition rm20111001 values less than (to_date('01-10-2011','DD-MM-YYYY'))
  tablespace data_01
,partition rm20111101 values less than (to_date('01-11-2011','DD-MM-YYYY'))
  tablespace data_01
  tablespace USERS

The three tests are:

  • Insert 10,000 records for random accounts, in time order for the same day. This would be like processing data for or during a “day”
  • Generating the 10,000 records in order of account, to better match the IOT structure, again for a single day. For any given account 0,1,2 or 3 records are created, to be a little more “real life”. This would be like loading a pre-processed set of data.
  • All 10,000 records are created for the one account, in datetime order, as an extreme example of inserting into an IOT data that is ordered to perfectly match the order of the IOT.

The code is written ‘badly’, ie a single record is prepared and inserted into the table at a time, within a loop of 10000 iterations. This is to avoid the bulk-processing that may benefit the heap table and also to better match the reality of an OLTP system. I do however only commit after all records are created for any given test, which is maybe a mistake but that’s what I did.

Now, before I go any further, does anyone want to guess by how much inserting into the heap table will be faster than inserting into the IOT (partitioned or otherwise)? Also, whether the partitioned IOT will be faster or slower than the IOT and what, if any, impact there will be by the ordering of data?

To allow you to think about that and not see the results by accident, I’ll stick in one of the insert harnesses here so you can see how I am doing it. Skip over if you just want the results

execute s_snap.my_snap(' finished non-insert test1')
-- Transaction_heap random data test
v_num         number :=10000; -- number of accounts
v_offset      number :=-4;
v_str1        varchar2(60);
v_str2        varchar2(60);
v_tran_type   number(2);
v_tran_id     number(10);
v_acco_type   number(2);
v_acco_id     number(10);
v_cre_date    date;
v_vc_1        varchar2(1000);
v_vc_2        varchar2(1000);
v_date_1      date :=sysdate;
v_num_1       number(2) :=10;
v_num_2       number(2) :=15;
dbms_output.put_line (to_char(SYSTIMESTAMP,'HH24:MI:SS.FF'));
for i in 1..10000 loop
    v_tran_type :=mod(1,3)+1;
    v_tran_id   := 1000000+i;
    v_acco_type := 5+(trunc(dbms_random.value(1,3))*5); -- 10 or 15
    v_acco_id   :=trunc(dbms_random.value(1,v_num/2)); -- 1 to 5000
    v_cre_date  :=trunc(sysdate-v_offset)+(i/(60*60*24));
    v_vc_1      :=substr(v_str1,1,51+mod(i,10));
    v_vc_2      :=substr(v_str2,1,46+mod(i,15));
    insert into transaction_heap
end loop;
dbms_output.put_line (to_char(SYSTIMESTAMP,'HH24:MI:SS.FF'));
execute s_snap.my_snap(' finished th insert test1')

Here are the results of three runs for all tests:

Elapsed time in seconds:

                    random insert             ordered-by-account       perfect IOT order match
Segment type    Test 1    2    3            Test 1    2    3           Test 1    2     3    
------------    ------ ---- ----            ------ ---- ----            ----- ---- ----
No insert          2.1  0.1  0.1               0.1  0.1  0.1              0.1  0.0  0.1   

                                    AVG                         AVG                        AVG
                                   ----                        ----                       ----
Heap              52.3 20.5 11.4   28.1       33.4 13.8 10.7   19.3       9.1 13.0  9.8   10.6
IOT              137.1 83.7 26.4   82.4      110.8 63.6 39.0   71.2       6.4  6.7  4.0    5.8
Partn'd IOT       33.2  9.5  8.7   17.1       17.1 10.4 14.3   13.9       8.1  5.3  4.5    6.0
                                   ----                        ----                       ----

Allow me to summarise that.

First, the preparation of the data and cycling around the loop, the no-insert tests, are negligible. {The first no_insert test is taking 2 seconds due to the parsing and collecting dictionary information overhead of the first thing done}. We can ignore the overhead of the test harness.

Much more significantly:

  • Each iteration of each test, things get faster {with one odd exception}
  • For Random and semi-ordered inserts, inserting into the IOT is slower than the Heap by a factor of 300% to 400% (3 times to 4 times slower) – what we all probably expected.
  • Inserting into the IOT is Faster than the heap when the insert order matches the structure of the IOT in the perfect_IOT test. Not just the same but 40% faster
  • The partitioned IOT is as Fast or Faster than the the Heap table, never mind the IOT. Over all the partitioned IOT takes 50% TO 75% THE TIME OF THE HEAP to be populated.

Hands up everyone expecting this? NB My hand would be down if I had not done similar tests before.

Why do we see these results?

I’ll just mention that the reason all the tests get quicker with each iteration is that the computer is doing the same thing over and over again and gets used to it, the relevant circuits in the chip are “oiling in”. Of course not, how silly – but it seems to be how non-IT people think of it :-) Basically more and more of each segment is getting cached and that reduces physical IO and speed things up. You see this a lot in testing and you need to be constantly vigilant about it. You might think I should really run the tests until they are steady-state, but that is a very false situation. On real system you don’t tend to get the same thing run over and over and over again on a system with nothing else running.

Much more interestingly, why is the partitioned IOT winning? Well, for the Heap table data is being poured into the leading edge of the table, which takes a little processing and some blocks to be created. However, there are also two indexes being maintained, the Primary Key and that Unique key. For each insert oracle has to find the right spot in each index (root block ->branch block->leaf block->check for row) and insert the new record, occasionally splitting a full leaf block.

For the IOT and the Partitioned IOT there is no table to maintain. In effect there are just two indexes to maintain. For the normal IOT this benefit is outweighed by the cost of working with a less space efficient Primary Key index. As there is the whole of the row in each index entry there are fewer index entries possible per leaf block and so more block need to be read in from disc and more splits are needed. Thus the inserts into the IOT are slower.
But for the partitioned IOT not only is there no table to maintain but the table (really the Primary Key) is partitioned – on date. We are creating data for a single day at a time. All the relevant blocks will be in the one partition (and it’s the latest and smallest partition) and quickly all cached in memory and processed in memory. It’s the efficient use of the cache again. For both the IOT and the Heap, the primary key segment being processed is larger and the inserts are spread across the whole index.

Ordering data to match the index order (ordered_by_account tests) helps for all three segment types – heap, IOT and partitioned IOT. We have two indexes and the data is being ordered for both in effect, as one is on the columns we are ordering the data creation for and the other, the unique key, is generated by an ascending sequence. The gain is about 25, 30% and is due to working through the indexes in order as opposed to skipping about. If the indexes were significantly larger than the buffer cache or on a highly active system where the turnover in the cache was very high (indeicated by a low buffer cache hit ratio :-) ) then the benefit would be higher.
The perfect ordering test (perfect_iot_order_match) is a special case. Not only is the data being processed in order of the indexes but only one part of the index is being updated – for a single account. In this situation the maintaining of the index is so efficient that the time saved not processing the heap table means both the IOT and IOT_P win out over the heap. So in this special case, inserts into IOTs are faster than inserts into a heap table.

And finally a quick summary:

  • Inserting records into an IOT carries about a 200-400% overhead, depending on the number of indexes
  • Ordering the data to match the IOT gives a 25-30% advantage, but it similarly aids update of the primary key on the heap
  • Partitioning the IOT such that a smaller chunk of the index is processed can be highly beneficial. I don’t cover it specifically in the above but you can probably appreciate that partitioning on a heap can be beneficial to processing indexes on the heap also.
  • If you can organise to create your IOT such that data is created only in one or a few distinct points in the structure (you might have to tweak you data load too of course), insert performance might match or even beat that for a heap table with the same indexes
  • Using methods to maximise efficient use of your buffer cache and reduce IO is a very powerful performance tuning aid, not just for select

BTW I am conscious of the fact that the above may surprise many and you might wonder if my data or tests are flawed. I have stripped out most of the code from this post as it was just becoming massive and confusing, but contact me and I will happily supply all the example scripts and data.

IOT P6(a) Update November 8, 2011

Posted by mwidlake in Architecture, development, performance, Testing.
Tags: , , , ,

In my last post, IOT part 6, inserts and updates slowed down, I made the point that IOT insert performance on a relatively small Oracle system was very slow, much slower than on a larger system I had used for professional testing. A major contributing factor was that the insert was working on the whole of the IOT as data was created. The block buffer cache was not large enough to hold the whole working set (in this case the whole IOT) once it grew beyond a certain size. Once it no longer fitted in memory, Oracle had to push blocks out of the cache and then read them back in next time they were needed, resulting in escalating physical IO.

I’ve just done another test which backs up this claim. I altered my test database so that the block buffer cache was larger, 232MB compared to 100MB in my first tests. The full IOT is around 200MB

Bottom line, the creation of the IOT was greatly sped up (almost by a factor of 4) and the physical IO dropped significantly, by a factor of 20. As a result, the creation of the IOT was almost as fast as the partitioned IOT. It also shows that the true overhead on insert of using an IOT is more like a factor of 2 to 4 as opposed 6 to 8.

You can see some of the details below. Just to help you understand them, it is worth noting that I had added one new, larger column to the test tables (to help future tests) so the final segments were a little larger (the IOT now being 210MB as opposed to 180MB in the first tests) and there was a little more block splitting.

                        Time in Seconds
Object type           Run with       Run with
                     100MB cache    232MB cache
------------------  ------------    -----------   
Normal Heap table          171.9          119.4   
IOT table                1,483.8          451.4     
Partitioned IOT            341.1          422.6 

-- First reading 100MB cache
-- second reading 232MB cache 
STAT_NAME                            Heap    	IOT	      IOT P
-------------------------------- ---------- -----------  ----------
CPU used by this session            5,716         7,222       6,241
                                    5,498         5,967       6,207

DB time                            17,311       148,866      34,120
                                   11,991        45,459      42,320

branch node splits                     25            76          65
                                       25            82         107

leaf node 90-10 splits                752         1,463       1,466
                                      774         1,465       1,465

leaf node splits                    8,127        24,870      28,841
                                    8,162        30,175      40,678

session logical reads           6,065,365     6,422,071   6,430,281
                                6,150,371     6,544,295   6,709.679

physical read IO requests             123        81,458       3,068
                                      36          4,012       1,959

physical read bytes             2,097,152   668,491,776  25,133,056
                                1,400,832    34,037,760  16,048,128

user I/O wait time                    454       139,585      22,253
                                       39        34,510      19,293

The heap table creation was faster with more memory available. I’m not really sure why, the cpu effort was about the same as before and though there was some reduction in physical IO with the larger cache, I suspect it might be more to do with both the DB and the machine having been recently restarted.

All three tests are doing a little more “work” in the second run due to that extra column and thus slightly fewer rows fitting in each block (more branch node and leaf node splits), but this just highlights even more how much the IOT performance has improved, which correlates with a massive drop in physical IO for the IOT creation. If you check the session logical reads they are increased by a very small, consistent amount. Physical read IO requests have dropped significantly and, in the case of the IOT, plummeted.

I believe the 90:10 leaf node splits are consistent as that will be the maintaining of the secondary index on ACCO_TYPE and ACCO_ID, which are populated in order as the data is created (derived from rownum).

What this second test really shows is that the efficiency with which you are able to make use of the database cache is incredibly significant. Efficiently accessing data via good indexes or tricks like IOTs and hash tables is important but it really helps to also try and consider how data is going to be recycled within the cache or used, pushed out and then reused. A general principle for batch-type work seems to me to be that if you can process it in chunks that can sit in memory, rather than the whole working set, there are benefits to be gained. Of course, partitioning can really help with this.

{If anyone is wondering why, for the heap table, the number of physical IO requests has dropped by 70% but the actual number of bytes has dropped by only 30%, I’m going to point the finger to some multi-block read scan going on, either in recursive code or, more likely, my code that actually gathers those stats! That would also help explain the drop in user IO wait time for the heap run.}

Just for completeness, here is a quick check of my SGA components for the latest tests, just to show I am using the cache size I claim. All of this is on Oracle 11.1 enterprise edition, on a tired old Windows laptop. {NB new laptop arrived today – you have no idea how hard it has been to keep doing this blog and not play with the new toy!!!}. If anyone wants the test scripts in full, send me a quick email and I’ll provide them.:

-- sga_info.sql
-- Martin Widlake /08
-- summary
set pages 32
set pause on
col bytes form 999,999,999,999,999 head byts___g___m___k___b
spool sga_info.lst
select * 
from v$sgainfo
order by name
spool off
clear col
NAME                             byts___g___m___k___b RES
-------------------------------- -------------------- ---
Buffer Cache Size                         243,269,632 Yes
Fixed SGA Size                              1,374,892 No
Free SGA Memory Available                           0
Granule Size                                4,194,304 No
Java Pool Size                              4,194,304 Yes
Large Pool Size                             4,194,304 Yes
Maximum SGA Size                          401,743,872 No
Redo Buffers                                6,103,040 No
Shared IO Pool Size                                 0 Yes
Shared Pool Size                          142,606,336 Yes
Startup overhead in Shared Pool            50,331,648 No
Streams Pool Size                                   0 Yes

IOT Part 6 – Inserts and Updates Slowed Down (part A) November 1, 2011

Posted by mwidlake in Architecture, performance, Testing.
Tags: , , , ,

<..IOT1 – the basics
<….IOT2 – Examples and proofs
<……IOT3 – Significantly reducing IO
<……..IOT4 – Boosting Buffer Cache efficiency
<……….IOT5 – Primary Key Drawback
…………>IOT6(B) – OLTP Inserts

A negative impact of using Index Organized Tables is that inserts are and updates can be significantly slowed down. This post covers the former and the reasons why – and the need to always run tests on a suitable system. (I’m ignoring deletes for now – many systems never actually delete data and I plan to cover IOTs and delete later)

Using an IOT can slow down insert by something like 100% to 1000%. If the insert of data to the table is only part of a load process, this might result in a much smaller overall impact on load, such as 25%. I’m going to highlight a few important contributing factors to this wide impact spread below.

If you think about it for a moment, you can appreciate there is a performance impact on data creation and modification with IOTs. When you create a new record in a normal table it gets inserted at the end of the table (or perhaps in a block marked as having space). There is no juggling of other data.
With an IOT, the correct point in the index has to be found and the row has to be inserted at the right point. This takes more “work”. The inserting of the new record may also lead to an index block being split and the extra work this entails. Similar extra work has to be carried out if you make updates to data that causes the record to move within the IOT.
Remember, though, that an IOT is almost certainly replacing an index on the heap table which, unless you are removing indexes before loading data and recreating them after, would have to be maintained when inserting into the Heap table. So some of the “overhead” of the IOT would still occur for the heap table in maintaining the Primary Key index. Comparing inserts or updates between a heap table with no indexes and an IOT is not a fair test.

For most database applications data is generally written once, modified occasionally and read many times – so the impact an IOT has on insert/update is often acceptable. However, to make that judgement call you need to know

  • what the update activity is on the data you are thinking of putting into an IOT
  • the magnitude of the impact on insert and update for your system
  • the ratio of read to write.

There is probably little point putting data into an IOT if you constantly update the primary key values (NB see IOT-5 as to why an IOT’s PK columns might not be parts of a true Primary Key) or populate previously empty columns or hardly ever read the data.

There is also no point in using an IOT if you cannot load the data fast enough to support the business need. I regularly encounter situations where people have tested the response of a system once populated but fail to test the performance of population.

Now to get down to the details. If you remember the previous posts in this thread (I know, it has been a while) then you will remember that I create three “tables” with the same columns. One is a normal heap table, one is an Index Organized Table and one is a partitioned Index Organized Table, partitioned into four monthly partitions. All tables have two indexes on them, the Primary Key index (which is the table in the case of the IOTs) and another, roughly similar index, pre-created on the table. I then populate the tables with one million records each.

These are the times, in seconds, to create 1 million records in the the HEAP and IOT tables:

                  Time in Seconds
Object type         Run_Normal
------------------  ----------
Normal Heap table        171.9  
IOT table               1483.8

This is the average of three runs to ensure the times were consistent. I am using Oracle V11.1 on a machine with an Intel T7500 core 2 Duo 2.2GHz, 2GB memory and a standard 250GB 5000RPM disk. The SGA is 256MB and Oracle has allocated around 100MB-120MB to the buffer cache.

We can see that inserting the 1 million rows into the IOT takes 860% the time it does with a heap table. That is a significant impact on speed. We now know how large the impact is on Insert of using an IOT and presumably it’s all to do with juggling the index blocks. Or do we?

This proof-of-concept (POC) on my laptop {which you can also run on your own machine at home} did not match with a proof-of-concept I did for a client. That was done on V10.2.0.3 on AIX, on a machine with 2 dual-core CPUS with hyper-threading (so 8 virtual cores), 2GB SGA and approx 1.5GB buffer cache, with enterprise-level storage somewhere in the bowels of the server room. The results on that machine to create a similar number of records were:

                  Time in Seconds
Object type         Run_Normal
------------------  ----------
Normal Heap table        152.0  
IOT table                205.9

In this case the IOT inserts required 135% the time of the Heap table. This was consistent with other tests I did with a more complex indexing strategy in place, the IOT overhead was around 25-35%. I can’t go into too much more detail as the information belongs to the client but the data creation was more complex and so the actual inserts were only part of the process – this is how it normally is in real life. Even so, the difference in overhead between my local-machine POC and the client hardware POC is significant, which highlights the impact your platform can have on your testing.

So where does that leave us? What is the true usual overhead? Below are my more full results from the laptop POC.

                        Time in Seconds
Object type         Run_Normal    Run_quiet    Run_wrong_p
------------------  ----------    ---------    -----------
Normal Heap table        171.9        81.83         188.27  
IOT table               1483.8      1055.35        1442.82
Partitioned IOT          341.1       267.83         841.22 

Note that with the partitioned IOT the creation took 341 second, the performance ratio to a heap table is only 198% and is much better than the normal IOT. Hopefully you are wondering why!

I’m running this test on a windows laptop and other things are going on. The timings for Run_Quiet are where I took steps to shut down all non-essential services and applications. This yielded a significant increase for all three object types but the biggest impact was on the already-fastest Heap table.

The final set of figures is for a “mistake”. I created the partitions wrong such that half the data went into one partition and the rest into another and a tiny fraction into a third, rather than being spread over 4 partitions evenly. You can see that the Heap and normal IOT times are very similar to the Run_Normal results (as you would expect as these test are the same) but for the partitioned IOT the time taken is half way towards the IOT figure.

We need to dig into what is going on a little further to see where the effort is being spent, and it turns out to be very interesting. During my proof-of-concept on the laptop I grabbed the information from v$sesstat for the session before and after each object creation so I could get the figures just for the loads. I then compared the stats between each object population and show some of them below {IOT_P means Partitioned IOT}.

STAT_NAME                            Heap    	IOT	        IOT P
------------------------------------ ---------- -------------  -----------
CPU used by this session                  5,716         7,222        6,241
DB time                                  17,311       148,866       34,120
Heap Segment Array Inserts               25,538            10           10

branch node splits                           25            76           65
leaf node 90-10 splits                      752         1,463        1,466
leaf node splits                          8,127        24,870       28,841

consistent gets                          57,655       129,717      150,835
cleanout - number of ktugct calls        32,437        75,201       88,701
enqueue requests                         10,936        28,550       33,265

file io wait time                     4,652,146 1,395,970,993  225,511,491
session logical reads                 6,065,365     6,422,071    6,430,281
physical read IO requests                   123        81,458        3,068
physical read bytes                   2,097,152   668,491,776   25,133,056
user I/O wait time                          454       139,585       22,253
hot buffers moved to head of LRU         13,077       198,214       48,915
free buffer requested                    64,887       179,653      117,316

The first section shows that all three used similar amounts of CPU, the IOT and partitioned IOT being a little higher. Much of the CPU consumed was probably in generating the fake data.The DB Time of course pretty much matches the elapsed time well as the DB was doing little else.
It is interesting to see that the Heap insert uses array inserts which of course are not available to the IOT and IOT_P as the data has to be inserted in order. {I think Oracle inserts the data into the heap table as an array and then updates the indexes for all the entries in the array – and I am only getting this array processing as I create the data as an array from a “insert into as select” type load. But don’t hold me to any of that}.

In all three cases there are two indexes being maintained but in the case of the IOT and IOT_P, the primary key index holds the whole row. This means there has to be more information per key, less keys per block and thus more blocks to hold the same data {and more branch blocks to reference them all}. So more block splits will be needed. The second section shows this increase in branch node and leaf block splits. Double the branch blocks and triple the leaf block splits. This is probably the extra work you would expect for an IOT. Why are there more leaf block splits for the partitioned IOT? The same data of volume ends up taking up more blocks in the partitioned IOT – 200MB for the IOT_P in four partitions of 40-60MB as opposed to a single 170MB for the IOT. The larger overall size of the partition is just due to a small overhead incurred by using partitions and also a touch of random fluctuation.

So for the IOT and IOT_P there is about three times the index-specific work being done and a similar increase in related statistics such as enqueues, but not three times as it is not just index processing that contribute to these other statistics. However, the elapsed time is much more than three times as much. Also, the IOT_P is doing more index work than the IOT but it’s elapsed time is less. Why?

The fourth section shows why. Look at the file io wait times. This is the total time spent waiting on IO {in millionths of a second} and it is significantly elevated for the IOT and to a lesser degree for the IOT_P. Physical IO is generally responsible for the vast majority of time in any computer system where it has not been completely avoided.
Session logical reads are only slightly elevated, almost negligably so but the number of physical reads to support it increases from 123 for the Heap table insert to 81,458 for the IOT and 3,068 for the IOT_P. A clue as to why comes from the hot buffers moved to head of LRU and free buffer requested statistics. There is a lot more activity in moving blocks around in the buffer cache for the IOT and IOT_P.

Basically, for the IOT, all the blocks in the primary key segment are constantly being updated but eventually they won’t all fit in the block buffer cache – remember I said the IOT is eventually 170MB and the buffer cache on my laptop is about 100MB – so they are flushed down to disk and then have to be read back when altered again. This is less of a problem for the IOT_P as only one partition is being worked on at a time (the IOT_P is partitioned on date and the data is created day by day) and so more of it (pretty much all) will stay in memory between alterations. The largest partition only grows to 60MB and so can be worked on in memory.
For the heap, the table is simply appended to and only the indexes have to be constantly updated and they are small enough to stay in the block buffer cache as they are worked on.

This is why when I got my partitioning “wrong” the load took so much longer. More physical IO was needed as the larger partition would not fit into the cache as it was worked on – A quick check shows that logical reads and in fact almost all statistics were very similar but 26,000 IO requests were made (compared to 81,458 for the IOT and 3,068 for the correct IOT_P).

Of course, I set my SGA size and thus the buffer cache to highlight the issue on my laptop and I have to say even I was surprised by the magnitude of the impact. On the enterprise-level system I did my client’s proof of concept on, the impact on insert was less because the buffer cache could hold the whole working set, I suspect the SAN had a considerable cache on it, there was ample CPU resource to cope with the added latching effort and the time taken to actually create the data inserted was a significant part of the workload, reducing the overall impact of the slowness caused by the IOT.

{Update, in This little update I increase my block buffer cache and show that physical IO plummets and the IOT insert performance increases dramatically}.

This demonstrates that a POC, especially one for what will become a real system, has to be a realistic volume on realistic hardware.
For my client’s POC, I still did have to bear in mind the eventual size of the live working set and the probably size of the live block buffer cache and make some educated guesses.

It also explains why my “run_quiet” timings showed a greater benefit for the heap table than the IOT and IOT_P. A windows machine has lots of pretty pointless things running that take up cpu and a bit of memory, not really IO so much. I reduced the CPU load and it benefits activity that is not IO, so it has more impact on the heap table load. Much of the time for the IOT and IOT_P is taken hammering the disk and that just takes time.

So, in summary:

  • Using an IOT increases the index block splitting and, in turn, enqueues and general workload. The increase is in proportion to the size of the IOT compared to the size of the replaced PK.
  • The performance degredation across the whole load process may well be less than 50% but the only way to really find out is to test
  • You may lose the array processing load that may benefit a heap table load if you do the load via an intermediate table.
  • With an IOT you may run into issues with physical IO if the segment (or part of the segment) you are loading into cannot fit into the buffer cache (This may be an important consideration for partitioning or ordering of the data loaded)
  • If you do a proof of concept, do it on a system that is as similar to the real one as you can
  • Just seeing the elapsed time difference between test is sometimes not enough. You need to find out where that extra time is being spent

I’ve thrown an awful lot at you in this one post, so I think I will stop there. I’ve not added the script to create the test tables here, they are in IOT-5 {lacking only the grabbing of the v$sesstat information}.


Get every new post delivered to your Inbox.

Join 152 other followers