jump to navigation

Northern Server Technology Day – Advert April 22, 2010

Posted by mwidlake in Meeting notes.

I’m a bit late doing this, but next week is the UKOUG Northern Server Technology Day.
As the title suggests, this is a day of technical presentations for Oracle users in the North of England. Of course, it is open to any UKOUG member (and anyone else, for a small fee), but the idea behind it is to provide a user group event for those who are not based in the South of England and so might struggle to the usual events. This year the day is being held in Leeds, which is where I was at college, so I am more than happy to return to my old stomping ground and support the day with a presentation on Database Statistics.

There are other, probably much better, presentations by David Kurtz, John Hallas, John Scott, Piet de Visser and David Phizacklea too. I’ve seen them all do their stuff before and it should be a cracking day. Check out the agenda via the link at the top of this post.

If you are an Oracle user who is based up in the North and have an interest in the meeting, I would urge you to make the extra effort to attend. It is a constant complaint that the UKOUG is biased towards having events in Reading, London and the Midlands and not a lot elsewhere, but to have events further afield there needs to be an active and interested audience to justify the event. The better the event is supported, the more likely it is to continue and prosper and potentially attract other UKOUG SIG meetings North.

Actually, on that note, I will also mention the Scotland 2010 conference meeting, a must for Oracle users in Scotland or those who would like to mix a technical conference with visiting Scotland {and I have to say, I do enjoy visiting Scotland}. Not sure yet if I can get there myself this year 😦


INTERNAL_FUNCTION() Impact April 21, 2010

Posted by mwidlake in internals, performance.
Tags: , ,

You may occasionally see something like the following in the filter predicates of an Explain Plan:


What is INTERNAL_FUNCTION, what is the use of INTERNAL_FUNCTION() in Oracle explain plans and why am I putting things like ORACLE INTERNAL_FUNCTION partition exclusion and meaning of internal_function performance almost as random strings in this blog?

Well, I want to get the hit rate up for this post as when I did a quick search on INTERNAL_FUNCTION the current top hit is This unfortunately misleading posting that tells you “It is not a significant overhead in SQL execution performance”. This internal function IS potentially a significant overhead in SQL Execution performance. You can’t add comments to the above site {just book a course or order a book}, so I’m having to create my own page…

INTERNAL_FUNCTION is, as the above posting says, where oracle does an implicit data conversion. In the shown case, as well as in my example in this blog and in most cases I have seen, from timestamp to date.

Why is it an issue? Because it is a function on a column and as such it can have a disasterous impact on sql execution performance.

This code below ( which has most select columns removed for simplicity and to protect the innocent) was performing very poorly, taking over 30 minutes to complete. This is all on Enterprise edition of Oracle.

SELECT trans0_.ID as ID3_1_
, round(transfers0_.AMOUNT,2) as formula10_0_
FROM VW_PMTT trans0_
WHERE  (trans0_.MODIFIED_DT between :p0 AND :p1
        AND trans0_.PERS_ID = :p2)
and transfers0_.ACCOUNT_ID=:p3

This is the plan and the significant filter/access predicates:

Plan hash value: 3261981189
| Id  | Operation                         | Name                 | Rows  |Bytes| Cost   |Pstart|Pstop|
|   0 | SELECT STATEMENT                  |                      |       |     | 67841  |      |     |
|   1 |  SORT ORDER BY                    |                      |     8 | 920 | 67841  |      |     |
|   2 |   VIEW                            | VW_PMTP              |     8 | 920 | 67840  |      |     |
|   3 |    UNION-ALL                      |                      |       |     |        |      |     |
|*  4 |     FILTER                        |                      |       |     |        |      |     |
|   5 |      NESTED LOOPS                 |                      |     4 | 280 | 28490  |      |     |
|*  6 |       TABLE ACCESS BY INDEX ROWID | W_TBLP               |     1 |  14 |     3  |      |
|*  7 |        INDEX UNIQUE SCAN          | PK_W_TBLP            |     1 |     |     2  |      |     |
|   8 |       PARTITION RANGE ALL         |                      |     4 | 224 | 28487  |    1 | 1580|
|*  9 |        TABLE ACCESS BY LOC INd RID| W_PAYMENT            |     4 | 224 | 28487       1 | 1580|
|* 10 |         INDEX RANGE SCAN          | IDX_W_P_TRANS        | 66936 |     |  3345  |    1 | 1580|

Predicate Information (identified by operation id):

   6 - filter("PLY"."ACCOUNTID"=:P3)
   7 - access("PLY"."PERS_ID"=:P2)

As you can see, the view is translated into a simple two-table join {and unioned with a second two-table join, which I removed again for clarity, but if anyone wants the full plan and statement, email me} where:

  • one table is accessed on a unique index (via bind variable P2).
  • Row filtered for P3
  • This table is then joined to a Partitioned table in a nested loop 
  • The table being accessed via a local index range scan.

At first glance, the plan may look fine, but Look at the pstart and pstop. 1 to 1580. That is every partition in the table.
The predicate information shows that the INTERANAL_FUNCTION(“PT”.”MODIFIED”) column is being compared to P0 and P1.

The partition key on the table is:-

Name of Table : W_PAYMENT

------------------------------ ---------- --- -----------
USER.W_PAYMENT                 TABLE       1 MODIFIED

ie the column that is being flitered by. Why no partition pruning?

Partition pruning is not occuring because of the “INTERNAL_FUNCTION” being applied to that column. The CBO is not able to understand how the result of a function will match to the partition values of the column. Not even it’s own, internally used function 😛

For the above, the “date” bind variables were defined as

p0 timestamp := to_timestamp(’01-FEB-2010 00:00′,’DD-MON-YYYY HH24:MI’);
p1 timestamp := to_timestamp(’03-FEB-2010 00:00′,’DD-MON-YYYY HH24:MI’);

I then altered the code such that the bind variables were defined as dates.

p0 date := to_date(’01-FEB-2010′,’DD-MON-YYYY’);
p1 date := to_date(’03-FEB-2010′,’DD-MON-YYYY’);

The plan and significant predicates were now:

Plan hash value: 346162108
| Id  | Operation                         | Name                 | Rows  |Bytes| Cost   |Pstart|Pstop|
|   0 | SELECT STATEMENT                  |                      |       |     | 67841  |      |     |
|   1 |  SORT ORDER BY                    |                      |     4 | 460 | 67841  |      |     |
|   2 |   VIEW                            | VW_PMTP              |     4 | 460 | 67840  |      |     |
|   3 |    UNION-ALL                      |                      |       |     |        |      |     |
|*  4 |     FILTER                        |                      |       |     |        |      |     |
|   5 |      NESTED LOOPS                 |                      |     2 | 140 | 28490  |      |     |
|*  6 |       TABLE ACCESS BY INDEX ROWID | W_TBLP               |     1 |  14 |     3  |      |     |
|*  7 |        INDEX UNIQUE SCAN          | PK_W_TBLP            |     1 |     |     2  |      |     |
|   8 |       PARTITION RANGE ALL         |                      |     2 | 112 | 28487  |  KEY | KEY |
|*  9 |        TABLE ACCESS BY LOC INd RID| W_PAYMENT            |     2 | 112 | 28487  |  KEY | KEY |
|* 10 |         INDEX RANGE SCAN          | IDX_W_P_TRANS        | 66936 |     |  3345  |  KEY | KEY |

Predicate Information (identified by operation id):

   6 - filter("PLY"."ACCOUNTID"=:P3)
   7 - access("PLY"."PERS_ID"=:P2)
   9 - filter(("PT"."TYPE"=1184769 AND "PT"."MODIFIED">=:P0 AND "PT"."MODIFIED"<:P1

The expected rows and bytes drop but the key difference in the plan is Pstart and Pstop are now KEY KEY. ie the CBO will evaluate at run time the values coming back from the Table access of W_TBLP and partition prune the access of the table.

In this situation, the original query took over 30 minutes to return. The new code took 18 seconds.

I tried one final test. I set the bind variables to be varchar2 so that implicit data conversion would occur:
p0 varchar2(20) := ’01-FEB-2010 00:00′;
p1 varchar2(20) := ’03-FEB-2010 00:00′;

With VARCHAR2 the CBO was able to do the implicit conversion without the INTERNAL_FUNCTION and the plan/filter predicates were exactly as for native dates.

It would seem this issue with INTERNAL_FUNCTION blights timestamp to date implicit data conversion but not some other implicit data conversions. Without testing further, I can’t be sure which.

This is a documented feature by Oracle. Unusually (and much to their credit) Oracle even document the potential negative impact of this particular construct with timestamps and dates. See this section in chapter 5 of the Oracle 10g data warehousing guide.

I better balance the fact I linked to an inaccurate posting on the subject with a few good ones.
This is a nice one from McLaud on the OraFAQ site where he has come across it as a problem and with help shows how it is due to implicit conversion of a timestamp from Hibernate.
This blog on the issue arising with Java is a nice example of the fix (use a date to hold a date, not a timestamp).

Finally, this link is to an older but very, very valid rant by Tom Kyte about implicit data conversion and using the correct data type for your data. {Don’t go using numbers or varchars to hold dates, just don’t, and similarly you should avoid implicit data conversion}.

Enrichment of the Code Monkey Environment April 15, 2010

Posted by mwidlake in Friday Philosophy.
Tags: ,
1 comment so far

In animal welfare there is a concept called “Environment Enrichment”. This is where you do things to make the life of the animal in captivity more enjoyable. Often it is centred on how you feed the animal – either hiding their food so they need to forage or presenting it in a way that better matches their natural behaviour . So, for example, you might take your gerbils out of their cage and then hide seeds and bits of fruit around the place {or get Ben Fogle dragging chunks of meat behind a Land Rover through the lion enclosure at Longleat Safari Park so the lions can chase after it}. Or you might buy some of that expensive, transparent, coloured plastic tubing that plugs together in little mazes – as Gerbils of course run around pink, blue and orange transparent tubes in the wild…

Where I am working at the moment, Environment Enrichment for the captives is achieved by relocating the coffee cups in the kitchen overnight. We all turn up in the morning and go up in little groups to play the “open the cupboards in random” game in order to track down the mugs. Never the same cupboard two days in a row – except occasionally just to keep us on our toes.

We also have several fruit bowls and someone occasionally fills them with a selection of fresh fruit. Of course, as we are Code Monkeys, the bananas are most popular. What most of my fellow inhabitants have not yet worked out is the occasional bonus cache in the pot that looks like a flower vase. It’s usually apples.

The best bit of the environment enrichment is at the end of the week though. 5pm Friday, with the gentle tick of cooling laptops and the gentle lowering drone of hard disks winding down, and the glow of the LCD sceens dimming, the oasis is opened. A bottle of beer or can of coke can be obtained and you all stand around and discuss anything but computers. Well, OK, maybe computers a bit, but they have to be non-work computers.

Relax with Good Security :-) April 14, 2010

Posted by mwidlake in Architecture.
Tags: , ,
1 comment so far

It’s nice to know that your computer system is very secure. It allows you to relax.

Is this because you feel confident that your data is protected and that your systems cannot be compromised by malicious actions?

Well maybe, but in my case, today (*), this is not why I can relax. Security is allowing me to relax as I’ve been locked out of my account at my client’s site and it is hard to tune SQL/review design documents if you can’t log on. It is a little bizarre that my account expired today as I know the date my account is set to self destruct – I filled in the relevant forms myself and simply had my boss sign them before I popped them over to the security team. Unfortunately, I can’t pop back over to them to jolly things along as I am now in a remote location (really, very, very remote!).

I just have to wait patiently for HR, Security and Management to complete the complex little Morris Dance they have to do to confirm I am a real person, need the access and have not recently been asked to leave {I hope I am not going to be asked to leave!}.

I’m not really complaining. Having worked for the NHS back in the late 80’s and early 90’s, I have shocking tales to tell, such as of taking home confidential patient clinical reports in order to play with “SQL*report writer” on my personal desktop, and similar stories about modern organisations in the private sector {well what do you know, person X owes HOW MUCH?}. So I am glad security is an order of magnitude better now. {No, honestly, it really is! No matter how bad you think things are where you are now, you should see the stuff I saw back then. These days, you at least have the ability to lock things down, if not the real-world reality.}

But, like all process-oriented things, it has to be designed to work well to be, well, effective. I got locked out of the same systems in a similar way just before Christmas. How many people are being paid a day’s wage to ring up the helpdesk and drink endless tea?

(*) I wrote this blog a while back. It might even have been before I had a blog. But a similar incident reminded me. Oh, and what was my reward for an enforced day being paid to drink tea and read a copy of “Oracle Scene”? Three days of working like a madman to catch up on that lost day. *sigh*. I did not get paid overtime.

Performance Tipping Points April 13, 2010

Posted by mwidlake in performance, statistics.
Tags: , ,
add a comment

This week I came across a nice example of a performance tipping point. This is where Everything is OK until you reach a point where it all quickly cascades to Not OK.

The below shows the timings for a regulalry run “Alert”. Four times an hour we want to know if something quite important has happened on the system.

---- - ------------- ------------- --------- ------------- -------------------- -----------------
292 S 0410 03:06:00 0410 03:06:31 .000359 0410 03:06:00 20100410 02:51:00 20100410 03:06:00
292 S 0410 03:21:00 0410 03:21:35 .000405 0410 03:21:00 20100410 03:06:00 20100410 03:21:00
292 S 0410 03:36:00 0410 03:36:38 .000440 0410 03:36:00 20100410 03:21:00 20100410 03:36:00
292 S 0410 03:51:00 0410 03:51:33 .000382 0410 03:51:00 20100410 03:36:00 20100410 03:51:00
292 S 0410 04:06:00 0410 04:06:28 .000324 0410 04:06:00 20100410 03:51:00 20100410 04:06:00
292 S 0410 04:21:00 0410 04:21:44 .000509 0410 04:21:00 20100410 04:06:00 20100410 04:21:00
292 S 0410 04:36:00 0410 04:36:27 .000313 0410 04:36:00 20100410 04:21:00 20100410 04:36:00
292 S 0410 04:51:00 0410 04:51:34 .000394 0410 04:51:00 20100410 04:36:00 20100410 04:51:00
292 S 0410 05:06:00 0410 05:06:44 .000509 0410 05:06:00 20100410 04:51:00 20100410 05:06:00
292 S 0410 05:21:00 0410 05:21:43 .000498 0410 05:21:00 20100410 05:06:00 20100410 05:21:00
292 S 0410 05:36:00 0410 05:37:01 .000706 0410 05:36:00 20100410 05:21:00 20100410 05:36:00
292 S 0410 05:51:00 0410 05:52:21 .000938 0410 05:51:00 20100410 05:36:00 20100410 05:51:00
292 S 0410 06:06:00 0410 06:08:09 .001493 0410 06:06:00 20100410 05:51:00 20100410 06:06:00
292 S 0410 06:21:01 0410 06:24:00 .002072 0410 06:21:01 20100410 06:06:00 20100410 06:21:01
292 S 0410 06:36:00 0410 06:40:12 .002917 0410 06:36:00 20100410 06:21:01 20100410 06:36:00
292 S 0410 06:51:00 0410 06:56:54 .004097 0410 06:51:00 20100410 06:36:00 20100410 06:51:00
292 S 0410 07:06:00 0410 07:13:17 .005058 0410 07:06:00 20100410 06:51:00 20100410 07:06:00
292 S 0410 07:21:00 0410 07:29:42 .006042 0410 07:21:00 20100410 07:06:00 20100410 07:21:00
292 S 0410 07:36:00 0410 07:47:48 .008194 0410 07:36:00 20100410 07:21:00 20100410 07:36:00
292 S 0410 07:51:00 0410 08:08:07 .011887 0410 07:51:00 20100410 07:36:00 20100410 07:51:00
292 S 0410 08:08:07 0410 08:29:43 .015000 0410 08:08:07 20100410 07:51:00 20100410 08:08:07
292 S 0410 08:29:43 0410 08:50:10 .014201 0410 08:29:43 20100410 08:08:07 20100410 08:29:43
292 S 0410 08:50:10 0410 09:22:28 .022431 0410 08:50:10 20100410 08:29:43 20100410 08:50:10
292 S 0410 09:22:28 0410 10:27:11 .044942 0410 09:22:28 20100410 08:50:10 20100410 09:22:28
292 S 0410 10:27:11 0410 12:57:16 .104225 0410 10:27:11 20100410 09:22:28 20100410 10:27:11
292 F 0410 12:57:16 0410 14:50:26 .078588 0410 12:57:16 20100410 10:27:11 20100410 12:57:16
292 F 0410 14:50:26 0410 16:49:42 .082824 0410 14:50:26 20100410 10:27:11 20100410 14:50:26
292 F 0410 16:49:42 0410 19:06:10 .094769 0410 16:49:42 20100410 10:27:11 20100410 16:49:42

The first half dozen records show the program ID 292 running in half aminute or so. It varies a little, from just under 30 seconds to 45 seconds. Each run kicks off 15 minutes after the previous and, if you check the START_PARAM and END_PARAM on the right of the listing, you can see that the report runs for the previous 15 minutes. ie the period since the last run ( including the running time of the last itteration).

Then, at 05:36 something happens. The execution takes a full minute. The next run takes 1 minute and 20 seconds. The next run takes over two minutes and each following execution takes a little longer and longer.

At 07:51, 1.5 hours later, something significant happens. The execution takes more than 15 minutes to run. This report is run every 15 minutes. I’m sure you can appreciate that this is a problem. What happens now depends on your architecture.

If you have a scheduler that simply kicks off the report every 15 minutes, at the next scheduled time (08:06 in my case) a new execution will start and you will have two version of the report running for a couple of minutes. As time progresses this overlap increases. When the run time reaches 30 minutes you will now start having 3 reports running at the same time. And soon you will get three, four, five etc version running at the same time. Depending on the number of CPUs and IO bandwidth of your system, how long will it be until it will be doing nothing but running this code?
If the code is doing somthing more than simply selecting data, the chance of the concurrent versions competing for locks or resources is high and can mean that the slow-down of the code escalates rapidly and in a very, very short time, your system is on it’s knees.

In our case, we have some protection against this. The scheduler detects that the previous version has not finished and it waits until it has done so before starting the next execution. so the next execution is delayed. In my example the 08:06 execution is delayed until 08:08, 2 minutes late.

We still have two possible situations. The report could simply continue to slow down at the previous rate and just be delayed longer and longer. In my example it has been slowing down at around 1 minute 20 seconds per run for the last 5 runs.

However, The next execution is further delayed until 08:29 – which is 6 minutes beyond the 15 minute window from 08:08. This latets run of the code is 5 minutes 30 seconds slower than the previous execution, not 1 minute 20 seconds slower. We still have a tipping point. If you remember, this report runs to cover the period since the last execution. As the last execution took more than 15 minutes, the next run has to report on more than 15 minutes. 21 minutes in this case.

The slow-down now rapidly escalates due to not just whatever was slowing the report down initially but also the growing reporting period.

Just 5 runs and 5 hours later, at 12:57:16, the report fails. It now runs for so long that it cannot recreate the data as it looked at the start of the run and we get snapshot-too-old errors. In the listing you just see the second column go from S to F.

It is game over. Thankfully only one copy of the report is running at any time (A version is kicked off as soon as the previous one fails) and so, worst case, is using only one process on the box, one CPU maximum and the IO that one thread can demand.

What was the root cause? Time and statistics and data volume.
Statistics because the job that collects statistics failed earlier.
Time because at 05:50 or so, the run window of 15 minutes was far enough out of the range of known date-times for a partition that the CBO decided only a few rows would be found and swapped to a nested-loop execution.
Data volume as data was coming in quicker as the day progressed and the nested loop plan performance was highly susceptible to increases in data volume.

For an excellent posting on why code suddenly changes it’s execution plan when nothing has happened but time passing, seethis excellent post and comments on the topic on Richard Foote’s blog

{And isn’t his blog so much skinnier than mine? 🙂 }

Welcome to the Wider Layout April 2, 2010

Posted by mwidlake in Blogging.

Noticed anything new about my blog? It seems to have been eating pies and grown wider, somewhat like it’s author.

I’d really appreciate any feedback on if this new width is acceptable or is too large for how you browse the internet. I know it will be too wide for netbook users (of which I am one) without scaling the screen to 75%, but I seriously needed more space.

I very much liked the look of my blog when I set it up and I’ve been very happy with the service WordPress supplies for free for my blog. Their technical support have even answered a couple of questions I have thrown their way over the last 10 months or so {I get better free support off WordPress than I have had from software I’ve paid for}. But the narrow window for the text of my blog has been annoying me since the second or third week after I started. As for code and Explain Plan layout, boy that has been a pain. The use of “sourcode” was an early and much appreciated find and this formatting tool has been improved several times, even in the last 12 months. In fact, that earlier post I just linked to does not fully make sense now as one failed attempt to use the “sourcode” layout now works.

But the width limitation of the style sheet I chose forces me to spend a long time trying to edit code and especially Explain Plan outputs such that it lays out in a readable format. It makes doing technical blogs even more time-consuming.

It all came to a head a month ago when I started using screen shots in my blog. They are rendered down to an illegible 460 pixels wide. This was a real blow as, along with the other threads I want to blog about, I want to do some stuff on AWR in OEM and that can only be done with screenshots.

As you can see, this width issue is no longer such a problem for me. WordPress let you alter the CSS of your style sheet for a fee. But WordPress being the nice chaps they are, you can try it out for free first, to ensure you can do what you want to do. So I got a friend who understands all this web/CSS/html stuff to help me and within a couple of hours we had it sorted. The final step was to give 15 dollars a year to WordPress {I see it as buying them a couple of pints} and I can now bring to you the Wider Widlake Ramblings.

If you are curious as to how this change in width aids layout, see this post I have edited to show before and after layouts.

Screenshots are still going to be an issue, I can’t go beyond 800 pixels for the text window as with the side-bar, window margins, scroll bar etc, the blog would become wider than 1200 pixels (and I decided that 1200 pixels was as far as I could go without it becoming a major problem for some people to read my blog). But code layout is now fine and I can manage screenshots if I am careful.

It pains me to say this, but bigger is sometimes better.

Friday Philosophy – I killed a presentation April 1, 2010

Posted by mwidlake in Friday Philosophy, performance, statistics.
Tags: , , ,

Well, this week saw the latest Management and Infrastructure SIG. I won’t go into the SIG meeting itself just now, maybe later this weekend. If you want an opinion on it, you might like to see what Graham Oaks said.

Being Chair of the SIG I have to find presenters. Being also one of those rare people who enjoys presenting, I know I can always tap one person to present. Me. {If only I could sing, play an instrument, act, do impression, put both my ankles behind my ears(*) or anything that was of any general interest, getting an audience would be so much easier}.

I decided to tackle a topic I have tried before and which I previously did a very poor show of: “General principles on gathering database statistics”.
It is not a title to strike joy into the soul, I know, but it is really important to getting good and reliable performance out of any modern Oracle database. Even other Oracle DBA types tend to find the topic worthy but boring, but I seem to have been forced to know a lot about it and I’ve become sadly very passionate about it.

So, I tried again. I mean, how long should it take to describe the issues with database statistics and the general principles to gathering them? I took my old presentation and stripped out all code, all SQL syntax, all listing of tables and got it down to describing the process. Then I started adding the bits and pieces you need to know to get this half-right. 132 slides. Hmmmm

I tried again and took it higher level. 48 slides. I can do 48 slides in something between 45 minutes and an hour. I was ready.

I had the last presentation slot of the day. I figure if I am going to foist myself on the audience, they should have the right to leave before I start, without missing any of the good stuff. I had 3/4 of the day’s attendees still with me when I started.

I think I killed my audience. Actually, I know I killed my audience. I lost a few as the slot progressed {I kidded myself they had to get back to the office/catch a train} and I made the fatal mistake of not shutting up when I had done 45 minutes. You see, there was all this important stuff I had not mentioned yet! Thankfully, I had a friend in the second row and when I saw him lose the will to live, I stopped. I’d been at it for 70 minutes. What we really needed was the Chair to shut me up but I was the chair.

Anyway, I came away from the talk with two realisations.

  • The topic of gathering database statistics is too large to cover in one presentation session.
  • It really is very boring, even when you are passionate about it.

I think I have no choice but to make this a blog thread (as I threatened to do so about 1 month ago). But I have also promised myself to finish off SQL Audit before I start a new thread and that was 3 months ago.

So I think I am going to have to put some time into this blogging lark. It’s OK though, I seem to go on about database statistics so much that even my cat is staring to avoid me. I’ll just stop watching Star Trek and drinking wine in the evenings and switch to preparing technical blogs. And drinking wine of course.

(*) Up until 2007 I could do this. I get out more now. Despite the stats fixation