jump to navigation

When to Fix & When to Find Out – Friday Philosophy July 15, 2011

Posted by mwidlake in AWR, Friday Philosophy, Testing.
Tags: ,

{warning, this is one of my long, rambling Friday Philosophy blogs. Technical blogs are on the way – though there are some nice AWR screen shots in this one ūüôā }

As a DBA (or System Administrator or Network admin or any other role that involves fixing things that are broken before the shouting starts) there is often a tension between two contending “best practices”:

– getting the system working again as soon as possible.
– understanding exactly what the problem is.

Some experts point out {and I generally agree} that unless you actually find out exactly what the problem was, what you did to fix it and via a test case demonstrate why the fix worked, you are not really solving the problem – You are just making it go away. You (hopefully!) know what you changed or the action you took so you have something that you can repeat which might fix it again next time. (NB this does not apply to the classic “turn it off, turn it back on again”, that nearly always is just an aversion therapy).

But it might not fix it next time.

Or you might be going to way more trouble to fix it than is needed {I’m remembering how flushing the shared pool used to get touted as a way to resolve performance issues, but it is a pretty brutal thing to do to a production database}.

You might even be making other problems worse {like slowing down everything on the production database as the caches warm up again, having flushed out all that data in the SGA, or that good-old-standard of rebuilding indexes that simply do not benefit from the rebuild}.

There is another issue with “just fixing it by trying things” in that you are not really learning about the cause of the issue or about how the technology you are looking after works. A big part of what makes an “expert” an expert is the drive, desire and opportunity to take the time to work this out. It’s that last part I sometimes get grumpy about, the opportunity.
For many of us, we do not have the luxury of investigating the root cause. Because we are under so much pressure to “get it fixed right now and worry about the detail later”. But we do not get to the detail as there is then the next “fix this problem right now” and so it goes on. {Kellyn Pot’Vin does a nice post about what she calls the Superman Conundrum on this topic}.

I’ve had exactly this dilema just a couple of months ago. Below are the details, it’s a bit long so skip to the end of the post if you like…

I was creating test data and I decided to use parallel processing to speed it up. I created a month’s worth of data with PL/SQL and then decided to copy it with a simple “insert into …select” statement, updating dates and a couple of other incrementing columns as part of the insert, using parallel. The creation of the second month’s data took longer than the PL/SQL code for the first month took. What the!!!??? I pulled up the AWR information and I could see that the problem was (possibly) with inter process communication between the parallel processes, as shown by the PX DEQ CREDIT:send blkd wait event.

The below screenshot shows the overall instance workload, the green is CPU and the Pink is “Other”. Only one SQL statement is responsible for all of this effort, via 5 sessions (four of which are parallel threads) You can see that the issue had been going on for over an hour {oh, and to a certain extent these pretty pictures are pointless – I am not looking for the exact solution now, but having loaded the pictures up to the blog site I am damn well going to pretty-up my blog with them}:

Drilling into that one session shows that the bulk of the waits by far is for PX DEq Credit: Send blkd:

By selecting that wait event, I got the histogram of wait times since the system started {I love those little histograms of wait times}:

Note that these waits are for long periods of time, around 1/10 of a second on average and some are for up to 2 or 4 seconds.

The thing is, I had anticipated this problem and increased my PARALLEL_EXECUTION_MESSAGE_SIZE to 16K from the default of 2K already, as I knew from experience that the default is way to small and has slowed down parallel execution for me before. So why was I seeing poorer performane now than I anticipated? I’m not understanding stuff. So I needed to change one thing and see what impact it has and repeat until I got to the bottom of it.

Except I could not – the next team in line was waiting for this data and they already had massive pressure to deliver. My job, what my employer was giving me money to do, was to fix this problem and move on. So, in this instance I just pragmatically got on with getting the task done as quickly as I could.

I did what we all do {don’t we?} under times of accute time pressure. I made a handful of changes, using the knowledge I already have and guessing a little, hoping that one of them would do the trick. This included reducing the degree of parallelism, adding the /*+ append */ hint (I simply forgot the first time around), pre-allocating the required space to the tablespace, muttering “pleaseopleaseoplease” under my breath….

It worked:

The job ran in less than 20 minutes and used less resource during that time. Well, it waited less anyway.
The wait histograms show lots and lots of shorter duration waits:

The duplication took 20 minutes when the previous attempt had been terminated after 2 hours when other factors forced it to be curtailed. Job done.

But the thing is, the problem was not fixed. I got the task done in a timescale that was required, I satisfied the need of the client, but I was and am not sure exactly why.

If I was a permanent employee I would consider pressing for being allowed to spend some time working this out, as my employer benefits from me extending my knowledge and skills. This is not always a successful strategy ūüôā {but it does indicate to me that my employer has a Problem and I need to address that}. In any case, as I was a consultant on this job, I was being paid to already know stuff. So it is now down to me, some personal time, the internet and people more knowledgeble than me who I can ask for help to sort this out.

And that was my main point. Often at work you have to get over the issue. Later on, you work out what the issue was. I wish I could get paid for the latter as well as the former. The real blow for me is that I no longer have access to that site and the information. My job was complete and, whether they have really shut down my access or not, I am not allowed to go back into the systems to dig around. I think I now know the issue, but I can’t prove it.


Friday Philosophy – Software being Good is not Good Enough February 5, 2010

Posted by mwidlake in Friday Philosophy, Uncategorized.
Tags: ,

In a previous Friday Philosophy on In Case of Emergency I mention that something being simply a “Good Idea” with technology is not good enough. Even being a GREAT idea is not enough. It also has to be:

  1. Easy and simple to use. After all, using that funny stick thing behind your steering wheel in a car, to indicate which direction you are turning, seems to be too much of an effort for many people. If installing some bit of softare or running a web page is more than a little bit of effort, most people will not bother.
  2. Quick. No one has patience anymore, or spare time. This will probably be the fourth year in a row¬†I do not plant any vegetables in our garden¬†as I need to spend a day or two clearing and digging over the spot for said veg. You can’t beat home-grown veg. Similarly, I won’t use a web pages that takes as long to load as it does to plant a carrot seed.
  3. Known about. There could be a really fantastic little program Out There that allows you to take a screen shot, add a title and comment and pastes it straight into a document for you, converting to half a dozen common formats on the fly. But I do not know about it. { ScreenHunter is pretty good, I have to say, and when I have shown it to people a lot of them like it}.
  4. Popular. This is not actually the same as “known about”. For a stand-alone application to be good for you,¬†you just need to know where it exists. Like maybe a free building architecture package. Whether thousands of people use it is moot, so long as you can get your extension drawings done in it with ease, that makes it great. But something that relies on the community, like a service to rate local eataries, unless lots of people use it and add ratings, well who cares. There are dozens (if not hundreds)¬†of such community¬†“good ideas” started every day but unless enough people start to use it, it will fizzle out, as the vast majority of them do.

Point 4 is highly relevant to “In Case Of Emergency” as it is simple, quick and relativley known about. It just needs to be ubiquitous.

I became very aware of point 3 a few years ago and also of the ability for very clever people to be sometimes very stupid when it comes to dealing with their fellow humans.

I was working on a database holding vast quantities of DNA information. If you don’t know, DNA information is basically represented by¬†huge long strings of A, C, T and G. So something like AACTCGTAGGTACGGGTAGGGGTAGAGTTTGAGATTGACTGAGAGGGGGAAAAATGTGTAGTGA…etc, etc,¬†etc. These strings are hundreds, thousand, hundreds of thousands of letters long. And Scientists like to search against these strings. Of which there are millions and millions.¬†Not for exact match¬†mind,¬†but kind-of-similar, fuzzy matches, where¬†for example 95% of the ACTGs match but some do not.¬†It’s called a BLAST match.

Anway, suffice to say, it takes a lot of compute power to do this and a fair amount of time to run. There was a service in America which would allow you to submit a BLAST query and get the answer in 20 minutes or so {I have no idea how fast it is now}. 

Some extremely clever chaps I had the pleasure of working with came up with a faster solution. Same search, under 5 seconds. Now that is GREAT. We put together the relevant hadware and software and started the service.  Now I thought it went beyond Good or even Great. It was Amazing (and I mean it, I was amazed we could do a fuzzy search against a billion such strings in 2, 3 seconds using a dozen or so PC-type servers).

No one used it. This was because almost no one knew about it and there was already this slow service people were used to using. People who used the old service never really thought to look for a new one and the chances were they would not have found ours anyway.

I pushed for more to be made of this new, faster service, that it should be advertised to the community, that it should be “sold” to people (it was free to use, by “sold” I mean an attempt made to persuade the scientific community it was worth their while investigating). The response I was given?

“If the service is worth using, people will come and use it”.

No they won’t. And indeed they didn’t. It was,¬†I felt, a stupid position to take by an incredibly inteligent person. How were people to know it existed? Were they just supposed to just wake up one morning knowing a better solution was out there? The internet pixies would come along in the night and whisper about it in your ear?¬†In the unlikely event of¬†someone¬†who would be interested in it just coming¬†across it, were they then going to swap to using it? After all¬†no one else seemed to¬†know about it and it¬†was 2 orders of magnitude faster,¬†suspiciously fast, how could it be any good?

The service got shut down as it was just humming in the corner consuming electricity. No one knew it existed, no one found it, no one came. I can’t but help wonder how much it could have helped the scientific community.

There must be thousands of other “failed” systems across the IT world that never took off just because the people who could use it never knew it existed. Depressing huh?

Testing Methodology – Getting Community Information January 22, 2010

Posted by mwidlake in Testing, Uncategorized.
Tags: ,

<Last post I tried to highlight that knowing specifically what you want to understand before you start testing is important and that it is beneficial to start with the booring official documentation. This is so that you can filter what is Out There in the web, as well as get a good grounding in what is officially possible.

There is no doubt that you can get a lot more information on Oracle features from the community than you can from the manuals and that it is often presented in a more easily understood and relevant way. But some of it is not right for your version or is just plain wrong. I can do no better than to refer to this recent posting by Jonathan Lewis on the topic. In this case, it is the persistance of old and now irrelevant information, perpetuated by the kind but flawed habit people have of pulling information together – but not testing it.

Treat everything on the net with caution if it has no proof. Even then be cautious ūüôā – I know I have “proved” a couple of things that turned out to be misunderstandings…

You can of course post to forums to ask for information and help, and people will often come to your aid. But you are often not really understanding the subject by doing that, you are just learning facts. It’s like learning the dates of the two World Wars and not understanding why the wars occured. I would point you to this posting by Lisa Dobson from a couple of years back, which makes the point about what you stand to learn, and also says something about how to ask for help. If it helps you to do the “right thing”, think of it selfishly. You will get better and more specific help if you have already looked into your issue and can ask specific questions. Also, if you properly learn something rather than just get a couple of facts about it, you are less likely to look foolish when you repeat those facts in a context they are not valid in. And you will.

Sorry, side tracked a little into one of my pet annoyances. Back to topic.

I want to know about SQL AUDIT. So I google SQL AUDIT ORACLE. Lots of hits. Let’s start filtering. First off, anything by “experts-exchange” and the other pay-for forums can forget it. Sorry guys but if I am going to spend money I’ll buy a manual. I will also ignore any stuff by sites that constantly suggest you go on training courses on boats with them or are “Team America” {everyone seen the film by Trey Parker? Puts the claim “We are the American Team” into a different light…}.

Now I go and look at the sites and I try making my search more intelligent, maybe adding in the word DBA_AUDIT_TRAIL and the like. I try and think what words and phrases someone explaining the topic would use which would not be common for other topics. That is why I often use database object names and commands in my searches.

In the case of SQL AUDIT, there are plenty of sites that generally pull together the basics from the manuals and tell you how to audit a few things and see the results and give some nice examples. It gets you further forward, but it’s mostly not very detailed. Just cookery instructions on what to do but not how it works. Thankfully, there is an excellent article by one of the experts in the field, Pete Finnigan.
Maybe I could have replaced this post by simply suggesting you just go to Pete’s web site, read what is there and follow the link from it to the ones he likes. It would work for the topic of SQL AUDIT. However, although it is always good to go to the sites of people you know and trust, it is always worth doing a google and going beyond the first page or two of results. Those first pages are mostly for a handful of very popular sites and sources. A new article or someone who is relatively unknown but has the answers you need may be further down the list, like pages 3 or 4. It is worth 5 mins just checking.

However, you are not likely to find everything you need. Certainly with SQL AUDIT you won’t find much about performance impact other than bland and not-very-helpful generic warnings that “Use of SQL AUDIT can carry a significant performance overhead”. How much overhead? for what sort of audit? And how do you know? In fact, when searching I found this, admittedly old, comment by Pete about there being relatively little “out there” about performance impact of Auditing. That made me smile, as that information was exactly what I was after.

*sigh*. The information I want does not seem to be out there. I need to do some testing.

That is for the next post (“at LAST”, I hear you all cry).

Testing Methodolgy – The Groundwork January 20, 2010

Posted by mwidlake in Perceptions, Testing, Uncategorized.
Tags: , ,
add a comment

<Previous PostNext Post …>

I want to start learning about using SQL AUDIT, as I mentioned a couple of days ago.

First question. What do I want to know about SQL AUDIT? I might just have thought “well, I do not know much about this area and I think I should – so I want to know everything”. That is OK, but personally I find it helps to make up a specific aim. Otherwise you can just flounder about {well, I do}. In this case I have decided I want to know:

  1. The options for auditing who is selecting key tables.
  2. How to audit when those key tables get changed.
  3. The impact on performance of that audit, and if it could be an issue.
  4. (4) follows on from (3), in that I want to find out how to control that performance impact.

For any of you who have been or are code developers, you hopefully appreciate test-driven coding. That is, you decide up front what the new code must achieve and design tests to ensure the code does it. You do not write any code until you have at least¬†thought of¬†the tests and written them down in a document. {Ideally you have written the tests and built some test data before you start, but then in an ideal world you would get paid more, have more holidays and the newspapers would tell the truth rather than sensational rubbish, but there you go ūüôā }

I do not think that learning stuff/testing as much different from developing code, thus the list above. I now know what I want to understand.

What next? I’m going to go and check the Oracle Documentation for the feature. And I am very careful to check the documentation for the version I will use.¬†This is 10.2 for me.¬†I know, the Oracle documentation can be pretty dry, it can be rather unhelpful and it is not very good at examples. But you can expect it to be 90% accurate in what it tells you. You can also expect it to be not-very-forthcoming about the issues, gotchas and bits that don’t work. {I have this pet theory that the official documentation only mentions how things do not work once a feature has been out for a version and people have complained that the documentation should let on about the limitations}.

So, for SQL AUDIT I suggest you go and read:

  • Concepts Manual, chapter 20 Database Security. If I am not rushed I would read the whole chapter, I might realise that what I want to do is better done with some other tool (If I wanted to see who had changed records months down the line, I think I would pick up that database triggers were a better bet, for example).
  • SQL Reference, chapter 13 , the section on AUDIT (no surprises there). I do not do much more than read through the SQL manual once though, as frankly I find it pretty useless for explaining stuff, but it puts into your head what the parts of the command there are and pointers to other parts of the documentation. I’ll read the concepts manual with more attention. In this case, the manual will lead me to:
  • Database Security Guide chapter 8. Which is pretty good, actually.
  • My next source of information, may not immediately spring to mind but I find it very valuable,¬†is¬†to find out which data dictionary objects are involved in the feature. In this case,¬†the previous sources would lead me to go to the Database Reference and check out:
  • DBA_AUDIT_TRAIL, DBA_OBJ_AUDIT_OPTS, DBA_PRIV_AUDIT_OPTS, DBA_STMT_AUDIT_OPTS. And of course, SYS.AUD$. {I actually¬†queried DBA_OBJECTS for anything with the word¬†“AUDIT” in it, check out all the tables and also had a quick peak at the text for any views, which would have led me to SYS.AUD$ if I did not already know about it}.

Why do I go and look at the data dictionary objects and the reference guide? After all, is it not nerdy enough to have put myself through reading the SQL Reference manual? {and let us be honest, it is rarely enjoyable reading the SQL Reference manual}. Because  I want to know how it works, not just how to use it. Seeing the table give me a lot of information and the description of the columns may well tell me a lot more. First thing, SYS.AUD$ only has one index, on a column SESSIONID# (I know, there is another column in the index, I need to get to a DB to check this). Any queries not via this index are going to scan the whole damned thing. Right off, I suspect this will be an issue.

I will DESC the tables, see if there is already any information in them. In this case, there was not. A clean sheet.

Why did I not just go to Google (Bing, Yahoo, whatever) and search? Because I can trust the manuals to tell me stuff which is mostly true and is valid for my release. Not stuff about older versions, about later versions, urban myths or just outright fallacies. The manuals certainly will not tell me everything, far from it, but what it does say is solid stuff. With this reliable start I can now go to other sources and have some chance of filtering  all the other stuff that is Out There. Once filtered, it is probably a lot more informative and easier to digest than the manuals.  

I’ll ramble on about that next posting.

New Year, same old rambling thoughts January 5, 2010

Posted by mwidlake in Blogging, Perceptions.
Tags: ,

It’s not Friday but, heck, it’s a New Year, there are many of us who might appreciate a non-techie, pointless ramble at the start of the first full working week of a new decade…A Friday Philospohy for the New Year. {If anyone wants to point out the New Decade starts on 1st Jan 2011, go take a running jump – popular opinion is against you, even if logic is for you}.

I found the UKOUG techie conference this year particularly interesting as it was the first major meeting I have been to since I started blogging, and I came across two main opinions about my attempts:

Those who like my blog as it is “chatty” and rambles a bit.
Those who dislike it – because it is “chatty” and rambles a bit…
{oh, and the third opinion, the most common, of utter ignorance of my blog – there goes the ego}.

Well, you can’t please everyone. I was a little saddened, however, as I spoke to a couple of people I really admire in the Oracle Knowledge world and they landed on the “chatty and rambling – bad” side of things. Damn. But they are so good at what they do, I forgive them. The swines.

But then I remembered what I said to a fellow blogger the other month. We bloggers/twitterers all put forward what we blog about in our own style. We might not blog something that is new, we might blog something that is “well known”, but we put it in our own style. Some like it, some do not. It matters not, so long as it adds to the sum of decent knowledge out there.
Some will hate our style and not read, some will read and enjoy. So long as the information gets out there to more people, that is fine.

So, do I think everything I blog is decent knowledge? Oh, I wish. I like to think it is mostly there {and I wish it was all correct} but I am realistic. I test most of what I blog, or I have lived for real most of what I blog, but I will make mistakes. And sometimes I will hit the edge of something good and I put it up there in the hope others will contribute {like the recent one one translating min-max column values into human readable stuff}. And often people do contribute and that is really, really good.

But I do and will continue to make mistakes, be daft, or just put things poorly. I have learned a fair bit in the last 8 months about written communication, the art of communicating to a global audience and also about how not to spread a topic over several weeks as you hope you can “just finish of those half-written blogs in an hour or two” and find it takes weeks. If anyone wants to give me any constructive criticism, please do, but maybe use my email (mwidlake@btinternet.com) rather than flame my postings.

And my last rambling thought for the start of 2010? I am probably going to post less in the next 6 months. I am always sad when the blog by someone I enjoy goes very quiet, but then we all have real jobs to do, so I try to be patient. In my own case, I have noticed I now read a lot less of other people’s blogs as writing my own takes so long. And I am missing too much. There are blogs I really admire or I have discovered in the last 6 months (sometimes both) that I simply fail to really read and they know stuff. So I need to read them. I am going to try and maintain a steady 2-3 blog entries a week, but for the next 6 months I am going to concentrate on learning. Something blogging has taught me is I am really quite ignorant ūüôā

Good wishes for 2010 to all and everyone who stumbles across my ramblings.

Friday Philosophy – Memory (of the Human Kind) November 14, 2009

Posted by mwidlake in Perceptions, Uncategorized.

One of the Joys of having been around for a while (I’ve hit that point where I occasionally find myself working with people who were not born when I got my first job in IT) is that you start to re-learn things. Well, I do. {I have to point out that, this being a Friday Philosophy, you will learn nothing yourself, especially not about Oracle. from this opost. If you want Oracle knowledge you might want to pop by Dion cho, Tanel Poder or Richard foote}

Several times over the last couple of weeks I have learnt something “new” and realised¬†a little while later that¬†I already knew it, but had forgotton (BLEVEL is one, consistent gets being impacted by array fetch size is another) . This is not something to be proud of – not knowing something you once did is¬†far worse, and slightly worrying, then never having known it…

I like to think that it is all the new stuff I am learning, pushing previous knowledge out of my brain’s buffer cache {if it is not warmed by re-using it}, but I suspect that it might just be an uptime-related memory leak.¬†A bit like having your database open for a few months and doing a lot of work that includes massive PL/SQL in-memory tables and hash joins (pickler fetches). No one seems sure of the exact mechanics, but after a while you have a hell of a lot less spare memory in your serever than you started with ūüôā

Maybe the memory loss is closer to the “pickler query”¬†analogy than I thought, you can preserve soft tissue for a long time in alcohol. I’ll have another glass of wine and think on that.

This Forgetting Stuff was actually a major factor to my starting a blog. I was in the process of putting a load of notes on Oracle Odditites and things I had learnt from years past onto a web site so I could get at it from wherever I was in the world, and a friend told me I was being stupid – I should put them on a blog. So I did. There are only two little issues with this.

  • I can’t remember what I have already put on my blog.
  • I’ve forgotten where I put the ZIP disk with the notes on.

So I was contemplating this drift of knowledge and two things struck me.

1) I reckon that the very best people, in any area of expertise, are blessed with excellent memories. There is a comedian in the UK called Stephen Fry, and he is renowned for being stunningly clever. I think he probably is stunningly clever, but he also remembers everything he has learnt (Rory McGrath is another UK comedian with a perfect memory, but somehow he lacks the charm of Stephen Fry, and he has a beard, so not so highly renowned).
2) My poor memory is not maybe such a bad thing. I don’t have to get so upset when a whole chunk of my Oracle knowledge becomes obsolete. I used to be really good at sizing objects and utilizing space, taking into account the 5-block minimum, getting the extents to fit nicely into the datafiles, little scripts to resize segments into a small but sensibele number of extents to reduce wasted space, considering initrans, pctfree, pctused… Pretty much all pointless now ūüôā

Buffer Cache Hit Ratio – my “guilty” Confession November 1, 2009

Posted by mwidlake in Perceptions, performance.
Tags: , ,

My Friday Philosophy this week was on Rules of Thumb on buffer gets per row returned.

Piet de Visser responded with a nice posting of his own, confessing to using ratios to help tuning {We seem to be playing some sort of blog-comment tag team game at the moment}.

Well, I have a confession so “guilty” or “dirty” that I feel I cannot inflict it on someone else’s blog as a comment.

I use the Buffer Cache Hit Ratio.

And the Library Cache Hit Ratio and the other Ratios.

As has been blogged and forum’d extensively, using these ratios is bad and stupid and anyone doing so does not know what they are doing as they do not help you solve performance problems. I mean, hell, you can download Connor McDonald’s/Jonathan Lewis’s script¬†¬†to set it to what you want so it must be rubbish {go to the link and chose¬†“tuning” and pick “Custom Hit Ratio” – it’s a rather neat little script}.

The point I am trying to make is that once the Buffer Cache Hit Ratio (BCHR) was wrongly elevated to the level of being regarded as a vital piece of key information but the reaction against this silly situation has been that it is now viewed by many (I feel) as the worst piece of misleading rubbish. Again a silly situation.

I think of the BCHR as similar to a heart rate. Is a heart rate of 120 good or bad? It’s bad if it is an adult’s resting heart rate, but pretty good if it is a kitten’s resting heart rate. It’s also probably pretty good if it is your heart rate as you walk briskly. Like the BCHR it can be fudged. I can go for a run to get mine higher, I can drain a couple of pints of my blood from my body and it will go up {I reserve the right not to prove that last one}. I can go to sleep and it will drop. Comparing my resting heart rate to yours (so like comparing BCHRs between systems)¬†is pretty pointless, as I am a different size, age and metabolism to you {probably}¬†but looking at mine over a year of dieting and exercising is very useful. If only I could keep up dieting and exercising for a year…

So what do I think the much-maligned Buffer Cache Hit Ratio gives me? It gives me what percentage of sql access, across the whole database activity, is satisfied from memory as opposed to disc. Or, put another way, the percentage of¬†occurences a block has to be got from the I/O subsystem. Not how many blocks are read from storage or memory though, but you can get that information easily enough. As Physical IO is several orders of magnitude slower than memory access {ignoring I/O caches I should add}¬†, it gives me an immediate feel for where I can and can’t look for things to improve.

If I am looking at a system that is overall very slow (eg high process wait queues under l/unix, the client has said the system is generally slow) and I see that the BCHR is low, say below 90%, this tells me I probably¬†can get some performance increase by reducing physical access. I’ll go and look for those statements with the highest physical IO and the hottest tablespaces/objects in the DB.
If the BCHR is already up at the 99% level, I need to look at other things, such as tuning sort, looking at removing activity in the database, to be very mindful of nested loop access where maybe it is not the best access method (very likely due to old stats on tables).

When I have got to know a system and what it’s BCHR generally sits at, a sudden change, especially a drop, means there is some unusual physical IO going on. If the phones start going and someone is complaining “it’s all slow”, the BCHR is one of the first things to look at – especially as it is available from so many places.

Another thing the BCHR gives me is, if I am looking at a given SQL statement or part of an application, it’s specific BCHR can be compared to the system BCHR. this does not help me tune the statement itself, but I know if it’s specific BCHR is low then it has unusually high IO demands compared to the rest of the system. Further, Reducing it might help the whole system, so I might want to keep an eye on overall system throughput. If I reduce the statement’s execution time by 75% and the whole system IO by 1%, the client is likely to be more happy, especially if that 1% equates to other programs running a little faster “for free”.

So, I don’t use the¬†BCHR to tune individual statements but I feel confident using it to track the general health of my database, so long as I am mindful of the impact of new functionality or upgrades. It’s a rule of thumb. It’s a database heart rate. (and so is redo generation and half a dozen other things).

When do We Learn #2 October 20, 2009

Posted by mwidlake in Blogging, Perceptions.

I exchanged a couple of mails with a friend a few weeks back about how the same topic can arise in a couple of blogs at the same time. Well, I had just blogged myself on when we learn and, blow me over with a feather, that Jonathan Lewis goes and post in a similar vein. He must be nicking ideas off my blog ūüôā {and yes, I am being tongue-in-cheek here}. We both post thought about needing spare capacity in your life to be able to spend the time to really understand how something works. Yes you learn a lot in the heat of a crisis, but you rarely reallu understand the details, ie become an expert, without having time to digest and qualify that knowledge.

I did write a long comment on his posting, including some links back to my own meandering thoughts on the topic, then realised that I would come across as a bit “me too” so I trimmed it and took out the links. But that is part of why I do my own blog, I found I was spamming other people’s pages with my diatribes and so decide to spam my own. {And I know I am meandering, I’m a bit sleep-deprived, stream of consciousness at the moment}. So here I can refer back to my own stuff and say “me too”, but you are already here reading this, so you only have yourself to blame :-)… Anyway, I wanted to refer back to a very early blog of mine about how much knowledge is enough. I try and make the point that you do not need to know everything, you can become a small-field or local expert just by being willing to learn a bit more.

Jonathan raises the point that he does not have a full time commitment to one client and so he has the luxury to investigate the details and oddities of what he looks into. He suggest this is a large part of why he is an expert, which I feel is true, and I am very happy to see one of the Oracle Names acknowledging that relative freedom from other pressures is key to having the luxury to chase down the details. Those of us in a full time role doing eg DBA, development or design work, have more than enough on our workday plates to keep us too busy. We cannot be top experts, we have a boss to satisfy and a role to fulfill. {Jonathan does not mention that chosing a career where you have luxury of time is also a pretty brave choice – you stand a good chance of earning a lot, lot less whilst working very hard to establish enough of a reputation to be able to earn enough to feed yourself and the cat}.

But this is not a black and white situatuation. There is room for many of us to become experts in our domain or in our locality. Our breadth of knowledge may never be as wide as others, we may not know more than anyone else in a given area {and let’s face, logically there can only be one person who knows the most about a given topic, and that one person is probably in denial about their superiority, which seems to be a defining quality of an expert – it is not so much humility I think as an acknowledgement of there being more to know and a desire to know it}. However, most of us can become the person in our organisation who knows most about X, or who can tie A, B and C together in a more holistic way than others (and that can be a real trick you know). There are always the top experts that you can call on for the worst problems, but you could become the person people come to first.

My advice would be to not try and learn everything about all aspects of Oracle, because you can’t, but rather learn a lot about one or two areas {and consider areas that are more unusual, not just “tuning SQL” or “the CBO”} and expand just your general knowledge of the wider field. And never forget that there is more to learn. So long as you are taking in more knowledge and understanding, you are improving. The best way to do it? Don’t just read other people’s stuff, try teaching someone else. It never ceases to amaze me how stupid I realise I am when I try and show someone else how something works. But that’s OK, so long as they learn it’s fine. If I learn as well, it’s great, and I nearly always do.

I’m getting on a bit, I think I am finally getting the hang of the idea that the more you know the more you realise you don’t know, I wish I knew that when I knew nothing.

A Tale of Two Meetings – 11GR2 and MI SIG October 5, 2009

Posted by mwidlake in Meeting notes, Perceptions.
Tags: , ,

Last week I attended two Oracle events, each very different from the other.

The first was an Oracle Corp event, giving details of the new 11GR2 release and what it was introducing. It was in a nice hotel in London with maybe 250, 300 attendees and all quite swish.

The other was a UK Oracle User Group meeting, the last Management and Infrastructure SIG for 2009. 30 people in the Oracle City office and far more unassuming¬†{And note, as I chair the MI SIG, anything I say about the day is liable to bias…}.

Both events were useful to attend and I learnt things at both, but I also found the difference between the two quite interesting.

Oracle 11G Release 2

The official Oracle 11GR2 presentation was where you went for the definitive information on what Oracle Corp feel are the new features of 11G R2 that are of interest (though some of it was not R2-specific but general 11G).

Chris Baker started off by telling us “there has never been a better time” to move to the latest technology or a greater need to gain business advantage through using said latest technology. You know, it would be really nice, just once, to go to such a corporate event and not be given this same thread of pointless posturing? I know it is probably just me being old and grumpy and contrary, but after 20 years in the business I am sick to the hind teeth of Keynotes or Announcements that say the same empty “Raa-Raa” stuff as the previous 19 years –¬†the need “now” to get the best out of your technology has been the same need¬†since the first computers were sold to businesses, so give it a rest. Just tell us about the damned technology, we are smart enough to¬†make our own decision as to whether ¬†it is a big enough improvement to warrant the investment in time and effort to take on. If we are not smart enough to know this, we will probably not be in business too long.

Sorry, I had not realised how much the Corporate Fluff about¬†constantly claiming “Now is the time”, “Now¬†things are critical” gets to me these days. Anyway, after that there were some good¬†overviews of the latest¬†bits of technology and¬†following form them¬†some dedicated sessions¬†in two streams on specific areas, split between¬†semi-technical and management-oriented talks, which was nice.

There was plenty of talk about the Oracle Database Machine, which appears to be exadata version 2 and sits on top of Sun hardware, which is no surprise given the latest Oracle Acquisition. I have to say, it looks good, all the hardware components have taken a step up (so now 40Gb infiniband interconnect,¬†more powerful processors, even more memory),¬†plus a great chunk of memory as¬†Sun’s¬†“FlashFire”¬†technology to help cache data and thus help OLTP work. More importantly, you can get a 1/4 machine now, which will probably make it of interest to more sites with less money to splash out on a dedicated Oracle system. I’ll save further details for another post, as this is getting too long.

The other interesting thing about the new Oracle Database Machine was the¬†striking absence of the two letters ‘P’ and ‘H’.¬†HP was not mentioned once.¬†I cannot but wonder how those who bought into the original exadata on HP hardware feel about their investment, given that V2 seems only available on Sun kit. If you wanted the latest V2 featries such as the much-touted¬† two-level disc compression is Oracle porting that over to the older HP systems, are Oracle offering a mighty nice deal to upgrade to the Sun systems or are there some customers with the HP kit¬†currently sticking needles into a clay model of top Oracle personnel?

The other new feature I’ll mention is RAT – Real Application Testing. You can google for the details but, in¬† a nutshell, you can record the activity on the live database and¬†play it back¬†against an 11g¬†copy of the database. The target needs to be logically¬†identical to the¬†source {so same tables, data,¬†users etc} but you can alter initialisation parameters, physical implementation, patch set, OS, RAC…¬†RAT will tell you what will change.

For me as a tuning/architecture guy this is very, very interesting. I might want to see the impact of implementing a system-wide change but currently this would involve either only partial testing and releasing on a wing and a prayer or a¬†full regression test on¬†an expensive and invariably¬†over-utilised full test stack , which often does not exist. There was no dedicated talk on it though, it was mentioned in parts of more general “all the great new stuff” presentations.

Management and Infrastructure SIG

RAT leads me on to the MI SIG meeting. We had a talk on RAT by Chris Jones from Oracle, which made it clearer that there are two elements to Real Application testing. One is the Database Replay and the other is SQL Performance Analyzer,  SPA. Check out this oracle datasheet for details.

SPA captures the SQL from a source system but then simply replays the SELECT only statements, one by one, against a target database. The idea is that you can detect plan changes or performance variations in just the Select SQL. Obviously, if the SELECTS are against data created by other statements that are not replayed then the figures will be different, but I can see this being of use in regression testing and giving some level of assurance. SPA has another advantage in that it can be run against a 10g database, as opposed to RAT which can only be run against 11 (though captured from a terminal 10g or 9i system Рthat is a new trick).
There are no plans at all to backport RAT to 10, it just ain’t gonna happen guys.

The SIG also had an excellent presentation on GRID for large sites (that is, many oracle instances) and how to manage it all. The presentation was as a result of requests for a talk on this topic by people who come to this SIG and Oracle {in the form of Andrew Bulloch} were good enough to oblige.

The two Oracle Corp talks were balanced by technical talks by James Ball and Doug Burns, on flexible GRID architectures and using OEM/ASH/AWR respectively. These were User presentations, mentioning warts as well as Wins. Not that many Warts though, some issues with licence daftness was about it as the technology had been found to work and do it’s job well. Both talks were excellent.

The fifth talk was actually an open-forum discussion, on Hiring Staff, chaired by Gordon Brown {No, not THAT Gordon Brown, as Gordon points out}. Many people joined in and shared opinions on or methods used in getting new technical staff. I found it useful, as I think did many. These open sessions are not to everyone’s taste and they can go wrong, but Gordon kept it flowing and all went very well.


The difference between the two meetings was striking. Both had strong support from Oracle ¬†{which I really appreciate}. Both¬†included talks¬†about the latest technology. However, the smaller, less swish event gave more information and better access to ask questions and get honest answers. There was also almost no Fluff at the SIG, it was all information or discussion, no “Raa-Raa”. But then, the lunch was very nice and there were free drinks after the Corporate event {we shared rounds at a local pub after the SIG event¬†– maybe one round too much}.¬†

I guess I am saying that whilst I appreciate the Big Corporate event, I get a lot more out of the smaller, user group event. Less fluff, more info.¬†Thankfully, Oracle support both, so I am not complaining {except about the “there has never been a better time” bit, I really AM sick of that ūüė¶ ).

¬†So if you don’t support your local Oracle user group, I’d¬†suggest you consider doing so. And if, like so many sites seem to, you have membership but don’t go along to the smaller events, heck get down there! There is some of the best stuff at these SIG meetings.

Friday Philosophy – Disasters September 4, 2009

Posted by mwidlake in Perceptions.
Tags: , ,

Some of you may be aware that I occasionally do presentations called something like:-

“5 Ways to Progress your Career Through Systems Disasters”

The intention of the presentations are to comment on things I have “been in the vicinity of” going wrong¬†in I.T. and ways to avoid them, in a light-hearted manner. Having a bit of a laugh whilst trying to make some serious points about project management, infrastructure, teams, people and the powers of chaos.

I’ll see about putting one up on my Web Site so that you can take a look {check back this weekend if you like}

The talks usually go down well but there are two potential issues in giving the talks:

  • The disasters or problems, if in some way my fault, could make¬†me look like an idiot or incompetent.
  • If it is known who I was working for when I “witnessed a disaster”, it could make that company look bad.

I never used to worry too much about this when I worked permanently for a company that was pretty relaxed about “stuff happens, it is good to share”. After all, if I looked like an idiot then that is fair enough and if I said anything that could be linked back to a prior (or current) employer {which, I hasten to point out, I did aim to avoid} then, well, sorry. But I only said things that were true.

However, when I returned to being self-employed, a good friend of mine took me to one side and suggested such talks could harm my career. I argued that it was not malicious and was helpful to people. My friend argued back that potential employing companies would not look so favourably on this, especially if they suspected that they may one day feature.

Hmmmm…. This was true.

So I toned down the talk…

The next time I did the presentation, the sanitised one, it was not such a hit. In fact, it was a bit rubbish.

The question is, should I have toned it down? Hands up anyone who has not personally done something unbelievably stupid at least once in their working life? Can everyone who has worked for an organisation that has not messed up at least one I.T. project please also raise their hand?

I can’t see any raised hands from here ūüôā

We all make mistakes.
All companies get things wrong at times.

Something you find when you start presenting or organising events is that the talks people most appreciate and learn the most from are about things going wrong.

So why can’t we all be grown-ups about admitting them, talking about them and learning? Personally, when I have interviewed people for jobs, I am always impressed by someone who will admit to the odd failure, especially if they can show what they learnt from it.

Oh, if anyone is reading this before offering me a position, I never made a mistake in my life, honest. I never deleted every patient record from a hospital information system, I was not even on-site when the incident didn’t happen. And if anyone suggest otherwise, it was a long time ago when it didn’t happen. ..

{I got all the data back, anyway. Never start work without a backup…}