jump to navigation

Friday Philosophy – Human Tuning Issues September 23, 2011

Posted by mwidlake in Friday Philosophy, humour, Perceptions, performance.
Tags: , ,
trackback

Oracle Tuning is all about technical stuff. It’s perhaps the most detail-focused and technical aspect of Oracle Administration there is. Explain Plans, Statistics, the CBO, database design, Physical implementation, the impact of initialisation variables, subquery factoring, sql profiles, pipeline functions,… To really get to grips with things you need to do some work with 10046 and 10053 traces, block dumps, looking at latching and queueing…

But I realised a good few years ago that there is another, very important aspect and one that is very often overlooked. People and their perception. The longer I am on an individual site, the more significant the People side of my role is likely to become.

Here is a little story for you. You’ll probably recognise it, it’s one that has been told (in many guises) before, by several people – it’s almost an IT Urban Myth.

When I was but a youth, not long out of college, I got a job with Oracle UK (who had a nice, blue logo back then) as a developer on a complex and large hospital system. We used Pyramid hardware if I remember correctly. When the servers were put in place, only half the memory boards and half the CPU boards were initiated. We went live with the system like that. Six months later, the users had seen the system was running quite a bit slower than before and started complaining. An engineer came in and initiated those other CPU boards and Memory boards. Things went faster and all the users were happy. OK, they did not throw a party but they stopped complaining. Some even smiled.

I told you that you would recognise the story. Of course, I’m now going to go on about the dishonest vendor and what was paid for this outrageous “tuning work”. But I’m not. This hobbling of the new system was done on purpose and it was done at the request of “us”, the application developers. Not the hardware supplier. It was done because some smart chap knew that as more people used the system and more parts of it were rolled out, things would slow down and people would complain. So some hardware was held in reserve so that the whole system could have a performance boost once workload had ramped up and people would be happy. Of course, the system was now only as fast as if it had been using all the hardware from day one – but the key difference was that rather than having unhappy users as things “were slower than 6 months ago”, everything was performing faster than it had done just a week or two ago, and users were happy due to the recent improvement in response time. Same end point from a performance perspective, much happy end point for the users.

Another aspect of this Human side of Tuning is unstable performance. People get really unhappy about varying response times. You get this sometimes with Parallel Query when you allow Oracle to reduce the number of parallel threads used depending on the workload on the server {there are other causes of the phenomena such as clashes with when stats are gathered or just random variation in data volumes}. So sometimes a report comes back in 30 minutes, sometimes it comes back in 2 hours. If you go from many parallel threads to single threaded execution it might be 4 hours. That really upsets people. In this situation you probably need to look at if you can fix the degree of parallelism that gives a response time that is good enough for business reasons and can always be achieved. OK, you might be able to get that report out quicker 2 days out of 5, but you won’t have a user who is happy on 3 days and ecstatic with joy on the 2 days the report is early. You will have a user who is really annoyed 3 days and grumbling about “what about yesterday!” on the other 2 days.

Of course this applies to screens as well. If humans are going to be using what I am tuning and would be aware of changes in performance (ie the total run time is above about 0.2 seconds) I try to aim for stable and good performance, not “outright fastest but might vary” performance. Because we are all basically grumpy creatures. We accept what we think cannot be changed but if we see something could be better, we want it!

People are happiest with consistency. So long as performance is good enough to satisfy the business requirements, generally speaking you just want to strive to maintain that level of performance. {There is one strong counter-argument in that ALL work on the system takes resource, so reducing a very common query or update by 75% frees up general resource to aid the whole system}.

One other aspect of Human Tuning I’ll mention is one that UI developers tend to be very attuned to. Users want to see something happening. Like a little icon or a message saying “processing” followed soon by another saying “verifying” or something like that. It does not matter what the messages are {though spinning hour glasses are no longer acceptable}, they just like to see that stuff is happening. So, if a screen can’t be made to come back in less than a small number of seconds, stick up a message or two as it progresses. Better still, give them some information up front whilst the system scrapes the rest together. It won’t be faster, it might even be slower over all, but if the users are happier, that is fine. Of course, Oracle CBO implements this sort of idea when you specify “first_n_rows” as the optimizer goal as opposed to “all_rows”. You want to get some data onto an interactive screen as soon as possible, for the users to look at, rather than aim for the fastest overall response time.

After all, the defining criteria of IT system success is that the users “are happy” -ie accept the system.

This has an interesting impact on my technical work as a tuning “expert”. I might not tune up a troublesome report or SQL statement as much as I possibly can. I had a recent example of this where I had to make some batch work run faster. I identified 3 or 4 things I could try and using 2 of them I got it to comfortably run in the window it had to run in {I’m being slightly inaccurate, it was now not the slowest step and upper management focused elsewhere}. There was a third step I was pretty sure would also help. It would have taken a little more testing and implementing and it was not needed right now. I documented it and let the client know about it, that there was more that could be got. But hold it in reserve because you have other things to do and, heck, it’s fast enough. {I should make it clear that the system as a whole was not stressed at all, so we did not need to reduce system load to aid all other things running}. In six months the step in the batch might not be fast enough or, more significantly, might once more be the slowest step and the target for a random management demand for improvement – in which case take the time to test and implement item 3. (For those curious people, it was to replace a single merge statement with an insert and an update, both of which could use different indexes).

I said it earlier. Often you do not want absolute performance. You want good-enough, stable performance. That makes people happy.

About these ads

Comments»

1. Bernard Polarski - September 23, 2011

I disagree for the more you free resources, the more room is made for new ‘ideas’ to pop up. Restrict from realizing the full potential of a query while you smell it, may prevent an extreme idea to emerge. New functionalities often implies from some old process to undergo a significant increase in level of performance. Achieving the full potential of a tuning session may be the trigger for the new functionaly which otherwise may never surface.

mwidlake - September 23, 2011

I can’t see that Bernard. I’d be more likely to “disagree with myself” on the basis of it’s best to use as little resource as possible to allow for head room than for new ideas to come up.

For something application-wise to become possible which previously was not, due to performance, you need to be an order of magnitude (10 tiimes) or more faster I think.
As for realising I could do something “interesting” as a result of applying a tuning step I have already identified {and documented}, well I already knew about the idea. I’ve already tuned up the code/application/report from “slow” to “fast” and so the returns are likely to be either diminishing or much harder to get to :-)

One could argue for working on the code more and more to get the very last percentage of performance, even looking at eg how many latches you are using, but that is called Compulsive Tuning Disorder. You have other things that need doing so you move on to them, unless you are studying in your own time.

2. Dom Brooks - September 23, 2011

I’ve found the deliberate mistake you little tease – “first_n_rows” should be “first_rows_n”. Was I first? What do I win?

This article reminds me of the sagacious observation you made here:

http://jonathanlewis.wordpress.com/2009/09/03/queue-time/#comment-34326

mwidlake - September 23, 2011

Dom, you win a grain-based beverage of your choice next time we have beers (Tuesday 27th?). And it was not deliberate, I’m just a bit thick.

And I assume the comment you refer to is the one about speeding things up so much that the damned users keep using it, and thus it makes even more demand on the system than the slow version – not the ones about Doug.

3. Dominic Delmolino (@ddelmoli) - September 28, 2011

Martin, my problem is similar in that I feel like I can only tune something up to the level of its maintainability by the responsible staff.

For example, if to tune a statement I have to engage of level of parallel query, and the onsite staff can’t manage / configure parallel query, I might avoid that as a tuning option. Or if using a PIVOT option makes the query run faster, but the customer doesn’t understand PIVOT. Analytic functions are another example.

I hate to ask them “how much” they understand, because there’s always a database “expert” on site who will end up “maintaining” what I’ve done, and who already resents that I’ve told their manager that they don’t need more hardware / cpu / disk / memory — they think I’ve done my tuning with smoke and mirrors, even though it’s demonstrably faster and uses less resources.

mwidlake - September 28, 2011

Hi Dominic,

Sorry for the delay in your comment appearing, I have not been around to do approvals until now. Any future comments from your email address will come through immediately.

That’s a really good point, I know what you mean. I’ve had to avoid implementing a few things in the past as I realised that the client either did not have anyone who knew enough to look after it or things were so chaotic that leaving them a task that had to be repeated at intervals was going to be missed. In such environments, automating such things reliably is also incredibly tricky if time is restricted.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 159 other followers

%d bloggers like this: