jump to navigation

Friday Philosophy – Explaining How Performance Tuning Is Not Magic? March 9, 2018

Posted by mwidlake in Friday Philosophy, performance, SQL.
Tags: , , , ,
trackback

Solving performance issues is not magic. Oh, I’m not saying it is always easy and I am not saying that you do not need both a lot of knowledge and also creativity. But it is not a dark art, at least not on Oracle systems where we have a wealth of tools and instrumentation to help us. But it can feel like that, especially when you lack experience of systematically solving performance issues.

SQL statement be fast!

I recently spent a couple of days with a client discussing how you solve performance issues and I am preparing a talk on exactly that for the up-coming OUG Ireland conference on the 22nd-23rd March. The guys I was talking to at the client are very, very capable production DBAs, but traditionally the developers at their site initially look at any performance issues. If they think it is the database but not something they can fix, they throw it over the fence at the DBAs. It’s not always the database of course and, if it was a simple fix (missing index, obviouosly inefficient query), then the developers fixed it. So these two guys are expected to only solve the more difficult issues. That’s not really fair as, if you are not practising on the simple problems how are you supposed to gain the experience and confidence to solve the harder ones?

Anyway, a part of the discussion was about Explain Plans. What does the COST mean in the plan, they asked? They saw it as some number that in an undefined way gave an indication of how expensive the step in the plan was, but they could not link it back to anything solid that made sense to them. It looked like a magically produced number that was sometimes very, very wrong. Like most (good) technical people, they want to know the details of things and how they work, they don’t want to simple accept something as working.

So I took them through some simple examples of plans and of how the COST is just a result of simple maths estimating the number of IOs needed to satisfy the step.

I won’t go into the full details here but have a look at the below, this is how I started:

I explained how you read “down” the slope of plan to the end (so step 3) and then worked back up the slope. So the first thing Oracle does is the index range scan. I showed them the BLEVEL of the index, the number of blocks per indexed value and why Oracle knew it would, on average, need 3 IOs to get the leaf block entries for the provided “DOB=to_date(’08-OCT-1934′,’DD-MON-YYYY’)”. Each DOB matched, on average, 20 rows. So the cost of step 3 was passed up to the step 2 of accessing the table rows. This would be done 20 times so the cost was 20+3. 23.

OK, they could accept that, it made sense. So let’s extend it…

I took the original query against PERSON for a given DOB and now joined it to a second table PERSON_NAME. Why is not important, it’s just a demonstration of a simple table join:

Now I explained that as you work “back up the slope of the plan” from the first, most indented step (so from step 5 to 4 to 3) at 3 there is a nested loop joining the rows passed to step 4 to the step in line below it, i.e. step 6. They had already seen steps 5 and 4 in our first example, Oracle is expecting to get 20 rows for a cost of 23. Look at line 4. And for each of those 20 rows, it will do a range scan of the index in step 6 and for each row it finds in the index, collect rows from the joined table in step 7.

So for each of the 20 original rows it does a scan of an index on the joined table for a cost of 2 (I showed the stats to them how this cost is calculated) and expects on average to find 5 matching rows so it needs to do 5 IOs to the PERSON_NAME to get those rows. Add that together and that cost of 7 is done 20 times. 7*20 is 140, plus the 23 from the orginal scan of the PERSON table, the whole COST is… 163.

Light bulbs came on and they got it! My job here is done.

But it was not. I then went on to explain how it is now hard to get such a simple example. This one is, I think, from an early version of Oracle 11. I told them how histograms on a column will make the estimated cardinality (number of records per given value for a column) more accurate, but harder to work out. I showed them how the cost of a unique index scan is reduced by 1. I explained how Oracle was blind to the correlation of column values unless you took steps to tell the optimiser about it (you know, how for a given value of car manufacturer there will be only a few values of car model, the two values are related)…

Worry was creeping back into their faces. “so it is simple mathematics – but the rules are complex? It’s complex simple mathematics?!?” Err, yes. And with 11 and 12 Oracle will use various methods to spot when the simple, complex mathematics does not match reality and will swap plans as a result…

I think I lost them at that point. Especially when they asked about the SQL Profiles and how they modified Costs… Baselines controlling how plans are used… Bind variables…

That is a real problem with Oracle Performance tuning now. Even something as simple as COST is based on a lot of rules, factors and clever tricks. And they are not the only things controlling which plan is executed anymore.

So I took a step back.

I told them to accept that the COST is really, honestly based on maths, and the expected number of ROWS is too. But the actual, specific values could be hard to totally justify. And it is when the estimated COST and (possibly more importantly) the estimated ROWS goes wrong you have problems. So look out for ROWS of 1 (or at least very low) in a plan for a statement that takes more than a few milliseconds. And for very, very large COSTS/ROWS in the millions or more. And what really helps id if you get the ACTUAL ROWS as opposed to the ESTIMATED RIWS. Where there is a significant difference, concentrate your focus there. Of course, getting the ACTUAL ROWS is not always easy and is for a later (and properly technical) post.

So, they asked, if they could not get the actual ROWS and there were no 1’s or millions’s in the plan ROWS/COSTS? How did they know where to concentrate? “Well, you just get a feel for it… do the costs feel reasonable?…”

Damn – I’m not sure I really delivered on my aim of proving Performance Tuning is science and not magic.

Any pointers anyone?

(Update – Vbarun made a comment that made me realise I had messed up the end of this post, I was talking about estimated ROWS and still had the words COST there. Now fixed. And the other thing I was hoping someone was going to suggest as a pointer was – to split the query down to individual tables & two-table joins and *check* how many rows you get back with the where predicates being used. It takes some time but it shows you where the estimates are going wrong.)

Advertisements

Comments»

1. Mathew Butler - March 9, 2018

I think the mechanics of understanding is the science, addressing the problem is more creative.

My thoughts on tuning:

I find that there’s a lot of considerations when performance tuning and which aspects are given most importance requires additional business, application and system context.

In terms of SQL tuning and what is a reasonable optimizer cost value – this is probably the last thing I might looks at. I tend to look at this only when I’ve worked out what access path should be used, and then only to compare costs relative to other access paths .e.g. why is the optimizer thinking this is more expensive that this alternative?

Instead I looks at how much work the query has to do (logical IO) , and try to come up with a view as to whether this is “efficient”. I think Cary Milsap or Jeff Holt that suggested as a rule of thumb that simple joins should only require roughly 10 LIO per row returned. So if there is a larger number of LIO than expected then I drill in to understand why.

Understanding why means looking at the execution plan and the access paths being used, and considering the application context e.g. should the query be exhibiting batch or OLTP behaviour. I like to come up with a test case that allows me to examine the estimated and actual rows processed in each step of the plan, using this as an indicator that the optimizer has made a mistake. I think Wolfgang Breitling originally introduced this idea. If the actuals and estimates differ greatly I then start to investigate the existing statistics, and methods of stats creation to attempt to resolve this. Alternatively maybe some other physical data organisation or index might be more efficient.

In general knowing and understanding the logical and physical data model, the data, and distributions of data relationships helps, sometimes though working with someone else who has this knowledge is sufficient.

There’s also the broader database context – when the query was identified as slow what did the database look like from a performance perspective? One or more AWR give a better feel for what other issues might be present, how the database is configured (parameters), and the type of processing/resource pressures that this database is under.

Overall though, I try to tune with the business goal in mind. Fixing a single SQL query is sometimes good enough. Sometime though there are broader changes needed across the whole processing flow to get the required performance outcome. Sometimes these changes are algorithm changes rather than localised query or index modifications.

Even writing this, I realise I’ve skipped over lots of details.

This is definitely not magic. There are rules that govern the RDBMS and application behaviour, understanding these means that these can then be reasoned about, hypotheses formed and tested.

mwidlake - March 9, 2018

Hi Matthew,

I agree with pretty much everything you say. Exactly how the cost is generated is not that important, but the fact that is intended to represent IO is – and that Oracle usually gets it right (or, at least, right enough. After all, you are after a good-enough plan, not the perfect plan). But, as you mention, knowing when the cost is wrong (estimated vs actual) can be incredibly useful in working out where the plan is seriously sub-optimal, why the cost is wrong and thus what is giving the optimiser the wrong data shape.

There is no getting away from the fact that there is an aspect to performance tuning that is down to experience and the pattern not feeling correct. I sometimes find it hard to properly justify that feeling -and that tells me that I’m not understanding enough about that situation.

The logical IOs per row is very useful, I use that a lot. I have a break point of about 4 tables in the join when I move the 10 IOs per row to 50 and then 100. DW queries usually.

What is more of an.. ?art? than a science is the choosing of a solution once you know the problem. Sometimes the obvious technical solution is not allowed for business reasons (not allowed to change the database for a 3rd party application, not even an index). Sometimes the business is so risk averse they won’t let you change anything – but insist you fix the problem anyway!

And sometimes it is not a problem! They noticed the slow code as someone looked, but the actual business is not suffering.

mathewbutler - March 9, 2018

I think the “not feeling correct” is that your brain has pattern matched based on experience subconsciously and is flagging that there’s something amiss. This happens to me, and when it does I take it as a marker to try and unpick what’s going on to try and understand what’s behind the feeling.

I love all this stuff. Making stuff go fast feels great. The learning and understanding that comes with the work is also fun. Proving to yourself that you really understand the way something works is great, and it those moments when you realise that the rabbit hole goes deeper and there is more to learn.

I agree about the cost. This has become more relevant with workload stats. I do look at the cost. It’s important to grasp that it can be understood. It’s just not always where I go first – though clearly a massive cost is a big red flag.

Liking these philosophy posts.

Have a good weekend.

2. Sayan Malakshinov - March 10, 2018

I think programming experience helps a lot in performance tuning: when you know CS fundamentals (algorithmics, how compilers work), have good experience in multithreading and concurrency, optimized and debugged own apps, it becomes very easy to understand how to read execution plans and what is the IO(or CPU) cost, cost-based and heuristic transformations/decisions.
The more experience we have, the faster and more effectively we can tune (and better heuristic decisions we can make to speed up our performance tuning processes 🙂

btw, I like to start describing executions plans with v$sql_plan/plan_table, because dbms_xplan concatenates OPERATION and OPTIONS, though it’s much easier to describe each line of the plan as a call of some function, that starts and stops child plan steps/functions and processes their results, and where OPERATION is a human-readable function name and OPTIONS is a main parameter that specifies used algorithm/subfunction.

ps. I like to show real function names using Tanel’s os_explain script: https://blog.tanelpoder.com/files/scripts/tools/unix/os_explain
for example:
s/qertb/TABLE ACCESS: /g;
s/qerix/INDEX: /g;
s/qerhj/HASH JOIN: /g;

3. vbarun - April 7, 2018

Cost is mainly influenced by cardinality estimates.
Thus, focus on estimated vs actual number of rows returned by each row source.
And that is easy to see (i.e. in sql monitor report).

mwidlake - April 7, 2018

Yes – if you have the licence for SQL Monitor (Enterprise Edition with the SQL Tuning pack on top), and you can access it, and the code runs for enough seconds to be caught, and you look for it close enough in time to it running… It’s brilliant.
I can’t do better than to point to This post by Maria Colgan as a starting point.

Something else your comment makes me realise – at the end of my post I am talking about ROWS not COST and I still use the word COST. I’ve changed it now but I suspect that might have misled some people :-(. Bad Martin.

It is when the estimated ROWS (cardinality) is very different to the actual ROWS that the optimiser tends to get the plan very, very wrong. Then you have to work out why it is wrong and what your options are to fix it.

vbarun - April 7, 2018

For completeness, SQL Monitor is just one way to access estimated vs actual cardinality. Alternatives include sql trace and dbms_xplan.display_cursor ( for details see https://blogs.oracle.com/optimizer/how-do-i-know-if-the-cardinality-estimates-in-a-plan-are-accurate).
Of course, each approach has its advantages and disadvantages.

mwidlake - April 7, 2018

Thanks, yes, always good to mention how you see the plan statistics (Actual rows).

For DBMS_EXPLAN with the “allstats…” option to work you must also enable the gathering/preserving of the actual rows info with one of:
– Hint the SQL statement to gather stats with /*+ GATHER_PLAN_STATISTICS */ (need access to change the code)
– Set statistics level to ALL (has impact on the whole system)
– use the underscore parameter _rowsource_exeution_statistics = true (never user underscore params on prod without Oracle Support saying you can)

If you cannot do any of the above – google “Kerry Osborne SQL_Profile gather_plan_statistics”. It’s cool 🙂


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s

%d bloggers like this: