jump to navigation

Partitions are Not {just} for Performance October 29, 2009

Posted by mwidlake in Architecture, performance, VLDB.
Tags: , , ,
trackback

There is a myth that Partitions in Oracle magically aid SQL Query performance, even a general assumption that the main role of partitioning is to aid such query performance. This is not so. There are certainly many cases where Partitioning can aid performance but in my experience they just as often hinder rather than help overall performance.

The myth is so strong that when asked at interview, most DBAs {and even some performance specialists} will site query performance as the main (and occasionally only) benefit of partitioning.

Why can partitions hinder SQL query performance? Let’s say for example that you have a 10 million row table and an index on that table. That index has a B-Level of 3, which means it has a root block, one level of branch nodes, a second layer of branch nodes and then the leaf-node level. (NOTE the below diagram shows B-level as 4 on the left of the diagram. That is because it is taken from a presentation I do where I make the point that if the data dictionary says an index has a B-level of 3, there are 4 levels. 3 “branch”, so B, and then the 4th “Index leaf entries”. The titles on the right indicate this). To access a row in the table via the index Oracle needs to read the root block, two branch blocks and then a leaf block to get the rowid of the record. This allows the table block {and from that the row} to be read. This is depicted in the below diagram, the numbered orange squares represent the blocks as selected in turn from the index and then table:

index_access_global

That is 5 I/O operations to access that row.

Now we partition the table into 5 partitions, 2 million rows each. The index is locally partitioned. If you are lucky, you may, just may, end up with local index partitions with a B-level 1 less then the original table, so in our case a B-level of 2. The often suggest process is now that one partition is considered and that the CBO will read one root node, a branch level block, a leaf block and then the block from the partition.

single_index_partition_access

4 I/Os and a saving of 20% {I’m ignoring for now caching and whether it is physical or logical IO}.

A saving of 20% IF {and only if} you have local index partitions with a lower B-Level than the equivalent non-partitioned index. And the access is to one partition alone.

However, I keep seeing situations where the index look up does not include the partition key. So you get the below situation:

index_lookup_partition

Lacking the partition key, the CBO cannot exclude any partitions – so it has to scan each one. For most partitions, maybe for all but one, no records are found, but the index has to be checked with 3 IO operations each. so in my example 16 I/Os are done to get the one record of interest.

16 I/O operations instead of 5. I numbered them in the diagram.

The situation is often not spotted, at least initially, as the time taken to carry out the extra local index partition scans is “small”, especially for specific lookups. Usually any testing is done on a table with only a small number of partitions.

I remember well the first time I came across this {on an Oracle 9.0 database I think}, there was well over 100 partitions and a process that checked many thousands of individual records had slowed down significantly, taking 4 or 5 times as long as before.

An indicator that the problem is occurring is when a single record lookup against the index and then table is taking perhaps several dozen to several hundred consistent gets rather than the 5 or 6 it should. Also, you will see the partition iterator in the explain plan. In that first case where I came across the issue, consistent gets of about 380 per record fetched were being seen for a select that returned 1 record 99% of the time from a single table lookup. I’ve since seen the same problem on tables with over a thousand partitions, each local index being scanned for a total of almost 5000 consistent gets per record.

You may think that this would be an unusual situation as access against very large tables is either full/partial table scans or lookups on the PK/with a WHERE clause including the partitioning key – but it is actually very common. Partitioned tables are being used more and more in large but generally OLTP applications or sets of records are identified in a datawarehouse that are then checked more specifically with generally row-specific logic. With VLDBs which have many, many partitioned tables with many, many partitions each, the problem is common and often not recognized in the fog of other issues and massive I/O levels.

I’ve only covered a general performance issue with partitions here, I’ll expand on the theme and this simplistic example in the next post.

And yes, there are many ways partitioning CAN aid performance. I aim to get to those too. I really love partitioning.

Comments»

1. Bernard Polarski - October 29, 2009

My 2 reasons for partitioning:

– Manage table purge through drops partitions rather than delete statements ( Improve performance by decreasing load)
– Achieve partition pruning for SQL (Improve performance)

So performance is my only motive and the case you describe is a must to avoid. You have said nothing about the table hash partitioning, which bring nothing in the management area, but only exists for the sake of Performance.

2. mwidlake - October 29, 2009

Hi Bernard,

Good points and ones I will be coming to. I see dropping (and truncating) partitions as opposed to delete statements as part of the true Best Reason for partitions – ease management). And the hash partitions as a useful feature once you have though long and hard about it.

The title of the blog is of course over-selling my point, but I really do want to make people realise that partitions only improve performance of you are careful (or lucky) and worsen performance if you are unlucky (or not careful)

3. PdV - October 29, 2009

Nice one Martin.

Good to read that Partitioning is not the solution to all performance-problems.

And the things to watch out for are indeed Partition-Iterator(all), or the scanning of a complete (index-)partition.

The Actual benefits from Partioning are probably in the partition-exchange capabilities. And only for those who can avoid (or afford) the rebuilding of global indexes.

Since you mention the 380 gets per row-returned, I’d be tempted to digress about ratios. 380 is too much (imho, and it all depends of course). Maybe Later.

4. Log Buffer #167: a Carnival of the Vanities for DBAs | Pythian Group Blog - October 30, 2009

[…] Widlake illustrates that partitions are not for performance: “There is a myth that Partitions in Oracle magically aid performance, even a general […]

5. Jeff Moss - November 3, 2009

I note your response to Bernard, that your title is overselling the point, but the post does seem to be quite negative towards the role partitioning can play in performance – yes, like in your example, it can hinder, when bad design choices are made, but more often than not, in my experience, it improves performance significantly.

Partitioning doesn’t have a “main role” – they can help with many aspects of database systems, including scalability, availability, manageability…and, of course, performance.

I don’t think it’s correct to say that Performance is the main reason for partitioning, nor the only one, however, I don’t think it would be fair to pick out any one system characteristic as being the main goal of partitioning…features such as partitioning are so valuable because they are multi faceted and offer enormous opportunities for system efficiency/design wins, on many levels.

I find it strange that your experience suggests that partitioning just as often hinders, rather than helps, performance – I notice you’ve worked on some large systems including data warehouses, well, in the words of Tim Gorman, such systems are doomed to almost inevitable failure, without the use of partitioning…because without partitioning, they can’t scale, can’t be managed and can’t perform.

The example you provide is valid, however, I wouldn’t say it was common – I saw it on one system at a previous client and, after I diagnosed it and explained the problem, things were redesigned and I’ve not seen it since on any other database…maybe I’m lucky that I’ve seen only one occurrence of partitioning causing a performance issue in ten years of using the feature and perhaps it’s no coincidence that the scenario where partitioning causes a performance problem is the same one I’ve seen – perhaps such scenarios are few and far between?

Cheers
Jeff

mwidlake - November 4, 2009

Hi Jeff,

Thanks for the considered and broad comment.

Maybe my title is too strong, but I wanted to make a point. Maybe I tried too hard…

I agree with you (and I think my first post does say this) that partitioning can give performance benefits. And, as Tim Gorman and yourself have pointed out, VLDBs are pretty much unattainable without the use of Partitions. I feel very strongly that partitioning has a wealth of uses and that performance is just one of them. The performance benefits can be very significant but I have found the management, maintenance and compartmentalization aspects of partitions to be their most useful.

One can only go on the experience you have had personally unless other put their experiences “out there”, which is partly where my blog fits in. In my experience I have come across the situation I described half a dozen times and I suspect it existed but went by me a couple of times before I caught in to it ( I can’t go back and check those systems, sadly). You have experienced it only once in 10 years, so maybe I have been unlucky? Chances are, it is somewhere between our two experiences (unless someone else comments to say that they have never seen it in 100 systems or their whole career is based on fixing this problem).

What does seem to be to be common, though, and what prompted the title of my fist posting on Partitions, is that amongst developers and DBAs, there is a belief that Partitioning is “for performance” and it seems at times to have been sold as that. So I want to challenge that and make non-partitioning experts aware that partitioning is not always going to help performance, that sometimes it will reduce performance and sometimes it will greatly aid performance. Further to that, there are all the other uses of partitioning, which you and I know about but many others may not.

So, I apologise for the negative title and take on this post, but hope I can balance it by the next half dozen where I plan to explore partitioning far more. And please pick me up if I appear to be way off line, it’s always good to be kept in check.

6. Jeff Moss - November 6, 2009

Hi Martin

No apology necessary…creating debate/healthy discussion is always useful in my view.

I’m not saying you’re way offline…just that your post was a little negative, in my view, towards the concept of performance relating to the use of partitioning.

I think what I don’t like is the way you’re separating the concept of performance from that of maintenance/manageability…they are related…if you don’t use partitions in a DW, where appropriate, then you will create an unmanageable system which can not perform when it tries to do any maintenance. If you use partitions you can (read should) use partition exchange loading which will give you a very manageable system which scales…and because it does so, it will perform.

Is partitioning giving you manageability or performance there?

I’d suggest it’s giving both and that in most cases performance increases from the use of partitioning except for specific examples like you showed.

Cheers
Jeff

mwidlake - November 7, 2009

Hi Jeff,

I’m a little behind in responding, I’m squeezing in the blog whilst on vacation, “away from IT” 🙂

The point you raise above about performance of the maintenance operations in the comment above and in your comment on the next post are very, very valid – annoyingly so {I’m laughing as I type this, I’m not really annoyed 🙂 } I’m starting with waking people up to the issues with partitions and classic “Performance”, what most people seem to concentrate on – SQL focuses, maybe considering other DML. The I’m going to lead into the Performance and Architecture advantages of Partitons and how wider Performance issues are where Partitions {in my opinion} are the most significant and then lead into the classice Performance arena again, how you can massively improve DW and even OLTP performance with Partitions.

So bear with me if you can and you are spot in with your comments – just way ahead of the game 🙂

Thanks,

Martin

7. Jeff Moss - November 9, 2009

Martin

Being way ahead of the game makes a change for me…I’ll enjoy it while it lasts!

Cheers
Jeff

8. Blogroll Report 23/10/2009-30/10/2009 « Coskan’s Approach to Oracle - November 10, 2009

[…] 11-How partitioning can cause bad performance on wrong partition implementations? Martin Widlake-Partitions are Not for Performance […]

9. Friday Philosophy – Oracle Performance Silver Bullet « Martin Widlake's Yet Another Oracle Blog - August 5, 2011

[…] queries to run faster and aids parallel activity. But sometimes (more often than you might think) Partitioning can drop rather than increase query or DML performance. In earlier versions of Oracle setting optimizer_index_caching and optimizer_index_cost_adj was […]

10. Friday Philosophy – Oracle Performance Silver Bullet « Ukrainian Oracle User Group - August 6, 2011

[…] queries to run faster and aids parallel activity. But sometimes (more often than you might think) Partitioning can drop rather than increase query or DML performance. In earlier versions of Oracle setting optimizer_index_caching and optimizer_index_cost_adj was […]

sabari - August 17, 2012

partition is worth for database table’s are not???

mwidlake - August 17, 2012

The answer is “it depends”. They can help SQL query and DML performance *IF* the partitions reduce the volume of data that Oracle has to scan. They hinder performance if they split the table into lots of separate chunks that Oracle has to scan each one of.

I still think that the main reason for partitioning is actually to help the performance of table maintenance. If you want to hold 100 million rows then, if held in a normal table, then this table will probably be too big to look after. Think how long and how much memory it would take to create an index on a table that large. By using partitions you can create the index in chunks.

I would almost certainly not create a table over 100GB without partitioning it. I would almost certainly not create a data warehouse without using partitioning. But you have to plan how you use it.

11. Abhishek Kumar - January 16, 2013

hi mwidlake,
I am having prolems in understandig your Point. The local index will have to have a partitioned key as part of the if its unique .Only in nonunique indexes the index will be withoutv a partitioning key. So Will your scenario of adverse performance be applicable only in the cas eof non unique indexes. As even if the lookup is done on an index where partitioning key is not involved in the lookup and index skip scan be done in case of unique indexes locally on each partition and performance will be better.

mwidlake - January 17, 2013

Hi Abhishek

I think you mean that the local index will have to include the partitioning key as part it – if the index is unique, and as such unique key lookups will not suffer from this problem

That is true. But many indexes on partitioned tables are NOT unique and in fact do not include the partition key in them. There is an argument {that I won’t expand on here} that any local indexes should NOT contain the partition key as most of the benefit of limiting a query with it is achieved via the partitioning. If the lookup is on the whole unique key or includes the partition key then, yes, only one partition will be considered.

But what I was seeing back in 2009, and continue to see all the time, is access either via an index that is not unique and does not contain the partitioning key OR access via part of the primary key which does not contain the partitioning key. eg let’s say a table of ACCOUNTS is partitioned on CREATED_DATE and the unique key is on ACCOUNT_TYPE, ACCOUNT_ID and CREATED_DATE. A where clause that states only the ACCOUNT_TYPE and ACCOUNT_ID will probably result in a range scan of the unique index – for each partition. Because Oracle lacks the CREATED_DATE to identify the correct partition.

Thus even use of a unique index can suffer from this problem.

I don’t know if the above example and the article as a whole seems to describe an unusual situation but, unfortunately, it is very very common. I find it difficult to think of a client I have been to in the last 6 years that makes heavy use of partitioning where I have not seen this issue.

I hope that helps Abhishek

12. Query Performance On Partitioned Tables – petersnotebook - February 1, 2017
13. Partitioning InnoDB tables based on time-based pseudo-sequential UUIDs - Too Many Afterthoughts - September 25, 2022

[…] Partitioning has multiple uses – spreading load onto multiple disks, cold storage of older data on cheaper disks, and probably others. Most importantly though, partitions are not for performance. […]


Leave a reply to Jeff Moss Cancel reply