jump to navigation

Friday Philisophy – To Manage or to Not Manage March 26, 2010

Posted by mwidlake in Friday Philosophy, Management, Uncategorized.
Tags: ,
6 comments

Recently a friend of mine Graham Oaks blogged about his decision to step back from management and return to the Technical Coal Face.

I made a similar decision 3 or 4 years back, so I have a lot of empathy for his position and his decision. I found that to do the job of a manager takes up a lot more time, effort, patience and emotional effort than I had realised. Team leading is bad enough, having to coordinate the efforts of a half dozen people and sorting out the myriad issued they throw your way. Being in charge of multiple teams, responsible for strategy, dealing with staff development and moral, being a buffer against HR and having to deal with the politics created by people who WANT to be managers and wield power is more than a full-time job. Trying to hold onto a technical element as well, I found I could only manage it by doing the technical job as a “hobby”, in my own time. It was just too much to keep going year after year.

I had to chose. Give up the technical to give me enough personal resource to remain a manager and get better at it, or stop being a manager and start re-gaining my technical skills. I chose the latter.

Since I made my decision 3 years ago, I have met several people who have made the same conscious decision to step away from management and return to a more technical role. You may stand to earn more as a manager {which is something I objected to before being a manager and I still object to having been one – it should be possible to earn the same doing either} but for some of us it is not enough to make losing the hands-on work a sacrifice worth making.

One of the points Graham makes in his blog is that his spell as a manager has given him an appreciation of the challenges of management and the particular hells and stresses of the role. I think this is something that people who have never been managers have trouble really understanding.

I was working with a guy a couple of years ago and he was telling me how much of “a Moron” his boss was. In fact, he felt his current boss was even more of a moron than his previous boss. He then confessed that all of his bosses had been morons. “What, every single one of them?” I asked. Yes, absolutely all of them. That struck me as incredibly unfortunate, that every single one of these managers (and he’d had a lot as he moved between teams and positions on a regular basis), most of whom had come up through the technical ranks, were all Morons. I pointed out this unfortunate coincidence and wondered if there might actually be a common factor with all of these managers. He told me there was; They were all Morons.

He himself had never been a manager. He said he was too smart. Not smart enough to get what I was hinting at with the common factor suggestion though.

Obviously, some managers are poor at what they do; there are poor people in every job. But something I took away from my time being a manager is a lack of empathy for anyone saying all managers are a waste of time when they have never done the job themselves.

After all, I doubt there is any job where just doing it means you are an idiot.

Except Sys Admins – They are all idiots :-) (ducks behind server).

Advert for the Management and Infrastructure SIG March 24, 2010

Posted by mwidlake in Management, Meeting notes.
Tags: , ,
4 comments

I’m a bit late doing this (life is just too busy at the moment) but I want to mention the next Management and Infrastructure Special Interest Group meeting of the UKOUG next Week. Tuesday 30th, being held in Oracle’s London City office.

I get asked by people what exactly the MI SIG is? {Honest, I do, I got asked twice this month alone!}. Is it a management meeting or is it another one of the technical SIGs, like the UNIX, RDBMS and RAC/HA SIGs? I’ve struggled to come up with a single line to sum it up. Other than to say “Both”.

It might be easier to sum up the target audience. The MI SIG is for technical people who need to deal with Oracle as a component of a large IT environment. Most of the audience could knock up a PL/SQL script to create a new set of tablespaces each month, would be able to instal Oracle {if given a couple of days and the manuals to peek at} and could explain two-phase commit. Maybe.
But what they have to deal with in their working lives are things like using Grid Control to manage 500 instances, understand what options are there for providing disaster recovery {if not the exact commands to eg set up physical standby or active/passive RAC}, knowing enough about storage options to make a sensible decision on which is best for each type of Oracle system they have. So it is a technical SIG, but covering general principles and, well, Infrastructure.

And the Management? Well, when the SIG started this bit was really interesting to me. When you have a lot of IT going on, especially in large organisations, the people looking after Oracle are not the people looking after Networks, or Storage, or Backups or half a dozen other things. And you probably have a team of people doing all that Oracle stuff with/for you. So you have to hire staff and keep ‘em happy and deal with teams who you have no power over but you need them to do stuff for you. And that Management part can be a lot harder than the technology, especially if you never planned on being a manager but just woke up one day with that monkey on your back.

So with the technical aspects of Large IT Infrastructure comes the management component too. The SIG is there for that audience.

I chair this SIG, so I am more than a little bit biased, but I think it is a good line-up of talks for this up-coming meeting. We have two talks on using OEM/Grid Control, one around using it for deploying clusters, one about how you go about integrating it with the likes of LDAP, Kerbros and using the Custom Metrics, ie plugging it into the wider infrastructure.

We also have a presentation on the latest/greatest Exadata2, from some Oracle friends.

To wrap up the technical talks I am going to try and explain some of the guiding principles for gathering statistics for you oracle databases. Not the details of DBMS_STATS command syntax, but why you need good stats, how you get them and the issues we all seem to end up facing with it.

Balancing the techical side is a talk on Birkman and understanding teams and people.

So, you can see it is a line-up matching the diversity of the SIG’s purpose.

As I said earlier, I initially was very interested in the management side of the SIG and I worried I would be pretty lonely in that opinion. For various reasons, those of us on the technical side tend not to have much time for those “soft skills” we associate with management theory. However, when I took over the SIG over a year ago, I asked the audience if they would want some talks on hiring staff, dealing with people, motivation… Over 60% of the audience said “YES!”. Quite loudly. About 30% said “OK, so long as we get technical stuff as well”. 6% said “over my dead body”.

I think the reason so many wanted the management side as well is, whether we like it or have an affinity for it or not, it is part of the job. And so we need to be able to do it. Personally, I quite like the human side of IT, but my wife tells me I am strange.

If your organisation has UKOUG membership it is free to come along to the SIG (one person per membership, excluding Bronze membership) Anyone can come along for £80. You would be very welcome and I am sure you will learn new stuff. Don’t let the fact that we retire to a pub afterwards where the chair buys a round sway your decision to come along at all.

Which Hints to Use March 18, 2010

Posted by mwidlake in performance.
Tags: , ,
2 comments

When tuning Oracle SQL code, which are the best Hints to use? Are there good and bad Hints? I believe so.

Firstly, I wish Oracle had called Hints something else. As many have commented, Hints are actually directives to the CBO. If the Hint is syntactically correct and can be applied, it will be applied. If you include a “USE_INDEX” Hint to tell the CBO to use an existing index then, no matter how crazy it would be to do so, the CBO will use the hint if doing so does not prevent the query from being logically possible.

That leads onto one of the reasons I do not actually like Hints {and these reasons lead to my definition of good and bad hints}. So long as the Hint remains possible, it will be followed. It has become a vital part of the logic of the code but Hints are rarely reviewed.
Another reason, often mentioned, is that if new functionality is introduced to the Oracle Optimizer, Hints may prevent it being used {eg you might have hinted the use of an index but now Oracle can convert two other, less selective B-tree indexes into bitmaps and merge them together to get a far more selective path – but the Hint forces the use of the original index}.
The above is possible but is rare compared to a far more common issue – You hinted a specific index be used, but if you now create another, more suitable index for that sql statement, the new index will not be used. The hinted one will still be used. Similarly If you drop the index that the Hint references, now the Hint is invalid and the CBO will chose a new access path. You are given no warning when you drop an index that hints reference {and it would be very tricky for Oracle to do this for you in a reliable way}.

A final problem with Hints is that the access path may need to change as data volumes and relationships change. When there are very few rows in a driving table, a nested loop access path may make sense. But if that driving table grows and has many more rows in it, then a hash join becomes more efficient. A Hint can fix this path.

I prefer Hints that give the CBO more information but allow it to still choose the path and to vary as data volumes change.

I like the DYNAMIC_SAMPLING Hint as it is generally just telling oracle to look harder {more intelligently} at the statistical information, at the cost of spending a little longer on the parse . Most systems have OPTIMIZER_DYNAMIC_SAMPLING set at 1 or 2 so by default tables with no stats will have stats gathered. Hinting at level 3 and 4 instructs the CBO to verify estimate guesses for predicates it has made and check correlation between rows. It is probably the Hint I am most happy using.

In version 10.1 I encountered lots and lots of issues with the CBO trying to unnest and merge elements of the SQL into the main body of the query. And getting it wrong. Though it is potentially stopping the CBO from examining useful access paths, I do use NO_MERGE and NO_UNNEST quite often and I “like” them as it leaves the rest of the decisions up to the optimizer. You are basically saying “leave that subquery alone and satisfy it in isolation”. I still encounter lots of such issues on 10.2, but I also use UNNEST more often, to push a subquery into the body of the code.

I am more happy using a CARDINALITY hint than USE_NL or USE_HASH as the CARDINALITY hint is self documenting (it says exactly what correction {or lie} you are giving the optimiser). A USE_NL Hint is enforcing an access path and not really saying why.

If specific hints are going to be used, I like to be very specific. USE_HASH should state both tables that are to be hashed together {It might even be that modern versions of Oracle insist on the form USE_HASH (taba tab) and not just USE_HASH (taba), as I never use the less specific Hint}.
{ NB see comment by Jonathan Lewis as to why I am utterly wrong on this one – USE_HASH basically says “Use a hash join on the tables listed” but does not force a hash between the two stated tables}.

I only Hint specific use of an index if I can’t fix the problem with better gathered stats. I don’t LIKE adding INDEX Hints, even though they are commonly used and easy to understand. For the reasons stated above, I do not like ordering the CBO to use one of the currently existing indexes.

I really do not like using the RULE hint. In Version 9 it was an uphill struggle to get people to not use it as it so often fixed the immediate problem and, of course, oracle used it so much for internal SQL (and still do in 10 and even in 11). How many current DBAs and Developers know what the Rule Based Optimizer rules are? {hands DOWN you over-40 lot}. Using the RULE hint is bordering on homeopathy for databases. It seems to work, you have no idea why and, in reality, it may well be doing nothing, as you are using a feature of Oracle that is incompatible wiht the RBO.

I am very, very uncomfortable about some hints. The bypass_ujvc hint is one of them. It basically tells Oracle it can do a MERGE INTO statement without having the unique constraint in place to support the where clause on the target table that allows it to work reliably. You are telling the optimizer “just trust me”. IE you can lie horribly to the CBO.

All in all, I try and get the stats right rather than hint. I’ll spend ten times as long trying to understand and fix (if I can) why the estimated costs and cardinalites in an Explain Plan are wrong than slapping in an INDEX Hint. I will use Hints if I can’t fix the plan via the stats, but I try and use the more generic Hints. I know from experience that fixing the stats {or at least understanding why I can’t} fixes more code than adding one hint to one SQL statement.

A good rule of thumb is, if the Cardinality is accurate for a step, the plan will be acceptable. {This is a Rule of thumb, not a cast-iron truth!}.

Friday Philosophy – CABs {an expensive way to get nowhere?} March 11, 2010

Posted by mwidlake in biology, development, Friday Philosophy, Management.
Tags: , ,
3 comments

A few years ago, my wife and I went to New York for a holiday. We got a cab from the airport into Manhattan. It was an expensive way to see, at great length, some of the more uninteresting automobile transit routes through New York. We arrived at our hotel a great deal later than we anticipated. And with a lot of our paper dollars no longer in our possession.

I’ve also taken cabs through London, usually at the weekend to get back to Liverpool Street Station. The trip is generally quick, painless and not too expensive, no matter what bit of London is currently being dug up. Those black-cab drivers know their stuff.

Of course, the CABs I refer to in the title of this Friday Philosophy are not private cars for hire. In this context CAB is Change Advisory Board. A term that can make grown developers weep. If you do not know, the Change Advisory Board is a group of people who look at the changes that are planed for a computer system and decide if they are fit for release. My personal experience of them has been similar to my experience of the taxi variety, though sadly more of the New York than London experience.

You might expect me to now sink into a diatribe {ie extended rant} about how I hate CABs. Well, I don’t. CABs can be a part of a valuable and highly worthwhile process control mechanism. Just as proper QA is core to any mature software development process, so CABs are important in getting informed, talented stakeholders to review proposed changes. They check for overall system impact, clashes with other proposed changes that individual development streams may be unaware of, use their own {hopefully deep and wide} experience to consider the changes and to verify Due Diligence has been invoked {that last one is a bit of a minefield and where, I believe, many CABs fail}.

Sadly, though this is often the aim, the end result is too often a bunch of uninformed and technically naive politicos trying to wield power, using the CAB meeting as an extended game of management chess.

I’ve seen CABs trade changes. “I’ll let you have X if I can have Y and Z”. I’ve seen CABs turn down changes because the form had spelling mistakes in it. I’ve seen CABs object to a change that will save the company 5 million pounds a day because it lacked one signature.

That last one just stopped me in my tracks {I’m not exaggerating either, if anything I am underplaying the cost impact of that decision. I saw the figures and I wasted a couple of days of my life checking, 5 million pounds a day was the least I felt I could prove.} We are talking about enough money every day to pay the salary of everyone on the CAB for several years. And they blocked it because the DBA team had not signed off the change. No effort was made to address the lack of the signature in any way, the change was just refused.

The DBA Team had not signed off the change because the one and only DBA Team Leader who was allowed to sign off was on holiday for two weeks. They needed that holiday too, for other but I suspect linked reasons.

Now, I knew the DBA Team Lead and he was a good bloke, he knew his stuff and he was not paid 5 million pounds a day. His deputy was paid even less but was no less talented but she was not allowed to sign off the change as she was not the DBA Team Lead.

That was a CAB gone very wrong. The process of the CAB had been allowed to over-rule good business sense. It was also overruling general and technical sense, but that really is secondary to what keeps the business making a profit.

I’ve seen the opposite of course, technical teams that just apply whatever changes they feel are fit, with no oversight or CAB. To be honest, this less controlled process seem to mess up less often than a poor CAB process as the technicians know they are the ones who will spend the weekend fixing a mess if one occurs. But that mess up will occur eventually, if control is lacking, and the bigger and more complex the IT environment, the greater the chance of the mess up.

So, I feel CABs are good, no make that Great, if you have the right people on them and you have a sensible cascade of authority so one person being away does not block the system. That is quite a bit harder to put in place than a simple “Dave A, John, Andrea, Alex, Raj, Dave P, Mal, Malcolm and Sarah have final signoff” which most CABs effecively become.

But there is one last fault of CABs I want to highlight. They tend to treat all changes in the same way and all changes are not the same. Upgrading the underlying OS is not the same as adding a cardinality hint to one Business Objects report.

If your CAB or change process treat the two above examples the same, then your CAB or change process is broken. Now, in all IT “rules of thumb” there is an exception. In this case, I am truly struggling to think of one. My feeling is that if your change process treats an OS upgrade the same as adding a hint to a report, it is not fit for purpose.

Those are my main issue with CABs. They should be of significant business importance, but nearly always they are implemented with one process to deal with all situations and then get taken over by people with an “Office Politics” agenda as opposed to a “Getting the best job we can reasonably expect done” agenda.

I’m very passionate about this and I have a way I hope can throw this issue into context, an analogy.

Ask yourself this senario.
You go to your doctor with a niggly cough you have had for a week OR you go to your doctor because you almost passed out each day you got out of bed for the last three days.
If your doctor treated you the same for both sets of symptoms, would you be happy with that doctor?

Why are all IT changes handled by most CABs in exactly the same way?

(BTW if you ever almost collapse when you get out of the bed in the morning, do NOT go to work, go instead to your doctor and ask them for a full medical and if he/she does not take blood pressure readings and order a full blood chemisty test, go find a new doctor.)

DBMS SIG March 10, 2010

Posted by mwidlake in internals, Meeting notes, performance.
Tags: , ,
5 comments

I went to the DBMS SIG today {DBMS Special Interest Group meeting of the UK Oracle User Group}. Don’t worry, I am not going to run through the set of presentations and make comments on them – although I like reading such entries by others on their blogs, I generally like them due to the style of writing as opposed to getting information out of them. But there is the odd tidbit that can be worth disseminating to as wide an audience as possible and I like to do my bit.

Having said I won’t go through all the talks… :-). No, I just want to comment that all the talks had merit at this meeting and, quite rightly, the meeting was full. This is nice to see as last year SIG attendance fell across the board, due to the “economic climate”. Very daft, in my opinion, as I see SIGs as free training plus networking opportunities (sorry to use the “networking” word” plus different viewpoints, all of which are invaluable at any time and especially valuable when things are hard.

Go to SIGs (or whatever is available in your country) as it is always worth the day invested. Where else can you see half a dozen experts give opinions and also corner them and ask them more stuff as well?

Anyway, the tidbits. First, Jonathan Lewis demonstrated how he goes about solving issues with complex SQL statements. It was terribly feindish and extremly clever and needs incredible in-depth knowledge of the oracle RDBMS… He draws a pseudo Entity Relationship Diagram of the tables involved, adds in some figures on what filtering will achieve and what ratio of records will be involved in table joins and asks the question “what will probably happen” a couple of dozen times. Yes, I lied, it does not need vast knowledge. It needs a clear, simple approach to solving the problem. And an ERD. I love ERDs and rue their general demise. I use exactly the same method myself to investigate complex SQL performance issues {I think I could be accused of trying to ride on shirt-tails here, but honestly, the step I take if an Explain Plan does not help me is to ERD the statement and look at the indexes and ratios between tables to see how I , as a human, would solve the query. Chatting to a few other old lags, it is a relatively universal approach by those of us who have used ERDs}. If you are a member of the UKOUG I strongly recommend downloading the slides to Jonathan’s talk. If you are not a member, maybe Jonathan will present it again at another venue, or you could get him to come along and do a course on tuning. {Jonathan, if you get a gig as a result if this, I want a pint of Marston’s Pedigree, OK?}

{And thanks to Sean malloy for commenting to provide a link to a published version of Jonathan’s method – Jonathan did mention this, highlighting the fact that it is his first real foray into SQL*Server. However, the method is database agnostic. This is the article}

Second tidbit. Adrian Dodman and Owen Ireland (who both look remarkably like Hollywood hearthrobs in their pictures, but different as their in-the-flesh selves, though very decent chaps they are too.) did an excellent talk on VLDB physical standbys, a topic that has particular resonance for myself. They mentioned parallel_execution_message_size. This defaults to 2k, on 10g at least. It is a rubbish setting. No, let me not beat about the bush, it is an utterly rubbish setting. If you use parallel query, parallel recovery or parallel anything-at-all, check out this parameter and, due dilligence allowing, increase it. Try 8k as opposed to 2k and even 16k. The manual on it says the default of 2k/4k is fine. It ain’t. Increasing the value just takes some memory out of the shared pool and, these days, if you can’t afford a few extra KB out of your shared pool, you need to replace your server with something costing about twice as much as a top-end desktop PC. { Why am I so vigorous in my opinion on this parameter? Well, I had a situation a few months back of trying to migrate a database to a new geographic location for a client in Germany. We did a backup/recovery type operation to do this. Applying the redo logs was proving to be a performance issue so Oracle Corp suggested parallel redo log application. It ran a LOT slower than single thread, about 300% slower. However, increasing the parallel_execution_message_size from 2k to 8k made the parallel log application about 400% faster than single thread. ie a dozen times faster. I know from presentations by Christian Antognini and discussions with others that it is a key parameter to getting parallel query to perform well too.}

Last tidbit. Don’t drop the OUTLN user. Yes, I know, why would you? Just don’t, OK? Especially on Oracle 11. If you do, for whatever reason, DO NOT SHUT DOWN THE DATABASE. Call Oracle Support and pray. Thanks go to Peter Mahaffey for that one. Yes he did. It all went terribly wrong for him.

Fun with Filters March 9, 2010

Posted by mwidlake in performance.
Tags: , ,
9 comments

{Please note – this post has been used to test changes to layout, thus the duplicated sections}

This post is about the importance of filter statements in execution plans and how they can indicate problems that are not always apparent.

I had a problem recently with some code that had gone rogue – it had been running in a few seconds, being executed every 10 minutes or so. Now it ran until either someone killed it off or it got “snapshot too old” errors. I pretty much knew it was prompted by stats being gathered on the tables in question as we had just gathered stats on the tables in that schema.

The code was something like this (it is not too important what exactly the code was). Oh, and this is on 10.2.0.3, enterprise edition with partitioning.

select  accounted, max(recorded_date),count(*)
from W_LCG_OPENING_
where accountid||DATA_SRC_COUNTRY_ID in
     (
       SELECT accountid||DATA_SRC_COUNTRY_ID FROM W_LCG_OPENING_
       minus
       SELECT  accountid ||DATA_SRC_COUNTRY_ID FROM W_LCG_CLIENT_
      )
   and 1= 1 – this is pointless, it replaces a check of “run”=”run”
   group by accountid,DATA_SRC_COUNTRY_ID

Those tables are quite large, several million rows in each. The code is basically scanning all of the rows in the table as there are no indexes to support the query and it is written in a way that, well, I would not write it.

Digging around in some monitoring tools I use I confirmed that the code has swapped plan

{original layout, having hacked it to fit}

inst SQL_ID        Plan Hash   First Load   execs   
---- ------------- ----------- ------------ ------ 
  TOT_ROWS       TOT_BG      TOT_DR        TOT_CPU    CREATED_DT
---------- ------------ ----------- --------------  ------------
   3 1t7ma4xn3rw46  3183284037 100203 09:10     125
     7,854     6,457,885   1208,149    816,476,585  091020 10:27
   3 1t7ma4xn3rw46  3127554972 100206 09:13       2
         0  1260,936,642        980  94599,678,960  100205 09:19

{newly possible layout, with wider screen}

inst SQL_ID        Plan Hash   First Load   execs    TOT_ROWS       TOT_BG      TOT_DR        TOT_CPU    CREATED_DT

---- ------------- ----------- ------------ ------ ---------- ------------ ----------- --------------  ------------
   3 1t7ma4xn3rw46  3183284037 100203 09:10    125      7,854    6,457,885    1208,149    816,476,585  091020 10:27
   3 1t7ma4xn3rw46  3127554972 100206 09:13      2          0 1260,936,642         980  94599,678,960  100205 09:19

Version 1 comes back in 5 or 6 seconds. Version 2 does not effectively come back, which is why it records 0 rows. You can see that the Disk Gets are pretty low in version 2 (allowing that it was never left to finish) but Buffer Gets and CPU are both massively up. So much so, it exceeds where I format the numbers with comas (as, if they are THAT big, they need fixing anyway).

I looked at the plans and, though they were different, nothing jumped out at me. So I trimmed down the code and built it up section by section until I saw something significant change. This is my general mode of tuning code if nothing initially occurs to me.
As usual, I started with any scalar or inline code, in this case that SELECT…MINUS…SELECT in the where clause.

SELECT accountid||DATA_SRC_COUNTRY_ID FROM W_LCG_OPENING_
Minus
SELECT  accountid ||DATA_SRC_COUNTRY_ID FROM W_LCG_CLIENT_

This came back in 3.145 seconds, with all of 62 rows. That is fine.

Now I added back the use of the MINUS code as a WHERE clause to the outer query.

select accountid
from  W_LCG_OPENING_
where accountid||DATA_SRC_COUNTRY_ID in
     (
       SELECT accountid||DATA_SRC_COUNTRY_ID FROM W_LCG_OPENING_
       minus
       SELECT  accountid ||DATA_SRC_COUNTRY_ID FROM W_LCG_CLIENT_
      ) 

This came back in 5.22 seconds with 4468 records. Again, I can live with that, we do not need sub-second response, we need sub-minute response.

So I now added back in the group by accountid…

select accountid,count(*)
from W_LCG_OPENING_
where accountid||DATA_SRC_COUNTRY_ID in
     (
       SELECT accountid||DATA_SRC_COUNTRY_ID FROM W_LCG_OPENING_
       minus
       SELECT  accountid ||DATA_SRC_COUNTRY_ID FROM W_LCG_CLIENT_
      )
group by accountid

This does not come back before I have fetched and consumed a cup of tea. That is a significant change.

I was not expecting this, why would sorting under 5 thousand rows stress a database that can scan multi-million row tables in seconds? There is all that CPU being burned up and huge numbers of buffer gets, but not much in the way of disc IO, so it is probably doing something that can be done in memory a very, very large number of times.

What is the difference in plan? {Oh HELL, they look rubbish – sorry, click on the images to blow them up, I’d suggest right-click and open in a new tab, and, YES, I am working on making my blog a little more “wide and friendly”.}. Good plan first, bad plan second

{Original compressed – width=”459″ height=”122″}

{Largest size I can use – width=”800″ height=”212″}

{Natural “overflow” size – width=”866″ height=”230″}

{ was width=”460″ height=”132″ now 800*242. Orig is 839* 242}

The view step has disappeared, which was an expensive step so should help not hinder.
The “SORT GROUP BY” has appeared, which it would do as we introduced the “GROUP BY” clause.
The hash join has been replaced with a filter. In a more complex plan you might miss this replacement, you would maybe notice the hash join disappearing but a filter, well, it is checking some simple “WHERE” clause isn’t it?

Well, the hash join was joining the results coming from the two steps feeding into it, the PARTITION RANGE FULL and the VIEW (which is the in-memory construct of the SELECT..MINUS…SELECT statement).

Now the Filter is filtering the results from the PARTITION RANGE ALL with the results from the MINUS. At this point I’d like to highlight that the predicted cardinality and bytes coming back for the steps within the union have reduced by a factor of 100 from the good and bad plans. And I’ll also admit I hid some key detail in the screen shot. I am not showing the access and filter predicates.

{image is 989*246, I am trying 800*246. Original 460*114}

The above is a screen shot showing that in the new code there are no filter predicates but an access predicate, for the access to the view (ie the select…minus…select). For easier reading, the full access predicate is below, rather than in the screen shot:

“$nso_col_1″=TO_CHAR(“ACCOUNTID”)||TO_CHAR(“DATA_SRC_COUNTRY_ID”)

However, for the slow code, there are no access predicated but are filter predicates. Again, the screen shot shows that there are predicates and I show the full text below. (Screen shots are from PL/SQL developer, btw).

{image 953*238, trying 800*238, original 460*114}

2- EXISTS ( (SELECT /*+ */ TO_CHAR("ACCOUNTID")
   ||TO_CHAR("DATA_SRC_COUNTRY_ID") 
FROM "W_LCG_OPENING_" "W_LCG_OPENING_" 
WHERE TO_CHAR("ACCOUNTID")||TO_CHAR("DATA_SRC_COUNTRY_ID")=TO_CHAR(:B1)||TO_CHAR(:B2))
MINUS (SELECT /*+ */ TO_CHAR("ACCOUNTID")||TO_CHAR("DATA_SRC_COUNTRY_ID")
FROM "W_LCG_CLIENT_" "W_LCG_CLIENT_" 
WHERE TO_CHAR("ACCOUNTID")||TO_CHAR("DATA_SRC_COUNTRY_ID")=TO_CHAR(:B3)||TO_CHAR(:B4)))	

8- TO_CHAR("ACCOUNTID")||TO_CHAR("DATA_SRC_COUNTRY_ID")=TO_CHAR(:B1)||TO_CHAR(:B2)	

10- TO_CHAR("ACCOUNTID")||TO_CHAR("DATA_SRC_COUNTRY_ID")=TO_CHAR(:B1)||TO_CHAR(:B2)	

Basically, the filter is to run the union query for every row from the driving query, passing in the relevant filter clause as an extra predicate to each of the individual queries of the SELECT…MINUS…SELECT

Clever, but not of much help to us as the performance is awful.

I showed this to a colleague and their immediate response was “why does that not show up as a nested loop? How am I supposed to spot that the “filter” is doing so much?” There were a couple of robust Saxon words mixed in with that statement.

So, Beware Filters replacing joins and get in the habit of checking out the filter and access predicates

If you use an old-style template for showing plans {like I do, a guilty sin of mine}, or a GUI where you have not selected that the filter and access predicates be shown, you may well not get them displayed. If you use autotrace in SQL*Plus, you will though:

db3_mw> select accountid,count(*)
  2  from W_LCG_OPENING_
  3  where accountid||DATA_SRC_COUNTRY_ID in
  4       (
  5         SELECT accountid||DATA_SRC_COUNTRY_ID FROM W_LCG_OPENING_
  6         minus
  7         SELECT  accountid ||DATA_SRC_COUNTRY_ID FROM W_LCG_CLIENT_
  8        )
  9  group by accountid
 10  /

Execution Plan
----------------------------------------------------------
Plan hash value: 397856216

--------------------------------------------------------------------------------
| Id  | Operation                | Name             | Rows  | Bytes | Cos
t (%CPU)| Time     | Pstart| Pstop |
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |                  |   911 |  7288 |
15M  (7)| 17:25:41 |       |       |
|   1 |  SORT GROUP BY           |                  |   911 |  7288 |
15M  (7)| 17:25:41 |       |       |
|*  2 |   FILTER                 |                  |       |       |
        |          |       |       |
|   3 |    PARTITION RANGE ALL   |                  |  3153K|    24M|  83
11   (4)| 00:00:34 |     1 |   840 |
|   4 |     TABLE ACCESS FULL    | W_LCG_OPENING_   |  3153K|    24M|  83
11   (4)| 00:00:34 |     1 |   840 |
|   5 |    MINUS                 |                  |       |       |
        |          |       |       |
|   6 |     SORT UNIQUE NOSORT   |                  | 31535 |   246K|  86
15   (7)| 00:00:35 |       |       |
|   7 |      PARTITION RANGE ALL |                  | 31535 |   246K|  86
11   (7)| 00:00:35 |     1 |   840 |
|*  8 |       TABLE ACCESS FULL  | W_LCG_OPENING_   | 31535 |   246K|  86
11   (7)| 00:00:35 |     1 |   840 |
|   9 |     SORT UNIQUE NOSORT   |                  |   132 |  1056 |
17  (18)| 00:00:01 |       |       |
|* 10 |      INDEX FAST FULL SCAN| W_LCG_CLIENT__PK |   132 |  1056 |
16  (13)| 00:00:01 |       |       |
--------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter( EXISTS ( (SELECT /*+ */ TO_CHAR("ACCOUNTID")||TO_CHAR("DATA_SRC_
") FROM
              "W_LCG_OPENING_" "W_LCG_OPENING_" WHERE
              TO_CHAR("ACCOUNTID")||TO_CHAR("DATA_SRC_")=TO_CHAR(:B1)|
|TO_CHAR(:B2))MINUS (SELECT /*+ */
              TO_CHAR("ACCOUNTID")||TO_CHAR("DATA_SRC_") 
FROM "W_LCG_CLIENT_"
              "W_LCG_CLIENT_" WHERE TO_CHAR("ACCOUNTID")||TO_CHAR("DATA_S
RC_")=TO_CHAR(:B3)||TO_CHAR(:B4) )))
   8 - filter(TO_CHAR("ACCOUNTID")||TO_CHAR("DATA_SRC_")=TO_CHAR(:B1)|
|TO_CHAR(:B2))
  10 - filter(TO_CHAR("ACCOUNTID")||TO_CHAR("DATA_SRC_")=TO_CHAR(:B1)|
|TO_CHAR(:B2)) 

{re-worked layout}

db3_mw> select accountid,count(*)
  from W_LCG_OPENING_
  where accountid||DATA_SRC_COUNTRY_ID in
       (   SELECT accountid||DATA_SRC_COUNTRY_ID FROM W_LCG_OPENING_
           minus
          SELECT  accountid ||DATA_SRC_COUNTRY_ID FROM W_LCG_CLIENT   )
  group by accountid

Execution Plan
----------------------------------------------------------
Plan hash value: 397856216

-------------------------------------------------------------------------------------------------------------
| Id  | Operation                | Name             | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
-------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |                  |   911 |  7288 |15M  (7)    |17:25:41  |       |       |
|   1 |  SORT GROUP BY           |                  |   911 |  7288 |15M  (7)    |17:25:41  |       |       |
|*  2 |   FILTER                 |                  |       |       |            |          |       |       |
|   3 |    PARTITION RANGE ALL   |                  |  3153K|    24M|  8311   (4)|00:00:34  |     1 |   840 |
|   4 |     TABLE ACCESS FULL    | W_LCG_OPENING_   |  3153K|    24M|  8311   (4)| 00:00:34 |     1 |   840 |
|   5 |    MINUS                 |                  |       |       |            |          |       |       |
|   6 |     SORT UNIQUE NOSORT   |                  | 31535 |   246K|  8615   (7)| 00:00:35 |       |       |
|   7 |      PARTITION RANGE ALL |                  | 31535 |   246K|  8611   (7)| 00:00:35 |     1 |   840 |
|*  8 |       TABLE ACCESS FULL  | W_LCG_OPENING_   | 31535 |   246K|  8611   (7)| 00:00:35 |     1 |   840 |
|   9 |     SORT UNIQUE NOSORT   |                  |   132 |  1056 |17  (18)    | 00:00:01 |       |       |
|* 10 |      INDEX FAST FULL SCAN| W_LCG_CLIENT__PK |   132 |  1056 |16  (13)    | 00:00:01 |       |       |
--------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter( EXISTS ( (SELECT /*+ */ TO_CHAR("ACCOUNTID")||TO_CHAR("DATA_SRC_COUNTRY_ID") FROM
              "W_LCG_OPENING_" "W_LCG_OPENING_" WHERE
              TO_CHAR("ACCOUNTID")||TO_CHAR("DATA_SRC_")=TO_CHAR(:B1)||TO_CHAR(:B2))MINUS (SELECT /*+ */
              TO_CHAR("ACCOUNTID")||TO_CHAR("DATA_SRC_") 
              FROM "W_LCG_CLIENT_"  "W_LCG_CLIENT_" 
              WHERE TO_CHAR ("ACCOUNTID")||TO_CHAR("DATA_SRC_")=TO_CHAR(:B3)||TO_CHAR(:B4)              )))
   8 - filter(TO_CHAR("ACCOUNTID")||TO_CHAR("DATA_SRC_COUNTRY_ID")=TO_CHAR(:B1)||TO_CHAR(:B2))
  10 - filter(TO_CHAR("ACCOUNTID")||TO_CHAR("DATA_SRC_")=TO_CHAR(:B1)||TO_CHAR(:B2)) 

What did I do to fix the problem? Well, even though the code originally went bad due to stats being gathered, I could not force it back to a nice plan after an hour or two playing with gathering new stats so, in this case, I used an UNNEST hint.

select accountid,count(*)
from W_LCG_OPENING_
where accountid||DATA_SRC_COUNTRY_ID in
     (
       SELECT /*+ unnest */ accountid||DATA_SRC_COUNTRY_ID FROM W_LCG_OPENING_
       minus
       SELECT  accountid ||DATA_SRC_COUNTRY_ID FROM W_LCG_CLIENT_
      )
group by accounted

And it forced the plan back to the version using a HASH join.

The hinted plan

I’m a little unhappy about that hint, but the code needed fixing and the hint did what I wanted it to {I think it was poor of me to only hint one of the two minus statements and I do not like resorting to hints just because I can’t work out what is “wrong” with the stats – If I work out what is wrong and still need the hint, fair enough}. However, I had to be pragmatic and get the code fixed and working in Live, so it has gone in with a hint

Follow

Get every new post delivered to your Inbox.

Join 166 other followers