jump to navigation

Friday Philosophy – Oracle Performance Silver Bullet August 5, 2011

Posted by mwidlake in Architecture, Friday Philosophy, performance.
Tags: , , ,
15 comments

Silver Cartridge and Bullet

For as long as I have been working with Oracle technology {which is now getting towards 2 decades and isn’t that pause for thought} there has been a constant search for Performance Silver Bullets – some trick or change or special init.ora parameter {alter system set go_faster_flag=’Y’} you can set to give you a guaranteed boost in performance. For all that time there has been only one.

There are a few performance Bronze Bullets…maybe Copper Bullets. The problem is, though, that the Oracle database is a complex piece of software and what is good for one situation is terrible for another. Often this is not even a case of “good 90% of the time, indifferent 9% of the time and tragic 1% of the time”. Usually it is more like 50%:30%:20%.

Cartridge with copper bullet &spent round

I’ve just been unfair to Oracle software actually, a lot of the problem is not with the complexity of Oracle, it is with the complexity of what you are doing with Oracle. There are the two extremes of OnLine Transaction Processing (lots of short running, concurrent, simple transactions you want to run very quickly by many users) and Data Warehouse where you want to process a vast amount of data by only a small number of users. You may well want to set certain initialisation parameters to favour quick response time (OLTP) or fastest processing time to completion (DW). Favouring one usually means a negative impact on the other. Many systems have both requirements in one… In between that there are the dozens and dozens of special cases and extremes that I have seen and I am just one guy. People get their database applications to do some weird stuff.

Partitioning is a bronze bullet. For many systems, partitioning the biggest tables makes them easier to manage, allows some queries to run faster and aids parallel activity. But sometimes (more often than you might think) Partitioning can drop rather than increase query or DML performance. In earlier versions of Oracle setting optimizer_index_caching and optimizer_index_cost_adj was often beneficial and in Oracle 9/8/7 setting db_file_multiblock_read_count “higher” was good for DWs….Go back to Oracle 7 and doing stuff to increase the buffer cache hit ratio towards 98% was generally good {and I will not respond to any comments citing Connors magnificent “choose your BCHR and I’ll achieve it” script}.
You know what? There was an old trick in Oracle 7 you could maybe still look at as a bronze bullet. Put your online redo logs and key index tablespaces on the fastest storage you have and split your indexes/tables/partitions across the faster/slower storage as is fit. Is all your storage the same speed? Go buy some SSD and now it isn’t….

Cartridge with Wooden Bullet

Then there are bronze bullets that you can use that very often improve performance but the impact can be catastrophic {Let’s call them wooden bullets :-) }. Like running your database in noarchivelog mode. That can speed up a lot of things, but if you find yourself in the situation of needing to do a recovery and you last cold backup is not recent enough – catastrophe. A less serious but more common version of this is doing things nologging. “oh, we can just re-do that after a recovery”. Have you done a test recovery that involved that “oh, we can just do it” step? And will you remember it when you have a real recovery situation and the pressure is on? Once you have one of these steps, you often end up with many of them. Will you remember them all?

How many of you have looked at ALTER SYSTEM SET COMMIT_WRITE=’BATCH,NOWAIT’? It could speed up response times and general performance on your busy OLTP system. And go lose you data on crash recovery. Don’t even think about using this one unless you have read up on the feature, tested it, tested it again and then sat and worried about could possibly go wrong for a good while.

That last point is maybe at the core of all these Performance Bronze Bullets. Each of these things may or may not work but you have to understand why and you have to understand what the payback is. What could now take longer or what functionality have I now lost? {hint, it is often recovery or scalability}.

So, what was that one Silver Bullet I tantalizingly left hanging out for all you people to wait for? You are not going to like this…

Look at what your application is doing and look at the very best that your hardware can do. Do you want 10,000 IOPS a second and your storage consists of less than 56 spindles? Forget it, your hardware cannot do it. No matter what you tune or tweak or fiddle with. The one and only Performance Silver Bullet is to look at your system and your hardware configuration and work out what is being asked and what can possibly be delivered. Now you can look at:

  • What is being asked of it. Do you need to do all of that (and that might involve turning some functionality off, if it is a massive drain and does very little to support your business).
  • Are you doing stuff that really is not needed, like management reports that no one has looked at in the last 12 months?
  • Is your system doing a heck of a lot to achieve a remarkably small amount? Like several hundred buffer gets for a single indexed row? That could be a failure to do partition exclusion.
  • Could you do something with physical data positioning to speed things up, like my current blogging obsession with IOTs?
  • You can also look at what part of your hardware is slowing things down. Usually it is spindle count/RAID level, ie something dropping your IOPS. Ignore all sales blurb from vendors and do some real-world tests that match what you app is or wants to do.

It’s hard work but it is possibly the only Silver Bullet out there. Time to roll up our sleeves and get cracking…

{Many Thanks to Kevin Closson for providing all the pictures – except the Silver Bullet, which he only went and identified in his comment!}

Friday Philosophy – Picture Theft!!! July 28, 2011

Posted by mwidlake in Blogging, Friday Philosophy, Perceptions.
Tags: , , ,
7 comments

Last week’s Friday Philosophy was a bit of a moan about how hard I find it to make nice graphics, how long it takes and no one seems to care that much about the results.

Well, after those two days effort on the pictures and the afore mentioned moan, irony of irony, someone has stolen one of my graphics!. So someone likes my efforts ;-). It is the one that represents how you scan down the levels of an index and then link across to the table via the rowid.

Before I go any further I better make it clear that I am not really upset about it at all :-). In fact, since the scoundrel included a link back to my web page and they are considerably better known than I, my little blog has had a big up-swing in traffic as a result, which is nice. Mind you, as the person who borrowed my diagram is SQL Server expert Steve Jones, of SQLSeverCentral/Redgate fame, most of my new audience are probably pretty focused on the SQL Server RDBMS and not Oracle, so unlikely to make many return visits unless they are work across the RDBMS boundaries.

What also gives me a little smile is that I have stumbled over the fact that I myself, back in November 2009, was looking for such a diagram {of the way Oracle steps down the index to the leaf blocks, gets the rowid and then straight to the table row} to ‘borrow’ for a post of my own on BLevel and heights of indexes. I even confessed at the time to looking for and failing to find one to use…

Humour aside, it set me to thinking though. Borrowing content is a perennial and thorny issue.

Occasionally someone will start putting content out on their blog or web site and it turns out that much of that content is directly obtained from other peoples’ blogs and websites – copy&pasted straight in or with little changes. That is generally seen by the original author as unacceptable and once they find out they object. In such cases it sometimes seems the culprit is unaware of this being a transgression and, once it is explained that they have effectively stolen many hours or days of someone’s efforts, they remove the material. Others seem aware this is theft but do not care until caught. Occasionally the culprit sees no error in their ways at all, even when challenged, as the material had been put “out there” so they now consider it free to all. I certainly do not agree. Perhaps the worst thing you see though is people including parts of published books, or even putting the whole book out there for download. Such people should of course have their hands stapled to their backsides in punishment, that is simple theft. Writing blogs takes a long time and effort, writing technical books takes forever and monumental effort. I know from friends that the financial return for such efforts is pitiful enough as it is.

On the other side of the coin, many of us put our stuff out there on the web to be read and used and are very happy for it to spread, to be borrowed from and disseminated. Like nearly all DBAs and developers, over the years I have written lots of little SQL scripts to pull information out of the data dictionary or do little database management tasks. I happily give away copies of these to anyone who wants them (and you can get them off my web site if you like, but just pretend it is not my website, as it is truly awful). All I ever ask is that whoever takes them leaves my name in them.

I think that is core to the issue. I suspect many of us bloggers are happy for small parts of our output to be borrowed so long as credit is given. I certainly am {but please note, this is my personal opinion – other bloggers may object very strongly and any repercussions on you in respect of taking material from other blogs and web sites is your concern}. However, Volume is also part of it. The larger the chunk you borrow, the more acknowledgement I would need to be happy about it. Borrowing a single diagram or a paragraph out of a page of text is OK, given I am cited for it. Taking most of a post would probably not, unless you asked first, were really nice about it and about me. Nicking a set of course notes I wrote is certainly unacceptable, no matter how much you put “originally written by that wonderful Martin Widlake” on it.

So, I think you need to cite the source as “payment” for using it. Perhaps the best way to do it is by simply linking to the material rather than putting it on your blog/website, but that does not work if you need the content within yours to make sense. In which case, I think Steve Jones’ approach of putting the content in his and including a link is reasonable. It might have been nice if there was a comment saying where the image came from but I can live without it. Despite my joking about it giving me more hits to my blog, it does not matter that his is a popular web site and gives me more hits. Even if a site gets no traffic, if someone has borrowed a small part of my output but cited me as the source, I’m cool with that.

The problem though is judging what is a “small” part to borrow and what is acceptable to the original author. We all perceive such things differently. So the safest thing is to ask the original author. If I want to use an idea that came from someone else in one of my blogs or a solution they came up with, I always ask and I ask if they want to be cited. This includes discussions in email or in the pub. I ask. If when preparing my blogs I learn a lot from someone else’s blog, I stick in a link and a comment, even though I will have written my own text. I hope that so far I have not upset anyone when I borrow a little.

Photos are a different issue though. I am not going to even attempt to cover that one!

Snowdon viewed from Yr Aran

Friday Philosophy – PowerPoint Picture Pain July 21, 2011

Posted by mwidlake in Friday Philosophy, humour.
Tags: ,
16 comments

The title does not refer to the agony of sitting through a presentation with loads of pointless graphics and zip-all content. Rather, it encompasses the agony of trying to produce some decent graphical content for PowerPoint. Or Word or any other software-based method of presenting information. I think we generally had better presentations when we had overhead projectors or flip charts and I go “powerpoint naked” to present sometimes.

However, I do present quite often using the ubiquitous PowerPoint. I spend hours on my slides, both trying to put good textual content into them for both the actual presentation and also for anyone who downloads them later. I also spend ages trying to put together graphics as we humans are visual creatures and a good {let’s pause for a second and stress the word good} picture or diagram can help enormously. I did a little animated graphic a few years back to show how DNA is unzipped and duplicated, allowing the near perfect transmission of our DNA as cells divide. Modesty prevents me saying it was good but {drat, I just said it} but several people told me it really helped them understand something that text descriptions struggle to put across.

{Hmm, OK, I’ll see if I can get this to work. Try this how-dna-duplicates powerpoint show. It shows how the two strands of DNA are “matched”. A, C, T and G are the 4 letters of the DNA and each one is associated with its “pair” in the second strand. A to T, T to A, C to G and G to C. Split the pair into two strands and you can make each a perfect copy of the original by adding back the pairs. Understand? Oh, and there are very, very, very rare copy mistakes, in this case a C is matched to a T rather than an A. Go on, fire up the slide.}

The reality is though that most of my presentations have very average slides. When you present for the UK Oracle User group they are good enough to solicit feedback from the audience and feed it back to the presenter. You get scored on topic, content, style, slides, charisma, sex appeal… The lowest part of my score is always, Always, ALWAYS the slides. This started getting on my nerves so one year I tried really hard on the slides. It was still my weakest point.

The thing is, I have never had a graphics team (let alone a whole marketing department) to help me out. Either I have worked for a company with no such thing or I have been a lone consultant doing what I can in my spare time. It is especially frustrating as large companies that do have access to teams of graphic artists seem to present with a large number of very pretty, very well drawn slides with zip-all content and the graphics add almost nothing to understanding (and we are back to the first line of this posting). In fact, I often find that slides with lots of pretty icons representing part of the technology stack with chunky arrows sweeping across them showing flow to be {cough} pointless to my understanding. One thing I have picked up from all those pretty pictures though is that a feeling of depth and texture is very effective in altering how a slide looks.

So, I decided for my posts on Index Organised Tables I would spend some time messing around with such things in PowerPoint. My Graphic of the Thread is:

I’m quite pleased with it, especially as in doing so I found out quite a bit about how to use the gradient fills, shadowing, 3D formating and rotating of simple shapes. It is certainly better than my previous stock graphic for an index:

I think most people would agree, it is an improvement. But is it two days worth of effort improvement. Yes, that is how long I spent messing around with the image options in PowerPoint to replace a flat, uninspiring green blob with 3 lines on it to represent an index into a nicely shaped, pseudo-3D image with a tad more information. *sigh*.

I guess I can now convert other flat pictures made up of boxes, triangles and lines into something that look a bit more like a professional diagram than something knocked up by a database geezer late one evening the day before the presentation. But if it does not help me produce a diagram that aids understanding rather than just presentation fluff, like those marketing slides I’ve just wasted a lot of time.

I tell you what though, having spent so long on that diagram, I’m going to use it to death :-)

When to Fix & When to Find Out – Friday Philosophy July 15, 2011

Posted by mwidlake in AWR, Friday Philosophy, Testing.
Tags: ,
8 comments

{warning, this is one of my long, rambling Friday Philosophy blogs. Technical blogs are on the way – though there are some nice AWR screen shots in this one :-) }

As a DBA (or System Administrator or Network admin or any other role that involves fixing things that are broken before the shouting starts) there is often a tension between two contending “best practices”:

- getting the system working again as soon as possible.
or
– understanding exactly what the problem is.

Some experts point out {and I generally agree} that unless you actually find out exactly what the problem was, what you did to fix it and via a test case demonstrate why the fix worked, you are not really solving the problem – You are just making it go away. You (hopefully!) know what you changed or the action you took so you have something that you can repeat which might fix it again next time. (NB this does not apply to the classic “turn it off, turn it back on again”, that nearly always is just an aversion therapy).

But it might not fix it next time.

Or you might be going to way more trouble to fix it than is needed {I’m remembering how flushing the shared pool used to get touted as a way to resolve performance issues, but it is a pretty brutal thing to do to a production database}.

You might even be making other problems worse {like slowing down everything on the production database as the caches warm up again, having flushed out all that data in the SGA, or that good-old-standard of rebuilding indexes that simply do not benefit from the rebuild}.

There is another issue with “just fixing it by trying things” in that you are not really learning about the cause of the issue or about how the technology you are looking after works. A big part of what makes an “expert” an expert is the drive, desire and opportunity to take the time to work this out. It’s that last part I sometimes get grumpy about, the opportunity.
For many of us, we do not have the luxury of investigating the root cause. Because we are under so much pressure to “get it fixed right now and worry about the detail later”. But we do not get to the detail as there is then the next “fix this problem right now” and so it goes on. {Kellyn Pot’Vin does a nice post about what she calls the Superman Conundrum on this topic}.

I’ve had exactly this dilema just a couple of months ago. Below are the details, it’s a bit long so skip to the end of the post if you like…

I was creating test data and I decided to use parallel processing to speed it up. I created a month’s worth of data with PL/SQL and then decided to copy it with a simple “insert into …select” statement, updating dates and a couple of other incrementing columns as part of the insert, using parallel. The creation of the second month’s data took longer than the PL/SQL code for the first month took. What the!!!??? I pulled up the AWR information and I could see that the problem was (possibly) with inter process communication between the parallel processes, as shown by the PX DEQ CREDIT:send blkd wait event.

The below screenshot shows the overall instance workload, the green is CPU and the Pink is “Other”. Only one SQL statement is responsible for all of this effort, via 5 sessions (four of which are parallel threads) You can see that the issue had been going on for over an hour {oh, and to a certain extent these pretty pictures are pointless – I am not looking for the exact solution now, but having loaded the pictures up to the blog site I am damn well going to pretty-up my blog with them}:

Drilling into that one session shows that the bulk of the waits by far is for PX DEq Credit: Send blkd:

By selecting that wait event, I got the histogram of wait times since the system started {I love those little histograms of wait times}:

Note that these waits are for long periods of time, around 1/10 of a second on average and some are for up to 2 or 4 seconds.

The thing is, I had anticipated this problem and increased my PARALLEL_EXECUTION_MESSAGE_SIZE to 16K from the default of 2K already, as I knew from experience that the default is way to small and has slowed down parallel execution for me before. So why was I seeing poorer performane now than I anticipated? I’m not understanding stuff. So I needed to change one thing and see what impact it has and repeat until I got to the bottom of it.

Except I could not – the next team in line was waiting for this data and they already had massive pressure to deliver. My job, what my employer was giving me money to do, was to fix this problem and move on. So, in this instance I just pragmatically got on with getting the task done as quickly as I could.

I did what we all do {don’t we?} under times of accute time pressure. I made a handful of changes, using the knowledge I already have and guessing a little, hoping that one of them would do the trick. This included reducing the degree of parallelism, adding the /*+ append */ hint (I simply forgot the first time around), pre-allocating the required space to the tablespace, muttering “pleaseopleaseoplease” under my breath….

It worked:

The job ran in less than 20 minutes and used less resource during that time. Well, it waited less anyway.
The wait histograms show lots and lots of shorter duration waits:

The duplication took 20 minutes when the previous attempt had been terminated after 2 hours when other factors forced it to be curtailed. Job done.

But the thing is, the problem was not fixed. I got the task done in a timescale that was required, I satisfied the need of the client, but I was and am not sure exactly why.

If I was a permanent employee I would consider pressing for being allowed to spend some time working this out, as my employer benefits from me extending my knowledge and skills. This is not always a successful strategy :-) {but it does indicate to me that my employer has a Problem and I need to address that}. In any case, as I was a consultant on this job, I was being paid to already know stuff. So it is now down to me, some personal time, the internet and people more knowledgeble than me who I can ask for help to sort this out.

And that was my main point. Often at work you have to get over the issue. Later on, you work out what the issue was. I wish I could get paid for the latter as well as the former. The real blow for me is that I no longer have access to that site and the information. My job was complete and, whether they have really shut down my access or not, I am not allowed to go back into the systems to dig around. I think I now know the issue, but I can’t prove it.

Friday Philosophy – Why do I work with Oracle Technology? July 8, 2011

Posted by mwidlake in Friday Philosophy.
Tags:
23 comments

As an Oracle Expert {*cough* bear with me, despite the lack of humility} I make a living based on my skills and knowledge about Oracle Technology. But why Oracle?

I was prompted to think about this as a side issue to a discussion within the OakTable network, about being aligned with companies – and it was suggested we are aligned with Oracle {my personal feeling is strongly that we are not – we are independent of Oracle Corp}.

How many people reading this blog woke up one morning and thought “Hey, I’m going to review all database technologies, work out the best one and work with that!” And then, after investigations, threw their weight behind Oracle? I certainly did not. I joined a company called Oracle UK almost as a way to escape my then management structure. I thought they were a teletext company {see the “about me” tab if you want a bit more detail on that}. I remain working with Oracle technology primarily because that is what I know the most about. It is by luck that I had stumbled into a technical area that went from strength to strength.

Don’t get me wrong, I think some Oracle technology is very, very good. Most of it is OK and some of it is, well, neither of those two. It is much better than the technology I came from (a language called MUMPS) which, though I still have fond memories of, I decided not to return to when the opportunity came up a couple of years into my Oracle life.

I think it is fair to say that I do not work within the Oracle sphere because I am dedicated to the Larry Ellison world vision {if I could work out what it is, but it seems to have less and less to do with a relational database and more and more about being the IT answer to all business needs, for good or bad}. I work in it because it is where I am and it is good enough technology to not demoralise me too much, plus there is enough work to pay for the cat food and the beer. I also suspect most of you are like me – you work with Oracle Technology because you woke up one day and realised that was where you were. Of course, this probably applies to 95% of people in 95% of jobs.

If I was to have the chance to choose my working career again, would I do the same? Would I work with Oracle technology? Well, it is one of the largest technologies around and so it provides a good source of work. It is probably not a bad choice if you are starting out right now. Actually, if I was to play this game again I would probably not be aiming for computer technology at all, I would hope to be brave enough to aim for what I really wanted to do and try to do medicine and become a surgeon {I did not as I feared my woodwork skills were so poor I would end up a GP, which strikes me as a bloody awful job} or stick with the genetics, which I still love. But if it had been IT? Hmmm, I think I would have gone lower level. I wish I knew how hardware really works. But then, how many jobs are there now in low-level firmware?

So I am in the world of Mr Ellison’s RDBMS and happy there. But importantly, I feel independent of Oracle. I can like what I like about the technology and dislike what I don’t and I can say which is which, just to link back to the topic of being aligned to a company. That is a small part of why I went back to being and independent consultant too. I do not feel obligated to support a company as they pay my wage. I suppose I feel obligated to be not-negative about a company that employs my services, but that can be another topic another day.

Friday Philosophy – The Secret to Being a Good IT Manager June 3, 2011

Posted by mwidlake in Friday Philosophy, humour, Management.
Tags: , ,
10 comments

If you go into a book shop there will probably be a section on business and, if there is, there will almost certainly be a load of books on how to be a manager. Shelves and shelves of them. There is also a large and vibrant market in selling courses on management and aspects of management. I’ve been on a couple of such course and, if you can manage to be open minded whilst keeping a cynical edge, I think they can be useful.

However, I think I most of them are missing the key points and that if you can but hold on to the following extensive list of guiding principles you will be a good IT manager. Maybe even an excellent one :-):

  1. Your top priority, at all times, is to see to the best interests of your people.
  2. Whatever you develop, be it code, databases, network, a team of support staff – User Acceptance is paramount.
  3. You must find ways to deal with other teams and your own management hierarchy in such a way as to be allowed to do (1) and (2).
  4. That’s it.
  5. OK, if pushed, I’d say Never Lie. Maybe that’s just personal though, it’s because I don’t have the memory, audacity or swiftness of mind to pull it off. By not lying I don’t have to try and construct what I said to who and why.

I’m sure people could cite some other hard rules like “you must be within budget” or “you need to get buy-in to your vision” but I don’t agree. Budgets can be negotiated and the difference between those deemed visionaries and those deemed fantasists seems to be to me down to success and luck. Luck is luck and for success I refer you to points 1 through 5.

OK, maybe a final rule is:

  • Never ask for or aim for something that is not realistic.

So, I am now able to develop my team and my application and not expect to be able to spend half the company profit on the fastest box out there, as it is not realistic.

There are a shed load of other things that I think are important to helping you be a good manager, you know, techniques and methods for improving things, but nothing else that is key.

And it’s such a simple, small list even I can aim for it.

The shame of it is that I don’t think it’s enough to be developed into a book or a course so I can’t sell the idea. That and I’ve gone and given it away in this blog. Also, though I feel I can give points 1,2 and 5 a good shot, point 3 is way beyond me…possibly because of point 5… So I am not a great manager.

I’m going to hide behind this stout wall now, with my hard hat on, and wait to be told how naive I am…

Update – A couple of weeks later, Kellyn on her DBA Kevlar blog put similar sentiments to looking after your guys, more from the employee’s perspective and far better covered

Why given so many of us feel this way and want things to be this way…are they not?

If you Really Can’t Solve a “Simple” Problem.. March 11, 2011

Posted by mwidlake in Friday Philosophy, Perceptions, Testing.
Tags: , ,
12 comments

Sometimes it can be very hard to solve what looks like a simple problem. Here I am going to cover a method that I almost guarantee will help you in such situations.

I recently had a performance issue with an Oracle database that had just gone live. This database is designed to scale to a few billion rows in two key tables, plus some “small” lookup tables of a few dozen to a couple of million rows. Designing a system of this scale with theory only is very dangerous, you need to test at something like the expected volumes. I was lucky, I was on a project where they were willing to put the effort and resource in and we did indeed create a test system with a few billion rows. Data structure and patterns were created to match the expected system, code was tested and we found issues. Root causes were identified, the code was altered and tested, fine work was done. Pleasingly soon the test system worked to SLAs and confidence was high. We had done this all the right way.

We went live. We ramped up the system to a million records. Performance was awful. Eyes swung my way… This was going to be easy, it would be the statistics, the database was 2 days old and I’d warned the client we would need to manage the object statistics. Stats were gathered.
The problem remained. Ohhh dear, that was not expected. Eyes stayed fixed upon me.

I looked at the plan and I quickly spotted what I knew was the problem. The below code is from the test system and line 15 is the key one, it is an index range scan on the primary key, within a nested loop:

   9 |          NESTED LOOPS                       |                           |     1 |   139 |    37   (3)| 00:00:01 |       
* 10 |           HASH JOIN SEMI                    |                           |     1 |    50 |    11  (10)| 00:00:01 |       
* 11 |            TABLE ACCESS BY INDEX ROWID      | PARTY_ABCDEFGHIJ          |     3 |   144 |     4   (0)| 00:00
* 12 |             INDEX RANGE SCAN                | PA_PK                     |     3 |       |     3   (0)| 00:00:01 |       
  13 |            COLLECTION ITERATOR PICKLER FETCH|                           |       |       |            |          |       
  14 |           PARTITION RANGE ITERATOR          |                           |    77 |  6853 |    26   (0)| 00:00:01 | 
* 15 |            INDEX RANGE SCAN                 | EVEN_PK                   |    77 |  6853 |    26   (0)| 00:00:01 | 

On the live system we had an index fast full scan (To be clear, the below is from when I had tried a few things already to fix the problem, but that index_fast_full_scan was the thing I was trying to avoid. Oh and, yes, the index has a different name).

|   9 |          NESTED LOOPS                 |                           |     1 |   125 |  1828   (3)| 00:00:16 |
|  10 |           NESTED LOOPS                |                           |     1 |    63 |     2   (0)| 00:00:01 |
|* 11 |            TABLE ACCESS BY INDEX ROWID| PARTY_ABCDEFGHIJ          |     1 |    45 |     2   (0)| 00:00:01 |
|* 12 |             INDEX UNIQUE SCAN         | PA_PK                     |     1 |       |     1   (0)| 00:00:01 |
|* 13 |            INDEX UNIQUE SCAN          | AGR_PK                    |     1 |    18 |     0   (0)| 00:00:01 |
|  14 |           PARTITION RANGE ITERATOR    |                           |     1 |    62 |  1826   (3)| 00:00:16 |
|* 15 |            INDEX FAST FULL SCAN       | EVE_PK                    |     1 |    62 |  1826   (3)| 00:00:16 |

Now I knew that Oracle would maybe pick that plan if it could get the data it wanted from the index and it felt that the cost was lower than doing multiple range scans. Many reasons could lead to that and I could fix them. This would not take long.

But I could not force the step I wanted. I could not get a set of hints that would force it. I could not get the stats gathered in a way that forced the nested loop range scan. I managed to alter the plan in many ways, fix the order of tables, the types of joins, but kept getting to the point where the access was via the index fast full scan but not by range scan. I thought I had it cracked when I came across a hint I had not known about before, namely the INDEX_RS_ASC {and INDEX_RS_DESC} hint to state do an ascending range scan. Nope, no joy.

By now, 8 hours had passed trying several things and we had a few other people looking at the problem, including Oracle Corp. Oracle Corp came up with a good idea – if the code on test runs fine, copy the stats over. Not as simple as it should be as the test system was not quite as-live but we did that. You guessed, it did not work.

So what now? I knew it was a simple problem but I could not fix it. So I tried a technique I knew had worked before. I’d long passed the point where I was concerned about my pride – I emailed friends and contacts and begged help.

Now, that is not the method of solving problems I am writing about – but it is a damned fine method and I have used it several times. I highly recommend it but only after you have put a lot of effort into doing your own work, if you are willing to give proper details of what you are trying to do – and, utterly crucially, if you are willing to put yourself out and help those you just asked for help on another day.

So, what is the silver bullet method? Well, it is what the person who mailed me back did and which I try to do myself – but struggle with.

Ask yourself, what are the most basic and fundamental things that could be wrong. What is so obvious you completely missed it? You’ve tried complex, you’ve been at this for ages, you are missing something. Sometimes it is that you are on the wrong system or you are changing code that is not the code being executed {I’ve done that a few times over the last 20 years}.

In this case, it was this:

Here is my primary key:

EVEN_PK EABCDE 1 AGR_EXT_SYS_ID
EVEN_PK EABCDE 2 EXT_PRD_HLD_ID_TX
EVEN_PK EABCDE 3 AAAMND_DT
EVEN_PK EABCDE 4 EVT_EXT_SYS_ID
EVEN_PK EABCDE 5 EABCDE_ID

Except, here is what it is on Live

EVE_PK EABCDE 1 EVT_EXT_SYS_ID
EVE_PK EABCDE 2 EABCDE
EVE_PK EABCDE 3 AGR_EXT_SYS_ID
EVE_PK EABCDE 4 EXT_PRD_HLD_ID_TX
EVE_PK EABCDE 5 AAAMND_DT

Ignore the difference in name, that was an artifact of the test environment creation, the key thing is the primary key has a different column order. The DBAs had implemented the table wrong {I’m not blaming them, sometimes stuff just happens OK?}.
Now, it did not alter logical functionality as the Primary Key is on the same columns, but as the access to the table is on only the “leading” three columns of the primary key, if the columns are indexed in the wrong order then Oracle cannot access the index via range scans on those values! Unit testing on the obligatory 6 records had worked fine, but any volume of data revealed the issue.

I could not force my access plan as it was not possible – I had missed the screaming obvious.

So, next time you just “know” you should be able to get your database (or code, or whatever) to do something and it won’t do it, go have a cup of tea, think about your last holiday for 5 minutes and then go back to the desk and ask yourself – did I check that the most fundamental and obvious things are correct.

That is what I think is the key to solving what look like simple problems where you just can’t work it out. Try and think even simpler.

Where is Sun? January 3, 2011

Posted by mwidlake in Friday Philosophy, Perceptions.
Tags: ,
2 comments

First of all, may I wish everyone who comes by my blog a heartfelt Happy New Year.
Secondly, I promise I’ll blog more often and more on technical aspects this year than I have for most of 2010.
Thirdly, I’ll admit the title to this blog is nothing to do with the hardware company now owned by Mr Larry Ellison, but is about the huge glowing ball of fire in the sky (which we have not seen a lot of here in England and Wales for the last couple of weeks – not sure about Scotland but I suspect it has been the same). I apologise for the blatantly misleading (and syntactically poor) title.

A quick question for you – It is the depths of winter for most of us, and it has been unusually cold here in the UK and much of Europe. When are we, as a planet, furthest from the Sun during winter? January the 1st? The Shortest day (21st December)? The day the evening start drawing out (December 14th)?
I think many in the Northern Hemisphere will be surprised to learn that we are closest to the sun today (3rd Jan 2011). A mere 147.104 million kilometers from the centre of our solar system. I mentioned this to a few friends and they were all taken aback, thinking we would be furthest from the warmth of the sun at the depths of our winter.

Come the 4th July 2011 it is not only some strange celebration in the US about having made the terrible decision to go it alone in the world {Joke guys!}, but is the day in the year that the Earth is furthest from the sun – 152.097 million kilometers. That is about 3.39% further away and, as the energy we receive from the sun is equal to the square of the distance, does account for a bit of a drop in the energy received. {Surface of a sphere is 4*pi*(R{adius}squared), you can think of the energy from the sun as being spread over the sphere at any given distance}.

Some of you may be wondering why this furthest/closest to the sun does not match the longest/shortest day. As some of you may remember, I explained about the oddities of the shortest day not matching when the nights start drawing out about this time last year. It is because as we spin around our own pole and around the sun, things are complicated by the fact that the earth “leans over” in it’s orbit.

Check out this nice web site where you can state the location and month you want to see sunrise, sunset, day length and (of particular relevance here) the distance from the sun for each day.

I find it interesting that many of the things us most of us see as “common sense” are often not actually right (I always assumed that the shortest day coincided with both the evenings starting to draw out and mornings getting earlier until I stumbled across it when looking at sunset times – I had to go find a nice Astronomer friend to explain it all to me). I also like the fact that a very simple system – a regularly spinning ball circling a large big “fixed point” in a fixed way – throws up some oddities due to little extra considerations that often go overlooked. Isn’t that so like IT?

That lean in the Earth’s angle of spin compared to the plane we revolve around the sun is slowly rotating too, so in a few years (long, long, long after any of us will be around to care) then the furthest point in the orbit will indeed match the northern hemisphere winter. Again like IT, even the oddities keep shifting.

Friday Philosophy – Run Over by a Bus December 3, 2010

Posted by mwidlake in Friday Philosophy, Perceptions.
Tags: , ,
7 comments

I chaired a session at the UKOUG this week by Daniel Fink, titled “Stop Chasing your tail: Using a Disciplined Approach to Problem Diagnosis”. It was a very good talk, about having a process, an approach to solving your IT problems and that it should be a process that suits you and your system. All good stuff and I utterly agree with what he said.

But it was a passing comment Daniel made that really set me thinking. It was something like:

You should be considering how people will look after the system after you have gone, the classic ‘what will we do if you are hit by a bus’….. No, I don’t like thinking like that, that phrase… I prefer ‘after you win the lottery and retire to a great life’.

It just struck a chord with me. Mr Fink’s {and I do go all formal when I intend respect} take on this is a far more positive way of looking at the situation of leaving the system in a state that others can look after once you are no longer able to help. The “Bus” phrase is very, very common, at least in the UK and I suspect in the US, and it is a very negative connotation. “Make sure it all works as something nasty is going to happen to you, something sudden, like being smeared across the tarmac by 25 tons of Greyhound doing 50mph, something basically fatal so you can’t prepare and you can’t help any more”. So, not just moved on, but dead.

Daniel made me realise that we should be looking at this from totally the other perspective and that doing so is much, much, much better. “Make it work so that they love you, even when you have gone away to a happier situation – one involving no road-based unpleasantness at all”.

Everyone leaves their job eventually and I like to think it is often for more positive reasons. Like retiring, or a better job {better for you, but a real shame for your old company as they like you so much}, moving to a new area, attempting a dream. Yes, sometimes (depressingly often at present) it is because you get made redundant or things go bad with your managers, or HR take over the organisation. But even so, better to leave knowing you did so with your professional duty intact I think. It’s one way of winning in a losing situation.

If turning the “bus” metaphor into a “lottery” metaphor results in the response in your brain of “well, when I do leave rich and happy, I still want to leave a painful mess behind me” – then it may indicate that you better leave where you are working as soon as possible in any case? As it is not a good situation and you are deeply very unhappy about it.

Up until now I have sometimes used a far more gruesome but less fatal phrase for the concept of making sure things continue after you leave and can no longer help, which is “involved in a freak lawnmower accident”. As in, can’t type but not dead. I’m going to stop using it, I’ve decided that even with my macabre sense of humour, it really is not a good way to think about doing your job properly. Daniel, your attitude is better. Thank you.

Oh, if you went along to the conference you can get the latest version of Daniel’s talk slides from the UKOUG web site (try this link), otherwise, he has a copy here – pick “papers and presentations”. It has lots of notes on it explaining what the slides mean (ie, what he actually says), which I think is a very nice thing for him to have spent the time doing.

Friday Philosophy – The Worst IT Person I Have Met October 15, 2010

Posted by mwidlake in Friday Philosophy.
Tags: , , ,
9 comments

A couple of weeks ago I extolled the virtues of someone I felt was a great person to work with. This week I’m going to do the opposite (and it will be interesting to see which posting gets more hits).

The worst person I have worked with in IT is Mick. I’ve only known a couple of Micks {and if you are one of them, but you don’t know Barry, you are not the Mick}. In an ironic twist of fate I met Mick at the same time I met the best person I have worked with, Barry. We were all in the same team you see, a UNIX sys admin team I got parachuted into. Maybe the vast difference between the two help make them so distinct in my mind.

Mick was very knowledgable and technically very capable. No, that is not fair, he was extremely good. He actually knew all this system admin stuff and several variations of shell programming, perl, C and a few other two-steps-from-assembler type languages. And he was an absolute and utter pain in the behind.

Barry and I did not know much (or in some cases, any) of this sys admin stuff. If we needed to do something and did not know how, Mick was supposed to show us. It worked something like this:

“Mick, I need to copy all the files that were changed last week from this directory on box X to box Y, keeping the directory structure – Can you help?”. Mick would not hear. He suffered from “intermittent deafness” – though he never missed any announcements about free food. You had to go and stand by Mick and wait for him to deem to notice you. If you actually interrupted him he would swear at you and utterly refuse to help, you had to wait quietly. If it was a good day he would deem this acceptable after a minute or two, but he would do his utmost to convey the impression he despised your lack of knowledge and your concerns were beneath his talents… but he would stoop to help.

You would repeat the task you were trying to do and, pausing only briefly to pour scorn on such a trivial thing, he would turn his back and start typing. He’d write a script to do it. “no, no, don’t write it, just tell me the basic commands and I’ll work it out!” No, he insisted on writing the script.

The script would be a thing to behold. Mick would write it in as few lines as possible and the least number of letters. For ages. Oh, he would have a working version in about the time it took Barry or I to explain the task, but he would not give you that version, oh no. He would ignore you until he had made all variables 1 character, took out all whitespace, replaced anything obvious with something obtuse, replaced a small chain of simple commands with one or two arcane commands. Every script was an attempt to win an “obfuscated code” competition. If we waited for the end result, it was impossible for Barry or I to decipher. The only benefit to the process was you would see the commands he was using and you could wander off and start with the unix Manuals yourself and get the job done.

He had other methods with which to demonstrate his greater worth.
Mick would agree to help (under duress of the boss telling him to do so) with an urgent task, but keep asking you to wait all day – then go home without doing his bit.
He seemed to love to intercept anyone coming to you for help, tell them he would sort out the problem for them – only to not. And then tell the user the next day that it was Barry or My problem to sort out. Correct, Mick would not have mentioned this to us.

Mick was fair though, he would treat everyone the same. With scorn. Any expertise in a field he did not know was unimportant and anyone with skills in his field was just competition to be shown who was best. Sadly, he usually was best, if best means biggest smartass.

Over time, as Barry and I learnt stuff (almost never from him), Mick became redundant. Not because we caught him up, not by a long way, but because no one else in the department would ask him anything. They would come to Barry and I. We might be slow and we sometimes screwed it up but we did not sneer and we fixed the problem in a way they could understand.

The reason Mick is the worst person I ever worked with is, unlike people who simply break stuff or lie about their skills or are stupid, he was actually very talented and capable – and yet took a perverse pleasure in not doing so. Mick would put effort into the art of maximizing his unhelpfulness. It was the difference between his potential to help and his drive to not do so that made it so hard for me to deal with him. I’d rather work with a talentless, idiot liar because at least you don’t need or expect much from them.

*sigh*

Follow

Get every new post delivered to your Inbox.

Join 161 other followers