jump to navigation

Database Sizing – How much Disk do I need? (The Easy Way) November 11, 2010

Posted by mwidlake in Architecture, development, VLDB.
Tags: , , , ,
7 comments

How much Disk do I need for my new Oracle database? Answer:-

  • 8-10 times the volume of raw data for an OLTP system
  • 2-4 times the raw data volume for a Data Warehouse.
  • The bigger the database, the nearer you will be to the lower multiplication factors.

{Disclaimer. This is of course just my opinion, based on some experience. If you use the above figures for a real project and get the total disc space you need wrong, don’t blame me. If you do and it is right, then of course you now owe me a beer.}

Many of us have probably had to calculate the expected size a database before, but the actual database is only one component of all the things you need to run the Oracle component of your system. You need to size the other components too – Archived redo logs, backup staging area, dataload staging area, external files, the operating system, swap space, the oracle binaries {which generally gets bigger every year but shrink in comparison to the average size of an Oracle DB} etc…

In a similar way to my thoughts on how much database space you need for a person, I also used to check out the total disk space every database I created and those that I came across took up. {A friend emailed me after my earlier posting to ask if I had an obsession about size. I think the answer must be “yes”}.

First of all, you need to know how much “raw data” you have. By this I mean what will become the table data. Back in the early 90’s this could be the total size of the flat files the old system was using, even the size of the data as it was in spreadsheets. An Oracle export file of the system gives a pretty good idea of the raw data volume too. Lacking all these then you need to roughly size your raw data. Do a calculation of “number_of_rows*sum_of_columns” for your biggest 10 tables (I might blog more on this later). Don’t be tempted to overestimate, my multipliers allow for the padding.

Let us say you have done this and it is 60GB of raw data for an OLTP system. Let the storage guys know you will probably want about 500GB of space. They will then mentally put it down as “of no consequence” as if you have dedicated storage guys you probably have many terabytes of storage. {Oh, I should mention that I am not considering redundancy at all but space that is provided. The amount of actual spinning disk is down to the level and type of RAID you storage guys make you use. That is a whole other discussion}.

If you come up with 5TB of raw data for a DW system then you need around 12-15TB of disk storage.

If you come up with more than a Terabyte or so of raw data for an OLTP system or 10 to 20 Terabytes for a DW, when you give you figures to the storage guys/procurement people then they may well go pale and say something like “you have got to be kidding!”. This is part of why the multiplication factor for Data Warehouses and larger systems in general is less, as you are forced to be more careful about the space you allocate and how you use it.

The overhead of total disk space over Raw data reduces as the database gets bigger for a number of reasons:

  • The size of the Oracle binaries and the OS does not change as the database gets bigger.
  • The size of swap space does not increase in line wiht the database as, generally speaking, if you increase the database size from 100GB to 1TB you do not have the luxury of increasing the system memory of your server. It probably doubles.
  • Very large databases tend to have something making them big, like images or embedded documents, which are not indexed. Thus the ratio of table segments to index segments increases.
  • If you have a very large database you start removing indexes (often those that support constraints) to aid performance of data load and management, again improving the ratio of table segments to index segments.
  • Backups become partial or incremental to reduce the size and duration of the backup.
  • As mentioned before, the sheer size of system is such that you just take more care over cleaning up work areas, paring down the archived redo log areas (those files to compress well) and other areas.
  • If things get extreme or you have been doing this for donkeys years {note to none-UK people, this means many, many years} you start altering PCTFREE and checking over extent sizes.

My best ever ratio of database size to raw data was around 1.6 and it took an awful lot of effort and planning to get there. And an IT manager who made me very, very aware of how much the storage was costing him (it is not the disks, it’s all the other stuff).

I should also just mention that the amount of disk you need is only one consideration. If you want your database to perform well you need to consider the number of spindles. After all, you can create a very large database indeed using a single 2TB disc – but any actual IO will perform terribly.

How Fast for £1,000 – Architecture August 5, 2010

Posted by mwidlake in Architecture, performance, Testing.
Tags: , , ,
7 comments

My previous post proposed the creation of “the fastest Oracle server for a grand”, or at least an investigation into what might be the fastest server. I’ve had some really good feedback {which I very much appreciate and am open to even more of}, so I think I’ll explore this further.

My initial ideas for the hardware configuration, written at the same time as the original post, were:

  • A single-chip, quad core intel core i5 or i7 processor (I would like two chips but the cost of multi-chip motherboards seems too high for my budget)
  • 8GB of memory as the best price point at present, but maybe push to 16GB
  • Multiple small, fast internal disks for storage, maybe expand via eSATA
  • backup to an external drive (cost not included in the budget).
  • USB3 and use of memory sticks for temp and online redo.
  • If budget will stretch, SSD disc for the core database components. like core tables, index tablespaces (who does that any more!).
    ASM or no ASM?
    If I run out of internal motherboard connections for storage, can I mix and match with USB3, external e-SATA or even GB ethernet?

As for the Oracle database considerations, I have a good few things I want to try out also. In the past (both distant and recent) I have had a lot of success in placing components of the database in specific locations. I refer to this as “Physical Implementation” {Physical Implementation, if I remember my old DB Design courses correctly, also includes things like partitioning, extent management, tablespace attributes – how you actually implement the tables, indexes and constraints that came from logical data design}.

Physically placing components like undo and redo logs on your fastest storage is old-hat but I think it gets overlooked a lot these days.
Placing of indexes and tables on different tablespaces on different storage is again an old and partially discredited practice, but I’d like to go back and have a new look at it. Again, I had some success with improved performance with this approach as little as 8 years ago but never got to rigorously test and document it. { As an aside, one benefit I have been (un)fortunate to gain from twice through putting tables and indexes in separate tablespaces is when a tablespace has been lost through file corruption – only for it to be an index tablespace, so I was able to just drop the tablespace and recreate the indexes.}

Then there is the use of clusters, IOTs, Bitmap indexes and Single Table Hash Clusters (are you reading this Piet?) which I want to explore again under 11.

I don’t think I am going to bother with mixed block sizes in one DB, I think you need very specialist needs to make it worth the overhead of managing the various caches and the fact that the CBO is not so great at accurately costing operations in non-standard block sizes {issues with the MBRC fudge factor being one}. But I think I will re-visit use of “keep” and “recycle” caches. For one thing, I want to show that they are just caches with a name and not special, by using the “Recycle” cache as the keep and the “keep” as a recycle cache.

Should I be using RAT for testing all of this? I said I was not going to use any special features beyond Enterprise edition but RAT could be jolly useful. But then I would need two servers. Is anyone willing to give me the other £1000 for it? I’d be ever so grateful! :-)

More Memory Meanderings – IOPS and Form Factors July 19, 2010

Posted by mwidlake in Architecture, Management, performance.
Tags: ,
8 comments

I had a few comments when I posted on solid state memory last week and I also had a couple of interesting email discussions with people.

I seriously failed to make much of one of the key advantages of solid-state storage over disk storage, which is the far greater capacity of Input/output operations per second (IOPS), which was picked up by Neil Chandler. Like many people, I have had discussions with the storage guys about why I think the storage is terribly slow and they think it is fast. They look at the total throughput from the storage to the server and tell me it is fine. It is not great ,they say, but it is {let’s say for this example} passing 440MB a second over to the server. That is respectable and I should stop complaining.

The problem is, they are just looking at throughput, which seems to be the main metric they are concerned about after acreage. This is probably not really their fault, it is the way the vendors approach things too. However, my database is just concerned in creating, fetching, and altering records and it does it as input/output operations. Let us say a disk can manage 80 IOPS per second (which allows an average 12.5 ms to both seek to the record and also read the data. Even many modern 7,200 rpm discs struggle to average less than 12ms seek time). We have 130 disks in this example storage array and there is no overhead from any sort of raid or any bottleneck in passing the data back to the server. {This is of course utterly unbelievable, but if i have been a little harsh not stating the discs can manage 8ms seek time, ignoring the raid/hba/network cost covers that}. Each disc is a “small” one of 500GB. They bought cheap disk to give us as many MB/£ as they could {10,000 and 15,0000 rpm disks will manage 120 and 160 IOPS per second but cost more per MB}.

Four sessions on my theoretical database are doing full table scans, 1MB of data per IO {Oracle’s usual max on 10.2}, Each session receiving 100MB of data a second, so 400MB in total. 5 discs {5*80 IOPS*1MB} could supply that level of IOPS. It is a perfect database world and there are no blocks in the cache already for these scans to interrupt the multi-block reads.

However, my system is primarily an OLTP system and the other IO is records being read via index lookups and single block reads or writes.

Each IOP reads the minimum for the database, which is a block. A block is 4k. Oracle can’t read a bit of a block.

Thus the 40MB of other data being transferred from (or to) the storage is single block reads of 4k. 10,000 of them. I will need 10,000/80 disks to support that level of IO. That is 125 discs, running flat out.

So, I am using all my 130 discs and 96% of them are serving 40MB of requests and 4% are serving 400MB of requests. As you can see, as an OLTP database I do not care about acreage or throughput. I want IOPS. I need all those spindles to give me the IOPS I need.

What does the 40MB of requests actually equate to? Let us say our indexes are small and efficient and have a height of 3 (b-level of 2), so root node, one level of branch nodes and then the leaf nodes. To get a row you need to read the root node, branch node, lead node and then the table block. 4 IOs. So those 10,000 IOPS are allowing us to read or write 10,000/4 records a second or 2,500 records.
You can read 2,500 records a second.

Sounds a lot? Well, let us say you are pulling up customer records onto a screen and the main page pulls data from 3 main tables (customer, address, account_summary) and translates 6 fields via lookups. I’ll be kind and say the lookups are tiny and oracle just reads the block or blocks of the table with one IO. So that is 9IOs for the customer screen, so if our 40MB OLTP IO was all for looking up customers then you could show just under 280 customers a second, across all users of your database. If you want to pull up the first screen of the orders summary, each screen record derived from 2 underlying main tables and again half a dozen lookups, but now with 10 records per summary page – that is 80 IOs for the page. Looking at a customer and their order summary you are down to under thirty a second across your whole organisation and doing nothing else.

You get the idea. 2,500 IOPS per second is tiny. Especially as those 130 500GB disks give you 65TB of space to host your database on. Yes, it is potentially a big database.

The only way any of this works is due to the buffer cache. If you have a very healthy buffer cache hit ratio of 99% then you can see that your 2500 records of physical IO coming in and out of the storage sub-system is actually supporting 250,000 logical-and-physical IOPS. {And in reality, many sites not buffer at the application layer too}.

Using Solid State Storage would potentially give you a huge boost in performance for your OLTP system, even if the new technology was used to simply replicate disk storage.

I think you can tell that storage vendors are very aware of this issue as seek time and IOPS is not metrics that tend to jump out of the literature for disk storage. In fact, often it is not mentioned at all. I have just been looking at some modern sales literature and white papers on storage from a couple of vendors and they do not even mention IOPS – but they happily quote acreage and maximum transfer rates. That is, until you get to information on Solid State Discs. NOw, because the vendor can say good things bout the situation then the information is there. On one HP white paper the figures given are:

				Modern super-fast		Top-end 
				SAS disk drive Top-end 	Solid State Disk
Sustained write     	150MB/s			180MB/s
Sustained read			90MB/s			180MB/s
Random write			285				5,000+
Random read				340				20,000+ 

More and more these days, as a DBA you do not need or want to state your storage requirements in terms of acreage or maximum throughput, you will get those for free, so long as you state your IOPS requirements. Just say “I need 5000 IOPS a second” and let the storage expert find the cheapest, smallest disks they can to provide it. You will have TBs of space.

With solid-state storage you would not need to over-specify storage acreage to get the IOPS, and this is why I said last week that you do not need solid state storage to match the capacity of current disks for this storage to take over. We would be back to the old situation where you buy so many cheap, small units to get the volume, IOPS are almost an accidental by-product. With 1GB discs you were always getting a bulk-buy discount :-)

I said that SSD would boost performance even if you used the technology to replicate the current disk storage. By this I mean that you get a chunk of solid-state disk with a SATA or SAS interface in a 3.5 inch format block and plug it in where a physical disk was plugged in, still sending chunks of 4k or 8k over the network to the Block Buffer Cache. But does Oracle want to stick with the current block paradigm for requesting information and holding data in the block buffer cache? After all, why pass over and hold in memory a block of data when all the user wanted was a specific record? It might be better to hold specific records. I suspect that Oracle will stick with the block-based structure for a while yet as it is so established and key to the kernel, but I would not be at all surprised if something is being developed with exadata in mind where data sets/records are buffered and this could be used for data coming from solid state memory. A second cache where, if using exadata or solid-state memory, holding single records. {I might come back to this in a later blog, this one is already getting bloated}.

This leads on to the physical side of solid-state discs. They currently conform to the 3.5” or 2.5” hard disc form factor but there is no need for them to do so. One friend commented that, with USB memory sticks, you could stick a female port on the back of a memory stick and a joint and you could just daisy-chain the USB sticks into each other, as a long snake. And then decorate your desk with them. Your storage could be looped around the ceiling as bunting. Being serious, though, with solid state storage then you could have racks or rows of chips anywhere in the server box. In something like a laptop the storage could be an array 2mm high across the bottom the chasis. For the server room you could have a 1u “server” and inside it a forest of chips mounted vertically, like row after row of teeth, with a simple fan at front and back to cool the teeth (if needed at all). And, as I said last time, with the solid state being so much smaller and no need to keep to the old hard disk format, you could squeeze a hell of a lot of storage into a standard server box.

If you pulled the storage locally into your server, you would be back to the world of localised storage, but then LANs and WANs are so much faster now that if you had 10TB of storage local to your server, you could probably share it with other machines in the network relatively easily and yet have it available to the local server with as many and as fat a set of internal interfaces as you could get your provider to manage.

I’m going to, at long last, wrap up this current instalment on my thoughts with a business one. I am convinced that soon solid-state storage is going to be so far superior a proposition to traditional disks that demand will explode. And so it won’t get cheaper. I’m wondering if manufacturers will hit a point where they can sell as much as they can easily make and so hold the price higher. After all, what was the argument for Compact Discs to cost twice as much to produce as old cassette tapes, even when they had been available for 5 years? What you can get away with charging for it.

Friday Philosophy – CABs {an expensive way to get nowhere?} March 11, 2010

Posted by mwidlake in biology, development, Friday Philosophy, Management.
Tags: , ,
3 comments

A few years ago, my wife and I went to New York for a holiday. We got a cab from the airport into Manhattan. It was an expensive way to see, at great length, some of the more uninteresting automobile transit routes through New York. We arrived at our hotel a great deal later than we anticipated. And with a lot of our paper dollars no longer in our possession.

I’ve also taken cabs through London, usually at the weekend to get back to Liverpool Street Station. The trip is generally quick, painless and not too expensive, no matter what bit of London is currently being dug up. Those black-cab drivers know their stuff.

Of course, the CABs I refer to in the title of this Friday Philosophy are not private cars for hire. In this context CAB is Change Advisory Board. A term that can make grown developers weep. If you do not know, the Change Advisory Board is a group of people who look at the changes that are planed for a computer system and decide if they are fit for release. My personal experience of them has been similar to my experience of the taxi variety, though sadly more of the New York than London experience.

You might expect me to now sink into a diatribe {ie extended rant} about how I hate CABs. Well, I don’t. CABs can be a part of a valuable and highly worthwhile process control mechanism. Just as proper QA is core to any mature software development process, so CABs are important in getting informed, talented stakeholders to review proposed changes. They check for overall system impact, clashes with other proposed changes that individual development streams may be unaware of, use their own {hopefully deep and wide} experience to consider the changes and to verify Due Diligence has been invoked {that last one is a bit of a minefield and where, I believe, many CABs fail}.

Sadly, though this is often the aim, the end result is too often a bunch of uninformed and technically naive politicos trying to wield power, using the CAB meeting as an extended game of management chess.

I’ve seen CABs trade changes. “I’ll let you have X if I can have Y and Z”. I’ve seen CABs turn down changes because the form had spelling mistakes in it. I’ve seen CABs object to a change that will save the company 5 million pounds a day because it lacked one signature.

That last one just stopped me in my tracks {I’m not exaggerating either, if anything I am underplaying the cost impact of that decision. I saw the figures and I wasted a couple of days of my life checking, 5 million pounds a day was the least I felt I could prove.} We are talking about enough money every day to pay the salary of everyone on the CAB for several years. And they blocked it because the DBA team had not signed off the change. No effort was made to address the lack of the signature in any way, the change was just refused.

The DBA Team had not signed off the change because the one and only DBA Team Leader who was allowed to sign off was on holiday for two weeks. They needed that holiday too, for other but I suspect linked reasons.

Now, I knew the DBA Team Lead and he was a good bloke, he knew his stuff and he was not paid 5 million pounds a day. His deputy was paid even less but was no less talented but she was not allowed to sign off the change as she was not the DBA Team Lead.

That was a CAB gone very wrong. The process of the CAB had been allowed to over-rule good business sense. It was also overruling general and technical sense, but that really is secondary to what keeps the business making a profit.

I’ve seen the opposite of course, technical teams that just apply whatever changes they feel are fit, with no oversight or CAB. To be honest, this less controlled process seem to mess up less often than a poor CAB process as the technicians know they are the ones who will spend the weekend fixing a mess if one occurs. But that mess up will occur eventually, if control is lacking, and the bigger and more complex the IT environment, the greater the chance of the mess up.

So, I feel CABs are good, no make that Great, if you have the right people on them and you have a sensible cascade of authority so one person being away does not block the system. That is quite a bit harder to put in place than a simple “Dave A, John, Andrea, Alex, Raj, Dave P, Mal, Malcolm and Sarah have final signoff” which most CABs effecively become.

But there is one last fault of CABs I want to highlight. They tend to treat all changes in the same way and all changes are not the same. Upgrading the underlying OS is not the same as adding a cardinality hint to one Business Objects report.

If your CAB or change process treat the two above examples the same, then your CAB or change process is broken. Now, in all IT “rules of thumb” there is an exception. In this case, I am truly struggling to think of one. My feeling is that if your change process treats an OS upgrade the same as adding a hint to a report, it is not fit for purpose.

Those are my main issue with CABs. They should be of significant business importance, but nearly always they are implemented with one process to deal with all situations and then get taken over by people with an “Office Politics” agenda as opposed to a “Getting the best job we can reasonably expect done” agenda.

I’m very passionate about this and I have a way I hope can throw this issue into context, an analogy.

Ask yourself this senario.
You go to your doctor with a niggly cough you have had for a week OR you go to your doctor because you almost passed out each day you got out of bed for the last three days.
If your doctor treated you the same for both sets of symptoms, would you be happy with that doctor?

Why are all IT changes handled by most CABs in exactly the same way?

(BTW if you ever almost collapse when you get out of the bed in the morning, do NOT go to work, go instead to your doctor and ask them for a full medical and if he/she does not take blood pressure readings and order a full blood chemisty test, go find a new doctor.)

Friday Philosophy – Alternatives to Testing February 19, 2010

Posted by mwidlake in Friday Philosophy, Testing.
Tags: ,
5 comments

We all know that we should test everything before we do it on a production system.

Or should we? Well yes, you should, but sometimes there is no suitable test system (or even any test system, which seems to be becoming a more common problem) or the testing regime takes weeks and you need a fix right now.

So given that there really is a proper business reason why doing the testing is expensive {in effort or time} or impossible, what can you do? I believe there is an alternative.

To be able to reduce or remove the need for proper testing you need to be absolutely sure the following three things are true, in order of importance:

  1. Be able to monitor the live system for negative impact of your changes.
  2. Have a proven and quick rollout.
  3. Be confident that the chance of negative impact on your system is low likelihood and low impact.

Note that, in my opinion, the least important {yet still very important} consideration is that the change be almost certain to work and won’t cause too much grief if something unexpected happens.  However, this seems to be the main concern when people are considering a “Quick Fix” on a production system. “Is it likely to go wrong? Is it going to matter if it does not help?”. At this point all eyes tend to swivel to the DBA or the Lead Developer who has to go on what I call DBA Instinct. Given your experience to date and what you know about how the database works, give it your best guess.

The problem with DBA Instinct is that if you are not very experienced you are unlikely to get it right. If you are very experienced, most of that experience was on earlier version of Oracle and you probably do little hands-on work now as you are a manger. So you are still likely unlikely to get it right :-)

So, I argue that point 1 (spotting things going wrong) and 2 (Getting back to where you started) are most important.

Let us take the classic example of just adding an index as our untested change (as we have no system capable of generating a realistic workload and thus the impact across the board).

3) Most of us would agree that adding an index is easy and the impact is low and the chance of plans going terribly slow by using the new index unsuitably are low…

2) Rollback is relatively simple, you drop the index again. Question, can you just drop an index on a table if a SQL statement is in-flight and using that index? You know, a SQL statement that the CBO has decided is going to be useful and, despite your DBA Instinct that nothing is likely to go wrong, now takes 20 seconds to run rather than 0.2 seconds. And it is being run all the time on your OLTP system so now there is no time when there are not 2 or 3 queries in flight.

3) Verifying impact. This point is vital and can be tricky. You can just rely on staring at your monitoring tool and hoping any serious impact shows up so you can catch it and do something about it. Or people ring up and complain. That would not be so good, you have impacted the business. 

I would pull out code from the SGA that is referencing the table to be indexed before you start (I’ll stick a bit of code at the bottom for doing this). If the list is not too large, you can do the same check after the code has gone in and compare changes to buffer gets and disk reads per execution.

A more targeted check is to pull out of v$SQL_PLAN anything that starts using the new index and check to make sure it is not now a problem.

A second example, I want to gather system statistics as I am sure the system does not appreciate how blindingly fast my disks are for multi block reads.

3) Impact? Well, I would suggest the potential impact is wide and across the board, else why do it?

2) Rollback? Delete the system statistics. What happens when you delete the system statistics? Are they all set to null? Are the defaults put back in place? Could I use DBMS_STATS.RESTORE_SYSTEM_STATS? What is the impact of any of them? {RESTORE appears to work reliably and, if you delete the system stats, you get the “out of the box” values again, which could be very different to what you had before your spur-of-the-moment gather}

1) How to assess negative impact? You are going to have to monitor the whole system and hope to pick up any code that starts running slowly before it impacts the business too much.

I would probably add the index but I would need to do at least some testing and proof of rollback before I gather system statistics. And I would only do so without proper testing if said proper testing was impossible due to a lack of test systems.

So, I do think you can implement changes without testing or with reduced testing, but it is not always the easy option.

I also think it is a valid (and often quicker, in terms of elapsed time) way of changing the productions system without going through full unit/integration/regression/NFR testing.

Now you just need to persuade the Change Advisory Board that it is OK to do the change. Good luck :-)

Oh, that code:-

-- chk_sga_txt.sql
-- Martin Widlake - look for sql in sga containing the provided txt
--
set pause on
set pages 36
col first_load_time form A20
spool chk_sga_txt
select
first_load_time
,parse_calls prse
,executions  excs
,buffer_gets buffs
,disk_reads   discs
,rows_processed rws
--,address        address
,hash_value     hash_value
,sql_id
--,plan_hash_value
,sql_text
from v$sqlarea
--where parsing_schema_id !='0'
where upper(sql_text) like upper('%'||nvl('&sql_txt','whoops')||'%')
and rownum <50
--order by first_load_time desc
order by (greatest(buffer_gets,1)/greatest(rows_processed,1)) desc
/
spool off

Making Things Better Makes Things Worse February 11, 2010

Posted by mwidlake in development, Management, Perceptions.
Tags: , ,
12 comments

This could be a Friday Philosophy but I’ve got a couple of those lined up already. Anyway, I am suffering at work at the moment. I’m encountering a phenomenon that I have talked about with Dennis Adams a couple of times. It probably has a proper term, but basically it is the odd situation that when you make things better, you get more complaints. {Dennis, do you know the proper term?}

{Update. Dennis was good enough to link to this paper he wrote on customer feedback}

Anyway, Let me explain. You have an application sitting on a database. The screens are slow, the reports take an age to come out, your might even have considerable system instability and unplanned outages. The users are not unhappy. They were unhappy last year. Now they are just cynical and they just expect the system to be slow, unresponsive, flaky. So they do not report any problems.

Then things change. The system gets some much-needed care and attention and now the slowest reports get tuned up, the screens come back faster and less spontaneous department-wide coffee breaks are caused by the system crashing. Everything gets better. But not for the help desk, now they start getting calls. “This report is too slow”. “Why can’t I jump straight from the customer screen to the pending orders screen?”. This happens because the users now realise that something can be done. There is a point in complaining as there is a chance their piece of misery could be made better. I certainly went through this many years ago when I inherited a system that crashed every week. No one mentioned it, they just went to tea and complained about it. The first time it crashed after I arrived I could not believe that no one had called before I had realised it had died. Once it had been up solidly for a couple of months, then when it crashed boy did we hear about it!

Also, when you improve a system and things generally get better, occasionally something will improve and then fall back a bit. Hardly anyone says “thanks” for the initial improvement but they will say something if it improves and then drops back.

That is what is happening for my main client at the moment. The system was not that bad, but it needed some help. Why else would I be there? I’ve been beavering away with the rest of the team and we are making things better, so far mostly at an underlying “getting the overall system straight” level. A new chap has joined to help concentrate on performance and he is really making inroads into specific things, individual reports and processes that need a good sorting out.

So now that things are getting better and performance is generally improving, anything that is still slow is being brought up by the development and support teams. Also, we’ve made a few things slower (I’m sorry, it just happens like that) and they certainly get mentioned.

So, I’m busy. And I could get annoyed at people asking why X is slower when Y and Z are faster. But I don’t, because Dennis explained this counter intuitive theory to me.

I know things are getting better as people are annoyed as opposed to apathetic :-)

Privileges required to FLUSH_DATABASE_MONITORING_INFO October 19, 2009

Posted by mwidlake in development, performance.
Tags: , , ,
1 comment so far

I’m doing some work at the moment on gathering object statistics and it helps me a lot to have access to the number of changed records in SYS.DBA_TAB_MODIFICATIONS. To ensure you have the latest information in this table, you need to first flush any data out of memory with DBMS_STATS.FLUSH_DATABASE_MONITORING_INFO.  For the live system, the DBAs rather understandably {and sensibly} want all users to run with the least access privileges they need, so granting DBA role to my user is out.

I googled for the actual system privilege or privileges needed to flush_database_monitoring_info and drew a blank, so I have had to find out for myself. And being a nice guy {who am I kidding}, I am now recording the info for anyone else who is interested to find:

On 10.2.0.3, to execute DBMS_STATS.FLUSH_DATABASE_MONITORING INFO you need the ANALYZE ANY system privilege.

{Not ANALYZE ANY DICTIONARY, which would make more sense to me}

For those who like such things, here is the proof. I had to use two sessions, thus the constant displaying of system time.

-- current user privs
DWPERFDEV1> @usr_privs
enter user whos privs you wish to see> dwperf
GRANTEE              TYPE PRIVILEGE                           adm
----------------------------------------------------------------
DWPERF               SYSP CREATE JOB                          NO
DWPERF               SYSP CREATE PROCEDURE                    NO
DWPERF               SYSP CREATE PUBLIC SYNONYM               NO
DWPERF               SYSP CREATE SESSION                      NO
DWPERF               SYSP CREATE SYNONYM                      NO
DWPERF               SYSP CREATE TABLE                        NO
DWPERF               SYSP CREATE TRIGGER                      NO
DWPERF               SYSP DEBUG CONNECT SESSION               NO
DWPERF               SYSP DROP PUBLIC SYNONYM                 NO
DWPERF               SYSP EXECUTE ANY PROCEDURE               NO
DWPERF               SYSP SELECT ANY DICTIONARY               NO
DWPERF               SYSP SELECT ANY TABLE                    NO
DWPERF               ROLE CONNECT                             NO
DWPERF               OBJP SYS.DBMS_UTILITY-EXECUTE            NO
DWPERF_ROLE          SYSP ANALYZE ANY                         NO
DWPERFDEV1> @showtime
  Date       Time
--------------------------------------------------------
19-OCT-2009 13:29:16

DWPERFDEV1> exec dbms_stats.flush_database_monitoring_info
BEGIN dbms_stats.flush_database_monitoring_info; END;
*
ERROR at line 1:
ORA-20000: Insufficient privileges
ORA-06512: at "SYS.DBMS_STATS", line 2148
ORA-06512: at "SYS.DBMS_STATS", line 14135
ORA-06512: at line 1

DEV1> @showtime
  Date       Time
------------------------------------------------
19-OCT-2009 13:29:30

DEV1> grant analyze any dictionary to dwperf;
Grant succeeded.

DWPERFDEV1> @showtime
  Date       Time
------------------------------------------
19-OCT-2009 13:29:40

DWPERFDEV1> exec dbms_stats.flush_database_monitoring_info
BEGIN dbms_stats.flush_database_monitoring_info; END;
*
ERROR at line 1:
ORA-20000: Insufficient privileges
ORA-06512: at "SYS.DBMS_STATS", line 2148
ORA-06512: at "SYS.DBMS_STATS", line 14135
ORA-06512: at line 1

DEV1> @showtime
 Date       Time
---------------------------------------------
19-OCT-2009 13:30:46

DEV1> grant analyze any to dwperf;
Grant succeeded.

DWPERFDEV1> @showtime
  Date       Time
------------------------------------
19-OCT-2009 13:31:20

DWPERFDEV1> exec dbms_stats.flush_database_monitoring_info
PL/SQL procedure successfully completed.

-- SUCCESS!

DEV1> @showtime
  Date       Time
-------------------------------------------
19-OCT-2009 13:31:38
DEV1> revoke analyze any from dwperf
Revoke succeeded.

DWPERFDEV1> @showtime
  Date       Time
------------------------------------------------------
19-OCT-2009 13:31:57

DWPERFDEV1> exec dbms_stats.flush_database_monitoring_info
BEGIN dbms_stats.flush_database_monitoring_info; END;
*
ERROR at line 1:
ORA-20000: Insufficient privileges
ORA-06512: at "SYS.DBMS_STATS", line 2148
ORA-06512: at "SYS.DBMS_STATS", line 14135
ORA-06512: at line 1

Of course, I’ll soon find something else that breaks due to my minimum privs before the end of the day, but it’s not easy creating more secure systems {note, I don’t say Secure, just more secure, as in less open!}.

VLDB Backups October 13, 2009

Posted by mwidlake in Architecture, VLDB.
Tags: , , ,
9 comments

One of the indications that your database classes as a VLDB is that your backups are giving you problems simply due to the size of the database.

As some of you will know from previous posts of mine, like this one about how vital it is to prove your backup and this one about how you maybe need to back up more than just the database, I have a thing about backups. Or, more specifically, recovery.

My focus on backups may surprise some people who have worked with me as I often state “I am not a production DBA”. {And no Dave, this is not me saying I am too good to be a proddy DBA, it is me saying I do not have current, strong skills in many of the daily proddy DBA tasks}. However, I am an architect. Whenever I work on a new system or heavily modify an existing system, I try and keep the need for backup and recovery at the front of my mind.

The most common issue encountered with backing up a VLDB is the time it takes to run the backup, it can’t be done in the time between backup cycles, usually a day. 

The second most common issue is the impact on the live system of running the backup. Sometimes this impact is overstated, after all if the backup is one thread running on the server it is only going to consume as much resource as one thread can, which may well leave enough over for the daily processing requirements, but usually for large systems steps have been taken to run the backup in parallel, thus creating considerable load on the system.

A third issue, which is related to the first, is that the backup takes so long and uses so many tapes (or space) that it rarely completes – a network glitch, a failure of the backup suite, running out of media, all stop the backup finishing. I’ve been in the situation of attempting 4 or 5 backups for each one that succeeds as something crops up in the 2 or 3 days it takes to run the backup. {In our case it was the flaky backup software, grrrrr}.

The final issue I’ll mention is one that is often overlooked. You can’t afford the time to recover the backup if it was ever needed. I’ve seen this especially with export or expdp-based backups - Some sites still use export and it has it’s place with smaller systems - often it seems to be used with OLTP systems that have more than 75% of the database volume as indexes. The export runs fine overnight, it is only processing that 25% of the system that is data. But when you ask the client if they can wait 5 days to import the export they go pale. This time-to-recovercan also be a problem with RMAN backups, you need to read in everything you wrote out.   

I’ve said it before but I’m going to say it again – a backup is not a backup until you have done a successful test full recovery. This would certainly highlight how long your recovery takes.

So, how do you solve the problem of backing up a VLDB?

Well, one solution is to not bother. I know of a couple of sites that have two physical copies of the database, at different locations, and write all data to both. If they lose one copy, they can keep running on the other copy whilst the lost version is rebuilt. Your swap-over could be almost instant.
Drawbacks here are:

  • If you lose one copy you have no redundancy until the second system is rebuilt. This is like losing a disk out of a RAID5 array, another failure is disaster. As databases get bigger, this period of zero redundancy gets longer and thus the chance of a second failure increases (which again is just like the RAID5 array – yet another argument against massive discs).
  • As you write to both systems, if the damage is caused by the feed (eg accidentally deleting data) then both are damaged, unless you have a delay on one system, in which case you now have issues with catching up on that delay if you have to swap to the second system. Flashback may save you from damage caused by the feed.
  • The cost of the second system and the complexity of the double-writes can both be issues.

Another solution is physical DataGuard. I see this as slightly different from the double-system approach as above as you have more options, such as replicating to more than one other system, opening and reading  the DataGuard copy, opening and using the copy before flashing it back and re-recovering, even Active DataGuard, where you can have the standby database open and being used, even whilst it is kept up-to-date. Again, you can set things up so that the gap between primary system failure and bringing up a new production system is small. A few issues to be mindful of are:

  • You have to ensure that your primary database is running in forced logging mode or you are extremely, and I mean extremely, careful about what you do against the database that is unrecoverable. The latter option is just asking for trouble actually. Which is a shame, as all those performance tricks of doing direct IO, append operations and nologging activities to help you process all the data in your VLDB are no longer available to you. This might be a show-stopper.
  • You have to take care in setting it all up and may need extra licence.
  • You still have the issue of damage being transmitted to your “backup” before you spot it.
  • The main issue? Someone will get clever and use your DataGuard systems for other things {Such as opening the standby to changing it and then flashing the data back, or use active data guard for reporting which becomes critical to your business} and now you actually have a production critical system split across the DataGuard architecture. It has stopped being a backup, or at least not a dedicated backup. Ooops.

There is actually no need to backup the whole database every night, though some sites seem fixated on achieving this. Or even every week. There is nothing wrong in having an RMAN level 0 {zero} backup that is a copy of everything and then just keep backing up the archived redo logs for eg 2 weeks before doing another level 0. So long as you thoroughly test the recovery and ensure you can recover the level 0, get hold of all those redo logs and apply them in a manner timely enough to support your business. I’ve recovered a level 0 backup over a month old and then run through all the archived redo logs to recreate the system, it worked fine as the volume of redo was pretty small compared to the database. Some considerations with this method are:

  • If you ever have trouble getting files from different days out of your backup area, or occasionally find files from your backup system are corrupt, do not even think of  using this method. One missed archive redo file from 13 days back and you are in serious trouble.
  • You need to do those level zero backups and they take a while. remember what I said about issues during a long backup?
  • It can get complex.
  • There is going to be a fairly significant delay in recovering your system.

There are several options with RMAN of doing incremental and cumulative incremental level 1 backups against a level 0 baseline backup. They have the same pros and cons as above, often trading more complexity with shorter recovery times. All good so long as you practice the recovery.

Physical copy at the storage level. These solutions seems to come and go every few years, but the principle is usually either (a) splitting mirrors – you have eg 3 copies of the data duplicated across the storage, you can un-couple one copy and do to it what you want, like copy it to tape- and then reintroduce the copy and catch up on changes, ie “resilver” the mirror. (b) use fancy logging within the storage layer to create a  logical copy of the whole live DB at a point in time by tracking and storing changes. You can then take your time copying that logical version to your backup destination. Taking the initial copy is usually instantaneous and with (b) can take up a surprisingly small amount of space. Disadvantages?

  • Cost. These clever IO units that can do this seem to be eye-wateringly expensive
  • Tie-in. You move storage provider, you need to re-plan and implement a new backup strategy
  • Probably personal this one, but can you trust it?  I saw it go horribly wrong in about 1998 and even now I kind of wince internally thinking about it. 

Export and Import. OK, I kind of rubbished this approach earlier and who in their right minds would try and export/import a VLDB of 20TB? You don’t. You export the critical few percent of the database that you need to recreate some sort of temporary production-capable system. Many applications can actually get by with all the reference/lookup data and the latest month or two of active business data. It gets a workable system up and running to keep your business process ticking over whilst you sort out recovering the rest of the system. The beauty of an export is that it can be imported to any working Oracle database of a high enough release level.

3 months ago I would have said this consideration needed to have been designed into you system architecture from the start, to stand any reasonable change of working, but I know of one site that managed just this technique recently. Only because they absolutely had to, but they managed it.   

My final backup methodology I’m going to mention here is – you do not need to back up all of your database in the same way. If you can move a large percentage of your database into readonly tablespaces, you can back up that section of the database once {disclaimer, by once I mean two or three times to two or three places and check you can read what you wrote and padlock the door to the vault it is in, and repeat said once-only backup every 6-12 months} and drop that section out of your backup. Now you only need to back up the remaining, hopefully small, active section of the database with whatever method you want. You can tie in the previous above of only needing to recover a critical subset of the system to get going again, ie what is not readonly, the two approaches complement each other. A few issues:

  • It only really works when you design this into the system from the start.
  • potentially complex recovery spread over a lot of tapes. Can you rely on being able to get at them in a timely manner?
  • People have a habit of wanting to update some of the stuff you made readonly. Sometimes only a few records but spread over a lot of readonly tablespaces.

All the above can be mixed and matched to come up with an overall solution. Personally I like having a physical standby database for immediate failover and an incremental backup off site for recovery beyond some situation that gets both primary and standby database.

Friday Philosophy -Do I think Oracle is Rubbish? October 8, 2009

Posted by mwidlake in Blogging, Perceptions.
Tags:
1 comment so far

This should be a “Friday Philosophy” posting really, but heck it’s my blog, I can do what I want {quiet smile}. Besides, by the time I finish this, it might well BE Friday. Oh, heck, I’ll just change the title now to a Friday Philosophy one…

I’ve been reviewing some of my blog this week {it is coming up to 6 months since I started so I was looking back at how it has gone}. Something struck me, which is I can be pretty negative about Oracle software and even Oracle Corp at times.

I mostly seem to pick up on oddities, things that do not work as first seems, even outright bugs. I do not often post about “this is how this cool Oracle feature works” or “I used this part of Oracle to solve this problem”. Partly the reason is that there are a lot of blogs and web pages about “how this feature works”, so the need is generally already met. Partly it is that I, like most people, are more interested in exceptions, gotchas and things going wrong. If it works, heck you just need to read the manual don’t you?

So, do I like Oracle?

Yes. Over all I really like working with Oracle. This is because:

  • I can store and work with pretty much whatever data I have ever needed to with Oracle. It is rare for me to be utterly stumped how to achieve something, though it could take time and maybe be a tad slow or a little inelegant, but it can be done.
  • Despite my recent complaints, you can chuck a hell of a lot of data at Oracle. Back in 2002 I was asked if I could put 7 or 8 Terabytes of data into an Oracle database. I did not even pause before saying “Yes!” – though I knew it would be a big job to do so in a way that was maintainable. I’d now feel the same about a couple of hundred TB.
  • The core technology works really well. We all complain about bits and pieces admitedly, but if I have a complex SQL statement with 15 tables and 25 where clauses, I don’t worry about the database giving me the wrong answer, I worry about the developer having written it wrongly {or Oracle running it slowly, but that keeps me in work, hehe.}. I can back up Oracle in many ways and, once I have proven my recovery, I know I can rely on the backup continuing to work, at least from an Oracle perspective. I’ve never yet lost any production data. Do I worry about transactional consistency? Never. Maybe I should, I’ve seen a couple of blogs showing how it can happen, but in my real-work life, I never even think about it.
  • Oracle does continue to improve the core products and they will listen to the community. It might not seem like it at times, I know, but they do. It can just take a long time for things to come through. As an example, I worked with the Oracle InterMedia developers back with the Oracle 10 beta program in 2003. They {well, to be specific, a very clever lady Melli Annamalia} were adding stuff back then that we and others needed that did not get to see the light of day in 10GR1, but was there as  a load of PL/SQL to do it in 10GR2. Melli said she was adding it into the code base as ‘C’ as well but it would take a while. It did, I think it was part of the 11G release.

Will this stop me complaining and whining on about bits of Oracle I don’t like or that do not work as they should? Absolutely not. As Piet de Visser said on a comment to one of my recent blogs, it is beholden on us Users to keep Oracle Corp honest. But I thought I ought to mention, at least once, that I do actually like Oracle.

I Like Oracle, OK?

Grudgingly :-)

Big Discs are Bad September 27, 2009

Posted by mwidlake in development, performance, Uncategorized.
Tags: , , ,
8 comments

I recently came across this article on large discs for database by Paul Vallee. The article is over 3 years old but is still incredibly valid. It’s a very good description of why big discs are a problem for Oracle Database Performance. {Paul also introduces the BAHD-DB campaign – Battle Against Huge Disks for Databases, which I quite like}.

To summarise the article, and the problem in general, IT managers will buy big discs as they provide more GB per pound sterling. It saves money.
However, less discs is Bad For Performance. As an extreme example, you can now buy a single disc that is a TB in size, so you could put a 1TB Oracle database on one such disc. This one disc can only transfer so much data per second and it takes this one disc say 10ms to search for any piece of data. If you want the index entry from one place and the table row from another, that is at least two seeks. This will not be a fast database {and I am not even touching on the consideration of disc resilience}.

Now spread the data over 10 discs. In theory these 10 discs can transfer 10 times the total data volume and one disc can be looking for information while the others are satisfying IO requests {This is a gross over-simplification, but it is the general idea}.

IT Managers will understand this 1-to-10 argument when you go through it.

Kind of.

But then discussions about how many modern “fast” discs are need to replace the old “slow” discs ensure. It can be very, very hard to get the message through that modern discs are not much faster. A 1TB disc compared to a 4-year-old 100GB disc will not have a transfer speed 10 times faster and it will certainly not have a seek time ten times less, chances are the seek time is the same. And then there are the discussion of how much impact the larger memory caches of modern storage units have. Answer,(a) quite a lot so long as it is caching what you want and (b) even if it is perfectly caching what you want, as soon as you have read a cache-sized set of data, you are back to disc IO speed.

Bottom line. Disc Drives are now slower in proportion to the disc acerage than they used to be.

Anyway, I am certainly not the only person to have had these discussions, though I have had them for longer than most {due to my accidental work history of having worked on VLDBs for so long}. There are certainly practitioners of Oracle Arts who understand all of this far better than I and one of them, James Morle, recently started blogging. It’s looking good so far. If he keeps it up for a month, I’ll put him on my blog roll :-)

There is, however, one aspect of the Big Disc Performance issue that does not seem to get much mention but is something I have suffered from more than a couple of times.

As a Database Performance person you have had the argument about needing spindles not disc acreage and won. The IT manager buys enough spindles to provide the I/O performance your system needs. Success.

However, the success has left a situation behind. You need 10 spindles over a couple of RAID 10 arrays to give you the IO you need. 250GB discs were the smallest you could buy. So you have 1.25TB of available storage (RAID 10 halves the storage) and have a 500GB database sitting on it. There is 750GB of empty storage there…

That 750GB of empty storage will not be left inviolate. Someone will use it. Someone will need “a bit of temporary storage” and that nice chunk of fast storage will be too inviting. Especially if it IS fast storage. It will be used.

Now your database, who’s storage you specified to support said database, is sharing it’s storage with another app. An  app that steals some of your IO and potentially {heck, let’s say this straight WILL} impact your database performance. And the galling thing? Twice, I had no idea my storage had become shared until I started getting odd IO latency issues on the database.

You may be able to make a logical argument for the spindles you need at design time. But you have almost no chance of protecting those spindles in the future. But who said working life was easy? :-)

Follow

Get every new post delivered to your Inbox.

Join 152 other followers