Friday Philosophy – How many Consistent Gets are Too Much? October 30, 2009
Posted by mwidlake in Perceptions, performance.Tags: db block gets, perception, performance
2 comments
One of my good friends, Piet de Visser commented on a recent post that “380 {consistent gets} is too much” per row returned. He is right. He is wrong. Which?
Piet is actually referring to a given scenario I had just described, so he is probably {heck, he is spot on} correct as his comment was made in context – but it set me to thinking about the number of times I have been asked “is the number of consistent gets good or bad” without any real consideration to the context. The person asking the question usually just wanted a black/white good/bad ratio, which is what Piet also mentioned in his comment, a need to discuss such a ratio. I am on holiday in New England with my second bottle of wine, memories of having spent the day eating Lobster, kicking though Fall leaves and sitting by a warm fire reading a book, so I am mellow enough to oblige.
Sadly, out of context, no such ratio probably exists. *sigh*. There evaporates the warm glow of the day :-).
The question of “is the number of consistent gets per row good or bad?” is a bit like the question “is the pay rate good enough?”. It really depends on the whole context, but there is probably an upper limit. If I am helping my brother fit {yet another} kitchen then the pay rate is low. He has helped me fit a few, I have helped him fit a few, a couple of pints down the pub is enough and that equates to about 30p an hour. Bog standard production DBA work? 30-60 pounds an hour seems to be the going rate. Project Managing a system move that has gone wrong 3 times already? I need to factor in a long holiday on top of my normal day rate, so probably high hundreds a day. £10,000 a day? I don’t care what it is, I ain’t doing it as it is either illegal, highly unpleasant, both or involves chucking/kicking/hitting a ball around a field and slagging off the ref, and I ain’t good at ball games.
I have a rule of thumb, and I think a rule of thumb is as good as you can manage with such a question as “is {some sort of work activity on the database} per row too much?”. With consistent gets, if the query has less than 5 tables, no group functions and is asking a sensible question {like details of an order, where this lab sample is, who owes me money} then:
- below 10 is good
- 10-100 I can live with but may be room for improvement
- above 100 per record, let’s have a look.
Scary “page-of-A4” SQL statement with no group functions?
- 100-1000 consistent gets is per row is fine unless you have a business reason to ask for better performance.
Query contains GROUP BY or analytical functions, all bets are pretty much off unless you are looking at
- a million consistent gets or 100,000 buffer gets, in which case it is once again time to ask “is this fast enough for the business”.
The million consistent gets or 100,000 buffer gets is currently my break-even “it is probably too much”, equivalent to I won’t do anything for £10 grand. 5 years ago I would have looked quizzically at anything over 200,000 consistent gets or 10,000 buffer gets but systems get bigger and faster {and I worry I am getting old enough to start becoming unable to ever look a million buffer gets in the eye and not flinch}. Buffer gets at 10% of the consistent gets, I look at. It might be doing a massive full table scan in which case fair enough, it might be satisfying a simple OLTP query in which case, what the Hell is broken?
The over-riding factor to all the above ratios though is “is the business suffering an impact as performance of the database is not enough to cope”? If there is a business impact, even if the ratio is 10 consistent gets per row, you have a look.
Something I have learnt to look out for though is DISTINCT. I look at DISTINCT in the same way a medic looks at a patient holding a sheaf of website printouts – with severe apprehension. I had an interesting problem a few years back. “Last week” a query took 5 minutes to come back and did so with 3 rows. The query was tweaked and now it comes back with 4 rows and takes 40 minutes. Why?
I rolled up my mental sleeves and dug in. Consistent gets before the tweak? A couple of million. After the tweak? About a hundred and 30 million or something. The SQL had DISTINCT clause. Right, let’s remove the DISTINCT. First version came back with 30 or 40 thousand records, the second with a cool couple of million. The code itself was efficient, except it was traversing a classic Chasm Trap in the database design {and if you don’t know what a Chasm Trap is, well that is because Database Design is not taught anymore, HA!}. Enough to say, the code was first finding many thousands of duplicates and now many millions of duplicates.
So, if there is a DISTINCT in the sql statement, I don’t care how many consistent gets are involved, of buffer gets or elapsed time. I take out that DISTINCT and see what the actual number of records returned is.
Which is a long-winded way of me saying that some factors over-ride even “rule of thumb” rules. so, as a rule of thumb, if a DISTINCT is involved I ignore my other Rules of Thumb. If not, I have a set of Rules of Thumb to guide my level of anxiety over a SQL statement, but all Rules of Thumb are over-ridden by a real business need.
Right, bottle 2 of wine empty, Wife has spotted the nature of my quiet repose, time to log off.
Partitions are Not {just} for Performance October 29, 2009
Posted by mwidlake in Architecture, performance, VLDB.Tags: Architecture, partitions, performance, VLDB
18 comments
There is a myth that Partitions in Oracle magically aid SQL Query performance, even a general assumption that the main role of partitioning is to aid such query performance. This is not so. There are certainly many cases where Partitioning can aid performance but in my experience they just as often hinder rather than help overall performance.
The myth is so strong that when asked at interview, most DBAs {and even some performance specialists} will site query performance as the main (and occasionally only) benefit of partitioning.
Why can partitions hinder SQL query performance? Let’s say for example that you have a 10 million row table and an index on that table. That index has a B-Level of 3, which means it has a root block, one level of branch nodes, a second layer of branch nodes and then the leaf-node level. (NOTE the below diagram shows B-level as 4 on the left of the diagram. That is because it is taken from a presentation I do where I make the point that if the data dictionary says an index has a B-level of 3, there are 4 levels. 3 “branch”, so B, and then the 4th “Index leaf entries”. The titles on the right indicate this). To access a row in the table via the index Oracle needs to read the root block, two branch blocks and then a leaf block to get the rowid of the record. This allows the table block {and from that the row} to be read. This is depicted in the below diagram, the numbered orange squares represent the blocks as selected in turn from the index and then table:
That is 5 I/O operations to access that row.
Now we partition the table into 5 partitions, 2 million rows each. The index is locally partitioned. If you are lucky, you may, just may, end up with local index partitions with a B-level 1 less then the original table, so in our case a B-level of 2. The often suggest process is now that one partition is considered and that the CBO will read one root node, a branch level block, a leaf block and then the block from the partition.
4 I/Os and a saving of 20% {I’m ignoring for now caching and whether it is physical or logical IO}.
A saving of 20% IF {and only if} you have local index partitions with a lower B-Level than the equivalent non-partitioned index. And the access is to one partition alone.
However, I keep seeing situations where the index look up does not include the partition key. So you get the below situation:
Lacking the partition key, the CBO cannot exclude any partitions – so it has to scan each one. For most partitions, maybe for all but one, no records are found, but the index has to be checked with 3 IO operations each. so in my example 16 I/Os are done to get the one record of interest.
16 I/O operations instead of 5. I numbered them in the diagram.
The situation is often not spotted, at least initially, as the time taken to carry out the extra local index partition scans is “small”, especially for specific lookups. Usually any testing is done on a table with only a small number of partitions.
I remember well the first time I came across this {on an Oracle 9.0 database I think}, there was well over 100 partitions and a process that checked many thousands of individual records had slowed down significantly, taking 4 or 5 times as long as before.
An indicator that the problem is occurring is when a single record lookup against the index and then table is taking perhaps several dozen to several hundred consistent gets rather than the 5 or 6 it should. Also, you will see the partition iterator in the explain plan. In that first case where I came across the issue, consistent gets of about 380 per record fetched were being seen for a select that returned 1 record 99% of the time from a single table lookup. I’ve since seen the same problem on tables with over a thousand partitions, each local index being scanned for a total of almost 5000 consistent gets per record.
You may think that this would be an unusual situation as access against very large tables is either full/partial table scans or lookups on the PK/with a WHERE clause including the partitioning key – but it is actually very common. Partitioned tables are being used more and more in large but generally OLTP applications or sets of records are identified in a datawarehouse that are then checked more specifically with generally row-specific logic. With VLDBs which have many, many partitioned tables with many, many partitions each, the problem is common and often not recognized in the fog of other issues and massive I/O levels.
I’ve only covered a general performance issue with partitions here, I’ll expand on the theme and this simplistic example in the next post.
And yes, there are many ways partitioning CAN aid performance. I aim to get to those too. I really love partitioning.
Accessing Roles in stored PL/SQL October 22, 2009
Posted by mwidlake in development, internals.Tags: data dictionary, PL/SQL, privileges
7 comments
Whilst looking for the minimum privileges I needed to execute DBMS_STATS.FLUSH_DATABASE_MONITORING_INFO {it is ANALYZE ANY, by the way} I discovered something about PL/SQL and roles that I did not know. Now, any right I had to claim expertise in PL/SQL expired at least 8 years ago but I asked some friends who ARE still professional PL/SQL experts and they did not know this either.
Privileges granted via Roles to a user are not available to stored PL/SQL created by that user, correct? This is widely known and understood. You have to grant priveleges directly to the user for them to be seen in the PL/SQL packages, functions etc.
Having found that I needed the ANALYZE ANY privilege as I mentioned above, I asked the DBA team to grant my user that privilege on Production and Staging. They did so – via a role. “it won’t work” I said “I run the code via a package, it won’t see the privilege” and proved it by running DBMS_STATS.FLUSH_DATABASE_MONITORING_INFO via a quickly constructed demo. Except I only proved my ignorance, it worked. WHY?
If a package is created such that it is executed with invokers rights then roles are seen.
This is my test script:
create or replace package test1 is procedure run_flush; end test1; / -- create or replace package test1 authid current_user is procedure run_flush is cursor get_ses_roles is select role from session_roles; begin dbms_output.put_line('starting'); for ses_roles_rec in get_ses_roles loop dbms_output.put_line(ses_roles_rec.role); end loop; dbms_output.put_line('flushing'); dbms_stats.flush_database_monitoring_info; dbms_output.put_line('ending'); end; begin null; end; /
I create this package as user MDW.
Now as a privileged user I create a role and grant analyze_any to the role.
MGR>create role mdw_role Role created. MGR>grant analyze any to mdw_role; Grant succeeded.
I’ll just prove that user MDW cannot yet execute the monitoring procedure
MDW> exec dbms_stats.flush_database_monitoring_info BEGIN dbms_stats.flush_database_monitoring_info; END; * ERROR at line 1: ORA-20000: Insufficient privileges ORA-06512: at "SYS.DBMS_STATS", line 2148 ORA-06512: at "SYS.DBMS_STATS", line 14135 ORA-06512: at line 1
Now I grant the role
MGR>grant mdw_role to mdw Grant succeeded.
MDW has to log out and back in again to see the role correctly. Having done this I check for the role and then try to execute the test procedure:
MDW> select * from session_roles ROLE ------------------------------ CONNECT MDW_ROLE 2 rows selected. MDW> exec test1.run_flush starting CONNECT MDW_ROLE flushing ending PL/SQL procedure successfully completed.
You can see that the package sees the roles and it executes the procedure successfully. So, stored PL/SQL can utilise privileges via roles if the packages is created with authid current_user, ie executors rights.
I better admit, as someone else might raise it, that this is not the best demonstration of this feature. I recreated the package with the first line set to:
create or replace package test1 is
ie the default of owners privileges. I now re-execute the call to the package:-
MDW> exec test1.run_flush starting flushing ending PL/SQL procedure successfully completed.
Note that the roles are no longer seen. However, the DBMS_STATS.FLUSH_DATABASE_MONITORING_INFO still executed correctly.
Why?
Well, because if you look at the package specification of DBMS_STATS you see:-
create or replace package sys.dbms_stats authid current_user is
It makes sense. It is dangerous for system-owned packages to be executing as the owner, ie SYS, as execute access to the package would allow access to SYS privileges.
Which, of course, is why my little proof script executed the flush correctly and I looked a bit silly in front of the DBA team. Oh well.
When do We Learn #2 October 20, 2009
Posted by mwidlake in Blogging, Perceptions.Tags: knowledge
4 comments
I exchanged a couple of mails with a friend a few weeks back about how the same topic can arise in a couple of blogs at the same time. Well, I had just blogged myself on when we learn and, blow me over with a feather, that Jonathan Lewis goes and post in a similar vein. He must be nicking ideas off my blog 🙂 {and yes, I am being tongue-in-cheek here}. We both post thought about needing spare capacity in your life to be able to spend the time to really understand how something works. Yes you learn a lot in the heat of a crisis, but you rarely reallu understand the details, ie become an expert, without having time to digest and qualify that knowledge.
I did write a long comment on his posting, including some links back to my own meandering thoughts on the topic, then realised that I would come across as a bit “me too” so I trimmed it and took out the links. But that is part of why I do my own blog, I found I was spamming other people’s pages with my diatribes and so decide to spam my own. {And I know I am meandering, I’m a bit sleep-deprived, stream of consciousness at the moment}. So here I can refer back to my own stuff and say “me too”, but you are already here reading this, so you only have yourself to blame :-)… Anyway, I wanted to refer back to a very early blog of mine about how much knowledge is enough. I try and make the point that you do not need to know everything, you can become a small-field or local expert just by being willing to learn a bit more.
Jonathan raises the point that he does not have a full time commitment to one client and so he has the luxury to investigate the details and oddities of what he looks into. He suggest this is a large part of why he is an expert, which I feel is true, and I am very happy to see one of the Oracle Names acknowledging that relative freedom from other pressures is key to having the luxury to chase down the details. Those of us in a full time role doing eg DBA, development or design work, have more than enough on our workday plates to keep us too busy. We cannot be top experts, we have a boss to satisfy and a role to fulfill. {Jonathan does not mention that chosing a career where you have luxury of time is also a pretty brave choice – you stand a good chance of earning a lot, lot less whilst working very hard to establish enough of a reputation to be able to earn enough to feed yourself and the cat}.
But this is not a black and white situatuation. There is room for many of us to become experts in our domain or in our locality. Our breadth of knowledge may never be as wide as others, we may not know more than anyone else in a given area {and let’s face, logically there can only be one person who knows the most about a given topic, and that one person is probably in denial about their superiority, which seems to be a defining quality of an expert – it is not so much humility I think as an acknowledgement of there being more to know and a desire to know it}. However, most of us can become the person in our organisation who knows most about X, or who can tie A, B and C together in a more holistic way than others (and that can be a real trick you know). There are always the top experts that you can call on for the worst problems, but you could become the person people come to first.
My advice would be to not try and learn everything about all aspects of Oracle, because you can’t, but rather learn a lot about one or two areas {and consider areas that are more unusual, not just “tuning SQL” or “the CBO”} and expand just your general knowledge of the wider field. And never forget that there is more to learn. So long as you are taking in more knowledge and understanding, you are improving. The best way to do it? Don’t just read other people’s stuff, try teaching someone else. It never ceases to amaze me how stupid I realise I am when I try and show someone else how something works. But that’s OK, so long as they learn it’s fine. If I learn as well, it’s great, and I nearly always do.
I’m getting on a bit, I think I am finally getting the hang of the idea that the more you know the more you realise you don’t know, I wish I knew that when I knew nothing.
Privileges required to FLUSH_DATABASE_MONITORING_INFO October 19, 2009
Posted by mwidlake in development, performance.Tags: data dictionary, privileges, statistics, system development
1 comment so far
I’m doing some work at the moment on gathering object statistics and it helps me a lot to have access to the number of changed records in SYS.DBA_TAB_MODIFICATIONS. To ensure you have the latest information in this table, you need to first flush any data out of memory with DBMS_STATS.FLUSH_DATABASE_MONITORING_INFO. For the live system, the DBAs rather understandably {and sensibly} want all users to run with the least access privileges they need, so granting DBA role to my user is out.
I googled for the actual system privilege or privileges needed to flush_database_monitoring_info and drew a blank, so I have had to find out for myself. And being a nice guy {who am I kidding}, I am now recording the info for anyone else who is interested to find:
On 10.2.0.3, to execute DBMS_STATS.FLUSH_DATABASE_MONITORING INFO you need the ANALYZE ANY system privilege.
{Not ANALYZE ANY DICTIONARY, which would make more sense to me}
For those who like such things, here is the proof. I had to use two sessions, thus the constant displaying of system time.
-- current user privs DWPERFDEV1> @usr_privs enter user whos privs you wish to see> dwperf GRANTEE TYPE PRIVILEGE adm ---------------------------------------------------------------- DWPERF SYSP CREATE JOB NO DWPERF SYSP CREATE PROCEDURE NO DWPERF SYSP CREATE PUBLIC SYNONYM NO DWPERF SYSP CREATE SESSION NO DWPERF SYSP CREATE SYNONYM NO DWPERF SYSP CREATE TABLE NO DWPERF SYSP CREATE TRIGGER NO DWPERF SYSP DEBUG CONNECT SESSION NO DWPERF SYSP DROP PUBLIC SYNONYM NO DWPERF SYSP EXECUTE ANY PROCEDURE NO DWPERF SYSP SELECT ANY DICTIONARY NO DWPERF SYSP SELECT ANY TABLE NO DWPERF ROLE CONNECT NO DWPERF OBJP SYS.DBMS_UTILITY-EXECUTE NO DWPERF_ROLE SYSP ANALYZE ANY NO DWPERFDEV1> @showtime Date Time -------------------------------------------------------- 19-OCT-2009 13:29:16 DWPERFDEV1> exec dbms_stats.flush_database_monitoring_info BEGIN dbms_stats.flush_database_monitoring_info; END; * ERROR at line 1: ORA-20000: Insufficient privileges ORA-06512: at "SYS.DBMS_STATS", line 2148 ORA-06512: at "SYS.DBMS_STATS", line 14135 ORA-06512: at line 1 DEV1> @showtime Date Time ------------------------------------------------ 19-OCT-2009 13:29:30 DEV1> grant analyze any dictionary to dwperf; Grant succeeded. DWPERFDEV1> @showtime Date Time ------------------------------------------ 19-OCT-2009 13:29:40 DWPERFDEV1> exec dbms_stats.flush_database_monitoring_info BEGIN dbms_stats.flush_database_monitoring_info; END; * ERROR at line 1: ORA-20000: Insufficient privileges ORA-06512: at "SYS.DBMS_STATS", line 2148 ORA-06512: at "SYS.DBMS_STATS", line 14135 ORA-06512: at line 1 DEV1> @showtime Date Time --------------------------------------------- 19-OCT-2009 13:30:46 DEV1> grant analyze any to dwperf; Grant succeeded. DWPERFDEV1> @showtime Date Time ------------------------------------ 19-OCT-2009 13:31:20 DWPERFDEV1> exec dbms_stats.flush_database_monitoring_info PL/SQL procedure successfully completed. -- SUCCESS! DEV1> @showtime Date Time ------------------------------------------- 19-OCT-2009 13:31:38 DEV1> revoke analyze any from dwperf Revoke succeeded. DWPERFDEV1> @showtime Date Time ------------------------------------------------------ 19-OCT-2009 13:31:57 DWPERFDEV1> exec dbms_stats.flush_database_monitoring_info BEGIN dbms_stats.flush_database_monitoring_info; END; * ERROR at line 1: ORA-20000: Insufficient privileges ORA-06512: at "SYS.DBMS_STATS", line 2148 ORA-06512: at "SYS.DBMS_STATS", line 14135 ORA-06512: at line 1
Of course, I’ll soon find something else that breaks due to my minimum privs before the end of the day, but it’s not easy creating more secure systems {note, I don’t say Secure, just more secure, as in less open!}.
Friday Philosophy – when do we learn? October 17, 2009
Posted by mwidlake in Perceptions, Private Life.Tags: behaviour, private
5 comments
I’ve had a theory for a while that there are two times when we learn:
- When we are under extreme duress
- When we are under no duress at all
I think all technicians would agree with the former. We learn a lot when something very important needs doing urgently, like getting the database back up or finding out why the application has suddenly gone wrong {Hint, very often the answer is to find What Changed}. Another example is when a decision has been made to implement something a manager has seen a nice sales presentation on and they really like the look of it. We technicians have to make it actually work {and I admit to once or twice having been the Manager in this situation :-). I apologise to my people from back then}.
I’ve also believed for a while that the other time you learn, or at least can learn, is when things are unusually quiet. When work is just at it’s normal hectic pace, it’s hard to spend the extra effort on reading manuals, trying things out and checking out some of those technical blogs. You spend all your spare effort on The Rest Of Your Life. You know, friends, partners, children, the cat.
So I think you need some slack time to learn and that is when the most complete learning is done. Yes, you learn a lot when the pressure is on, but you are generally learning “how to get the damned problem resolved” and probably not exactly why the problem occurred; did you fix the problem or just cover it over? Did you implement that new feature your boss’s boss wanted in the best way, or in a way that just about works. You need the slack time to sort out the details.
When do we get slack time? Weekends and holidays. How many of us have snuck the odd technical book or two into our luggage when going on holiday? {And how many of us have had that look from our partners when they find out?}.
Well, at the end of this week I am going on two and a half weeks holiday, over to New England in the US. A few days in Boston, up through Maine, across to Mount Washington to a little hotel where we had possibly the best meal of our lives, down to Mystic and then over to Washington to see some friends.
I am not taking any manuals. I am not taking any technical books. I am not taking a laptop with Oracle on it. I am not even likely to blog for the duration. Why? I have not been as mentally and physically shattered as I am now since I finished my degree 20 years ago. I just want to switch off for a while.
So I am revising my theory of when we learn. I now think we learn when:
- When we are under extreme duress {that just does not change}
- When we have spare mental capacity and the drive to use it.
Right now, I think I have the mental capacity of a drunk squirrel. So from the end of next week, I’m going to sleep, read sci-fi, eat and drink well and maybe do a bit of culture. The computers and the learning can wait for a little while.
VLDB Backups October 13, 2009
Posted by mwidlake in Architecture, VLDB.Tags: Architecture, backup, system development, VLDB
9 comments
One of the indications that your database classes as a VLDB is that your backups are giving you problems simply due to the size of the database.
As some of you will know from previous posts of mine, like this one about how vital it is to prove your backup and this one about how you maybe need to back up more than just the database, I have a thing about backups. Or, more specifically, recovery.
My focus on backups may surprise some people who have worked with me as I often state “I am not a production DBA”. {And no Dave, this is not me saying I am too good to be a proddy DBA, it is me saying I do not have current, strong skills in many of the daily proddy DBA tasks}. However, I am an architect. Whenever I work on a new system or heavily modify an existing system, I try and keep the need for backup and recovery at the front of my mind.
The most common issue encountered with backing up a VLDB is the time it takes to run the backup, it can’t be done in the time between backup cycles, usually a day.
The second most common issue is the impact on the live system of running the backup. Sometimes this impact is overstated, after all if the backup is one thread running on the server it is only going to consume as much resource as one thread can, which may well leave enough over for the daily processing requirements, but usually for large systems steps have been taken to run the backup in parallel, thus creating considerable load on the system.
A third issue, which is related to the first, is that the backup takes so long and uses so many tapes (or space) that it rarely completes – a network glitch, a failure of the backup suite, running out of media, all stop the backup finishing. I’ve been in the situation of attempting 4 or 5 backups for each one that succeeds as something crops up in the 2 or 3 days it takes to run the backup. {In our case it was the flaky backup software, grrrrr}.
The final issue I’ll mention is one that is often overlooked. You can’t afford the time to recover the backup if it was ever needed. I’ve seen this especially with export or expdp-based backups – Some sites still use export and it has it’s place with smaller systems – often it seems to be used with OLTP systems that have more than 75% of the database volume as indexes. The export runs fine overnight, it is only processing that 25% of the system that is data. But when you ask the client if they can wait 5 days to import the export they go pale. This time-to-recovercan also be a problem with RMAN backups, you need to read in everything you wrote out.
I’ve said it before but I’m going to say it again – a backup is not a backup until you have done a successful test full recovery. This would certainly highlight how long your recovery takes.
So, how do you solve the problem of backing up a VLDB?
Well, one solution is to not bother. I know of a couple of sites that have two physical copies of the database, at different locations, and write all data to both. If they lose one copy, they can keep running on the other copy whilst the lost version is rebuilt. Your swap-over could be almost instant.
Drawbacks here are:
- If you lose one copy you have no redundancy until the second system is rebuilt. This is like losing a disk out of a RAID5 array, another failure is disaster. As databases get bigger, this period of zero redundancy gets longer and thus the chance of a second failure increases (which again is just like the RAID5 array – yet another argument against massive discs).
- As you write to both systems, if the damage is caused by the feed (eg accidentally deleting data) then both are damaged, unless you have a delay on one system, in which case you now have issues with catching up on that delay if you have to swap to the second system. Flashback may save you from damage caused by the feed.
- The cost of the second system and the complexity of the double-writes can both be issues.
Another solution is physical DataGuard. I see this as slightly different from the double-system approach as above as you have more options, such as replicating to more than one other system, opening and reading the DataGuard copy, opening and using the copy before flashing it back and re-recovering, even Active DataGuard, where you can have the standby database open and being used, even whilst it is kept up-to-date. Again, you can set things up so that the gap between primary system failure and bringing up a new production system is small. A few issues to be mindful of are:
- You have to ensure that your primary database is running in forced logging mode or you are extremely, and I mean extremely, careful about what you do against the database that is unrecoverable. The latter option is just asking for trouble actually. Which is a shame, as all those performance tricks of doing direct IO, append operations and nologging activities to help you process all the data in your VLDB are no longer available to you. This might be a show-stopper.
- You have to take care in setting it all up and may need extra licence.
- You still have the issue of damage being transmitted to your “backup” before you spot it.
- The main issue? Someone will get clever and use your DataGuard systems for other things {Such as opening the standby to changing it and then flashing the data back, or use active data guard for reporting which becomes critical to your business} and now you actually have a production critical system split across the DataGuard architecture. It has stopped being a backup, or at least not a dedicated backup. Ooops.
There is actually no need to backup the whole database every night, though some sites seem fixated on achieving this. Or even every week. There is nothing wrong in having an RMAN level 0 {zero} backup that is a copy of everything and then just keep backing up the archived redo logs for eg 2 weeks before doing another level 0. So long as you thoroughly test the recovery and ensure you can recover the level 0, get hold of all those redo logs and apply them in a manner timely enough to support your business. I’ve recovered a level 0 backup over a month old and then run through all the archived redo logs to recreate the system, it worked fine as the volume of redo was pretty small compared to the database. Some considerations with this method are:
- If you ever have trouble getting files from different days out of your backup area, or occasionally find files from your backup system are corrupt, do not even think of using this method. One missed archive redo file from 13 days back and you are in serious trouble.
- You need to do those level zero backups and they take a while. remember what I said about issues during a long backup?
- It can get complex.
- There is going to be a fairly significant delay in recovering your system.
There are several options with RMAN of doing incremental and cumulative incremental level 1 backups against a level 0 baseline backup. They have the same pros and cons as above, often trading more complexity with shorter recovery times. All good so long as you practice the recovery.
Physical copy at the storage level. These solutions seems to come and go every few years, but the principle is usually either (a) splitting mirrors – you have eg 3 copies of the data duplicated across the storage, you can un-couple one copy and do to it what you want, like copy it to tape- and then reintroduce the copy and catch up on changes, ie “resilver” the mirror. (b) use fancy logging within the storage layer to create a logical copy of the whole live DB at a point in time by tracking and storing changes. You can then take your time copying that logical version to your backup destination. Taking the initial copy is usually instantaneous and with (b) can take up a surprisingly small amount of space. Disadvantages?
- Cost. These clever IO units that can do this seem to be eye-wateringly expensive
- Tie-in. You move storage provider, you need to re-plan and implement a new backup strategy
- Probably personal this one, but can you trust it? I saw it go horribly wrong in about 1998 and even now I kind of wince internally thinking about it.
Export and Import. OK, I kind of rubbished this approach earlier and who in their right minds would try and export/import a VLDB of 20TB? You don’t. You export the critical few percent of the database that you need to recreate some sort of temporary production-capable system. Many applications can actually get by with all the reference/lookup data and the latest month or two of active business data. It gets a workable system up and running to keep your business process ticking over whilst you sort out recovering the rest of the system. The beauty of an export is that it can be imported to any working Oracle database of a high enough release level.
3 months ago I would have said this consideration needed to have been designed into you system architecture from the start, to stand any reasonable change of working, but I know of one site that managed just this technique recently. Only because they absolutely had to, but they managed it.
My final backup methodology I’m going to mention here is – you do not need to back up all of your database in the same way. If you can move a large percentage of your database into readonly tablespaces, you can back up that section of the database once {disclaimer, by once I mean two or three times to two or three places and check you can read what you wrote and padlock the door to the vault it is in, and repeat said once-only backup every 6-12 months} and drop that section out of your backup. Now you only need to back up the remaining, hopefully small, active section of the database with whatever method you want. You can tie in the previous above of only needing to recover a critical subset of the system to get going again, ie what is not readonly, the two approaches complement each other. A few issues:
- It only really works when you design this into the system from the start.
- potentially complex recovery spread over a lot of tapes. Can you rely on being able to get at them in a timely manner?
- People have a habit of wanting to update some of the stuff you made readonly. Sometimes only a few records but spread over a lot of readonly tablespaces.
All the above can be mixed and matched to come up with an overall solution. Personally I like having a physical standby database for immediate failover and an incremental backup off site for recovery beyond some situation that gets both primary and standby database.
Keeping the server and storage utilisation high October 9, 2009
Posted by mwidlake in performance.Tags: Humour, performance
6 comments
A friend of mine sent me this today, from an old site of mine:
******************************************
From: John Smith
Sent: 07 October 2009 10:08
To: Sarah Sims
Cc: DBA team
Subject: RE: Performance issues on your Test servers
Hi Sarah,
Please could somebody tell us why this query is running repeatedly on your database:
SELECT composite_key , exact_time , object_type , table_name , user_id , xml_data FROM usr1234.acramendlogshd ORDER BY 2;
It’s very prominent in both of the 1-hour time slices I’ve analyzed so far, and is fetching the entire 12GB 20M-row table. This is so absurd that it looks almost like a programming error!
The same table in the production environment is almost the same size, 18M-rows.
Regards,
John
******************************************
******************************************
From: Sarah Sims
Sent: 07 October 2009 11:18
To: John Smith
Cc: DBA team
Subject: RE: Performance issues on your Test servers
John,
This would be the ACRALS service which has a bug in it currently which means that it runs continuously but never achieves anything.
Sarah
******************************************
So, not only inefficient and pointless but known to be inefficient and pointless. And still on.
And as some of you may suspect, yep this is a third party application where changing the code is forbidden. Seems like testing before release might also be forbidden….
Friday Philosophy -Do I think Oracle is Rubbish? October 8, 2009
Posted by mwidlake in Blogging, Perceptions.Tags: system development
1 comment so far
This should be a “Friday Philosophy” posting really, but heck it’s my blog, I can do what I want {quiet smile}. Besides, by the time I finish this, it might well BE Friday. Oh, heck, I’ll just change the title now to a Friday Philosophy one…
I’ve been reviewing some of my blog this week {it is coming up to 6 months since I started so I was looking back at how it has gone}. Something struck me, which is I can be pretty negative about Oracle software and even Oracle Corp at times.
I mostly seem to pick up on oddities, things that do not work as first seems, even outright bugs. I do not often post about “this is how this cool Oracle feature works” or “I used this part of Oracle to solve this problem”. Partly the reason is that there are a lot of blogs and web pages about “how this feature works”, so the need is generally already met. Partly it is that I, like most people, are more interested in exceptions, gotchas and things going wrong. If it works, heck you just need to read the manual don’t you?
So, do I like Oracle?
Yes. Over all I really like working with Oracle. This is because:
- I can store and work with pretty much whatever data I have ever needed to with Oracle. It is rare for me to be utterly stumped how to achieve something, though it could take time and maybe be a tad slow or a little inelegant, but it can be done.
- Despite my recent complaints, you can chuck a hell of a lot of data at Oracle. Back in 2002 I was asked if I could put 7 or 8 Terabytes of data into an Oracle database. I did not even pause before saying “Yes!” – though I knew it would be a big job to do so in a way that was maintainable. I’d now feel the same about a couple of hundred TB.
- The core technology works really well. We all complain about bits and pieces admitedly, but if I have a complex SQL statement with 15 tables and 25 where clauses, I don’t worry about the database giving me the wrong answer, I worry about the developer having written it wrongly {or Oracle running it slowly, but that keeps me in work, hehe.}. I can back up Oracle in many ways and, once I have proven my recovery, I know I can rely on the backup continuing to work, at least from an Oracle perspective. I’ve never yet lost any production data. Do I worry about transactional consistency? Never. Maybe I should, I’ve seen a couple of blogs showing how it can happen, but in my real-work life, I never even think about it.
- Oracle does continue to improve the core products and they will listen to the community. It might not seem like it at times, I know, but they do. It can just take a long time for things to come through. As an example, I worked with the Oracle InterMedia developers back with the Oracle 10 beta program in 2003. They {well, to be specific, a very clever lady Melli Annamalai} were adding stuff back then that we and others needed that did not get to see the light of day in 10GR1, but was there as a load of PL/SQL to do it in 10GR2. Melli said she was adding it into the code base as ‘C’ as well but it would take a while. It did, I think it was part of the 11G release.
Will this stop me complaining and whining on about bits of Oracle I don’t like or that do not work as they should? Absolutely not. As Piet de Visser said on a comment to one of my recent blogs, it is beholden on us Users to keep Oracle Corp honest. But I thought I ought to mention, at least once, that I do actually like Oracle.
I Like Oracle, OK?
Grudgingly 🙂
Describing tables you can’t DESC October 7, 2009
Posted by mwidlake in internals.Tags: data dictionary, SQL
3 comments
This is more an oddity than anything particularly useful. Sometimes you can’t use the sql*plus DESCRIBE {DESC} command on tables- but you might have an alternative.
I’m doing a lot of work for a client on a 10.2.0.3 database. I have SELECT ANY DICTIONARY but not SELECT ANY TABLE privilege. This is because there is sensitive data in the database and it is duly protected {and this is certainly not the first time I have worked for a client with full dictionary access but not data access, it’s becoming normal}. I’m granted access to specific things as needs arise.
I knew I had to look at a table called AD_FAKE_STATS_COLUMNS.
select count(*) from mdw.AD_FAKE_STATS_COLUMNS / select count(*) from mdw.AD_FAKE_STATS_COLUMNS * ERROR at line 1: ORA-00942: table or view does not exist
Have I got the table name wrong?
select table_name,owner from dba_tables where table_name = 'AD_FAKE_STATS_COLUMNS' TABLE_NAME OWNER ------------------------------ ----------------- AD_FAKE_STATS_COLUMNS MDW
OK, it is there, I don’t have access to it. Fine, I don’t want access to the data, I just want to see the structure of the table:
desc mdw.AD_FAKE_STATS_COLUMNS ERROR: ORA-04043: object mdw.AD_FAKE_STATS_COLUMNS does not exist
Oh. DESC in sql*plus does not work.
I can’t DESC a table I do not have access to carry out DML on. I’m going to have to go and ask someone to give me permission to see the table. How annoying.
Or do I?
@tab_desc Enter value for tab_name: AD_FAKE_STATS_COLUMNS old 21: where table_name like upper (nvl('&TAB_NAME','W')||'%') new 21: where table_name like upper (nvl('AD_FAKE_STATS_COLUMNS','W')||'%') TAB_OWN TAB_NAME COL_NAME M COL_DEF ----------------------------------------------------------------- MDW AD_FAKE_STATS_COLUMNS TABLE_NAME Y VARCHAR2(30) COLUMN_NAME Y VARCHAR2(30) COPY_STATS_FROM N VARCHAR2(61) LOW_VALUE_SQL N VARCHAR2(100) HIGH_VALUE_SQL N VARCHAR2(100) DISTINCT_SQL N VARCHAR2(100) DAYS_HIST_NULL_AVGLEN N NUMBER(3,0)
🙂
I have access to the data dictionary. So I can see the structure of the table, which after all is what I wanted and is what the client is happy for me to have.{I’ve never much liked the DESC command in sql*plus, I replaced it with a little sql script against the data dictionary years ago}.
In case you want it, here is the script:
-- tab_desc.sql -- Martin Widlake date? way back in the mists of time -- my own replacement for desc. -- 16/11/01 improved the data_type section SET PAUSE ON SET PAUSE 'Any Key...>' SET PAGES 24 col Tab_own form A10 col tab_name form a22 wrap col col_name form a28 wrap col col_def form A14 break on tab_own skip 1 on tab_name skip 1 spool tab_desc.lst select owner Tab_Own ,table_name Tab_Name ,column_name Col_Name ,decode(NULLABLE,'Y','N','Y') Mand ,data_type||decode(data_type ,'NUMBER','(' ||decode(to_char(data_precision) ,null,'38' , to_char(data_precision)|| decode(data_scale,null,'' , ','||data_scale) ) ||')' ,'DATE',null ,'LONG',null ,'LONG RAW',null ,'('||Substr(DATA_LENGTH,1,5)||')' ) col_def from dba_tab_columns where table_name like upper (nvl('&TAB_NAME','WHOOPS')||'%') order by 1,2,column_id,3,4 / spool off clear col --