jump to navigation

Friday Philosophy – Friends who Know Stuff August 29, 2009

Posted by mwidlake in Uncategorized.
Tags: ,
2 comments

Being a consultant/contractor can be hard work. You are expected to turn up on-site and become useful before the second cup of coffee has been drunk. Then, if you manage that, there is an expectation that you will continue to pull IT Rabbits out of computer Hats for as long as you are there. Which is actually reasonable, we are after all usually paid well for being consultants/contractors.

This need to always have an answer can become quite hard, especially as those of us who chose not to be permanent staff have a bit of a disadvantage, one which those who have normal jobs may not appreciate.

Permies have a group around them called “the team” who they can call upon and talk about issues. Permies tend to stay in an organisation for many years and build up strong contacts with people, who they can call upon years after they have moved on. For us nomads, it can be far harder to make strong links to people you can call upon. That is not to say most teams are unfriendly when you go on site, it is just that by the nature of starting off as a temporary member of the group and moving on after a year, 6 months, even a week or two, developing strong ties to people so that you feel able to badger them 2 years later is less likely to happen.

Don’t under estimate the benefit of being able to call on old friends to get a second opinion (or of being called yourself to help assist some old friend who has got to get to grips with some section of IT knowledge that you had to deal with for 2 years). It really helps.

Some of you are probably thinking “well, you Consultant types just ask each other, the other experts that you all know”. Well, sometimes, but we tend not to work with each other much. Contractors rarely get to work with each other repeatedly on different projects unless you find yourself in a position where the client needs someone with X skills and you have a friend with X skills who is looking for a new position.

This is something I have become very aware of, having gone from Contractor for half a decade, to Permy for 6 years and back again to contractor/consultant. I miss having a stable team of collegues to discuss things with.

So, Friends with Skills are important. And it is a two-way thing, if you expect to be able to call on some old collegues for help, then you need to be helpful when they call on you.

Is this a case of “who you know not what you know”? Yes and no. It is not about contacts getting you a leg up. It’s about developing and keeping a group of work-related friends where you all help each other when there is a need. Proper friendship is about sharing, not using.

RAC GV$SQL type Bugs August 28, 2009

Posted by mwidlake in internals.
3 comments

This week, I have mostly been hitting obscure bugs with Oracle RAC and GV$SQL/GV$SQL_PLAN.
ORA-00600 with GV$SQL plus ORA-12085 with GV$SQL_PLAN and ORA-03113 with V$SQL_PLAN. I’ve listed them in the hope anyone else hitting them will pick up this Blog entry (they are obscure but, even so, there is not a lot “out there” about them.)

I’m basically looking at getting a handle on overall Oracle SQL performance variation. This is driven by two areas of consideration:

  • SQL suddenly going rogue (IE a good plan swapping to a very bad plan)
  • The system-wide impact of changes, like gathering system statistics or changing initialisation parameters.

In an ideal world you would have a fully developed test system to try out the latter and your monitoring tools (like OEM/AWR or Toad or whatever you set up) would pick up the former. Back in reality, you can find yourself lacking both and I have had the idea for a looooooong time  to set up something light-weight that checks the V$SQL… views directly to pick these up and now I am implementing something to do this. And finding out some of the pains and aggravations in doing so…

Enough meandering, these are some bugs some other poor sods are going to hit and these are the answers I have found. They may not be the best answers but it is all I have to offer :-).

Select count(*) from GV$SQL or select count(*) form GV$SQLAREA gives an ORA-00600 error and the usual trc file to wonder over. Happens in 10.2.0.2 to 10.2.0.3 on any platform and is fixed on 10.2.0.4. No backport I am aware of.  Check out doc ID  357016.1 (Sorry if this is wrong but I am having major cut n paste issues over crap Virign wireless Broadband sevices, amongst other crap Virgin Wireless Broadband issues). Inclusion of the SQL_FULLTEXT (CLOB) column is suggested as the issue.

select * from gv$sql on a RAC enviornment
ORA-00600: internal error code, arguments: [kolaslGetLength-1], [], [], [], [], [], [], [].
Doc ID: 357016.1 Type: PROBLEM
Modified Date : 10-SEP-2008 Status: PUBLISHED

Our solution was to select the exact columns we wanted and this excluded the SQL_FULLTEXT column. The error went away. I did not pursue any further as it is a known but generally “swept under the Oracle Carpet” error. Heck, it only impacts those nutters using RAC… plus multibyte character set,  like al32utf8.

Next up, selecting * or a set of columns from GV$SQL_PLAN on linux on 10.2.0.3 gives an ORA-12805 error.

ORA-12805: parallel query server died unexpectedly
ORA-06512: at “MWPERF.PKG_XXX_MONITOR”, line 311
ORA-06512: at line 1

Now, we really did NOT expect that as we had turned off parallel processing across the DB but I figure if you query against a GV$ view it is going to run across all nodes and compile the results, so parallel is forced. Well, on 10.2.0.3 on HP hardware/Redhat linux it barfs.

I reproduced the error by pulling the insert statement driven off GV$SQL_PLAN into a simple insert statement, it gave the same error.

Just using V$QL_PLAN you get ORA-03113 end-of-file on communication channel. I think this is occuring on the remote nodes and passing back the 12805 error.

INSERT INTO ad_sql_runtime_plan
*
ERROR at line 1:
ORA-03113: end-of-file on communication channel

Anyway, google and metalink both threw up no 10.xxxxx errors when I searched but did reveal a 9i errorr when trying to look at FILTER_PREDICATES column, fixed in 10.1 (Bug 3545517). 
Also, I found bug 5166445 linking to 5933643. Which is no bloody {sorry} use to anyone as 5933643 is one of those oh-so-common oracle bug/issue numbers that links to a page saying “you can’t see this as it is private” which is just an insult to people trying to solve their own problems. Bad Oracle Corp, Bad Boy.
{Yes, I am annoyed by this, it is just simply poor by Oracle to reference an SR on public metalink that is not publically viewable and TOO MUCH is not publically viewable – I’ve know incidents when my own damn SR is not publically viewable and I had no idea why}.

It looks like there was a known and specific bug with FILTER_PREDICATES that impacted 9.2.0.3 to 9.2.0.4. Maybe to commemorate it Oracle introduced a similar bug on the same table for 10.2.0.3 to 10.2.0.4? On a different column of course.

SO, the solution. Well, I managed to narrow it down to one column. I could select count(column_name) from GV$SQL_PLAN for every column except OTHER_XML which is a Clob and PROJECTION, which is the problem. THat was the only column that errored and it gave the error codes I stated above.

Do select * or include PROJECTION in your column list and selects against GV$SQL_PLAN and V$SQL_PLAN may well go wrong under 10.2.0.3 on linux or wherever you get the problem. Remove the column and you could be OK. If not, test by using “select count {column-name} from V$SQL_AREA;” for all the other columns one by one and see what errors for you.

I did check to see if GV$sql_plan resolved to a complex view on many x$ objects but it did not, all of the columns (including PROJECTION) come off the X$KQLFXL internal object.

Now, the question is, having solved the problem for my own specific requirement on a RAC/linux 10.2.0.3 platform and done a Blog entry, is it worth spending half a day raising this obscure bug with Oracle Corp…?

All Is Well in Widlake World August 27, 2009

Posted by mwidlake in Private Life.
Tags: ,
3 comments

I’m about to post a proper blog entry but, before that, I’d like to thank those of you who contacted me to ask how my Mother was doing. She’s doing OK, she has remembered how to breath and the cause of the whole issue has been identified, is treatable and is under control.

Now I just have to brace myself for when she can speak again and tells us all how bored and uncomfortable she is. Thankfully the Nurses will take the brunt of it and they are good at being understanding. Maybe in 6 weeks we will have her back home, looking after herself and my current main employer will not have to be as very understanding as they have been (thankyou Employer).

Oh, and the skills at reading graphs came in useful again. I asked a few days ago about “That set of spikes” and the nurses got agitated. Until they realised my Mother had took off a mointoring lead as it was annoying her. Mothers huh?

Friday Philosophy – When the Job Meets Real Life August 15, 2009

Posted by mwidlake in Private Life.
Tags: ,
1 comment so far

For my Friday Philosophy this week, I’m going way off topic. I’m straying into non-work life. Actually, let’s be accurate, I’m wading deep into personal life. If you want to know about Oracle mechanics, this ain’t the post for you.

It’s been a trying week this week, so much so that I have not posted for a few days – and I suspect I won’t post at all next week. About two hours after completing my last post on Tuesday, my wife called me to let me know my mum had been taken into hospital. My mother had developed some breathing difficulties, in that she’d forgotten completely how to do it.

So, after tube/train/drive across the country I found myself sat in an intensive care unit with my Brother, wondering how in heck so many tubes can be attached to such a small person as my mother, but they managed it. All of this medical monitoring stuff is attached to computers, many with readouts.

I have a couple of advantages over most people in these situations. I studied biology at University and I worked either within or along side the NHS for 7 or 8 years, on hospital patient systems. So I understand a bit about all the equipment that is used and the data it produces.

What has this to do with the world of Oracle Performance? Not an awful lot. Except for one thing.

Sometimes, though I love what I do for a living and find solving performance problems stimulating and satisfying, I question “what is the point” in the whole real-world situation. I was sat there at the side of my mother’s bed, exchanging idle chat and some black humour with my brother {it’s the way our brains are wired, I blame my Mum}. Suddenly I stopped listening to my brother. The pattern had changed. The graphs had shifted and the figures had altered on the screens attached to the equipment monitoring my mother. I’d picked this up as I was used to watching performance graphs for computer systems. My brother was oblivious. Well what a surprise, an IT skill that turns out to be useful in the real world. Spotting graph/pattern changes.

As it turns out, the nursing staff had spotted these anomolous graphs too, glanced over, and realised it was just “one of those things”. Status quo was restored about 1 minute and 3 pints of my sweat later.

So why do I think my ability to spot a change in “performance graphs” and scare myself so deeply is a good thing? Because at least I had a feel for what was going on and I felt less clueless and helpless.

I’ve looke back on this and come to an even more shocking thought. There is a management technique that helps in real life. I have been a manager and I was surrounded by experts in their field. I was sat in a real-world situation, surrounded by experts in their field. When they did not react to the changed pattern on my mother’s monitors, I was reassured that it was not serious. So maybe some management skills have other uses too. I’ll remember that the next time someone tells me all management skills are bunkum. {But it probably still holds that most Managers are Bunkum 🙂 }

I wish that more IT managers could treat their staff this way. If the DBA team
{or Sys admin team, or network team} do not respond to the graphs as a sign of impending doom, then it probably ain’t impending doom, so trust them.

And of course the other reason I’ve blogged about this is it’s an outlet to a certain amount of trepidation about the future. Maybe I should have stuck to Biology rather than IT. *sigh*.

Decrypting Histogram Data August 11, 2009

Posted by mwidlake in internals, performance.
Tags: , , ,
5 comments

{part two is here}

I’m looking at Histograms at the moment and I’ve hit a problem I always hit when looking at histograms. So I’ve decided once and for all to see if I can find a solution.

The value held in DBA_TAB_HISTOGRAMS.ENDPOINT_VALUE for VARCHAR2 columns is a number with no apparent link to reality.

{Before anyone who has read my previous posts suggests it, I’ve already looked at the view and the relevant column resolves to data in either sys.histgrm$ or sys.hist_head$ and the underlying values held are still numbers and not processed in the view, so they are numbers. I think}.
{{I have a sneaking suspicion this could turn into a set of increasingly obscure postings…}}
{{{THIS one might be my most obscure so far!}}}

Let’s have a look:

select --owner
 table_name
,column_name
,endpoint_value  end_val
,endpoint_number rowcount
from all_tab_histograms
where table_name ='TEST_P'
and  owner='WIDLAKEM'
order by table_name,column_name,endpoint_number
/
TABLE_NAME      COLUMN_NAME        END_VAL        ROWCOUNT
--------------- --------------- ---------- ---------------
TEST_P          STATUS                   0          41,523
TEST_P          STATUS                   1          42,065
TEST_P          STATUS                   2          42,099
TEST_P          VC_1            3.3882E+35               0
TEST_P          VC_1            3.4951E+35               1
TEST_P          VC_1            3.6487E+35               2
TEST_P          VC_1            3.7558E+35               3
TEST_P          VC_1            3.9095E+35               4
TEST_P          VC_1            4.0162E+35               5
TEST_P          VC_1            4.1697E+35               6
TEST_P          VC_1            4.3235E+35               7
TEST_P          VC_1            4.4305E+35               8
TEST_P          VC_1            4.5841E+35               9
TEST_P          VC_1            4.6914E+35              10

So what in heck is 4.6914E+35 {last value listed}?

Well, it is a number of course. If I set my column heading appropriately I can see this:-

col end_val form 999,999,999,999,999,999,999,999,999,999,999,999
col column_name form a10
select column_name
,endpoint_value  end_val
,endpoint_number rowcount
from all_tab_histograms
where table_name ='TEST_P'
and  owner='WIDLAKEM'
and column_name='VC_1'
order by table_name,column_name,endpoint_number
/
COLUMN_NAM                                          END_VAL   ROWCOUNT
---------- ------------------------------------------------ ----------
VC_1        338,822,823,410,931,000,000,000,000,000,000,000          0
VC_1        349,512,448,932,628,000,000,000,000,000,000,000          1
VC_1        364,867,102,368,954,000,000,000,000,000,000,000          2
VC_1        375,575,265,424,979,000,000,000,000,000,000,000          3
VC_1        390,949,334,699,583,000,000,000,000,000,000,000          4
VC_1        401,618,352,253,998,000,000,000,000,000,000,000          5
VC_1        416,972,612,321,579,000,000,000,000,000,000,000          6
VC_1        432,345,727,450,272,000,000,000,000,000,000,000          7
VC_1        443,054,364,035,286,000,000,000,000,000,000,000          8
VC_1        458,409,658,691,912,000,000,000,000,000,000,000          9
VC_1        469,139,289,515,351,000,000,000,000,000,000,000         10

The ENDPOINT_VALUE is an encoded representation of the varchar2 value, based on the trick of turning each character into it’s ascii equivalent (0-255) and then multiplying it by 256*no-of-characters-in to the string and adding it all together. Let’s use “Cat” as an example.

  • Last character is ‘t’, ascii value is 116=116;
  • Next-to-last character is ‘b’, ascii value 97, *(256)= 24832;
  • Chr 3 in is ‘C’, ascii value 67, *(256*256) =4390912
  • Total is 4390912+24832+116=4415860

What Oracle actually does is slightly different {and I could have this slightly wrong, my brain waved a white flag a couple of hours back} it takes the first 15 characters and multiplies the first character ascii value by power(256*15), second character ascii value by power(256*14) etc until it runs out of characters or gets to character 15.

How do I know this? I decoded the ‘C’ in the kernel :-). No, I didn’t, I found a link to an encode script Jonathan Lewis had written {it’s also in his latest book, or at least in the set of scripts the book tells you how to download}. I’ve lost the original link I found, and thus a nice little article on the whole topic, but this link will do until I can re-find the original. {NB this referenced article mentions using the hexstr function which may be neater but, as I said earlier, my brain has given up}.

I’ve outrageously stolen Jonathan’s script which encodes a varchar2 string into the relevant numeric and used it as a template to create a decode version too. Just a couple of notes:

– I can imagine this might not work if you have multibyte characters.
– Oracle confuse things by doing something like translating the 15 character into a 36-digit string – and then taking only the first 15 significant digits. This kind of messes up the untranslate.

So, I create three functions:

  • hist_chartonum that converts a varchar2 into something very similar to what is stored in the histogram views. Jonathan will find it incredibly familiar. An optional second paramter turns off the truncation that oracle does for the histogram data, so you can get the full glorious 36 significant digits for 15 characters if you wish.
  • hist_numtochar which converts the numeric back to a string. It gets it slightly wrong due to the truncation of the least significant 20 or so characters. Again, an optional second paramater allows it to not replicate the trunaction and work with all 15 characters.
  • hist_numtochar2 which is an attempt to allow for the truncation errors. I add power(256,9) to the value if it has been truncated. It seems to make eg the numeric representation of AAAAA translate back to AAAAA not AAAA@. Yes, it’s a fudge.

This is the script:-

-- cre_hist_funcs.sql
-- Heavily borrowed from Jonathan Lewis, sep 2003
-- MDW 11/8/09 - all mistakes mine.
-- hist_chartonum converts a string to a number in a very similar way
-- to how Oracle does so for storing in column ENDPOINT_VALUE in XXX_TAB_HISTOGRAMS
-- Optionally get it to not truncate the value in the same way to only 15 digits.
-- hist_numtochar converts the ENDPOINT_VALUE back to human-readable format.
-- Optinally get it to go beyond 15 numerics (7 or 8 characters)
-- hist_numtochar2 attempts to allow for truncation errors.
-- JLs version has been checked against 8.1.7.4, 9.2.0.4 and 10.1.0.2
-- I've used on 10.2.0.3
set timing off
create or replace function hist_chartonum(p_vc varchar2
                                         ,p_trunc varchar2 :='Y') return number
is
m_vc varchar2(15) := substr(rpad(p_vc,15,chr(0)),1,15);
m_n number := 0;
begin
  for i in 1..15 loop
/*  dbms_output.put(ascii(substr(m_vc,i,1)));
    dbms_output.put(chr(9));
    dbms_output.put_Line(to_char( power(256,15-i) * ascii(substr(m_vc,i,1)),
                                '999,999,999,999,999,999,999,999,999,999,999,999'
                                 )
                        ); */
    m_n := m_n + power(256,15-i) * ascii(substr(m_vc,i,1));
  end loop;
-- this converts it from a 36 digit number to the 15-digit number used
-- in the ENDPOINT_VALUE Column.
  If p_trunc = 'Y' then
    m_n := round(m_n, -21);
  end if;
-- dbms_output.put_line(to_char(m_n,'999,999,999,999,999,999,999,999,999,999,999,999'));
return m_n;
end;
/

create or replace function hist_numtochar(p_num number
                                         ,p_trunc varchar2 :='Y') return varchar2
is
  m_vc varchar2(15);
  m_n number :=0;
  m_n1 number;
  m_loop number :=7;
begin
m_n :=p_num;
if length(to_char(m_n))<36 then
--dbms_output.put_line ('input too short');
  m_vc:='num format err';
else
  if p_trunc !='Y' then
    m_loop :=15;
  end if;
--dbms_output.put_line(to_char(m_N,'999,999,999,999,999,999,999,999,999,999,999,999'));
  for i in 1..m_loop loop
    m_n1:=trunc(m_n/(power(256,15-i)));
--    dbms_output.put_line(to_char(m_n1));
    if m_n1!=0 then m_vc:=m_vc||chr(m_n1);
    end if;
    dbms_output.put_line(m_vc);
    m_n:=m_n-(m_n1*power(256,15-i));  
  end loop;
end if;
return m_vc;
end;
/
create or replace function hist_numtochar2(p_num number
                                         ,p_trunc varchar2 :='Y') return varchar2
is
  m_vc varchar2(15);
  m_n number :=0;
  m_n1 number;
  m_loop number :=7;
begin
m_n :=p_num;
if length(to_char(m_n))<36 then
--dbms_output.put_line ('input too short');
  m_vc:='num format err';
else
  if p_trunc !='Y' then
    m_loop :=15;
  else
   m_n:=m_n+power(256,9);
  end if;
--dbms_output.put_line(to_char(m_N,'999,999,999,999,999,999,999,999,999,999,999,999'));
  for i in 1..m_loop loop
    m_n1:=trunc(m_n/(power(256,15-i)));
--    dbms_output.put_line(to_char(m_n1));
    if m_n1!=0 then m_vc:=m_vc||chr(m_n1);
    end if;
    dbms_output.put_line(m_vc);
    m_n:=m_n-(m_n1*power(256,15-i));  
  end loop;
end if;
return m_vc;
end;
/
rem
rem Sample of use:
rem 

col id1 format 999,999,999,999,999,999,999,999,999,999,999,999
col id3 format 999,999,999,999,999,999,999,999,999,999,999,999
select
  'short'  text
 ,hist_chartonum('short') id1
 ,hist_numtochar(hist_chartonum('short')) id2
 ,hist_chartonum('short','N') id3
 ,hist_numtochar(hist_chartonum('short','N'),'N') id4      
from dual;
select
  'alongteststring'  text
 ,hist_chartonum('alongteststring') id1
 ,hist_numtochar(hist_chartonum('alongteststring')) id2
 ,hist_chartonum('alongteststring','N') id3
 ,hist_numtochar(hist_chartonum('alongteststring','N'),'N') id4      
from dual;
select
  'amuchlongerteststring'  text
 ,hist_chartonum('amuchlongerteststring') id1
 ,hist_numtochar(hist_chartonum('amuchlongerteststring')) id2
 ,hist_chartonum('amuchlongerteststring','N') id3
 ,hist_numtochar(hist_chartonum('amuchlongerteststring','N'),'N') id4      
from dual;
spool off

The test at the end produces the following output:-

strng  histgrm_translation
------------------------------------------------
translate_back
------------------------------------------------
untrimmed_translation
------------------------------------------------
untrimmed_translate_back
------------------------------------------------
short  599,232,339,077,851,000,000,000,000,000,000,000
short
 599,232,339,077,850,549,055,205,294,263,500,800
short

alongteststring  505,852,124,009,532,000,000,000,000,000,000,000
alongtm
 505,852,124,009,531,849,435,547,374,314,942,055
alongteststring

amuchlongerteststring  505,872,878,384,947,000,000,000,000,000,000,000
amuchly
 505,872,878,384,946,809,648,430,989,359,543,156
amuchlongertest

Final comment. Oracle trims the value stored in sys.histgrm$ and knackers up the untranslate. Why? Well, this is only me thinking out aloud, but this reduces the number from 36 to 15 significant digits and numerics are stored as one byte per two digits plus an offset byte {am I missing a length byte?}. So 10 bytes or so. This will keep the size of data stored down. I mean, as I posted here, the sys.histgrm table gets big enough as it is!

Tomorrow {or whenever my brain recovers} I’ll show how I use it and one or two oddities I have found or come across from other people’s blogs.

Friday Philosophy – Should the Software or the User be the Stupid One? August 7, 2009

Posted by mwidlake in internals, performance.
Tags: , ,
5 comments

Oracle’s performance engine is complex and copes with a lot of database situations automatically – or to be more precise, it tries to cope with lots of database situations automatically.

Over the last few versions, Oracle has added many, many things to allow the database to cope automatically with all sorts of different data volumes, spreads of data, relationships between tables, use of different oracle technologies (By this I mean bitmap tables, index tables, partitions, clusters, external tables). All of these things aim to allow the database to just “cope” with whatever you need it to do, with less and less effort by the users {by users, I mean technical users; DBAs and Developers}. Thus it allows for “stupid” users. {I mean no offence, maybe read “inexperienced” instead of stupid}.

As an example, you can now have some very large tables consisting of several partitions and some status look-ups. You query against them. Oracle’s CBO will automatically ignore partitions it can ignore, use indexes or full table scans to use the least amount of IO,use histograms to spot where clauses are on low-cardinality values, Hash joins rather then nested loops as appropriate depending on memory availability, use bitmap indexes when it thinks it can and merge the results from several bitmap indexes, use function based indexes to support functions in where clauses….
It even self-gathers the information to look after all this. Column usage and table modifications are tracked, statistics are gathered when needed and in ways to support data skew, PGA and SGA can be automonitoring and managing…

It all sounds great. In fact, most of the time, for most people, it is great. {I know, most people reading this post are probably people who have encountered the problem systems and so know it goes wrong and so you need more knowledge to cope – you are a biased set of people. In the nicest way, I should add 🙂 } The idea is, I believe, that you do not neet to be smart to look after Oracle.

If it is not great, if this highly complex system gets it wrong and tries to satisfy SQL statements in sub-optimal ways, then the User has to step in and fix things. ie You.

It is now horrifically complex for us technical users to understand what is going on. You have to not only be “not stupid”, but “not average” either. Sometimes you have to be “not great”, ie brilliant.

In my example, we need to look at if the SQL is constructed to allow the indexes to be used, are functions correctly laid out to use function indexes, are partitions being maintained correctly, when were stats last gathered, did it include histograms and do they help, has oracle missed the need for histograms, are the indexes analyzed at a high enough sample size, are the bitmaps greatly slowing down inserts, have hints been used on the code, are initialisation parameters set to override default fucntionality…

You get the idea, I won’t drone on further. I didn’t even mention memory considerations though {OK, I’ll shut up}.

My point is, the more complex the software, the more “intelligent” it is, the more it is designed to allow for “stupid” users, then the more super-intelligent the user has to be to cope when it breaks.

How about an alternative?

How would it be if we went back to the Rule Based Optimizer and no automatic management of complex situations?

Oracle would maybe need to add a few rules to the RBO for it to cope with later developments, so it would be slightly more complex than V6 but not a lot.
Everything else, the User decides. You only gather stats you decide to gather, on objects you decide need them. No you don’t, it’s a Rule Based Optimizer – no stats gathering! {But see below}.

No automatic memory management. No automatic anything.

The User {the technical user, the DBA and Developer} would have to be smart. Not brilliant, just smart. You would probably have to do more, but most of it would be easier as the levels of complexity and interdependence are reduced. All those tweaks and tricks in the CBO and all the monitoring to cope with “complex” would not exist to go wrong.

Plus it might solve another concern I have. I think there is a chasm growing as there is no need to solve simple problems as Oracle copes but then having to solve complex problems when Orcle does not cope. If you don’t develop skills and experience solving the simple problems, how do you solve the complex ones? I think this is why most Oracle performance and architecture experts are old {Sorry, pleasantly middle-aged}. Young people new to the arena have a massive learning mountain to climb.

So, if we have stupid software, maybe we can get away with more stupid “smart” expert users. ie ALL of us can cope. You cut your teeth on smaller, simpler systems and learn how to cope with the stupid software beast. As you learn more, you learn to cope with more complex situations and they never get that complex as the database is not so “clever”

I’d actually still argue that all the intelligence gathering the Oracle database does should still continue – stats gathered on objects, the ability to gather information on memory usage and thus advice on changes, tracking column usages and table changes. But We, the Stupid Users get to look at it and use it as we see fit for our systems.

I’m sure many systems would not work quite so fast in my senario, but I’d rather have a system working at 75% it’s theoretical fastest all the time rather than one working at 95% and breaking regularly, and in ways so complex it needs weeks to work out and fix.

I now await all the comments to tell me how stupid I am {I can be blindlingly stupid, especially on Fridays}.

Looking inside v$ performance views August 5, 2009

Posted by mwidlake in internals, performance.
Tags: ,
7 comments

I keep forgetting how to look inside the V$ performance objects, I’ve got a mental block on it. I had to ask a colleague again this week how to do it. So I’m blogging it. This should fix it in my mind.

You use the V$FIXED_VIEW_DEFINITION dynamic view to seem them. You have to use this as the damned things hide in a circular data dictionary black hole.

Here is an example. I want to know what the v$locked_object is actually looking at.

desc v$LOCKED_OBJECT
Name                                      Null?    Type
 ------------------------------------------------------
 XIDUSN                                             NUMBER
 XIDSLOT                                            NUMBER
 XIDSQN                                             NUMBER
 OBJECT_ID                                          NUMBER
 SESSION_ID                                         NUMBER
 ORACLE_USERNAME                                    VARCHAR2(30)
 OS_USER_NAME                                       VARCHAR2(30)
 PROCESS                                            VARCHAR2(12)
 LOCKED_MODE                                        NUMBER

OK, what is v$locked_object, is it a table, a view, a synonym…?

select owner,object_name,object_type
from dba_objects
where object_name ='V$LOCKED_OBJECT'

OWNER      OBJECT_NAME     OBJECT_TYPE
---------- --------------- ---------------
PUBLIC     V$LOCKED_OBJECT SYNONYM

It’s a synonym. Not a view, even though people refer to the v$ views or performance views.

What is it a synonym for?

select synonym_name,table_owner,table_name
from dba_synonyms
where synonym_name='V$LOCKED_OBJECT'

SYNONYM_NAME    TABLE_OWNER     TABLE_NAME
--------------- --------------- --------------------
V$LOCKED_OBJECT SYS             V_$LOCKED_OBJECT

OK, it’s a synonym on sys.v_$locked_object. What is that?

select owner,object_name,object_type
from dba_objects
where object_name ='V_$LOCKED_OBJECT'

OWNER      OBJECT_NAME      OBJECT_TYPE
---------- ---------------- ---------------
SYS        V_$LOCKED_OBJECT VIEW

It’s a view. Good, so what is commonly called a view IS a view, hidden by a synonym. What is also good is that we have been here before, I posted about looking inside data dictionary viewsabout a month ago. You can just extract the SQL of the view…

select owner,view_name,text
from dba_views
where view_name='V_$LOCKED_OBJECT'

OWNER           VIEW_NAME
--------------- --------------------
TEXT
--------------------------------------------------
SYS V_$LOCKED_OBJECT
select "XIDUSN","XIDSLOT","XIDSQN","OBJECT_ID","SE
SSION_ID","ORACLE_USERNAME","OS_USER_NAME","PROCES
S","LOCKED_MODE" from v$locked_object

Bingo! Its a simple view on top of v$locked_object.
Sorry? v$locked_object? That’s where we started.
v$locked_object is a synonym for
v_$locked_object that is a view on top of
v$locked_object.

This is the circular black hole I mentioned. Grrrr.

The solution is the afore mentioned v$fixed_view_definition.

desc v$fixed_view_definition
 Name                    Null?    Type
 ----------------------- -------- ----------------
 VIEW_NAME                        VARCHAR2(30)
 VIEW_DEFINITION                  VARCHAR2(4000)

select * from v$fixed_view_definition
where view_name = 'V$LOCKED_OBJECT'

VIEW_NAME
------------------------------
VIEW_DEFINITION
--------------------------------------------
V$LOCKED_OBJECT
select xidusn,xidslot,xidsqn,object_id,session_id,
oracle_username,os_user_name,process,locked_mode 
from gv$locked_object
where inst_id = USERENV('Instance')

It is a view on gv$locked_object. OK, let’s go look at that {anyone getting a bad feeling about this becoming another circular reference?}

select * from v$fixed_view_definition
where view_name = 'GV$LOCKED_OBJECT'

VIEW_NAME
------------------------------
VIEW_DEFINITION
----------------------------------------------
GV$LOCKED_OBJECT
select x.inst_id,x.kxidusn, x.kxidslt, x.kxidsqn, l.ktadmtab,
s.indx, s.ksuudlna, s.ksuseunm, s.ksusepid, l.ksqlkmod
from x$ktcxb x, x$ktadm l, x$ksuse s
where x.ktcxbxba = l.kssobown and x.ktcxbses = s.addr

Now that’s better. It is showing what that view is really looking at.

I don’t know why the v$xxxx synonym to v_$xxxx view on v$xxxx synonym circular reference exists but I suspect it is something to do with allowing Oracle’s internal code to realise it has to look elsewhere to resolve the v$ objects, ie as being built on top of the x$ objects – which are themselves a way of exposing C objects {I think they are C objects} on memory structures… I’ll stop now as I am in deeper than I can swim. Maybe someone more adept with oracle internals has worked this out and blogged about it but I have not found it yet {maybe Jonathan Lewis has, he started mentioning the x$ and v$ views back in Oracle 7! Or René Nyffenegger, who’s pages on the v$ and x$ stuff I find very useful}

Sadly, you can’t look at the x$ objects at all unless you are logged on as SYS. If you have access to the SYS user on your company systems you should know enough to not go poking around on such things on production systems. Install Oracle on your PC and poke around on there. It can be addictive though.

I think I’ll leave it there for tonight.

Spending Time in London August 4, 2009

Posted by mwidlake in Blogging, Perceptions.
3 comments

I’m currently working in London, to the west of the centre but within coverage of the underground system. As a result, getting to and from home and work takes a looong time. This gives me a little time to do stuff on my netbook on the train, but if it is really busy and when I am on the tube it is not possible to do this. It’s wasted time.

To keep as much of my sanity intact as I can and reduce the amount of time I spend doing nothing but wondering why all other commuters appear so unfriendly {I know, most of them are thinking exactly the same :-)}, I opt to stay in London a couple of nights a week. The benefit of this is that either I have nothing to do in the evening and I can read manuals and do this blog {or alternatively watch rubbish on TV or read a book}. Or I can go and drink beer with people I know. I don’t know that many people in London, I have to confess, but I have had a couple of very enjoyable evenings so far.

Tonight was with Doug Burns. I don’t know Doug that well, but have enjoyed talking to him at UKOUG meetings and exchanging emails/comments on blogs. It was excellent to spend a couple of hours and maybe one too many beers with Doug and talk about what can only be described as an eclectic range of topics. I also managed to mug him for his excellent presentation on AWR for the next MI SIG meeting in October. I hope that last beer was not too much and he remembers…

This is part of the whole Oracle Community thing. It’s good to Blog, it’s good to go on forums and it is good to exchange emails, but you can’t beat meeting in person, either at conference, at user group meetings or just because you are in the same town that evening. I find once you have met, communicating is a lot easier {I had an excellent night in Newcastle with Piet de Visser about 18 months ago and now we exchange rants and thoughts quite often}. So, if anyone out there is in London and fancies a beer, you could drop me a line. I’ll buy the first one if you ask nicely.

Right, where was that manual on oracle wait interface…Oh, “Celebrity animals have got talent on ice” has come on the TV, maybe I’ll watch that.

Why is my SYSTEM Tablespace so Big? August 3, 2009

Posted by mwidlake in internals, VLDB.
Tags: , ,
25 comments

How big is your SYSTEM tablespace? 512M? 1GB? 4GB? Bigger? Is it bigger than the whole of the first critical database you ever looked after? Why has the SYSTEM tablespace got so big?

You can easily check the size of the SYSTEM tablespace with any oracle development/admin GUI or with this script.

@dbf_lst
Enter value for ts_name: system
old   8: where tablespace_name like upper('&ts_name'||'%')
new   8: where tablespace_name like upper('system'||'%')
continue> 

FILE_NAME                                  F_ID  TS_NAME         SIZE_M   THOU_BL
----------------------------------------- ----- ---------------- -------- -------
+DISCG/sid/datafile/system.272.596722817      1 SYSTEM             24,920   3,190

That is 24GB. It is not the biggest one I have seen, but it is up there in the top 3.

Not so many versions ago, it was not unusual to have a system tablespace of less then 128MB in size. In fact, if you did not use stored PL/SQL this was probably the case up to and through 8i and into 9i. So – it’s PL/SQL causing the bloat? No, probably not. Ahhh, what about all that AWR automatic real-time monitoring gubbins you hear so much about? Nope, that goes into SYSAUX.

No, it’s probably down to one or both of two internal tables, SYS.AUD$ and SYS.HISTGRM$.

Let’s check out what are the biggest objects in this particular SYSTEM tablespace:

select owner,segment_name,segment_type
 ,bytes/(1024*1024) size_m
 from dba_segments
 where tablespace_name = 'SYSTEM'
 and    bytes/(1024*1024) >1
 order by size_m desc

OWNER        SEGMENT_NAME         SEGMENT_TYPE         SIZE_M
------------ -------------------- ------------------ --------
SYS          C_OBJ#_INTCOL#       CLUSTER              13,313
SYS          AUD$                 TABLE                 5,688
SYS          I_H_OBJ#_COL#        INDEX                 1,863
SYS          I_AUD1               INDEX                 1,606
SYS          HIST_HEAD$           TABLE                   311
SYS          SOURCE$              TABLE                   224
SYS          IDL_UB1$             TABLE                   224
SYS          C_FILE#_BLOCK#       CLUSTER                 208
SYS          INDPART$             TABLE                   160
SYS          OBJ$                 TABLE                   144
SYS          I_HH_OBJ#_COL#       INDEX                   128
SYS          I_HH_OBJ#_INTCOL#    INDEX                   128
SYS          I_OBJ2               INDEX                    80
SYS          I_FILE#_BLOCK#       INDEX                    62
SYS          TABPART$             TABLE                    57
SYS          I_OBJ1               INDEX                    49
{snip}

To be a little obtuse, I’ll start with the second largest object, SYS.AUD$, which you can see is getting close to 6GB in size {Oh, and SYS.I_AUD1, the fourth largest object, is the PK index on AUD$}.

AUD$ is the underlying table that holds all of the system auditing information that is turned on and off with the SQL AUDIT command. You can use AUDIT to monitor particular users, changes to system priveleges, access on key tables or particular activity like table changes. Go check the SQL manual if you want to know more.
The records for this captured activity all go into this table. On this particular system a few key priveleges and logons by certain users are the only things being audited, but the data has built up over 3 or 4 years.

On version 9 and 10 the options for managing this table are…poor. Bascially, you have to do it yourself. I think I am right to say that it is the only SYS-owned table that Oracle Corp say it is OK for you to move into another tablespace yourself and that you are allowed to delete and truncate data from. If you AUDIT a lot of things or activities you probably need to delete some data. {If you AUDIT a lot of things or activities and you don’t delete any of the data, you probably did not turn on AUDIT very long ago or have not checked the size of your SYSTEM tablespace for a while…}.

Oh, and one final thing on the AUD$ table. It is lacking indexes – it has only one {ignoring LOB indexes}. This is a good thing as indexes slow down inserts quite significantly {I use a rule of thumb that if it takes “N“ms to insert a record into a table, it will take “2.2*N“ms if there is an index on the table, “3.4*N“ms if there are two indexes, “4.6*N“ms if there are three… Get the idea? However, I have not checked this for real in years. Treat this statement as a potential myth}. I would recommend you do not add further indexes to AUD$. If you need to investigate the information in there, I would strongly suggest you extract what you want into a new table with a CTAS statement, then index that copy table and do your work on that. You do not want to slow down further everything that is already slowed down a little by being audited.

What about the largest thing in the SYSTEM tablespace? It’s SYS.C_OBJ#_INTCOL# nd is over 13GB in size. I said it would be SYS.HISTGRM$ didn’t I?

Well, SYS.C_OBJ#_INTCOL# is a cluster. Clusters are structures, segments, that hold one or more tables. All records for the same key across all those tables are held together, in the same block or blocks. The key is either a normal index or a hash. The idea is that it speeds up selecting records from different tables with the same key – they are all in the same block or blocks. It does work too, but they seem to have fallen out of favour with Oracle designers, but are worth considering.

In this particular cluster there is only one table, SYS.HISTGRM$:

select owner,table_name
from dba_tables
where cluster_name='C_OBJ#_INTCOL#'

OWNER                          TABLE_NAME
------------------------------ ------------------------
SYS                            HISTGRM$
1 row selected.

So, SYS.HISTGRM$ is the only table in the cluster. It {and it’s cluster index, on columns OBJ# and INTCOL#} are hiding in this cluster segment. It IS the biggest thing in this particular SYSTEM tablespace. The only index, other than the cluster index, on the table SYS.HISTGRM$ is I_H_OBJ#_COL# and is the third largest object in the SYSTEM tablespace.

--ind_cols
select
INDEX_OWNER                 ind_owner
,INDEX_NAME                  ind_Name
,TABLE_NAME                  tab_Name
,substr(column_position,1,3) Psn
,COLUMN_NAME                 Col_Name
from dba_ind_columns
where table_name like upper(nvl('&tab_name','WHOOPS')||'%')
order by 3,1,2,4,5
--eof
@ind_cols
Enter value for tab_name: histgrm$

IND_OWNER   IND_NAME           TAB_NAME           PSN       COL_NAME
----------- ------------------ ------------------ --------- --------------
SYS         I_H_OBJ#_COL#      HISTGRM$           1         OBJ#
                                                  2         COL#
2 rows selected.
Elapsed: 00:00:02.07

-- you find the cluster index by looking on DBA_INDEXES and
--DBA_IND_COLUMNS where the "TABLE_NAME" is the name of the cluster
@ind_cols
Enter value for tab_name: C_OBJ#_INTCOL#

IND_OWNER   IND_NAME           TAB_NAME           PSN       COL_NAME
----------- ------------------ ------------------ --------- --------------
SYS         I_OBJ#_INTCOL#     C_OBJ#_INTCOL#     1         OBJ#
                                                  2         INTCOL#
2 rows selected.
Elapsed: 00:00:00.93

What is HISTGRM$? Do you think it looks like it could be something to do with column Histograms? Well it is, it’s the underlying data dictionary table that holds column histograms. It started to become common to see a large HISTGRM$ table with Oracle 9 systems with many partitions {and of course subpartitions, if used}, where statistics were gathered on these partitioned tables aggressively and included a METHOD_OPT clause that causes column statistics to be collected. eg FOR ALL INDEXED COLUMNS SIZE 200 or FOR ALL COLUMNS SIZE 100.

Where most people have seen it balloon in size though is with Oracle 10 and the introduction of…the Automated Stats gathering job {so, there is a connection to previous recent posts 🙂 }.
Every night, every weekend, any partition or sub-partiton that is 10% or more different to last time stats were gathered, get stats gathered. The automatic job gathers states with METHOD_OPT=AUTO. Which means your Oracle software decides which columns to gather histogram stats on and how many buckets it collects.

Whoever wrote that part of the stats gathering feature really liked to gather information.

If you have lots of partitions {or sub-partitions} and gather stats with the METHOD_OPT=AUTO, the two factors added together result in a massive HISTGRM$ table and thus a massive cluster.

Is space important? No, of course not, disc acreage is cheap, just buy bigger discs ;-). {That, of course, was a highly questionable suggestion}.
Being serious, I’m personally not too concerend about my SYSTEM tablespace being that big, after all compared to the size of databases that use thousands of partitions, 32GB is usually not a major concern, but I know of people who did not expect their system tablespace to get that big and have issues with simple running out of strage available to them.

I am more concerned about the impact on parse times and general data dictionary perforance of such large objects. I’ve found the data dictionary can be very slow for some queries when there are large numbers of extents, objects, columns…Histograms are just in there with the mix. I’ve had to work around such slow data dictionary access several times in the past and I suspect {though have not had time to in any way support this with evidence} that parsing is slowed down by unneccessary histograms. As an example of this, which actually has nothing to do with SYS.HISTGRM$, if you look back up this post at the code listing the indexes on the cluster and table, the code took between a second and two seconds to run. Normally it takes less than 1/10 of a second.

Can you do anything about the size of these objects? Well, with SYS.AUD$ Oracle allow you to delete from it and move the table to a new tablespace. If you use the SQL AUDIT command, you should include housekeeping the SYS.AUD$ table in your overall implementation.

SYS.C_OBJ#_INTCOL# and SYS.HISTGRM in it are slightly trickier. It is not easy to reduce the space used by these objects and thus the SYSTEM tablespace once they have grown large, but you can stop them getting bigger by collecting fewer column histogram statistics on your partitioned tables. Obviously that is not as simple as it sounds, but many sites with large numbers of objects do turn off the automated stats gathering and run their own code.