jump to navigation

ORA_ROWSCN – When Was My Record Commited January 25, 2016

Posted by mwidlake in SQL.
Tags:
1 comment so far

I was going to do a follow-up post to my post on USERENV(‘COMMITSCN’) just to describe the slightly better known but still overlooked (and possibly more useful) ORA_ROWSCN – but I don’t need to!

As Neil Chandler has done this excellent post on it

Go and look at that post, it has all the information and detail you were actually looking for.

However, for any of you search-engine-seekers who can’t face the pain of following a link {it’s such a hard life for the modern knowledge-by-mouse-click generation}, here are the bare bones:

ORA_ROWSCN is a pseudo column that, by default, shows you the SCN (System Change Number) when the block was last changed. You can create a table with the ROWDEPENDENCIES extension to track it for individual rows but you can’t alter a table to add it. So usually you see a (conservative) last-change-scn for the block. See Neil’s blog post for more detail

To convert the SCN to a date/time there are various ways but Oracle will do it for you if it was within the last few days for you – with SCN_TO_TIMESTAMP function. If the row/block was last changed more than a few days ago, that function will error with ORA-08181 (I wish it just returned null rather than 08181, but then you can always wrap the call in your own function that handles that…)

Below is a short demo using test tables I don’t describe:

select house_number, addr_line_1,post_code
      ,ora_rowscn
      ,scn_to_timestamp(ora_rowscn) time_of_scn
from address 
where house_number = 100
and addr_line_1 like 'COTHAM SOUTH%'
/

  HOUSE
 NUMBER ADDR_LINE_1          POST_COD ORA_ROWSCN TIME_OF_SCN
------- -------------------- -------- ---------- ------------------------
    100 COTHAM SOUTH TERRACE SH5 8FA    11425626 24-JAN-16 20.44.56.00000
    100 COTHAM SOUTH DRIVE   LS20 1QY   11427281 24-JAN-16 20.51.29.00000
    100 COTHAM SOUTH         BD17 7JW   11437843 24-JAN-16 20.53.39.00000
    100 COTHAM SOUTH TERRACE LS7 9SK    11448376 24-JAN-16 20.54.56.00000
    100 COTHAM SOUTH TERRACE LS16 4SW   11460162 24-JAN-16 21.20.29.00000
    100 COTHAM SOUTH TERRACE LS7 1GL    11461400 24-JAN-16 21.25.48.00000
    100 COTHAM SOUTH         LS20 1TO   11471921 24-JAN-16 21.26.53.00000
    100 COTHAM SOUTH DRIVE   LS1 5EJ    11471921 24-JAN-16 21.26.53.00000
    100 COTHAM SOUTH DRIVE   SG 3LO     11482461 24-JAN-16 21.28.05.00000
...

--However, if the change is more than about 120 hours ago...
select surname,first_forename,dob,ora_rowscn
     ,scn_to_timestamp(ora_rowscn) time_of_scn
from person where surname='KINOCK'
and DOB between sysdate -10000 and sysdate -9500
/

     ,scn_to_timestamp(ora_rowscn) time_of_scn
      *
ERROR at line 2:
ORA-08181: specified number is not a valid system change number
ORA-06512: at "SYS.SCN_TO_TIMESTAMP", line 1

-- which is a bit misleading as it was a valid SCN, just not for the SCN_TO_TIMESTAMP function
-- remove the column based on scn_to_timestamp...

select surname,first_forename,dob,ora_rowscn
     --,scn_to_timestamp(ora_rowscn) time_of_scn
from person where surname='KINOCK'
and DOB between sysdate -10000 and sysdate -9500

SURNAME         FIRST_FORENAME  DOB               ORA_ROWSCN
--------------- --------------- ----------------- ----------
KINOCK          ABIGAIL         22-APR-1989 00:00    2518996
KINOCK          FRANCESCA       23-APR-1989 00:00    2539749
KINOCK          GIANO           10-NOV-1989 00:00    2567890
KINOCK          GILLIAN         11-JAN-1990 00:00    2716278
...

Look, you really should go and look at Neil’s post: click here

Getting Your Transaction SCN – USERENV(COMMITSCN) January 19, 2016

Posted by mwidlake in development, performance, SQL.
Tags: , , ,
2 comments

A few days ago I was introduced (or re-introduced) to USERENV(‘COMMITSCN’) by Jonathan Lewis. This is an internal function that allows limited access to the SCN of your transaction.

I was trying to find a way to get the actual commit SCN easily as it struck me that Oracle would have it to hand somewhere and it would be unique to the change and generated very efficiently. I could not find anything to do it so I asked Jonathan and he pointed me straight to this post he did about it a while back. What a nice chap. However, the post is from 1999 (last CENTURY!) so I thought I should just check it out first…

I’ll say up front that this is an undocumented feature of a now-deprecated function. Well, almost undocumented – older SQL reference manuals mention that ‘commitscn’ returns a NUMBER as opposed to the VARCHAR2 returned by most parameters you can use with USERENV, but it does not list it as a valid parameter for that function.
USERENV has been deprecated since Oracle 10g (see the old 10g documentation link here about USERENV) and you have been instructed to use SYS_CONTEXT(‘userenv’,’parameter‘) as described in the 12c database SQL reference manual here. However, there is no way to get the commit SCN from SYS_CONTEXT that I can find, so I thought I’d check out if USERENV(‘COMMITSCN’) still works. It does, on my version of Oracle 12.1.0.2!

There are some strict limits to this function. To begin with, you can’t select it, you can only use it on insert/update:


-- attempt a simple select of the SCN
mdw> select userenv('commitscn') from dual;
select userenv('commitscn') from dual
               *
ERROR at line 1:
ORA-01725: USERENV('COMMITSCN')  not allowed here

--But I can use in an insert,
mdw> insert into test_scn (seq_no,vc1,scn_no)
  2  values (100,'abcd',userenv('commitscn'))

1 row created.

mdw> select * from test_scn where seq_no=100

    SEQ_NO VC1            SCN_NO
---------- ---------- ----------
       100 abcd         11144739

-- Now commit my new record
mdw> commit;
Commit complete.

mdw> select * from test_scn where seq_no=100
  2  /

    SEQ_NO VC1            SCN_NO
---------- ---------- ----------
       100 abcd         11144753

--LOOK at the value for SCN_NO now! Compare to before the commit!

If you look at the above you will see a couple of things. The first is that, as I said, you cannot SELECT the function USERENV(‘COMMITSCN’).
The other is, though a value is put into the column when I insert a row using it, and I see that when I query the information back, it changes when I commit. This is because Oracle is recording something at the point of commit, not at the point of the SQL statement running – and the new SCN is only generated when you commit. A lot could have happened since I did the INSERT, I might have gone for a cup of tea and a batch job kicked off doing 1 million transactions, each with it’s own SCN. So though a placeholder of the current SCN is put into your view of the table row, the value put in the actual table is generated at the time of the commit.

Another limiting rule is that you can only reference USERENV(‘COMMITSCN’) once in a transaction, for one row. If I try and create 2 rows using the function in the same transaction I get an error, if I try to update more than 1 row I get an error:

mdw> insert into test_scn (seq_no,vc1,scn_no)
  2  values (101,'abcd',userenv('commitscn'))

1 row created.

mdw> insert into test_scn (seq_no,vc1,scn_no)
  2  values (102,'abcd',userenv('commitscn'))

insert into test_scn (seq_no,vc1,scn_no)
            *
ERROR at line 1:
ORA-01721: USERENV(COMMITSCN) invoked more than once in a transaction

-- now test updating several records
mdw> commit;
Commit complete.

mdw> select * from test_scn;

    SEQ_NO VC1            SCN_NO
---------- ---------- ----------
         1 AAAAAAA      11143743
         2 AAAAAAA      11143746
         3 AAAAAAA      11143749
         4 AAAAAAA      11143774
         5 AAAAAAA      11143777
       100 abcd         11144753
       101 abcd         11145543

mdw> update test_scn set scn_no = userenv('commitscn');
update test_scn set scn_no = userenv('commitscn')
       *
ERROR at line 1:
ORA-01721: USERENV(COMMITSCN) invoked more than once in a transaction

-- but one row works
mdw> update test_scn set scn_no = userenv('commitscn') where rownum =1;
1 row updated.

USERENV(‘COMMITSCN’) is old, undocumented and limited in use. So why am I looking at it, let alone even telling you all about it? {Apart from the fact that a lot of you *love* this sort of obscure, tid-bitty stuff}. Well, as there is no replacement for it that I am aware of. You can get the current SCN in a couple of ways, the easiest probably being to get it from V$DATABASE:

mdw> select current_scn from v$database;
any key>

CURRENT_SCN
-----------
   11146718

However, that is the last SCN used at the time you check it and is not the SCN when you commit. ie it is a different thing. I always find it irksome on those odd occasions when something is deprecated in Oracle with nothing really to replace it.

I just demonstrate again that USERENV(‘COMMITSCN’) is a little special below, and not the same as just selecting SCN from V$DATABASE. Before I go any further, I think the value USERENV(‘COMMITSCN’) puts into the table is the actual COMMIT SCN minus 1. I mostly think this as Jonathan said so:-). I do see each time I run this test that the first select from V$DATABASE and then my insert and a commit straight away results in a value in the table 2 higher than the select.

Further iterations (2nd and 3rd in this case) show the value selected from V$DATABASE and the value inserted into TEST_SCN immediately after are the same, and are 3 higher than the previous iteration. I anticipated an increase of two, once for the change to the UNDO tablespace for the insert and once for the insert. I am not sure where the third one comes in.

However, in the fourth iteration I have a PAUSE in my SQL*Plus script between checking V$DATABASE and doing my insert and, in a second session, I do some simple inserts and commits {it does not matter what, so I don’t show it}. Thus the difference between the SCN I collected from V$DATABASE and the value inserted into the table.
Finally, in the fifth iteration, I check the value in V$DATABASE, do the insert, query it back and see the two are the same. And THEN I pause so I can do some changes and commit them in my second session. After I’ve done that I continue my first session which commits my latest insert into TEST_SCN. I check the values actually stored in the table and, just as at the top of this post, you see that the value actually preserved in the table is a later SCN than the placeholder one. It is doing something special.

(the below has the noddy code to create my test table and sequence as well as the test)

-- test_scn1
--create table test_scn (seq_no number,vc1 varchar2(10),scn_no number)
-- create sequence scn_seq;
select current_scn from v$database;
insert into test_scn values (scn_seq.nextval,'AAAAAAA',userenv('commitscn'));
commit;
select * from test_scn;
select current_scn from v$database;
insert into test_scn values (scn_seq.nextval,'AAAAAAA',userenv('commitscn'));
commit;
select * from test_scn;
select current_scn from v$database;
insert into test_scn values (scn_seq.nextval,'AAAAAAA',userenv('commitscn'));
commit;
select * from test_scn;
select current_scn from v$database;
pause 'commit some stuff in second session and then press any key'
insert into test_scn values (scn_seq.nextval,'AAAAAAA',userenv('commitscn'));
commit;
select * from test_scn;
select current_scn from v$database;
insert into test_scn values (scn_seq.nextval,'AAAAAAA',userenv('commitscn'));
pause 'commit some stuff in second session again and then press any key'
select * from test_scn;
commit;
select * from test_scn;
select current_scn from v$database;

-- the output of the test
mdw> @test_scn1

--check V$DATABASE SCN
CURRENT_SCN
-----------
   11147809

-- Create and commit 1 row
1 row created.
Commit complete.

    SEQ_NO VC1            SCN_NO
---------- ---------- ----------
        11 AAAAAAA      11147811
-- note that the inserted SCN is 2 higher than the current SCN.

--Same steps, 2nd iteration
CURRENT_SCN
-----------
   11147814

1 row created.
Commit complete.

    SEQ_NO VC1            SCN_NO
---------- ---------- ----------
        11 AAAAAAA      11147811
        12 AAAAAAA      11147814
-- select SCN and inserted SCN are the same, 3 higher than first iteration

-- same steps, 3rd iteration
CURRENT_SCN
-----------
   11147817

1 row created.
Commit complete.

    SEQ_NO VC1            SCN_NO
---------- ---------- ----------
        11 AAAAAAA      11147811
        12 AAAAAAA      11147814
        13 AAAAAAA      11147817
-- select SCN and inserted SCN are the same, 3 higher than first iteration

-- 4th iteration, a pause
CURRENT_SCN
-----------
   11147820

'commit some stuff in second session and then press any key'
-- I did indeed change and commit some stuff in second session, before I create my record in test_scn

1 row created.
Commit complete.

    SEQ_NO VC1            SCN_NO
---------- ---------- ----------
        11 AAAAAAA      11147811
        12 AAAAAAA      11147814
        13 AAAAAAA      11147817
        14 AAAAAAA      11147831
-- larger gap in SCN (11147817 to 11147831

-- 5th iteration, pause now after insert and before commit
CURRENT_SCN
-----------
   11147834

1 row created.
'commit some stuff in second session again and then press any key'
-- I did indeed change and commit some stuff in second session 

    SEQ_NO VC1            SCN_NO
---------- ---------- ----------
        11 AAAAAAA      11147811
        12 AAAAAAA      11147814
        13 AAAAAAA      11147817
        14 AAAAAAA      11147831
        15 AAAAAAA      11147834

-- Notice the current_scn from V$DATABASE and the last row in the table match, 11147834..

Commit complete.

    SEQ_NO VC1            SCN_NO
---------- ---------- ----------
        11 AAAAAAA      11147811
        12 AAAAAAA      11147814
        13 AAAAAAA      11147817
        14 AAAAAAA      11147831
        15 AAAAAAA      11147842

-- But after the commit the SCN in row "15" has increased to 11147842.

CURRENT_SCN
-----------
   11147851

-- and the next check of V$DATABASE SCN shows the usual increase of 3 by the commit.

As you can see from the above, USERENV(‘COMMITSCN’) is doing something a little special and, despite all the limitations, I might actually have a use for it…

Pragma UDF – Some Current Limitations November 11, 2015

Posted by mwidlake in performance, PL/SQL, SQL, Testing.
Tags: , , , ,
6 comments

There are currently some limitations to when pragma UDF will speed up your calls to PL/SQL functions from SQL.

In my post introducing the new pragma UDF feature of Oracle 12c I explained how it can be used to reduce the impact of context switching when you call a PL/SQL function from SQL.

In my example I showed how running a SQL-only SELECT statement that formatted a name for display over 100,000 records took 0.03 seconds went up to 0.33 seconds when the formatting SQL was put in a user defined PL/SQL function. This impact on performance is a shame as it is so beneficial to encapsulate business logic in one single place with PL/SQL. Stating that the PL/SQL function is a user defined one with the pragma UDF option reduced the run time to 0.08 seconds – which is removing most of the context switching overhead. Check out the prior post for full details.

This improvement in performance is great news and is as good, and sometimes better, than using the other new capability of 12c – allowing you to state a function as part of a SQL statement using the WITH clause, if you know about that (I plan to do a further post on that). As a quick example, here is my display name function code expressed within a WITH clause:

with 
  function l_disp_name(p_sn      in varchar2
                      ,p_fn1     in varchar2
                      ,p_fn2     in varchar2 :=null  
                      ,p_title   in varchar2 :=null )
return varchar2
is
v_return     varchar2(1000);
begin
  v_return := case when p_title is null then ''
                   else initcap(p_title)||' '
              end
            ||initcap(p_fn1)||' '
            ||case when p_fn2 is null then ''
                   else substr(p_fn2,1,1)||' '
              end
            ||initcap(p_sn);
return v_return;
end l_disp_name;
select  max(l_disp_name(p_sn =>surname           ,p_fn1  =>first_forename
                       ,p_fn2=>second_forename   ,p_title=>pers_title)        ) text_output
       ,count(*)
from pers
/

The above runs in 0.10 seconds, just slightly slower than 0.08 for my pragma UDF function

However, I need to warn you of some current limitations to pragma UDF. Nearly all the examples on the web so far are

  • using very, very simple functions that take in a number and return a number
  • Use a stand-alone stored function

And they work fine. However, I had real trouble getting a performance gain when I was working with my function that took in four varchar2 inputs and returned a varchar2 value. No error was given when I marked the function with pragma UDF but there was no performance gain (or loss).

I eventually worked out some limitations to pragma UDF on my version of Oracle – 12.1.0.2.0

  1. It gives a performance boost when the inputs and return values are NUMBER, VARCHAR2, multiple VARCHAR2 IN parameters
  2. There is no performance boost when either or both the IN parameter or RETURN value is a DATE
  3. There is no performance boost if there are any default values for VARCHAR2 IN parameters
  4. If the function gains a performance benefit from pragma UDF as a standalone stored function, it appears to also gain an advantage if it is a function defined as pragma UDF within a package – so you can still keep all your functions in packages.

You might notice that my example of using the WITH clause states a function that has default values. The WITH option gains the performance advantage of that feature just fine with IN parameter defaults.

The take-home message is that, at present, pragma UDF only seems to help functions with certain types of IN or RETURN values and is nullified by default values – so if you see no performance gain for your functions, this might be why. I need to stress that my tests were not exhaustive, I have not investigated many other combinations.

I’ve discussed the issue with a couple of people within Oracle and the relevant Product Manager is looking to investigate further for me, which is jolly decent of the fellow.

My investigation is of course only by empirical testing, it does not reveal how pragma UDF works. But, as I said in my first post, it seems to aid how information is passed between the PL/SQL and SQL engines as it is variation in those that seem to nullify the benefit of pragma UDF. If you want to duplicate my tests, you can do with the below scripts. I show my test output first, with comments produced with PROMPT commands. I then give you the SQL to create the test table, the functions and package I used and the test script. Feel free to take, expand and let me know of anything different or further you may find. I say nothing of interest after the scripts, so this is in effect the end of the post:-)

The output of my test, with prompts:

running udf_tests
investigating why pragam udf helps some simple functions and not others
---------------------------------------------------------------------- --

simple number in-number out function
NUMBER_OUTPUT   COUNT(*)                                                                            
------------- ----------                                                                            
  10000000000     100000                                                                            
Elapsed: 00:00:00.12

NUMBER_OUTPUT   COUNT(*)                                                                            
------------- ----------                                                                            
  10000000000     100000                                                                            
Elapsed: 00:00:00.03
** udf helps

simple varchar in-varchar out function
TEXT_OUTPUT                      COUNT(*)                                                           
------------------------------ ----------                                                           
XYYYY                              100000                                                           
Elapsed: 00:00:00.12

TEXT_OUTPUT                      COUNT(*)                                                           
------------------------------ ----------                                                           
XYYYY                              100000                                                           
Elapsed: 00:00:00.04
** udf helps

two varchar in-varchar out function, is the issue with more than one in parameter?
TEXT_OUTPUT                      COUNT(*)                                                           
------------------------------ ----------                                                           
XYYYYYYYYY                         100000                                                           
Elapsed: 00:00:00.14

TEXT_OUTPUT                      COUNT(*)                                                           
------------------------------ ----------                                                           
XYYYYYYYYY                         100000                                                           
Elapsed: 00:00:00.04
** udf helps

simple date in-date out function
DATE_OUTPUT            COUNT(*)                                                                     
-------------------- ----------                                                                     
14-MAY-2010 13:11        100000                                                                     
Elapsed: 00:00:00.15

DATE_OUTPUT            COUNT(*)                                                                     
-------------------- ----------                                                                     
21-NOV-2004 13:11        100000                                                                     
Elapsed: 00:00:00.15
***************************************************************SIMILAR TIME!!!
udf does not help

is date out the issue,  date in-num out function
NUMBER_OUTPUT   COUNT(*)                                                                            
------------- ----------                                                                            
      2454431     100000                                                                            
Elapsed: 00:00:00.17

NUMBER_OUTPUT   COUNT(*)                                                                            
------------- ----------                                                                            
      2454231     100000                                                                            
Elapsed: 00:00:00.18
***************************************************************SIMILAR TIME!!!
udf does not help

is date in the issue,  num in-date out function
DATE_OUTPUT            COUNT(*)                                                                     
-------------------- ----------                                                                     
07-AUG-2018 18:11        100000                                                                     
Elapsed: 00:00:00.21

DATE_OUTPUT            COUNT(*)                                                                     
-------------------- ----------                                                                     
11-NOV-2015 17:57        100000                                                                     
Elapsed: 00:00:00.21
***************************************************************SIMILAR TIME!!!
udf does not help

so back to my original function I had issues with
a difference with the multiple vcs in func and my orig func is my orig had defaults
thus I will try a version with no defaults
TEXT_OUTPUT                      COUNT(*)                                                           
------------------------------ ----------                                                           
Ms Wyyyyyyyyy W Wyyyyyyyyy         100000                                                           
Elapsed: 00:00:00.19

TEXT_OUTPUT                      COUNT(*)                                                           
------------------------------ ----------                                                           
Ms Wyyyyyyyyy W Wyyyyyyyyy         100000                                                           
Elapsed: 00:00:00.08
****************************************************************UDF has an IMPACT

now with one of the parameters set to a default
TEXT_OUTPUT                      COUNT(*)                                                           
------------------------------ ----------                                                           
Ms Wyyyyyyyyy W Wyyyyyyyyy         100000                                                           
Elapsed: 00:00:00.32

TEXT_OUTPUT                      COUNT(*)                                                           
------------------------------ ----------                                                           
Ms Wyyyyyyyyy W Wyyyyyyyyy         100000                                                           
Elapsed: 00:00:00.32
****************************************************************UDF has NO IMPACT
****************************************************************ALSO defaults cause both versions to be slower

now call the simple disp_name_udf function that benefits standalone from within a package

standalone
TEXT_OUTPUT                      COUNT(*)                                                           
------------------------------ ----------                                                           
Ms Wyyyyyyyyy W Wyyyyyyyyy         100000                                                           
Elapsed: 00:00:00.08

within package
TEXT_OUTPUT                      COUNT(*)                                                           
------------------------------ ----------                                                           
Ms Wyyyyyyyyy W Wyyyyyyyyy         100000                                                           
Elapsed: 00:00:00.08
*********************************************** WORKS - so long as neither spec of body have prm defaults

and just to round of, using a subquery factored function which my prior tests showed reduced overhead
TEXT_OUTPUT                      COUNT(*)                                                           
------------------------------ ----------                                                           
Ms Wyyyyyyyyy W Wyyyyyyyyy         100000                                                           
Elapsed: 00:00:00.10

the WITH function benefits even with defaults
TEXT_OUTPUT                      COUNT(*)                                                           
------------------------------ ----------                                                           
Ms Wyyyyyyyyy W Wyyyyyyyyy         100000                                                           
Elapsed: 00:00:00.10

Creating the test table

drop table pers;
create table pers
(pers_id             number(8)    not null
,surname             varchar2(30) not null
,first_forename      varchar2(30) not null
,second_forename     varchar2(30)
,pers_title          varchar2(10)
,sex_ind             char(1)      not null
,dob                 date
,addr_id             number(8)
,pers_comment        varchar2(2000)
)
/
insert into pers
select rownum 
      ,rpad(chr(65+mod(rownum,24)),10,chr(65+mod(rownum,25))) 
      ,rpad(chr(65+mod(rownum,24)),10,chr(65+mod(rownum,25))) @cre
      ,rpad(chr(65+mod(rownum,24)),10,chr(65+mod(rownum,25))) 
      ,decode(mod(rownum,4),0,'MR',1,'MRS',2,'Ms',3,'MR','DR')
      ,decode(mod(rownum,2),0,'M',1,'F')
      ,sysdate - (3000+mod(rownum,30000))
      ,rownum +1001
      ,rpad(chr(65+mod(rownum,24)),200,chr(65+mod(rownum,25))) 
from dual
connect by level < 100001
/

Creating the functions and a small package

--num_num
CREATE OR REPLACE FUNCTION normal_num_num(p_id IN NUMBER) RETURN NUMBER IS
v_num number;
BEGIN
  v_num:=p_id*p_id;
  RETURN v_num;
END;
/
CREATE OR REPLACE FUNCTION udf_num_num(p_id IN NUMBER) RETURN NUMBER IS
PRAGMA UDF;
v_num number;
BEGIN
  v_num:=p_id*p_id;
  RETURN v_num;
END;
/
--
-- vc_vc
CREATE OR REPLACE FUNCTION normal_vc_vc(p_id IN varchar2) RETURN varchar2 IS
v_vc varchar2(100);
BEGIN
  v_vc:=substr(p_id,1,5);
  RETURN v_vc;
END;
/
CREATE OR REPLACE FUNCTION udf_vc_vc(p_id IN varchar2) RETURN varchar2 IS
PRAGMA UDF;
v_vc varchar2(100);
BEGIN
  v_vc:=substr(p_id,1,5);
  RETURN v_vc;
END;
/
-- vc_vc_2
CREATE OR REPLACE FUNCTION normal_vc_vc_2(p_id1 IN varchar2,p_id2 IN varchar2) RETURN varchar2 IS
v_vc varchar2(100);
BEGIN
  v_vc:=substr(p_id1,1,5)||substr(p_id2,2,5);
  RETURN v_vc;
END;
/
CREATE OR REPLACE FUNCTION udf_vc_vc_2(p_id1 IN varchar2,p_id2 IN varchar2) RETURN varchar2 IS
PRAGMA UDF;
v_vc varchar2(100);
BEGIN
  v_vc:=substr(p_id1,1,5)||substr(p_id2,2,5);
  RETURN v_vc;
END;
/
--
-- dt_dt
CREATE OR REPLACE FUNCTION normal_dt_dt(p_id IN date) RETURN date IS
v_dt date;
BEGIN
  v_dt:=p_id+1000;
  RETURN v_dt;
END;
/
CREATE OR REPLACE FUNCTION udf_dt_dt(p_id IN date) RETURN date IS
PRAGMA UDF;
v_dt date;
BEGIN
  v_dt:=p_id-1000;
  RETURN v_dt;
END;
/
-- dt_num
CREATE OR REPLACE FUNCTION normal_dt_num(p_id IN date) RETURN number IS
v_num number;
BEGIN
  v_num:=to_char(p_id,'J')+100;
  RETURN v_num;
END;
/
CREATE OR REPLACE FUNCTION udf_dt_num(p_id IN date) RETURN number IS
PRAGMA UDF;
v_num number;
BEGIN
  v_num:=to_char(p_id,'J')-100;
  RETURN v_num;
END;
/
-- num_dt
CREATE OR REPLACE FUNCTION normal_num_dt(p_id IN number) RETURN DATE IS
v_dt date;
BEGIN
  v_dt:=sysdate+(p_id/100);
  RETURN v_dt;
END;
/
CREATE OR REPLACE FUNCTION udf_num_dt(p_id IN number) RETURN DATE IS
PRAGMA UDF;
v_dt date;
BEGIN
  v_dt:=sysdate-(p_id/100);
  RETURN v_dt;
END;
/
create or replace function normal_disp_name (p_sn      in varchar2
                       ,p_fn1     in varchar2
                       ,p_fn2     in varchar2  
                       ,p_title   in varchar2  ) return varchar2 is
v_return     varchar2(1000);
begin
  v_return := case when p_title is null then ''
                   else initcap(p_title)||' '
              end
            ||initcap(p_fn1)||' '
            ||case when p_fn2 is null then ''
                   else substr(p_fn2,1,1)||' '
              end
            ||initcap(p_sn);
return v_return;
end;
/
create or replace function udf_disp_name (p_sn      in varchar2
                       ,p_fn1     in varchar2
                       ,p_fn2     in varchar2  
                       ,p_title   in varchar2  ) return varchar2 is
PRAGMA UDF;
v_return     varchar2(1000);
begin
  v_return := case when p_title is null then ''
                   else initcap(p_title)||' '
              end
            ||initcap(p_fn1)||' '
            ||case when p_fn2 is null then ''
                   else substr(p_fn2,1,1)||' '
              end
            ||initcap(p_sn);
return v_return;
end;
/
create or replace function normal_disp_name_defaults (p_sn      in varchar2
                       ,p_fn1     in varchar2
                       ,p_fn2     in varchar2  
                       ,p_title   in varchar2 :=null  ) return varchar2 is
v_return     varchar2(1000);
begin
  v_return := case when p_title is null then ''
                   else initcap(p_title)||' '
              end
            ||initcap(p_fn1)||' '
            ||case when p_fn2 is null then ''
                   else substr(p_fn2,1,1)||' '
              end
            ||initcap(p_sn);
return v_return;
end;
/
create or replace function udf_disp_name_defaults (p_sn      in varchar2
                       ,p_fn1     in varchar2
                       ,p_fn2     in varchar2  
                       ,p_title   in varchar2 :=null ) return varchar2 is
PRAGMA UDF;
v_return     varchar2(1000);
begin
  v_return := case when p_title is null then ''
                   else initcap(p_title)||' '
              end
            ||initcap(p_fn1)||' '
            ||case when p_fn2 is null then ''
                   else substr(p_fn2,1,1)||' '
              end
            ||initcap(p_sn);
return v_return;
end;
/
create or replace package t_pkg as
function udf_disp_name (p_sn      in varchar2
                       ,p_fn1     in varchar2
                       ,p_fn2     in varchar2  
                       ,p_title   in varchar2 ) return varchar2;
end t_pkg;
/
create or replace package body t_pkg as
function udf_disp_name (p_sn      in varchar2
                       ,p_fn1     in varchar2
                       ,p_fn2     in varchar2  
                       ,p_title   in varchar2 ) return varchar2 is
PRAGMA UDF;
v_return     varchar2(1000);
begin
  v_return := case when p_title is null then ''
                   else initcap(p_title)||' '
              end
            ||initcap(p_fn1)||' '
            ||case when p_fn2 is null then ''
                   else substr(p_fn2,1,1)||' '
              end
            ||initcap(p_sn);
return v_return;
end;
end t_pkg;
/

The test script

-- udf_tests
set lines 100 pages 50
set feed off
col text_output     form a30
col number_output   form 99999999999
col date_output     form a20
spool udf_tests.lst
prompt  running udf_tests
prompt
set pause off
set autotrace off
set timi on
prompt investigating why pragam udf helps some simple functions and not others
prompt ----------------------------------------------------------------------- 
--
prompt
--
prompt simple number in-number out function
select /* mdw_16a */       max(normal_num_num(pers_id)) number_output
       ,count(*) from pers
/
select /* mdw_16b */        max(udf_num_num(pers_id)) number_output
       ,count(*)  from pers
/
prompt
prompt ** udf helps
prompt
--
prompt simple varchar in-varchar out function
select /* mdw_16c */       max(normal_vc_vc(surname)) text_output
       ,count(*) from pers
/
select /* mdw_16d */       max(udf_vc_vc(surname)) text_output
       ,count(*)  from pers
/
prompt
prompt ** udf helps
prompt
--
--
prompt  two varchar in-varchar out function, is the issue with more than one in parameter?
select /* mdw_16e */       max(normal_vc_vc_2(surname,first_forename)) text_output
       ,count(*) from pers
/
select /* mdw_16f */       max(udf_vc_vc_2(surname,first_forename)) text_output
       ,count(*)  from pers
/
prompt
prompt ** udf helps
prompt
--
prompt simple date in-date out function
select /* mdw_16g */       max(normal_dt_dt(DOB)) date_output
       ,count(*) from pers
/
select /* mdw_16h */       max(udf_dt_dt(DOB)) date_output
       ,count(*)  from pers
/
prompt ***************************************************************SIMILAR TIME!!!
prompt udf does not help
prompt

--
prompt is date out the issue,  date in-num out function
select /* mdw_16i */       max(normal_dt_num(DOB)) number_output
       ,count(*) from pers
/
select /* mdw_16j */       max(udf_dt_num(DOB)) number_output
       ,count(*)  from pers
/
prompt ***************************************************************SIMILAR TIME!!!
prompt udf does not help

--
prompt is date in the issue,  num in-date out function
select /* mdw_16k */       max(normal_num_dt(pers_id)) date_output
       ,count(*) from pers
/
select /* mdw_16l */       max(udf_num_dt(pers_id)) date_output
       ,count(*)  from pers
/
prompt ***************************************************************SIMILAR TIME!!!
prompt udf does not help
--
--
prompt
prompt so back to my original function I had issues with
prompt a difference with the multiple vcs in func and my orig func is my orig had defaults
prompt thus I will try a version with no defaults
prompt 
select /* mdw_16m */
        max(normal_disp_name(surname,first_forename,second_forename,pers_title)) text_output
       ,count(*) from pers
/
select /* mdw_16n */
        max(udf_disp_name(surname,first_forename,second_forename,pers_title)) text_output
       ,count(*)  from pers
/
prompt ****************************************************************UDF has an IMPACT
prompt
prompt
prompt now with one of the parameters set to a default 
select /* mdw_16o */
        max(normal_disp_name_defaults(surname,first_forename,second_forename,pers_title)) text_output
       ,count(*) from pers
/
select /* mdw_16p */
        max(udf_disp_name_defaults(surname,first_forename,second_forename,pers_title)) text_output
       ,count(*)  from pers
/
prompt ****************************************************************UDF has NO IMPACT
prompt ****************************************************************ALSO defaults cause both versions to be slower
prompt 
prompt now call the simple disp_name_udf function that benefits standalone from within a package
prompt
prompt standalone
select /* mdw_16q */
        max(udf_disp_name(surname,first_forename,second_forename,pers_title)) text_output
       ,count(*)  from pers
/
prompt
prompt within package
select /* mdw_16r */
        max(t_pkg.udf_disp_name(surname,first_forename,second_forename,pers_title)) text_output
       ,count(*)  from pers
/
prompt *********************************************** WORKS - so long as neither spec of body have prm defaults
prompt
prompt and just to round of, using a subquery factored function which my prior tests showed reduced overhead
with 
  function l_disp_name(p_sn      in varchar2
                      ,p_fn1     in varchar2
                      ,p_fn2     in varchar2  
                      ,p_title   in varchar2 )
return varchar2
is
v_return     varchar2(1000);
begin
  v_return := case when p_title is null then ''
                   else initcap(p_title)||' '
              end
            ||initcap(p_fn1)||' '
            ||case when p_fn2 is null then ''
                   else substr(p_fn2,1,1)||' '
              end
            ||initcap(p_sn);
return v_return;
end l_disp_name;
select /*mdw_16s */
        max(l_disp_name(p_sn =>surname           ,p_fn1  =>first_forename
                       ,p_fn2=>second_forename   ,p_title=>pers_title)        ) text_output
       ,count(*)
from pers
/

prompt the WITH function benefits even with defaults
with 
  function l_disp_name(p_sn      in varchar2
                      ,p_fn1     in varchar2
                      ,p_fn2     in varchar2 :=null  
                      ,p_title   in varchar2 :=null )
return varchar2
is
v_return     varchar2(1000);
begin
  v_return := case when p_title is null then ''
                   else initcap(p_title)||' '
              end
            ||initcap(p_fn1)||' '
            ||case when p_fn2 is null then ''
                   else substr(p_fn2,1,1)||' '
              end
            ||initcap(p_sn);
return v_return;
end l_disp_name;
select /*mdw_16t */
        max(l_disp_name(p_sn =>surname           ,p_fn1  =>first_forename
                       ,p_fn2=>second_forename   ,p_title=>pers_title)        ) text_output
       ,count(*)
from pers
/
--
spool off

Pragma UDF – Speeding Up your PL/SQL Functions Called From SQL November 4, 2015

Posted by mwidlake in performance, PL/SQL, SQL.
Tags: , , ,
6 comments

A new feature for PL/SQL was introduced in V12, pragma UDF. UDF stands for User Defined Functions. It can speed up any SQL you have that uses PL/SQL functions you created yourself.

{please see this second post on some limitations of pragma UDF in respect of IN & RETURN data types and parameter defaults}.

We can create our own functions in PL/SQL and they can be called from both PL/SQL and SQL. This has been possible since V7.3 and is used extensively by some sites to extend the capabilities of the database and encapsulate business logic.

A problem with this, though, is that every time you swap from SQL to PL/SQL (or the other way around) you have to do a context switch each time, which can be quite cpu and memory intensive. If you are using your own PL/SQL function in the SELECT list of a SQL statement and you are selecting a lot of rows (say as part of a business report) then the overhead can be quite considerable as you could be doing a context switch per row. I won’t go into too much detail here (partly as I go in to considerable detail on the subject in a book I am working on for 2016) on how you can investigate the context switching and when exactly it occurs, but I will show you one of the two new ways in Oracle 12 to reduce the overhead, namely PRAGMA UDF. At present this seems to be a little used and rarely-mentioned feature on the blogsphere, with articles just covering simple examples of almost no-business-function, numeric functions.

I’ll give you a slightly less simple example but my next post will give you details on some limitations of pragma UDF. Here I am just setting the scene. I have the below PERSON table which has the parts of the names in distinct columns, with the contents forced to upper case (as is standard practice). We will create a function to provide a nicely init-capped and spaced display name and a second function which is identical but uses PRAGMA UDF.

PERSON
Name                                     Null?    Type
---------------------------------------- -------- ---------------
PERS_ID                                  NOT NULL NUMBER(8)
SURNAME                                  NOT NULL VARCHAR2(30)
FIRST_FORENAME                           NOT NULL VARCHAR2(30)
SECOND_FORENAME                                   VARCHAR2(30)
PERS_TITLE                                        VARCHAR2(10)
SEX_IND                                  NOT NULL CHAR(1)
DOB                                               DATE
ADDR_ID                                           NUMBER(8)
STAFF_IND                                         CHAR(1)
LAST_CONTACT_ID                                   NUMBER(8)
PERS_COMMENT                                      VARCHAR2(2000)


create or replace function normal_disp_name (p_sn      in varchar2
                                            ,p_fn1     in varchar2
                                            ,p_fn2     in varchar2  
                                            ,p_title   in varchar2  ) return varchar2 is
v_return     varchar2(1000);
begin
  v_return := case when p_title is null then ''
                   else initcap(p_title)||' '
              end
            ||initcap(p_fn1)||' '
            ||case when p_fn2 is null then ''
                   else substr(p_fn2,1,1)||' '
              end
            ||initcap(p_sn);
return v_return;
end;
/
create or replace function udf_disp_name (p_sn      in varchar2
                                         ,p_fn1     in varchar2
                                         ,p_fn2     in varchar2  
                                         ,p_title   in varchar2  ) return varchar2 is
-- The Below is the KEY bit
PRAGMA UDF;
v_return     varchar2(1000);
-- {Identical from here}

-- select some data with one of the functions, it does not matter which
select pers_title title,    first_forename    ,second_forename    , surname
      ,normal_disp_name(p_sn =>surname           ,p_fn1  =>first_forename
                       ,p_fn2=>second_forename   ,p_title=>pers_title) display_name
from person
...


TITLE   first_fn   secon_fn   SURNAME         DISPLAY_NAME
------- ---------- ---------- --------------- ----------------------------
MR      HARRISON   RICHARD    HARRIS          Mr Harrison R Harris
MRS     ANNEKA     RACHAEL    HARRIS          Mrs Anneka R Harris
MRS     NICKIE     ALISON     ELWIG           Mrs Nickie A Elwig
MASTER  JAMES      DENZIL     ELWIG           Master James D Elwig
MR      JEFF                  GARCIA          Mr Jeff Garcia
...
MRS     AMELIA     MARIA      ORPINGTON-SMYTH Mrs Amelia M Orpington-Smyth

So we have our test table, you can see my normal_disp_name function and that the *only* difference with the second version is the inclusion of PRAGMA_UDF in the declaration section. That is partly why it is such a nice feature, you can just add this one line to existing code and you should get the benefit. Should….. {see second post when I do it}

Finally, I show some code using the function and the output.

To demonstrate the impact of context switching I will select 100,000 records from my test table in 3 ways: using only native SQL functions and thus no context switching; using my traditional PL/SQL function which suffers from context switching; with my new “pragma UDF” function to reduce the overhead of the context switching.

select avg(length(
      case when pers_title is null then ''
                   else initcap(pers_title)||' '
              end
            ||initcap(first_forename)||' '
            ||case when second_forename is null then ''
                   else substr(second_forename,1,1)||' '
              end
            ||initcap(surname)
          )      )  avg_name_length
       ,count(*)
from person
where pers_id > 100100
and rownum < 100000

select  avg(length(normal_disp_name(p_sn =>surname           ,p_fn1  =>first_forename
                                   ,p_fn2=>second_forename   ,p_title=>pers_title)        ) ) disp_name_len
       ,avg(addr_id)
       ,count(*)
from person
where pers_id > 100100 and rownum < 100000

select  avg(length(udf_disp_name(p_sn =>surname           ,p_fn1  =>first_forename
                                ,p_fn2=>second_forename   ,p_title=>pers_title)        ) ) disp_name_len
       ,avg(addr_id)
       ,count(*)
from person
where pers_id > 100100 and rownum < 100000

One thing to mention in passing is that the code using either function is much easier to read and self-documenting. This is one of the benefits of proceduralising your code, as well as creating just one place to maintain it. If I had to re-use that native-SQL section in a half-dozen reports I would probably mess up at least one of the times I cut-and-paste it and I would now have several places to maintain that code.

I ran the test statements several times and took the average of the 2nd to 6th runs, to remove the initial parsing & caching overhead that comes with the first execution and to get more reliable figures than one further run would give me.

Version                      Run Time average (secs)
Native SQL                   0.03
Traditional PL/SQL           0.33
PRAGMA UDF PL/SQL            0.08

As you can see, just including PRAGMA UDF removed most of the overhead caused by context switching.

How does PRAGMA UDF work? I’m not sure, the official Oracle documentation is pretty light on it and just says:

“The UDF pragma tells the compiler that the PL/SQL unit is a user defined function that is used primarily in SQL statements, which might improve its performance

Note the italics (which are mine). “Might improve its performance” but no detail as to what it does. As I understand it, it alters the internal representation of data as it is passed between the SQL and PL/SQL engines via the IN and RETURN values (Note it does not change the data types!) – but treat that as a bit of wild speculation for now. I have some evidence for it that you will see in, yes, the next post.

Pragma UDF can slow down slightly functions being called directly from PL/SQL. So use it only for functions you know will be called from SQL.

I’ll make one other observation. Using PL/SQL functions increased the run time to process 100,000 records on my modest test system by all of 0.3 seconds. But that is 10 times the time taken for the native SQL statement. Pragma UDF removes around 80% of this overhead. It’s a nice saving but is probably inconsequential if your code is actually doing any physical IO at all (my example is processing already cached blocks). And if you are only processing a few records or one record in a GUI screen, the context switching is moot {meaning, of no significance}.

But if you have code that processes a huge set of data and uses a lot of user defined PL/SQL functions (and again I go into a lot more detail about this in the book) using pragma UDF in 12C could gain you quite a bit of extra performance. If you have code where even 0.00001 seconds is important (think trading systems) then again there may be a worthwhile benefit.

PL/SQL bug with DBMS_RANDOM? October 8, 2015

Posted by mwidlake in bug, PL/SQL, SQL.
Tags: , , ,
7 comments

I think I’ve found an (admittedly obscure) bug with DBMS_RANDOM, group functions, PL/SQL and/or SQL.

Have a look and see if you also think this is odd – or have I missed the totally obvious?

(This is all on 12.1.0.2)

{Update – my conclusion is, and thanks to Joel and Sayan for their comments, that this is not a “bug”. Oracle do not promise us how PL/SQL functions are executed due to the way SQL can be re-written by the parser. I think most of us stumbling over something like this would treat it as a bug though. You have to look at the column projection, again see the comments, to see how Oracle is deciding to get the columns derived by a naked call to DBMS_RANDOM.VALUE (by naked I mean no inclusion of parameters passed in and, significantly, no reference to columns). It’s just the way it is}

Without going into the details (we would be here for hours if I did) I’m looking into the overhead of context switching between PL/SQL and SQL. It it fairly common knowledge that when you call a PL/SQL function from SQL there is a context switch when the SQL engine hands over control to the PL/SQL engine. I’ve been doing some work into how much the overhead is and that it is incurred for each distinct PL/SQL function (plus loads of other considerations around it).

In doing so I saw something unexpected (to me, anyway) which I have simplified down to this:

select /* mdw_z1 */ avg(dbms_random.value) avg_dbrav1,avg(dbms_random.value) avg_dbrav2
      ,max(created) max_cre
from all_objects
where rownum <50000

AVG_DBRAV1 AVG_DBRAV2 MAX_CRE
---------- ---------- -----------------
.502234063 .502234063 11-SEP-2014 09:31
Elapsed: 00:00:00.89

Note that the two averages of DBMS_RANDOM.VALUE are exactly the same. It is so improbably as to be pretty much impossible that over all those rows, the different random values generated for each column add up to exactly the same. They are getting the same values for each row. I’m very, very dubious of any “identical seeding” issue (ie they both get the same seed and from then provide identical values) as even if DBMS_RANDOM is basing it’s output on something like initial seed, SCN of statement and number of iterations, it is still being referenced twice for each row.

Some further evidence is that when I increase the number of calls to DBMS_RANDOM the elapsed time is almost identical and the statement CPU and PLSQL_EXEC_TIME (pulled from V$SQL) do not increase in any significant way (PLSQL_EXEC_TIME actually goes down):

select /* mdw_z2 */ avg(dbms_random.value) avg_dbrav1,avg(dbms_random.value) avg_dbrav2
       ,avg(dbms_random.value) avg_dbrav3,avg(dbms_random.value) avg_dbrav4
       ,avg(dbms_random.value) avg_dbrav5,avg(dbms_random.value) avg_dbrav6
      ,max(created) max_cre
from all_objects
where rownum <50000

AVG_DBRAV1 AVG_DBRAV2 AVG_DBRAV3 AVG_DBRAV4 AVG_DBRAV5 AVG_DBRAV6 MAX_CRE
---------- ---------- ---------- ---------- ---------- ---------- -----------------
.500367568 .500367568 .500367568 .500367568 .500367568 .500367568 11-SEP-2014 09:31
Elapsed: 00:00:00.84 -- (cf .89 before)

-- information about the two SQL statements from v$SQL, identified by my comments
SQL_ID        PLAN_HASHV  PRSE  EXCS     BUFFS DISCS     CPU_T PLSQL_EXEC    RWS
------------- ---------- ----- ----- --------- ----- --------- ---------- ------
SQL_TXT
--------------------------------------------------------------------------------
4y4jsj7uy12t3 1587414607     1     1     41481     0    843750     312691      1
select /* mdw_z2 */ avg(dbms_random.value) avg_dbrav1,avg(dbms_random.value) avg
gpdga3qars8p2 1587414607     1     1     41481     0    828125     316648      1
select /* mdw_z1 */ avg(dbms_random.value) avg_dbrav1,avg(dbms_random.value) avg

I’ve verified that adding other PL/SQL calls adds about 0.4 seconds to the execution time per extra (simple) PL/SQL function and you see the CPU Time and PLSQL_EXEC_TIME increase.

It is as if Oracle realises the call to DBMS_RANDOM is the same and so does it once per row and puts the results into each column. It does this for other deterministic PL/SQL functions but I would (a) be worried if DBMS_RANDOM was being treated as deterministic (as it’s purpose is to NOT be deterministic) and (b) I know I have used DBMS_RANDOM to generate test data many times, often for several columns in my generating SQL SELECT statement, and I never noticed this before. So, I decided to check that it was only returning 1 value per row to populate the results for each column derived from DBMS_RANDOM.VALUE:


select dbms_random.value drv1, dbms_random.value drv2,created
from all_objects where rownum < 6

      DRV1       DRV2 CREATED
---------- ---------- -----------------
.341919389 .020497964 11-SEP-2014 08:40
.569447631 .727672346 11-SEP-2014 08:40
.019986319 .709239586 11-SEP-2014 08:40
.286970852 .144263004 11-SEP-2014 08:40
 .14440676 .538196808 11-SEP-2014 08:40

Huh? That rather damages my theory and actually works the way I expected, hoped for and thought I remembered. And you can just look at the two columns and you know they are not going to add up to exactly the same! (the last digit adds up to 21 for DRV1 , 28 for DRV2) {How many of you checked that and got 27 for DRV1?}.

So I wrote some code that should be logically identical to my original SQL statement but does the data collection “manually”:

with source as
(select /*+ materialize */
        dbms_random.value  dbrav1    ,dbms_random.value dbrav2
       ,created
from all_objects
where rownum <50000
)
select/* mdw_z3 */ avg(dbrav1)  avg_dbrav1     ,avg(dbrav2)  avg_dbrav2
      ,max(created)  max_cre
from source

AVG_DBRAV1 AVG_DBRAV2 MAX_CRE
---------- ---------- -----------------
.497954489 .497633494 11-SEP-2014 09:31

Elapsed: 00:00:00.96

As you can see, all I do is force the data to be collected into an internal temporary table using the WITH clause and hint it to stop oracle merging the code together and then average the columns. And now I get two different values for the two DBMS_RANDOM.VALUE derived columns.

This version of the code also accrues more runtime and statement CPU/PLSQL_EXEC_TIME, as I mentioned above, when I add more PL/SQL calls. In the below the extended list of “columns” version takes 1.78 seconds, CPU time increases from 843,750 microseconds to 1,703,125 and PLSQL_EXEC_TIME increases from 316,589 microseconds to 917,208

with source /*2 */ as
(select /*+ materialize */
        dbms_random.value  dbrav1    ,dbms_random.value dbrav2
       ,dbms_random.value  dbrav3    ,dbms_random.value dbrav4
       ,dbms_random.value  dbrav5    ,dbms_random.value dbrav6
       ,created
from all_objects
where rownum <50000
)
select/* mdw_z4 */ avg(dbrav1)  avg_dbrav1     ,avg(dbrav2)  avg_dbrav2
      ,avg(dbrav3)  avg_dbrav3     ,avg(dbrav4)  avg_dbrav4
      ,avg(dbrav5)  avg_dbrav5     ,avg(dbrav6)  avg_dbrav6
      ,max(created)  max_cre
from source

AVG_DBRAV1 AVG_DBRAV2 AVG_DBRAV3 AVG_DBRAV4 AVG_DBRAV5 AVG_DBRAV6 MAX_CRE
---------- ---------- ---------- ---------- ---------- ---------- -----------------
.499007362 .501985119 .498591643  .50252316  .49939127  .49804233 11-SEP-2014 09:31

Elapsed: 00:00:01.78

--
--
SQL_ID        PLAN_HASHV  PRSE  EXCS     BUFFS DISCS     CPU_T PLSQL_EXEC    RWS
------------- ---------- ----- ----- --------- ----- --------- ---------- ------
SQL_TXT
--------------------------------------------------------------------------------
49zjaaj41dg00 3034557986     1     1     43465   962   1703125     917208      1
with source /*2 */ as (select /*+ materialize */         dbms_random.value  dbra
0rtbx97f14b0k 3034557986     1     1     42294   382    843750     316586      1
with source as (select /*+ materialize */         dbms_random.value  dbrav1    ,

I did have the execution plans in here too but the post is already quite long. They are identical though, as is shown by the same value of 3034557986 for the PLAN_HASH_VALUE for both statements

So in Summary, the below two versions of the (logically identical as far as I can see) code give different results. The difference is that one is allowing Oracle to do the averaging natively and in the other I am forcing the data to be collected into an internal temporary table and then averaged:

select /* mdw_z1 */ avg(dbms_random.value) avg_dbrav1,avg(dbms_random.value) avg_dbrav2
      ,max(created) max_cre
from all_objects
where rownum <50000

AVG_DBRAV1 AVG_DBRAV2 MAX_CRE
---------- ---------- -----------------
.502234063 .502234063 11-SEP-2014 09:31

with source as
(select /*+ materialize */
        dbms_random.value  dbrav1    ,dbms_random.value dbrav2
       ,created
from all_objects
where rownum <50000
)
select/* mdw_z3 */ avg(dbrav1)  avg_dbrav1     ,avg(dbrav2)  avg_dbrav2
      ,max(created)  max_cre
from source

AVG_DBRAV1 AVG_DBRAV2 MAX_CRE
---------- ---------- -----------------
.497954489 .497633494 11-SEP-2014 09:31

If no one can explain what I am missing, I suppose I should raise a bug with Oracle. Which could be tricky seeing as my access to MyOracleSupport is a bit tenuous…

In case you want to play, this is my whole test script, which does everything but query V$SQL for the statement stats at the end. I am sure you can manage that yourselves…

-- the two columns from dbms_random get the same result - which I did not expect
select /* mdw_z1 */ avg(dbms_random.value) avg_dbrav1,avg(dbms_random.value) avg_dbrav2
      ,max(created) max_cre
from all_objects
where rownum <50000
/

select /* mdw_z2 */ avg(dbms_random.value) avg_dbrav1,avg(dbms_random.value) avg_dbrav2
       ,avg(dbms_random.value) avg_dbrav3,avg(dbms_random.value) avg_dbrav4
       ,avg(dbms_random.value) avg_dbrav5,avg(dbms_random.value) avg_dbrav6
      ,max(created) max_cre
from all_objects
where rownum <50000
/
--
-- The below forces oracle to gather the data into an internal temporary table first.
-- and then I average that
with source as
(select /*+ materialize */
        dbms_random.value  dbrav1    ,dbms_random.value dbrav2
       ,created
from all_objects
where rownum <50000
)
select/* mdw_z3 */ avg(dbrav1)  avg_dbrav1     ,avg(dbrav2)  avg_dbrav2
      ,max(created)  max_cre
from source
/
-- and now I make it do more executions against dbms_random, to see if makes a difference
-- which will lend support to my idea it is doing more contect switching.
with source /*2 */ as
(select /*+ materialize */
        dbms_random.value  dbrav1    ,dbms_random.value dbrav2
       ,dbms_random.value  dbrav3    ,dbms_random.value dbrav4
       ,dbms_random.value  dbrav5    ,dbms_random.value dbrav6
       ,created
from all_objects
where rownum <50000
)
select/* mdw_z4 */ avg(dbrav1)  avg_dbrav1     ,avg(dbrav2)  avg_dbrav2
      ,avg(dbrav3)  avg_dbrav3     ,avg(dbrav4)  avg_dbrav4
      ,avg(dbrav5)  avg_dbrav5     ,avg(dbrav6)  avg_dbrav6
      ,max(created)  max_cre
from source
/

Audio semi-Visual Presentation on Clustering Data in Oracle November 12, 2014

Posted by mwidlake in performance, Presenting, SQL.
Tags: , , ,
add a comment

I suppose it had to happen eventually but one of my presentations has ended up on YouTube. It’s a recent presentation I did for the Oracle Midlands user group in September.

The topic is (as the title of this blog post hints at!)Boosting select performance by clustering data. The video consists of the slides I presented, changing as the presentation progresses, with my audio over the top. It goes on for a bit, close to an hour, but you could watch a section and then go and do something else before watching a bit more.

I have to say, it is very odd hearing my voice (and the slight touch of the “brummie” {Birmingham} accent coming through) and I do wince at the places where I blather or say something slightly wrong or make a joke that involved a visual element that is lost. Oh well, at least you don’t see me wandering around and jumping up,literally, to point out bits on the slides.

I’m glad to say I will be repeating a slightly more polished version of the presentation at this year’s UKOUG Tech14 conference in December. I was a bit under the weather back on September the 16th, due to having just returned to the Working Life, and with this perfect example of what I did then I should be able to make the next shot at it a cracker… maybe.

On the topic of Oracle Midlands, I like this user group that is run by Mike Mckay Dirden, the meetings consist of evening presentations in Birmingham with a bit of support from Redgate. This includes half-time samosas to keep you going. The next meeting is described here and is on Tuesday 25th November. This meeting has two presentations by my friend Bjoern Rost, who is an Oracle Ace Director {gasps of appreciation from the audience} and a consummate presenter. I wish I could make it there as I would like to share a pint with Bjoern (well, he’ll have wine or a cocktail I suspect as he is not a beer fan) as well as some of my other friends up that part of the country.

What Day Is It If You Only Specify The Time? November 6, 2014

Posted by mwidlake in SQL.
Tags: ,
4 comments

What is the date if you only specify the time when you populate an Oracle date column (or variable)?

That was the question that came up a few days ago in the little DBA team I am currently working in. Of course, the question was posed by the “junior” (who is damned smart at this stuff as he keeps asking questions like that) and the answer from us two old hands was… “Ohh!… hang on… errr….”

A little discussion then occurred. One of us suggested it would be “today”. One of us suggested it would be the Julian 1 date (4712BC). Both of us then stated it was an easy thing to test and the PROPER answer was “just try it and then you will know for sure”. We were right {and, of course, wrong} – as in with a simple question like this you don’t need to google the question (so “hello” to anyone googling the question and finding this page!) you just try it:

The junior tried it and…

select sysdate,to_char(to_date('11:23:15','HH24:MI:SS'),'DD-MON-YYYY HH24:MI:SS') time_only
from dual;

SYSDATE              TIME_ONLY
-------------------- --------------------
05-NOV-2014 22:37:23 01-NOV-2014 11:23:15

The above of course shows that us two old hands were wrong in our suggestions of what the default date would be

The answer is that:

If you do not specify the date, it defaults to *the first day of the current month*

How odd. How amusing. What was more amusing was the two of us old hands looked at the answer and we both said “oh yes! I remember learning that before. Maybe a couple of times..”.

Maybe I am wrong and most of you reading this learned what the default date is early in your Oracle experience and never forgot it {or have learned it from here and will not forget it now, so long as you remain in possession of your wits} but both of us tired old sods laughed over the fact we knew we had known that and forgotten it. And when asked, came up with reasonable, but wrong, suggestions to the the answer. But at least we both knew it was one of those “you can answer it almost as quickly as it takes to google it” questions and the proper answer was to do a 1 minute test.

A quick check on a table in one of our applications that holds the date and time of an event in two columns (a slightly mad but common situation) demonstrated it nicely too:

select action_date,action_time
from source_table
where action_time is not null
ACTION_DATE           ACTION_TIME
-------------------- --------------------
09-JUN-2011 00:00:00 01-OCT-2014 11:45:30
09-DEC-2012 00:00:00 01-OCT-2014 11:12:13
09-DEC-2012 00:00:00 01-OCT-2014 17:05:57
13-JUN-2013 00:00:00 01-OCT-2014 16:25:17
17-JUN-2013 00:00:00 01-OCT-2014 16:39:00
20-JUN-2013 00:00:00 01-OCT-2014 13:00:00
25-SEP-2014 00:00:00 01-NOV-2014 08:59:00
03-NOV-2014 00:00:00 01-NOV-2014 09:00:00
03-NOV-2014 00:00:00 01-NOV-2014 00:00:00

So, if you do not specify the date, Oracle substitutes the first day of the current month. It is fully documented in the overview of the date datatype

Of course, if you do not state the time portion of a date, it defaults to the start of the current hour.

Only kidding, it of course defaults to midnight, though given how the date portion defaults my hour suggestion would almost make sense.

select sysdate,to_char(to_date('15-OCT-2013','DD-MON-YYYY'),'DD-MON-YYYY HH24:MI:SS') date_only
from dual;

SYSDATE              DATE_ONLY
-------------------- --------------------
06-NOV-14            15-OCT-2013 00:00:00

I wonder what other basic pieces of Oracle Info have left my head and if it is more or less than the average person who has been doing this for 25 years?

Row Level Security Part 2 – permissions November 20, 2012

Posted by mwidlake in internals, security, SQL.
Tags: , ,
3 comments

<..Part 1, introduction..
..Part 3 summary in pictures..>

In this second post on the topic of “an introduction to Row Level Security” I want to cover a few things about what permissions you need to implement RLS and some of the consequences. In my introduction in part one I just said my main user has “DBA type Privileges”.

{NB This is all on Oracle V11.2 and I believe everything below is applicable to V10 as well. Also, I should point out that I am not an Oracle security expert – but despite repeatedly saying this, it seems like at least once a year I am asked to improve a system’s security on the grounds of “more than we have now is an improvement”}.

Firstly, you need the permission to create and drop RLS policies. This is controlled by having the EXECUTE privilege on SYS.DBMS_RLS. That is it. If you can execute this package you can create and drop RLS Policies, enable and disable them and also create, manage and use Policy Groups.

Obviously EXECUTE on SYS.DBMS_RLS is a privilege you need to restrict to only “empowered” accounts- but you also want to be very careful about your definition of empowered. If an account has this privilege, it can alter (disable!) any RLS policies that ANYONE created on the instance. You may need to think carefully about having several accounts that can alter all the RLS policies across all schemas in your instance.

You might want to create a single user that has the EXECUTE on SYS.DBMS_RLS and use that to implement and control RLS across the instance {and even have the account locked when it is not needed}. Of course, under normal Oracle anyone with SYS access can run the procedure – but you have SYS access under control already, don’t you?…:-).

By default, who has this privilege?

MDW> @obj_privs
Enter value for obj_name: dbms_rls
Enter value for obj_own: sys
GRANTEE              OWNER        TABLE_NAME             GRANTOR  PRIVILEGE    GRA
-------------------- ------------ ---------------------- -------- ------------ ---
EXECUTE_CATALOG_ROLE SYS          DBMS_RLS               SYS      EXECUTE      NO
XDB                  SYS          DBMS_RLS               SYS      EXECUTE      NO
WMSYS                SYS          DBMS_RLS               SYS      EXECUTE      NO

--
-- and who has that EXECUTE_CATALOG_ROLE?
--
select * from dba_role_privs where granted_role='EXECUTE_CATALOG_ROLE'

GRANTEE                        GRANTED_ROLE                   ADM DEF
------------------------------ ------------------------------ --- ---
DBA                            EXECUTE_CATALOG_ROLE           YES YES
IMP_FULL_DATABASE              EXECUTE_CATALOG_ROLE           NO  YES
EXP_FULL_DATABASE              EXECUTE_CATALOG_ROLE           NO  YES
SYS                            EXECUTE_CATALOG_ROLE           YES YES

So, our old friend the DBA role has that privilege, via the EXECUTE_CATALOG_ROLE and IMP_/EXP_FULL_DATABASE.

Let’s go back to creating policies. I’m sticking to row-level RLS for now, not column-level. When you create a policy you basically are saying “when object X is accessed, run function Y and apply the resulting string to all access, as a predicate”

Lifted an example from the manual, but adding in a couple of bits the authors were too lazy to put in {:-) } :-

DBMS_RLS.ADD_POLICY (
 object_schema   =>   'hr'
,object_name     => 'employees'
,policy_name     => 'emp_policy'
,function_schema => 'hr_rls'
,policy_function => 'emp_sec'
,statement_types => 'select');

The function HR_RLS.EMP_SEC returns a string P1

When someone issues a statement that access the HR.EMPLOYEES table the function HR_RLS.EMP_SEC is executed as part of the parse and, internally, the following view is created and placed into the orginal statement:

SELECT * FROM hr.employees WHERE P1;

As this takes place as part of the parse part of the statement processing, errors with the function are found before any effort to access the actual data occur. It is also why it is not simple to avoid RLS – it is done as part of the overall SQL processing carried out by Oracle.

If the function is invalid, generates an error or the supplied “WHERE predicate” P1 causes the final statement to be syntactically incorrect, you get an error.
At this point Oracle fails “secure” and simply won’t fire the SQL statement. Even if you own the tables and have full privilege access to them. I think it is worth a quick demonstration

I am logged on as the owner of the table and RLS function, MDW:-

MDW> select * from test1
        ID CP_NAME                        OTHER_NAME                        MASK_FL
---------- ------------------------------ ------------------------------ ----------
         1 ERIC                           ERIC THE RED                            1
         2 BROWN                          BOB THE BROWN                           2
         3 GREEN                          GARY THE GREEN                          1
         4 BLUE                           BILL THE BLUE                           3
         5 BLACK                          DAVE THE BLACK                          4
         6 PURPLE                         PETE THE PURPLE                         5
         7 RED                            ROGER THE RED                           6
--
--All works OK
MDW> @rls_func_damage
--
CREATE OR REPLACE FUNCTION hide_rows (
 v_schema IN VARCHAR2
,v_objname IN VARCHAR2)
RETURN VARCHAR2 AS
con VARCHAR2 (200);
BEGIN
-- The line below states loser not user
con := (loser='''||v_schema||''' or MASK_FL > 3)';
RETURN (con);
END hide_rows;
/
Function created.
--
-- Let us check the predicate coming from the function
--
select hide_rows(v_schema=>user,v_objname=>'TEST1') from dual;

HIDE_ROWS(V_SCHEMA=>USER,V_OBJNAME=>'TEST1')
---------------------------------------------------------------------------------------
(loser='MDW' or MASK_FL > 3)

-- "loser" is not a valid piece of syntax or a known variable. So...
--
select * from TEST1
              *
ERROR at line 1:
ORA-28113: policy predicate has error

-- I now damage the statement creating the function so that it is syntactically incorrect
CREATE OR REPLACE FUNCTION hide_rows (
...
-- missing quote
con := (loser='''||v_schema||''' or MASK_FL > 3)';
...

Warning: Function created with compilation errors.

select * from TEST1
              *
ERROR at line 1:
ORA-28110: policy function or package MDW.HIDE_ROWS has error

--re-create the function as I started with

Function created.

select * from TEST1
        ID CP_NAME                        OTHER_NAME                        MASK_FL
---------- ------------------------------ ------------------------------ ----------
         1 ERIC                           ERIC THE RED                            1
         2 BROWN                          BOB THE BROWN                           2
...
         7 RED                            ROGER THE RED                           6
--
-- Back to a working system

As you can see, you get ORA-28110 if the function used by the RLS policy is damaged and ORA-28113 if the function returns a non-valid WHERE Predicate. In either case, the data in those tables is now inaccessible. That could be a problem…

It is of course very interesting that RLS is kind of “blind” to the functions it uses until it uses it. You can alter RLS activity if you can alter the functions.

Last post I mentioned that, as you RLS security is based on these functions, you should give thought to who can see them and change them. Hopefully the above highlights that. There is another access aspect to these functions that is key.

In the first example where I use DBMS_RLS.ADD_POLICY, I state function_schema – this is who owns the PL/SQL function. It defaults to the person who creates the Policy. The function is fired in the standard way with DEFINER RIGHTS – i.e. the rights of the function owner. {You could override this when creating the function such that it runs with executioner rights – but that strikes me as a potentially very confusing and a source of much pain and anguish}.

So the function can see what the owner can see, not the executioner. This has two key effects:

  • You can restrict the access to any DB objects that the function requires to the owner of the function. The end user does not need to have access and it may be more secure if they do not.
  • You have to grant access to objects directly to the function owner, via either object grants or system grants. Stored PL/SQL does not “see” roles if executed with definer rights.

So eg if your function is owned by MDW and references a master table called SEC_MASTER.MASKED_USERS then you need to grant select on SEC_MASTER.MASKED_USERS to MDW. No using roles.
{of course, calling a table MASKED_USERS might give anyone acquiring access to the DB a clue that it is important. I tend to call such “security important” tables things that are “obviously” boring, like ROTA_DATA.}

Finally, that WHERE Predicate P1 is in effect added to your SQL statement. It can be pretty much anything. It can reference other database objects. If it references other database objects THE EXECUTING USER MUST BE ABLE TO SEE THEM. Not the owner of the function; that function has been fired and the WHERE Predicate generated already. On actually running the SQL, the executing user must have access to all objects referenced, including those introduced by the P1 WHERE Predicate. Oh, and remember synonyms!

There is there a way to sidestep RLS and you sometimes need to.

If your logon has the EXEMPT ACCESS POLICY system privilege then RLS functions are utterly ignored by you. No check is made to see if policies apply to your SQL statements, let alone run them. You may need to use this privilege if you need to access functionality that RLS has issues with (see comments by Dom Brooks and Tony Sleight on my first post). You might also need it if you have errors with the policies but you need access to the data to keep the business moving. You probably need a locked-down user somewhere with the EXEMPT ACCESS POLICY privilege if you use RLS.
Of course, great care has to be taken in making use of this privilege as it side-steps all RLS_based security.
If you have master users that have full access to the data, this privilege also removes the overhead of the SQL engine identifying any policies, firing the function and including the extra predicates.

One very final point. I have found myself writing RLS functions that all check if the user was the owner of the table and, if so, allowing them to see all data. By granting EXEMPT ACCESS POLICY to that user I was able to do away with those checks and make the functions much simpler. In this case I did not so much think of EXEMPT ACCESS POLICY turning off security but the lack of it turning it on for restricted users.

That will do for part 2.

Exclusion of Unioned SQL in Views – Followup August 20, 2012

Posted by mwidlake in database design, performance, SQL.
Tags: , ,
add a comment

Last week I put up a post about how Oracle can filter out sections of a union view..

Within the comments I put up another example where the CBO did not filter out all but one of the Union views despite my replicating the exact WHERE clause of one of the unioned statements. Jonathan Lewis posted a followup to say “Oracle can be very touchy about how it plays this game” and made a prediction of how the CBO would handle a slightly different scenario.

This was the extra scenario and I include brief details on creating the unioned view too. NB all on Oracle 11.2.0.2. {non-Exadata:-) }

  1  create table mdw_source
  2  tablespace users
  3  as select * from dba_objects

  1  create or replace view mdw_v1 as
  2  select
  3    '1'              src_type
  4   ,ms.OWNER ,ms.OBJECT_NAME,ms.SUBOBJECT_NAME ,ms.CREATED
  5  from mdw_source ms
  6  where ms.object_name like 'A%'
  7  union all
  8  select
  9    '2'              src_type
 10   ,ms.OWNER ,ms.OBJECT_NAME,ms.SUBOBJECT_NAME ,ms.CREATED
 11  from mdw_source ms
 12  where ms.object_name like 'B%'
 13  UNION ALL
 14  select
 15    '3'              src_type
 16   ,ms.OWNER ,ms.OBJECT_NAME,ms.SUBOBJECT_NAME ,ms.CREATED
 17  from mdw_source ms
 18  where ms.object_name like 'C%'

1 select count(created)
2 from mdw_v1 mv
3 where mv.object_name like ‘C%’

Execution Plan
———————————————————-
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
———————————————————————————–
| 0 | SELECT STATEMENT | | 1 | 75 | 551 (1)| 00:00:07 |
| 1 | SORT AGGREGATE | | 1 | 75 | | |
| 2 | VIEW | MDW_V1 | 824 | 61800 | 551 (1)| 00:00:07 |
| 3 | UNION-ALL | | | | | |
|* 4 | TABLE ACCESS FULL| MDW_SOURCE | 1 | 34 | 184 (1)| 00:00:03 |
|* 5 | TABLE ACCESS FULL| MDW_SOURCE | 2 | 68 | 184 (1)| 00:00:03 |
|* 6 | TABLE ACCESS FULL| MDW_SOURCE | 821 | 27914 | 184 (1)| 00:00:03 |
———————————————————————————–

Predicate Information (identified by operation id):
—————————————————
4 – filter(“MS”.”OBJECT_NAME” LIKE ‘A%’ AND “MS”.”OBJECT_NAME” LIKE ‘C%’)
5 – filter(“MS”.”OBJECT_NAME” LIKE ‘B%’ AND “MS”.”OBJECT_NAME” LIKE ‘C%’)
6 – filter(“MS”.”OBJECT_NAME” LIKE ‘C%’)

Statistics
———————————————————-
8 recursive calls
0 db block gets
2417 consistent gets 

So, as you can see, despite me providing the WHERE clause in my final select as “where mv.object_name like ‘C%’” and the WHERE clauses of the unioned view are all similar and one is “ms.object_name like ‘C%’” the CBO fails to realise it can exclude all but one of the unioned SQL statements – despite the mutally exclusive filter predicates:

4 – filter(“MS”.”OBJECT_NAME” LIKE ‘A%’ AND “MS”.”OBJECT_NAME” LIKE ‘C%’)
5 – filter(“MS”.”OBJECT_NAME” LIKE ‘B%’ AND “MS”.”OBJECT_NAME” LIKE ‘C%’)

I have to confess, in light of recent experience I would have wondered if these filter predicates would prevent the other two SQL statements being fired at execution time but the number of consistent gets confirms that it is not – 2417 compared to 805 or so for when the query exclusion occurs. You can check back to the prior post to verify that.

So, let’s do the tests Jonathan suggested. He suggested altering the view to use equality predicates so I have recreated the view as such:

TEST_DB> create or replace view mdw_v1 as
  2  select
  3    '1'              src_type
  4    ,ms.OWNER ,ms.OBJECT_NAME ,ms.SUBOBJECT_NAME ,ms.CREATED
  5  from mdw_source ms
  6  where substr(ms.object_name,1,1) ='A'
  7  union all
  8  select
  9    '2'              src_type
 10   ,ms.OWNER ,ms.OBJECT_NAME ,ms.SUBOBJECT_NAME ,ms.CREATED
 11  from mdw_source ms
 12  where substr(ms.object_name,1,1) ='B'
 13  UNION ALL
 14  select
 15    '3'              src_type
 16   ,ms.OWNER ,ms.OBJECT_NAME ,ms.SUBOBJECT_NAME ,ms.CREATED
 17  from mdw_source ms
 18  where substr(ms.object_name,1,1) ='C'

View created.

TEST_DB> set autotrace on
TEST_DB> --
TEST_DB> -- below will do a select of a non-indexed column across the view
TEST_DB> select count(created) from mdw_v1

COUNT(CREATED)
--------------
          1288

Execution Plan
-----------------------------------------------------------------------------------
| Id  | Operation            | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |            |     1 |     9 |   552   (1)| 00:00:07 |
|   1 |  SORT AGGREGATE      |            |     1 |     9 |            |          |
|   2 |   VIEW               | MDW_V1     |  1677 | 15093 |   552   (1)| 00:00:07 |
|   3 |    UNION-ALL         |            |       |       |            |          |
|*  4 |     TABLE ACCESS FULL| MDW_SOURCE |   559 | 19006 |   184   (1)| 00:00:03 |
|*  5 |     TABLE ACCESS FULL| MDW_SOURCE |   559 | 19006 |   184   (1)| 00:00:03 |
|*  6 |     TABLE ACCESS FULL| MDW_SOURCE |   559 | 19006 |   184   (1)| 00:00:03 |
-----------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   4 - filter(SUBSTR("MS"."OBJECT_NAME",1,1)='A')
   5 - filter(SUBSTR("MS"."OBJECT_NAME",1,1)='B')
   6 - filter(SUBSTR("MS"."OBJECT_NAME",1,1)='C')

Statistics
----------------------------------------------------------
          8  recursive calls
          0  db block gets
       2417  consistent gets


TEST_DB> -- below will do a select of a non-indexed column
TEST_DB> -- specific to one of the Union's in the view via the src_type
TEST_DB> select count(created)
  2  from mdw_v1
  3  where src_type='3'

COUNT(CREATED)
--------------
           129

Execution Plan
------------------------------------------------------------------------------------
| Id  | Operation             | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |            |     1 |    12 |   184   (1)| 00:00:03 |
|   1 |  SORT AGGREGATE       |            |     1 |    12 |            |          |
|   2 |   VIEW                | MDW_V1     |   561 |  6732 |   184   (1)| 00:00:03 |
|   3 |    UNION-ALL          |            |       |       |            |          |
|*  4 |     FILTER            |            |       |       |            |          |
|*  5 |      TABLE ACCESS FULL| MDW_SOURCE |   559 | 19006 |   184   (1)| 00:00:03 |
|*  6 |     FILTER            |            |       |       |            |          |
|*  7 |      TABLE ACCESS FULL| MDW_SOURCE |   559 | 19006 |   184   (1)| 00:00:03 |
|*  8 |     TABLE ACCESS FULL | MDW_SOURCE |   559 | 19006 |   184   (1)| 00:00:03 |
------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   4 - filter(NULL IS NOT NULL)
   5 - filter(SUBSTR("MS"."OBJECT_NAME",1,1)='A')
   6 - filter(NULL IS NOT NULL)
   7 - filter(SUBSTR("MS"."OBJECT_NAME",1,1)='B')
   8 - filter(SUBSTR("MS"."OBJECT_NAME",1,1)='C')

Statistics
----------------------------------------------------------
          8  recursive calls
          0  db block gets
        807  consistent gets


TEST_DB> -- below will do a select of a non-indexed column
TEST_DB> -- specific to one of the Union's in the view via matching the where clause
TEST_DB> select count(created)
  2  from mdw_v1
  3  where substr(object_name,1,1) ='C'

COUNT(CREATED)
--------------
           129

Execution Plan
------------------------------------------------------------------------------------
| Id  | Operation             | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |            |     1 |    75 |   184   (1)| 00:00:03 |
|   1 |  SORT AGGREGATE       |            |     1 |    75 |            |          |
|   2 |   VIEW                | MDW_V1     |   561 | 42075 |   184   (1)| 00:00:03 |
|   3 |    UNION-ALL          |            |       |       |            |          |
|*  4 |     FILTER            |            |       |       |            |          |
|*  5 |      TABLE ACCESS FULL| MDW_SOURCE |   559 | 19006 |   184   (1)| 00:00:03 |
|*  6 |     FILTER            |            |       |       |            |          |
|*  7 |      TABLE ACCESS FULL| MDW_SOURCE |   559 | 19006 |   184   (1)| 00:00:03 |
|*  8 |     TABLE ACCESS FULL | MDW_SOURCE |   559 | 19006 |   184   (1)| 00:00:03 |
------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   4 - filter(NULL IS NOT NULL)
   5 - filter(SUBSTR("MS"."OBJECT_NAME",1,1)='A')
   6 - filter(NULL IS NOT NULL)
   7 - filter(SUBSTR("MS"."OBJECT_NAME",1,1)='B')
   8 - filter(SUBSTR("MS"."OBJECT_NAME",1,1)='C')

Statistics
----------------------------------------------------------
          8  recursive calls
          0  db block gets
        807  consistent gets


TEST_DB> -- below will do a select of a non-indexed column
TEST_DB> -- specific to one of the Union's in the view via NOT matching the where clause
TEST_DB> -- but maybe logical exclusion can be managed
TEST_DB> select count(created)
  2  from mdw_v1
  3  where object_name ='C'

COUNT(CREATED)
--------------
             0

Execution Plan
-----------------------------------------------------------------------------------
| Id  | Operation            | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |            |     1 |    75 |   551   (1)| 00:00:07 |
|   1 |  SORT AGGREGATE      |            |     1 |    75 |            |          |
|   2 |   VIEW               | MDW_V1     |     3 |   225 |   551   (1)| 00:00:07 |
|   3 |    UNION-ALL         |            |       |       |            |          |
|*  4 |     TABLE ACCESS FULL| MDW_SOURCE |     1 |    34 |   184   (1)| 00:00:03 |
|*  5 |     TABLE ACCESS FULL| MDW_SOURCE |     1 |    34 |   184   (1)| 00:00:03 |
|*  6 |     TABLE ACCESS FULL| MDW_SOURCE |     1 |    34 |   184   (1)| 00:00:03 |
-----------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   4 - filter("MS"."OBJECT_NAME"='C' AND
              SUBSTR("MS"."OBJECT_NAME",1,1)='A')
   5 - filter("MS"."OBJECT_NAME"='C' AND
              SUBSTR("MS"."OBJECT_NAME",1,1)='B')
   6 - filter("MS"."OBJECT_NAME"='C' AND
              SUBSTR("MS"."OBJECT_NAME",1,1)='C')

Statistics
----------------------------------------------------------
          8  recursive calls
          0  db block gets
       2417  consistent gets

TEST_DB> -- below will do a select of a non-indexed column
TEST_DB> -- specific to one of the Union's in the view via NOT matching the where clause
TEST_DB> -- but logically it is the same.
TEST_DB> select count(created)
  2  from mdw_v1
  3  where object_name like 'C%'

COUNT(CREATED)
--------------
           129

Execution Plan
-----------------------------------------------------------------------------------
| Id  | Operation            | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |            |     1 |    75 |   551   (1)| 00:00:07 |
|   1 |  SORT AGGREGATE      |            |     1 |    75 |            |          |
|   2 |   VIEW               | MDW_V1     |    24 |  1800 |   551   (1)| 00:00:07 |
|   3 |    UNION-ALL         |            |       |       |            |          |
|*  4 |     TABLE ACCESS FULL| MDW_SOURCE |     8 |   272 |   184   (1)| 00:00:03 |
|*  5 |     TABLE ACCESS FULL| MDW_SOURCE |     8 |   272 |   184   (1)| 00:00:03 |
|*  6 |     TABLE ACCESS FULL| MDW_SOURCE |     8 |   272 |   184   (1)| 00:00:03 |
-----------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   4 - filter("MS"."OBJECT_NAME" LIKE 'C%' AND
              SUBSTR("MS"."OBJECT_NAME",1,1)='A')
   5 - filter("MS"."OBJECT_NAME" LIKE 'C%' AND
              SUBSTR("MS"."OBJECT_NAME",1,1)='B')
   6 - filter("MS"."OBJECT_NAME" LIKE 'C%' AND
              SUBSTR("MS"."OBJECT_NAME",1,1)='C')

Statistics
----------------------------------------------------------
          8  recursive calls
          0  db block gets
       2417  consistent gets

Above, I create the view, do a quick select count(*) to cause a full scan of all three section in the union view, then do selects using various WHERE predicates to see what sections of the unioned SQl statements are excluded.

The results are very intersting. The filtering on sub_type still works fine, but now the WHERE predicate “where substr(object_name,1,1) =’C’” allows the CBO to exclude sections of the union, the “Null is not null” filters appear and the consistent gets is 807. The WHERE predicate “where object_name =’C'” which at first glance you might think to be logically the same but is not (we selected object_names where only the first character is checked but the whole object_name is selected) acts does not do any exclusion and I thought it might. There are no OBJECT_NAMES of ‘C’ but logically as a human you can see that such an object name could not be found by the first two unioned SQL statements in any case.

The last test is most interesting. The WHERE clause of the SQL select over the view is “where object_name like ‘C%’” and is logically the same as the one WHERE clause in the unioned view “where substr(object_name,1,1) =’C’“. So no exclusion occurs and again we see 2417 consistent gets.

It would seem the CBO Pixies either have not coded for that or, as Jonathan says Oracle can be “touchy” about this feature. Bottom line, if you are expecting this sort of unioned view SQL exclusion to occur – TEST!

Exclusion of Unioned SQL in Views? August 16, 2012

Posted by mwidlake in performance, SQL.
Tags: , , ,
13 comments

Question – you have a view definition of the following format:

select 1 as d_type, col_a,col_b,col_c
from TAB_X, TAB_Y, TAB_Z
where {your predicates}
UNION
select 2 as d_type, col_a,col_b,col_c
from TAB_P, TAB_Q, TAB_R
where {your predicates}
UNION
select 3 as d_type, col_a,col_b,col_c
from TAB_X X, TAB_Y, TAB_Z
where {your predicates}

You now select from the view and your code includes a filter predicate such as:

“WHERE D_TYPE = 1”

What will the CBO do? Is it intelligent enough to do what a human would do and exclude the parts of the query that could not possibly return data you want? Ie exclude the SQL statements where d_type is set to 2 and 3. Or is that funcitonality not built into the CBO? After all, it is only software and someone would have to decide that there could be benefit of looking into set of unioned SQL selects to see if any could be excluded and then code that into the CBO.

After all, the CBO is not smart enough to realise it could easily satisfy “select min(col_a), max(col_a) from tab x” via two fast lookups on a supporting index. It scans the index.

So – what is your guess?

I asked 3 people. Not a massive straw poll but I got one vote for it will not exclude, one for it will and one smart answer of “I would expect it to exclude- but as you asked the question I am thinking it does not”.

Let’s do the test.

DB_TEST> @demo_union_view_2
DB_TEST> drop table mdw_source purge;
Table dropped.

  1  create table mdw_source
  2  tablespace users
  3  as select * from dba_objects
  4  /
Table created.

  1  create index mdso_idx on mdw_source(owner,object_name)
  2  tablespace users
  3  /
Index created.

DB_TEST> exec dbms_stats.gather_table_stats(ownname=>USER,tabname=>'MDW_SOURCE',method_opt=>'for all columns size 1')
PL/SQL procedure successfully completed.

  1  select table_name,num_rows from dba_tab_statistics
  2  where owner=USER and table_name='MDW_SOURCE';

TABLE_NAME      NUM_ROWS
--------------- ----------
MDW_SOURCE      55895

  1  create or replace view mdw_v1 as
  2  select
  3    '1'              src_type
  4   ,ms.OWNER
  5   ,ms.OBJECT_NAME
  6   ,ms.SUBOBJECT_NAME
  7   ,ms.CREATED
  8  from mdw_source ms
  9  where ms.object_name like 'A%'
 10  union all
 11  select
 12    '2'              src_type
 13   ,ms.OWNER
 14   ,ms.OBJECT_NAME
 15   ,ms.SUBOBJECT_NAME
 16   ,ms.CREATED
 17  from mdw_source ms
 18  where ms.object_name like 'B%'
 19  UNION ALL
 20  select
 21    '3'              src_type
 22   ,ms.OWNER
 23   ,ms.OBJECT_NAME
 24   ,ms.SUBOBJECT_NAME
 25   ,ms.CREATED
 26  from mdw_source ms
 27  where ms.object_name like 'C%'
 28  /
View created.

So, a simple table is created from DBA_OBJECTS and a view over the top of the table consisting on 3 simple selects unioned together, with a static value (1, 2 or 3) generated for each of the 3 parts as SRC_TYPE

Now let’s run some tests:

DB_TEST> set autotrace on
DB_TEST> --
DB_TEST> -- below will do a select of a non-indexed column across the view
DB_TEST> select count(created) from mdw_v1
  2  /

COUNT(CREATED)
--------------
          1284

Execution Plan
----------------------------------------------------------
Plan hash value: 4259489107
-----------------------------------------------------------------------------------
| Id  | Operation            | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |            |     1 |     9 |   551   (1)| 00:00:07 |
|   1 |  SORT AGGREGATE      |            |     1 |     9 |            |          |
|   2 |   VIEW               | MDW_V1     |  2463 | 22167 |   551   (1)| 00:00:07 |
|   3 |    UNION-ALL         |            |       |       |            |          |
|*  4 |     TABLE ACCESS FULL| MDW_SOURCE |   821 | 27914 |   184   (1)| 00:00:03 |
|*  5 |     TABLE ACCESS FULL| MDW_SOURCE |   821 | 27914 |   184   (1)| 00:00:03 |
|*  6 |     TABLE ACCESS FULL| MDW_SOURCE |   821 | 27914 |   184   (1)| 00:00:03 |
-----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - filter("MS"."OBJECT_NAME" LIKE 'A%')
   5 - filter("MS"."OBJECT_NAME" LIKE 'B%')
   6 - filter("MS"."OBJECT_NAME" LIKE 'C%')

Statistics
----------------------------------------------------------
          8  recursive calls
          0  db block gets
       2417  consistent gets


DB_TEST> -- below will do a select of a non-indexed column across the view
DB_TEST> -- limiting it to "one" of the types
DB_TEST> select count(created)
  2  from mdw_v1
  3  where src_type='3'
  4  /

COUNT(CREATED)
--------------
           129


Execution Plan
----------------------------------------------------------
Plan hash value: 1188403912
------------------------------------------------------------------------------------
| Id  | Operation             | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |            |     1 |    12 |   184   (1)| 00:00:03 |
|   1 |  SORT AGGREGATE       |            |     1 |    12 |            |          |
|   2 |   VIEW                | MDW_V1     |   823 |  9876 |   184   (1)| 00:00:03 |
|   3 |    UNION-ALL          |            |       |       |            |          |
|*  4 |     FILTER            |            |       |       |            |          |
|*  5 |      TABLE ACCESS FULL| MDW_SOURCE |   821 | 27914 |   184   (1)| 00:00:03 |
|*  6 |     FILTER            |            |       |       |            |          |
|*  7 |      TABLE ACCESS FULL| MDW_SOURCE |   821 | 27914 |   184   (1)| 00:00:03 |
|*  8 |     TABLE ACCESS FULL | MDW_SOURCE |   821 | 27914 |   184   (1)| 00:00:03 |
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - filter(NULL IS NOT NULL)
   5 - filter("MS"."OBJECT_NAME" LIKE 'A%')
   6 - filter(NULL IS NOT NULL)
   7 - filter("MS"."OBJECT_NAME" LIKE 'B%')
   8 - filter("MS"."OBJECT_NAME" LIKE 'C%')

Statistics
----------------------------------------------------------
          8  recursive calls
          0  db block gets
        807  consistent gets


DB_TEST> -- below will do a select of an indexed column across the view
DB_TEST> select count(owner) from mdw_v1
  2  /

COUNT(OWNER)
------------
        1284


Execution Plan
----------------------------------------------------------
Plan hash value: 4219520050
------------------------------------------------------------------------------------
| Id  | Operation               | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |          |     1 |    17 |   221   (1)| 00:00:03 |
|   1 |  SORT AGGREGATE         |          |     1 |    17 |            |          |
|   2 |   VIEW                  | MDW_V1   |  2463 | 41871 |   221   (1)| 00:00:03 |
|   3 |    UNION-ALL            |          |       |       |            |          |
|*  4 |     INDEX FAST FULL SCAN| MDSO_IDX |   821 | 26272 |    74   (2)| 00:00:01 |
|*  5 |     INDEX FAST FULL SCAN| MDSO_IDX |   821 | 26272 |    74   (2)| 00:00:01 |
|*  6 |     INDEX FAST FULL SCAN| MDSO_IDX |   821 | 26272 |    74   (2)| 00:00:01 |
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - filter("MS"."OBJECT_NAME" LIKE 'A%')
   5 - filter("MS"."OBJECT_NAME" LIKE 'B%')
   6 - filter("MS"."OBJECT_NAME" LIKE 'C%')

Statistics
----------------------------------------------------------
          8  recursive calls
          0  db block gets
        989  consistent gets


DB_TEST> --
DB_TEST> -- below will do a select of an indexed column across the view
DB_TEST> -- limiting it to "one" of the types
DB_TEST> select count(owner)
  2  from mdw_v1
  3  where src_type='3'
  4  /

COUNT(OWNER)
------------
         129


Execution Plan
----------------------------------------------------------
Plan hash value: 815942527
-------------------------------------------------------------------------------------
| Id  | Operation                | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |          |     1 |    20 |    74   (2)| 00:00:01 |
|   1 |  SORT AGGREGATE          |          |     1 |    20 |            |          |
|   2 |   VIEW                   | MDW_V1   |   823 | 16460 |    74   (2)| 00:00:01 |
|   3 |    UNION-ALL             |          |       |       |            |          |
|*  4 |     FILTER               |          |       |       |            |          |
|*  5 |      INDEX FAST FULL SCAN| MDSO_IDX |   821 | 26272 |    74   (2)| 00:00:01 |
|*  6 |     FILTER               |          |       |       |            |          |
|*  7 |      INDEX FAST FULL SCAN| MDSO_IDX |   821 | 26272 |    74   (2)| 00:00:01 |
|*  8 |     INDEX FAST FULL SCAN | MDSO_IDX |   821 | 26272 |    74   (2)| 00:00:01 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - filter(NULL IS NOT NULL)
   5 - filter("MS"."OBJECT_NAME" LIKE 'A%')
   6 - filter(NULL IS NOT NULL)
   7 - filter("MS"."OBJECT_NAME" LIKE 'B%')
   8 - filter("MS"."OBJECT_NAME" LIKE 'C%')

Statistics
----------------------------------------------------------
          8  recursive calls
          0  db block gets
        331  consistent gets


DB_TEST> -- below will select against the undelying object we are interested in
DB_TEST> select count(created)
  2  from mdw_source ms
  3  where ms.object_name like 'C%'
  4  /

COUNT(CREATED)
--------------
           129


Execution Plan
----------------------------------------------------------
Plan hash value: 152094671

---------------------------------------------------------------------------------
| Id  | Operation          | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |            |     1 |    34 |   184   (1)| 00:00:03 |
|   1 |  SORT AGGREGATE    |            |     1 |    34 |            |          |
|*  2 |   TABLE ACCESS FULL| MDW_SOURCE |   821 | 27914 |   184   (1)| 00:00:03 |
---------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("MS"."OBJECT_NAME" LIKE 'C%')

Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
        805  consistent gets

Look at the plans at the start.

When my query does not include SRC_TYPE,ie the first SQL statement, the plan is simple – it scans the table three times, estimates 821 rows per scan and a cost of 184 for each step. When I include the SRC_TYPE = ‘3’ in the next step we see that the plan still includes all three full table scans, each still estimated at a cost of 184.
When you provide a predicate that identifies the one sql statement that you want to run, it still runs all three. I tried a few further tests and could not force the exclusion of the “unwanted” queries from the plan.

I showed my friend above (the one who suggested it would not work as I had asked the question) the results and we agreed that it was a little unfortunate that the Oracle Optimizer Pixies had not coded in the ability to exclude unwanted Unioned statements, but we could understand it.

However, for those of you thinking “you are missing something” – you are right. {And, to be fair to my friend, my example was not quite so simplified as what I have posted here}.

I decided to quantify the level of the impact and so shifted my concentration to the buffer gets. Look at the first and second statements again. The cost of the total statement drops, from 551 to 184. The actual consistent gets resulting from executing the SQL is dropping from 2417 to 807. They are both dropping dramatically to approximately 1/3.

Now check out the filter predicates at step 4 and 6 – “NULL IS NOT NULL”. That will always be false. Everything “after” that statement in the plan will not be executed. The step in the plan is there and, as a step, has the same cost – but it is is not being executed.

The pattern is repeated when the simple SQL statement is on an indexed column and the index is being used (the next two statements) {and honesty forces me to admit I have included more output than I need for this example, as I wanted to confuse the issue slightly and maybe encourage you to not read the plans too carefully}.

So yes, the CBO IS excluding the individual unioned SQL statements based on the stated SRC_TYPE and the fixed value stated in each part of the union.

I’m quite impressed by that. Clever Oracle Optimizer Pixies.

To tie this back to reality, I initially looked at this as I am trying to tune some code that hides all the complexity of the application under one massive unioned uber-view over a set of 9 other complex, unioned views. The explain plan runs to close to 700 lines. I had utterly overlooked the filters and filter predicates in that – heck I missed it initially in my simple test:-) {which was more complex than in this post, in my defense}. Trying to work with a very slow statement based on nested, complex views that is used by pretty much the whole application is “fun”.

Also back in reality, the filter clause I see in my real code is along the lines of:

:B17 is null or is not null

This is as bind variable is being considerd – and that is actually one of the considerations back in this real world of the original issue. The level that the master query needs to run in is decided by this bind variable – otherwise the application developer would (should have?) just pointed at the sub-view they needed for that level. So the CBO does not know which chunk of the union’d code it can ignore until it sees that bind variable, so the plan shows what could be executed.

Of course, it could be argued that if you know the type at the time you query the database, rather than use the above union filtering trick to cut down your massively complex view to the part that is required, just query the sub view you need directly. A bit more code but a lot simpler system to look after.

It’s a nice SQL trick but I would keep it to simple examples!

Finaly, just for fun, here is the plan I’ve been trying to tune. As I said, close to 700 steps and only the highlighted grey section on the left is shown in more detail on the right.

Follow

Get every new post delivered to your Inbox.

Join 230 other followers