jump to navigation

Row Level Security Part 2 – permissions November 20, 2012

Posted by mwidlake in internals, security, SQL.
Tags: , ,
3 comments

<..Part 1, introduction..
..Part 3 summary in pictures..>

In this second post on the topic of “an introduction to Row Level Security” I want to cover a few things about what permissions you need to implement RLS and some of the consequences. In my introduction in part one I just said my main user has “DBA type Privileges”.

{NB This is all on Oracle V11.2 and I believe everything below is applicable to V10 as well. Also, I should point out that I am not an Oracle security expert – but despite repeatedly saying this, it seems like at least once a year I am asked to improve a system’s security on the grounds of “more than we have now is an improvement”}.

Firstly, you need the permission to create and drop RLS policies. This is controlled by having the EXECUTE privilege on SYS.DBMS_RLS. That is it. If you can execute this package you can create and drop RLS Policies, enable and disable them and also create, manage and use Policy Groups.

Obviously EXECUTE on SYS.DBMS_RLS is a privilege you need to restrict to only “empowered” accounts- but you also want to be very careful about your definition of empowered. If an account has this privilege, it can alter (disable!) any RLS policies that ANYONE created on the instance. You may need to think carefully about having several accounts that can alter all the RLS policies across all schemas in your instance.

You might want to create a single user that has the EXECUTE on SYS.DBMS_RLS and use that to implement and control RLS across the instance {and even have the account locked when it is not needed}. Of course, under normal Oracle anyone with SYS access can run the procedure – but you have SYS access under control already, don’t you?… :-).

By default, who has this privilege?

MDW> @obj_privs
Enter value for obj_name: dbms_rls
Enter value for obj_own: sys
GRANTEE              OWNER        TABLE_NAME             GRANTOR  PRIVILEGE    GRA
-------------------- ------------ ---------------------- -------- ------------ ---
EXECUTE_CATALOG_ROLE SYS          DBMS_RLS               SYS      EXECUTE      NO
XDB                  SYS          DBMS_RLS               SYS      EXECUTE      NO
WMSYS                SYS          DBMS_RLS               SYS      EXECUTE      NO

--
-- and who has that EXECUTE_CATALOG_ROLE?
--
select * from dba_role_privs where granted_role='EXECUTE_CATALOG_ROLE'

GRANTEE                        GRANTED_ROLE                   ADM DEF
------------------------------ ------------------------------ --- ---
DBA                            EXECUTE_CATALOG_ROLE           YES YES
IMP_FULL_DATABASE              EXECUTE_CATALOG_ROLE           NO  YES
EXP_FULL_DATABASE              EXECUTE_CATALOG_ROLE           NO  YES
SYS                            EXECUTE_CATALOG_ROLE           YES YES

So, our old friend the DBA role has that privilege, via the EXECUTE_CATALOG_ROLE and IMP_/EXP_FULL_DATABASE.

Let’s go back to creating policies. I’m sticking to row-level RLS for now, not column-level. When you create a policy you basically are saying “when object X is accessed, run function Y and apply the resulting string to all access, as a predicate”

Lifted an example from the manual, but adding in a couple of bits the authors were too lazy to put in { :-) } :-

DBMS_RLS.ADD_POLICY (
 object_schema   =>   'hr'
,object_name     => 'employees'
,policy_name     => 'emp_policy'
,function_schema => 'hr_rls'
,policy_function => 'emp_sec'
,statement_types => 'select');

The function HR_RLS.EMP_SEC returns a string P1

When someone issues a statement that access the HR.EMPLOYEES table the function HR_RLS.EMP_SEC is executed as part of the parse and, internally, the following view is created and placed into the orginal statement:

SELECT * FROM hr.employees WHERE P1;

As this takes place as part of the parse part of the statement processing, errors with the function are found before any effort to access the actual data occur. It is also why it is not simple to avoid RLS – it is done as part of the overall SQL processing carried out by Oracle.

If the function is invalid, generates an error or the supplied “WHERE predicate” P1 causes the final statement to be syntactically incorrect, you get an error.
At this point Oracle fails “secure” and simply won’t fire the SQL statement. Even if you own the tables and have full privilege access to them. I think it is worth a quick demonstration

I am logged on as the owner of the table and RLS function, MDW:-

MDW> select * from test1
        ID CP_NAME                        OTHER_NAME                        MASK_FL
---------- ------------------------------ ------------------------------ ----------
         1 ERIC                           ERIC THE RED                            1
         2 BROWN                          BOB THE BROWN                           2
         3 GREEN                          GARY THE GREEN                          1
         4 BLUE                           BILL THE BLUE                           3
         5 BLACK                          DAVE THE BLACK                          4
         6 PURPLE                         PETE THE PURPLE                         5
         7 RED                            ROGER THE RED                           6
--
--All works OK
MDW> @rls_func_damage
--
CREATE OR REPLACE FUNCTION hide_rows (
 v_schema IN VARCHAR2
,v_objname IN VARCHAR2)
RETURN VARCHAR2 AS
con VARCHAR2 (200);
BEGIN
-- The line below states loser not user
con := (loser='''||v_schema||''' or MASK_FL > 3)';
RETURN (con);
END hide_rows;
/
Function created.
--
-- Let us check the predicate coming from the function
--
select hide_rows(v_schema=>user,v_objname=>'TEST1') from dual;

HIDE_ROWS(V_SCHEMA=>USER,V_OBJNAME=>'TEST1')
---------------------------------------------------------------------------------------
(loser='MDW' or MASK_FL > 3)

-- "loser" is not a valid piece of syntax or a known variable. So...
--
select * from TEST1
              *
ERROR at line 1:
ORA-28113: policy predicate has error

-- I now damage the statement creating the function so that it is syntactically incorrect
CREATE OR REPLACE FUNCTION hide_rows (
...
-- missing quote
con := (loser='''||v_schema||''' or MASK_FL > 3)';
...

Warning: Function created with compilation errors.

select * from TEST1
              *
ERROR at line 1:
ORA-28110: policy function or package MDW.HIDE_ROWS has error

--re-create the function as I started with

Function created.

select * from TEST1
        ID CP_NAME                        OTHER_NAME                        MASK_FL
---------- ------------------------------ ------------------------------ ----------
         1 ERIC                           ERIC THE RED                            1
         2 BROWN                          BOB THE BROWN                           2
...
         7 RED                            ROGER THE RED                           6
--
-- Back to a working system

As you can see, you get ORA-28110 if the function used by the RLS policy is damaged and ORA-28113 if the function returns a non-valid WHERE Predicate. In either case, the data in those tables is now inaccessible. That could be a problem…

It is of course very interesting that RLS is kind of “blind” to the functions it uses until it uses it. You can alter RLS activity if you can alter the functions.

Last post I mentioned that, as you RLS security is based on these functions, you should give thought to who can see them and change them. Hopefully the above highlights that. There is another access aspect to these functions that is key.

In the first example where I use DBMS_RLS.ADD_POLICY, I state function_schema – this is who owns the PL/SQL function. It defaults to the person who creates the Policy. The function is fired in the standard way with DEFINER RIGHTS – i.e. the rights of the function owner. {You could override this when creating the function such that it runs with executioner rights – but that strikes me as a potentially very confusing and a source of much pain and anguish}.

So the function can see what the owner can see, not the executioner. This has two key effects:

  • You can restrict the access to any DB objects that the function requires to the owner of the function. The end user does not need to have access and it may be more secure if they do not.
  • You have to grant access to objects directly to the function owner, via either object grants or system grants. Stored PL/SQL does not “see” roles if executed with definer rights.

So eg if your function is owned by MDW and references a master table called SEC_MASTER.MASKED_USERS then you need to grant select on SEC_MASTER.MASKED_USERS to MDW. No using roles.
{of course, calling a table MASKED_USERS might give anyone acquiring access to the DB a clue that it is important. I tend to call such “security important” tables things that are “obviously” boring, like ROTA_DATA.}

Finally, that WHERE Predicate P1 is in effect added to your SQL statement. It can be pretty much anything. It can reference other database objects. If it references other database objects THE EXECUTING USER MUST BE ABLE TO SEE THEM. Not the owner of the function; that function has been fired and the WHERE Predicate generated already. On actually running the SQL, the executing user must have access to all objects referenced, including those introduced by the P1 WHERE Predicate. Oh, and remember synonyms!

There is there a way to sidestep RLS and you sometimes need to.

If your logon has the EXEMPT ACCESS POLICY system privilege then RLS functions are utterly ignored by you. No check is made to see if policies apply to your SQL statements, let alone run them. You may need to use this privilege if you need to access functionality that RLS has issues with (see comments by Dom Brooks and Tony Sleight on my first post). You might also need it if you have errors with the policies but you need access to the data to keep the business moving. You probably need a locked-down user somewhere with the EXEMPT ACCESS POLICY privilege if you use RLS.
Of course, great care has to be taken in making use of this privilege as it side-steps all RLS_based security.
If you have master users that have full access to the data, this privilege also removes the overhead of the SQL engine identifying any policies, firing the function and including the extra predicates.

One very final point. I have found myself writing RLS functions that all check if the user was the owner of the table and, if so, allowing them to see all data. By granting EXEMPT ACCESS POLICY to that user I was able to do away with those checks and make the functions much simpler. In this case I did not so much think of EXEMPT ACCESS POLICY turning off security but the lack of it turning it on for restricted users.

That will do for part 2.

Exclusion of Unioned SQL in Views – Followup August 20, 2012

Posted by mwidlake in database design, performance, SQL.
Tags: , ,
add a comment

Last week I put up a post about how Oracle can filter out sections of a union view..

Within the comments I put up another example where the CBO did not filter out all but one of the Union views despite my replicating the exact WHERE clause of one of the unioned statements. Jonathan Lewis posted a followup to say “Oracle can be very touchy about how it plays this game” and made a prediction of how the CBO would handle a slightly different scenario.

This was the extra scenario and I include brief details on creating the unioned view too. NB all on Oracle 11.2.0.2. {non-Exadata :-) }

  1  create table mdw_source
  2  tablespace users
  3  as select * from dba_objects

  1  create or replace view mdw_v1 as
  2  select
  3    '1'              src_type
  4   ,ms.OWNER ,ms.OBJECT_NAME,ms.SUBOBJECT_NAME ,ms.CREATED
  5  from mdw_source ms
  6  where ms.object_name like 'A%'
  7  union all
  8  select
  9    '2'              src_type
 10   ,ms.OWNER ,ms.OBJECT_NAME,ms.SUBOBJECT_NAME ,ms.CREATED
 11  from mdw_source ms
 12  where ms.object_name like 'B%'
 13  UNION ALL
 14  select
 15    '3'              src_type
 16   ,ms.OWNER ,ms.OBJECT_NAME,ms.SUBOBJECT_NAME ,ms.CREATED
 17  from mdw_source ms
 18  where ms.object_name like 'C%'

1 select count(created)
2 from mdw_v1 mv
3 where mv.object_name like ‘C%’

Execution Plan
———————————————————-
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
———————————————————————————–
| 0 | SELECT STATEMENT | | 1 | 75 | 551 (1)| 00:00:07 |
| 1 | SORT AGGREGATE | | 1 | 75 | | |
| 2 | VIEW | MDW_V1 | 824 | 61800 | 551 (1)| 00:00:07 |
| 3 | UNION-ALL | | | | | |
|* 4 | TABLE ACCESS FULL| MDW_SOURCE | 1 | 34 | 184 (1)| 00:00:03 |
|* 5 | TABLE ACCESS FULL| MDW_SOURCE | 2 | 68 | 184 (1)| 00:00:03 |
|* 6 | TABLE ACCESS FULL| MDW_SOURCE | 821 | 27914 | 184 (1)| 00:00:03 |
———————————————————————————–

Predicate Information (identified by operation id):
—————————————————
4 – filter(“MS”.”OBJECT_NAME” LIKE ‘A%’ AND “MS”.”OBJECT_NAME” LIKE ‘C%’)
5 – filter(“MS”.”OBJECT_NAME” LIKE ‘B%’ AND “MS”.”OBJECT_NAME” LIKE ‘C%’)
6 – filter(“MS”.”OBJECT_NAME” LIKE ‘C%’)

Statistics
———————————————————-
8 recursive calls
0 db block gets
2417 consistent gets 

So, as you can see, despite me providing the WHERE clause in my final select as “where mv.object_name like ‘C%’” and the WHERE clauses of the unioned view are all similar and one is “ms.object_name like ‘C%’” the CBO fails to realise it can exclude all but one of the unioned SQL statements – despite the mutally exclusive filter predicates:

4 – filter(“MS”.”OBJECT_NAME” LIKE ‘A%’ AND “MS”.”OBJECT_NAME” LIKE ‘C%’)
5 – filter(“MS”.”OBJECT_NAME” LIKE ‘B%’ AND “MS”.”OBJECT_NAME” LIKE ‘C%’)

I have to confess, in light of recent experience I would have wondered if these filter predicates would prevent the other two SQL statements being fired at execution time but the number of consistent gets confirms that it is not – 2417 compared to 805 or so for when the query exclusion occurs. You can check back to the prior post to verify that.

So, let’s do the tests Jonathan suggested. He suggested altering the view to use equality predicates so I have recreated the view as such:

TEST_DB> create or replace view mdw_v1 as
  2  select
  3    '1'              src_type
  4    ,ms.OWNER ,ms.OBJECT_NAME ,ms.SUBOBJECT_NAME ,ms.CREATED
  5  from mdw_source ms
  6  where substr(ms.object_name,1,1) ='A'
  7  union all
  8  select
  9    '2'              src_type
 10   ,ms.OWNER ,ms.OBJECT_NAME ,ms.SUBOBJECT_NAME ,ms.CREATED
 11  from mdw_source ms
 12  where substr(ms.object_name,1,1) ='B'
 13  UNION ALL
 14  select
 15    '3'              src_type
 16   ,ms.OWNER ,ms.OBJECT_NAME ,ms.SUBOBJECT_NAME ,ms.CREATED
 17  from mdw_source ms
 18  where substr(ms.object_name,1,1) ='C'

View created.

TEST_DB> set autotrace on
TEST_DB> --
TEST_DB> -- below will do a select of a non-indexed column across the view
TEST_DB> select count(created) from mdw_v1

COUNT(CREATED)
--------------
          1288

Execution Plan
-----------------------------------------------------------------------------------
| Id  | Operation            | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |            |     1 |     9 |   552   (1)| 00:00:07 |
|   1 |  SORT AGGREGATE      |            |     1 |     9 |            |          |
|   2 |   VIEW               | MDW_V1     |  1677 | 15093 |   552   (1)| 00:00:07 |
|   3 |    UNION-ALL         |            |       |       |            |          |
|*  4 |     TABLE ACCESS FULL| MDW_SOURCE |   559 | 19006 |   184   (1)| 00:00:03 |
|*  5 |     TABLE ACCESS FULL| MDW_SOURCE |   559 | 19006 |   184   (1)| 00:00:03 |
|*  6 |     TABLE ACCESS FULL| MDW_SOURCE |   559 | 19006 |   184   (1)| 00:00:03 |
-----------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   4 - filter(SUBSTR("MS"."OBJECT_NAME",1,1)='A')
   5 - filter(SUBSTR("MS"."OBJECT_NAME",1,1)='B')
   6 - filter(SUBSTR("MS"."OBJECT_NAME",1,1)='C')

Statistics
----------------------------------------------------------
          8  recursive calls
          0  db block gets
       2417  consistent gets


TEST_DB> -- below will do a select of a non-indexed column
TEST_DB> -- specific to one of the Union's in the view via the src_type
TEST_DB> select count(created)
  2  from mdw_v1
  3  where src_type='3'

COUNT(CREATED)
--------------
           129

Execution Plan
------------------------------------------------------------------------------------
| Id  | Operation             | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |            |     1 |    12 |   184   (1)| 00:00:03 |
|   1 |  SORT AGGREGATE       |            |     1 |    12 |            |          |
|   2 |   VIEW                | MDW_V1     |   561 |  6732 |   184   (1)| 00:00:03 |
|   3 |    UNION-ALL          |            |       |       |            |          |
|*  4 |     FILTER            |            |       |       |            |          |
|*  5 |      TABLE ACCESS FULL| MDW_SOURCE |   559 | 19006 |   184   (1)| 00:00:03 |
|*  6 |     FILTER            |            |       |       |            |          |
|*  7 |      TABLE ACCESS FULL| MDW_SOURCE |   559 | 19006 |   184   (1)| 00:00:03 |
|*  8 |     TABLE ACCESS FULL | MDW_SOURCE |   559 | 19006 |   184   (1)| 00:00:03 |
------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   4 - filter(NULL IS NOT NULL)
   5 - filter(SUBSTR("MS"."OBJECT_NAME",1,1)='A')
   6 - filter(NULL IS NOT NULL)
   7 - filter(SUBSTR("MS"."OBJECT_NAME",1,1)='B')
   8 - filter(SUBSTR("MS"."OBJECT_NAME",1,1)='C')

Statistics
----------------------------------------------------------
          8  recursive calls
          0  db block gets
        807  consistent gets


TEST_DB> -- below will do a select of a non-indexed column
TEST_DB> -- specific to one of the Union's in the view via matching the where clause
TEST_DB> select count(created)
  2  from mdw_v1
  3  where substr(object_name,1,1) ='C'

COUNT(CREATED)
--------------
           129

Execution Plan
------------------------------------------------------------------------------------
| Id  | Operation             | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |            |     1 |    75 |   184   (1)| 00:00:03 |
|   1 |  SORT AGGREGATE       |            |     1 |    75 |            |          |
|   2 |   VIEW                | MDW_V1     |   561 | 42075 |   184   (1)| 00:00:03 |
|   3 |    UNION-ALL          |            |       |       |            |          |
|*  4 |     FILTER            |            |       |       |            |          |
|*  5 |      TABLE ACCESS FULL| MDW_SOURCE |   559 | 19006 |   184   (1)| 00:00:03 |
|*  6 |     FILTER            |            |       |       |            |          |
|*  7 |      TABLE ACCESS FULL| MDW_SOURCE |   559 | 19006 |   184   (1)| 00:00:03 |
|*  8 |     TABLE ACCESS FULL | MDW_SOURCE |   559 | 19006 |   184   (1)| 00:00:03 |
------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   4 - filter(NULL IS NOT NULL)
   5 - filter(SUBSTR("MS"."OBJECT_NAME",1,1)='A')
   6 - filter(NULL IS NOT NULL)
   7 - filter(SUBSTR("MS"."OBJECT_NAME",1,1)='B')
   8 - filter(SUBSTR("MS"."OBJECT_NAME",1,1)='C')

Statistics
----------------------------------------------------------
          8  recursive calls
          0  db block gets
        807  consistent gets


TEST_DB> -- below will do a select of a non-indexed column
TEST_DB> -- specific to one of the Union's in the view via NOT matching the where clause
TEST_DB> -- but maybe logical exclusion can be managed
TEST_DB> select count(created)
  2  from mdw_v1
  3  where object_name ='C'

COUNT(CREATED)
--------------
             0

Execution Plan
-----------------------------------------------------------------------------------
| Id  | Operation            | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |            |     1 |    75 |   551   (1)| 00:00:07 |
|   1 |  SORT AGGREGATE      |            |     1 |    75 |            |          |
|   2 |   VIEW               | MDW_V1     |     3 |   225 |   551   (1)| 00:00:07 |
|   3 |    UNION-ALL         |            |       |       |            |          |
|*  4 |     TABLE ACCESS FULL| MDW_SOURCE |     1 |    34 |   184   (1)| 00:00:03 |
|*  5 |     TABLE ACCESS FULL| MDW_SOURCE |     1 |    34 |   184   (1)| 00:00:03 |
|*  6 |     TABLE ACCESS FULL| MDW_SOURCE |     1 |    34 |   184   (1)| 00:00:03 |
-----------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   4 - filter("MS"."OBJECT_NAME"='C' AND
              SUBSTR("MS"."OBJECT_NAME",1,1)='A')
   5 - filter("MS"."OBJECT_NAME"='C' AND
              SUBSTR("MS"."OBJECT_NAME",1,1)='B')
   6 - filter("MS"."OBJECT_NAME"='C' AND
              SUBSTR("MS"."OBJECT_NAME",1,1)='C')

Statistics
----------------------------------------------------------
          8  recursive calls
          0  db block gets
       2417  consistent gets

TEST_DB> -- below will do a select of a non-indexed column
TEST_DB> -- specific to one of the Union's in the view via NOT matching the where clause
TEST_DB> -- but logically it is the same.
TEST_DB> select count(created)
  2  from mdw_v1
  3  where object_name like 'C%'

COUNT(CREATED)
--------------
           129

Execution Plan
-----------------------------------------------------------------------------------
| Id  | Operation            | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |            |     1 |    75 |   551   (1)| 00:00:07 |
|   1 |  SORT AGGREGATE      |            |     1 |    75 |            |          |
|   2 |   VIEW               | MDW_V1     |    24 |  1800 |   551   (1)| 00:00:07 |
|   3 |    UNION-ALL         |            |       |       |            |          |
|*  4 |     TABLE ACCESS FULL| MDW_SOURCE |     8 |   272 |   184   (1)| 00:00:03 |
|*  5 |     TABLE ACCESS FULL| MDW_SOURCE |     8 |   272 |   184   (1)| 00:00:03 |
|*  6 |     TABLE ACCESS FULL| MDW_SOURCE |     8 |   272 |   184   (1)| 00:00:03 |
-----------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   4 - filter("MS"."OBJECT_NAME" LIKE 'C%' AND
              SUBSTR("MS"."OBJECT_NAME",1,1)='A')
   5 - filter("MS"."OBJECT_NAME" LIKE 'C%' AND
              SUBSTR("MS"."OBJECT_NAME",1,1)='B')
   6 - filter("MS"."OBJECT_NAME" LIKE 'C%' AND
              SUBSTR("MS"."OBJECT_NAME",1,1)='C')

Statistics
----------------------------------------------------------
          8  recursive calls
          0  db block gets
       2417  consistent gets

Above, I create the view, do a quick select count(*) to cause a full scan of all three section in the union view, then do selects using various WHERE predicates to see what sections of the unioned SQl statements are excluded.

The results are very intersting. The filtering on sub_type still works fine, but now the WHERE predicate “where substr(object_name,1,1) =’C’” allows the CBO to exclude sections of the union, the “Null is not null” filters appear and the consistent gets is 807. The WHERE predicate “where object_name =’C'” which at first glance you might think to be logically the same but is not (we selected object_names where only the first character is checked but the whole object_name is selected) acts does not do any exclusion and I thought it might. There are no OBJECT_NAMES of ‘C’ but logically as a human you can see that such an object name could not be found by the first two unioned SQL statements in any case.

The last test is most interesting. The WHERE clause of the SQL select over the view is “where object_name like ‘C%’” and is logically the same as the one WHERE clause in the unioned view “where substr(object_name,1,1) =’C’“. So no exclusion occurs and again we see 2417 consistent gets.

It would seem the CBO Pixies either have not coded for that or, as Jonathan says Oracle can be “touchy” about this feature. Bottom line, if you are expecting this sort of unioned view SQL exclusion to occur – TEST!

Exclusion of Unioned SQL in Views? August 16, 2012

Posted by mwidlake in performance, SQL.
Tags: , , ,
13 comments

Question – you have a view definition of the following format:

select 1 as d_type, col_a,col_b,col_c
from TAB_X, TAB_Y, TAB_Z
where {your predicates}
UNION
select 2 as d_type, col_a,col_b,col_c
from TAB_P, TAB_Q, TAB_R
where {your predicates}
UNION
select 3 as d_type, col_a,col_b,col_c
from TAB_X X, TAB_Y, TAB_Z
where {your predicates}

You now select from the view and your code includes a filter predicate such as:

“WHERE D_TYPE = 1″

What will the CBO do? Is it intelligent enough to do what a human would do and exclude the parts of the query that could not possibly return data you want? Ie exclude the SQL statements where d_type is set to 2 and 3. Or is that funcitonality not built into the CBO? After all, it is only software and someone would have to decide that there could be benefit of looking into set of unioned SQL selects to see if any could be excluded and then code that into the CBO.

After all, the CBO is not smart enough to realise it could easily satisfy “select min(col_a), max(col_a) from tab x” via two fast lookups on a supporting index. It scans the index.

So – what is your guess?

I asked 3 people. Not a massive straw poll but I got one vote for it will not exclude, one for it will and one smart answer of “I would expect it to exclude- but as you asked the question I am thinking it does not”.

Let’s do the test.

DB_TEST> @demo_union_view_2
DB_TEST> drop table mdw_source purge;
Table dropped.

  1  create table mdw_source
  2  tablespace users
  3  as select * from dba_objects
  4  /
Table created.

  1  create index mdso_idx on mdw_source(owner,object_name)
  2  tablespace users
  3  /
Index created.

DB_TEST> exec dbms_stats.gather_table_stats(ownname=>USER,tabname=>'MDW_SOURCE',method_opt=>'for all columns size 1')
PL/SQL procedure successfully completed.

  1  select table_name,num_rows from dba_tab_statistics
  2  where owner=USER and table_name='MDW_SOURCE';

TABLE_NAME      NUM_ROWS
--------------- ----------
MDW_SOURCE      55895

  1  create or replace view mdw_v1 as
  2  select
  3    '1'              src_type
  4   ,ms.OWNER
  5   ,ms.OBJECT_NAME
  6   ,ms.SUBOBJECT_NAME
  7   ,ms.CREATED
  8  from mdw_source ms
  9  where ms.object_name like 'A%'
 10  union all
 11  select
 12    '2'              src_type
 13   ,ms.OWNER
 14   ,ms.OBJECT_NAME
 15   ,ms.SUBOBJECT_NAME
 16   ,ms.CREATED
 17  from mdw_source ms
 18  where ms.object_name like 'B%'
 19  UNION ALL
 20  select
 21    '3'              src_type
 22   ,ms.OWNER
 23   ,ms.OBJECT_NAME
 24   ,ms.SUBOBJECT_NAME
 25   ,ms.CREATED
 26  from mdw_source ms
 27  where ms.object_name like 'C%'
 28  /
View created.

So, a simple table is created from DBA_OBJECTS and a view over the top of the table consisting on 3 simple selects unioned together, with a static value (1, 2 or 3) generated for each of the 3 parts as SRC_TYPE

Now let’s run some tests:

DB_TEST> set autotrace on
DB_TEST> --
DB_TEST> -- below will do a select of a non-indexed column across the view
DB_TEST> select count(created) from mdw_v1
  2  /

COUNT(CREATED)
--------------
          1284

Execution Plan
----------------------------------------------------------
Plan hash value: 4259489107
-----------------------------------------------------------------------------------
| Id  | Operation            | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |            |     1 |     9 |   551   (1)| 00:00:07 |
|   1 |  SORT AGGREGATE      |            |     1 |     9 |            |          |
|   2 |   VIEW               | MDW_V1     |  2463 | 22167 |   551   (1)| 00:00:07 |
|   3 |    UNION-ALL         |            |       |       |            |          |
|*  4 |     TABLE ACCESS FULL| MDW_SOURCE |   821 | 27914 |   184   (1)| 00:00:03 |
|*  5 |     TABLE ACCESS FULL| MDW_SOURCE |   821 | 27914 |   184   (1)| 00:00:03 |
|*  6 |     TABLE ACCESS FULL| MDW_SOURCE |   821 | 27914 |   184   (1)| 00:00:03 |
-----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - filter("MS"."OBJECT_NAME" LIKE 'A%')
   5 - filter("MS"."OBJECT_NAME" LIKE 'B%')
   6 - filter("MS"."OBJECT_NAME" LIKE 'C%')

Statistics
----------------------------------------------------------
          8  recursive calls
          0  db block gets
       2417  consistent gets


DB_TEST> -- below will do a select of a non-indexed column across the view
DB_TEST> -- limiting it to "one" of the types
DB_TEST> select count(created)
  2  from mdw_v1
  3  where src_type='3'
  4  /

COUNT(CREATED)
--------------
           129


Execution Plan
----------------------------------------------------------
Plan hash value: 1188403912
------------------------------------------------------------------------------------
| Id  | Operation             | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |            |     1 |    12 |   184   (1)| 00:00:03 |
|   1 |  SORT AGGREGATE       |            |     1 |    12 |            |          |
|   2 |   VIEW                | MDW_V1     |   823 |  9876 |   184   (1)| 00:00:03 |
|   3 |    UNION-ALL          |            |       |       |            |          |
|*  4 |     FILTER            |            |       |       |            |          |
|*  5 |      TABLE ACCESS FULL| MDW_SOURCE |   821 | 27914 |   184   (1)| 00:00:03 |
|*  6 |     FILTER            |            |       |       |            |          |
|*  7 |      TABLE ACCESS FULL| MDW_SOURCE |   821 | 27914 |   184   (1)| 00:00:03 |
|*  8 |     TABLE ACCESS FULL | MDW_SOURCE |   821 | 27914 |   184   (1)| 00:00:03 |
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - filter(NULL IS NOT NULL)
   5 - filter("MS"."OBJECT_NAME" LIKE 'A%')
   6 - filter(NULL IS NOT NULL)
   7 - filter("MS"."OBJECT_NAME" LIKE 'B%')
   8 - filter("MS"."OBJECT_NAME" LIKE 'C%')

Statistics
----------------------------------------------------------
          8  recursive calls
          0  db block gets
        807  consistent gets


DB_TEST> -- below will do a select of an indexed column across the view
DB_TEST> select count(owner) from mdw_v1
  2  /

COUNT(OWNER)
------------
        1284


Execution Plan
----------------------------------------------------------
Plan hash value: 4219520050
------------------------------------------------------------------------------------
| Id  | Operation               | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |          |     1 |    17 |   221   (1)| 00:00:03 |
|   1 |  SORT AGGREGATE         |          |     1 |    17 |            |          |
|   2 |   VIEW                  | MDW_V1   |  2463 | 41871 |   221   (1)| 00:00:03 |
|   3 |    UNION-ALL            |          |       |       |            |          |
|*  4 |     INDEX FAST FULL SCAN| MDSO_IDX |   821 | 26272 |    74   (2)| 00:00:01 |
|*  5 |     INDEX FAST FULL SCAN| MDSO_IDX |   821 | 26272 |    74   (2)| 00:00:01 |
|*  6 |     INDEX FAST FULL SCAN| MDSO_IDX |   821 | 26272 |    74   (2)| 00:00:01 |
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - filter("MS"."OBJECT_NAME" LIKE 'A%')
   5 - filter("MS"."OBJECT_NAME" LIKE 'B%')
   6 - filter("MS"."OBJECT_NAME" LIKE 'C%')

Statistics
----------------------------------------------------------
          8  recursive calls
          0  db block gets
        989  consistent gets


DB_TEST> --
DB_TEST> -- below will do a select of an indexed column across the view
DB_TEST> -- limiting it to "one" of the types
DB_TEST> select count(owner)
  2  from mdw_v1
  3  where src_type='3'
  4  /

COUNT(OWNER)
------------
         129


Execution Plan
----------------------------------------------------------
Plan hash value: 815942527
-------------------------------------------------------------------------------------
| Id  | Operation                | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |          |     1 |    20 |    74   (2)| 00:00:01 |
|   1 |  SORT AGGREGATE          |          |     1 |    20 |            |          |
|   2 |   VIEW                   | MDW_V1   |   823 | 16460 |    74   (2)| 00:00:01 |
|   3 |    UNION-ALL             |          |       |       |            |          |
|*  4 |     FILTER               |          |       |       |            |          |
|*  5 |      INDEX FAST FULL SCAN| MDSO_IDX |   821 | 26272 |    74   (2)| 00:00:01 |
|*  6 |     FILTER               |          |       |       |            |          |
|*  7 |      INDEX FAST FULL SCAN| MDSO_IDX |   821 | 26272 |    74   (2)| 00:00:01 |
|*  8 |     INDEX FAST FULL SCAN | MDSO_IDX |   821 | 26272 |    74   (2)| 00:00:01 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - filter(NULL IS NOT NULL)
   5 - filter("MS"."OBJECT_NAME" LIKE 'A%')
   6 - filter(NULL IS NOT NULL)
   7 - filter("MS"."OBJECT_NAME" LIKE 'B%')
   8 - filter("MS"."OBJECT_NAME" LIKE 'C%')

Statistics
----------------------------------------------------------
          8  recursive calls
          0  db block gets
        331  consistent gets


DB_TEST> -- below will select against the undelying object we are interested in
DB_TEST> select count(created)
  2  from mdw_source ms
  3  where ms.object_name like 'C%'
  4  /

COUNT(CREATED)
--------------
           129


Execution Plan
----------------------------------------------------------
Plan hash value: 152094671

---------------------------------------------------------------------------------
| Id  | Operation          | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |            |     1 |    34 |   184   (1)| 00:00:03 |
|   1 |  SORT AGGREGATE    |            |     1 |    34 |            |          |
|*  2 |   TABLE ACCESS FULL| MDW_SOURCE |   821 | 27914 |   184   (1)| 00:00:03 |
---------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("MS"."OBJECT_NAME" LIKE 'C%')

Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
        805  consistent gets

Look at the plans at the start.

When my query does not include SRC_TYPE,ie the first SQL statement, the plan is simple – it scans the table three times, estimates 821 rows per scan and a cost of 184 for each step. When I include the SRC_TYPE = ‘3’ in the next step we see that the plan still includes all three full table scans, each still estimated at a cost of 184.
When you provide a predicate that identifies the one sql statement that you want to run, it still runs all three. I tried a few further tests and could not force the exclusion of the “unwanted” queries from the plan.

I showed my friend above (the one who suggested it would not work as I had asked the question) the results and we agreed that it was a little unfortunate that the Oracle Optimizer Pixies had not coded in the ability to exclude unwanted Unioned statements, but we could understand it.

However, for those of you thinking “you are missing something” – you are right. {And, to be fair to my friend, my example was not quite so simplified as what I have posted here}.

I decided to quantify the level of the impact and so shifted my concentration to the buffer gets. Look at the first and second statements again. The cost of the total statement drops, from 551 to 184. The actual consistent gets resulting from executing the SQL is dropping from 2417 to 807. They are both dropping dramatically to approximately 1/3.

Now check out the filter predicates at step 4 and 6 – “NULL IS NOT NULL”. That will always be false. Everything “after” that statement in the plan will not be executed. The step in the plan is there and, as a step, has the same cost – but it is is not being executed.

The pattern is repeated when the simple SQL statement is on an indexed column and the index is being used (the next two statements) {and honesty forces me to admit I have included more output than I need for this example, as I wanted to confuse the issue slightly and maybe encourage you to not read the plans too carefully}.

So yes, the CBO IS excluding the individual unioned SQL statements based on the stated SRC_TYPE and the fixed value stated in each part of the union.

I’m quite impressed by that. Clever Oracle Optimizer Pixies.

To tie this back to reality, I initially looked at this as I am trying to tune some code that hides all the complexity of the application under one massive unioned uber-view over a set of 9 other complex, unioned views. The explain plan runs to close to 700 lines. I had utterly overlooked the filters and filter predicates in that – heck I missed it initially in my simple test :-) {which was more complex than in this post, in my defense}. Trying to work with a very slow statement based on nested, complex views that is used by pretty much the whole application is “fun”.

Also back in reality, the filter clause I see in my real code is along the lines of:

:B17 is null or is not null

This is as bind variable is being considerd – and that is actually one of the considerations back in this real world of the original issue. The level that the master query needs to run in is decided by this bind variable – otherwise the application developer would (should have?) just pointed at the sub-view they needed for that level. So the CBO does not know which chunk of the union’d code it can ignore until it sees that bind variable, so the plan shows what could be executed.

Of course, it could be argued that if you know the type at the time you query the database, rather than use the above union filtering trick to cut down your massively complex view to the part that is required, just query the sub view you need directly. A bit more code but a lot simpler system to look after.

It’s a nice SQL trick but I would keep it to simple examples!

Finaly, just for fun, here is the plan I’ve been trying to tune. As I said, close to 700 steps and only the highlighted grey section on the left is shown in more detail on the right.

Shrinking Tables to Aid Full Scans April 18, 2012

Posted by mwidlake in performance, SQL.
Tags: ,
14 comments

{This blog is about shrinking tables where the High Water Mark is higher than it needs to be. Prompted by one of the comments, I wrote a follow-up post on finding the High Water Mark and tables that consist mostly of empty space, which would be candidates for shrinking.}

This blog is about one of those things I do as almost an autonomous “not thinking about it” performance housekeeping task, one which I have been meaning to mention for ages.

There can be quite a lot to gain by checking out full scans on “small” tables and seeing if it is as efficient as it can be. Sometimes it is, sometimes it is not. Often it is “good enough”. Occasionally it is awful.

Recently I was just casting an eye over the “top 20″ SQL on a system for any code generating a lot of consistent gets. I came across the below:

FRST_LOAD_TIME SQL_ID PRSE EXCS BUFFS DISCS RWS
-------------------- ------------- --------- ----------- -------------- ---------- ------------
CPU_MS ELAPSD_MS SORTS DIR_W OPT_COST
-------------- -------------- ---------- ---------- --------------
SQL_TEXT
----------------------------------------------------------------------------------------------------
2-03-14/18:00:10 bk9b5u6zyvy59 36,262 36,262 90,634,158 7 36,261
320,102.3 326,920.9 0 0 546
SELECT count(*) RUNNING_SESSIONS from SNP_SESSION WHERE SESS_STATUS = :1 and AGENT_NAME=:2

It caught my eye as it struck me as a lot of buffer gets for what I thought was quite a small table and it is executed quite often. In this case, a few thousand times a day. It takes 2500 buffer gets per count(*). All buffered, but the CPU has to chunk through it.

I check, it is a small table:

ABCD01> select count(*) from dev_ODI_XXXX.snp_session

COUNT(*)
----------
73

ABCD01> @tab_lst
Enter value for tab_name: snp_session

TABLE_NAME OWNER NUM_ROWS
------------------------------ ------------------------------ -------------
SNP_SESSION DEV_ODI_XXXX 49

-- and a quick check on those key columns

OWNER COLUMN_NAME NUM_DISTINCT N_NULLS LOW_V HI_V BKTS AVG_L
-------- -------------------- ------------ ---------- --------------- --------------- ---- -----
DEV_ODI_ SESS_STATUS 4 0 D W 4 2
DEV_ODI_ AGENT_NAME 4 6 AAA_ODI_AGENT BBB_ODI_AGENT 4 13

The table IS small, only 73 rows. It is also interesting that the stats are out by quite a percentage, but the table is still seen as small according to the stats. This indicates the table is quite volatile, but consistently small. Another thing to notice is that the number of distinct values for each column in the WHERE clause is only 4 and thus an index lookup would not be very specific. With such a small number of records in the table and not very specific index I “know” that an index lookup would not be of much benefit {BTW Everything that you “know” in this way – do check it in reality every now and again…}. I don’t look at adding an index, I look at why such a small table needs so many buffer gets.

So, why so many buffer gets?

First of all, I need to be careful. The original source of information I was looking as was V$SQL and that holds cumulative stats for all executions of the same SQL statement. Thus the buffer gets could have been very high a while ago (say, yesterday) and now are small, and the average is 2500 BGs per select but right now it is running fine. So I verify that the code is still doing a lot of work for such a small number of records:

set autotrace on
SELECT count(*) RUNNING_SESSIONS from dev_ODI_XXXX.SNP_SESSION 
WHERE SESS_STATUS = 'D' --:1 
and AGENT_NAME= 'DEV_ODI_AGENT'--:2

RUNNING_SESSIONS
----------------
1
Elapsed: 00:00:00.99

Execution Plan
----------------------------------------------------------
Plan hash value: 1837310352
------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 15 | 546 (1)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 15 | | |
|* 2 | TABLE ACCESS STORAGE FULL| SNP_SESSION | 1 | 15 | 546 (1)| 00:00:01 |
------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
2 - storage("SESS_STATUS"='D' AND "AGENT_NAME"='DEV_ODI_AGENT')
filter("SESS_STATUS"='D' AND "AGENT_NAME"='DEV_ODI_AGENT')

Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
2013 consistent gets
0 physical reads

That’s still a lot of consistent gets to scan all of 75 rows and do a count (which is the sort aggregate step). A second to process and 2013 consistent gets to look at 75 rows in a skinny table – just trust me it is a skinny table. And the optimiser cost the full scan step at 546 “equivalent single IOs”.
{Oh, if anyone is wondering what the “table access storage full” and predicate step “storage(“SESS_STATUS”=’D’ AND “AGENT_NAME”=’DEV_ODI_AGENT’)” it’s because this system is an exadata box. Just ignore this for now, it is not relevant to this issue. I’ll blog about it *next week*}.

What is the problem?

Well, let’s see how big the table is.

ABCD01> @seg_dets
Enter value for seg_name: snp_session
Enter value for owner: dev_odi%
Any Key>

OWNER SEG_NAME SEG TS_NAME BYTES_K BLOCKS exts INI_K NXT_K
-------- --------------- --- -------- ---------- --------- ---- ------- -------
DEV_ODI_ SNP_SESSION TAB DEV_ODI_ 16,384 2,048 31 64 1024
REPO USER

The table is 2048 blocks in size, the consistent gets to look at the table is 2013. That’s close.

The problem is when you full scan a table, it scans up to the high water mark of that table, ie the highest block that Oracle knows there has been data in. As Oracle inserts new data into a table and extends it, it keeps a note of the highest block data has been put into, the High Water Mark (HWM). When you delete data from a table, Oracle marks individual blocks to be reused as they empty (drop below a certain percentage full). But Oracle does NOT check to see if many blocks are being fully emptied and it can reduce the high water mark. In fact, there could well be a few records in the first block of the table and then nothing until a few records in the last block that has been used in the table, the High Water Mark. Oracle just scans all blocks and counts relevant records up to the high water mark. This is actually a very long-standing performance tweek, you can allocate GBs of data to a table but Oracle knows the very highest block it has put data into for that segment and only scans up to the High Water Mark. Usually, the HWM is useful.

I happen to know that this table used to have several thousand records in it but is now used far less extensively and holds only a few records. They are constantly deleted and new ones created. I have seen {very regularly, at least once a year} some extreme situations where a table has had many millions of rows in it at one point, setting the HWM to several hundred thousand blocks into the table segment, but the table now only holds a few hundred records. But a Full Table Scan scans all the way up to the HWM, taking several minutes to access a few hundred records. Often that table is being inefficiently cleared with a “delete from table_x” statement and then re-populated.

What is the answer? Well, I could truncate the table and set the HWM to zero, but this table actually holds some real data. I can’t truncate it in this situation. I could MOVE the table, ie get Oracle to recreate the table and move the existing data into it. That would set the HWM of the new segment to block one and then raise it as it inserts the data neatly into the first few blocks. The third option, and the one I took, is to SHRINK the table. This basically tells oracle to shuffle the data down to the start of the table and reset the HWM. You can do this on the fly and I have done it on relatively active systems, but I must give the usual advice to test this yourself and you may have locking issues on a busy system hitting the table you shrink very hard. Also, you do need to enable row movement on the table for Shrink to work, as is demonstrated below:

ABCD01> alter table dev_ODI_XXXX.snp_session shrink space cascade;
alter table dev_ODI_XXXX.snp_session shrink space cascade
*
ERROR at line 1:
ORA-10636: ROW MOVEMENT is not enabled

ABCD01> alter table dev_ODI_XXXX.snp_session enable row movement;

Table altered.

ABCD01> alter table dev_ODI_XXXX.snp_session shrink space cascade;

Table altered.

Elapsed: 00:00:01.98

So, how does my little select perform now?

set autotrace on
SELECT count(*) RUNNING_SESSIONS from dev_ODI_XXXX.SNP_SESSION 
WHERE SESS_STATUS = 'D' --:1 
and AGENT_NAME= 'DEV_ODI_AGENT'--:2

RUNNING_SESSIONS
----------------
1
Elapsed: 00:00:00.04

Execution Plan
----------------------------------------------------------
Plan hash value: 1837310352
------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 15 | 546 (1)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 15 | | |
|* 2 | TABLE ACCESS STORAGE FULL| SNP_SESSION | 1 | 15 | 546 (1)| 00:00:01 |
------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
2 - storage("SESS_STATUS"='D' AND "AGENT_NAME"='DEV_ODI_AGENT')
filter("SESS_STATUS"='D' AND "AGENT_NAME"='DEV_ODI_AGENT')

Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
8 consistent gets
0 physical reads

Fabulous. The run time is very short and the number of consistent gets is tiny. You might notice that the Explain Plan still thinks the cost of the full scan will be 546 (ie the equivalent of 546 single block reads) but that is as I have not re-gathered stats. I left the system to do this automatically that night.

I’ll just mention one more thing. Shrinking a table is a bit like the old trick of rebuilding indexes to compact them. People misunderstood what rebuilding indexes was achieving and why the indexes were so much larger than they “needed” to be and would spend CPU and IO rebuilding indexes overnight – only for them to quickly explode back to the original size (creating a lot of overhead as they did). Shrinking oversized tables can have a great benefit but keep in mind that if some process pushes the HWM right up every day by shoving lots of data into the table and then clearing it down later, shrinking the table is just a waste of time. It gives benefit when the table really has shrunk down in size for a while. Or it only occasionally explodes to a massive size. I have in the past written “regular shrink” scripts for tables I know explode once in a while (eg once a week) but are usually small, but you need to know what the pattern is. Don’t just blindly shrink tables.

For me in this situation, that code scanning that table dropped out of my “top 20 most demanding SQL statements” and has stayed out of it for weeks. The code now completed in considerably under 0.1 seconds as opposed to a whole second – but no one has noticed. I’d removed about 5% of the workload of the system – but no one noticed. But it took me 1/2 an hour to investigate and fix and I’d freed up a non-trivial chunk of processing resource. In my Performance Tuning heart I know I did a good job on that day.

SQL Quiz – How To Multiply across Rows February 22, 2012

Posted by mwidlake in SQL.
Tags:
15 comments

A colleague came to me a couple of days ago with a SQL problem. He had something like this:

@get_source

NAME          INPUT
------------- -----
GROUP_1       5
GROUP_2       3
GROUP_3       4
GROUP_4       7
GROUP_5       3

What he wanted to do was multiply all the inputs across the groups, to get the total number of possible permutations. ie 5*3*4*7*3. The product of all the INPUTS. This has to be in straight SQL. Easy! You just need to… Ahhh… Hmmm…

No, I just could not think of a way to do it that my colleague could use.

- There is no group-by function that gives a product of a column {that I know of}
– We could not use PL/SQL {for reasons I won’t go into}, so no cursor looping or passing in an array, which would make it simple
– Neither of us could think of an analytical function that could take the result of itself in the prior row (though I suspect there may be a way to do it).
– The number of groups could and would vary from a few to possibly a hundred, so the old tricks of converting rows to columns or nesting so many sql statements would not help.

So, I asked my friend – the queen of SQL Query, {Boneist} {Oh she of the trombone playing pastime}.

She came straight back with an answer. In case you want to try and work out an answer yourself before seeing the solution, below is a cat picture. The answer she came up with is below that:

The key to the solution is natural logs {ln}. I don’t know about you, but I learnt about using logs at school and have pretty much not used them since. In summary:

If x=3*5*9
then ln(x) = ln(3)+ln(5)+ln(9)
= 1.09861+1.60944+2.19722
= 4.90527

ie using log converts multiplication to addition. You then use EXP, the inverse of ln, to convert your added-up log value into your result.

exp(4.90527) = 135

{NB if you use my figures, exp(4.90527) realy equals 134.999355, as I have truncated the log values shown. Oracle does this far more accurately internally but be aware you might get some slight rounding errors).

So, what we can do is simply use the SQL GROUP function SUM to add together the natural logs of all the rows:

sum(ln(input))
{grouped by the whole statement, so no group by is needed in this case}

As an example:

-- show the expected result first
select 3*7*4*5*1 from dual;

 3*7*4*5*1
----------
       420


select min(name),max(name),count(name)
,EXP (SUM (LN (gr_sum))) gr_prod
from
(select 'group_1' name, 3 gr_sum from dual
 union
 select 'group_2' name, 7 gr_sum from dual
 union
 select 'group_3' name, 4 gr_sum from dual
 union
 select 'group_4' name, 5 gr_sum from dual
 union
 select 'group_5' name, 1 gr_sum from dual
)
/

MIN(NAM MAX(NAM COUNT(NAME)    GR_PROD
------- ------- ----------- ----------
group_1 group_5           5        420

As you can see, it works – even if when you first look at the formula your brains {if you are not a mathematician} try to leak out of your head. Just try and remember what your mathematics teacher said about log books and how, before calculators, they were used to speed up manual long multiplication tasks by converting the task into log addition.

If you want more information on logs, see this discussion about how they are actually about growth or wikipedia if you must :-).

Boneist actually pointed me to this very nice post about using logs in oracle by Anju Parashar, which I have borrowed from.

One issues to be aware of (which is highlighted in Anuj Parashar’s article) is that you can’t get a log of negative values, as a consequence Oracle will give you an ora-01428 error:

select ln(-3) from dual;
select ln(-3) from dual
*
ERROR at line 1:
ORA-01428: argument ‘-3′ is out of range

Anuj gives a version of code that works if all values are negative, below I have one that copes with any number of negatives. Basically, you convert all the values to be multiplied to positive values and then make it negative if the count of negative values is odd. Mathematically, the result of a multiplication can ONLY be negative if there are an odd number of negative values.

,EXP (SUM (LN (abs(gr_sum))))
*decode (mod(sum(decode(sign(gr_sum),0,0,1,0, 1)),2)
,0,1,-1) correct_gr_prod

I’m sure that the above expression could be simplified, but I have to go and do the day job.

Finally, here is a little set of test cases covering the above, so you can play with this.

mdw1123&gt; select 3*7*4*5*1 from dual;

 3*7*4*5*1
----------
       420

1 row selected.

mdw1123&gt; --
mdw1123&gt; select 'group_1' name, 3 gr_sum from dual
  2  union
  3  select 'group_2' name, 7 gr_sum from dual
  4  union
  5  select 'group_3' name, 4 gr_sum from dual
  6  union
  7  select 'group_4' name, 5 gr_sum from dual
  8  union
  9  select 'group_5' name, 1 gr_sum from dual
 10  /

NAME        GR_SUM
------- ----------
group_1          3
group_2          7
group_3          4
group_4          5
group_5          1

5 rows selected.

mdw1123&gt;
mdw1123&gt; select min(name),max(name),count(name)
  2  ,EXP (SUM (LN (gr_sum))) gr_prod
  3  from
  4  (select 'group_1' name, 3 gr_sum from dual
  5   union
  6   select 'group_2' name, 7 gr_sum from dual
  7   union
  8   select 'group_3' name, 4 gr_sum from dual
  9   union
 10   select 'group_4' name, 5 gr_sum from dual
 11   union
 12   select 'group_5' name, 1 gr_sum from dual
 13  )
 14  /

MIN(NAM MAX(NAM COUNT(NAME)    GR_PROD
------- ------- ----------- ----------
group_1 group_5           5        420

1 row selected.

mdw1123&gt; --
mdw1123&gt; -- now with a negative
mdw1123&gt; select 'group_1' name, 3 gr_sum from dual
  2   union
  3   select 'group_2' name, -7 gr_sum from dual
  4   union
  5   select 'group_3' name, 4 gr_sum from dual
  6   union
  7   select 'group_4' name, 5 gr_sum from dual
  8   union
  9   select 'group_5' name, 1 gr_sum from dual
 10  /

NAME        GR_SUM
------- ----------
group_1          3
group_2         -7
group_3          4
group_4          5
group_5          1

5 rows selected.

mdw1123&gt; -- and if the values contain negatives
mdw1123&gt; select min(name),max(name),count(name)
  2  ,EXP (SUM (LN (abs(gr_sum)))) gr_prod
  3  ,mod(sum(decode(sign(gr_sum),0,0
  4                          ,1,0
  5                          ,  1)
  6           ),2) -- 0 if even number of negatives, else 1
  7           modifier
  8  ,EXP (SUM (LN (abs(gr_sum))))
  9   *decode (mod(sum(decode(sign(gr_sum),0,0,1,0,     1)),2)
 10         ,0,1,-1) correct_gr_prod
 11  from
 12  (select 'group_1' name, 3 gr_sum from dual
 13   union
 14   select 'group_2' name, -7 gr_sum from dual
 15   union
 16   select 'group_3' name, 4 gr_sum from dual
 17   union
 18   select 'group_4' name, 5 gr_sum from dual
 19   union
 20   select 'group_5' name, 1 gr_sum from dual
 21  )
 22  /

MIN(NAM MAX(NAM COUNT(NAME)    GR_PROD   MODIFIER CORRECT_GR_PROD
------- ------- ----------- ---------- ---------- ---------------
group_1 group_5           5        420          1            -420

1 row selected.

Dropped Partitions do not go in the Recycle Bin January 24, 2012

Posted by mwidlake in SQL.
Tags: , ,
3 comments

If you alter table TEST drop partition Q1, does it go in the recycle bin?

That is the question I was asked today. “Of course it….” Pause. More pause.

No, I did not know. I knew I’d seen partitions in the recyclebin on Oracle but I thought there was no option to state PURGE when you dropped a partition.

So, a quick test was needed.

First I tried a drop of a partition I knew I could live without {NB this is on version 11.2.0.3, I am 99.8% sure this is the same on 10}

mdw1123> alter table person_call drop partition d_20111205 purge
  2  /
alter table person_call drop partition d_20111205 purge
                                                  *
ERROR at line 1:
ORA-14048: a partition maintenance operation may not be combined with other operations


mdw1123> alter table person_call drop partition d_20111205
  2  /

Table altered.

mdw1123> select count(*) from dba_recyclebin
  2  /
Any Key>

  COUNT(*)
----------
         0

1 row selected.

That’s pretty conclusive then, Oracle does not like the PURGE option when dropping a partitions, which suggests it does not go into the recyclebin, and in fact my recyclebin remains empty when I correctly drop the partition. {NB I had purged the recyclebin before this test}.

However, I want to make sure that it is indeed just the partition operation that does not go to the recyclebin (you can of course turn off the recyclebin and I have not proven that I have not done this). So:


mdw1123> create table mdw (id number,vc1 varchar2(10))
  2  partition by range (id)
  3  (partition p1 values less than (10)
  4  ,partition p2 values less than (20)
  5  ,partition p3 values less than (30)
  6  ,partition pm values less than (maxvalue)
  7  )
  8
mdw1123> /
Table created.

mdw1123> insert into mdw
  2  select rownum,'AAAAAAAA'
  3  from dual
  4  connect by level <40
  5  /
39 rows created.

mdw1123> select * from dba_recyclebin
  2  /
no rows selected

mdw1123> alter table mdw drop partition p3
  2  /
Table altered.

mdw1123> select * from dba_recyclebin
  2  /
no rows selected

mdw1123> drop table mdw
  2  /
Table dropped.

mdw1123> select * from dba_recyclebin;
Any Key>
OWNER                          OBJECT_NAME                    ORIGINAL_NAME
------------------------------ ------------------------------ --------------------------------
OPERATION TYPE                      TS_NAME                        CREATETIME
--------- ------------------------- ------------------------------ -------------------
DROPTIME               DROPSCN PARTITION_NAME                   CAN CAN    RELATED BASE_OBJECT
------------------- ---------- -------------------------------- --- --- ---------- -----------
PURGE_OBJECT      SPACE
------------ ----------
MDW                            BIN$DI/3ZwiMRneOxLrJ4jidyg==$0 MDW
DROP      Table Partition           INDEX_01                       2012-01-24:16:13:55
2012-01-24:16:15:33    2787392                                  NO  NO       77672       77672
       77672       1024
MDW                            BIN$DI/3ZwiMRneOxLrJ4jidyg==$0 MDW
DROP      Table Partition           INDEX_01                       2012-01-24:16:13:55
2012-01-24:16:15:33    2787392                                  NO  NO       77672       77672
       77672       1024
MDW                            BIN$DI/3ZwiMRneOxLrJ4jidyg==$0 MDW
DROP      Table Partition           INDEX_01                       2012-01-24:16:13:55
2012-01-24:16:15:33    2787392                                  NO  NO       77672       77672
       77672       1024
MDW                            BIN$DI/3ZwiMRneOxLrJ4jidyg==$0 MDW
DROP      TABLE                                                    2012-01-24:16:13:55
2012-01-24:16:15:33    2787393                                  YES YES      77672       77672
       77672

4 rows selected.

So, if you go and drop the wrong partition, you can’t go getting it back without some sort of restore or recovery (as there is nothing to get it back from) but you can if you drop the whole table. And I was not going mad, I had seen partitions in the Recyclebin (there is, after all, a PARTITION_NAME column in the table)

This seems a little odd to me, I would imagine that dropping the wrong partition is something you would want to use flashback query to fix (as you can with accidentally dropping a table) but there you go. You can’t.

Back to the day job….

Follow

Get every new post delivered to your Inbox.

Join 161 other followers