jump to navigation

Fun with Filters March 9, 2010

Posted by mwidlake in performance.
Tags: , ,
9 comments

{Please note – this post has been used to test changes to layout, thus the duplicated sections}

This post is about the importance of filter statements in execution plans and how they can indicate problems that are not always apparent.

I had a problem recently with some code that had gone rogue – it had been running in a few seconds, being executed every 10 minutes or so. Now it ran until either someone killed it off or it got “snapshot too old” errors. I pretty much knew it was prompted by stats being gathered on the tables in question as we had just gathered stats on the tables in that schema.

The code was something like this (it is not too important what exactly the code was). Oh, and this is on 10.2.0.3, enterprise edition with partitioning.

select  accounted, max(recorded_date),count(*)
from W_LCG_OPENING_
where accountid||DATA_SRC_COUNTRY_ID in
     (
       SELECT accountid||DATA_SRC_COUNTRY_ID FROM W_LCG_OPENING_
       minus
       SELECT  accountid ||DATA_SRC_COUNTRY_ID FROM W_LCG_CLIENT_
      )
   and 1= 1 – this is pointless, it replaces a check of “run”=”run”
   group by accountid,DATA_SRC_COUNTRY_ID

Those tables are quite large, several million rows in each. The code is basically scanning all of the rows in the table as there are no indexes to support the query and it is written in a way that, well, I would not write it.

Digging around in some monitoring tools I use I confirmed that the code has swapped plan

{original layout, having hacked it to fit}

inst SQL_ID        Plan Hash   First Load   execs   
---- ------------- ----------- ------------ ------ 
  TOT_ROWS       TOT_BG      TOT_DR        TOT_CPU    CREATED_DT
---------- ------------ ----------- --------------  ------------
   3 1t7ma4xn3rw46  3183284037 100203 09:10     125
     7,854     6,457,885   1208,149    816,476,585  091020 10:27
   3 1t7ma4xn3rw46  3127554972 100206 09:13       2
         0  1260,936,642        980  94599,678,960  100205 09:19

{newly possible layout, with wider screen}

inst SQL_ID        Plan Hash   First Load   execs    TOT_ROWS       TOT_BG      TOT_DR        TOT_CPU    CREATED_DT

---- ------------- ----------- ------------ ------ ---------- ------------ ----------- --------------  ------------
   3 1t7ma4xn3rw46  3183284037 100203 09:10    125      7,854    6,457,885    1208,149    816,476,585  091020 10:27
   3 1t7ma4xn3rw46  3127554972 100206 09:13      2          0 1260,936,642         980  94599,678,960  100205 09:19

Version 1 comes back in 5 or 6 seconds. Version 2 does not effectively come back, which is why it records 0 rows. You can see that the Disk Gets are pretty low in version 2 (allowing that it was never left to finish) but Buffer Gets and CPU are both massively up. So much so, it exceeds where I format the numbers with comas (as, if they are THAT big, they need fixing anyway).

I looked at the plans and, though they were different, nothing jumped out at me. So I trimmed down the code and built it up section by section until I saw something significant change. This is my general mode of tuning code if nothing initially occurs to me.
As usual, I started with any scalar or inline code, in this case that SELECT…MINUS…SELECT in the where clause.

SELECT accountid||DATA_SRC_COUNTRY_ID FROM W_LCG_OPENING_
Minus
SELECT  accountid ||DATA_SRC_COUNTRY_ID FROM W_LCG_CLIENT_

This came back in 3.145 seconds, with all of 62 rows. That is fine.

Now I added back the use of the MINUS code as a WHERE clause to the outer query.

select accountid
from  W_LCG_OPENING_
where accountid||DATA_SRC_COUNTRY_ID in
     (
       SELECT accountid||DATA_SRC_COUNTRY_ID FROM W_LCG_OPENING_
       minus
       SELECT  accountid ||DATA_SRC_COUNTRY_ID FROM W_LCG_CLIENT_
      ) 

This came back in 5.22 seconds with 4468 records. Again, I can live with that, we do not need sub-second response, we need sub-minute response.

So I now added back in the group by accountid…

select accountid,count(*)
from W_LCG_OPENING_
where accountid||DATA_SRC_COUNTRY_ID in
     (
       SELECT accountid||DATA_SRC_COUNTRY_ID FROM W_LCG_OPENING_
       minus
       SELECT  accountid ||DATA_SRC_COUNTRY_ID FROM W_LCG_CLIENT_
      )
group by accountid

This does not come back before I have fetched and consumed a cup of tea. That is a significant change.

I was not expecting this, why would sorting under 5 thousand rows stress a database that can scan multi-million row tables in seconds? There is all that CPU being burned up and huge numbers of buffer gets, but not much in the way of disc IO, so it is probably doing something that can be done in memory a very, very large number of times.

What is the difference in plan? {Oh HELL, they look rubbish – sorry, click on the images to blow them up, I’d suggest right-click and open in a new tab, and, YES, I am working on making my blog a little more “wide and friendly”.}. Good plan first, bad plan second

{Original compressed – width=”459″ height=”122″}

{Largest size I can use – width=”800″ height=”212″}

{Natural “overflow” size – width=”866″ height=”230″}

{ was width=”460″ height=”132″ now 800*242. Orig is 839* 242}

The view step has disappeared, which was an expensive step so should help not hinder.
The “SORT GROUP BY” has appeared, which it would do as we introduced the “GROUP BY” clause.
The hash join has been replaced with a filter. In a more complex plan you might miss this replacement, you would maybe notice the hash join disappearing but a filter, well, it is checking some simple “WHERE” clause isn’t it?

Well, the hash join was joining the results coming from the two steps feeding into it, the PARTITION RANGE FULL and the VIEW (which is the in-memory construct of the SELECT..MINUS…SELECT statement).

Now the Filter is filtering the results from the PARTITION RANGE ALL with the results from the MINUS. At this point I’d like to highlight that the predicted cardinality and bytes coming back for the steps within the union have reduced by a factor of 100 from the good and bad plans. And I’ll also admit I hid some key detail in the screen shot. I am not showing the access and filter predicates.

{image is 989*246, I am trying 800*246. Original 460*114}

The above is a screen shot showing that in the new code there are no filter predicates but an access predicate, for the access to the view (ie the select…minus…select). For easier reading, the full access predicate is below, rather than in the screen shot:

“$nso_col_1″=TO_CHAR(“ACCOUNTID”)||TO_CHAR(“DATA_SRC_COUNTRY_ID”)

However, for the slow code, there are no access predicated but are filter predicates. Again, the screen shot shows that there are predicates and I show the full text below. (Screen shots are from PL/SQL developer, btw).

{image 953*238, trying 800*238, original 460*114}

2- EXISTS ( (SELECT /*+ */ TO_CHAR("ACCOUNTID")
   ||TO_CHAR("DATA_SRC_COUNTRY_ID") 
FROM "W_LCG_OPENING_" "W_LCG_OPENING_" 
WHERE TO_CHAR("ACCOUNTID")||TO_CHAR("DATA_SRC_COUNTRY_ID")=TO_CHAR(:B1)||TO_CHAR(:B2))
MINUS (SELECT /*+ */ TO_CHAR("ACCOUNTID")||TO_CHAR("DATA_SRC_COUNTRY_ID")
FROM "W_LCG_CLIENT_" "W_LCG_CLIENT_" 
WHERE TO_CHAR("ACCOUNTID")||TO_CHAR("DATA_SRC_COUNTRY_ID")=TO_CHAR(:B3)||TO_CHAR(:B4)))	

8- TO_CHAR("ACCOUNTID")||TO_CHAR("DATA_SRC_COUNTRY_ID")=TO_CHAR(:B1)||TO_CHAR(:B2)	

10- TO_CHAR("ACCOUNTID")||TO_CHAR("DATA_SRC_COUNTRY_ID")=TO_CHAR(:B1)||TO_CHAR(:B2)	

Basically, the filter is to run the union query for every row from the driving query, passing in the relevant filter clause as an extra predicate to each of the individual queries of the SELECT…MINUS…SELECT

Clever, but not of much help to us as the performance is awful.

I showed this to a colleague and their immediate response was “why does that not show up as a nested loop? How am I supposed to spot that the “filter” is doing so much?” There were a couple of robust Saxon words mixed in with that statement.

So, Beware Filters replacing joins and get in the habit of checking out the filter and access predicates

If you use an old-style template for showing plans {like I do, a guilty sin of mine}, or a GUI where you have not selected that the filter and access predicates be shown, you may well not get them displayed. If you use autotrace in SQL*Plus, you will though:

db3_mw> select accountid,count(*)
  2  from W_LCG_OPENING_
  3  where accountid||DATA_SRC_COUNTRY_ID in
  4       (
  5         SELECT accountid||DATA_SRC_COUNTRY_ID FROM W_LCG_OPENING_
  6         minus
  7         SELECT  accountid ||DATA_SRC_COUNTRY_ID FROM W_LCG_CLIENT_
  8        )
  9  group by accountid
 10  /

Execution Plan
----------------------------------------------------------
Plan hash value: 397856216

--------------------------------------------------------------------------------
| Id  | Operation                | Name             | Rows  | Bytes | Cos
t (%CPU)| Time     | Pstart| Pstop |
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |                  |   911 |  7288 |
15M  (7)| 17:25:41 |       |       |
|   1 |  SORT GROUP BY           |                  |   911 |  7288 |
15M  (7)| 17:25:41 |       |       |
|*  2 |   FILTER                 |                  |       |       |
        |          |       |       |
|   3 |    PARTITION RANGE ALL   |                  |  3153K|    24M|  83
11   (4)| 00:00:34 |     1 |   840 |
|   4 |     TABLE ACCESS FULL    | W_LCG_OPENING_   |  3153K|    24M|  83
11   (4)| 00:00:34 |     1 |   840 |
|   5 |    MINUS                 |                  |       |       |
        |          |       |       |
|   6 |     SORT UNIQUE NOSORT   |                  | 31535 |   246K|  86
15   (7)| 00:00:35 |       |       |
|   7 |      PARTITION RANGE ALL |                  | 31535 |   246K|  86
11   (7)| 00:00:35 |     1 |   840 |
|*  8 |       TABLE ACCESS FULL  | W_LCG_OPENING_   | 31535 |   246K|  86
11   (7)| 00:00:35 |     1 |   840 |
|   9 |     SORT UNIQUE NOSORT   |                  |   132 |  1056 |
17  (18)| 00:00:01 |       |       |
|* 10 |      INDEX FAST FULL SCAN| W_LCG_CLIENT__PK |   132 |  1056 |
16  (13)| 00:00:01 |       |       |
--------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter( EXISTS ( (SELECT /*+ */ TO_CHAR("ACCOUNTID")||TO_CHAR("DATA_SRC_
") FROM
              "W_LCG_OPENING_" "W_LCG_OPENING_" WHERE
              TO_CHAR("ACCOUNTID")||TO_CHAR("DATA_SRC_")=TO_CHAR(:B1)|
|TO_CHAR(:B2))MINUS (SELECT /*+ */
              TO_CHAR("ACCOUNTID")||TO_CHAR("DATA_SRC_") 
FROM "W_LCG_CLIENT_"
              "W_LCG_CLIENT_" WHERE TO_CHAR("ACCOUNTID")||TO_CHAR("DATA_S
RC_")=TO_CHAR(:B3)||TO_CHAR(:B4) )))
   8 - filter(TO_CHAR("ACCOUNTID")||TO_CHAR("DATA_SRC_")=TO_CHAR(:B1)|
|TO_CHAR(:B2))
  10 - filter(TO_CHAR("ACCOUNTID")||TO_CHAR("DATA_SRC_")=TO_CHAR(:B1)|
|TO_CHAR(:B2)) 

{re-worked layout}

db3_mw> select accountid,count(*)
  from W_LCG_OPENING_
  where accountid||DATA_SRC_COUNTRY_ID in
       (   SELECT accountid||DATA_SRC_COUNTRY_ID FROM W_LCG_OPENING_
           minus
          SELECT  accountid ||DATA_SRC_COUNTRY_ID FROM W_LCG_CLIENT   )
  group by accountid

Execution Plan
----------------------------------------------------------
Plan hash value: 397856216

-------------------------------------------------------------------------------------------------------------
| Id  | Operation                | Name             | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
-------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |                  |   911 |  7288 |15M  (7)    |17:25:41  |       |       |
|   1 |  SORT GROUP BY           |                  |   911 |  7288 |15M  (7)    |17:25:41  |       |       |
|*  2 |   FILTER                 |                  |       |       |            |          |       |       |
|   3 |    PARTITION RANGE ALL   |                  |  3153K|    24M|  8311   (4)|00:00:34  |     1 |   840 |
|   4 |     TABLE ACCESS FULL    | W_LCG_OPENING_   |  3153K|    24M|  8311   (4)| 00:00:34 |     1 |   840 |
|   5 |    MINUS                 |                  |       |       |            |          |       |       |
|   6 |     SORT UNIQUE NOSORT   |                  | 31535 |   246K|  8615   (7)| 00:00:35 |       |       |
|   7 |      PARTITION RANGE ALL |                  | 31535 |   246K|  8611   (7)| 00:00:35 |     1 |   840 |
|*  8 |       TABLE ACCESS FULL  | W_LCG_OPENING_   | 31535 |   246K|  8611   (7)| 00:00:35 |     1 |   840 |
|   9 |     SORT UNIQUE NOSORT   |                  |   132 |  1056 |17  (18)    | 00:00:01 |       |       |
|* 10 |      INDEX FAST FULL SCAN| W_LCG_CLIENT__PK |   132 |  1056 |16  (13)    | 00:00:01 |       |       |
--------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter( EXISTS ( (SELECT /*+ */ TO_CHAR("ACCOUNTID")||TO_CHAR("DATA_SRC_COUNTRY_ID") FROM
              "W_LCG_OPENING_" "W_LCG_OPENING_" WHERE
              TO_CHAR("ACCOUNTID")||TO_CHAR("DATA_SRC_")=TO_CHAR(:B1)||TO_CHAR(:B2))MINUS (SELECT /*+ */
              TO_CHAR("ACCOUNTID")||TO_CHAR("DATA_SRC_") 
              FROM "W_LCG_CLIENT_"  "W_LCG_CLIENT_" 
              WHERE TO_CHAR ("ACCOUNTID")||TO_CHAR("DATA_SRC_")=TO_CHAR(:B3)||TO_CHAR(:B4)              )))
   8 - filter(TO_CHAR("ACCOUNTID")||TO_CHAR("DATA_SRC_COUNTRY_ID")=TO_CHAR(:B1)||TO_CHAR(:B2))
  10 - filter(TO_CHAR("ACCOUNTID")||TO_CHAR("DATA_SRC_")=TO_CHAR(:B1)||TO_CHAR(:B2)) 

What did I do to fix the problem? Well, even though the code originally went bad due to stats being gathered, I could not force it back to a nice plan after an hour or two playing with gathering new stats so, in this case, I used an UNNEST hint.

select accountid,count(*)
from W_LCG_OPENING_
where accountid||DATA_SRC_COUNTRY_ID in
     (
       SELECT /*+ unnest */ accountid||DATA_SRC_COUNTRY_ID FROM W_LCG_OPENING_
       minus
       SELECT  accountid ||DATA_SRC_COUNTRY_ID FROM W_LCG_CLIENT_
      )
group by accounted

And it forced the plan back to the version using a HASH join.

The hinted plan

I’m a little unhappy about that hint, but the code needed fixing and the hint did what I wanted it to {I think it was poor of me to only hint one of the two minus statements and I do not like resorting to hints just because I can’t work out what is “wrong” with the stats – If I work out what is wrong and still need the hint, fair enough}. However, I had to be pragmatic and get the code fixed and working in Live, so it has gone in with a hint

Follow

Get every new post delivered to your Inbox.

Join 158 other followers