##
SQL Quiz – How To Multiply across Rows *February 22, 2012*

*Posted by mwidlake in SQL.*

Tags: SQL

trackback

Tags: SQL

trackback

A colleague came to me a couple of days ago with a SQL problem. He had something like this:

@get_source NAME INPUT ------------- ----- GROUP_1 5 GROUP_2 3 GROUP_3 4 GROUP_4 7 GROUP_5 3

What he wanted to do was multiply all the inputs across the groups, to get the total number of possible permutations. ie 5*3*4*7*3. The product of all the INPUTS. This has to be in straight SQL. Easy! You just need to… Ahhh… Hmmm…

No, I just could not think of a way to do it that my colleague could use.

- There is no group-by function that gives a product of a column {that I know of}

- We could not use PL/SQL {for reasons I won’t go into}, so no cursor looping or passing in an array, which would make it simple

- Neither of us could think of an analytical function that could take the result of itself in the prior row (though I suspect there may be a way to do it).

- The number of groups could and would vary from a few to possibly a hundred, so the old tricks of converting rows to columns or nesting so many sql statements would not help.

So, I asked my friend – the queen of SQL Query, {Boneist} {Oh she of the trombone playing pastime}.

She came straight back with an answer. In case you want to try and work out an answer yourself before seeing the solution, below is a cat picture. The answer she came up with is below that:

The key to the solution is natural logs {ln}. I don’t know about you, but I learnt about using logs at school and have pretty much not used them since. In summary:

If x=3*5*9

then ln(x) = ln(3)+ln(5)+ln(9)

= 1.09861+1.60944+2.19722

= 4.90527

ie using log converts **multiplication** to **addition**. You then use EXP, the inverse of ln, to convert your added-up log value into your result.

exp(4.90527) = 135

{NB if you use my figures, exp(4.90527) realy equals 134.999355, as I have truncated the log values shown. Oracle does this far more accurately internally but be aware you might get some slight rounding errors).

So, what we can do is simply use the SQL GROUP function **SUM** to add together the *natural logs of all the rows*:

sum(ln(input))

{grouped by the whole statement, so no group by is needed in this case}

As an example:

-- show the expected result first select 3*7*4*5*1 from dual; 3*7*4*5*1 ---------- 420 select min(name),max(name),count(name) ,EXP (SUM (LN (gr_sum))) gr_prod from (select 'group_1' name, 3 gr_sum from dual union select 'group_2' name, 7 gr_sum from dual union select 'group_3' name, 4 gr_sum from dual union select 'group_4' name, 5 gr_sum from dual union select 'group_5' name, 1 gr_sum from dual ) / MIN(NAM MAX(NAM COUNT(NAME) GR_PROD ------- ------- ----------- ---------- group_1 group_5 5 420

As you can see, it works – even if when you first look at the formula your brains {if you are not a mathematician} try to leak out of your head. Just try and remember what your mathematics teacher said about log books and how, before calculators, they were used to speed up manual long multiplication tasks by converting the task into log addition.

If you want more information on logs, see this discussion about how they are actually about growth or wikipedia if you must :-).

Boneist actually pointed me to this very nice post about using logs in oracle by Anju Parashar, which I have borrowed from.

One issues to be aware of (which is highlighted in Anuj Parashar’s article) is that you can’t get a log of negative values, as a consequence Oracle will give you an ora-01428 error:

select ln(-3) from dual;

select ln(-3) from dual

*

ERROR at line 1:

ORA-01428: argument ‘-3′ is out of range

Anuj gives a version of code that works if all values are negative, below I have one that copes with any number of negatives. Basically, you convert all the values to be multiplied to positive values and then make it negative if the count of negative values is odd. Mathematically, the result of a multiplication can ONLY be negative if there are an odd number of negative values.

,EXP (SUM (LN (abs(gr_sum))))

*decode (mod(sum(decode(sign(gr_sum),0,0,1,0, 1)),2)

,0,1,-1) correct_gr_prod

I’m sure that the above expression could be simplified, but I have to go and do the day job.

Finally, here is a little set of test cases covering the above, so you can play with this.

mdw1123> select 3*7*4*5*1 from dual; 3*7*4*5*1 ---------- 420 1 row selected. mdw1123> -- mdw1123> select 'group_1' name, 3 gr_sum from dual 2 union 3 select 'group_2' name, 7 gr_sum from dual 4 union 5 select 'group_3' name, 4 gr_sum from dual 6 union 7 select 'group_4' name, 5 gr_sum from dual 8 union 9 select 'group_5' name, 1 gr_sum from dual 10 / NAME GR_SUM ------- ---------- group_1 3 group_2 7 group_3 4 group_4 5 group_5 1 5 rows selected. mdw1123> mdw1123> select min(name),max(name),count(name) 2 ,EXP (SUM (LN (gr_sum))) gr_prod 3 from 4 (select 'group_1' name, 3 gr_sum from dual 5 union 6 select 'group_2' name, 7 gr_sum from dual 7 union 8 select 'group_3' name, 4 gr_sum from dual 9 union 10 select 'group_4' name, 5 gr_sum from dual 11 union 12 select 'group_5' name, 1 gr_sum from dual 13 ) 14 / MIN(NAM MAX(NAM COUNT(NAME) GR_PROD ------- ------- ----------- ---------- group_1 group_5 5 420 1 row selected. mdw1123> -- mdw1123> -- now with a negative mdw1123> select 'group_1' name, 3 gr_sum from dual 2 union 3 select 'group_2' name, -7 gr_sum from dual 4 union 5 select 'group_3' name, 4 gr_sum from dual 6 union 7 select 'group_4' name, 5 gr_sum from dual 8 union 9 select 'group_5' name, 1 gr_sum from dual 10 / NAME GR_SUM ------- ---------- group_1 3 group_2 -7 group_3 4 group_4 5 group_5 1 5 rows selected. mdw1123> -- and if the values contain negatives mdw1123> select min(name),max(name),count(name) 2 ,EXP (SUM (LN (abs(gr_sum)))) gr_prod 3 ,mod(sum(decode(sign(gr_sum),0,0 4 ,1,0 5 , 1) 6 ),2) -- 0 if even number of negatives, else 1 7 modifier 8 ,EXP (SUM (LN (abs(gr_sum)))) 9 *decode (mod(sum(decode(sign(gr_sum),0,0,1,0, 1)),2) 10 ,0,1,-1) correct_gr_prod 11 from 12 (select 'group_1' name, 3 gr_sum from dual 13 union 14 select 'group_2' name, -7 gr_sum from dual 15 union 16 select 'group_3' name, 4 gr_sum from dual 17 union 18 select 'group_4' name, 5 gr_sum from dual 19 union 20 select 'group_5' name, 1 gr_sum from dual 21 ) 22 / MIN(NAM MAX(NAM COUNT(NAME) GR_PROD MODIFIER CORRECT_GR_PROD ------- ------- ----------- ---------- ---------- --------------- group_1 group_5 5 420 1 -420 1 row selected.

That’s very impressive, I wouldn’t have got close, even if you’d hinted that the answer revolved round logs.

Assuming that you had a known number of ‘groups’ or at least no more than N then you could do it with lag/lead in oracle or row preceeding on other platforms bundled into an inline query but the logs solution is probably only way of getting it to work where then number of rows to be processed is unknown, so its a lot cleaner.

Yes, one of my first thoughts was to use Lag and have eg 10 Lag functions looking back up to 10 records and have that as a subquery to a piece of SQL pulling out the the first none-zero column – which would work up until you needed the product of 11 rows.

I would never have thought to use logs to solve the problem – that is an interesting solution.

I coulple of months ago Dom Brooks shared an approach on my blog that could be adapted to solve this problem. Essentially, the solution builds a math formula, and then evaluates the math formula within the SQL statement. The original solution provided by Dom Brooks:

http://hoopercharles.wordpress.com/2011/07/13/how-many-ways-to-solve-this-problem-add-the-sequential-numbers-x-through-y/#comment-3647

Adapted to solve this problem:

That’s pretty cool too. I’ll have to play with that.

Your/Dom’s solution will make more sense to someone reading the code, I think

Another alternative is to use the SQL model clause.

{code}SQL> with data as

2 ( select ‘group_1′ name, 3 gr_sum from dual union all

3 select ‘group_2′ name, 7 gr_sum from dual union all

4 select ‘group_3′ name, 4 gr_sum from dual union all

5 select ‘group_4′ name, 5 gr_sum from dual union all

6 select ‘group_5′ name, 3 gr_sum from dual

7 )

8 , including_product as

9 ( select name

10 , product

11 from data

12 model

13 return updated rows

14 dimension by (rownum r)

15 measures (name,gr_sum,1 product)

16 ( product[any] order by r = nvl(product[cv()-1],1) * gr_sum[cv()]

17 )

18 )

19 select min(name) min_group

20 , max(name) max_group

21 , count(*) count_group

22 , max(product) product

23 from including_product

24 /

MIN_GRO MAX_GRO COUNT_GROUP PRODUCT

——- ——- ———– ———-

group_1 group_5 5 1260

1 row selected.{code}

Hi Rob,

Thanks for that, I’m really happy that someone has provided a solution based on model.

I have to confess that my colleague and I briefly touched on using model but backed away from the idea as we have both only ever used it in the most simple way, and a few years ago when it first appeared.

We were not brave enough to try it!

If the records come from different TABLEs, of course, a cartesian join does it for you:

SELECT COUNT(*) FROM

(SELECT 1 FROM (select ‘group_1′ name, 3 gr_sum from dual) CONNECT BY LEVEL <= gr_sum),

(SELECT 1 FROM (select 'group_2' name, 7 gr_sum from dual) CONNECT BY LEVEL <= gr_sum),

(SELECT 1 FROM (select 'group_3' name, 4 gr_sum from dual) CONNECT BY LEVEL <= gr_sum),

(SELECT 1 FROM (select 'group_4' name, 5 gr_sum from dual) CONNECT BY LEVEL <= gr_sum),

(SELECT 1 FROM (select 'group_5' name, 1 gr_sum from dual) CONNECT BY LEVEL <= gr_sum);

Of course, if you're not thinking, you might come up with the following and think it actually works. :)

WITH

Data

AS

(

select 'group_1' name, 3 gr_sum from dual union

select 'group_2' name, 7 gr_sum from dual union

select 'group_3' name, 4 gr_sum from dual union

select 'group_4' name, 5 gr_sum from dual union

select 'group_5' name, 1 gr_sum from dual

),

Silliness

AS

(

SELECT

COUNT(*) x

FROM

Data

CONNECT BY

LEVEL < gr_sum

)

SELECT

MIN(x)

- COUNT(*)

- SUM(gr_sum) "Really?"

FROM

Silliness,

Data;

Thanks Brian – how would you cope with there being a varying number of tables to get the information from?

How would i cope? Don’t know. I was just mentioning the “obvious” answer, as joins without WHERE clauses is (logically) the starting point of any query with multiple TABLEs.

I must be an old, bitter man, since I learned math in highschool, I learned math at Uni, AND I remember lots of it. Upon reading the question, I immediately typed

$ sqlplus scott/tiger

SQL> select exp(sum(ln(deptno))) from dept;

Then again, I also own a selection of some 20 slide rules that I also remember how to use, and occasionally will use.

/Bjørn.

Hi Bjorn,

Good for you :-) I have never been shown how to use a slide rule.

I do take a perverse pleasure in using Oracle on half a million pounds worth of equipment as a simple calculator, rather than get the little Casio out of my top drawer

Nice solutions. How about the user-defined aggregate function (possibly ruled out by “No PL/SQL”?)

SQL>create or replace type aggr_product as object

2 ( nb number,

3 static function ODCIAggregateInitialize(sctx IN OUT aggr_product) return number,

4 member function ODCIAggregateIterate(self IN OUT aggr_product, value IN NUMBER) return number,

5 member function ODCIAggregateTerminate(self IN aggr_product, returnValue OUT number, flags IN number) return number,

6 member function ODCIAggregateMerge(self IN OUT aggr_product, ctx2 IN aggr_product) return number

7 )

8 /

Type created.

SQL>create or replace type body aggr_product is

2 static function ODCIAggregateInitialize(sctx IN OUT aggr_product) return number is

3 begin

4 sctx := aggr_product(1);

5 return ODCIConst.Success;

6 end;

7

8 member function ODCIAggregateIterate(self IN OUT aggr_product, value IN NUMBER) return number is

9 begin

10 self.nb:=self.nb*value;

11 return ODCIConst.Success;

12 end;

13

14 member function ODCIAggregateTerminate(self IN aggr_product, returnValue OUT number, flags IN number) return number is

15 begin

16 returnValue := self.nb;

17 return ODCIConst.Success;

18 end;

19

20 member function ODCIAggregateMerge(self IN OUT aggr_product, ctx2 IN aggr_product) return number is

21 begin

22 self.nb := ctx2.nb;

23 return ODCIConst.Success;

24 end;

25 end;

26 /

Type body created.

SQL>CREATE OR REPLACE FUNCTION product (input number) RETURN number

2 PARALLEL_ENABLE AGGREGATE USING aggr_product;

3 /

Function created.

SQL>SELECT MIN(NAME),MAX(NAME),count(NAME)

2 ,product(gr_sum) gr_prod

3 from

4 (select 'group_1' name, 3 gr_sum from dual

5 union

6 select 'group_2' name, 7 gr_sum from dual

7 union

8 select 'group_3' name, 4 gr_sum from dual

9 union

10 select 'group_4' name, 5 gr_sum from dual

11 union

12 select 'group_5' name, 1 gr_sum from dual

13 )

14 /

MIN(NAM MAX(NAM COUNT(NAME) GR_PROD

------- ------- ----------- ----------

group_1 group_5 5 420

I have to admit, the natural log solution is pretty darn cool, but as far as practical solutions go, John Howard’s is the approach I’d go with. However I’m fairly certain that line 22 should be:

self.nb := self.nb * ctx2.nb;

The ODCIAggregateMerge call is for when the SQL engine was working on two groups of data separately (during parallel execution, for example) and now wants to combine one result set with another. That would require multiplying self with the value being merged (ctx2 in this case).

Thanks Mark. Yeah, the natural log method really made me smile, but either John’s or Dom’s approach are probably the best to consider applying on a client site. Whatever you put in place, there has to be a reasonable chance that an average developer type is going to understand it after the event and the log stuff is just a bit too mathematical. Very cool but, unless I searched my own blog, in 5 years time I would probably just go “wow! How does THAT work!”

Dang, missed this for several days. Yes, nice answer but not allowed for my client as it used PL/SQL. Plus, I have to say, my personal PL/SQL skills are below me really understanding all of this. I need to brush up on my PL/SQL!!!! Many hanks for the input.