Menu Bar

Friday, November 18, 2011

Flattening Your Data - How to Make Multiple Rows into Columns


"Mercy is the mark of a great man ... <stab> ...Guess I'm just a good one ... <stab> ... Well, I'm alright" - Captain Malcolm Reynolds, Firefly - 'Shindig'.

I recent had to extract some data from my database. The general format of the tables was like so:

Supplier Table:

supplier_code     char(6)
supplier_name     char(40)
supplier address  char(40)
sup_status        char(1)
etc


Colloquial Name Table:

supplier_code     char(6)
colloquial_name   char(40)


Email Address Table:

supplier_code     char(6)
sup_email_address char(40)

The base requirement was to output one supplier per line with the base information on each supplier from the first table and from 0 through 6 colloquial names for the supplier, depending on how many were present. There was also an optional email address. Finally, there was a status code on the table and I had to include every supplier with a status of "3" or less.

I'll be the first to admit that I do not fully understand the depth of the OVER / PARTITION BY syntax but I get the gist of it. I ended up needing to get some help on this one but the end query looked like this:

select a.supplier_code, a.supplier_name, a.supplier_address, 
       d.sup_email_addr, g.colloq_1, g.colloq_2, g.colloq_3, 
       g.colloq_4, g.colloq_5, g.colloq_6
FROM SUPPLIER a
LEFT OUTER JOIN SUP_EMAIL D on a.supplier_code = d.supplier_code
LEFT OUTER JOIN 
(
select a.supplier_code, 
 MAX(CASE WHEN ClqOrder = 1 THEN Colloquial_name ELSE ' ' END) AS colloq_1,
 MAX(CASE WHEN ClqOrder = 2 THEN Colloquial_name ELSE ' ' END) AS colloq_2,
 MAX(CASE WHEN ClqOrder = 3 THEN Colloquial_name ELSE ' ' END) AS colloq_3,
 MAX(CASE WHEN ClqOrder = 4 THEN Colloquial_name ELSE ' ' END) AS colloq_4,
 MAX(CASE WHEN ClqOrder = 5 THEN Colloquial_name ELSE ' ' END) AS colloq_5,
 MAX(CASE WHEN ClqOrder = 6 THEN Colloquial_name ELSE ' ' END) AS colloq_6
FROM (
SELECT s.*, B.Colloquial_name, B.ClqOrder
FROM SUPPLIER S
INNER JOIN
(SELECT supplier_code, Colloquial_Name, 
        ROW_NUMBER() OVER (PARTITION BY supplier_code ORDER BY Colloquial_name) AS ClqOrder
   FROM sup_colloq) B
on S.SUPPLIER_CODE = B.SUPPLIER_CODE) A
GROUP by a.Supplier_CODE
) G on a.supplier_CODE = g.supplier_CODE
where a.sup_status <= '3'

So this all works, but how does it break down ?

For simplicity's sake, and the limited width of the blog page, we're going to shorten all those char(40) columns to something much shorter. Let's start with making the tables and populating some base data.

SQL> create table supplier
  2  (supplier_code char(2),
  3   supplier_name char(5),
  4   supplier_address char(5),
  5   sup_status char(1));

Table created.

SQL> create table sup_colloq
  2  (supplier_code char(2),
  3   colloquial_name char(5));

Table created.

SQL> create table sup_email
  2  (supplier_code char(2),
  3   sup_email_addr char(5));

Table created.

SQL> insert into supplier values ('01','Sup#1','Addr1','1');

1 row created.

SQL> insert into supplier values ('02','Sup#2','Addr2','1');

1 row created.

SQL> insert into supplier values ('03','Sup#3','Addr3','1');

1 row created.

SQL> insert into sup_email values ('01','a@a');

1 row created.

SQL> insert into sup_email values ('02','b@b');

1 row created.

SQL> insert into sup_colloq values('01','Cql01');

1 row created.

SQL> insert into sup_colloq values('01','Cql02');

1 row created.

SQL> insert into sup_colloq values('01','Cql03');

1 row created.

SQL> insert into sup_colloq values('01','Cql04');

1 row created.

SQL> insert into sup_colloq values('01','Cql05');

1 row created.

SQL> insert into sup_colloq values('01','Cql06');

1 row created.

SQL> insert into sup_colloq values('03','Cql07');

1 row created.

SQL>  insert into sup_colloq values('03','Cql08');

1 row created.

Running the big query gives us the expected output:

SQL> select a.supplier_code, a.supplier_name, a.supplier_address, 
  2         d.sup_email_addr, g.colloq_1, g.colloq_2, g.colloq_3, 
  3         g.colloq_4, g.colloq_5, g.colloq_6
  4  FROM SUPPLIER a
  5  LEFT OUTER JOIN SUP_EMAIL D on a.supplier_code = d.supplier_code
  6  LEFT OUTER JOIN 
  7  (
  8  select a.supplier_code, 
  9   MAX(CASE WHEN ClqOrder = 1 THEN Colloquial_name ELSE ' ' END) AS colloq_1,
 10   MAX(CASE WHEN ClqOrder = 2 THEN Colloquial_name ELSE ' ' END) AS colloq_2,
 11   MAX(CASE WHEN ClqOrder = 3 THEN Colloquial_name ELSE ' ' END) AS colloq_3,
 12   MAX(CASE WHEN ClqOrder = 4 THEN Colloquial_name ELSE ' ' END) AS colloq_4,
 13   MAX(CASE WHEN ClqOrder = 5 THEN Colloquial_name ELSE ' ' END) AS colloq_5,
 14   MAX(CASE WHEN ClqOrder = 6 THEN Colloquial_name ELSE ' ' END) AS colloq_6
 15  FROM (
 16  SELECT s.*, B.Colloquial_name, B.ClqOrder
 17  FROM SUPPLIER S
 18  INNER JOIN
 19  (SELECT supplier_code, Colloquial_Name, 
 20          ROW_NUMBER() OVER (PARTITION BY supplier_code ORDER BY Colloquial_name) AS ClqOrder
 21     FROM sup_colloq) B
 22  on S.SUPPLIER_CODE = B.SUPPLIER_CODE) A
 23  GROUP by a.Supplier_CODE
 24  ) G on a.supplier_CODE = g.supplier_CODE
 25  where a.sup_status <= '3'
 26  
SQL> /

SU SUPPL SUPPL SUP_E COLLO COLLO COLLO COLLO COLLO COLLO
-- ----- ----- ----- ----- ----- ----- ----- ----- -----
01 Sup#1 Addr1 a@a   Cql01 Cql02 Cql03 Cql04 Cql05 Cql06
03 Sup#3 Addr3       Cql07 Cql08
02 Sup#2 Addr2 b@b

So now we see what it did, let's break it down. The first small select gets our colloquial names with row numbers in an inline table. These row numbers will help us later.

SQL> SELECT supplier_code, Colloquial_Name, 
  2          ROW_NUMBER() OVER (PARTITION BY supplier_code ORDER BY Colloquial_name) AS ClqOrder
  3     FROM sup_colloq;

SU COLLO   CLQORDER
-- ----- ----------
01 Cql01          1
01 Cql02          2
01 Cql03          3
01 Cql04          4
01 Cql05          5
01 Cql06          6
03 Cql07          1
03 Cql08          2

8 rows selected.

When we grab the next chunk of code, with the CASE / MAX, we can query on each of the ClqOrder columns and turn that into something a little more friendly to what we need:

SQL> select a.supplier_code, 
  2   MAX(CASE WHEN ClqOrder = 1 THEN Colloquial_name ELSE ' ' END) AS colloq_1,
  3   MAX(CASE WHEN ClqOrder = 2 THEN Colloquial_name ELSE ' ' END) AS colloq_2,
  4   MAX(CASE WHEN ClqOrder = 3 THEN Colloquial_name ELSE ' ' END) AS colloq_3,
  5   MAX(CASE WHEN ClqOrder = 4 THEN Colloquial_name ELSE ' ' END) AS colloq_4,
  6   MAX(CASE WHEN ClqOrder = 5 THEN Colloquial_name ELSE ' ' END) AS colloq_5,
  7   MAX(CASE WHEN ClqOrder = 6 THEN Colloquial_name ELSE ' ' END) AS colloq_6
  8  FROM (
  9  SELECT s.*, B.Colloquial_name, B.ClqOrder
 10  FROM SUPPLIER S
 11  INNER JOIN
 12  (SELECT supplier_code, Colloquial_Name, 
 13          ROW_NUMBER() OVER (PARTITION BY supplier_code ORDER BY Colloquial_name) AS ClqOrder
 14     FROM sup_colloq) B
 15  on S.SUPPLIER_CODE = B.SUPPLIER_CODE) A
 16  GROUP by a.Supplier_CODE;

SU COLLO COLLO COLLO COLLO COLLO COLLO
-- ----- ----- ----- ----- ----- -----
01 Cql01 Cql02 Cql03 Cql04 Cql05 Cql06
03 Cql07 Cql08

Finally, adding in the outer joins as show at the top, we pick up the grand combination of everything. This was definitely something I've had to do before and thought it was worth preserving here. It falls squarely int the "for my edification" bucket that was talked about in the first post :)