DEV Community

slavong
slavong

Posted on

Columns to Rows in Oracle SQL

Based on the amount of posts in the web and questions from my colleagues, this seems to be a quite common challenge. That's why I decided to write this post with the goal of going beyond what you typically find in stackoverflow answers and comments.

So what is the problem? We have a column that contains some list and we want to turn it into rows. So from this ...

col
a,b,c

we want to get this ...

result
a
b
c

I already hear you saying: "But this is not a good design!". True, first normal form (1.NF) does not allow the source table to look like above. But the reality check says, that not all developers (and data modellers) know 1.NF and even less follow it, often for good reasons, e.g. to build some kind of staging area in their database.

Let us start with the test data. We will use Oracle dual "table" for this purpose

SELECT 'a,b,c' AS col 
FROM dual
Enter fullscreen mode Exit fullscreen mode

and to improve readability of the SQL statements below, we'll pack it into a WITH clause

WITH test AS (
SELECT 'a,b,c' AS col 
FROM dual )
SELECT * FROM test
Enter fullscreen mode Exit fullscreen mode

The most common solution is to use REGEXP_SUBSTR and REGEXP_COUNT functions to achieve the expected result

WITH test AS (
SELECT 'a,b,c' AS col 
FROM dual )
SELECT LEVEL AS n, REGEXP_SUBSTR(col, '[^,]+', 1, LEVEL) AS out
FROM test
CONNECT BY LEVEL <= REGEXP_COUNT(col, ',') + 1
Enter fullscreen mode Exit fullscreen mode

We get

N OUT
1 a
2 b
3 c

But this "solution" has one big drawback. Let's try to put more than just one row into test data

WITH test AS (
SELECT 'a,b,c' AS col
FROM dual 
UNION ALL
SELECT 'd,e,f' AS col 
FROM dual )
SELECT LEVEL AS n, REGEXP_SUBSTR(col, '[^,]+', 1, LEVEL) AS out 
FROM test
CONNECT BY LEVEL <= REGEXP_COUNT(col, ',') + 1
ORDER BY n, out
Enter fullscreen mode Exit fullscreen mode

NOTE: ORDER BY was also added as last line to the query above to improve readability of the output.

The output is not what is expected

N OUT
1 a
1 d
2 b
2 b
2 e
2 e
3 c
3 c
3 c
3 c
3 f
3 f
3 f
3 f

What's going on here? Data in second level is duplicated and on third level it even quadrupled. What's wrong? In order to find out, we add function call for SYS_CONNECT_BY_PATH

WITH test AS (
SELECT 'a,b,c' AS col
FROM dual 
UNION ALL
SELECT 'd,e,f' AS col 
FROM dual )
SELECT LEVEL AS n, REGEXP_SUBSTR(col, '[^,]+', 1, LEVEL) AS out,
SYS_CONNECT_BY_PATH(LEVEL || '+' || REGEXP_SUBSTR(col, '[^,]+', 1, LEVEL), '/') as path
FROM test
CONNECT BY LEVEL <= REGEXP_COUNT(col, ',') + 1
ORDER BY n, out
Enter fullscreen mode Exit fullscreen mode
N OUT PATH
1 a /1+a
1 d /1+d
2 b /1+a/2+b
2 b /1+d/2+b
2 e /1+d/2+e
2 e /1+a/2+e
3 c /1+d/2+b/3+c
3 c /1+a/2+b/3+c
3 c /1+d/2+e/3+c
3 c /1+a/2+e/3+c
3 f /1+d/2+b/3+f
3 f /1+d/2+e/3+f
3 f /1+a/2+b/3+f
3 f /1+a/2+e/3+f

Output shows, that the query is traversing the path and is jumping from one test data row to the other. I tried using some features like NOCYCLE or adding a unique column to the test data to fix this, but without success. If you have a solution, please let me know.

But there is a solution, in my opinion even more elegant. Welcome to CROSS APPLY join clause.

WITH test AS (
SELECT 'a,b,c' AS col
FROM dual 
UNION ALL
SELECT 'd,e,f' AS col 
FROM dual )
SELECT *
FROM test t
CROSS APPLY 
(SELECT LEVEL AS n, REGEXP_SUBSTR(t.col, '[^,]+', 1, LEVEL) AS out
FROM dual
CONNECT BY LEVEL <= REGEXP_COUNT(t.col, ',') + 1)
ORDER BY n, out
Enter fullscreen mode Exit fullscreen mode

The output is exactly as expected

COL N OUT
a,b,c 1 a
d,e,f 1 d
a,b,c 2 b
d,e,f 2 e
a,b,c 3 c
d,e,f 3 f

I have a follow up to this article in my mind on how to enhance the query for source data containing strings (or even CLOBs), that cannot be easily parsed by regular expressions. So stay tuned!

That's it, folks! at least for now.

Top comments (0)