I ran into a requirement today where I had to combine a number of strings (varchars) into a delimited list. Some strings would be NULL and should be ignored.
I played around and found a bunch of different approaches which I’d like to share.
Let’s assume we have a package info_util
with the function person
which will combine a number of information about a person to a delimited list (to simplify the example all information is passed by parameter):
dbms_output.put_line(info_util.person(
i_name => 'Luke Skywalker',
i_alignment => 'light',
i_comment => 'Most powerful jedi of all times'));
-- Output: Luke Skywalker, light, Most powerful jedi of all times
dbms_output.put_line(info_util.person(
i_name => 'Vader',
i_title => 'Darth',
i_alignment => 'dark',
i_comment => 'Pretty evil'));
-- Output: Vader, Darth, dark, Pretty evil
Basic String concatenation
The simplest solution is just to go over everything with some if-logic and concatenating the strings
function person_concatenate(
i_name varchar2,
i_title varchar2,
i_alignment varchar2,
i_comment varchar2
) return varchar2 deterministic
as
l_result varchar2(32000);
begin
if i_name is not null then
l_result := l_result || i_name;
end if;
if i_title is not null then
if l_result is not null then
l_result := l_result || ', ';
end if;
l_result := l_result || i_title;
end if;
if i_alignment is not null then
if l_result is not null then
l_result := l_result || ', ';
end if;
l_result := l_result || i_alignment;
end if;
if i_comment is not null then
if l_result is not null then
l_result := l_result || ', ';
end if;
l_result := l_result || i_comment;
end if;
return substr(l_result, 1, 4000);
end;
But this is very hard to read, understand and maintain. If you don’t believe me, how long do you need to make sure there’s no difference between handling of i_alignment
and i_comment
?
Its also tedious and boring.
Using LISTAGG
The LISTAGG
functionality does more or less exactly what we want here, so why don’t we use it?
function person_listagg(
i_name varchar2,
i_title varchar2,
i_alignment varchar2,
i_comment varchar2
) return varchar2 deterministic
as
/* We have to collect all parts
in a nested table collection first
*/
l_parts sys.odcivarchar2list := new sys.odcivarchar2list();
l_result varchar2(4000);
/* This function makes our code more
readable: just extend the collection and add an item
*/
procedure add_part( i_string varchar2 )
as
begin
l_parts.extend;
l_parts(l_parts.last) := i_string;
end;
begin
/* Now we can simple add the parts */
add_part(i_name);
add_part(i_title);
add_part(i_alignment);
add_part(i_comment);
/* And can use LISTAGG to aggregate */
select listagg(column_value, ', ') within group (order by rownum)
into l_result
from table(l_parts);
return l_result;
end;
The main problem here is that it will probably be very slow due to the PL/SQL -> SQL context switches.
Concatenate in a PL/SQL Loop
So maybe we can handle the concatenation in a PL/SQL Loop instead of using LISTAGG.
function person_plsql_comb(
i_name varchar2,
i_title varchar2,
i_alignment varchar2,
i_comment varchar2
) return varchar2 deterministic
as
/* So lets take the thing that improves readability */
l_parts sys.odcivarchar2list := new sys.odcivarchar2list();
l_result varchar2(4000);
procedure add_part( i_string varchar2 )
as
begin
l_parts.extend;
l_parts(l_parts.last) := i_string;
end;
begin
add_part(i_name);
add_part(i_title);
add_part(i_alignment);
add_part(i_comment);
/* But handle the concatenation in PL/SQL */
for i in l_parts.first..l_parts.last loop
if l_parts(i) is not null then
if l_result is not null then
l_result := l_result || ', ';
end if;
l_result := l_result || l_parts(i);
end if;
end loop;
return l_result;
end;
That looks kinda nice to me. But what about the performance?
The following code runs the function under test 20’000 times (10’000 for each use-case). To eliminate any advantage of the DETERMINISTIC
keyword, I added the run to each name, so no caching involved.
declare
l_ts timestamp := current_timestamp;
l_info varchar2(4000);
begin
for i in 1..10000 loop
l_info := info_util.person_concatenate('Luke Skywalker'||to_char(i), null, 'light', 'Most powerful jedi of all times');
l_info := info_util.person_concatenate('Vader'||to_char(i), 'Darth', 'dark', 'Pretty evil');
end loop;
dbms_output.put_line('Basic concatenation: ' || to_char(current_timestamp-l_ts));
end;
/
| Basic concatenation | 0.02 seconds |
| LISTAGG | 0.8 seconds |
| PL/SQL Loop | 0.1 seconds |
So we see that the context-switches of LISTAGG indeed slow down our functionality. The looping and collection initialization stuff also comes at a significant performance cost, but it’s by far not that bad.
What if we use the DETERMINISTIC
advantage and basically just call 2 different statements over and over again?
| Basic concatenation | 0.0008 seconds |
| PL/SQL Loop | 0.0008 seconds |
There is hardly any difference, which means that if your data is homogenous, the additional performance cost of the Collection-with-Loop approach won’t be noticable (Performance always depends on your data).
PL/SQL Object
What we could do now is to extract the whole aggregation part into a PL/SQL object (and yes, of course we can improve the readability and still not use objects, but they are very handy, especially in these cases):
create or replace type t_string_aggregator force is object
(
/* The nested table is now part of our object */
c_parts sys.odcivarchar2list,
constructor function t_string_aggregator return self as result,
member procedure add_string( i_string varchar2 ),
member function get_aggregate( i_delimiter varchar2 default ', ' )
return varchar2
);
/
create or replace type body t_string_aggregator as
constructor function t_string_aggregator return self as result
as
begin
/* Lets not forget to initialize the collection */
c_parts := new sys.odcivarchar2list();
return;
end;
/* This is basically the same as the internal procedure
we used in the other approaches
*/
member procedure add_string( i_string varchar2 )
as
begin
c_parts.extend;
c_parts(c_parts.last) := i_string;
end;
/* We can even make the delimiter dynamic */
member function get_aggregate( i_delimiter varchar2 )
return varchar2
as
l_result varchar2(4000);
begin
/* Little tweak if we dont have any items */
if c_parts.count < 0 then
return null;
end if;
for i in c_parts.first..c_parts.last loop
if c_parts(i) is not null then
if l_result is not null then
l_result := l_result || i_delimiter;
end if;
l_result := l_result || c_parts(i);
end if;
end loop;
return l_result;
end;
end;
/
The final function would then look like this:
function person_plsql_obj(
i_name varchar2,
i_title varchar2,
i_alignment varchar2,
i_comment varchar2
) return varchar2 deterministic
as
l_aggregator t_string_aggregator := new t_string_aggregator();
begin
l_aggregator.add_string(i_name);
l_aggregator.add_string(i_title);
l_aggregator.add_string(i_alignment);
l_aggregator.add_string(i_comment);
return l_aggregator.get_aggregate();
end;
Pretty small and readable once we know what t_string_aggregator
is good for, right?
Performance-wise, it is about 10% slower than the PL/SQL Loop approach – which again is not noticable when using the DETERMINISTIC
advantage – depending on your data.
_ Warning: This solution has severe problems once the sum of the string-parts exceed 4000 chars. But that’s another topic and another example._
You can run the full example on LiveSQLand get the sourcecode from my GitHub repository.
Thoughts
So, why do I like this last approach via PL/SQL Objects so much?
From my point of view, it modularizes and encapsulates my functionality pretty good. This leads to the following advantages:
- it’s easier to understand when I use it in other places
- it has a clearly defined scope
- it is easier to expand (e.g. overflow-mechanics)
- it enables DRY (Don’t repeat yourself)
- it is very easy to test.
Looking forward to hear what you folks think about it.
Top comments (0)