Samuel Nitsche

Posted on Sep 12, 2019 • Originally published at cleandatabase.wordpress.com on Aug 30, 2019

Concatenate a List of Strings in PL/SQL

#100codeexamples #plsql #oracle

I ran into a requirement today where I had to combine a number of strings (varchars) into a delimited list. Some strings would be NULL and should be ignored.

I played around and found a bunch of different approaches which I’d like to share.

Let’s assume we have a package info_util with the function person which will combine a number of information about a person to a delimited list (to simplify the example all information is passed by parameter):

dbms_output.put_line(info_util.person(
  i_name => 'Luke Skywalker',
  i_alignment => 'light',
  i_comment => 'Most powerful jedi of all times'));
-- Output: Luke Skywalker, light, Most powerful jedi of all times

dbms_output.put_line(info_util.person(
  i_name => 'Vader',
  i_title => 'Darth',
  i_alignment => 'dark',
  i_comment => 'Pretty evil'));
-- Output: Vader, Darth, dark, Pretty evil

Basic String concatenation

The simplest solution is just to go over everything with some if-logic and concatenating the strings

function person_concatenate(
  i_name varchar2,
  i_title varchar2,
  i_alignment varchar2,
  i_comment varchar2
) return varchar2 deterministic
as
  l_result varchar2(32000);
  begin
    if i_name is not null then
      l_result := l_result || i_name;
    end if;

    if i_title is not null then
      if l_result is not null then
        l_result := l_result || ', ';
      end if;
      l_result := l_result || i_title;
    end if;

    if i_alignment is not null then
      if l_result is not null then
        l_result := l_result || ', ';
      end if;
      l_result := l_result || i_alignment;
    end if;

    if i_comment is not null then
      if l_result is not null then
        l_result := l_result || ', ';
      end if;
      l_result := l_result || i_comment;
    end if;

    return substr(l_result, 1, 4000);
  end;

But this is very hard to read, understand and maintain. If you don’t believe me, how long do you need to make sure there’s no difference between handling of i_alignment and i_comment?

Its also tedious and boring.

Using LISTAGG

The LISTAGG functionality does more or less exactly what we want here, so why don’t we use it?

function person_listagg(
  i_name varchar2,
  i_title varchar2,
  i_alignment varchar2,
  i_comment varchar2
) return varchar2 deterministic
as
  /* We have to collect all parts
     in a nested table collection first
   */
  l_parts sys.odcivarchar2list := new sys.odcivarchar2list();
  l_result varchar2(4000);

  /* This function makes our code more
     readable: just extend the collection and add an item
   */
  procedure add_part( i_string varchar2 )
  as
    begin
      l_parts.extend;
      l_parts(l_parts.last) := i_string;
    end;
  begin
    /* Now we can simple add the parts */
    add_part(i_name);
    add_part(i_title);
    add_part(i_alignment);
    add_part(i_comment);

    /* And can use LISTAGG to aggregate */
    select listagg(column_value, ', ') within group (order by rownum)
      into l_result
      from table(l_parts);

    return l_result;
  end;

The main problem here is that it will probably be very slow due to the PL/SQL -> SQL context switches.

Concatenate in a PL/SQL Loop

So maybe we can handle the concatenation in a PL/SQL Loop instead of using LISTAGG.

function person_plsql_comb(
  i_name varchar2,
  i_title varchar2,
  i_alignment varchar2,
  i_comment varchar2
) return varchar2 deterministic
as
  /* So lets take the thing that improves readability */
  l_parts sys.odcivarchar2list := new sys.odcivarchar2list();
  l_result varchar2(4000);
  procedure add_part( i_string varchar2 )
  as
    begin
      l_parts.extend;
      l_parts(l_parts.last) := i_string;
    end;
  begin
    add_part(i_name);
    add_part(i_title);
    add_part(i_alignment);
    add_part(i_comment);

    /* But handle the concatenation in PL/SQL */
    for i in l_parts.first..l_parts.last loop
      if l_parts(i) is not null then
        if l_result is not null then
          l_result := l_result || ', ';
        end if;
        l_result := l_result || l_parts(i);
      end if;
    end loop;

    return l_result;
  end;

That looks kinda nice to me. But what about the performance?

The following code runs the function under test 20’000 times (10’000 for each use-case). To eliminate any advantage of the DETERMINISTIC keyword, I added the run to each name, so no caching involved.

declare
  l_ts timestamp := current_timestamp;
  l_info varchar2(4000);
begin
    for i in 1..10000 loop
        l_info := info_util.person_concatenate('Luke Skywalker'||to_char(i), null, 'light', 'Most powerful jedi of all times');
        l_info := info_util.person_concatenate('Vader'||to_char(i), 'Darth', 'dark', 'Pretty evil');
    end loop;
    dbms_output.put_line('Basic concatenation: ' || to_char(current_timestamp-l_ts));
end;
/

So we see that the context-switches of LISTAGG indeed slow down our functionality. The looping and collection initialization stuff also comes at a significant performance cost, but it’s by far not that bad.

What if we use the DETERMINISTIC advantage and basically just call 2 different statements over and over again?

There is hardly any difference, which means that if your data is homogenous, the additional performance cost of the Collection-with-Loop approach won’t be noticable (Performance always depends on your data).

PL/SQL Object

What we could do now is to extract the whole aggregation part into a PL/SQL object (and yes, of course we can improve the readability and still not use objects, but they are very handy, especially in these cases):

create or replace type t_string_aggregator force is object
(
  /* The nested table is now part of our object */
  c_parts sys.odcivarchar2list,

  constructor function t_string_aggregator return self as result,
  member procedure add_string( i_string varchar2 ),
  member function get_aggregate( i_delimiter varchar2 default ', ' )
    return varchar2
);
/

create or replace type body t_string_aggregator as
  constructor function t_string_aggregator return self as result
  as
    begin
      /* Lets not forget to initialize the collection */
      c_parts := new sys.odcivarchar2list();
      return;
    end;

  /* This is basically the same as the internal procedure
     we used in the other approaches
   */
  member procedure add_string( i_string varchar2 )
  as
    begin
      c_parts.extend;
      c_parts(c_parts.last) := i_string;
    end;

  /* We can even make the delimiter dynamic */
  member function get_aggregate( i_delimiter varchar2 )
    return varchar2
  as
    l_result varchar2(4000);
    begin
      /* Little tweak if we dont have any items */
      if c_parts.count < 0 then
        return null;
      end if;

      for i in c_parts.first..c_parts.last loop
          if c_parts(i) is not null then
            if l_result is not null then
                  l_result := l_result || i_delimiter;
              end if;
            l_result := l_result || c_parts(i);
          end if;
      end loop;

      return l_result;
    end;
end;
/

The final function would then look like this:

function person_plsql_obj(
  i_name varchar2,
  i_title varchar2,
  i_alignment varchar2,
  i_comment varchar2
) return varchar2 deterministic
as
  l_aggregator t_string_aggregator := new t_string_aggregator();
  begin
    l_aggregator.add_string(i_name);
    l_aggregator.add_string(i_title);
    l_aggregator.add_string(i_alignment);
    l_aggregator.add_string(i_comment);

    return l_aggregator.get_aggregate();
  end;

Pretty small and readable once we know what t_string_aggregator is good for, right?

Performance-wise, it is about 10% slower than the PL/SQL Loop approach – which again is not noticable when using the DETERMINISTIC advantage – depending on your data.

_ Warning: This solution has severe problems once the sum of the string-parts exceed 4000 chars. But that’s another topic and another example._

You can run the full example on LiveSQLand get the sourcecode from my GitHub repository.

Thoughts

So, why do I like this last approach via PL/SQL Objects so much?

From my point of view, it modularizes and encapsulates my functionality pretty good. This leads to the following advantages:

it’s easier to understand when I use it in other places
it has a clearly defined scope
it is easier to expand (e.g. overflow-mechanics)
it enables DRY (Don’t repeat yourself)
it is very easy to test.

Looking forward to hear what you folks think about it.

DEV Community