Be Hai Nguyen

Posted on Nov 15, 2022

Python: executing PostgreSQL stored functions which return multiple result sets.

#python #postgres #storedprocedures #resultsets

PostgreSQL stored functions can return multiple result sets. In this post, we're looking at calling such stored function; for each returned record set, retrieving its column names and records, then store these in a structure of our own choosing. We're using SQLAlchemy, and the psycopg2 connector library.

This SQLAlchemy page PostgreSQL lists several connector packages. I'm using psycopg2 2.9.57, since it is on top of the list, and also it seems to be used by everybody else.

I am replicating the MySQL database, and the stored procedure discussed in Python: executing MySQL stored procedures which return multiple result sets over to PostgreSQL.

To recap, the source database is the MySQL test data released by Oracle Corporation. Downloadable from https://github.com/datacharmer/test_db. I've migrated it over to PostgreSQL as discussed in pgloader Docker: migrating from Docker & localhost MySQL to localhost PostgreSQL, please see command ❷.

The stored function in this post is exactly identical to the one in MySQL. That is:

Each returned record set is in its own list. And in this list:
- The first element is the list of column names.
- The subsequent elements are lists of data records.
All record set lists are stored within a another list.
When there is no data, extract only column names for each record set.

I've not done PostgreSQL before. I've commercial experiences with Microsoft SQL Server, Oracle, InterBase and MySQL: the PostgreSQL learning curve is not too steep for me... Before writing this post, I've finished porting fourteen ( 14 ) stored methods in MySQL to PostgreSQL for the project I'm currently working on. The objective is to get the server application supports both MySQL and PostgreSQL seamlessly. None of the stored method I've ported returns multiple result sets, I'm looking at this issue just for my own understanding. Principal references used in this post are:

Postgres function: return multiple tables -- please see the accepted answer. The stored function does not have any named cursor in the argument list, just RETURNS setof refcursor.
Is there a proper way to handle cursors returned from a postgresql function in psycopg? This is not about multiple result sets, but about using named cursor. I.e. cur1.callproc('reffunc', ['curname']). I'm not sure if this is a correct answer ( it was accepted ) or not, but it does give me some more information to experiment with, especially ['curname'].
Get a list of column names from a psycopg2 cursor executing stored proc? -- please see the answer by user piro on Mar 1, 2018. He's one of the maintainers of the psycopg2 2.9.57 library.
Also, the Python codes presented by the post initiator give me information on how to work the named cursor, too.

It is a bit hard to articulate, but these three ( 3 ) posts have given me enough to do my own experiments: I don't like the idea of having to specify a list of returned named cursors in the manner of reference posts 2 ( two ) and 3 ( three ). I like the flexibility in reference post 1 ( one ) much better.

This is my first version of the stored method:

create or replace function DemoStoredProc1( pm_dept_no varchar(4) )
returns setof refcursor
language plpgsql
as
$$
declare 
  c1 refcursor;
  c2 refcursor;
begin
  open c1 for 
  select * from departments where dept_no = pm_dept_no;
  return next c1;

  open c2 for   
  select * from dept_manager where dept_no = pm_dept_no;
  return next c2;
end;
$$

I test it within pgAdmin with:

select * from DemoStoredProc1( 'd001' );

And the output is:

demostoredproc1 refcursor
<unnamed portal 1>;
<unnamed portal 2>;

From what I can gather, each of these unnamed portal refcursor is called a server side cursor?

After some experimentations, this is the resulting Python codes:

from sqlalchemy import create_engine
from contextlib import closing

engine = create_engine( 'postgresql+psycopg2://postgres:secret-password@localhost/employees', echo = False )
connection = engine.raw_connection()

def collect_result_set( connection, named_cursor: str, data: list ) -> None:
    cursor = connection.cursor( named_cursor )

    dataset = []
    row = cursor.fetchone()
    columns = [ column[0] for column in cursor.description ]
    dataset.append( columns )

    if ( row != None ): 
        dataset.append( list(row) )
        for row in cursor:
            dataset.append( list(row) )

    data.append( dataset )
    cursor.close()

try:
    with closing( connection.cursor() ) as cursor:
        cursor.callproc( 'DemoStoredProc1', [ 'd001' ] )

        data = []       
        for cur in cursor:             
            collect_result_set( connection, cur[0], data )

        cursor.close()

        import pprint
        print( '\n' )
        pprint.pprint( data )

except Exception as e:
    print( f'Exception. Type {type(e)}: {str(e)}' )
finally:
    if 'connection' in locals():
        connection.close()

def collect_result_set( connection, named_cursor: str, data: list ) -> None:

to get column names, after getting the server side cursor, as per reference post 3 ( three ) above, we need to call .fetchone() to make the column names available. After this, the column names are available regardless if there is any data or not:

columns = [ column[0] for column in cursor.description ]

The above one liner we've seen before in the post for MySQL. Parameter named_cursor: str is my own try, and my own lucky guess! In the main loop, if we add the print( cur, cur[0], type(cur[0]) ) statement:

        for cur in cursor: 
            print( cur, cur[0], type(cur[0]) )

We would get the following printouts:

('<unnamed portal 1>;',) <unnamed portal 1>; <class 'str'>;
('<unnamed portal 2>;',) <unnamed portal 2>; <class 'str'>;

Based on the information given in reference posts 1 ( one ) and 2 ( two ), I thought I should try cur[0] -- and it just happens to work.

I try another implementation of the above stored method, using named cursor arguments, as per reference posts 2 ( two ) and 3 ( three ):

create or replace function DemoStoredProc2( pm_dept_no varchar(4),
    c1 refcursor, c2 refcursor )
returns setof refcursor
language plpgsql
as
$$
begin
  open c1 for 
  select * from departments where dept_no = pm_dept_no;
  return next c1;

  open c2 for   
  select * from dept_manager where dept_no = pm_dept_no;
  return next c2;
end;
$$

And following is the Python implementation:

<pre>
from sqlalchemy import create_engine
from contextlib import closing

engine = create_engine( 'postgresql+psycopg2://postgres:secret-password@localhost/employees', echo = False )
connection = engine.raw_connection()

def collect_result_set( connection, named_cursor: str, data: list ) -> None:
    cursor = connection.cursor( named_cursor )

    dataset = []
    row = cursor.fetchone()
    columns = [ column[0] for column in cursor.description ]
    dataset.append( columns )

    if ( row != None ): 
        dataset.append( list(row) )
        for row in cursor:
            dataset.append( list(row) )

    data.append( dataset )
    cursor.close()

try:
    with closing( connection.cursor() ) as cursor:
        cursor.callproc( 'DemoStoredProc2', [ 'd001', 'c1', 'c2' ] )

        data = []

        collect_result_set( connection, 'c1', data )
        collect_result_set( connection, 'c2', data )

        cursor.close()

        import pprint
        print( '\n' )
        pprint.pprint( data )

except Exception as e:
    print( f'Exception. Type {type(e)}: {str(e)}' )
finally:
    if 'connection' in locals():
        connection.close()

def collect_result_set( connection, named_cursor: str, data: list ) -> None:

remains unchanged in this implementation. Only the main block has to be changed: method collect_result_set( ... ) is called explicitly with the two named cursors. That means, if the stored method must return another additional result set, the codes of the main block must be modified to handle this new result set:

-- This is the reason why I like the first implementation better.

Both of these implementations return the same result:

[
    [
        ['dept_no', 'dept_name'], 
        ['d001', 'Marketing']
    ],
    [
        ['emp_no', 'dept_no', 'from_date', 'to_date'],
        [110022, 'd001', datetime.date(1985, 1, 1), datetime.date(1991, 10, 1)],
        [110039, 'd001', datetime.date(1991, 10, 1), datetime.date(9999, 1, 1)]
    ]
]

Changed d001 to, for example, dxx1, we will get:

[
    [
        ['dept_no', 'dept_name']
    ], 
    [
        ['emp_no', 'dept_no', 'from_date', 'to_date']
    ]
]

✿✿✿

I find this interesting and helpful. I hope you do too. Thank you for reading and stay safe as always.

Top comments (1)

Ghayas Abou Fakher • Oct 15 '23

You have to fetch the result into your refcursor using:
select * from DemoStoredProc1( 'd001' )
FETCH ALL IN "c1 ";

DEV Community

Python: executing PostgreSQL stored functions which return multiple result sets.

Top comments (1)

Read next

Chatbot with Semantic Kernel - Part 3: Inspector & tokens 🔎

How to Install PySpark on Your Local Machine

Vedro Hooks

7 Powerful Python Metaprogramming Techniques for Dynamic Code