Mohamed Mokhtar

Posted on Jun 8, 2023

Introduction to RangeTableEntry (RTE) & Query from PostgresQL perspective

#postgres #database #apacheage #opensource

Introduction

In the following post we will introduce you to a new changes we are looking for at AGE (it is a goal that we are looking for so we can build an easy compatibility to feature and new versions of PostgreSQL)

RTE is Range Table Entry; What is it? Why we are introducing that; it is for a project purpose for AGE (Apache Age Extension)

Project description

PG 13 uses PNSI instead of RTE in many functions, PNSI exists in PG11 and PG12 as well. What is worth exploring is that we can replace RTE with PNSI totally in PG11 and PG12 versions. If we can, multiple versions can be closer to each other with minimum changes and further development can be easy as well.

Project goals

Comparing RTE with PNSI in PG12 and PG13
Checking out whether it is possible to replace RTE with PNSI in the newer versions of AGE to keep them compatible, through checking the source code of PG13 and compare it with the parts in PG12 that consists of RTE and check their replacement with PNSI ability.

Project tasks

Research about PNSI and RTE (Next blog will be about PNSI)
List down similarity and differences between both RTE and PNSI (At next blog)
Implement PNSI in one of the cypher clause for test purpose (Another blog)

Research about RTE

Definition

RTE stands for Range Table Entry (In code it is RangeTblEntry)
What exactly range table entry is?
In the query processing after having the parse tree built through the Parser we are going into the Analyzer
The analyzer/analyser runs a semantic analysis of a parse tree generated by the parser and generates a query tree.

On the query parsed we are having something in the query tree called range table it is a list which contains the table which are being used in the FROM statement.
i.e SELECT col_a FROM tbl_a where id = 10;

After parsing the tree (is a Query tree) it will contain a RangeTableList that built as a linked list of RangeTableEntry
Each range table entry points to a table and contains some of the data of the table
What is the struct of that RTE:


typedef struct RangeTblEntry
{
    NodeTag     type;
// Some fields was here I have removed it to concentrate  I will build that struct field by field 
} RangeTblEntry;

That looks very large struct but let's go through it step by step.

NodeTag

The NodeTag is the first field of every created node, wherever that node is created, i.e having a node in the parse tree it must contain a NodeTag to indicate what is the type of that node
Simply it is an ENUM typ
Each phase of the processing has its types so that we will find tags for the EXECUTION phase and tags for PLANNING phase and etc.

src/include/nodes/nodes.h

For our purpose we are looking at the query tree (after running semantic analysis on parse tree, it becomes query tree using the analyzer) nodes specifically

/*
typedef enum NodeTag
{
     * TAGS FOR PARSE TREE NODES (parsenodes.h)
     */
// Some fields was here I have removed it to concentrate  
    T_RangeSubselect,
    T_RangeFunction,
    T_RangeTableSample,
    T_RangeTableFunc,
    T_RangeTableFuncCol,
// Some fields was here I have removed it to concentrate  
    T_RangeTblEntry,
    T_RangeTblFunction,
// Some fields was here I have removed it to concentrate  
}

We can find our RTE listed as one of the types of that enum so here we conclude that RTE is a NODE that's within Query phase
and has that NodeTag to indicate it's type.

typedef struct RangeTblEntry
{
    NodeTag     type;
    RTEKind     rtekind;        
}

RTEKind

/*--------------------
 * RangeTblEntry -
 *    A range table is a List of RangeTblEntry nodes.
 *
 *    A range table entry may represent a plain relation, a sub-select in
 *    FROM, or the result of a JOIN clause.  (Only explicit JOIN syntax
 *    produces an RTE, not the implicit join resulting from multiple FROM
 *    items.  This is because we only need the RTE to deal with SQL features
 *    like outer joins and join-output-column aliasing.)  Other special
 *    RTE types also exist, as indicated by RTEKind.

Rangle Table Entry is a Node pointing to what? Guess what?
pointing to A TABLE but what kind of that table; here RTEKind comes to tell us:

typedef enum RTEKind
{
    RTE_RELATION,               /* ordinary relation reference */
    RTE_SUBQUERY,               /* subquery in FROM */
    RTE_JOIN,                   /* join */
    RTE_FUNCTION,               /* function in FROM */
    RTE_TABLEFUNC,              /* TableFunc(.., column list) */
    RTE_VALUES,                 /* VALUES (<exprlist>), (<exprlist>), ... */
    RTE_CTE,                    /* common table expr (WITH list element) */
    RTE_NAMEDTUPLESTORE,        /* tuplestore, e.g. for AFTER triggers */
    RTE_RESULT                  /* RTE represents an empty FROM clause; such
                                 * RTEs are added by the planner, they're not
                                 * present during parsing or rewriting */
} RTEKind;

What else in RTE

    Oid         relid;          /* OID of the relation */
    char        relkind;        /* relation kind (see pg_class.relkind) */
    int         rellockmode;    /* lock level that query requires on the rel */
    struct TableSampleClause *tablesample;  /* sampling info, or NULL */

Oid: is the object id of the relation/table
I think it is self descried by the comments

Other some fields are specifically for some RTEKinds I will not mention them I will mention the rest only which are valid for all RTEKinds

/*
     * Fields valid in all RTEs:
     */
    Alias      *alias;          /* user-written alias clause, if any */
    Alias      *eref;           /* expanded reference names */
    bool        lateral;        /* subquery, function, or values is LATERAL? */
    bool        inh;            /* inheritance requested? */
    bool        inFromCl;       /* present in FROM clause? */
    AclMode     requiredPerms;  /* bitmask of required access permissions */
    Oid         checkAsUser;    /* if valid, check access as this role */
    Bitmapset  *selectedCols;   /* columns needing SELECT permission */
    Bitmapset  *insertedCols;   /* columns needing INSERT permission */
    Bitmapset  *updatedCols;    /* columns needing UPDATE permission */
    Bitmapset  *extraUpdatedCols;   /* generated columns being updated */
    List       *securityQuals;  /* security barrier quals to apply, if any */

From the perspective of their usage

Simply we use it for building the Query tree and proceed with it in the other phases of the Query processing

From the perspective of alternating RTE with PNSI

How can we replace a Query tree Node with another something ParseNameSpaceItem we should learn what is it exactly to check whether we can replace them or not you will get it on next post, see you there!

What are the difference between RTE at PG12 and PG13

At the fields those valid for a join RTE (else NULL/zero):

At PG 13

    JoinType    jointype;       /* type of join */
    int         joinmergedcols; /* number of merged (JOIN USING) columns */
    List       *joinaliasvars;  /* list of alias-var expansions */
    List       *joinleftcols;   /* left-side input column numbers */
    List       *joinrightcols;  /* right-side input column numbers */

At PG 12

    JoinType    jointype;       /* type of join */
    List       *joinaliasvars;  /* list of alias-var expansions */

Nothing else is changed

Conclusion

In PostgreSQL, a Range Table Entry (RTE) is a data structure used to represent a table or subquery in a query plan. RTEs are used extensively by the PostgreSQL query planner and executor to represent the various tables, subqueries, and other data sources involved in a query.

Each RTE contains information about a single table or subquery, including its name, alias, column definitions, and any join clauses or other conditions that should be applied when querying the table. RTEs are typically created during the parsing and planning phases of query processing, and are then used to generate an execution plan for the query.

In general, RTEs are represented as nodes in the PostgreSQL query tree data structure, which is used to represent the logical structure of a query. The query tree is constructed during the parsing phase of query processing, and is then used by the query planner to generate an execution plan for the query.

Resources & References

https://github.com/apache/age
https://age.apache.org/
https://github.com/postgres/postgres/ (Branches 12 and 13)

DEV Community