DEV Community

Cover image for The Internals of PostgreSQL: Chapter 4: Foreign Data Wrappers and Parallel Query
Hasnain Somani
Hasnain Somani

Posted on

The Internals of PostgreSQL: Chapter 4: Foreign Data Wrappers and Parallel Query

Abbreviations to be used in the post:
FDW = Foreign Data Wrappers
MED = Management of External Data

FDW allows access to remotely stored data with the help of SQL/MED. In order to use FDW, appropriate extensions have to be installed, and necessary commands such as 'CREATE FOREIGN TABLE", "CREATE SERVER" have to be set-up. Once the set-up is complete, foreign tables can be accessed on the remote servers through SELECT queries. Moreover, FDW also allows join operations between foreign tables.

The process of how FDW works is:

  1. The analyzer generates the query tree from the SQL input.
  2. The executor connects to the remote server through the use of specific libraries based on foreign data wrapper being used.
  3. The cost of each plan path is estimated using the EXPLAIN command, given that the use_remote_estimate option is enabled.
  4. Deparsing: creation of plain text SQL statement from a plan tree.
  5. This plain text SQL statement is sent to the remote server. The result is processed by the executor. This processing includes join operations, and necessary operations made on the tables. The results may also be converted into a PostgreSQL readable format.

FDW extensions use the EXPLAIN command to get statistics, and estimate the plan tree for a query (As discussed in chapter 3). These results are best reflected by postgres_fdw extension of FDW. Others may not provide enough information.

In order to execute SELECT queries, a sequence of SQL statements are followed:

  1. Start a remote transaction.
  2. Declare a cursor.
  3. Fetch the result.
  4. Close the cursor.
  5. Commit the transaction.

For multi-table queries, each foreign table is fetched individually through single table SELECT statement, and then a join operation is performed on the local server. This was done in earlier versions even if the foreign tables were on the same remote server - In later versions, postgres_fdw can execute remote join operation on the remote servers if the tables are on the same server (the use_remote_estimate option should be enabled). Although, it is important to keep in mind that distributed deadlock detection is not supported by FDW - hence, if multiple transactions involve updates on both local and foreign tables simultaneously, deadlocks can occur. This is one disadvantage, or a short coming of Postgre's FDW.

Overall, FDW supports access to remote data, and allows managing it just like local data - resulting in seamless integration of data.

Top comments (0)