Hello guys, this is the fourth article of my blog series. Without further ado, let's begin with today's topic.
PostgreSQL, a relational database management system, has been continuously developed since 1986 as an open-source project. Among its notable features, Foreign Data Wrappers (FDW) stand out. FDW enables users to seamlessly access and manage data from remote sources as if they were part of their local PostgreSQL database. This capability simplifies data management tasks and enhances query performance.
In this blog post, we will dive into the world of FDWs in PostgreSQL, offering a comprehensive exploration of their architecture, extensions, and practical usage for handling data from diverse sources. The post will outline the advantages of leveraging FDWs and present illustrative examples demonstrating how they can effectively integrate data from external sources. By the end, readers will gain valuable insights into harnessing the power of FDWs in PostgreSQL to streamline their data management processes.
What are Foreign Data Wrappers (FDW)?
Foreign Data Wrappers (FDWs) in PostgreSQL enable users to access and manage data stored in external databases or file systems seamlessly, as if it were native to their PostgreSQL database. This eliminates the need to transfer data between systems and allows querying and manipulation of data from various sources within a single PostgreSQL database.
FDWs are essentially extensions in PostgreSQL that define the communication with external data sources. After installing and configuring an FDW, users can create foreign tables in their PostgreSQL database, which establish a connection to the external data source. These foreign tables can then be queried and updated like regular tables in PostgreSQL.
The flexibility and extensibility of PostgreSQL's FDW feature enable the creation of custom FDWs to communicate with a wide range of external data sources. PostgreSQL includes several built-in FDWs, such as postgres_fdw for accessing PostgreSQL databases, file_fdw for accessing flat files, oracle_fdw for accessing Oracle databases, mysql_fdw for accessing MySQL databases, and mongodb_fdw for accessing MongoDB databases. Additionally, numerous third-party FDWs are available to support other data sources like NoSQL databases or web services.
To use an FDW, it must be installed and configured in PostgreSQL. This involves creating a foreign server object that specifies the connection details for the external data source and creating foreign tables linked to the foreign server object. Once the FDW is set up, the foreign tables can be queried and updated alongside regular tables in PostgreSQL.
FDWs offer a powerful capability in PostgreSQL, enabling the integration of data from diverse sources within a single database. They simplify data management tasks and improve query performance by leveraging the strengths of different data storage technologies without the need for data migration. With FDWs, users can harness the full potential of PostgreSQL and seamlessly incorporate data from various sources into their database ecosystem.
Architecture of FDW's in PostgreSQL
PostgreSQL is a widely used open-source relational database management system (RDBMS) that has been under development since 1986. It offers a comprehensive set of features, including transaction control, high availability, and extensibility. One particularly intriguing feature is its support for managing remote data through Foreign Data Wrappers (FDW), which allows access to data from different servers. Let's delve into the architecture of FDW in PostgreSQL.
The SQL Management of External Data (SQL/MED) specification was introduced to the SQL standard in 2003. PostgreSQL has been progressively implementing this feature since version 9.1 to incorporate a subset of SQL/MED functionality. In SQL/MED, a table residing on a remote server is referred to as a foreign table.
PostgreSQL's FDW leverages SQL/MED to manage foreign tables, which are conceptually similar to local tables. After installing the necessary extension and configuring the appropriate settings, you can access and interact with foreign tables on remote servers.
To illustrate, let's consider an example scenario where there are two remote servers: "postgresql" and "mysql." Each server hosts a foreign table named "foreign_pg_tbl" and "foreign_my_tbl," respectively. In this case, you can access these foreign tables from the local server by executing SELECT queries:
sqlCopy code
localdb=# SELECT count() FROM foreign_pg_tbl; count ------- 20000
localdb=# SELECT count() FROM foreign_my_tbl;count ------- 10000
Furthermore, you can perform join operations involving foreign tables residing on different servers, similar to working with local tables:
sqlCopy code
localdb=# SELECT count(*) FROM foreign_pg_tbl AS p, foreign_my_tbl AS m WHERE p.id = m.id; count ------- 10000
While various FDW extensions have been developed and listed in the Postgres wiki, it's worth noting that most of them are not actively maintained. The primary exception is the "postgres_fdw," which is officially developed and maintained by the PostgreSQL Global Development Group. It serves as an extension specifically designed for accessing remote PostgreSQL servers.
FDW Architecture
To use the FDW feature, you need to install the appropriate extension and execute setup commands, such as CREATE FOREIGN TABLE, CREATE SERVER and CREATE USER MAPPING. After providing the appropriate setting, the functions defined in the extension are invoked during query processing to access the foreign tables.
The architecture of FDW in PostgreSQL can be described as follows:
- The analyzer/analyser in PostgreSQL constructs the query tree of the input SQL statement. It accomplishes this by incorporating the definitions of foreign tables, which are stored in the pg_catalog.pg_class and pg_catalog.pg_foreign_table catalogs. These definitions can be created using commands such as CREATE FOREIGN TABLE or IMPORT FOREIGN SCHEMA.
- The planner, also known as the executor, establishes a connection to the remote server using a specific library designed for connecting to the respective remote database server. For instance, when connecting to a remote PostgreSQL server, the postgres_fdw extension utilizes the libpq library. On the other hand, when connecting to a MySQL server, the mysql_fdw extension relies on the libmysqlclient library.
- When the use_remote_estimate option is enabled (which is off by default), the planner in PostgreSQL executes EXPLAIN commands to estimate the cost associated with each potential plan path. This estimation helps in evaluating and selecting the most optimal plan for query execution.
- The PostgreSQL planner converts the plan tree into a plain text SQL statement, a process known as deparsing. This deparsing step involves transforming the internal representation of the query plan into a human-readable SQL statement that reflects the intended operations and structure of the query.
- When the planning phase is complete, the executor in PostgreSQL sends the plain text SQL statement to the remote server for execution. It then awaits the result returned by the remote server. After receiving the data, the executor may perform additional processing depending on the nature of the query. For instance, in the case of a multi-table query, the executor carries out join processing to combine the received data with other relevant tables. The specific details of these processing steps are elaborated in subsequent sections, taking into account the query requirements and data relationships.
How FDW Works in PostgreSQL
Foreign Data Wrappers (FDW) in PostgreSQL enable the management of foreign tables, which resemble local tables, through the utilization of SQL/MED, a specification for accessing remote data. This functionality has been a part of PostgreSQL since version 9.1. Once the necessary extension is installed and appropriate settings are configured, foreign tables on remote servers can be accessed. Additionally, FDW allows the execution of join operations with foreign tables stored on different servers, just like local tables. While PostgreSQL offers several FDW extensions, only postgres_fdw is officially developed and maintained by the PostgreSQL Global Development Group.
To utilize the FDW feature, the relevant extension needs to be installed, and setup commands such as CREATE FOREIGN TABLE, CREATE SERVER, and CREATE USER MAPPING need to be executed. During query processing, functions defined in the extension are invoked to access foreign tables. The planner (or executor) establishes a connection to the remote server using a specific library designed for connecting to the respective remote database server. The planner creates a plan tree by executing EXPLAIN commands to query the cost of plans from the remote server.
However, some FDW extensions solely rely on embedded constant values. The executor sends the plain text SQL statement to the remote server, receives the result, and processes the data as necessary. For instance, when executing a multi-table query, the executor performs join processing of the received data with other tables. Detailed information regarding PostgreSQL's FDW is presented in the following sections, encompassing an overview, the functioning of the postgres_fdw extension, and the creation of a query tree, connection to the remote server, and generation of a plan tree using EXPLAIN commands.
Foreign Data Wrappers (FDW) Extension in PostgreSQL
PostgreSQL, an open-source object-relational database system, offers a variety of advanced functionalities. Among these features, Foreign Data Wrappers (FDW) stand out as a noteworthy capability. FDW allows users to seamlessly access and manage remote data as if it were a part of their local database. In this article, we will explore the FDW extension in PostgreSQL, delving into its fundamental concept and elucidating how it operates.
Foreign Data Wrappers (FDW)
Foreign Data Wrappers (FDW) in PostgreSQL are an extension that leverages the SQL Management of External Data (SQL/MED) specification to manage foreign tables, which exhibit similar characteristics to local tables. By utilizing FDW, users can effortlessly access foreign tables located on remote servers and perform join operations with tables stored across different servers.
The development of the FDW feature within PostgreSQL began with version 9.1. To utilize this feature, the installation of the relevant extension is necessary, followed by the execution of setup commands such as CREATE FOREIGN TABLE, CREATE SERVER, and CREATE USER MAPPING. Once the appropriate settings are configured, the functions defined in the extension come into play during query processing to facilitate seamless access to foreign tables.
Fundamental Concept of FDW
Once you have installed the required extension and configured the necessary settings, you gain the ability to access foreign tables residing on remote servers. To illustrate, let's consider a scenario where there are two remote servers, "postgresql" and "mysql," each containing a foreign_pg_tbl table and a foreign_my_tbl table, respectively. In this case, you can access these foreign tables from the local server by executing SELECT queries, as demonstrated below:
localdb=# -- foreign_pg_tbl is on the remote postgresql server.
localdb-# SELECT count(*) FROM foreign_pg_tbl;
count
-------
20000
localdb=# -- foreign_my_tbl is on the remote mysql server.
localdb-# SELECT count(*) FROM foreign_my_tbl;
count
-------
10000
Overview of FDW in PostgreSQL
To use the Foreign Data Wrapper (FDW) feature in PostgreSQL, you must begin by installing the appropriate extension and executing essential setup commands like CREATE FOREIGN TABLE, CREATE SERVER, and CREATE USER MAPPING. These commands allow you to define the foreign tables, establish connections to remote database servers, and map user credentials.
During query processing, the analyzer/analyser component creates a query tree based on the input SQL. It uses the definitions of foreign tables, which are stored in the pg_catalog.pg_class and pg_catalog.pg_foreign_table catalogs. The creation of foreign tables can be done using the CREATE FOREIGN TABLE or IMPORT FOREIGN SCHEMA command.
The planner (also known as the executor) utilizes specific libraries to establish connections with the remote database server. The connection details, including the username, server's IP address, and port number, are stored in the pg_catalog.pg_user_mapping and pg_catalog.pg_foreign_server catalogs. These details are specified using the CREATE USER MAPPING and CREATE SERVER commands.
PostgreSQL's FDW provides support for obtaining statistics from foreign tables, which are used to estimate the plan tree of a query. Several FDW extensions, such as postgres_fdw, mysql_fdw, tds_fdw, and jdbc2_fdw, rely on these statistics. If the use_remote_estimate option is enabled by executing the ALTER SERVER command and setting it to "on," the planner queries the remote server for the cost of plans using the EXPLAIN command. However, if this option is not enabled, the planner uses embedded constant values as the default estimation strategy.
Conclusion
FDW is a powerful feature in PostgreSQL that allows users to access and manage remote data as if it were a part of the local database. With FDW, you can execute join operations with tables stored on different servers. PostgreSQL's FDW extension is officially developed and maintained by the PostgreSQL Global Development Group. In this article, we have discussed the basic concept of FDW, how it works, and its overview in PostgreSQL.
Top comments (0)