Join Operation:
PostgreSQL supports three join operations:
- Nested Loop Join It is the fundamental join operation and it can be used in any join condition. It does not need any start-up cost. The run cost of nested loop join is proportional to product size of outer and inner table.
Materialised Nested Loop Join:
Materialised nested loop join used to reduce the total scanning cost of the inner table. Before running a nested loop join, the executor writes the inner table tuple to the work_mem or temporary file by scanning the inner table. It uses the inner table tuple more efficiently.
Merge Join
Merge join is only used in natural joins and equi joins, and the cost of merge join is estimated by initial_cost_mergejoin() and final_cost_mergejoin() functions. Cost estimation is start-up cost of merge join is sum of sorting costs of both inner and outer tables, start-up cost is O(Nouter log2(Nouter)+Ninner log2(Ninner)).Hash Join
Hash join can be used only in natural joins and equi-joins. It depends on the size of table, if it is small enough then it will be two-phase in memory hash join, otherwise hybrid hash join is used with skew method, and the cost estimation is omitted.In-Memory Hash Join:
Hash table area is called batch in PostgreSQL and batch has hash slots called buckets and number of buckets described by ExecChooseHashTableSize() function.Hybrid Hash Join with Skew:
It is used when the tuple of the inner table cannot be stored into one batch in work_mem. In the first build and prob phase, PostgreSQL prepares multiple batches. The number of batches is the same as the number of buckets.
Join Access Paths and Join Nodes:
- Join access path: An access path of the nested loop join is the join path structure and other join access paths, Merger path and hash path are based on it.
- Join Nodes: This subsection shows the three join nodes:
- NestedLoopNodes
- MergeJoinNodes
- HashJoinNodes
Creating the Plan Tree of Multiple-Table Query:
Preprocessing:
Preprocessing for multiple-table query will be described:
- Planning and converting CTE
- Pulling Subqueries Up
- Transforming an outer join to an inner join
Getting Cheapest Path:
It is a very expensive process if planners consider all combinations of indexes and join methods and it will be infeasible if the number of tables exceeds a certain level. If a number less than 12 planner can get optimal results by applying dynamic programming, otherwise genetic algorithms.
Top comments (0)