This blog aims to assist you in understanding the intital concepts of Chapter:8 [Buffer Manager] from the book The Internals of PostgreSQL.
Note: Ensure that you have a thorough understanding of Chapter 7 Part-1, Chapter 7 Part-2 and basics of PostreSQL before we proceed to Chapter 8, as it forms the foundation for our exploration.
So, Let's Start:
Introduction to Chapter
A buffer manager manages data transfers between shared memory and persistent storage and can have a significant impact on the performance of the DBMS.
The PostgreSQL buffer manager works very efficiently.
Relations between buffer manager, storage, and backend processes in PostgreSQL is depicted in figure below:
Overview of the Chapter
Buffer Manager Structure
The PostgreSQL buffer manager comprises a buffer table, buffer descriptors, and buffer pool.
The buffer pool layer stores data file pages, such as tables and indexes, as well as freespace maps and visibility maps.
The buffer pool is an array (Indices of a buffer pool array are referred to as buffer_ids), i.e., each slot stores one page of a data file.
Buffer Tag
In PostgreSQL, each page of all data files can be assigned a unique tag, i.e. a buffer tag.
When the buffer manager receives a request, PostgreSQL uses the buffer_tag of the desired page.
The buffer_tag comprises three values:-
- The RelFileNode.
- The fork number of the relation to which its page belongs.
- The block number of its page.
- For Example: the buffer_tag {(16821, 16384, 37721), 0, 7} identifies the page that is in the seventh block whose relation's OID and fork number are 37721 and 0, respectively; the relation is contained in the database whose OID is 16384 under the tablespace whose OID is 16821.
How a Backend Process Reads Pages
When reading a table or index page, a backend process sends a request that includes the page's buffer_tag to the buffer manager.
The buffer manager returns the buffer_ID of the slot that stores the requested page. If the requested page is not stored in the buffer pool, the buffer manager loads the page from persistent storage to one of the buffer pool slots and then returns the buffer_ID's slot.
The backend process accesses the buffer_ID's slot.
How a backend reads a page from the buffer manager in PostgreSQL is depicted in figure below:
- When a backend process modifies a page in the buffer pool (e.g., by inserting tuples), the modified page, which has not yet been flushed to storage, is referred to as a dirty page.
Page Replacement Algorithm
When all buffer pool slots are occupied but the requested page is not stored, the buffer manager must select one page in the buffer pool that will be replaced by the requested page.
Page selection algorithms are called page replacement algorithms and the selected page is referred to as a victim page.
PostgreSQL has used clock sweep because it is simpler and more efficient than the LRU algorithm used in previous versions.
Flushing Dirty Pages
Dirty pages should eventually be flushed to storage; however, the buffer manager requires help to perform this task.
In PostgreSQL, two background processes, checkpointer and background writer, are responsible for this task.
Buffer manager structure
- The PostgreSQL buffer manager comprises three layers, i.e. the buffer table, buffer descriptors, and buffer pool.
Buffer manager's three-layer structure in PostgreSQL is depicted in figure below:
The buffer pool is an array. Each slot stores a data file pages. The indices of the array slots are referred to as buffer_ids.
The buffer descriptors layer is an array of buffer descriptors. Each descriptor has one-to-one correspondence to a buffer pool slot and holds metadata of the stored page in the corresponding slot.
The buffer table is a hash table that stores the relations between the buffer_tags of stored pages and buffer_ids of the descriptors that hold the stored pages' respective metadata.
Buffer Table
- A buffer table can be logically divided into three parts: a hash function, hash bucket slots, and data entries.
Buffer table in PostgreSQL is depicted in figure below:
The built-in hash function maps buffer_tags to the hash bucket slots.
If the number of hash bucket slots is greater than the number of the buffer pool slots, collisions may occur.
The buffer table uses a separate chaining with linked lists method to resolve collisions.
A data entry comprises two values: the buffer_tag of a page, and the buffer_id of the descriptor that holds the page's metadata.
For Example: a data entry βTag_A, id=1β means that the buffer descriptor with buffer_id 1 stores metadata of the page tagged with Tag_A.
The Pseudo-function is:-
uint32 bucket_slot = calc_bucket(unsigned hash(BufferTag buffer_tag), uint32 bucket_size)
Buffer Descriptor
The buffer descriptor structure is defined by the structure BufferDesc.
Tag holds the buffer_tag of the stored page in the corresponding buffer pool slot.
Buffer_id identifies the descriptor (equivalent to the buffer_id of the corresponding buffer pool slot).
Refcount (Pin count) ** holds the number of PostgreSQL processes currently accessing the associated stored page. When a PostgreSQL process **accesses the stored page, its refcount must be incremented by 1 (refcount++). After accessing the page, its refcount must be decreased by 1 (refcount--).When the refcount is zero, i.e. the associated stored page is not currently being accessed, the page is unpinned; otherwise it is pinned.
Usage_count holds the number of times the associated stored page has been accessed since it was loaded into the corresponding buffer pool slot.
Content_lock and io_in_progress_lock are light-weight locks that are used to control access to the associated stored page.
flags can hold several states of the associated stored page.
- Dirty bit indicates whether the stored page is dirty.
- Valid bit indicates whether the stored page can be read or written (valid).
- io_in_progress bit indicates whether the buffer manager is reading/writing the associated page from/to storage.
FreeNext is a pointer to the next descriptor to generate a freelist.
Three descriptor states are defined:
Empty: When the corresponding buffer pool slot does not store a page (i.e. refcount and usage_count are 0), the state of this descriptor is empty.
Pinned: When the corresponding buffer pool slot stores a page and any PostgreSQL processes are accessing the page (i.e. refcount and usage_count are greater than or equal to 1), the state of this buffer descriptor is pinned.
Unpinned: When the corresponding buffer pool slot stores a page but no PostgreSQL processes are accessing the page (i.e. usage_count is greater than or equal to 1, but refcount is 0), the state of this buffer descriptor is unpinned.
I hope, this blog has helped you in understanding the initial concepts of Buffer Manager in PostreSQL.
Check out summary of Chapter : 8 Part-2
If you want to understand PostgreSQL In-Depth.
Top comments (0)