Improving paging performance with large data exports.

#mongodb #composable #database #performance

My work with Elastic Path involves helping companies make amazing ecommerce experiences, but many of these companies are huge enterprises with massive amounts of data. The challenge of handling huge record sets is common and not specific to the ecommerce world. For ecommerce it is often a large customer base or ever increasing list of historical orders, but other fields deal with the same problem of data just in different way. While databases today have become much better at handling this level of data requirements, the need to export or synchronize these records with other systems can still create problems. Elastic Path is API-first, so it’s easy to say, just build an API, but sending all order data in a single request is impossible. Paging is required to make things work, but not all paging is created equally.

The initial approach to building pagination is typically to leverage the systems already built into the database of choice. For MongoDB that means using offset paging with skip() and limit().

This allows each query to skip a predetermined number of records to reach a particular page and then limit the results to that specific page size.

Here is an example of retrieving page 3, with a page size of 10 records:

mongodb // MongoDB query example for offset-based paging db.records.find().skip(20).limit(10);

MongoDB has a default limit of 10,000 records when using skip. This is not an intrinsic limitation of MongoDB, but is a practical limit to ensure good performance and to ensure memory and processing power is not monopolized by paging. Having this limit ensures that paging does not negatively affect the systems overall performance.

When you use offset-based pagination, MongoDB needs to skip a certain number of records to retrieve the desired page of results. However, as the offset value increases, the database needs to perform more work to skip over the specified number of records. This can lead to slower query performance and potential resource exhaustion, especially with large offset values.

Queries become progressively slower as the offset increases, leading to performance degradation.

This is not limited to MongoDB or document databases, PostgreSQL, MySQL, and other databases will also see performance degradation with very large values passed to LIMIT.

While this approach works well for small numbers of records or in cases where you will only show the first few pages of results it proves inefficient for large databases and data exports.

Solution: Cursor-Based Pagination

Cursor-based paging takes a different approach to splitting a dataset into pieces. This takes a bit more setup but can offer better performance when at higher values. Cursor-based paging leverages a unique identifier, the cursor, to navigate through the dataset. Unlike offset paging, where the skip value can lead to deteriorating performance over time, cursor-based pagination relies on a cursor that points directly to a specific record. This ensures consistent and predictable performance.

Cursor-Based Pagination Steps:

Identify your cursor, choose a field that is unique and sortable.
For the first page of results, simply sort by this cursor and limit the number of records.
Grab the last item from this page and make note of the cursor value.
For the next page, include a filter that records must have a cursor greater (or less depending on sort) than the value from step 3.

This approach leverages the database indexes to improve performance so it’s important that the database is configured in a way to optimize the query pattern.

Index the Cursor Field: The field used for the cursor, often an increasing or unique identifier, should be indexed. This helps in efficient retrieval of records based on the cursor value.
Avoid Complex Queries: Keep cursor-based queries simple. Complex queries can hinder the database's ability to optimize and use indexes effectively.
Use Compound Indexes: If your cursor is based on multiple fields, consider using compound indexes, while not ideal it’s better than no index at all.
Regularly Monitor Index Usage: Keep an eye on the indexes' usage and performance. Unused or redundant indexes can impact overall database performance.
Balance Read and Write Operations: Be mindful of the trade-off between read and write performance when adding indexes. Over-indexing can slow down write operations.

Here is an example MongoDB query using this pattern:

mongodb // MongoDB query example for cursor-based pagination db.records.find({ _id: { $gt: lastProcessedId } }).limit(10);

Benefits

The advantages of cursor-based pagination over traditional offset methods are multifaceted. By ensuring consistent and predictable performance, this approach proves highly effective in managing large datasets. It circumvents challenges related to data gaps and duplicate entries that can plague offset paging. Moreover, cursor-based pagination's elimination of record skipping streamlines the processing of new data, enhancing both efficiency and reliability.

If you want to see how this can be put in practice with an API-First solution you can see the Elastic Path documentation.

Learn the foundations of

Composable Architecture

Free Course

DEV Community

Improving paging performance with large data exports.

Solution: Cursor-Based Pagination

Benefits

Composable Architecture

Top comments (0)

Read next

Optimizing Pagination in PostgreSQL: OFFSET/LIMIT vs. Keyset

Benchmarking Crunchy Data for latency

The Prisma ORM: A Brief Overview and Introduction

When and Why to Use JSON Columns in SQL Databases