Practical Rust Web Development - Pagination

#rust #pagination #actixweb #postgres

As you might know, the default pagination mechanism provided in PostgreSQL is not very efficient, actually they warn against OFFSET when you need to go deep inside pagination.

So, what alternative to OFFSET do we have?, one option might be by design, let's say we will build a modern application with infinite scrolling that works in a similar fashion as Instagram, this implies a simple solution.

We could just return a group of records, let's say 10, and sort them by any field, in this case we could create a rank field which contains some number that prioritize the search according to some factors, like a premium account or any other we can decide, anyway we're going to ignore the algorithm for the rank field and assume we already have it.

We want another rank, this is a PostgreSQL function called ts_rank, we're using full text search and we need to order our rows according to the frequency of their matching lexemes.

However we have a problem, because of the previous requirement (the full text search rank) our queries might be slow, so we need to find a solution.

RUM index

We have something that might help, there is an external PostgreSQL module that creates a custom index that allows us to use full text search with ranking with faster results, it's called Rum index, the problem is that we will have slow inserts, but it's ok, because we need fast reads.

The first thing we need to do is install the extension and load it in PostgreSQL, in ubuntu we can do it this way:

sudo apt install postgresql-10-rum
sudo su - postgres
psql -d mystore
CREATE EXTENSION rum;

Then we need to edit our last migration to change GIN by RUM and add the function rum_tsvector_ops:

migrations/2019-06-25-112356_add_tsvector_column_to_products/up.sql:

ALTER TABLE products ADD COLUMN text_searchable_product_col tsvector NOT NULL;

UPDATE products SET text_searchable_product_col = 
    to_tsvector('english', name || ' ' || coalesce(description, ''));

CREATE INDEX textsearch_idx ON products USING RUM (text_searchable_product_col rum_tsvector_ops);

CREATE TRIGGER tsvectorupdateproducts BEFORE INSERT OR UPDATE
ON products FOR EACH ROW EXECUTE PROCEDURE
tsvector_update_trigger(text_searchable_product_col, 'pg_catalog.english', name, description);

Now, we can redo our migration:

diesel migration redo

Be careful to check the schema file, we add a new field type called TsVector and need to load it from diesel_full_text_search crate, every time we perform a migration schema.rb reloads removing the use instructions.

RUM index support

Our crate called diesel_full_text_search doesn't support the RUM operator: <=>, so we can use my fork for that:

Cargo.toml:

diesel_full_text_search = { version = "1.2.1", git = "https://github.com/werner/diesel_full_text_search" }

We said that we would be using a rank for product let's add it.

diesel migration generate add_rank_to_products
diesel migration generate add_rank_index

migrations/2019-07-01-114531_add_rank_to_products/up.sql:

ALTER TABLE products ADD COLUMN product_rank NUMERIC DEFAULT 0.0;

migrations/2019-07-01-192138_add_rank_index/up.sql:

CREATE INDEX products_rank_idx ON products (product_rank);

Now we can modify the model.

src/models/product.rs:

impl ProductList {
    pub fn list(connection: &PgConnection, search: &str, rank: f64) -> Self {
        use diesel::RunQueryDsl;
        use diesel::ExpressionMethods;
        use diesel::QueryDsl;
        use diesel::pg::Pg;
        use crate::schema::products::dsl::*;
        use crate::schema;
        use diesel_full_text_search::{plainto_tsquery, TsRumExtensions, TsVectorExtensions};

        let mut query = schema::products::table.into_boxed::<Pg>();

        if !search.is_empty() {
            query = query
                .filter(text_searchable_product_col.matches(plainto_tsquery(search)))
                .order((product_rank.desc(), 
                        text_searchable_product_col.distance(plainto_tsquery(search))));
        } else {
            query = query.order(product_rank.desc());
        }
        let result = query
            .select(PRODUCT_COLUMNS)
            .filter(product_rank.le(rank))
            .limit(10)
            .load::<Product>(connection)
            .expect("Error loading products");

        ProductList(result)
    }
}

We added the corresponding order by to take care of the RUM functionality, if no search is provided we just do a regular sort by our product rank.

We use a filter to perform a where clause when we need to sort the results, then from the front end we would receive the corresponding rank to return the products.

Finally we edit the handler:

src/handlers/products.rs:

#[derive(Deserialize)]
pub struct ProductPagination {
    pub rank: f64
}

pub fn index(_user: LoggedUser,
             pool: web::Data<PgPool>,
             product_search: web::Query<ProductSearch>,
             pagination: web::Query<ProductPagination>) -> Result<HttpResponse> {
    let pg_pool = pg_pool_handler(pool)?;
    let search = &product_search.search;
    Ok(HttpResponse::Ok().json(ProductList::list(&pg_pool, search, pagination.rank)))
}