Design Patterns for Resilient Serving - Batch Serving

#datascience #machinelearning

Batch serving is useful when we need to carry out predictions asynchronously unlike stateless serving function where we can process one instance or at most a few 1000 instances embedded in a single request.

Examples include:

Determine whether to reorder a stock-keeping unit - needs to be carried out on an hourly basis
Creating personalized songs playlist
Recommendation engines with periodic refresh rates - say, the periodic refresh rate is per hour, then, we carry out inferences for only those users who visited the website in last on hour

To achieve asynchronous predictions, batch serving makes use of distributed data processing infrastructures such as BigQuery, Apache Beam, etc.

Consider this example below, where we run inference on approx. 1.5 million rows of data using BigQuery:

WITH all_complaints AS (
SELECT * FROM ML.PREDICT(MODEL external_model,
  (SELECT consumer_complaint_narrative AS reviews
   FROM `bigquery-public-data`.cfpb_complaints.complaint_database
   WHERE consumer_complaint_narrative IS NOT NULL
  )
))
SELECT * FROM all_complaints
ORDER BY positive_review_probability DESC LIMIT 5

Here, the following operations take place in order:

Read consumer_complaint_narrative column from dataset where consumer_complaint_narrative is not NULL. Let's assume this is a total of X values. These are then distributed across N shards.
N workers process each of N shards to read the data and do the inference using the model files.
Each of the N workers find the 5 most positive complaints from the shard they processed.
Take the (5 * N) complaints, sort them and then select 5 from the actual result.

DEV Community

Design Patterns for Resilient Serving - Batch Serving

Top comments (0)

Read next

Automatic Indexing in Oracle 19c version

Top Data Science Tools in 2024: A Comparative Review of the Best Software

Beginners Guide: Setting Up Your Local Environment for Machine Learning with Miniconda and Python

The Ultimate Guide to Fine-Tuning LLMs: Technologies, Research, Best Practices, Challenges, and Opportunities