DEV Community

Manoj Kumar Patra
Manoj Kumar Patra

Posted on

Design Patterns for Resilient Serving - Batch Serving

Batch serving is useful when we need to carry out predictions asynchronously unlike stateless serving function where we can process one instance or at most a few 1000 instances embedded in a single request.

Examples include:

  1. Determine whether to reorder a stock-keeping unit - needs to be carried out on an hourly basis
  2. Creating personalized songs playlist
  3. Recommendation engines with periodic refresh rates - say, the periodic refresh rate is per hour, then, we carry out inferences for only those users who visited the website in last on hour

To achieve asynchronous predictions, batch serving makes use of distributed data processing infrastructures such as BigQuery, Apache Beam, etc.

Consider this example below, where we run inference on approx. 1.5 million rows of data using BigQuery:

WITH all_complaints AS (
SELECT * FROM ML.PREDICT(MODEL external_model,
  (SELECT consumer_complaint_narrative AS reviews
   FROM `bigquery-public-data`.cfpb_complaints.complaint_database
   WHERE consumer_complaint_narrative IS NOT NULL
  )
))
SELECT * FROM all_complaints
ORDER BY positive_review_probability DESC LIMIT 5
Enter fullscreen mode Exit fullscreen mode

Here, the following operations take place in order:

  1. Read consumer_complaint_narrative column from dataset where consumer_complaint_narrative is not NULL. Let's assume this is a total of X values. These are then distributed across N shards.
  2. N workers process each of N shards to read the data and do the inference using the model files.
  3. Each of the N workers find the 5 most positive complaints from the shard they processed.
  4. Take the (5 * N) complaints, sort them and then select 5 from the actual result.

Top comments (0)