Batch serving is useful when we need to carry out predictions asynchronously unlike stateless serving function where we can process one instance or at most a few 1000 instances embedded in a single request.
Examples include:
- Determine whether to reorder a stock-keeping unit - needs to be carried out on an hourly basis
- Creating personalized songs playlist
- Recommendation engines with periodic refresh rates - say, the periodic refresh rate is per hour, then, we carry out inferences for only those users who visited the website in last on hour
To achieve asynchronous predictions, batch serving makes use of distributed data processing infrastructures such as BigQuery, Apache Beam, etc.
Consider this example below, where we run inference on approx. 1.5 million rows of data using BigQuery:
WITH all_complaints AS (
SELECT * FROM ML.PREDICT(MODEL external_model,
(SELECT consumer_complaint_narrative AS reviews
FROM `bigquery-public-data`.cfpb_complaints.complaint_database
WHERE consumer_complaint_narrative IS NOT NULL
)
))
SELECT * FROM all_complaints
ORDER BY positive_review_probability DESC LIMIT 5
Here, the following operations take place in order:
- Read
consumer_complaint_narrative
column from dataset whereconsumer_complaint_narrative
is notNULL
. Let's assume this is a total of X values. These are then distributed across N shards. - N workers process each of N shards to read the data and do the inference using the model files.
- Each of the N workers find the 5 most positive complaints from the shard they processed.
- Take the (5 * N) complaints, sort them and then select 5 from the actual result.
Top comments (0)