Design Patterns for Resilient Serving - Stateless Serving Function

#machinelearning

Using this design pattern, we can have a production ML system synchronously handle millions of prediction requests per second.

Stateless components	Stateful components
Output is determined purely by the inputs	Output depends both on inputs and internal state
No state => can be shared by multiple clients	Need to store each client's conversational state
Highly scalable => initialized on first request and destroyed when client terminates or timed out	Expensive and difficult to manage

Exporting a model as a stateless function means the stateful variables such as epoch number, learning rate, etc. need to be tracked separately and not to be included in the exported file.

Demerits of carrying out inferences on an in-memory object

Loading the entire model (which can be large in size) into memory
Limits on latency on predictions
Programming language dependency
The model input and output may not be user-friendly

Achieving statelessness

Export the model into a format that is programming language independent
Restore the model as a stateless function in production
Make the stateless function available via REST endpoint

To save a model in Keras: model.save(export_path)

This will export the model as a <model>.pb file - protocol buffer and extracts out other stateful variables into separate files.

model.save also takes an optional argument signatures. This can be used to define a dictionary stating different serving functions. If not specified, the model's forward pass is exported.

To determine the signature of the stateless function that we will use for serving:

!saved_model_cli show --dir {export_path} --tag_set serve --signature_def serving_default

Finally, we can use this as follows:

restored = tf.keras.models.load_model(export_path)
infer = restored.signatures['serving_default']
outputs = infer(input)

DEV Community

Design Patterns for Resilient Serving - Stateless Serving Function

Demerits of carrying out inferences on an in-memory object

Achieving statelessness

Top comments (0)

Read next

Unearthing Universal Feature Geometries: Sparse Autoencoders Reveal Crystal-like and Modular Structures

LLMs Know More Than They Show: Intrinsic Representation of Hallucinations Revealed

YOLOv11: A New Breakthrough in Document Layout Analysis

Unlocking Search Relevance: Large Language Models Power Pinterest Discovery