DEV Community

Manoj Kumar Patra
Manoj Kumar Patra

Posted on

Design Patterns for Resilient Serving - Stateless Serving Function

Using this design pattern, we can have a production ML system synchronously handle millions of prediction requests per second.

Stateless components Stateful components
Output is determined purely by the inputs Output depends both on inputs and internal state
No state => can be shared by multiple clients Need to store each client's conversational state
Highly scalable => initialized on first request and destroyed when client terminates or timed out Expensive and difficult to manage

Exporting a model as a stateless function means the stateful variables such as epoch number, learning rate, etc. need to be tracked separately and not to be included in the exported file.

Demerits of carrying out inferences on an in-memory object

  1. Loading the entire model (which can be large in size) into memory
  2. Limits on latency on predictions
  3. Programming language dependency
  4. The model input and output may not be user-friendly

Achieving statelessness

  1. Export the model into a format that is programming language independent
  2. Restore the model as a stateless function in production
  3. Make the stateless function available via REST endpoint

To save a model in Keras: model.save(export_path)

This will export the model as a <model>.pb file - protocol buffer and extracts out other stateful variables into separate files.

model.save also takes an optional argument signatures. This can be used to define a dictionary stating different serving functions. If not specified, the model's forward pass is exported.

To determine the signature of the stateless function that we will use for serving:

!saved_model_cli show --dir {export_path} --tag_set serve --signature_def serving_default
Enter fullscreen mode Exit fullscreen mode

Finally, we can use this as follows:

restored = tf.keras.models.load_model(export_path)
infer = restored.signatures['serving_default']
outputs = infer(input)
Enter fullscreen mode Exit fullscreen mode

Top comments (0)