Are you still writing flask to serve your model? Stop doing that, you have a much better choice now: Pinferencia.
Pinferencia is a python library aims to be the simplest way to serve your model.
Fast to code, fast to go alive. Minimal codes to write, minimum codes modifications needed. Just based on what you have.
100% Test Coverage: Both statement and branch coverages, no kidding.
Easy to use, easy to understand.
Automatic API documentation page. All API explained in details with online try-out feature. Thanks to FastAPI and Starlette.
Serve any model, even a single function can be served.
Support Kserve API, compatible with Kubeflow, TF Serving, Triton and TorchServe. There is no pain switching to or from them, and Pinferencia is much faster for prototyping!
Yes, and a lot easier than other tools.
You just need to add three extra lines.
Checkout the sample on its page to serve a huggingface model:
Ready to get start?
Go visit: Pinferencia (underneathall.app) for detailed examples.