ddif06 for ScaleDynamics

Posted on Mar 16, 2020 • Updated on Dec 21, 2021 • Originally published at Medium

How to run Tensorflow.js on a serverless platform : reusing models

In a previous article, we introduced neural networks and TensorFlow framework basics.

Today, we present convert models developed with Python TensorFlow for use with TensorFlow.js, and discuss web-based versus server-based deployment.

TensorFlow, from Python to JavaScript

As we introduced in the first article, the original Python TensorFlow consisted of a declarative-style API.

The declarative style (requiring by nature a specific debug environment: Tensorboard) and the broad API functionalities induced a relatively long learning curve on the developer's side:

As a first step, the user constructs a "graph" of all TensorFlow operations (from simple operators on tensors to operations with complete networks, including connections with data sources and sinks).
then, he creates a "session", in which TensorFlow analyses the graph, resolves the operation schedule and executes all computations.

Due to the long time needed to learn and master this API, Google introduced 2 significant python TF improvements toward user-friendliness:

An imperative execution mode ("eager execution"), that is way more intuitive for python (and other script-languages) programmers… and making debugging easier. It was however not fully compatible with all existing features.
Keras API: a set of high-level operations (network assembly, inference and training), user-friendly, dedicated to neural networks, inherited from Keras by Google in 2017.

Tensorflow.js, the JavaScript version of TensorFlow (imperative execution) does not include all TF functionalities available in "declarative" mode, but supports, amongst others, the full Keras API.

Ready to use TensorFlow.js models

Pre-trained models are available for public use by non-experts in machine learning on TensorFlow.js model repository, for various applications:

Images processing: classification, objects detection, body/hand pose estimation, body segmentation, face meshing
Text processing: toxicity detection, sentence encoding,
Speech processing: command recognition.
Language processing: the newly released mobileBERT model enables applications like chat bots, ...

All of these are also hosted on NPM. Feel free to visit the repository https://www.tensorflow.org/js/models for more details.

More than 1000 available TensorFlow models and variants are being centralized in the TensorFlow Hub, which includes models for Python and the models mentioned above, usable in JavaScript.

As mentioned in our previous article, the Magenta project (music and art using ML), hosted on NPM as well, provides a JavaScript API using models, amongst which recursive neural networks (RNN).

Converting a Python TF model for JavaScript

Although many ready-to-use models are available online, in most cases, re-training (at least, fine-tuning) is often required for a specific application case, when not re-architecting.

As Python is widely used in model design and training, situations arise where a model developed with Python TF has to be used with JavaScript (browser or Node.js).

Knowing the Python TF history that was briefly summarized above, when the time comes to save or export a trained model, one won't be surprised to see different formats:

saved model format: includes a complete model architecture, weights and optimizer configuration in a single folder. Such a model can be used without access to the original python code. Training can be resumed from the checkpoint reached by the time it was saved,
Keras saved model ('hdf5' format): models created using the Keras API can be saved in a single file ('.h5'). Basically, it contains the same info as the saved model,
frozen model ('.pb'): a variant of a saved model, but that cannot be trained anymore (only architecture and weights are saved). It is aimed at being used for inference only.

TensorFlow provides a converter in python environment: tensorflowjs_converter.

It can be installed easily using:

$ pip install tensorflowjs

This utility converts various model file formats generated by the TF python API into a JSON file with additional binary files containing weights.

For details on model converter, see the links below:

https://www.tensorflow.org/js/guide/conversion

https://www.tensorflow.org/js/tutorials/conversion/import_saved_model

https://www.tensorflow.org/js/tutorials/conversion/import_keras

In addition, the TensorFlow.js team just released a model conversion wizard (announced at TensorFlow dev summit 2020).

Converting with python shell command-line utility

Example for a frozen graph model's '.pb' file. The output node of the TensorFlow graph must be specified:

>>> tensorflowjs_converter \
--input_format=tf_frozen_model \
--output_node_names='MobilenetV2/Predictions/Reshape_1' \
/mobilenet/frozen_model.pb \
/mobilenet/web_model

Example for a '.h5' keras model file:

>>> tensorflowjs_converter --input_format=keras /my_path/my_model.h5 /my_tfjsmodel_path

Both examples create a JSON model file & binary weights

Generating a converted model in python code

For Keras models, the tensorflow.js module includes APIs callable in python TF that directly output JSON format.

Example:

# In Python code where the model is created and trained
import tensorflowjs as tfjs
...
def train(...):
    model = keras.models.Sequential()   # create a layered keras model
    ...
    model.compile(...)
    model.fit(...)                                 # train model
    tfjs.converters.save_keras_model(model, my_tfjsmodel_path)

Once converted, depending on the model type (Graph or Keras), it can be loaded in a JavaScript environment with Tensorflow.js model loading utilities:

// in JavaScript code inferring the converted model
const model = await tf.loadGraphModel('myTfjsmodelPath/model.json';);

or,

const model = await tf.loadLayersModel('myTfjsmodelPath/model.json';);

then the model is usable for an inference:

const prediction = model.predict(inputData);

Operating a JavaScript model

At some point, a neural network model is sufficiently stable to be used on significant data sets. Depending on the application case, this usage may consist of:

inference only: analyzing "production" data sets (texts, images or other media content, etc…) without further training (at least during the analysis).
inference and training: part of the "production" data sets is also used for continuous network training in order to increase performance with application-specific experience.

If both browser-based and Node-based TensorFlow.js APIs are equivalent in terms of functionalities, multiple key decision aspects add to performance when selecting the best way to operate the model : data volumes, transfer bandwidth and privacy.

Browser-based execution is interesting in highly-interactive applications, particularly when processing media that are streamed in or out locally (webcam, graphical user interfaces, sound, …), and for moderate-size NN whose load-time is not crippling for user experience.

Using a browser-based execution has some drawbacks for standard size-models, impacting a lot the user experience:

The performance of the model is limited, and only moderate size NN modules can be used, despite TensorFlow.js' webGl and Wasm backends that provide acceleration capabilities,
loading a model can take 15s or even a minute due to the size of models and the performance of the mobile network, which is a long time for the user,
memory requirements to run the model are high. On small memory devices it restricts the use of the model, breaking application features,
not all mobile phones/browsers are up to date and the model could not run on all devices.

Of course, this is a current state as Google progresses on some of these issues. In the short term, using a server-based execution using Node.js is an excellent solution that solves all these drawbacks.

Performance of the model is close to Python TF thanks to using native or GPU accelerated versions of TF.js for Node.js, there are no more limits to the model complexity;
a server has a super fast network, and time to load a model is significantly decreased. Also, servers can be already ready to run with models preloaded;
a server can be tuned with memory requirements to run any model size;
the model is guaranteed to run on any server.

The new drawbacks are more related to the remote data transfers to the server, in particular moving sensitive data out of the device must be managed and defined in the service provider...

It could also open the possibility to perform inference/training processes within or at the edge of the network boundary where the data is stored to reduce latency and data transfer times.

Only the inference results (usually lighter than input data flows) have to be considered as payload from latency & infrastructure cost viewpoints.

Finally, TensorFlow.js, on the server side, provides the TFX tool (Tensorflow extended) to deploy production machine-learning pipelines. The AutoML tool (provided by Google Cloud) also provides a GUI-based suite to train and deploy custom ML models without requiring extended machine-learning and NN expertize.

In next article, we’ll show how to use an online TensorFlow.js model and deploy it rapidly using our WarpJS JavaScript Serverless Function-as-a-Service (FaaS).

#MadeWithTFJS

Thanks!

About the author

Dominique d'Inverno holds a MSC in telecommunications engineering. After 20 years of experience including embedded electronics design, mobile computing systems architecture and mathematical modeling, he joined ScaleDynamics team in 2018 as AI and algorithm development engineer.

DEV Community

How to run Tensorflow.js on a serverless platform : reusing models

TensorFlow, from Python to JavaScript

Ready to use TensorFlow.js models

Converting a Python TF model for JavaScript

Operating a JavaScript model

About the author

Top comments (0)

Read next

Navigating the Rapids: A Junior Developer's Journey Through Challenges and Growth

2530. Maximal Score After Applying K Operations

2570. Merge Two 2D Arrays by Summing Values solution

3113. Find the Number of Subarrays Where Boundary Elements Are Maximum