In a previous article, we introduced neural networks and TensorFlow framework basics.
Today, we present convert models developed with Python TensorFlow for use with TensorFlow.js, and discuss web-based versus server-based deployment.
As we introduced in the first article, the original Python TensorFlow consisted of a declarative-style API.
The declarative style (requiring by nature a specific debug environment: Tensorboard) and the broad API functionalities induced a relatively long learning curve on the developer's side:
- As a first step, the user constructs a "graph" of all TensorFlow operations (from simple operators on tensors to operations with complete networks, including connections with data sources and sinks).
- then, he creates a "session", in which TensorFlow analyses the graph, resolves the operation schedule and executes all computations.
Due to the long time needed to learn and master this API, Google introduced 2 significant python TF improvements toward user-friendliness:
- An imperative execution mode ("eager execution"), that is way more intuitive for python (and other script-languages) programmers… and making debugging easier. It was however not fully compatible with all existing features.
- Keras API: a set of high-level operations (network assembly, inference and training), user-friendly, dedicated to neural networks, inherited from Keras by Google in 2017.
Pre-trained models are available for public use by non-experts in machine learning on TensorFlow.js model repository, for various applications:
- Images processing: classification, objects detection, body/hand pose estimation, body segmentation, face meshing
- Text processing: toxicity detection, sentence encoding,
- Speech processing: command recognition.
- Language processing: the newly released mobileBERT model enables applications like chat bots, ...
All of these are also hosted on NPM. Feel free to visit the repository https://www.tensorflow.org/js/models for more details.
Although many ready-to-use models are available online, in most cases, re-training (at least, fine-tuning) is often required for a specific application case, when not re-architecting.
Knowing the Python TF history that was briefly summarized above, when the time comes to save or export a trained model, one won't be surprised to see different formats:
saved model format: includes a complete model architecture, weights and optimizer configuration in a single folder. Such a model can be used without access to the original python code. Training can be resumed from the checkpoint reached by the time it was saved,
Keras saved model ('hdf5' format): models created using the Keras API can be saved in a single file ('.h5'). Basically, it contains the same info as the saved model,
frozen model ('.pb'): a variant of a saved model, but that cannot be trained anymore (only architecture and weights are saved). It is aimed at being used for inference only.
TensorFlow provides a converter in python environment: tensorflowjs_converter.
It can be installed easily using:
$ pip install tensorflowjs
This utility converts various model file formats generated by the TF python API into a JSON file with additional binary files containing weights.
For details on model converter, see the links below:
In addition, the TensorFlow.js team just released a model conversion wizard (announced at TensorFlow dev summit 2020).
Converting with python shell command-line utility
Example for a frozen graph model's '.pb' file. The output node of the TensorFlow graph must be specified:
>>> tensorflowjs_converter \
Example for a '.h5' keras model file:
>>> tensorflowjs_converter --input_format=keras /my_path/my_model.h5 /my_tfjsmodel_path
Both examples create a JSON model file & binary weights
Generating a converted model in python code
For Keras models, the tensorflow.js module includes APIs callable in python TF that directly output JSON format.
# In Python code where the model is created and trained import tensorflowjs as tfjs ... def train(...): model = keras.models.Sequential() # create a layered keras model ... model.compile(...) model.fit(...) # train model tfjs.converters.save_keras_model(model, my_tfjsmodel_path)
const model = await tf.loadLayersModel('myTfjsmodelPath/model.json';);
then the model is usable for an inference:
const prediction = model.predict(inputData);
At some point, a neural network model is sufficiently stable to be used on significant data sets. Depending on the application case, this usage may consist of:
- inference only: analyzing "production" data sets (texts, images or other media content, etc…) without further training (at least during the analysis).
- inference and training: part of the "production" data sets is also used for continuous network training in order to increase performance with application-specific experience.
If both browser-based and Node-based TensorFlow.js APIs are equivalent in terms of functionalities, multiple key decision aspects add to performance when selecting the best way to operate the model : data volumes, transfer bandwidth and privacy.
Browser-based execution is interesting in highly-interactive applications, particularly when processing media that are streamed in or out locally (webcam, graphical user interfaces, sound, …), and for moderate-size NN whose load-time is not crippling for user experience.
Using a browser-based execution has some drawbacks for standard size-models, impacting a lot the user experience:
- The performance of the model is limited, and only moderate size NN modules can be used, despite TensorFlow.js' webGl and Wasm backends that provide acceleration capabilities,
- loading a model can take 15s or even a minute due to the size of models and the performance of the mobile network, which is a long time for the user,
- memory requirements to run the model are high. On small memory devices it restricts the use of the model, breaking application features,
- not all mobile phones/browsers are up to date and the model could not run on all devices.
Of course, this is a current state as Google progresses on some of these issues. In the short term, using a server-based execution using Node.js is an excellent solution that solves all these drawbacks.
- Performance of the model is close to Python TF thanks to using native or GPU accelerated versions of TF.js for Node.js, there are no more limits to the model complexity;
- a server has a super fast network, and time to load a model is significantly decreased. Also, servers can be already ready to run with models preloaded;
- a server can be tuned with memory requirements to run any model size;
- the model is guaranteed to run on any server.
The new drawbacks are more related to the remote data transfers to the server, in particular moving sensitive data out of the device must be managed and defined in the service provider...
It could also open the possibility to perform inference/training processes within or at the edge of the network boundary where the data is stored to reduce latency and data transfer times.
Only the inference results (usually lighter than input data flows) have to be considered as payload from latency & infrastructure cost viewpoints.
Finally, TensorFlow.js, on the server side, provides the TFX tool (Tensorflow extended) to deploy production machine-learning pipelines. The AutoML tool (provided by Google Cloud) also provides a GUI-based suite to train and deploy custom ML models without requiring extended machine-learning and NN expertize.
Dominique d'Inverno holds a MSC in telecommunications engineering. After 20 years of experience including embedded electronics design, mobile computing systems architecture and mathematical modeling, he joined ScaleDynamics team in 2018 as AI and algorithm development engineer.