If you are new to Azure you can get started a free subscription using the link below.
Once you’ve trained your first distributed PyTorch Lighting model with Azure ML it is time to add logging.
Logs are critical for troubleshooting and tracking the performance of machine learning models. Since we often train on remote clusters, logs provide a simple mechanism for having a clear understanding of what’s going on at each phase of developing our model.
As opposed to simple print statements, logs are time stamped, can be filtered by severity, and are used by Azure ML to visualize critical metrics such during training, validation, and testing. Logging metrics with Azure ML is alos a perquisite for using the Azure ML HyperDrive Service to help us find optimal model configurations.
Logging is a perfect demonstration of how both PyTorch Lighting and Azure ML combine to show simplify your model training just by using lightning we can save ourselves dozens of lines of PyTorch code in our application earning readability in the process.
In vanilla PyTorch, keeping track and maintaining logging code can get complicated very quickly.
ML frameworks and services such as Azure ML, Tensor Board, TestTube, Neptune.ai and Comet ML each have their own unique logging APIs. This means that ML engineers often need to maintain multiple log statements at each phase of training, validation and testing.
PyTorch Lighting simplifies this process by providing a unified logging interface that comes with out of the box support with the most popular machine learning logging APIs.
Multiple Loggers can even be chained together which greatly simplifies your code.
**from** pytorch\_lightning.loggers **import** TensorBoardLogger, TestTubeLogger logger1 **=** TensorBoardLogger('tb\_logs', name **=**'my\_model') logger2 **=** TestTubeLogger('tb\_logs', name **=**'my\_model') trainer **=** Trainer(logger **=** [logger1, logger2])
Once, loggers are provide to a PyTorch Lighting trainer they can be accessed in any lightning_module_function_or_hook outside of __init__.
**class** **MyModule** (LightningModule): **def** **some\_lightning\_module\_function\_or\_hook** (self): some\_img **=** fake\_image() _# Option 1_ self **.** logger **.** experiment **.** add\_image('generated\_images', some\_img, 0) _# Option 2_ self **.** logger **.** experiment **.** add\_image('generated\_images', some\_img, 0)
Since Azure ML has native integration with ML Flow, we can take advantage of PyTorch Lighting’s ML Flow Logger module to get native metric visualizations across multiple experiment runs and utilize hyperdrive with very minor changes to our training code.
Below I’ll outline the code needed to take advantage of Azure ML Logging with PyTorch lightning.
Add PyTorch Lighting, Azure ML and ML Flow packages to the run environment.
pip - azureml-defaults - mlflow - azureml-mlflow - pytorch-lightning
from azureml.core.run import Run run = Run.get\_context() mlflow\_url = run.experiment.workspace.get\_mlflow\_tracking\_uri()mlf\_logger =
MLFlowLogger(experiment\_name=amlexp.name, tracking\_uri=mlflow\_url) mlf\_logger.\_run\_id = run.id
Step #4 Add logging statements to the PyTorch Lighting the training_step, validation_step, and test_step Hooks
def training\_step(self, batch, batch\_idx): # Calculate train loss here self.log("train\_loss", loss) # return test loss def validation\_step(self, batch, batch\_idx): # Calculate validation loss here self.log("val\_loss", loss) # return test loss def test\_step(self, batch, batch\_idx): # Calculate test loss here self.log("test\_loss", loss) # return test loss
trainer = pl.Trainer.from\_argparse\_args(args) trainer.logger = mlf\_logger # enjoy default logging implemented by pl!
And there you have it! Now when you submit your PyTorch Lighting train script you will get real time visualizations and HyperDrive inputs at Train, Validation, and Test time with a fraction of the normal required code.
You shouldn’t but if you have any issues let me know in the comments.
In the next post, I will show you how to configure Multi Node Distributed Training with PyTorch and Azure ML using Low Priority compute instances to minimize training cost by an order of magnitude.
I want to give a major shout out to Minna Xiao and Alex Deng from the Azure ML team for their support and commitment working towards a better developer experience with Open Source Frameworks such as PyTorch Lighting on Azure.
Aaron (Ari) Bornstein is an AI researcher with a passion for history, engaging with new technologies and computational medicine. As an Open Source Engineer at Microsoft’s Cloud Developer Advocacy team, he collaborates with the Israeli Hi-Tech Community, to solve real world problems with game changing technologies that are then documented, open sourced, and shared with the rest of the world.