Artificial Intelligence (AI) in times of General Data Protection Law (LGPD) in Brazil!

#lgpd #datascience #dataops #devops

How to suit LGPD using MLOps, Data catalog, and more?

We are adapting to the General Data Protection Law (LGPD); this new law has as main objective to guarantee data privacy and reliability, but how is the data area adapted to this new reality? What are the strategies adopted? What about Artificial Intelligence (AI)?

These are some of the strategies that are happening in the data area:

Infinite Forms?
This strategy aims to create one or more forms to manage who accesses and where the data and its sources are. The problem with this approach is that for each new data source that gets dirty, you have to create a new form or adapt the old ones with it; it takes a long time to adopt this strategy fully.
Magic data traceability tools and/or solutions:
This strategy aims to adopt ideas and concepts of data lineage. We have an environment of high data replication, and it is necessary to know who is accessed data sources and data replication processes. The problem with this approach is control since monitoring the data does not guarantee privacy and control of access to this information.
Data Catalog
This strategy aims to create a data catalog where instead of applications and users accessing and using data sources directly, they use interface to control and mediate access. We have control and ways to manage access to data, avoid unnecessary replication, and create a resilience mechanism for the data source and backups.

The main market tools where we can adopt this type of strategy are Dremio and Qlik Data Catalyst, among others …

What about artificial intelligence? Is there a strategy? A leap from the cat?

When we look at an Artificial Intelligence model, we notice that it is basically not just another software where we will only use the principles and means of software engineering; after all, we have data in this context.

With software engineering, we basically learn about life cycles, where the creation of new software requirements are raised, developed, tested, undergo maintenance and evolution. Already with the AI model, we have a much more complex life cycle since the generation of the model; it is training and retraining.

We use different algorithms to create and train different types of models with data samples that are often random; how to control this cycle? How to guarantee data privacy and reliability in the results of a new model?

Can we adopt MLOps ?!

At the end of 2018, many people began to realize that they had the means to implement new facilitating models and even in an automated way like AutoML, but deploying or putting them into production until today is another story, with that, the discipline of MLOPS (Machine Learning and “Information Technology OPerationS”), which aims to simplify and automate the life cycle of Artificial Intelligence models.

“MLOps (a compound of Machine Learning and“ information technology OPerationS ”) is [a] new discipline/focus/practice for collaboration and communication between data scientists and information technology (IT) professionals while automating and productizing machine learning algorithms.” — Nisha Talagala (2018)

In 2019, MLOPS was used in the creation of automation for the implementation of new models, resulting in different automated pipeline solutions generally guided by GitOps; in most cases, the Continuous Integration (CI) process takes place, where a new model is encapsulated in a Docker image and taken to production by a Deploy Continuous (CD) process where the image would be in one or more containers managed by Kubernetes (K8S), OpenShift, among other solutions…

Currently, market solutions are no longer just automation pipelines and enabling and managing all the clicks of new models. We currently have mlflow, Kubeflow, Polyaxon, and so many other solutions that aim at the possibility of adopting MLOps.

With MLOPS, we can track and manage the entire life cycle of a model, be it the data engineer working with data from different sources and creating the datasets, the data scientist using the dataset in conjunction with different algorithms and ways to generate the trained models, automation pipelines for model deployment and even monitoring the retraining need with a new set of data.

Using MLOPS, we can manage all access and life cycle data and AI models, making this discipline possible to adapt the General Data Protection Law.

Some interesting links about MLOPS:

https://medium.com/analytics-vidhya/polyaxon-argo-and-seldon-for-model-training-package-and-deployment-in-kubernetes-fa089ba7d60b

https://towardsdatascience.com/the-rise-of-the-term-mlops-3b14d5bd1bdb

https://towardsdatascience.com/ml-ops-machine-learning-as-an-engineering-discipline-b86ca4874a3f

https://towardsdatascience.com/mlops-the-upcoming-shining-star-dcf9444c493

https://medium.com/@selfouly/mlops-done-right-47cec1dbfc8d

Some podcasts:

https://hipsters.tech/machine-learning-e-o-mlops-hipsters-171/

https://medium.com/data-hackers/dataops-a6d008549aa6

This is only the first part, with your feedback we will have other articles on the subject.