> Originally published on my personal blog.
Soon after the lockdown, when we all started working from home, there was a sudden increase in the number of people who wanted to write machine learning code. And not just any machine learning code, but one specifically to check if a person is wearing a mask or not.
The way this would work is, you have a computer running this program, which pulls in live feed or images from a webcam connected to that computer. The program would scan for people in the video or photos, and then detect if the person is wearing a mask or not, as this is very important in today’s world.
Image processing like this is very important, impressive, and necessary. Machine learning, artificial intelligence, or deep learning – whatever you want to call it, plays a very significant role in our computers being able to do it. But right now, in 2021, these technologies have become so commonplace, that anybody with the basic knowledge of how to put together a bunch of REST APIs, can build applications around this.
Now, by saying this, I don’t mean to belittle the tens of thousands of people who have and still are working hard to make this happen. Without the tireless efforts of scientists behind the algorithms that we take for granted today, we wouldn’t have a lot of features that we enjoy so much today.
But, with this experiment, I wanted to show that even a person like me, who absolutely, in no way is an expert in any AI or ML technologies or tools, can build an application which uses all these technologies in the background. And also to show that this can be done in a very short amount of time.
On LinkedIn, I saw a post where another person had written this Python code to detect PPE on a person. Curious, I clicked on the GitHub link to check the code out. I went through it, most of it. And thought to myself – this is a lot of code to do something that somebody has already done. And there are companies that offer this as a service professionally that you can just build into your app.
So the first place I checked, to nobody’s surprise, was AWS. I had already played around a bit with their Rekognition service. It’s very easy to get started, the APIs are straight forward, and it isn’t too expensive. So I decided to use their new PPE Rekognition API to see if I can do the same thing that all these people have been doing.
But for that, I need to interface with a webcam and capture a photo, at least. So I turned to a technology that I have very little knowledge of – React. And fortunately for me, there’s an npm package for that. I’m sure you’ve not heard of that before.
Anyway, I set about creating a new React app using the very easy to remember command:
npx create-react-app ppe_rekognition
Once all the boilerplate was created for me, I added the webcam dependency, went through their documentation (copy-pasted the code snippet), and I had a working web app which could now take photos from a webcam.
Once this was done, I wrote a Java (Spring Boot) API service which would take requests from the React frontend app and forward that to the Rekognition APIs. And then forward the response in the opposite direction. I added the API service in between just to make sure, even by mistake, I don’t leak my AWS access and secret keys (I did it anyway).
So, within an hour or two, I had a pipeline to take a photo from a webcam, forward that to my API service, which forwards that to AWS’s Rekognition APIs, which then tell me if:
- there are people in the frame
- they are wearing a face cover
- they are wearing a head cover
- they are wearing a hand cover
And if they are wearing any of these covers, what’s the boundary in the frame where these covers were detected? And a lot more info, which is overwhelming to be honest.
You can see two screenshots below from the app. I wanted to blur my face instead of using these bright stars, but I’m too lazy for all that. Anyway, in the first photo, you can see that I’m not wearing a mask, and that’s what the result says. But it was able to detect that there’s a face in the frame.
Similarly, in the second photo, you can see that I’m wearing a mask, and that’s reflected in the result. We’re still working on the UI to, to make it more usable. We’ve already added the option to select from a list of cameras available on the device – when you’re using this on your phone and you want to use the camera on the back of the phone to scan other people. Go check it out.
And I was done. That’s pretty much it. I didn’t collect a huge cache of images and then split it into training and testing datasets. I didn’t wait hours together to train my model only to realize that I’ll have to do it all over again for a 147th time. And guess what, I didn’t even build a model. It’s pretty anti-climactic.
And that’s saying a lot.
We’re still putting some work into it though. As I said, I don’t do any kind of UI work. So I pulled in a reluctant friend to help me with that. And he’s been doing a great job turning it into something that you can actually look at without cringing. Take a look at.
One takeaway from this for you is that most AI, ML, deep learning, etc. has been commoditized. You don’t have to freak out about learning these things. Using these ready-to-use tools and services is a very good place to start your AI journey.
But that’s not to say that you will find a tool out there for all your needs. For most real world, real business impact kind of tasks, you’ll still need to build your own models. But for the most part, for most consumer-level applications (image processing, text processing, voice processing, video processing, etc.), you can easily find at least one service which could meet most of your needs.
The second take away from this is that you definitely need to spend your time learning and building you own models, messing with the features in the your data, wanting to pull all your hair out, giving up on life just before you stumble upon another tweak for your model. So yeah, go build a model.