Architecture Diagram generated from repo source
Originally Posted https://darnahsan.medium.com/video-analytics-in-scala-with-akka-actors-ffmpeg-graphicsmagick-opencv-and-openimaj-b12735336b27 Published on May 3, 2020
This is going to be a long post. You can skip to the **Design* section if you don’t want to go through some background history of the project.*
This was my first time writing Scala almost 6 years ago(and the last time for a project of this scale). Back then the word around was that Ruby is going to get obsolete as Rails is losing its charm after setting the web development world on the path of convention over configuration. Scala was a relatively new kid on the block with Twitter recently investing a lot of its stack development in it.
The whole idea of video analytics 6 years ago was quite a buzz. Around the time I started this, Azure was not even a cloud player but now it's the 2nd biggest player and has beaten AWS to the Pentagon billion-dollar cloud project. The world was still running its racks or getting dedicated servers in a data centre somewhere aroundthe globe. Cloud was not the cloud we know today. So yeah this seemed like a daunting task where not many resources about how to go around doing this were available online and neither the tools in the cloud to scale were readily available for commercial purposes nor open source projects to do such analysis. All that you could find were some sample projects around using the libraries.
As a young developer with just over 3 years of experience in the industry, this looked like an opportunity to disrupt the big players with commercial offerings. I named the project Retrospective then and now looking back at it, it's a retrospective of that journey.
Primarily being a Ruby developer I wanted to do this in Ruby and even found an openCV binding for it on Github called ropencv. It lacked resources over how to work with it and had limited features available, having limited experience of working with openCV I managed to run the classifiers using ropencv and contributed examples for future users in a [PR18].
hog_descriptor and haar cascade examples #18
these are examples for Hog descriptor and Haar Cascade , pppl might find them useful
The performance was not to the standards where you would be able to process videos in a reasonable amount of time. The next obvious choice were using C/C++ or JVM. Never wrote C++ in my life and never enjoyed writing a limited amount of C that I did. On the other hand, Java was a decent compromise but its boilerplate was now a nuisance being a Ruby developer. That is when exploring other languages I narrowed it down to Groovy and Scala. The choice to proceed ended being Scala due to its strong support for Actors which seemed very impressive for doing multi-tasking. Luckily OpenCV has a guide to generate Java bindings but none were available to download and use so I open-sourced the bindings for anyone to use on Github
ahsandar / OpencvJavaLib
java bindings compiled using OpenCV instructions for opencv for 32bit 64bit , java 7 & 8 Linux and Mac OSX
opencv java binding lib and jar for linux/osx 32 bit and 64 bit
Enough of the background over how, why and when this project was commenced, will come back to why it was never pursued with commercial intent and why it never got worked upon in the last 6 years and what are the takeaways from this project.
Design
Now over how the whole program works or how I designed for it to run and process gigabytes of files over a modest laptop HP Pavillion dm-3000ea with 16 GB of RAM in significantly less time then the total duration of videos .
Videos are nothing but a sequence of Images and that is what was the primary block of analysis. Each video was broken down to the level of images. Each video to be processed was termed as Footage which was then broken down into chunks called Video, video was further broken down into clips from which images were extracted for analysis. Each breakdown was configurable down to a minute. Assume you have a Footage of 1 hour then it could be broken down into chunks of 15 mins of videos which would then be broken down into chunks of 5 mins clips and frame extracted out of each clip. The clips could be generated down to 1-minute intervals for analysis.
As you can see the breakdown in the folder structure to better understand the basic unit of processing.
Each Footage was owned by an Akka Actor an amazing toolkit to build reactive and concurrent systems. So each Footage was assigned to a Footage Actor who would convert the footage into chunks of Video where each video was owned by its VideoActor, which was a child actor to FootageActor. The VidoeActor would convert each video to a clip and each clip in the same manner was owned by a ClipActor that would be a child of the VideoActor. So one FootageActor can have many children VideoActor and each VideoActor can have many ClipActor. This made a hierarchy of workers each working in its own space and doing analysis per clip
The FFmpeg was used to create Video and Clip chunks from the Footage. It was simpler to use the CLI tool via scala than a Java library which was not feature-complete. To run CLI commands throughout the code I wrote a small helper class that helped in executing commands and capturing their outputs to know about success or errors.
package services;
import grizzled.slf4j.Logging;
import sys.process._;
import scala.collection.mutable.ListBuffer;
object CommandService {
private object Locker
val cmd: CommandService = new CommandService
def execute(command: String): String = {
//Locker.synchronized{
cmd << command
cmd.execute
cmd.outputLogAll
// }
}
}
class CommandService extends Logging {
var commands = new ListBuffer[String]
val out = new StringBuilder
val err = new StringBuilder
val outputLogger = ProcessLogger(
(o: String) => out.append(o),
(e: String) => err.append(e)
)
def this(command: String) {
this
commands+=(command)
}
def << (command: String) = commands+=(command)
def cmdNew = commands+= "&&"
def printCmd = info(createCommand)
def execute {
var command: String = createCommand
info(command)
command ! outputLogger
commands.clear
}
def createCommand = commands.mkString(" ")
def outputLog = outputLogger
def outputLogOut = out.mkString
def outputLogErr = err.mkString
def outputLogAll = f"$outputLogOut %n $outputLogErr"
def printStdOutput {
printStdOut
printStdErr
}
def printStdOut {
info("output start")
info(out.mkString(""))
info("output end")
}
def printStdErr {
info("error start")
info(err.mkString(""))
info("error end")
}
}
Once we had our basic unit of processing it was the turn for doing some image processing on the extracted frames. There can be as many processors as you like to be added to the design as each process was performed by an actor in parallel. The main processes that were being done in parallel before Heatmaps get generated were the following.
Background Subtraction and Connected Component
Classifiers Harr and HOG
ORB (Oriented FAST and Rotated BRIEF)
Background Subtraction
OpenCV was used to do background subtraction. The background was subtracted per frame to identify changes in each frame. This helped in identifying any motion in the video and also indicated particular regions that were occupied by objects with their duration which made it possible to create heatmaps of the indoor and outdoor locations.
Connected Components
To calculate the connected components from the background-subtracted images initially OpenCV was tried but it had some issues and would cause segmentation faults. OpenIMAJ another great library to do image analysis was used and the results were decent enough to identify active locations that were interacted with by people.
Cascade Classifier
Nowadays identification classifiers have progressed by miles. It's quite simple to build object-identifying programs using toolkits such as TensorFlow. OpenCV had two basic classifiers at that time and they did a decent job of identifying objects when run together and results combined with outliers rejected.
Heatmaps
This was my favourite feature of the whole and the initial use case to identify the hot zones in indoor and outdoor videos. This worked like a charm, it was probably the most accurate of all the features built on the background subtraction output.
After all the processors ran then came the job of accumulating the results from each clip and combining them per video and then on footage level. This was also done using Actors. Accumulator Actor would combine results per clip and send them to a VideoActor who would do the same by running an Accumulator Actor per video and then send it over to Footage Actor.
At some point, I might put the source code on Github but for now, give this a like if you learned something or enjoyed reading it.
This was all done using Strategy Pattern in Akka. I got to share this as an answer to question on stack overflow out of experience of this project as well
This is an continuation of my previous question How do I get around type erasure on Akka receive method
I have 10 type of events which extends from Event that I need to handle.
I want to implement business logic for each event in separate trait, because because mixing all…
case class EventOperation[T <: Event](eventType: T)
class OperationActor extends Actor {
def receive = {
case EventOperation(eventType) => eventType.execute
}
}
trait Event {
def execute //implement execute in specific event class
}
class Event1 extends Event {/*execute implemented with business logic*/}
class Event2 extends Event {/*execute implemented with business logic*/}
This pretty much sums up the whole Retrospective project. The reason I gave up on Scala after this was its own doing the Scala 3 was on the rise with a Python 3 moment around the corner for it. Also the Community was split up itself trying to take each other down which isn’t a great welcoming experience for new comers. There are some really great individuals who can be credited to have new comers join the Scala club but for me the road ended with this project.
As for the project goes it didn’t get its chance to be commercialised due to the demand and this was probably way before its time for people to understand the importance of video analytics.
Nonetheless this project did introduce me to a lot of challenges and gave me an opportunity on how to solve them. It gave me an experience of working on a reactive concurrent system. Nowadays creating projects thay identify objects is a piece of cake with the advances over time but how to do that at scale over gigabytes of footage is what this project turned out to be a model for me and not just another object identifier in videos.
At some point I might put the source code on Github but for now give this a like if you learned something or enjoyed reading it.
Top comments (0)