vishwasnarayanre

Posted on May 3, 2021

TFUG Thursdays : Accelerating DNN Applications on Edge compute

#edgecomputing #machinelearning #deeplearning #hardware

Neural network have somehow been a game changer in the field of the Deep learning and Edge comprises a myriad of devices spanning micro-controller based, application processor based and server based(IOT and IIOT). These are consumer electronic devices ranging from drones, small-robots, surveillance cameras, AR/VR goggles, wearables and many more which needs processing on device. The compute type ranges from CPU, GPU, DSP, NPU and Microcontroller. These devices have different OSes such as custom-Linux, ubuntu, android and ROS to name a few. The on-device acceleration libraries range from pytorch, TensorFlow, tensorRT(from Nvidia) and such. What should an application developer do to enable a use-case with the Neural Network? Too many choices, complexities in a compute, storage, memory and power limited Edge environment.

thus in this webinar we will try to demystify what is required for you to work on the Neural Network on edge

In this webinar we will focus on

GMAC technology demo..
Need for accelerating TensorFlow applications
Hardware considerations and tricks
Energy efficient TensorFlow execution on Edge devices

We will update the online event page soon and also please do watch out for the email from our community.

About Speaker:

Amit Mate has 20+ years of experience in leading cross-functional Engineering teams on ML and Wireless projects from concept-through-commercialization. He has delivered commercial grade software on several deep-technologies (OCR, 3G/4G, VR, Femtocells) with Industry leaders such as Qualcomm and Nokia.

Amit earned his master’s degree in electrical communication engineering from IISc, Bangalore and bachelor's in electronics and communication from NIT, Nagpur. He has been awarded 10+ patents including 3GPP essential patents.

GMAC is a company that focuses more on the edge compute where AI/ML is deployed on the device which has constraints for the Memeory,Power(Battery). During 2013, in his (Amit Sir's) Carrer he was introduced to Ai/Ml and worked on OCR engine on Phone and learnt how to accelerate these workloads.

Agenda of this talk was:

What is AI/ML/DL?
What is edge?
Todays Edge DNN Application-combination of Ai,ML,DL.
DNN Application Stack
A close look at DNN and Computation on EDGE Compute.
Why we need to Accelerate?

What is AL/ML/DL?

In the webinar he starts by referring to Dr,RajReddy, a renowned robotist from CMU Robotics institute was awarded ACM award at the time of 1994, he had predicted the growth of the industry with reference to the industry disruptions. He also predicted when a computer can do more than a billion operations then we can have some more breakthroughs in the industry.

Intelligence is not defined even today thus example of the intelligence is more unique today and he also gives the example of the Car and driver today.

Thus machine intelligence is for the implicit execution and also implicit inference.

What is EDGE?

Edge computing is a distributed information technology (IT) architecture in which client data is processed at the periphery of the network, as close to the originating source as possible.

Edge computing is explained in detail here this is the definition form ARM and also gives a good reference to the compute that we all should be working on in the futurate have everything that is required for the beginners in a nutshell thus please do go and read this article.

But in Edge there are gateway, node and /or any device-to-device network which happens in the real time*almost) but for the advanced 5G or the 4G connected network the latency is for the round trip for the computation.

The demo promo from GMAc Intelligence is here

Here the list of demos are for the Activity recognition,ANPR,Touchless attendance

In the demo video of GMAC there is also monocular depth sensing for the traffic and also other application. Thus, we can say that we need to have more modality to the model and then the inference will be more accurate.

Need for accelerating TensorFlow application

We also can infer through the pico joule per inference which come in the same inference that we had earlier Power , Process ,Area. Here the Pico joule per inference gives us how much energy is needed in giving a inference for the edge device accurately.

The Deep learning DNN stack, which says the difference in the real time compute instance and the Edge compute.

The stack below in the above image is the real time stack. All these stacks will infer only one thing what are the accelerators and how do we need to accelerate the application.

Thus we focus on the hardware compute types:

The above is a very common architecture of the CNN. This is going to be the same throughout in any use cases. Thus we need to optimize the Neural network for the Edge compute inference.

Why DSP in DNN Inference?

You guys can refer to this article that I have mentioned here.

These are the parameters that will say the nature of the Neural Network.MAC (Multiply and Accumulate) operations are the one that are actually defined in edge computing when they do the inference.

How are DNN inferencing application is mapped on todays DSP's TI AM5749 Sitara SoC?

Datasheet of TI AM5749

The main component is a network translation tool, this basically gives the trained network to the optimized network in the real time deployment.

this above diagram and the inference on the slide will give you the actual need in involving DSP units in the computer boards when we are architecting it.So moving a little deeper for the ALU(Arithmatics logic Unit) which is architected in the SIMD(single Input and Multiple Data)

Thus when we step back and look at the table that has given the reference to the MAC operations for the different operations from ANN,thus if we don't do these optimizations we may have to give a lot of the RAM(Random-access memory) and "Memory Read Relative Cost" in the figure above will say the DDR Cost that we need to keep in mind when we are designing the Neural Network when deployed on Edge computer. Thus an efficient way to get the DNN architecture or CNN architecture is by having a dataflow on the Edge compute.

The dataflow that is designed above is going to give us a chance to reduce the redundant data flow from the DRAM(Dynamic Random-access memory),in DRAM we also need to keep the DRAM Refresh that we need to keep in mind (this is also a one of the unsolved problem statements in memory -if you solve it, it is going to be a breakthrough).

Energy Efficient 2D Convolution

The slide below says the features, but the aim of designing such architecture is getting more compute in the less cycles. If you are doubtful take a pen and paper and do a 8-bit compute of the same architecture if you can so that you get a proper understanding of the Architecture that you want to deploy with the Energy profile that has been shown here.

Other design elements for energy

efficient computations on DSPS

Support for reduced precision of operands and operations (instead of 32 bit floating point)

Binary/Ternary weights and activations
8/10 bit dynamic fixed-point representations for weights/activations ### Support to reduce number of operations and model size
JIT uncompressing of network models (DDR-Local buffer transfer is in compressed form) - Huffman coding of weights and biases
Depth First Convolution => approach to minimize data movements to DDR Amit Mate
Skip reading weights and MAC operations for zero (or near zero valued) weights and activations ### Other Algorithmic approaches
Network Pruning and Re-Training
Winograd Convolutions -2x speed-up (replace convolution with dot product - e.g., 4x4 and 3x3 convolution (i.e., 36 MACS ) get replaced with 16 MACS after Winograd.

Thus, this is the conclusion that we can draw from this session when we are designing some algorithms and deploying a model on a Edge Device.

Q/A part:

When can we see edge computing platform doing some very hefty tasks of running complex architecture on deployment?

Answer There is a lot of the new architecture involved over the course of the time, we can have the intent of the neural network also thus we have to make sure that we do some good work in terms of the Architecture first and then go on with the Edge compute as tuning architecture for the Edge compute is a very tough challenge. Thus we might see this edge computing being deployed very soon so as to run some hefty algorithms like RNN,CNN in the future. Skill to extract the full potential of the hardware.

Think in terms of these parameters also, this should be used as a ruled book when you want to use..

Is hardware acceleration important for edge or is compressing deep network model more efficient?

Answer You can refer to this paper here : https://arxiv.org/pdf/1710.09282.pdf

Thanks to those who read this article and also thanks to those who attended the webinar live if you have missed it please do go and see the webinar in the above link.

Challenge Submission: SpeechCraft - AI-Powered Speech Analysis for Better Communication

SpeechCraft is an advanced real-time speech analytics platform that transforms spoken words into actionable insights. Using cutting-edge AI technology from AssemblyAI, it provides instant transcription while analyzing multiple dimensions of speech performance.

Read full post