As stated in Moore’s Law, the number of transistors that can fit onto an integrated circuit doubles roughly every 18 months. This drove the exponential growth of computing power over the decades. However, it has become much harder to keep the pace of Moore’s law as the scale of chip components shrink. The ending of Moore’s law has caused major limitations in energy efficient general purpose architectures making implementation technology and parallelism to reach their maximum potential for performance improvements. Domain Specific Architecture with domain specific hardware acceleration was introduced to address this issue and improve the performance that cannot be achieved by improving general purpose computing.
Domain Specific Architectures achieve higher efficiency compared to general purpose computing by tailoring the architecture to characteristics of the domain being addressed. Therefore, Domain Specific Architectures require more domain specific knowledge. Graphics Processing Units(GPUs), Neural Network processors used for deep learning, and processors for Software Defined Networks(SDNs) are a few of the applications of Domain Specific Computing.
To increase efficiency and achieve higher performance, Domain Specific Computing utilizes parallelism through Single Instruction Multiple Data(SIMD) technique. As it needs to fetch only one instruction stream and the ability to operate in lockstep makes SIMD more efficient than Multiple Instruction Multiple Data(MIMD) used by General Purpose Processors. Domain Specific Architecture also makes use of Very Long Instruction Word(VLIW) that works well with explicit parallel programs by performing the necessary analysis and scheduling at compile time.
Another key feature of Domain Specific Architecture is the effective use of memory bandwidth. General purpose processors use multilevel caches to increase bandwidth and hide the latency but caches have a low temporal or spatial locality when the datasets are huge and when caches work well the locality is very high thus making the cache idle most of the time. Domain Specific Architectures use a hierarchy of memories with movement controlled explicitly by the software. Therefore when the application is specific to a certain domain, as in the case of Domain Specific Computing, user controlled memories can consume less energy than caches.
Domain Specific Architecture can also achieve higher performance by eliminating unnecessary accuracy when it is adequate. When using general purpose CPUs for applications in machine learning and graphics it uses more accuracy than needed by using 32 and 64-bit integer and floating point data. Most of these applications can improve both data and computational throughput by using just 8-bit or 16-bit integers. Domain Specific Architecture can also benefit from using Domain Specific Languages that can introduce more parallelism with improved structure and memory access.
In conclusion, it can be stated that Domain Specific Architectures can achieve better performance as they are shaped to fit the needs of a specific application but it must impact significant portions of computational problems in order to acquire a remarkable runtime impact. It is also evident that computing power is the most dominating concern driving the deployment of Domain Specific Architecture based solutions.