Accelerating deep learning on Intel MIC architecture
Deep neural networks are continuously adapting to the underlying co-processor like GPU in order to saturate all sources of parallelism. At the same time, co-processor evolves in multiple directions to explore different trade-offs. The MIC architecture, one such example, strays from the mainstream CPU design by packing a larger number of simpler cores per chip, relying on SIMD instructions to fill the performance gap. Deep Learning has been attempting to utilize the SIMD capabilities of CPUs. However, mainstream CPUs have only recently adopted wider SIMD registers and more advanced instructions, since they do not rely primarily on SIMD for efficiency. In this project, we are going to investigate novel vectorized designs and implementations of DNNs, based on advanced SIMD operations, such as gathers and scatters.