Accelerating deep learning on Intel MIC architecture


petrie - Posted on 19 October 2015

Project Description: 

Deep neural networks are continuously adapting to the underlying co-processor like GPU in order to saturate all sources of parallelism. At the same time, co-processor evolves in multiple directions to explore different trade-offs. The MIC architecture, one such example, strays from the mainstream CPU design by packing a larger number of simpler cores per chip, relying on SIMD instructions to fill the performance gap. Deep Learning has been attempting to utilize the SIMD capabilities of CPUs. However, mainstream CPUs have only recently adopted wider SIMD registers and more advanced instructions, since they do not rely primarily on SIMD for efficiency. In this project, we are going to investigate novel vectorized designs and implementations of DNNs, based on advanced SIMD operations, such as gathers and scatters.

Researcher name: 
Petrie Wong
Researcher position: 
PhD Student
Researcher department: 
Department of Computer Science
Researcher email: 
Research Project Details
Project Duration: 
08/2014 to 08/2017
Project Significance: 
Our research outcomes will be findings on Xeon Phi, which are quantitatively different from those on multi-core CPUs. First, the impact of architectural features and software optimizations has quite different behavior on Xeon Phi in comparison with those on the CPU, which calls for new optimization and tuning on Xeon Phi. Second, we would find out the winner between hardware oblivious algorithms and hardware conscious algorithms on a wide parameter window. These two outcomes further shed light on the design and implementation of machine learning models on new-generation single-chip many-core technologies.
Results Achieved: 
Currently, we have published a workshop paper[1] about accelerating in-memory indexes in modern CPU architecture in an affiliated workshop of ACM SIGMOD. [1] TLB misses — the Missing Issue of Adaptive Radix Tree? P Wong, Z Feng, W Xu, E Lo, B Kao. International Workshop on Data Management on New Hardware
Remarks: 
In this project, we develop our new implementations of machine learning models in the Intel's modern MIC architecture. This hardware is in the HPC2015 cluster.