SC20

Increasing AI Inference with Low-Precision Optimization Tool with Intel® Deep Learning Boost – A High Energy Physics Use Case

With hardware acceleration support, such as Intel® DL Boost Vector Neural Network Instructions (VNNI), low precision Deep Learning (DL) inference delivers higher throughput and lower latency meeting accuracy. However, deployment challenges remain due to high computational complexity of inference quantization. We have developed a Low-Precision Optimization Tool in the Open Source to boost inference performance and solution deployment supporting a unified interface across Intel optimized DL frameworks (TensorFlow, PyTorch, and MXNet) on Intel platforms. This tool supports rich and automatic accuracy-driven tuning strategies for post-training quantization and quantization aware training. We present a number of DL benchmarks quantized with this tool showing up to 3.8X performance improvement. We also apply the low-precision tool to convolutional Generative Adversarial Networks (GAN) “3DGAN” prototype in High Energy Physics (HEP) use case from CERN openlab. Our GAN model has delivered promising results for decreasing inference time while maintaining the same level of accuracy compared to Geant4 Monte Carlo simulations. The low-precision INT8 inference performance over FP32, benchmarked on Intel 2S Xeon® processor 8280 demonstrates a speedup of up to 1.8X. From the physics point of view, we evaluate the quality of the quantized GAN generated images against the validation data simulated using Geant4 simulations. This work represents one of the first attempts at applying low precision computing to generative models for which the precision constraints on the output are very strict. We show that over generational CPU hardware improvements, TensorFlow framework optimizations, and with low precision inference, we observe a speedup of >68000X over Monte Carlo Simulation. Dr. Sofia Vallecorsa, Computing Engineer, CERN openlab Haihao Shen, Engineering Manager at Intel Corporation

Read Less