tinyML Talks on September 28, 2021 “A Practical Guide to Neural Network Quantization” by Marios Fournarakis

We held our next tinyML Talks webcast. Marios Fournarakis from Qualcomm AI Research presented A Practical Guide to Neural Network Quantization on September 28, 2021.

September 28 forum updated

IMPORTANT: Please register here

Neural network quantization is an effective way of reducing the power requirements and latency of neural network inference while maintaining high accuracy. The success of quantization has led to a large volume of literature and competing methods in recent years, and Qualcomm has been at the forefront of this research. This talk aims to cut through the noise and introduce a practical guide for quantizing neural networks inspired by our research and expertise at Qualcomm. We will begin with an introduction to quantization and fixed-point accelerators for neural network inference. We will then consider implementation pipelines for quantizing neural networks with near floating-point accuracy for popular neural networks and benchmarks. Finally, you will leave this talk with a set of diagnostic and debugging tools to address common neural network quantization issues.

You can find more information about the theory and algorithms we will discuss in this talk in our White Paper on Neural Network Quantization at the following arXiv link: [2106.08295] A White Paper on Neural Network Quantization

Marios Fournarakis is a Deep Learning Researcher at Qualcomm AI Research in Amsterdam, working on power-efficient training and inference of neural networks, focusing on quantization techniques and compute-in-memory. He is also interested in low-power AI applications and equivariant neural networks. He completed his graduate work in Machine Learning at University College London and holds a Master’s in Engineering from the University of Cambridge. Prior to Qualcomm, he worked as a Computer Vision research intern at Niantic Labs in London on ML-based video anonymization, and at Arup as a structural engineering consultant.


Watch on YouTube:
Marios Fournarakis

Download presentation slides:
Marios Fournarakis

Feel free to ask your questions on this thread and keep the conversation going!

Hi Marios,
Thanks you very much for an excellent talk, I learnt a lot.
I have been looking into binarized neural networks and I think AIMET and AdaRound could be very useful for this.
I have a quick question that you may be able to help with.

For highly quantized networks I think the loss of performance, due to quantization, could be compensated for by using larger networks (e.g. more nodes in each layer). Do you have any thought on this or references to this point ?

Thanks again, much appreciated.