tinyML Talks on September 28, 2021 “A Practical Guide to Neural Network Quantization” by Marios Fournarakis

Olga · August 24, 2021, 7:24pm

We held our next tinyML Talks webcast. Marios Fournarakis from Qualcomm AI Research presented A Practical Guide to Neural Network Quantization on September 28, 2021.

September 28 forum updated

IMPORTANT: Please register here

Neural network quantization is an effective way of reducing the power requirements and latency of neural network inference while maintaining high accuracy. The success of quantization has led to a large volume of literature and competing methods in recent years, and Qualcomm has been at the forefront of this research. This talk aims to cut through the noise and introduce a practical guide for quantizing neural networks inspired by our research and expertise at Qualcomm. We will begin with an introduction to quantization and fixed-point accelerators for neural network inference. We will then consider implementation pipelines for quantizing neural networks with near floating-point accuracy for popular neural networks and benchmarks. Finally, you will leave this talk with a set of diagnostic and debugging tools to address common neural network quantization issues.

You can find more information about the theory and algorithms we will discuss in this talk in our White Paper on Neural Network Quantization at the following arXiv link: [2106.08295] A White Paper on Neural Network Quantization

Marios Fournarakis is a Deep Learning Researcher at Qualcomm AI Research in Amsterdam, working on power-efficient training and inference of neural networks, focusing on quantization techniques and compute-in-memory. He is also interested in low-power AI applications and equivariant neural networks. He completed his graduate work in Machine Learning at University College London and holds a Master’s in Engineering from the University of Cambridge. Prior to Qualcomm, he worked as a Computer Vision research intern at Niantic Labs in London on ML-based video anonymization, and at Arup as a structural engineering consultant.

*=========================

Watch on YouTube:
Marios Fournarakis

Download presentation slides:
Marios Fournarakis

Feel free to ask your questions on this thread and keep the conversation going!

johned · October 1, 2021, 10:37am

Hi Marios,
Thanks you very much for an excellent talk, I learnt a lot.
I have been looking into binarized neural networks and I think AIMET and AdaRound could be very useful for this.
I have a quick question that you may be able to help with.

For highly quantized networks I think the loss of performance, due to quantization, could be compensated for by using larger networks (e.g. more nodes in each layer). Do you have any thought on this or references to this point ?

Thanks again, much appreciated.
John

Topic		Replies	Views
tinyML Talks on November 4, 2020 “Introduction to optimization algorithms for compressing neural networks” by Marcus Rüb tinyML Talks	0	754	November 4, 2020
tinyML Talks on January 16, 2023 “Lightweight Neural Network Architectures” by Andrii Polukhin from Data Science UA tinyML Talks	1	519	January 17, 2023
tinyML Talks on February 27, 2023 “From the lab to the edge: Post-Training Compression” by Edouard Yvinec from Datakalab tinyML Talks	0	459	February 14, 2023
tinyML Talks on November 21, 2023 “Twofold Sparsity: Joint Bit- and Network-level Sparse Deep Neural Network for Energy-efficient RRAM Based CIM” by Foroozan Karimzadeh from Georgia Institute of Technology tinyML Talks	0	203	November 9, 2023
tinyML Talks on January 19, 2021 “Running Binarized Neural Networks on Microcontrollers” by Lukas Geiger tinyML Talks	2	2150	January 22, 2021

tinyML Talks on September 28, 2021 “A Practical Guide to Neural Network Quantization” by Marios Fournarakis

Related Topics