tinyML Talks on November 16, 2021 “SuperSlash: Unifying Design Space Exploration and Model Compression methodology for design of deep learning accelerators for TinyML” by Rehan Hafiz

Olga · October 10, 2021, 6:41am

We held our next tinyML Talks webcast. Rehan Hafiz from Information Technology University presented SuperSlash: Unifying Design Space Exploration and Model Compression methodology for design of deep learning accelerators for TinyML on November 16, 2021.

Forum November 16

Deploying Deep Learning (DL) models on resource-constrained embedded devices is a challenging task. The limited on-chip memory on such devices results in increased off-chip memory access volume, thus limiting the size of DL models that can be efficiently realized in such systems. Sophisticated DSE (Design Space Exploration) schemes have been developed in the past to reduce the off-chip memory access volume. However, DSE alone cannot reduce the amount of off-chip memory accesses beyond a certain point due to the fixed model size. Model compression via pruning can be employed to reduce the size of the model and the associated off-chip memory accesses. However, we found that pruned models with even the same accuracy and model size may require a different number of off-chip memory accesses depending upon the pruning strategy adopted. Furthermore, the classical pruning schemes are not guided by the goals of DSE. In this talk we discuss SuperSlash, a unified solution for DSE and Model Compression. SuperSlash estimates off-chip memory access volume overhead of each layer of a deep learning model by exploring multiple design candidates. In particular, it evaluates multiple data reuse strategies for each layer, along with the possibility of layer fusion. Layer fusion aims at reducing the off-chip memory access volume by avoiding the intermediate off-chip storage of a layer’s output and directly using it for processing of the subsequent layer. SuperSlash then guides the pruning process via a ranking function, which ranks each layer according to its explored off-chip memory access cost. The talk shall thus present a technique to jointly perform the pruning and DSE to fit in large DNN models on accelerators with low computational resources.

Rehan Hafiz received his Ph.D. degree in Electrical Engineering from the University of Manchester, United Kingdom, in 2008. He is currently with Information Technology University (ITU), Lahore, as a Professor in the Faculty of Engineering. He founded and directed the Vision Processing Lab (VISpro) that focuses on areas like Vision System Design, Approximate Computing, Design of Application-Specific Hardware Accelerators, Deep Learning, FPGA based design, and applied image and video processing. Apart from several publications in these areas, he holds multiple patents in the US, South Korean, and Pakistan patent offices.

=========================

Watch on YouTube:
Rehan Hafiz

Download presentation slides:
Rehan Hafiz

Feel free to ask your questions on this thread and keep the conversation going!

Topic		Replies	Views
tinyML Talks on November 21, 2023 “Twofold Sparsity: Joint Bit- and Network-level Sparse Deep Neural Network for Energy-efficient RRAM Based CIM” by Foroozan Karimzadeh from Georgia Institute of Technology tinyML Talks	0	200	November 9, 2023
Two tinyML Talks on June 9, 2020: 1) “SpArSe: Sparse Architecture Search for CNNs on Resource-Constrained Microcontrollers” by Igor Fedorov (Arm); 2) “tinyML doesn’t need Big Data, it needs Great Data” by Dominic Binks (Audio Analytic) tinyML Talks	10	1597	June 11, 2020
tinyML Talks on November 4, 2020 “Introduction to optimization algorithms for compressing neural networks” by Marcus Rüb tinyML Talks	0	751	November 4, 2020
tinyML Talks on July 17, 2020 “ A Review of Compression Methods for Deep Convolutional Neural Networks" by Vincent Gripon tinyML Talks	0	741	August 18, 2020
tinyML Talks on September 28, 2021 “A Practical Guide to Neural Network Quantization” by Marios Fournarakis tinyML Talks	1	846	October 1, 2021

tinyML Talks on November 16, 2021 “SuperSlash: Unifying Design Space Exploration and Model Compression methodology for design of deep learning accelerators for TinyML” by Rehan Hafiz

Related Topics