Two tinyML Talks on November 10, 2020 by Ehsan Saboori from Deeplite and Alexander Samuelsson from Imagimob

We held our next tinyML Talks webcast with two presentations: Ehsan Saboori from Deeplite presented Networks within Networks: Novel CNN design space exploration for resource limited devices and Alexander Samuelsson from Imagimob presented How to build advanced hand-gestures using radar and tinyML on November 10, 2020 at 8:00 AM and 8:30 AM Pacific Time.

Ehsan Saboori (left) and Alexander Samuelsson (right)

The push for ultra-low power and low-latency deep learning models, computing hardware, and systems for inference on edge devices continues to create exciting new opportunities for AI in daily life. However, designing such systems can introduce significant overhead in terms of trial & error, domain expertise and engineering challenges, particularly when trying to preserve model accuracy after extreme model compression. In this talk, we introduce a novel, automated method for compressing convolutional neural networks to preserve maximum accuracy on the target application by utilizing a “network within network” paradigm. In this approach we approximate the individual layers within a neural network using smaller neural networks to find highly accurate, highly compact CNN topologies that satisfy strict constraints on model size and acceptable accuracy. We will discuss some results, the implications of this approach for tinyML applications and future developments in the field.

Ehsan Saboori is CTO and co-founder of Deeplite. He obtained his BSc in artificial intelligence and completed his PhD in computer science at Concordia university in 2016. He has been a member of ACM and IEEE associations and published several peer-reviewed conferences and journal papers. With several years of experience working in different companies such as Microsoft, SAP and Morgan Stanley he cofounded Deeplite where he assessed emerging challenges for deep learning and formed the technology vision that became the core of Deeplite.

Imagimob is a pioneer in Edge AI (=TinyML) with experience from 20+ Edge AI customer projects. Customers include Scania, Husqvarna and Flir Systems. The first commercial product using an Imagimob Edge AI application was commercially launched in 2018. This presentation is a case study where we are demonstrating how we developed gesture-controlled headphones using a radar sensor (Acconeer) and Edge AI applications, a concept that was presented at CES 2020 in Las Vegas. The presentation will demonstrate how Imagimob AI (SaaS) is used in the development of the application.

Alexander Samuelsson is the CTO and co-founder at Imagimob. Alex has extensive experience of software development in areas such as mobile apps, mobile games and cloud systems. Previously he studied Computer Science at KTH Royal Institute of Technology.


Watch on YouTube:
Ehsan Saboori
Alexander Samuelsson

Download presentation slide:
Ehsan Saboori
Alexander Samuelsson

Feel free to ask your questions on this thread and keep the conversation going!

On the hand-gestures using radar:

  • How many gestures on maximum your model can handle?
  • What is the accuracy rate?
  • Have you compared your model and results with Google Soli?
  • Could you give more detail on how you represent the radar signal? Do you have a paper on that?

Thanks a lot

Here are the questions and answers for Deeplite’s session.

  1. These are great results, do you know how the compression effects real-world performance?
    We remove a lot of redundancy from the network during compression. In most scenarios, our compressed models learn better, and we have the most efficient information in the small compact model.

  2. What is the disadvantage to replacing a layer with a smaller one?
    As we add more layers, maybe the latency will go up. But as we are adding thinner layers, it compensates for the overhead. You can fit the model on the endpoint devices.

  3. "How is the compression results and performances of this approach as compared to the Knowledge Distillation approaches?
    We are also using the Knowledge distillation approach, but the knowledge distillation alone doesn’t help you compress.

  4. Is possible to use a dynamic stack method that can grow and shrink proportionally according to throughput/load per node?
    We use Annealing metaheuristic approach which is similar to grow and shrink method in this context. We can define temperature for each layer and by increasing and decreasing them we can find near optimal size that better approximates the global optima while the overall energy of the model is preserved.

  5. have you tried this on EfficientNet?
    We are going to do this. We have tried on MobileNet, Dense Net.

  6. How did you explore network within network to get those results?
    We have used some metaheuristic approaches to select the layers and learn the size for the alternative networks. After replacing those layers and minimizing the loss we could achieve those results.

  7. Are there trade-offs that come with the compression of the model?

  8. Do you have it published as a paper?
    We are working on that. We will publish the paper on our website once we complete it.

  9. Improving the performance. Does it mean that VGG (by default) has not been trained enough?
    No, it means it is over parameterized. on the other word, we don’t need that much capacity to learn CIFAR100 e.g. So, it gives us more room to optimize VGG comparing to mobilenet e.g.

  10. Have you done work with Transformers models ?
    Not yet. But they are in our roadmap

  11. Could this technique be applied to audio sequence models?
    We haven’t tried that, but if the audio sequence model is a CNN model, it should work.

  12. What about noise level since more nodes lead to more deep layers and more noise?
    We have ongoing R&D tasks on that. However, based on what he has seen, smaller/deeper networks are more robust comparing to shallower and larger networks.

  13. Would transformation after compression work?
    It could work, but based on our experience if u compress the network it will be very hard to compress it further.

  14. How do these results translate to real speedups (throughput/latency) in real world cases?
    Usually, you can’t fit these models on Edge devices which means almost impractical performance on those HW. By having them on on-chip memory you can run them much faster. However, as I mentioned earlier, you need more techniques like Quantization to make it better on Edge HW.

  15. In case of multi-layers replacements, the optimization would be layer by layer or there would be a specific/optimized strategy? if there is, how this can be reached for different models?
    As a part of the exploration process, we learn where and how to insert these layers. It is a learnable approach. So after you find the place and the transformation function, then you can replace the layers. We may have another session later to explain that in more detail.