Two tinyML Talks on June 23, 2020: 1) “A weight-averaging approach to speeding up model training on resource-constrained devices” by Unmesh Kurup (LG Electronics); 2) “Analog ML Is Relevant—Because Most Sensor Content Isn’t” by Brandon Rumberg (Aspinity)

We held our ninth tinyML Talks webcast with two presentations: Unmesh Kurup from LG Electronics America Research Lab has presented A weight-averaging approach to speeding up model training on resource-constrained devices and Brandon Rumberg from Aspinity has presented Analog ML Is Relevant—Because Most Sensor Content Isn’t on June 23, 2020 at 8:00 AM and 8:30 AM Pacific Time.


Unmesh Kurup (left) and Brandon Rumberg (right)

Training machine learning models on edge devices has definite advantages for security, privacy and latency. However, techniques such as Deep Neural Networks (DNNs) are unsuitable given the resource constraints on such devices. Optimizing DNNs is especially challenging due to the nonconvex nature of their loss function. While gradient-based methods that use back-propagation have been crucial to neural network adoption, optimal convergence of the loss function is still time-consuming, volatile, and needs many finely tuned hyperparameters. One key hyperparameter is the learning rate. A high learning rate can produce fast results faster but at the increased risk of the model never converging. In this talk, I explain one of the advances from our lab that show that by manipulating the model weights directly using their distributions over batch-wise updates, we can achieve significant intermediate improvements in training convergence, and add more robustness to the optimization process with negligible cost of additional training time. More importantly, this approach allows deep neural networks to be trained at higher than usual learning rates resulting in fewer epochs which reduces resource use and allows for lower total training time.

Unmesh Kurup is Senior Manager, AI in the Advanced AI group at LG Electronics. His current research focus is in the training and deployment of machine learning models for edge devices. Prior to joining LG, he was a Senior Data Scientist at Robert Bosch Research Technology Center where he developed and lead multiple data science projects in the areas of Energy, Healthcare, and ADAS. He has been working at the intersection of AI and ML for over 20 years.

Power has always been a challenge for battery-operated always-on devices, and with additional privacy and communication requirements pushing more data processing to the device, power is an even more critical constraint. Aspinity will discuss how analog ML cultivates an always-on system architecture that mimics the brain’s ability to use a small amount of energy up front to determine which data are important before committing higher power resources to further analysis. This approach allows designers to partition more efficient always-on systems that determine which sensor data are important while the data are still analog and subsequently eliminate the digitization and higher-power analysis of irrelevant data that will simply be thrown away.

Over the last decade, Brandon Rumberg has focused on the full stack of low-power sensing technologies, spanning integrated circuit design, embedded systems and signal processing, and software development kit creation and system integration. These combined skills provided the foundation for his new architectural approach to solving the power, size and cost issues with always-on higher-bandwidth signal-processing devices. Brandon holds multiple patents, has developed and taught three engineering courses, and has authored 20 publications—one of which earned him a Best Paper Award at the International Symposium on Quality Electronic Design, 2015. He received Ph.D., M.S., and B.S. degrees in Electrical/Computer Engineering from West Virginia University.

==========================

Watch on YouTube:
Unmesh Kurup
Brandon Rumberg

Download presentation slide:
Unmesh Kurup
Brandon Rumberg

Feel free to ask your questions on this thread and keep the conversation going!

Thank you Unmesh and Brandon for the great presentations!
We’ve managed to answer most of the questions live or via the chat during the talk, but I’ll add the missed ones below:

Questions for Unmesh Kurup’s talk

  • What is the size of your training set / dataset used in the plots shown?

  • If an edge device (already resource constrained) is used for training, what % of CPU and Memory resources are left for doing the “useful” work the edge device is meant to do?
    Moderator Answer: Typically this happens when the edge device is used at a low duty cycle - e.g., it is sitting idly for a reasonable fraction of time - e.g., a smart fridge is not doing much computation most of the time. Smart assistants are not doing much work most of the time, and only their front-end wake-word detector is running.

Questions for Brandon Rumberg:

  • Which dimensionality reduction techniques are you using in that first stage? PCA? Low Variance Filter?
  • By what percentage does this analog signal pre-prop step extend battery life of the small devices?
  • Please post a link to the notebook you used for the demo

Thank you
Ravi Sivalingam