Two tinyML Talks on August 18, 2020 by Mark Stubbs from Shoreline IoT Inc. and Urmish Thakker from Arm ML Research

We held our tinyML Talks webcast with two presentations: Mark Stubbs from Shoreline IoT has presented Practical application of tinyML in battery powered anomaly sensors for predictive maintenance of industrial assets and Urmish Thakker from Arm has presented Pushing the limits of RNN Compression using Kronecker Products on August 18, 2020 at 8:00 AM and 8:30 AM Pacific Time.


Mark Stubbs (left) and Urmish Thakker (right)

Detecting anomalies in industrial equipment provides significant savings by preventing unplanned downtime and costly repairs due to unnoticed trends towards complete failure. Problems are detected and corrected early to attain maximum useful life from an asset. The combination of tinyML, low power wireless, integrated sensors, and IoT cloud enables a low cost and easy to install system to monitor industrial assets distributed throughout a factory. In this talk, we will present how tinyML is utilized for anomaly detection along with other sensor techniques to create a long life battery optimized solution for condition based maintenance in industry. We will also show a live demonstration of a tinyML-based end-to-end system solution.

CTO and Co-Founder at Shoreline IoT leading a team to revolutionize data gathering and predictive maintenance in industrial applications. Formerly worked in IoT at Google and Echelon to build networked systems to gather sensor information and make it actionable. Mark Stubbs also worked as Systems Engineer at Apple.

This talk gives an overview of our work in exploring Kronecker Products (KP) to compress sequence based neural networks. The talk is divided into two parts. In the first part we show that KP can compress IoT RNN Applications by 15-38x compression factors, achieving better results than traditional compression methods. This talk covers a quick tutorial on KP and the best methodology for using KP to compress IoT workloads. However when KP is applied to large Natural Language Processing tasks, it leads to significant accuracy loss (approx 26%). The second part of the talk addresses this issue. We show a way to recover accuracy otherwise lost when applying KP compression to large NLP tasks using a novel technique that we call doping. Doping is a process of adding an extremely sparse overlay matrix on top of the pre-defined KP structure. We call the resultant compression method doped kronecker product (DKP). We present experimental results that demonstrate compression of a large language model with LSTM layers of size 25 MB by 25x with 1.4% loss in perplexity score using DKP and show that it outperforms other traditional compression technique.

Urmish Thakker is a Senior Research Engineer working at Arm’s ML Research Lab. His research focuses on efficient execution of neural networks on Arm devices. Specifically, he works on model quantization, pruning, structured matrices and low-rank decomposition. His work at Arm has led to multiple patents and publications. Prior to working at Arm, he has worked with AMD, Texas Instruments and Broadcom as performance modelling, design and verification engineer contributing to the development of multiple products. Urmish graduated with a Master’s Degree in Computer Science from University of Wisconsin Madison in USA and a Bachelor’s Degree in Electrical Engineering from Birla Institute of Technology and Science in India.

==========================

Watch on YouTube:
Mark Stubbs
Urmish Thakker

Download presentation slide:
Mark Stubbs
Urmish Thakker

Feel free to ask your questions on this thread and keep the conversation going!

Hi Mark,

Thanks for a great presentation. I have a questions about training data used.

How many training samples do you need and how often do you sample? For vibration data sampling, for e.g, the machine could be active or idle. How do you know when to sample?

Appreciate your response. Thanks!

Mark, Here are some questions from the audience. Could you please help address them?

  • Does your model apply to something as generic as a “washing machine” in which finding a normal load balance and “normal” vibration could be difficult as it’s based on the clothing load in the machine and operator error in loading? Is there the concept of variablity in operational normal?

  • What is your MCU? Arm Cortex-M4

  • What is the low power wireless communication standard used for this work?
    We support 4g/5g, NB-IoT, LoRa

  • What ML approach are you using to detect anomalies? time series, unsupervised learning?
    We used a combination of time series and unsupervised learning

  • Are these be available now? From The Maintenance Geek

  • How often is training data sent to the backend?

  • Have you used any physics based knowledge i.e. physics based modeling to capture physics of the machine to be monitored when the models are trained?

  • Is the data collection for the training is done on battery? If so, how much battery is used in the initial training phase?

  • What are you doing that is unique to get 5 year battery life?

  • what if machines have weekly, monthly or yearly cyclic load patterns? will the sensor learn correctly?
    Can/could the sensor train/update its model over time to accommadate machine aging?

  • What is the sensor used for this demo?

  • What board are you using for that?

    1. How susceptible is the anomaly detector to external vibration coming from neighboring machines or intermittent load changes? 2) how do you classify anomalies so reliability teams know what problem is present?
  • I understand how this works with detecting bearing/vibration problems, what other applications can you use this technique with?

  • Does the model detect anomaly signals that might not have seen in the past?

  • Is it only acceleration data used?

  • what are the steps for installing such a unit? does it require professional help?

  • You detect the anomalies when happens or before it happens if it’s before how much time before?

  • can I run this device on power? how powerful is the MCU for real-time number crunching?

  • can we download all the raw data captured by the device and process it ourselves instead of tinyml?

Thanks!

Hi Urmish,

Here are the questions from your session:

  • Is this compression method uses more energy on the device than without using compression for the similar application?

• Is rank a good measure of expressiveness?

  • Hi, Is this compression implementation available in Tensorflow lite for microcontrollers ?

  • Can we fine-tuning the compressed models?

  • The latency reduction was low relative to other compression techniques. Do you foresee KP latency improving due to e.g. HW acceleration of KP?
    What is the reason that kronecker (with no modiifcation) works for IOT but not for larger models?

  • As a student undergraduate, AI on the edge fascinates me. Can you suggest some resources and forums where I can look forward as a beginner ? I find difficulty in model conversion into c/c++ binary file.

  • Most of the LSTM computations are matrix multiplication, how does Kronecker product be applied to matrix multiplication in LSTM? will it bring more computations?
    is there a open tool which I can use to compress my own model using Kronecker product?

  • How do you enforce sparsity during training?

  • Can you control the sparsity of sparse matrix? How can you make sure it is sparse actually?

  • In Doped KP, does it increase the FLOPS because we introduce matrix addition? does it affect inference time?

  • Would the introduction of the sparse matrix makes the previous fast inference via linear algebra trick disappear?

  • Is training a doped kronecker product easy? Why do u start from a dense matrix?

*Were the presented results computed using CPUs or GPUs?

Thanks!