tinyML Talks on March 8, 2022 “On-device model fine-tuning for industrial anomaly detection applications” by Konstantin Meshcheriakov

We held our next tinyML Talks webcast. Konstantin Meshcheriakov from Klika Tech presented On-device model fine-tuning for industrial anomaly detection applications on March 8, 2022.

March 8 forums

Lifelong machine learning is an advanced paradigm for improving the performance of anomaly detection in changing conditions of industrial environments. With on-device fine-tuning, pre-trained neural networks can adapt to new data. Efficient on-device learning can be done with a small memory footprint allowing models to run inference and continuously fine-tune newly collected data.

Join this talk to learn more about improving the flexibility of ML models and avoiding issues connected with continuous training, such as catastrophic forgetting. This presentation will document the process of moving an AWS cloud-based anomaly detection application to an MCU in the same time decreasing infrastructure costs and simplifying the operational efforts.

Konstantin is a solution architect at Klika Tech with strong experience in building embedded and machine learning solutions. Working closely with the clients, he is responsible for architecture creation and initiation of new IoT and ML-related projects. He also leads the machine learning competency and internal courses in the company.


Watch on YouTube:
Konstantin Meshcheriakov

Download presentation slides:
Konstantin Meshcheriakov

Feel free to ask your questions on this thread and keep the conversation going!

  1. Is this anomaly detection algo is supervise based or unsupervised base?

It is unsupervised one.

  1. Do you make use of any anomaly data to improve performance of the model?

As the cloud solution requires training the model from scratch each time the solution is deployed to the new cloud account (there is no API to load the trained model), the user of the solution needs to let the development kit work for some time in normal mode before using the anomaly detection capabilities. If during this time some small amount of anomalous data is included in the data, the model will still provide a robust anomaly detection, but the performance is unlikely to be improved.

  1. How to validate the model?

Anomaly detection is notoriously difficult to validate. As it is an unsupervised learning algorithm there are usually no labels that would allow using usual methods such as ROC or PR curves. The model itself can be validated on a similarly labeled dataset, for example, but in industrial settings, the data and the anomalies can be quite different.

The labels could also be generated automatically if we have a set of “normal” data by modifying the data in a way resembling what the anticipated anomalies would look like. It would require strong domain knowledge, though.

Another way would be to use methods, designed to work with unsupervised learning, for example, [1607.01152] How to Evaluate the Quality of Unsupervised Anomaly Detection Algorithms?

  1. What’s your monthly Amazon bill?

The cost of the deployed cloud-based solution would be about $50 a month for an infrastructure that can support up to five evaluation kits.

  1. Would you consider using AWS Greengrass versus AWS IoT core? is there a pro/con for using IoT core versus Greengrass?

Greengrass would possibly save money by processing the data locally without using IoT Core and Amazon Kinesis, thus removing a lot of cloud components. It will, however, require a local edge gateway, or PC, and so it does not play really well with our development kit vision.

We also have a project for anomaly detection using AWS IoT Greengrass, if you are interested, you can contact me for the details.

  1. Which are the time requirements/constraints for response?

For the evaluation kit, we didn’t have any time requirements. You can, however, expect latency to be in the range of 1-2 seconds.

  1. How does the random cut forest deal with shifts in behavior of the population and its individuals over time (that may not represent a failure risk, but maybe wearing in for a population that had started new)?

A great question! RRCF is designed so when the data point is passed through it, there is a probability that it will be added to some of the trees. The interesting thing is that this probability is designed in such a way that it behaves in exactly the same way as if the data point was present in the initial dataset, and was sampled from it.
AWS provides an excellent presentation of how RRCF works in detail, Self-paced digital training on AWS - AWS Skill Builder. It requires registration, though.

  1. For example, historically, used BMW road car engines made the most reliable F1 racing engines, since they were worn in already… I anticipate that the sensed patterns would look different on these.

Exactly, and this is why model fine-tuning is such an important thing. And also why anomaly detection is difficult in real life. The patterns of a worn-out engine would differ from those of a new one, but not enough to indicate a nearing failure.

  1. Perhaps to generalize… how to bootstrap a reliability bathtub curve, given a new population of equipment?

Quite a difficult thing to do, especially considering that as indicated by a bathtub curve, the equipment tends to fail either at the beginning of exploitation or after some significant time. I would select some equipment, let it run for some time, and then collect the data for the model. It would also be great to collect the data for the failed equipment, along with the reason for failure, so that more sophisticated predictive maintenance models could be created.
Collecting such data is one of the purposes of the evaluation kit.

  1. Does NN approach perform better than Robust Random Cut Forest Approach?

Yes, the anomalies that we see are more distinguished and the inference time is faster.

  1. Is the demo using RNN?

No, the demo uses the autoencoder with fully connected layers. From our experience, recurrent neural networks are more difficult to train, especially if we are to do that in a streaming manner. Convolutional autoencoders, on the other hand, can provide a lot of benefits of RNN, while being much easier to train and fine-tune.

  1. The autoencoder is trained only with data without anomalies?

The data for the autoencoder training was collected from the working unit, so all the data is generally “normal”.

  1. Is it a battery-based system or supplied with power lines?!

The equipment we were aimed is usually mains-powered, so a reliable power source is usually available for the device.

  1. The input of the autoencoder is a kpi of the signal (mean, standard deviation, etc) or the signal itself?

For the high-frequency data, the signal is aggregated, and for lower-frequency data, such as a pressure sensor, it is used in the raw form.

  1. But if you can have anomalies then you can create sythetic data and validate the model right ? But you said its very difficult to do the validation

Yes, but there is no guarantee that the synthetic data would accurately present the actual anomalies that could be encountered during the equipment operations. So this method may not represent the accurate estimation of the model’s quality.

  1. Is the EDGE TNN Microcontroller dependent?

No, it can work everywhere the TFLite interpreter works.

  1. Do you fine tune the model with just new normal kind of data?

In the demo, we fine-tune the model with the data model considered anomalous before. We assume that we want to fine-tune only when we want to introduce new normal data distribution.

  1. Is this deployment possible with mbed oS ?

Absolutely, the library is RTOS-independent.

  1. Does the C++ library you just mentioned has support to custom layers designed using tensorflow and lite models?

If we are talking about our EdgeTNN library, then it supports only the fully-connected layers for backprop right now. If it is about the underlying library, then it is just a general linear algebra library called Eigen (Eigen)

  1. Could you repeat the name of the C++ library that was used for backpropagation? Was it open source?
  2. Did you mention the name of the c++ library ?

It is called Eigen (Eigen), it is a general linear algebra library, and it uses a permissive MPL2 license.

  1. Did you have any issues with limited RAM on the MCU?

Not really, backprop only added a few additional kilobytes of RAM on top of what the basic model needed.

  1. How the backpropagation is implemented? in terms of memory footprints, data transfers, data types, etc. 2) how about switching from one device to another? how your schema is being optimized accordingly? how about fin-tuning of quantized models …

Basically, we implement mini-batch gradient descent. The footprint for this specific model is in the order of several kilobytes, the data type is currently stored in the float16 type and we don’t optimize for the controllers’ cache as it is highly hardware-dependent.
It also means that switching to another type of device is easy, but the performance can be sub-optimal. Supporting the hardware optimized operations and quantization is in the roadmap.