We had a lot of questions and couldn’t get to them all! I’ll take a shot at answering the remaining ones, below. Note that these are my answers, not Pete’s—though Pete is welcome to chime in with additions and corrections
What are the current research challenges in TinyML.
There are many! Here are a few of the big topics:
- Model compression; how do you get a model that has the same performance but is smaller in size or takes less time to compute? Includes quantization, sparsity, and binarization.
- Runtime architectures. TensorFlow Lite uses an interpreter, but some approaches use code-generation.
- Hardware architectures. Can we design custom silicon that is especially suited to running tiny models with low power?
- Training techniques for tiny models. Reducing model size has only recently become a goal; are there approaches to training that give better results than the approaches used with larger models? Includes distillation, pruning.
I have been using TinyML for a while. Any guidelines on how much memory should be allocated for the tensor-arena in inference engine ?
It’s currently best tackled by trial-and-error; write some unit tests that run your model, start with a big tensor arena size, and just keep making it smaller until your code no longer runs. The size required for a given model remains static between inferences.
Q: If we do research which is best device for benchmarking.
We’ve found it useful to focus on devices with an Arm Cortex-M core, since these represent the typical range of specs for embedded devices. The M4 is probably the sweet spot right now in terms of balance between capability and power use.
is there a danger of over-optimizing for Tiny-ness? For e.g., we spent years developing computationally-efficient ML techniques but GPUs came along and now you can do deep learning on desktops. Could the same thing happen in the next 5 years where we have large compute on tiny form-factors or do we run up against the laws of physics w.r.t to power requirements?
There is some particularly impressive hardware in development that will make it easier to run larger models on-device, or run the same models using less power—for example, Arm’s Ethos-U55, or Eta Compute’s ECM3531. Devices are likely to become more capable, but the same is true of higher power accelerators, so TinyML will continue to exist as a concept.
What is the inferencing rate of the camera on sparkfun edge?
I believe it’s a few seconds per inference using the latest Arm CMSIS-NN optimizations. I haven’t tried this yet myself, though!
Is there an open source driver for the camera for the sparkfun edge?
Yes, the camera module is the HM01B0 and you can find the driver here:
Is the K210 / Sipeed boards supported?
I haven’t used the K210, but I think it comes with a compiler that is able to generate C code from TensorFlow Lite models. This is a different approach than used by TensorFlow Lite for Microcontrollers, which is a runtime that can interpret and execute the models directly. So you can use some of the TensorFlow Lite tooling, but not all of it.
Any HW support/suggestions for training on the edge?
There’s not much out there around training on the edge! It’s generally difficult due to the compute constraints and lack of labelled data, but I believe TensorFlow Lite will support training on mobile devices soon. It’s unlikely this will come to TensorFlow Lite for Microcontrollers.
Is there a benchmark program you recommend to measure whether a device can run TinyML efficiently?
You could start by modifying the TensorFlow Lite for Microcontrollers test suites, which run several different models:
Hi. Looking to create Edge ML for nature classifier with a camera and audio to use on an IoT network on the Edge for a park. Finding that balance with power such as using Coral which we do now that needs a power line and battery power that lasts a few days at least what would you recommend.
This depends a lot on what you need to do with the camera. If you’re looking for real-time inference, you’ll need something bigger than TinyML (at least for the time being). If you can get away with an inference every few seconds, something with the power of an Arm Cortex-M4 should work.