Whats the catch...Arduino vs TPU/VPU/GPU performance and limitations

The big question I have around TinyML is how it is able to perform inference on a Arduino with such limited computing resources in an 8bit AVR. From my own experimentation with DL and CV a Raspberry Pi 3B+ with a considerably more power CPU struggles perform inference on a basic classifier and perfornance improves drastically using a GPU let alone a TPU for Tensorflow. Lets not forget about OpenVINO and the Movidius VPU.

What confuses me is that even testing basic DSP convolution on a MPU works best with MPU’s with FPU’s and multiple execution pipelines in other words Cortex M3/4 or dsPIC etc. Now with DL inference you adding on the additional computational over head it seems “preposterous” that an Arduino could even begin to do any kind of inference.

So what are the limitations of TinyML using an Arduino over using more powerful hardware. This might also then lead to the queston what are the suitable applications of TinyML.

Note I am a DL absolute noob so perhaps the gaps in my knowledge are to blame for this thinking. The thing is that the amount of effort one has to invest in learning DL means that one need to carefully choose the path to go down. For AIoT I am am a juncture of TinyML vs OpenVINO for more basic DL on sensor data and I never even considered an Arduino till now in fact I have given up on 8bit MCUs.

Looking forward to any replies or perhaps links to educate in this regard

Hi @zageek,

Good question! Most of the work done with embedded machine learning on the Arduino platform is on their 32-bit devices; the Arduino Nano, MKR, and Portenta all use Arm Cortex-M based cores. For the TinyML book we provide some sample code for the Arduino Nano 33 BLE Sense since it has a nice selection of sensors, including a microphone.

Since there are many ultra-low power 32 bit MCUs available at low cost, I haven’t seen much embedded ML work targeting 8 bit MCUs. You’re right that the speed and memory probably make it difficult to do much beyond running very simple models.

If you’re interested in learning deep learning for embedded devices, I’d recommend getting familiar with the TensorFlow framework since its model format is compatible with a lot of the toolchains available for embedded ML.

That said, you can avoid the requirement to become a machine learning framework guru by using some of the high level tools that have emerged over the last couple of years and can help you easily generate a model and accompany DSP code from a given dataset. I’m biased, but you might consider checking out Edge Impulse, the product I work on :slight_smile:


Hi, I am also very new at this, so many thanks for the discussion. While I know there is fantastic work being done in getting maximum performance within the limitations of various MCUs, I am looking at the promise of TinyML slightly differently.

My understanding is that a real strength will be the ability to have a large number of MCUs working on the same effort. Basically, the army of ants approach. I would think that even very limited MCUs would be powerful in large numbers.

Thank you for any thoughts on this

Best regards,

I’ve done a little bit of ML on the Arduino Uno, which is 8-bit and runs at 16 MHz. Naturally there is a limit to how much compute this can do. Just as importantly, you’re also very limited by the amount of RAM (only 2 Kb).

You can do ML on these devices but you’re obviously limited to very basic models – or using new, smart algorithms such as SEFR.

Using TensorFlow on a device such that is a no-no, as the overhead of using such a library will already eat up all your RAM.

I just read the SEFR paper https://arxiv.org/pdf/2006.04620v1.pdf. They argue that a learning model which can be trained in O(n) time is useful in an embedded context. But SEFR requires supervised learning. How are you supposed to do supervised learning in a field environment?

As with any supervised learning model, it’s possible to train it beforehand and to put the trained model in production. In the case of SEFR the trained model just requires M + 1 learned parameters (where M is the number of features in the data). So it’s really small once it has been trained.

Depending on the application, it may also be possible to do supervised learning in the field. This of course requires the user to collect training data and label it. In the case of a mobile phone app, for example, this can be done as part of the user interacting with the app (where they won’t even realize they are labeling data).

For an ML-based sensor of some kind that needs to run autonomously, supervised learning in the field makes less sense.

Sure, but the SEFR shows only minor (and sometimes negative) accuracy improvement over naive bayes, but a lot larger inference energy than naive bayes. So I guess I struggle to understand SEFR’s niche.

From the reported figures it seems that the much larger energy usage is primarily for multiclass classification problems, not for binary classification. This is not so strange, as SEFR is a binary classification algorithm and using it on N classes means running it N times.

I see it as just one more tool in the toolbox. In some situations it might be better than other classifiers, in other situations it might not be. (Which is true for all of them, of course.)

Hi Dan

Thanks for the excellent feedback. I am taking a look at Edge Impulse and it looks very cool I actually want to see if I can use it in a demo project!!

I am trying to break into and catch up in this field so I definitely like the idea of mixing embedded with ML. Disclaimer excuse my ignorance as I know shy of nothing in AI at the moment and I am learning and trying to build an intuitive understanding of the landscape as I go along. I see understanding the correct application of AI as actually the biggest challenge.

I have one major question around using these kinds of tools based on my research and self study into ML so please bear with me if its a dumb one.

It goes like this, say I want to use a microphone, edge ML inference and an MCU to detect the sound of a knock on a door.

A simple hypothetical scenario that I thought of to embody the type of problem domain I see edge ML being applied to.

So lets say I have all the stuff hooked up to get the sensor data to the MCU and run inference etc. In such a scenario how would I get enough sample data to train a ML model to detect knocks in as many potential sites as possible. If you think about the problem there are a wide variety of door designs made of different materials or of different types of wood and densities which would have different spectral responses in terms of the reflected and abosbed sound for a reference knock with the same impact force and impact source. Now add another dimension of freedom in the fact that different people have different size hands, arm lengths and strengths etc.Suddenly there is a lot of variation to deal with.

Now the million dollar question why would I go the ML route when I could apply some DSP approaches instead (cross correlation maybe) or I could measure the spectral response and use some kind of hard coded algorithm based on a study of the general characteristics of the problem and emprical data from experimental observation. In other words applying the scientific method so to speak. I could use some statistical calculations ( yes I know we are now approaching ML using inferencial statistics…) but ultimately avoid needing to use TinyML or any other edge AI inference in itself.

I can see that a ML will give the best possible results for widespread use if the right model is used that is correctly trained. I get why. However the million dollar question s how do I get enough training data and where can I find this? I know this is the old problem with supervised learning approaches.

To small a set of data and surely I could just as well go the DSP route? Note I use broad terms here without getting into the semantics.

So my ultimate question is what is the selling point of edge AI if one cant get enough training data to build robust models and its arguably harder to get access to raw or even labelled sensor data samples online like this than it is to use google images to find enough images to label to train a DL CV classifier for example.

What then is the value proposition, how much data is a enough to be effective with TinyML for example and in the absence of large amounts of pretrained models and data how does one approach using edge ML and is it worth the effort if you don’t have the resources to empirically gather representative training data?

I feel like this post is perhaps not laser focussed but I am asking the general question is edge AI hype, or is more performant than I would think than “normal” DL where one has often use large training sets
more readily available while in edge AI scenarios sensor data is 1 dimensional?

If I am going to use the knock detector and train it for a limited sample set for use at some specific house with a specific door, since the sample scope is so narrowed why don’t I just go and use mathematical/DSP an estimation techniques. Or do I only want to use ML to seem in vogue and part of the hype. I have to ask this as devils advocate.

For me its the chicken and egg scenario that’s forming a psychological block that is somewhat of an impediment to the motivation to invest time in learning this in more detail. Then again it could just be me with my entrenched legacy thinking. Or does one wait for the community to do this legwork for models and architectures to proliferate in the public domain like seems to be the case in DL at the moment where you can pretty much find a model to do anything that has been developed and trained by others focussed on specific problems…

What advice can you give to help me see the light because i really want to