Phoenix Chapter Meeting July 28, 2020 - Implementation Considerations for ML at the Edge of the Cloud

We held our next tinyML Talks webcast. Mike Stanley from SenSIP presented Implementation Considerations for Machine Learning at the Edge of the Cloud on July 28, 2020.


  1. Introduction - Opening Remarks, A. Spanias (3 min)
  2. The Tiny ML Foundation and MeetUps, E. Gousev (5 min)
  3. Connection to the MEMS & Sensors Community in Phoenix, S. Whalley (5 min)
  4. Introduction of Speaker, S. Jayasuriya (3 min)
  5. TinyML Seminar: Machine Learning at the Edge of the Cloud, Mike Stanley, (40min)
  6. Discussion and Q&A Session, S. Jayasuriya (moderator)
  7. Round Table & Next Steps (all)

Talk Abstract
Most machine learning classes focus on the learning algorithms, but largely ignore larger system issues. Implementation Considerations for Machine Learning at the Edge of the Cloud examines the broader set of problems that have to be dealt with before any embedded implementation can be brought to market. These range from hardware choices (MCU vs MPU, communications media, sensors, board implementation, etc), packaging issues, machine learning library choices, data collection, feature engineering, cloud interfaces, security and more. All of the above are discussed with the benefit of hindsight from an engineer tasked with putting together an end-to-end design system for embedded machine learning while working for a major semiconductor manufacturer.

Speaker Biography
Mike Stanley spent almost four decades in the semiconductor field at Motorola, Freescale and NXP in areas ranging from circuit design to machine learning. He is author/co-author of 8 patents, numerous publications, and is a contributor to Measurement, Instrumentation and Sensors Handbook, 2nd edition. Mike was inducted into the MEMS & Sensor Industry Group Hall of Fame in 2015 and is a Senior Member of the IEEE and IEEE Standard 2700-2014 contributor. He co-authored “Sensor Analysis for the Internet of Things”, published in 2018 by Morgan & Claypool Publishers. Mike continues his association with the Sensor, Signal & Information Processing Center (SenSIP) at A.S.U. and is one of the organizers for the Phoenix Chapter of the TinyML organization.

Mike Stanley
Here are the questions and answers (from Mike) from the meeting:

  • What is X IDE on slide 20?
    “IDE” is a generic term that stands for “Integrated Development Environment”. I happened to use the NXP Semiconductor MCUXpresso tools, and that icon is the logo for that tool. Basically it’s an Eclipse-based environment for coding and debugging assembler, C and C++ programs on ARM microprocessors.

  • did you consider MicroPython?
    No we didn’t. We wanted something that fit into the NXP standard development ecosystem and included support for all the libraries supported by MCUXpresso. I’m also unaware of any ML support built into MicroPython (if I’m wrong about that, I would love to hear details…).

  • Is there a difference between Edge computing (which I think Mike is describing) and the part of TinyML which pushes Models and DNN all the way down to the device and/or it’s sensors? Does TinyML stop at some level going up the stack?
    Edge Computing simply refers to code running in devices connected to the internet, usually in an MCU, sometimes using an MPU. Often there are sensors attached to those processors. TensorFlow Lite for MCUs clearly can fall into that application category.

  • Is there a clear choice between MPU and MCU for edge ML? Or is the answer "it depends?
    The answer is almost always “it depends”. MPUs (microprocessors) generally have more horsepower and memory available, but at higher power levels, more PCB space and higher cost. Almost by it’s nature, TinyML, which is defined to be <= 1mW, is going to be running on an MCU.

  • Can you share your notebooks with this group?
    To avoid even the possible appearance of conflict of interest, I didn’t keep a copy of my NXP materials when I retired. But the good news is that the scikit-learn and TensorFlow communities have lots of great examples. One thing I will point out is that a lot of ML examples I have seen online omit the data normalization step during training and in the embedded code. Make sure your notebooks do the normalization and write those parameters to the code to be used in the embedded environment.

  • Have you tried tinyML platform cainvas ( for C++ ML models for MCUs?
    No I haven’t. But I’ll check it out. Thanks for the pointer!

  • what file format does OpenCV use for model export? does it support a format like ONNX?
    OpenCV can import/export models in an XML format (NOT related to ONNX). That’s nice because it’s human readable, but horribly inefficient.

  • Which sensor board is this? Thanks
    The board in the presentation is one that my team developed around the i.MXRT1062 MCU from NXP Semiconductor. As far as I know, it’s not generally available. But if you would like a contact at NXP, please reach out to me at and I’ll put you in touch with someone from my old team. I’ll also note that the Arduino Nano 33 BLE Sense has a very nice sensor complement, as does the Clue board from AdaFruit.

  • do you recommend any model deployment methods to deploy your trained model on tiny platforms/MCU? It depends… I’ve written export scripts from Matlab that write models directly into compilable C. For the flow I showed, we did a bit of on-the-fly text editing to transform OpenCV XML into embeddable C strings which OpenCV conveniently has a function for reading. As noted above, that is horribly inefficient, but had the advantage of applying to all model types in OpenCV. We also intended (but hadn’t by the time I left) to implement an over-the-wire update process using that same XML.

  • Can you give us some insight into how your team gathered X-ray specific data without impacting board performance?
    There may have been a bit of performance hit, but I think it was pretty minor. FreeRTOS has some hooks in it for monitoring memory and we took advantage of those. Also ARM cores have a hardware timer built into them that we used to instrument instruction counts.

  • did you consider MXNet GluonCV?
    No, we were specifically targeting classical machine learning techniques (vs deep learning).

  • is another choice better than TensorFlow micro.
    No, I wasn’t aware of it. In looking at the Git repo, it looks like we started our project about 8 months earlier than the first commit of deep-C. The other thing I’ll point out is that the actual ML capability is a minor part of an overall project like this. We spent much more time on sensor drivers, communications, etc. That’s actually a key point of the presentation. ML advocates (and I am one) can get caught up in the ML component without realizing that all the “other stuff” will actually be what keeps them from getting a product to market. But thanks for the pointer to deep-C. I’ll check it out.

  • Do you believe that TinyML, could be integrated with Blockchain? So the node can mine too.
    TinyML isn’t a library per sae, it’s more a concept. Lots of implementations could apply. That said, I don’t know enough about Blockchain to render an opinion.

  • For the applications you were building the kit for, did you easily run out of memory and computation resources to implement an ML algorithm?
    We had to pay attention to it. We had at least ten sensors running concurrently on the board, simply servicing those took a lot of time and memory. It also matters whether you want to do in situ training, or off-line training. Generally, the model inferencing was the least of our concerns.

  • what are the typical code size requirements for tiny ml based algorithms
    There’s code size and there’s data size. Both can kill you. My understanding is that the interpretor for TensorFlow Lite for MCUs can fit in 20K, but your model might easily take more. Part of the Tensflow Lite process is going through a quantization and trim step to reduce model size, so part of the answer relates to how well you do that. TinyML’s stated goal is to be able to run in very constrained hardware environments.
    Our OpenCV implementation was much beefier. The libraries themselves took several hundred kB. We put an inexpensive external RAM on our board that essentially took care of memory considerations for that implementation.

Additional information relating to this meeting

Watch on YouTube:
Mike Stanley

Download presentation slide:
Mike Stanley

Feel free to ask your questions on this thread and keep the conversation going!