tinyML Talks on February 8, 2021 “CMSIS-NN & Optimizations for Edge AI” by Felix Johnny and Fredrik Knutsson

We held our next tinyML Talks webcast. Felix Johnny and Fredrik Knutsson from Arm has presented CMSIS-NN & Optimizations for Edge AI on February 8, 2021.

The talk is centered around performance optimizations for Edge AI applications. We’ll begin with identifying bottlenecks in the inference of ML models and move on to ways to handle them. A major part of the solution is in the use of a specialized library like CMSIS-NN which provides optimization for compute-intensive operators targeting Arm Cortex-M processors. Common optimization methodologies used in CMSIS-NN will also be discussed. We have something for model designers by showing how shapes of operators affect performance and some solutions to handle it.
In the end, Fredrik will give a live demo of CMSIS-NN together with TensorFlow Lite for Microcontrollers showcasing the benefits of optimization using an Arduino Nano 33 BLE sense board.

Felix Johnny is the maintainer of Arm’s open source CMSIS-NN library that targets optimized Neural Network kernels for Cortex-M CPUs. He has spent most of the last 15 years in the wireless domain working with software design and optimizations in memory and cycle constrained systems. Outside of work, he is an active music photographer.

Fredrik Knutsson is the technical lead for the Arm team working on Ethos-U55 and Cortex-M integration into embedded runtimes. He holds a M.Sc. in electrical engineering from Chalmers university of technology. Fredrik has more than 15 years of experience in the embedded software domain, doing mainly software architecture and system design. Four the past four years he’s been working for Arm and has previous experience from the wireless, wearable and automotive business.

==========================

Watch on YouTube:
Felix Johnny and Fredrik Knutsson

Download presentation slide:
Felix Johnny and Fredrik Knutsson

Feel free to ask your questions on this thread and keep the conversation going!

==========================

Q: Are these MACs per a single prediction?
A: Yes, for one inference

Q: Is there a tool to profile NN model? Something that given a network, dumps how many calculations/MAC are made and resolution of it?
A: We use an internal tool to get the nbr of MACs per inference. There may well be public tools for this purpose as well though.

Q: What is the you tube library url for past session recordings?
A: tinyML - YouTube

Q: Does the CMSIS-NN also support Recurrent Architectures like LSTMs?
A: CMSIS-NN as such is not a framwork, so the question is more like: does the overlying framework support LSTMs? I believe it’s being worked on by the TFLM team of Google. (Best wa y to check is searching here: Issues · tensorflow/tensorflow · GitHub ) When/if supported by the overlying framework, there may well be a good motivation to support the involved ops of an LSTM cell in CMSIS-NN, if they numbers if cycles spend in those ops are considerable.

Q:Is it possible to share the model architectures for the person detect model and the other workloads in the TinyMLPerf benchmark?
A: It’s available here: tensorflow/third_party_downloads.inc at de8e18a12a13802e507393798d68b7f9945a28fc · tensorflow/tensorflow · GitHub

Q: When will Ethos-55 boards be available on the market?
A: Later this year is the time line for that…

Q:Do you have any pointers to use the CMSIS-NN kernels outside of the arduino environment?
A: There are many ways to do that. A good starting point is here: tensorflow/README.md at master · tensorflow/tensorflow · GitHub Example 1 shows you an example of the ‘bare metal’ way of enabling CMSIS-NN using TFLM. This way can then be used on any of the TFLM exmaple usecases.

Q: The person score and no person score seems to be way higher with CMSIS-NN on is this expected?
A: This reflects the uncertainty. When I did the demo without CMSIS-NN there were no image (elephant, einstein) in front of the camera. That’ll probably cause a higher degree of uncertainty. Regardless, CMSIS-NN optimized kernels are bitexact to TFLM reference kernels, hence there are no functional differences between them.