We held our next tinyML Talks webcast with two presentations: Chao Xu from Eta Compute has presented Enabling Neural network at the low power edge: A neural network compiler for hardware constrained embedded system and Brian Turnquist with Rodney Dockter from Boon Logic has presented Amber: A Complete, ML-Based, Anomaly Detection Pipeline for Microcontrollers on November 24, 2020 at 8:00 AM and 8:30 AM Pacific Time.
Chao Xu (left) and Brian Turnquist (right)
Neural Networks continue to gain interests for deployment in IoT and other mobile and edge devices. Yet enabling a NN in a hardware constrained embedded system such as low power edge devices presents many challenges.
In this presentation we will show how Eta Compute took an integrated approach to minimize the barrier to design neural network for ultra-low power operation, with an example for embedded vision application:
- Neural network design and optimization for the embedded world: memory, compute power and accuracy
- Hardware and software co-optimization to improve the energy efficiency
- Automatic inference code generation based on the model graph by a proprietary hardware-aware compiler tool
The audience will gain an understanding of the integrated approach (hardware/ software considerations) and an understanding of what is possible in terms of efficiency on modern sensor node processors for vision.
Chao Xu brings more than 20 years of experience in the advanced signal processing and machine learning, networking semiconductor, and silicon photonics. Prior to Eta Compute, Dr. Xu served as senior director of communication systems and computing and storage platform at Inphi Corporation. Prior to Inphi, he held senior R&D positions at Integrated Device Technology and PMC-Sierra. Dr. Xu has over 30 pending and awarded patents. He received his Ph.D. from the University of Pennsylvania, a master’s of science degree and a bachelor of engineering degree in electrical engineering from the University of Science and Technology of China. His research area includes speech recognition, noise robustness, feature extraction, and other general machine learning methods.
Sensor anomaly detection pipelines deployable on microcontrollers typically begin with data collection which is followed by off-line training and model-building on multi-core, high performance compute resources. The resulting model is static and may require additional pruning prior to deployment. Furthermore, the model may not translate to other sensors, even identical sensors monitoring identical assets running the same motion profiles. This talk will demonstrate a complete, unsupervised machine learning-based, anomaly detection pipeline that is deployable on low-power microcontrollers such as the ARM Cortex M7. Using live sensor values in real-time, the Amber algorithm seamlessly tunes its hyperparameters, then trains its ML model, and finally transitions to anomaly detection mode where it can generate thousands of inferences per second with extremely high accuracy. Since each microcontroller autonomously customizes its ML model to its associated sensor, this approach is suitable for deployments to billions of IoT sensors.
Brian Turnquist has worked in machine learning for the past twenty years developing numerous novel algorithms for automatically clustering biological signals in real-time. Turnquist is CTO of Minneapolis tech start-up, Boon Logic, and a previous visiting researcher at the Universities of Nürnberg and Heidelberg, and tenured professor at Bethel University. His Ph.D. is in Mathematics from the University of Maryland with fourteen refereed publications in neuroscience and mathematics.
Rodney Dockter is an engineer with a broad background in robotics and machine learning. Application areas include industrial automation, autonomous off-highway vehicles, mobile robotics, and surgical robotics. Rodney is the Director of Computer Vision at Minneapolis-based Boon Logic, and a lecturer at the University of Minnesota. He holds a Ph.D. in Mechanical Engineering from the University of Minnesota.
Feel free to ask your questions on this thread and keep the conversation going!
Besides using tflite models as a starting point, are there plans to **directly ** support matlab?
Ans: Yes. In about 3 months.
For multiple device optimization, how does TENSAI takes care of data dependency between different layers.
Ans: For inference, data usually flows forward from input to output. The compiler takes care of the data pipeline with DMA in the background. If there are some dependence among multiple layers, Tensai compiler will wait for all data sources available before moving to next layer.
Can you support facial recognition under the same power consumption specifications?
Ans: Currently we have not implemented facial recognition yet. But this is our next effort. I think there will be some trade-off between the power and accuracy. I think we can use it as an always on visual wake up application to trigger more accurarte NN for further processing.
Are all tensorflow operations supported?
Ans: At this moment, we support TFLite kernels.
Is there any alignment with machine learning standards such as Open Neural Network Exchange (ONNX)? Ans: Yes.
What are the devices supported in TENSAI compiler ? I mean is it only supported on ETA devices or other devices too? Do you have any results of TENSAI compiler compared with TVM apache open source compiler?
Ans: Currently we support Eta Compute (ECM3531, ECM3532) devices. This can be extended for other 3rd party silicons. Tensai is more optimized based on hardware and software co-optimizations.
are currently binary networks supported? xnor operations? Yes.
Thoughts on Mythic or Flexlogic devices? In future.
Will Tensai flow be available for free? Any dev kits coming? Only for certain partners.
what’s your experience with the TFLite quantized model on the LSTM/GRU type of networks? seems TFLite has limited support on these… Ans: We further optimize it with special optimizations.
The power number you given is that measuring the e2e power, or just the computation core? Ans: Some I/O turned off during measurements and e2e.
Can you give more details about how did you measure the power and energy? We measured the single VDD power pin and turn off LED GPIO I/Os. All supplies are generated inside the chip from this single external supply.
are you supporting custom op? if so, how? YES
What about segmentation, for example lane following. Would this be do-able?: We have not implemented this.
What is the RAM size for the People counting demo using Mobilenet v1?: 256K SRAM