Two tinyML Talks on August 4, 2020: 1) “Low power CV meets the real world” by Venkat Rangan (tinyVision.ai Inc.); 2) “Towards Ultra-Low Power Embedded Object Detection” by Theocharis Theocharides (University of Cyprus)

We held our twelfth tinyML Talks webcast with two presentations: Venkat Rangan from tinyVision.ai has presented Low power CV meets the real world and Theocharis Theocharides from University of Cyprus has presented Towards Ultra-Low Power Embedded Object Detection on August 4, 2020 at 8:00 AM and 8:30 AM Pacific Time.


Venkat Rangan (left) and Theocharis Theocharides (right)

As the tinyML community is acutely aware, adding Vision capability to a battery powered IoT device is non-trivial. The tremendous amount of vision data that needs to be processed necessitates the use of HW accelerators as well as clever algorithms that take advantage of data locality, sparsity and so on. A real world CV enabled IoT device requires attention to a range of other practical issues ranging from indoor/outdoor location, orientation, optics, sensor selection etc. This talk touches upon some of the practical considerations, tradeoffs and issues inherent in the design of a tinyCV system.

Venkat Rangan is the founder of tinyVision.ai Inc., a product design and consulting company specializing in IoT devices incorporating Computer Vision. Prior to founding tinyVision, Venkat was a Director of Engineering at Qualcomm Research where he co-founded and led the R&D of the ultra-low power Glance CV solution. He is the holder of more than 60 granted patents in various fields including low power conventional and neuromorphic vision. Venkat holds a BSEE from the Indian Institute of Technology, Roorkee and an MSEE from the University of Cincinnati. He can be reached at venkat@tinyvision.ai. www.tinyvision.ai

Embedded computer vision is nowadays adopted in several computing devices, consumer electronics and cyber-physical systems. Visual edge intelligence is a growing necessity for emerging applications where real-time decision is vital. Object detection, the first step in such applications, achieved tremendous improvements in terms of accuracy due to the emergence of Convolutional Neural Networks (CNNs) and Deep Learning. However, such complex paradigms require extensive resources, which prevents their deployment on resource-constraint mobile and embedded devices that simultaneously need to process high resolution images. Common approaches in reducing resources involve techniques such as quantization, pruning, compression, etc. While these techniques are efficient up to a certain aspect, they are built on traditional computationally inspired approaches. On the other hand, mammalian vision utilizes saliency and memory among other techniques, and limits attention during a visual search within a significantly limited search space. In this talk therefore, I will present our efforts to reduce the processing demands of edge-based CNN inference, via inclusion of a hierarchical framework that enables to detect objects in high-resolution video frames, and maintain the accuracy of state-of-the-art CNN-based object detectors, validated on UAV platforms in various applications involving car and pedestrian detection.

Theocharis (Theo) Theocharides holds a Ph.D. in Computer Engineering from Penn State University, working in the areas of low-power, resource constrained computer architecture and embedded systems design with emphasis on computer vision and machine learning applications. Along with his students, his research encapsulates the design, development, implementation and deployment of low-power and reliable on-chip application-specific architectures, real-time embedded systems design, with emphasis on acceleration of computer vision and artificial intelligence algorithms in hardware, geared towards edge computing, and in utilizing reconfigurable hardware towards self-aware, evolvable and robust intelligent edge computing systems. Theo is a Senior Member of the IEEE, a member of the ACM, and currently he is an Associate Editor for IEEE Consumer Electronics magazine, ACM’s Journal on Emerging Technologies, the IET Computers and Digital Techniques, and the ETRI journal. He is also currently serving as the Application Track Chair for the Design, Automation and Test in Europe (DATE) Conference.

==========================

Watch on YouTube:
Venkat Rangan
Theocharis Theocharides

Download presentation slide:
Venkat Rangan + https://discord.gg/3qbXujE
Theocharis Theocharides

Feel free to ask your questions on this thread and keep the conversation going!

@Venkat - Here are the questions that we couldn’t address during the talk, grouped by topic

  1. Power
    a) How is the power comparison between FPGA, GPU, TPU, etc?
    b) What metrics do you use for the amount of power collected to expect from energy harvesting from solar cells?
    c) Would years of use be a good metric for power consumption?

  2. Communications
    a) What kinds of communication are good or capable in low power sleep?
    b) Do you have a version of the power/pie chart analysis with BLE?
    c) Have you tried communication with MQTT?

  3. Cameras
    a) Did you take 3D data for this image sensing application? What kind of a camera did you utilize for the image sensing and how did you weigh the trade-off of different cameras?
    b) What difference between your motion detection and DVS camera?

  4. tinyVision platform
    a) Do you have a dev/sample platform and environment that someone can take and start on their own?
    b) What would be the target price for the tinyVision’s Vision FPGA SoM?

  5. Misc.
    a) You had mentioned off-the-shelf (OTS) may not work for such constrained applications, but OTS is sort of critical for deployed systems as too many customs systems can cause higher costs - what are your thoughts about this?

@ttheocharides - Here are the questions that we couldn’t address during the talk

  1. Did you consider the ZED Stereo Camera?
  2. What are your thoughts on Voronoi cubes for depth estimation?
  3. What kind of a dataset did you consider using? Was MS COCO dataset ever in consideration?
  4. If you do inference with a sliding-window approach based on movement, how do you train the model? Do you just crop all training images to tiny windows based on the edge detectors?
  5. Have you explored NN that operate directly on the compressed video stream? Do you see this as a viable options for reducing host processor workload?
  6. Very interesting, but it only works on the day, or you can solve that using nightvision cameras?
  7. Trying to make more efficient the algorithms or hardware would be unneeded if research can reach little size batteries with a great capacity?

Hi everyone. My name is Christos Kyrkou, I am one of the lead authors of the research presented by Prof. Theocharides. First of all, thanks for hosting and for gathering these interesting questions. I will try as best I can to address them here:

  1. Did you consider the ZED Stereo Camera?
  • We used the Bumblebee®2 FireWire camera and some FPGA based stereo module. The approaches shown in the presentation are not constraint to a specific set of stereo camera.
  1. What are your thoughts on Voronoi cubes for depth estimation?
  • They seem to be quite good at smoothing the map. Not sure about run-time and added complexity though.
  1. What kind of a dataset did you consider using? Was MS COCO dataset ever in consideration?
  • Our use-cases focus on footage captured from drones so we have constructed a custom dataset. From some experiments that we carried out MS COCO pretraining does not help much since we have different scales and viewpoints of objects.
  1. If you do inference with a sliding-window approach based on movement, how do you train the model? Do you just crop all training images to tiny windows based on the edge detectors?
  • In this case we train the model as normal with a dataset of patches from each class. Then only during inference we take advantage of motion to decide which regions to classify and which to not.
  1. Have you explored NN that operate directly on the compressed video stream? Do you see this as a viable options for reducing host processor workload?
  • Not ourselves but I think there are some works that do classification on JPG compressed images. It definitely seems an interesting path worth exploring.
  1. Very interesting, but it only works on the day, or you can solve that using nightvision cameras?
  • As with the human visual system standard cameras need light so this makes things more difficult to deploy at night. Of course there are other types of cameras that can sense different range of the spectrum which might provide more robust input. With regards to the data modality we only need to retrain the models otherwise the algorithms are agnostic to this.
  1. Trying to make more efficient the algorithms or hardware would be unneeded if research can reach little size batteries with a great capacity?
  • It is not only a matter of having enough battery. A processor has limited capacity. So if you want to run multiple inference models/algorithms then each one should be optimized. Also, there is the issue of heat. If you run a complex algorithm that constantly looks at using all the hardware at the maximum frequency then in the long run you risk of higher degradation and need to account for heat dissipation. If your algorithm is lightweight and you can make a hardware accelerator for it, it will be much more efficient in all those aspects.

I hope that I have at least partially addressed some of the questions. Please do not hesitate to contact me in case you need any more information or are interested in future collaboration!