Q&A for the talk: Enabling Neural network at the low power edge: A neural network compiler for hardware constrained embedded system by Chao Xu

  • Besides using tflite models as a starting point, are there plans to **directly ** support matlab?
    Ans: Yes. In about 3 months.

  • For multiple device optimization, how does TENSAI takes care of data dependency between different layers.
    Ans: For inference, data usually flows forward from input to output. The compiler takes care of the data pipeline with DMA in the background. If there are some dependence among multiple layers, Tensai compiler will wait for all data sources available before moving to next layer.

  • Can you support facial recognition under the same power consumption specifications?
    Ans: Currently we have not implemented facial recognition yet. But this is our next effort. I think there will be some trade-off between the power and accuracy. I think we can use it as an always on visual wake up application to trigger more accurarte NN for further processing.

  • Are all tensorflow operations supported?
    Ans: At this moment, we support TFLite kernels.

  • Is there any alignment with machine learning standards such as Open Neural Network Exchange (ONNX)? Ans: Yes.

  • What are the devices supported in TENSAI compiler ? I mean is it only supported on ETA devices or other devices too? Do you have any results of TENSAI compiler compared with TVM apache open source compiler?
    Ans: Currently we support Eta Compute (ECM3531, ECM3532) devices. This can be extended for other 3rd party silicons. Tensai is more optimized based on hardware and software co-optimizations.

  • are currently binary networks supported? xnor operations? Yes.

  • Thoughts on Mythic or Flexlogic devices? In future.

  • Will Tensai flow be available for free? Any dev kits coming? Only for certain partners.

  • what’s your experience with the TFLite quantized model on the LSTM/GRU type of networks? seems TFLite has limited support on these… Ans: We further optimize it with special optimizations.

  • The power number you given is that measuring the e2e power, or just the computation core? Ans: Some I/O turned off during measurements and e2e.

  • Can you give more details about how did you measure the power and energy? We measured the single VDD power pin and turn off LED GPIO I/Os. All supplies are generated inside the chip from this single external supply.

  • are you supporting custom op? if so, how? YES

  • What about segmentation, for example lane following. Would this be do-able?: We have not implemented this.

  • What is the RAM size for the People counting demo using Mobilenet v1?: 256K SRAM