Benchmarking and Improving NN Execution on Digital Signal Processor for Hearing Instruments

Here are the questions and answers from tinyML talk " Benchmarking and Improving NN Execution on Digital Signal Processor for Hearing Instruments" by Zuzana Jelcicova at Demant.

  1. On Slide 14: If the network is not trained to the leading edge of state-of-the-art, how representative are the presented numbers? Could it be that a very well-trained network with higher accuracy would already lose significant accuracy at 8bit?
    [Answer] Of course, a bigger/smaller gap between the baseline model and its 8-bit quantized version could occur for a well-trained network. However, we would not expect a significant loss – rather the same trend as (suitable) post-training quantization results in little degradation in model accuracy.

  2. My question is on acquiring datasets. Please which is the best place to get Rainfall datasets?
    [Answer] We do not work with Rainfall datasets. I can, therefore, not give you an answer, unfortunately. (you could though try to search on Kaggle).

  3. Does the Q5.19 processing force your implementation into custom logic (FPGA/GA) or have you run proforma on general-purpose processing too (in simulation)? If general uP possible, are there libraries?
    [Answer] The vector math supporting 4x Q5.19 format is implemented directly in our DSP. We did not use external libraries – the processing of the Q format is done by ourselves.

  4. When comparing energy on slide 30, what is the clock speed?
    [Answer] The energy numbers are estimated based on Horowitz’s [1] paper for a 45nm process, so frequency is not necessary. To do the calculations, only the total amount of MACs and memory accesses are needed.
    [1] M. Horowitz, “1.1 computing’s energy problem (and what we can do about it),” in2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC), vol. 57, Feb 2014, pp. 10–14.

  5. Can you rank the optimizations with respect to potential energy efficiency improvement?
    [Answer] Each of the optimizations brings significant improvements and complements well the other two. Regarding the ranking, we could keep them in the given order, i.e.: 1) Reduced wordlength 2) Parallel MACs 3) Two-step scaling.

  6. Are there any results to compare such hardware for the Feature Extraction (cepstrum domain transformation) stage?
    [Answer] We did not include the MFCC feature extraction in the results for this experiment. If you are interested in comparing the efficiency of different implementations in general, there are many works that state power required to execute the feature extraction stage.

  7. Can you provide more context regarding your use case of the keyword spotting model in hearing aid?
    [Answer] Keyword spotting could be used for many actions, e.g. adjusting settings or changing environments - without using a smartphone. This would be, especially, useful in situations that require user’s full attention, and/or interaction with a smartphone is difficult/not possible.

  8. What hardware used in this practice?
    [Answer]We have not deployed our accelerator on any device yet - if that is what you are asking.

  9. What kind of personalization do we need to learn on hearing aid?
    [Answer]Every user has their own needs and preferences when it comes to hearing instrument settings. Therefore, it is important to tweak the settings individually for every person. In order to do so, the user must go through the fitting and adjustment stages carried out by a hearing specialist. You can, therefore, imagine the advantages that a neural network learning can add in the process of fitting the hearing instrument to the preferences of the individual user. Ideally, you would like to have a hearing instrument that is capable of adjusting seamlessly to any environment without users even noticing, and being able to do e.g. noise reduction, speech separation, etc. (what human auditory brain of people with normal hearing does without bigger effort).