There is a need for easy to use tools. Instead of having separate training and optimizing tools, it is desirable to have an integrated tool that takes the training data and gives you simple C code for a deep net that executes as-is on a low-end MCU or DSP.
Google, ST, NXP, and ARM’s libraries for MCU-based inference are currently way more cumbersome than they need to be.
Another important issue has to do with training objectives. Currently, most approaches convert a high-precision, trained network to a version that runs on simpler hardware. There is a need for training regimes with dual-purpose objective functions that try to reduce the prediction error and weight-matrix complexity simultaneously, starting from the very first epoch.