tinyML Talks on April 27, 2021 “Train-by-weight (TBW): Accelerated Deep Learning by Data Dimensionality Reduction” by Xingheng Lin and Michael Jo

We held our next tinyML Talks webcast. Michael Jo and Xingheng Lin from Rose-Hulman Institute of Technology presented Train-by-weight (TBW): Accelerated Deep Learning by Data Dimensionality Reduction on April 27, 2021.

The state-of-the-arts pretrained machine/deep learning (M/DL) models are available in tinyML community for numerous applications. However, training these models for new objects and retraining the pretrained models are computationally expensive.
Our proposed Train-by-Weight (TBW) approach is a combination of linear classifier, such as principal component analysis (PCA), and a nonlinear classifier, such as deep learning model. There are two key contributions in this approach. First, we perform dimensionality reduction by generating weighted data sets using linear classifiers. Secondly, weighted data sets offer essential data sets to M/DL model. As a result, we achieved reduced training and verification time by maximum 88% in deep artificial neural network model with approximately 1% accuracy loss.
TinyML community may benefit from the proposed approach by faster training of M/DL models due to lower bandwidth of data. Moreover, this may offer energy efficient hardware/software solutions due to its relatively simple architecture.

Xingheng Lin was born in Jiangxi Province, China, in 2000. He is currently pursuing the B. S. degree in computer engineering at Rose-Hulman Institute of Technology. His primary research interests are Principle Component Analysis based machine learning and deep learning acceleration. Besides his primary research project, Xingheng is currently working on pattern recognition of rapid saliva COVID-19 test response which is a collaboration with 12-15 Molecular Diagnostics.

Michael Jo received his Ph.D. in Electrical and Computer Engineering in 2018 from the University of Illinois at Urbana-Champaign. He is currently an assistant professor at Rose-Hulman Institute of Technology in the department of Electrical and Computer Engineering. His current research interests are accelerated embedded machine learning, computer vision, and integration of artificial intelligence and nanotechnology.


Watch on YouTube:
Michael Jo and Xingheng Lin

Download presentation slides:
Michael Jo and Xingheng Lin

Feel free to ask your questions on this thread and keep the conversation going!

1 Like

How much time do you need to perform PCA for the entire dataset ? When you share the speed improvement do you account the time needed to perform PCA aswell?

“This is a great question and we apologize we did not include this in our calculation as it was ignorable. The preprocessing took between 0.3s and 0.61s. Percentage-wise it was between 0.36% and 0.9%: Percentage = (T_preprocessing*100%)/(T_preprocessing + Time for converging) [The data is attached below]”

Were these stats on the previous slide averaged across multiple training sessions?
Yes, it is the average of five training sessions.

Could it be possible that you just embed one more hidden layer next to the input layer (i.e., 1st hidden layer) that you can project the original image data into lower dim. (say from input: 256^2 —> 10^10), into the origninal CNN?So you don’t have to additionally do PCA preprocessing to do the data?
That may be possible. However we did not consider making changes in the structure of deep learning model. Our idea was an algorithmic approach by simply combining linear and non-linear classifier and achieve acceleration. The suggested idea sounds relevant to PCANet: DOI: 10.1109/TIP.2015.2475625

How sensitive is your method to adversarial examples?

We have not measured the sensitivity yet. Speed-wise, our method only depends on the data size and the acceleration will remain effective.

Have you compared the results of the PCA-based CNN classification with an equivalent size reduction only using downsampling into a CNN?

We have not experimented for comparison yet but we believe our method will take less time due to reducing the data size by extracting most of the features with higher variance, which we believe are helping faster convergence in deep learning models.

Do you think this method can be extended to other applicaitons like object detection, segmentation etc ?

The direct application may be object detection. We are certain this could also be extended to segmentation or clustering.

In addition to PCA, ever think of other dimension reduction skills, such as projection pursiut, which from my hunch, is essentially quite similar to the idea I mentioned in previous question.

Yes, linear classifiers such as ICA and NMF can also be used as they allow dimensionality reduction and weighted data as a result. However, the acceleration might not be the same as NMF, for example, is a heuristic method.

How this method performs on other datasets such CIFAR 100, COCO or others? did you evaluate on other datasets?

Yes, we can apply to other data sets and we did applied to CIFAR 100.

How was the 10^2 size selected? Can you use the statistical info provided by PCA to auto-choose the reduced dimention size?

10 by 10 size is decided by the number of principal components. We may choose any squared number and in turn the deep learning model will down-size. However, we cannot make it too small due to lack of features.

Does this method also include the possibility of reducing the number of images and not just the dimensionality?

Possible but we are not certain if the method can contribute to less number of images for training, besides the smaller size image.