tinyML Talks on March 9, 2021 “Positive Unlabeled Learning for Tiny ML” by Kristen Jaskie

We held our next tinyML Talks webcast. Kristen Jaskie from Arizona State University has presented Positive Unlabeled Learning for Tiny ML on March 9, 2021.

March 9 forum

Real world data is often only partially labeled. Because completely labeling data can be expensive or even impossible in some cases, a common scenario involves having only a small number of labeled samples from the class of interest, and a large quantity of unlabeled and unknown data. A classification boundary differentiating the underlying positive and negative classes is still desired. This is known as the Positive and Unlabeled learning problem, or PU learning, and is of growing importance in machine learning. Fortunately, PU learning algorithms exist that can create effective models using low power and memory requirements. In this talk, Ms. Jaskie will present several potential embedded applications for PU learning and describe how sensors, tiny ML, and PU learning all complement one another. In addition, she will describe low complexity solutions and explain why the techniques are so effective and in growing demand.

Kristen Jaskie is a Ph.D. student in Electrical Engineering in the ECEE school at ASU and she is a research associate with SenSIP. She received her B.S in Computer Science from the University of Washington and her M.S. in Computer Science specializing in AI and Machine Learning (ML) at the University of California San Diego. Kristen’s main areas of interest are in ML algorithm development and ML education. Specific interests include semi-supervised learning and the positive unlabeled learning problem. She is writing a monograph on the subject to be published later this year. In addition, Kristen owns her own consulting company and was a faculty member and department chair in Computer Science at Glendale Community College in Glendale, AZ for several years before returning to school to complete her Ph.D. She is expecting to graduate in Spring 2021.

==========================

Watch on YouTube:
Kristen Jaskie

Download presentation slide:
Kristen Jaskie

Feel free to ask your questions on this thread and keep the conversation going!

Answers to a Q&A questions from the webinar:
Q: Doesn’t PU lead to more false positives?
A: It depends on the dataset and (a) how separable the positive and negative datasets are in the featurespace available and (b) how well distributed the labeled positive samples are from the unlabeled positive samples. In a situation with substantial label bias, this can lead to false positives, but it is actually more common that label bias leads to false negatives as usually stronger positives are more likely to be labeled than weaker positives. With reasonably separable datasets and a reasonable label distribution, false positives are no more common than with any supervised learning algorithm.

Q: Does unlabeled data and the False-negative or Type 2 Error relevant?
A: I’m not entirely sure what’s being asked here, but the Positive Unlabeled learning algorithm is a very useful algorithm in a situation where a medical or other test has a high false negative rate and a low false positive rate which is fairly common in the medical field. In this scenario, a negative result cannot be trusted and becomes an “unknown” while positive results can be trusted and remain as positives giving us the PU learning problem. If you’re asking about false negatives resulting from a PU classification, this can be a problem if substantial label bias exists. Typically, label bias manifests as weakly positive samples being less likely to be labeled than strongly positive samples. This can make a PU algorithm believe the positive/negative threshold is further into the positive distribution than it is, resulting in false negatives. In a situation with a known label bias, tuning the threshold would be required. I hope I answered your question!

Answers to a Q&A questions from the webinar:
Q: We otten don’t know about the distributions when we are solving a practical problem with multiple features. The 1st assumption would affect the final results in large, how do we tackle them?
A: The first assumption was that we assume that the positive and negative datasets are at least partially separable in the featurespace provided. This assumption is implicitly assumed by every supervised or semi-supervised algorithm in existence, it’s just rarely explicitly stated :). Effective classification is not possible without this. Imagine having a two dimensional dataset with completely mixed data so that you couldn’t draw even a wiggly line to separate it within reason. If you can’t, a computer certainly isn’t going to be able to do so. So it is generally assumed for any classification problem that the sets are separable, and then if a classification attempt has consistently low accuracy, then either more data points or more features are typically required.

Q: How much would the results you showed be impacted when the second assumption is not true? Basically have you tried not random sampling in the datasets whose results you showed.
A: Short answer: This is really important and I haven’t done it yet - nobody has really done it yet.

Long answer:
This is an excellent question and an important next step. At the moment, this is a difficult thing to do as there is no real standard way to do it. I’m currently working on a set of benchmarking datasets and evaluation metrics that I hope will allow PU learning algorithms to be more easily compared with one another. Building simulated datasets with realistic bias is non-trivial, though not really difficult - I just haven’t done it yet :). Bias in real-world datasets often causes strongly positive samples to be more likely to be labeled than weakly positive samples. A person with advanced diabetes is more likely to be diagnosed than a person with early diabetes, which is really the person we’re really trying to diagnose with our model. Even using a labeled diabetes dataset may not be effective as some of the borderline cases may have been undiagnosed when the person actually should have been marked positive.

For a dataset such as MNIST, I would need to label the digits that were most clearly a 3 with the best penmanship. This could be done by creating an “average” or model of the 3 and then checking each sample by how different it was from the model. Only ones similar to the model would be labeled. This would need to be verified by hand which is time consuming as there are 6000 images of the number 3.

Some researchers in Belgium are working on a way to allow bias in some but not all features. I could and should test that, though I haven’t done so yet. This would be like saying that only MNIST 3s whose 15th pixel over and the 4th pixel down was really dark should be labeled - the labeling is biassed based on that feature.