Positive Unlabeled Learning - Optimization and Evaluation
Description
In many real-world machine learning classification applications, well labeled training data can be difficult, expensive, or even impossible to obtain. In such situations, it is sometimes possible to label a small subset of data as belonging to the class of interest though it is impractical to manually label all data not of interest. The result is a small set of positive labeled data and a large set of unknown and unlabeled data. This is known as the Positive and Unlabeled learning (PU learning) problem, a type of semi-supervised learning. In this dissertation, the PU learning problem is rigorously defined, several common assumptions described, and a literature review of the field provided. A new family of effective PU learning algorithms, the MLR (Modified Logistic Regression) family of algorithms, is described. Theoretical and experimental justification for these algorithms is provided demonstrating their success and flexibility. Extensive experimentation and empirical evidence are provided comparing several new and existing PU learning evaluation estimation metrics in a wide variety of scenarios. The surprisingly clear advantage of a simple recall estimate as the best estimate for overall PU classifier performance is described. Finally, an application of PU learning to the field of solar fault detection, an area not previously explored in the field, demonstrates the advantage and potential of PU learning in new application domains.
Date Created
The date the item was original created (prior to any relationship with the ASU Digital Repositories.)
2021
Agent
- Author (aut): Jaskie, Kristen P
- Thesis advisor (ths): Spanias, Andreas
- Committee member: Blain-Christen, Jennifer
- Committee member: Tepedelenlioğlu, Cihan
- Committee member: Thiagarajan, Jayaraman
- Publisher (pbl): Arizona State University