SATLAB - An End to End framework for Labelling Satellite Images

Aggarwal, Shantanu

In this work, I propose a novel, unsupervised framework titled SATLAB, to label satellite images, given a classification task at hand. Existing models for satellite image classification such as DeepSAT and DeepSAT-V2 rely on deep learning models that are label-hungry…

In this work, I propose a novel, unsupervised framework titled SATLAB, to label satellite images, given a classification task at hand. Existing models for satellite image classification such as DeepSAT and DeepSAT-V2 rely on deep learning models that are label-hungry and require a significant amount of training data. Since manual curation of labels is expensive, I ensure that SATLAB requires zero training labels. SATLAB can work in conjunction with several generative and unsupervised machine learning models by allowing them to be seamlessly plugged into its architecture. I devise three operating modes for SATLAB - manual, semi-automatic and automatic which require varying levels of human intervention in creating the domain-specific labeling functions for each image that can be utilized by the candidate generative models such as Snorkel, as well as other unsupervised learners in SATLAB. Unlike existing supervised learning baselines which only extract textural features from satellite images, I support the extraction of both textural and geospatial features in SATLAB, and I empirically show that geospatial features enhance the classification F1-score by 33%. I build SATLAB on the top of Apache Sedona in order to leverage its rich set of spatial query processing operators for the extraction of geospatial features from satellite raster images. I evaluate SATLAB on a target binary classification task that distinguishes slum from non-slum areas, upon a repository of 100K satellite images captured by the Sentinel satellite program. My 5-Fold Cross Validation (CV) experiments show that SATLAB achieves competitive F1-scores (0.6) using 0% labeled data while the best baseline supervised learning baseline achieves 0.74 F1-score using 80% labeled data. I also show that Snorkel outperforms alternative generative and unsupervised candidate models that can be plugged into SATLAB by 33% to 71% w.r.t. F1-score and 3 times to 73 times w.r.t. latency. I also show that downstream classifiers trained using the labels generated by SATLAB are comparable in quality (0.63 F1) to their counterpart classifiers (0.74 F1) trained on manually curated labels.

Copyright Statement