A Study on Generative Adversarial Networks Exacerbating Social Data Bias

158485-Thumbnail Image.png
Description
Generative Adversarial Networks are designed, in theory, to replicate the distribution of the data they are trained on. With real-world limitations, such as finite network capacity and training set size, they inevitably suffer a yet unavoidable technical failure: mode collapse.

Generative Adversarial Networks are designed, in theory, to replicate the distribution of the data they are trained on. With real-world limitations, such as finite network capacity and training set size, they inevitably suffer a yet unavoidable technical failure: mode collapse. GAN-generated data is not nearly as diverse as the real-world data the network is trained on; this work shows that this effect is especially drastic when the training data is highly non-uniform. Specifically, GANs learn to exacerbate the social biases which exist in the training set along sensitive axes such as gender and race. In an age where many datasets are curated from web and social media data (which are almost never balanced), this has dangerous implications for downstream tasks using GAN-generated synthetic data, such as data augmentation for classification. This thesis presents an empirical demonstration of this phenomenon and illustrates its real-world ramifications. It starts by showing that when asked to sample images from an illustrative dataset of engineering faculty headshots from 47 U.S. universities, unfortunately skewed toward white males, a DCGAN’s generator “imagines” faces with light skin colors and masculine features. In addition, this work verifies that the generated distribution diverges more from the real-world distribution when the training data is non-uniform than when it is uniform. This work also shows that a conditional variant of GAN is not immune to exacerbating sensitive social biases. Finally, this work contributes a preliminary case study on Snapchat’s explosively popular GAN-enabled “My Twin” selfie lens, which consistently lightens the skin tone for women of color in an attempt to make faces more feminine. The results and discussion of the study are meant to caution machine learning practitioners who may unsuspectingly increase the biases in their applications.
Date Created
2020
Agent

#MeToo: Polarization and Discourse in the Digital Age

132693-Thumbnail Image.png
Description
Social media is explosively popular in discussing socio-political issues. This work provides a preliminary study on how polarization occurs online. Chapter I begins by introducing limitations of the internet in maintaining a free flow of information. Not only do users

Social media is explosively popular in discussing socio-political issues. This work provides a preliminary study on how polarization occurs online. Chapter I begins by introducing limitations of the internet in maintaining a free flow of information. Not only do users seek out groups of like-minded individuals and insulate themselves from opposing views, social media platforms algorithmically curate content such that it will be in line with a user’s preconceived notions of the world. The work then defines polarization and carefully discusses its most prominent causes. It then shifts focus to analyze a closely-related issue regarding political discourse: outrage, which is both a noticeable effect of and further cause of polarization. It is clearly prevalent in traditional media, but for completion, I provide a case study to measure its incidence in social media. In Chapter II, I scrutinize the language used in the #MeToo movement on Twitter and draw conclusions about the issues Twitter users focus on and how they express their views. This chapter details the method I used, the challenges I faced in designing the exploratory study, and the results I found. I benchmark patterns I find in the Twitterverse against those I find in The Wall Street Journal. The analysis relies upon the metric of word similarity, based on proximity of and frequency of words used together, to make distinctions about what users are most commonly saying with respect to given topics, or keywords. Chapter III closes the essay with conclusions of socio-political polarization, discourse, and outrage in social media. Finally, the essay outlines potential channels for future work.
Date Created
2019-05
Agent