Managing a user's vulnerability on a social networking site

153374-Thumbnail Image.png
Description
Users often join an online social networking (OSN) site, like Facebook, to remain social, by either staying connected with friends or expanding social networks. On an OSN site, users generally share variety of personal information which is often expected

Users often join an online social networking (OSN) site, like Facebook, to remain social, by either staying connected with friends or expanding social networks. On an OSN site, users generally share variety of personal information which is often expected to be visible to their friends, but sometimes vulnerable to unwarranted access from others. The recent study suggests that many personal attributes, including religious and political affiliations, sexual orientation, relationship status, age, and gender, are predictable using users' personal data from an OSN site. The majority of users want to remain socially active, and protect their personal data at the same time. This tension leads to a user's vulnerability, allowing privacy attacks which can cause physical and emotional distress to a user, sometimes with dire consequences. For example, stalkers can make use of personal information available on an OSN site to their personal gain. This dissertation aims to systematically study a user vulnerability against such privacy attacks.

A user vulnerability can be managed in three steps: (1) identifying, (2) measuring and (3) reducing a user vulnerability. Researchers have long been identifying vulnerabilities arising from user's personal data, including user names, demographic attributes, lists of friends, wall posts and associated interactions, multimedia data such as photos, audios and videos, and tagging of friends. Hence, this research first proposes a way to measure and reduce a user vulnerability to protect such personal data. This dissertation also proposes an algorithm to minimize a user's vulnerability while maximizing their social utility values.

To address these vulnerability concerns, social networking sites like Facebook usually let their users to adjust their profile settings so as to make some of their data invisible. However, users sometimes interact with others using unprotected posts (e.g., posts from a ``Facebook page\footnote{The term ''Facebook page`` refers to the page which are commonly dedicated for businesses, brands and organizations to share their stories and connect with people.}''). Such interactions help users to become more social and are publicly accessible to everyone. Thus, visibilities of these interactions are beyond the control of their profile settings. I explore such unprotected interactions so that users' are well aware of these new vulnerabilities and adopt measures to mitigate them further. In particular, {\em are users' personal attributes predictable using only the unprotected interactions}? To answer this question, I address a novel problem of predictability of users' personal attributes with unprotected interactions. The extreme sparsity patterns in users' unprotected interactions pose a serious challenge. Therefore, I approach to mitigating the data sparsity challenge by designing a novel attribute prediction framework using only the unprotected interactions. Experimental results on Facebook dataset demonstrates that the proposed framework can predict users' personal attributes.
Date Created
2015
Agent

Orthogonal Rank-One Matrix Pursuit for Low Rank Matrix Completion

129297-Thumbnail Image.png
Description

In this paper, we propose an efficient and scalable low rank matrix completion algorithm. The key idea is to extend the orthogonal matching pursuit method from the vector case to the matrix case. We further propose an economic version of

In this paper, we propose an efficient and scalable low rank matrix completion algorithm. The key idea is to extend the orthogonal matching pursuit method from the vector case to the matrix case. We further propose an economic version of our algorithm by introducing a novel weight updating rule to reduce the time and storage complexity. Both versions are computationally inexpensive for each matrix pursuit iteration and find satisfactory results in a few iterations. Another advantage of our proposed algorithm is that it has only one tunable parameter, which is the rank. It is easy to understand and to use by the user. This becomes especially important in large-scale learning problems. In addition, we rigorously show that both versions achieve a linear convergence rate, which is significantly better than the previous known results. We also empirically compare the proposed algorithms with several state-of-the-art matrix completion algorithms on many real-world datasets, including the large-scale recommendation dataset Netflix as well as the MovieLens datasets. Numerical results show that our proposed algorithm is more efficient than competing algorithms while achieving similar or better prediction performance.

Date Created
2014-11-30
Agent

Computing distrust in social media

153339-Thumbnail Image.png
Description
A myriad of social media services are emerging in recent years that allow people to communicate and express themselves conveniently and easily. The pervasive use of social media generates massive data at an unprecedented rate. It becomes increasingly difficult for

A myriad of social media services are emerging in recent years that allow people to communicate and express themselves conveniently and easily. The pervasive use of social media generates massive data at an unprecedented rate. It becomes increasingly difficult for online users to find relevant information or, in other words, exacerbates the information overload problem. Meanwhile, users in social media can be both passive content consumers and active content producers, causing the quality of user-generated content can vary dramatically from excellence to abuse or spam, which results in a problem of information credibility. Trust, providing evidence about with whom users can trust to share information and from whom users can accept information without additional verification, plays a crucial role in helping online users collect relevant and reliable information. It has been proven to be an effective way to mitigate information overload and credibility problems and has attracted increasing attention.

As the conceptual counterpart of trust, distrust could be as important as trust and its value has been widely recognized by social sciences in the physical world. However, little attention is paid on distrust in social media. Social media differs from the physical world - (1) its data is passively observed, large-scale, incomplete, noisy and embedded with rich heterogeneous sources; and (2) distrust is generally unavailable in social media. These unique properties of social media present novel challenges for computing distrust in social media: (1) passively observed social media data does not provide necessary information social scientists use to understand distrust, how can I understand distrust in social media? (2) distrust is usually invisible in social media, how can I make invisible distrust visible by leveraging unique properties of social media data? and (3) little is known about distrust and its role in social media applications, how can distrust help make difference in social media applications?

The chief objective of this dissertation is to figure out solutions to these challenges via innovative research and novel methods. In particular, computational tasks are designed to {\it understand distrust}, a innovative task, i.e., {\it predicting distrust} is proposed with novel frameworks to make invisible distrust visible, and principled approaches are develop to {\it apply distrust} in social media applications. Since distrust is a special type of negative links, I demonstrate the generalization of properties and algorithms of distrust to negative links, i.e., {\it generalizing findings of distrust}, which greatly expands the boundaries of research of distrust and largely broadens its applications in social media.
Date Created
2015
Agent

Understanding social media users via attributes and links

153259-Thumbnail Image.png
Description
With the rise of social media, hundreds of millions of people spend countless hours all over the globe on social media to connect, interact, share, and create user-generated data. This rich environment provides tremendous opportunities for many different players to

With the rise of social media, hundreds of millions of people spend countless hours all over the globe on social media to connect, interact, share, and create user-generated data. This rich environment provides tremendous opportunities for many different players to easily and effectively reach out to people, interact with them, influence them, or get their opinions. There are two pieces of information that attract most attention on social media sites, including user preferences and interactions. Businesses and organizations use this information to better understand and therefore provide customized services to social media users. This data can be used for different purposes such as, targeted advertisement, product recommendation, or even opinion mining. Social media sites use this information to better serve their users.

Despite the importance of personal information, in many cases people do not reveal this information to the public. Predicting the hidden or missing information is a common response to this challenge. In this thesis, we address the problem of predicting user attributes and future or missing links using an egocentric approach. The current research proposes novel concepts and approaches to better understand social media users in twofold including, a) their attributes, preferences, and interests, and b) their future or missing connections and interactions. More specifically, the contributions of this dissertation are (1) proposing a framework to study social media users through their attributes and link information, (2) proposing a scalable algorithm to predict user preferences; and (3) proposing a novel approach to predict attributes and links with limited information. The proposed algorithms use an egocentric approach to improve the state of the art algorithms in two directions. First by improving the prediction accuracy, and second, by increasing the scalability of the algorithms.
Date Created
2014
Agent

Graph-based sparse learning: models, algorithms, and applications

153196-Thumbnail Image.png
Description
Sparse learning is a powerful tool to generate models of high-dimensional data with high interpretability, and it has many important applications in areas such as bioinformatics, medical image processing, and computer vision. Recently, the a priori structural information has been

Sparse learning is a powerful tool to generate models of high-dimensional data with high interpretability, and it has many important applications in areas such as bioinformatics, medical image processing, and computer vision. Recently, the a priori structural information has been shown to be powerful for improving the performance of sparse learning models. A graph is a fundamental way to represent structural information of features. This dissertation focuses on graph-based sparse learning. The first part of this dissertation aims to integrate a graph into sparse learning to improve the performance. Specifically, the problem of feature grouping and selection over a given undirected graph is considered. Three models are proposed along with efficient solvers to achieve simultaneous feature grouping and selection, enhancing estimation accuracy. One major challenge is that it is still computationally challenging to solve large scale graph-based sparse learning problems. An efficient, scalable, and parallel algorithm for one widely used graph-based sparse learning approach, called anisotropic total variation regularization is therefore proposed, by explicitly exploring the structure of a graph. The second part of this dissertation focuses on uncovering the graph structure from the data. Two issues in graphical modeling are considered. One is the joint estimation of multiple graphical models using a fused lasso penalty and the other is the estimation of hierarchical graphical models. The key technical contribution is to establish the necessary and sufficient condition for the graphs to be decomposable. Based on this key property, a simple screening rule is presented, which reduces the size of the optimization problem, dramatically reducing the computational cost.
Date Created
2014
Agent

Personalized POI recommendation on location-based social networks

153140-Thumbnail Image.png
Description
The rapid urban expansion has greatly extended the physical boundary of our living area, along with a large number of POIs (points of interest) being developed. A POI is a specific location (e.g., hotel, restaurant, theater, mall) that a user

The rapid urban expansion has greatly extended the physical boundary of our living area, along with a large number of POIs (points of interest) being developed. A POI is a specific location (e.g., hotel, restaurant, theater, mall) that a user may find useful or interesting. When exploring the city and neighborhood, the increasing number of POIs could enrich people's daily life, providing them with more choices of life experience than before, while at the same time also brings the problem of "curse of choices", resulting in the difficulty for a user to make a satisfied decision on "where to go" in an efficient way. Personalized POI recommendation is a task proposed on purpose of helping users filter out uninteresting POIs and reduce time in decision making, which could also benefit virtual marketing.

Developing POI recommender systems requires observation of human mobility w.r.t. real-world POIs, which is infeasible with traditional mobile data. However, the recent development of location-based social networks (LBSNs) provides such observation. Typical location-based social networking sites allow users to "check in" at POIs with smartphones, leave tips and share that experience with their online friends. The increasing number of LBSN users has generated large amounts of LBSN data, providing an unprecedented opportunity to study human mobility for personalized POI recommendation in spatial, temporal, social, and content aspects.

Different from recommender systems in other categories, e.g., movie recommendation in NetFlix, friend recommendation in dating websites, item recommendation in online shopping sites, personalized POI recommendation on LBSNs has its unique challenges due to the stochastic property of human mobility and the mobile behavior indications provided by LBSN information layout. The strong correlations between geographical POI information and other LBSN information result in three major human mobile properties, i.e., geo-social correlations, geo-temporal patterns, and geo-content indications, which are neither observed in other recommender systems, nor exploited in current POI recommendation. In this dissertation, we investigate these properties on LBSNs, and propose personalized POI recommendation models accordingly. The performance evaluated on real-world LBSN datasets validates the power of these properties in capturing user mobility, and demonstrates the ability of our models for personalized POI recommendation.
Date Created
2014
Agent

Simultaneous variable and feature group selection in heterogeneous learning: optimization and applications

153085-Thumbnail Image.png
Description
Advances in data collection technologies have made it cost-effective to obtain heterogeneous data from multiple data sources. Very often, the data are of very high dimension and feature selection is preferred in order to reduce noise, save computational cost and

Advances in data collection technologies have made it cost-effective to obtain heterogeneous data from multiple data sources. Very often, the data are of very high dimension and feature selection is preferred in order to reduce noise, save computational cost and learn interpretable models. Due to the multi-modality nature of heterogeneous data, it is interesting to design efficient machine learning models that are capable of performing variable selection and feature group (data source) selection simultaneously (a.k.a bi-level selection). In this thesis, I carry out research along this direction with a particular focus on designing efficient optimization algorithms. I start with a unified bi-level learning model that contains several existing feature selection models as special cases. Then the proposed model is further extended to tackle the block-wise missing data, one of the major challenges in the diagnosis of Alzheimer's Disease (AD). Moreover, I propose a novel interpretable sparse group feature selection model that greatly facilitates the procedure of parameter tuning and model selection. Last but not least, I show that by solving the sparse group hard thresholding problem directly, the sparse group feature selection model can be further improved in terms of both algorithmic complexity and efficiency. Promising results are demonstrated in the extensive evaluation on multiple real-world data sets.
Date Created
2014
Agent

TensorDB and tensor-relational model (TRM) for efficient tensor-relational operations

152906-Thumbnail Image.png
Description
Multidimensional data have various representations. Thanks to their simplicity in modeling multidimensional data and the availability of various mathematical tools (such as tensor decompositions) that support multi-aspect analysis of such data, tensors are increasingly being used in many application domains

Multidimensional data have various representations. Thanks to their simplicity in modeling multidimensional data and the availability of various mathematical tools (such as tensor decompositions) that support multi-aspect analysis of such data, tensors are increasingly being used in many application domains including scientific data management, sensor data management, and social network data analysis. Relational model, on the other hand, enables semantic manipulation of data using relational operators, such as projection, selection, Cartesian-product, and set operators. For many multidimensional data applications, tensor operations as well as relational operations need to be supported throughout the data life cycle. In this thesis, we introduce a tensor-based relational data model (TRM), which enables both tensor- based data analysis and relational manipulations of multidimensional data, and define tensor-relational operations on this model. Then we introduce a tensor-relational data management system, so called, TensorDB. TensorDB is based on TRM, which brings together relational algebraic operations (for data manipulation and integration) and tensor algebraic operations (for data analysis). We develop optimization strategies for tensor-relational operations in both in-memory and in-database TensorDB. The goal of the TRM and TensorDB is to serve as a single environment that supports the entire life cycle of data; that is, data can be manipulated, integrated, processed, and analyzed.
Date Created
2014
Agent

Identity authentication and near field device authentication for smart devices

152874-Thumbnail Image.png
Description
The widespread adoption of mobile devices gives rise to new opportunities and challenges for authentication mechanisms. Many traditional authentication mechanisms become unsuitable for smart devices. For example, while password is widely used on computers as user identity authentication, inputting password

The widespread adoption of mobile devices gives rise to new opportunities and challenges for authentication mechanisms. Many traditional authentication mechanisms become unsuitable for smart devices. For example, while password is widely used on computers as user identity authentication, inputting password on small smartphone screen is error-prone and not convenient. In the meantime, there are emerging demands for new types of authentication. Proximity authentication is an example, which is not needed for computers but quite necessary for smart devices. These challenges motivate me to study and develop novel authentication mechanisms specific for smart devices.

In this dissertation, I am interested in the special authentication demands of smart devices and about to satisfy the demands. First, I study how the features of smart devices affect user identity authentications. For identity authentication domain, I aim to design a continuous, forge-resistant authentication mechanism that does not interrupt user-device interactions. I propose a mechanism that authenticates user identity based on the user's finger movement patterns. Next, I study a smart-device-specific authentication, proximity authentication, which authenticates whether two devices are in close proximity. For prox- imity authentication domain, I aim to design a user-friendly authentication mechanism that can defend against relay attacks. In addition, I restrict the authenticated distance to the scale of near field, i.e., a few centimeters. My first design utilizes a user's coherent two-finger movement on smart device screen to restrict the distance. To achieve a fully-automated system, I explore acoustic communications and propose a novel near field authentication system.
Date Created
2014
Agent

Semantic sparse learning in images and videos

152840-Thumbnail Image.png
Description
Many learning models have been proposed for various tasks in visual computing. Popular examples include hidden Markov models and support vector machines. Recently, sparse-representation-based learning methods have attracted a lot of attention in the computer vision field, largely because of

Many learning models have been proposed for various tasks in visual computing. Popular examples include hidden Markov models and support vector machines. Recently, sparse-representation-based learning methods have attracted a lot of attention in the computer vision field, largely because of their impressive performance in many applications. In the literature, many of such sparse learning methods focus on designing or application of some learning techniques for certain feature space without much explicit consideration on possible interaction between the underlying semantics of the visual data and the employed learning technique. Rich semantic information in most visual data, if properly incorporated into algorithm design, should help achieving improved performance while delivering intuitive interpretation of the algorithmic outcomes. My study addresses the problem of how to explicitly consider the semantic information of the visual data in the sparse learning algorithms. In this work, we identify four problems which are of great importance and broad interest to the community. Specifically, a novel approach is proposed to incorporate label information to learn a dictionary which is not only reconstructive but also discriminative; considering the formation process of face images, a novel image decomposition approach for an ensemble of correlated images is proposed, where a subspace is built from the decomposition and applied to face recognition; based on the observation that, the foreground (or salient) objects are sparse in input domain and the background is sparse in frequency domain, a novel and efficient spatio-temporal saliency detection algorithm is proposed to identify the salient regions in video; and a novel hidden Markov model learning approach is proposed by utilizing a sparse set of pairwise comparisons among the data, which is easier to obtain and more meaningful, consistent than tradition labels, in many scenarios, e.g., evaluating motion skills in surgical simulations. In those four problems, different types of semantic information are modeled and incorporated in designing sparse learning algorithms for the corresponding visual computing tasks. Several real world applications are selected to demonstrate the effectiveness of the proposed methods, including, face recognition, spatio-temporal saliency detection, abnormality detection, spatio-temporal interest point detection, motion analysis and emotion recognition. In those applications, data of different modalities are involved, ranging from audio signal, image to video. Experiments on large scale real world data with comparisons to state-of-art methods confirm the proposed approaches deliver salient advantages, showing adding those semantic information dramatically improve the performances of the general sparse learning methods.
Date Created
2014
Agent