Multiobjective Optimization Based Approach for Truth Discovery

157416-Thumbnail Image.png
There are many applications where the truth is unknown. The truth values are

guessed by different sources. The values of different properties can be obtained from

various sources. These will lead to the disagreement in sources. An important task

is to obtain the

There are many applications where the truth is unknown. The truth values are

guessed by different sources. The values of different properties can be obtained from

various sources. These will lead to the disagreement in sources. An important task

is to obtain the truth from these sometimes contradictory sources. In the extension

of computing the truth, the reliability of sources needs to be computed. There are

models which compute the precision values. In those earlier models Banerjee et al.

(2005) Dong and Naumann (2009) Kasneci et al. (2011) Li et al. (2012) Marian and

Wu (2011) Zhao and Han (2012) Zhao et al. (2012), multiple properties are modeled

individually. In one of the existing works, the heterogeneous properties are modeled in

a joined way. In that work, the framework i.e. Conflict Resolution on Heterogeneous

Data (CRH) framework is based on the single objective optimization. Due to the

single objective optimization and non-convex optimization problem, only one local

optimal solution is found. As this is a non-convex optimization problem, the optimal

point depends upon the initial point. This single objective optimization problem is

converted into a multi-objective optimization problem. Due to the multi-objective

optimization problem, the Pareto optimal points are computed. In an extension of

that, the single objective optimization problem is solved with numerous initial points.

The above two approaches are used for finding the solution better than the solution

obtained in the CRH with median as the initial point for the continuous variables and

majority voting as the initial point for the categorical variables. In the experiments,

the solution, coming from the CRH, lies in the Pareto optimal points of the multiobjective

optimization and the solution coming from the CRH is the optimum solution

in these experiments.
Date Created


132263-Thumbnail Image.png
Karate is a Japanese martial art that originated approximately a century ago, with heavy influence from Chinese martial arts at the time. Although it was originally created as a form of self-defense, many today practice it for sport. Organizations such

Karate is a Japanese martial art that originated approximately a century ago, with heavy influence from Chinese martial arts at the time. Although it was originally created as a form of self-defense, many today practice it for sport. Organizations such as the World Karate Federation (WKF) and USA Karate establish rules for competitions as well as host tournaments for practitioners of all ages and skill levels to participate in. Dojos will often host small, local tournaments for their students to practice and sharpen their competition skills. Smaller tournaments often do not have the same tools and technologies that larger tournaments do. Sign-ups are typically done in-person and payments are cash-only, which can be inconvenient for those who are extremely busy or forgetful. Another issue with hosting local tournaments is that the software used to run the timer is a desktop application, called Karate Semaphore. In the case of technical difficulties, installing the software on another machine can be extremely time-consuming and delay the progression of the tournament. Not to mention, Karate Semaphore was created following the 2012 WKF rules—meaning it is currently out of date, as it does not contain any features supporting new rules.
For my creative project, I designed a website through which smaller, local tournament registration and management are possible. Users can register for tournaments through the registration page. Registered users can check their registration is successful by viewing a table of all competitors. If the list of competitors is too long, they can filter results based on search criteria. Tournament management will be possible via a functioning timer following WKF rules which keeps track of both the match’s score as well as time.
Date Created

A Framework for Interactive Geospatial Map Cleaning using GPS Trajectories

155987-Thumbnail Image.png
A volunteered geographic information system, e.g., OpenStreetMap (OSM), collects data from volunteers to generate geospatial maps. To keep the map consistent, volunteers are expected to perform the tedious task of updating the underlying geospatial data at regular intervals. Such a

A volunteered geographic information system, e.g., OpenStreetMap (OSM), collects data from volunteers to generate geospatial maps. To keep the map consistent, volunteers are expected to perform the tedious task of updating the underlying geospatial data at regular intervals. Such a map curation step takes time and considerable human effort. In this thesis, we propose a framework that improves the process of updating geospatial maps by automatically identifying road changes from user-generated GPS traces. Since GPS traces can be sparse and noisy, the proposed framework validates the map changes with the users before propagating them to a publishable version of the map. The proposed framework achieves up to four times faster map matching performance than the state-of-the-art algorithms with only 0.1-0.3% accuracy loss.
Date Created

Query Workload-Aware Index Structures for Range Searches in 1D, 2D, and High-Dimensional Spaces

155846-Thumbnail Image.png
Most current database management systems are optimized for single query execution.

Yet, often, queries come as part of a query workload. Therefore, there is a need

for index structures that can take into consideration existence of multiple queries in a

query workload and

Most current database management systems are optimized for single query execution.

Yet, often, queries come as part of a query workload. Therefore, there is a need

for index structures that can take into consideration existence of multiple queries in a

query workload and efficiently produce accurate results for the entire query workload.

These index structures should be scalable to handle large amounts of data as well as

large query workloads.

The main objective of this dissertation is to create and design scalable index structures

that are optimized for range query workloads. Range queries are an important

type of queries with wide-ranging applications. There are no existing index structures

that are optimized for efficient execution of range query workloads. There are

also unique challenges that need to be addressed for range queries in 1D, 2D, and

high-dimensional spaces. In this work, I introduce novel cost models, index selection

algorithms, and storage mechanisms that can tackle these challenges and efficiently

process a given range query workload in 1D, 2D, and high-dimensional spaces. In particular,

I introduce the index structures, HCS (for 1D spaces), cSHB (for 2D spaces),

and PSLSH (for high-dimensional spaces) that are designed specifically to efficiently

handle range query workload and the unique challenges arising from their respective

spaces. I experimentally show the effectiveness of the above proposed index structures

by comparing with state-of-the-art techniques.
Date Created

SPSR efficient processing of socially k-nearest neighbors with spatial range filter

154864-Thumbnail Image.png
Social media has become popular in the past decade. Facebook for example has 1.59 billion active users monthly. With such massive social networks generating lot of data, everyone is constantly looking for ways of leveraging the knowledge from social networks

Social media has become popular in the past decade. Facebook for example has 1.59 billion active users monthly. With such massive social networks generating lot of data, everyone is constantly looking for ways of leveraging the knowledge from social networks to make their systems more personalized to their end users. And with rapid increase in the usage of mobile phones and wearables, social media data is being tied to spatial networks. This research document proposes an efficient technique that answers socially k-Nearest Neighbors with Spatial Range Filter. The proposed approach performs a joint search on both the social and spatial domains which radically improves the performance compared to straight forward solutions. The research document proposes a novel index that combines social and spatial indexes. In other words, graph data is stored in an organized manner to filter it based on spatial (region of interest) and social constraints (top-k closest vertices) at query time. That leads to pruning necessary paths during the social graph traversal procedure, and only returns the top-K social close venues. The research document then experimentally proves how the proposed approach outperforms existing baseline approaches by at least three times and also compare how each of our algorithms perform under various conditions on a real geo-social dataset extracted from Yelp.
Date Created

Locality sensitive indexing for efficient high-dimensional query answering in the presence of excluded regions

154272-Thumbnail Image.png
Similarity search in high-dimensional spaces is popular for applications like image

processing, time series, and genome data. In higher dimensions, the phenomenon of

curse of dimensionality kills the effectiveness of most of the index structures, giving

way to approximate methods like Locality Sensitive

Similarity search in high-dimensional spaces is popular for applications like image

processing, time series, and genome data. In higher dimensions, the phenomenon of

curse of dimensionality kills the effectiveness of most of the index structures, giving

way to approximate methods like Locality Sensitive Hashing (LSH), to answer similarity

searches. In addition to range searches and k-nearest neighbor searches, there

is a need to answer negative queries formed by excluded regions, in high-dimensional

data. Though there have been a slew of variants of LSH to improve efficiency, reduce

storage, and provide better accuracies, none of the techniques are capable of

answering queries in the presence of excluded regions.

This thesis provides a novel approach to handle such negative queries. This is

achieved by creating a prefix based hierarchical index structure. First, the higher

dimensional space is projected to a lower dimension space. Then, a one-dimensional

ordering is developed, while retaining the hierarchical traits. The algorithm intelligently

prunes the irrelevant candidates while answering queries in the presence of

excluded regions. While naive LSH would need to filter out the negative query results

from the main results, the new algorithm minimizes the need to fetch the redundant

results in the first place. Experiment results show that this reduces post-processing

cost thereby reducing the query processing time.
Date Created

SearchViz: an interactive visual interface to navigate search-results in online discussion forums

154120-Thumbnail Image.png
Online programming communities are widely used by programmers for troubleshooting or various problem solving tasks. Large and ever increasing volume of posts on these communities demands more efforts to read and comprehend thus making it harder to find relevant information.

Online programming communities are widely used by programmers for troubleshooting or various problem solving tasks. Large and ever increasing volume of posts on these communities demands more efforts to read and comprehend thus making it harder to find relevant information. In my thesis; I designed and studied an alternate approach by using interactive network visualization to represent relevant search results for online programming discussion forums.

I conducted user study to evaluate the effectiveness of this approach. Results show that users were able to identify relevant information more precisely via visual interface as compared to traditional list based approach. Network visualization demonstrated effective search-result navigation support to facilitate user’s tasks and improved query quality for successive queries. Subjective evaluation also showed that visualizing search results conveys more semantic information in efficient manner and makes searching more effective.
Date Created

Space adaptation techniques for preference oriented skyline processing

153303-Thumbnail Image.png
Skyline queries are a well-established technique used in multi criteria decision applications. There is a recent interest among the research community to efficiently compute skylines but the problem of presenting the skyline that takes into account the preferences of the

Skyline queries are a well-established technique used in multi criteria decision applications. There is a recent interest among the research community to efficiently compute skylines but the problem of presenting the skyline that takes into account the preferences of the user is still open. Each user has varying interests towards each attribute and hence "one size fits all" methodology might not satisfy all the users. True user satisfaction can be obtained only when the skyline is tailored specifically for each user based on his preferences.

This research investigates the problem of preference aware skyline processing which consists of inferring the preferences of users and computing a skyline specific to that user, taking into account his preferences. This research proposes a model that transforms the data from a given space to a user preferential space where each attribute represents the preference of the user. This study proposes two techniques "Preferential Skyline Processing" and "Latent Skyline Processing" to efficiently compute preference aware skylines in the user preferential space. Finally, through extensive experiments and performance analysis the correctness of the recommendations and the algorithm's ability to outperform the naïve ones is confirmed.
Date Created