Full metadata

Title

Big Data Generator and Evaluation of a Similarity Grouping Operator

Description

As Big Data becomes more relevant, existing grouping and clustering algorithms will need to be evaluated for their effectiveness with large amounts of data. Previous work in Similarity Grouping proposes a possible alternative to existing data analytics tools, which acts as a hybrid between fast grouping and insightful clustering. We, the SimCloud Team, proposed Distributed Similarity Group-by (DSG), a distributed implementation of Similarity Group By. Experimental results show that DSG is effective at generating meaningful clusters and has a lower runtime than K-Means, a commonly used clustering algorithm. This document presents my personal contributions to this team effort. The contributions include the multi-dimensional synthetic data generator, execution of the Increasing Scale Factor experiment, and presentations at the NCURIE Symposium and the SISAP 2019 Conference.

Date Created

2019-12

Contributors

Wallace, Xavier Guillermo (Author)
Silva, Yasin (Thesis director)
Kuai, Xu (Committee member)
School for the Future of Innovation in Society (Contributor)
School of Mathematical and Natural Sciences (Contributor)
Barrett, The Honors College (Contributor)

Topical Subject

Extent

14 pages

Language

eng

Copyright Statement

In Copyright

Primary Member of

Barrett, The Honors College Thesis/Creative Project Collection

Series

Academic Year 2019-2020

Handle

https://hdl.handle.net/2286/R.I.54737

Level of coding

minimal

Cataloging Standards

asu1

System Created

2019-11-01 12:00:04

System Modified

2021-08-11 04:09:57
3 years 3 months ago

Additional Formats