Full metadata
Title
Evaluation of Storage Systems for Big Data Analytics
Description
Recent trends in big data storage systems show a shift from disk centric models to memory centric models. The primary challenges faced by these systems are speed, scalability, and fault tolerance. It is interesting to investigate the performance of these two models with respect to some big data applications. This thesis studies the performance of Ceph (a disk centric model) and Alluxio (a memory centric model) and evaluates whether a hybrid model provides any performance benefits with respect to big data applications. To this end, an application TechTalk is created that uses Ceph to store data and Alluxio to perform data analytics. The functionalities of the application include offline lecture storage, live recording of classes, content analysis and reference generation. The knowledge base of videos is constructed by analyzing the offline data using machine learning techniques. This training dataset provides knowledge to construct the index of an online stream. The indexed metadata enables the students to search, view and access the relevant content. The performance of the application is benchmarked in different use cases to demonstrate the benefits of the hybrid model.
Date Created
2017
Contributors
- NAGENDRA, SHILPA (Author)
- Huang, Dijiang (Thesis advisor)
- Zhao, Ming (Committee member)
- Maciejewski, Ross (Committee member)
- Chung, Chun-Jen (Committee member)
- Arizona State University (Publisher)
Topical Subject
Resource Type
Extent
150 pages
Language
eng
Copyright Statement
In Copyright
Primary Member of
Peer-reviewed
No
Open Access
No
Handle
https://hdl.handle.net/2286/R.I.46221
Level of coding
minimal
Note
Masters Thesis Computer Science 2017
System Created
- 2018-02-01 07:03:09
System Modified
- 2021-08-26 09:47:01
- 3 years 2 months ago
Additional Formats