Full metadata
Title
SA128 - A Smart Data Compression Technique for Columnar Databases Based on Characteristics of Data
Description
Traditionally, databases have been categorized as either row-oriented or column-oriented databases. Row-oriented databases store each row of the table’s data contiguously onto the disk whereas column-oriented databases store each column’s data contiguously onto the disk. In recent years, columnar database management systems are becoming increasingly popular because deep and narrow queries are faster on them. Hence, column-oriented databases are highly optimized to be used with analytical (OLAP) workloads (Mike Freedman 2019). That is why they are very frequently used in business intelligence (BI), data warehouses, etc., which involve working with large data sets, intensive queries and aggregated computing. As the size of data keeps growing, efficient compression of data becomes an important consideration for these databases to optimize storage as well as improve query performance. Since column-oriented databases store data of the same data type contiguously, most modern compression techniques provide better compression ratios as compared to row-oriented databases. This thesis introduces a new compression technique called SA128 for column-oriented databases that performs a column-wise compression of database tables. SA128 is a multi-stage compression technique which performs a column-wise compression followed by a table-wide compression of database tables. In the first stage, SA128 performs an analysis based on the characteristics of data (such as data type and distribution) and determines which combination of lossless compression algorithms would result in the best compression ratio. In the second phase, SA128 uses an entropy encoding technique such as rANS (Duda, J., 2013) to further improve the compression ratio.
Date Created
2021
Contributors
- Anand, Sukhpreet Singh (Author)
- Bansal, Ajay (Thesis advisor)
- Heinrichs, Robert R (Committee member)
- Gonzalez-Sanchez, Javier (Committee member)
- Arizona State University (Publisher)
Topical Subject
Resource Type
Extent
147 pages
Language
eng
Copyright Statement
In Copyright
Primary Member of
Peer-reviewed
No
Open Access
No
Handle
https://hdl.handle.net/2286/R.2.N.161549
Level of coding
minimal
Cataloging Standards
Note
Partial requirement for: M.S., Arizona State University, 2021
Field of study: Software Engineering
System Created
- 2021-11-16 02:01:31
System Modified
- 2021-11-30 12:51:28
- 2 years 11 months ago
Additional Formats