161549-Thumbnail Image.png
Description
Traditionally, databases have been categorized as either row-oriented or column-oriented databases. Row-oriented databases store each row of the table’s data contiguously onto the disk whereas column-oriented databases store each column’s data contiguously onto the disk. In recent years, columnar database

Traditionally, databases have been categorized as either row-oriented or column-oriented databases. Row-oriented databases store each row of the table’s data contiguously onto the disk whereas column-oriented databases store each column’s data contiguously onto the disk. In recent years, columnar database management systems are becoming increasingly popular because deep and narrow queries are faster on them. Hence, column-oriented databases are highly optimized to be used with analytical (OLAP) workloads (Mike Freedman 2019). That is why they are very frequently used in business intelligence (BI), data warehouses, etc., which involve working with large data sets, intensive queries and aggregated computing. As the size of data keeps growing, efficient compression of data becomes an important consideration for these databases to optimize storage as well as improve query performance. Since column-oriented databases store data of the same data type contiguously, most modern compression techniques provide better compression ratios as compared to row-oriented databases. This thesis introduces a new compression technique called SA128 for column-oriented databases that performs a column-wise compression of database tables. SA128 is a multi-stage compression technique which performs a column-wise compression followed by a table-wide compression of database tables. In the first stage, SA128 performs an analysis based on the characteristics of data (such as data type and distribution) and determines which combination of lossless compression algorithms would result in the best compression ratio. In the second phase, SA128 uses an entropy encoding technique such as rANS (Duda, J., 2013) to further improve the compression ratio.
Reuse Permissions


  • Download restricted.

    Details

    Title
    • SA128 - A Smart Data Compression Technique for Columnar Databases Based on Characteristics of Data
    Contributors
    Date Created
    2021
    Resource Type
  • Text
  • Collections this item is in
    Note
    • Partial requirement for: M.S., Arizona State University, 2021
    • Field of study: Software Engineering

    Machine-readable links