Privacy Preserving Visualizations using Vega-Lite

193593-Thumbnail Image.png
In today's data-driven world, privacy is a significant concern. It is crucial to preserve the privacy of sensitive information while visualizing data. This thesis aims to develop new techniques and software tools that support Vega-Lite visualizations while maintaining privacy. Vega-Lite

In today's data-driven world, privacy is a significant concern. It is crucial to preserve the privacy of sensitive information while visualizing data. This thesis aims to develop new techniques and software tools that support Vega-Lite visualizations while maintaining privacy. Vega-Lite is a visualization grammar based on Wilkinson's grammar of graphics. The project extends Vega-Lite to incorporate privacy algorithms such as k-anonymity, l-diversity, t-closeness, and differential privacy. This is done by using a unique multi-input loop module logic that generates combinations of attributes as a new anonymization method. Differential privacy is implemented by adding controlled noise (Laplace or Exponential) to the sensitive columns in the dataset. The user defines custom rules in the JSON schema, mentioning the privacy methods and the sensitive column. The schema is validated using Another JSON Validation library, and these rules help identify the anonymization techniques to be performed on the dataset before sending it back to the Vega-Lite visualization server. Multiple datasets satisfying the privacy requirements are generated, and their utility scores are provided so that the user can trade-off between privacy and utility on the datasets based on their requirements. The interface developed is user-friendly and intuitive and guides users in using it. It provides appropriate feedback on the privacy-preserving visualizations generated through various utility metrics. This application is helpful for technical or domain experts across multiple domains where privacy is a big concern, such as medical institutions, traffic and urban planning, financial institutions, educational records, and employer-employee relations. This project is novel as it provides a one-stop solution for privacy-preserving visualization. It works on open-source software, Vega-Lite, which several organizations and users use for business and educational purposes.
Date Created

Optimizing Consistency and Performance Trade-off in Distributed Log-Structured Merge-Tree-based Key-Value Stores

189344-Thumbnail Image.png
Distributed databases, such as Log-Structured Merge-Tree Key-Value Stores (LSM-KVS), are widely used in modern infrastructure. One of the primary challenges in these databases is ensuring consistency, meaning that all nodes have the same view of data at any given time.

Distributed databases, such as Log-Structured Merge-Tree Key-Value Stores (LSM-KVS), are widely used in modern infrastructure. One of the primary challenges in these databases is ensuring consistency, meaning that all nodes have the same view of data at any given time. However, maintaining consistency requires a trade-off: the stronger the consistency, the more resources are necessary to replicate data across replicas, which decreases database performance. Addressing this trade-off poses two challenges: first, developing and managing multiple consistency levels within a single system, and second, assigning consistency levels to effectively balance the consistency-performance trade-off. This thesis introduces Self-configuring Consistency In Distributed LSM-KVS (SCID), a service that leverages unique properties of LSM KVS properties to manage consistency levels and automates level assignment with ML. To address the first challenge, SCID combines Dynamic read-only instances and Logical KV-based partitions to enable on-demand updates of read-only instances and facilitate the logical separation of groups of key-value pairs. SCID uses logical partitions as consistency levels and on-demand updates in dynamic read-only instances to allow for multiple consistency levels. To address the second challenge, the thesis presents an ML-based solution, SCID-ML to manage consistency-performance trade-off with better effectiveness. We evaluate SCID and find it to improve the write throughput up to 50% and achieve 62% accuracy for consistency-level predictions.
Date Created

GPU-enabled Functional-as-a-Service

171964-Thumbnail Image.png
Function-as-a-Service (FaaS) is emerging as an important cloud computing service model as it can improve scalability and usability for a wide range of applications, especially Machine-Learning (ML) inference tasks that require scalable computation resources and complicated configurations. Many applications, including

Function-as-a-Service (FaaS) is emerging as an important cloud computing service model as it can improve scalability and usability for a wide range of applications, especially Machine-Learning (ML) inference tasks that require scalable computation resources and complicated configurations. Many applications, including ML inference, rely on Graphics-Processing-Unit (GPU) to achieve high performance; however, support for GPUs is currently lacking in existing FaaS solutions. The unique event-triggered and short-lived nature of functions poses new challenges to enabling GPUs on FaaS which must consider the overhead of transferring data (e.g., ML model parameters and inputs/outputs) between GPU and host memory. This thesis presents a new GPU-enabled FaaS solution that enables functions to efficiently utilize GPUs to accelerate computations such as model inference. First, the work extends existing open-source FaaS frameworks such as OpenFaaS to support the scheduling and execution of functions across GPUs in a FaaS cluster. Second, it provides caching of ML models in GPU memory to improve the performance of model inference functions and global management of GPU memories to improve the cache utilization. Third, it offers co-designed GPU function scheduling and cache management to optimize the performance of ML inference functions. Specifically, the thesis proposes locality-aware scheduling which maximizes the utilization of both GPU memory for cache hits and GPU cores for parallel processing. A thorough evaluation based on real-world traces and ML models shows that the proposed GPU-enabled FaaS works well for ML inference tasks, and the proposed locality-aware scheduler achieves a speedup of 34x compared to the default, load-balancing only scheduler.
Date Created