EdgeFaaS: A Function-based Framework for Edge Computing

168534-Thumbnail Image.png
Description
The rapid growth of data generated from Internet of Things (IoTs) such as smart phones and smart home devices presents new challenges to cloud computing in transferring, storing, and processing the data. With increasingly more powerful edge devices, edge computing,

The rapid growth of data generated from Internet of Things (IoTs) such as smart phones and smart home devices presents new challenges to cloud computing in transferring, storing, and processing the data. With increasingly more powerful edge devices, edge computing, on the other hand, has the potential to better responsiveness, privacy, and cost efficiency. However, resources across the cloud and edge are highly distributed and highly diverse. To address these challenges, this paper proposes EdgeFaaS, a Function-as-a-Service (FaaS) based computing framework that supports the flexible, convenient, and optimized use of distributed and heterogeneous resources across IoT, edge, and cloud systems. EdgeFaaS allows cluster resources and individual devices to be managed under the same framework and provide computational and storage resources for functions. It provides virtual function and virtual storage interfaces for consistent function management and storage management across heterogeneous compute and storage resources. It automatically optimizes the scheduling of functions and placement of data according to their performance and privacy requirements. EdgeFaaS is evaluated based on two edge workflows: video analytics workflow and federated learning workflow, both of which are representative edge applications and involve large amounts of input data generated from edge devices.
Date Created
2021
Agent

Distributed RDF Storage and Querying Using In-Memory Processing Engine

161510-Thumbnail Image.png
Description
The proliferation of semantic data in the form of RDF (Resource Description Framework) triples demands an efficient, scalable, and distributed storage along with a highly available and fault-tolerant parallel processing strategy. There are three open issues with distributed RDF data

The proliferation of semantic data in the form of RDF (Resource Description Framework) triples demands an efficient, scalable, and distributed storage along with a highly available and fault-tolerant parallel processing strategy. There are three open issues with distributed RDF data management systems that are not well addressed altogether in existing work. First is the querying efficiency, second is that solutions are optimized for certain types of query patterns and don’t necessarily work well for all types, and third is concerned with reducing pre-processing cost. Therefore, the rapid growth of RDF data raises the need for an efficient partitioning strategy over distributed data management systems to improve SPARQL (SPARQL Protocol and RDF Query Language) query performance regardless of its pattern shape with minimized pre-processing overhead. In this context, the first contribution of this work is a distributed RDF data partitioning schema called 3CStore that extends the existing VP (Vertical Partitioning) approach by using a subset of triples from the VP tables based on different join correlations. This approach speeds up queries at the cost of additional pre-processing overhead. To solve this, a relational partitioning schema called VPExp was developed by splitting predicates based on explicit type information of objects. This approach gains a significant query performance only for the specific type of query where the object is bound to a value for a particular predicate. To get efficient query performance on a wide range of query patterns, an improved solution is proposed by extending the existing Property Table approach to Subset-Property Table and combined with the VP approach. Further investigation on distributed RDF processing and querying systems based on typical use cases led to a novel relational partitioning schema called PTP (Property Table Partitioning) that further partitions the whole Property Table into the number of unique properties to minimize query input size and join operations during query evaluation. Finally, an RDF data management system based on the SPARQL-over-SQL approach called S3QLRDF is developed that generates the optimal query execution plan using statistics of PTP tables to provide efficient SPARQL query processing on a distributed system.
Date Created
2021
Agent

System Support for Large-scale Geospatial Data Analytics

158510-Thumbnail Image.png
Description
The volume of available spatial data has increased tremendously. Such data includes but is not limited to: weather maps, socioeconomic data, vegetation indices, geotagged social media, and more. These applications need a powerful data management platform to support scalable and

The volume of available spatial data has increased tremendously. Such data includes but is not limited to: weather maps, socioeconomic data, vegetation indices, geotagged social media, and more. These applications need a powerful data management platform to support scalable and interactive analytics on big spatial data. Even though existing single-node spatial database systems (DBMSs) provide support for spatial data, they suffer from performance issues when dealing with big spatial data. Challenges to building large-scale spatial data systems are as follows: (1) System Scalability: The massive-scale of available spatial data hinders making sense of it using traditional spatial database management systems. Moreover, large-scale spatial data, besides its tremendous storage footprint, may be extremely difficult to manage and maintain due to the heterogeneous shapes, skewed data distribution and complex spatial relationship. (2) Fast analytics: When the user runs spatial data analytics applications using graphical analytics tools, she does not tolerate delays introduced by the underlying spatial database system. Instead, the user needs to see useful information quickly.

In this dissertation, I focus on designing efficient data systems and data indexing mechanisms to bolster scalable and interactive analytics on large-scale geospatial data. I first propose a cluster computing system GeoSpark which extends the core engine of Apache Spark and Spark SQL to support spatial data types, indexes, and geometrical operations at scale. In order to reduce the indexing overhead, I propose Hippo, a fast, yet scalable, sparse database indexing approach. In contrast to existing tree index structures, Hippo stores disk page ranges (each works as a pointer of one or many pages) instead of tuple pointers in the indexed table to reduce the storage space occupied by the index. Moreover, I present Tabula, a middleware framework that sits between a SQL data system and a spatial visualization dashboard to make the user experience with the dashboard more seamless and interactive. Tabula adopts a materialized sampling cube approach, which pre-materializes samples, not for the entire table as in the SampleFirst approach, but for the results of potentially unforeseen queries (represented by an OLAP cube cell).
Date Created
2020
Agent

Prototyping a Mobile Application Aimed to Streamline the Process of Preparing for Medical School Applications

132147-Thumbnail Image.png
Description
Less than half of all premedical applicants get accepted into a medical school, 39.3% of applicants to be precise, and that statistic is based on the number of matriculants out of the total applicants in 2015. With such a discouraging

Less than half of all premedical applicants get accepted into a medical school, 39.3% of applicants to be precise, and that statistic is based on the number of matriculants out of the total applicants in 2015. With such a discouraging acceptance rate, many students who start out as premed are often not towards the end of their undergraduate career and post-graduation because they do not feel prepared for medical school. It’s difficult for premed students to find all the information they need in one place rather than going from place to place or school website to school website. Additionally, it can be a hassle for premeds to keep track of all their coursework and calculate separate GPAs for each category especially due to how annoying Excel spread sheets can be. This is where the conceptualization of Premed Portfolio comes in. Premed Portfolio is a prototype mobile application. Premed Portfolio aims to streamline the process of preparing for medical school by guiding students to create a portfolio aimed to address the most important aspects of a medical school application. Students will be able to keep track of their cumulative GPA, BCPM (also known as science/math) GPA, MCAT Scores, prerequisite coursework and many more targeted areas of medical school. Premed Portfolio will also hope to use the stats that students provide and educate them on their chances of getting into medical school.
Date Created
2019-05
Agent