On Processing Spatial Queries in Graph Database Management Systems

161302-Thumbnail Image.png
Description
Spatial data is fundamental in many applications like map services, land resource management, etc. Meanwhile, spatial data inherently comes with abundant context information because spatial entities themselves possess different properties, e.g., graph or textual information, etc. Among all these compound

Spatial data is fundamental in many applications like map services, land resource management, etc. Meanwhile, spatial data inherently comes with abundant context information because spatial entities themselves possess different properties, e.g., graph or textual information, etc. Among all these compound spatial data, geospatial graph data is one of the most challenging for the complexity of graph data. Graph data is commonly used to model real scenarios and searching for the matching subgraphs is fundamental in retrieving and analyzing graph data. With the ubiquity of spatial data, vertexes or edges in graphs are enriched with spatial location attributes side by side with other non-spatial attributes. Graph-based applications integrate spatial data into the graph model and provide more spatial-aware services. The co-existence of the graph and spatial data in the same geospatial graph triggers some new applications. To solve new problems in these applications, existing solutions develop an integrated system that incorporates the graph database and spatial database engines. However, existing approaches suffer from the architecture where graph data and spatial data are isolated. In this dissertation, I will explain two indexing frameworks, GeoReach and RisoTree, which can significantly accelerate the queries in geospatial graphs. GeoReach includes a query operator that adds spatial data awareness to a graph database management system. In GeoReach, the neighborhood spatial information is summarized and stored on each vertex in the graph. The summarization includes three different structures according to the location distribution. These spatial summaries are utilized to terminate the graph search early.RisoTree is a hierarchical tree structure where each node is represented by a minimum bounding rectangle (MBR). The MBR of a node is a rectangle that encloses all its children. A key difference between RisoTree and RTree is that RisoTree contains pre-materialized subgraph information to each index node. The subgraph information is utilized during the spatial index search phase to prune search paths that cannot satisfy the query graph pattern. The RisoTree index reduces the search space when the spatial filtering phase is performed with relatively light cost.
Date Created
2021
Agent

FPGA Acceleration of CNNs Using OpenCL

158677-Thumbnail Image.png
Description
Convolutional Neural Network (CNN) has achieved state-of-the-art performance in numerous applications like computer vision, natural language processing, robotics etc. The advancement of High-Performance Computing systems equipped with dedicated hardware accelerators has also paved the way towards the success of compute

Convolutional Neural Network (CNN) has achieved state-of-the-art performance in numerous applications like computer vision, natural language processing, robotics etc. The advancement of High-Performance Computing systems equipped with dedicated hardware accelerators has also paved the way towards the success of compute intensive CNNs. Graphics Processing Units (GPUs), with massive processing capability, have been of general interest for the acceleration of CNNs. Recently, Field Programmable Gate Arrays (FPGAs) have been promising in CNN acceleration since they offer high performance while also being re-configurable to support the evolution of CNNs. This work focuses on a design methodology to accelerate CNNs on FPGA with low inference latency and high-throughput which are crucial for scenarios like self-driving cars, video surveillance etc. It also includes optimizations which reduce the resource utilization by a large margin with a small degradation in performance thus making the design suitable for low-end FPGA devices as well.

FPGA accelerators often suffer due to the limited main memory bandwidth. Also, highly parallel designs with large resource utilization often end up achieving low operating frequency due to poor routing. This work employs data fetch and buffer mechanisms, designed specifically for the memory access pattern of CNNs, that overlap computation with memory access. This work proposes a novel arrangement of the systolic processing element array to achieve high frequency and consume less resources than the existing works. Also, support has been extended to more complicated CNNs to do video processing. On Intel Arria 10 GX1150, the design operates at a frequency as high as 258MHz and performs single inference of VGG-16 and C3D in 23.5ms and 45.6ms respectively. For VGG-16 and C3D the design offers a throughput of 66.1 and 23.98 inferences/s respectively. This design can outperform other FPGA 2D CNN accelerators by up to 9.7 times and 3D CNN accelerators by up to 2.7 times.
Date Created
2020
Agent

System Support for Large-scale Geospatial Data Analytics

158510-Thumbnail Image.png
Description
The volume of available spatial data has increased tremendously. Such data includes but is not limited to: weather maps, socioeconomic data, vegetation indices, geotagged social media, and more. These applications need a powerful data management platform to support scalable and

The volume of available spatial data has increased tremendously. Such data includes but is not limited to: weather maps, socioeconomic data, vegetation indices, geotagged social media, and more. These applications need a powerful data management platform to support scalable and interactive analytics on big spatial data. Even though existing single-node spatial database systems (DBMSs) provide support for spatial data, they suffer from performance issues when dealing with big spatial data. Challenges to building large-scale spatial data systems are as follows: (1) System Scalability: The massive-scale of available spatial data hinders making sense of it using traditional spatial database management systems. Moreover, large-scale spatial data, besides its tremendous storage footprint, may be extremely difficult to manage and maintain due to the heterogeneous shapes, skewed data distribution and complex spatial relationship. (2) Fast analytics: When the user runs spatial data analytics applications using graphical analytics tools, she does not tolerate delays introduced by the underlying spatial database system. Instead, the user needs to see useful information quickly.

In this dissertation, I focus on designing efficient data systems and data indexing mechanisms to bolster scalable and interactive analytics on large-scale geospatial data. I first propose a cluster computing system GeoSpark which extends the core engine of Apache Spark and Spark SQL to support spatial data types, indexes, and geometrical operations at scale. In order to reduce the indexing overhead, I propose Hippo, a fast, yet scalable, sparse database indexing approach. In contrast to existing tree index structures, Hippo stores disk page ranges (each works as a pointer of one or many pages) instead of tuple pointers in the indexed table to reduce the storage space occupied by the index. Moreover, I present Tabula, a middleware framework that sits between a SQL data system and a spatial visualization dashboard to make the user experience with the dashboard more seamless and interactive. Tabula adopts a materialized sampling cube approach, which pre-materializes samples, not for the entire table as in the SampleFirst approach, but for the results of potentially unforeseen queries (represented by an OLAP cube cell).
Date Created
2020
Agent

Mengde Signatures: The First Practical Implementation Of Proxy Digital Signatures

Description
Proxy digital signatures are a subset of proxy cryptography that enable a peer, as a proxy delegator, to delegate signing privileges to another trusted peer, who becomes a proxy signer. The proxy signer then signs authorized transactions routed to it

Proxy digital signatures are a subset of proxy cryptography that enable a peer, as a proxy delegator, to delegate signing privileges to another trusted peer, who becomes a proxy signer. The proxy signer then signs authorized transactions routed to it from the proxy delegator, to then send to the intended third party on their behalf. This has great applications for computer networks where certain devices lack sufficient computational power to secure themselves and may rely on trusted and computationally more powerful peers, particularly within edge and fog networks. Although there are multiple proxy digital signature schemas that are circulated within cryptography-centric research papers, a practical software implementation has yet to be created. In this paper we describe Mengde Signatures: the first practical software implementation of proxy digital signatures. We expound upon the current architecture and process for how proxy signatures are implemented and function in a software engineering context. Although applicable to many different types of networks, we showcase the application of Mengde Signatures on an open source Proof-Of-Work Blockchain.
Date Created
2020-12
Agent

PCI and Personal Data Anonymization

131504-Thumbnail Image.png
Description
In the last few years, billion-dollar companies like Yahoo and Equifax have had data breaches causing millions of people’s personal information to be leaked online. Other billion-dollar companies like Google and Facebook have gotten in trouble for abusing people’s personal

In the last few years, billion-dollar companies like Yahoo and Equifax have had data breaches causing millions of people’s personal information to be leaked online. Other billion-dollar companies like Google and Facebook have gotten in trouble for abusing people’s personal information for financial gain as well. In this new age of technology where everything is being digitalized and stored online, people all over the world are concerned about what is happening to their personal information and how they can trust it is being kept safe. This paper describes, first, the importance of protecting user data, second, one easy tool that companies and developers can use to help ensure that their user’s information (credit card information specifically) is kept safe, how to implement that tool, and finally, future work and research that needs to be done. The solution I propose is a software tool that will keep credit card data secured. It is only a small step towards achieving a completely secure data anonymized system, but when implemented correctly, it can reduce the risk of credit card data from being exposed to the public. The software tool is a script that can scan every viable file in any given system, server, or other file-structured Linux system and detect if there any visible credit card numbers that should be hidden.
Date Created
2020-05
Agent

The Necessity of Error Correction In The Quantum World

131863-Thumbnail Image.png
Description
Quantum computers provide a promising future, where computationally difficult
problems can be executed exponentially faster than the current classical computers we have in use today. While there is tremendous research and development in the creation of quantum computers, there is a

Quantum computers provide a promising future, where computationally difficult
problems can be executed exponentially faster than the current classical computers we have in use today. While there is tremendous research and development in the creation of quantum computers, there is a fundamental challenge that exists in the quantum world. Due to the fragility of the quantum world, error correction methods have originated since 1995 to tackle the giant problem. Since the birth of the idea that these powerful computers can crunch and process numbers beyond the limit of the current computers, there exist several mathematical error correcting codes that could potentially give the required stability in the fragile and fault tolerant quantum world. While there has been a multitude of possible solutions, there is no one single error correcting code that is the key to solving the problem. Almost every solution presented has shared with it a limiting factor or an issue that prevents it from becoming the breakthrough that is desperately needed.

This paper gives an introductory knowledge of what is the quantum world and why there is a need for error correcting topologies. Finally, it introduces one recent topology that could be added to the list of possible solutions to this central problem. Rather than focusing on the mathematical frameworks, the paper introduces the main concepts so that most readers even outside the major field of computer science can understand what the main problem is and how this topology attempts to solve it.
Date Created
2020-05
Agent

Enabling Peer to Peer Energy Trading Marketplace Using Consortium Blockchain Networks

157869-Thumbnail Image.png
Description
Blockchain technology enables peer-to-peer transactions through the elimination of the need for a centralized entity governing consensus. Rather than having a centralized database, the data is distributed across multiple computers which enables crash fault tolerance as well as makes the

Blockchain technology enables peer-to-peer transactions through the elimination of the need for a centralized entity governing consensus. Rather than having a centralized database, the data is distributed across multiple computers which enables crash fault tolerance as well as makes the system difficult to tamper with due to a distributed consensus algorithm.

In this research, the potential of blockchain technology to manage energy transactions is examined. The energy production landscape is being reshaped by distributed energy resources (DERs): photo-voltaic panels, electric vehicles, smart appliances, and battery storage. Distributed energy sources such as microgrids, household solar installations, community solar installations, and plug-in hybrid vehicles enable energy consumers to act as providers of energy themselves, hence acting as 'prosumers' of energy.

Blockchain Technology facilitates managing the transactions between involved prosumers using 'Smart Contracts' by tokenizing energy into assets. Better utilization of grid assets lowers costs and also presents the opportunity to buy energy at a reasonable price while staying connected with the utility company. This technology acts as a backbone for 2 models applicable to transactional energy marketplace viz. 'Real-Time Energy Marketplace' and 'Energy Futures'. In the first model, the prosumers are given a choice to bid for a price for energy within a stipulated period of time, while the Utility Company acts as an operating entity. In the second model, the marketplace is more liberal, where the utility company is not involved as an operator. The Utility company facilitates infrastructure and manages accounts for all users, but does not endorse or govern transactions related to energy bidding. These smart contracts are not time bounded and can be suspended by the utility during periods of network instability.
Date Created
2019
Agent

A Comparative Study on the Performance Isolation of Virtualization Technologies

157781-Thumbnail Image.png
Description
Virtualization technologies are widely used in modern computing systems to deliver shared resources to heterogeneous applications. Virtual Machines (VMs) are the basic building blocks for Infrastructure as a Service (IaaS), and containers are widely used to provide Platform as a

Virtualization technologies are widely used in modern computing systems to deliver shared resources to heterogeneous applications. Virtual Machines (VMs) are the basic building blocks for Infrastructure as a Service (IaaS), and containers are widely used to provide Platform as a Service (PaaS). Although it is generally believed that containers have less overhead than VMs, an important tradeoff which has not been thoroughly studied is the effectiveness of performance isolation, i.e., to what extent the virtualization technology prevents the applications from affecting each other’s performance when they share the resources using separate VMs or containers. Such isolation is critical to provide performance guarantees for applications consolidated using VMs or containers. This paper provides a comprehensive study on the performance isolation for three widely used virtualization technologies, full virtualization, para-virtualization, and operating system level virtualization, using Kernel-based Virtual Machine (KVM), Xen, and Docker containers as the representative implementations of these technologies. The results show that containers generally have less performance loss (up to 69% and 41% compared to KVM and Xen in network latency experiments, respectively) and better scalability (up to 83.3% and 64.6% faster compared to KVM and Xen when increasing number of VMs/containers to 64, respectively), but they also suffer from much worse isolation (up to 111.8% and 104.92% slowdown compared to KVM and Xen when adding disk stress test in TeraSort experiments under full usage (FU) scenario, respectively). The resource reservation tools help virtualization technologies achieve better performance (up to 85.9% better disk performance in TeraSort under FU scenario), but cannot help them avoid all impacts.
Date Created
2019
Agent

Activity Specification for Time-based Discrete Event Simulation Models

157772-Thumbnail Image.png
Description
Computational models for relatively complex systems are subject to many difficulties, among which is the ability for the models to be discretely understandable and applicable to specific problem types and their solutions. This demands the specification of a dynamic system

Computational models for relatively complex systems are subject to many difficulties, among which is the ability for the models to be discretely understandable and applicable to specific problem types and their solutions. This demands the specification of a dynamic system as a collection of models, including metamodels. In this context, new modeling approaches and tools can help provide a richer understanding and, therefore, the development of sophisticated behavior in system dynamics. From this vantage point, an activity specification is proposed as a modeling approach based on a time-based discrete event system abstraction. Such models are founded upon set-theoretic principles and methods for modeling and simulation with the intent of making them subject to specific and profound questions for user-defined experiments.

Because developing models is becoming more time-consuming and expensive, some research has focused on the acquisition of concrete means targeted at the early stages of component-based system analysis and design. The model-driven architecture (MDA) framework provides some means for the behavioral modeling of discrete systems. The development of models can benefit from simplifications and elaborations enabled by the MDA meta-layers, which is essential for managing model complexity. Although metamodels pose difficulties, especially for developing complex behavior, as opposed to structure, they are advantageous and complementary to formal models and concrete implementations in programming languages.

The developed approach is focused on action and control concepts across the MDA meta-layers and is proposed for the parallel Discrete Event System Specification (P-DEVS) formalism. The Unified Modeling Language (UML) activity meta-models are used with syntax and semantics that conform to the DEVS formalism and its execution protocol. The notions of the DEVS component and state are used together according to their underlying system-theoretic foundation. A prototype tool supporting activity modeling was developed to demonstrate the degree to which action-based behavior can be modeled using the MDA and DEVS. The parallel DEVS, as a formal approach, supports identifying the semantics of the UML activities. Another prototype was developed to create activity models and support their execution with the DEVS-Suite simulator, and a set of prototypical multiprocessor architecture model specifications were designed, simulated, and analyzed.
Date Created
2019
Agent

A Study on Resources Utilization of Deep Learning Workloads

132750-Thumbnail Image.png
Description
Deep learning and AI have grabbed tremendous attention in the last decade. The substantial accuracy improvement by neural networks in common tasks such as image classification and speech recognition has made deep learning as a replacement for many conventional machine

Deep learning and AI have grabbed tremendous attention in the last decade. The substantial accuracy improvement by neural networks in common tasks such as image classification and speech recognition has made deep learning as a replacement for many conventional machine learning techniques. Training Deep Neural networks require a lot of data, and therefore vast of amounts of computing resources to process the data and train the model for the neural network. The most obvious solution to solving this problem is to speed up the time it takes to train Deep Neural networks.
AI and deep learning workloads are different from the conventional cloud and mobile workloads, with respect to: (1) Computational Intensity, (2) I/O characteristics, and (3) communication pattern. While there is a considerable amount of research activity on the theoretical aspects of AI and Deep Learning algorithms that run with greater efficiency, there are only a few studies on the infrastructural impact of Deep Learning workloads on computing and storage resources in distributed systems.
It is typical to utilize a heterogeneous mixture of CPU and GPU devices to perform training on a neural network. Google Brain has a developed a reinforcement model that can place training operations across a heterogeneous cluster. Though it has only been tested with local devices in a single cluster. This study will explore the method’s capabilities and attempt to apply this method on a cluster with nodes across a network.
Date Created
2019-05
Agent