Discovering Partial-Value Associations and Applications

190884-Thumbnail Image.png
Description
Existing machine learning and data mining techniques have difficulty in handling three characteristics of real-world data sets altogether in a computationally efficient way: (1) different data types with both categorical data and numeric data, (2) different variable relations in different

Existing machine learning and data mining techniques have difficulty in handling three characteristics of real-world data sets altogether in a computationally efficient way: (1) different data types with both categorical data and numeric data, (2) different variable relations in different value ranges of variables, and (3) unknown variable dependency.This dissertation developed a Partial-Value Association Discovery (PVAD) algorithm to overcome the above drawbacks in existing techniques. It also enables the discovery of partial-value and full-value variable associations showing both effects of individual variables and interactive effects of multiple variables. The algorithm is compared with Association rule mining and Decision Tree for validation purposes. The results show that the PVAD algorithm can overcome the shortcomings of existing methods. The second part of this dissertation focuses on knee point detection on noisy data. This extended research topic was inspired during the investigation into categorization for numeric data, which corresponds to Step 1 of the PVAD algorithm. A new mathematical definition of knee point on discrete data is introduced. Due to the unavailability of ground truth data or benchmark data sets, functions used to generate synthetic data are carefully selected and defined. These functions are subsequently employed to create the data sets for this experiment. These synthetic data sets are useful for systematically evaluating and comparing the performance of existing methods. Additionally, a deep-learning model is devised for this problem. Experiments show that the proposed model surpasses existing methods in all synthetic data sets, regardless of whether the samples have single or multiple knee points. The third section presents the application results of the PVAD algorithm to real-world data sets in various domains. These include energy consumption data of an Arizona State University (ASU) building, Computer Network, and ASU Engineering Freshmen Retention. The PVAD algorithm is utilized to create an associative network for energy consumption modeling, analyze univariate and multivariate measures of network flow variables, and identify common and uncommon characteristics related to engineering student retention after their first year at the university. The findings indicate that the PVAD algorithm offers the advantage and capability to uncover variable relationships.
Date Created
2023
Agent

Data and Predictive Analytics for Energy Use

134662-Thumbnail Image.png
Description
The overall energy consumption around the United States has not been reduced even with the advancement of technology over the past decades. Deficiencies exist between design and actual energy performances. Energy Infrastructure Systems (EIS) are impacted when the amount of

The overall energy consumption around the United States has not been reduced even with the advancement of technology over the past decades. Deficiencies exist between design and actual energy performances. Energy Infrastructure Systems (EIS) are impacted when the amount of energy production cannot be accurately and efficiently forecasted. Inaccurate engineering assumptions can result when there is a lack of understanding on how energy systems can operate in real-world applications. Energy systems are complex, which results in unknown system behaviors, due to an unknown structural system model. Currently, there exists a lack of data mining techniques in reverse engineering, which are needed to develop efficient structural system models. In this project, a new type of reverse engineering algorithm has been applied to a year's worth of energy data collected from an ASU research building called MacroTechnology Works, to identify the structural system model. Developing and understanding structural system models is the first step in creating accurate predictive analytics for energy production. The associative network of the building's data will be highlighted to accurately depict the structural model. This structural model will enhance energy infrastructure systems' energy efficiency, reduce energy waste, and narrow the gaps between energy infrastructure design, planning, operation and management (DPOM).
Date Created
2016-12
Agent

Centralized and decentralized methods of efficient resource allocation in cloud computing

155138-Thumbnail Image.png
Description
Resource allocation in cloud computing determines the allocation of computer and network resources of service providers to service requests of cloud users for meeting the cloud users' service requirements. The efficient and effective resource allocation determines the success of cloud

Resource allocation in cloud computing determines the allocation of computer and network resources of service providers to service requests of cloud users for meeting the cloud users' service requirements. The efficient and effective resource allocation determines the success of cloud computing. However, it is challenging to satisfy objectives of all service providers and all cloud users in an unpredictable environment with dynamic workload, large shared resources and complex policies to manage them.

Many studies propose to use centralized algorithms for achieving optimal solutions for resource allocation. However, the centralized algorithms may encounter the scalability problem to handle a large number of service requests in a realistically satisfactory time. Hence, this dissertation presents two studies. One study develops and tests heuristics of centralized resource allocation to produce near-optimal solutions in a scalable manner. Another study looks into decentralized methods of performing resource allocation.

The first part of this dissertation defines the resource allocation problem as a centralized optimization problem in Mixed Integer Programming (MIP) and obtains the optimal solutions for various resource-service problem scenarios. Based on the analysis of the optimal solutions, various heuristics are designed for efficient resource allocation. Extended experiments are conducted with larger numbers of user requests and service providers for performance evaluation of the resource allocation heuristics. Experimental results of the resource allocation heuristics show the comparable performance of the heuristics to the optimal solutions from solving the optimization problem. Moreover, the resource allocation heuristics demonstrate better computational efficiency and thus scalability than solving the optimization problem.

The second part of this dissertation looks into elements of service provider-user coordination first in the formulation of the centralized resource allocation problem in MIP and then in the formulation of the optimization problem in a decentralized manner for various problem cases. By examining differences between the centralized, optimal solutions and the decentralized solutions for those problem cases, the analysis of how the decentralized service provider-user coordination breaks down the optimal solutions is performed. Based on the analysis, strategies of decentralized service provider-user coordination are developed.
Date Created
2016
Agent

Assurance management framework for access control systems

151152-Thumbnail Image.png
Description
Access control is one of the most fundamental security mechanisms used in the design and management of modern information systems. However, there still exists an open question on how formal access control models can be automatically analyzed and fully realized

Access control is one of the most fundamental security mechanisms used in the design and management of modern information systems. However, there still exists an open question on how formal access control models can be automatically analyzed and fully realized in secure system development. Furthermore, specifying and managing access control policies are often error-prone due to the lack of effective analysis mechanisms and tools. In this dissertation, I present an Assurance Management Framework (AMF) that is designed to cope with various assurance management requirements from both access control system development and policy-based computing. On one hand, the AMF framework facilitates comprehensive analysis and thorough realization of formal access control models in secure system development. I demonstrate how this method can be applied to build role-based access control systems by adopting the NIST/ANSI RBAC standard as an underlying security model. On the other hand, the AMF framework ensures the correctness of access control policies in policy-based computing through automated reasoning techniques and anomaly management mechanisms. A systematic method is presented to formulate XACML in Answer Set Programming (ASP) that allows users to leverage off-the-shelf ASP solvers for a variety of analysis services. In addition, I introduce a novel anomaly management mechanism, along with a grid-based visualization approach, which enables systematic and effective detection and resolution of policy anomalies. I further evaluate the AMF framework through modeling and analyzing multiparty access control in Online Social Networks (OSNs). A MultiParty Access Control (MPAC) model is formulated to capture the essence of multiparty authorization requirements in OSNs. In particular, I show how AMF can be applied to OSNs for identifying and resolving privacy conflicts, and representing and reasoning about MPAC model and policy. To demonstrate the feasibility of the proposed methodology, a suite of proof-of-concept prototype systems is implemented as well.
Date Created
2012
Agent

Analysis and modeling of services impacts on system workload and performance in service-based systems (SBS)

150497-Thumbnail Image.png
Description
In recent years, service oriented computing (SOC) has become a widely accepted paradigm for the development of distributed applications such as web services, grid computing and cloud computing systems. In service-based systems (SBS), multiple service requests with specific performance requirements

In recent years, service oriented computing (SOC) has become a widely accepted paradigm for the development of distributed applications such as web services, grid computing and cloud computing systems. In service-based systems (SBS), multiple service requests with specific performance requirements make services compete for system resources. IT service providers need to allocate resources to services so the performance requirements of customers can be satisfied. Workload and performance models are required for efficient resource management and service performance assurance in SBS. This dissertation develops two methods to understand and model the cause-effect relations of service-related activities with resources workload and service performance. Part one presents an empirical method that requires the collection of system dynamics data and the application of statistical analyses. The results show that the method is capable to: 1) uncover the impacts of services on resource workload and service performance, 2) identify interaction effects of multiple services running concurrently, 3) gain insights about resource and performance tradeoffs of services, and 4) build service workload and performance models. In part two, the empirical method is used to investigate the impacts of services, security mechanisms and cyber attacks on resources workload and service performance. The information obtained is used to: 1) uncover interaction effects of services, security mechanisms and cyber attacks, 2) identify tradeoffs within limits of system resources, and 3) develop general/specific strategies for system survivability. Finally, part three presents a framework based on the usage profiles of services competing for resources and the resource-sharing schemes. The framework is used to: 1) uncover the impacts of service parameters (e.g. arrival distribution, execution time distribution, priority, workload intensity, scheduling algorithm) on workload and performance, and 2) build service workload and performance models at individual resources. The estimates obtained from service workload and performance models at individual resources can be aggregated to obtain overall estimates of services through multiple system resources. The workload and performance models of services obtained through both methods can be used for the efficient resource management and service performance assurance in SBS.
Date Created
2012
Agent