Existing machine learning and data mining techniques have difficulty in handling three characteristics of real-world data sets altogether in a computationally efficient way: (1) different data types with both categorical data and numeric data, (2) different variable relations in different…
Existing machine learning and data mining techniques have difficulty in handling three characteristics of real-world data sets altogether in a computationally efficient way: (1) different data types with both categorical data and numeric data, (2) different variable relations in different value ranges of variables, and (3) unknown variable dependency.This dissertation developed a Partial-Value Association Discovery (PVAD) algorithm to overcome the above drawbacks in existing techniques. It also enables the discovery of partial-value and full-value variable associations showing both effects
of individual variables and interactive effects of multiple variables. The algorithm is compared with Association rule mining and Decision Tree for validation purposes. The results show that the PVAD algorithm can overcome the shortcomings of existing methods.
The second part of this dissertation focuses on knee point detection on noisy data. This extended research topic was inspired during the investigation into categorization for numeric data, which corresponds to Step 1 of the PVAD algorithm. A new mathematical definition of knee point on discrete data is introduced. Due to the unavailability of ground truth data or benchmark data sets, functions used to generate synthetic data are carefully selected and defined. These functions are subsequently employed to create the data sets for this experiment. These synthetic data sets are useful for systematically evaluating and comparing the performance of existing methods. Additionally, a deep-learning model is devised for this problem. Experiments show that the proposed model surpasses existing methods in all synthetic data sets, regardless of whether the samples have single or multiple knee points.
The third section presents the application results of the PVAD algorithm to real-world
data sets in various domains. These include energy consumption data of an Arizona State University (ASU) building, Computer Network, and ASU Engineering Freshmen Retention. The PVAD algorithm is utilized to create an associative network for energy consumption modeling, analyze univariate and multivariate measures of network flow variables, and identify common and uncommon characteristics related to engineering student retention after their first year at the university. The findings indicate that the PVAD algorithm offers the advantage and capability to uncover variable relationships.
Date Created
The date the item was original created (prior to any relationship with the ASU Digital Repositories.)
Efforts to enhance the quality of life and promote better health have led to improved water quality standards. Adequate daily fluid intake, primarily from tap water, is crucial for human health. By improving drinking water quality, negative health effects associated…
Efforts to enhance the quality of life and promote better health have led to improved water quality standards. Adequate daily fluid intake, primarily from tap water, is crucial for human health. By improving drinking water quality, negative health effects associated with consuming inadequate water can be mitigated. Although the United States Environmental Protection Agency (EPA) sets and enforces federal water quality limits at water treatment plants, water quality reaching end users degrades during the water delivery process, emphasizing the need for proactive control systems in buildings to ensure safe drinking water.Future commercial and institutional buildings are anticipated to feature real-time water quality sensors, automated flushing and filtration systems, temperature control devices, and chemical boosters. Integrating these technologies with a reliable water quality control system that optimizes the use of chemical additives, filtration, flushing, and temperature adjustments ensures users consistently have access to water of adequate quality. Additionally, existing buildings can be retrofitted with these technologies at a reasonable cost, guaranteeing user safety.
In the absence of smart buildings with the required technology, Chapter 2 describes developing an EPANET-MSX (a multi-species extension of EPA’s water simulation tool) model for a typical 5-story building. Chapter 3 involves creating accurate nonlinear approximation models of EPANET-MSX’s complex fluid dynamics and chemical reactions and developing an open-loop water quality control system that can regulate the water quality based on the approximated state of water quality. To address potential sudden changes in water quality, improve predictions, and reduce the gap between approximated and true state of water quality, a feedback control loop is developed in Chapter 4.
Lastly, this dissertation includes the development of a reinforcement learning (RL) based water quality control system for cases where the approximation models prove inadequate and cause instability during implementation with a real building water network. The RL-based control system can be implemented in various buildings without the need to develop new hydraulic models and can handle the stochastic nature of water demand, ensuring the proactive control system’s effectiveness in maintaining water quality within safe limits for consumption.
Date Created
The date the item was original created (prior to any relationship with the ASU Digital Repositories.)
In this dissertation, a cyber-physical system called MIDAS (Managing Interacting Demand And Supply) has been developed, where the “supply” refers to the transportation infrastructure including traffic controls while the “demand” refers to its dynamic traffic loads. The strength of MIDAS…
In this dissertation, a cyber-physical system called MIDAS (Managing Interacting Demand And Supply) has been developed, where the “supply” refers to the transportation infrastructure including traffic controls while the “demand” refers to its dynamic traffic loads. The strength of MIDAS lies in its ability to proactively control and manage mixed vehicular traffic, having various levels of autonomy, through traffic intersections. Using real-time traffic control algorithms MIDAS minimizes wait times, congestion, and travel times on existing roadways. For traffic engineers, efficient control of complicated traffic movements used at diamond interchanges (DI), which interface streets with freeways, is challenging for normal human driven vehicular traffic, let alone for communicationally-connected vehicles (CVs) due to stochastic demand and uncertainties. This dissertation first develops a proactive traffic control algorithm, MIDAS, using forward-recursion dynamic programming (DP), for scheduling large set of traffic movements of non-connected vehicles and CVs at the DIs, over a finite-time horizon. MIDAS captures measurements from fixed detectors and captures Lagrangian measurements from CVs, to estimate link travel times, arrival times and turning movements. Simulation study shows MIDAS’ outperforms (a) a current optimal state-of-art optimal fixed-cycle time control scheme, and (b) a state-of-art traffic adaptive cycle-free scheme.
Subsequently, this dissertation addresses the challenges of improving the road capacity by platooning fully autonomous vehicles (AVs), resulting in smaller headways and greater road utilization. With the MIDAS AI (Autonomous Intersection) control, an effective platooning strategy is developed, and optimal release sequence of AVs is determined using a new forward-recursive DP that minimizes the time-loss delays of AVs. MIDAS AI evaluates the DP decisions every second and communicates optimal actions to the AVs.
Although MIDAS AI’s exact DP achieves optimal solution in almost real-time compared to other exact algorithms, it suffers from scalability. To address this challenge, the dissertation then develops MIDAS RAIC (Reinforced Autonomous Intersection Control), a deep reinforcement learning based real-time dynamic traffic control system for AVs at an intersection. Simulation results show the proposed deep Q-learning architecture trains MIDAS RAIC to learn a near-optimal policy that minimizes the total cumulative time loss delay and performs nearly as well as the MIDAS AI.
Date Created
The date the item was original created (prior to any relationship with the ASU Digital Repositories.)
Sequential event prediction or sequential pattern mining is a well-studied topic in the literature. There are a lot of real-world scenarios where the data is released sequentially. People believe that there exist repetitive patterns of event sequences so that the…
Sequential event prediction or sequential pattern mining is a well-studied topic in the literature. There are a lot of real-world scenarios where the data is released sequentially. People believe that there exist repetitive patterns of event sequences so that the future events can be predicted. For example, many companies build their recommender system to predict the next possible product for the users according to their purchase history. The healthcare system discovers the relationships among patients’ sequential symptoms to mitigate the adverse effect of a treatment (drugs or surgery). Modern engineering systems like aviation/distributed computing/energy systems diagnosed failure event logs and took prompt actions to avoid disaster when a similar failure pattern occurs. In this dissertation, I specifically focus on building a scalable algorithm for event prediction and extraction in the aviation domain. Understanding the accident event is always the major concern of the safety issue in the aviation system. A flight accident is often caused by a sequence of failure events. Accurate modeling of the failure event sequence and how it leads to the final accident is important for aviation safety. This work aims to study the relationship of the failure event sequence and evaluate the risk of the final accident according to these failure events. There are three major challenges I am trying to deal with. (1) Modeling Sequential Events with Hierarchical Structure: I aim to improve the prediction accuracy by taking advantage of the multi-level or hierarchical representation of these rare events. Specifically, I proposed to build a sequential Encoder-Decoder framework with a hierarchical embedding representation of the events. (2) Lack of high-quality and consistent event log data: In order to acquire more accurate event data from aviation accident reports, I convert the problem into a multi-label classification. An attention-based Bidirectional Encoder Representations from Transformers model is developed to achieve good performance and interpretability. (3) Ontology-based event extraction: In order to extract detailed events, I proposed to solve the problem as a hierarchical classification task. I improve the model performance by incorporating event ontology. By solving these three challenges, I provide a framework to extract events from narrative reports and estimate the risk level of aviation accidents through event sequence modeling.
Date Created
The date the item was original created (prior to any relationship with the ASU Digital Repositories.)
Additive manufacturing consists of successive fabrication of materials layer upon layer to manufacture three-dimensional items. Several key problems such as poor quality of finished products and excessive operational costs are yet to be addressed before it becomes widely applicable…
Additive manufacturing consists of successive fabrication of materials layer upon layer to manufacture three-dimensional items. Several key problems such as poor quality of finished products and excessive operational costs are yet to be addressed before it becomes widely applicable in the industry. Retroactive/offline actions such as post-manufacturing inspections for defect detection in finished products are not only extremely expensive and ineffective but are also incapable of issuing corrective action signals during the building span. In-situ monitoring and optimal control methods, on the other hand, can provide viable alternatives to aid with the online detection of anomalies and control the process. Nevertheless, the complexity of process assumptions, unique structure of collected data, and high-frequency data acquisition rate severely deteriorates the performance of traditional and parametric control and process monitoring approaches. Out of diverse categories of additive manufacturing, Large-Scale Additive Manufacturing (LSAM) by material extrusion and Laser Powder Bed Fusion (LPBF) suffer the most due to their more advanced technologies and are therefore the subjects of study in this work. In LSAM, the geometry of large parts can impact the heat dissipation and lead to large thermal gradients between distance locations on the surface. The surface's temperature profile is captured by an infrared thermal camera and translated to a non-linear regression model to formulate the surface cooling dynamics. The surface temperature prediction methodology is then combined into an optimization model with probabilistic constraints for real-time layer time and material flow control. On-axis optical high-speed cameras can capture streams of melt pool images of laser-powder interaction in real-time during the process. Model-agnostic deep learning methods offer a great deal of flexibility when facing such unstructured big data and thus are appealing alternatives to their physical-related and regression-based modeling counterparts. A configuration of Convolutional Long-Short Term Memory (ConvLSTM) auto-encoder is proposed to learn a deep spatio-temporal representation from sequences of melt pool images collected from experimental builds. The unfolded bottleneck tensors are then further mined to construct a high accuracy and low false alarm rate anomaly detection and monitoring procedure.
Date Created
The date the item was original created (prior to any relationship with the ASU Digital Repositories.)
Recent advancements in machine learning methods have allowed companies to develop advanced computer vision aided production lines that take advantage of the raw and labeled data captured by high-definition cameras mounted at vantage points in their factory floor. We experiment…
Recent advancements in machine learning methods have allowed companies to develop advanced computer vision aided production lines that take advantage of the raw and labeled data captured by high-definition cameras mounted at vantage points in their factory floor. We experiment with two different methods of developing one such system to automatically track key components on a production line. By tracking the state of these key components using object detection we can accurately determine and report production line metrics like part arrival and start/stop times for key factory processes. We began by collecting and labeling raw image data from the cameras overlooking the factory floor. Using that data we trained two dedicated object detection models. Our training utilized transfer learning to start from a Faster R-CNN ResNet model trained on Microsoft’s COCO dataset. The first model we developed is a binary classifier that detects the state of a single object while the second model is a multiclass classifier that detects the state of two distinct objects on the factory floor. Both models achieved over 95% classification and localization accuracy on our test datasets. Having two additional classes did not affect the classification or localization accuracy of the multiclass model compared to the binary model.
Date Created
The date the item was original created (prior to any relationship with the ASU Digital Repositories.)
Natural disasters are occurring increasingly around the world, causing significant economiclosses. To alleviate their adverse effect, it is crucial to plan what should be done in response
to them in a proactive manner. This research aims at developing proactive and real-time
recovery…
Natural disasters are occurring increasingly around the world, causing significant economiclosses. To alleviate their adverse effect, it is crucial to plan what should be done in response
to them in a proactive manner. This research aims at developing proactive and real-time
recovery algorithms for large-scale power networks exposed to weather events considering
uncertainty. These algorithms support the recovery decisions to mitigate the disaster impact, resulting in faster recovery of the network. The challenges associated with developing
these algorithms are summarized below:
1. Even ignoring uncertainty, when operating cost of the network is considered the problem will be a bi-level optimization which is NP-hard.
2. To meet the requirement for real-time decision making under uncertainty, the problem could be formulated a Stochastic Dynamic Program with the aim to minimize
the total cost. However, considering the operating cost of the network violates the
underlying assumptions of this approach.
3. Stochastic Dynamic Programming approach is also not applicable to realistic problem sizes, due to the curse of dimensionality.
4. Uncertainty-based approaches for failure modeling, rely on point-generation of failures and ignore the network structure.
To deal with the first challenge, in chapter 2, a heuristic solution framework is proposed, and its performance is evaluated by conducting numerical experiments. To address the second challenge, in chapter 3, after formulating the problem as a Stochastic Dynamic Program, an approximated dynamic programming heuristic is proposed to solve the problem. Numerical experiments on synthetic and realistic test-beds, show the satisfactory performance of the proposed approach. To address the third challenge, in chapter 4, an efficient base heuristic policy and an aggregation scheme in the action space is proposed. Numerical experiments on a realistic test-bed verify the ability of the proposed method to
recover the network more efficiently. Finally, to address the fourth challenge, in chapter 5, a simulation-based model is proposed that using historical data and accounting for the interaction between network components, allows for analyzing the impact of adverse events on regional service level. A realistic case study is then conducted to showcase the applicability of the approach.
Date Created
The date the item was original created (prior to any relationship with the ASU Digital Repositories.)
Nonalcoholic Steatohepatitis (NASH) is a severe form of Nonalcoholic fatty liverdisease, that is caused due to excessive calorie intake, sedentary lifestyle and in the
absence of severe alcohol consumption. It is widely prevalent in the United States
and in many other developed…
Nonalcoholic Steatohepatitis (NASH) is a severe form of Nonalcoholic fatty liverdisease, that is caused due to excessive calorie intake, sedentary lifestyle and in the
absence of severe alcohol consumption. It is widely prevalent in the United States
and in many other developed countries, affecting up to 25 percent of the population.
Due to being asymptotic, it usually goes unnoticed and may lead to liver failure if
not treated at the right time.
Currently, liver biopsy is the gold standard to diagnose NASH, but being an
invasive procedure, it comes with it's own complications along with the inconvenience
of sampling repeated measurements over a period of time. Hence, noninvasive
procedures to assess NASH are urgently required. Magnetic Resonance Elastography
(MRE) based Shear Stiffness and Loss Modulus along with Magnetic Resonance
Imaging based proton density fat fraction have been successfully combined to predict
NASH stages However, their role in the prediction of disease progression still remains
to be investigated.
This thesis thus looks into combining features from serial MRE observations to
develop statistical models to predict NASH progression. It utilizes data from an experiment
conducted on male mice to develop progressive and regressive NASH and
trains ordinal models, ordered probit regression and ordinal forest on labels generated
from a logistic regression model. The models are assessed on histological data collected
at the end point of the experiment. The models developed provide a framework
to utilize a non-invasive tool to predict NASH disease progression.
Date Created
The date the item was original created (prior to any relationship with the ASU Digital Repositories.)
A production system is commonly restricted by time windows. For example, perishability is a major concern in food processing and requires products, such as yogurt, beer and meat, not to stay in buffer for long. Semiconductor manufacturing is faced with…
A production system is commonly restricted by time windows. For example, perishability is a major concern in food processing and requires products, such as yogurt, beer and meat, not to stay in buffer for long. Semiconductor manufacturing is faced with oxidation and moisture absorption issues, if a product in buffer is exposed to air for long. Machine reliability is a major source of uncertainty in production systems that causes residence time constraints to be unsatisfied, leading to potential product quality issues. Rapid advances in sensor technology and automation provide potentials to manage production in real time, but the system complexity, brought by residence time constraints, makes it difficult to optimize system performance while providing a guaranteed product quality. To contribute to this end, this dissertation is dedicated to modeling, analysis and control of production systems with constrained time windows. This study starts with a small-scale serial production line with two machines and one buffer. Even the simplest serial lines could have too large state space due to the consideration of residence time constraints. A Markov chain model is developed to approximately analyze its transient behavior with a high accuracy. An iterative learning algorithm is proposed to perform real-time control. The analysis of two-machine serial line contributes to the further analysis of more general and complex serial lines with multiple machines. Residence time constraints can be required in multiple stages. To deal with it, a two-machine-one-buffer subsystem isolated from a multi-stage serial production line is firstly analyzed and then acts as a building block to support the aggregation method for overall system performance. The proposed aggregation method substantially reduces the complexity of the problem while maintaining a high accuracy. A decomposition-based control approach is proposed to control a multi-stage serial production line. A production system is decomposed into small-scale subsystems, and an iterative aggregation procedure is then used to generate a production control policy. The decomposition-based control approach outperforms general-purpose reinforcement learning method by delivering significant system performance improvement and substantial reduction on computation overhead. Semiconductor assembly line is a typical production system, where products are restricted by time windows and production can be disrupted by machine failures. A production control problem of semiconductor assembly line is presented and studied, and thus total lot delay time and residence time constraint violation are minimized.
Date Created
The date the item was original created (prior to any relationship with the ASU Digital Repositories.)
Drinking water quality violations are widespread in the United States and elsewhere in the world. More than half of Americans are not confident in the safety of their tap water, especially after the 2014 Flint, Michigan water crisis. Other than…
Drinking water quality violations are widespread in the United States and elsewhere in the world. More than half of Americans are not confident in the safety of their tap water, especially after the 2014 Flint, Michigan water crisis. Other than accidental contamination events, stagnation is a major cause of water quality degradation. Thus, there is a pressing need to build a real-time control system that can make control decisions quickly and proactively so that the quality of water can be maintained at all times. However, towards this end, modeling the dynamics of water distribution systems are very challenging due to the complex fluid dynamics and chemical reactions in the system. This challenge needs to be addressed before moving on to modeling the optimal control problem. The research in this dissertation leverages statistical machine learning approaches in approximating the complex water system dynamics and then develops different optimization models for proactive and real-time water quality control. This research focuses on two effective ways to maintain water quality, flushing of taps and injection of chlorine or other disinfectants; both of these actions decrease the equivalent “water age”, a useful proxy for water quality related to bacteria growth. This research first develops linear predictive models for water quality and subsequently linear programming optimization models for proactive water age control via flushing. The second part of the research considers both flushing and disinfectant injections in the control problem and develops mixed integer quadratically constrained optimization models for controlling water age. Different control strategies for disinfectant injections are also evaluated: binary on-off injections and continuous injections. In the third part of the research, water demand is assumed to be uncertain and stochastic. The developed approach to control the system relates to learning the optimal real-time flushing decisions by combing reinforced temporal-difference learning approaches with linear value function approximation for solving approximately the underlying Markov decision processes. Computational results on widely used simulation models demonstrates the developed control systems were indeed effective for water quality control with known demands as well as when demands are uncertain and stochastic.
Date Created
The date the item was original created (prior to any relationship with the ASU Digital Repositories.)