Analysis of No-Confounding Designs in 16 Runs for 9-14 Factors

171421-Thumbnail Image.png
Nonregular designs for 9-14 factors in 16 runs are a vital alternative for to theregular minimum aberration resolution III fractional factorials. Because there is no complete aliasing between the main factor and two factor interactions, these designs are useful as

Nonregular designs for 9-14 factors in 16 runs are a vital alternative for to theregular minimum aberration resolution III fractional factorials. Because there is no complete aliasing between the main factor and two factor interactions, these designs are useful as potential confusion in results is avoided. However, there is another associated complication to this kind of design due to the complete confounding for some of the two- factors. In this research, the focus is on using three different of methods and compare the results. The methods are: Stepwise, least absolute shrinkage and selection operator (LASSO) and the Dantzig selector method. In a previous research, Metcalfe discuss the nonregular designs for 6-8 factors design and studies several analysis methods. She also develops a new method, The Aliased Informed Model Selection (AIMS), for those designs. This research builds upon that. For this research, simulation is used to develop random models to analyze designs from the class of nonregular fractions with 9 – 14 factors in 16 runs using JMP scripting. Then, analyze the cases with the mentioned methods and find the success rate for each one. The model generations were random with only main factors, or main factors and two- factors interaction as active effects. Effect sizes of 2 and 3 standard deviations are studied. The nonregular design used in this research are 9 and 11-factors design. Results shows that there is a clear consistency for the main factors only as active effects using all the methods. However, adding the interactions to the active effects degrade the success rate substantially for the Dantzig method. Moreover, as the active effects exceed approximately half of the degrees of freedom for the design the performance for all i the methods decreases. Finally, some recommendations are discussed for further research investigation such as AIMS, other variation methods and Augmentation.
Date Created

Bayesian Methods for Tuning Hyperparameters of Loss Functions in Machine Learning

168839-Thumbnail Image.png
The introduction of parameterized loss functions for robustness in machine learning has led to questions as to how hyperparameter(s) of the loss functions can be tuned. This thesis explores how Bayesian methods can be leveraged to tune such hyperparameters. Specifically,

The introduction of parameterized loss functions for robustness in machine learning has led to questions as to how hyperparameter(s) of the loss functions can be tuned. This thesis explores how Bayesian methods can be leveraged to tune such hyperparameters. Specifically, a modified Gibbs sampling scheme is used to generate a distribution of loss parameters of tunable loss functions. The modified Gibbs sampler is a two-block sampler that alternates between sampling the loss parameter and optimizing the other model parameters. The sampling step is performed using slice sampling, while the optimization step is performed using gradient descent. This thesis explores the application of the modified Gibbs sampler to alpha-loss, a tunable loss function with a single parameter $\alpha \in (0,\infty]$, that is designed for the classification setting. Theoretically, it is shown that the Markov chain generated by a modified Gibbs sampling scheme is ergodic; that is, the chain has, and converges to, a unique stationary (posterior) distribution. Further, the modified Gibbs sampler is implemented in two experiments: a synthetic dataset and a canonical image dataset. The results show that the modified Gibbs sampler performs well under label noise, generating a distribution indicating preference for larger values of alpha, matching the outcomes of previous experiments.
Date Created

Formalizing Safety, Perception, and Mission Requirements for Testing and Planning in Autonomous Vehicles

161988-Thumbnail Image.png
Autonomous Vehicles (AV) are inevitable entities in future mobility systems thatdemand safety and adaptability as two critical factors in replacing/assisting human drivers. Safety arises in defining, standardizing, quantifying, and monitoring requirements for all autonomous components. Adaptability, on the other hand, involves efficient handling

Autonomous Vehicles (AV) are inevitable entities in future mobility systems thatdemand safety and adaptability as two critical factors in replacing/assisting human drivers. Safety arises in defining, standardizing, quantifying, and monitoring requirements for all autonomous components. Adaptability, on the other hand, involves efficient handling of uncertainty and inconsistencies in models and data. First, I address safety by presenting a search-based test-case generation framework that can be used in training and testing deep-learning components of AV. Next, to address adaptability, I propose a framework based on multi-valued linear temporal logic syntax and semantics that allows autonomous agents to perform model-checking on systems with uncertainties. The search-based test-case generation framework provides safety assurance guarantees through formalizing and monitoring Responsibility Sensitive Safety (RSS) rules. I use the RSS rules in signal temporal logic as qualification specifications for monitoring and screening the quality of generated test-drive scenarios. Furthermore, to extend the existing temporal-based formal languages’ expressivity, I propose a new spatio-temporal perception logic that enables formalizing qualification specifications for perception systems. All-in-one, my test-generation framework can be used for reasoning about the quality of perception, prediction, and decision-making components in AV. Finally, my efforts resulted in publicly available software. One is an offline monitoring algorithm based on the proposed logic to reason about the quality of perception systems. The other is an optimal planner (model checker) that accepts mission specifications and model descriptions in the form of multi-valued logic and multi-valued sets, respectively. My monitoring framework is distributed with the publicly available S-TaLiRo and Sim-ATAV tools.
Date Created

Making Bayesian Optimization Practical in the Context of High Dimensional, Highly Expensive, Black­Box Functions

161846-Thumbnail Image.png
Complex systems appear when interaction among system components creates emergent behavior that is difficult to be predicted from component properties. The growth of Internet of Things (IoT) and embedded technology has increased complexity across several sectors (e.g., automotive, aerospace, agriculture,

Complex systems appear when interaction among system components creates emergent behavior that is difficult to be predicted from component properties. The growth of Internet of Things (IoT) and embedded technology has increased complexity across several sectors (e.g., automotive, aerospace, agriculture, city infrastructures, home technologies, healthcare) where the paradigm of cyber-physical systems (CPSs) has become a standard. While CPS enables unprecedented capabilities, it raises new challenges in system design, certification, control, and verification. When optimizing system performance computationally expensive simulation tools are often required, and search algorithms that sequentially interrogate a simulator to learn promising solutions are in great demand. This class of algorithms are black-box optimization techniques. However, the generality that makes black-box optimization desirable also causes computational efficiency difficulties when applied real problems. This thesis focuses on Bayesian optimization, a prominent black-box optimization family, and proposes new principles, translated in implementable algorithms, to scale Bayesian optimization to highly expensive, large scale problems. Four problem contexts are studied and approaches are proposed for practically applying Bayesian optimization concepts, namely: (1) increasing sample efficiency of a highly expensive simulator in the presence of other sources of information, where multi-fidelity optimization is used to leverage complementary information sources; (2) accelerating global optimization in the presence of local searches by avoiding over-exploitation with adaptive restart behavior; (3) scaling optimization to high dimensional input spaces by integrating Game theoretic mechanisms with traditional techniques; (4) accelerating optimization by embedding function structure when the reward function is a minimum of several functions. In the first context this thesis produces two multi-fidelity algorithms, a sample driven and model driven approach, and is implemented to optimize a serial production line; in the second context the Stochastic Optimization with Adaptive Restart (SOAR) framework is produced and analyzed with multiple applications to CPS falsification problems; in the third context the Bayesian optimization with sample fictitious play (BOFiP) algorithm is developed with an implementation in high-dimensional neural network training; in the last problem context the minimum surrogate optimization (MSO) framework is produced and combined with both Bayesian optimization and the SOAR framework with applications in simultaneous falsification of multiple CPS requirements.
Date Created

GEM: An Efficient Entity Matching Framework for Geospatial Data

161829-Thumbnail Image.png
The use of spatial data has become very fundamental in today's world. Ranging from fitness trackers to food delivery services, almost all application records users' location information and require clean geospatial data to enhance various application features. As spatial data

The use of spatial data has become very fundamental in today's world. Ranging from fitness trackers to food delivery services, almost all application records users' location information and require clean geospatial data to enhance various application features. As spatial data flows in from heterogeneous sources various problems arise. The study of entity matching has been a fervent step in the process of producing clean usable data. Entity matching is an amalgamation of various sub-processes including blocking and matching. At the end of an entity matching pipeline, we get deduplicated records of the same real-world entity. Identifying various mentions of the same real-world locations is known as spatial entity matching. While entity matching received significant interest in the field of relational entity matching, the same cannot be said about spatial entity matching. In this dissertation, I build an end-to-end Geospatial Entity Matching framework, GEM, exploring spatial entity matching from a novel perspective. In the current state-of-the-art systems spatial entity matching is only done on one type of geometrical data variant. Instead of confining to matching spatial entities of only point geometry type, I work on extending the boundaries of spatial entity matching to match the more generic polygon geometry entities as well. I propose a methodology to provide support for three entity matching scenarios across different geometrical data types: point X point, point X polygon, polygon X polygon. As mentioned above entity matching consists of various steps but blocking, feature vector creation, and classification are the core steps of the system. GEM comprises an efficient and lightweight blocking technique, GeoPrune, that uses the geohash encoding mechanism to prune away the obvious non-matching spatial entities. Geohashing is a technique to convert a point location coordinates to an alphanumeric code string. This technique proves to be very effective and swift for the blocking mechanism. I leverage the Apache Sedona engine to create the feature vectors. Apache Sedona is a spatial database management system that holds the capacity of processing spatial SQL queries with multiple geometry types without compromising on their original coordinate vector representation. In this step, I re-purpose the spatial proximity operators (SQL queries) in Apache Sedona to create spatial feature dimensions that capture the proximity between a geospatial entity pair. The last step of an entity matching process is matching or classification. The classification step in GEM is a pluggable component, which consumes the feature vector for a spatial entity pair and determines whether the geolocations match or not. The component provides 3 machine learning models that consume the same feature vector and provide a label for the test data based on the training. I conduct experiments with the three classifiers upon multiple large-scale geospatial datasets consisting of both spatial and relational attributes. Data considered for experiments arrives from heterogeneous sources and we pre-align its schema manually. GEM achieves an F-measure of 1.0 for a point X point dataset with 176k total pairs, which is 42% higher than a state-of-the-art spatial EM baseline. It achieves F-measures of 0.966 and 0.993 for the point X polygon dataset with 302M total pairs, and the polygon X polygon dataset with 16M total pairs respectively.
Date Created

Disaster Analytics for Critical Infrastructures : Methods and Algorithms for Modeling Disasters and Proactive Recovery Preparedness

161785-Thumbnail Image.png
Natural disasters are occurring increasingly around the world, causing significant economiclosses. To alleviate their adverse effect, it is crucial to plan what should be done in response to them in a proactive manner. This research aims at developing proactive and real-time recovery

Natural disasters are occurring increasingly around the world, causing significant economiclosses. To alleviate their adverse effect, it is crucial to plan what should be done in response to them in a proactive manner. This research aims at developing proactive and real-time recovery algorithms for large-scale power networks exposed to weather events considering uncertainty. These algorithms support the recovery decisions to mitigate the disaster impact, resulting in faster recovery of the network. The challenges associated with developing these algorithms are summarized below: 1. Even ignoring uncertainty, when operating cost of the network is considered the problem will be a bi-level optimization which is NP-hard. 2. To meet the requirement for real-time decision making under uncertainty, the problem could be formulated a Stochastic Dynamic Program with the aim to minimize the total cost. However, considering the operating cost of the network violates the underlying assumptions of this approach. 3. Stochastic Dynamic Programming approach is also not applicable to realistic problem sizes, due to the curse of dimensionality. 4. Uncertainty-based approaches for failure modeling, rely on point-generation of failures and ignore the network structure. To deal with the first challenge, in chapter 2, a heuristic solution framework is proposed, and its performance is evaluated by conducting numerical experiments. To address the second challenge, in chapter 3, after formulating the problem as a Stochastic Dynamic Program, an approximated dynamic programming heuristic is proposed to solve the problem. Numerical experiments on synthetic and realistic test-beds, show the satisfactory performance of the proposed approach. To address the third challenge, in chapter 4, an efficient base heuristic policy and an aggregation scheme in the action space is proposed. Numerical experiments on a realistic test-bed verify the ability of the proposed method to recover the network more efficiently. Finally, to address the fourth challenge, in chapter 5, a simulation-based model is proposed that using historical data and accounting for the interaction between network components, allows for analyzing the impact of adverse events on regional service level. A realistic case study is then conducted to showcase the applicability of the approach.
Date Created

Optimization Based Verification and Synthesis for Safe Autonomy

161770-Thumbnail Image.png
Autonomous systems should satisfy a set of requirements that guarantee their safety, efficiency, and reliability when working under uncertain circumstances. These requirements can have financial, or legal implications or they can describe what is assigned to autonomous systems.As a result,

Autonomous systems should satisfy a set of requirements that guarantee their safety, efficiency, and reliability when working under uncertain circumstances. These requirements can have financial, or legal implications or they can describe what is assigned to autonomous systems.As a result, the system controller needs to be designed in order to comply with these - potentially complicated - requirements, and the closed-loop system needs to be tested and verified against these requirements. However, when the complexity of the system and its requirements increases, designing a requirement-based controller for the system and analyzing the closed-loop system against the requirement becomes very challenging. In this case, existing design and test methodologies based on trial-and-error would fail, and hence disciplined scientific approaches should be considered. To address some of these challenges, in this dissertation, I present different methods that facilitate efficient testing, and control design based on requirements: 1. Gradient-based methods for improved optimization-based testing, 2. Requirement-based learning for the design of neural-network controllers, 3. Methods based on barrier functions for designing control inputs that ensure the satisfaction of safety constraints.
Date Created

Proactive and Real-Time Optimal Control of Water Quality in Water Distribution Networks

161504-Thumbnail Image.png
Drinking water quality violations are widespread in the United States and elsewhere in the world. More than half of Americans are not confident in the safety of their tap water, especially after the 2014 Flint, Michigan water crisis. Other than

Drinking water quality violations are widespread in the United States and elsewhere in the world. More than half of Americans are not confident in the safety of their tap water, especially after the 2014 Flint, Michigan water crisis. Other than accidental contamination events, stagnation is a major cause of water quality degradation. Thus, there is a pressing need to build a real-time control system that can make control decisions quickly and proactively so that the quality of water can be maintained at all times. However, towards this end, modeling the dynamics of water distribution systems are very challenging due to the complex fluid dynamics and chemical reactions in the system. This challenge needs to be addressed before moving on to modeling the optimal control problem. The research in this dissertation leverages statistical machine learning approaches in approximating the complex water system dynamics and then develops different optimization models for proactive and real-time water quality control. This research focuses on two effective ways to maintain water quality, flushing of taps and injection of chlorine or other disinfectants; both of these actions decrease the equivalent “water age”, a useful proxy for water quality related to bacteria growth. This research first develops linear predictive models for water quality and subsequently linear programming optimization models for proactive water age control via flushing. The second part of the research considers both flushing and disinfectant injections in the control problem and develops mixed integer quadratically constrained optimization models for controlling water age. Different control strategies for disinfectant injections are also evaluated: binary on-off injections and continuous injections. In the third part of the research, water demand is assumed to be uncertain and stochastic. The developed approach to control the system relates to learning the optimal real-time flushing decisions by combing reinforced temporal-difference learning approaches with linear value function approximation for solving approximately the underlying Markov decision processes. Computational results on widely used simulation models demonstrates the developed control systems were indeed effective for water quality control with known demands as well as when demands are uncertain and stochastic.
Date Created

Co-simulation of Cyber-Physical Systems Using DEVS and Functional Mockup Units

161251-Thumbnail Image.png
Cyber-Physical Systems (CPS) are becoming increasingly prevalent around the world. Co-simulation of cyber and physical components has shown to be an effective way towards the development of time-sensitive and reliable CPS. Correctly combining continuous models with discrete models for co-simulation

Cyber-Physical Systems (CPS) are becoming increasingly prevalent around the world. Co-simulation of cyber and physical components has shown to be an effective way towards the development of time-sensitive and reliable CPS. Correctly combining continuous models with discrete models for co-simulation can often be challenging. In this thesis, the Functional Markup Interface (FMI) is used to develop an adapter called DEVS-FMI for the DEVS-Suite simulator. The adapter, implemented using JavaFMI 2.0, allows any Functional Mock-Up Unit (FMU) to be co-simulated with a Discrete Event System Specification (DEVS) model. This approach enables taking advantage of the parallel DEVS formalism to model cyber systems and using Modelica to model physical systems. An FMU serves as a slave simulator while the DEVS-Suite serves as a master simulator. The Four-Variable model is used as a guide to define the requirements for the inputs and outputs of actuator and sensor devices used in cyber and physical systems. The input and output data as non-functional abstractions of the sensor and actuator devices. Select cyber and physical parts of an electric scooter are chosen, modeled, simulated, and evaluated using the integrated OpenModelica and the DEVS-Suite simulators. Closely related research is briefly examined and expanding this work with support for implicit state-changes for continuous models and distributed co-simulation is noted.
Date Created

Understanding the Impact of Varied Testing and Infection Rates on Covid-19 Impact Across Age-Based Populations

147738-Thumbnail Image.png

Covid-19 is unlike any coronavirus we have seen before, characterized mostly by the ease with which it spreads. This analysis utilizes an SEIR model built to accommodate various populations to understand how different testing and infection rates may affect hospitalization

Covid-19 is unlike any coronavirus we have seen before, characterized mostly by the ease with which it spreads. This analysis utilizes an SEIR model built to accommodate various populations to understand how different testing and infection rates may affect hospitalization and death. This analysis finds that infection rates have a significant impact on Covid-19 impact regardless of the population whereas the impact that testing rates have in this simulation is not as pronounced. Thus, policy-makers should focus on decreasing infection rates through targeted lockdowns and vaccine rollout to contain the virus, and decrease its spread.

Date Created