Continuous-Time Reinforcement Learning: New Design Algorithms with Theoretical Insights and Performance Guarantees

191489-Thumbnail Image.png
Description
This dissertation discusses continuous-time reinforcement learning (CT-RL) for control of affine nonlinear systems. Continuous-time nonlinear optimal control problems hold great promise in real-world applications. After decades of development, reinforcement learning (RL) has achieved some of the greatest successes as a

This dissertation discusses continuous-time reinforcement learning (CT-RL) for control of affine nonlinear systems. Continuous-time nonlinear optimal control problems hold great promise in real-world applications. After decades of development, reinforcement learning (RL) has achieved some of the greatest successes as a general nonlinear control design method. Yet as RL control has developed, CT-RL results have greatly lagged their discrete-time RL (DT-RL) counterparts, especially in regards to real-world applications. Current CT-RL algorithms generally fall into two classes: adaptive dynamic programming (ADP), and actor-critic deep RL (DRL). The first school of ADP methods features elegant theoretical results stemming from adaptive and optimal control. Yet, they have not been shown effectively synthesizing meaningful controllers. The second school of DRL has shown impressive learning solutions, yet theoretical guarantees are still to be developed. A substantive analysis uncovering the quantitative causes of the fundamental gap between CT and DT remains to be conducted. Thus, this work develops a first-of-its kind quantitative evaluation framework to diagnose the performance limitations of the leading CT-RL methods. This dissertation also introduces a suite of new CT-RL algorithms which offers both theoretical and synthesis guarantees. The proposed design approach relies on three important factors. First, for physical systems that feature physically-motivated dynamical partitions into distinct loops, the proposed decentralization method breaks the optimal control problem into smaller subproblems. Second, the work introduces a new excitation framework to improve persistence of excitation (PE) and numerical conditioning via classical input/output insights. Third, the method scales the learning problem via design-motivated invertible transformations of the system state variables in order to modulate the algorithm learning regression for further increases in numerical stability. This dissertation introduces a suite of (decentralized) excitable integral reinforcement learning (EIRL) algorithms implementing these paradigms. It rigorously proves convergence, optimality, and closed-loop stability guarantees of the proposed methods, which are demonstrated in comprehensive comparative studies with the leading methods in ADP on a significant application problem of controlling an unstable, nonminimum phase hypersonic vehicle (HSV). It also conducts comprehensive comparative studies with the leading DRL methods on three state-of-the-art (SOTA) environments, revealing new performance/design insights.
Date Created
2024
Agent

Dynamic Modeling, System Identification, and Control Engineering Approaches for Designing Optimized and Perpetually Adaptive Behavioral Health Interventions

162018-Thumbnail Image.png
Description
Behavior-driven obesity has become one of the most challenging global epidemics since the 1990s, and is presently associated with the leading causes of death in the U.S. and worldwide, including diabetes, cardiovascular disease, strokes, and some forms of cancer. The

Behavior-driven obesity has become one of the most challenging global epidemics since the 1990s, and is presently associated with the leading causes of death in the U.S. and worldwide, including diabetes, cardiovascular disease, strokes, and some forms of cancer. The use of system identification and control engineering principles in the design of novel and perpetually adaptive behavioral health interventions for promoting physical activity and healthy eating has been the central theme in many recent contributions. However, the absence of experimental studies specifically designed with the purpose of developing control-oriented behavioral models has restricted prior efforts in this domain to the use of hypothetical simulations to demonstrate the potential viability of these interventions. In this dissertation, the use of first-of-a-kind, real-life experimental results to develop dynamic, participant-validated behavioral models essential for the design and evaluation of optimized and adaptive behavioral interventions is examined. Following an intergenerational approach, the first part of this work aims to develop a dynamical systems model of intrauterine fetal growth with the prime goal of predicting infant birth weight, which has been associated with subsequent childhood and adult-onset obesity. The use of longitudinal input-output data from the “Healthy Mom Zone” intervention study has enabled the estimation and validation of this fetoplacental model. The second part establishes a set of data-driven behavioral models founded on Social Cognitive Theory (SCT). The “Just Walk” intervention experiment, developed at Arizona State University using system identification principles, has lent a unique opportunity to estimate and validate both black-box and semiphysical SCT models for predicting physical activity behavior. Further, this dissertation addresses some of the model estimation challenges arising from the limitations of “Just Walk”, including the need for developing nontraditional modeling approaches for short datasets, as well as delivers a new theoretical and algorithmic framework for structured state-space model estimation that can be used in a broader set of application domains. Finally, adaptive closed-loop intervention simulations of participant-validated SCT models from “Just Walk” are presented using a Hybrid Model Predictive Control (HMPC) control law. A simple HMPC controller reconfiguration strategy for designing both single- and multi-phase intervention designs is proposed.
Date Created
2021
Agent

Control and Estimation Theory in Ranging Applications

158028-Thumbnail Image.png
Description
For the last 50 years, oscillator modeling in ranging systems has received considerable

attention. Many components in a navigation system, such as the master oscillator

driving the receiver system, as well the master oscillator in the transmitting system

contribute significantly to timing errors.

For the last 50 years, oscillator modeling in ranging systems has received considerable

attention. Many components in a navigation system, such as the master oscillator

driving the receiver system, as well the master oscillator in the transmitting system

contribute significantly to timing errors. Algorithms in the navigation processor must

be able to predict and compensate such errors to achieve a specified accuracy. While

much work has been done on the fundamentals of these problems, the thinking on said

problems has not progressed. On the hardware end, the designers of local oscillators

focus on synthesized frequency and loop noise bandwidth. This does nothing to

mitigate, or reduce frequency stability degradation in band. Similarly, there are not

systematic methods to accommodate phase and frequency anomalies such as clock

jumps. Phase locked loops are fundamentally control systems, and while control

theory has had significant advancement over the last 30 years, the design of timekeeping

sources has not advanced beyond classical control. On the software end,

single or two state oscillator models are typically embedded in a Kalman Filter to

alleviate time errors between the transmitter and receiver clock. Such models are

appropriate for short term time accuracy, but insufficient for long term time accuracy.

Additionally, flicker frequency noise may be present in oscillators, and it presents

mathematical modeling complications. This work proposes novel H∞ control methods

to address the shortcomings in the standard design of time-keeping phase locked loops.

Such methods allow the designer to address frequency stability degradation as well

as high phase/frequency dynamics. Additionally, finite-dimensional approximants of

flicker frequency noise that are more representative of the truth system than the

tradition Gauss Markov approach are derived. Last, to maintain timing accuracy in

a wide variety of operating environments, novel Banks of Adaptive Extended Kalman

Filters are used to address both stochastic and dynamic uncertainty.
Date Created
2020
Agent

Fabrication, Modeling and Control of a Spherical Tail-Sitter UAV

156708-Thumbnail Image.png
Description
In the past decade, real-world applications of Vertical Take-Off and Landing (VTOL) Unmanned Aerial Vehicles (UAV) have increased significantly. There has been growing interest in one of these types of UAVs, called a tail-sitter UAV, due to its VTOL and

In the past decade, real-world applications of Vertical Take-Off and Landing (VTOL) Unmanned Aerial Vehicles (UAV) have increased significantly. There has been growing interest in one of these types of UAVs, called a tail-sitter UAV, due to its VTOL and cruise capabilities. This thesis presents the fabrication of a spherical tail-sitter UAV and derives a nonlinear mathematical model of its dynamics. The singularity in the attitude kinematics of the vehicle is avoided using Modified Rodrigues Parameters (MRP). The model parameters of the fabricated vehicle are calculated using the bifilar pendulum method, a motor stand, and ANSYS simulation software. Then the trim conditions at hover are calculated for the nonlinear model, and the rotational dynamics of the model are linearized around the equilibrium state with the calculated trim conditions. Robust controllers are designed to stabilize the UAV in hover using the H2 control and H-infinity control methodologies. For H2 control design, Linear Quadratic Gaussian (LQG) control is used. For the H infinity control design, Linear Matrix Inequalities (LMI) with frequency-dependent weights are derived and solved using the MATLAB toolbox YALMIP. In addition, a nonlinear controller is designed using the Sum-of-Squares (SOS) method to implement large-angle maneuvers for transitions between horizontal flight and vertical flight. Finally, the linear controllers are implemented in the fabricated spherical tail-sitter UAV for experimental validation. The performance trade-offs and the response of the UAV with the linear and nonlinear controllers are discussed in detail.
Date Created
2018
Agent

A system identification and control engineering approach for optimizing mHealth behavioral interventions based on social cognitive theory

154920-Thumbnail Image.png
Description
Behavioral health problems such as physical inactivity are among the main causes of mortality around the world. Mobile and wireless health (mHealth) interventions offer the opportunity for applying control engineering concepts in behavioral change settings. Social Cognitive Theory (SCT) is

Behavioral health problems such as physical inactivity are among the main causes of mortality around the world. Mobile and wireless health (mHealth) interventions offer the opportunity for applying control engineering concepts in behavioral change settings. Social Cognitive Theory (SCT) is among the most influential theories of health behavior and has been used as the conceptual basis of many behavioral interventions. This dissertation examines adaptive behavioral interventions for physical inactivity problems based on SCT using system identification and control engineering principles. First, a dynamical model of SCT using fluid analogies is developed. The model is used throughout the dissertation to evaluate system identification approaches and to develop control strategies based on Hybrid Model Predictive Control (HMPC). An initial system identification informative experiment is designed to obtain basic insights about the system. Based on the informative experimental results, a second optimized experiment is developed as the solution of a formal constrained optimization problem. The concept of Identification Test Monitoring (ITM) is developed for determining experimental duration and adjustments to the input signals in real time. ITM relies on deterministic signals, such as multisines, and uncertainty regions resulting from frequency domain transfer function estimation that is performed during experimental execution. ITM is motivated by practical considerations in behavioral interventions; however, a generalized approach is presented for broad-based multivariable application settings such as process control. Stopping criteria for the experimental test utilizing ITM are developed using both open-loop and robust control considerations.

A closed-loop intensively adaptive intervention for physical activity is proposed relying on a controller formulation based on HMPC. The discrete and logical features of HMPC naturally address the categorical nature of the intervention components that include behavioral goals and reward points. The intervention incorporates online controller reconfiguration to manage the transition between the behavioral initiation and maintenance training stages. Simulation results are presented to illustrate the performance of the system using a model for a hypothetical participant under realistic conditions that include uncertainty. The contributions of this dissertation can ultimately impact novel applications of cyberphysical system in medical applications.
Date Created
2016
Agent

Multidisciplinary optimization for the design and control of uncertain dynamical systems

152420-Thumbnail Image.png
Description
This dissertation considers an integrated approach to system design and controller design based on analyzing limits of system performance. Historically, plant design methodologies have not incorporated control relevant considerations. Such an approach could result in a system that might not

This dissertation considers an integrated approach to system design and controller design based on analyzing limits of system performance. Historically, plant design methodologies have not incorporated control relevant considerations. Such an approach could result in a system that might not meet its specifications (or one that requires a complex control architecture to do so). System and controller designers often go through several iterations in order to converge to an acceptable plant and controller design. The focus of this dissertation is on the design and control an air-breathing hypersonic vehicle using such an integrated system-control design framework. The goal is to reduce the number of system-control design iterations (by explicitly incorporate control considerations in the system design process), as well as to influence the guidance/trajectory specifications for the system. Due to the high computational costs associated with obtaining a dynamic model for each plant configuration considered, approximations to the system dynamics are used in the control design process. By formulating the control design problem using bilinear and polynomial matrix inequalities, several common control and system design constraints can be simultaneously incorporated into a vehicle design optimization. Several design problems are examined to illustrate the effectiveness of this approach (and to compare the computational burden of this methodology against more traditional approaches).
Date Created
2014
Agent

H-infinity control design via convex optimization: toward a comprehensive design environment

152341-Thumbnail Image.png
Description
The problem of systematically designing a control system continues to remain a subject of intense research. In this thesis, a very powerful control system design environment for Linear Time-Invariant (LTI) Multiple-Input Multiple-Output (MIMO) plants is presented. The environment has been

The problem of systematically designing a control system continues to remain a subject of intense research. In this thesis, a very powerful control system design environment for Linear Time-Invariant (LTI) Multiple-Input Multiple-Output (MIMO) plants is presented. The environment has been designed to address a broad set of closed loop metrics and constraints; e.g. weighted H-infinity closed loop performance subject to closed loop frequency and/or time domain constraints (e.g. peak frequency response, peak overshoot, peak controls, etc.). The general problem considered - a generalized weighted mixed-sensitivity problem subject to constraints - permits designers to directly address and tradeoff multivariable properties at distinct loop breaking points; e.g. at plant outputs and at plant inputs. As such, the environment is particularly powerful for (poorly conditioned) multivariable plants. The Youla parameterization is used to parameterize the set of all stabilizing LTI proper controllers. This is used to convexify the general problem being addressed. Several bases are used to turn the resulting infinite-dimensional problem into a finite-dimensional problem for which there exist many efficient convex optimization algorithms. A simple cutting plane algorithm is used within the environment. Academic and physical examples are presented to illustrate the utility of the environment.
Date Created
2013
Agent

Portfolio modeling, analysis and management

149506-Thumbnail Image.png
Description
A systematic top down approach to minimize risk and maximize the profits of an investment over a given period of time is proposed. Macroeconomic factors such as Gross Domestic Product (GDP), Consumer Price Index (CPI), Outstanding Consumer Credit, Industrial Production

A systematic top down approach to minimize risk and maximize the profits of an investment over a given period of time is proposed. Macroeconomic factors such as Gross Domestic Product (GDP), Consumer Price Index (CPI), Outstanding Consumer Credit, Industrial Production Index, Money Supply (MS), Unemployment Rate, and Ten-Year Treasury are used to predict/estimate asset (sector ETF`s) returns. Fundamental ratios of individual stocks are used to predict the stock returns. An a priori known cash-flow sequence is assumed available for investment. Given the importance of sector performance on stock performance, sector based Exchange Traded Funds (ETFs) for the S&P; and Dow Jones are considered and wealth is allocated. Mean variance optimization with risk and return constraints are used to distribute the wealth in individual sectors among the selected stocks. The results presented should be viewed as providing an outer control/decision loop generating sector target allocations that will ultimately drive an inner control/decision loop focusing on stock selection. Receding horizon control (RHC) ideas are exploited to pose and solve two relevant constrained optimization problems. First, the classic problem of wealth maximization subject to risk constraints (as measured by a metric on the covariance matrices) is considered. Special consideration is given to an optimization problem that attempts to minimize the peak risk over the prediction horizon, while trying to track a wealth objective. It is concluded that this approach may be particularly beneficial during downturns - appreciably limiting downside during downturns while providing most of the upside during upturns. Investment in stocks during upturns and in sector ETF`s during downturns is profitable.
Date Created
2010
Agent