Continuous-Time Reinforcement Learning: New Design Algorithms with Theoretical Insights and Performance Guarantees

191489-Thumbnail Image.png
Description
This dissertation discusses continuous-time reinforcement learning (CT-RL) for control of affine nonlinear systems. Continuous-time nonlinear optimal control problems hold great promise in real-world applications. After decades of development, reinforcement learning (RL) has achieved some of the greatest successes as a

This dissertation discusses continuous-time reinforcement learning (CT-RL) for control of affine nonlinear systems. Continuous-time nonlinear optimal control problems hold great promise in real-world applications. After decades of development, reinforcement learning (RL) has achieved some of the greatest successes as a general nonlinear control design method. Yet as RL control has developed, CT-RL results have greatly lagged their discrete-time RL (DT-RL) counterparts, especially in regards to real-world applications. Current CT-RL algorithms generally fall into two classes: adaptive dynamic programming (ADP), and actor-critic deep RL (DRL). The first school of ADP methods features elegant theoretical results stemming from adaptive and optimal control. Yet, they have not been shown effectively synthesizing meaningful controllers. The second school of DRL has shown impressive learning solutions, yet theoretical guarantees are still to be developed. A substantive analysis uncovering the quantitative causes of the fundamental gap between CT and DT remains to be conducted. Thus, this work develops a first-of-its kind quantitative evaluation framework to diagnose the performance limitations of the leading CT-RL methods. This dissertation also introduces a suite of new CT-RL algorithms which offers both theoretical and synthesis guarantees. The proposed design approach relies on three important factors. First, for physical systems that feature physically-motivated dynamical partitions into distinct loops, the proposed decentralization method breaks the optimal control problem into smaller subproblems. Second, the work introduces a new excitation framework to improve persistence of excitation (PE) and numerical conditioning via classical input/output insights. Third, the method scales the learning problem via design-motivated invertible transformations of the system state variables in order to modulate the algorithm learning regression for further increases in numerical stability. This dissertation introduces a suite of (decentralized) excitable integral reinforcement learning (EIRL) algorithms implementing these paradigms. It rigorously proves convergence, optimality, and closed-loop stability guarantees of the proposed methods, which are demonstrated in comprehensive comparative studies with the leading methods in ADP on a significant application problem of controlling an unstable, nonminimum phase hypersonic vehicle (HSV). It also conducts comprehensive comparative studies with the leading DRL methods on three state-of-the-art (SOTA) environments, revealing new performance/design insights.
Date Created
2024
Agent

Dynamics, Directional Maneuverability and Optimization Based Multivariable Control of Nonholonomic Differential Drive Mobile Robots

168479-Thumbnail Image.png
Description
This dissertation presents a comprehensive study of modeling and control issues associated with nonholonomic differential drive mobile robots. The first part of dissertation focuses on modeling using Lagrangian mechanics. The dynamics is modeled as a two-input two-output (TITO) nonlinear model.

This dissertation presents a comprehensive study of modeling and control issues associated with nonholonomic differential drive mobile robots. The first part of dissertation focuses on modeling using Lagrangian mechanics. The dynamics is modeled as a two-input two-output (TITO) nonlinear model. Motor dynamics are also modeled. Trade studies are conducted to shed light on critical vehicle design parameters, and how they impact static properties, dynamic properties, directional stability, coupling and overall vehicle design. An aspect ratio based dynamic decoupling condition is also presented. The second part of dissertation addresses design of linear time-invariant (LTI), multi-input multi-ouput (MIMO) fixed-structure H∞ controllers for the inner-loop velocity (v, ω) tracking system of the robot, motivated by a practical desire to design classically structured robust controllers. The fixed-structure H∞-optimal controllers are designed using Generalized Mixed Sensitivity(GMS) methodology to systematically shape properties at distinct loop breaking points. The H∞-control problem is solved using nonsmooth optimization techniques to compute locally optimal solutions. Matlab’s Robust Control toolbox (Hinfstruct and Systune) is used to solve the nonsmooth optimization. The dissertation also addresses the design of fixed-structure MIMO gain-scheduled H∞ controllers via GMS methodology. Trade-off studies are conducted to address the effect of vehicle design parameters on frequency and time domain properties of the inner-loop control system of mobile robot. The third part of dissertation focuses on the design of outer-loop position (x, y, θ) control system of mobile robot using real-time model predictive control (MPC) algorithms. Both linear time-varying (LTV) MPC and nonlinear MPC algorithms are discussed.The outer-loop performance of mobile robot is studied for two applications - 1) single robot trajectory tracking and multi-robot coordination in presence of obstacles, 2) maximum progress maneuvering on racetrack. The dissertation specifically addresses the impact of variation of c.g. position w.r.t. wheel-axle on directional maneuverability, peak control effort required to perform aggressive maneuvers, and overall position control performance. Detailed control relevant performance trade-offs associated with outer-loop position control are demonstrated through simulations in discrete time. Optimizations packages CPLEX(convex-QP in LTV-MPC) and ACADO(NLP in nonlinear-MPC) are used to solve the OCP in real time. All simulations are performed on Robot Operating System (ROS).
Date Created
2021
Agent

From Data to Predictive Models: Robust Identification and Analysis of the Immune System

168460-Thumbnail Image.png
Description
In this dissertation, new data-driven techniques are developed to solve three problems related to generating predictive models of the immune system. These problems and their solutions are summarized as follows. The first problem is that, while cellular characteristics can

In this dissertation, new data-driven techniques are developed to solve three problems related to generating predictive models of the immune system. These problems and their solutions are summarized as follows. The first problem is that, while cellular characteristics can be measured using flow cytometry, immune system cells are often analyzed only after they are sorted into groups by those characteristics. In Chapter 3 a method of analyzing the cellular characteristics of the immune system cells by generating Probability Density Functions (PDFs) to model the flow cytometry data is proposed. To generate a PDF to model the distribution of immune cell characteristics a new class of random variable called Sliced-Distributions (SDs) is developed. It is shown that the SDs can outperform other state-of-the-art methods on a set of benchmarks and can be used to differentiate between immune cells taken from healthy patients and those with Rheumatoid Arthritis. The second problem is that while immune system cells can be broken into different subpopulations, it is unclear which subpopulations are most significant. In Chapter 4 a new machine learning algorithm is formulated and used to identify subpopulations that can best predict disease severity or the populations of other immune cells. The proposed machine learning algorithm performs well when compared to other state-of-the-art methods and is applied to an immunological dataset to identify disease-relevant subpopulations of immune cells denoted immune states. Finally, while immunotherapies have been effectively used to treat cancer, selecting an optimal drug dose and period of treatment administration is still an open problem. In Chapter 5 a method to estimate Lyapunov functions of a system with unknown dynamics is proposed. This method is applied to generate a semialgebraic set containing immunotherapy doses and period of treatment that is predicted to eliminate a patient's tumor. The problem of selecting an optimal pulsed immunotherapy treatment from this semialgebraic set is formulated as a Global Polynomial Optimization (GPO) problem. In Chapter 6 a new method to solve GPO problems is proposed and optimal pulsed immunotherapy treatments are identified for this system.
Date Created
2021
Agent

Control of Unmanned Aerial Vehicles for Mission Critical Tasks

162002-Thumbnail Image.png
Description
Unmanned aerial vehicles (UAVs) have reshaped the world of aviation. With the emergence of different types of UAVs, a multitude of mission critical applications, e.g., aerial photography, package delivery, grasping and manipulation, aerial reconnaissance and surveillance have been accomplished successfully.

Unmanned aerial vehicles (UAVs) have reshaped the world of aviation. With the emergence of different types of UAVs, a multitude of mission critical applications, e.g., aerial photography, package delivery, grasping and manipulation, aerial reconnaissance and surveillance have been accomplished successfully. All of the aforementioned applications require the UAVs to be robust to external disturbances and safe while flying in cluttered environments and these factors are of paramount importance for task completion. In the first phase, this dissertation starts by presenting the synthesis and experimental validation of real-time low-level estimation and robust attitude and position controllers for multirotors. For the task of reliable position estimation, a hybrid low-pass de-trending filter is proposed for attenuating noise and drift in the velocity and position estimates respectively. Subsequently, a disturbance observer (DOB) approach with online Q-filter tuning is proposed for disturbance rejection and precise position control. Finally, a non-linear disturbance observer (NDOB) approach, along with a parameter optimization framework, is proposed for robust attitude control of multirotors. Multiple simulation and experimental flight tests are performed to demonstrate the efficacy of the proposed algorithms. Aerial grasping and collection is a type of mission-critical task which requires vision based sensing and robust control algorithms for successful task completion. In the second phase, this dissertation initially explores different object grasping approaches utilizing soft and rigid graspers. Additionally, vision based control paradigms are developed for object grasping and collection applications, specifically from water surfaces. Autonomous object collection from water surfaces presents a multitude of challenges: i) object drift due to propeller outwash, ii) reflection and glare from water surfaces makes object detection extremely challenging and iii) lack of reliable height sensors above water surface (for autonomous landing on water). Finally, a first of its kind aerial manipulation system, with an integrated net system and a robust vision based control structure, is proposed for floating object collection from water surfaces. Objects of different shapes and sizes are collected, through multiple experimental flight tests, with a success rate of 91.6%. To the best of the author's knowledge, this is the first work demonstrating autonomous object collection from water surfaces.
Date Created
2021
Agent

Bioinspired Interactions with Complex Granular and Aquatic Environments

158757-Thumbnail Image.png
Description
August Krogh, a 20th century Nobel Prize winner in Physiology and Medicine, once stated, "for such a large number of problems there will be some animal of choice, or a few such animals, on which it can be most conveniently

August Krogh, a 20th century Nobel Prize winner in Physiology and Medicine, once stated, "for such a large number of problems there will be some animal of choice, or a few such animals, on which it can be most conveniently studied." What developed to be known as the Krogh Principle, has become the cornerstone of bioinspired robotics. This is the realization that solutions to various multifaceted engineering problems lie in nature. With the integration of biology, physics and engineering, the classical approach in solving engineering problems has transformed. Through such an integration, the presented research will address the following engineering solution: maneuverability on and through complex granular and aquatic environments. The basilisk lizard and the octopus are the key sources of inspiration for the anticipated solution. The basilisk lizard is a highly agile reptile with the ability to easily traverse on vast, alternating, unstructured, and complex terrains (i.e. sand, mud, water). This makes them a great medium for pursuing potential solutions for robotic locomotion on such terrains. The octopus, with a nearly soft, yet muscular hydrostat body and arms, is proficient in locomotion and its complex motor functions are vast. Their versatility, "infinite" degrees of freedom, and dexterity have made them an ideal candidate for inspiration in the fields such as soft robotics. Through conducting animal experiments on the basilisk lizard and octopus, insight can be obtained on the question: how does the animal interact with complex granular and aquatic environments so effectively? Following it through by conducting systematic robotic experiments, the capabilities and limitations of the animal can be understood. Integrating the hierarchical concepts observed and learnt through animal and robotic experiments, it can be used towards designing, modeling, and developing robotic systems that will assist humanity and society on a diversified set of applications: home service, health care, public safety, transportation, logistics, structural examinations, aquatic and extraterrestrial exploration, search-and-rescue, environmental monitoring, forestry, and agriculture, just to name a few. By learning and being inspired by nature, there exist the potential to go beyond nature for the greater good of society and humanity.
Date Created
2020
Agent

Coordinated Navigation and Localization of an Autonomous Underwater Vehicle Using an Autonomous Surface Vehicle in the OpenUAV Simulation Framework

158648-Thumbnail Image.png
Description
The need for incorporating game engines into robotics tools becomes increasingly crucial as their graphics continue to become more photorealistic. This thesis presents a simulation framework, referred to as OpenUAV, that addresses cloud simulation and photorealism challenges in academic and

The need for incorporating game engines into robotics tools becomes increasingly crucial as their graphics continue to become more photorealistic. This thesis presents a simulation framework, referred to as OpenUAV, that addresses cloud simulation and photorealism challenges in academic and research goals. In this work, OpenUAV is used to create a simulation of an autonomous underwater vehicle (AUV) closely following a moving autonomous surface vehicle (ASV) in an underwater coral reef environment. It incorporates the Unity3D game engine and the robotics software Gazebo to take advantage of Unity3D's perception and Gazebo's physics simulation. The software is developed as a containerized solution that is deployable on cloud and on-premise systems.

This method of utilizing Gazebo's physics and Unity3D perception is evaluated for a team of marine vehicles (an AUV and an ASV) in a coral reef environment. A coordinated navigation and localization module is presented that allows the AUV to follow the path of the ASV. A fiducial marker underneath the ASV facilitates pose estimation of the AUV, and the pose estimates are filtered using the known dynamical system model of both vehicles for better localization. This thesis also investigates different fiducial markers and their detection rates in this Unity3D underwater environment. The limitations and capabilities of this Unity3D perception and Gazebo physics approach are examined.
Date Created
2020
Agent

Ant-Inspired Control Strategies for Collective Transport by Dynamic Multi-Robot Teams with Temporary Leaders

158420-Thumbnail Image.png
Description
In certain ant species, groups of ants work together to transport food and materials back to their nests. In some cases, the group exhibits a leader-follower behavior in which a single ant guides the entire group based on its knowledge

In certain ant species, groups of ants work together to transport food and materials back to their nests. In some cases, the group exhibits a leader-follower behavior in which a single ant guides the entire group based on its knowledge of the destination. In some cases, the leader role is occupied temporarily by an ant, only to be replaced when an ant with new information arrives. This kind of behavior can be very useful in uncertain environments where robot teams work together to transport a heavy or bulky payload. The purpose of this research was to study ways to implement this behavior on robot teams.

In this work, I combined existing dynamical models of collective transport in ants to create a stochastic model that describes these behaviors and can be used to control multi-robot systems to perform collective transport. In this model, each agent transitions stochastically between roles based on the force that it senses the other agents are applying to the load. The agent’s motion is governed by a proportional controller that updates its applied force based on the load velocity. I developed agent-based simulations of this model in NetLogo and explored leader-follower scenarios in which agents receive information about the transport destination by a newly informed agent (leader) joining the team. From these simulations, I derived the mean allocations of agents between “puller” and “lifter” roles and the mean forces applied by the agents throughout the motion.

From the simulation results obtained, we show that the mean ratio of lifter to puller populations is approximately 1:1. We also show that agents using the role update procedure based on forces are required to exert less force than agents that select their role based on their position on the load, although both strategies achieve similar transport speeds.
Date Created
2020
Agent

Trajectory Modeling, Estimation and Interception of a Thrown Ball using a Robotic Ground Vehicle

156958-Thumbnail Image.png
Description
Toward the ambitious long-term goal of developing a robotic circus, this thesis addresses key steps toward the development of a ground robot that can catch a ball. Toward this end, we examine nonlinear quadratic drag trajectories for a tossed ball.

Toward the ambitious long-term goal of developing a robotic circus, this thesis addresses key steps toward the development of a ground robot that can catch a ball. Toward this end, we examine nonlinear quadratic drag trajectories for a tossed ball. Relevant least square error fits are provided. It is also shown how a Kalman filter and Extended Kalman filter can be used to generate estimates for the ball trajectory.

Several simple ball intercept policies are examined. This includes open loop and closed loop policies. It is also shown how a low-cost differential-drive research grade robot can be built, modeled and controlled. Directions for developing more complex xy planar intercept policies are also briefly discussed. In short, the thesis establishes a foundation for future work on developing a practical ball catching robot.
Date Created
2018
Agent

Exploration, Mapping and Scalar Field Estimation using a Swarm of Resource-Constrained Robots

156952-Thumbnail Image.png
Description
Robotic swarms can potentially perform complicated tasks such as exploration and mapping at large space and time scales in a parallel and robust fashion. This thesis presents strategies for mapping environmental features of interest – specifically obstacles, collision-free paths, generating

Robotic swarms can potentially perform complicated tasks such as exploration and mapping at large space and time scales in a parallel and robust fashion. This thesis presents strategies for mapping environmental features of interest – specifically obstacles, collision-free paths, generating a metric map and estimating scalar density fields– in an unknown domain using data obtained by a swarm of resource-constrained robots. First, an approach was developed for mapping a single obstacle using a swarm of point-mass robots with both directed and random motion. The swarm population dynamics are modeled by a set of advection-diffusion-reaction partial differential equations (PDEs) in which a spatially-dependent indicator function marks the presence or absence of the obstacle in the domain. The indicator function is estimated by solving an optimization problem with PDEs as constraints. Second, a methodology for constructing a topological map of an unknown environment was proposed, which indicates collision-free paths for navigation, from data collected by a swarm of finite-sized robots. As an initial step, the number of topological features in the domain was quantified by applying tools from algebraic topology, to a probability function over the explored region that indicates the presence of obstacles. A topological map of the domain is then generated using a graph-based wave propagation algorithm. This approach is further extended, enabling the technique to construct a metric map of an unknown domain with obstacles using uncertain position data collected by a swarm of resource-constrained robots, filtered using intensity measurements of an external signal. Next, a distributed method was developed to construct the occupancy grid map of an unknown environment using a swarm of inexpensive robots or mobile sensors with limited communication. In addition to this, an exploration strategy which combines information theoretic ideas with Levy walks was also proposed. Finally, the problem of reconstructing a two-dimensional scalar field using observations from a subset of a sensor network in which each node communicates its local measurements to its neighboring nodes was addressed. This problem reduces to estimating the initial condition of a large interconnected system with first-order linear dynamics, which can be solved as an optimization problem.
Date Created
2018
Agent

Fabrication, Modeling and Control of a Spherical Tail-Sitter UAV

156708-Thumbnail Image.png
Description
In the past decade, real-world applications of Vertical Take-Off and Landing (VTOL) Unmanned Aerial Vehicles (UAV) have increased significantly. There has been growing interest in one of these types of UAVs, called a tail-sitter UAV, due to its VTOL and

In the past decade, real-world applications of Vertical Take-Off and Landing (VTOL) Unmanned Aerial Vehicles (UAV) have increased significantly. There has been growing interest in one of these types of UAVs, called a tail-sitter UAV, due to its VTOL and cruise capabilities. This thesis presents the fabrication of a spherical tail-sitter UAV and derives a nonlinear mathematical model of its dynamics. The singularity in the attitude kinematics of the vehicle is avoided using Modified Rodrigues Parameters (MRP). The model parameters of the fabricated vehicle are calculated using the bifilar pendulum method, a motor stand, and ANSYS simulation software. Then the trim conditions at hover are calculated for the nonlinear model, and the rotational dynamics of the model are linearized around the equilibrium state with the calculated trim conditions. Robust controllers are designed to stabilize the UAV in hover using the H2 control and H-infinity control methodologies. For H2 control design, Linear Quadratic Gaussian (LQG) control is used. For the H infinity control design, Linear Matrix Inequalities (LMI) with frequency-dependent weights are derived and solved using the MATLAB toolbox YALMIP. In addition, a nonlinear controller is designed using the Sum-of-Squares (SOS) method to implement large-angle maneuvers for transitions between horizontal flight and vertical flight. Finally, the linear controllers are implemented in the fabricated spherical tail-sitter UAV for experimental validation. The performance trade-offs and the response of the UAV with the linear and nonlinear controllers are discussed in detail.
Date Created
2018
Agent