Joint Learning of Reward Machines and Policies for Multi-Agent Reinforcement Learning in Non-Cooperative Stochastic Games

193002-Thumbnail Image.png
Description
Multi-agent reinforcement learning (MARL) plays a pivotal role in artificial intelligence by facilitating the learning process in complex environments inhabited by multiple entities. This thesis explores the integration of learning high-level knowledge through reward machines (RMs) with MARL to effectively

Multi-agent reinforcement learning (MARL) plays a pivotal role in artificial intelligence by facilitating the learning process in complex environments inhabited by multiple entities. This thesis explores the integration of learning high-level knowledge through reward machines (RMs) with MARL to effectively manage non-Markovian reward functions in non-cooperative stochastic games. Reward machines offer a sophisticated way to model the temporal structure of rewards, thereby providing an enhanced representation of agent decision-making processes. A novel algorithm JIRP-SG is introduced, enabling agents to concurrently learn RMs and optimize their best response policies while navigating the intricate temporal dependencies present in non-cooperative settings. This approach employs automata learning to iteratively acquire RMs and utilizes the Lemke-Howson method to update the Q-functions, aiming for a Nash equilibrium. It is demonstrated that the method introduced reliably converges to accurately encode the reward functions and achieve the optimal best response policy for each agent over time. The effectiveness of the proposed approach is validated through case studies, including a Pacman Game scenario and a Factory Assembly scenario, illustrating its superior performance compared to baseline methods. Additionally, the impact of batch size on learning performance is examined, revealing that a diligent agent employing smaller batches can surpass the performance of an agent using larger batches, which fails to summarize experiences as effectively.
Date Created
2024
Agent

Quantification of Shoulder Stiffness at Various Arm Postures using a 4-Bar Parallel Exoskeleton Robot

189266-Thumbnail Image.png
Description
Shoulder injuries are the leading cause of shoulder discomfort or disabilities. Assessment of the glenohumeral joint functions through system identification technique approach is beneficial considering glenohumeral joint has major contributing factors associated with shoulder movement and stability. This function is

Shoulder injuries are the leading cause of shoulder discomfort or disabilities. Assessment of the glenohumeral joint functions through system identification technique approach is beneficial considering glenohumeral joint has major contributing factors associated with shoulder movement and stability. This function is identified by estimating a mathematical model by perturbing the glenohumeral joint and measuring the input angle and output torque. In this study, a shoulder exoskeleton robot was utilized, which makes use of a 4-bar spherical parallel manipulator (4B-SPM). The 4B-SPM exoskeleton has the advantage of high acceleration, fast enough to satisfy the speed requirement for the characterization of distinct neuromuscular properties of shoulder. Thirty-four healthy subjects (17 female, 17 male) were appointed with no history of shoulder impairment to characterize shoulder joint stiffness by providing filtered gaussianrandom perturbations with RMS value, frequency of 2 degrees and 3 Hz respectively. These perturbations arecaptured by 3-D Motion capture system by placing markers on arm brace which allows arm to be locked at a particular pose. Participants were instructed to maintain a relaxed state to avoid the interference of the muscle activation on the mechanical properties of the shoulder. Torque was measured using Force-Torque (FT) sensor at 15 different postures. These postures were divided among 3 flexion angles of the shoulder with a set of 5 horizontal extension postural configuration quantified for each flexion angle. The stiffness characterization was performed by utilizing Short Data Segment (SDS) method of time-varying system identification. It was observed that shoulder joint stiffness varied significantly depending on the arm's posture. The shoulder joint stiffness was observed to increase as the flexion angle decreases. Notably, a convex pattern emerged, wherein stiffness values increased as the arm deviated further from the mid-range of the shoulder joint's range of motion (ROM) in horizontal extension directions. These findings suggest that maintaining the arm's posture near the mid-range of ROM decreases the stability of the shoulder joint. The shoulder joint stiffness was also observed to have significant difference on the basis of gender where in male subjects were observed to have higher joint stiffness than female subjects.
Date Created
2023
Agent

Reinforcement Learning for Planning and Scheduling in Aviation

187627-Thumbnail Image.png
Description
Aviation is a complicated field that involves a wide range of operations, from commercial airline flights to Unmanned Aerial Systems (UAS). Planning and scheduling are essential components in the aviation industry that play a significant role in ensuring safe and

Aviation is a complicated field that involves a wide range of operations, from commercial airline flights to Unmanned Aerial Systems (UAS). Planning and scheduling are essential components in the aviation industry that play a significant role in ensuring safe and efficient operations. Reinforcement Learning (RL) has received increasing attention in recent years due to its capability to enable autonomous decision-making. To investigate the potential advantages and effectiveness of RL in aviation planning and scheduling, three topics are explored in-depth, including obstacle avoidance, task-oriented path planning, and maintenance scheduling. A dynamic and probabilistic airspace reservation concept, called Dynamic Anisotropic (DA) bound, is first developed for UAS, which can be added around the UAS as the separation requirement. A model based on Q-leaning is proposed to integrate DA bound with path planning for obstacle avoidance. Moreover, A deep reinforcement learning algorithm based on Proximal Policy Optimization (PPO) is proposed to guide the UAS to destinations while avoiding obstacles through continuous control. Results from case studies demonstrate that the proposed model can provide accurate and robust guidance and resolve conflict with a success rate of over 99%. Next, the single-UAS path planning problem is extended to a multi-agent system where agents aim to accomplish their own complex tasks. These tasks involve non-Markovian reward functions and can be specified using reward machines. Both cooperative and competitive environments are explored. Decentralized Graph-based reinforcement learning using Reward Machines (DGRM) is proposed to improve computational efficiency for maximizing the global reward in a graph-based Markov Decision Process (MDP). Q-learning with Reward Machines for Stochastic Games (QRM-SG) is developed to learn the best-response strategy for each agent in a competitive environment. Furthermore, maintenance scheduling is investigated. The purpose is to minimize the system maintenance cost while ensuring compliance with reliability requirements. Maintenance scheduling is formulated as an MDP and determines when and what maintenance operations to conduct. A Linear Programming-enhanced RollouT (LPRT) method is developed to solve both constrained deterministic and stochastic maintenance scheduling with an infinite horizon. LPRT categorizes components according to their health condition and makes decisions for each category.
Date Created
2023
Agent