Incorporating Causal Information using Temporal-Logic-Based Causal Diagram in Reinforcement Learning
Document
Description
In this thesis, I investigate a subset of reinforcement learning (RL) tasks where the objective for the agent is to achieve temporally extended goals. A common approach, in this setting, is to represent the tasks using deterministic finite automata (DFA) and integrate them in the state space of the RL algorithms, yet such representations often disregard causal knowledge pertinent to the environment. To address this limitation, I introduce the Temporal-Logic-based Causal Diagram (TL-CD) in RL.TL-CD encapsulates temporal causal relationships among diverse environmental properties. We leverage the TL-CD to devise an RL algorithm that significantly reduces environment exploration requirements. By synergizing TL-CD with task-specific DFAs, I identify scenarios wherein the agent can efficiently determine expected rewards early during the exploration phases. Through a series of case studies, I empirically demonstrate the advantages of TL-CDs, particularly highlighting the accelerated convergence of the algorithm towards an optimal policy facilitated by diminished exploration of the environment.