Full metadata

Title

FPGA accelerator architecture for Q-learning and its applications in space exploration rovers

Description

Achieving human level intelligence is a long-term goal for many Artificial Intelligence (AI) researchers. Recent developments in combining deep learning and reinforcement learning helped us to move a step forward in achieving this goal. Reinforcement learning using a delayed reward mechanism is an approach to machine intelligence which studies decision making with control and how a decision making agent can learn to act optimally in an environment-unaware conditions.

Q-learning is one of the model-free reinforcement directed learning strategies which uses temporal differences to estimate the performances of state-action pairs called Q values. A simple implementation of Q-learning algorithm can be done using a Q table memory to store and update the Q values. However, with an increase in state space data due to a complex environment, and with an increase in possible number of actions an agent can perform, Q table reaches its space limit and would be difficult to scale well. Q-learning with neural networks eliminates the use of Q table by approximating the Q function using neural networks.

Autonomous agents need to develop cognitive properties and become self-adaptive to be deployable in any environment. Reinforcement learning with Q-learning have been very efficient in solving such problems. However, embedded systems like space rovers and autonomous robots rarely implement such techniques due to the constraints faced like processing power, chip area, convergence rate and cost of the chip. These problems present a need for a portable, low power, area efficient hardware accelerator to accelerate the process of such learning.

This problem is targeted by implementing a hardware schematic architecture for Q-learning using Artificial Neural networks. This architecture exploits the massive parallelism provided by neural network with a dedicated fine grain parallelism provided by a Field Programmable Gate Array (FPGA) thereby processing the Q values at a high throughput. Mars exploration rovers currently use Xilinx-Space-grade FPGA devices for image processing, pyrotechnic operation control and obstacle avoidance. The hardware resource consumption for the architecture has been synthesized considering Xilinx Virtex7 FPGA as the target device.

Date Created

2016

Contributors

Gankidi, Pranay Reddy (Author)
Thangavelautham, Jekanthan (Thesis advisor)
Ren, Fengbo (Committee member)
Seo, Jae-Sun (Committee member)
Arizona State University (Publisher)

Topical Subject

Resource Type

Text

Genre

Masters Thesis

Academic theses

Extent

ix, 75 pages : illustrations (some color)

Language

eng

Copyright Statement

In Copyright

Reuse Permissions

Primary Member of

ASU Electronic Theses and Dissertations

Peer-reviewed

No

Open Access

No

Handle

https://hdl.handle.net/2286/R.I.40834

Statement of Responsibility

by Pranay Reddy Gankidi

Description Source

Viewed on February 6, 2016

Level of coding

full

Note

thesis

Partial requirement for: M.S., Arizona State University, 2016

bibliography

Includes bibliographical references (pages 73-75)

Field of study: Engineering

System Created

2016-12-01 07:11:16

System Modified

2021-08-30 01:20:15
3 years 2 months ago

Additional Formats