Full metadata
Title
Scalable register file architecture for CGRA accelerators
Description
Coarse-grained Reconfigurable Arrays (CGRAs) are promising accelerators capable
of accelerating even non-parallel loops and loops with low trip-counts. One challenge
in compiling for CGRAs is to manage both recurring and nonrecurring variables in
the register file (RF) of the CGRA. Although prior works have managed recurring
variables via rotating RF, they access the nonrecurring variables through either a
global RF or from a constant memory. The former does not scale well, and the latter
degrades the mapping quality. This work proposes a hardware-software codesign
approach in order to manage all the variables in a local nonrotating RF. Hardware
provides modulo addition based indexing mechanism to enable correct addressing
of recurring variables in a nonrotating RF. The compiler determines the number of
registers required for each recurring variable and configures the boundary between the
registers used for recurring and nonrecurring variables. The compiler also pre-loads
the read-only variables and constants into the local registers in the prologue of the
schedule. Synthesis and place-and-route results of the previous and the proposed RF
design show that proposed solution achieves 17% better cycle time. Experiments of
mapping several important and performance-critical loops collected from MiBench
show proposed approach improves performance (through better mapping) by 18%,
compared to using constant memory.
of accelerating even non-parallel loops and loops with low trip-counts. One challenge
in compiling for CGRAs is to manage both recurring and nonrecurring variables in
the register file (RF) of the CGRA. Although prior works have managed recurring
variables via rotating RF, they access the nonrecurring variables through either a
global RF or from a constant memory. The former does not scale well, and the latter
degrades the mapping quality. This work proposes a hardware-software codesign
approach in order to manage all the variables in a local nonrotating RF. Hardware
provides modulo addition based indexing mechanism to enable correct addressing
of recurring variables in a nonrotating RF. The compiler determines the number of
registers required for each recurring variable and configures the boundary between the
registers used for recurring and nonrecurring variables. The compiler also pre-loads
the read-only variables and constants into the local registers in the prologue of the
schedule. Synthesis and place-and-route results of the previous and the proposed RF
design show that proposed solution achieves 17% better cycle time. Experiments of
mapping several important and performance-critical loops collected from MiBench
show proposed approach improves performance (through better mapping) by 18%,
compared to using constant memory.
Date Created
2016
Contributors
- Dave, Shail (Author)
- Shrivastava, Aviral (Thesis advisor)
- Ren, Fengbo (Committee member)
- Ogras, Umit Y. (Committee member)
- Arizona State University (Publisher)
Topical Subject
Resource Type
Extent
vi, 30 pages : illustrations (some color)
Language
eng
Copyright Statement
In Copyright
Primary Member of
Peer-reviewed
No
Open Access
No
Handle
https://hdl.handle.net/2286/R.I.40738
Statement of Responsibility
by Shail Dave
Description Source
Viewed on January 5, 2017
Level of coding
full
Note
thesis
Partial requirement for: M.S., Arizona State University, 2016
bibliography
Includes bibliographical references (pages 28-30)
Field of study: Computer science
System Created
- 2016-12-01 07:02:18
System Modified
- 2021-08-30 01:20:40
- 3 years 2 months ago
Additional Formats