Full metadata
Title
Analyzing, Understanding, and Improving Predicted Variable Names in Decompiled Binary Code
Description
Reverse engineers use decompilers to analyze binaries when their source code is unavailable. A binary decompiler attempts to transform binary programs to their corresponding high-level source code by recovering and inferring the information that was lost during the compilation process. One type of information that is lost during compilation is variable names, which are critical for reverse engineers to analyze and understand programs. Traditional binary decompilers generally use automatically generated, placeholder variable names that are meaningless or have little correlation with their intended semantics. Having correct or meaningful variable names in decompiled code, instead of placeholder variable names, greatly increases the readability of decompiled binary code. Decompiled Identifier Renaming Engine (DIRE) is a state-of-the-art, deep-learning-based solution that automatically predicts variable names in decompiled binary code. However, DIRE's prediction result is far from perfect. The first goal of this research project is to take a close look at the current state-of-the-art solution for automated variable name prediction on decompilation output of binary code, assess the prediction quality, and understand how the prediction result can be improved. Then, as the second goal of this research project, I aim to improve the prediction quality of variable names. With a thorough understanding of DIRE's issues, I focus on improving the quality of training data. This thesis proposes a novel approach to improving the quality of the training data by normalizing variable names and converting their abbreviated forms to their full forms. I implemented and evaluated the proposed approach on a data set of over 10k and 20k binaries and showed improvements over DIRE.
Date Created
2021
Contributors
- Bajaj, Ati Priya (Author)
- Wang, Ruoyu (Thesis advisor)
- Baral, Chitta (Committee member)
- Shoshitaishvili, Yan (Committee member)
- Arizona State University (Publisher)
Topical Subject
Resource Type
Extent
36 pages
Language
eng
Copyright Statement
In Copyright
Primary Member of
Peer-reviewed
No
Open Access
No
Handle
https://hdl.handle.net/2286/R.2.N.161705
Level of coding
minimal
Cataloging Standards
Note
Partial requirement for: M.S., Arizona State University, 2021
Field of study: Computer Science
System Created
- 2021-11-16 03:20:21
System Modified
- 2021-11-30 12:51:28
- 2 years 11 months ago
Additional Formats