Description
Referring Expression Comprehension (REC) is an important area of research in Natural Language Processing (NLP) and vision domain. It involves locating an object in an image described by a natural language referring expression. This task requires information from both Natural Language and Vision aspect. The task is compositional in nature as it requires visual reasoning as underlying process along with relationships among the objects in the image. Recent works based on modular networks have
displayed to be an effective framework for performing visual reasoning task.
Although this approach is effective, it has been established that the current benchmark datasets for referring expression comprehension suffer from bias. Recent work on CLEVR-Ref+ dataset deals with bias issues by constructing a synthetic dataset
and provides an approach for the aforementioned task which performed better than the previous state-of-the-art models as well as showing the reasoning process. This work aims to improve the performance on CLEVR-Ref+ dataset and achieve comparable interpretability. In this work, the neural module network approach with the attention map technique is employed. The neural module network is composed of the primitive operation modules which are specific to their functions and the output is generated using a separate segmentation module. From empirical results, it is clear that this approach is performing significantly better than the current State-of-theart in one aspect (Predicted programs) and achieving comparable results for another aspect (Ground truth programs)
displayed to be an effective framework for performing visual reasoning task.
Although this approach is effective, it has been established that the current benchmark datasets for referring expression comprehension suffer from bias. Recent work on CLEVR-Ref+ dataset deals with bias issues by constructing a synthetic dataset
and provides an approach for the aforementioned task which performed better than the previous state-of-the-art models as well as showing the reasoning process. This work aims to improve the performance on CLEVR-Ref+ dataset and achieve comparable interpretability. In this work, the neural module network approach with the attention map technique is employed. The neural module network is composed of the primitive operation modules which are specific to their functions and the output is generated using a separate segmentation module. From empirical results, it is clear that this approach is performing significantly better than the current State-of-theart in one aspect (Predicted programs) and achieving comparable results for another aspect (Ground truth programs)
Details
Title
- Referring Expression Comprehension for CLEVR-Ref+ Dataset
Contributors
- Rathor, Kuldeep Singh (Author)
- Baral, Chitta (Thesis advisor)
- Yang, Yezhou (Committee member)
- Simeone, Michael (Committee member)
- Arizona State University (Publisher)
Date Created
The date the item was original created (prior to any relationship with the ASU Digital Repositories.)
2020
Subjects
Resource Type
Collections this item is in
Note
- Masters Thesis Computer Science 2020