Full metadata
Title
Towards Development of Models that Learn New Tasks from Instructions
Description
Humans have the remarkable ability to solve different tasks by simply reading textual instructions that define the tasks and looking at a few examples. Natural Language Processing (NLP) models built with the conventional machine learning paradigm, however, often struggle to generalize across tasks (e.g., a question-answering system cannot solve classification tasks) despite training with lots of examples. A long-standing challenge in Artificial Intelligence (AI) is to build a model that learns a new task by understanding the human-readable instructions that define it. To study this, I led the development of NATURAL INSTRUCTIONS and SUPERNATURAL INSTRUCTIONS, large-scale datasets of diverse tasks, their human-authored instructions, and instances. I adopt generative pre-trained language models to encode task-specific instructions along with input and generate task output. Empirical results in my experiments indicate that the instruction-tuning helps models achieve cross-task generalization. This leads to the question: how to write good instructions? Backed by extensive empirical analysis on large language models, I observe important attributes for successful instructional prompts and propose several reframing techniques for model designers to create such prompts. Empirical results in my experiments show that reframing notably improves few-shot learning performance; this is particularly important on large language models, such as GPT3 where tuning models or prompts on large datasets is expensive. In another experiment, I observe that representing a chain of thought instruction of mathematical reasoning questions as a program improves model performance significantly. This observation leads to the development of a large scale mathematical reasoning model BHASKAR and a unified benchmark LILA. In case of program synthesis tasks, however, summarizing a question (instead of expanding as in chain of thought) helps models significantly. This thesis also contains the study of instruction-example equivalence, power of decomposition instruction to replace the need for new models and origination of dataset bias from crowdsourcing instructions to better understand the advantages and disadvantages of instruction paradigm. Finally, I apply the instruction paradigm to match real user needs and introduce a new prompting technique HELP ME THINK to help humans perform various tasks by asking questions.
Date Created
2023
Contributors
- Mishra, Swaroop (Author)
- Baral, Chitta (Thesis advisor)
- Mitra, Arindam (Committee member)
- Blanco, Eduardo (Committee member)
- Yang, Yezhou (Committee member)
- Arizona State University (Publisher)
Topical Subject
Resource Type
Extent
417 pages
Language
eng
Copyright Statement
In Copyright
Primary Member of
Peer-reviewed
No
Open Access
No
Handle
https://hdl.handle.net/2286/R.2.N.187521
Level of coding
minimal
Cataloging Standards
Note
Partial requirement for: Ph.D., Arizona State University, 2023
Field of study: Computer Engineering
System Created
- 2023-06-07 11:30:15
System Modified
- 2023-06-07 11:30:21
- 1 year 5 months ago
Additional Formats