Learning Temporally Composable Task Segmentations with Language

193572-Thumbnail Image.png
Learning longer-horizon tasks is challenging with techniques such as reinforcement learning and behavior cloning. Previous approaches have split these long tasks into shorter tasks that are easier to learn by using statistical change point detection methods. However, classical changepoint detection

Learning longer-horizon tasks is challenging with techniques such as reinforcement learning and behavior cloning. Previous approaches have split these long tasks into shorter tasks that are easier to learn by using statistical change point detection methods. However, classical changepoint detection methods function only with low-dimensional robot trajectory data and not with high-dimensional inputs such as vision. In this thesis, I have split a long horizon tasks, represented by trajectories into short-horizon sub-tasks with the supervision of language. These shorter horizon tasks can be learned using conventional behavior cloning approaches. I found comparisons between the techniques from the video moment retrieval problem and changepoint detection in robot trajectory data consisting of high-dimensional data. The proposed moment retrieval-based approach shows a more than 30% improvement in mean average precision (mAP) for identifying trajectory sub-tasks with language guidance compared to that without language. Several ablations are performed to understand the effects of domain randomization, sample complexity, views, and sim-to-real transfer of this method. The data ablation shows that just with a 100 labeled trajectories a 42.01 mAP can be achieved, demonstrating the sample efficiency of using such an approach. Further, behavior cloning models trained on the segmented trajectories outperform a single model trained on the whole trajectory by up to 20%.
Date Created

Addressing Efficiency and Reliability Challenges in Natural Language Processing

193413-Thumbnail Image.png
Recently developed large language models have achieved remarkable success on a wide range of natural language tasks. Furthermore, they have been shown to possess an impressive ability to generate fluent and coherent text. Despite all the notable abilities of these

Recently developed large language models have achieved remarkable success on a wide range of natural language tasks. Furthermore, they have been shown to possess an impressive ability to generate fluent and coherent text. Despite all the notable abilities of these models, there exist several efficiency and reliability related challenges. For example, they are vulnerable to a phenomenon called 'hallucination' in which they generate text that is not factually correct and they also have a large number of parameters which makes their inference slow and computationally expensive. With the objective of taking a step closer towards further enabling the widespread adoption of the Natural Language Processing (NLP) systems, this dissertation studies the following question: how to effectively address the efficiency and reliability related concerns of the NLP systems? Specifically, to improve the reliability of models, this dissertation first presents an approach that actively detects and mitigates the hallucinations of LLMs using a retrieval augmented methodology. Note that another strategy to mitigate incorrect predictions is abstention from answering when error is likely, i.e., selective prediction. To this end, I present selective prediction approaches and conduct extensive experiments to demonstrate their effectiveness. Building on top of selective prediction, I also present post-abstention strategies that focus on reliably increasing the coverage of a selective prediction system without considerably impacting its accuracy. Furthermore, this dissertation covers multiple aspects of improving the efficiency including 'inference efficiency' (making model inferences in a computationally efficient manner without sacrificing the prediction accuracy), 'data sample efficiency' (efficiently collecting data instances for training a task-specific system), 'open-domain QA reader efficiency' (leveraging the external knowledge efficiently while answering open-domain questions), and 'evaluation efficiency' (comparing the performance of different models efficiently). In summary, this dissertation highlights several challenges pertinent to the efficiency and reliability involved in the development of NLP systems and provides effective solutions to address them.
Date Created

Interpreting Answers to Yes-No Questions in Twitter

190194-Thumbnail Image.png
Interpreting answers to yes-no questions in social media is difficult. Yes and no keywords are uncommon, and when answers include them, they are rarely to be interpreted what the keywords suggest. This work presents a new corpus of 4,442 yes-no

Interpreting answers to yes-no questions in social media is difficult. Yes and no keywords are uncommon, and when answers include them, they are rarely to be interpreted what the keywords suggest. This work presents a new corpus of 4,442 yes-no question answer pairs from Twitter (Twitter-YN). The corpus includes question-answer instances from different temporal settings. These settings allow investigating if having older tweets helps understanding more contemporary tweets. Common linguistic features of answers meaning yes, no as well as those whose interpretation remains unknown are also discussed. Experimental results show that large language models are far from solving this problem, even after fine-tuning and blending other corpora for the same problem but outside social media (F1: 0.59). In addition to English, this work presents a Hindi corpus of 3,409 yes-no questions and answers from Twitter (Twitter-YN-hi). Cross lingual experiments are conducted using a distant supervision approach. It is observed that performance of multilingual large language models to interpret indirect answers to yes-no questions in Hindi can be improved when Twitter-YN is blended with distantly supervised data.
Date Created

Neuro-Symbolic AI Approaches to Enhance Deep Neural Networks with Logical Reasoning and Knowledge Integration

189394-Thumbnail Image.png
One of the challenges in Artificial Intelligence (AI) is to integrate fast, automatic, and intuitive System-1 thinking with slow, deliberate, and logical System-2 thinking. While deep learning approaches excel at perception tasks for System-1, their reasoning capabilities for System-2 are

One of the challenges in Artificial Intelligence (AI) is to integrate fast, automatic, and intuitive System-1 thinking with slow, deliberate, and logical System-2 thinking. While deep learning approaches excel at perception tasks for System-1, their reasoning capabilities for System-2 are limited. Besides, deep learning approaches are usually data-hungry, hard to make use of explicit knowledge, and struggling with interpretability and justification. This dissertation presents three neuro-symbolic AI approaches that integrate neural networks (NNs) with symbolic AI methods to address these issues. The first approach presented in this dissertation is NeurASP, which combines NNs with Answer Set Programming (ASP), a logic programming formalism. NeurASP provides an effective way to integrate sub-symbolic and symbolic computation by treating NN outputs as probability distributions over atomic facts in ASP. The explicit knowledge encoded in ASP corrects mistakes in NN outputs and allows for better training with less data. To avoid NeurASP's bottleneck in symbolic computation, this dissertation presents a Constraint Loss via Straight-Through Estimators (CL-STE). CL-STE provides a systematic way to compile discrete logical constraints into a loss function over discretized NN outputs and scales significantly better than state-of-the-art neuro-symbolic methods. This dissertation also presents a finding when CL-STE was applied to Transformers. Transformers can be extended with recurrence to enhance its power for multi-step reasoning. Such Recurrent Transformer can straightforwardly be applied to visual constraint reasoning problems while successfully addressing the symbol grounding problem. Lastly, this dissertation addresses the limitation of pre-trained Large Language Models (LLMs) on multi-step logical reasoning problems with a dual-process neuro-symbolic reasoning system called LLM+ASP, where an LLM (e.g., GPT-3) serves as a highly effective few-shot semantic parser that turns natural language sentences into a logical form that can be used as input to ASP. LLM+ASP achieves state-of-the-art performance on several textual reasoning benchmarks and can handle robot planning tasks that an LLM alone fails to solve.
Date Created

Multimodal Fake News Detection via Single Tower Transformer

189367-Thumbnail Image.png
With the rise in social media usage and rapid communication, the proliferation of misinformation and fake news has become a pressing concern. The detection of multimodal fake news requires careful consideration of both image and textual semantics with proper alignment

With the rise in social media usage and rapid communication, the proliferation of misinformation and fake news has become a pressing concern. The detection of multimodal fake news requires careful consideration of both image and textual semantics with proper alignment of the embedding space. Automated fake news detection has gained significant attention in recent years. Existing research has focused on either capturing cross-modal inconsistency information or leveraging the complementary information within image-text pairs. However, the potential of powerful cross-modal contrastive learning methods and effective modality mixing remains an open-ended question. The thesis proposes a novel two-leg single-tower architecture equipped with self-attention mechanisms and custom contrastive loss to efficiently aggregate multimodal features. Furthermore, pretraining and fine-tuning are employed on the custom transformer model to classify fake news across the popular Twitter multimodal fake news dataset. The experimental results demonstrate the efficacy and robustness of the proposed approach, offering promising advancements in multimodal fake news detection research.
Date Created

Towards Understanding the Role of Knowledge in Improving Transformer-based Language Models

189209-Thumbnail Image.png
In natural language processing, language models have achieved remarkable success over the last few years. The Transformers are at the core of most of these models. Their success can be mainly attributed to an enormous amount of curated data they

In natural language processing, language models have achieved remarkable success over the last few years. The Transformers are at the core of most of these models. Their success can be mainly attributed to an enormous amount of curated data they are trained on. Even though such language models are trained on massive curated data, they often need specific extracted knowledge to understand better and reason. This is because often relevant knowledge may be implicit or missing, which hampers machine reasoning. Apart from that, manual knowledge curation is time-consuming and erroneous. Hence, finding fast and effective methods to extract such knowledge from data is important for improving language models. This leads to finding ideal ways to utilize such knowledge by incorporating them into language models. Successful knowledge extraction and integration lead to an important question of knowledge evaluation of such models by developing tools or introducing challenging test suites to learn about their limitations and improve them further. So to improve the transformer-based models, understanding the role of knowledge becomes important. In the pursuit to improve language models with knowledge, in this dissertation I study three broad research directions spanning across the natural language, biomedical and cybersecurity domains: (1) Knowledge Extraction (KX) - How can transformer-based language models be leveraged to extract knowledge from data? (2) Knowledge Integration (KI) - How can such specific knowledge be used to improve such models? (3) Knowledge Evaluation (KE) - How can language models be evaluated for specific skills and understand their limitations? I propose methods to extract explicit textual, implicit structural, missing textual, and missing structural knowledge from natural language and binary programs using transformer-based language models. I develop ways to improve the language model’s multi-step and commonsense reasoning abilities using external knowledge. Finally, I develop challenging datasets which assess their numerical reasoning skills in both in-domain and out-of-domain settings.
Date Created

Neural Retriever-Reader for Information Retrieval and Question Answering

187694-Thumbnail Image.png
In the era of information explosion and multi-modal data, information retrieval (IR) and question answering (QA) systems have become essential in daily human activities. IR systems aim to find relevant information in response to user queries, while QA systems

In the era of information explosion and multi-modal data, information retrieval (IR) and question answering (QA) systems have become essential in daily human activities. IR systems aim to find relevant information in response to user queries, while QA systems provide concise and accurate answers to user questions. IR and QA are two of the most crucial challenges in the realm of Artificial Intelligence (AI), with wide-ranging real-world applications such as search engines and dialogue systems. This dissertation investigates and develops novel models and training objectives to enhance current retrieval systems in textual and multi-modal contexts. Moreover, it examines QA systems, emphasizing generalization and robustness, and creates new benchmarks to promote their progress. Neural retrievers have surfaced as a viable solution, capable of surpassing the constraints of traditional term-matching search algorithms. This dissertation presents Poly-DPR, an innovative multi-vector model architecture that manages test-query, and ReViz, a comprehensive multimodal model to tackle multi-modality queries. By utilizing IR-focused pretraining tasks and producing large-scale training data, the proposed methodology substantially improves the abilities of existing neural retrievers.Concurrently, this dissertation investigates the realm of QA systems, referred to as ``readers'', by performing an exhaustive analysis of current extractive and generative readers, which results in a reliable guidance for selecting readers for downstream applications. Additionally, an original reader (Two-in-One) is designed to effectively choose the pertinent passages and sentences from a pool of candidates for multi-hop reasoning. This dissertation also acknowledges the significance of logical reasoning in real-world applications and has developed a comprehensive testbed, LogiGLUE, to further the advancement of reasoning capabilities in QA systems.
Date Created

Towards Development of Models that Learn New Tasks from Instructions

187521-Thumbnail Image.png
Humans have the remarkable ability to solve different tasks by simply reading textual instructions that define the tasks and looking at a few examples. Natural Language Processing (NLP) models built with the conventional machine learning paradigm, however, often struggle to

Humans have the remarkable ability to solve different tasks by simply reading textual instructions that define the tasks and looking at a few examples. Natural Language Processing (NLP) models built with the conventional machine learning paradigm, however, often struggle to generalize across tasks (e.g., a question-answering system cannot solve classification tasks) despite training with lots of examples. A long-standing challenge in Artificial Intelligence (AI) is to build a model that learns a new task by understanding the human-readable instructions that define it. To study this, I led the development of NATURAL INSTRUCTIONS and SUPERNATURAL INSTRUCTIONS, large-scale datasets of diverse tasks, their human-authored instructions, and instances. I adopt generative pre-trained language models to encode task-specific instructions along with input and generate task output. Empirical results in my experiments indicate that the instruction-tuning helps models achieve cross-task generalization. This leads to the question: how to write good instructions? Backed by extensive empirical analysis on large language models, I observe important attributes for successful instructional prompts and propose several reframing techniques for model designers to create such prompts. Empirical results in my experiments show that reframing notably improves few-shot learning performance; this is particularly important on large language models, such as GPT3 where tuning models or prompts on large datasets is expensive. In another experiment, I observe that representing a chain of thought instruction of mathematical reasoning questions as a program improves model performance significantly. This observation leads to the development of a large scale mathematical reasoning model BHASKAR and a unified benchmark LILA. In case of program synthesis tasks, however, summarizing a question (instead of expanding as in chain of thought) helps models significantly. This thesis also contains the study of instruction-example equivalence, power of decomposition instruction to replace the need for new models and origination of dataset bias from crowdsourcing instructions to better understand the advantages and disadvantages of instruction paradigm. Finally, I apply the instruction paradigm to match real user needs and introduce a new prompting technique HELP ME THINK to help humans perform various tasks by asking questions.
Date Created

Using Language Models to Generate Text-to-SQL Training Data An Approach to Improve Performance of a Text-to-SQL Parser

187426-Thumbnail Image.png
Code Generation is a task that has gained rapid progress in Natural Language Processing (NLP) research. This thesis focuses on the text-to-Structured Query Language (SQL) task, where the input is a question about a specific database and the output is

Code Generation is a task that has gained rapid progress in Natural Language Processing (NLP) research. This thesis focuses on the text-to-Structured Query Language (SQL) task, where the input is a question about a specific database and the output is the SQL that when executed will return the desired answer. The data creation process bottlenecks current text-to-SQL datasets. The technical knowledge required to understand and create SQL makes crowd-sourcing a dataset expensive and time-consuming. Thus, existing datasets do not provide a robust enough training set for state-of-the-art semantic parsing models. This thesis outlines my technique for generating a text-to-SQL dataset using GPT3 and prompt engineering techniques. My approach entails providing the Generative Pretrained Transformer 3 model (GPT-3) with particular instructions to build a rigorous text-to-SQL dataset. In this paper, I show that the created pairs have excellent quality and diversity, and when utilized as training data, they can enhance the accuracy of SQL generation models. I expect that my method will be of interest to academics in the disciplines of NLP because it can considerably reduce the time, effort, and cost necessary to produce large, high-quality text-to-SQL datasets. Furthermore, my approach can be extended to other tasks and domains to alleviate the burden of curating human-annotated data.
Date Created