Automation of Title and Abstract Screening for Clinical Systematic Reviews

161889-Thumbnail Image.png
Description
Systematic Reviews (SRs) aim to synthesize the totality of evidence for clinical practice and are important in making clinical practice guidelines and health policy decisions. However, conducting SRs manually is a laborious and time-consuming process. This challenge is growing due

Systematic Reviews (SRs) aim to synthesize the totality of evidence for clinical practice and are important in making clinical practice guidelines and health policy decisions. However, conducting SRs manually is a laborious and time-consuming process. This challenge is growing due to the increase in the number of databases to search and the papers being published. Hence, the automation of SRs is an essential task. The goal of this thesis work is to develop Natural Language Processing (NLP)-based classifiers to automate the title and abstract-based screening for clinical SRs based on inclusion/exclusion criteria. In clinical SRs, a high-sensitivity system is a key requirement. Most existing methods for SRs use binary classification systems trained on labeled data to predict inclusion/exclusion. While previous studies have shown that NLP-based classification methods can automate title and abstract-based screening for SRs, methods for achieving high-sensitivity have not been empirically studied. In addition, the training strategy for binary classification has several limitations: (1) it ignores the inclusion/exclusion criteria, (2) lacks generalization ability, (3) suffers from low resource data, and (4) fails to achieve reasonable precision at high-sensitivity levels. This thesis work presents contributions to several aspects of the clinical systematic review domain. First, it presents an empirical study of NLP-based supervised text classification and high-sensitivity methods on datasets developed from six different SRs in the clinical domain. Second, this thesis work provides a novel approach to view SR as a Question Answering (QA) problem in order to overcome the limitations of the binary classification training strategy; and propose a more general abstract screening model for different SRs. Finally, this work provides a new QA-based dataset for six different SRs which is made available to the community.
Date Created
2021
Agent

Self-supervised Representation Learning via Image Out-painting for Medical Image Analysis

158615-Thumbnail Image.png
Description
In recent years, Convolutional Neural Networks (CNNs) have been widely used in not only the computer vision community but also within the medical imaging community. Specifically, the use of pre-trained CNNs on large-scale datasets (e.g., ImageNet) via transfer learning for

In recent years, Convolutional Neural Networks (CNNs) have been widely used in not only the computer vision community but also within the medical imaging community. Specifically, the use of pre-trained CNNs on large-scale datasets (e.g., ImageNet) via transfer learning for a variety of medical imaging applications, has become the de facto standard within both communities.

However, to fit the current paradigm, 3D imaging tasks have to be reformulated and solved in 2D, losing rich 3D contextual information. Moreover, pre-trained models on natural images never see any biomedical images and do not have knowledge about anatomical structures present in medical images. To overcome the above limitations, this thesis proposes an image out-painting self-supervised proxy task to develop pre-trained models directly from medical images without utilizing systematic annotations. The idea is to randomly mask an image and train the model to predict the missing region. It is demonstrated that by predicting missing anatomical structures when seeing only parts of the image, the model will learn generic representation yielding better performance on various medical imaging applications via transfer learning.

The extensive experiments demonstrate that the proposed proxy task outperforms training from scratch in six out of seven medical imaging applications covering 2D and 3D classification and segmentation. Moreover, image out-painting proxy task offers competitive performance to state-of-the-art models pre-trained on ImageNet and other self-supervised baselines such as in-painting. Owing to its outstanding performance, out-painting is utilized as one of the self-supervised proxy tasks to provide generic 3D pre-trained models for medical image analysis.
Date Created
2020
Agent

Multi-Perspective Semantic Information Retrieval in the Biomedical Domain

158461-Thumbnail Image.png
Description
Information Retrieval (IR) is the task of obtaining pieces of data (such as documents or snippets of text) that are relevant to a particular query or need from a large repository of information. IR is a valuable component

Information Retrieval (IR) is the task of obtaining pieces of data (such as documents or snippets of text) that are relevant to a particular query or need from a large repository of information. IR is a valuable component of several downstream Natural Language Processing (NLP) tasks, such as Question Answering. Practically, IR is at the heart of many widely-used technologies like search engines.

While probabilistic ranking functions, such as the Okapi BM25 function, have been utilized in IR systems since the 1970's, modern neural approaches pose certain advantages compared to their classical counterparts. In particular, the release of BERT (Bidirectional Encoder Representations from Transformers) has had a significant impact in the NLP community by demonstrating how the use of a Masked Language Model (MLM) trained on a considerable corpus of data can improve a variety of downstream NLP tasks, including sentence classification and passage re-ranking.

IR Systems are also important in the biomedical and clinical domains. Given the continuously-increasing amount of scientific literature across biomedical domain, the ability find answers to specific clinical queries from a repository of millions of articles is a matter of practical value to medics, doctors, and other medical professionals. Moreover, there are domain-specific challenges present in the biomedical domain, including handling clinical jargon and evaluating the similarity or relatedness of various medical symptoms when determining the relevance between a query and a sentence.

This work presents contributions to several aspects of the Biomedical Semantic Information Retrieval domain. First, it introduces Multi-Perspective Sentence Relevance, a novel methodology of utilizing BERT-based models for contextual IR. The system is evaluated using the BioASQ Biomedical IR Challenge. Finally, practical contributions in the form of a live IR system for medics and a proposed challenge on the Living Systematic Review clinical task are provided.
Date Created
2020
Agent

Analysis of Tweets for Social Media Health Applications

157902-Thumbnail Image.png
Description
Social networking sites like Twitter have provided people a platform to connect

with each other, to discuss and share information and news or to entertain themselves. As the number of users continues to grow there has been explosive growth in the

Social networking sites like Twitter have provided people a platform to connect

with each other, to discuss and share information and news or to entertain themselves. As the number of users continues to grow there has been explosive growth in the data generated by these users. Such a vast data source has provided researchers a way to study and monitor public health.

Accurately analyzing tweets is a difficult task mainly because of their short length, the inventive spellings and creative language expressions. Instead of focusing at the topic level, identifying tweets that have personal health experience mentions would be more helpful to researchers, governments and other organizations. Another important limitation in the current systems for social media health applications is the use of a disease-specific model and dataset to study a particular disease. Identifying adverse drug reactions is an important part of the drug development process. Detecting and extracting adverse drug mentions in tweets can supplement the list of adverse drug reactions that result from the drug trials and can help in the improvement of the drugs.

This thesis aims to address these two challenges and proposes three systems. A generalizable system to identify personal health experience mentions across different disease domains, a system for automatic classifications of adverse effects mentions in tweets and a system to extract adverse drug mentions from tweets. The proposed systems use the transfer learning from language models to achieve notable scores on Social Media Mining for Health Applications(SMM4H) 2019 (Weissenbacher et al. 2019) shared tasks.
Date Created
2019
Agent

Can knowledge rich sentences help language models to solve common sense reasoning problems?

157871-Thumbnail Image.png
Description
Significance of real-world knowledge for Natural Language Understanding(NLU) is well-known for decades. With advancements in technology, challenging tasks like question-answering, text-summarizing, and machine translation are made possible with continuous efforts in the field of Natural Language Processing(NLP). Yet, knowledge integration

Significance of real-world knowledge for Natural Language Understanding(NLU) is well-known for decades. With advancements in technology, challenging tasks like question-answering, text-summarizing, and machine translation are made possible with continuous efforts in the field of Natural Language Processing(NLP). Yet, knowledge integration to answer common sense questions is still a daunting task. Logical reasoning has been a resort for many of the problems in NLP and has achieved considerable results in the field, but it is difficult to resolve the ambiguities in a natural language. Co-reference resolution is one of the problems where ambiguity arises due to the semantics of the sentence. Another such problem is the cause and result statements which require causal commonsense reasoning to resolve the ambiguity. Modeling these type of problems is not a simple task with rules or logic. State-of-the-art systems addressing these problems use a trained neural network model, which claims to have overall knowledge from a huge trained corpus. These systems answer the questions by using the knowledge embedded in their trained language model. Although the language models embed the knowledge from the data, they use occurrences of words and frequency of co-existing words to solve the prevailing ambiguity. This limits the performance of language models to solve the problems in common-sense reasoning task as it generalizes the concept rather than trying to answer the problem specific to its context. For example, "The painting in Mark's living room shows an oak tree. It is to the right of a house", is a co-reference resolution problem which requires knowledge. Language models can resolve whether "it" refers to "painting" or "tree", since "house" and "tree" are two common co-occurring words so the models can resolve "tree" to be the co-reference. On the other hand, "The large ball crashed right through the table. Because it was made of Styrofoam ." to resolve for "it" which can be either "table" or "ball", is difficult for a language model as it requires more information about the problem.

In this work, I have built an end-to-end framework, which uses the automatically extracted knowledge based on the problem. This knowledge is augmented with the language models using an explicit reasoning module to resolve the ambiguity. This system is built to improve the accuracy of the language models based approaches for commonsense reasoning. This system has proved to achieve the state of the art accuracy on the Winograd Schema Challenge.
Date Created
2019
Agent

Knowledge Representation, Reasoning and Learning for Non-Extractive Reading Comprehension

157780-Thumbnail Image.png
Description
While in recent years deep learning (DL) based approaches have been the popular approach in developing end-to-end question answering (QA) systems, such systems lack several desired properties, such as the ability to do sophisticated reasoning with knowledge, the ability to

While in recent years deep learning (DL) based approaches have been the popular approach in developing end-to-end question answering (QA) systems, such systems lack several desired properties, such as the ability to do sophisticated reasoning with knowledge, the ability to learn using less resources and interpretability. In this thesis, I explore solutions that aim to address these drawbacks.

Towards this goal, I work with a specific family of reading comprehension tasks, normally referred to as the Non-Extractive Reading Comprehension (NRC), where the given passage does not contain enough information and to correctly answer sophisticated reasoning and ``additional knowledge" is required. I have organized the NRC tasks into three categories. Here I present my solutions to the first two categories and some preliminary results on the third category.

Category 1 NRC tasks refer to the scenarios where the required ``additional knowledge" is missing but there exists a decent natural language parser. For these tasks, I learn the missing ``additional knowledge" with the help of the parser and a novel inductive logic programming. The learned knowledge is then used to answer new questions. Experiments on three NRC tasks show that this approach along with providing an interpretable solution achieves better or comparable accuracy to that of the state-of-the-art DL based approaches.

The category 2 NRC tasks refer to the alternate scenario where the ``additional knowledge" is available but no natural language parser works well for the sentences of the target domain. To deal with these tasks, I present a novel hybrid reasoning approach which combines symbolic and natural language inference (neural reasoning) and ultimately allows symbolic modules to reason over raw text without requiring any translation. Experiments on two NRC tasks shows its effectiveness.

The category 3 neither provide the ``missing knowledge" and nor a good parser. This thesis does not provide an interpretable solution for this category but some preliminary results and analysis of a pure DL based approach. Nonetheless, the thesis shows beyond the world of pure DL based approaches, there are tools that can offer interpretable solutions for challenging tasks without using much resource and possibly with better accuracy.
Date Created
2019
Agent

Using Natural Language Processing to Identify Questions and Answers Written by People Addicted to Opioids

131987-Thumbnail Image.png
Description
Background: Natural Language Processing models have been trained to locate questions and answers in forum settings before but on topics such as cancer and diabetes. Also, studies have used filtering methods to understand themes in forum settings regarding opioid use.

Background: Natural Language Processing models have been trained to locate questions and answers in forum settings before but on topics such as cancer and diabetes. Also, studies have used filtering methods to understand themes in forum settings regarding opioid use. However, studies have not been conducted regarding training an NLP model to locate the questions people addicted to opioids are asking their peers and the answers they are receiving in forums. There are a variety of annotation tools available to help aid the data collection to train NLP models. For academic purposes, brat is the best tool for this purpose. This study will inform clinical practice by indicating what the inner thoughts of their patients who are addicted to opioids are so that they will be able to have more meaningful conversations during appointments that the patient may be too afraid to start.

Methods: The standard NLP process was used for this study in which a gold standard was reached through matched paired annotations of the forum text in brat and a neural network was trained on the content. Following the annotation process, adjudication occurred to increase the inter-annotator agreement. Categories were developed by local physicians to describe the questions and three pilots were run to test the best way to categorize the questions.

Results: The inter-annotator agreement, calculated via F-score, before adjudication for a 0.7 threshold was 0.378 for the annotation activity. After adjudication at a threshold of 0.7, the inter-annotator agreement increased to 0.560. Pilots 1, 2, and 3 of the categorization activity had an inter-annotator agreement of 0.375, 0.5, and 0.966 respectively.

Discussion: The inter-annotator agreement of the annotation activity may have been low initially since the annotators were students who may have not been as invested in the project as necessary to accurately annotate the text. Also, as everyone interprets the text slightly differently, it is possible that that contributed to the differences in the matched pairs’ annotations. The F-score variation for the categorization activity partially had to do with different delivery systems of the instructions and partially with the area of study of the participants. The first pilot did not mandate the use of the original context located in brat and the instructions were provided in the form of a downloadable document. The participants were computer science graduate students. The second pilot also had the instructions delivered via a document, but it was strongly suggested that the context be used to gain an understanding of the questions’ meanings. The participants were also computer science graduate students who upon a discussion of their results after the pilot expressed that they did not have a good understanding of the medical jargon in the posts. The final pilot used a combination of students with and without medical background, required to use the context, and included verbal instructions in combination with the written ones. The combination of these factors increased the F-score significantly. For a full-scale experiment, students with a medical background should be used to categorize the questions.
Date Created
2019-12
Agent

Image Reference Extraction: Linking References to Radiology Images with Clinically Significant Findings

132648-Thumbnail Image.png
Description
Background: Pulmonary embolism is a deadly condition that is often diagnosed using a technique known as computed tomography pulmonary angiography (CTPA). CTPA reports are free-text, narrative-style forms of documentation conferring radiologist findings—both primary (regarding pulmonary embolism) and incidental. This project

Background: Pulmonary embolism is a deadly condition that is often diagnosed using a technique known as computed tomography pulmonary angiography (CTPA). CTPA reports are free-text, narrative-style forms of documentation conferring radiologist findings—both primary (regarding pulmonary embolism) and incidental. This project seeks to combine simple natural language processing (NLP) techniques, such as regular expressions and rules, to build upon and
further process output from a machine learning based named entity recognition (NER) tool for the purposes of (1) linking references to radiological images with the corresponding clinical findings and (2) extracting primary and incidental findings.

Methods: The project’s system utilized a regular expression to extract image references. All CTPA reports were first processed with NER software to obtain the text and spans of clinical findings. A heuristic was used to determine the appropriate clinical finding that should be linked with a particular image reference. Another regular expression was used to extract primary findings from NER output; the remaining findings were considered incidental. Performance was
assessed against a gold standard, which was based upon a manually annotated version of the CTPA reports used in this project.

Results: Extraction of image references achieved a 100% accuracy. Linkages between these references and exact gold standard spans of the clinical findings achieved a precision of 0.24, a recall of 0.22, and an F1 score of 0.23. Linkages with partial spans of clinical findings as determined by the gold standard achieved a precision of 0.71, a recall of 0.67, and an F1 score of 0.69. Primary and incidental finding extraction achieved a precision of 0.67, a recall of 0.80, and
an F1 score of 0.73.

Discussion: Various elements reduced system performance such as the difficulty of exactly matching the spans of clinical findings from NER output with those found in the gold standard. The heuristic linking clinical findings and image references was especially sensitive to NER false positives and false negatives due to its assumption that the appropriate clinical finding was that which was immediately prior to the image reference. Although the system did not perform as well as hoped, lessons were learned such as the need for clear research methodology and proper gold standard creation; without a proper gold standard, problem scope and system performance cannot be properly assessed. Improvements to the system include creating a more robust heuristic, sifting NER false positives, and training the NER tool used on a dataset of CTPA reports.
Date Created
2019-05
Agent