Using Natural Language Processing to Identify Questions and Answers Written by People Addicted to Opioids

131987-Thumbnail Image.png
Description
Background: Natural Language Processing models have been trained to locate questions and answers in forum settings before but on topics such as cancer and diabetes. Also, studies have used filtering methods to understand themes in forum settings regarding opioid use.

Background: Natural Language Processing models have been trained to locate questions and answers in forum settings before but on topics such as cancer and diabetes. Also, studies have used filtering methods to understand themes in forum settings regarding opioid use. However, studies have not been conducted regarding training an NLP model to locate the questions people addicted to opioids are asking their peers and the answers they are receiving in forums. There are a variety of annotation tools available to help aid the data collection to train NLP models. For academic purposes, brat is the best tool for this purpose. This study will inform clinical practice by indicating what the inner thoughts of their patients who are addicted to opioids are so that they will be able to have more meaningful conversations during appointments that the patient may be too afraid to start.

Methods: The standard NLP process was used for this study in which a gold standard was reached through matched paired annotations of the forum text in brat and a neural network was trained on the content. Following the annotation process, adjudication occurred to increase the inter-annotator agreement. Categories were developed by local physicians to describe the questions and three pilots were run to test the best way to categorize the questions.

Results: The inter-annotator agreement, calculated via F-score, before adjudication for a 0.7 threshold was 0.378 for the annotation activity. After adjudication at a threshold of 0.7, the inter-annotator agreement increased to 0.560. Pilots 1, 2, and 3 of the categorization activity had an inter-annotator agreement of 0.375, 0.5, and 0.966 respectively.

Discussion: The inter-annotator agreement of the annotation activity may have been low initially since the annotators were students who may have not been as invested in the project as necessary to accurately annotate the text. Also, as everyone interprets the text slightly differently, it is possible that that contributed to the differences in the matched pairs’ annotations. The F-score variation for the categorization activity partially had to do with different delivery systems of the instructions and partially with the area of study of the participants. The first pilot did not mandate the use of the original context located in brat and the instructions were provided in the form of a downloadable document. The participants were computer science graduate students. The second pilot also had the instructions delivered via a document, but it was strongly suggested that the context be used to gain an understanding of the questions’ meanings. The participants were also computer science graduate students who upon a discussion of their results after the pilot expressed that they did not have a good understanding of the medical jargon in the posts. The final pilot used a combination of students with and without medical background, required to use the context, and included verbal instructions in combination with the written ones. The combination of these factors increased the F-score significantly. For a full-scale experiment, students with a medical background should be used to categorize the questions.
Date Created
2019-12
Agent