Description
Extracting geographical data from unstructured text fields in medical research publications is essential for analyzing the global distribution of medical research efforts. Understanding these patterns aids in identifying research resource allocation and highlights regions requiring more attention. The study leverages research publication data from PubMed to extract geographical information from the metadata associated with each article. The proposed method involves developing a sophisticated Natural Language Processing (NLP) model using Bidirectional Encoder Representations from Transformers (BERT), Hugging Face transformers, and named entity recognition (NER) tools. This model can handle diverse data structures and terminology inconsistencies present in medical literature. The implications of this research are significant for furthering advancements in health informatics and computational linguistics. This methodology provides a robust framework for analyzing the geographical distribution of medical research. The automated extraction method reduces the possibility of human error and enhances the reliability of location extraction. Compared to traditional methods like string matching or manual extraction, the NLP-based approach offers greater accuracy and efficiency, significantly reducing the time and effort required for data processing.
Details
Title
- Extraction of Geographical Location Data From Unstructured Text Fields of Medical Research Publications
Contributors
- Bathini, Venkata Bharath (Author)
- Kelley, Christy (Thesis advisor)
- Poste, George (Committee member)
- Davulcu, Hasan (Committee member)
- Arizona State University (Publisher)
Date Created
The date the item was original created (prior to any relationship with the ASU Digital Repositories.)
2024
Subjects
Resource Type
Collections this item is in
Note
- Partial requirement for: M.S., Arizona State University, 2024
- Field of study: Computer Science