Description
Extracting geographical data from unstructured text fields in medical research publications is essential for analyzing the global distribution of medical research efforts. Understanding these patterns aids in identifying research resource allocation and highlights regions requiring more attention. The study leverages

Extracting geographical data from unstructured text fields in medical research publications is essential for analyzing the global distribution of medical research efforts. Understanding these patterns aids in identifying research resource allocation and highlights regions requiring more attention. The study leverages research publication data from PubMed to extract geographical information from the metadata associated with each article. The proposed method involves developing a sophisticated Natural Language Processing (NLP) model using Bidirectional Encoder Representations from Transformers (BERT), Hugging Face transformers, and named entity recognition (NER) tools. This model can handle diverse data structures and terminology inconsistencies present in medical literature. The implications of this research are significant for furthering advancements in health informatics and computational linguistics. This methodology provides a robust framework for analyzing the geographical distribution of medical research. The automated extraction method reduces the possibility of human error and enhances the reliability of location extraction. Compared to traditional methods like string matching or manual extraction, the NLP-based approach offers greater accuracy and efficiency, significantly reducing the time and effort required for data processing.
Reuse Permissions
  • Downloads
    PDF (1.5 MB)

    Details

    Title
    • Extraction of Geographical Location Data From Unstructured Text Fields of Medical Research Publications
    Contributors
    Date Created
    2024
    Subjects
    Resource Type
  • Text
  • Collections this item is in
    Note
    • Partial requirement for: M.S., Arizona State University, 2024
    • Field of study: Computer Science

    Machine-readable links