Extraction of Geographical Location Data From Unstructured Text Fields of Medical Research Publications

Bathini, Venkata Bharath

Extracting geographical data from unstructured text fields in medical research publications is essential for analyzing the global distribution of medical research efforts. Understanding these patterns aids in identifying research resource allocation and highlights regions requiring more attention. The study leverages…

Extracting geographical data from unstructured text fields in medical research publications is essential for analyzing the global distribution of medical research efforts. Understanding these patterns aids in identifying research resource allocation and highlights regions requiring more attention. The study leverages research publication data from PubMed to extract geographical information from the metadata associated with each article. The proposed method involves developing a sophisticated Natural Language Processing (NLP) model using Bidirectional Encoder Representations from Transformers (BERT), Hugging Face transformers, and named entity recognition (NER) tools. This model can handle diverse data structures and terminology inconsistencies present in medical literature. The implications of this research are significant for furthering advancements in health informatics and computational linguistics. This methodology provides a robust framework for analyzing the geographical distribution of medical research. The automated extraction method reduces the possibility of human error and enhances the reliability of location extraction. Compared to traditional methods like string matching or manual extraction, the NLP-based approach offers greater accuracy and efficiency, significantly reducing the time and effort required for data processing.

Copyright Statement