Full metadata
Title
Extraction of Geographical Location Data From Unstructured Text Fields of Medical Research Publications
Description
Extracting geographical data from unstructured text fields in medical research publications is essential for analyzing the global distribution of medical research efforts. Understanding these patterns aids in identifying research resource allocation and highlights regions requiring more attention. The study leverages research publication data from PubMed to extract geographical information from the metadata associated with each article. The proposed method involves developing a sophisticated Natural Language Processing (NLP) model using Bidirectional Encoder Representations from Transformers (BERT), Hugging Face transformers, and named entity recognition (NER) tools. This model can handle diverse data structures and terminology inconsistencies present in medical literature. The implications of this research are significant for furthering advancements in health informatics and computational linguistics. This methodology provides a robust framework for analyzing the geographical distribution of medical research. The automated extraction method reduces the possibility of human error and enhances the reliability of location extraction. Compared to traditional methods like string matching or manual extraction, the NLP-based approach offers greater accuracy and efficiency, significantly reducing the time and effort required for data processing.
Date Created
2024
Contributors
- Bathini, Venkata Bharath (Author)
- Kelley, Christy (Thesis advisor)
- Poste, George (Committee member)
- Davulcu, Hasan (Committee member)
- Arizona State University (Publisher)
Topical Subject
Resource Type
Extent
43 pages
Language
eng
Copyright Statement
In Copyright
Primary Member of
Peer-reviewed
No
Open Access
No
Handle
https://hdl.handle.net/2286/R.2.N.194711
Level of coding
minimal
Cataloging Standards
Note
Partial requirement for: M.S., Arizona State University, 2024
Field of study: Computer Science
System Created
- 2024-07-03 05:38:09
System Modified
- 2024-07-03 05:38:13
- 5 months 3 weeks ago
Additional Formats