GET THE APP

..

Journal of Health & Medical Informatics

ISSN: 2157-7420

Open Access

Cohort Identification for Trampoline-associated Traumatic Dental Injuries among Pediatric Patients from Clinical Notes using Machine Learning

Abstract

Joseph W. Sirrianni*, Jin Peng, Yungui Huang and Homa Amini

Background: Cohort identification is a crucial task for performing retrospective clinical analysis. The utilization of natural language processing, especially the modern and advanced approaches using deep learning modeling, may improve this task by allowing for improved classification of patients by cohort status. However, this utilization has not been applied in the dentaldomain.

Objective: We aim to identify patients that suffer trampoline-associated traumatic dental injuries among all trampoline-associatedinjuries.

Methods: We develop and apply a natural language processing cohort identification pipeline, consisting of text filtering rules and a machine learning model trained using historic data. The pipeline processes a patient’s clinical notes for a series of temporally related encounters and produces a binary prediction of whether the patient has suffered a trampoline-injury or not. We experimented with six different machine learning models: logistic regression, random forest, decision tress, linear-SVM, naïve bayes, and a fine-tuned ClinicalBERT model.

Results: The fine-tuned ClinicalBERT model had the best performance of the models on our evaluation data with a PPV of 0.836 and a sensitivity of 0.898. The application of the pipeline on our data increased the cohort size for all trampoline injuries from an initial 7454 patients to 15,010 patients and the trampoline-associated traumatic dental injuries cohort from an initial 102 patients to 140 patients.

Conclusion: We present a novel natural language processing powered pipeline for identifying a trampoline-associated injury cohort for dental research. Our results demonstrate the superiority of deep learning over traditional machine learning models on our specific task. Our process for identifying patient encounters by activity type is generalizable to several different types of injuries and applicable to other research cohorts.

HTML PDF

Share this article

Google Scholar citation report
Citations: 2128

Journal of Health & Medical Informatics received 2128 citations as per Google Scholar report

Journal of Health & Medical Informatics peer review process verified at publons

Indexed In

 
arrow_upward arrow_upward