Evaluation of Automatic Annotation Performed on a Corpus Of Data from Family Medicine

Charlotte Siefridt

doi:10.37421/2329-9126.2022.10.474

Perspective - (2022) Volume 10, Issue 9

Evaluation of Automatic Annotation Performed on a Corpus Of Data from Family Medicine

Charlotte Siefridt^*

^*Correspondence: Charlotte Siefridt, Department of General Medicine, Rouen University Hospital, Rouen, France, Email:

Author information

Department of General Medicine, Rouen University Hospital, Rouen, France

Received: 04-Sep-2022, Manuscript No. JGPR-22-79405; Editor assigned: 05-Sep-2022, Pre QC No. P-79405; Reviewed: 16-Sep-2022, QC No. Q-79405; Revised: 21-Sep-2022, Manuscript No. R-79405; Published: 28-Sep-2022 , DOI: 10.37421/2329-9126.2022.10.474
Citation: Siefridt, Charlotte. “Evaluation of Automatic Annotation Performed on a Corpus of Data from Family Medicine.” J Gen Prac 10 (2022): 474.
Copyright: © 2022 Siefridt C. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Introduction

Family medicine research is required to raise the standard of care. Publications in general medicine continue to be few. Electronic medical record databases may boost the quantity of these articles. To be properly utilised, these data must be coded. This study's goal was to evaluate the effectiveness of the semantic annotation produced by a multi-terminological concept extractor in a corpus of family medical consultations. Family medicine research is required to raise the standard of care. The quality of recent articles is still inconsistent. The quality of these activities can be improved by using databases derived from electronic medical information. To be properly utilised, these data must be coded. This study's goal was to evaluate the level of semantic annotation produced by a multiterminological concept extractor in a corpus of consultations for family medicine. Method 25 general practitioners' French consultation data were automatically tagged with 28 distinct terminologies. The information was gathered and divided into three categories: observations, consultation findings, and consultation reasons. Following the first evaluation, the tool underwent a rectification phase before undergoing a second evaluation. Precision, recall, and F-measure were calculated for each evaluation. In family medicine, consultation data can be automatically annotated using a multi-terminological ideas extractor. The absence of routine coding might be fixed by integrating such a tool into the business software used by general practitioners. Creating a common terminology for family medicine could help with coding, semantic interoperability, and the exchange of pertinent information [1].

The Clinical Research Practice Datalink (CPRD), created in the United Kingdom, has established its applicability for promoting public health, enhancing procedures, and raising publication standards. Such databases, which are private company-owned (CEGEDIM, IMS-Health) or diseasespecific, are no longer in use in France (Sentinel network). Building a health data warehouse for family medicine is one of the goals of the French project "Regional Information Platform in General Medicine." It was created by 14 general practitioners (GPs) from Normandy and 11 GPs from Provence, the Alps, and the French Riviera. The data in EMRs must be structured and standardised in order to guarantee interoperability as well as completeness and dependability. Only 13% of French doctors utilise coding [2].

Description

An experienced GP (CS), known as the MTCE Evaluator (MTCEe), conducted the evaluation utilising a special web application. It showed the relevant document on one side and a list of the identified concepts on the other. The annotation's relevancy was evaluated using one of four categories: "valid," "false," "irrelevant," and "to be verified." When a concept was correctly detected but did not add anything when annotated separately, such as laterality, a 6 annotation was deemed "irrelevant". If the evaluator was unable to determine whether an annotation was accurate, it was deemed to be "to be confirmed." Whether a section of the document had already been annotated or not, it was also feasible to manually add an annotation to it. The assessor could consult two more authors (JG and MS) [3].

Statistics were automatically generated from this examination in order to determine common metrics including precision, recall, and F-measure. The fraction of accurate data among all detected data was referred to as precision. The number of annotations confirmed by the evaluator to the total number of concepts annotated by the MTCE was used to calculate precision. The percentage of data that were correctly identified out of all correctly identified data was called recall. Recall in this instance was determined by dividing the total number of concepts that should have been discovered by the number of concepts that the evaluator approved. This statistic contains the ideas that the evaluator personally inserted [4,5].

Conclusion

In parallel, the most frequent errors were determined and categorised in a tabular file. The goal was to increase the MTCE's efficiency. The MTCE processed negative results. The idea was deemed false if the negation was included in the document but ignored by the MTCE. Corrections to the tool or the terminological content were made possible thanks to the mistake analysis. Thus, HeTOP was unable to recognise this language if the meaning was vague or unimportant, such as "autre" (other). If the inaccuracy was context-related, the appropriate terminology, such as "radio," was set aside for this concept's annotation in the second analysis. In the second evaluation, it was anticipated that a different phrase would be utilised to extract this concept. The term was then removed from the HeTOP site if it was a fake synonym or acronym, such as "symptome" (symptom) for "syndrome" (syndrom). The MTCE parameters were changed to account for stemming mistakes. Consequently, the number of letters that make up a root has risen. Compound terms were initially viewed by the MTCE as two separate words. The MTCE treated them as a single word after repair.

The majority of the useless annotations were located in the observations, especially where there were numerous phrases like "tablet," "between meals," etc. The existence of common terms, which are more common in family medicine consultations, can be used to explain the prevalence of contextual errors in the three corpora. The vagueness and polysemy of various acronyms account for their use frequently, especially in observations and consultationrelated justifications. The process of correcting naturally produced other mistakes. The outcomes of the consultation and the observation tool performed better overall after this step of rectification. The minor improvement in precision (0.80 and then 0.82) and the decline in recall led to a moderate decline in tool performance for the consulting-related reasons.