Comparative performance analysis of active learning strategies for the entity recognition task

  • Supervised learning requires a lot of annotated data, which makes the annotation process time-consuming and expensive. Active Learning (AL) offers a promising solution by reducing the number of labeled data needed while maintaining model performance. This work focuses on the application of supervised learning and AL for (named) entity recognition, which is a subdiscipline of Natural Language Processing (NLP). Despite the potential of AL in this area, there is still a limited understanding of the performance of different approaches. We address this gap by conducting a comparative performance analysis with diverse, carefully selected corpora and AL strategies. Thereby, we establish a standardized evaluation setting to ensure reproducibility and consistency across experiments. With our analysis, we discover scenarios where AL provides performance improvements and others where its benefits are limited. In particular, we find that strategies including historical information from the learn ing process and maximizing entity information yield the most significant improvements. Our findings can guide researchers and practitioners in optimizing their annotation efforts.

Export metadata

Additional Services

Share in X Search Google Scholar
Metadaten
Author:Philipp KohlORCiD, Yoka KrämerORCiD, Claudia Fohry, Bodo Kraft
DOI:https://doi.org/10.5220/0013068200003838
ISBN:978-989-758-716-0
ISSN:2184-3228
Parent Title (English):Proceedings of the 16th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1
Publisher:SciTePress
Place of publication:Setúbal
Document Type:Conference Proceeding
Language:English
Year of Completion:2024
Tag:Active Learning; Annotation Effort; Named Entity Recognition; Selective Sampling; Span Labeling
First Page:480
Last Page:488
Note:
16th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management, November 17-19, 2024, in Porto, Portugal
Link:https://doi.org/10.5220/0013068200003838
Zugriffsart:weltweit
Institutes:FH Aachen / Fachbereich Medizintechnik und Technomathematik
open_access (DINI-Set):open_access
Licence (German): Creative Commons - Namensnennung-Nicht kommerziell-Keine Bearbeitung