Comparative performance analysis of active learning strategies for the entity recognition task
- Supervised learning requires a lot of annotated data, which makes the annotation process time-consuming and expensive. Active Learning (AL) offers a promising solution by reducing the number of labeled data needed while maintaining model performance. This work focuses on the application of supervised learning and AL for (named) entity recognition, which is a subdiscipline of Natural Language Processing (NLP). Despite the potential of AL in this area, there is still a limited understanding of the performance of different approaches. We address this gap by conducting a comparative performance analysis with diverse, carefully selected corpora and AL strategies. Thereby, we establish a standardized evaluation setting to ensure reproducibility and consistency across experiments. With our analysis, we discover scenarios where AL provides performance improvements and others where its benefits are limited. In particular, we find that strategies including historical information from the learn ing process and maximizing entity information yield the most significant improvements. Our findings can guide researchers and practitioners in optimizing their annotation efforts.
Author: | Philipp KohlORCiD, Yoka KrämerORCiD, Claudia Fohry, Bodo Kraft |
---|---|
DOI: | https://doi.org/10.5220/0013068200003838 |
ISBN: | 978-989-758-716-0 |
ISSN: | 2184-3228 |
Parent Title (English): | Proceedings of the 16th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - Volume 1 |
Publisher: | SciTePress |
Place of publication: | Setúbal |
Document Type: | Conference Proceeding |
Language: | English |
Year of Completion: | 2024 |
Tag: | Active Learning; Annotation Effort; Named Entity Recognition; Selective Sampling; Span Labeling |
First Page: | 480 |
Last Page: | 488 |
Note: | 16th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management, November 17-19, 2024, in Porto, Portugal |
Link: | https://doi.org/10.5220/0013068200003838 |
Zugriffsart: | weltweit |
Institutes: | FH Aachen / Fachbereich Medizintechnik und Technomathematik |
open_access (DINI-Set): | open_access |
Licence (German): | Creative Commons - Namensnennung-Nicht kommerziell-Keine Bearbeitung |