TY - JOUR A1 - Bornheim, Tobias A1 - Grieger, Niklas A1 - Blaneck, Patrick Gustav A1 - Bialonski, Stephan T1 - Speaker Attribution in German Parliamentary Debates with QLoRA-adapted Large Language Models JF - Journal for language technology and computational linguistics : JLCL N2 - The growing body of political texts opens up new opportunities for rich insights into political dynamics and ideologies but also increases the workload for manual analysis. Automated speaker attribution, which detects who said what to whom in a speech event and is closely related to semantic role labeling, is an important processing step for computational text analysis. We study the potential of the large language model family Llama 2 to automate speaker attribution in German parliamentary debates from 2017-2021. We fine-tune Llama 2 with QLoRA, an efficient training strategy, and observe our approach to achieve competitive performance in the GermEval 2023 Shared Task On Speaker Attribution in German News Articles and Parliamentary Debates. Our results shed light on the capabilities of large language models in automating speaker attribution, revealing a promising avenue for computational analysis of political discourse and the development of semantic role labeling systems. KW - large language models KW - German KW - speaker attribution KW - semantic role labeling Y1 - 2024 U6 - http://dx.doi.org/10.21248/jlcl.37.2024.244 SN - 2190-6858 VL - 37 IS - 1 PB - Gesellschaft für Sprachtechnologie und Computerlinguistik CY - Regensburg ER - TY - INPR A1 - Bornheim, Tobias A1 - Niklas, Grieger A1 - Blaneck, Patrick Gustav A1 - Bialonski, Stephan T1 - Preprint: Speaker attribution in German parliamentary debates with QLoRA-adapted large language models T2 - Journal for Language Technology and Computational Linguistics N2 - The growing body of political texts opens up new opportunities for rich insights into political dynamics and ideologies but also increases the workload for manual analysis. Automated speaker attribution, which detects who said what to whom in a speech event and is closely related to semantic role labeling, is an important processing step for computational text analysis. We study the potential of the large language model family Llama 2 to automate speaker attribution in German parliamentary debates from 2017-2021. We fine-tune Llama 2 with QLoRA, an efficient training strategy, and observe our approach to achieve competitive performance in the GermEval 2023 Shared Task On Speaker Attribution in German News Articles and Parliamentary Debates. Our results shed light on the capabilities of large language models in automating speaker attribution, revealing a promising avenue for computational analysis of political discourse and the development of semantic role labeling systems. Y1 - 2023 U6 - http://dx.doi.org/10.48550/arXiv.2309.09902 N1 - Veröffentlichte Version verfügbar unter: https://doi.org/10.21248/jlcl.37.2024.244 ER - TY - CHAP A1 - Blaneck, Patrick Gustav A1 - Bornheim, Tobias A1 - Grieger, Niklas A1 - Bialonski, Stephan T1 - Automatic readability assessment of german sentences with transformer ensembles T2 - Proceedings of the GermEval 2022 Workshop on Text Complexity Assessment of German Text N2 - Reliable methods for automatic readability assessment have the potential to impact a variety of fields, ranging from machine translation to self-informed learning. Recently, large language models for the German language (such as GBERT and GPT-2-Wechsel) have become available, allowing to develop Deep Learning based approaches that promise to further improve automatic readability assessment. In this contribution, we studied the ability of ensembles of fine-tuned GBERT and GPT-2-Wechsel models to reliably predict the readability of German sentences. We combined these models with linguistic features and investigated the dependence of prediction performance on ensemble size and composition. Mixed ensembles of GBERT and GPT-2-Wechsel performed better than ensembles of the same size consisting of only GBERT or GPT-2-Wechsel models. Our models were evaluated in the GermEval 2022 Shared Task on Text Complexity Assessment on data of German sentences. On out-of-sample data, our best ensemble achieved a root mean squared error of 0:435. Y1 - 2022 U6 - http://dx.doi.org/10.48550/arXiv.2209.04299 N1 - Proceedings of the 18th Conference on Natural Language Processing/Konferenz zur Verarbeitung natürlicher Sprache (KONVENS 2022) 12-15 September, 2022 University of Potsdam Potsdam, Germany SP - 57 EP - 62 PB - Association for Computational Linguistics CY - Potsdam ER - TY - CHAP A1 - Bornheim, Tobias A1 - Grieger, Niklas A1 - Bialonski, Stephan T1 - FHAC at GermEval 2021: Identifying German toxic, engaging, and fact-claiming comments with ensemble learning T2 - Proceedings of the GermEval 2021 Workshop on the Identification of Toxic, Engaging, and Fact-Claiming Comments : 17th Conference on Natural Language Processing KONVENS 2021 Y1 - 2021 U6 - http://dx.doi.org/10.48415/2021/fhw5-x128 N1 - SP - 105 EP - 111 PB - Heinrich Heine University CY - Düsseldorf ER -