OPUS 4 | Search

Speaker Attribution in German Parliamentary Debates with QLoRA-adapted Large Language Models (2024)

Bornheim, Tobias ; Grieger, Niklas ; Blaneck, Patrick Gustav ; Bialonski, Stephan

The growing body of political texts opens up new opportunities for rich insights into political dynamics and ideologies but also increases the workload for manual analysis. Automated speaker attribution, which detects who said what to whom in a speech event and is closely related to semantic role labeling, is an important processing step for computational text analysis. We study the potential of the large language model family Llama 2 to automate speaker attribution in German parliamentary debates from 2017-2021. We fine-tune Llama 2 with QLoRA, an efficient training strategy, and observe our approach to achieve competitive performance in the GermEval 2023 Shared Task On Speaker Attribution in German News Articles and Parliamentary Debates. Our results shed light on the capabilities of large language models in automating speaker attribution, revealing a promising avenue for computational analysis of political discourse and the development of semantic role labeling systems.

Preprint: Data-efficient sleep staging with synthetic time series pretraining (2024)

Grieger, Niklas ; Mehrkanoon, Siamak ; Bialonski, Stephan

Analyzing electroencephalographic (EEG) time series can be challenging, especially with deep neural networks, due to the large variability among human subjects and often small datasets. To address these challenges, various strategies, such as self-supervised learning, have been suggested, but they typically rely on extensive empirical datasets. Inspired by recent advances in computer vision, we propose a pretraining task termed "frequency pretraining" to pretrain a neural network for sleep staging by predicting the frequency content of randomly generated synthetic time series. Our experiments demonstrate that our method surpasses fully supervised learning in scenarios with limited data and few subjects, and matches its performance in regimes with many subjects. Furthermore, our results underline the relevance of frequency information for sleep stage scoring, while also demonstrating that deep neural networks utilize information beyond frequencies to enhance sleep staging performance, which is consistent with previous research. We anticipate that our approach will be advantageous across a broad spectrum of applications where EEG data is limited or derived from a small number of subjects, including the domain of brain-computer interfaces.

Preprint: Detecting sexism in German online newspaper comments with open-source text embeddings (2024)

Bremm, Florian ; Blaneck, Patrick Gustav ; Bornheim, Tobias ; Grieger, Niklas ; Bialonski, Stephan

Sexism in online media comments is a pervasive challenge that often manifests subtly, complicating moderation efforts as interpretations of what constitutes sexism can vary among individuals. We study monolingual and multilingual open-source text embeddings to reliably detect sexism and misogyny in Germanlanguage online comments from an Austrian newspaper. We observed classifiers trained on text embeddings to mimic closely the individual judgements of human annotators. Our method showed robust performance in the GermEval 2024 GerMS-Detect Subtask 1 challenge, achieving an average macro F1 score of 0.597 (4th place, as reported on Codabench). It also accurately predicted the distribution of human annotations in GerMS-Detect Subtask 2, with an average Jensen-Shannon distance of 0.301 (2nd place). The computational efficiency of our approach suggests potential for scalable applications across various languages and linguistic contexts.

Novel analytical tools reveal that local synchronization of cilia coincides with tissue-scale metachronal waves in zebrafish multiciliated epithelia (2023)

Ringers, Christa ; Bialonski, Stephan ; Ege, Mert ; Solovev, Anton ; Hansen, Jan Niklas ; Jeong, Inyoung ; Friedrich, Benjamin M. ; Jurisch-Yaksi, Nathalie

Motile cilia are hair-like cell extensions that beat periodically to generate fluid flow along various epithelial tissues within the body. In dense multiciliated carpets, cilia were shown to exhibit a remarkable coordination of their beat in the form of traveling metachronal waves, a phenomenon which supposedly enhances fluid transport. Yet, how cilia coordinate their regular beat in multiciliated epithelia to move fluids remains insufficiently understood, particularly due to lack of rigorous quantification. We combine experiments, novel analysis tools, and theory to address this knowledge gap. To investigate collective dynamics of cilia, we studied zebrafish multiciliated epithelia in the nose and the brain. We focused mainly on the zebrafish nose, due to its conserved properties with other ciliated tissues and its superior accessibility for non-invasive imaging. We revealed that cilia are synchronized only locally and that the size of local synchronization domains increases with the viscosity of the surrounding medium. Even though synchronization is local only, we observed global patterns of traveling metachronal waves across the zebrafish multiciliated epithelium. Intriguingly, these global wave direction patterns are conserved across individual fish, but different for left and right noses, unveiling a chiral asymmetry of metachronal coordination. To understand the implications of synchronization for fluid pumping, we used a computational model of a regular array of cilia. We found that local metachronal synchronization prevents steric collisions, i.e., cilia colliding with each other, and improves fluid pumping in dense cilia carpets, but hardly affects the direction of fluid flow. In conclusion, we show that local synchronization together with tissue-scale cilia alignment coincide and generate metachronal wave patterns in multiciliated epithelia, which enhance their physiological function of fluid pumping.

Der KI-Chatbot ChatGPT: Eine Herausforderung für die Hochschulen (2023)

Bialonski, Stephan ; Grieger, Niklas

Essays, Gedichte, Programmcode: ChatGPT generiert automatisch Texte auf bisher unerreicht hohem Niveau. Dieses und nachfolgende Systeme werden nicht nur die akademische Welt nachhaltig verändern.

Preprint: Speaker attribution in German parliamentary debates with QLoRA-adapted large language models (2023)

Bornheim, Tobias ; Grieger, Niklas ; Blaneck, Patrick Gustav ; Bialonski, Stephan

The growing body of political texts opens up new opportunities for rich insights into political dynamics and ideologies but also increases the workload for manual analysis. Automated speaker attribution, which detects who said what to whom in a speech event and is closely related to semantic role labeling, is an important processing step for computational text analysis. We study the potential of the large language model family Llama 2 to automate speaker attribution in German parliamentary debates from 2017-2021. We fine-tune Llama 2 with QLoRA, an efficient training strategy, and observe our approach to achieve competitive performance in the GermEval 2023 Shared Task On Speaker Attribution in German News Articles and Parliamentary Debates. Our results shed light on the capabilities of large language models in automating speaker attribution, revealing a promising avenue for computational analysis of political discourse and the development of semantic role labeling systems.

Automatic readability assessment of german sentences with transformer ensembles (2022)

Blaneck, Patrick Gustav ; Bornheim, Tobias ; Grieger, Niklas ; Bialonski, Stephan

Reliable methods for automatic readability assessment have the potential to impact a variety of fields, ranging from machine translation to self-informed learning. Recently, large language models for the German language (such as GBERT and GPT-2-Wechsel) have become available, allowing to develop Deep Learning based approaches that promise to further improve automatic readability assessment. In this contribution, we studied the ability of ensembles of fine-tuned GBERT and GPT-2-Wechsel models to reliably predict the readability of German sentences. We combined these models with linguistic features and investigated the dependence of prediction performance on ensemble size and composition. Mixed ensembles of GBERT and GPT-2-Wechsel performed better than ensembles of the same size consisting of only GBERT or GPT-2-Wechsel models. Our models were evaluated in the GermEval 2022 Shared Task on Text Complexity Assessment on data of German sentences. On out-of-sample data, our best ensemble achieved a root mean squared error of 0:435.

Advanced sleep spindle identification with neural networks (2022)

Kaulen, Lars ; Schwabedal, Justus T. C. ; Schneider, Jules ; Ritter, Philipp ; Bialonski, Stephan

Sleep spindles are neurophysiological phenomena that appear to be linked to memory formation and other functions of the central nervous system, and that can be observed in electroencephalographic recordings (EEG) during sleep. Manually identified spindle annotations in EEG recordings suffer from substantial intra- and inter-rater variability, even if raters have been highly trained, which reduces the reliability of spindle measures as a research and diagnostic tool. The Massive Online Data Annotation (MODA) project has recently addressed this problem by forming a consensus from multiple such rating experts, thus providing a corpus of spindle annotations of enhanced quality. Based on this dataset, we present a U-Net-type deep neural network model to automatically detect sleep spindles. Our model’s performance exceeds that of the state-of-the-art detector and of most experts in the MODA dataset. We observed improved detection accuracy in subjects of all ages, including older individuals whose spindles are particularly challenging to detect reliably. Our results underline the potential of automated methods to do repetitive cumbersome tasks with super-human performance.

Schlafspindeln – Funktion, Detektion und Nutzung als Biomarker für die psychiatrische Diagnostik (2022)

Schneider, Jules ; Schwabedal, Justus T. C. ; Bialonski, Stephan

Hintergrund: Die Schlafspindel ist ein Graphoelement des Elektroenzephalogramms (EEG), das im Leicht- und Tiefschlaf beobachtet werden kann. Veränderungen der Spindelaktivität wurden für verschiedene psychiatrische Erkrankungen beschrieben. Schlafspindeln zeigen aufgrund ihrer relativ konstanten Eigenschaften Potenzial als Biomarker in der psychiatrischen Diagnostik. Methode: Dieser Beitrag liefert einen Überblick über den Stand der Wissenschaft zu Eigenschaften und Funktionen der Schlafspindeln sowie über beschriebene Veränderungen der Spindelaktivität bei psychiatrischen Erkrankungen. Verschiedene methodische Ansätze und Ausblicke zur Spindeldetektion werden hinsichtlich deren Anwendungspotenzial in der psychiatrischen Diagnostik erläutert. Ergebnisse und Schlussfolgerung: Während Veränderungen der Spindelaktivität bei psychiatrischen Erkrankungen beschrieben wurden, ist deren exaktes Potenzial für die psychiatrische Diagnostik noch nicht ausreichend erforscht. Diesbezüglicher Erkenntnisgewinn wird in der Forschung gegenwärtig durch ressourcenintensive und fehleranfällige Methoden zur manuellen oder automatisierten Spindeldetektion ausgebremst. Neuere Detektionsansätze, die auf Deep-Learning-Verfahren basieren, könnten die Schwierigkeiten bisheriger Detektionsmethoden überwinden und damit neue Möglichkeiten für die praktisch

RBDtector: an open-source software to detect REM sleep without atonia according to visual scoring criteria (2022)

Röthenbacher, Annika ; Cesari, Matteo ; Doppler, Christopher E.J. ; Okkels, Niels ; Willemsen, Nele ; Sembowski, Nora ; Seger, Aline ; Lindner, Marie ; Brune, Corinna ; Stefani, Ambra ; Högl, Birgit ; Bialonski, Stephan ; Borghammer, Per ; Fink, Gereon R. ; Schober, Martin ; Sommerauer, Michael

REM sleep without atonia (RSWA) is a key feature for the diagnosis of rapid eye movement (REM) sleep behaviour disorder (RBD). We introduce RBDtector, a novel open-source software to score RSWA according to established SINBAR visual scoring criteria. We assessed muscle activity of the mentalis, flexor digitorum superficialis (FDS), and anterior tibialis (AT) muscles. RSWA was scored manually as tonic, phasic, and any activity by human scorers as well as using RBDtector in 20 subjects. Subsequently, 174 subjects (72 without RBD and 102 with RBD) were analysed with RBDtector to show the algorithm’s applicability. We additionally compared RBDtector estimates to a previously published dataset. RBDtector showed robust conformity with human scorings. The highest congruency was achieved for phasic and any activity of the FDS. Combining mentalis any and FDS any, RBDtector identified RBD subjects with 100% specificity and 96% sensitivity applying a cut-off of 20.6%. Comparable performance was obtained without manual artefact removal. RBD subjects also showed muscle bouts of higher amplitude and longer duration. RBDtector provides estimates of tonic, phasic, and any activity comparable to human scorings. RBDtector, which is freely available, can help identify RBD subjects and provides reliable RSWA metrics.

Author(s)
Title
Additional Person(s)
Referee(s)
Abstract
Fulltext

Open Access

Refine

Author

Year of publication

Institute

Has Fulltext

Language

Document Type

Keywords

Zugriffsart

Is part of the Bibliography

40 search hits