TY - CHAP A1 - Gaigall, Daniel T1 - On Consistent Hypothesis Testing In General Hilbert Spaces N2 - Inference on the basis of high-dimensional and functional data are two topics which are discussed frequently in the current statistical literature. A possibility to include both topics in a single approach is working on a very general space for the underlying observations, such as a separable Hilbert space. We propose a general method for consistently hypothesis testing on the basis of random variables with values in separable Hilbert spaces. We avoid concerns with the curse of dimensionality due to a projection idea. We apply well-known test statistics from nonparametric inference to the projected data and integrate over all projections from a specific set and with respect to suitable probability measures. In contrast to classical methods, which are applicable for real-valued random variables or random vectors of dimensions lower than the sample size, the tests can be applied to random vectors of dimensions larger than the sample size or even to functional and high-dimensional data. In general, resampling procedures such as bootstrap or permutation are suitable to determine critical values. The idea can be extended to the case of incomplete observations. Moreover, we develop an efficient algorithm for implementing the method. Examples are given for testing goodness-of-fit in a one-sample situation in [1] or for testing marginal homogeneity on the basis of a paired sample in [2]. Here, the test statistics in use can be seen as generalizations of the well-known Cramérvon-Mises test statistics in the one-sample and two-samples case. The treatment of other testing problems is possible as well. By using the theory of U-statistics, for instance, asymptotic null distributions of the test statistics are obtained as the sample size tends to infinity. Standard continuity assumptions ensure the asymptotic exactness of the tests under the null hypothesis and that the tests detect any alternative in the limit. Simulation studies demonstrate size and power of the tests in the finite sample case, confirm the theoretical findings, and are used for the comparison with concurring procedures. A possible application of the general approach is inference for stock market returns, also in high data frequencies. In the field of empirical finance, statistical inference of stock market prices usually takes place on the basis of related log-returns as data. In the classical models for stock prices, i.e., the exponential Lévy model, Black-Scholes model, and Merton model, properties such as independence and stationarity of the increments ensure an independent and identically structure of the data. Specific trends during certain periods of the stock price processes can cause complications in this regard. In fact, our approach can compensate those effects by the treatment of the log-returns as random vectors or even as functional data. Y1 - 2022 U6 - http://dx.doi.org/10.11159/icsta22.157 N1 - Proceedings of the 4th International Conference on Statistics: Theory and Applications (ICSTA’22) Prague, Czech Republic – July 28- 30 SP - Paper No. 157 PB - Avestia Publishing CY - Orléans, Kanada ER - TY - CHAP A1 - Staat, Manfred A1 - Tran, Ngoc Trinh T1 - Strain based brittle failure criteria for rocks T2 - Proceedings of (NACOME2022) The 11th National Conference on Mechanics, Vol. 1. Solid Mechanics, Rock Mechanics, Artificial Intelligence, Teaching and Training, Hanoi, December 2-3, 2022 N2 - When confining pressure is low or absent, extensional fractures are typical, with fractures occurring on unloaded planes in rock. These “paradox” fractures can be explained by a phenomenological extension strain failure criterion. In the past, a simple empirical criterion for fracture initiation in brittle rock has been developed. But this criterion makes unrealistic strength predictions in biaxial compression and tension. A new extension strain criterion overcomes this limitation by adding a weighted principal shear component. The weight is chosen, such that the enriched extension strain criterion represents the same failure surface as the Mohr–Coulomb (MC) criterion. Thus, the MC criterion has been derived as an extension strain criterion predicting failure modes, which are unexpected in the understanding of the failure of cohesive-frictional materials. In progressive damage of rock, the most likely fracture direction is orthogonal to the maximum extension strain. The enriched extension strain criterion is proposed as a threshold surface for crack initiation CI and crack damage CD and as a failure surface at peak P. Examples show that the enriched extension strain criterion predicts much lower volumes of damaged rock mass compared to the simple extension strain criterion. KW - Extension fracture KW - Extension strain criterion KW - Mohr–Coulomb criterion KW - Evolution of damage Y1 - 2023 SN - 978-604-357-084-7 SP - 500 EP - 509 PB - Nha xuat ban Khoa hoc tu nhien va Cong nghe (Verlag Naturwissenschaft und Technik) CY - Hanoi ER - TY - CHAP A1 - Blaneck, Patrick Gustav A1 - Bornheim, Tobias A1 - Grieger, Niklas A1 - Bialonski, Stephan T1 - Automatic readability assessment of german sentences with transformer ensembles T2 - Proceedings of the GermEval 2022 Workshop on Text Complexity Assessment of German Text N2 - Reliable methods for automatic readability assessment have the potential to impact a variety of fields, ranging from machine translation to self-informed learning. Recently, large language models for the German language (such as GBERT and GPT-2-Wechsel) have become available, allowing to develop Deep Learning based approaches that promise to further improve automatic readability assessment. In this contribution, we studied the ability of ensembles of fine-tuned GBERT and GPT-2-Wechsel models to reliably predict the readability of German sentences. We combined these models with linguistic features and investigated the dependence of prediction performance on ensemble size and composition. Mixed ensembles of GBERT and GPT-2-Wechsel performed better than ensembles of the same size consisting of only GBERT or GPT-2-Wechsel models. Our models were evaluated in the GermEval 2022 Shared Task on Text Complexity Assessment on data of German sentences. On out-of-sample data, our best ensemble achieved a root mean squared error of 0:435. Y1 - 2022 U6 - http://dx.doi.org/10.48550/arXiv.2209.04299 N1 - Proceedings of the 18th Conference on Natural Language Processing/Konferenz zur Verarbeitung natürlicher Sprache (KONVENS 2022) 12-15 September, 2022 University of Potsdam Potsdam, Germany SP - 57 EP - 62 PB - Association for Computational Linguistics CY - Potsdam ER - TY - CHAP A1 - Büsgen, André A1 - Klöser, Lars A1 - Kohl, Philipp A1 - Schmidts, Oliver A1 - Kraft, Bodo A1 - Zündorf, Albert T1 - Exploratory analysis of chat-based black market profiles with natural language processing T2 - Proceedings of the 11th International Conference on Data Science, Technology and Applications N2 - Messenger apps like WhatsApp or Telegram are an integral part of daily communication. Besides the various positive effects, those services extend the operating range of criminals. Open trading groups with many thousand participants emerged on Telegram. Law enforcement agencies monitor suspicious users in such chat rooms. This research shows that text analysis, based on natural language processing, facilitates this through a meaningful domain overview and detailed investigations. We crawled a corpus from such self-proclaimed black markets and annotated five attribute types products, money, payment methods, user names, and locations. Based on each message a user sends, we extract and group these attributes to build profiles. Then, we build features to cluster the profiles. Pretrained word vectors yield better unsupervised clustering results than current state-of-the-art transformer models. The result is a semantically meaningful high-level overview of the user landscape of black market chatrooms. Additionally, the extracted structured information serves as a foundation for further data exploration, for example, the most active users or preferred payment methods. KW - Clustering KW - Natural Language Processing KW - Information Extraction KW - Profile Extraction KW - Text Mining Y1 - 2022 SN - 978-989-758-583-8 U6 - http://dx.doi.org/10.5220/0011271400003269 SN - 2184-285X SP - 83 EP - 94 ER - TY - CHAP A1 - Mandekar, Swati A1 - Jentsch, Lina A1 - Lutz, Kai A1 - Behbahani, Mehdi A1 - Melnykowycz, Mark T1 - Earable design analysis for sleep EEG measurements T2 - UbiComp '21 N2 - Conventional EEG devices cannot be used in everyday life and hence, past decade research has been focused on Ear-EEG for mobile, at-home monitoring for various applications ranging from emotion detection to sleep monitoring. As the area available for electrode contact in the ear is limited, the electrode size and location play a vital role for an Ear-EEG system. In this investigation, we present a quantitative study of ear-electrodes with two electrode sizes at different locations in a wet and dry configuration. Electrode impedance scales inversely with size and ranges from 450 kΩ to 1.29 MΩ for dry and from 22 kΩ to 42 kΩ for wet contact at 10 Hz. For any size, the location in the ear canal with the lowest impedance is ELE (Left Ear Superior), presumably due to increased contact pressure caused by the outer-ear anatomy. The results can be used to optimize signal pickup and SNR for specific applications. We demonstrate this by recording sleep spindles during sleep onset with high quality (5.27 μVrms). KW - EEG KW - sensors KW - Impedance Spectroscopy KW - Sleep EEG KW - biopotential electrodes Y1 - 2021 U6 - http://dx.doi.org/10.1145/3460418.3479328 N1 - UbiComp '21: Adjunct Proceedings of the 2021 ACM International Joint Conference on Pervasive and Ubiquitous Computing and Proceedings of the 2021 ACM International Symposium on Wearable Computers, September 21–26, 2021, Virtual, USA SP - 171 EP - 175 ER - TY - CHAP A1 - Klöser, Lars A1 - Kohl, Philipp A1 - Kraft, Bodo A1 - Zündorf, Albert T1 - Multi-attribute relation extraction (MARE): simplifying the application of relation extraction T2 - Proceedings of the 2nd International Conference on Deep Learning Theory and Applications - DeLTA N2 - Natural language understanding’s relation extraction makes innovative and encouraging novel business concepts possible and facilitates new digitilized decision-making processes. Current approaches allow the extraction of relations with a fixed number of entities as attributes. Extracting relations with an arbitrary amount of attributes requires complex systems and costly relation-trigger annotations to assist these systems. We introduce multi-attribute relation extraction (MARE) as an assumption-less problem formulation with two approaches, facilitating an explicit mapping from business use cases to the data annotations. Avoiding elaborated annotation constraints simplifies the application of relation extraction approaches. The evaluation compares our models to current state-of-the-art event extraction and binary relation extraction methods. Our approaches show improvement compared to these on the extraction of general multi-attribute relations. Y1 - 2021 SN - 978-989-758-526-5 U6 - http://dx.doi.org/10.5220/0010559201480156 N1 - Proceedings of the 2nd International Conference on Deep Learning Theory and Applications, DeLTA2021, July 7-9, 2021 SP - 148 EP - 156 ER - TY - CHAP A1 - Schmidts, Oliver A1 - Kraft, Bodo A1 - Winkens, Marvin A1 - Zündorf, Albert T1 - Catalog integration of heterogeneous and volatile product data T2 - DATA 2020: Data Management Technologies and Applications N2 - The integration of frequently changing, volatile product data from different manufacturers into a single catalog is a significant challenge for small and medium-sized e-commerce companies. They rely on timely integrating product data to present them aggregated in an online shop without knowing format specifications, concept understanding of manufacturers, and data quality. Furthermore, format, concepts, and data quality may change at any time. Consequently, integrating product catalogs into a single standardized catalog is often a laborious manual task. Current strategies to streamline or automate catalog integration use techniques based on machine learning, word vectorization, or semantic similarity. However, most approaches struggle with low-quality or real-world data. We propose Attribute Label Ranking (ALR) as a recommendation engine to simplify the integration process of previously unknown, proprietary tabular format into a standardized catalog for practitioners. We evaluate ALR by focusing on the impact of different neural network architectures, language features, and semantic similarity. Additionally, we consider metrics for industrial application and present the impact of ALR in production and its limitations. Y1 - 2021 SN - 978-3-030-83013-7 U6 - http://dx.doi.org/10.1007/978-3-030-83014-4_7 N1 - International Conference on Data Management Technologies and Applications, DATA 2020, 7-9 July SP - 134 EP - 153 PB - Springer CY - Cham ER - TY - CHAP A1 - Kohl, Philipp A1 - Schmidts, Oliver A1 - Klöser, Lars A1 - Werth, Henri A1 - Kraft, Bodo A1 - Zündorf, Albert T1 - STAMP 4 NLP – an agile framework for rapid quality-driven NLP applications development T2 - Quality of Information and Communications Technology. QUATIC 2021 N2 - The progress in natural language processing (NLP) research over the last years, offers novel business opportunities for companies, as automated user interaction or improved data analysis. Building sophisticated NLP applications requires dealing with modern machine learning (ML) technologies, which impedes enterprises from establishing successful NLP projects. Our experience in applied NLP research projects shows that the continuous integration of research prototypes in production-like environments with quality assurance builds trust in the software and shows convenience and usefulness regarding the business goal. We introduce STAMP 4 NLP as an iterative and incremental process model for developing NLP applications. With STAMP 4 NLP, we merge software engineering principles with best practices from data science. Instantiating our process model allows efficiently creating prototypes by utilizing templates, conventions, and implementations, enabling developers and data scientists to focus on the business goals. Due to our iterative-incremental approach, businesses can deploy an enhanced version of the prototype to their software environment after every iteration, maximizing potential business value and trust early and avoiding the cost of successful yet never deployed experiments. KW - Machine learning KW - Process model KW - Natural language processing Y1 - 2021 SN - 978-3-030-85346-4 SN - 978-3-030-85347-1 U6 - http://dx.doi.org/10.1007/978-3-030-85347-1_12 N1 - International Conference on the Quality of Information and Communications Technology, QUATIC 2021, 8-11 September, Algarve, Portugal SP - 156 EP - 166 PB - Springer CY - Cham ER - TY - CHAP A1 - Bornheim, Tobias A1 - Grieger, Niklas A1 - Bialonski, Stephan T1 - FHAC at GermEval 2021: Identifying German toxic, engaging, and fact-claiming comments with ensemble learning T2 - Proceedings of the GermEval 2021 Workshop on the Identification of Toxic, Engaging, and Fact-Claiming Comments : 17th Conference on Natural Language Processing KONVENS 2021 Y1 - 2021 U6 - http://dx.doi.org/10.48415/2021/fhw5-x128 N1 - SP - 105 EP - 111 PB - Heinrich Heine University CY - Düsseldorf ER - TY - CHAP A1 - Olderog, M. A1 - Mohr, P. A1 - Beging, Stefan A1 - Tsoumpas, C. A1 - Ziemons, Karl T1 - Simulation study on the role of tissue-scattered events in improving sensitivity for a compact time of flight compton positron emission tomograph T2 - 2020 IEEE Nuclear Science Symposium and Medical Imaging Conference (NSS/MIC) N2 - In positron emission tomography improving time, energy and spatial detector resolutions and using Compton kinematics introduces the possibility to reconstruct a radioactivity distribution image from scatter coincidences, thereby enhancing image quality. The number of single scattered coincidences alone is in the same order of magnitude as true coincidences. In this work, a compact Compton camera module based on monolithic scintillation material is investigated as a detector ring module. The detector interactions are simulated with Monte Carlo package GATE. The scattering angle inside the tissue is derived from the energy of the scattered photon, which results in a set of possible scattering trajectories or broken line of response. The Compton kinematics collimation reduces the number of solutions. Additionally, the time of flight information helps localize the position of the annihilation. One of the questions of this investigation is related to how the energy, spatial and temporal resolutions help confine the possible annihilation volume. A comparison of currently technically feasible detector resolutions (under laboratory conditions) demonstrates the influence on this annihilation volume and shows that energy and coincidence time resolution have a significant impact. An enhancement of the latter from 400 ps to 100 ps leads to a smaller annihilation volume of around 50%, while a change of the energy resolution in the absorber layer from 12% to 4.5% results in a reduction of 60%. The inclusion of single tissue-scattered data has the potential to increase the sensitivity of a scanner by a factor of 2 to 3 times. The concept can be further optimized and extended for multiple scatter coincidences and subsequently validated by a reconstruction algorithm. Y1 - 2021 SN - 978-1-7281-7693-2 U6 - http://dx.doi.org/10.1109/NSS/MIC42677.2020.9507901 N1 - 2020 IEEE Nuclear Science Symposium and Medical Imaging Conference (NSS/MIC), 31 Oct.-7 Nov. 2020, Boston, MA, USA PB - IEEE ER -