TY  - CHAP
A1  - Gaigall, Daniel
T1  - On Consistent Hypothesis Testing In General Hilbert Spaces
N2  - Inference on the basis of high-dimensional and functional data are two topics which are discussed frequently in the current statistical literature. A possibility to include both topics in a single approach is working on a very general space for the underlying observations, such as a separable Hilbert space. We propose a general method for consistently hypothesis testing on the basis of random variables with values in separable Hilbert spaces. We avoid concerns with the curse of dimensionality due to a projection idea. We apply well-known test statistics from nonparametric inference to the projected data and integrate over all projections from a specific set and with respect to suitable probability measures. In contrast to classical methods, which are applicable for real-valued random variables or random vectors of dimensions lower than the sample size, the tests can be applied to random vectors of dimensions larger than the sample size or even to functional and high-dimensional data. In general, resampling procedures such as bootstrap or permutation are suitable to determine critical values. The idea can be extended to the case of incomplete observations. Moreover, we develop an efficient algorithm for implementing the method. Examples are given for testing goodness-of-fit in a one-sample situation in [1] or for testing marginal homogeneity on the basis of a paired sample in [2]. Here, the test statistics in use can be seen as generalizations of the well-known Cramérvon-Mises test statistics in the one-sample and two-samples case. The treatment of other testing problems is possible as well. By using the theory of U-statistics, for instance, asymptotic null distributions of the test statistics are obtained as the sample size tends to infinity. Standard continuity assumptions ensure the asymptotic exactness of the tests under the null hypothesis and that the tests detect any alternative in the limit. Simulation studies demonstrate size and power of the tests in the finite sample case, confirm the theoretical findings, and are used for the comparison with concurring procedures. A possible application of the general approach is inference for stock market returns, also in high data frequencies. In the field of empirical finance, statistical inference of stock market prices usually takes place on the basis of related log-returns as data. In the classical models for stock prices, i.e., the exponential Lévy model, Black-Scholes model, and Merton model, properties such as independence and stationarity of the increments ensure an independent and identically structure of the data. Specific trends during certain periods of the stock price processes can cause complications in this regard. In fact, our approach can compensate those effects by the treatment of the log-returns as random vectors or even as functional data.
Y1  - 2022
U6  - http://dx.doi.org/10.11159/icsta22.157
N1  - Proceedings of the 4th International Conference on Statistics: Theory and Applications (ICSTA’22) Prague, Czech Republic – July 28- 30
SP  - Paper No. 157
PB  - Avestia Publishing
CY  - Orléans, Kanada
ER  - 
TY  - CHAP
A1  - Staat, Manfred
A1  - Tran, Ngoc Trinh
T1  - Strain based brittle failure criteria for rocks
T2  - Proceedings of (NACOME2022) The 11th National Conference on Mechanics, Vol. 1. Solid Mechanics, Rock Mechanics, Artificial Intelligence, Teaching and Training, Hanoi, December 2-3, 2022
N2  - When confining pressure is low or absent, extensional fractures are typical, with fractures occurring on unloaded planes in rock. These “paradox” fractures can be explained by a phenomenological extension strain failure criterion. In the past, a simple empirical criterion for fracture initiation in brittle rock has been developed. But this criterion makes unrealistic strength predictions in biaxial compression and tension. A new extension strain criterion overcomes this limitation by adding a weighted principal shear component. The weight is chosen, such that the enriched extension strain criterion represents the same failure surface as the Mohr–Coulomb (MC) criterion. Thus, the MC criterion has been derived as an extension strain criterion predicting failure modes, which are unexpected in the understanding of the failure of cohesive-frictional materials. In progressive damage of rock, the most likely fracture direction is orthogonal to the maximum extension strain. The enriched extension strain criterion is proposed as a threshold surface for crack initiation CI and crack damage CD and as a failure surface at peak P. Examples show that the enriched extension strain criterion predicts much lower volumes of damaged rock mass compared to the simple extension strain criterion.
KW  - Extension fracture
KW  - Extension strain criterion
KW  - Mohr–Coulomb criterion
KW  - Evolution of damage
Y1  - 2023
SN  - 978-604-357-084-7
SP  - 500
EP  - 509
PB  - Nha xuat ban Khoa hoc tu nhien va Cong nghe (Verlag Naturwissenschaft und Technik)
CY  - Hanoi
ER  - 
TY  - CHAP
A1  - Blaneck, Patrick Gustav
A1  - Bornheim, Tobias
A1  - Grieger, Niklas
A1  - Bialonski, Stephan
T1  - Automatic readability assessment of german sentences with transformer ensembles
T2  - Proceedings of the GermEval 2022 Workshop on Text Complexity Assessment of German Text
N2  - Reliable methods for automatic readability assessment have the potential to impact a variety of fields, ranging from machine translation to self-informed learning. Recently, large language models for the German language (such as GBERT and GPT-2-Wechsel) have become available, allowing to develop Deep Learning based approaches that promise to further improve automatic readability assessment. In this contribution, we studied the ability of ensembles of fine-tuned GBERT and GPT-2-Wechsel models to reliably predict the readability of German sentences. We combined these models with linguistic features and investigated the dependence of prediction performance on ensemble size and composition. Mixed ensembles of GBERT and GPT-2-Wechsel performed better than ensembles of the same size consisting of only GBERT or GPT-2-Wechsel models. Our models were evaluated in the GermEval 2022 Shared Task on Text Complexity Assessment on data of German sentences. On out-of-sample data, our best ensemble achieved a root mean squared error of 0:435.
Y1  - 2022
U6  - http://dx.doi.org/10.48550/arXiv.2209.04299
N1  - Proceedings of the 18th Conference on Natural Language
Processing/Konferenz zur Verarbeitung natürlicher
Sprache (KONVENS 2022)
12-15 September, 2022
University of Potsdam
Potsdam, Germany
SP  - 57
EP  - 62
PB  - Association for Computational Linguistics
CY  - Potsdam
ER  - 
TY  - CHAP
A1  - Büsgen, André
A1  - Klöser, Lars
A1  - Kohl, Philipp
A1  - Schmidts, Oliver
A1  - Kraft, Bodo
A1  - Zündorf, Albert
T1  - Exploratory analysis of chat-based black market profiles with natural language processing
T2  - Proceedings of the 11th International Conference on Data Science, Technology and Applications
N2  - Messenger apps like WhatsApp or Telegram are an integral part of daily communication. Besides the various positive effects, those services extend the operating range of criminals. Open trading groups with many thousand participants emerged on Telegram. Law enforcement agencies monitor suspicious users in such chat rooms. This research shows that text analysis, based on natural language processing, facilitates this through a meaningful domain overview and detailed investigations. We crawled a corpus from such self-proclaimed black markets and annotated five attribute types products, money, payment methods, user names, and locations. Based on each message a user sends, we extract and group these attributes to build profiles. Then, we build features to cluster the profiles. Pretrained word vectors yield better unsupervised clustering results than current
state-of-the-art transformer models. The result is a semantically meaningful high-level overview of the user landscape of black market chatrooms. Additionally, the extracted structured information serves as a foundation for further data exploration, for example, the most active users or preferred payment methods.
KW  - Clustering
KW  - Natural Language Processing
KW  - Information Extraction
KW  - Profile Extraction
KW  - Text Mining
Y1  - 2022
SN  - 978-989-758-583-8
U6  - http://dx.doi.org/10.5220/0011271400003269
SN  - 2184-285X
SP  - 83
EP  - 94
ER  - 
TY  - CHAP
A1  - Mandekar, Swati
A1  - Jentsch, Lina
A1  - Lutz, Kai
A1  - Behbahani, Mehdi
A1  - Melnykowycz, Mark
T1  - Earable design analysis for sleep EEG measurements
T2  - UbiComp '21
N2  - Conventional EEG devices cannot be used in everyday life and
hence, past decade research has been focused on Ear-EEG for mobile,
at-home monitoring for various applications ranging from
emotion detection to sleep monitoring. As the area available for
electrode contact in the ear is limited, the electrode size and location
play a vital role for an Ear-EEG system. In this investigation, we
present a quantitative study of ear-electrodes with two electrode
sizes at different locations in a wet and dry configuration. Electrode
impedance scales inversely with size and ranges from 450 kΩ to
1.29 MΩ for dry and from 22 kΩ to 42 kΩ for wet contact at 10 Hz.
For any size, the location in the ear canal with the lowest impedance
is ELE (Left Ear Superior), presumably due to increased contact
pressure caused by the outer-ear anatomy. The results can be used
to optimize signal pickup and SNR for specific applications. We
demonstrate this by recording sleep spindles during sleep onset
with high quality (5.27 μVrms).
KW  - EEG
KW  - sensors
KW  - Impedance Spectroscopy
KW  - Sleep EEG
KW  - biopotential electrodes
Y1  - 2021
U6  - http://dx.doi.org/10.1145/3460418.3479328
N1  - UbiComp '21: Adjunct Proceedings of the 2021 ACM International Joint Conference on Pervasive and Ubiquitous Computing and Proceedings of the 2021 ACM International Symposium on Wearable Computers, September 21–26, 2021, Virtual, USA
SP  - 171
EP  - 175
ER  - 
TY  - CHAP
A1  - Klöser, Lars
A1  - Kohl, Philipp
A1  - Kraft, Bodo
A1  - Zündorf, Albert
T1  - Multi-attribute relation extraction (MARE): simplifying the application of relation extraction
T2  - Proceedings of the 2nd International Conference on Deep Learning Theory and Applications - DeLTA
N2  - Natural language understanding’s relation extraction makes innovative and encouraging novel business concepts possible and facilitates new digitilized decision-making processes. Current approaches allow the extraction of relations with a fixed number of entities as attributes. Extracting relations with an arbitrary amount of attributes requires complex systems and costly relation-trigger annotations to assist these systems. We introduce multi-attribute relation extraction (MARE) as an assumption-less problem formulation with two approaches, facilitating an explicit mapping from business use cases to the data annotations. Avoiding elaborated annotation constraints simplifies the application of relation extraction approaches. The evaluation compares our models to current state-of-the-art event extraction and binary relation extraction methods. Our approaches show improvement compared to these on the extraction of general multi-attribute relations.
Y1  - 2021
SN  - 978-989-758-526-5
U6  - http://dx.doi.org/10.5220/0010559201480156
N1  - Proceedings of the 2nd International Conference on Deep Learning Theory and Applications, DeLTA2021, July 7-9, 2021
SP  - 148
EP  - 156
ER  - 
TY  - CHAP
A1  - Schmidts, Oliver
A1  - Kraft, Bodo
A1  - Winkens, Marvin
A1  - Zündorf, Albert
T1  - Catalog integration of heterogeneous and volatile product data
T2  - DATA 2020: Data Management Technologies and Applications
N2  - The integration of frequently changing, volatile product data from different manufacturers into a single catalog is a significant challenge for small and medium-sized e-commerce companies. They rely on timely integrating product data to present them aggregated in an online shop without knowing format specifications, concept understanding of manufacturers, and data quality. Furthermore, format, concepts, and data quality may change at any time. Consequently, integrating product catalogs into a single standardized catalog is often a laborious manual task. Current strategies to streamline or automate catalog integration use techniques based on machine learning, word vectorization, or semantic similarity. However, most approaches struggle with low-quality or real-world data. We propose Attribute Label Ranking (ALR) as a recommendation engine to simplify the integration process of previously unknown, proprietary tabular format into a standardized catalog for practitioners. We evaluate ALR by focusing on the impact of different neural network architectures, language features, and semantic similarity. Additionally, we consider metrics for industrial application and present the impact of ALR in production and its limitations.
Y1  - 2021
SN  - 978-3-030-83013-7
U6  - http://dx.doi.org/10.1007/978-3-030-83014-4_7
N1  - International Conference on Data Management Technologies and Applications, DATA 2020, 7-9 July
SP  - 134
EP  - 153
PB  - Springer
CY  - Cham
ER  - 
TY  - CHAP
A1  - Kohl, Philipp
A1  - Schmidts, Oliver
A1  - Klöser, Lars
A1  - Werth, Henri
A1  - Kraft, Bodo
A1  - Zündorf, Albert
T1  - STAMP 4 NLP – an agile framework for rapid quality-driven NLP applications development
T2  - Quality of Information and Communications Technology. QUATIC 2021
N2  - The progress in natural language processing (NLP) research over the last years, offers novel business opportunities for companies, as automated user interaction or improved data analysis. Building sophisticated NLP applications requires dealing with modern machine learning (ML) technologies, which impedes enterprises from establishing successful NLP projects. Our experience in applied NLP research projects shows that the continuous integration of research prototypes in production-like environments with quality assurance builds trust in the software and shows convenience and usefulness regarding the business goal. We introduce STAMP 4 NLP as an iterative and incremental process model for developing NLP applications. With STAMP 4 NLP, we merge software engineering principles with best practices from data science. Instantiating our process model allows efficiently creating prototypes by utilizing templates, conventions, and implementations, enabling developers and data scientists to focus on the business goals. Due to our iterative-incremental approach, businesses can deploy an enhanced version of the prototype to their software environment after every iteration, maximizing potential business value and trust early and avoiding the cost of successful yet never deployed experiments.
KW  - Machine learning
KW  - Process model
KW  - Natural language processing
Y1  - 2021
SN  - 978-3-030-85346-4
SN  - 978-3-030-85347-1
U6  - http://dx.doi.org/10.1007/978-3-030-85347-1_12
N1  - International Conference on the Quality of Information and Communications Technology, QUATIC 2021, 8-11 September, Algarve, Portugal
SP  - 156
EP  - 166
PB  - Springer
CY  - Cham
ER  - 
TY  - CHAP
A1  - Bornheim, Tobias
A1  - Grieger, Niklas
A1  - Bialonski, Stephan
T1  - FHAC at GermEval 2021: Identifying German toxic, engaging, and fact-claiming comments with ensemble learning
T2  - Proceedings of the GermEval 2021 Workshop on the Identification of Toxic, Engaging, and Fact-Claiming Comments : 17th Conference on Natural Language Processing KONVENS 2021
Y1  - 2021
U6  - http://dx.doi.org/10.48415/2021/fhw5-x128
N1  - <Conference on Natural Language Processing, KONVENS, 17, 2021>
SP  - 105
EP  - 111
PB  - Heinrich Heine University
CY  - Düsseldorf
ER  - 
TY  - CHAP
A1  - Olderog, M.
A1  - Mohr, P.
A1  - Beging, Stefan
A1  - Tsoumpas, C.
A1  - Ziemons, Karl
T1  - Simulation study on the role of tissue-scattered events in improving sensitivity for a compact time of flight compton positron emission tomograph
T2  - 2020 IEEE Nuclear Science Symposium and Medical Imaging Conference (NSS/MIC)
N2  - In positron emission tomography improving time, energy and spatial detector resolutions and using Compton kinematics introduces the possibility to reconstruct a radioactivity distribution image from scatter coincidences, thereby enhancing image quality. The number of single scattered coincidences alone is in the same order of magnitude as true coincidences. In this work, a compact Compton camera module based on monolithic scintillation material is investigated as a detector ring module. The detector interactions are simulated with Monte Carlo package GATE. The scattering angle inside the tissue is derived from the energy of the scattered photon, which results in a set of possible scattering trajectories or broken line of response. The Compton kinematics collimation reduces the number of solutions. Additionally, the time of flight information helps localize the position of the annihilation. One of the questions of this investigation is related to how the energy, spatial and temporal resolutions help confine the possible annihilation volume. A comparison of currently technically feasible detector resolutions (under laboratory conditions) demonstrates the influence on this annihilation volume and shows that energy and coincidence time resolution have a significant impact. An enhancement of the latter from 400 ps to 100 ps leads to a smaller annihilation volume of around 50%, while a change of the energy resolution in the absorber layer from 12% to 4.5% results in a reduction of 60%. The inclusion of single tissue-scattered data has the potential to increase the sensitivity of a scanner by a factor of 2 to 3 times. The concept can be further optimized and extended for multiple scatter coincidences and subsequently validated by a reconstruction algorithm.
Y1  - 2021
SN  - 978-1-7281-7693-2
U6  - http://dx.doi.org/10.1109/NSS/MIC42677.2020.9507901
N1  - 2020 IEEE Nuclear Science Symposium and Medical Imaging Conference (NSS/MIC), 31 Oct.-7 Nov. 2020, Boston, MA, USA
PB  - IEEE
ER  -