Conference Proceeding
Refine
Year of publication
Institute
- Fachbereich Medizintechnik und Technomathematik (152) (remove)
Has Fulltext
- no (152) (remove)
Document Type
- Conference Proceeding (152) (remove)
Keywords
In energy economy forecasts of different time series are rudimentary. In this study, a prediction for the German day-ahead spot market is created with Apache Spark and R. It is just an example for many different applications in virtual power plant environments. Other examples of use as intraday price processes, load processes of machines or electric vehicles, real time energy loads of photovoltaic systems and many more time series need to be analysed and predicted.
This work gives a short introduction into the project where this study is settled. It describes the time series methods that are used in energy industry for forecasts shortly. As programming technique Apache Spark, which is a strong cluster computing technology, is utilised. Today, single time series can be predicted. The focus of this work is on developing a method to parallel forecasting, to process multiple time series simultaneously with R and Apache Spark.
The progress in natural language processing (NLP) research over the last years, offers novel business opportunities for companies, as automated user interaction or improved data analysis. Building sophisticated NLP applications requires dealing with modern machine learning (ML) technologies, which impedes enterprises from establishing successful NLP projects. Our experience in applied NLP research projects shows that the continuous integration of research prototypes in production-like environments with quality assurance builds trust in the software and shows convenience and usefulness regarding the business goal. We introduce STAMP 4 NLP as an iterative and incremental process model for developing NLP applications. With STAMP 4 NLP, we merge software engineering principles with best practices from data science. Instantiating our process model allows efficiently creating prototypes by utilizing templates, conventions, and implementations, enabling developers and data scientists to focus on the business goals. Due to our iterative-incremental approach, businesses can deploy an enhanced version of the prototype to their software environment after every iteration, maximizing potential business value and trust early and avoiding the cost of successful yet never deployed experiments.
Supervised machine learning and deep learning require a large amount of labeled data, which data scientists obtain in a manual, and time-consuming annotation process. To mitigate this challenge, Active Learning (AL) proposes promising data points to annotators they annotate next instead of a subsequent or random sample. This method is supposed to save annotation effort while maintaining model performance.
However, practitioners face many AL strategies for different tasks and need an empirical basis to choose between them. Surveys categorize AL strategies into taxonomies without performance indications. Presentations of novel AL strategies compare the performance to a small subset of strategies. Our contribution addresses the empirical basis by introducing a reproducible active learning evaluation (ALE) framework for the comparative evaluation of AL strategies in NLP.
The framework allows the implementation of AL strategies with low effort and a fair data-driven comparison through defining and tracking experiment parameters (e.g., initial dataset size, number of data points per query step, and the budget). ALE helps practitioners to make more informed decisions, and researchers can focus on developing new, effective AL strategies and deriving best practices for specific use cases. With best practices, practitioners can lower their annotation costs. We present a case study to illustrate how to use the framework.
Multi-attribute relation extraction (MARE): simplifying the application of relation extraction
(2021)
Natural language understanding’s relation extraction makes innovative and encouraging novel business concepts possible and facilitates new digitilized decision-making processes. Current approaches allow the extraction of relations with a fixed number of entities as attributes. Extracting relations with an arbitrary amount of attributes requires complex systems and costly relation-trigger annotations to assist these systems. We introduce multi-attribute relation extraction (MARE) as an assumption-less problem formulation with two approaches, facilitating an explicit mapping from business use cases to the data annotations. Avoiding elaborated annotation constraints simplifies the application of relation extraction approaches. The evaluation compares our models to current state-of-the-art event extraction and binary relation extraction methods. Our approaches show improvement compared to these on the extraction of general multi-attribute relations.
In recent years, the development of large pretrained language models, such as BERT and GPT, significantly improved information extraction systems on various tasks, including relation classification. State-of-the-art systems are highly accurate on scientific benchmarks. A lack of explainability is currently a complicating factor in many real-world applications. Comprehensible systems are necessary to prevent biased, counterintuitive, or harmful decisions.
We introduce semantic extents, a concept to analyze decision patterns for the relation classification task. Semantic extents are the most influential parts of texts concerning classification decisions. Our definition allows similar procedures to determine semantic extents for humans and models. We provide an annotation tool and a software framework to determine semantic extents for humans and models conveniently and reproducibly. Comparing both reveals that models tend to learn shortcut patterns from data. These patterns are hard to detect with current interpretability methods, such as input reductions. Our approach can help detect and eliminate spurious decision patterns during model development. Semantic extents can increase the reliability and security of natural language processing systems. Semantic extents are an essential step in enabling applications in critical areas like healthcare or finance. Moreover, our work opens new research directions for developing methods to explain deep learning models.
Heavy metal detection with semiconductor devices based on PLD-prepared chalcogenide glass thin films
(2007)
Effective training requires high muscle forces potentially leading to training-induced injuries. Thus, continuous monitoring and controlling of the loadings applied to the musculoskeletal system along the motion trajectory is required. In this paper, a norm-optimal iterative learning control algorithm for the robot-assisted training is developed. The algorithm aims at minimizing the external knee joint moment, which is commonly used to quantify the loading of the medial compartment. To estimate the external knee joint moment, a musculoskeletal lower extremity model is implemented in OpenSim and coupled with a model of an industrial robot and a force plate mounted at its end-effector. The algorithm is tested in simulation for patients with varus, normal and valgus alignment of the knee. The results show that the algorithm is able to minimize the external knee joint moment in all three cases and converges after less than seven iterations.
Mathematical morphology is a part of image processing that has proven to be fruitful for numerous applications. Two main operations in mathematical morphology are dilation and erosion. These are based on the construction of a supremum or infimum with respect to an order over the tonal range in a certain section of the image. The tonal ordering can easily be realised in grey-scale morphology, and some morphological methods have been proposed for colour morphology. However, all of these have certain limitations.
In this paper we present a novel approach to colour morphology extending upon previous work in the field based on the Loewner order. We propose to consider an approximation of the supremum by means of a log-sum exponentiation introduced by Maslov. We apply this to the embedding of an RGB image in a field of symmetric 2x2 matrices. In this way we obtain nearly isotropic matrices representing colours and the structural advantage of transitivity. In numerical experiments we highlight some remarkable properties of the proposed approach.
The overall objective of this study is to develop a new external fixator, which closely maps the native kinematics of the elbow to decrease the joint force resulting in reduced rehabilitation time and pain. An experimental setup was designed to determine the native kinematics of the elbow during flexion of cadaveric arms. As a preliminary study, data from literature was used to modify a published biomechanical model for the calculation of the joint and muscle forces. They were compared to the original model and the effect of the kinematic refinement was evaluated. Furthermore, the obtained muscle forces were determined in order to apply them in the experimental setup. The joint forces in the modified model differed slightly from the forces in the original model. The muscle force curves changed particularly for small flexion angles but their magnitude for larger angles was consistent.
The human arm consists of the humerus (upper arm), the medial ulna and the lateral radius (forearm). The joint between the humerus and the ulna is called humeroulnar joint and the joint between the humerus and the radius is called humeroradial joint. Lateral and medial collateral ligaments stabilize the elbow. Statistically, 2.5 out of 10,000 people suffer from radial head fractures [1]. In these fractures the cartilage is often affected. Caused by the injured cartilage, degenerative diseases like posttraumatic arthrosis may occur. The resulting pain and reduced range of motion have an impact on the patient’s quality of life. Until now, there has not been a treatment which allows typical loads in daily life activities and offers good long-term results. A new surgical approach was developed with the motivation to reduce the progress of the posttraumatic arthrosis. Here, the radius is shortened by 3 mm in the proximal part [2]. By this means, the load of the radius is intended to be reduced due to a load shift to the ulna. Since the radius is the most important stabilizer of the elbow it has to be confirmed that the stability is not affected. In the first test (Fig. 1 left), pressure distributions within the humeroulnar and humeroradial joints a native and a shortened radius were measured using resistive pressure sensors (I5076 and I5027, Tekscan, USA). The humerus was loaded axially in a tension testing machine (Z010, Zwick Roell, Germany) in 50 N steps up to 400 N. From the humerus the load is transmitted through both the radius and the ulna into the hand which is fixed on the ground. In the second test (Fig. 1 right), the joint stability was investigated using a digital image correlation system to measure the displacement of the ulna. Here, the humerus is fixed with a desired flexion angle and the unconstrained forearm lies on the ground. A rope connects the load actuator with a hook fixed in the ulna. A guide roller is used so that the rope pulls the ulna horizontally when a tensile load is applied. This creates a moment about the elbow joint with a maximum value of 7.5 Nm. Measurements were performed with varying flexion angles (0°, 30°, 60°, 90°, 120°). For both tests and each measurement, seven specimens were used. Student ́s t-test was employed to determine whether the mean values of the measurements in native specimen and operated specimens differ significantly.
Biomechanical simulation of different prosthetic meshes for repairing uterine/vaginal vault prolapse
(2017)
The paper presents a method for the quantitative assessment of choroidal blood flow using an OCT-A system. The developed technique for processing of OCT-A scans is divided into two stages. At the first stage, the identification of the boundaries in the selected portion was performed. At the second stage, each pixel mark on the selected layer was represented as a volume unit, a voxel, which characterizes the region of moving blood. Three geometric shapes were considered to represent the voxel. On the example of one OCT-A scan, this work presents a quantitative assessment of the blood flow index. A possible modification of two-stage algorithm based on voxel scan processing is presented.
The discovery of human induced pluripotent stem cells reprogrammed from somatic cells [1] and their ability to differentiate into cardiomyocytes (hiPSC-CMs) has provided a robust platform for drug screening [2]. Drug screenings are essential in the development of new components, particularly for evaluating the potential of drugs to induce life-threatening pro-arrhythmias. Between 1988 and 2009, 14 drugs have been removed from the market for this reason [3]. The microelectrode array (MEA) technique is a robust tool for drug screening as it detects the field potentials (FPs) for the entire cell culture. Furthermore, the propagation of the field potential can be examined on an electrode basis. To analyze MEA measurements in detail, we have developed an open-source tool.
Chemische Sensoren mit Bariumstrontiumtitanat als funktionelle Schicht zur Multiparameterdetektion
(2013)
Multi-parameter detection for supporting monitoring and control of biogas processes in agriculture
(2014)
Beim Ausbau nachhaltiger, regenerativer Energieversorgung hat die Umwandlung von organischer Biomasse in Biogas ein großes Potential. Der zugrundeliegende, komplexe biologische Prozess wird noch immer unzureichend verstanden und bedarf systematischer Untersuchungen der Prozessparameter, um einen hohen Ertrag bei guter Gasqualität zu ermöglichen. Die Fragestellungen zur Entschlüsselung des Prozesses sind sowohl verfahrenstechnischer als auch mikrobiologischer Natur. Aus mikrobiologischer Sicht ist die Kenntnis der tatsächlich beteiligten prozesstragenden Mikroorganismen von erheblicher Bedeutung, aus verfahrenstechnischer Sicht die Kenntnis der physikalischen und chemischen Faktoren, welche die mikrobiologischen Prozesse und kontrollieren. Im Zusammenspiel aller dieser Parameter wird die Biogasbildung befördert oder behindert, bis zum Abbruch des Prozesses.
Eine mögliche Kontrollmethode ist die Messung der metabolischen Aktivität prozesstragender Organismen.
Diese soll, beruhend auf fundierten Prozessdaten, gewonnen durch eine Parallelanlage, mit einem lichtadressierbaren potentiometrischen Sensor-System (LAPS) realisiert werden. Dieser Sensor ist in der Lage, pH-Wert-änderungen zu detektieren, die durch den Stoffwechsel der auf dem Chip immobilisierten Organismen hervorgerufen werden, um eine Online-Überwachung von Biogasanlagen zu ermöglichen.
An increasing number of applications target their executions on specific hardware like general purpose Graphics Processing Units. Some Cloud Computing providers offer this specific hardware so that organizations can rent such resources. However, outsourcing the whole application to the Cloud causes avoidable costs if only some parts of the application benefit from the specific expensive hardware. A partial execution of applications in the Cloud is a tradeoff between costs and efficiency. This paper addresses the demand for a consistent framework that allows for a mixture of on- and off-premise calculations by migrating only specific parts to a Cloud. It uses the concept of workflows to present how individual workflow tasks can be migrated to the Cloud whereas the remaining tasks are executed on-premise.
Inference on the basis of high-dimensional and functional data are two topics which are discussed frequently in the current statistical literature. A possibility to include both topics in a single approach is working on a very general space for the underlying observations, such as a separable Hilbert space. We propose a general method for consistently hypothesis testing on the basis of random variables with values in separable Hilbert spaces. We avoid concerns with the curse of dimensionality due to a projection idea. We apply well-known test statistics from nonparametric inference to the projected data and integrate over all projections from a specific set and with respect to suitable probability measures. In contrast to classical methods, which are applicable for real-valued random variables or random vectors of dimensions lower than the sample size, the tests can be applied to random vectors of dimensions larger than the sample size or even to functional and high-dimensional data. In general, resampling procedures such as bootstrap or permutation are suitable to determine critical values. The idea can be extended to the case of incomplete observations. Moreover, we develop an efficient algorithm for implementing the method. Examples are given for testing goodness-of-fit in a one-sample situation in [1] or for testing marginal homogeneity on the basis of a paired sample in [2]. Here, the test statistics in use can be seen as generalizations of the well-known Cramérvon-Mises test statistics in the one-sample and two-samples case. The treatment of other testing problems is possible as well. By using the theory of U-statistics, for instance, asymptotic null distributions of the test statistics are obtained as the sample size tends to infinity. Standard continuity assumptions ensure the asymptotic exactness of the tests under the null hypothesis and that the tests detect any alternative in the limit. Simulation studies demonstrate size and power of the tests in the finite sample case, confirm the theoretical findings, and are used for the comparison with concurring procedures. A possible application of the general approach is inference for stock market returns, also in high data frequencies. In the field of empirical finance, statistical inference of stock market prices usually takes place on the basis of related log-returns as data. In the classical models for stock prices, i.e., the exponential Lévy model, Black-Scholes model, and Merton model, properties such as independence and stationarity of the increments ensure an independent and identically structure of the data. Specific trends during certain periods of the stock price processes can cause complications in this regard. In fact, our approach can compensate those effects by the treatment of the log-returns as random vectors or even as functional data.
Effectiveness of the edge-based smoothed finite element method applied to soft biological tissues
(2012)
Messenger apps like WhatsApp or Telegram are an integral part of daily communication. Besides the various positive effects, those services extend the operating range of criminals. Open trading groups with many thousand participants emerged on Telegram. Law enforcement agencies monitor suspicious users in such chat rooms. This research shows that text analysis, based on natural language processing, facilitates this through a meaningful domain overview and detailed investigations. We crawled a corpus from such self-proclaimed black markets and annotated five attribute types products, money, payment methods, user names, and locations. Based on each message a user sends, we extract and group these attributes to build profiles. Then, we build features to cluster the profiles. Pretrained word vectors yield better unsupervised clustering results than current
state-of-the-art transformer models. The result is a semantically meaningful high-level overview of the user landscape of black market chatrooms. Additionally, the extracted structured information serves as a foundation for further data exploration, for example, the most active users or preferred payment methods.
Messenger apps like WhatsApp and Telegram are frequently used for everyday communication, but they can also be utilized as a platform for illegal activity. Telegram allows public groups with up to 200.000 participants. Criminals use these public groups for trading illegal commodities and services, which becomes a concern for law enforcement agencies, who manually monitor suspicious activity in these chat rooms. This research demonstrates how natural language processing (NLP) can assist in analyzing these chat rooms, providing an explorative overview of the domain and facilitating purposeful analyses of user behavior. We provide a publicly available corpus of annotated text messages with entities and relations from four self-proclaimed black market chat rooms. Our pipeline approach aggregates the extracted product attributes from user messages to profiles and uses these with their sold products as features for clustering. The extracted structured information is the foundation for further data exploration, such as identifying the top vendors or fine-granular price analyses. Our evaluation shows that pretrained word vectors perform better for unsupervised clustering than state-of-the-art transformer models, while the latter is still superior for sequence labeling.