Refine
Year of publication
Document Type
- Conference Proceeding (18)
- Article (7)
- Book (1)
- Part of a Book (1)
- Other (1)
Has Fulltext
- no (28) (remove)
Keywords
- Natural language processing (4)
- Clustering (2)
- Information extraction (2)
- Active learning (1)
- Deep learning (1)
- Information Extraction (1)
- Machine learning (1)
- Natural Language Processing (1)
- Natural language understanding (1)
- Process model (1)
- Profile Extraction (1)
- Profile extraction (1)
- Query learning (1)
- Relation classification (1)
- Reproducible research (1)
- Text Mining (1)
- Text mining (1)
- Trustworthy artificial intelligence (1)
Messenger apps like WhatsApp or Telegram are an integral part of daily communication. Besides the various positive effects, those services extend the operating range of criminals. Open trading groups with many thousand participants emerged on Telegram. Law enforcement agencies monitor suspicious users in such chat rooms. This research shows that text analysis, based on natural language processing, facilitates this through a meaningful domain overview and detailed investigations. We crawled a corpus from such self-proclaimed black markets and annotated five attribute types products, money, payment methods, user names, and locations. Based on each message a user sends, we extract and group these attributes to build profiles. Then, we build features to cluster the profiles. Pretrained word vectors yield better unsupervised clustering results than current
state-of-the-art transformer models. The result is a semantically meaningful high-level overview of the user landscape of black market chatrooms. Additionally, the extracted structured information serves as a foundation for further data exploration, for example, the most active users or preferred payment methods.
Messenger apps like WhatsApp and Telegram are frequently used for everyday communication, but they can also be utilized as a platform for illegal activity. Telegram allows public groups with up to 200.000 participants. Criminals use these public groups for trading illegal commodities and services, which becomes a concern for law enforcement agencies, who manually monitor suspicious activity in these chat rooms. This research demonstrates how natural language processing (NLP) can assist in analyzing these chat rooms, providing an explorative overview of the domain and facilitating purposeful analyses of user behavior. We provide a publicly available corpus of annotated text messages with entities and relations from four self-proclaimed black market chat rooms. Our pipeline approach aggregates the extracted product attributes from user messages to profiles and uses these with their sold products as features for clustering. The extracted structured information is the foundation for further data exploration, such as identifying the top vendors or fine-granular price analyses. Our evaluation shows that pretrained word vectors perform better for unsupervised clustering than state-of-the-art transformer models, while the latter is still superior for sequence labeling.
Agil ist im Trend und immer mehr Unternehmen, die ihre Projekte bisher nach klassischen Prinzipien durchführten, denken über den Einsatz agiler Methoden nach. Doch selbst wenn die Organisation bereits beide Philosophien unterstützt, gilt für ein Projekt meist die klare Vorgabe: agil oder klassisch. Es gibt aber noch einen anderen Ansatz, mit diesen "unterschiedlichen Welten" umzugehen: Und zwar die beiden Philosophien innerhalb eines Projekts zu kombinieren. Wie dies in der Praxis aussehen und gelingen kann, zeigen Dr. Michael Kirchhof und Prof. Dr. Bodo Kraft in diesem Beitrag.
Multi-attribute relation extraction (MARE): simplifying the application of relation extraction
(2021)
Natural language understanding’s relation extraction makes innovative and encouraging novel business concepts possible and facilitates new digitilized decision-making processes. Current approaches allow the extraction of relations with a fixed number of entities as attributes. Extracting relations with an arbitrary amount of attributes requires complex systems and costly relation-trigger annotations to assist these systems. We introduce multi-attribute relation extraction (MARE) as an assumption-less problem formulation with two approaches, facilitating an explicit mapping from business use cases to the data annotations. Avoiding elaborated annotation constraints simplifies the application of relation extraction approaches. The evaluation compares our models to current state-of-the-art event extraction and binary relation extraction methods. Our approaches show improvement compared to these on the extraction of general multi-attribute relations.
This paper presents NLP Lean Programming
framework (NLPf), a new framework
for creating custom natural language processing
(NLP) models and pipelines by utilizing
common software development build systems.
This approach allows developers to train and
integrate domain-specific NLP pipelines into
their applications seamlessly. Additionally,
NLPf provides an annotation tool which improves
the annotation process significantly by
providing a well-designed GUI and sophisticated
way of using input devices. Due to
NLPf’s properties developers and domain experts
are able to build domain-specific NLP
applications more efficiently. NLPf is Opensource
software and available at https://
gitlab.com/schrieveslaach/NLPf.