FUSION: Feature-based Processing of Heterogeneous Documents for Automated Information Extraction

  • Information Extraction (IE) processes are often business-critical, but very hard to automate due to a heterogeneous data basis. Specific document characteristics, also called features, influence the optimal way of processing. Architecture for Automated Generation of Distributed Information Extraction Pipelines (ARTIFACT) supports businesses in successively automating their IE processes by finding optimal IE pipelines. However, ARTIFACT treats each document the same way, and does not enable document-specific processing. Single solution strategies can perform extraordinarily well for documents with particular traits. While manual approvals are superfluous for these documents, ARTIFACT does not provide the opportunity for Fully Automatic Processing (FAP). Therefore, we introduce an enhanced pattern that integrates an extensible and domain-independent concept of feature detection based on microservices. Due to this, we create two fundamental benefits. First, the document-specific process ing increases the quality of automated generated IE pipelines. Second, the system enables FAP to eliminate superfluous approval efforts.

Export metadata

Additional Services

Share in X Search Google Scholar
Metadaten
Author:Michael SildatkeORCiD, Hendrik Karwanni, Bodo Kraft, Albert Zündorf
DOI:https://doi.org/10.5220/0011351100003266
ISBN:978-989-758-588-3
ISSN:2184-2833
Parent Title (English):Proceedings of the 17th International Conference on Software Technologies - ICSOFT
Document Type:Conference Proceeding
Language:English
Year of Completion:2022
First Page:250
Last Page:260
Note:
17th International Conference on Software Technologies, July 11-13, 2022, in Lisbon, Portugal
Link:https://www.scitepress.org/Link.aspx?doi=10.5220/0011351100003266
Institutes:FH Aachen / Fachbereich Energietechnik
FH Aachen / Fachbereich Medizintechnik und Technomathematik