elib
DLR-Header
DLR-Logo -> http://www.dlr.de
DLR Portal Home | Imprint | Privacy Policy | Contact | Deutsch
Fontsize: [-] Text [+]

OntoHuman: User Interface for Ontology-based Information Extraction from Technical Documents with Human-in-the-loop interaction

Opasjumruskit, Kobkaew (2022) OntoHuman: User Interface for Ontology-based Information Extraction from Technical Documents with Human-in-the-loop interaction. NFDI4Ing Conference 2022, 2022-10-26 - 2022-10-27, Online.

[img] PDF
410kB
[img] PDF
1MB

Official URL: https://nfdi4ing.de/conference_abstracts/#803

Abstract

In this talk, we present DSAT (Document Semantic Annotation Tool), a tool to automatically extract information from technical documents based on ontologies and natural language processing techniques, within the context of the OntoHuman project. The OntoHuman project aimed to enrich ontologies, which contain semantic information describing objects or concepts, with information extracted from documents. The central component of the OntoHuman Project is DSAT, which was originally designed for assisting users to annotate key-value-unit tuples on technical documents. Besides the user interface, there are other modules used in OntoHuman: an ontology enrichment module (ConTrOn - Continuously Trained Ontology), a DSAT database (DSAT DB) for storing annotations and custom ontologies, and an information extraction module (PLIX). These components are integrated into OntoHuman to achieve the automatic information extraction. Prior to OntoHuman, the ontologies used by DSAT for the automatic extraction were fixed and limited to one specific domain, i.e. spacecraft engineering. To update and customize an ontology manually is tedious and requires additional efforts to use ontology modelling tools. Therefore, a semi-automatic process to enrich ontologies can assist domain experts, who are not necessarily ontology experts, to map their knowledge into ontologies. To enable the customization of ontologies, we improved DSAT and ConTrOn in the OntoHuman project. We also pursue the Human-in-the-Loop (HiL) approach, which requires humans to verify the results of an automatic process by providing feedback to the system. We combined the HiL component to generalize the automatic information extraction process. In contrast to the prototypical solution, we now can apply and customize ontologies to extract data from documents of other domains. Feedback from users can now be collected via a web-based user interface and used for updating ontologies further. The following proposed features were implemented: correction of automatically extracted data, resolution of word ambiguities, adding new annotations, and export function for annotations. Additionally, we simplified the UI according to feedback from workshops participants from the NDFI4Ing community. We also conducted a user survey and received rather good rating for the tool (DSAT). Regarding the user experience, the tool is considered to be easy to use (6 points out of 7), supportive (5.5/7), efficient (6/7) and novel (5/7). The workshop's participants rated the domain of usage of DSAT to generic purpose (rated 3.5 points out of 5), somewhat relevant to their colleagues' work (3/5), and not very relevant to their own work (2/5). However, since the participants of the workshops were limited to 9 and 6 persons, we hope to collect further feedback and attract more users from various fields of work during this conference. Since the automatic annotation of documents depends largely on the used ontologies, to fully use the tools for other domains, users should know where to find relevant ontologies. An ontology search API could be used to assist the users to find the right ontologies in the future. Furthermore, the suggested topics we collected from the workshops, such as semantic disambiguation, multi-language support, and graph value extraction are rather complicated topics. Therefore, we decided to research these topics beyond the project period. They are currently studied and could be integrated into DSAT in the future.

Item URL in elib:https://elib.dlr.de/189331/
Document Type:Conference or Workshop Item (Speech)
Title:OntoHuman: User Interface for Ontology-based Information Extraction from Technical Documents with Human-in-the-loop interaction
Authors:
AuthorsInstitution or Email of AuthorsAuthor's ORCID iDORCID Put Code
Opasjumruskit, KobkaewUNSPECIFIEDhttps://orcid.org/0000-0002-9206-6896UNSPECIFIED
Date:October 2022
Refereed publication:Yes
Open Access:Yes
Gold Open Access:No
In SCOPUS:No
In ISI Web of Science:No
Status:Accepted
Keywords:Ontology, Information Extraction, Human-in-the-Loop
Event Title:NFDI4Ing Conference 2022
Event Location:Online
Event Type:national Conference
Event Start Date:26 October 2022
Event End Date:27 October 2022
HGF - Research field:Aeronautics, Space and Transport
HGF - Program:Space
HGF - Program Themes:Space System Technology
DLR - Research area:Raumfahrt
DLR - Program:R SY - Space System Technology
DLR - Research theme (Project):R - Digital production techniques for aerospace
Location: Jena
Institutes and Institutions:Institute of Data Science > Data Management and Enrichment
Institute of Data Science > Smart Systems for Digitalization
Deposited By: Opasjumruskit, Kobkaew
Deposited On:02 Nov 2022 10:54
Last Modified:24 Apr 2024 20:50

Repository Staff Only: item control page

Browse
Search
Help & Contact
Information
electronic library is running on EPrints 3.3.12
Website and database design: Copyright © German Aerospace Center (DLR). All rights reserved.