Opasjumruskit, Kobkaew (2022) OntoHuman: User Interface for Ontology-based Information Extraction from Technical Documents with Human-in-the-loop interaction. NFDI4Ing Conference 2022, 2022-10-26 - 2022-10-27, Online.
PDF
410kB | |
PDF
1MB |
Offizielle URL: https://nfdi4ing.de/conference_abstracts/#803
Kurzfassung
In this talk, we present DSAT (Document Semantic Annotation Tool), a tool to automatically extract information from technical documents based on ontologies and natural language processing techniques, within the context of the OntoHuman project. The OntoHuman project aimed to enrich ontologies, which contain semantic information describing objects or concepts, with information extracted from documents. The central component of the OntoHuman Project is DSAT, which was originally designed for assisting users to annotate key-value-unit tuples on technical documents. Besides the user interface, there are other modules used in OntoHuman: an ontology enrichment module (ConTrOn - Continuously Trained Ontology), a DSAT database (DSAT DB) for storing annotations and custom ontologies, and an information extraction module (PLIX). These components are integrated into OntoHuman to achieve the automatic information extraction. Prior to OntoHuman, the ontologies used by DSAT for the automatic extraction were fixed and limited to one specific domain, i.e. spacecraft engineering. To update and customize an ontology manually is tedious and requires additional efforts to use ontology modelling tools. Therefore, a semi-automatic process to enrich ontologies can assist domain experts, who are not necessarily ontology experts, to map their knowledge into ontologies. To enable the customization of ontologies, we improved DSAT and ConTrOn in the OntoHuman project. We also pursue the Human-in-the-Loop (HiL) approach, which requires humans to verify the results of an automatic process by providing feedback to the system. We combined the HiL component to generalize the automatic information extraction process. In contrast to the prototypical solution, we now can apply and customize ontologies to extract data from documents of other domains. Feedback from users can now be collected via a web-based user interface and used for updating ontologies further. The following proposed features were implemented: correction of automatically extracted data, resolution of word ambiguities, adding new annotations, and export function for annotations. Additionally, we simplified the UI according to feedback from workshops participants from the NDFI4Ing community. We also conducted a user survey and received rather good rating for the tool (DSAT). Regarding the user experience, the tool is considered to be easy to use (6 points out of 7), supportive (5.5/7), efficient (6/7) and novel (5/7). The workshop's participants rated the domain of usage of DSAT to generic purpose (rated 3.5 points out of 5), somewhat relevant to their colleagues' work (3/5), and not very relevant to their own work (2/5). However, since the participants of the workshops were limited to 9 and 6 persons, we hope to collect further feedback and attract more users from various fields of work during this conference. Since the automatic annotation of documents depends largely on the used ontologies, to fully use the tools for other domains, users should know where to find relevant ontologies. An ontology search API could be used to assist the users to find the right ontologies in the future. Furthermore, the suggested topics we collected from the workshops, such as semantic disambiguation, multi-language support, and graph value extraction are rather complicated topics. Therefore, we decided to research these topics beyond the project period. They are currently studied and could be integrated into DSAT in the future.
elib-URL des Eintrags: | https://elib.dlr.de/189331/ | ||||||||
---|---|---|---|---|---|---|---|---|---|
Dokumentart: | Konferenzbeitrag (Vortrag) | ||||||||
Titel: | OntoHuman: User Interface for Ontology-based Information Extraction from Technical Documents with Human-in-the-loop interaction | ||||||||
Autoren: |
| ||||||||
Datum: | Oktober 2022 | ||||||||
Referierte Publikation: | Ja | ||||||||
Open Access: | Ja | ||||||||
Gold Open Access: | Nein | ||||||||
In SCOPUS: | Nein | ||||||||
In ISI Web of Science: | Nein | ||||||||
Status: | akzeptierter Beitrag | ||||||||
Stichwörter: | Ontology, Information Extraction, Human-in-the-Loop | ||||||||
Veranstaltungstitel: | NFDI4Ing Conference 2022 | ||||||||
Veranstaltungsort: | Online | ||||||||
Veranstaltungsart: | nationale Konferenz | ||||||||
Veranstaltungsbeginn: | 26 Oktober 2022 | ||||||||
Veranstaltungsende: | 27 Oktober 2022 | ||||||||
HGF - Forschungsbereich: | Luftfahrt, Raumfahrt und Verkehr | ||||||||
HGF - Programm: | Raumfahrt | ||||||||
HGF - Programmthema: | Technik für Raumfahrtsysteme | ||||||||
DLR - Schwerpunkt: | Raumfahrt | ||||||||
DLR - Forschungsgebiet: | R SY - Technik für Raumfahrtsysteme | ||||||||
DLR - Teilgebiet (Projekt, Vorhaben): | R - Digitale Produktionstechniken für die Raumfahrt | ||||||||
Standort: | Jena | ||||||||
Institute & Einrichtungen: | Institut für Datenwissenschaften > Datenmanagement und -aufbereitung Institut für Datenwissenschaften > Softwaresysteme für die Digitalisierung | ||||||||
Hinterlegt von: | Opasjumruskit, Kobkaew | ||||||||
Hinterlegt am: | 02 Nov 2022 10:54 | ||||||||
Letzte Änderung: | 24 Apr 2024 20:50 |
Nur für Mitarbeiter des Archivs: Kontrollseite des Eintrags