Singh Gill, Amandeep (2024) Automatic Prompt Generator Guided by Ontology for Information Extraction from Semi Structured Documents. Masterarbeit, University of Passau.
Dieses Archiv kann nicht den Volltext zur Verfügung stellen.
Kurzfassung
The rapid advancement of artificial intelligence (AI) has revolutionized the way complex information is processed, offering new opportunities for extracting and interpreting data from technical documents. This thesis presents a novel method for information extraction from technical datasheets, with a focus on the chemical elements domain, leveraging an ontologyguided prompt generation system. Technical datasheets, often presented in semi-structured formats like PDFs, pose challenges for traditional machine learning and natural language processing (NLP) approaches due to their diverse schemas and the need to capture nuanced semantic context. Existing algorithms struggle to effectively extract meaningful information from these complex and unstandardized structures. This research introduces a framework that combines domain-specific ontologies with Large Language Models (LLMs) to address these challenges. By designing ontology-guided prompts based on the MaTic Ontology, this approach enables LLMs to extract key information from technical datasheets, accounting for their semantic complexity and structural variability. The framework is tested on various datasheet formats, including text and tables, demonstrating its ability to handle the intricacies of technical language and document layouts. The study contributes to the fields of NLP and information retrieval by presenting a scalable and adaptable methodology for automating the extraction of domain-specific information. While the primary focus is on chemical elements, the approach is generalizable to other technical domains by incorporating different ontologies. This adaptability enhances its potential for widespread application in automating knowledge extraction, improving research workflows, and enabling AI to address domain-specific challenges across diverse industries.
elib-URL des Eintrags: | https://elib.dlr.de/211814/ | ||||||||
---|---|---|---|---|---|---|---|---|---|
Dokumentart: | Hochschulschrift (Masterarbeit) | ||||||||
Titel: | Automatic Prompt Generator Guided by Ontology for Information Extraction from Semi Structured Documents | ||||||||
Autoren: |
| ||||||||
Datum: | Dezember 2024 | ||||||||
Open Access: | Nein | ||||||||
Status: | veröffentlicht | ||||||||
Stichwörter: | LLM, Information Extraction, NLP, PDF | ||||||||
Institution: | University of Passau | ||||||||
Abteilung: | Faculty of Computer Science and Mathematics | ||||||||
HGF - Forschungsbereich: | keine Zuordnung | ||||||||
HGF - Programm: | keine Zuordnung | ||||||||
HGF - Programmthema: | keine Zuordnung | ||||||||
DLR - Schwerpunkt: | Digitalisierung | ||||||||
DLR - Forschungsgebiet: | D - keine Zuordnung | ||||||||
DLR - Teilgebiet (Projekt, Vorhaben): | D - MaTiC-M | ||||||||
Standort: | Jena | ||||||||
Institute & Einrichtungen: | Institut für Datenwissenschaften > Datenmanagement und -aufbereitung | ||||||||
Hinterlegt von: | Köhler, Tobias Andreas | ||||||||
Hinterlegt am: | 14 Jan 2025 10:01 | ||||||||
Letzte Änderung: | 14 Jan 2025 10:01 |
Nur für Mitarbeiter des Archivs: Kontrollseite des Eintrags