elib
DLR-Header
DLR-Logo -> http://www.dlr.de
DLR Portal Home | Impressum | Datenschutz | Kontakt | English
Schriftgröße: [-] Text [+]

Automatic Prompt Generator Guided by Ontology for Information Extraction from Semi Structured Documents

Singh Gill, Amandeep (2024) Automatic Prompt Generator Guided by Ontology for Information Extraction from Semi Structured Documents. Masterarbeit, University of Passau.

Dieses Archiv kann nicht den Volltext zur Verfügung stellen.

Kurzfassung

The rapid advancement of artificial intelligence (AI) has revolutionized the way complex information is processed, offering new opportunities for extracting and interpreting data from technical documents. This thesis presents a novel method for information extraction from technical datasheets, with a focus on the chemical elements domain, leveraging an ontologyguided prompt generation system. Technical datasheets, often presented in semi-structured formats like PDFs, pose challenges for traditional machine learning and natural language processing (NLP) approaches due to their diverse schemas and the need to capture nuanced semantic context. Existing algorithms struggle to effectively extract meaningful information from these complex and unstandardized structures. This research introduces a framework that combines domain-specific ontologies with Large Language Models (LLMs) to address these challenges. By designing ontology-guided prompts based on the MaTic Ontology, this approach enables LLMs to extract key information from technical datasheets, accounting for their semantic complexity and structural variability. The framework is tested on various datasheet formats, including text and tables, demonstrating its ability to handle the intricacies of technical language and document layouts. The study contributes to the fields of NLP and information retrieval by presenting a scalable and adaptable methodology for automating the extraction of domain-specific information. While the primary focus is on chemical elements, the approach is generalizable to other technical domains by incorporating different ontologies. This adaptability enhances its potential for widespread application in automating knowledge extraction, improving research workflows, and enabling AI to address domain-specific challenges across diverse industries.

elib-URL des Eintrags:https://elib.dlr.de/211814/
Dokumentart:Hochschulschrift (Masterarbeit)
Titel:Automatic Prompt Generator Guided by Ontology for Information Extraction from Semi Structured Documents
Autoren:
AutorenInstitution oder E-Mail-AdresseAutoren-ORCID-iDORCID Put Code
Singh Gill, AmandeepNICHT SPEZIFIZIERTNICHT SPEZIFIZIERTNICHT SPEZIFIZIERT
Datum:Dezember 2024
Open Access:Nein
Status:veröffentlicht
Stichwörter:LLM, Information Extraction, NLP, PDF
Institution:University of Passau
Abteilung:Faculty of Computer Science and Mathematics
HGF - Forschungsbereich:keine Zuordnung
HGF - Programm:keine Zuordnung
HGF - Programmthema:keine Zuordnung
DLR - Schwerpunkt:Digitalisierung
DLR - Forschungsgebiet:D - keine Zuordnung
DLR - Teilgebiet (Projekt, Vorhaben):D - MaTiC-M
Standort: Jena
Institute & Einrichtungen:Institut für Datenwissenschaften > Datenmanagement und -aufbereitung
Hinterlegt von: Köhler, Tobias Andreas
Hinterlegt am:14 Jan 2025 10:01
Letzte Änderung:14 Jan 2025 10:01

Nur für Mitarbeiter des Archivs: Kontrollseite des Eintrags

Blättern
Suchen
Hilfe & Kontakt
Informationen
electronic library verwendet EPrints 3.3.12
Gestaltung Webseite und Datenbank: Copyright © Deutsches Zentrum für Luft- und Raumfahrt (DLR). Alle Rechte vorbehalten.