elib
DLR-Header
DLR-Logo -> http://www.dlr.de
DLR Portal Home | Impressum | Datenschutz | Kontakt | English
Schriftgröße: [-] Text [+]

Novel Approaches For Literature-Based Discovery

Bensch, Oliver (2021) Novel Approaches For Literature-Based Discovery. Masterarbeit, Maastricht University.

[img] PDF
3MB

Kurzfassung

Literature-Based Discovery (LBD) is a technique for generating novel hypotheses from scientific corpora. The typical LBD procedure comprises three stages. The first phase identifies significant concepts within a text corpus. The second phase involves the extraction of relationships between detected concepts to construct a concept graph. Finally, the unconnected concept pairs are ranked. New hypotheses could be generated based on the rank of the unconnected concepts, which could indicate novel discoveries. However, LBD is typically applied to medical literature, as this domain has annotated corpora and ontologies such as the Unified Medical Language System UMLS that facilitate knowledge extraction. This work assesses novel approaches concerning all three phases of LBD for open-domain scientific literature. For concept detection, several methods such as Named Entity Recognition models were trained and evaluated on datasets like STEM-ECR. On this dataset, the model "SciBERT-cased-STEM-ECR" achieved an F1 score of 65,4% and was used to detect concepts in a collection of abstracts retrieved from the German Aerospace Center’s (DLR) publication server eLib. Additionally, SciElectraSmall++ models were trained on a subset of the AMiner corpus, which significantly improved the performance for concept detection on the STEM-ECR dataset compared to Electra-Small++ models trained on the OpenWebText corpus. For relation extraction, Question Answering models trained on SQuAD 2.0 were first evaluated on the SciECR dataset. The "Electra-large-squad2" model identified 44,7% of relations and concepts with an average word error rate of 23,8% and was used to extract relationships between the detected concepts in the DLR dataset to create a concept graph. To compare the novel approaches to conventional methods, another graph based on tf.idf and co-occurrence was created. Using an online questionnaire the created graphs were evaluated by domain experts (N=11). The experts classified all six random sampled concept pairs detected by the question answering approach, like "virtual environments"-"USED-FOR"-"interactive exploration", as related. Whereas only 57,14% of relations, such as "scales"-"related"-"weight", in the graph, created using tf.idf and co-occurence were determined as correct by most of the experts. A search engine was developed to perform manual LBD on created graphs.The concept networks were grouped according to the year of the corresponding abstracts. It was evaluated whether link prediction methods such as common neighbors could forecast future relationships. Based on the graph up to 2018 common neighbors predicted 65 new relations, of which one matched the 443 relations added between 2018 and 2021.

elib-URL des Eintrags:https://elib.dlr.de/147246/
Dokumentart:Hochschulschrift (Masterarbeit)
Titel:Novel Approaches For Literature-Based Discovery
Autoren:
AutorenInstitution oder E-Mail-AdresseAutoren-ORCID-iDORCID Put Code
Bensch, OliverNICHT SPEZIFIZIERTNICHT SPEZIFIZIERTNICHT SPEZIFIZIERT
Datum:2021
Referierte Publikation:Nein
Open Access:Ja
Seitenanzahl:77
Status:nicht veröffentlicht
Stichwörter:Literature-based discovery, Text Mining, Graph Mining, Knowledge Graphs
Institution:Maastricht University
Abteilung:Department of Data Science and Knowledge Engineering
HGF - Forschungsbereich:Luftfahrt, Raumfahrt und Verkehr
HGF - Programm:Raumfahrt
HGF - Programmthema:keine Zuordnung
DLR - Schwerpunkt:Raumfahrt
DLR - Forschungsgebiet:R - keine Zuordnung
DLR - Teilgebiet (Projekt, Vorhaben):R - keine Zuordnung
Standort: Köln-Porz
Institute & Einrichtungen:Institut für Softwaretechnologie
Institut für Softwaretechnologie > Intelligente und verteilte Systeme
Hinterlegt von: Hecking, Dr. Tobias
Hinterlegt am:15 Dez 2021 10:51
Letzte Änderung:15 Dez 2021 10:51

Nur für Mitarbeiter des Archivs: Kontrollseite des Eintrags

Blättern
Suchen
Hilfe & Kontakt
Informationen
electronic library verwendet EPrints 3.3.12
Gestaltung Webseite und Datenbank: Copyright © Deutsches Zentrum für Luft- und Raumfahrt (DLR). Alle Rechte vorbehalten.