elib
DLR-Header
DLR-Logo -> http://www.dlr.de
DLR Portal Home | Imprint | Privacy Policy | Contact | Deutsch
Fontsize: [-] Text [+]

Novel Approaches For Literature-Based Discovery

Bensch, Oliver (2021) Novel Approaches For Literature-Based Discovery. Master's, Maastricht University.

[img] PDF
3MB

Abstract

Literature-Based Discovery (LBD) is a technique for generating novel hypotheses from scientific corpora. The typical LBD procedure comprises three stages. The first phase identifies significant concepts within a text corpus. The second phase involves the extraction of relationships between detected concepts to construct a concept graph. Finally, the unconnected concept pairs are ranked. New hypotheses could be generated based on the rank of the unconnected concepts, which could indicate novel discoveries. However, LBD is typically applied to medical literature, as this domain has annotated corpora and ontologies such as the Unified Medical Language System UMLS that facilitate knowledge extraction. This work assesses novel approaches concerning all three phases of LBD for open-domain scientific literature. For concept detection, several methods such as Named Entity Recognition models were trained and evaluated on datasets like STEM-ECR. On this dataset, the model "SciBERT-cased-STEM-ECR" achieved an F1 score of 65,4% and was used to detect concepts in a collection of abstracts retrieved from the German Aerospace Center’s (DLR) publication server eLib. Additionally, SciElectraSmall++ models were trained on a subset of the AMiner corpus, which significantly improved the performance for concept detection on the STEM-ECR dataset compared to Electra-Small++ models trained on the OpenWebText corpus. For relation extraction, Question Answering models trained on SQuAD 2.0 were first evaluated on the SciECR dataset. The "Electra-large-squad2" model identified 44,7% of relations and concepts with an average word error rate of 23,8% and was used to extract relationships between the detected concepts in the DLR dataset to create a concept graph. To compare the novel approaches to conventional methods, another graph based on tf.idf and co-occurrence was created. Using an online questionnaire the created graphs were evaluated by domain experts (N=11). The experts classified all six random sampled concept pairs detected by the question answering approach, like "virtual environments"-"USED-FOR"-"interactive exploration", as related. Whereas only 57,14% of relations, such as "scales"-"related"-"weight", in the graph, created using tf.idf and co-occurence were determined as correct by most of the experts. A search engine was developed to perform manual LBD on created graphs.The concept networks were grouped according to the year of the corresponding abstracts. It was evaluated whether link prediction methods such as common neighbors could forecast future relationships. Based on the graph up to 2018 common neighbors predicted 65 new relations, of which one matched the 443 relations added between 2018 and 2021.

Item URL in elib:https://elib.dlr.de/147246/
Document Type:Thesis (Master's)
Title:Novel Approaches For Literature-Based Discovery
Authors:
AuthorsInstitution or Email of AuthorsAuthor's ORCID iD
Bensch, OliverUNSPECIFIEDUNSPECIFIED
Date:2021
Refereed publication:No
Open Access:Yes
Gold Open Access:No
In SCOPUS:No
In ISI Web of Science:No
Number of Pages:77
Status:Unpublished
Keywords:Literature-based discovery, Text Mining, Graph Mining, Knowledge Graphs
Institution:Maastricht University
Department:Department of Data Science and Knowledge Engineering
HGF - Research field:Aeronautics, Space and Transport
HGF - Program:Space
HGF - Program Themes:other
DLR - Research area:Raumfahrt
DLR - Program:R - no assignment
DLR - Research theme (Project):R - no assignment
Location: Köln-Porz
Institutes and Institutions:Institute for Software Technology
Institute for Software Technology > Intelligent and Distributed Systems
Deposited By: Hecking, Tobias
Deposited On:15 Dec 2021 10:51
Last Modified:15 Dec 2021 10:51

Repository Staff Only: item control page

Browse
Search
Help & Contact
Information
electronic library is running on EPrints 3.3.12
Website and database design: Copyright © German Aerospace Center (DLR). All rights reserved.