elib
DLR-Header
DLR-Logo -> http://www.dlr.de
DLR Portal Home | Imprint | Privacy Policy | Contact | Deutsch
Fontsize: [-] Text [+]

Enhancing Term-Based Document Retrieval by Word Embedding and Transformer Models

Farzana, Sheikh Mastura (2021) Enhancing Term-Based Document Retrieval by Word Embedding and Transformer Models. Master's, Rheinische Friedrich-Wilhelms-Universität Bonn.

[img] PDF
1MB

Abstract

Document retrieval implies the process of obtaining most relevant documents based on some query from a large corpus of documents. Traditional document retrieval methods focus on the existence and/or non-existence of the query terms in a particular document to assess relevance of the document to the query. However, a document can be contextually relevant to some query without containing the exact query words, or the document might contain the query term and still be about some completely different topic. Hence arises the need of context aware document retrieval systems. In this thesis, we focus on enhancing document retrieval methods in order to capture the contextual relevance of a document to a certain query. The primary components used to achieve our goals are word embedding models and transformer-based pre-trained natural language models. We propose three different approaches for enhancing document retrieval methods. We use three different datasets to evaluate our models and compare the results with classical document retrieval models.

Item URL in elib:https://elib.dlr.de/147631/
Document Type:Thesis (Master's)
Title:Enhancing Term-Based Document Retrieval by Word Embedding and Transformer Models
Authors:
AuthorsInstitution or Email of AuthorsAuthor's ORCID iD
Farzana, Sheikh MasturaUNSPECIFIEDUNSPECIFIED
Date:25 November 2021
Refereed publication:No
Open Access:Yes
Gold Open Access:No
In SCOPUS:No
In ISI Web of Science:No
Status:Published
Keywords:Text Mining, Information Retrieval, Word Embeddings, Transformer-based Language Models
Institution:Rheinische Friedrich-Wilhelms-Universität Bonn
Department:Institut für Informatik
HGF - Research field:other
HGF - Program:other
HGF - Program Themes:other
DLR - Research area:no assignment
DLR - Program:no assignment
DLR - Research theme (Project):no assignment
Location: Köln-Porz
Institutes and Institutions:Institute for Software Technology > Intelligent and Distributed Systems
Institute for Software Technology
Deposited By: Hamm, Andreas
Deposited On:16 Dec 2021 07:59
Last Modified:29 Sep 2022 11:20

Repository Staff Only: item control page

Browse
Search
Help & Contact
Information
electronic library is running on EPrints 3.3.12
Website and database design: Copyright © German Aerospace Center (DLR). All rights reserved.