Farzana, Sheikh Mastura (2021) Enhancing Term-Based Document Retrieval by Word Embedding and Transformer Models. Master's, Rheinische Friedrich-Wilhelms-Universität Bonn.
![]() |
PDF
1MB |
Abstract
Document retrieval implies the process of obtaining most relevant documents based on some query from a large corpus of documents. Traditional document retrieval methods focus on the existence and/or non-existence of the query terms in a particular document to assess relevance of the document to the query. However, a document can be contextually relevant to some query without containing the exact query words, or the document might contain the query term and still be about some completely different topic. Hence arises the need of context aware document retrieval systems. In this thesis, we focus on enhancing document retrieval methods in order to capture the contextual relevance of a document to a certain query. The primary components used to achieve our goals are word embedding models and transformer-based pre-trained natural language models. We propose three different approaches for enhancing document retrieval methods. We use three different datasets to evaluate our models and compare the results with classical document retrieval models.
Item URL in elib: | https://elib.dlr.de/147631/ | ||||||
---|---|---|---|---|---|---|---|
Document Type: | Thesis (Master's) | ||||||
Title: | Enhancing Term-Based Document Retrieval by Word Embedding and Transformer Models | ||||||
Authors: |
| ||||||
Date: | 25 November 2021 | ||||||
Refereed publication: | No | ||||||
Open Access: | Yes | ||||||
Gold Open Access: | No | ||||||
In SCOPUS: | No | ||||||
In ISI Web of Science: | No | ||||||
Status: | Published | ||||||
Keywords: | Text Mining, Information Retrieval, Word Embeddings, Transformer-based Language Models | ||||||
Institution: | Rheinische Friedrich-Wilhelms-Universität Bonn | ||||||
Department: | Institut für Informatik | ||||||
HGF - Research field: | other | ||||||
HGF - Program: | other | ||||||
HGF - Program Themes: | other | ||||||
DLR - Research area: | no assignment | ||||||
DLR - Program: | no assignment | ||||||
DLR - Research theme (Project): | no assignment | ||||||
Location: | Köln-Porz | ||||||
Institutes and Institutions: | Institute for Software Technology > Intelligent and Distributed Systems Institute for Software Technology | ||||||
Deposited By: | Hamm, Andreas | ||||||
Deposited On: | 16 Dec 2021 07:59 | ||||||
Last Modified: | 29 Sep 2022 11:20 |
Repository Staff Only: item control page