Farzana, Sheikh Mastura (2021) Enhancing Term-Based Document Retrieval by Word Embedding and Transformer Models. Masterarbeit, Rheinische Friedrich-Wilhelms-Universität Bonn.
PDF
1MB |
Kurzfassung
Document retrieval implies the process of obtaining most relevant documents based on some query from a large corpus of documents. Traditional document retrieval methods focus on the existence and/or non-existence of the query terms in a particular document to assess relevance of the document to the query. However, a document can be contextually relevant to some query without containing the exact query words, or the document might contain the query term and still be about some completely different topic. Hence arises the need of context aware document retrieval systems. In this thesis, we focus on enhancing document retrieval methods in order to capture the contextual relevance of a document to a certain query. The primary components used to achieve our goals are word embedding models and transformer-based pre-trained natural language models. We propose three different approaches for enhancing document retrieval methods. We use three different datasets to evaluate our models and compare the results with classical document retrieval models.
elib-URL des Eintrags: | https://elib.dlr.de/147631/ | ||||||||
---|---|---|---|---|---|---|---|---|---|
Dokumentart: | Hochschulschrift (Masterarbeit) | ||||||||
Titel: | Enhancing Term-Based Document Retrieval by Word Embedding and Transformer Models | ||||||||
Autoren: |
| ||||||||
Datum: | 25 November 2021 | ||||||||
Referierte Publikation: | Nein | ||||||||
Open Access: | Ja | ||||||||
Status: | veröffentlicht | ||||||||
Stichwörter: | Text Mining, Information Retrieval, Word Embeddings, Transformer-based Language Models | ||||||||
Institution: | Rheinische Friedrich-Wilhelms-Universität Bonn | ||||||||
Abteilung: | Institut für Informatik | ||||||||
HGF - Forschungsbereich: | keine Zuordnung | ||||||||
HGF - Programm: | keine Zuordnung | ||||||||
HGF - Programmthema: | keine Zuordnung | ||||||||
DLR - Schwerpunkt: | keine Zuordnung | ||||||||
DLR - Forschungsgebiet: | keine Zuordnung | ||||||||
DLR - Teilgebiet (Projekt, Vorhaben): | keine Zuordnung | ||||||||
Standort: | Köln-Porz | ||||||||
Institute & Einrichtungen: | Institut für Softwaretechnologie > Intelligente und verteilte Systeme Institut für Softwaretechnologie | ||||||||
Hinterlegt von: | Hamm, Dr. Andreas | ||||||||
Hinterlegt am: | 16 Dez 2021 07:59 | ||||||||
Letzte Änderung: | 29 Sep 2022 11:20 |
Nur für Mitarbeiter des Archivs: Kontrollseite des Eintrags