DLR-Logo -> http://www.dlr.de
DLR Portal Home | Imprint | Privacy Policy | Contact | Deutsch
Fontsize: [-] Text [+]

Sentence Retrieval using Stemming and Lemmatization with Different Length of the Queries

Boban, Ivan and Doko, Alen and Gotovac, Sven (2020) Sentence Retrieval using Stemming and Lemmatization with Different Length of the Queries. Advances in Science, Technology and Engineering Systems, 5 (3), pp. 349-354. ASTES Publishers. doi: 10.25046/aj050345. ISSN 2415-6698.

[img] PDF - Published version

Official URL: https://astesj.com/v05/i03/p45/


In this paper we focus on Sentence retrieval which is similar to Document retrieval but with a smaller unit of retrieval. Using data pre-processing in document retrieval is generally considered useful. When it comes to sentence retrieval the situation is not that clear. In this paper we use TF-ISF (term frequency – inverse sentence frequency) method for sentence retrieval. As pre-processing steps, we use stop word removal and language modeling techniques: stemming and lemmatization. We also experiment with different query lengths. The results show that data pre-processing with stemming and lemmatization is useful with sentences retrieval as it is with document retrieval. Lemmatization produces better results with longer queries, while stemming shows worse results with longer queries. For the experiment we used data of the Text Retrieval Conference (TREC) novelty tracks.

Item URL in elib:https://elib.dlr.de/139260/
Document Type:Article
Title:Sentence Retrieval using Stemming and Lemmatization with Different Length of the Queries
AuthorsInstitution or Email of AuthorsAuthor's ORCID iD
Boban, IvanUNSPECIFIEDhttps://orcid.org/0000-0002-1732-6336
Doko, AlenUNSPECIFIEDhttps://orcid.org/0000-0001-7401-3558
Journal or Publication Title:Advances in Science, Technology and Engineering Systems
Refereed publication:Yes
Open Access:Yes
Gold Open Access:No
In ISI Web of Science:No
Page Range:pp. 349-354
Publisher:ASTES Publishers
Keywords:Sentence retrieval, TF-ISF, Data pre-processing, Stemming, Lemmatization
HGF - Research field:Aeronautics, Space and Transport
HGF - Program:Aeronautics
HGF - Program Themes:fixed-wing aircraft
DLR - Research area:Aeronautics
DLR - Program:L AR - Aircraft Research
DLR - Research theme (Project):L - Simulation and Validation (old)
Location: Bremen
Institutes and Institutions:Institute for Software Technology
Institute for Software Technology > Intelligent and Distributed Systems
Deposited By: Doko, Alen
Deposited On:14 Dec 2020 09:51
Last Modified:16 Jun 2021 13:45

Repository Staff Only: item control page

Help & Contact
electronic library is running on EPrints 3.3.12
Website and database design: Copyright © German Aerospace Center (DLR). All rights reserved.