Boban, Ivan und Doko, Alen und Gotovac, Sven (2020) Sentence Retrieval using Stemming and Lemmatization with Different Length of the Queries. Advances in Science, Technology and Engineering Systems, 5 (3), Seiten 349-354. ASTES Publishers. doi: 10.25046/aj050345. ISSN 2415-6698.
PDF
- Verlagsversion (veröffentlichte Fassung)
352kB |
Offizielle URL: https://astesj.com/v05/i03/p45/
Kurzfassung
In this paper we focus on Sentence retrieval which is similar to Document retrieval but with a smaller unit of retrieval. Using data pre-processing in document retrieval is generally considered useful. When it comes to sentence retrieval the situation is not that clear. In this paper we use TF-ISF (term frequency – inverse sentence frequency) method for sentence retrieval. As pre-processing steps, we use stop word removal and language modeling techniques: stemming and lemmatization. We also experiment with different query lengths. The results show that data pre-processing with stemming and lemmatization is useful with sentences retrieval as it is with document retrieval. Lemmatization produces better results with longer queries, while stemming shows worse results with longer queries. For the experiment we used data of the Text Retrieval Conference (TREC) novelty tracks.
elib-URL des Eintrags: | https://elib.dlr.de/139260/ | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Dokumentart: | Zeitschriftenbeitrag | ||||||||||||||||
Titel: | Sentence Retrieval using Stemming and Lemmatization with Different Length of the Queries | ||||||||||||||||
Autoren: |
| ||||||||||||||||
Datum: | 2020 | ||||||||||||||||
Erschienen in: | Advances in Science, Technology and Engineering Systems | ||||||||||||||||
Referierte Publikation: | Ja | ||||||||||||||||
Open Access: | Ja | ||||||||||||||||
Gold Open Access: | Nein | ||||||||||||||||
In SCOPUS: | Ja | ||||||||||||||||
In ISI Web of Science: | Nein | ||||||||||||||||
Band: | 5 | ||||||||||||||||
DOI: | 10.25046/aj050345 | ||||||||||||||||
Seitenbereich: | Seiten 349-354 | ||||||||||||||||
Verlag: | ASTES Publishers | ||||||||||||||||
ISSN: | 2415-6698 | ||||||||||||||||
Status: | veröffentlicht | ||||||||||||||||
Stichwörter: | Sentence retrieval, TF-ISF, Data pre-processing, Stemming, Lemmatization | ||||||||||||||||
HGF - Forschungsbereich: | Luftfahrt, Raumfahrt und Verkehr | ||||||||||||||||
HGF - Programm: | Luftfahrt | ||||||||||||||||
HGF - Programmthema: | Flugzeuge | ||||||||||||||||
DLR - Schwerpunkt: | Luftfahrt | ||||||||||||||||
DLR - Forschungsgebiet: | L AR - Aircraft Research | ||||||||||||||||
DLR - Teilgebiet (Projekt, Vorhaben): | L - Simulation und Validierung (alt) | ||||||||||||||||
Standort: | Bremen | ||||||||||||||||
Institute & Einrichtungen: | Institut für Softwaretechnologie Institut für Softwaretechnologie > Intelligente und verteilte Systeme | ||||||||||||||||
Hinterlegt von: | Doko, Alen | ||||||||||||||||
Hinterlegt am: | 14 Dez 2020 09:51 | ||||||||||||||||
Letzte Änderung: | 16 Jun 2021 13:45 |
Nur für Mitarbeiter des Archivs: Kontrollseite des Eintrags