Stäbler, Maximilian und Turnbull, Steffen und Müller, Tobias und Langdon, Christoph und Marx-Gómez, Jorge und Köster, Frank (2025) The Impact of Chunking Strategies on Domain-Specific Information Retrieval in RAG Systems. In: 2025 IEEE International Conference on Omni-Layer Intelligent Systems, COINS 2025. International Conference on Omni-Layer Intelligent Systems (COINS), 2025-08-04 - 2025-08-09, Madison, WI, USA. doi: 10.1109/COINS65080.2025.11125724. ISBN 979-833152037-3. ISSN 2996-5322.
|
PDF
1MB |
Kurzfassung
We benchmark 90 chunker–model configurations across seven arXiv domains (2 520 retrieval runs) and show that a sentence-based splitter with a 512-token window and 200-token overlap reaches the highest token-level Intersection-overUnion (IoU ≈ 0.099) while remaining compute-efficient. Our study systematically pairs seven open-source embedding models with semantic and fixed-size chunking strategies, measuring their impact on retrieval quality and latency in RetrievalAugmented Generation (RAG) pipelines. Results reveal that (i) sentence splitting consistently outperforms alternative heuristics, (ii) smaller embeddings deliver more stable cross-domain performance than larger ones, and (iii) finance texts benefit most, whereas astrophysics lags. The accompanying code provides practitioners with empirically grounded guidelines for selecting chunking–embedding combinations that balance accuracy and efficiency.
| elib-URL des Eintrags: | https://elib.dlr.de/221921/ | ||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Dokumentart: | Konferenzbeitrag (Vortrag) | ||||||||||||||||||||||||||||
| Titel: | The Impact of Chunking Strategies on Domain-Specific Information Retrieval in RAG Systems | ||||||||||||||||||||||||||||
| Autoren: |
| ||||||||||||||||||||||||||||
| Datum: | Oktober 2025 | ||||||||||||||||||||||||||||
| Erschienen in: | 2025 IEEE International Conference on Omni-Layer Intelligent Systems, COINS 2025 | ||||||||||||||||||||||||||||
| Referierte Publikation: | Ja | ||||||||||||||||||||||||||||
| Open Access: | Ja | ||||||||||||||||||||||||||||
| Gold Open Access: | Nein | ||||||||||||||||||||||||||||
| In SCOPUS: | Ja | ||||||||||||||||||||||||||||
| In ISI Web of Science: | Nein | ||||||||||||||||||||||||||||
| DOI: | 10.1109/COINS65080.2025.11125724 | ||||||||||||||||||||||||||||
| ISSN: | 2996-5322 | ||||||||||||||||||||||||||||
| ISBN: | 979-833152037-3 | ||||||||||||||||||||||||||||
| Status: | veröffentlicht | ||||||||||||||||||||||||||||
| Stichwörter: | RAG, Information Retrieval, Chunking | ||||||||||||||||||||||||||||
| Veranstaltungstitel: | International Conference on Omni-Layer Intelligent Systems (COINS) | ||||||||||||||||||||||||||||
| Veranstaltungsort: | Madison, WI, USA | ||||||||||||||||||||||||||||
| Veranstaltungsart: | internationale Konferenz | ||||||||||||||||||||||||||||
| Veranstaltungsbeginn: | 4 August 2025 | ||||||||||||||||||||||||||||
| Veranstaltungsende: | 9 August 2025 | ||||||||||||||||||||||||||||
| HGF - Forschungsbereich: | Luftfahrt, Raumfahrt und Verkehr | ||||||||||||||||||||||||||||
| HGF - Programm: | Verkehr | ||||||||||||||||||||||||||||
| HGF - Programmthema: | Verkehrssystem | ||||||||||||||||||||||||||||
| DLR - Schwerpunkt: | Verkehr | ||||||||||||||||||||||||||||
| DLR - Forschungsgebiet: | V VS - Verkehrssystem | ||||||||||||||||||||||||||||
| DLR - Teilgebiet (Projekt, Vorhaben): | V - DiVe - Digital organisiertes Verkehrssystem | ||||||||||||||||||||||||||||
| Standort: | Ulm | ||||||||||||||||||||||||||||
| Institute & Einrichtungen: | Institut für KI-Sicherheit | ||||||||||||||||||||||||||||
| Hinterlegt von: | Stäbler, Maximilian | ||||||||||||||||||||||||||||
| Hinterlegt am: | 12 Jan 2026 08:51 | ||||||||||||||||||||||||||||
| Letzte Änderung: | 13 Jan 2026 10:05 |
Nur für Mitarbeiter des Archivs: Kontrollseite des Eintrags