Piazza, Daniele (2026) A Local Access-Controlled Multi-Source RAG Framework For Privacy-Sensitive Enterprise Environments. Masterarbeit, Università Degli Studi di Milano.
|
PDF
- Nur DLR-intern zugänglich
1MB |
Kurzfassung
Accessing internal knowledge in large organisations is often slow and fragmented, particularly when information is distributed across multiple heterogeneous platforms. In security-critical environments such as space agencies, the challenge is further compounded by strict access control requirements and data sovereignty constraints that prevent routing queries through external infrastructure, making these not optional design goals but hard operational requirements. This thesis presents an access-controlled multi-source Retrieval-Augmented Generation (RAG) system developed for the German Aerospace Center (DLR) to improve access to enterprise knowledge while strictly enforcing source- level access control. The central architectural contribution is an on-demand retrieval strategy that eliminates the need for a persistent vector index. Rather than crawling and pre-indexing the knowledge base with a privileged service account, the system performs live searches at query time using the authenticated session of the requesting user, inherently guaranteeing Access Control List (ACL) compliance, absolute data freshness and minimal credential exposure. A unified reader interface abstracts the heterogeneity of the two connected enterprise knowledge bases, Atlassian Confluence and Microsoft SharePoint, into a single retrieval pipeline, while remaining extensible to additional platforms without requiring modifications to the core RAG logic. The system operates fully on-premise, using locally hosted Large Language Models (LLMs) managed with Ollama and accessed through Open WebUI, adopting a privacy-by-design approach in which no internal data transits external cloud services. Deployment is container-based and automated through a GitLab Continuous Integration (CI) pipeline. To validate and iteratively refine the system, a curated evaluation dataset was constructed from internal agency documentation through a combination of synthetic generation, automated filtering and human verification. Both retrieval effectiveness and generation quality were assessed using embedding-based and LLM-as-judge metrics. In addition to the formal evaluation, practical usage within the agency confirmed the viability of a fully local, access-controlled RAG system in a large enterprise environment.
| elib-URL des Eintrags: | https://elib.dlr.de/224327/ | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Dokumentart: | Hochschulschrift (Masterarbeit) | ||||||||
| Titel: | A Local Access-Controlled Multi-Source RAG Framework For Privacy-Sensitive Enterprise Environments | ||||||||
| Autoren: |
| ||||||||
| DLR-Supervisor: |
| ||||||||
| Datum: | April 2026 | ||||||||
| Open Access: | Nein | ||||||||
| Seitenanzahl: | 91 | ||||||||
| Status: | veröffentlicht | ||||||||
| Stichwörter: | Retrieval-Augmented Generation, RAG, Generative AI, Natural Language Processing, Large Language Models, LLMs, Privacy-by-Design, On-Premise Deployment, Access Control, Enterprise Knowledge Management, Enterprise Search | ||||||||
| Institution: | Università Degli Studi di Milano | ||||||||
| HGF - Forschungsbereich: | Luftfahrt, Raumfahrt und Verkehr | ||||||||
| HGF - Programm: | Raumfahrt | ||||||||
| HGF - Programmthema: | Technik für Raumfahrtsysteme | ||||||||
| DLR - Schwerpunkt: | Raumfahrt | ||||||||
| DLR - Forschungsgebiet: | R SY - Technik für Raumfahrtsysteme | ||||||||
| DLR - Teilgebiet (Projekt, Vorhaben): | R - Digitale Transformation in der Raumfahrt [SY] | ||||||||
| Standort: | Bremen | ||||||||
| Institute & Einrichtungen: | Institut für Raumfahrtsysteme > Systementwicklung und Projektbüro | ||||||||
| Hinterlegt von: | Krummen, Sven | ||||||||
| Hinterlegt am: | 05 Mai 2026 11:47 | ||||||||
| Letzte Änderung: | 05 Mai 2026 11:47 |
Nur für Mitarbeiter des Archivs: Kontrollseite des Eintrags