Gharib Bafghi, Zeinab und Reinartz, Peter (2025) VLM-Based Building Change Detection with CNN-Transformer. In: ISPRS, Archives, XLVIII (4/W16), Seiten 39-44. ISPRS. 9th International Conference on Smart Data and Smart Cities (SDSC), 2025-09-02 - 2025-09-05, Kashiwa, Japan. doi: 10.5194/isprs-archives-XLVIII-4-W16-2025-39-2025.
|
PDF
1MB |
Offizielle URL: https://isprs-archives.copernicus.org/articles/XLVIII-4-W16-2025/39/2025/isprs-archives-XLVIII-4-W16-2025-39-2025.pdf
Kurzfassung
Accurate building change detection in high-resolution satellite imagery is critical for urban planning, disaster response, and smart city applications. Existing methods often rely on large labeled datasets or handcrafted features, limiting scalability across diverse geographic regions. In this paper, we propose a hybrid framework that integrates a pretrained Vision-Language Model (Grounding DINO) with a lightweight CNN-Transformer architecture to perform text-guided building change detection. Without any fine-tuning, Grounding DINO generates semantic building masks from bi-temporal image pairs using the text prompt “building,” which are used to amplify structural features in a ResNet18 backbone. A custom Transformer encoder with dual spatial and channel attention refines these features to capture both local details and global context. On the LEVIR-CD dataset, our framework improves Recall by +3.98%, F1-Score by +3.01%, and Intersection over Union (IoU) by +4.70% compared to a CNN-Transformer baseline. These results highlight the potential of vision-language models to enhance remote sensing workflows without extensive domain-specific fine-tuning.
| elib-URL des Eintrags: | https://elib.dlr.de/222233/ | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Dokumentart: | Konferenzbeitrag (Poster) | ||||||||||||
| Titel: | VLM-Based Building Change Detection with CNN-Transformer | ||||||||||||
| Autoren: |
| ||||||||||||
| Datum: | 2025 | ||||||||||||
| Erschienen in: | ISPRS, Archives | ||||||||||||
| Referierte Publikation: | Ja | ||||||||||||
| Open Access: | Ja | ||||||||||||
| Gold Open Access: | Nein | ||||||||||||
| In SCOPUS: | Nein | ||||||||||||
| In ISI Web of Science: | Nein | ||||||||||||
| Band: | XLVIII | ||||||||||||
| DOI: | 10.5194/isprs-archives-XLVIII-4-W16-2025-39-2025 | ||||||||||||
| Seitenbereich: | Seiten 39-44 | ||||||||||||
| Verlag: | ISPRS | ||||||||||||
| Name der Reihe: | Volume XLVIII-4/W16-2025 | ||||||||||||
| Status: | veröffentlicht | ||||||||||||
| Stichwörter: | Building Change Detection, Vision-Language Model, Satelllite Imagery, Transformer Model, Grounding Dino | ||||||||||||
| Veranstaltungstitel: | 9th International Conference on Smart Data and Smart Cities (SDSC) | ||||||||||||
| Veranstaltungsort: | Kashiwa, Japan | ||||||||||||
| Veranstaltungsart: | internationale Konferenz | ||||||||||||
| Veranstaltungsbeginn: | 2 September 2025 | ||||||||||||
| Veranstaltungsende: | 5 September 2025 | ||||||||||||
| Veranstalter : | ISPRS, TC IV | ||||||||||||
| HGF - Forschungsbereich: | Luftfahrt, Raumfahrt und Verkehr | ||||||||||||
| HGF - Programm: | Verkehr | ||||||||||||
| HGF - Programmthema: | Straßenverkehr | ||||||||||||
| DLR - Schwerpunkt: | Verkehr | ||||||||||||
| DLR - Forschungsgebiet: | V ST Straßenverkehr | ||||||||||||
| DLR - Teilgebiet (Projekt, Vorhaben): | V - V&V4NGC - Methoden, Prozesse und Werkzeugketten für die Validierung & Verifikation von NGC | ||||||||||||
| Standort: | Oberpfaffenhofen | ||||||||||||
| Institute & Einrichtungen: | Institut für Methodik der Fernerkundung > Photogrammetrie und Bildanalyse | ||||||||||||
| Hinterlegt von: | Reinartz, Prof. Dr.. Peter | ||||||||||||
| Hinterlegt am: | 21 Jan 2026 12:30 | ||||||||||||
| Letzte Änderung: | 25 Jan 2026 15:49 |
Nur für Mitarbeiter des Archivs: Kontrollseite des Eintrags