Are Vision-Language Foundation Models Able to Fly?

Rüter, Joachim und Davydov, Philipp und Maienschein, Theresa Diana und Durak, Umut und Dauer, Johann C. (2025) Are Vision-Language Foundation Models Able to Fly? In: 44th AIAA DATC/IEEE Digital Avionics Systems Conference, DASC 2025. Institute of Electrical and Electronics Engineers Inc.. Digital Avionics Systems Conference, 2025-09-14, Montreal, Kanada. doi: 10.1109/DASC66011.2025.11257290. ISBN 979-833152519-4. ISSN 2155-7195.

PDF - Nur DLR-intern zugänglich
5MB

Kurzfassung

Safe autonomous aircraft require accurate environment perception, which can be achieved through semantic segmentation of camera images. However, training neural networks relies on large, diverse datasets that are often unavailable in aviation. Vision-language foundation models offer a promising alternative, but their accuracy for aviation tasks is an open question as the aerial perspective might not be adequately represented in the original training data. Against this background, this paper investigates the performance of two vision-language foundation models, CLIPSeg and CAT-Seg, on an aerial image dataset. Our experiments show that the models can achieve competitive semantic segmentation performance without aviation-specific training. This paper further examines prompt engineering and discusses challenges of deploying these models in aviation. While certification and runtime constraints pose significant hurdles, our findings suggest that vision-language foundation models have potential for improving environment perception in aviation and may reduce the need for extensive training data in the future.

elib-URL des Eintrags: