Gieraths, Valentin (2025) Enabling Task-Parameterized Imitation Learning in Unstructured Environments Using Visual Foundation Models. Masterarbeit, Technical University of Munich (TUM).
|
PDF
36MB |
Kurzfassung
The direct deployment of robots in dynamic environments remains a challenging problem and is traditionally achieved by manual and often complex programming, especially in industry. Different approaches have been proposed to address this challenge, including training or fine-tuning Vision-Language-Action Models (VLAs), reinforcement learning policies, and combined hybrid methods. However, these typically require vast amounts of data and computational resources, and cannot be easily adapted to new tasks. Alternative approaches such as Imitation Learning (IL) and Task-Parameterized Imitation Learning (TPIL) can train new skills rapidly with limited data, but they struggle to adapt to dynamic environments without additional information. This thesis presents a framework integrating Task-Parameterized Kernelized Movement Primitives (TP-KMPs) with pretrained Vision-Language Models (VLMs) to enable natural language-based robot control with minimal demonstration requirements. The modular architecture separates perception, reasoning, and execution, leveraging classical robotics methods and modern foundation models. Skills are acquired through kinesthetic demonstration (3–6 examples per skill) and executed via natural language commands. The framework employs dynamic tool generation to enable seamless integration, eliminating foundation model fine-tuning. As a TPIL approach, TP-KMPs generate smooth trajectories conditioned on task parameters set automatically by the VLM, enabling execution across diverse environmental configurations. Key contributions include the implementation of the complete framework along with a perception pipeline for 6D object pose estimation. Additionally, a probabilistic skill combination mechanism leverages the covariance structure of TP-KMPs to synthesize novel behaviors from existing skills without additional demonstrations. A covariance manipulation strategy addresses compatibility constraints when individual Kernelized Movement Primitives (KMPs) exhibit insufficient variation in the demonstration data. Experimental validation on a torque-controlled, 7-DoF German Aerospace Center (DLR) Safe, Autonomous Robotic Assistant (SARA) robot demonstrates robust skill execution, generalization to novel configurations, and successful skill composition for industrial and household manipulation. The framework achieves data efficiency comparable to traditional IL approaches while providing intuitive natural language interaction. Results confirm that pretrained VLMs can serve as the reasoning layer when provided with appropriately structured interfaces, maintaining zero-shot capabilities while addressing spatial reasoning limitations.
| elib-URL des Eintrags: | https://elib.dlr.de/220023/ | ||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Dokumentart: | Hochschulschrift (Masterarbeit) | ||||||||||||||||
| Titel: | Enabling Task-Parameterized Imitation Learning in Unstructured Environments Using Visual Foundation Models | ||||||||||||||||
| Autoren: |
| ||||||||||||||||
| DLR-Supervisor: |
| ||||||||||||||||
| Datum: | 1 Dezember 2025 | ||||||||||||||||
| Open Access: | Ja | ||||||||||||||||
| Seitenanzahl: | 104 | ||||||||||||||||
| Status: | veröffentlicht | ||||||||||||||||
| Stichwörter: | Natural Language, Probabilistic Machine Learning, LLM, VLM, Interactive Robot Skill Learning, Selection, Combination and Execution, | ||||||||||||||||
| Institution: | Technical University of Munich (TUM) | ||||||||||||||||
| Abteilung: | TUM School of Computation, Information and Technology (CIT) | ||||||||||||||||
| HGF - Forschungsbereich: | Luftfahrt, Raumfahrt und Verkehr | ||||||||||||||||
| HGF - Programm: | Raumfahrt | ||||||||||||||||
| HGF - Programmthema: | Robotik | ||||||||||||||||
| DLR - Schwerpunkt: | Raumfahrt | ||||||||||||||||
| DLR - Forschungsgebiet: | R RO - Robotik | ||||||||||||||||
| DLR - Teilgebiet (Projekt, Vorhaben): | R - Synergieprojekt ASPIRO | ||||||||||||||||
| Standort: | Oberpfaffenhofen | ||||||||||||||||
| Institute & Einrichtungen: | Institut für Robotik und Mechatronik (ab 2013) Institut für Robotik und Mechatronik (ab 2013) > Kognitive Robotik | ||||||||||||||||
| Hinterlegt von: | Knauer, Markus Wendelin | ||||||||||||||||
| Hinterlegt am: | 04 Dez 2025 13:57 | ||||||||||||||||
| Letzte Änderung: | 04 Dez 2025 13:57 |
Nur für Mitarbeiter des Archivs: Kontrollseite des Eintrags