elib
DLR-Header
DLR-Logo -> http://www.dlr.de
DLR Portal Home | Impressum | Datenschutz | Barrierefreiheit | Kontakt | English
Schriftgröße: [-] Text [+]

Enabling Task-Parameterized Imitation Learning in Unstructured Environments Using Visual Foundation Models

Gieraths, Valentin (2025) Enabling Task-Parameterized Imitation Learning in Unstructured Environments Using Visual Foundation Models. Masterarbeit, Technical University of Munich (TUM).

[img] PDF
36MB

Kurzfassung

The direct deployment of robots in dynamic environments remains a challenging problem and is traditionally achieved by manual and often complex programming, especially in industry. Different approaches have been proposed to address this challenge, including training or fine-tuning Vision-Language-Action Models (VLAs), reinforcement learning policies, and combined hybrid methods. However, these typically require vast amounts of data and computational resources, and cannot be easily adapted to new tasks. Alternative approaches such as Imitation Learning (IL) and Task-Parameterized Imitation Learning (TPIL) can train new skills rapidly with limited data, but they struggle to adapt to dynamic environments without additional information. This thesis presents a framework integrating Task-Parameterized Kernelized Movement Primitives (TP-KMPs) with pretrained Vision-Language Models (VLMs) to enable natural language-based robot control with minimal demonstration requirements. The modular architecture separates perception, reasoning, and execution, leveraging classical robotics methods and modern foundation models. Skills are acquired through kinesthetic demonstration (3–6 examples per skill) and executed via natural language commands. The framework employs dynamic tool generation to enable seamless integration, eliminating foundation model fine-tuning. As a TPIL approach, TP-KMPs generate smooth trajectories conditioned on task parameters set automatically by the VLM, enabling execution across diverse environmental configurations. Key contributions include the implementation of the complete framework along with a perception pipeline for 6D object pose estimation. Additionally, a probabilistic skill combination mechanism leverages the covariance structure of TP-KMPs to synthesize novel behaviors from existing skills without additional demonstrations. A covariance manipulation strategy addresses compatibility constraints when individual Kernelized Movement Primitives (KMPs) exhibit insufficient variation in the demonstration data. Experimental validation on a torque-controlled, 7-DoF German Aerospace Center (DLR) Safe, Autonomous Robotic Assistant (SARA) robot demonstrates robust skill execution, generalization to novel configurations, and successful skill composition for industrial and household manipulation. The framework achieves data efficiency comparable to traditional IL approaches while providing intuitive natural language interaction. Results confirm that pretrained VLMs can serve as the reasoning layer when provided with appropriately structured interfaces, maintaining zero-shot capabilities while addressing spatial reasoning limitations.

elib-URL des Eintrags:https://elib.dlr.de/220023/
Dokumentart:Hochschulschrift (Masterarbeit)
Titel:Enabling Task-Parameterized Imitation Learning in Unstructured Environments Using Visual Foundation Models
Autoren:
AutorenInstitution oder E-Mail-AdresseAutoren-ORCID-iDORCID Put Code
Gieraths, Valentinvalentin.gieraths (at) dlr.deNICHT SPEZIFIZIERTNICHT SPEZIFIZIERT
DLR-Supervisor:
BeitragsartDLR-SupervisorInstitution oder E-Mail-AdresseDLR-Supervisor-ORCID-iD
Thesis advisorKnauer, Markus WendelinMarkus.Knauer (at) dlr.dehttps://orcid.org/0000-0001-8229-9410
Thesis advisorSilverio, Joaojoao.silverio (at) dlr.dehttps://orcid.org/0000-0003-1428-8933
Thesis advisorAlbu-Schäffer, Alin OlimpiuAlin.Albu-Schaeffer (at) dlr.dehttps://orcid.org/0000-0001-5343-9074
Datum:1 Dezember 2025
Open Access:Ja
Seitenanzahl:104
Status:veröffentlicht
Stichwörter:Natural Language, Probabilistic Machine Learning, LLM, VLM, Interactive Robot Skill Learning, Selection, Combination and Execution,
Institution:Technical University of Munich (TUM)
Abteilung:TUM School of Computation, Information and Technology (CIT)
HGF - Forschungsbereich:Luftfahrt, Raumfahrt und Verkehr
HGF - Programm:Raumfahrt
HGF - Programmthema:Robotik
DLR - Schwerpunkt:Raumfahrt
DLR - Forschungsgebiet:R RO - Robotik
DLR - Teilgebiet (Projekt, Vorhaben):R - Synergieprojekt ASPIRO
Standort: Oberpfaffenhofen
Institute & Einrichtungen:Institut für Robotik und Mechatronik (ab 2013)
Institut für Robotik und Mechatronik (ab 2013) > Kognitive Robotik
Hinterlegt von: Knauer, Markus Wendelin
Hinterlegt am:04 Dez 2025 13:57
Letzte Änderung:04 Dez 2025 13:57

Nur für Mitarbeiter des Archivs: Kontrollseite des Eintrags

Blättern
Suchen
Hilfe & Kontakt
Informationen
OpenAIRE Validator logo electronic library verwendet EPrints 3.3.12
Gestaltung Webseite und Datenbank: Copyright © Deutsches Zentrum für Luft- und Raumfahrt (DLR). Alle Rechte vorbehalten.