Enabling Task-Parameterized Imitation Learning in Unstructured Environments Using Visual Foundation Models

Gieraths, Valentin (2025) Enabling Task-Parameterized Imitation Learning in Unstructured Environments Using Visual Foundation Models. Masterarbeit, Technical University of Munich (TUM).

PDF
36MB

Kurzfassung

The direct deployment of robots in dynamic environments remains a challenging problem and is traditionally achieved by manual and often complex programming, especially in industry. Different approaches have been proposed to address this challenge, including training or fine-tuning Vision-Language-Action Models (VLAs), reinforcement learning policies, and combined hybrid methods. However, these typically require vast amounts of data and computational resources, and cannot be easily adapted to new tasks. Alternative approaches such as Imitation Learning (IL) and Task-Parameterized Imitation Learning (TPIL) can train new skills rapidly with limited data, but they struggle to adapt to dynamic environments without additional information. This thesis presents a framework integrating Task-Parameterized Kernelized Movement Primitives (TP-KMPs) with pretrained Vision-Language Models (VLMs) to enable natural language-based robot control with minimal demonstration requirements. The modular architecture separates perception, reasoning, and execution, leveraging classical robotics methods and modern foundation models. Skills are acquired through kinesthetic demonstration (3–6 examples per skill) and executed via natural language commands. The framework employs dynamic tool generation to enable seamless integration, eliminating foundation model fine-tuning. As a TPIL approach, TP-KMPs generate smooth trajectories conditioned on task parameters set automatically by the VLM, enabling execution across diverse environmental configurations. Key contributions include the implementation of the complete framework along with a perception pipeline for 6D object pose estimation. Additionally, a probabilistic skill combination mechanism leverages the covariance structure of TP-KMPs to synthesize novel behaviors from existing skills without additional demonstrations. A covariance manipulation strategy addresses compatibility constraints when individual Kernelized Movement Primitives (KMPs) exhibit insufficient variation in the demonstration data. Experimental validation on a torque-controlled, 7-DoF German Aerospace Center (DLR) Safe, Autonomous Robotic Assistant (SARA) robot demonstrates robust skill execution, generalization to novel configurations, and successful skill composition for industrial and household manipulation. The framework achieves data efficiency comparable to traditional IL approaches while providing intuitive natural language interaction. Results confirm that pretrained VLMs can serve as the reasoning layer when provided with appropriately structured interfaces, maintaining zero-shot capabilities while addressing spatial reasoning limitations.

elib-URL des Eintrags:

https://elib.dlr.de/220023/

Dokumentart:

Hochschulschrift (Masterarbeit)

Titel:

Enabling Task-Parameterized Imitation Learning in Unstructured Environments Using Visual Foundation Models

Autoren:

Autoren	Institution oder E-Mail-Adresse	Autoren-ORCID-iD	ORCID Put Code
Gieraths, Valentin	valentin.gieraths (at) dlr.de	NICHT SPEZIFIZIERT	NICHT SPEZIFIZIERT

DLR-Supervisor:

Beitragsart	DLR-Supervisor	Institution oder E-Mail-Adresse	DLR-Supervisor-ORCID-iD
Thesis advisor	Knauer, Markus Wendelin	Markus.Knauer (at) dlr.de	https://orcid.org/0000-0001-8229-9410
Thesis advisor	Silverio, Joao	joao.silverio (at) dlr.de	https://orcid.org/0000-0003-1428-8933
Thesis advisor	Albu-Schäffer, Alin Olimpiu	Alin.Albu-Schaeffer (at) dlr.de	https://orcid.org/0000-0001-5343-9074

Datum:

1 Dezember 2025

Open Access:

Seitenanzahl:

104

Status:

veröffentlicht

Stichwörter:

Natural Language, Probabilistic Machine Learning, LLM, VLM, Interactive Robot Skill Learning, Selection, Combination and Execution,

Institution:

Technical University of Munich (TUM)

Abteilung:

TUM School of Computation, Information and Technology (CIT)

HGF - Forschungsbereich:

Luftfahrt, Raumfahrt und Verkehr

HGF - Programm:

Raumfahrt

HGF - Programmthema:

Robotik

DLR - Schwerpunkt:

Raumfahrt

DLR - Forschungsgebiet:

R RO - Robotik

DLR - Teilgebiet (Projekt, Vorhaben):

R - Synergieprojekt ASPIRO

Standort:

Oberpfaffenhofen

Institute & Einrichtungen:

Institut für Robotik und Mechatronik (ab 2013)
Institut für Robotik und Mechatronik (ab 2013) > Kognitive Robotik

Hinterlegt von:

Knauer, Markus Wendelin

Hinterlegt am:

04 Dez 2025 13:57

Letzte Änderung:

04 Dez 2025 13:57

Nur für Mitarbeiter des Archivs: Kontrollseite des Eintrags