Improved Exploration with Stochastic Policies in Deep Reinforcement Learning

Pitz, Johannes (2020) Improved Exploration with Stochastic Policies in Deep Reinforcement Learning. DLR-Interner Bericht. DLR-IB-RM-OP-2020-216. Masterarbeit. Technische Universität München (TUM). 80 S.

PDF
3MB

Kurzfassung

Deep reinforcement learning has recently shown promising results in robot control, but even current state-of-the-art algorithms fail in solving seemingly simple realistic tasks. For example, OpenAI et al. 2019 demonstrate the learning of dexterous in-hand manipulation of objects lying on the palm of an upside oriented robot hand. However, manipulating an object from above (i.e., the hand is oriented upside-down) turns out to be fundamentally more difficult to learn for current algorithms because the object has to be robustly grasped at all times to avoid immediate failure. In this thesis, we identify the commonly used naive exploration strategies as the main issue. Therefore, we propose to utilize more expressive stochastic policy distributions to enable reinforcement learning agents to learn to explore in a targeted manner. In particular, we extend the Soft Actor-Critic algorithm with policy distributions of varying expressiveness. We analyze how these variants explore in simplified environments with adjustable difficulties that we designed specifically to mimic the core problem of dexterous in-hand manipulation. We find that stochastic policies with expressive distributions can learn fundamentally more complex tasks. Moreover, beyond the exploration behavior, we show that in not perfectly observable environments, agents that represent their final (learned) policy with expressive distributions can solve tasks where commonly used simpler distributions fail.

elib-URL des Eintrags:

https://elib.dlr.de/139379/

Dokumentart:

Berichtsreihe (DLR-Interner Bericht, Masterarbeit)

Titel:

Improved Exploration with Stochastic Policies in Deep Reinforcement Learning

Autoren:

Autoren	Institution oder E-Mail-Adresse	Autoren-ORCID-iD	ORCID Put Code
Pitz, Johannes	DLR Institut für Robotik und Mechantronik	https://orcid.org/0000-0002-2629-1892	NICHT SPEZIFIZIERT

Datum:

2020

Referierte Publikation:

Nein

Open Access:

Seitenanzahl:

Status:

veröffentlicht

Stichwörter:

Deep Reinforcement Learning, Exploration, Stochastic Policies, In-Hand Manipulation, Soft Actor-Critc, Normalizing Flows

Institution:

Technische Universität München (TUM)

Abteilung:

Informatik

HGF - Forschungsbereich:

Luftfahrt, Raumfahrt und Verkehr

HGF - Programm:

Raumfahrt

HGF - Programmthema:

Technik für Raumfahrtsysteme

DLR - Schwerpunkt:

Raumfahrt

DLR - Forschungsgebiet:

R SY - Technik für Raumfahrtsysteme

DLR - Teilgebiet (Projekt, Vorhaben):

R - On-Orbit Servicing [SY]

Standort:

Oberpfaffenhofen

Institute & Einrichtungen:

Institut für Robotik und Mechatronik (ab 2013) > Autonomie und Fernprogrammierung

Hinterlegt von:

Geyer, Günther

Hinterlegt am:

09 Dez 2020 22:39

Letzte Änderung:

09 Dez 2020 22:39

Nur für Mitarbeiter des Archivs: Kontrollseite des Eintrags