elib
DLR-Header
DLR-Logo -> http://www.dlr.de
DLR Portal Home | Impressum | Datenschutz | Kontakt | English
Schriftgröße: [-] Text [+]

Improved Exploration with Stochastic Policies in Deep Reinforcement Learning

Pitz, Johannes (2020) Improved Exploration with Stochastic Policies in Deep Reinforcement Learning. DLR-Interner Bericht. DLR-IB-RM-OP-2020-216. Masterarbeit. Technische Universität München (TUM). 80 S.

[img] PDF
3MB

Kurzfassung

Deep reinforcement learning has recently shown promising results in robot control, but even current state-of-the-art algorithms fail in solving seemingly simple realistic tasks. For example, OpenAI et al. 2019 demonstrate the learning of dexterous in-hand manipulation of objects lying on the palm of an upside oriented robot hand. However, manipulating an object from above (i.e., the hand is oriented upside-down) turns out to be fundamentally more difficult to learn for current algorithms because the object has to be robustly grasped at all times to avoid immediate failure. In this thesis, we identify the commonly used naive exploration strategies as the main issue. Therefore, we propose to utilize more expressive stochastic policy distributions to enable reinforcement learning agents to learn to explore in a targeted manner. In particular, we extend the Soft Actor-Critic algorithm with policy distributions of varying expressiveness. We analyze how these variants explore in simplified environments with adjustable difficulties that we designed specifically to mimic the core problem of dexterous in-hand manipulation. We find that stochastic policies with expressive distributions can learn fundamentally more complex tasks. Moreover, beyond the exploration behavior, we show that in not perfectly observable environments, agents that represent their final (learned) policy with expressive distributions can solve tasks where commonly used simpler distributions fail.

elib-URL des Eintrags:https://elib.dlr.de/139379/
Dokumentart:Berichtsreihe (DLR-Interner Bericht, Masterarbeit)
Titel:Improved Exploration with Stochastic Policies in Deep Reinforcement Learning
Autoren:
AutorenInstitution oder E-Mail-AdresseAutoren-ORCID-iDORCID Put Code
Pitz, JohannesDLR Institut für Robotik und Mechantronikhttps://orcid.org/0000-0002-2629-1892NICHT SPEZIFIZIERT
Datum:2020
Referierte Publikation:Nein
Open Access:Ja
Seitenanzahl:80
Status:veröffentlicht
Stichwörter:Deep Reinforcement Learning, Exploration, Stochastic Policies, In-Hand Manipulation, Soft Actor-Critc, Normalizing Flows
Institution:Technische Universität München (TUM)
Abteilung:Informatik
HGF - Forschungsbereich:Luftfahrt, Raumfahrt und Verkehr
HGF - Programm:Raumfahrt
HGF - Programmthema:Technik für Raumfahrtsysteme
DLR - Schwerpunkt:Raumfahrt
DLR - Forschungsgebiet:R SY - Technik für Raumfahrtsysteme
DLR - Teilgebiet (Projekt, Vorhaben):R - On-Orbit Servicing [SY]
Standort: Oberpfaffenhofen
Institute & Einrichtungen:Institut für Robotik und Mechatronik (ab 2013) > Autonomie und Fernprogrammierung
Hinterlegt von: Geyer, Günther
Hinterlegt am:09 Dez 2020 22:39
Letzte Änderung:09 Dez 2020 22:39

Nur für Mitarbeiter des Archivs: Kontrollseite des Eintrags

Blättern
Suchen
Hilfe & Kontakt
Informationen
electronic library verwendet EPrints 3.3.12
Gestaltung Webseite und Datenbank: Copyright © Deutsches Zentrum für Luft- und Raumfahrt (DLR). Alle Rechte vorbehalten.