Pitz, Johannes (2020) Improved Exploration with Stochastic Policies in Deep Reinforcement Learning. DLR-Interner Bericht. DLR-IB-RM-OP-2020-216. Masterarbeit. Technische Universität München (TUM). 80 S.
PDF
3MB |
Kurzfassung
Deep reinforcement learning has recently shown promising results in robot control, but even current state-of-the-art algorithms fail in solving seemingly simple realistic tasks. For example, OpenAI et al. 2019 demonstrate the learning of dexterous in-hand manipulation of objects lying on the palm of an upside oriented robot hand. However, manipulating an object from above (i.e., the hand is oriented upside-down) turns out to be fundamentally more difficult to learn for current algorithms because the object has to be robustly grasped at all times to avoid immediate failure. In this thesis, we identify the commonly used naive exploration strategies as the main issue. Therefore, we propose to utilize more expressive stochastic policy distributions to enable reinforcement learning agents to learn to explore in a targeted manner. In particular, we extend the Soft Actor-Critic algorithm with policy distributions of varying expressiveness. We analyze how these variants explore in simplified environments with adjustable difficulties that we designed specifically to mimic the core problem of dexterous in-hand manipulation. We find that stochastic policies with expressive distributions can learn fundamentally more complex tasks. Moreover, beyond the exploration behavior, we show that in not perfectly observable environments, agents that represent their final (learned) policy with expressive distributions can solve tasks where commonly used simpler distributions fail.
elib-URL des Eintrags: | https://elib.dlr.de/139379/ | ||||||||
---|---|---|---|---|---|---|---|---|---|
Dokumentart: | Berichtsreihe (DLR-Interner Bericht, Masterarbeit) | ||||||||
Titel: | Improved Exploration with Stochastic Policies in Deep Reinforcement Learning | ||||||||
Autoren: |
| ||||||||
Datum: | 2020 | ||||||||
Referierte Publikation: | Nein | ||||||||
Open Access: | Ja | ||||||||
Seitenanzahl: | 80 | ||||||||
Status: | veröffentlicht | ||||||||
Stichwörter: | Deep Reinforcement Learning, Exploration, Stochastic Policies, In-Hand Manipulation, Soft Actor-Critc, Normalizing Flows | ||||||||
Institution: | Technische Universität München (TUM) | ||||||||
Abteilung: | Informatik | ||||||||
HGF - Forschungsbereich: | Luftfahrt, Raumfahrt und Verkehr | ||||||||
HGF - Programm: | Raumfahrt | ||||||||
HGF - Programmthema: | Technik für Raumfahrtsysteme | ||||||||
DLR - Schwerpunkt: | Raumfahrt | ||||||||
DLR - Forschungsgebiet: | R SY - Technik für Raumfahrtsysteme | ||||||||
DLR - Teilgebiet (Projekt, Vorhaben): | R - On-Orbit Servicing [SY] | ||||||||
Standort: | Oberpfaffenhofen | ||||||||
Institute & Einrichtungen: | Institut für Robotik und Mechatronik (ab 2013) > Autonomie und Fernprogrammierung | ||||||||
Hinterlegt von: | Geyer, Günther | ||||||||
Hinterlegt am: | 09 Dez 2020 22:39 | ||||||||
Letzte Änderung: | 09 Dez 2020 22:39 |
Nur für Mitarbeiter des Archivs: Kontrollseite des Eintrags