elib
DLR-Header
DLR-Logo -> http://www.dlr.de
DLR Portal Home | Imprint | Privacy Policy | Contact | Deutsch
Fontsize: [-] Text [+]

Improved Exploration with Stochastic Policies in Deep Reinforcement Learning

Pitz, Johannes (2020) Improved Exploration with Stochastic Policies in Deep Reinforcement Learning. DLR-Interner Bericht. DLR-IB-RM-OP-2020-216. Master's. Technische Universität München (TUM). 80 S.

[img] PDF
3MB

Abstract

Deep reinforcement learning has recently shown promising results in robot control, but even current state-of-the-art algorithms fail in solving seemingly simple realistic tasks. For example, OpenAI et al. 2019 demonstrate the learning of dexterous in-hand manipulation of objects lying on the palm of an upside oriented robot hand. However, manipulating an object from above (i.e., the hand is oriented upside-down) turns out to be fundamentally more difficult to learn for current algorithms because the object has to be robustly grasped at all times to avoid immediate failure. In this thesis, we identify the commonly used naive exploration strategies as the main issue. Therefore, we propose to utilize more expressive stochastic policy distributions to enable reinforcement learning agents to learn to explore in a targeted manner. In particular, we extend the Soft Actor-Critic algorithm with policy distributions of varying expressiveness. We analyze how these variants explore in simplified environments with adjustable difficulties that we designed specifically to mimic the core problem of dexterous in-hand manipulation. We find that stochastic policies with expressive distributions can learn fundamentally more complex tasks. Moreover, beyond the exploration behavior, we show that in not perfectly observable environments, agents that represent their final (learned) policy with expressive distributions can solve tasks where commonly used simpler distributions fail.

Item URL in elib:https://elib.dlr.de/139379/
Document Type:Monograph (DLR-Interner Bericht, Master's)
Title:Improved Exploration with Stochastic Policies in Deep Reinforcement Learning
Authors:
AuthorsInstitution or Email of AuthorsAuthor's ORCID iD
Pitz, JohannesDLR Institut für Robotik und Mechantronikhttps://orcid.org/0000-0002-2629-1892
Date:2020
Refereed publication:No
Open Access:Yes
Gold Open Access:No
In SCOPUS:No
In ISI Web of Science:No
Number of Pages:80
Status:Published
Keywords:Deep Reinforcement Learning, Exploration, Stochastic Policies, In-Hand Manipulation, Soft Actor-Critc, Normalizing Flows
Institution:Technische Universität München (TUM)
Department:Informatik
HGF - Research field:Aeronautics, Space and Transport
HGF - Program:Space
HGF - Program Themes:Space System Technology
DLR - Research area:Raumfahrt
DLR - Program:R SY - Space System Technology
DLR - Research theme (Project):R - On-Orbit Servicing [SY]
Location: Oberpfaffenhofen
Institutes and Institutions:Institute of Robotics and Mechatronics (since 2013) > Autonomy and Teleoperation
Deposited By: Geyer, Günther
Deposited On:09 Dec 2020 22:39
Last Modified:09 Dec 2020 22:39

Repository Staff Only: item control page

Browse
Search
Help & Contact
Information
electronic library is running on EPrints 3.3.12
Copyright © 2008-2017 German Aerospace Center (DLR). All rights reserved.