Data-Efficient Learning for Autonomous Robots
Speaker: Dr Marc Deisenroth (Imperial College London)
Date: February 7th 2018
Many thanks to Dr Marc Deisenroth (Imperial College London) who presented 'Data-Efficient Learning for Autonomous Robots'
Abstract: Trial-and-error based reinforcement learning (RL) has seen rapid advancements in recent times, especially with the advent of deep neural networks. However, the majority of autonomous RL algorithms either rely on engineered features or a large number of interactions with the environment. Such a large number of interactions may be impractical in many real-world applications. For example, robots are subject to wear and tear and, hence, millions of interactions may change or damage the system.
To address this problem, current learning approaches typically require task-specific knowledge in form of expert demonstrations, pre-shaped policies, or the underlying dynamics. In the first part of the talk, I follow a different approach and speed up learning by efficiently extracting information from sparse data. In particular, we propose to learn a probabilistic, non-parametric Gaussian process dynamics model. By explicitly incorporating model uncertainty in long-term planning and controller learning my approach reduces the effects of model errors, a key problem in model-based learning. Compared to state-of-the art reinforcement learning our model-based policy search method achieves an unprecedented speed of learning, which makes is most promising for application to real systems. We demonstrate its applicability to autonomous learning from scratch on real robot and control tasks. To reduce the number of system interactions while naturally handling state or control constraints, we extend the above framework and propose a model-based RL framework based on Model Predictive Control (MPC) using learned probabilistic dynamics models. We provide theoretical guarantees for the first-order optimality in the GP-based transition models with deterministic approximate inference for long-term planning. The proposed framework demonstrates superior data efficiency and learning rates compared to the current state of the art.