Modelling Human Target Reaching using A novel predictive deep reinforcement learning technique
Date
2018-04-03T16:43:40Z
Authors
Sheikhnezhad Fard, Farzaneh
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
It is hypothesized that the brain builds an internal representation of the world and its body. Moreover, it is well established that human decision making and instrumental control uses multiple systems, some which are habitual and some which require planning. In this thesis, we proposed a novel model called adaptive observer\cite{fard2015modeling} that learns the internal representation of the world using dynamic neural fields (DNF). DNF is a well-known model that simulates brain activity in cortical tissues. By DNF the activity of the population of neurons is being considered instead of the activity of only one single neuron. Later, we introduce a model called \textit{\bf arbitrated predictive actor-critic (APAC)}~\cite{fard2017novel,fard2017anactor}. In APAC, we proposed a general architecture comprising both habitual and planning control paradigms by introducing an arbitrator that controls which subsystem is used at any time.
Both adaptive observer and APAC imply the internal model, however, they are different in some aspects. For example, the adaptive observer, unlike APAC, uses DNFs to represent neural activities. While, APAC, unlike the adaptive observer, can learn the kinematics of the system without a prior knowledge and combines two control systems for decision making. APAC model takes advantage of a fast habitual controller when it is reliable enough.
Both models are studied and tested under different conditions on a target reaching task. The adaptive observer was tested with a real robotic arm, while the APAC was examined with a simulated robot arm. In adaptive observer, a path integration technique is implied to reach the target. Such adaptive observer can also explain some interesting features and behaviours in the brain, namely moving with impaired sensory input, and motor adaptation. Through permutation of target-reaching conditions, we also demonstrate that APAC is capable of learning kinematics of the system rapidly without a priori knowledge and is robust to (A) changing environmental reward and kinematics, and (B) occluded vision. The arbitrator model is compared to pure planning and pure habitual instances of the model.
Description
Keywords
Machine Learning, Deep Learning, Supervised Learning, Deep Reinforcement Learning, Cognitive Robotics, Human behaviour modelling