Modelling Human Target Reaching using A novel predictive deep reinforcement learning technique
Abstract
It is hypothesized that the brain builds an internal representation of the world and its body. Moreover, it is well established that human decision making and instrumental control uses multiple systems, some which are habitual and some which require planning. In this thesis, we proposed a novel model called adaptive observer\cite{fard2015modeling} that learns the internal representation of the world using dynamic neural fields (DNF). DNF is a well-known model that simulates brain activity in cortical tissues. By DNF the activity of the population of neurons is being considered instead of the activity of only one single neuron. Later, we introduce a model called \textit{\bf arbitrated predictive actor-critic (APAC)}~\cite{fard2017novel,fard2017anactor}. In APAC, we proposed a general architecture comprising both habitual and planning control paradigms by introducing an arbitrator that controls which subsystem is used at any time.
Both adaptive observer and APAC imply the internal model, however, they are different in some aspects. For example, the adaptive observer, unlike APAC, uses DNFs to represent neural activities. While, APAC, unlike the adaptive observer, can learn the kinematics of the system without a prior knowledge and combines two control systems for decision making. APAC model takes advantage of a fast habitual controller when it is reliable enough.
Both models are studied and tested under different conditions on a target reaching task. The adaptive observer was tested with a real robotic arm, while the APAC was examined with a simulated robot arm. In adaptive observer, a path integration technique is implied to reach the target. Such adaptive observer can also explain some interesting features and behaviours in the brain, namely moving with impaired sensory input, and motor adaptation. Through permutation of target-reaching conditions, we also demonstrate that APAC is capable of learning kinematics of the system rapidly without a priori knowledge and is robust to (A) changing environmental reward and kinematics, and (B) occluded vision. The arbitrator model is compared to pure planning and pure habitual instances of the model.
Subject
Collections
Related items
Showing items related by title, author, creator and subject.
-
Deep Neural Network (DNN) Design: The Utilization of Approximate Computing and Practical Considerations for Accuracy Evaluation
Hammad, Issam (2021-08-04)Approximate computing is emerging as a viable way to achieve significant performance enhancement in terms of power, speed, and area for system on chip (SoC) designs. Utilizing approximate computing in the design of deep ... -
ELECTRONIC GAMING MACHINE PLAYSTYLE DETECTION AND RAPID PLAYSTYLE CLASSIFICATION USING MULTIVARIATE CONVOLUTIONAL LSTM NEURAL NETWORK ARCHITECTURE
Latifi, Soheil (2021-09-01)Electronic Gaming Machines (EGM) are common, anonymous, stateless gambling machines operated by a region’s lottery and situated in licensed venues. Previous work have shown that problem gambling detection is possible ... -
Interactive Learning To Rank And Visual Rank Interpretation
Pereira, Mateus Malvessi (2020-04-08)Many algorithms in the Information Retrieval domain have been developed considering training models using vast amounts of data. The acquisition of this data, however, is time-consuming and requires lots of human effort. ...