Show simple item record

dc.contributor.authorSheikhnezhad Fard, Farzaneh
dc.date.accessioned2018-04-03T16:43:40Z
dc.date.available2018-04-03T16:43:40Z
dc.identifier.urihttp://hdl.handle.net/10222/73811
dc.description.abstractIt is hypothesized that the brain builds an internal representation of the world and its body. Moreover, it is well established that human decision making and instrumental control uses multiple systems, some which are habitual and some which require planning. In this thesis, we proposed a novel model called adaptive observer\cite{fard2015modeling} that learns the internal representation of the world using dynamic neural fields (DNF). DNF is a well-known model that simulates brain activity in cortical tissues. By DNF the activity of the population of neurons is being considered instead of the activity of only one single neuron. Later, we introduce a model called \textit{\bf arbitrated predictive actor-critic (APAC)}~\cite{fard2017novel,fard2017anactor}. In APAC, we proposed a general architecture comprising both habitual and planning control paradigms by introducing an arbitrator that controls which subsystem is used at any time. Both adaptive observer and APAC imply the internal model, however, they are different in some aspects. For example, the adaptive observer, unlike APAC, uses DNFs to represent neural activities. While, APAC, unlike the adaptive observer, can learn the kinematics of the system without a prior knowledge and combines two control systems for decision making. APAC model takes advantage of a fast habitual controller when it is reliable enough. Both models are studied and tested under different conditions on a target reaching task. The adaptive observer was tested with a real robotic arm, while the APAC was examined with a simulated robot arm. In adaptive observer, a path integration technique is implied to reach the target. Such adaptive observer can also explain some interesting features and behaviours in the brain, namely moving with impaired sensory input, and motor adaptation. Through permutation of target-reaching conditions, we also demonstrate that APAC is capable of learning kinematics of the system rapidly without a priori knowledge and is robust to (A) changing environmental reward and kinematics, and (B) occluded vision. The arbitrator model is compared to pure planning and pure habitual instances of the model.en_US
dc.language.isoenen_US
dc.subjectMachine Learningen_US
dc.subjectDeep Learningen_US
dc.subjectSupervised Learningen_US
dc.subjectDeep Reinforcement Learningen_US
dc.subjectCognitive Roboticsen_US
dc.subjectHuman behaviour modellingen_US
dc.titleModelling Human Target Reaching using A novel predictive deep reinforcement learning techniqueen_US
dc.date.defence2018-03-26
dc.contributor.departmentFaculty of Computer Scienceen_US
dc.contributor.degreeDoctor of Philosophyen_US
dc.contributor.external-examinerDr. Jeff Krichmaren_US
dc.contributor.graduate-coordinatorDr. Adam Donaldsonen_US
dc.contributor.thesis-readerDr. Sageev Ooreen_US
dc.contributor.thesis-readerDr. Malcolm Heywooden_US
dc.contributor.thesis-readerDr. David Westwooden_US
dc.contributor.thesis-supervisorDr. Thomas Trappenbergen_US
dc.contributor.ethics-approvalNot Applicableen_US
dc.contributor.manuscriptsYesen_US
dc.contributor.copyright-releaseYesen_US
 Find Full text

Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record