Show simple item record

dc.contributor.authorAmaral, Ryan
dc.date.accessioned2021-08-27T12:30:07Z
dc.date.available2021-08-27T12:30:07Z
dc.date.issued2021-08-27T12:30:07Z
dc.identifier.urihttp://hdl.handle.net/10222/80746
dc.descriptionTangled Program Graphs (TPG) represents a framework for evolving programs under an explicitly emergent model for modularity. The framework has been very successful at discovering solutions to tasks with delayed rewards (reinforcement learning) when the actions are limited to a single discrete action per state. In this thesis, an approach is proposed for generalizing TPG to the case of multiple real-valued actions per state. Two empirical benchmarking studies are performed to demonstrate these outcomes: ViZDoom over multiple tasks, and bipedal walker control. The former is used to compare to original TPG with single discrete actions per state, the later is used to demonstrate multiple real-valued actions per state. It is shown that the complexity of the resulting solutions decreases considerably compared to the original TPG formulation. However, in order to reach these results, significant attention has to be paid to the adoption of appropriate diversity mechanisms. This thesis therefore also proposes a framework for intermittently injecting new material into the TPG population during training. The modular properties of TPG enable this material to be absorbed on a continuous basis. Results are comparable with those identified under certain recent deep learning approaches.en_US
dc.description.abstractTangled Program Graphs (TPG) represents a framework for evolving programs under an explicitly emergent model for modularity. The framework has been very successful at discovering solutions to tasks with delayed rewards (reinforcement learning) when the actions are limited to a single discrete action per state. In this thesis, an approach is proposed for generalizing TPG to the case of multiple real-valued actions per state. Two empirical benchmarking studies are performed to demonstrate these outcomes: ViZDoom over multiple tasks, and bipedal walker control. The former is used to compare to original TPG with single discrete actions per state, the later is used to demonstrate multiple real-valued actions per state. It is shown that the complexity of the resulting solutions decreases considerably compared to the original TPG formulation. However, in order to reach these results, significant attention has to be paid to the adoption of appropriate diversity mechanisms. This thesis therefore also proposes a framework for intermittently injecting new material into the TPG population during training. The modular properties of TPG enable this material to be absorbed on a continuous basis. Results are comparable with those identified under certain recent deep learning approaches.en_US
dc.language.isoenen_US
dc.subjectReinforcement Learningen_US
dc.subjectGenetic Programmingen_US
dc.subjectDiversityen_US
dc.subjectEvolutionen_US
dc.subjectMachine Learningen_US
dc.subjectSubpopulationen_US
dc.subjectContinuous Controlen_US
dc.subjectOpenAI Gymen_US
dc.subjectViZDoomen_US
dc.subjectSBBen_US
dc.subjectTPGen_US
dc.titleReinforcement Learning with Real Valued Tangled Program Graphsen_US
dc.typeThesisen_US
dc.date.defence2021-08-25
dc.contributor.departmentFaculty of Computer Scienceen_US
dc.contributor.degreeMaster of Computer Scienceen_US
dc.contributor.external-examinern/aen_US
dc.contributor.graduate-coordinatorDr. Michael McAllisteren_US
dc.contributor.thesis-readerDr. Andrew McIntyreen_US
dc.contributor.thesis-readerDr. Nur Zincir-Heywooden_US
dc.contributor.thesis-supervisorDr. Malcolm Heywooden_US
dc.contributor.ethics-approvalNot Applicableen_US
dc.contributor.manuscriptsNot Applicableen_US
dc.contributor.copyright-releaseNot Applicableen_US
 Find Full text

Files in this item

Thumbnail
Thumbnail
Thumbnail
Thumbnail

This item appears in the following Collection(s)

Show simple item record