Reinforcement Learning with Real Valued Tangled Program Graphs

Amaral, Ryan

Reinforcement Learning with Real Valued Tangled Program Graphs

dc.contributor.author	Amaral, Ryan
dc.contributor.copyright-release	Not Applicable	en_US
dc.contributor.degree	Master of Computer Science	en_US
dc.contributor.department	Faculty of Computer Science	en_US
dc.contributor.ethics-approval	Not Applicable	en_US
dc.contributor.external-examiner	n/a	en_US
dc.contributor.graduate-coordinator	Dr. Michael McAllister	en_US
dc.contributor.manuscripts	Not Applicable	en_US
dc.contributor.thesis-reader	Dr. Andrew McIntyre	en_US
dc.contributor.thesis-reader	Dr. Nur Zincir-Heywood	en_US
dc.contributor.thesis-supervisor	Dr. Malcolm Heywood	en_US
dc.date.accessioned	2021-08-27T12:30:07Z
dc.date.available	2021-08-27T12:30:07Z
dc.date.defence	2021-08-25
dc.date.issued	2021-08-27T12:30:07Z
dc.description	Tangled Program Graphs (TPG) represents a framework for evolving programs under an explicitly emergent model for modularity. The framework has been very successful at discovering solutions to tasks with delayed rewards (reinforcement learning) when the actions are limited to a single discrete action per state. In this thesis, an approach is proposed for generalizing TPG to the case of multiple real-valued actions per state. Two empirical benchmarking studies are performed to demonstrate these outcomes: ViZDoom over multiple tasks, and bipedal walker control. The former is used to compare to original TPG with single discrete actions per state, the later is used to demonstrate multiple real-valued actions per state. It is shown that the complexity of the resulting solutions decreases considerably compared to the original TPG formulation. However, in order to reach these results, significant attention has to be paid to the adoption of appropriate diversity mechanisms. This thesis therefore also proposes a framework for intermittently injecting new material into the TPG population during training. The modular properties of TPG enable this material to be absorbed on a continuous basis. Results are comparable with those identified under certain recent deep learning approaches.	en_US
dc.description.abstract	Tangled Program Graphs (TPG) represents a framework for evolving programs under an explicitly emergent model for modularity. The framework has been very successful at discovering solutions to tasks with delayed rewards (reinforcement learning) when the actions are limited to a single discrete action per state. In this thesis, an approach is proposed for generalizing TPG to the case of multiple real-valued actions per state. Two empirical benchmarking studies are performed to demonstrate these outcomes: ViZDoom over multiple tasks, and bipedal walker control. The former is used to compare to original TPG with single discrete actions per state, the later is used to demonstrate multiple real-valued actions per state. It is shown that the complexity of the resulting solutions decreases considerably compared to the original TPG formulation. However, in order to reach these results, significant attention has to be paid to the adoption of appropriate diversity mechanisms. This thesis therefore also proposes a framework for intermittently injecting new material into the TPG population during training. The modular properties of TPG enable this material to be absorbed on a continuous basis. Results are comparable with those identified under certain recent deep learning approaches.	en_US
dc.identifier.uri	http://hdl.handle.net/10222/80746
dc.language.iso	en	en_US
dc.subject	Reinforcement Learning	en_US
dc.subject	Genetic Programming	en_US
dc.subject	Diversity	en_US
dc.subject	Evolution	en_US
dc.subject	Machine Learning	en_US
dc.subject	Subpopulation	en_US
dc.subject	Continuous Control	en_US
dc.subject	OpenAI Gym	en_US
dc.subject	ViZDoom	en_US
dc.subject	SBB	en_US
dc.subject	TPG	en_US
dc.title	Reinforcement Learning with Real Valued Tangled Program Graphs	en_US
dc.type	Thesis	en_US