Reinforcement Learning with Real Valued Tangled Program Graphs
dc.contributor.author | Amaral, Ryan | |
dc.contributor.copyright-release | Not Applicable | en_US |
dc.contributor.degree | Master of Computer Science | en_US |
dc.contributor.department | Faculty of Computer Science | en_US |
dc.contributor.ethics-approval | Not Applicable | en_US |
dc.contributor.external-examiner | n/a | en_US |
dc.contributor.graduate-coordinator | Dr. Michael McAllister | en_US |
dc.contributor.manuscripts | Not Applicable | en_US |
dc.contributor.thesis-reader | Dr. Andrew McIntyre | en_US |
dc.contributor.thesis-reader | Dr. Nur Zincir-Heywood | en_US |
dc.contributor.thesis-supervisor | Dr. Malcolm Heywood | en_US |
dc.date.accessioned | 2021-08-27T12:30:07Z | |
dc.date.available | 2021-08-27T12:30:07Z | |
dc.date.defence | 2021-08-25 | |
dc.date.issued | 2021-08-27T12:30:07Z | |
dc.description | Tangled Program Graphs (TPG) represents a framework for evolving programs under an explicitly emergent model for modularity. The framework has been very successful at discovering solutions to tasks with delayed rewards (reinforcement learning) when the actions are limited to a single discrete action per state. In this thesis, an approach is proposed for generalizing TPG to the case of multiple real-valued actions per state. Two empirical benchmarking studies are performed to demonstrate these outcomes: ViZDoom over multiple tasks, and bipedal walker control. The former is used to compare to original TPG with single discrete actions per state, the later is used to demonstrate multiple real-valued actions per state. It is shown that the complexity of the resulting solutions decreases considerably compared to the original TPG formulation. However, in order to reach these results, significant attention has to be paid to the adoption of appropriate diversity mechanisms. This thesis therefore also proposes a framework for intermittently injecting new material into the TPG population during training. The modular properties of TPG enable this material to be absorbed on a continuous basis. Results are comparable with those identified under certain recent deep learning approaches. | en_US |
dc.description.abstract | Tangled Program Graphs (TPG) represents a framework for evolving programs under an explicitly emergent model for modularity. The framework has been very successful at discovering solutions to tasks with delayed rewards (reinforcement learning) when the actions are limited to a single discrete action per state. In this thesis, an approach is proposed for generalizing TPG to the case of multiple real-valued actions per state. Two empirical benchmarking studies are performed to demonstrate these outcomes: ViZDoom over multiple tasks, and bipedal walker control. The former is used to compare to original TPG with single discrete actions per state, the later is used to demonstrate multiple real-valued actions per state. It is shown that the complexity of the resulting solutions decreases considerably compared to the original TPG formulation. However, in order to reach these results, significant attention has to be paid to the adoption of appropriate diversity mechanisms. This thesis therefore also proposes a framework for intermittently injecting new material into the TPG population during training. The modular properties of TPG enable this material to be absorbed on a continuous basis. Results are comparable with those identified under certain recent deep learning approaches. | en_US |
dc.identifier.uri | http://hdl.handle.net/10222/80746 | |
dc.language.iso | en | en_US |
dc.subject | Reinforcement Learning | en_US |
dc.subject | Genetic Programming | en_US |
dc.subject | Diversity | en_US |
dc.subject | Evolution | en_US |
dc.subject | Machine Learning | en_US |
dc.subject | Subpopulation | en_US |
dc.subject | Continuous Control | en_US |
dc.subject | OpenAI Gym | en_US |
dc.subject | ViZDoom | en_US |
dc.subject | SBB | en_US |
dc.subject | TPG | en_US |
dc.title | Reinforcement Learning with Real Valued Tangled Program Graphs | en_US |
dc.type | Thesis | en_US |
Files
Original bundle
1 - 4 of 4
Loading...
- Name:
- RyanAmaral2021.pdf
- Size:
- 4.42 MB
- Format:
- Adobe Portable Document Format
- Description:
- Thesis Paper
No Thumbnail Available
- Name:
- gp-bipedal-walker-analysis-main.zip
- Size:
- 21.48 MB
- Format:
- Unknown data format
- Description:
- Python code to analyze results
No Thumbnail Available
- Name:
- ml-runner-2dbiped-gp.zip
- Size:
- 17.88 KB
- Format:
- Unknown data format
- Description:
- Python code used to run biped GP experiments
No Thumbnail Available
- Name:
- ml-runner-2dbiped-tpgsbb.zip
- Size:
- 20.25 KB
- Format:
- Unknown data format
- Description:
- Python code used to run biped SBB, TPG, TPG+SBB experiments
License bundle
1 - 1 of 1
No Thumbnail Available
- Name:
- license.txt
- Size:
- 1.71 KB
- Format:
- Item-specific license agreed upon to submission
- Description: