Evolving Policies to Solve the Rubik's Cube: Experiments with Ideal and Approximate Performance Functions

Smith, Robert

dc.contributor.author	Smith, Robert
dc.date.accessioned	2016-08-26T15:50:57Z
dc.date.available	2016-08-26T15:50:57Z
dc.date.issued	2016-08-26T15:50:57Z
dc.identifier.uri	http://hdl.handle.net/10222/72115
dc.description.abstract	This work reports on an approach to direct policy discovery (a form of reinforcement learning) using genetic programming (GP) for the 3 x 3 x 3 Rubik's Cube. Specifically, a synthesis of two approaches is proposed: 1) a previous group theoretic formulation is used to suggest a sequence of objectives for developing solutions to different stages of the overall task; and 2) a hierarchical formulation of GP policy search is utilized in which policies adapted for an earlier objective are explicitly transferred to aid the construction of policies for the next objective. The resulting hierarchical organization of policies into a policy tree explicitly demonstrates task decomposition and policy reuse. Algorithmically, the process makes use of a recursive call to a common approach for maintaining a diverse population of GP individuals and then learns how to reuse subsets of programs (policies) developed against the earlier objective. Other than the two objectives, we do not explicitly identify how to decompose the task or mark specific policies for transfer. Moreover, at the end of evolution we return a population solving 100% of 17,675,698 different initial Cubes for the two objectives currently in use. A second set of experiments are then performed to qualify the relative contributions for two components for discovering policy trees: Policy diversity maintenance and Competitive coevolution. Both components prove to be fundamental. Without support for each, performance only reaches ~55% and ~23% respectively.	en_US
dc.language.iso	en_US	en_US
dc.subject	Machine learning	en_US
dc.title	Evolving Policies to Solve the Rubik's Cube: Experiments with Ideal and Approximate Performance Functions	en_US
dc.date.defence	2016-08-23
dc.contributor.department	Faculty of Computer Science	en_US
dc.contributor.degree	Master of Computer Science	en_US
dc.contributor.external-examiner	n/a	en_US
dc.contributor.graduate-coordinator	Dr. Malcolm Heywood	en_US
dc.contributor.thesis-reader	Dr. Andrew McIntyre	en_US
dc.contributor.thesis-reader	Dr. Qigang Gao	en_US
dc.contributor.thesis-supervisor	Dr. Malcolm Heywood	en_US
dc.contributor.ethics-approval	Not Applicable	en_US
dc.contributor.manuscripts	Not Applicable	en_US
dc.contributor.copyright-release	Not Applicable	en_US

Find Full text

Files in this item

Name:: Smith-Robert-MCSC-CSCI-August- ...
Size:: 586.5Kb
Format:: PDF
Description:: Final Thesis PDF/A Submission ...

View/Open

This item appears in the following Collection(s)

Faculty of Graduate Studies Online Theses

Show simple item record