Scaling Genetic Programming to Challenging Reinforcement Tasks through Emergent Modularity
MetadataShow full item record
Algorithms that learn through environmental interaction and delayed rewards, or reinforcement learning, increasingly face the challenge of scaling to dynamic, high-dimensional environments. Video games model these types of real-world decision-making and control scenarios while being simple enough to implement within experiments. This work demonstrates how emergent modularity and open-ended evolution allow genetic programming (GP) to discover strategies for difficult gaming scenarios while maintaining relatively low model complexity. Two related learning algorithms are considered: Policy Trees and Tangled Program Graphs (TPG). In the case of Policy Trees, a methodology for transfer learning is proposed which specifically leverages both structural and behavioural modularity in the learner representation. The utility of the approach is empirically evaluated in two challenging task domains: RoboCup Soccer and Ms. Pac-Man. In RoboCup, decision-making policies are first evolved for simple subtasks and then reused within a policy hierarchy in order to learn the more complex task of Half-Field Offense. The same methodology is applied to Ms. Pac-Man, in which case the use of task-agnostic diversity maintenance enables the automatic discovery of suitable sub-policies, removing the need for a prior human-specified task decomposition. In both task domains, the final GP decision-making policies reach state-of-the-art levels of play while being significantly less complex than solutions from temporal difference methods and neuroevolution. TPG takes a more open-ended approach to modularity, emphasizing the ability to adaptively complexify policies through interaction with the task environment. The challenging Atari video game environment is used to show that this approach builds decision-making policies that broadly match the quality of several deep learning methods while being several orders of magnitude less computationally demanding, both in terms of sample efficiency and model complexity. Finally, the approach is capable of evolving solutions to multiple game titles simultaneously with no additional computational cost. In this case, agent behaviours for an individual game as well as single agents capable of playing up to 5 games emerge from the same evolutionary run.