ON DEVELOPMENTAL VARIATION IN HIERARCHICAL SYMBIOTIC POLICY SEARCH
MetadataShow full item record
A hierarchical symbiotic framework for policy search with genetic programming (GP) is evaluated in two control-style temporal sequence learning domains. The symbiotic formulation assumes each policy takes the form of a cooperative team between multiple symbiont programs. An initial cycle of evolution establishes a diverse range of host behaviours with limited capability. The second cycle uses these initial policies as meta actions for reuse by symbiont programs. The relationship between development and ecology is explored by explicitly altering the interaction between learning agent and environment at fixed points throughout evolution. In both task domains, this developmental diversity significantly improves performance. Specifically, ecologies designed to promote good specialists in the first developmental phase and then good generalists result in much stronger organisms from the perspective of generalization ability and efficiency. Conversely, when there is no diversity in the interaction between task environment and policy learner, the resulting hierarchy is not as robust or general. The relative contribution from each cycle of evolution in the resulting hierarchical policies is measured from the perspective of multi-level selection. These multi-level policies are shown to be significantly better than the sum of contributing meta actions.