Repository logo
 

QTRB: TEAM-BASED REGION BUILDING USING Q-LEARNING TO DERIVE POLICY ON PROGRAMS PARAMETERIZED BY LOCAL REWARD SIGNAL

dc.contributor.authorSealy, Noah
dc.contributor.copyright-releaseNot Applicableen_US
dc.contributor.degreeMaster of Computer Scienceen_US
dc.contributor.departmentFaculty of Computer Scienceen_US
dc.contributor.ethics-approvalNot Applicableen_US
dc.contributor.external-examinern/aen_US
dc.contributor.graduate-coordinatorDr. Michael McAllisteren_US
dc.contributor.manuscriptsNot Applicableen_US
dc.contributor.thesis-readerDr. Vlado Keseljen_US
dc.contributor.thesis-readerDr. Garnett Wilsonen_US
dc.contributor.thesis-readerDr. Dirk Arnolden_US
dc.contributor.thesis-supervisorDr. Malcolm Heywooden_US
dc.date.accessioned2023-04-26T14:35:24Z
dc.date.available2023-04-26T14:35:24Z
dc.date.defence2023-04-11
dc.date.issued2023-04-24
dc.description.abstractWhile attempting to solve 2-dimensional grid world maze tasks, it was observed that genetic programming is limited by its random initialization and no use of local reward. This thesis proposes a hybrid algorithm called QTRB, team-based region building with q-learning, which attempts to integrate genetic programming and reinforcement learning to use local reward during evolution. During evolution, QTRB constructs programs based directly on local environmental reward; programs are then passed to a reinforcement learning agent to learn on as a model. QTRB was tested to solve variously sized 2-dimensional maze tasks, hypothesizing that policy can be derived from an agent learning from this model. The results suggest that QTRB can derive policy on the given tasks, with fewer direct environment queries than traditional q-learning as the task size scales.en_US
dc.identifier.urihttp://hdl.handle.net/10222/82533
dc.language.isoenen_US
dc.subjectgenetic programmingen_US
dc.subjectreinforcement learningen_US
dc.subjectlocal reinforcementen_US
dc.subjecthybrid algorithmsen_US
dc.subjectqtrben_US
dc.titleQTRB: TEAM-BASED REGION BUILDING USING Q-LEARNING TO DERIVE POLICY ON PROGRAMS PARAMETERIZED BY LOCAL REWARD SIGNALen_US
dc.typeThesisen_US

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
NoahSealy2023.pdf
Size:
14.82 MB
Format:
Adobe Portable Document Format
Description:
Main thesis

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: