Show simple item record

dc.contributor.authorMaler, Adrian
dc.date.accessioned2023-04-17T12:10:31Z
dc.date.available2023-04-17T12:10:31Z
dc.date.issued2023-04-14
dc.identifier.urihttp://hdl.handle.net/10222/82418
dc.description.abstractIn artificial intelligence, common sense refers to simple acts of verbal reasoning. The Winograd Schema Challenge (WSC), an important test of common sense, was recently defeated by transformer-based language models. We investigate the implications of that defeat: have language models achieved common sense, or is the challenge flawed? That is, we consider the problem of reevaluating verbal reasoning in language models. We evaluate the accuracy and consistency on Winograd schemas of three important pretrained models: GPT-2, RoBERTa, and T5. We generalize the Winograd schema to a larger class of problems, called adversarial schemas, and propose an evaluation protocol for them that incorporates consistency. We create a new test of common-sense verbal reasoning made up of our adversarial schemas. Each model performs significantly worse on our test than on WSC, and no model exhibits high consistency. We find no convincing evidence of verbal reasoning by language models.en_US
dc.language.isoenen_US
dc.subjectcomputer scienceen_US
dc.subjectartificial intelligenceen_US
dc.subjectnatural language processingen_US
dc.subjectmachine learningen_US
dc.subjectlanguage modelen_US
dc.subjectcommon-sense reasoningen_US
dc.subjectWinograd Schema Challengeen_US
dc.titleEvaluating Common-Sense Reasoning in Pretrained Transformer-Based Language Models Using Adversarial Schemas and Consistency Metricsen_US
dc.date.defence2023-04-05
dc.contributor.departmentFaculty of Computer Scienceen_US
dc.contributor.degreeMaster of Computer Scienceen_US
dc.contributor.external-examinern/aen_US
dc.contributor.graduate-coordinatorMichael McAllisteren_US
dc.contributor.thesis-readerDarren Abramsonen_US
dc.contributor.thesis-readerDirk Arnolden_US
dc.contributor.thesis-supervisorVlado Keseljen_US
dc.contributor.ethics-approvalNot Applicableen_US
dc.contributor.manuscriptsNot Applicableen_US
dc.contributor.copyright-releaseNot Applicableen_US
 Find Full text

Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record