Show simple item record

dc.contributor.authorMei, Jie
dc.date.accessioned2017-04-11T14:08:44Z
dc.date.available2017-04-11T14:08:44Z
dc.date.issued2017-04-11T14:08:44Z
dc.identifier.urihttp://hdl.handle.net/10222/72844
dc.description.abstractThis thesis deals with the problem of error correction for Optical Character Recognation (OCR) generated text, or OCR-postprocessing: how to detect error words in a text generated from OCR process and to suggest the most appropriate candidates to correct such errors. The thesis demonstrates that OCR errors are inherently more protean and volatile than handwriting or typing errors, while existing OCR-postprocessing approaches have different limitations. Through analyzing the recent development of error correction techniques, we illustrate that the compositional approach incorporating correction inferences is broadly researched and practically usefull. Thus, we propose an ensemble regression approach that composite correction inferences for ranking correction candidates of complex OCR errors. On practical side, we make available a benchmark dataset for this task and conduct a comprehensive study on performance analysis with different correction inferences and ensemble algorithms. In particular, the experimental results show that the proposed ensemble method is a robust approach that is able to handle complex OCR errors and outperform various baselines.en_US
dc.language.isoen_USen_US
dc.subjectError Correctionen_US
dc.subjectMachine Learningen_US
dc.subjectRegression analysisen_US
dc.subjectOCR Post-processingen_US
dc.subjectNatural Language Processingen_US
dc.subjectOptical character recognition
dc.titleAn Ensemble Regression Approach For OCR Error Correctionen_US
dc.date.defence2017-03-27
dc.contributor.departmentFaculty of Computer Scienceen_US
dc.contributor.degreeMaster of Computer Scienceen_US
dc.contributor.external-examinern/aen_US
dc.contributor.graduate-coordinatorAlex Brodskyen_US
dc.contributor.thesis-readerVlado Keseljen_US
dc.contributor.thesis-readerAbidalrahman Moh'den_US
dc.contributor.thesis-supervisorEvangelos Miliosen_US
dc.contributor.ethics-approvalNot Applicableen_US
dc.contributor.manuscriptsNot Applicableen_US
dc.contributor.copyright-releaseNot Applicableen_US
 Find Full text

Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record