dc.contributor.author | Brown, Nathaniel | |
dc.date.accessioned | 2023-04-17T12:56:40Z | |
dc.date.available | 2023-04-17T12:56:40Z | |
dc.date.issued | 2023-04-14 | |
dc.identifier.uri | http://hdl.handle.net/10222/82419 | |
dc.description.abstract | The FM-index is used in many bioinformatics applications; however, it cannot handle large text collections. This is of particular concern due to reference bias; references fail to capture the required genetic diversity. A recently successful solution has been FM-indexes using the RLBWT and many references, but these traditionally rely on worst case logarithmic time rank operations. Nishimoto and Tabei improved this result to answer queries by computing last-to-first (LF) steps in constant-time. We show that this result can be made practical for LF using “table-lookup” on permutations. We show how to compress columns of the table to support count queries, competitive in time/space with the best existing implementations. Matching statistics describe exact matching substrings of a pattern, and computing them relies on LF, thresholds, and LCE queries. We show how storing additional LCE information alongside thresholds allows us to improve the speed of computing matching statistic for pan-genomic datasets. | en_US |
dc.language.iso | en | en_US |
dc.subject | Data Structures | en_US |
dc.subject | Computational Genomics | en_US |
dc.subject | Algorithms | en_US |
dc.subject | Text Indexes | en_US |
dc.title | BWT-runs Compressed Data Structures for Pan-Genomics Text Indexing | en_US |
dc.type | Thesis | en_US |
dc.date.defence | 2023-04-12 | |
dc.contributor.department | Faculty of Computer Science | en_US |
dc.contributor.degree | Master of Computer Science | en_US |
dc.contributor.external-examiner | n/a | en_US |
dc.contributor.graduate-coordinator | Michael McAllister | en_US |
dc.contributor.thesis-reader | Christopher Whidden | en_US |
dc.contributor.thesis-reader | Meng He | en_US |
dc.contributor.thesis-supervisor | Travis Gagie | en_US |
dc.contributor.ethics-approval | Not Applicable | en_US |
dc.contributor.manuscripts | Yes | en_US |
dc.contributor.copyright-release | Not Applicable | en_US |