Show simple item record

dc.contributor.authorKashgouli, Sana
dc.date.accessioned2023-08-25T13:16:55Z
dc.date.available2023-08-25T13:16:55Z
dc.date.issued2023-08-23
dc.identifier.urihttp://hdl.handle.net/10222/82831
dc.description.abstractFor highly repetitive texts such as pangenomic databases, indexes based on grammars (or Lempel-Ziv parses, string attractors, etc.) are usually significantly smaller than indexes based on the Burrows-Wheeler Transform. Nevertheless, they are not widely used in practice, probably because research on grammar-based indexes has focused almost exclusively on exact pattern matching. In the first main part of this thesis, we describe a new tool, KATKA, that stores a phylogenetic tree T such that later, given a pattern P[1..m] and an integer k, it can quickly return the root of the smallest subtree of T containing all the genomes in which the k-mer P[i..i + k – 1] occurs, for 1 ≤ i ≤ m – k + 1. This is similar to the functionality of the popular metagenomic-classification tool KRAKEN but with k given at query time instead of at construction time. In the second main part, we show how, given positive constants ϵ and δ, and an α-balanced straight-line program with g rules for a text T[1..n], we can build an O(g)-space index that, given a pattern P[1..m], in O(mlogδ g) time finds with high probability a substring of P that occurs in T and whose length is at least a (1 – ϵ) fraction of the longest common substring of P and T. The correctness can be ensured within the same query time in expectation. KATKA is currently based on an LZ77-index, but we discuss how we may be able to implement it on top of a grammar-based index, taking advantage of our results in the second part. We close with a brief discussion of future work.en_US
dc.language.isoenen_US
dc.subjectIndexingen_US
dc.subjectpattern matchingen_US
dc.titleA Kraken-like tool with k given at query time and an index for finding approximately longest common substringsen_US
dc.date.defence2023-08-10
dc.contributor.departmentFaculty of Computer Scienceen_US
dc.contributor.degreeMaster of Computer Scienceen_US
dc.contributor.external-examinerNicola Prezzaen_US
dc.contributor.graduate-coordinatorMichael McAllisteren_US
dc.contributor.thesis-readerFinlay Maguireen_US
dc.contributor.thesis-supervisorTravis Gagieen_US
dc.contributor.ethics-approvalNot Applicableen_US
dc.contributor.manuscriptsYesen_US
dc.contributor.copyright-releaseNot Applicableen_US
 Find Full text

Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record