Please be advised that DalSpace will be unavailable from June 19 to July 7 for a system migration and upgrade. Graduate students who are required to submit their thesis during this period are asked to contact thesis.review@dal.ca, for instructions on how to proceed. For all other submissions, please return on July 7 to upload your material. Starting on July 7, the new URL for DalSpace will be dal.scholaris.ca . Thank you for your patience.
Repository logo

Data Compression Meets Automata Theory

Loading...
Thumbnail Image

Authors

Cotumaccio, Nicola

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

We introduce a new paradigm in graph compression and formal language theory. We show that the ideas behind some of the most important data structures for compressing and indexing strings - such as the suffix array, the Burrows-Wheeler Transform and the FM-index - are much more general and provide a new approach to studying automata and regular languages, which retrospectively explains the impact of these data structures. We classify all automata and all regular languages by their propensity to be sorted. Our classification represents a useful parameterization simultaneously for diverse automata-related measures: (i) the encoding bit-complexity of automata/labeled graphs, (ii) the complexity of operations on regular languages (e.g. membership) and on labeled graphs (e.g. pattern matching), (iii) the complexity of NFA determinization by the powerset-construction algorithm. To the best of our knowledge, ours is the only parameterization of automata/labeled graphs capturing simultaneously all these aspects. We show that our parameterization has deep and unexpected consequences both in data compression (encoding, pattern matching) and in automata theory (nondeterminism, entanglement, minimization).

Description

Keywords

Data Compression, Automata Theory, Burrows-Wheeler Transform, FM-index, Regular Languages, Nondeterminism, Minimization, Suffix Trees, Suffix Arrays

Citation