Repository logo
 

Data Compression Meets Automata Theory

Date

2024-02-07

Authors

Cotumaccio, Nicola

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

We introduce a new paradigm in graph compression and formal language theory. We show that the ideas behind some of the most important data structures for compressing and indexing strings - such as the suffix array, the Burrows-Wheeler Transform and the FM-index - are much more general and provide a new approach to studying automata and regular languages, which retrospectively explains the impact of these data structures. We classify all automata and all regular languages by their propensity to be sorted. Our classification represents a useful parameterization simultaneously for diverse automata-related measures: (i) the encoding bit-complexity of automata/labeled graphs, (ii) the complexity of operations on regular languages (e.g. membership) and on labeled graphs (e.g. pattern matching), (iii) the complexity of NFA determinization by the powerset-construction algorithm. To the best of our knowledge, ours is the only parameterization of automata/labeled graphs capturing simultaneously all these aspects. We show that our parameterization has deep and unexpected consequences both in data compression (encoding, pattern matching) and in automata theory (nondeterminism, entanglement, minimization).

Description

Keywords

Data Compression, Automata Theory, Burrows-Wheeler Transform, FM-index, Regular Languages, Nondeterminism, Minimization, Suffix Trees, Suffix Arrays

Citation