Engineering Algorithms for Solving Geometric and Graph Problems on Large Data Sets
Cosgaya Lozano, Adan Jose
MetadataShow full item record
This thesis focuses on the engineering of algorithms for massive data sets. In recent years, massive data sets have become ubiquitous and existing computing applications, for the most part, cannot handle these data sets efficiently: either they crash or their performance degrades to a point where they take unacceptably long to process the input. Parallel computing and I/O-efficient algorithms provide the means to process massive amounts of data efficiently. The work presented in this thesis makes use of these techniques and focuses on obtaining practically efficient solutions for specific problems in computational geometry and graph theory. We focus our attention first on skyline computations. This problem arises in decision-making applications and has been well studied in computational geometry and also by the database community in recent years. Most of the previous work on this problem has focused on sequential computations using a single processor, and the algorithms produced are not able to efficiently process data sets beyond the capacity of main memory. Such massive data sets are becoming more common; thus, parallelizing the skyline computation and eliminating the I/O bottleneck in large-scale computations is increasingly important in order to retrieve the results in a reasonable amount of time. Furthermore, we address two fundamental problems of graph analysis that appear in many application areas and which have eluded efforts to develop theoretically I/O-efficient solutions: computing the strongly connected components of a directed graph and topological sorting of a directed acyclic graph. To approach these problems, we designed algorithms, developed efficient implementations and, using extensive experiments, verified that they perform well in practice. Our solutions are based on well understood algorithmic techniques. The experiments show that, even though some of these techniques do not lead to provably efficient algorithms, they do lead to practically efficient heuristic solutions. In particular, our parallel algorithm for skyline computation is based on divide-and-conquer, while the strong connectivity and topological sorting algorithms use techniques such as graph contraction, the Euler technique, list ranking, and time-forward processing.