Range Searching Data Structures with Cache Locality
MetadataShow full item record
This thesis focuses on range searching data structures, an elementary problem in computational geometry with research spanning decades. These problems often involve very large data sets. Processor speeds increase faster than memory speeds, thus the gap between the rate at which CPUs can process data and the rate at which it can be retrieved is increasing. To bridge this gap, various levels of cache are used. Since cache misses are costly, algorithms should be cache-friendly. The input-output (I/O) model was the first model for constructing cache-efficient algorithms, focusing on a two-level memory hierarchy. Algorithms for this model require manual tuning to determine optimal values for hardware dependent parameters, and are only optimal at a single level of a memory hierarchy. Cache-oblivious (CO) algorithms are built without knowledge of the hierarchy, allowing them to be optimal across all levels at once. There exist strong theoretical and practical results for I/O-efficient range searching. Recently, the CO model has received attention, but range searching remains poorly understood. This thesis explores data structures for CO range counting and reporting. It presents the first space and worst-case query-time optimal approximate range counting structure for a family of related problems, and associated O(N log N)-space query-optimal reporting structures. The approximate counting structure is the first of its kind in internal memory, I/O and CO models. Researchers have been trying to create linear-space query-optimal CO reporting structures. This thesis shows that for a variety of problems, linear space is in fact impossible. Heuristics are also used for building cache-friendly algorithms. Space-filling curves are continuous functions mapping multi-dimensional sets into one-dimensional ones. They are used to build search structures in the hopes that objects that were close in the original space remain close in the resulting ordering. This results in queries incurring fewer page swaps when traversing the structure. The Hilbert curve is notably good at this, but often imposes a space or time penalty. This thesis introduces compact Hilbert indices, which remove the ineffiency inherent for input point sets with bounding boxes smaller than their bounding hypercubes.