Show simple item record

dc.contributor.authorChen, Ying.en_US
dc.date.accessioned2014-10-21T12:35:22Z
dc.date.available2014-10-21T12:35:22Z
dc.date.issued2005en_US
dc.identifier.otherAAINR08421en_US
dc.identifier.urihttp://hdl.handle.net/10222/54753
dc.descriptionMore and more organizations, such as business, health care providers and scientific enterprises, rely on Online Analytical Processing (OLAP) to analyze massive data sets at a variety of summary levels and in a multidimensional way. In OLAP systems, one of the most computationally intensive tasks is to execute the Cube query, which was proposed by Gray et al. in 1997 as an extension of the Structured Query Language (SQL). A cube query generates a set of group-bys/views over all combinations of a set of attributes/dimensions from a table. The result of the query is a collection of multidimensional data, called a Data Cube. Pre-computing of data cubes can dramatically reduce the response time of other queries. Recently many sequential algorithms have been proposed to generate data cubes efficiently, however as the size of data sets grows, there is a need for even more scalable algorithms. Currently, for large data sets, the cube queries may require hours or even days to run on standard sequential machines. Parallel Computing can provide two key ingredients for dealing with large data size: (1) increased computational power through multiple processors and (2) increased I/O bandwidth through multiple parallel disks.en_US
dc.descriptionThe work presented in this thesis combines (1) the design of efficient parallel cube generation algorithms for the three basic types of data cubes: full cubes, partial cubes and iceberg cubes, with (2) careful system work associated with parallelism and external memory issues, and (3) extensive experiments and evaluation. The proposal algorithms are both external memory and parallel. They are designed for shared-nothing clusters, and use explicitly represented cost models which aid in performance tuning and portability. Our experiments show that the relative speedup of the algorithms is close to optimal/linear speedup for a wide range of input parameters, and the scalability is almost linear on large data sets. The proposed algorithms have been carefully implemented in our cgmOLAP prototype, which is to our knowledge the first fully functional parallel OLAP system able to build data cubes at a rate of more than half terabyte per hour.en_US
dc.descriptionThesis (Ph.D.)--Dalhousie University (Canada), 2005.en_US
dc.languageengen_US
dc.publisherDalhousie Universityen_US
dc.publisheren_US
dc.subjectComputer Science.en_US
dc.titleParallel generation of ROLAP data cubes.en_US
dc.typetexten_US
dc.contributor.degreePh.D.en_US
 Find Full text

Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record