Show simple item record

dc.contributor.authorKong, Quan
dc.date.accessioned2014-08-18T12:08:12Z
dc.date.available2014-08-18T12:08:12Z
dc.date.issued2014-08-18
dc.identifier.urihttp://hdl.handle.net/10222/53799
dc.description.abstractOn-line Analytical processing (OLAP) has been an important approach to the analysis of large structured data warehouse systems for many years. OLAP queries, in contrast to On-line Transaction Processing (OLTP) queries that typically access only a small portion of a data warehouse, may need to aggregate large portions of a data warehouse which often leads to performance bottlenecks. This problem is often compounded in real-time environments where new data may arrive frequently and at high velocity. One approach to addressing these performance challenges is to exploit collections of multi-core servers organized in cloud-based computing platforms. In this thesis, we explore the design, implementation and evaluation of two new real-time cloud-based OLAP systems. In the first, we introduce CR-OLAP, a scalable Cloud based Real-time OLAP system that harnesses a new distributed index structure for OLAP, the distributed PDCR tree. CR-OLAP utilizes a scalable cloud infrastructure consisting of multiple commodity processors. With increasing database size, CR-OLAP dynamically increases the number of processors to maintain performance. Our distributed PDCR tree data structure supports multiple dimension hierarchies and efficient query processing on the elaborate dimension hierarchies which are central to OLAP systems. It is particularly efficient for complex OLAP queries that need to aggregate large portions of the data warehouse, such as "report the total sales in all stores located in California and New York during the months February-May of all years". We evaluated CR-OLAP on the Amazon EC2 cloud, using the TPC-DS benchmark data set. The tests demonstrate that CR-OLAP scales well with increasing number of processors, even for complex queries. They also highlighted that future scalability was only likely to be achieved in an architecture that supports multiple coordinating servers. Based on our experience with CR-OLAP, we then present VelocityOLAP (vOLAP), a second real-time OLAP system for high velocity data in cloud environments. This system supports dimension hierarchies, is highly scalable, exploits both multi-core and multiprocessor parallelism, and guarantees strong serialization for both user and work group sessions. vOLAP is also built based on PDC-tree but supports multiple coordinating server processors and real-time dynamic load balancing. An experimental evaluation of our vOLAP prototype, using 18 worker instances for a database size of 1.5 billion items, shows that it is able to ingest new data items at a rate of over 600,000 items per second, and can process streams of interspersed inserts and OLAP queries in real-time at a rate of approximately 200,000 queries per second.en_US
dc.language.isoenen_US
dc.subjectReal-time, OLAP, Cloud, distributed PDCR-treeen_US
dc.titleScalable real-time OLAP systems for the clouden_US
dc.date.defence2014-08-08
dc.contributor.departmentFaculty of Computer Scienceen_US
dc.contributor.degreeMaster of Computer Scienceen_US
dc.contributor.external-examinern/aen_US
dc.contributor.graduate-coordinatorDr. Evangelos E. Miliosen_US
dc.contributor.thesis-readerDr. Peter Bodorik, Dr. Vlado Keseljen_US
dc.contributor.thesis-supervisorDr. Andrew Rau-Chaplinen_US
dc.contributor.ethics-approvalNot Applicableen_US
dc.contributor.manuscriptsNot Applicableen_US
dc.contributor.copyright-releaseNot Applicableen_US
 Find Full text

Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record