Designing and Developing Interactive Big Data Decision Support Systems for Performance, Scalability, Availability and Consistency
MetadataShow full item record
Big data decision support systems are used to interpret meaning from extremely large data sets. The users of such systems rely on decision support systems to provide short, human-readable summarizations to aid the user in the decision making process. An interactive big data decision support system must do all of this within seconds of a user request. This short response window promotes interactivity between the system and its user, enabling the user to make several ad hoc or follow-up queries to the system shortly after receiving a response. In this thesis, we explore the design and development of interactive big data decision support systems that satisfy four key useful characteristics: performance, scalability, availability and consistency. We do this within the context of two applications. We first design and develop a novel interactive reinsurance portfolio analytics system. Our system runs on a cloud architecture and efficiently distributes work to achieve excellent scalability, scaling up to thousands of cores. In order for our system to be highly performant, we design our system to process all data entirely in memory. Our system is made consistent by a decentralized data storage service that guarantees strong consistency for all input data. A queuing system that automatically retries failed tasks ensures that the system is highly available. In a comparison with one of the leading commercial portfolio analytics systems, our system performed approximately 50 times faster. Later, we further improve performance by caching intermediate results between portfolio analyses, allowing extremely complex location-level analytics queries to be processed in only 11 seconds. Without caching, the same queries would have to process hundreds of millions of transformations over terabytes of data. Our second application is Online Analytical Processing (OLAP), where we focus solely on data consistency. We describe a method for quantifying consistency in distributed OLAP systems and present a corresponding Monte Carlo simulation to approximate the level of consistency for quorum-replicated OLAP systems, allowing users to explore their system's level of consistency under different usage scenarios. In a case study, we validate the accuracy of our simulation on a real, interactive OLAP system.