Query-Driven Large-Scale Portfolio Aggregate Risk Analysis on MapReduce
Abstract
Modern reinsurance companies use stochastic simulation techniques for portfolio risk analysis, often referred to as aggregate risk analysis, to support risk management. Their risk portfolios may consist of thousands of reinsurance contracts covering millions of individually insured locations. To quantify risk and to help ensure capital adequacy, each portfolio must be evaluated in large-scaled simulation trials, each capturing a different possible sequence of catastrophic events (e.g., earthquakes, hurricanes, etc.) over the course of a contractual year. In practice, due to the amount of data and computations involved, it is highly attractive to explore high performance parallel computing solutions to accelerate the analysis.
In this thesis, we explore the design of a flexible framework, called QuPARA, which exploits parallelism to perform aggregate risk analysis via distributed computing by using the MapReduce programming model. The goal is to provide a flexible framework that can be used by analysts to answer a wide variety of unanticipated but natural ad hoc queries to help them better understand multiple dimensions of risks that can impact portfolio performance and thus company solvency.
The QuPARA framework was implemented using Apache Hadoop, Apache Hive, and Pentaho. This prototype allows the user to take advantage of large parallel servers in order to answer ad hoc risk analysis queries efficiently even on large data sets. We also present data structure optimizations and tuning that greatly accelerate QuPARA's computation. The performance of the prototype system is competitive with highly tuned production systems that are only capable of answering a narrow set of portfolio queries, in contrast to the wide range of ad hoc queries QuPARA is able to resolve.