Prediction of Crime Occurrences: A data-driven approach for single domain and cross-domain learning
Bappee, Fateha Khanam
MetadataShow full item record
Nowadays, urban data such as demographics, infrastructure, and crime records are becoming more accessible to researchers. This has led to improvements in quantitative crime research, such as identifying factors that contribute to criminal activities. However, data from smaller cities are not as available or comprehensive. Applying the same research techniques to both urban regions and smaller domains is difficult due to nonlinear connections and data dependencies. To address this challenge, we examine an extensive set of features link to different domains from various perspectives and provide explanations for each link. Our study aims to build data-driven models for predicting future crime occurrences. We first examine the geographic aspect of crime by focusing on a single domain, the city of Halifax, Nova Scotia. We apply reverse geocoding technique to retrieve spatial information using Open Street Map, and propose a density based spatial clustering algorithm to generate crime hotspots. A spatial distance feature is then computed based on the location of different hotpoints extracted from hotspots considering different types of crime. Next, we unite the Internet of Things (IOT) and social media data, as well as explore the smart city context likely to provide a large volume of heterogeneous, city-relevant data in near future. We propose employing streetlight infrastructure and Foursquare data along with demographic characteristics for improving crime prediction. Finally, we address the same task from a cross-domain perspective to tackle the data insufficiency problem in a small city. We create a uniform outline for all geographic regions in Halifax by adapting and learning knowledge from two different domains (Toronto, Vancouver) which belong to different but related distributions with Halifax. For transferring knowledge among source and target domains, we propose applying instance-based transfer learning settings. Each setting is directed to learn knowledge based on a seasonal perspective with cross-domain data fusion. We choose ensemble learning methods for model building as it has generalization capabilities over new data. We evaluate the classification performance for both single and multi-domain representations and compare the results with baseline models. Our findings demonstrate the effectiveness of integrating diverse sources of data to gain satisfactory classification performance.