HIERARCHICAL LOCATION CLASSIFICATION OF TWITTER USERS WITH A CONTENT BASED PROBABILITY MODEL
Extraction of geographical information from the content is gaining importance due to the huge growth of textual data on social media and a phenomenal increase in the location based personalized services. Knowledge of online user’s content and location enables location based personalized services. Existing approaches to predict the location of Twitter users have not incorporated geographical information from geo-tagged tweets and are content driven only. A hybrid approach using a combination of hierarchical location classification and tweet geo-location is proposed to predict location based on tweet content and metadata. Our approach uses an ensemble of content based statistic classifiers trained on words, hashtags, places and heuristic classifiers for place names, geo-coordinates in tweets to predict locations at different granularities like time zone, state and city. Experimental results suggest that our hybrid approach achieves a city prediction accuracy of 70.7% for Twitter users and outperforms the existing hierarchical location classification methods.