Data Mining in Social Media for Stock Market Prediction
In this thesis, machine learning algorithms are used in NLP to get the public sentiment on individual stocks from social media in order to study its relationship with the stock price change. The NLP approach of sentiment detection is a two-stage process by implementing Neutral v.s. Polarized sentiment detection before Positive v.s. Negative sentiment detection, and SVMs are proved to be the best classifiers with the overall accuracy rates of 71.84% and 74.3%, respectively. It is discovered that users’ activity on StockTwits overnight significantly positively correlates to the stock trading volume the next business day. The collective sentiments for afterhours have powerful prediction on the change of stock price for the next day in 9 out of 15 stocks studied by using the Granger Causality test; and the overall accuracy rate of predicting the up and down movement of stocks by using the collective sentiments is 58.9%.