Predicting Day-Ahead Stock Returns using Search Engine Query Volumes: An Application of Gradient Boosted Decision Trees to the S P 100

The internet has changed the way we live, work and take decisions. As it is the major modern resource for research, detailed data on internet usage exhibits vast amounts of behavioral information. This paper aims to answer the question whether this information can be facilitated to predict future returns of stocks on financial capital markets. In an empirical analysis it implements gradient boosted decision trees to learn relationships between abnormal returns of stocks within the S P 100 index and lagged predictors derived from historical financial data, as well as search term query volumes on the internet search engine Google. Models predict the occurrence of day-ahead stock returns in excess of the index median. On a time frame from 2005 to 2017, all disparate datasets exhibit valuable information. Evaluated models have average areas under the receiver operating characteristic between 54.2 indicating a classification better than random guessing. Implementing a simple statistical arbitrage strategy, models are used to create daily trading portfolios of ten stocks and result in annual performances of more than 57 before transaction costs. With ensembles of different data sets topping up the performance ranking, the results further question the weak form and semi-strong form efficiency of modern financial capital markets. Even though transaction costs are not included, the approach adds to the existing literature. It gives guidance on how to use and transform data on internet usage behavior for financial and economic modeling and forecasting.

READ FULL TEXT

page 6

page 15

page 17

page 27

page 28

page 30

research
09/11/2019

Validating Weak-form Market Efficiency in United States Stock Markets with Trend Deterministic Price Data and Machine Learning

The Efficient Market Hypothesis has been a staple of economics research ...
research
06/29/2015

Portfolio optimization using local linear regression ensembles in RapidMiner

In this paper we implement a Local Linear Regression Ensemble Committee ...
research
05/30/2022

Asymptotic dependence modelling of the BRICS stock markets

With the use of empirical data, this paper focuses on solving financial ...
research
01/21/2013

Evaluation of a Supervised Learning Approach for Stock Market Operations

Data mining methods have been widely applied in financial markets, with ...
research
04/21/2020

Forecasting directional movements of stock prices for intraday trading using LSTM and random forests

We employ both random forests and LSTM networks (more precisely CuDNNLST...
research
05/08/2020

Construction of Minimum Spanning Trees from Financial Returns using Rank Correlation

The construction of minimum spanning trees (MSTs) from correlation matri...
research
10/13/2016

Bank Card Usage Prediction Exploiting Geolocation Information

We describe the solution of team ISMLL for the ECML-PKDD 2016 Discovery ...

Please sign up or login with your details

Forgot password? Click here to reset