Time and the Value of Data

03/17/2022
by   Ehsan Valavi, et al.
0

Managers often believe that collecting more data will continually improve the accuracy of their machine learning models. However, we argue in this paper that when data lose relevance over time, it may be optimal to collect a limited amount of recent data instead of keeping around an infinite supply of older (less relevant) data. In addition, we argue that increasing the stock of data by including older datasets may, in fact, damage the model's accuracy. Expectedly, the model's accuracy improves by increasing the flow of data (defined as data collection rate); however, it requires other tradeoffs in terms of refreshing or retraining machine learning models more frequently. Using these results, we investigate how the business value created by machine learning models scales with data and when the stock of data establishes a sustainable competitive advantage. We argue that data's time-dependency weakens the barrier to entry that the stock of data creates. As a result, a competing firm equipped with a limited (yet sufficient) amount of recent data can develop more accurate models. This result, coupled with the fact that older datasets may deteriorate models' accuracy, suggests that created business value doesn't scale with the stock of available data unless the firm offloads less relevant data from its data repository. Consequently, a firm's growth policy should incorporate a balance between the stock of historical data and the flow of new data. We complement our theoretical results with an experiment. In the experiment, we empirically measure the loss in the accuracy of a next word prediction model trained on datasets from various time periods. Our empirical measurements confirm the economic significance of the value decline over time. For example, 100MB of text data, after seven years, becomes as valuable as 50MB of current data for the next word prediction task.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/17/2022

Time Dependency, Data Flow, and Competitive Advantage

Data is fundamental to machine learning-based products and services and ...
research
08/07/2023

Towards Machine Learning-based Fish Stock Assessment

The accurate assessment of fish stocks is crucial for sustainable fisher...
research
01/26/2022

Machine Learning for Stock Prediction Based on Fundamental Analysis

Application of machine learning for stock prediction is attracting a lot...
research
10/26/2022

Will we run out of data? An analysis of the limits of scaling datasets in Machine Learning

We analyze the growth of dataset sizes used in machine learning for natu...
research
05/01/2022

A Word is Worth A Thousand Dollars: Adversarial Attack on Tweets Fools Stock Prediction

More and more investors and machine learning models rely on social media...
research
02/29/2020

Applications of deep learning in stock market prediction: recent progress

Stock market prediction has been a classical yet challenging problem, wi...
research
07/23/2022

Augmented Bilinear Network for Incremental Multi-Stock Time-Series Classification

Deep Learning models have become dominant in tackling financial time-ser...

Please sign up or login with your details

Forgot password? Click here to reset