Yield forecasting with machine learning and small data: what gains for grains?
Forecasting crop yields is important for food security, in particular to predict where crop production is likely to drop. Climate records and remotely-sensed data have become instrumental sources of data for crop yield forecasting systems. Similarly, machine learning methods are increasingly used to process big Earth observation data. However, access to data necessary to train such algorithms is often limited in food-insecure countries. Here, we evaluate the performance of machine learning algorithms and small data to forecast yield on a monthly basis between the start and the end of the growing season. To do so, we developed a robust and automated machine-learning pipeline which selects the best features and model for prediction. Taking Algeria as case study, we predicted national yields for barley, soft wheat and durum wheat with an accuracy of 0.16-0.2 t/ha (13-14 The best machine-learning models always outperformed simple benchmark models. This was confirmed in low-yielding years, which is particularly relevant for early warning. Nonetheless, the differences in accuracy between machine learning and benchmark models were not always of practical significance. Besides, the benchmark models outperformed up to 60 models that were tested, which stresses the importance of proper model calibration and selection. For crop yield forecasting, like for many application domains, machine learning has delivered significant improvement in predictive power. Nonetheless, superiority over simple benchmarks is often fully achieved after extensive calibration, especially when dealing with small data.READ FULL TEXT