Natural resources and bioeconomy studies 80/2021 CropYield—Towards pre- harvest crop yield forecasts with satellite remote sensing Final report Maria Yli-Heikkilä, Samantha Wittke, Mirva Kokkinen and Anneli Partala Natural Resources Institute Finland, Helsinki 2021 Natural resources and bioeconomy studies 80/2021 CropYield—Towards pre- harvest crop yield forecasts with satellite remote sensing Final report Maria Yli-Heikkilä, Samantha Wittke, Mirva Kokkinen and Anneli Partala Recommended citation: Yli-Heikkilä, M., Wittke, S., Kokkinen, M. & Partala, A. 2021. CropYield—Towards pre-harvest crop yield forecasts with satellite remote sensing : Final report. Natural resources and bio- economy studies 80/2021. Natural Resources Institute Finland. Helsinki. 28 p. The project was funded by the European Union (grant 831735 – 2018-FI-CROPYIELD). Maria Yli-Heikkilä, ORCID ID, https://orcid.org/0000-0003-1528-7246 ISBN 978-952-380-307-7 (Print) ISBN 978-952-380-308-4 (Online) ISSN 2342-7647 (Print) ISSN 2342-7639 (Online) URN http://urn.fi/URN:ISBN:978-952-380-308-4 Copyright: Natural Resources Institute Finland (Luke) Authors: Maria Yli-Heikkilä, Samantha Wittke, Mirva Kokkinen and Anneli Partala Publisher: Natural Resources Institute Finland (Luke), Helsinki 2021 Year of publication: 2021 Cover photo: Anneli Partala Printing house and publishing sales: Juvenes Print, http://luke.juvenesprint.fi Natural resources and bioeconomy studies 80/2021 3 Summary Maria Yli-Heikkilä1, Samantha Wittke2,3, Mirva Kokkinen1 and Anneli Partala1 1) Natural Resources Institute Finland (LUKE), Latokartanonkaari 9, FI-00790 Helsinki, Finland maria.yli-heikkila@luke.fi 2) National Land Survey of Finland 3) Aalto University, Finland The general objective of this project was to enhance the crop statistics. To this end, we estab- lished a pilot case for an automated process for improved crop yield statistics by merging Earth observation (EO) data, the administrative data, agro-meteorological data and historical crop statistics survey data. The significance of the approach is that the previously very laborious data acquisition process from different sources and the processing of multistep modelling is now by design fully automated, and can thus reduce spending on professional surveying. The main achievement is that a new artificial intelligence-based crop yield forecasting system can pro- duce pre-harvest yield predictions for four main cereals (oats, barley, wheat, rye). Surveys are very costly in terms of time and expense. The same is true of gathering expert estimates on regional crop yield forecasts. During the last decade, EO systems have been shown to provide an effective means for large-scale crop monitoring and yield estimations. In this sense, this project has fulfilled its promise to establish a pilot case for an automated pro- cess of improved crop yield statistics by merging EO data and a data-driven modelling ap- proach. As a result, we can produce several in-season crop yield forecasts, the first already in late June, around the same time as the Joint Research Centre’s European-wide forecast. From then on, the forecasts can be published, for example, at 10 day-intervals. The machine learning models implemented in this project achieved a highly promising level of accuracy in pre-harvest yield predictions for four main cereals (oats, barley, wheat, and rye) when compared to the Joint Research Centre’s and LUKE’s seasonal forecasts. However, the problem of choosing the best model remains. There was no clear winning model that reliably predicts yields at all times. Therefore, a model comparison will be the most important devel- opmental task ahead. In the context of agricultural statistics, more accurate in-season forecasts of crop yields benefit sustainable agriculture and food security with better informed political decisions. In addition, reliable crop forecasts have market impacts. Moreover, EO-based applications can be globally applicable. We expect that within a few years our EO-based crop forecasting will be proven to be a sound method to replace in-season regional expert estimates, and in the foreseeable future it will also gradually replace annual farm surveys. The uptake of EO as a new data source in statistical production was more complex than initially expected. There are a myriad of approaches to monitoring crop yields, the main decisions to make being whether to utilise: i) optical or radar satellite data or both, ii) image mosaics or single images, iii) pixel-based or object-based image analysis. In addition, remote sensing re- quires specialized expertise, not to mention the specialized expertise needed in predictive modelling. We acknowledged the lack of remote sensing expertise and made a decision at the start to outsource the pre-processing of satellite images. With outsourcing the sustainability of the project may be jeopardized if the know-how outsourced cannot be fully transferred to the statistical production. In this sense, one significant achievement in the project has been the Natural resources and bioeconomy studies 80/2021 4 uptake of EO knowledge, with a substantial contribution from the National Land Survey of Fin- land, which became a sound part of our production system. As a result, we have the in-house readiness to apply EO as a new data source also to other statistical themes. It was concluded that country-wide forecasts seemed to work already in June, probably due to the inherited sampling weights from the crop production surveys. However, for the regional forecasts the sampling data was inadequate. For regional forecasts, we would need to sample fields to gain an equal spatial coverage. Moreover, for the northern regions the crop forecast- ing is reasonable only from July on due to the later sowing dates. Therefore, further study is needed to evaluate the best physiologically grounded observation window for each region. Deploying the forecasting pipeline requires further automatization. Especially at the end of the pipeline the validation of the results needs further scrutiny. Uploading the predictions to sta- tistical production databases requires modifications to existing ICT-systems. In addition, pre- diction model architectures need to be revised and improved along with the new data from the coming years. Keywords: Earth observation, remote sensing, crop yield, agricultural statistics, machine learning Natural resources and bioeconomy studies 80/2021 5 Contents 1. Introduction ............................................................................................................. 6 2. Model evaluation .................................................................................................... 7 2.1. Research plan(s) ............................................................................................................................................... 7 2.1.1. Reshaping time series into analysis ready data .......................................................................... 8 2.1.2. Ancillary meteorological data ......................................................................................................... 11 2.1.3. Alternative feature engineering approach ................................................................................ 11 2.2. Modelling ........................................................................................................................................................ 11 2.2.1. Vanilla Recurrent Neural Network (RNN) .................................................................................. 11 2.2.2. Long-Short Term Memory Network (LSTM) ............................................................................. 11 2.2.3. Random Forest (RF) ............................................................................................................................ 11 2.3. Results ........................................................................................................................................................... 12 2.3.1. Seasonal forecast for the year 2020 ............................................................................................. 12 2.3.2. Testing alternative feature engineering and meteorological features ........................... 13 2.3.3. Model comparison .............................................................................................................................. 15 2.4. Seasonal forecasting assessment .......................................................................................................... 17 3. Crop production survey as a reference .............................................................. 19 3.1. Pre-processing .............................................................................................................................................. 19 3.2. Representativeness of the reference data .......................................................................................... 19 4. Data processing pipeline ...................................................................................... 24 4.1. Single images ................................................................................................................................................ 24 4.2. Image mosaics .............................................................................................................................................. 25 4.3. Processing time for forecasting .............................................................................................................. 25 5. Practical implications to statistical production ................................................. 26 6. Conclusions ............................................................................................................ 27 References .................................................................................................................... 28 Natural resources and bioeconomy studies 80/2021 6 1. Introduction The Earth observation (EO) systems provide an effective means for large-scale crop monitoring and yield estimation. The current radar and optical remote sensing satellite imagery has a fairly high spatial and temporal resolution and contains a wealth of information on vegetation growth. In the near future new missions are planned for launch. Agricultural monitoring has long utilised satellite remote sensing to estimate product yields. Traditionally, crop yield fore- casting has been based on a single or an ensemble of physiological models of crop growth. For example, process-based crop growth models such as the Joint Research Centre’s (JRC) MARS Crop Yield Forecasting System utilise satellite-derived data as a complementary source of information. With the increasing volume and variety of remote sensing data, there is a need to develop a fully data-driven model for crop yield forecasting. In this project, we established a pilot case for an automated process for improved crop yield statistics by merging EO data, administrative data, agro-meteorological data, and historical crop statistics survey data. The novelty of the approach is that the previously very laborious data acquisition from different sources and pro- cessing of multistep modelling is now by design fully automated, and thus, can reduce spend- ing on professional surveying. The new artificial intelligence-based crop yield forecasting sys- tem can produce pre-harvest yield predictions for four main cereals (oats, barley, wheat, rye). In Finland, the growing season starts from early May till harvesting time in late August or Sep- tember. Currently, crop yield forecasts are produced twice: at the end of July and August. The in-season forecasts are based on expert estimates from the regional agricultural advisory bod- ies. The forecasts are published on a national level. The final statistics based on the Finnish crop production survey are published in February–March the following year. The final statistics are published on a regional level (similar to NUTS3). The Finnish crop production survey is based on stratified sampling and is conducted by the Natural Resources Institute Finland (LUKE). In this project, we produced crop yield forecasts for June, July, August and finally for a complete time series at the end of the growing season in September. With a fully automated forecasting system, we can produce the first forecast in late June, which is around the same time as JRC’s European-wide forecast. Forecasts can be published, for example, at 10 day-intervals. This report describes the forecasting system in detail. In the final phase of the project, we pro- duced forecasts for the year 2020. The model was trained with the data from the previous years: 2016–2019. Here, we report the processing times and amount of data for the year 2020 only. For data storage and computing we used the Puhti supercomputer hosted by the CSC – IT Center of Science, which is a center of expertise in information technology owned by the Finn- ish state and higher education institutions. Natural resources and bioeconomy studies 80/2021 7 2. Model evaluation This section describes the model development and results in detail. Figure 1 below serves as a graphical abstract of the model development. Figure 1. The scheme of the model development. 2.1. Research plan(s) According to our initial research plan (A), crop yields would be predicted from a sequence of remotely sensed multispectral images, where each image corresponds to a different day of the year (DOY) within a growing season. In the course of the project, we also established a new research line (B) based on mosaicked observations of indices. Figure 2 illustrates the two lines of research. Natural resources and bioeconomy studies 80/2021 8 Figure 2. Two lines of research were followed. Plan A is based on single multispectral images from Sentinel-2. Plan B. is based on mosaicked observations of three normalized difference indices. 2.1.1. Reshaping time series into analysis ready data Multispectral images from Sentinel-2 have 13 bands, of which 10 are usable for crop monitor- ing. Our initial hypothesis was that data from all available images and all bands should be utilized (plan A) as compared to the traditional approach to compress the band information into indices and to stack observations into fixed-length time windows (plan B). The prevailing assumption in neural network applications is that data should be feed with minimum pre-pro- cessing modifications to avoid the loss of information. Hence, the plan A can be labelled a data-driven approach, whereas plan B is a compromise of compressing information into a user- friendly format for downstream modelling. This section describes how satellite and ancillary data were reshaped into an analysis-ready format in both cases. Single multispectral images Following the data-driven approach, all relevant spectral bands for land monitoring were used, that is, bands 2, 3, 4, 5, 6, 7, 8, 8A, 11 and 12. The excluded bands were related to observing coastal or inland waters or atmospheric aerosols (band 1), water vapour (band 9), and cirrus reflectance (band 10). We used Level 2A products which are images of geometrically corrected bottom-of-atmosphere reflectances. The product also includes cloud masks, but we did not use them. Instead, again, following the data-driven approach, we preset a ‘healthy’ range for the pixel values for each band. Namely, by calculating the percentiles of all pixel values within the training set for each crop type, we used the 1st and the 99th percentiles as the ‘healthy’ minimum and maximum values. Values outside this range were ignored. We believe values Natural resources and bioeconomy studies 80/2021 9 close to zero and at the upper end are either biased reflectance values or represent clouds, and therefore do not contain any relevant information for this study. The observable unit in the study was a farm because our reference data were the average crop yields on the farm level. More specifically, the spatial observable unit was a field (polygon) or a group of fields (multi-polygon) if a farm was growing the crop in question on several fields. First, pixel values were extracted from a (multi-)polygon in the satellite image. Here we used a ‘healthy’ range to arrange polygon-wise pixel values into 32 bin normalized histograms. Thus, at each bin, the value is the value of the probability density function normalized such that the integral over the range is 1. For each band we have 32 features. Finally, for each observable unit (farm) we have multiple observations from the growing season. In our data set we had satellite acquisitions from up to 115 days between 10/5 and 01/09. Due to cloudiness and satellite revisit times, we had significantly fewer observations within a grow- ing season. Therefore, the time series are variable-length (ragged), and the missing time steps need to be padded with zeros. For example, for predicting oats in 2020, we have 13,677 parcels, 115 zero-padded time steps, and as features 32 bins from 10 bands, altogether 320 features. The data shape is three dimensional (13677, 115, 320). Index mosaics Image mosaicking is a technique used to assemble multiple overlapping images together. An image mosaic is an integral representation of a scene produced from the most optimal pixel values. We utilized pre-processed 30-day mosaics of Sentinel-2-based indices covering the whole of Finland.1 The mosaics were computed on the 15th and the last day of month, and therefore, we had two observations available per month. 30 days were compressed into one value based on the maximum NDVI. That is, for all index mosaics, the pixel values were selected from the image with maximum NDVI value with the given time window. Figure 3 illustrates how the mosaics are computed from the time windows. 1 Sentinel-2 image mosaics are produced by the Finnish Environment Institute SYKE using the CalFin- cluster of Sodankylä National Satellite Data Center, and they were developed as part of the sub-pro- gramme “Distribution and Processing of Satellite Imagery” in the "Geospatial Platform of Finnish Public Administration" programme (2017–2019). Natural resources and bioeconomy studies 80/2021 10 Figure 3. A mosaicked Normalized Difference Vegetation Index (NDVI) is calculated by taking the maximum NDVI value of all available observations within 30 days. The 30-day windows overlap by15 days. From image mosaics we get a full time series. We used all the available image mosaics the first being from April 01–30/04 and last from October 01–30. Thus, we have 13 time steps. The Sentinel-2 image index mosaics used in this study were: the Normalized Difference Mois- ture Index (NDMI), the Normalized Difference Tillage Index (NDTI) and the Normalized Differ- ence Vegetation Index (NDVI). Pixel-wise indices are calculated from Sentinel-2 image bands (B) as follows: NDVI = (B8-B4)/(B8+B4) NDTI = (B11-B12)/(B11+B12) NDMI = (B8-B11)/(B8+B11 Similar feature engineering was carried out as with the single images. Namely, by calculating the percentiles of all the pixel values per index within the training set for each crop type, we used the 1st and the 100th percentiles as ‘healthy’ minimum and maximum values. Between the ‘healthy’ range we arranged pixel values polygon-wise into 16 bin normalized histograms. For example, for predicting oats in 2020, we have 13,677 parcels, 13 time steps, and as features 16 bins from 3 indices, making altogether 48 features. The data shape is three dimensional (13677, 13, 48). In the above, we have considered observations as time series from the growing season. Recur- rent neural networks are applicable to time series. However, the decision tree-based machine learning method Random Forest has proven out to be a good and robust method even for time-series tasks, although it cannot grasp the time dimension. We chose to use RF as a base- line learner for neural networks. The input data for RF needs to be 2-dimensional. Therefore, we stacked the time steps and features into one dimension resulting in a data shape of (13677, 624) with image mosaics. Natural resources and bioeconomy studies 80/2021 11 2.1.2. Ancillary meteorological data Meteorological data are considered significant predictors in traditional crop growth models. Therefore, we tested whether the RF model improved with ancillary meteorological features. For each (multi-)polygon, a centroid was calculated. Based on the centroid, the cumulative precipitation, solar radiation, and temperature were fetched from a weather database for the following dates: 01/06, 15/06, 01/07, 15/07, 01/08, 15/08, 01/09. This resulted in 13 time steps x 3 indices x 16 bins + 7 time steps x meteorological features. Accordingly, for predicting oats in 2020, the data set has a shape of (13677, 645). See Section 2.3.2. for results. 2.1.3. Alternative feature engineering approach In the above, we used normalized histograms with a predefined range as compressed features from pixel values. A common method to prepare data for modelling is to take only the mean value per observable unit, or alternatively the median, 25th and 75th percentiles. We tested whether histograms outperformed the percentile/mean approach. From all the pixels per (multi-)polygon we extracted the mean, median, the 10th, 20th, 30th, 40th, 60th, 70th, 80th, and 90th percentiles. We only tested this with image mosaics for the RF. Instead of 16 bins per index, we now have 10 features. Thus, the shape of the data is (3400, 13x30). See Section 2.3.2. for results. 2.2. Modelling Recurrent neural networks (RNNs) (Rumelhart et al., 1986) were chosen for processing the se- quential data from a chain of satellite images. RNNs can learn to recognize temporally ex- tended patterns in time series. Long-Short Term Memory networks (Hochreiter and Schmidhu- ber, 1997) are a special type of RNNs. We compared RNN methods to a Random Forest (RF) (Breiman, 2001) decision tree-based machine learning method. The fundamental difference between RNNs and RF is that while RNNs can consider the temporal dynamics in the data, RF does not explicitly capture that. 2.2.1. Vanilla Recurrent Neural Network (RNN) Neural networks consist of connections from the input layer to the output layer. In between, there are usually one or more hidden layers. In the hidden layers, the Vanilla RNN stores an internal state and updates it after each time steps. In the end, the hidden state is a function of all previous hidden states. This way, the network carries information in time. We can say it remembers the earlier phases. 2.2.2. Long-Short Term Memory Network (LSTM) An LSTM is a variant architecture of an RNN that often outperforms the Vanilla RNN especially in long time series. An LSTM has the ability to carry a long-term memory by slowly changing weights. At each time step it considers whether to update the internal state or not with the help of four gatekeepers. 2.2.3. Random Forest (RF) RF is a powerful and robust decision tree-based learning approach. Regression RF is an ensem- ble method, where the forest prediction for a particular test point (farm) is the average of 500 tree-based predictors. Note that the RF approach does not particularly capture the time Natural resources and bioeconomy studies 80/2021 12 dimension. Each time step is merely added to the input data as a new independent static vari- able. While RNNs require lot of developmental work to build a network architecture which is suitable for the prediction task and to fine-tune the hyperparameters, RF performs quite well with simple 2D data and with default parameters. We used RF as a baseline method for more complicated RNNs, but also to test alternative feature engineering methods (histograms or percentiles), and whether meteorological features improved the predictive performance. 2.3. Results In this section we present results from our research lines A and B (see Figure 1 for the research line schemes). From single satellite images we processed all available Sentinel-2 data, i.e., from 2016–2020. From these data sets we trained two RNN models: RNNsingle and LSTMsingle. The openly available pre-processed image mosaics covered the years 2018–2020. From these image mosaics we produced the LSTMmosaic and the RFmosaic models. Finally, we trained the models with data from the years 2016–2019 (single) or 2018–2019 (mosaic) and tested the models with year 2020 data. 2.3.1. Seasonal forecast for the year 2020 We developed prediction models for four main crop types. We have separate models for winter wheat, spring wheat, rye, feed barley, malting barley, and oats. To compare the performance of the prediction models, we collected public forecasts for reference. JRC publishes forecasts for all the aforementioned crops except for oats. Forecasts are published in the JRC MARS Bulletin in the middle of June and at the end of July and August. LUKE publishes crop forecasts in the middle of July and at the end of August. The preliminary crop statistics based on a farmer survey are published at the end of November. In this section, we present figures forecasting the crop yields for 2020. ‘Final’ means predictions based on all the satellite data for the growing season. The final LUKE forecast are the preliminary crop sta- tistics published in November. This can be considered as the ground truth and is denoted here with the red horizontal line. Predictions for June, July and August are based on the satellite images acquired by the 15th of the passing month. Here we present results for spring wheat and feed barley. Figure 4 shows how the LSTMsingle approach was closest to the Final spring wheat yield estimate in June. The RFmosaic performed best in July but was over optimistic again in August. This year the JRC and LUKE forecasts, as well as mosaic predictions seemed to be in consensus for a higher yield. For the feed barley forecasts in Figure 5 there was a consensus between all the rivals, except that the RNNsingle approach predicted overly low yields. However, surprisingly, its companion LSTMsingle was very close to the Final yield estimate already for June and throughout the season. Natural resources and bioeconomy studies 80/2021 13 Figure 4. In-season and post-harvest crop yield forecast for spring wheat in 2020. Figure 5. In-season and post-harvest crop yield forecast for feed barley in 2020. 2.3.2. Testing alternative feature engineering and meteorological features In this section, we present results for two crop type, spring wheat and feed barley. We tested how two different feature engineering methods (histograms vs. percentiles) performed, and how adding meteorological features effected the model accuracy. The meteorological features consisted of the cumulative precipitation, solar radiation, and temperature (see Section 2.1.2. for details). The metric for accuracy was the root mean squared error (RMSE), and this means how much on average the prediction per farm deviated (+ or -) from the yield the farmer had declared. The results are from the test set. The data set was pooled from the years 2018–2019. In the following Figures 6 and 7, all RF models were trained from image mosaic data for spring wheat and feed barley, respectively. RF-percentilesMeteo means that from each (multi-)poly- gon all the pixel values are compressed into percentiles (see Section 2.1.3. for details) and means. These and the meteorological features are the features for the model. RF-histoMeteo Natural resources and bioeconomy studies 80/2021 14 means that from each (multi-)polygon all the pixel values are compressed into histograms (see Section 2.1.1. for details), and these plus the meteorological features are the features for the model. The RF-percentiles model was trained with only percentiles+mean features, The RF- histo model was trained with histogram features. The LSTM model was trained from image mosaics data with histogram features, and it is shown here just for comparison. Figure 6 shows that in June, the spring wheat yield forecasts from RF-histoMeteo and RF-histo were slightly superior to rivals with percentiles. The number of farms in the training set was 2072. The R2 for the Final RF model was quite rather high evenly for all model variants: • percentilesMeteo 0.61 • histoMeteo 0.60 • percentiles 0.60 • histo 0.58 Figure 6. Model accuracies in in-season and post-harvest crop yield prediction for spring wheat in 2020. Figure 7 shows that in June, the feed barley yield forecasts from RF-histoMeteo and RF-histo models were slightly superior to the rivals with percentiles. However, the performance of the histogram model variants did not improve later in the season, whereas the percentile models made performance improvements by the end of the season. The R2 value for the Final RF model was higher for the percentile model variants compared to the histogram models: • percentilesMeteo 0.45 • histoMeteo 0.29 • percentiles 0.45 • histo 0.27 The number of farms in the training set was 3,190. Adding meteorological features did not seem to improve predictive performance with any model variant. Natural resources and bioeconomy studies 80/2021 15 Figure 7. Model accuracies in in-season and post-harvest crop yield predictions for feed barley in 2020. In the light of results from all the crop types under scrutiny, we conclude that neither the his- togram nor the percentile approach yielded a superior performance, except that the percentiles method performed better for feed barley. Adding meteorological features may improve the model, but only slightly. We also saw a decreasing trend in the model error from the first pre- dictions in June to the last prediction (Final). This was also expected as in June there was less information available, and at the end of the season the learner benefited from the full time series from the growing season. 2.3.3. Model comparison For a model comparison on an aggregated country-level we can calculate crop yield predic- tions over multiple years and compare these predictions to averaged crop yield statistics. In Table 1 we have calculated the mean crop yield from the crop statistics over 2016–2020 in comparison to the post-harvest mean crop yield predictions from LSTM and RNN models trained on single satellite images from the same years. In Table 2 we have calculated the mean crop yield from the official crop statistics over 2018–2020 vis-à-vis test set predictions from the same years. These models (RF and LSTM) were trained on image mosaics. We can see that in this light LSTM performs better than RNN with a single image approach. However, in all cases, models trained on image mosaics performed better. The RFmosaic model performed better than the LSTMmosaic model in 5 out of 6 cases. Natural resources and bioeconomy studies 80/2021 16 Table 1. Mean crop yields (kg/ha) over 2016–2020 in comparison to the mean post-harvest LSTMsingle and RNNsingle predictions and the difference () of the predictions from the crop statistics for the same period. Crop type Crop statistics LSTMsingle LSTMsingle RNNsingle  RNNsingle Winter wheat 4,140 3,553 -586 (-14%) 2,244 -1895 (-46%) Spring wheat 3,650 3,244 -405 (-11%) 3,222 -427 (-12%) Rye 3,648 3,234 -413 (-11%) 1,667 -1980 (-54%) Feed barley 3,704 3,412 -291 (-8%) 2,730 -973 (-26%) Malting barley 3,880 3,700 -179 (-5%) 1,639 -2240 (-58%) Oats 3,510 2,969 -540 (-15%) 2,674 -835 (-24%) Table 2. Mean crop yields (kg/ha) over 2018–2020 in comparison with the mean post-harvest RFmosaic and LSTMmosaic predictions and the difference () of the predictions from the crop statistics for the same period. Crop type Crop statistics RFmosaic  Rfmosaic LSTMmosaic  LSTMmosaic Winter wheat 4,186 4,240 54 (+1%) 4,078 -108 (-3%) Spring wheat 3,446 3,547 100 (+3%) 3,836 390 (+11%) Rye 3,660 3,660 0 (0%) 3,698 38 (+1%) Feed barley 3,650 3,536 -113 (-3%) 4,125 475 (+13%) Malting barley 3,713 3,603 -109 (-3%) 3,770 57 (+2%) Oats 3,466 3,424 -41 (-1%) 3,701 234 (+7%) Natural resources and bioeconomy studies 80/2021 17 2.4. Seasonal forecasting assessment Here we note a few key points for assessing the seasonal forecasts. With single images, we should be able to detect seasonal dynamics, whereas with mosaic composites, tracking physi- ological dynamics is somewhat lost. However, the image mosaics still show smooth curves of NDVI, NDTI, and NDMI, whereas the band intensities from the single images are hard to inter- pret. Figures 8 and 9 show how the data looks at the farm level. Figure 8. The median intensity values of ten Sentinel-2 bands from a parcel growing feed bar- ley. Natural resources and bioeconomy studies 80/2021 18 Figure 9. The median intensity values of NDVT, NDMI, and NDVI from the parcels of a farm growing feed barley. The greenness index NDTI and the moisture index NDMI in Figure 9 form nice bell curves peaking in June and July, as expected. The tillage index NDTI is related to cellulose and lignin absorption features of the vegetation and presents here as a surprisingly flat bell curve. The NDTI is to detect crop residues and senescent, nonphotosynthetic physiological phase of the crop in the autumn. In this data, the NDTI shows only a moderate increasing trend towards the end of the season. The single images with ten bands should detect seasonal dynamics, such as physiological stress and photosynthetic activity, on a full scale. However, surprisingly, no clear trend can be de- tected in this case. More-in-depth analysis of the growing season agro-meteorological condi- tions could be conducted to illuminate what is happening in the Figure 8. Nevertheless, the reflectance data is most probably suffering from pervasive cloudiness and shadow contamina- tion, as no cloud filtering was applied. This also explains why the farm-level mean square errors are quite high, whereas the large-scale predictions is fairly accurate. Natural resources and bioeconomy studies 80/2021 19 3. Crop production survey as a reference This section describes how we utilized data from crop production surveys for forecasting. The Finnish crop production survey is based on stratified sampling and is annually conducted by the Natural Resources Institute Finland. The total number of agricultural holdings in Finland is about 45,000 and around 6,000 (13%) participate in the farm survey each year. 3.1. Pre-processing First, we applied some pre-processing steps. First of all, at the model training time the statistical unit used was a farm because as a reference we had average yields per farm for six crops. At forecasting time, the observational unit used was a field. In satellite images, the smallest observational unit is a pixel. Sentinel-2 images have a resolution of 10 by 10 meters. Therefore, we decided to filter out all fields smaller than 1ha. This ensured we had enough pixels overlaying a field. We also checked whether all the geometries were valid. Table 3 shows how the area and number of fields in the sample was divided between crops. Table 3. Area of crops and the number of fields and their share in the crop production sam- ple in 2020. Crop Total area in the sample (ha) No. of fields in the sample Autumn wheat 5,545 (3%) 1,157 (2%) Spring wheat 37,903 (19%) 8,483 (17%) Rye 3,789 (2%) 860 (2%) Feed barley 78,511 (40%) 20,334 (41%) Malting barley 12,850 (7%) 2,665 (5%) Oats 56,774 (29%) 15,665 (32%) Total: 195,373 (100%) 49,164 (100%) 3.2. Representativeness of the reference data The sampling design for the crop production survey follows multistage weighted sampling. Most of the weight is determined by the regional share of total harvested area for the main crops in Finland. Therefore, we do not have equal spatial coverage of farms in Finland, but most of the farms come from the high agricultural productivity regions. Other variables determining the weights in the sampling design are the production type and economic size of the farm. When testing the prediction models, we used the sample for the crop production survey for the test year (2020). Therefore, we have inherited the weighting from the sample design into the predictions as well. The weighted sampling explains why we have quite an accurate average yield prediction for the whole country. However, for producing regional aggregates, the weighted sampling fails. There are very few farms in the sample from some areas (see for ex- ample the numbers of malting barley parcels in Table 4). Therefore, for regional predictions we should increase the number of observations, and not follow the crop production sampling plan. Natural resources and bioeconomy studies 80/2021 20 Table 4. Number of field parcels in the testing set in 2020 per crop type in the regions. Please refer to the region number on the map in Figure 11. Region Autumn wheat Spring wheat Rye Feed barley Malting barley Oats 1 181 1,270 91 569 552 1,017 2 381 2,155 190 2,209 847 1,915 3 74 1,166 124 1,743 218 1,551 4 111 751 110 1,160 658 1,613 5 74 702 49 1,190 90 1,594 6 92 497 75 796 117 957 7 10 116 16 532 0 447 8 10 100 14 1,786 4 522 9 0 75 20 547 13 430 10 13 92 15 627 6 552 11 100 840 80 3,372 71 2,330 12 35 403 27 2,985 79 1,281 13 30 175 12 2,443 2 1,236 14 0 27 5 151 0 112 15 0 1 0 64 0 20 Table 5. The starting and ending windows of regional sowing dates in 2016–2020 for all crops. Source: LUKE, 2021. Region Start Finishing Uusimaa 20.4.–7.5. 29.5.–2.6. Southwest Finland 15.4.–7.5. 27.5.–8.6. Satakunta 25.4.–10.5. 25.5.–10.6. Pirkanmaa 20.4.–10.5. 25.–31.5. Häme 22.4.–9.5. 26.5.–1.6. Kymenlaakso 24.4.–10.5. 19.5.–5.6. South Karelia 24.4.–12.5. 24.5.–1.6. South Savo 25.4.–8.5. 6.–7.6. North Savo 7.–16.5. 25.5.–10.6. North Karelia 11.-18.5. 31.5.–10.6. Central Finland 29.4.–17.5. 23.-31.5. South Ostrobothnia and Ostrobothnia 23.4.–17.5. 23.5.–5.6. North Ostrobothnia 5.–29.5. 3.–20.6. Kainuu 4.–14.5. 19.–25.6. Lapland 23.–29.5. 5.–30.6. Natural resources and bioeconomy studies 80/2021 21 Figure 10. Land under cereal cultivation (ha) in 2020 based on Finnish Food Authority’s Agri- cultural Parcel Registry. Another issue to bear in mind is the wide difference of the cropping systems in the south of the country compared to the north. The majority of Finnish arable land is located between the latitudes of 60 and 65°N (see Figure 10). Natural resources and bioeconomy studies 80/2021 22 Figure 11. The regions as a reference to Table 4 and 6. Source: The Municipal Division da- taset, National Land Survey of Finland 2020. Table 6 shows how regional forecasts from Random Forest predictions in June 2020 deviate from the final crop survey-based regional forecasts for all cereal crops. The mean of error in kg/ha and percentage is region-wise and crop type-wise. The error is large in Lapland (no. 15 on the map) and Kainuu (no. 14 on the map). Firstly, it is not uncommon that crops are sown only in June in the northern parts of Finland. Secondly, we only have a few observed fields in Lapland. Whereas, in the southwestern Finland the regional forecasts seem to be quite accu- rate. This region is also considered to be the granary of Finland. Crops are sown there already at the end of April or early May. See Table 5 for the regional sowing dates of all crops. With this in mind, we can conclude that country-wide forecasts seemed to work already in June, probably due to the inherited sampling weights from the crop production survey. However, for the regional forecasts the sampling data is inadequate. For regional forecasts, we should sam- ple fields to have an equal spatial coverage. Moreover, for the northern regions the crop fore- casting is reasonable only from July onwards due to later sowing dates. Natural resources and bioeconomy studies 80/2021 23 Table 6. Deviation (error) from the final regional crop yield forecasts (kg/ha) when predicting with the Random Forest model in June 2020. Please refer to the region numbers on the map in Figure 11. Region Autumn wheat Spring wheat Rye Feed barley Malting barley Oats Mean 1 504 -563 358 -671 -634 -30 -173 (-6%) 2 32 -929 114 -629 -500 110 -300 (-10%) 3 -333 -905 7 -652 -840 -12 -456 (-14%) 4 166 -456 122 -391 -369 173 -126 (-4%) 5 -101 -814 315 -575 -402 -389 -328 (-11%) 6 -126 -660 7 -851 -866 -51 -425 (-13%) 7 -1,605 -443 397 -149 NA -22 -364 (-10%) 8 -722 -449 207 -23 NA -515 -300 (-11%) 9 NA -907 55 -141 -2,489 -389 -774 (-36%) 10 -109 -566 -223 -428 -553 -208 -348 (-12%) 11 -484 -109 500 21 -44 80 -6 (0%) 12 121 -61 751 220 -237 223 170 (+4%) 13 -192 191 91 431 -1,864 -175 -253 (-13%) 14 NA -1,243 -850 -716 NA -1,267 -1,019 (-46%) 15 NA NA NA -428 NA -2,746 -1,587 (-133%) Mean -237 (-8%) -565 (-19%) 132 (+1%) -332 (-11%) -800 (-32%) -348 (-23%) Natural resources and bioeconomy studies 80/2021 24 4. Data processing pipeline In this section we evaluate how much time, computing resources, and data storage is required for processing the data into an analysis ready format. 4.1. Single images Creating the data set from single images starts with downloading the Sentinel-2 Level 2A im- ages from the Copernicus Open Access Hub (Scihub). In the summer 2020, we downloaded images in two sets. The first images from 10.5.–20.6. were downloaded with a customized script on the 21st of June. There were 1496 zip-files occupying 1Tb of storage space. The download- ing took 16 hours. Unzipping the zip files took another 2 hours. In the autumn, we downloaded the rest of the images from 21.6.–1.9.2020. We got 2,673 zip-files, which occupied 1.8Tb of storage space, and the download took 34 hours. Unzipping took 3 hours. The images are from 64 tiles covering the whole of Finnish arable land. After unzipping we had SAFE folders occu- pying 3.1Tb. Each SAFE folder includes many small files, mostly meta data files. Therefore, also the number of files is considerable, amounting to 600,000 in total. After the image data was stored, we ran a Python program to extract all the pixels from our observational fields. The program is run for each image separately, but the tasks run in parallel as an array job. However, six crops are run separately, one after another, to maintain sanity. Although we downloaded the data in two sets, we processed all of them only in the autumn. Therefore, all the processing times here relate to the whole data set in 2020. There were 41,570 images (JPEG2000) to process. Note that one satellite image consists of 13 bands, and each of these bands are stored in a separate JPEG2000-file. A field may be included in two or even three overlaying tiles, and therefore, some fields may have more observations. This processing step was the most time and computing expensive step. Table 7 shows the computing time needed, the number of images and observed fields per crop. For some crops, the arable area spans multiple tiles, therefore has more images. For example, oats and feed barley are cultivated up to the north of the country, but malting barley is cultivated only in the south. Table 7. The computing time, the number of images and observed fields per crop when ex- tracting pixel values from Sentinel-2 images. Crop Fields Images Computing time Winter wheat 2,037 20,511 27min Spring wheat 7,289 25,340 3h 28min Rye 2,279 22,920 24min Feed barley 11,762 29,630 7h 45min Malting barley 2,301 16,670 1h 5min Oats 13,677 27,200 6h 20min Total: 39,345 142,271 19h 29min Natural resources and bioeconomy studies 80/2021 25 4.2. Image mosaics When following the research plan with image mosaics, we started with image files locally stored in the Puhti supercomputing server. Additionally, extraction of intensity values could be done in parallel for all sets. Table 8 shows the computing time needed, the number of images and observed fields per crop. Table 8. The computing time, the number of images and observed fields per crop when ex- tracting pixel values from Sentinel-2 image mosaics. Crop Fields Images Computing time Winter wheat 998 48 parallel Spring wheat 7,384 48 parallel Rye 728 48 parallel Feed barley 18,470 48 parallel Malting barley 2,214 48 parallel Oats 13,876 48 parallel Total: 43,670 288 1h 23min After extracting the pixel values, both the single image and image mosaic research line fol- lowed the same processing steps. Namely, from pixel values, histograms or percentiles were computed. After that the histograms were reshaped into an analysis-ready format. These steps take at maximum tens of minutes for each data set. 4.3. Processing time for forecasting Here we estimate, how much time it takes to produce country-wide crop forecasts in June. First of all, with single images, it takes about 18 hours to download and unzip the data. This can be started already earlier in June and the latest images can be downloaded just at the start of the forecasting time. For extracting the pixel values from the images, we will need parcel geome- tries from the Finnish Food Authority, which are possibly available in mid-June. Extracting pixel values may take 10 hours. Downstream processing will take max. 2 hours. Producing forecasts from single images will thus take about three days. Producing forecasts from image mosaics will take a few hours or a maximum of one day. To conclude, deploying the forecasting pipeline requires further automatization. Especially in the end of the pipeline, the validation of the results needs further scrutiny. Uploading the pre- dictions to statistical production databases needs modifications to existing ICT-systems. In ad- dition, the prediction model architectures need to be revised and improved along with the new data from the becoming years. Natural resources and bioeconomy studies 80/2021 26 5. Practical implications to statistical production The uptake of EO as a new data source in statistical production was more complex than initially expected. There are a myriad of approaches to monitoring crop yields, the main decisions to make being whether to use: i) optical or radar satellite data or both, ii) image mosaics or single images, iii) a pixel-based or object-based image analysis. In addition, remote sensing requires specialized expertise, not to mention the specialized expertise needed in data engineering and predictive modelling. We acknowledged the lack of remote sensing expertise and made a decision at the start to outsource the pre-processing of the satellite images. With outsourcing the sustainability of the project may be jeopardized if the know-how outsourced cannot be fully transferred to the statistical production. In this sense, one significant achievement in the project has been the uptake of EO knowledge, with substantial contribution from the National Land Survey of Fin- land, as to a sound part of our production systems. Although we aimed at fully automated processing pipeline to be deployed in statistical pro- duction, the project was initially explorative. In order to develop an existing crop yield statistics to be more detailed and timelier, we have explored new data sources and developed method- ology for preparing first experimental statistics to evaluate its potential. We declared a need for improvement in the existing statistics, we had the innovative minds and courage to take the necessary steps to formulate a project plan and we had a vision how to fulfil the promise of economically sound improvements in the statistical production to the decision makers. We did not have a dream team of specialized expertise in remote sensing, IT systems, data engi- neering, data science and machine learning. However, during the project, we learned to master the core skills needed. At the same time, we have inherited the well-known downsides of ex- plorative pilot projects: the skills and knowledge of a new system sits tight within the key work- ers of the project. We will need to work out how to transfer the knowledge within the produc- tion unit. At this stage the risk of departing experts would severely hamper efficient utilization of the achievements of the project in the future. Another well-known obstacle with explorative projects is the bottleneck from developmental phase to the regular statistical production. How to facilitate the use of new data source to the domain statisticians remains a challenge and will require further reallocation of human re- sources and update of skills before the new data source becomes operationally mature part of the statistical production. However, we are fully motivated to drive this change, since there is a lot at stake. In the context of agricultural statistics, more accurate in-season forecasts of crop yields benefit sustainable agriculture and food security with better informed political decisions. We expect that within few years our EO-based crop forecasting is proven a sound method to replace in-season re- gional expert estimates, and in the foreseen future also gradually replaces the annual survey. At the same time, we have in-house readiness to apply EO as a new data source also in other statistics themes. As EO-based applications can be transnationally and even globally applicable, we should also foster international cooperation with international statistical organizations and national statis- tical institutes. We should also seek partners for research initiatives (inside or outside academia) for example on topics such as how reliable crop forecasts impacts the market, or what are the impacts of more dense-in-time crop forecasting for society at large. Natural resources and bioeconomy studies 80/2021 27 6. Conclusions Currently in Finland crop production statistics are based on both farm surveys and expert esti- mates combined with the data gathered from registers. The collection of the individual data is resource intensive. During the last decade, EO systems have been shown to provide an effective means to deliver large-scale crop monitoring and yield estimations. In this sense, this project has fulfilled its promise to establish a pilot case of an automated process for improved crop yield statistics by merging EO data and an advanced data-driven modelling approach. This report showed how, with a fully automated forecasting system, we can produce the first fore- cast in late June, around the same time as the JRC’s European-wide forecast, with higher accu- racy. The forecasts can be calculated, for example, at 10 day-intervals. The machine learning models implemented in this project achieved a highly promising level of accuracy in pre-harvest yield predictions for four main cereals (oats, barley, wheat, and rye). This new method will eventually reduce data collection costs, but in this initial phase, the annual crop production survey is still a necessity for collecting more ground truth data for model development. At this stage, we have developed prediction models for the four main crops but there is still a lot of work needed to enlarge the method to cover all crops under statistical production. We foresee that with further development of the prediction models, the reliability of the predictions can be improved in the coming years and can be enlarged to include other crops. On a large scale, more accurate in-season forecasting of crop yields will benefit sustainable agriculture and food security with better informed political decisions. In addition, reliable crop forecasts have a market impact. The societal impact of more crop forecasting needs to be evaluated. Natural resources and bioeconomy studies 80/2021 28 References Breiman, L. 2001. Random Forests. Machine Learning 45. p. 5–32. Hochreiter, S. & Schmidhuber, J. 1997. Long Short-Term Memory. Neural Computation 9. p. 1735–1780. LUKE, 2021. Statistics on Utilised Agricultural Area [Online]. Sowing dates available at: https://stat.luke.fi/sites/default/files/kevatkylvot_2000-2021_0.xls (Accessed: 21.9.2021). Rumelhart, D. E., Hinton, G. E. & Williams, R. J. 1986. Learning representations by back-propa- gating errors. Nature 323. p. 533–536. Natural Resources Institute Finland Latokartanonkaari 9 FI-00790 Helsinki, Finland tel. +358 29 532 6000