Jukuri, open repository of the Natural Resources Institute Finland (Luke) 
   
 
   
All material supplied via Jukuri is protected by copyright and other intellectual property rights. Duplication 
or sale, in electronic or print form, of any part of the repository collections is prohibited. Making electronic 
or print copies of the material is permitted only for your own personal use or for educational purposes.  For 
other purposes, this article may be used in accordance with the publisher’s terms. There may be 
differences between this version and the publisher’s version. You are advised to cite the publisher’s 
version. 
 
This is an electronic reprint of the original article.  
This reprint may differ from the original in pagination and typographic detail. 
 
Author(s): Alireza Hamedianfar, Cheikh Mohamedou, Annika Kangas and Jari Vauhkonen 
Title: Deep learning for forest inventory and planning: a critical review on the remote 
sensing approaches so far and prospects for further applications 
Year: 2022 
Version: Published version 
Copyright:   The Author(s) 2022 
Rights: CC BY 4.0 
Rights url: http://creativecommons.org/licenses/by/4.0/ 
 
Please cite the original version: 
Alireza Hamedianfar, Cheikh Mohamedou, Annika Kangas, Jari Vauhkonen, Deep learning for 
forest inventory and planning: a critical review on the remote sensing approaches so far and 
prospects for further applications, Forestry: An International Journal of Forest Research, 2022;, 
cpac002, https://doi.org/10.1093/forestry/cpac002. 
© The Author(s) 2022. Published by Oxford University Press on behalf of Institute of Chartered Foresters.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/
by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
1
Forestry An International Journal of Forest Research
Forestry 2022; 1–15, https://doi.org/10.1093/forestry/cpac002
Deep learning for forest inventory and planning: a critical review on
the remote sensing approaches so far and prospects for further
applications
Alireza Hamedianfar1, Cheikh Mohamedou1, Annika Kangas2 and Jari Vauhkonen1,*
1Department of Forest Sciences, University of Helsinki, Latokartanonkaari 7 (P.O. Box 27), FI-00014 Helsinki, Finland
2Bioeconomy and Environment Unit, Natural Resources Institute Finland (Luke), Yliopistokatu 6 B, FI-80100 Joensuu, Finland
*Corresponding author Tel: +358 50 4303895; E-mail: jari.vauhkonen@helsinki.fi
Received 14 June 2021
Data processing for forestry applications is challenged by the increasing availability of multi-source and multi-
temporal data. The advancements of Deep Learning (DL) algorithms have made it a prominent family of meth-
ods for machine learning and artificial intelligence. This review determines the current state-of-the-art in using
DL for solving forestry problems. Although DL has shown potential for various estimation tasks, the applications
of DL to forestry are in their infancy. The main study line has related to comparing various Convolutional Neural
Network (CNN) architectures between each other and against more shallow machine learning techniques.
The main asset of DL is the possibility to internally learn multi-scale features without an explicit feature
extraction step, which many people typically perceive as a black box approach. According to a comprehensive
literature review, we identified challenges related to (1) acquiring sufficient amounts of representative and
labelled training data, (2) difficulties to select suitable DL architecture and hyperparameterization amongmany
methodological choices and (3) susceptibility to overlearn the training data and consequent risks related to the
generalizability of the predictions, which can however be reduced by proper choices on the above.We recognized
possibilities in building time-series prediction strategies upon Recurrent Neural Network architectures and, more
generally, re-thinking forestry applications in terms of components inherent to DL. Nevertheless, DL applications
remain data-driven, in contrast to being based on causal reasoning, and currently lack many best practices of
conventional forestry modelling approaches. The benefits of DL depend on the application, and the practitioners
are advised to ex ante subject their requirements to operational data availability, for example. By this review, we
contribute to the technical discussion about the prospects of DL for forestry and shed light on properties that
require attention from the practitioners.
Introduction
Various forestry applications are suggested to benefit from
autonomous machines and systems that re-configure them-
selves upon an introduction of new components or information
(e.g. Uusitalo et al., 2006; Nuutinen et al., 2011; Pukkala et al.
2021). As reviewed by Müller et al. (2019), artificial intelligence
could potentially allow for autonomous decision-making in the
planning and implementation of forestry operations by learning
from observations and experiences. Yet, to date, themain use for
artificial intelligence has related to translating data from remote
sensing into forest attributes (Müller et al., 2019). Although
this is valuable, future forestry applications could benefit from
discovering complex or unconventional relationships in different
scales. The more frequent availability of multi-source forest data
due to new remote sensing methods (Kangas et al., 2018) may
open possibilities on the above, but also generate challenges
related to considerations of data from multiple sources and
(temporal and spatial) scales (see Seidl et al., 2013, for a review
on scaling issues).
Machine learning is a form of artificial intelligence, in which
a computer is algorithmically trained to perform a task such
as prediction or classification (Hastie et al., 2009; Schmidhuber,
2015). The dictionary definition by Oxford Languages stresses the
ability ‘to learn and adapt without following explicit instructions,
by using algorithms and statistical models to analyse and draw
inferences from patterns in data’. Deep Learning (DL) (LeCun
et al., 1989; see e.g. Graupe, 2016; Deng and Yu, 2013, for
overviews) is a form of machine learning that is based on the
neural network concept that resembles the function of brains.
Each network is composed of many layers that transfer the
input to output by progressively learning higher level features
D
ow
nloaded from
 https://academ
ic.oup.com
/forestry/advance-article/doi/10.1093/forestry/cpac002/6518266 by guest on 15 February 2022
Forestry
(Hatcher and Yu, 2018). These layers in between are called ‘hid-
den layers’ and a network with a sufficiently high number of hid-
den layers can be considered deep (Schmidhuber, 2015; Litjens
et al., 2017), in contrast to more shallow neural networks. DL
can perform learning tasks without the need of human-derived
explanatory variables (LeCun et al., 2015; Schmidhuber, 2015),
by virtue of which it has more potential to learn abstract features
from data (Shao et al., 2017). The ability to use DL with all kinds
of data, including numbers, images and audio, has paved its
way to a dominant role in the development of predictive sys-
tems for regression and classification problems (Hatcher and Yu,
2018).
The feasibility of machine learning approaches was recently
reviewed for applications such as Earth observation (Sal-
cedo-Sanz et al., 2020), change detection (Shi et al., 2020) and
fire management (Jain et al., 2020). Rammer and Seidl (2019)
and Reichstein et al. (2019) provided perspectives of using DL
to enhance the modelling of biotic damages (namely, bark
beetle outbreaks) and geoscientific processes, respectively. There
are additional implementation-specific reviews of DL for image
analysis and segmentation (Hoeser and Kuenzer, 2020; Hoeser
et al., 2020) that are applicable to the aforementioned domains
and remote sensing-based inventories (see also Kattenborn
et al., 2021). Finally, Diez et al. (2021) meritoriously reviewed
DL algorithms both in general and with respect to applicability
in various forest inventory tasks based on imagery acquired
using unmanned aerial systems. Nevertheless, the existing
reviews are not informative on the applicability of DL for forest
inventory and planning applications. In the reviews, DL was
predominantly applied to analyses of remotely sensed images,
whereas its benefits with other forestry data types are unknown
and may depend on factors such as operational data availability
in an application. Supervised learning to map the input data to
meaningful labels has been extensively studied using linear and
non-linear models, nearest neighbour search, support vector
machines and decision trees such as the random forest. A
forest modeller would likely benefit from guidance for choosing
between DL and these approaches, some of which may have
more conservative training data demands.
This article aims to present the state-of-the-art of DL as appli-
cable to various forestry applications, mainly those related to
forest inventory. The main article text is intended as an easy-
to-approach counsel to especially point out DL counterpoints to
conventional forestry modelling. The article is augmented by a
Supplementary data file with technical definitions on DL archi-
tectures. Those are described with a special focus on their impor-
tance and implementation in these applications. A literature
review of studies employing DL in forest inventory and planning
was conducted to figure out the effect of architecture and input
data-specific parameters on the obtained results. Based on the
review, strengths and opportunities of applying DL are identified
and discussed with current problems and challenges. The text is
structured as follows:
• A review of generic concepts and inherent properties of DL that
need attention when applied; in particular, to briefly explain
the key principles of commonly used approaches.
• A quantitative review of DL approaches currently used
in forestry applications; specifically, their architecture and
input data-specific factors and other aspects affecting their
applicability and feasibility.
• A qualitative review of current forestry applications of DL;
specifically, how the inherent aspects of DL methods are cur-
rently realized.
• Summing up ways forward and concluded recommendations.
Generic concepts and rationale behind DL
The interest in DL arouse especially since 2012, when the Con-
volutional Neural Network (CNN)-based AlexNet-architecture of
Krizhevsky et al. (2012) outperformed traditional machine learn-
ing algorithms in image object detection and classification (Ma
et al., 2019).We therefore elaborate principles of a CNNas a base-
line technique for the further review. The adoption of a CNN can
be rationalized by juxtaposing to conventional supervised learn-
ing, in which the workflow from data to labels consists of feature
generation, feature selection, classifier design and evaluation
(e.g. Theodoridis and Koutroumbas, 2008). These steps require
hand-crafted adjustments that need to be done application-
and data-specifically involving manual processing and notice-
able expertise (Sothe et al., 2020). In Figure 1, for example, the
image can be analysed by means of statistics extracted from its
histogram or, if information on the co-occurrence of the image
tones is needed, specific second-order statistics (see, e.g. Niemi
and Vauhkonen, 2016). The analyses are complicated in par-
ticular in the latter case due to considerations of the relevant
feature types, extraction scales, and that selected features are
still optimal to model the phenomenon in question when com-
puted from an image rotated slightly differently, for instance. The
extracted features should later comply with the assumptions of
the classifier or regressor such as normality or linearity with the
training data. Considering three-dimensional data or data from
multi-temporal or multiple sources makes the considerations
even more complex, which, on one hand, suggests that avoiding
tasks related to themanual feature selection and classifier design
is an asset. On the other hand, human-interpretable features
designed based on causal reasoning can also be an asset.
Convolutions and CNNs
A CNN uses convolutions to internally extract and analyse data
features and, therefore, avoids manual feature generation and
extraction. A convolution is a linear operation based on multi-
plying input data vectors with weight values given by kernels,
similar to what convolving an image with a filter does in an
image analysis context. Figure 2(a) shows the principle of how
a convolved feature vector is obtained by sliding a kernel over
input data and calculatingweighted values based on theweights.
Figure 2(b) shows how the resulting feature map depends on the
number of kernels and each kernel’s dimensions (determined
by width, height and depth depending on the input data) and
stride, which is a number of input unit that controls how the
kernel is moved from one position to the next position. The
CNN learns the weights of the kernels on its own, resulting in a
feature map that represents features rather than pixel values.
The use of many different convolution layers results in different
feature maps: for example, in image analyses, the 1D convo-
lutions may extract spectral information and 2D convolutions
2
D
ow
nloaded from
 https://academ
ic.oup.com
/forestry/advance-article/doi/10.1093/forestry/cpac002/6518266 by guest on 15 February 2022
Deep learning for forest inventory and planning
Figure 1 A schematic example of the unambiguity of feature extraction
for four circular plots to predict a forest attribute for the plot filled by red.
Besides extracting the pixel value of the plot, the co-occurrence of image
tones in the nested squares drawn around the plots could be informative
depending on the application. Relevant textural information can therefore
be (a) 1-dimensional (based on a feature extractor of 1×1 pixels), (b) 2-
dimensional (n× n pixels, where n is the size of the pixel neighbourhood)
or (c) multi-dimensional (n × n × m pixels, where m is the multi-
source or multi-temporal data depth). In CNN, the feature extraction can
ideally be replaced by the internal process to move convolution kernels
corresponding to a–c over the entire multi-dimensional data stack and
learn optimal weights for the obtained features based on representative
training data.
spatial features, whereas 3D convolutions can exploit both these
feature types. For instance, Mäyrä et al. (2021) found that 3D
convolutions outperformed 2D ones as they were capable of
learning spatial–spectral features in contrast to just spatial ones,
respectively, when classifying trees of different species based on
hyperspectral images. Stacking convolutional layers with differ-
ent convolution kernels is reasoned as a convolution exploits data
from its extent, and different convolutions may detect different
types of features. As the image data contains local correlation,
the local links within convolutions aim to exploit it for better
features (Liu, 2020).
A typical CNN architecture includes a convolution and pooling
mechanism, which breaks the image down into features anal-
ysed internally within the CNN, fully connected layers to weight
the outputs of convolution/pooling, and prediction (Figure 2c). A
fully connected neural network comprises a sequence of fully
connected layers that link each neuron in one layer to neurons of
another layer. The main advantage of fully connected network
is that it does not consider any particular assumption for the
input data. The major drawback of fully connected networks is
that they are computationally expensive and could be prone to
overfitting (Goodfellow et al., 2016; Chollet, 2018). Pooling is a
subsampling technique and does not implement any weights.
A max-pooling takes a maximum of input elements in a region
defined by the filter. It reduces the feature map dimensions
while maintaining important information needed for the predic-
tion task (Akhtar and Ragavendran, 2020). The convolution and
pooling are usually operated in several rounds so that the former
performs filtering to derive information, and the latter reduces
and focuses the information. Altogether, these steps extend the
field of view from the local computation unit to relationships
all over the data and may mitigate the negative impacts of
overfitting (Singh and Majumder, 2020).
Taken together, the whole CNN can, in principle, learn unique,
hierarchical patterns in multi-dimensional data. In turn, the
learning performance of a CNN depends on several parameters,
affecting the data requirements, processing and computation
burden, and the complexity of the resulting architecture, which
fundamentally needs to be fine-tuned per each application.
Figure 2(c) points out the needs for labelled data, or other
knowledge depending on an application to optimize these
parameters.
Inherent properties that require considerations
by a practitioner
Different CNN architectures will be formed depending on how
the components described in the previous section are arranged.
When constructing a CNN-based analysis, the operator quickly
comes across questions related to CNN architectures, training
and optimal parameterization. To assist in comprehending
these tasks, we have prepared a Supplementary data file that
describes choices related to adopting a CNN at a technical level.
Further, based on overviews such as Schmidhuber (2015) and
Nweke et al. (2018), we identified Recurrent Neural Network
(RNN), Autoencoders and Restricted Boltzmann Machine as
the main additional DL methods with relevance to forestry.
In the Supplementary data file, we provide a basic under-
standing of these approaches’ methodological principles and
differences.
When juxtaposed with any potential application, the afore-
mentioned choices need to be considered with respect to the
volume and cohesion of data available to train the DL framework
for the predictions. One prerequisite to apply DL is access to large
enough amounts of training data which is required to sufficiently
fine-tune the large number of network parameters and at the
same time to prevent overfitting. Regarding forestry applications,
the following data aspects were identified as critical to consider
already when pre-evaluating the suitability of a DL approach for
different applications:
3
D
ow
nloaded from
 https://academ
ic.oup.com
/forestry/advance-article/doi/10.1093/forestry/cpac002/6518266 by guest on 15 February 2022
Forestry
Figure 2 (a) The principle of obtaining a convolved feature vector [c1, c2, c3] from an input vector [x1, x2, x3, x4] by sliding a kernel with weights w1 and
w2. (b) Examples of convolving the input image with two rows and four columns by different kernels delineated by solid line. (c) A schematic diagram
of components and required considerations of a Convolutional Neural Network (CNN). The different CNN architectures are based on concatenating
varying numbers of convolution and pooling, fully connected and prediction layers. The three-dimensional box depicts the convolution kernel and
arrows around it illustrate its operation. Unlike similar representations in other literature sources, the figure highlights where labelled data or other
knowledge should be used to (1) inform on the form of the relationships using (piecewise) linear or non-linear activation functions; (2–3) adjust (hyper-
)parameters according to the prediction performance; or (4) validate predictions. Steps 2–3 are based on comparing the change of an expected loss
(E) between predicted and training features to a threshold (t). These steps differ depending on the operating mode of the CNN (forward-pass or
back-propagation).
• Although the amount of required data samples cannot be
generically instructed, we note that in the forestry context,
sufficient data to train an efficient and robust DL architecture
corresponds to a number of observations collected in wide-
scale inventories (e.g. national forest inventories). An alterna-
tive is to employ data collected from various remote sensing
platforms. Nevertheless, obtaining sufficient data may require
merging several distinct acquisitions over time, resulting in
potentially significant variations in data quality over the entire
area (see further insights in Kangas et al., 2019).
• Data diversity is critical to build an effective deep learning
model with reliable prediction capability (Wong et al., 2016).
With that regard, data augmentation and transfer learning
are DL-related techniques to ease dealing with scarce data
conditions. Data augmentation aims to expand the number
of training examples by generating new synthesized data
based on various transformations applied to available training
data sets (Shorten and Khoshgoftaar, 2019). For instance,
the square windows drawn to Figure 1 could be rotated
or processed utilizing numerous other image processing
techniques to account for varying imaging conditions, thereby
augmenting data. Although the new samples based on data
augmentation are not independent, augmenting the existing
data have proven to help to improve a DL architecture’s
performance and generalization potential. This approach
has been rarely considered in forestry applications so far. In
transfer learning, knowledge obtained from a training process
of one problem is applied to another related problem (Pan
and Yang, 2010). Transfer learning consists of training a DL
architecture with a huge amount of data and then using the
pre-trained model for fine-tuning or calibrating the network
using a restricted number of case-specific training samples.
This approach can be faster and more accurate than training
the network from scratch. It is usually executed by removing
the last units of the trainedmodel and performing the training
process with new units for the new problem (Pan and Yang,
2010).
• Target variables in forestry often require predictions for
multiple time points or, in other words, the prediction of
dynamics over time instead of a single state. Although time-
series analyses based onweighting data fromdifferent sources
and time points according to the associated uncertainty
have been well-defined in the context of Bayesian filtering
(Särkkä, 2013), many conventional machine learningmethods
applied in forestry have been extensively explored for single
time variable prediction with past data, and broadening this
4
D
ow
nloaded from
 https://academ
ic.oup.com
/forestry/advance-article/doi/10.1093/forestry/cpac002/6518266 by guest on 15 February 2022
Deep learning for forest inventory and planning
perspective with the use of DLmethods provides an interesting
possibility.
DL approaches used in forestry applications
Types of applications
Early studies that pioneered the use of DL for forestry focused
on various applications including estimation of forest biophysical
parameters using autoencoders which exploit high-level feature
representations of image data based on decoders and encoders
(García-Gutiérrez et al., 2016; Shao et al., 2017); interpretation
and extraction of LiDAR features using CNN (Ayrey and Hayes,
2018; Contreras et al., 2019); plant pattern identification and
classification (Guan et al., 2015; Mizoguchi et al., 2017; Zou et al.,
2017; Carpentier et al., 2018; Hamraz et al., 2019; Zortea et al.,
2018; Dos Santos et al., 2019; Fricker et al., 2019; Liu et al.,
2019; Marrs and Ni-Meister, 2019; Narine et al., 2019; Martins
et al., 2019; Pelletier et al., 2019; Windrim and Bryson, 2019), tree
attribute prediction (Ercanlı, 2020) and semantic segmentation
(Chen et al., 2020). Classification tasks have included, inter alia,
forest pests and diseases (Safonova et al., 2019), forest fire mon-
itoring (Chen et al., 2019), wind damage (Hamdi et al., 2019), and
dead wood as a proxy of forest health or biodiversity from aerial
imagery (Sylvain et al., 2019; Jiang et al., 2019).
There was a notable recent focus on image and object analy-
ses to detect trees and their species using various remote sens-
ing data. Comparing the earlier review of Zhu et al. (2017) to
Kattenborn et al. (2021), who already found about a hundred
studies related to these themes from 2017–2020, this trend
could be assumed to continue strong. We therefore consciously
placed a stronger focus on estimating forestry characteristics
and time-series dynamics (Table 1), which were not covered by
the earlier reviews. In our summary of tree segmentation and
classification tasks (Table 2), we focused on the archetypes of
these studies and direct the reader to Kattenborn et al. (2021)
for more applications.
Additionally, DL has been used for fusing data sources. For
instance, Chang et al. (2019) fused data to combine methods to
simultaneously classify forest cover types and estimate different
forest variables using aerial, satellite, terrain and climate data at
varying resolutions. Shah et al. (2020) used a synergy of Landsat
imagery and Lidar data to produce a canopy height model for a
forested area using CNN.
Su et al. (2019) predicted tree heights over time using the long
short-term memory (LSTM). Extending RNN, LSTM networks are
specialized for processing sequential data as the LSTMs ‘remem-
ber’ their data inputs over a longer time and improve the perfor-
mance of traditional RNN which is only capable of maintaining
the short-term memories.
Relationships between accuracy and input data
Our specific objective for this review was to determine whether
and how DL methods improve the current estimation of forest
variables of interest in terms of an error metric such as the
root mean squared error (RMSE). We aimed at a meta-analysis
between evaluation measures or other performance statistics
and the parameterization of the approach. However, as both of
these were rarely described in numerically comparative terms,
we include as many details as possible that can explain the
performance of the DL and hence its feasibility for the respec-
tive task.
Regarding the estimation of continuous forest inventory vari-
ables (Table 1), different numbers of field plots were utilized for
the training and test datasets. The total number of the field
plots varied from 60 to 17537. Most of the studies focused on
predicting above-ground biomass, and the best RMSE was 11 per
cent based on 236 field plots for a study area covering 100 km2
(Zhang et al., 2019). García-Gutiérrez et al. (2016), reporting an
RMSE of 15 per cent, similarly applied autoencoders in a smaller
area (4 km2), but they provide no information about the amount
of training and validation data. Ayrey and Hayes (2018) used the
highest number of field plots split to 15537 samples for training
and additional 1000 for testing and 1000 for validating CNN-
based architectures.
In the structure of DL methods, the size of input data (tiles),
spatial resolution, filters and batch size (the number of training
examples utilized in one iteration) may effectively contribute
to the predictive accuracy, efficiency and training time of the
model. For instance, filter size influences the number of trainable
parameters, and the size of the output depends on the size of
inputs. Pelletier et al. (2019) reported the impact of different
filter sizes (3, 5, 9, 17 and 33) and batch sizes (8, 16, 32, 64
and 128) for crop and forest species detection using temporal
CNN applied on Formosat-2 satellite images resulting in better
accuracy for filter size 9 and batch size 32. Schiefer et al. (2020)
examined the impacts of different spatial resolutions and tile
sizes for tree species discrimination using U-net semantic seg-
mentation model on UAV-derived canopy height models. The
authors claimed that tile size did not represent a meaningful
effect on model accuracy. However, they witnessed that larger
tiles could impact the accuracy of those classes with a low
number of samples because increasing the size of input tiles
reduces the number of samples for underrepresented classes.
Other studies focusing on comparable target variables did not
evaluate (or at least do not report) multiple filter and batch
sizes. Moreover, the size of input data and spatial resolution is
likely to influence the performance of the DL architecture. Current
literature lacks a thorough assessment of these parameters for
forestry prediction tasks. In general, DL hyperparameters may
need to be optimized context-dependently for each application.
Relationships between accuracy, architecture and
methodology
The architecture choice may affect DL results as compared with
conventional machine learning methods for forest variable esti-
mation. Zhang et al. (2019) investigated the use of autoencoders
for forest biomass estimation on Landsat8 and LiDAR data sets.
The autoencoders outperformed traditional k-nearest neighbour,
random forest, support vector regression and multiple stepwise
linear regression approaches by 1 per cent to 7 per cent in
terms of the relative RMSE. According to Ayrey and Hayes (2018),
Inception-V3 or GoogleNet (see the Supplementary data file)
were the most successful CNN architectures for forest above-
ground biomass estimation from LiDAR data with RMSE of 26
per cent or 27 per cent and bias of 0.7 per cent or 2.1 per cent,
5
D
ow
nloaded from
 https://academ
ic.oup.com
/forestry/advance-article/doi/10.1093/forestry/cpac002/6518266 by guest on 15 February 2022
Forestry
Ta
bl
e
1
A
su
m
m
ar
y
of
th
e
st
ud
ie
s
re
la
te
d
to
co
nt
in
uo
us
fo
re
st
va
ria
bl
e
es
tim
at
io
n.
DL
m
et
ho
d
Ta
sk
Da
ta
Ar
ch
ite
ct
ur
e
O
pt
im
iz
er
Le
ar
ni
ng
ra
te
Va
ria
bl
e
(u
ni
t)
Sa
m
pl
e
si
ze
Tr
ai
ni
ng
da
ta
si
ze
In
pu
td
at
a
si
ze
Er
ro
r
Ti
m
e
po
in
ts
Re
fe
re
nc
e
Au
to
en
co
de
r
Re
gr
es
si
on
La
nd
sa
t8
,
Se
nt
in
el
-1
,L
iD
AR
St
ac
ke
d
au
to
au
to
en
co
de
r
st
oc
h.
gr
ad
ie
nt
de
sc
en
t
-
AG
B
(t
.h
a−
1
)
14
00
pl
ot
s
80
0
pl
ot
s
-
14
.4
%
1
Sh
ao
et
al
.(
20
17
)
La
nd
sa
t8
Li
DA
R
St
ac
ke
d
au
to
au
to
en
co
de
r
st
oc
h.
gr
ad
ie
nt
de
sc
en
t
-
AG
B
(t
.h
a−
1
)
23
6
pl
ot
s
17
7
pl
ot
s
-
11
.4
%
1
Zh
an
g
et
al
.
(2
01
9)
CN
N
Se
m
an
tic
se
gm
en
ta
tio
n
Li
DA
R
FC
N
-
-
tr
ee
di
am
et
er
-
N
A
-
-
1
Ch
en
et
al
.(
20
20
)
Cl
as
si
fic
at
io
n
Li
DA
R
In
ce
pt
io
n-
V3
-
-
AG
B
(t
.h
a−
1
)
17
53
7
pl
ot
s
15
53
7
pl
ot
s
7
×
7
×
18
pi
xe
ls
(3
D
CN
N
)
48
.1
%
1
Ay
re
y
an
d
H
ay
es
(2
01
8)
La
nd
sa
t8
,L
iD
AR
-
st
oc
h.
gr
ad
ie
nt
de
sc
en
t
-
Ca
no
py
H
ei
gh
t
-
N
A
30
×
30
0.
98
m
1
Sh
ah
et
al
.(
20
20
)
R-
CN
N
Cl
as
si
fic
at
io
n
an
d
Re
gr
es
si
on
Ae
ria
li
m
ag
e,
La
nd
sa
t7
tim
e-
se
rie
s,
To
po
gr
ap
hy
,
Cl
im
at
e
-
Ad
am
0.
1
Tr
ee
sp
ec
ie
s,
AG
B
(t
.h
a−
1
)
99
67
pl
ot
s
77
24
pl
ot
s
80
×
80
36
%
2
Ch
an
g
et
al
.
(2
01
9)
DN
N
Re
gr
es
si
on
Li
DA
R,
IC
ES
at
-2
pr
ofi
le
,L
an
ds
at
im
ag
e
m
et
ric
s
DN
N
RM
S
Pr
op
0.
1
0.
01
0.
00
01
AG
B
(t
.h
a−
1
)
-
14
48
pi
xe
ls
32
×
32
15
.5
–1
5.
6
(M
g/
ha
)
1
N
ar
in
e
et
al
.
(2
01
9)
Pl
ot
da
ta
N
A
0.
99
9
H
ei
gh
ta
nd
di
am
et
er
re
la
tio
n
15
0
pl
ot
s
N
A
-
4.
95
%
1
Er
ca
nl
ı(
20
20
)
Au
to
en
co
de
r
Li
DA
R
Au
to
en
co
de
r
N
A
-
AG
B
(t
.h
a−
1
)
39
+
54
pl
ot
s
N
A
-
15
%
1
Ga
rc
ía
-G
ut
ié
rr
ez
et
al
.(
20
16
)
RN
N
Tr
ee
ag
e,
te
m
pe
ra
tu
re
ra
in
fa
ll,
so
il,
sl
op
e
po
si
tio
n
Au
to
en
co
de
r
w
ith
LS
TM
RM
S
Pr
op
0.
00
1
H
ei
gh
tg
ro
w
th
10
00
(s
am
pl
e
ty
pe
no
t
m
en
tio
ne
d)
50
0
(d
at
a
ty
pe
no
t
m
en
tio
ne
d)
0.
07
06
%
1
Su
et
al
.(
20
19
)
6
D
ow
nloaded from
 https://academ
ic.oup.com
/forestry/advance-article/doi/10.1093/forestry/cpac002/6518266 by guest on 15 February 2022
Deep learning for forest inventory and planning
Ta
bl
e
2
A
su
m
m
ar
y
of
th
e
st
ud
ie
s
re
la
te
d
to
se
gm
en
ta
tio
n
an
d
cl
as
si
fic
at
io
n.
DL
m
et
ho
d
Ta
sk
s
Da
ta
Ar
ch
ite
ct
ur
e
O
pt
im
iz
er
Le
ar
ni
ng
ra
te
Va
ria
bl
e
In
pu
tD
at
a
Si
ze
La
be
lle
d
Da
ta
O
A%
Re
fe
re
nc
e
De
ep
Bo
ltz
m
an
n
m
ac
hi
ne
s
Cl
as
si
fic
at
io
n
M
ob
ile
Li
DA
R
de
ep
Bo
ltz
m
an
n
m
ac
hi
ne
s
st
oc
ha
st
ic
gr
ad
ie
nt
de
sc
en
t
-
tr
ee
sp
ec
ie
s
-
50
00
0
tr
ee
sa
m
pl
es
fro
m
10
di
ffe
re
nt
tr
ee
sp
ec
ie
s
86
.1
Gu
an
et
al
.
(2
01
5)
De
ep
Be
lie
f
N
et
w
or
k
Li
DA
R
st
oc
ha
st
ic
gr
ad
ie
nt
de
sc
en
t
-
-
-
95
.6
Zo
u
et
al
.(
20
17
)
CN
N
pa
tc
h-
ba
se
d
H
yp
er
sp
ec
tr
al
an
d
Li
DA
R
FC
N
st
oc
ha
st
ic
gr
ad
ie
nt
de
sc
en
t
0.
00
01
tr
ee
sp
ec
ie
s
-
71
3
tr
ee
s
86
.6
7
Fr
ic
ke
re
ta
l.
(2
01
9)
Se
m
an
tic
se
gm
en
ta
tio
n
Fo
rm
os
at
-2
im
ag
es
Te
m
pC
N
N
Ad
am
-
La
nd
co
ve
ra
nd
tr
ee
sp
ec
ie
s
32
×
32
14
19
po
ly
go
ns
93
.4
5
Pe
lle
tie
re
ta
l.
(2
01
9)
Cl
as
si
fic
at
io
n
UA
V
Re
sN
et
-5
0
w
ith
SL
IC
an
d
SV
M
st
oc
ha
st
ic
gr
ad
ie
nt
de
sc
en
t
0.
01
Tr
ee
de
te
ct
io
n
32
×
32
pi
xe
ls
-
89
.0
1
M
ar
tin
s
et
al
.
(2
01
9)
pa
tc
h-
ba
se
d
pu
bl
ic
da
ta
se
t
(n
am
ed
Ba
rk
N
et
1.
0)
re
sn
et
34
Ad
am
0.
00
01
tr
ee
sp
ec
ie
s
32
×
32
pi
xe
ls
-
97
.8
1
Ca
rp
en
tie
re
ta
l.
(2
01
8)
pa
tc
h-
ba
se
d
UA
V
VG
G-
16
Ad
am
0.
00
01
Fo
re
st
da
m
ag
e
22
4
×
22
4
pi
xe
ls
20
0
im
ag
e-
pa
tc
he
s
M
or
e
th
an
90
Sa
fo
no
va
et
al
.
(2
01
9)
Se
m
an
tic
se
gm
en
ta
tio
n
Ae
ria
li
m
ag
e
(V
-N
IR
)
U
-N
et
Ad
am
0.
01
0.
00
1
0.
00
05
0.
00
00
1
Fo
re
st
da
m
ag
e
25
6
×
25
6
pi
xe
ls
15
25
til
es
92
H
am
di
et
al
.
(2
01
9)
pa
tc
h-
ba
se
d
Ae
ria
li
m
ag
e
VG
G1
6
st
oc
ha
st
ic
gr
ad
ie
nt
de
sc
en
t
0.
00
1
Tr
ee
m
or
ta
lit
y
21
×
21
41
×
41
pi
xe
ls
31
5
po
ly
go
ns
94
Sy
lv
ai
n
et
al
.
(2
01
9)
Gr
ou
nd
ph
ot
og
ra
ph
y
U
N
ET
Ad
am
0.
00
1
Tr
ee
sp
ec
ie
s
22
4
×
22
4
pi
xe
ls
64
fie
ld
pl
ot
s
96
.0
3
Li
u
et
al
.(
20
19
)
Se
m
an
tic
se
gm
en
ta
tio
n
Ae
ria
li
m
ag
e
(V
-N
IR
)
FC
N
-D
en
se
N
et
Ad
am
-
Tr
ee
m
or
ta
lit
y
51
2
×
51
2p
ix
el
s
26
1
fa
lle
n
de
ad
tr
ee
s
an
d
30
5
st
an
di
ng
de
ad
tr
ee
s
N
A
Ji
an
g
et
al
.
(2
01
9)
7
D
ow
nloaded from
 https://academ
ic.oup.com
/forestry/advance-article/doi/10.1093/forestry/cpac002/6518266 by guest on 15 February 2022
Forestry
respectively. Inception-V3 was the top-performing architecture,
and it outperformed linear mixed model and random forest
predictions by 3–5 per cent difference in RMSE. In another study,
Pelletier et al. (2019) investigated crop and tree species classi-
fication using Formosat-2 time-series data and found CNN to
outperform an RNN alternative.
Compared to a single data source, data fusion-based feature
enrichment by multi-source data may support applications that
entail more temporally, spatially or spectrally varying properties
than single data could deliver. With that regard, Su et al. (2019)
applied joint autoencoder-RNN for tree height growth prediction
upon integration of different datasets, such as tree age, tem-
perature, rainfall, soil and slope position. Chang et al. (2019)
developed a multi-task recurrent CNN to integrate various data
sources, including aerial and satellite image time-series, topog-
raphy, and climate data to classify different forest cover types
and forest variable attributes, such as above-ground biomass,
quadratic mean diameter, basal area and canopy cover. They
concluded that a multi-task method outperformed support vec-
tor machine and random forest algorithms.
As summarized in Table 2, CNN has been a prominent method
for tree detection and classification tasks such as tree species dis-
crimination (Fricker et al., 2019; Carpentier et al., 2018; Liu et al.,
2019; Pelletier et al., 2019), forest damage detection (Hamdi
et al., 2019; Safonova et al., 2019) and tree mortality mapping
(Sylvain et al., 2019; Jiang et al., 2019). The CNNs mainly were
used for semantic segmentation and patch-based approaches,
the definitions of which are elaborated in the Supplementary
data file. Besides CNN, deep Boltzmann machines (Guan et al.,
2015) and deep belief network (Zou et al., 2017) were applied
for tree species recognition. A critical limitation on training a
CNN architecture for image classification is the laborious pro-
cess of preparing training sample labels, and consequently, no
training data with wide representation and generality for species
classification are available. For instance, transferring knowledge
from labelled to unlabelled data has been evaluated for sev-
eral classification tasks (e.g. Li et al., 2017), but not representa-
tively for classification tasks in areas within a continuous forest
cover.
Despite the importance of minimizing the loss function, the
related optimization is regularly overlooked and has not received
much attention. Using a proper optimizer is essential in selecting
the beneficial features for predicting the response variable and
fine-tuning themodel parameters. It is done by evaluating which
optimizer is more effective and leads to better performance
concerning evaluation criteria (Okewu et al., 2019). As reported in
Tables 1 and 2, most studies utilized classical stochastic gradient
descent, and few studies used more efficient choices based on
adaptive learning rate methods such as Adam and RMSProp
(Mubin et al., 2019).
Discussion
Considering the findings above and in Tables 1 and 2, it is evident
that the applications of DL for forestry are in an early phase. The
primary study line has related to comparing various CNN archi-
tectures between each other and against conventional machine
learning techniques in the estimation of forest attributes. A better
performance was often reported for CNN, but the reason for
this was not explained in terms of simple or complex relation-
ships between the attributes considered, availability of data to
estimate those, or similar. Even though it is not yet possible to
conclude which solution may perform better, in the discussion
we aim at identifying some good practices and challenges in
the current state-of-the-art, and at extending the discussion by
qualitatively collating characteristics of forestry applications and
inherent DL properties.
A selection of studies representing the current
state-of-the-art of DL in forestry
Preceded by initial trials with autoencoders (García-Gutiérrez
et al., 2016; Shao et al., 2017), Ayrey and Hayes (2018) is one of
the pioneering studies in using CNN for forest variable estimation,
and many later studies are based on the same principles. Ayrey
and Hayes (2018) identified separate extraction of predictive
metrics and related considerations (e.g. variation in acquisition
parameters, multicollinearity) as weaknesses that could be
circumvented by means of DL. Many CNN architectures were
evaluated and compared with other approaches for LiDAR-based
forest inventory. Many later studies justify the choice of a DL
method based on the same argumentation that omitting the
step of metric extraction is beneficial.
Chen et al. (2020) introduced a semantic segmentation
approach for LiDAR cloud points for DBH predictions using
the CNN architecture PointNet++. The method automatically
produces tree diameter estimates from an analysis of point cloud
data, indicating that conventional LiDAR and textural features
would result in less accurate results. Although the developed
algorithm and technique are unique to the case, the principles of
semantic segmentation may also have many other applications
in fields related to mimicking segmentation patterns satisfactory
to the user of the data.
Among applications related to modelling forestry dynamics
over time, Su et al. (2019) employed DL in predicting the height
growth of large trees based on tree age, temperature rainfall,
soil and slope data. Although developed for powerline safety
assessment, the method showed potential for modelling the
time-series of tree heights of fast-growing Eucalyptus species.
The study hints at hyperparameter choices and configurations as
critical factors influencing the accuracy of the DL network, which
was illustrated by choosing one activation function over another
and then comparing the resulting RMSE. The best-performing
approach was to merge the extraction capabilities of an autoen-
coder with the forecasting capabilities of the LSTM. This is one of
the hybrid ideas that we see can yield satisfactory results.
Chang et al. (2019) presented a DL approach that employed
several methods and principles. The hybrid DL approach concur-
rently classified forest types and estimated forest parameters
including above-ground biomass, basal area, canopy cover and
mean diameter based on the openly available optical remote
sensing, terrain and climate data. An RNN based on the LSTM
constituted an umbrella to combine classification and regression
and learn from a complexity of the data, showing here as a time-
series. The interesting point in this research is the high efficiency
of the RNN to combine (fuse) different data sources as input
variables. Yet, this work can also be criticized for lacking details
preventing the replication of the analyses (see below).
8
D
ow
nloaded from
 https://academ
ic.oup.com
/forestry/advance-article/doi/10.1093/forestry/cpac002/6518266 by guest on 15 February 2022
Deep learning for forest inventory and planning
Mäyrä et al. (2021) evaluated the integration of hyperspec-
tral remote sensing and LiDAR-derived canopy height model for
boreal tree species classification using CNN. They compared 3D-
CNNs based on the use of spectral–spatial information (con-
trasted to 2D-CNN that would only use spatial information) with
benchmarks conventionally used for this task. The 3D-CNNs out-
performed a support vector machine and an artificial neural
network by 3–5 percentage point improvement to the overall
classification accuracy and other benchmarked methods by a
much wider margin. The CNN was trained from scratch, and
data augmentation was implemented to overcome the negative
impact of a limited number of labelled training samples. Upon
analysis of input image patch sizes, the smallest (4m) and largest
(10 m) tested patches were found to result in the highest (87 per
cent) and lowest (85 per cent) overall accuracy, respectively, but
this cannot be considered a significant difference. Importantly,
they analysed the CNN solutions to discover the spectral and
spatial features that had positively impacted the classification,
thereby providing a useful interpretation of the CNN result that
can otherwise be considered as a black box.
Current challenges, reproducibility of the results and
replicability of the methods
The crucial factor for the success of a DLmodel is the accessibility
to a sufficient amount of training data. A specific challenge is the
requirement to have annotated (labelled) data for the training
(Padarian et al., 2019). In many real-world forestry problems, it
may be challenging to acquire massive amounts of such labelled
information. Based on field inventories, collecting large amounts
of observations is difficult and expensive, requiring extensive field
campaigns over large areas. Although it is nowadays feasible to
use various remote sensing techniques (Kangas et al., 2018), it
remains a problem that these observations usually need to be
interpreted, i.e. refined to labelled information. In this aspect,
forestry applicationsmay differ frommany fieldswhere the appli-
cation of the DL can employ databases of labelled data collected
in huge amounts fromcash and credit card transactions and simi-
lar registers or bymeans of socialmedia and other crowdsourcing
approaches.
No similar label generator can easily be identified for forestry
applications, except possibly for harvester data (Uusitalo et al.,
2006), which however need to be linked to other data sources
for prediction purposes. Although there is a great potential to
use smartphone applications such as iNaturalist to collect taxa
observations, those usually need to be annotated by experts for
reliability (Lahti et al., 2021) and accounted for location biases
originating from the behaviour of the observers (Mononen et al.,
2018). Measuring forest parameters requires a sample to cover
the entire forest variation under sufficient visibility for the (at the
moment, proprietary) interpretation algorithms (Pitkänen et al.,
2021). Only public participation applications that collect opinions
(e.g. Kangas et al., 2015) could be used directly as the people are
reliable sources on their own preferences. Those could be used as
reference data for DL to learn generally appealing location, aes-
thetic and natural properties related to trees and other aspects
based on multi-source data.
To compensate for the small number of data samples, Shao
et al. (2017) used LiDAR data as synthetic data; however, no
assessment was provided to indicate the accuracy improvement.
Among the studies reviewed by Kattenborn et al. (2021), 60 per
cent used visual interpretation to generate the training data,
whereas the remaining 17 per cent or 22 per cent were based on
pure in situ or combined visual and in situ observations, respec-
tively. Further studies should obviously generate more efficient
and robust means to the data generation. Another, more generic
solution that appears under-utilized in forest variable estimation
is the use of the transfer learning method, which is beneficial
to propagate the knowledge learned from a large-scale dataset
to a comparatively small-scale dataset (Zhang et al., 2019). In
addition, the use of data augmentation could possibly mitigate
the negative impacts of issues related to training data limitation.
Standard data augmentation techniques based on changing grey
level values or mirroring datasets might not work or be sufficient
whenaiming to learn 3D-structural phenomena.With somedata,
such as synthetic aperture radar (SAR) reflectivity including both
amplitude and phase components, one might even end up with
very implausible resultswith standard data augmentation so that
more research on proper techniques is needed.
Currently, the literature lacks guidance on the appropriate
amount, extent, geographical coverage and distribution, and so
on, related to training data. Although specific guidelines have
not been reported, obviously more geographically distributed
observations could improve the predictive accuracy and gener-
alization ability. Having sufficient data and choosing a proper
approach to partition training, testing and validation subsets is
particularly important. The training and validation subsets should
be independent of each other, but represent the whole variation
observed in the population. If the samples are divided randomly,
the representativeness of divided samples will most likely vary.
Therefore, the random splitting of the labelled data into train and
test samples may cause overestimations of the accuracy. Some
of the studies for forest variable estimation studies utilized the
random sampling for dividing the ground truth data to train and
test data (Su et al., 2019; Zhang et al., 2019; Narine et al., 2019;
Shah et al., 2020). Although this approach is simple and widely
used, it may limit the transferability of the model to new areas.
Increasing the number of neurons in DL has is a high chance
of solving complicated problems. As a result, a deep network can
highly adapt itself to the training data. However, adding more
fully connected and convolutional layers leads in increasing the
depth and complexity of the network, and as a result, the model
could be prone to excessive running times and overfitting. The
latter leads to degrading quantitative and qualitative accuracy.
Selecting the appropriate number of neurons in the hidden layer
and suitable hyperparameters provides the opportunities to opti-
mally solve these problems. The learning rate is one well-known
hyperparameter that relates to the rate of updating weights in
the analysis. Most of the forest variable estimation studies utilized
conventional optimizers of the hyperparameters based on the
stochastic gradient descent algorithm, whereas more efficient
approaches could be based on adaptive learning rate methods.
Many studies report no information on the used optimizer (Gar-
cía-Gutiérrez et al., 2016; Ayrey and Hayes, 2018; Ercanlı, 2020;
Chen et al., 2020), whereas few studies used dropout and batch
normalization to reduce overfitting and improve the generaliza-
tion capability in the training process (Ayrey and Hayes, 2018;
Chang et al., 2019; Shah et al., 2020). There may be room to
improve the model accuracy and generalization abilities by fur-
ther studying the optimization of the parameters involved. Apart
9
D
ow
nloaded from
 https://academ
ic.oup.com
/forestry/advance-article/doi/10.1093/forestry/cpac002/6518266 by guest on 15 February 2022
Forestry
from the importance of proper parameter optimization, it would
be essential to investigate and evaluate the effect of varying sub-
sample of input data for creating training, test and validation
splits in accordance with the choice of filters and optimizers
used in the training process. Moreover, although most of the
studies only utilized limited numbers of field plots and achieved
promising results over a single study area, the reproducibility of a
DL result has not yet been investigated over new (independent)
datasets and separate study areas.
Moreover, issues related to the replicability of the methods
were noted especially when reviewing the earliest DL studies, but
also among more recent ones. For instance, based on publica-
tions of Ayrey and Hayes (2018) and Chang et al. (2019), it is
not possible to determine what was the exactmodelling unit and
method to derive the response variable for those units. Especially
linking field data measured from circular plots with pixels corre-
sponding to the plot is nontrivial (cf., Figure 1), but required when
training convolutions for the response values to be predicted. It
is possible that the reviewers of the early DL manuscripts might
not have been familiar enough with the methodology to ask
crucial questions on the implementations. Similar signs of under-
maturation can be pointed out from the recent DL applications
as there were in the proliferation of nearest neighbour imputa-
tions until exemplary guidance on feature selection and cross-
validation (e.g. Packalen et al., 2012). Our findings call for similar
investigations and how-to-instructions on DL.
Perspectives of data dimensions on choosing a
modelling approach
Over the last decades, various machine learning algorithms have
been applied for regression and classification of tree and stand
parameters. Comparisons of DL approaches with these algo-
rithms cannot be considered satisfactory so far. This is especially
true if the validation is extended to qualitative aspects such
as the fitness of the DL method to the data availability and
similar prerequisites of the given task. It is not clear all along the
studies whether the reported excellent performances of DL can
be attributed to the method itself or methodical aspects such
as no independent validation, undetected overfitting or a lack of
proper comparison to other feasible methods.
A study on using DL for estimating the tree height–diameter
relationship (Ercanlı 2020) can be used as a cautionary exam-
ple on potentially choosing an excessively complex approach
to model a simple phenomenon. Although mentioning DL in its
title, the algorithm described by Ercanlı (2020) was a common
Multilayer Perceptron (MLP), the structure of which was grown
to include 100 neurons within nine hidden layers in the best-
performing version. Its performance was compared with non-
linear regression andmixed-effectsmodelling, indicating theMLP
to outperform the conventional methods. Although there can be
possible overfitting issues due to evaluating the network with
specific data, Ercanlı (2020) is brought up here to proclaim the
rationale of the modelling choices. The height–diameter mod-
elling is a well-known and thoroughly studied problem in the
field of forest biometrics. Because ofmeasuring trees within plots
and plots within stands, the data become nested with a hier-
archical structure of errors. Therefore, a general recommenda-
tion is to adopt a mixed-effects modelling approach to manage
the hierarchical errors, but also because in a practical case the
parameters associated with the random effects are unknown
and must be predicted. For the latter, the mixed-effects mod-
elling approaches offer undisputed benefits due to calibration
abilities employing the Best Linear Unbiased Prediction, which
is elaborated by Mehtätalo et al. (2015). Ercanlı (2020) did not
address the hierarchical data structure, whereas predicting the
random stand effects or calibrating the predictions with a limited
number of observations would have resulted in a comparative
evaluation accounting for all the possibilities of the modelling
methods in a practical situation. Even if DL approaches may not
have a similar theoretical basis for these aspects as statistical
analysis, accounting for the hierarchical structure that is visible
in the data, for instance, must be developed in the future.
Apart from the above example, Mohamedou et al. (2019) did
not find the MLP approach to add value over diameter increment
predictions. As reasons, they suggest a better parameterization
of the linear mixed-effects model according to causal reasoning
on the biophysical phenomenon. Predictions of inventory totals
or attributes of major species based on high-resolution auxiliary
data are generally found challenging to improve, especially in
boreal forest conditions. For instance, Niska et al. (2010) found
the artificial neural networks and k-nearest neighbour predictions
comparably accurate for the total attributes. The methods dif-
fered in accuracy for plot and stand levels, and it is typical to
get similarly contradictory performances between methods in
predicting species-specific and minor species’ properties (Varvia
et al., 2019). Better performances of different methods can also
be just by a chance because of the small proportion of the better-
predicted phenomena in the data or similar.
We cautiously suggest that in all the above cases, the poten-
tial of a method may be related to the number of dimensions
of the predicted phenomena. Estimating forest growing stock
or biomass is a traditional modelling task, which is essentially
doable with a 1D approach (i.e. using a vector of explanatory
features based on a 1D feature extractor as in Figure 1(a)). The
possibilities to extract additional information for 1D vectors with
CNN are probably limited to what can be achieved by prop-
erly adding interaction terms etc. to conventional parametric
or non-parametric modelling. On the contrary, tree detection,
segmentation and species recognition are 2D tasks (Figure 1(b))
and correspond to generic image object detection and classifica-
tion, for which the CNN first broke through. It is further possible
that CNNs turn out more useful in tasks where the modelled
phenomenon requires considering multiple scales or the time
dimension (Figure 1(c); see below).
Future DL studies could benefit from lessons learned
with other modelling approaches
Various DL approaches were applied to single tree-level inventory
that constitute a chain of events of individual tree detection,
feature extraction and estimation of tree attributes in terms of
conventional methods. It is instructive to consider best practices
that already exist for these steps in the literature without the
DL intervention. There is a number of approaches to carry out
individual tree detections as reviewed by Koch et al. (2014),
Zhen et al. (2016) and Lindberg and Holmgren (2017). Among
these techniques, integrating external knowledge into the image
10
D
ow
nloaded from
 https://academ
ic.oup.com
/forestry/advance-article/doi/10.1093/forestry/cpac002/6518266 by guest on 15 February 2022
Deep learning for forest inventory and planning
analysis of remotely sensed data has been beneficial for the
success rates regardless of whether the knowledgewas obtained
by learning from previous algorithm runs (Heinzel et al., 2011),
estimating the total stem number based on other data (Ene et al.
2012) or point processes (Kansanen et al., 2016), employing prior
knowledge on allowable tree dimensions (Lähivaara et al., 2014;
Swetnam and Falk, 2014; Sacˇkov et al., 2017), or combining tree
detection and sizemodelling (Kansanen et al., 2019). All the listed
considerations could logically be attempted by means of DL, but
this ambition was missing from the studies we came up with.
Fassnacht et al. (2016) provide generally applicable recom-
mendations on tree species classification from remotely sensed
data. Although not covering DL methods, Fassnacht et al. (2016)
concluded that ‘[m]ost studies followed data-driven approaches
and pursued an optimization of classification accuracy, while a
concrete hypothesis or a targeted application was missing in
all but a few exceptional studies’. They further encourage more
research on the causal understanding of the traits that affect the
remotely sensed signal and therefore affect which tree species
can or cannot be classified under given conditions. Even if provid-
ing an otherwise meritorious study of DL and other methods for
species classification, Mäyrä et al. (2021) can be categorized as a
purely data-driven and algorithmbenchmarking study. Moreover,
it could be questioned that since the CNN method required data
augmentation, would the compared methods not have benefit-
ted from the similar expansion of the training data? In principle,
manually rotating the images (cf., Figure 1) or using appropriate
textural features could have yielded the same result also with a
more traditional approach, but obviously based on amuch higher
manual processing burden.
Instead of decimal improvements to RMSE figures, it may be
more fruitful to use DL in applications that require producing
something the conventional methods cannot. The CNN’s ability
to internally learn features from datamay be considered as such.
Applications specifically benefitting from that can be identified
by juxtaposing with traits described in earlier literature based
on other methods. First, discussing scale issues in the context
of modeling ecosystem structure and functioning, Seidl et al.
(2013) hinted that existing models could be reviewed to learn
on scale-dependencies for various applications. Even in the case
where tree-level results were aimed for, Maltamo et al. (2009)
and Vauhkonen et al. (2010) found beneficial to, in addition to
parameters extracted from the tree segments, also use multi-
scale predictors such as those describing the area-level forest
structure in the proximity of the tree. Because of the internal logic
based on image convolution with varying kernel size (Figures 1–
2), the CNNs could possibly infer appropriate scaling from the
data. The data-drivenness of the CNNs could that way be soundly
employed to learn multi-scale patterns for predictions described
above and domains such as spatial ecology and forest ecosystem
modeling, where the scale issues are of importance, but related
analyses seem to be lacking.
Possibilities on DL of forestry dynamics and
management scheduling over time
Although the reviewed studies presented promising DL methods
and results for state variables of a single point in time, our
review indicates that the current applications of DL for forest
management and inventory largely lack predictions of forest
dynamics and growth over time. As the time factor is essential
for the forest dynamics of an area, a time-sensitive DL framework
would be important for a better understanding of forest change
and providing timely informed decisions. Developing DL frame-
works that can handle time-series data is an essential aspect
requiring innovative solutions. A few suggestions for a structured
workflow to process forest inventory time-series using DL can be
formulated based on literature from other fields, bearing in mind
that no comprehensive standard DL procedure relevant to forest
characteristics is currently sketched or tested.
According to observations made from other scientific fields
with time-series data, we expect that RNNs such as the LSTM will
take an influential role because of the ability to simultaneously
consider both the present and past data (Hochreiter and Schmid-
huber, 1997; Gers et al., 1999). On a different occasion, Wan
et al. (2019) criticize the LSTM and techniques as ineffective with
aperiodic datasets and time-consuming with subsequent needs
to develop its learning process over time. Although a combination
of RNN–CNN has shown potential for time-varying image clas-
sification (Mou et al., 2019) and multi-task learning for species
detection and forest variable estimation (Chang et al., 2019),
such approaches should be further investigated for synergistic
data fusion ofmulti-source data and field plots to develop a time-
series forecast strategy of forest variable attributes.
Using prior data from models and historical observations
together with current measurements has potential to both
improve estimates and reduce the data collection burden. Little
has been done to address the benefit from prior data by DL,
compared with using more widely known Bayesian approaches
for this purpose (e.g. Uusitalo et al., 2006; Lähivaara et al.,
2014; Ehlers et al., 2018; Varvia et al., 2019). It is essential
that the uncertainties of using pixels vs aggregated units are
quantified for drawing informed decisions, where the LSTM-
based solutions come conceptually close to the Bayesian filtering
(e.g. Särkkä, 2013). A potential and logical solution would be
to integrate DL with a Bayesian approach, where the DL would
learn the weights for a Bayesian network. Those could possibly
be further used to predict uncertainties, learn from them, and
calibrate the predictions. Another example could be adding
point estimates by credible intervals that allow uncertainty
analyses, and subsequent computational advantages obtained
by replacing an approach based on Bayesian linear assumptions
(Varvia et al., 2019). The correlation of multi-source or multi-
temporal data sources is a challenge in fusing these data (Ehlers
et al. 2018), and therefore it is essential to investigate the
impact of this correlation on DL-based analyses. An integration
of Bayesian filtering and prediction (or their non-parametric
variants such as the Gaussian process regression of Varvia et al.,
2019) with DL would interestingly provide potential to analyze
more than one static state and therefore more feasibly learn the
dynamic nature of forestry attributes.
It is also possible to consider management scheduling in
terms of DL by re-thinking the allocation of operations that is
usually based on linear programming and its variants. Specifi-
cally, the management scheduling problem would be described
as a discrete event stochastic system that is driven by Markov
decision processes, which are in some state S at each time
step. Moving into a new state S′ is influenced by the chosen
11
D
ow
nloaded from
 https://academ
ic.oup.com
/forestry/advance-article/doi/10.1093/forestry/cpac002/6518266 by guest on 15 February 2022
Forestry
action A, giving the decision-maker a corresponding reward R(S,
A, S′). A policy is a rule that a decision-maker follows when
selecting actions in each state. The task is then to formulate
such action-value or reward functions that allow selecting the
optimal policy to maximize the total reward over all successive
steps. In other fields, shallow neural networks (Laguna andMarti,
2002; Jin et al., 2004) and reinforcement learning have been
tested for annealing manufacturing schedules (Stefán, 2003).
Recently, Malo et al. (2021) tested reinforcement learning for
optimizing forestmanagement and found it to allow inclusions of
stochastic events without discretizing state and control variables.
The shallow version could be developed to deep reinforcement
learning by means of recursive training with earlier solutions. In
Hinton (2014), a layer of hidden units of a RBM was trained with
earlier RBMs to cover the possible Markov decision processes of
alternative treatments. The resulting deep hierarchy was then
learned as a neural network (Hinton, 2014). This way of thinking
corresponds to learning from the successes of previously solved
forest planning problems, which is an interesting future outlook.
Conclusion
This review identified the current state, trends, challenges and
research needs using DL for forestry applications. Several pio-
neering trials of tree speciesmapping, forest attribute estimation,
health and disease determination, and firemonitoring have been
presented. Nevertheless, this field remains relatively young, and
it is expected to yield plentiful studies in the coming years.
DL provides the opportunity to learn from multi-source and
multi-temporal data. The main asset of DL is the possibility to
internally learn multi-scale features without an explicit feature
extraction step. However, this asset can also be perceived nega-
tively as DL models are currently hard to interpret, even if inter-
pretations and visualizations of the patterns in the data identified
by DL can be developed as in Mäyrä et al. (2021). Until better
understanding, it is easily perceived as a black box approach with
risks related to the generalization abilities of the predictions, for
example. Essential factors for generating robust DLmethodswith
low chance of overfitting are a sufficient amount of represen-
tative, labelled training data and the appropriate evaluation of
various hyperparameters and optimizationmethods that depend
on the selected architecture and available data. Consequently,
the fitness of a DL method to an application depends on how
extensively it can be parameterized under operational condi-
tions. We note that lessons learned with conventional modelling
approaches based on causal reasoning may turn out to be useful
‘training data’ for DL.
The prospects of DL are likely better realized, when the studies
move forward from the current main study line related to com-
paring various CNNarchitectures between eachother andagainst
conventional machine learning techniques. It is possible that
DL allows learning from observations and experiences, thereby
improving forestry operations through more autonomous deci-
sion processes, as envisaged for machine learning in general by
Müller et al. (2019). Meanwhile, we expect that the following
applications are increasingly realized as intermediate steps to
this overarching goal: (1) discovering new knowledge by novel
combinations of data frommultiple scales and sources including
topographical surveys, weather and climate, historical maps,
and taxonomic observations annotated by experts, in addition
to conventional forestry and remote sensing data sources; (2)
distinguishing species or sizes while segmenting tree or tree
group instances using limited expert annotation of ground truths
and semantic segmentation types of CNNs; (3) learning optimal
weights for Bayesian probabilistic frameworks to account for
stochastic features and thus better manage uncertainties and
calibrate predictions accordingly; and (4) re-thinking manage-
ment scheduling problems as deep reinforcement learning from
databases containing information on forestry production possi-
bilities and decision makers’ preferences, allowing to learn from
previously solved forest planning problems. Novel applications
may be innovated based on considerations of which components
inherent to DL optimally translate to forestry applications or (yet
undiscovered) parts of them.
Supplementary data
Supplementary data are available at Forestry online.
Data availability statement
No new data were generated or analysed in support of this
research.
Acknowledgements
We would like to thank the editor and two anonymous reviewers for
exceptionally thoughtful comments that greatly improved the paper.
Conflict of interest statement
None declared.
Funding
The Academy of Finland [grant number 324193].
References
Akhtar, N. and Ragavendran, U. 2020 Interpretation of intelligence in
CNN-pooling processes: a methodological survey. Neural Comput. Appl.
32, 879–898.
Ayrey, E. and Hayes, D.J. 2018 The use of three-dimensional convolutional
neural networks to interpret LiDAR for forest inventory. Remote Sens. 10.
10.3390/rs10040649.
Carpentier, M., Giguere, P. and Gaudreault, J. 2018 Tree species identifi-
cation from Bark Images using Convolutional Neural Networks. IEEE Int.
Conf. Intell. Robot. Syst. 10.1109/IROS.2018.8593514
Chang, T., Rasmussen, B.P., Dickson, B.G. and Zachmann, L.J. 2019
Chimera: a multi-task recurrent convolutional neural network for
forest classification and structural estimation. Remote Sens. 11.
10.3390/rs11070768.
Chen, Y., Zhang, Y., Jing, X.,Wang, G., Mu, L., Yi, Y., Liu, H., Liu, D. UAV image-
based forest fire detection approach using Convolutional Neural Network.
12
D
ow
nloaded from
 https://academ
ic.oup.com
/forestry/advance-article/doi/10.1093/forestry/cpac002/6518266 by guest on 15 February 2022
Deep learning for forest inventory and planning
In Proceedings of the 2019 14th IEEE Conference on Industrial Electronics
and Applications (ICIEA), Xi’an, China, 19–21 June 2019; pp. 2118–2123.
Chen, S.W., Nardari, G.V, Lee, E.S., Qu, C., Liu, X., Romero, R.A.F. and
Kumar, V. 2020 SLOAM: semantic lidar odometry and mapping for Forest
inventory. IEEE Robot. Autom. Lett. 5, 612–619.
Chollet, F. 2018Deep Learningwith Python. Manning Publications Co., New
York, NY, USA.
Contreras, J., Denzler, J. and Sickert, S. 2019 Automatically estimat-
ing forestal characteristics in 3D point clouds using deep learning. IDiv
Annual Conference, Leipzig, Germany, 29-30 August 2019. https://elib.dlr.
de/133241/
Deng, L. and Yu, D. 2013 Deep learning: methods and applications foun-
dations and trends R in signal processing. Signal Proc. 7, 197–387.
Diez, Y., Kentsch, S., Fukuda, M., Caceres, M.L.L., Moritake, K. and Cabezas,
M. 2021Deep learning in forestry usingUAV-acquired RGB data.A Practical
Review. Remote Sens. 13. 10.3390/rs13142837.
dos Santos, A.A., Marcato Junior, J., Araújo, M.S., Di Martini, D.R., Tetila,
E.C., Siqueira, H.L., et al. 2019 Assessment of CNN-based methods for
individual tree detection on images captured by RGB cameras attached
to UAVS. Sensors 19. 10.3390/s19163595.
Ehlers, S., Saarela, S., Lindgren, N., Lindberg, E., Nyström, M., Persson,
H.J., et al. 2018 Assessing error correlations in remote sensing-based
estimates of forest attributes for improved composite estimation. Remote
Sens. 10. 10.3390/rs10050667.
Ene, L., Næsset, E. and Gobakken, T. 2012 Single tree detection in het-
erogeneous boreal forests using airborne laser scanning and area-based
stem number estimates. Int. J. Remote Sens. 33, 5171–5193.
Ercanlı, I˙. 2020 Innovative deep learning artificial intelligence applications
for predicting relationships between individual tree height and diameter
at breast height. For. Ecosyst. 7. 10.1186/s40663-020-00226-3.
Fassnacht, F.E., Latifi, H., Steréczak, K., Modzelewska, A., Lefsky, M., Waser,
L.T., et al. 2016 Review of studies on tree species classification from
remotely sensed data. Remote Sens. Environ. 186, 64–87.
Fricker, G.A., Ventura, J.D., Wolf, J.A., North, M.P., Davis, F.W. and Franklin,
J. 2019 A convolutional neural network classifier identifies tree species
in mixed-conifer forest from hyperspectral imagery. Remote Sens. 11.
10.3390/rs11192326.
García-Gutiérrez, J., González-Ferreiro, E., Mateos-García, D. and
Riquelme-Santos, J.C. 2016 A preliminary study of the suitability of
deep learning to improve LiDAR-derived biomass estimation. Lect. Notes
Comput. Sci 9648, 588–596.
Gers, F.A., Schmidhuber, J. and Cummins, F. 1999 Learning to forget:
Continual prediction with LSTM. IEE Conf. Publ. 2, 850–855.
Goodfellow, I., Bengio, Y. and Courville, A. 2016 Deep Learning. MIT Press,
Cambridge, MA, USA.
Graupe, D. 2016 Deep learning neural networks. World Sci.
10.1142/10190.
Guan, H., Yu, Y., Ji, Z., Li, J. and Zhang, Q. 2015 Deep learning-based tree
classification using mobile LiDAR data. Remote Sens. Lett. 6, 864–873.
Hamdi, Z.M., Brandmeier, M. and Straub, C. 2019 Forest damage assess-
ment using deep learning on high resolution remote sensing data.
Remote Sens. 11. 10.3390/rs11171976.
Hamraz, H., Jacobs, N.B., Contreras, M.A. and Clark, C.H. 2019 Deep learn-
ing for conifer/deciduous classification of airborne LiDAR 3D point clouds
representing individual trees. ISPRS J. Photogramm. Remote Sens. 158,
219–230.
Hastie, T., Tibshirani, R. and Friedman, J. 2009 The Elements of Statistical
Learning: Data Mining, Inference, and Prediction. Springer-Verlag New
York, ISBN 978–0–387-84857-0
Hatcher, W.G. and Yu, W. 2018 A survey of deep learning: platforms,
applications and emerging research trends. IEEE Access 6, 24411–24432.
Heinzel, J.N., Weinacker, H. and Koch, B. 2011 Prior-knowledge-based
single-tree extraction. Int. J. Remote Sens. 32, 4999–5020.
Hinton, G. 2014Where do features come from? Cogn. Sci. 38, 1078–1101.
Hochreiter, S. and Schmidhuber, J. 1997 Long short-termmemory. Neural
Comput. 9, 1735–1780.
Hoeser, T. and Kuenzer, C. 2020 Object detection and image segmen-
tation with deep learning on earth observation data: a review-part I:
evolution and recent trends. Remote Sens. 12. 10.3390/rs12101667.
Hoeser, T., Bachofer, F. and Kuenzer, C. 2020 Object detection and image
segmentation with deep learning on earth observation data: a review—
part II: applications. Remote Sens. 12. 10.3390/rs12183053.
Jain, P., Coogan, S.C.P., Subramanian, S.G., Crowley, M., Taylor, S. and
Flannigan, M.D. 2020 A review ofmachine learning applications in wildfire
science and management. Env. Rev. 28, 478–505.
Jiang, S., Yao, W. and Heurich, M. 2019 Dead wood detection based
on semantic segmentation of VHR aerial CIR imagery using optimized
FCN-Densenet. Int. Arch. Photogramm. Remote. Sens. Spat. Inf. Sci. 42,
127–133.
Jin, C., Liu, X. and Gao, P. 2004 An intelligent simulationmethod based on
artificial neural network for container yard operation. Lect. Notes Comput.
Sci 3174, 904–911.
Kangas, A., Astrup, R., Breidenbach, J., Fridman, J., Gobakken, T.,
Korhonen, K.T., et al. 2018 Remote sensing and forest inventories
in Nordic countries–roadmap for the future. Scand. J. For. Res. 33,
397–412.
Kangas, A., Rasinmäki, J., Eyvindson, K. and Chambers, P. 2015 A mobile
phone application for the collection of opinion data for forest planning
purposes. Env. Manage. 55, 961–971.
Kangas, A., Räty, M., Korhonen, K.T., Vauhkonen, J. and Packalen, T. 2019
Catering information needs fromglobal to local scales-potential and chal-
lenges with national forest inventories. Forests 10. 10.3390/f10090800.
Kansanen, K., Vauhkonen, J., Lähivaara, T. and Mehtätalo, L. 2016 Stand
density estimators based on individual tree detection and stochastic
geometry. Can. J. For. Res. 46, 1359–1366.
Kansanen, K., Vauhkonen, J., Lähivaara, T., Seppänen, A., Maltamo, M. and
Mehtätalo, L. 2019 Estimating forest stand density and structure using
Bayesian individual tree detection, stochastic geometry, and distribution
matching. ISPRS J. Photogramm. Remote Sens. 152, 66–78.
Kattenborn, T., Leitloff, J., Schiefer, F. and Hinz, S. 2021 Review on Con-
volutional Neural Networks (CNN) in vegetation remote sensing. ISPRS J.
Photogramm. Remote Sens. 173, 24–49.
Koch, B., Kattenborn, T., Straub, C. and Vauhkonen, J. 2014 Segmentation
of forest to tree objects. In Forestry Applications of Airborne Laser Scan-
ning M. Maltamo, E. Næsset and J. Vauhkonen (eds). Springer, Dordrecht,
Managing Forest Ecosystems 27. 10.1007/978-94-017-8663-8_5
Krizhevsky, A., Sutskever, I. and Hinton, G.E. 2012 Imagenet classification
with deep convolutional neural networks. Adv. Neural Inf. Proc. Syst. 1,
1097–1105.
Laguna, M. and Marti, R. 2002 Neural network prediction in a system for
optimizing simulations. IIE Trans. 34, 273–282.
Lähivaara, T., Seppänen, A., Kaipio, J.P., Vauhkonen, J., Korhonen, L.,
Tokola, T., et al. 2014 Bayesian approach to tree detection based on
airborne laser scanning data. IEEE Trans. Geosci. Remote Sens. 52,
2690–2699.
Lahti, K.M., Heikkinen, M., Juslén, A. and Schulman, L. 2021 Tackling data
quality challenges in the Finnish Biodiversity Information Facility (FinBIF).
Biodiv. Inf. Sci. Standard 5. 10.3897/biss.5.75559.
13
D
ow
nloaded from
 https://academ
ic.oup.com
/forestry/advance-article/doi/10.1093/forestry/cpac002/6518266 by guest on 15 February 2022
Forestry
LeCun, Y., Bengio, Y. and Hinton, G. 2015 Deep learning. Nature 521,
436–444.
LeCun, Y., Boser, B., Denker, J.S., Henderson, D., Howard, R.E., Hubbard, W.,
et al. 1989 Backpropagation applied to handwritten zip code recognition.
Neural Comput. 1, 541–551.
Li, A., Lu, Z., Wang, L., Xiang, T. and Wen, J.R. 2017 Zero-shot scene
classification for high spatial resolution remote sensing images. IEEE
Trans. Geosci. Remote Sens. 55, 4157–4167.
Lindberg, E. and Holmgren, J. 2017 Individual tree crown methods for 3D
data from remote sensing. Cur. For. Rep. 3, 19–31.
Litjens, G., Kooi, T., Bejnordi, B.E., Arindra, A., Setio, A., Ciompi, F., et al. 2017
A survey on deep learning in medical image analysis. Medic. Image Anal.
42, 60–88.
Liu, H. 2020 Robot systems for rail transit applications. Elsevier .
10.1016/C2019-0-04615-8.
Liu, J., Wang, X. andWang, T. 2019 Classification of tree species and stock
volume estimation in ground forest images using deep learning. Comput.
Electron. Agr. 166. 10.1016/j.compag.2019.105012.
Ma, L., Liu, Y., Zhang, X., Ye, Y., Yin, G. and Johnson, B.A. 2019 Deep learning
in remote sensing applications: a meta-analysis and review. ISPRS J.
Photogramm. Remote Sens. 152, 166–177.
Marrs, J. and Ni-Meister, W. 2019 Machine learning techniques for tree
species classification using co-registered LiDAR and hyperspectral data.
Remote Sens. 11. 10.3390/rs11070819.
Martins, J., Junior, J.M., Menezes, G., Pistori, H., Santaana, D. and
Goncalves, W. 2019 Image segmentation and classification with SLIC
Superpixel and convolutional neural network in forest context. IEEE Int.
Geosci. Remote Sens. Symp. 2019, 6543–6546.
Malo, P., Tahvonen, O., Suominen, A., Back, P. and Viitasaari, L. 2021
Reinforcement learning in optimizing forestmanagement. Can. J. For. Res.,
in press . 10.1139/cjfr-2020-0447.
Maltamo, M., Peuhkurinen, J., Malinen, J., Vauhkonen, J., Packalén, P. and
Tokola, T. 2009 Predicting tree attributes and quality characteristics of
scots pine using airborne laser scanning data. Silva Fenn. 43, 507–521.
Mäyrä, J., Keski-Saari, S., Kivinen, S., Tanhuanpää, T., Hurskainen, P., Kull-
berg, P., et al. 2021 Tree species classification from airborne hyperspectral
and LiDAR data using 3D convolutional neural networks. Remote Sens.
Environ. 256. 10.1016/j.rse.2021.112322.
Mehtätalo, L., de-Miguel, S. and Gregoire, T.G. 2015 Modeling height-
diameter curves for prediction. Can. J. For. Res. 45, 826–837.
Mizoguchi, T., Ishii, A., Nakamura, H., Inoue, T. and Takamatsu, H.
2017 Lidar-based individual tree species classification using convolu-
tional neural network. Proc. Videometrics, Range Imag. Appl. 10332.
10.1117/12.2270123.
Mohamedou, C., Korhonen, L., Eerikaïnen, K. and Tokola, T. 2019 Using
LiDAR-modified topographic wetness index, terrain attributes with leaf
area index to improve a single-tree growth model in south-eastern Fin-
land. Forestry 92, 253–263.
Mononen, L., Auvinen, A.P., Packalen, P., Virkkala, R., Valbuena, R., Bohlin,
I., et al. 2018 Usability of citizen science observations together with
airborne laser scanning data in determining the habitat preferences of
forest birds. For. Ecol. Manag. 430, 498–508.
Mou, L., Bruzzone, L. and Zhu, X.X. 2019 Learning spectral-spatialoral
features via a recurrent convolutional neural network for change
detection in multispectral imagery. IEEE Trans. Geosci. Remote Sens. 57,
924–935.
Mubin, N.A., Nadarajoo, E., Shafri, H.Z.M. and Hamedianfar, A. 2019
Young and mature oil palm tree detection and counting using
convolutional neural network deep learning method. Int. J. Remote Sens.
40, 7500–7515.
Müller, F., Jaeger, D. and Hanewinkel, M. 2019 Digitization in wood supply
– a review on how industry 4.0 will change the forest value chain. Comput.
Electron. Agr. 162, 206–218.
Narine, L.L., Popescu, S.C. and Malambo, L. 2019 Synergy of ICESat-2 and
landsat for mapping forest aboveground biomass with deep learning.
Remote Sens. 11. 10.3390/rs11121503.
Niemi, M.T. and Vauhkonen, J. 2016 Extracting canopy surface texture
from airborne laser scanning data for the supervised and unsuper-
vised prediction of area-based forest characteristics. Remote Sens. 8.
10.3390/rs8070582.
Niska, H., Skön, J.P., Packalén, P., Tokola, T., Maltamo,M. and Kolehmainen,
M. 2010 Neural networks for the prediction of species-specific plot vol-
umes using airborne laser scanning and aerial photographs. IEEE Trans.
Geosci. Remote Sens. 48, 1076–1085.
Nuutinen, T., Berger, F., Karjalainen, A., Lempinen, R., Maltamo, M. and
Siitonen, M. 2011 Request-driven generation of calculation chains for
adaptive forest analysis. Scand. J. For. Res. 26, 2–10.
Nweke, H.F., Teh, Y.W., Al-garadi, M.A. and Alo, U.R. 2018 Deep learning
algorithms for human activity recognition using mobile and wearable
sensor networks: State of the art and research challenges. Expert Syst.
Appl. 105, 233–261.
Okewu, E., Adewole, P. and Sennaike, O. 2019 Experimental comparison
of stochastic optimizers in deep learning. Lect. Notes Comput. Sci 11623,
704–715.
Packalen, P., Temesgen, H. and Maltamo, M. 2012 Variable selection
strategies for nearest neighbor imputation methods used in remote
sensing based forest inventory. Can. J. Remote. Sens. 38, 557–569.
Padarian, J., Minasny, B. and McBratney, A.B. 2019 Using deep learning
for digital soil mapping: a review aided by machine learning tools. Soil 5,
79–89.
Pan, S.J. and Yang, Q. 2010 A survey on transfer learning. IEEE Trans.
Knowledge Data Eng. 22, 1345–1359.
Pelletier, C., Webb, G.I. and Petitjean, F. 2019 Temporal convolutional
neural network for the classification of satellite image time series. Remote
Sens. 11. 10.3390/rs11050523.
Pitkänen, T.P., Räty, M., Hyvönen, P., Korhonen, K.T. and Vauhkonen, J.
2021 Using auxiliary data to rationalize smartphone-based pre-harvest
forest mensuration. Forestry. 10.1093/forestry/cpab039.
Pukkala, T., Vauhkonen, J., Korhonen, K.T. and Packalen, T. 2021 Self-
learning growth simulator for modelling forest stand dynamics in chang-
ing conditions. Forestry 94, 333–346.
Rammer, W. and Seidl, R. 2019 Harnessing deep learning in ecol-
ogy: an example predicting bark beetle outbreaks. Front. Plant Sci. 10.
10.3389/fpls.2019.01327.
Reichstein, M., Camps-Valls, G., Stevens, B., Jung, M., Denzler, J. and
Carvalhais, N. 2019 Deep learning and process understanding for data-
driven earth system science. Nature 566, 195–204.
Sacˇkov, I., Hlásny, T., Bucha, T. and Jurisˇ, M. 2017 Integration of tree
allometry rules to treetops detection and tree crowns delineation using
airborne lidar data. IForest Biogeosci. For. 10, 459–467.
Safonova, A., Tabik, S., Alcaraz-Segura, D., Rubtsov, A., Maglinets, Y. and
Herrera, F. 2019 Detection of fir trees (Abies sibirica) damaged by the bark
beetle in unmanned aerial vehicle images with deep learning. Remote
Sens. 11. 10.3390/rs11060643.
Salcedo-Sanz, S., Ghamisi, P., Piles, M., Werner, M., Cuadra, L., Moreno-
Martínez, A., et al. 2020 Machine learning information fusion in earth
observation: a comprehensive review of methods, applications and data
sources. Inf. Fusion 22, 480–545.
Särkkä, S., 2013 Bayesian Filtering and Smoothing. Cambridge University
Press, ISBN 9781139344203. 10.1017/CBO9781139344203
14
D
ow
nloaded from
 https://academ
ic.oup.com
/forestry/advance-article/doi/10.1093/forestry/cpac002/6518266 by guest on 15 February 2022
Deep learning for forest inventory and planning
Schiefer, F., Kattenborn, T., Frick, A., Frey, J., Schall, P., Koch, B., et al. 2020
Mapping forest tree species in high resolution UAV-based RGB-imagery by
means of convolutional neural networks. ISPRS J. Photogramm. Remote
Sens. 170, 205–215.
Schmidhuber, J. 2015 Deep learning in neural networks: an overview.
Neural Netw. 61, 85–117.
Seidl, R., Eastaugh, C.S., Kramer, K., Maroschek, M., Reyer, C., Socha, J.,
et al. 2013 Scaling issues in forest ecosystem management and how to
address them with models. Eur. J. For. Res. 132, 653–666.
Shah, S.A.A., Manzoor, M.A. and Bais, A. 2020 Canopy height estimation
at Landsat resolution using convolutional neural networks. Mach. Learn.
Knowledge Extract. 2, 23–36.
Shao, Z., Zhang, L. and Wang, L. 2017 Stacked sparse autoencoder
modeling using the synergy of airborne LiDAR and satellite optical and
SAR data to map Forest above-ground biomass. IEEE J. Selected Topics
Appl. Earth Obs. Remote Sens. 10, 5569–5582.
Shi, W., Zhang, M., Zhang, R. and Chen, S. 2020 Change detection based
on artificial intelligence: state-of-the-art and challenges. Remote Sens.
12. 10.3390/rs12101688.
Shorten, C. and Khoshgoftaar, T.M. 2019 A survey on image data aug-
mentation for deep learning. J. Big Data 6, 1–48.
Singh, S.A. and Majumder, S. 2020 Short and noisy electrocardio-
gram classification based on deep learning. In Deep Learning for
Data Analytics, H. Das, C. Pradhan and N. Deypp (eds). Elsevier.
10.1016/B978-0-12-819764-6.00002-8.
Sothe, C., de Almeida, C.M., Schimalski, M.B., la Rosa, L.E.C., Castro, J.D.B.,
Feitosa, R.Q., et al. 2020 Comparative performance of convolutional neu-
ral network, weighted and conventional support vector machine and
random forest for classifying tree species using hyperspectral and pho-
togrammetric data. Gisci. Remote Sens. 57, 369–394.
Stefán, P. 2003 Combined Use of Reinforcement Learning and Simu-
lated Annealing: Algorithms and Applications. Ph.D. Thesis, University
of Miskolc, Department of Mechanical Engineering, Budapest, Hun-
gary, 119. http://phd.lib.uni-miskolc.hu/JaDoX_Portlets/documents/docu
ment_5607_section_985.pdf
Su, C., Wu, X., Tang, X. and Hu, J. 2019 Growth height prediction for the
trees under overhead lines based on deep learning algorithm. Int. Conf.
Power Syst. Tech. 2018, 3693–3699.
Sylvain, J.D., Drolet, G. and Brown, N. 2019 Mapping dead forest cover
using a deep convolutional neural network and digital aerial photography.
ISPRS J. Photogramm. Remote Sens. 156, 14–26.
Swetnam, T.L. and Falk, D.A. 2014 Application of metabolic scaling theory
to reduce error in local maxima tree segmentation from aerial LiDAR. For.
Ecol. Manag. 323, 158–167.
Theodoridis, S. and Koutroumbas, K. 2008. Pattern Recognition. 4th edn.
Academic Press, ISBN: 9780080949123
Uusitalo, J., Puustelli, A., Kivinen, V.P., Nummi, T. and Sinha, B.K. 2006
Bayesian estimation of diameter distribution during harvesting. Silva
Fenn. 40, 663–671.
Varvia, P., Lähivaara, T., Maltamo, M., Packalen, P. and Seppänen, A.
2019 Gaussian process regression for forest attribute estimation from
airborne laser scanning data. IEEE Trans. Geosci. Remote Sens. 57,
3361–3369.
Vauhkonen, J., Korpela, I., Maltamo, M. and Tokola, T. 2010 Impu-
tation of single-tree attributes using airborne laser scanning-based
height, intensity, and alpha shape metrics. Remote Sens. Environ. 114,
1263–1276.
Wan, R., Mei, S., Wang, J., Liu, M. and Yang, F. 2019 Multivariate tem-
poral convolutional network: a deep neural networks approach for mul-
tivariate time series forecasting. Electronics 8. 10.3390/electronics80
80876.
Windrim, L. and Bryson, M. 2019 Forest tree detection and segmentation
using high resolution airborne LiDAR. Proc. IEEE/RSJ Int. Conf. Intell. Robot.
Syst. 2019, 3898–3904.
Wong, S.C., Gatt, A., Stamatescu, V. and McDonnell, M.D. 2016 Under-
standing data augmentation for classification: when to warp?. Proc. Int.
Conf. Digit. Image Comput. Tech. Appl. 2016, 1–6.
Zhang, L., Shao, Z., Liu, J. and Cheng, Q. 2019Deep learning based retrieval
of forest aboveground biomass from combined LiDAR and Landsat 8 data.
Remote Sens. 11. 10.3390/rs11121459.
Zhen, Z., Quackenbush, L.J. and Zhang, L. 2016 Trends in automatic
individual tree crown detection and delineation-evolution of LiDAR data.
Remote Sens. 8. 10.3390/rs8040333.
Zhu, X.X., Tuia, D., Mou, L., Xia, G.S., Zhang, L., Xu, F., et al. 2017 Deep
learning in remote sensing: a review. IEEE Geosci. Remote Sens. Mag. 5,
8–36.
Zortea, M., Nery, M., Ruga, B., Carvalho, L.B. and Bastos, A.C. 2018 Oil-palm
tree detection in aerial images combining deep learning classifiers. Proc.
IEEE Int. Geosci. Remote Sens. Symp. 2018, 657–660.
Zou, X., Cheng, M., Wang, C., Xia, Y. and Li, J. 2017 Tree classification in
complex forest point clouds based on deep learning. IEEE Geosci. Remote
Sens. Lett. 14, 2360–2364.
15
D
ow
nloaded from
 https://academ
ic.oup.com
/forestry/advance-article/doi/10.1093/forestry/cpac002/6518266 by guest on 15 February 2022