Evaluation of the quality of automatic tree detection using photogrammetric canopy height models and orthomosaic

. The work was performed in the old-growth linden-spruce forest of the Kologrivsky Forest Nature Reserve (Kostroma Oblast, Russia) based on aerial photography with a quadcopter. Automatic detection algorithms made it possible to detect most of the trees in the forest canopy. Tree detection by orthomosaic using neural network algorithm ‘DeepForest’ turned out to be of better quality than detection based on the canopy height model using an algorithm based on the sliding window method. As a rule, both methods showed better results for conifers compared to deciduous trees. Comparison of the average heights of trees estimated from remote data and measured by ground survey did not reveal significant differences. Additional ground surveys to assess the quality of undergrowth detection are needed.


Introduction
Data on the spatial structure of forest stands are demanded for solving many fundamental and applied problems in forest ecology and forest management (Alonzo et al., 2018;Bennett et al., 2020;Otero et al., 2018;Puliti et al., 2015).At the same time, it is necessary to analyse large amounts of actual data on the structure of stands and the characteristics of individual trees in order to obtain reliable results; these data are often difficult to obtain using traditional ground survey methods (Anderson and Gaston, 2013;Pajares, 2015;Zhang et al., 2016a).A promising approach for the rapid collection of such data is aerial photography by unmanned aerial vehicles (UAVs), or quadcopters.The quadcopter retains the set altitude during the entire flight, moves along a planned in advance route, taking a series of photos automatically with overlap at a fixed camera orientation angle.Photogrammetric processing of these images allows obtaining three-dimensional point clouds, two-dimensional raster RGB (Red, Green, Blue) orthomosaic, and digital elevation models, by which it is possible to estimate the spatial position of the studied objects (in a given coordinate system), as well measure their size.
Based on point clouds and digital height models and using ready-made algorithms, it is possible to automatically detect (i.e., find) tree tops, estimate their heights, trunk diameters and crown sizes with high accuracy (Birdal et al., 2017;Ivanova et al., 2021;Krisanski et al., 2020;Medvedev et al., 2020;Mohan et al., 2017).At the same time, the results published so far are mainly obtained in mono-or oligo-dominant forests, which were often managed (Miller et al., 2017;Picos et al., 2020).In addition, the quality of detection and height estimation of trees with a cone-shaped crown is higher compared to trees with a spherical crown (Alonzo et al., 2018;Bennett et al., 2020;Ivanova et al., 2020).
Methods based on artificial intelligence, computer vision, and machine learning algorithms are promising for analysing the structure of forest stands using orthoimage.In recent years, such methods have been widely used in ecology to address matters related to the analysis and segmentation of images (Christin et al., 2019;Lamba et al., 2019).The advantage of this approach is that the accumulated array of images of global coverage, primarily satellite imagery, may be used for analysis (Weinstein et al., 2020).For example, D.E.Kislov and K.A. Korznikov (2020) have assessed windthrow disturbances using remote data and a neural network.M. Onishi and T. Ise (2021) successfully detected tree crowns from orthoimage in the mixed forests of Japan using deep learning methods.
The study aims to evaluate the quality of tree top detection using photogrammetric canopy height mo-dels and orthomosaic.The research was performed on the territory of the Kologrivsky Forest Nature Reserve in a mixed old-growth forest stand.

Research object
The Kologrivsky Forest Nature Reserve is located in the north-eastern part of the Kostroma Oblast and consists of two clusters (Manturovsky and Kologrivsky).The total area of the Reserve is 58939.6ha.Reserve became part of the biosphere reserve of the same name as a part of the UNESCO Man and the Biosphere Program (https://en.unesco.org/biosphere/eu-na/kologrivsky-forest) in 2020.The old coniferous-broadleaved forest tract (area of 918 hectares), located on the territory of the Kologrivsky cluster (the so-called "core" of the Reserve), is of greatest interest for study (Korennye..., 1988).According to the literature data, the core spruce forests have not been impacted by anthropogenic disturbances over the past 350-400 years (Ivanov et al., 2012;Khoroshev et al., 2013).The tree layer of these forests includes Picea abies (L.) Karst., Abies sibirica Ledeb., Tilia cordata Mill., Ulmus glabra Huds., Acer platanoides L., and Betula pubescens Erh.The stands are uneven-aged and multilayered.The forests of the reserve "core" were significantly damaged by the storm (Lebedev and Chistyakov, 2021b); in 2021, however, some fragments of old-growth southern taiga spruce forests have remained.This work was carried out on a permanent sample plot, which was slightly affected by the consequences of a catastrophic windthrow.
The surveyed sample plot with a size of 200×50 m (1 ha) was established in 1983, a re-survey was carried out in 2017.The area consists of two adjoining subplots of 0.5 ha each: 11/83 and 12/83, according to the numbering adopted in the reserve (Fig. 1).Forest inventory attributes for 2017 are presented in Table 1 (Lebedev and Chistyakova, 2021a).In 2017, an active fragmentation of spruce stands was recorded due to the natural fallout of large spruce trees.

Methods for collecting and analysing the field data
Aerial photography of the sample plot was carried out on August 27 and 28, 2021, with a DJI Phantom 4 quadcopter in mosaic mode flight from a height of 336 m with 80% overlap of photographs (for the entire territory of the reserve "core") and from a height of 154 m with 95% overlap (for the sample plot).Flight plans were arranged with a buffer zone to avoid the edge effect during data processing.The photogrammetric processing of the obtained images was carried out using the Agisoft Metashape v. 1.5 program (2019), where dense point clouds were built: (1) for the sample plot with a density of 436.9 points/ m 2 (Fig. 2), and (2) orthomosaic with a resolution of 5 and 10 cm/pixel for the sample plot and the entire territory of the reserve "core", respectively.
Since the published data on the state of forest stands at the sample plot were collected prior to the catastrophic windthrow, the windthrow damage was assessed first.To do this, manual vectorization of all visually distinguishable trunks of fallen trees was performed according to the orthoimage.The histogram of the distribution of pixel height values in the digital height model was also analysed.
A canopy height model was used to detect tree tops.The dense point cloud was processed in the R environment (R Core Team, 2021) using lidR v. 3.1.2(Roussel et al., 2020) and rLiDAR (Silva et al., 2018).These open-source packages are designed to process airborne laser scanning (LiDAR) data and photogrammetric point clouds, as well as elevation models derived from them.The canopy height model was built on the basis of a photogrammetric point cloud using the functions of the lidR package.First, points of the ground points class were selected using the cloth algorithm simulation filtering (Zhang et al., 2016b) implemented in the classify_ground() function.Then, using the normalize_height() function, the point cloud was normalised according to the level of the ground points using the 'tin' (triangular interpolation network).After that, a tree canopy height model was built using the pit-free algorithm (Khosravipour et al., 2014)   a sliding-window algorithm implemented in the FindTreesCHM() function of the rLiDAR package.
We have tested different combinations of the sliding window of a specified size (fws argument) and the minimum tree height (minht argument).
In order to select individual trees according to the orthomosaic, we used a pretrained neural network available in the 'DeepForest' library implemented in the Python environment (Weinstein et al., 2020).This neural network is built on the basis of semisupervised learning.The model involves the input of small amounts of labelled data to complement the datasets that do not have labels.Labelled data is used to run the system and may significantly increase the speed and accuracy of learning.The model used in 'DeepForest' was trained by developers on descriptions of 24 forest sites of the US National Environmental Monitoring Network.The data included 434551 automatically found trees and 2848 manually labelled trees (Weinstein et al., 2019).In our study, we tested different tile sizes for tree detection: 200, 400, 600, and 800 pixels, as recommended by the developers.The analysis was carried out using an orthoimage with a resolution of 10 cm/pixel.
The following values were calculated to assess the quality of the results of automatic detection: TP (true positive) is the number of trees correctly detected by the algorithm; FP (false positive) is the number of false positives of the algorithm, when a tree is either detected automatically, but is absent in the tree stand, or the found top belongs to an already detected tree; FN (false negative) is the number of missed trees that exist in the area, but were not detected automatically (Fig. 3).
The number of correctly detected trees (true positive, TP) and false positives (FP) was estimated based on visual analysis of the orthomosaic.Since detailed data of the sample plot recalculation were not available for 2017, all trees were manually detected in the QGIS environment (QGIS Development Team, 2019) using a digital elevation model and an orthomosaic in order to estimate the number of trees omitted by the algorithm (FN).In particular, the crown margins were expertly determined for each tree visually recognized on the orthoimage, then the tree top was marked for each tree guided by the height model.After that, FN was calculated as the difference between the number of manually detected trees and the number of tops correctly detected automatically.The evaluation was performed only for stands (height > 20 m) because reliable visual detection of undergrowth trees and undergrowth in windows was often not possible.All trees (TP, FP, and FN) were indicated as belonging to deciduous or coniferous on the basis of an expert assessment according to the orthoimage.Then the scores adopted for the analysis of the quality of automatic detection were calculated (Goutte and Gaussier, 2005;Li et al., 2012;Sokolova et al., 2008).
The score p (precision) indicates the quality of tree detection; it is calculated as the proportion of trees correctly detected by the algorithm (true positive, TP) and the number of all trees found automatically: r, p, and F > 0.8 (Chen et al., 2022;Gonçalves et al., 2022;Ivanova et al., 2021;Silva et al., 2016).
According to the results of processing the digital height model, the scores were calculated for different combinations of arguments of the FindTreesCHM() function.In each variant, the values obtained for all data were analysed only for the stand (> 20 m) and undergrowth (< 20 m).Based on the results of the orthomosaic processing using a neural network, scores were calculated for different options for the size of the tile, but only for the forest stand.

Results and discussion
According to the visual analysis of the orthoimage and canopy height model, the woody vegetation on the studied sample plot has a heterogeneous vertical structure (Figs. 1,4,5).It is known that the canopy height model reflects only its upper layer, so the type of the distribution of pixel heights may reveal the presence of heterogeneities and discontinuities in the canopy (Portnov et al., 2021).Two peaks are clearly visible on the histogram of the distribution of pixel height values obtained in the digital tree canopy The score r (recall) indicates the completeness of tree detection by the algorithm; it is calculated as the proportion of trees correctly detected automatically (TP) relative to the number of manually detected trees (i.e., growing on the sample plot): F-score indicates the general quality of the results, taking into account r and p; it is calculated by the formula: The more trees omitted by the algorithm, the smaller the value of r; low p values indicate a high number of false positives (FP).The F-value is high at high values of r and p, i.e., when most of the trees available in the area (in this case, manually detected) are found automatically and when the number of missed trees (false negative, FN) and false positives is small.Rating values may vary from 0 to 1; in the literature, the detection quality is described as high if height model (Fig. 4).The left one corresponds to the vegetation in the canopy gaps, the right one, to the groups of the largest trees forming the stand.At the same time, the number of pixels corresponding to large trees is small, and the distribution shape differs from that specific to dense canopy, but it is closer to the areas where there are gaps in the forest stand (Ivanova and Shashkov, 2022;Portnov et al., 2021).
In addition, during remote sensing, there was no dense forest stand on subplot 11/83, and only individual large trees remained.Woody vegetation was mainly represented by undergrowth, while only single fallen trunks were visible on the orthoimage in places where there was no forest stand.This suggested that the fragmentation of the stand occurred before the catastrophic windthrow due to natural reasons, which was consistent with the results of the 2017 re-survey.Trees that fell several years ago may be hidden by layers of undergrowth and, therefore, were not visible on the orthoimage.The trees that fell as a result of a catastrophic windthrow were detected in this subplot, mainly from neighbouring areas (Fig. 1).In subplot 12/83, the number of large trees was greater than in subplot 11/83; the tree layer had gaps.A strip of fallen trees was clearly visible, oriented from the southwest to the northeast; most likely, these trees were felled by a storm in 2021.There were also small gaps, in which fallen trees were indistinguishable from the orthoimage.Apparently, they were formed before the catastrophic windthrow as a result of the fall of individual large trees.Therefore, the woody vegetation of the studied area was heterogeneous; it was formed both as a result of the long-term natural dynamics of forest stands and because of severe windthrow timber.
From 82 to 320 tree tops were detected throughout the sample plot as a result of automatic detection of tree tops using a digital elevation model with different options for the FindTreesCHM() function arguments (Fig. 5, Table 2).In total, 136 trees were mapped (51 coniferous and 85 deciduous) by manual detection of trees in the upper layer.When comparing these data with the orthomosaic, the best results were obtained using a sliding window of 5 pixels and with a minimum height of detected trees of 10 or 15 m.The results of these experiments were used to quantify the quality of detection.
When assessing the quality of the results of automatic detection, performed for all trees and separately for layers, a high quality of the search for p was obtained for both analysed combinations of FindTreesCHM() function arguments (Table 3).This evidenced that the algorithm found tree tops correctly in most cases.At the same time, the quality of detection of coniferous trees was higher than that of deciduous trees.
The completeness of detection r was assessed only for the forest stand.For deciduous trees, this parameter was quite high and comparable in both experiments, while the value of the r estimate was higher than p.This indicated that the algorithm found false tops more often than omitted trees when detecting deciduous trees.Both estimates were high for coniferous trees at minht = 10, the value of r was slightly lower than p.At minht = 15, the estimate of r was low (much lower than p).Therefore, the algorithm omitted coniferous trees more often than it found false tops.
F estimates, which indicates the general detection quality, were rather high.The highest value (0.95) was obtained for coniferous trees at minht = 10, the lowest value (0.78), for deciduous trees at minht = 10.In both experiments, the F values for coniferous trees were higher than for deciduous trees.
As a result of tree detection by orthomosaic using the 'DeepForest' neural network algorithm, the best results were obtained with tile sizes of 400 and 600 pixels.With a tile size of up to 400 pixels, the model identified too fractional objects, due to which the largest crowns were recognized as sets of several small ones.On the contrary, when the tile size was more than 800 pixels, the crowns of several adjacent trees were detected as one tree.When using a tile of 400 pixels, 381 trees were detected, and 218 trees with a tile of 600 pixels.The detected trees belonged both to the forest stand and to the undergrowth, so their number was higher than that obtained by detection using the height model.
The evaluation of the quality of the neural network results was performed only for the forest stand.It was noted that its tree detection accuracy was higher in comparison with the canopy height model (Table 4); at the same time, as a rule, coniferous trees were also detected better than deciduous ones.In all cases, p was higher than r, i.e., the neural network omitted trees more often than found false ones (in this case, it combined several trees into one crown).The general quality of detection, F, was high.According to this assessment, in the first experiment, the detection accuracy of coniferous and deciduous trees was the same, in the second, the results obtained for coniferous trees were of better quality.
Generally, we conclude that it is possible to identify most of the trees in the forest stand with high accuracy in the studied area of the mixed forest with the help of ready-made algorithms.As it was concluded in our previous research performed in pine and pine-spruce forests on the territory of the Prioksko-Terrasny Biosphere Reserve, the presence of trees of several species in the forest stand, pronounced layering and gaps in the forest canopy can reduce the quality of tree detection using digital elevation models (Ivanova et al., 2021).In the present study, the quality of detection is generally high, despite the polydominance and heterogeneous vertical structure of woody vegetation.Apparently, this result is explained by the absence of closed canopy in the studied area and by mature trees located singly or in groups.At the same time, the crowns of individual trees are well distinguished on the canopy height model and on the orthoimage even in the case of a group arrangement.Apparently, the low population density of trees has become the main factor for their successful detection.This conclusion fits well to earlier publications stating that the quality of tree segmentation in sparse stands is higher than in closed stands (Alonzo et al., 2014(Alonzo et al., , 2018;;Kolarik et al., 2020;Medvedev et al., 2020).
It is also important to note that deep learning-based tree segmentation methods continue to be actively developed.Thus, M. Onishi and T. Ise (2021) both segmented tree crowns in mixed closed deciduous forests successfully and identified nine classes of trees with high accuracy.Further development of these methods will make it possible to obtain more accurate estimates describing the structure of forest stands and to predict their dynamics.
Our results also evidence of a higher quality of detection of coniferous trees compared to deciduous ones.All coniferous trees presented at the sample plot (mainly spruce, with a small amount of fir) have a well-defined top.Large deciduous trees (birch and linden), on the contrary, had spherical crowns, so the algorithm often found several tops within each one.In general, this result is expected and is consistent with the data available in the literature (Alonzo et al., 2018;Bennett et al., 2020;Ivanova et al., 2020).
The development of a methodology for a more complete account of undergrowth is of significant interest for further research.Obviously, it is necessary to use a smaller sliding window than for detecting mature trees to solve this problem based on the canopy height model; in addition, higher resolution canopy height models are likely to be required.The results of preliminary experiments (original unpublished data) allow to conclude that it is possible to detect a large number of tree tops that form the undergrowth layer using a height model with a resolution of 40 cm/ pixel.When using the 'DeepForest' neural network, a promising way to improve the results is to train the model on data from areas adjacent to the sample plot.This approach is recommended by the developers to better take into account the local characteristics of forest stands (Weinstein et al., 2020).In any case, the assessment of the quality of these results is not possible without ground survey data.
We have also shown that the automatically obtained estimates of average tree heights are in good agreement with the results of ground measurements (Table 5), but it is impossible to make a detailed comparison at the level of individual trees due to the lack of more detailed data.Nevertheless, our results are consistent with the conclusions obtained at other sites about the good convergence of these estimates (Bennet et al., 2020;Birdal et al., 2017;Panagiotidis et al., 2017).At the same time, it should be noted that automatic height estimates may have errors associated with artefacts of photogrammetric cloud processing when selecting points of the "ground points" class.In addition, ground-based height measurements, especially of large deciduous trees, may have significant errors due to the ambiguity of the definition of the tree top (Alonzo et al., 2018;Bennett et al., 2020;Ivanova et al., 2021).

Conclusions
This work presents the results of a study of the structure of forest stands based on orthoimage and photogrammetric models of tree canopy heights, which is a new and actively developing direction in Russia.Old-growth southern taiga spruce forests with a complex tree stand structure have been studied.
Despite the presence of several species of trees and the spatial heterogeneity of the forest stand, readymade algorithms for processing aerial photography data make it possible to detect most of trees in the forest canopy with confidence.The quality of detection results from an orthoimage using a neural network is higher than the quality of detection from a digital

Tree type
Average tree height, ground measurements ( 2017  elevation model using an algorithm based on the sliding window method.In both cases, the detection quality of coniferous trees is generally slightly higher than that of deciduous trees.The average heights estimated from digital height models are in good agreement with the results of ground measurements.Additional ground surveys are required to assess the quality of undergrowth detection.
In general, the obtained results confirm that the remote data collected by means of UAVs make it possible to obtain realistic estimates of the characteristics of stands.We conclude that the use of these methods in forestry practice is quite promising for carrying out planned forest inventory work, as well as for promptly obtaining data on the state of forest stands after catastrophic events.The use of this approach is especially relevant for hard-to-reach areas where in situ studies are difficult to perform.

Fig. 3 .
Fig. 3. Correctly detected trees (true positive = TP), false positives (FP), and trees missed by the algorithm (false negative = FN) on a fragment of an orthomosaic.

Fig. 5 .
Fig. 5. Results of automatic detection of tree tops using a digital height model for different values of the FindTreesCHM () function arguments: A -at minht = 10, fws = 5; B -at minht = 15, fws = 5; C -results of manual tree detection.

Table 1 .
Forest inventory attributes of the forest stands on the studied permanent trial plot in 2017 (according to: Lebedev and Chistyakov, 2021a, with modifications).

Table 2 .
Results of automatic detection of the digital elevation model with different variants of the FindTreesCHM() arguments (minht and fws).

Table 5 .
Comparison of tree heights based on ground measurements in 2017 and estimates obtained from aerial photography data in 2021.SD is the standard deviation, min is the minimum height, max, the maximum height.

Table 3 .
Evaluation of the quality of the results of automatic detection of tree tops on a permanent sample plot using a digital height model.

Table 4 .
Evaluation of the quality of the results of detection of trees of the upper layer on a permanent sample plot using a neural network.