Impact Statement
OpenForest establishes a constantly evolving catalog of open-access forest datasets. This catalog is open for contributions and aims to provide a single central hub for such datasets within the open-source community. In addition to introducing the OpenForest catalog, we provide in this article a detailed overview of complementary research topics and challenges in forest monitoring and machine learning, so as to better enable the impactful use of these datasets in interdisciplinary research. We hope this work will ultimately contribute substantially to enhancing our comprehension of global forest composition as well the development of innovative machine learning algorithms.
1. Introduction
Forests cover one third of the Earth’s land surface (The Food and Agriculture Organization of the United Nations, 2020). They provide a range of valuable ecosystem goods and services to humanity, including timber provision, water and climate regulation, and atmospheric carbon sequestration (Millennium Ecosystem Assessment, 2001; Bonan, Reference Bonan2008). They also serve as habitat for a myriad of plant, animal, and microbial species. However, human activities have had, and continue to have, a major impact on forests worldwide.
More than 3,000 ha of forests disappear every hour from deforestation (Hansen et al., Reference Hansen, Potapov, Moore, Hancher, Turubanova, Tyukavina, Thau, Stehman, Goetz, Loveland, Kommareddy, Egorov, Chini, Justice and Townshend2013; The Food and Agriculture Organization of the United Nations, 2020). Yet forests are also increasingly recognized as natural solutions to the joint climate and biodiversity crises (Griscom et al., Reference Griscom, Adams, Ellis, Houghton, Lomax, Miteva, Schlesinger, Shoch, Siikamäki, Smith, Woodbury, Zganjar, Blackman, Campari, Conant, Delgado, Elias, Gopalakrishna, Hamsik, Herrero, Kiesecker, Landis, Laestadius, Leavitt, Minnemeyer, Polasky, Potapov, Putz, Sanderman, Silvius, Wollenberg and Fargione2017, Reference Griscom, Busch, Cook-Patton, Ellis, Funk, Leavitt, Lomax, Turner, Chapman, Engelmann, Gurwick, Landis, Lawrence, Malhi, Schindler Murray, Navarrete, Roe, Scull, Smith, Streck, Walker and Worthington2020; Drever et al., Reference Drever, Cook-Patton, Akhter, Badiou, Chmura, Davidson, Desjardins, Dyk, Fargione, Fellows, Filewod, Hessing-Lewis, Jayasundara, Keeton, Kroeger, Lark, Le, Leavitt, LeClerc, Lemprière, Metsaranta, McConkey, Neilson, St-Laurent, Puric-Mladenovic, Rodrigue, Soolanayakanahally, Spawn, Strack, Smyth, Thevathasan, Voicu, Williams, Woodbury, Worth, Xu, Yeo and Kurz2021). Forest-based adaptation through avoided forest conversion, improved forest management, and forest restoration could mitigate over 2 Gt CO2-eq emissions per year by 2030 (Intergovernmental Panel On Climate Change (IPCC), 2023), with variations observed in different regions worldwide (Griscom et al., Reference Griscom, Adams, Ellis, Houghton, Lomax, Miteva, Schlesinger, Shoch, Siikamäki, Smith, Woodbury, Zganjar, Blackman, Campari, Conant, Delgado, Elias, Gopalakrishna, Hamsik, Herrero, Kiesecker, Landis, Laestadius, Leavitt, Minnemeyer, Polasky, Potapov, Putz, Sanderman, Silvius, Wollenberg and Fargione2017, Reference Griscom, Busch, Cook-Patton, Ellis, Funk, Leavitt, Lomax, Turner, Chapman, Engelmann, Gurwick, Landis, Lawrence, Malhi, Schindler Murray, Navarrete, Roe, Scull, Smith, Streck, Walker and Worthington2020; Bastin et al., Reference Bastin, Finegold, Garcia, Mollicone, Rezende, Routh, Zohner and Crowther2019; Busch et al., Reference Busch, Engelmann, Cook-Patton, Griscom, Kroeger, Possingham and Shyamsundar2019; Drever et al., Reference Drever, Cook-Patton, Akhter, Badiou, Chmura, Davidson, Desjardins, Dyk, Fargione, Fellows, Filewod, Hessing-Lewis, Jayasundara, Keeton, Kroeger, Lark, Le, Leavitt, LeClerc, Lemprière, Metsaranta, McConkey, Neilson, St-Laurent, Puric-Mladenovic, Rodrigue, Soolanayakanahally, Spawn, Strack, Smyth, Thevathasan, Voicu, Williams, Woodbury, Worth, Xu, Yeo and Kurz2021), while being limited by the climatic effects we are witnessing on forests (Zhu et al., Reference Zhu, Zhang, Niu, Chu and Luo2018).
Due to their significant economic and ecological importance, monitoring forests has attracted considerable attention. It includes the assessment of ecosystem functional properties, as well as the evaluation of forest health, vitality, stress, and diseases (see Section 2.1). However, monitoring forests presents significant challenges, especially using field-based approaches (see Section 2.2). Forests cover huge areas and can be difficult to access. Consequently, remote sensing has and continues to play an important role in forest monitoring worldwide. Nowadays, a wide array of remote sensing platforms and sensors to monitor forests are available and being used. This includes platforms such as drones (also referred to as unoccupied aerial vehicles [UAVs]), airplanes, or satellites, and sensors ranging from passive optical imagery, to active methods such as light detection and ranging (LiDAR) or synthetic aperture RADAR (SAR) (Verrelst et al., Reference Verrelst, Camps-Valls, Muñoz Marí, Rivera, Veroustraete, Clevers and Moreno2015; White et al., Reference White, Coops, Wulder, Vastaranta, Hilker and Tompalski2016).
In recent years, the applications of remote sensing data for the Earth-related purposes (Ma et al., Reference Ma, Liu, Zhang, Ye, Yin and Johnson2019; Camps-Valls et al., Reference Camps-Valls, Tuia, Zhu and Reichstein2021), such as forest monitoring (Fassnacht et al., Reference Fassnacht, Latifi, Stereńczak, Modzelewska, Lefsky, Waser, Straub and Ghosh2016; Diez et al., Reference Diez, Kentsch, Fukuda, Caceres, Moritake and Cabezas2021; Kattenborn et al., Reference Kattenborn, Leitloff, Schiefer and Hinz2021; Michalowska and Rapinski, Reference Michalowska and Rapinski2021), have gained momentum due to the adoption of machine learning methods and algorithms. It has been inspired by continuous improvements in the performance of deep learning models used in computer vision challenges in the past decade (Deng et al., Reference Deng, Dong, Socher, Li, Li and Fei-Fei2009; Lin et al., Reference Lin, Maire, Belongie, Hays, Perona, Ramanan, Dollár and Zitnick2014; Everingham et al., Reference Everingham, Eslami, Van Gool, Williams, Winn and Zisserman2015). Recent advances in deep learning model architectures have enabled the integration of remote sensing data from various sensors and resolutions—spatial, temporal, or spectral—which presents promising opportunities to enhance forest monitoring practices (Cong et al., Reference Cong, Khanna, Meng, Liu, Rozi, He, Burke, Lobell and Ermon2022; Rahaman et al., Reference Rahaman, Weiss, Träuble, Locatello, Lacoste, Bengio, Pal, Li and Schölkopf2022; Reed et al., Reference Reed, Gupta, Li, Brockman, Funk, Clipp, Funk, Candido, Uyttendaele and Darrell2022; Tseng et al., Reference Tseng, Zvonkov, Purohit, Rolnick and Kerner2023).
Numerous machine learning challenges related to forest monitoring have yet to be explored (see Figure 1), and addressing them will require diverse and large forest datasets (Liang and Gamarra, Reference Liang and Gamarra2020). While there is a wealth of remote sensing data that is freely available, these data can be difficult to access because they involve a wide variety of sensory modalities, geographies, and tasks and are spread out across many repositories. To the best of our knowledge, no comprehensive, central repository of open-access forest datasets currently exists, a gap which we fill here with OpenForest (https://github.com/RolnickLab/OpenForest). The OpenForest catalog is designed to simplify the process of accessing and highlight forest monitoring datasets for researchers in the field of machine learning and forest biology, thereby accelerating progress in these domains.

Figure 1. Overview of forest monitoring topics and challenges associated with machine learning perspectives and challenges. Note: Each forest monitoring topic and its challenges are detailed with their corresponding section number (in red). They are associated with the three main machine learning perspectives and challenge categories, namely generalization, limited data, and domain-specific objectives, along with their corresponding section number (in red).
In this article, we present the existing biological topics and challenges related to forest monitoring that scientists are currently investigating (Section 2) and which could be of interest of machine learning practitioners. Additionally, we briefly introduce several machine learning research topics, exploring their potential applications in addressing biology-related challenges (Section 3). These applications hold promise in assisting ecologists and biologists in their work. Moreover, we conduct a thorough review of open-access forest datasets across different spatial scales (Figure 2) to support both machine learning and biology research communities in their work (Section 4). Finally, we provide perspectives on the space of machine learning applications with forest datasets (Section 5).

Figure 2. Illustration of forest monitoring datasets at different scales. Note: Inventories are in situ measurements realized at the tree level. Ground-based datasets are recorded within or below the canopy of the trees. Aerial datasets are composed of recordings from sensors mounted on unoccupied (drones) or occupied aircrafts. Satellite datasets are collected from sensors mounted on satellites orbiting the Earth. Map datasets are generated at the country or world level using datasets at the aerial or satellite scales.
2. Forest monitoring: current topics and challenges
Forest monitoring is an empirical science that is increasingly based on data-driven machine learning methods and, as such, benefits by improved data access through open data (Wulder et al., Reference Wulder, Masek, Cohen, Loveland and Woodcock2012; De Lima et al., Reference De Lima, Phillips, Duque, Tello, Davies, De Oliveira, Muller, Honorio Coronado, Vilanova, Cuni-Sanchez, Baker, Ryan, Malizia, Lewis, Ter Steege, Ferreira, Marimon, Luu, Imani, Arroyo, Blundo, Kenfack, Sainge, Sonké and Vásquez2022). In particular, deep learning algorithms are widely recognized for their strong performance in diverse tasks, but their successful application often relies on large datasets to unleash their performance and enhance their generalization potential. This section seeks to emphasize the importance of open-access forest datasets for two primary purposes: first, to facilitate a more comprehensive exploration of current topics in the context of forest monitoring (Section 2.1); and second, to better assess forest monitoring-related challenges (Section 2.2), particularly for machine learning practitioners.
2.1. Forest monitoring topics
Considering the significance of forests both economically and ecologically, forest monitoring encompasses a range of trackable forest attributes. Each of them can be sensed with different sensors, platforms, and across different scales. The forest attribute itself (such as the regression of a biochemical property or the detection of tree individuals), together with the structure of the remotely sensed signals, collectively determines the appropriate machine learning algorithms to be employed. These aspects are succinctly discussed in the following sections.
2.1.1. Forest extent and forest type mapping
Tracking the extent of forests is crucial to understand the spatial distribution of forest resources, ecosystem services, and assess the role of forest in land surface dynamics (Keenan et al., Reference Keenan, Reams, Achard, de Freitas, Grainger and Lindquist2015). Thereby, forest can be classified to different management, functional, or ecosystems types (e.g., coniferous, deciduous forests) (Buchhorn et al., Reference Buchhorn, Lesiv, Tsendbazar, Herold, Bertels and Smets2020; Zhang et al., Reference Zhang, Liu, Chen, Gao, Xie and Mi2021). In this regard, Earth observation data from long-term satellite missions (e.g., Landsat or Sentinel described in Section 4.4) enable to track forest extent dynamics across past decades (Hansen et al., Reference Hansen, Potapov, Moore, Hancher, Turubanova, Tyukavina, Thau, Stehman, Goetz, Loveland, Kommareddy, Egorov, Chini, Justice and Townshend2013), enabling to assess conservation efforts and anthropogenic land cover change, such as deforestation for agricultural expansion (Curtis et al., Reference Curtis, Slay, Harris, Tyukavina and Hansen2018).
2.1.2. Tree species mapping
A fine-scaled representation of forest stands in terms of their species composition is relevant for forestry (e.g., species-specific timber supply), biogeographical assessments (e.g., climate change-induced shifts in species distributions), or biodiversity monitoring (Fassnacht et al., Reference Fassnacht, Latifi, Stereńczak, Modzelewska, Lefsky, Waser, Straub and Ghosh2016; Wang and Gamon, Reference Wang and Gamon2019; Cavender-Bares et al., Reference Cavender-Bares, Gamon and Townsend2020). Recent developments in machine learning greatly advanced the identification of tree species in high-resolution data (e.g., imagery or LiDAR point clouds from drones and airplanes) using semantic and instance segmentation methods (Schiefer et al., Reference Schiefer, Kattenborn, Frick, Frey, Schall, Koch and Schmidtlein2020; Li et al., Reference Li, Chai, Wang, Lei and Zhang2022a; Cloutier et al., Reference Cloutier, Germain and Laliberté2023). At large spatial scales, Earth observation satellite data, providing coarser spatial but high temporal and spectral resolutions, enable accurate assessments of tree species distributions using spatiotemporal machine learning methods (Ienco et al., Reference Ienco, Interdonato, Gaetano and Minh2019; Bolyn et al., Reference Bolyn, Lejeune, Michez and Latte2022).
2.1.3. Biomass quantification
Forests provide cardinal ecosystem services through their provision of timber and their role as sinks in the terrestrial carbon cycle (Regnier et al., Reference Regnier, Resplandy, Najjar and Ciais2022). Tree biomass is primarily a product of the wood volume and density, while both properties are challenging to obtain from remote sensing data. Accurate biomass estimates of individuals trees can be obtained from close-range 3D representations acquired from terrestrial or drone-based LiDAR systems (Brede et al., Reference Brede, Terryn, Barbier, Bartholomeus, Bartolo, Calders, Derroire, Moorthy, Lau, Levick, Raumonen, Verbeeck, Wang, Whiteside, van der Zee and Herold2022). More indirectly related information on crown and canopy structure derived from airborne or spaceborne LiDAR and SAR data can be used to estimate biomass at the stand scale (Le Toan et al., Reference Le Toan, Quegan, Davidson, Balzter, Paillou, Papathanassiou, Plummer, Rocca, Saatchi, Shugart and Ulander2011; Lu et al., Reference Lu, Chen, Wang, Liu, Li and Moran2016). Some studies have indicated the value of passive optical data from satellites, since forest biomass is partially correlated with foliage density (Besnard et al., Reference Besnard, Koirala, Santoro, Weber, Nelson, Gütter, Herault, Kassi, N’Guessan, Neigh, Poulter, Zhang and Carvalhais2021; Potapov et al., Reference Potapov, Li, Hernandez-Serna, Tyukavina, Hansen, Kommareddy, Pickens, Turubanova, Tang, Silva, Armston, Dubayah, Bryan Blair and Hofton2021). Given that precise large-scale biomass distributions cannot be directly revealed through a single remote sensing modality alone, deep learning may play a crucial role to simultaneously exploit the suite and high dimensionality of available data modalities (Yang et al., Reference Yang, Liang and Zhang2020).
2.1.4. Forest health, disturbance, and mortality
In many regions, forest ecosystems are under pressure as globalization facilitates the introduction of exotic pests and pathogens, climate change exceeds the resilience and resistance of trees (Hartmann et al., Reference Hartmann, Bastos, Das, Esquivel-Muelbert, Hammond, Martínez-Vilalta, McDowell, Powers, Pugh, Ruthrof and Allen2022), while nutrient and water cycles are affected by anthropogenic activities (Steffen et al., Reference Steffen, Sanderson, Tyson, Jäger, Matson, Moore, Oldfield, Richardson, Schellnhuber, Turner and Wasson2005; Trumbore et al., Reference Trumbore, Brando and Hartmann2015). A decline in tree health, for example, due to pathogen infestations or shortages of water and nutrients, can lead to a variety of symptoms, such as changing concentrations of multiple biochemical tissue properties (e.g., pigments, carbohydrates, and water content), which in turn can be sensed through multispectral or hyperspectral reflectance (Zarco-Tejada et al., Reference Zarco-Tejada, Camino, Beck, Calderon, Hornero, Hernández-Clemente, Kattenborn, Montes-Borrego, Susca, Morelli, Gonzalez-Dugo, PRJ, Landa, Boscia, Saponari and Navas-Cortes2018; Zarco-Tejada et al., Reference Zarco-Tejada, Hornero, Beck, Kattenborn, Kempeneers and Hernández-Clemente2019). In this context, deep learning algorithms are very promising, due to their capability to exploit high-dimensional data (e.g., hyperspectral) and to translate it into a suite of foliage properties relevant to vegetation health (Cherif et al., Reference Cherif, Feilhauer, Berger, Dao, Ewald, Hank, He, Kovach, Lu, Townsend and Kattenborn2023). An interconnected topic are globally increased rates of forest mortality (Allen et al., Reference Allen, Macalady, Chenchouni, Bachelet, McDowell, Vennetier, Kitzberger, Rigling, Breshears, Hogg, Gonzalez, Fensham, Zhang, Castro, Demidova, Lim, Allard, Running, Semerci and Cobb2010; Hartmann et al., Reference Hartmann, Bastos, Das, Esquivel-Muelbert, Hammond, Martínez-Vilalta, McDowell, Powers, Pugh, Ruthrof and Allen2022). In this context, a wealth of approaches was successfully employed at local scales, such as the detection of dead trees via semantic or instance segmentation techniques (Sani-Mohammed et al., Reference Sani-Mohammed, Yao and Heurich2022; Cloutier et al., Reference Cloutier, Germain and Laliberté2023), or at large scales, such as the regression of annual cover of dead tree crowns in satellite image pixels with deep learning-based time series analysis (Schiefer et al., Reference Schiefer, Schmidtlein, Frick, Frey, Klinke, Zielewska-Büttner, Junttila, Uhl and Kattenborn2023).
2.1.5. Biophysical traits and functional ecosystem properties
With accelerated biodiversity decline and environmental change, understanding functional properties, their diversity across stands and landscapes, as well as their phenology (temporal dynamics), is essential to assess the resilience and resistance of ecosystems (Thompson et al., Reference Thompson, Mackey, McNulty and Mosseler2009; Sakschewski et al., Reference Sakschewski, Von Bloh, Boit, Poorter, Peña Claros, Heinke, Joshi and Thonicke2016). Given that trees through evolution developed different strategies to interact with light, their appearance studied with optical remote sensing signals can inform on a variety of functional traits, such as the foliage density, date of green up, or contents of different pigments, and carbohydrates (Schneider et al., Reference Schneider, Morsdorf, Schmid, Petchey, Hueni, Schimel and Schaepman2017; Cherif et al., Reference Cherif, Feilhauer, Berger, Dao, Ewald, Hank, He, Kovach, Lu, Townsend and Kattenborn2023). Such functional traits determine the configuration of an ecosystem and thereby modulate functional ecosystem processes (Migliavacca et al., Reference Migliavacca, Musavi, Mahecha, Nelson, Knauer, Baldocchi, Perez-Priego, Christiansen, Peters, Anderson, Bahn, Andrew Black, Blanken, Bonal, Buchmann, Caldararu, Carrara, Carvalhais, Cescatti, Chen, Cleverly, Cremonese, Desai, El-Madany, Farella, Fernández-Martínez, Filippa, Forkel, Galvagno, Gomarasca, Gough, Göckede, Ibrom, Ikawa, Janssens, Jung, Kattge, Keenan, Knohl, Kobayashi, Kraemer, Law, Liddell, Ma, Mammarella, Martini, Macfarlane, Matteucci, Montagnani, Pabon-Moreno, Panigada, Papale, Pendall, Penuelas, Phillips, Reich, Rossini, Rotenberg, Scott, Stahl, Weber, Wohlfahrt, Wolf, Wright, Yakir, Zaehle and Reichstein2021; Gomarasca et al., Reference Gomarasca, Migliavacca, Kattge, Nelson, Niinemets, Wirth, Cescatti, Bahn, Nair, Acosta, Altaf Arain, Beloiu, Andrew Black, Bruun, Bucher, Buchmann, Byun, Carrara, Conte, da Silva, Duveiller, Fares, Ibrom, Knohl, Komac, Limousin, Lusk, Mahecha, Martini, Minden, Montagnani, Mori, Onoda, Peñuelas, Perez-Priego, Poschlod, Powell, Reich, Šigut, van Bodegom, Walther, Wohlfahrt, Wright and Reichstein2023), that is, fluxes of energy and matter between the terrestrial biosphere, pedosphere, hydrosphere, and atmosphere, including carbon, evapotranspiration, latent, and sensible heat. Due to the cardinal importance of these fluxes in the Earth system, considerable efforts have been made to monitor them using a ground-based sensor network of flux towers (e.g., FLUXNET) (Baldocchi et al., Reference Baldocchi, Falge, Gu, Olson, Hollinger, Running, Anthoni, Bernhofer, Davis, Evans, Fuentes, Goldstein, Katul, Law, Lee, Malhi, Meyers, Munger, Oechel, Paw U, Pilegaard, Schmid, Valentini, Verma, Vesala, Wilson and Wofsy2001). Given the complexity of these ecosystem processes, deep learning is assumed to greatly enlarge our capabilities to exploit local flux towers and globally available remote sensing data to spatially and temporally extrapolate and understand forest ecosystem process (Jung et al., Reference Jung, Koirala, Weber, Ichii, Gans, Camps-Valls, Papale, Schwalm, Tramontana and Reichstein2019; Reichstein et al., Reference Reichstein, Camps-Valls, Stevens, Jung, Denzler, Carvalhais and Prabhat2019; Camps-Valls et al., Reference Camps-Valls, Tuia, Zhu and Reichstein2021; ElGhawi et al., Reference ElGhawi, Kraft, Reimers, Reichstein, Körner, Gentine and WinklerWinkler2023).
2.2. Forest monitoring challenges
Forests are complex ecosystems dominated by trees. As living organisms, trees are affected by various abiotic and biotic factors, which influence their remotely sensed signal via their foliage properties and crown architecture (Kulawardhana, Reference Kulawardhana2011). Machine learning researchers wishing to develop algorithms to monitor forests using remote sensing data must be aware of these sources of biological variation and their origin. Because some of this biological variation is largely unpredictable but potentially clustered in space and time (e.g., insect outbreaks affecting tree health and random genetic variation within tree species populations), it can be seen as a challenge as it might lead to systematic errors for prediction tasks. On the other hand, part of this variation is deterministic (e.g., changes in leaf color and other phenological changes driven by seasonal fluctuations that occur every year) and could be leveraged to improve model performances (Cloutier et al., Reference Cloutier, Germain and Laliberté2023). Another major, pervasive challenge in forest monitoring are the difficulties associated with the acquisition of ground data to train or validate machine learning models using remote sensing data. Below we summarize these primary challenges.
2.2.1. Tree species
There are an estimated 73,000 tree species on Earth (Cazzolla Gatti et al., Reference Cazzolla Gatti, Reich, Gamarra, Crowther, Hui, Morera, Bastin, de Miguel, Nabuurs, Svenning, Serra-Diaz, Merow, Enquist, Kamenetsky, Lee, Zhu, Fang, Jacobs, Pijanowski, Banerjee, Giaquinto, Alberti, Almeyda Zambrano, Alvarez-Davila, Araujo-Murakami, Avitabile, Aymard, Balazy, Baraloto, Barroso, Bastian, Birnbaum, Bitariho, Bogaert, Bongers, Bouriaud, Brancalion, Brearley, Broadbent, Bussotti, Castro Da Silva, César, Češljar, Chama Moscoso, Chen, Cienciala, Clark, Coomes, Dayanandan, Decuyper, Dee, Del Aguila Pasquel, Derroire, Djuikouo, Van Do, Dolezal, Đorđević, Engel, Fayle, Feldpausch, Fridman, Harris, Hemp, Hengeveld, Herault, Herold, Ibanez, Jagodzinski, Jaroszewicz, Jeffery, Johannsen, Jucker, Kangur, Karminov, Kartawinata, Kennard, Kepfer-Rojas, Keppel, Khan, Khare, Kileen, Kim, Korjus, Kumar, Kumar, Laarmann, Labrière, Lang, Lewis, Lukina, Maitner, Malhi, Marshall, Martynenko, Monteagudo Mendoza, Ontikov, Ortiz-Malavasi, Pallqui Camacho, Paquette, Park, Parthasarathy, Peri, Petronelli, Pfautsch, Phillips, Picard, Piotto, Poorter, Poulsen, Pretzsch, Ramírez-Angulo, Restrepo Correa, Rodeghiero, Rojas Gonzáles, Rolim, Rovero, Rutishauser, Saikia, Salas-Eljatib, Schepaschenko, Scherer-Lorenzen, Šebeň, Silveira, Slik, Sonké, Souza, Stereńczak, Svoboda, Taedoumg, Tchebakova, Terborgh, Tikhonova, Torres-Lezama, Van Der Plas, Vásquez, Viana, Vibrans, Vilanova, Vos, Wang, Westerlund, White, Wiser, Zawiła-Niedźwiecki, Zemagho, Zhu, Zo-Bi and Liang2022), the majority of which are found in the tropics. While tree species show many similarities (e.g., the presence of woody stems and branches), every tree species differs from one another in their chemical and structural make-up and how they will reflect solar radiation (Asner et al., Reference Asner, Martin, Carranza-Jiménez, Sinca, Tupayachi, Anderson and Martinez2014). For example, tree foliage of different species comes in various shades of green that reflect the concentrations of pigments (e.g., chlorophylls and carotenoids) (Gates et al., Reference Gates, Keegan, Schleter and Weidner1965). Likewise, tree species differ from each other in their leaf form crown structure (Verbeeck et al., Reference Verbeeck, Bauters, Jackson, Shenkin, Disney and Calders2019), which will affect the remotely sensed signal. Such foliar biophysical and crown structural variation among tree species is the result of millions of years of evolution and of adaptations to various environmental conditions (Meireles et al., Reference Meireles, Cavender-Bares, Townsend, Ustin, Gamon, Schweiger, Schaepman, Asner, Martin, Singh, Schrodt, Chlus and O’Meara2020).
From a machine learning perspective, the biggest challenge associated with tree species diversity is that models trained on data from a given set of tree species might transfer poorly to other regions that host different species. However, the phylogeny and evolutionary distances of tree species are fairly well known (Zanne et al., Reference Zanne, Tank, Cornwell, Eastman, Smith, FitzJohn, McGlinn, O’Meara, Moles, Reich, Royer, Soltis, Stevens, Westoby, Wright, Aarssen, Bertin, Calaminus, Govaerts, Hemmings, Leishman, Oleksyn, Soltis, Swenson, Warman and Beaulieu2014), and tree species that are closer phylogenetically tend to be more similar in their traits (Ackerly, Reference Ackerly2009). As such, phylogenetic correlations and distances among tree species can potentially be leveraged to improve model transferability. Another interconnected challenge, elaborated upon in the following sections, pertains to the dynamic nature of tree species in relation to their leaf biophysical and crown structural characteristics. Instead, each individual differs according to their abiotic (e.g., microclimate and soil) and biotic environment (competition and herbivory), and as such, the expression of foliage and crown properties can overlap between species (Fassnacht et al., Reference Fassnacht, Latifi, Stereńczak, Modzelewska, Lefsky, Waser, Straub and Ghosh2016).
2.2.2. Seasons and phenology
Trees are sessile organisms, but they still respond dynamically to fluctuating seasons. In some cases, phenological properties, such as leaf onset, flowers, or seeds, might be of direct interest to monitor ecological phenomena or species (Wagner, Reference Wagner2021), in which case high-frequency multitemporal imagery might be required. Indeed, phenological changes among species, for example, changes in leaf color during autumn senescence, can help to distinguish tree species based on color, which can be used to improve species classification models (Cloutier et al., Reference Cloutier, Germain and Laliberté2023). However, phenological properties may also hinder the transferability of models across time (Kattenborn et al., Reference Kattenborn, Schiefer, Frey, Feilhauer, Mahecha and Dormann2022b). For instance, the information learnt by a machine learning model using data acquired in summer may not transfer to the same location in fall as trees may have changed in their leaf biochemical properties or the fraction of flowers and seeds in the canopy (Schiefer et al., Reference Schiefer, Schmidtlein and Kattenborn2021). In such cases, the temporal representativeness of data on individual tree species can be key (Kattenborn et al., Reference Kattenborn, Schiefer, Frey, Feilhauer, Mahecha and Dormann2022b).
2.2.3. Forest dynamics
The structure and composition of forests is strongly influenced by abiotic factors such as climate, geology, and soils, as well as water availability. For example, declining temperatures and/or growing season lengths with increasing latitude and/or elevation can filter out tree species that cannot tolerate low temperatures (e.g., low frost resistance) or that do not have enough time to produce mature tissue once the growing season becomes too short (Körner et al., Reference Körner, Basler, Hoch, Kollas, Lenz, Randin, Vitasse and Zimmermann2016). In addition, changes in soil nutrient availability driven by geomorphological processes can influence forest canopy biochemistry (Chadwick and Asner, Reference Chadwick and Asner2018). Water supply is also important: too much water favors trees that can tolerate waterlogging, whereas too little water favors trees that can resist or recover from xylem cavitation (Choat et al., Reference Choat, Jansen, Brodribb, Cochard, Delzon, Bhaskar, Bucci, Feild, Gleason, Hacke, Jacobsen, Lens, Maherali, Martínez-Vilalta, Mayr, Mencuccini, Mitchell, Nardini, Pittermann, Pratt, Sperry, Westoby, Wright and Zanne2012). Much of these environmental influences on forest composition express themselves via tree species turnover, that is, changes in tree species composition across these spatial environmental gradients or discontinuities. However, changes in forest composition and structure can also arise through intraspecific variation. Applications of machine learning methods to forest monitoring should integrate these sources of variation. In particular, incorporating environmental drivers of forest composition and structure as model inputs may help to transfer forest monitoring models from one region to the other.
Tree monitoring can also be affected by biotic factors—that is, by other organisms. First, pests and pathogens can impact tree health, foliage chemistry, and/or water content, which in turn can affect the remotely sensed signal of forest canopies (Sapes et al., Reference Sapes, Lapadat, Schweiger, Juzwik, Montgomery, Gholizadeh, Townsend, Gamon and Cavender-Bares2022). The health status of trees is often directly expressed via their foliage properties and crown architecture and therefore can cause a large variability in remote sensing signals (Zarco-Tejada et al., Reference Zarco-Tejada, Camino, Beck, Calderon, Hornero, Hernández-Clemente, Kattenborn, Montes-Borrego, Susca, Morelli, Gonzalez-Dugo, PRJ, Landa, Boscia, Saponari and Navas-Cortes2018; Kattenborn et al., Reference Kattenborn, Richter, Guimarães-Steinicke, Feilhauer and Wirth2022a). In addition, the remotely sensed signals of trees can also be “overshadowed” by other organisms that live in their crowns (epiphytes), particularly in tropical environments (Baldeck et al., Reference Baldeck, Asner, Martin, Anderson, Knapp, Kellner and Wright2015). Prominent examples are lianas or mistletoes.
Forest management activities as part of forest dynamics, such as harvesting, thinning, and pruning, can challenge the accurate mapping of forest attributes with remote sensing, as this crucial information is often unavailable but significantly impacts forest structure and composition. This lack of data therefore introduces uncertainty into remote sensing analyses and models.
2.2.4. Data collection
As previously mentioned, forests can exhibit significant diversity in terms of their composition and structure across different locations and time periods due to a variety of factors. This heterogeneity poses a particular difficulty in creating machine learning models for forest monitoring. Models developed for one region may not easily generalize to other regions that lie beyond the scope of the training data distribution. One solution for this issue would involve training these models using extensive datasets that encompass the complete spectrum of conditions present in diverse forest environments. Forest remote sensing data worldwide, especially acquired from sensors on satellites, are abundant and generally easily obtainable. In sharp contrast, there is a scarcity of ground-based data, including labels or annotations.
In contrast to other disciplines, annotating remote sensing data in the context of vegetation is often time-consuming, costly, and complex. This phenomenon arises due to the fact that vegetation of various species or conditions frequently exhibits striking similarities, often referred to as “greenery.” Moreover, vegetation communities often show smooth transitions across species or growth forms along environmental gradients. This aspect adds another layer of complexity to the task of distinguishing between individual plants, species, or growth forms (Kattenborn et al., Reference Kattenborn, Leitloff, Schiefer and Hinz2021). Often, field inventories become essential to validate annotations, such as the identification of tree species (Kattenborn et al., Reference Kattenborn, Eichel, Wiser, Burrows, Fassnacht and Schmidtlein2020; Cloutier et al., Reference Cloutier, Germain and Laliberté2023), or when the properties of interest, such as stem diameters, cannot be directly extracted from remote sensing data and require on-site measurements conducted by human observers. Gathering such field data is typically a time-intensive, expensive, and gradual process, leading to significant constraints on its accessibility. Field data such as stem diameters are very important to estimate aboveground tree biomass because most published allometric equations use stem diameter as its primary predictor (Gonzalez-Akre et al., Reference Gonzalez-Akre, Piponiot, Lepore, Herrmann, Lutz, Baltzer, Dick, Gilbert, He, Heym, Huerta, Jansen, Johnson, Knapp, Kral, Lin, Malhi, McMahon, Myers, Orwig, Rodriguez-Hernandez, Russo, Shue, Wang, Wolf, Yang, Davies and Anderson-Teixeira2022). Moreover, spatial coordinates frequently serve as the sole means of connecting field data to remote sensing data. However, GPS or GNSS geolocation in forest settings often introduces substantial uncertainties (ranging from meters to tens of meters), thereby posing challenges in accurately establishing a posteriori links between field observations and remote sensing data (Kattenborn et al., Reference Kattenborn, Leitloff, Schiefer and Hinz2021).
Therefore, integrating various datasets is a strategy aimed at addressing the scarcity of annotated data, cutting down annotation expenses, and lessening the dependency on field-based ground truthing. Nonetheless, this may result heterogenous datasets: In numerous cases, annotations vary (such as boxes, polygons, and points), as well as their quality, across different applications. Annotations are frequently customized to match particular remote sensing data characteristics, such as spatial resolution. Hence, directly merging labels from different datasets is often not feasible, or at the very least, alternative approaches are necessary. One such approach is weakly supervised learning, where the potential lack of label quality is counteracted by leveraging a substantial quantity of data (see Section 3.2.2).
The key takeaway from this section is that the development of machine learning models for forest monitoring will consistently involve a substantial surplus of unlabeled remote sensing data in comparison to labeled ground-truth data. This disparity arises due to the inherent challenges in obtaining labeled data. This scenario is not exclusive to forest monitoring; rather, it is a prevalent aspect shared with other geospatial applications using remote sensing data (Rahaman et al., Reference Rahaman, Weiss, Träuble, Locatello, Lacoste, Bengio, Pal, Li and Schölkopf2022; Mai et al., Reference Mai, Huang, Sun, Song, Mishra, Liu, Gao, Liu, Cong, Hu, Cundy, Li, Zhu and Lao2023a) (see Section 3.1.2). This has two main implications for machine learning research aimed at improving forest monitoring. First, there is a need to develop machine learning methods to forest monitoring that that can effectively utilize limited labeled data. One approach involves leveraging self-supervised learning techniques to extract valuable representations from the available data (see Section 3.2.1). Second, there exists a necessity for novel machine learning strategies, including active learning or alternative forms of model-assisted labeling. These approaches aim to expedite the process of label collection by human observers and reduce associated costs (see Section 3.2.3).
3. Machine learning perspectives and challenges
Machine learning algorithms in computer vision have gained significant capabilities over the past decade, for example, in image classification (Krizhevsky et al., Reference Krizhevsky, Sutskever and Hinton2012; Simonyan and Zisserman, Reference Simonyan and Zisserman2015; Szegedy et al., Reference Szegedy, Liu, Jia, Sermanet, Reed, Anguelov, Erhan, Vanhoucke and Rabinovich2015, Reference Szegedy, Vanhoucke, Ioffe, Shlens and Wojna2016, Reference Szegedy, Ioffe, Vanhoucke and Alemi2017; He et al., Reference He, Zhang, Ren and Sun2016; Hu et al., Reference Hu, Shen and Sun2018; Dosovitskiy et al., Reference Dosovitskiy, Beyer, Kolesnikov, Weissenborn, Zhai, Unterthiner, Dehghani, Minderer, Heigold, Gelly, Uszkoreit and Houlsby2021; Liu et al., Reference Liu, Lin, Cao, Hu, Wei, Zhang, Lin and Guo2021; Touvron et al., Reference Touvron, Cord, Douze, Massa, Sablayrolles and Jégou2021), object detection (Ren et al., Reference Ren, He, Girshick and Sun2015; Liu et al., Reference Liu, Anguelov, Erhan, Szegedy, Reed, Fu and Berg2016; Redmon et al., Reference Redmon, Divvala, Girshick and Farhadi2016; He et al., Reference He, Gkioxari, Dollar and Girshick2017; Redmon and Farhadi, Reference Redmon and Farhadi2018; Li et al., Reference Li, Zhang, Xu, Liu, Zhang, Ni and Shum2023a), and segmentation (Long et al., Reference Long, Shelhamer and Darrell2015; Ronneberger et al., Reference Ronneberger, Fischer and Brox2015; He et al., Reference He, Gkioxari, Dollar and Girshick2017; Lin et al., Reference Lin, Goyal, Girshick, He and Dollár2017; Chen et al., Reference Chen, Zhu, Papandreou, Schroff and Adam2018; Cheng et al., Reference Cheng, Misra, Schwing, Kirillov and Girdhar2022; Kirillov et al., Reference Kirillov, Mintun, Ravi, Mao, Rolland, Gustafson, Xiao, Whitehead, Berg, Lo, Dollár and Girshick2023). While many successful algorithmic paradigms have been established, different applications differ widely across sensory modalities and domain-specific constraints, necessitating the adaptation of algorithms to fit specific needs. For instance, detecting, localizing, and segmenting objects in a scene have been explored for LiDAR point cloud (Yang et al., Reference Yang, Luo and Urtasun2018), automotive RADAR (Ouaknine et al., Reference Ouaknine, Newson, Pérez, Tupin and Rebut2021a), and medical echocardiography (Leclerc et al., Reference Leclerc, Smistad, Pedrosa, Ostvik, Cervenansky, Espinosa, Espeland, Berg, Jodoin, Grenier, Lartizien, Dhooge, Lovstakken and and Bernard2019).
Machine learning algorithms for remote sensing have been the subject of extensive innovation and application (Ma et al., Reference Ma, Liu, Zhang, Ye, Yin and Johnson2019; Camps-Valls et al., Reference Camps-Valls, Tuia, Zhu and Reichstein2021) for problems involving classification (Maxwell et al., Reference Maxwell, Warner and Fang2018; Cheng et al., Reference Cheng, Xie, Han, Guo and Xia2020), object detection (Cheng and Han, Reference Cheng and Han2016; Li et al., Reference Li, Wan, Cheng, Meng and Han2020), and segmentation (Hoeser and Kuenzer, Reference Hoeser and Kuenzer2020; Yuan et al., Reference Yuan, Shi and Gu2021). In recent times, there has been a growing exploration of such techniques for forest monitoring purposes, aiming to enhance our understanding of forest composition, with a specific focus on tree species mapping (see Sections 2.2.1 and 2.1.2), that is, tree classification, tree detection, and tree segmentation (Fassnacht et al., Reference Fassnacht, Latifi, Stereńczak, Modzelewska, Lefsky, Waser, Straub and Ghosh2016; Diez et al., Reference Diez, Kentsch, Fukuda, Caceres, Moritake and Cabezas2021; Kattenborn et al., Reference Kattenborn, Leitloff, Schiefer and Hinz2021; Michalowska and Rapinski, Reference Michalowska and Rapinski2021). These tasks are accomplished using modalities from diverse sensors to gather complementary information.
Nevertheless, the study of machine learning for forest monitoring has not received as much attention as autonomous driving or medical imagery, despite the importance of forest conservation, restoration, and management as natural solutions to the joint climate and biodiversity crises (IPCC, 2023). Consequently, there are numerous unexplored machine learning challenges that need to be addressed in order to tackle climate change (Rolnick et al., Reference Rolnick, Donti, Kaack, Kochanski, Lacoste, Sankaran, Ross, Milojevic-Dupont, Jaques, Waldman-Brown, Luccioni, Maharaj, Sherwin, Mukkavilli, Kording, Gomes, Ng, Hassabis, Platt, Creutzig, Chayes and Bengio2023), including improving forest monitoring practices. Can the challenges encountered in adapting machine learning for forest monitoring be beneficial in exploring the challenges in the field of biology and ecology? This section will outline the current challenges in machine learning linked to forest monitoring, as described in Figure 1, and discuss the diverse strategies employed to tackle them.
3.1. Generalization
Generalization in machine learning refers to the ability of an algorithm to continue to perform well when evaluated on data different from that it was trained on (Zhang et al., Reference Zhang, Bengio, Hardt, Recht and Vinyals2017). One may speak of both in-distribution generalization (performance on data relatively similar to training data) and out-of-distribution generalization (performance on highly different data). Out-of-distribution generalization can be especially relevant to forest monitoring, since, as mentioned in Section 4, forest datasets have a wide range of variations in terms of geographical locations, species composition, sensors and scale (see Figure 2). Such variations introduce distinct data distribution shifts that need to be considered and addressed in forest monitoring tasks. For example, the effects of geographic variability of data have been examined in the context of tree species distributions (Dormann et al., Reference Dormann, McPherson, Araujo, Bivand, Bolliger, Carl, Davies, Hirzel, Jetz, Daniel Kissling, Kuhn, Ohlemuller, Peres-Neto, Reineking, Schroder, Schurr and Wilson2007) and biomass estimation (Ploton et al., Reference Ploton, Mortier, Réjou-Méchain, Barbier, Picard, Rossi, Dormann, Cornu, Viennois, Bayol, Lyapustin, Gourlet-Fleury and Pélissier2020). Simple algorithmic approaches to improve generalization include various forms of regularization (Zou and Hastie, Reference Zou and Hastie2005), data augmentation (Shorten and Khoshgoftaar, Reference Shorten and Khoshgoftaar2019), dropout (Srivastava et al., Reference Srivastava, Hinton, Krizhevsky, Sutskever and Salakhutdinov2014), and batch normalization (Ioffe and Szegedy, Reference Ioffe and Szegedy2015), while improving the breadth of training data, where possible, is also almost always beneficial in practice. However, generalization remains a very active field of research in machine learning. We consider two areas of work that may be of especial interest in forest monitoring.
3.1.1. Domain adaptation
Transfer learning refers to transferring information learnt by a model on one set of problems to different set of problems (Weiss et al., Reference Weiss, Khoshgoftaar and Wang2016). For example, one may pretrain a model on a large, commonly used dataset and then fine-tune it on a smaller dataset representing the specific problem in question. Transfer learning can boost generalization when there is a significant distribution shift between training and inference (Csurka, Reference Csurka2017). One particularly notable approach to transfer learning is domain adaptation, where a model must be applied to target domains that are unknown or lacking labeled data (Soltani et al., Reference Soltani, Feilhauer, Duker and Kattenborn2022). Some domain adaptation approaches have already been applied in plant identification (Ganin and Lempitsky, Reference Ganin and Lempitsky2015). Autonomous driving has witnessed significant exploration in the realm of unsupervised domain adaptation (UDA), where a model is trained on labeled data from the source domain and unlabeled data from the target domain, with the objective of improving its performance specifically on the target domain. It has been explored in the context of unlabeled or unseen source or target domains (Wilson and Cook, Reference Wilson and Cook2020) using generative (Hoffman et al., Reference Hoffman, Tzeng, Park, Zhu, Isola, Saenko, Efros and Darrell2018) or adversarial (Vu et al., Reference Vu, Jain, Bucher, Cord and Pérez2019) methods. The UDA framework has also been explored for cross-modal learning considering domains from different sensor modalities (Jaritz et al., Reference Jaritz, Vu, de Charette, Wirbel and Pérez2020). Within the context of forest monitoring, this framework could prove particularly valuable for adapting a model from one forest to another, regardless of whether they belong to the same biome or not, to identify similar species across both the source and target domains (see Sections 2.1.2 and 2.2.1). Additionally, this framework would be beneficial for adapting the model to address distribution shifts that occur between tree signature distributions (see Sections 2.1.4, 2.2.2, and 2.2.3) as well as between different sensors (see Section 2.2.4). Domain adaptation has been investigated in the field of remote sensing mostly in the context of extrapolation across time or geographical region, including approaches for both aerial and satellite data (Shi et al., Reference Shi, Du, Guo and Du2022; Wang et al., Reference Wang, Feng, Sun, Zhang, Zhang, Yang and Meng2022; Arnaudo et al., Reference Arnaudo, Tavera, Masone, Dominici and Caputo2023; Ma et al., Reference Ma, Zhang, Wang and Pun2023; Xu et al., Reference Xu, Shi, Yuan and Zhu2023). Such work holds potential for training generalizable algorithms for forest monitoring, such as adapting models from PhenoCams to satellite images (Kosmala et al., Reference Kosmala, Hufkens and Richardson2018).
3.1.2. Foundation models
Foundation models are models that can operate on diverse sets of input modalities, scales, data regimes, and downstream tasks. They refer to large-scale multimodal and multitask models, which have opened up research in generalization capacities such as increasing performances in applications unseen during training (Bommasani et al., Reference Bommasani, Hudson, Adeli, Altman, Arora, von Arx, Bernstein, Bohg, Bosselut, Brunskill, Brynjolfsson, Buch, Card, Castellon, Chatterji, Chen, Creel, Davis, Demszky, Donahue, Doumbouya, Durmus, Ermon, Etchemendy, Ethayarajh, Fei-Fei, Finn, Gale, Gillespie, Goel, Goodman, Grossman, Guha, Hashimoto, Henderson, Hewitt, Ho, Hong, Hsu, Huang, Icard, Jain, Jurafsky, Kalluri, Karamcheti, Keeling, Khani, Khattab, Koh, Krass, Krishna, Kuditipudi, Kumar, Ladhak, Lee, Lee, Leskovec, Levent, Li, Li, Ma, Malik, Manning, Mirchandani, Mitchell, Munyikwa, Nair, Narayan, Narayanan, Newman, Nie, Niebles, Nilforoshan, Nyarko, Ogut, Orr, Papadimitriou, Park, Piech, Portelance, Potts, Raghunathan, Reich, Ren, Rong, Roohani, Ruiz, Ryan, R’e, Sadigh, Sagawa, Santhanam, Shih, Srinivasan, Tamkin, Taori, Thomas, Tramèr, Wang, Wang, Wu, Wu, Wu, Xie, Yasunaga, You, Zaharia, Zhang, Zhang, Zhang, Zhang, Zheng, Zhou and Liang2021). Most of the discussed machine learning strategies can be further explored by training foundation models with diverse datasets, thereby enhancing their generalization capabilities. Forest datasets encompass a wide range of scales, ranging from field measurements to estimated world maps (refer to Figure 2), as well as varying resolutions and modalities for different tasks (as outlined in Section 4). The data diversity necessitates the utilization of generalized deep learning architectures. Influenced by the success of large language models (Devlin et al., Reference Devlin, Chang, Lee and Toutanova2019; Radford et al., Reference Radford, Wu, Child, Luan, Amodei and Sutskever2019, Reference Radford, Kim, Hallacy, Ramesh, Goh, Agarwal, Sastry, Askell, Mishkin, Clark, Krueger and Sutskever2021; Brown et al., Reference Brown, Mann, Ryder, Subbiah, Kaplan, Dhariwal, Neelakantan, Shyam, Sastry, Askell, Agarwal, Herbert-Voss, Krueger, Henighan, Child, Ramesh, Ziegler, Wu, Winter, Hesse, Chen, Sigler, Litwin, Gray, Chess, Clark, Berner, McCandlish, Radford, Sutskever and Amodei2020; Chowdhery et al., Reference Chowdhery, Narang, Devlin, Bosma, Mishra, Roberts, Barham, Chung, Sutton, Gehrmann, Schuh, Shi, Tsvyashchenko, Maynez, Rao, Barnes, Tay, Shazeer, Prabhakaran, Reif, Du, Hutchinson, Pope, Bradbury, Austin, Isard, Gur-Ari, Yin, Duke, Levskaya, Ghemawat, Dev, Michalewski, Garcia, Misra, Robinson, Fedus, Zhou, Ippolito, Luan, Lim, Zoph, Spiridonov, Sepassi, Dohan, Agrawal, Omernick, Dai, Pillai, Pellat, Lewkowycz, Moreira, Child, Polozov, Lee, Zhou, Wang, Saeta, Diaz, Firat, Catasta, Wei, Meier-Hellstern, Eck, Dean, Petrov and Fiedel2022; Hoffmann et al., Reference Hoffmann, Borgeaud, Mensch, Buchatskaya, Cai, Rutherford, Casas, Hendricks, Welbl, Clark, Hennigan, Noland, Millican, Driessche, Damoc, Guy, Osindero, Simonyan, Elsen, Rae, Vinyals and Sifre2022; Driess et al., Reference Driess, Xia, Sajjadi, Lynch, Chowdhery, Ichter, Wahid, Tompson, Vuong, Yu, Huang, Chebotar, Sermanet, Duckworth, Levine, Vanhoucke, Hausman, Toussaint, Greff, Zeng, Mordatch and Florence2023; Touvron et al., Reference Touvron, Lavril, Izacard, Martinet, Lachaux, Lacroix, Rozière, Goyal, Hambro, Azhar, Rodriguez, Joulin, Grave and Lample2023), recent advancements in computer vision have led to the development of models that incorporate multiple modalities and can perform multiple tasks simultaneously. In recent studies, researchers have explored the concept of multitask vision by utilizing natural images (Cheng et al., Reference Cheng, Schwing and Kirillov2021, Reference Cheng, Misra, Schwing, Kirillov and Girdhar2022; Kirillov et al., Reference Kirillov, Mintun, Ravi, Mao, Rolland, Gustafson, Xiao, Whitehead, Berg, Lo, Dollár and Girshick2023; Li et al., Reference Li, Zhang, Xu, Liu, Zhang, Ni and Shum2023a) or by employing text to enhance performance in vision-based tasks (Dancette and Cord, Reference Dancette and Cord2022; Xu et al., Reference Xu, De Mello, Liu, Byeon, Breuel, Kautz and Wang2022; Jain et al., Reference Jain, Li, Chiu, Hassani, Orlov and Shi2023a). In the realm of integrating image and text for performing tasks on both modalities, alternative approaches have been developed to improve performances by benefiting from their combination (Zhu et al., Reference Zhu, Zhu, Li, Wu, Wang, Li, Wang and Dai2022; Li et al., Reference Li, Zhu, Jiang, Zhu, Li, Yuan, Wang, Qiao, Wang, Wang and Dai2023b). Additionally, generalist models have been constructed to be agnostic to specific modalities and tasks (Jaegle et al., Reference Jaegle, Gimeno, Brock, Zisserman, Vinyals and Carreira2021, Reference Jaegle, Borgeaud, Alayrac, Doersch, Ionescu, Ding, Koppula, Zoran, Brock, Shelhamer, Hénaff, Botvinick, Zisserman, Vinyals and Carreira2022), enabling them to handle diverse modalities and tasks with a unified approach. In the field of computer vision, for example, the segment anything model (Kirillov et al., Reference Kirillov, Mintun, Ravi, Mao, Rolland, Gustafson, Xiao, Whitehead, Berg, Lo, Dollár and Girshick2023) has demonstrated the capability to perform instance segmentation by leveraging prompts in conjunction with input images. These architecture frameworks hold a significant value for forest monitoring tasks, enabling the detection, segmentation, and estimation of tree properties over large geographical areas, such as their canopy surface or their aboveground biomass (Tolan et al., Reference Tolan, Yang, Nosarzewski, Couairon, Vo, Brandt, Spore, Majumdar, Haziza, Vamaraju, Moutakani, Bojanowski, Johns, White, Tiecke and Couprie2023; Tucker et al., Reference Tucker, Brandt, Hiernaux, Kariryaa, Rasmussen, Small, Igel, Reiner, Melocik, Meyer, Sinno, Romero, Glennie, Fitts, Morin, Pinzon, McClain, Morin, Porter, Loeffler, Kergoat, Issoufou, Savadogo, Wigneron, Poulter, Ciais, Kaufmann, Myneni, Saatchi and Fensholt2023).
The utilization of foundation models with remote sensing data is still in its infancy. However, promising advances have been made in developing multimodal architectures (Zhang et al., Reference Zhang, Ming, Feng, Liu, He and Zhao2023) and temporal-based approaches (Garnot et al., Reference Garnot, Landrieu and Chehata2021; Garnot and Landrieu, Reference Garnot and Landrieu2021; Tarasiou et al., Reference Tarasiou, Chavez and Zafeiriou2023) specifically tailored for precise tasks in remote sensing applications. Based on the masked autoencoder pretraining method (He et al., Reference He, Chen, Xie, Li, Dollár and Girshick2022), multimodal and multitask architectures have been developed for Earth observation applications, particularly for land use and land cover (LULC) estimation (Cong et al., Reference Cong, Khanna, Meng, Liu, Rozi, He, Burke, Lobell and Ermon2022; Reed et al., Reference Reed, Gupta, Li, Brockman, Funk, Clipp, Funk, Candido, Uyttendaele and Darrell2022; Sun et al., Reference Sun, Wang, Lu, Zhu, Lu, He, Li, Rong, Yang, Chang, He, Yang, Wang, Lu and Fu2022; Tseng et al., Reference Tseng, Zvonkov, Purohit, Rolnick and Kerner2023). These architectures address the challenges posed by data collected from sensors that record diverse physical measurements, such as multispectral or SAR data in remote sensing (Reed et al., Reference Reed, Gupta, Li, Brockman, Funk, Clipp, Funk, Candido, Uyttendaele and Darrell2022; Pan et al., Reference Pan, Gao, Dong and Du2023; Yamazaki et al., Reference Yamazaki, Hanyu, Tran, Garcia, Tran, McCann, Liao, Rainwater, Adkins, Molthan, Cothren and Le2023), as well as in natural images (Themyr et al., Reference Themyr, Rambour, Thome, Collins and Hostettler2023). While different spectral, spatial, and temporal resolutions have been considered in previous works, there remains a lack of exploration regarding the resolution gap between datasets captured by aerial and satellite sensors. The integration of multimodal, multitask, and multiscale architectures is expected to significantly enhance the generalization capabilities of models for forest monitoring tasks at global scale (see Section 2.2.4). By training these algorithms with various types of datasets and specializing them for forest monitoring tasks, they could effectively deliver improved performance across different geographical regions such as for species cover estimation or aboveground biomass estimation (see Section 2.1.3).
3.2. Learning from limited data
There are a growing number of massive datasets and algorithms leveraging them, including across remote sensing (Sumbul et al., Reference Sumbul, de Wall, Kreuziger, Marcelino, Costa, Benevides, Caetano, Demir and Markl2021; Bastani et al., Reference Bastani, Wolters, Gupta, Ferdinando and Kembhavi2023; Rahaman et al., Reference Rahaman, Weiss, Träuble, Locatello, Lacoste, Bengio, Pal, Li and Schölkopf2022; Mai et al., Reference Mai, Huang, Sun, Song, Mishra, Liu, Gao, Liu, Cong, Hu, Cundy, Li, Zhu and Lao2023a). However, many of the most powerful machine learning approaches are supervised and, therefore, require labels, which can be challenging, time-consuming, and costly to obtain. There has been considerable attention given to the problem of learning from limited labeled data; we here present several families of approaches and their relevance to forest monitoring.
3.2.1. Self-supervised learning
Situated in-between supervised and unsupervised learning, the self-supervised learning paradigm involves training a model to reconstruct certain known relationships between or within the datapoints themselves. The resulting model can then be fine-tuned with actual labeled data or directly applied to solve the downstream task. Self-supervised approaches in computer vision include discriminative approaches that distinguish between positive and negative samples while separating their representation (e.g., contrastive learning) (Gidaris et al., Reference Gidaris, Singh and Komodakis2018; Chen et al., Reference Chen, Kornblith, Norouzi and Hinton2020; He et al., Reference He, Fan, Wu, Xie and Girshick2020; Caron et al., Reference Caron, Touvron, Misra, Jégou, Mairal, Bojanowski and Joulin2021; Oquab et al., Reference Oquab, Darcet, Moutakanni, Vo, Szafraniec, Khalidov, Fernandez, Haziza, Massa, El-Nouby, Assran, Ballas, Galuba, Howes, Huang, Li, Misra, Rabbat, Sharma, Synnaeve, Xu, Jegou, Mairal, Labatut, Joulin and Bojanowski2023) and generative approaches that learn representations through reconstruction (Lehtinen et al., Reference Lehtinen, Munkberg, Hasselgren, Laine, Karras, Aittala and Aila2018; He et al., Reference He, Chen, Xie, Li, Dollár and Girshick2022). The utilization of self-supervised learning in remote sensing has experienced significant growth due to the abundance of unlabeled open-access data (Tao et al., Reference Tao, Qi, Guo, Zhu and Li2023). For instance, geolocation of satellite images has been exploited with a contrastive approach (Ayush et al., Reference Ayush, Uzkent, Meng, Tanmay, Burke, Lobell and Ermon2021; Mai et al., Reference Mai, Lao, He, Song and Ermon2023b). Multispectral and SAR data have been reconstructed based on the temporal information (Cong et al., Reference Cong, Khanna, Meng, Liu, Rozi, He, Burke, Lobell and Ermon2022; Yadav et al., Reference Yadav, Nascetti, Azizpour and Ban2022), for multiscale reconstruction (Reed et al., Reference Reed, Gupta, Li, Brockman, Funk, Clipp, Funk, Candido, Uyttendaele and Darrell2022) and for denoising (Dalsasso et al., Reference Dalsasso, Denis and Tupin2021, Reference Dalsasso, Denis and Tupin2022; Meraoumia et al., Reference Meraoumia, Dalsasso, Denis, Abergel and Tupin2023). Emerging cross-modal approaches, encompassing both discriminative (Jain et al., Reference Jain, Wilson and Gulshan2022) and generative (Jain et al., Reference Jain, Schoen-Phelan and Ross2023b) techniques, are being developed to harness the complementary nature of aligned samples. Self-supervised learning will greatly unleash the potential of remote sensing data in the area of forests (Ge et al., Reference Ge, Gu, Su, Lönnqvist and Antropov2023) by learning textural and geometrical structures of forests and trees without labels (see Sections 2.2.1 and 2.2.4).
3.2.2. Weakly supervised learning
Obtaining precise and detailed annotations, for example, for tree crown instance segmentation, can be both costly and time-consuming. Although self-supervised learning aims to learn representations from pretext tasks, it still necessitates precise annotations for fine-tuning the model in a downstream task. In cases where precise annotations are not available, coarse-grained and potentially inaccurate annotations, or even single point locations, can be utilized as weak labels in weakly supervised learning approaches (Zhou, Reference Zhou2018). Given their cost-effectiveness and efficiency, computer vision methods have been developed to leverage weak annotations while addressing their inherent inaccuracies (Zhou, Reference Zhou2018). Weakly supervised learning has been explored in the realm of object location (Oquab et al., Reference Oquab, Bottou, Laptev and Sivic2015), object relationship estimation (Peyre et al., Reference Peyre, Sivic, Laptev and Schmid2017), instance segmentation (Ahn et al., Reference Ahn, Cho and Kwak2019), and contrastive learning (Zheng et al., Reference Zheng, Wang, You, Qian, Zhang, Wang and Xu2021). Obtaining high-quality annotations for remote sensing data is particularly difficult due to their poor spatial resolution or the physics of the sensors used. Weakly supervised learning has therefore been investigated for Earth observation tasks including object detection (Han et al., Reference Han, Zhang, Cheng, Guo and Ren2015; Zhang et al., Reference Zhang, Han, Cheng, Liu, Bu and Guo2015; Yao et al., Reference Yao, Feng, Han, Cheng and Guo2021), LULC semantic segmentation (Yao et al., Reference Yao, Han, Cheng, Qian and Guo2016; Wang et al., Reference Wang, Chen, Xie, Azzari and Lobell2020b), and plant traits regression (Schiller et al., Reference Schiller, Schmidtlein, Boonman, Moreno-Martínez and Kattenborn2021; Cherif et al., Reference Cherif, Feilhauer, Berger, Dao, Ewald, Hank, He, Kovach, Lu, Townsend and Kattenborn2023). Recently, weakly supervised methods have been investigated in the areas of tree classification (Illarionova et al., Reference Illarionova, Trekin, Ignatiev and Oseledets2021), tree counting (Amirkolaee et al., Reference Amirkolaee, Shi and Mulligan2023), tree detection (Aygunes et al., Reference Aygunes, Cinbis and Aksoy2021), and tree segmentation (Gazzea et al., Reference Gazzea, Kristensen, Pirotti, Ozguven and Arghandeh2022) using multispectral data (see Section 2.2.4).
3.2.3. Active learning
Even highly precise and fine-grained annotations are generally less useful if present in only small quantities. To address this limitation, active learning strategies have been developed to identify and select the optimal way to select a small set of training datapoints to label (Cohn et al., Reference Cohn, Ghahramani and Jordan1996). These strategies for sample selection often rely on estimating the uncertainty of a model (Gal et al., Reference Gal, Islam and Ghahramani2017), for instance, using variational approaches (Sinha et al., Reference Sinha, Ebrahimi and Darrell2019) or estimated with a loss function (Yoo and Kweon, Reference Yoo and Kweon2019). They have demonstrated their effectiveness in scenarios where the amount of labeled data is limited, particularly in the context of image classification (Gal et al., Reference Gal, Islam and Ghahramani2017; Sinha et al., Reference Sinha, Ebrahimi and Darrell2019; Yoo and Kweon, Reference Yoo and Kweon2019), object detection (Roy et al., Reference Roy, Unmesh and Namboodiri2019), and semantic segmentation (Siddiqui et al., Reference Siddiqui, Valentin and Niessner2020). Active learning has also been investigated for remote sensing applications, including classification (Tuia et al., Reference Tuia, Volpi, Copa, Kanevski and Munoz-Mari2011), object detection (Qu et al., Reference Qu, Du, Cao, Guan and Zhao2020), and LULC semantic segmentation with hyperspectral data (Li et al., Reference Li, Bioucas-Dias and Plaza2010; Li et al., Reference Li, Bioucas-Dias and Plaza2011; Zhang et al., Reference Zhang, Pasolli, Crawford and Tilton2016). Its application would be helpful for forest monitoring to optimize and create relevant human annotations (see Section 2.2.4).
3.2.4. Few-shot learning
Yet another approach to limited data availability is few-shot learning, which refers to efficient fine-tuning of a pretrained model using only a few annotated datapoints. Few-shot learning has been approached from different perspectives, considering the comparison between the small annotated dataset and the data used for pretraining the model—for example, by quantifying the similarities between these datasets (Vinyals et al., Reference Vinyals, Blundell, Lillicrap, Kavukcuoglu and Wierstra2016), constructing mixtures of feature embeddings (Snell et al., Reference Snell, Swersky and Zemel2017), or adapting the optimization scheme through meta-learning (Finn et al., Reference Finn, Abbeel and Levine2017). Motivated by the limited availability of annotations, applications of few-shot learning in remote sensing tasks have been investigated. For instance, methods based on feature similarity (Alajaji et al., Reference Alajaji, Alhichri, Ammour and Alajlan2020; Zhang et al., Reference Zhang, Bai, Wang, Bai and Li2020; Alosaimi et al., Reference Alosaimi, Alhichri, Bazi, Ben Youssef and Alajlan2023) and metric learning (Liu et al., Reference Liu, Yu, Yu, Zhang, Wan and Wang2019), aiming at separating representations in an embedding space, have been explored for LULC classification with either multispectral or hyperspectral data. Objects have also been detected by learning meta features (Deng et al., Reference Deng, Li and Fang2022). Metric learning techniques have also been utilized in the context of few-shot learning for semantic segmentation tasks (Jiang et al., Reference Jiang, Zhou and Li2022) or meta learning with multispectral and SAR data (Rußwurm et al., Reference Rußwurm, Wang, Körner and Lobell2020). Few-shot learning has been explored for tree species classification using feature similarity (Chen et al., Reference Chen, Tian, Chai, Zhang and Chen2021) and would be beneficial to recognize a species or estimate the characteristics of a tree with minimal manual annotations (see Sections 2.2.2 and 2.2.4).
3.2.5. Zero-shot learning
The machine learning community has also shown interest in developing methods for training algorithms to differentiate unseen classes without any explicitly annotated samples at all, which is known as zero-shot learning (Xian et al., Reference Xian, Lampert, Schiele and Akata2018). In order to categorize unseen classes, the task of zero-shot learning has been accomplished by projecting image and word embeddings (Socher et al., Reference Socher, Ganjoo, Sridhar, Bastani, Manning and Ng2013) or known semantic attributes (Lampert et al., Reference Lampert, Nickisch and Harmeling2014) into a shared space. Zero-shot learning has also been investigated by incorporating a mixture of embeddings from the source domain before computing similarities with the target domain, which includes the unseen classes (Zhang and Saligrama, Reference Zhang and Saligrama2015). Generative approaches have been developed to create visual feature embeddings of unseen classes from word embeddings for zero-shot semantic segmentation (Bucher et al., Reference Bucher, Vu, Cord and Pérez2019). Zero-shot learning has also garnered attention in remote sensing applications, including combining multispectral data and word embeddings for classification tasks (Li et al., Reference Li, Lu, Wang, Xiang and Wen2017, Reference Li, Kong, Zhang, Tan and Chen2021) and initial exploration of applying zero-shot learning to classify hyperspectral data (Freitas et al., Reference Freitas, Silva and Silva2022). Generative approaches have also been used with remote sensing data to create visual embeddings from word embeddings (Li et al., Reference Li, Zhang, Wang, Lin and Zhang2022b). Zero-shot learning presents a promising approach for forest monitoring, enabling the adaptation of models in regions where previously unseen species are encountered (see Sections 2.2.2 and 2.2.4). By leveraging tree taxonomy hierarchy and meta characteristics to align with visual embeddings (Sumbul et al., Reference Sumbul, Cinbis and Aksoy2018), a vast research potential emerges. Notably, utilizing foundation models that have demonstrated strong zero-shot learning capabilities (Brown et al., Reference Brown, Mann, Ryder, Subbiah, Kaplan, Dhariwal, Neelakantan, Shyam, Sastry, Askell, Agarwal, Herbert-Voss, Krueger, Henighan, Child, Ramesh, Ziegler, Wu, Winter, Hesse, Chen, Sigler, Litwin, Gray, Chess, Clark, Berner, McCandlish, Radford, Sutskever and Amodei2020; Radford et al., Reference Radford, Kim, Hallacy, Ramesh, Goh, Agarwal, Sastry, Askell, Mishkin, Clark, Krueger and Sutskever2021) further enhances this potential. In the following paragraph, we will explore methods concerning domain-specific objectives, focusing on the consideration of physical and biological constraints and their applications.
3.3. Domain-specific objectives
Machine learning methods commonly use a fairly limited set of metrics to evaluate success, such as (macro or micro) accuracy of labels and cross-entropy loss for classification tasks, mean squared error or mean average error for regression tasks, and so forth. However, these uniform metrics do not necessarily reflect the realities of real-world use cases (Birhane et al., Reference Birhane, Kalluri, Card, Agnew, Dotan and Bao2022), where criteria for success may be much more nuanced or domain-specific. In this section, we consider two other families of objectives that may frequently be of relevance in forest monitoring.
3.3.1. Constraints on data
Depending on the domain of application, the outputs of a machine learning pipeline may have specific constraints that must be satisfied if the answer is to be useful or even possible. For example, climate variables may need to obey physical laws such as conservation of energy, engineered systems may need to obey the laws of mechanics, and so forth. Machine learning models to work with such variables have increasingly been designed with soft constraints (Ouaknine et al., Reference Ouaknine, Newson, Pérez, Tupin and Rebut2021a; Harder et al., Reference Harder, Watson-Parris, Stier, Strassel, Gauger and Keuper2022), which impose penalties for constraint violation, or hard constraints (Donti et al., Reference Donti, Rolnick and Kolter2021; Geiss and Hardin, Reference Geiss and Hardin2021; Harder et al., Reference Harder, Ramesh, Hernandez-Garcia, Yang, Sattigeri, Szwarcman, Watson and Rolnick2023), where the constraints are strictly enforced by the design of the algorithm. Compared to physics- and engineered-based constraints, fewer authors have to date integrated biological constraints into ML algorithms. Dynamics of biological systems have been included in a deep learning optimization scheme as hard constraints from ordinary differential equations (Yazdani et al., Reference Yazdani, Lu, Raissi and Karniadakis2020). There are potential opportunities for incorporating biological constraints in forest monitoring by considering phenological (Richardson et al., Reference Richardson, Hufkens, Milliman and Frolking2018) or biophysical traits, or ecosystem properties (see Section 2.1.5), for example, by considering the ratio of tree height and canopy size. These constraints could be particularly valuable in tasks such as semantic segmentation or biomass estimation.
Domain-specific constraints on data may also pose opportunities for improving the design of machine learning models. The design of deep learning model architectures can incorporate considerations for, or reconstruction of, physical properties. For instance, a physics-informed architecture has been developed for super-resolution in turbulent flows, incorporating partial differential equations as a form of regularization (Jiang et al., Reference Jiang, Esmaeilzadeh, Azizzadenesheli, Kashinath, Mustafa, Tchelepi, Marcus, Prabhat and Anandkumar2020). Similarly, RADAR-based architectures have been created to reconstruct physical properties for scene understanding in the context of autonomous driving (Ouaknine et al., Reference Ouaknine, Newson, Pérez, Tupin and Rebut2021a; Rebut et al., Reference Rebut, Ouaknine, Malik and Pérez2022). Leveraging the properties of multiple sensors has also been employed to fuse their representations (Ouaknine, Reference Ouaknine2022) or to generate annotations from one modality to another (Ouaknine et al., Reference Ouaknine, Newson, Rebut, Tupin and Perez2021b; Schiefer et al., Reference Schiefer, Schmidtlein, Frick, Frey, Klinke, Zielewska-Büttner, Junttila, Uhl and Kattenborn2023). In remote sensing, self-supervised learning has benefited from SAR physical properties by considering a pretext denoising task (Dalsasso et al., Reference Dalsasso, Denis and Tupin2021; Meraoumia et al., Reference Meraoumia, Dalsasso, Denis, Abergel and Tupin2023), or by separating and reconstructing the real from the imaginary part of the signal (Dalsasso et al., Reference Dalsasso, Denis and Tupin2022). Such methods could also be explored by exploiting various sensors to learn representations of forests and trees (see Section 2.2.4).
3.3.2. Uncertainty quantification
Biological phenomena adhere to intricate rules that are challenging to estimate and often exhibit inherent uncertainties. The estimation of prediction uncertainty aids in obtaining a better understanding of the strengths and limitations of a machine learning model. The overall uncertainty of these models comprises both aleatoric and epistemic uncertainties (Gal, Reference Gal2016). They both can be distinguished based on their origins. Aleatoric uncertainty arises from the inherent noise present in the data and label distributions, while epistemic uncertainty is associated with the model itself, encompassing its estimated parameters and structural characteristics. Approaches have been devised to estimate the uncertainties of deep neural networks, for example, by using a Bayesian approach such as Monte Carlo dropout (Gal and Ghahramani, Reference Gal and Ghahramani2016), by using adversarial training combined with model ensembles (Lakshminarayanan et al., Reference Lakshminarayanan, Pritzel and Blundell2017), by predicting the uncertainty distribution (Malinin and Gales, Reference Malinin and Gales2018), or by learning an auxiliary confidence score from the data (Corbière et al., Reference Corbière, Thome, Bar-Hen, Cord and Pérez2019; Corbière, Reference Corbière2022). Similar methods have been applied to estimate uncertainty in remote sensing data for crop yield estimation (Ma et al., Reference Ma, Zhang, Kang and özdoğan2021b) or road segmentation (Haas and Rabus, Reference Haas and Rabus2021). The quantification of uncertainties in forest monitoring methods has been carried out to assess both aleatoric and epistemic uncertainties (see Section 2.2). This is commonly performed to evaluate the uncertainty of predictions on large-scale maps, utilizing low-resolution satellite data. The uncertainty of plant functional type has been studied for classification in Siberia (Ottlé et al., Reference Ottlé, Lescure, Maignan, Poulter, Wang and Delbart2013). Estimating the uncertainty of aboveground biomass has also been conducted to establish a range of estimated values in carbon stock maps (Patterson et al., Reference Patterson, Healey, Ståhl, Saarela, Holm, Andersen, Dubayah, Duncanson, Hancock, Armston, Kellner, Cohen and Yang2019; Santoro et al., Reference Santoro, Cartus, Carvalhais, Rozendaal, Avitabile, Araza, de Bruin, Herold, Quegan, Rodriguez-Veiga, Balzter, Carreiras, Schepaschenko, Korets, Shimada, Itoh, Martínez, Cavlovic, Cazzolla Gatti, da Conceiçao Bispo, Dewnath, Labrière, Liang, Lindsell, Mitchard, Morel, Pacheco Pascagaza, Ryan, Slik, Vaglio Laurin, Verbeeck, Wijaya and Willcock2021) (see Section 2.1.3). To quantify uncertainty, these methods utilize standard deviations or output probabilities of the model. Recent studies have taken a step further in estimating tree carbon stocks in semi-arid sub-Saharan Africa north of the Equator by combining uncertainty from both allometric equations and predicted crown segmentation, utilizing field measurements (Tucker et al., Reference Tucker, Brandt, Hiernaux, Kariryaa, Rasmussen, Small, Igel, Reiner, Melocik, Meyer, Sinno, Romero, Glennie, Fitts, Morin, Pinzon, McClain, Morin, Porter, Loeffler, Kergoat, Issoufou, Savadogo, Wigneron, Poulter, Ciais, Kaufmann, Myneni, Saatchi and Fensholt2023). There has been limited application of advanced uncertainty quantification methods, whether associated with the data or the predictive model, in the context of forest monitoring.
Despite the extensive application of the presented machine learning techniques in remote sensing, their utilization for forest monitoring has been relatively limited. This presents numerous opportunities to gain deeper insights into the composition of forests while achieving generalization at a large scale. However, it is crucial to have access to high-quality, diverse, and sufficient datasets in order to effectively explore machine learning strategies. In the following section, we will review open-access forest datasets, providing information on their size, tasks, scale, and modalities.
4. Review of open-access forest datasets
Open-access datasets are essential to drive the scientific community in general to exploring forest biology challenges, particularly by using machine learning strategies (see Section 3). Deep learning algorithms have demonstrated strong performance in various forest monitoring tasks, such as tree classification or segmentation (Kattenborn et al., Reference Kattenborn, Leitloff, Schiefer and Hinz2021). The availability of open-access datasets has played a significant role in enhancing the algorithm performances and expanding their applications on a larger scale. In this particular field, the use of data, from the tree to the country level (see Figure 2), distributed in the entire globe, must be taken into consideration. Algorithms have been trained for forest monitoring by leveraging datasets that encompass different scales, modalities, and tasks (Guimaraes et al., Reference Guimaraes, Padua, Marques, Silva, Peres and Sousa2020; Kattenborn et al., Reference Kattenborn, Leitloff, Schiefer and Hinz2021; Michalowska and Rapinski, Reference Michalowska and Rapinski2021). However, the limited availability of data sources often restricts public access, thereby impeding the progress of extended research projects. While the scientific community emphasizes the importance of reproducible experiments, it is worth noting that some datasets do not fully adhere to the fair principles (https://www.go-fair.org/fair-principles/), which encompass aspects like documentation and findability.
While there is still a considerable quantity of publicly available datasets, it is important to acknowledge that they may have certain limitations that restrict their impact in machine learning applications for forest composition analysis. These limitations can include factors such as the size of the dataset or the specific type of data that is released. This section aims to review forest monitoring datasets considering the following criteria:
-
1. The dataset should be open-access, that is, without any request requirement.
-
2. The dataset should be related to at least one published article; exceptions have been made for datasets that are available as preprints, but are considered to be must-see datasets.
-
3. The dataset should be focused on the composition of the forest, excluding event-based specific ones (i.e., wildfire detection).
-
4. An LULC dataset should contain more than a single plant functional type (i.e., conifers or deciduous) since a focus is made on better understanding the composition of the forest.
-
5. The dataset should be at the tree level at least, excluding datasets at the organ or cellular level considered as out of the scope of this review (e.g., leaf spectra or root scans).
-
6. The dataset should contained at least
$ O\left(1,000\right) $ trees.
Based on these criteria, 86 datasets have been identified representing a wide range of geographical locations and spanning from 1974 to 2022. The datasets are associated with publications from 2005 to 2023, as depicted in Figure 3.

Figure 3. Distribution of the reviewed open-access forest datasets. Note: (Left) World map of the location of the reviewed datasets at the country level. Most of the datasets are regional and do not reflect the entire associated country. The datasets categorized with a “Worldwide” location or at the continent level have been excluded for visualization purposes. (Right) Distributions of the publication years and recording years used and/or released in the associated datasets.
The scope of the presented review is broad; it is likely that other datasets meeting these requirements have been missed. Based on this motivation, the study is supported by OpenForest, a dynamic catalog integrating the reviewed datasets and open to updates from the community. (The catalog contains all URLs to access the datasets that are not included in this article to ensure a temporal consistency. OpenForest is available at https://github.com/RolnickLab/OpenForest.) Updates on OpenForest will be restricted with the criteria detailed above. We hope to motivate researchers by grouping our efforts to create the largest database of open-access forest datasets and thus create synergies on forest monitoring applications.
This section will review open-access forest datasets grouped at different scales as presented in Figure 2: inventories (Section 4.1), ground-based recordings (Section 4.2), aerial recordings (Section 4.3), satellite recordings (Section 4.4), and country or world maps (Section 4.5). Datasets composed of mixed scales are finally presented (Section 4.6).
Each section will detail the overall scope of the presented datasets with the specificity of the sensors used to record the data, the information related to each dataset, and their applications. In each section, the reviewed datasets will be categorized in tables, respectively, to the scale of the released data. In these tables, the publication and recording years are differentiated to better understand the temporal scope of the datasets. The recordings years are distinguished with a new line, while time series are represented by an upper dash. Each table will relate the available modalities in the “Data” column. This one is separated with the “Spatial resolution” or “Spatial precision” column (except for inventories) with a dashed line to associate a resolution to the corresponding modality. Each section will also discuss the limits of current open-access datasets to motivate our perspectives presented in Section 5. The following section will review inventory datasets as the smallest scale of recordings that have been taken into account.
4.1. Inventories
Historically, forests have been mostly locally or regionally inventoried based on stratified plot samples acquired in the field (Jucker et al., Reference Jucker, Fischer, Chave, Coomes, Caspersen, Ali, Loubota Panzou, Feldpausch, Falster, Usoltsev, Adu-Bredu, Alves, Aminpour, Angoboy, Anten, Antin, Askari, Muñoz, Ayyappan, Balvanera, Banin, Barbier, Battles, Beeckman, Bocko, Bond-Lamberty, Bongers, Bowers, Brade, van Breugel, Chantrain, Chaudhary, Dai, Dalponte, Dimobe, Domec, Doucet, Duursma, Enríquez, van Ewijk, Farfán-Rios, Fayolle, Forni, Forrester, Gilani, Godlee, Gourlet-Fleury, Haeni, Hall, He, Hemp, Hernández-Stefanoni, Higgins, Holdaway, Hussain, Hutley, Ichie, Iida, Jiang, Joshi, Kaboli, Larsary, Kenzo, Kloeppel, Kohyama, Kunwar, Kuyah, Kvasnica, Lin, Lines, Liu, Lorimer, Loumeto, Malhi, Marshall, Mattsson, Matula, Meave, Mensah, Mi, Momo, Moncrieff, Mora, Nissanka, O’Hara, Pearce, Pelissier, Peri, Ploton, Poorter, Pour, Pourbabaei, Dupuy-Rada, Ribeiro, Ryan, Sanaei, Sanger, Schlund, Sellan, Shenkin, Sonké, Sterck, Svátek, Takagi, Trugman, Ullah, Vadeboncoeur, Valipour, Vanderwel, Vovides, Wang, Wang, Wirth, Woods, Xiang, Ximenes, Xu, Yamada and Zavala2022). Digitized and open-access inventories generally cover small areas, consisting of dozens or a few hundred trees, which limits their impact on the machine learning community (Section 3). As defined in Section 4, this section is focused on medium- to large-scale inventories with at least
$ O\left(1,000\right) $
trees. A significant part of reviewed inventory datasets are mixed with modalities at different scales, which will be detailed in Section 4.6.
Inventory datasets are summarized in Table 1; the size of the datasets is quantified by the number of trees. Inventory datasets are composed of various measurements. They commonly contain tree height, canopy diameter, diameter at breast height (DBH), or diameter at soil height (DSH) (Gastauer et al., Reference Gastauer, Leyh and Meira-Neto2015; Jucker et al., Reference Jucker, Fischer, Chave, Coomes, Caspersen, Ali, Loubota Panzou, Feldpausch, Falster, Usoltsev, Adu-Bredu, Alves, Aminpour, Angoboy, Anten, Antin, Askari, Muñoz, Ayyappan, Balvanera, Banin, Barbier, Battles, Beeckman, Bocko, Bond-Lamberty, Bongers, Bowers, Brade, van Breugel, Chantrain, Chaudhary, Dai, Dalponte, Dimobe, Domec, Doucet, Duursma, Enríquez, van Ewijk, Farfán-Rios, Fayolle, Forni, Forrester, Gilani, Godlee, Gourlet-Fleury, Haeni, Hall, He, Hemp, Hernández-Stefanoni, Higgins, Holdaway, Hussain, Hutley, Ichie, Iida, Jiang, Joshi, Kaboli, Larsary, Kenzo, Kloeppel, Kohyama, Kunwar, Kuyah, Kvasnica, Lin, Lines, Liu, Lorimer, Loumeto, Malhi, Marshall, Mattsson, Matula, Meave, Mensah, Mi, Momo, Moncrieff, Mora, Nissanka, O’Hara, Pearce, Pelissier, Peri, Ploton, Poorter, Pour, Pourbabaei, Dupuy-Rada, Ribeiro, Ryan, Sanaei, Sanger, Schlund, Sellan, Shenkin, Sonké, Sterck, Svátek, Takagi, Trugman, Ullah, Vadeboncoeur, Valipour, Vanderwel, Vovides, Wang, Wang, Wirth, Woods, Xiang, Ximenes, Xu, Yamada and Zavala2022; National Ecological Observatory Network (NEON), 2023; Oliveira et al., Reference Oliveira, Farias, Perdiz, Scudeller and Imbrozio Barbosa2017;Pérez-Luque et al., Reference Pérez-Luque, Zamora, Bonet and Pérez-Pérez2015; Pérez-Luque et al., Reference Pérez-Luque, Gea-Izquierdo and Zamora2021). In specific cases, wood density, bark density, and bark thickness are also measured (Schepaschenko et al., Reference Schepaschenko, Chave, Phillips, Lewis, Davies, Réjou-Méchain, Sist, Scipal, Perger, Herault, Labrière, Hofhansl, Affum-Baffoe, Aleinikov, Alonso, Amani, Araujo-Murakami, Armston, Arroyo, Ascarrunz, Azevedo, Baker, Bałazy, Bedeau, Berry, Bilous, Bilous, Bissiengou, Blanc, Bobkova, Braslavskaya, Brienen, Burslem, Condit, Cuni-Sanchez, Danilina, del Castillo Torres, Derroire, Descroix, Sotta, d’Oliveira, Dresel, Erwin, Evdokimenko, Falck, Feldpausch, Foli, Foster, Fritz, Garcia-Abril, Gornov, Gornova, Gothard-Bassébé, Gourlet-Fleury, Guedes, Hamer, Susanty, Higuchi, Coronado, Hubau, Hubbell, Ilstedt, Ivanov, Kanashiro, Karlsson, Karminov, Killeen, Koffi, Konovalova, Kraxner, Krejza, Krisnawati, Krivobokov, Kuznetsov, Lakyda, Lakyda, Licona, Lucas, Lukina, Lussetti, Malhi, Manzanera, Marimon, Junior, Martinez, Martynenko, Matsala, Matyashuk, Mazzei, Memiaghe, Mendoza, Mendoza, Moroziuk, Mukhortova, Musa, Nazimova, Okuda, Oliveira, Ontikov, Osipov, Pietsch, Playfair, Poulsen, Radchenko, Rodney, Rozak, Ruschel, Rutishauser, See, Shchepashchenko, Shevchenko, Shvidenko, Silveira, Singh, Sonké, Souza, Stereńczak, Stonozhenko, Sullivan, Szatniewska, Taedoumg, ter Steege, Tikhonova, Toledo, Trefilova, Valbuena, Gamarra, Vasiliev, Vedrova, Verhovets, Vidal, Vladimirova, Vleminckx, Vos, Vozmitel, Wanek, West, Woell, Woods, Wortel, Yamada, Nur Hajar and Zo-Bi2019; Farias et al., Reference Farias, Silva, de Oliveira Perdiz, Citó, da Silva Carvalho and Barbosa2020; Kindermann et al., Reference Kindermann, Dobler, Niedeggen, Fabiano and Linstädter2022). These information are particularly useful to estimate the tree density, the aboveground biomass (AGB), or the tree carbon stock at large scale (Tucker et al., Reference Tucker, Brandt, Hiernaux, Kariryaa, Rasmussen, Small, Igel, Reiner, Melocik, Meyer, Sinno, Romero, Glennie, Fitts, Morin, Pinzon, McClain, Morin, Porter, Loeffler, Kergoat, Issoufou, Savadogo, Wigneron, Poulter, Ciais, Kaufmann, Myneni, Saatchi and Fensholt2023) even if the inventories have not been released with the estimated maps (Patterson et al., Reference Patterson, Healey, Ståhl, Saarela, Holm, Andersen, Dubayah, Duncanson, Hancock, Armston, Kellner, Cohen and Yang2019; Dionizio et al., Reference Dionizio, Pimenta, Lima and Costa2020).
Table 1. Review of open-access forest inventories datasets

Note: The dataset size measured in K is
$ O\left({10}^3\right) $
. AGB = aboveground biomass; Classif. = classification; DBH = diameter at breast height; DSH = diameter at soil height; N/A = non-applicable; OL = object localization; Reg. = regression; Unknown = non-provided by the authors.
Species, genus, and family of the trees are generally provided. This hierarchy of labels coming alongside with the tree geo-location make inventories a very accurate datasets for understanding forest composition. However, they are geographically sparse and centered in a specific location to reduce measurement efforts (Laar and Akça, Reference Laar and Akça2007; Motz et al., Reference Motz, Sterba and Pommerening2010). As an exception, Tallo (Jucker et al., Reference Jucker, Fischer, Chave, Coomes, Caspersen, Ali, Loubota Panzou, Feldpausch, Falster, Usoltsev, Adu-Bredu, Alves, Aminpour, Angoboy, Anten, Antin, Askari, Muñoz, Ayyappan, Balvanera, Banin, Barbier, Battles, Beeckman, Bocko, Bond-Lamberty, Bongers, Bowers, Brade, van Breugel, Chantrain, Chaudhary, Dai, Dalponte, Dimobe, Domec, Doucet, Duursma, Enríquez, van Ewijk, Farfán-Rios, Fayolle, Forni, Forrester, Gilani, Godlee, Gourlet-Fleury, Haeni, Hall, He, Hemp, Hernández-Stefanoni, Higgins, Holdaway, Hussain, Hutley, Ichie, Iida, Jiang, Joshi, Kaboli, Larsary, Kenzo, Kloeppel, Kohyama, Kunwar, Kuyah, Kvasnica, Lin, Lines, Liu, Lorimer, Loumeto, Malhi, Marshall, Mattsson, Matula, Meave, Mensah, Mi, Momo, Moncrieff, Mora, Nissanka, O’Hara, Pearce, Pelissier, Peri, Ploton, Poorter, Pour, Pourbabaei, Dupuy-Rada, Ribeiro, Ryan, Sanaei, Sanger, Schlund, Sellan, Shenkin, Sonké, Sterck, Svátek, Takagi, Trugman, Ullah, Vadeboncoeur, Valipour, Vanderwel, Vovides, Wang, Wang, Wirth, Woods, Xiang, Ximenes, Xu, Yamada and Zavala2022) groups inventories from all around the world, with an unprecedented number of species reported. The latter could have an impact on estimating tree species distribution at large scale.
Considering that inventories contain annotations of trees or tree clusters, they open possibilities to segment tree canopies according to their taxonomic levels, regress continuous metrics (i.e., height and biomass), or even locate tree individuals by predicting their coordinates or crown perimeter (Tucker et al., Reference Tucker, Brandt, Hiernaux, Kariryaa, Rasmussen, Small, Igel, Reiner, Melocik, Meyer, Sinno, Romero, Glennie, Fitts, Morin, Pinzon, McClain, Morin, Porter, Loeffler, Kergoat, Issoufou, Savadogo, Wigneron, Poulter, Ciais, Kaufmann, Myneni, Saatchi and Fensholt2023). Another example could be to estimate the wood density of a tree or its carbon stock using allometric equations with information on taxonomy and height measured on the field (Zianis et al., Reference Zianis, Muukkonen, Mäkipää and Mencuccini2005). Inventories could also be combined with other modalities and used as annotations for larger-scale tasks. As an example, remote sensing datasets presented in the following sections in the same geographic locations could be associated with inventories to enhance the precision of their annotations. While establishing this connection between ground measurements and remote sensing data presents its own set of challenges. In the following section, we will review datasets of ground-based recordings.
4.2. Ground-based recordings
The fine-scaled composition of forests can be understood by visualizing the trees within or under their canopy. Ground-based datasets are composed of recordings inside the forests, under the tree canopy. Trunks and small trees, invisible from a bird’s eye view, can be captured with cameras recording red–green–blue (RGB) images per example. These data are sometimes recorded in time series, for example, PhenoCams (Klosterman et al., Reference Klosterman, Hufkens, Gray, Melaas, Sonnentag, Lavine, Mitchell, Norman, Friedl and Richardson2014; Brown et al., Reference Brown, Hultine, Steltzer, Denny, Denslow, Granados, Henderson, Moore, Nagai, SanClements, Sánchez-Azofeifa, Sonnentag, Tazik and Richardson2016). The use of data recorded by sensors by machine learning algorithms help to have a broader context and more tree information in the samples compared to inventories.
Ground-based datasets are reviewed in Table 2. The dataset size has been measured in hectares (ha) corresponding to the studied surface, in the number of trees in the area, or in the number of samples, which may differ between synthetic and real samples (Grondin et al., Reference Grondin, Fortin, Pomerleau and Giguère2022).
Table 2. Review of open-access ground-based forest datasets

Note: The dataset size measured in K is
$ O\left({10}^3\right) $
. DBH = diameter at breast height; ha = hectares; IMU = inertial measurement unit; IS = instance segmentation; KD = key-point detection; N/A = non-applicable; OD = object detection; PC = point cloud; Reg. = regression; RGB = red–green–blue images; Unknown = non-provided by the authors.
Stereo cameras are parameterized to estimate the depth of a scene differentiating trees and objects from the background in the forest (Grondin et al., Reference Grondin, Fortin, Pomerleau and Giguère2022). Thermal cameras have also been used to record trees’ signature (Still et al., Reference Still, Powell, Aubrecht, Kim, Helliker, Roberts, Richardson and Goulden2019) and distinguish them from other objects (Reis et al., Reference Reis, dos Santos and Santos2020; da Silva et al., Reference da Silva, dos Santos, Sousa and Filipe2021a,Reference da Silva, dos Santos, Sousa, Filipe and Boaventura-Cunhab, Reference da Silva, Santos, Filipe, Sousa and Oliveira2022). In specific cases, camera images have been annotated with bounding boxes around trees to detect them (Tremblay et al., Reference Tremblay, Béland, Gagnon, Pomerleau and Giguère2020; Grondin et al., Reference Grondin, Fortin, Pomerleau and Giguère2022). Only two reviewed datasets located in Canada have been annotated with several species classes to combine detection and classification of trees (Tremblay et al., Reference Tremblay, Béland, Gagnon, Pomerleau and Giguère2020; Grondin et al., Reference Grondin, Fortin, Pomerleau and Giguère2022). Since these datasets also provide inertial measurement unit (IMU), a potential task could be to predict the next move of an automated agent in a forest.
Forest geometry is also being intensively studied from the ground by using LiDAR—typically referred to as terrestrial laser scanning (TLS). This active sensor records three-dimensional scenes with photon reflections and can be applied from tripods or be combined with IMUs to enable mobile laser scanning. It is not impacted by sun lighting conditions and well suited to understand the structure of forests and trees such as measuring, gap fraction, stand density, tree height, DBH, volume, or biomass (Hackenberg et al., Reference Hackenberg, Spiecker, Calders, Disney and Raumonen2015; Liang et al., Reference Liang, Kankare, Hyyppä, Wang, Kukko, Haggrén, Yu, Kaartinen, Jaakkola, Guan, Holopainen and Vastaranta2016; Tremblay et al., Reference Tremblay, Béland, Gagnon, Pomerleau and Giguère2020). The spatial resolution of ground-based LiDAR recordings is either expressed in the average number of points per meter squared, or in the precision of localization of each point, based on information provided by the authors. The generated LiDAR point clouds have been used for instance segmentation (Burt et al., Reference Burt, Disney and Calders2018; Tremblay et al., Reference Tremblay, Béland, Gagnon, Pomerleau and Giguère2020; Grondin et al., Reference Grondin, Fortin, Pomerleau and Giguère2022), that is, segment each tree independently and associate them an identification number, or key-point detection, that is, localizing points of interest for each tree.
Ground-based datasets are useful to understand the composition of forests under the tree canopy, and recordings were difficult to automatize until recently (Calders et al., Reference Calders, Brede, Newnham, Culvenor, Armston, Bartholomeus, Griebel, Hayward, Junttila, Lau, Levick, Morrone, Origo, Pfeifer, Verbesselt and Herold2023). Literature lacks large-scale annotated datasets, although they can provide information at high spatial and temporal resolution and from perspectives that aerial and satellite recordings cannot. Providing both ground-based and aerial-based recordings (Soltani et al., Reference Soltani, Feilhauer, Duker and Kattenborn2022), informing both above and below tree canopy would facilitate transfer and bridging machine learning applications between different modality scales (for details, see Section 5). The next section will review aerial recordings datasets.
4.3. Aerial recordings
Aerial datasets consist of recordings of sensors mounted on unoccupied (drones) or occupied aircrafts flying above the tree canopy, offering a broader perspective of the forest without the hindrance of obstacles impeding the automatic recording process. The diversity in aerial datasets has increased in the past few years since they are used for diverse applications such as vegetation segmentation, disease detection, fire detection, and numerous others (Guimaraes et al., Reference Guimaraes, Padua, Marques, Silva, Peres and Sousa2020). This is in part also boosted as governmental organizations are increasingly making the imagery of repeated official aerial campaigns openly available (e.g., for entire countries). Furthermore, the decreasing costs of UAVs and the miniaturization of high-quality sensors have served as strong incentives for their adoption within the community.
Aerial-based recordings are reviewed in Table 3. The dataset size is expressed in kilometer squared (
$ {\mathrm{km}}^2 $
), or in hectares (ha) if the studied area is small. It is also quantified by the number of samples or the number of trees if applicable.
Table 3. Review of open-access aerial forest datasets

Note: The dataset size measured in K is
$ O\left({10}^3\right) $
. CHM = canopy height model; Classif. = classification; DBH = diameter at breast height; DSM = digital surface model; DTM = digital terrain model (spatial or vertical); ha = hectares; MC = multi-classification; N/A = non-applicable; OD = object detection; PC = point cloud; Reg. = regression; RGB = red–green–blue images; Seg. = semantic segmentation; Unknown = non-provided by the authors.
Multiple sensors can be carried by UAVs, including RGB and thermal cameras, multispectral sensors, hyperspectral sensors, and LiDAR, which collectively contribute to a captivating array of recorded data, offering diverse perspectives and insights. Cameras mounted on UAVs facilitate the acquisition of overlapping images with a spatial resolution of a few millimeters to centimeters. Such high-resolution image datasets can be applied in concert with photogrammetric workflows, which enable a triangulation of common features found in overlapping images, enabling to precisely reconstruct camera parameters and orientations in hundreds of images automatically. Such workflows enable to reconstruct digital surface models and reprojections of the imagery to generate geocoded image mosaics with orthographic projection (Guimaraes et al., Reference Guimaraes, Padua, Marques, Silva, Peres and Sousa2020; Diez et al., Reference Diez, Kentsch, Fukuda, Caceres, Moritake and Cabezas2021). Most of the recently publicly released aerial datasets contain RGB images generated by photogrammetry since they are relatively simple and cheap to collect (Morales et al., Reference Morales, Kemper, Sevillano, Arteaga, Ortega and Telles2018; Kattenborn et al., Reference Kattenborn, Eichel and Fassnacht2019a, Reference Kattenborn, Eichel, Wiser, Burrows, Fassnacht and Schmidtlein2020; Kentsch et al., Reference Kentsch, Lopez Caceres, Serrano, Roure and Diez2020; Schiefer et al., Reference Schiefer, Kattenborn, Frick, Frey, Schall, Koch and Schmidtlein2020; Nguyen et al., Reference Nguyen, Lopez Caceres, Moritake, Kentsch, Shu and Diez2021; Galuszynski et al., Reference Galuszynski, Duker, Potts and Kattenborn2022; Reiersen et al., Reference Reiersen, Dao, Lütjens, Klemmer, Amara, Steinegger, Zhang and Zhu2022). But the original RGB point cloud carrying the height information used to generate the DSM is generally not provided with some exceptions (Brieger et al., Reference Brieger, Herzschuh, Pestryakova, Bookhagen, Zakharov and Kruse2019; van Geffen et al., Reference van Geffen, Heim, Brieger, Geng, Shevtsova, Schulte, Stuenzi, Bernhardt, Troeva, Pestryakova, Zakharov, Pflug, Herzschuh and Kruse2022). This is unfortunate because there would be opportunities for new multimodal models to leverage both the RGB and point cloud modalities to improve model performance.
An alternative method for studying the topography of both the ground and canopies, depending on the structure of the forest, involves the utilization of airborne LiDAR acquisitions (Ferraz et al., Reference Ferraz, Saatchi, Xu, Hagen, Chave, Yu, Meyer, Garcia, Silva, Roswintiart, Samboko, Sist, Walker, Pearson, Wijaya, Sullivan, Rutishauser, Hoekman and Ganguly2018; Kalinicheva et al., Reference Kalinicheva, Landrieu, Mallet and Chehata2022). In contrast to terrestrial LiDAR, these measurements commonly have lower point densities, but cover large areas. Airborne LiDAR sensors are operated with IMU sensors, which enables geo-referenced flights transect across large spatial extents. These sensors typically can record multiple returns per LiDAR pulse so that acquisitions can resemble the vertical structure of forest stands, including multiple overlapping tree layers, the understory, and even the ground topography (Kalinicheva et al., Reference Kalinicheva, Landrieu, Mallet and Chehata2022). The spatial resolution of airborne LiDAR products is estimated by the average number of points per square meter.
Multispectral and hyperspectral sensors are passive, capturing reflected or emitted photons (Mavrovic et al., Reference Mavrovic, Sonnentag, Lemmetyinen, Baltzer, Kinnard and Roy2023) from the sun across wavelength bands that extend beyond the visible spectrum, allowing for comprehensive recording of electromagnetic radiation throughout the near up to the shortwave infrared region. They are especially valuable in assessing the composition of forest canopies, enabling the differentiation of species or retrieving biochemical and structural properties based on the spectral characteristics across spectral bands (Fassnacht et al., Reference Fassnacht, Latifi, Stereńczak, Modzelewska, Lefsky, Waser, Straub and Ghosh2016; Cherif et al., Reference Cherif, Feilhauer, Berger, Dao, Ewald, Hank, He, Kovach, Lu, Townsend and Kattenborn2023). A trade-off is usually required between acquiring information with a high spectral and a low spatial resolution (Paz-Kagan et al., Reference Paz-Kagan, Caras, Herrmann, Shachak and Karnieli2017), or with a low spectral and a high spatial resolution (Garioud et al., Reference Garioud, Peillet, Bookjans, Giordano and Wattrelos2022), given that the radiation reflected by plant canopies does not suffice the acquisition at high spectral and high spatial resolution simultaneously.
Forest monitoring can be explored in different ways using aerial datasets relying on the sensors employed and the annotations provided alongside the data. For instance, semantic segmentation is a prevalent method employed to classify forest canopies into tree species (Morales et al., Reference Morales, Kemper, Sevillano, Arteaga, Ortega and Telles2018; Kattenborn et al., Reference Kattenborn, Eichel and Fassnacht2019a, Reference Kattenborn, Eichel, Wiser, Burrows, Fassnacht and Schmidtlein2020; Kentsch et al., Reference Kentsch, Lopez Caceres, Serrano, Roure and Diez2020; Schiefer et al., Reference Schiefer, Kattenborn, Frick, Frey, Schall, Koch and Schmidtlein2020; Galuszynski et al., Reference Galuszynski, Duker, Potts and Kattenborn2022). Depending on the canopy structural complexity and data quality, the classification might combined with a delineation of individual tree crowns using instance segmentation approaches. Thereby, instance segmentation captures the intricate shapes of tree crowns, unlike object detection, which typically predicts rectangular bounding boxes or centroids for individual objects (Reiersen et al., Reference Reiersen, Dao, Lütjens, Klemmer, Amara, Steinegger, Zhang and Zhu2022). Some of the reviewed datasets include a DSM, which can be utilized with tree localization to estimate canopy height. This application using deep learning algorithms and aerial data is an actual active field of research (Yue et al., Reference Yue, Yang, Li, Hu, Zhang and Li2019; Moradi et al., Reference Moradi, Javan and Samadzadegan2022; Reiersen et al., Reference Reiersen, Dao, Lütjens, Klemmer, Amara, Steinegger, Zhang and Zhu2022; Wagner et al., Reference Wagner, Roberts, Ritz, Carter, Dalagnol, Favrichon, Hirye, Brandt, Ciais and Saatchi2023).
Due to the high spatial resolution, datasets of aerial recordings enable a granular understanding of forests at the individual tree level. There are still many open challenges which could be explored at the tree level such as segmenting individual tree crowns in dense forests, classifying them between a wide range of species, or adapting algorithms from a forest to another (see Section 2). Nevertheless, the scale of aerial datasets, especially for drones, is constrained by limited battery life and recording capacities, making it challenging to regularly assess and thus monitor large forest areas. Consequently, the next section will explore satellite datasets, which are better suited for capturing a broader scope of forest landscapes at high frequencies.
4.4. Satellite recordings
Satellite imagery has been consistently recorded across the globe for many years, enabling extensive research in the field of temporal remote sensing. This abundance of data has opened up research in machine learning applied to Earth observation, in particular deep learning approaches (Camps-Valls et al., Reference Camps-Valls, Tuia, Zhu and Reichstein2021), in the past few years. The datasets generated by diverse satellite missions encompass a wide range of resolutions and employ various sensors, enabling studies of diverse phenomena over both space and time (Swain et al., Reference Swain, Paul and Behera2023).
The Landsat missions (https://www.usgs.gov/landsat-missions/landsat-satellite-missions), a collaborative endeavor started in the seventies involving the U.S. Geological Survey, U.S. Department of the Interior, National Aeronautics and Space Administration (NASA), and the U.S. Department of Agriculture, represent the earliest and pioneering attempt to utilize multispectral cameras for Earth observation (Wulder et al., Reference Wulder, Roy, Radeloff, Loveland, Anderson, Johnson, Zhu, Scambos, Pahlevan, Hansen, Gorelick, Crawford, Masek, Hermosilla, White, Belward, Schaaf, Woodcock, Huntington, Lymburner, Hostert, Gao, Lyapustin, Pekel, Strobl and Cook2022). Landsat missions 4 and 5 capture images with between four and seven spectral bands, offering spatial resolutions ranging from 30 to 120 meters. The more recent Landsat missions, namely Landsat 7 and Landsat 8, record images with eight and nine spectral bands, respectively. These missions provide spatial resolutions ranging from 15 to 60 meters for Landsat 7 and 15 to 30 meters for Landsat 8. All of the Landsat missions have a 16-day repeat cycle. Most of the reviewed datasets used 30-meter-resolution spectral bands to ensure a consistency between the bands used for their final application (Potapov et al., Reference Potapov, Tyukavina, Turubanova, Talero, Hernandez-Serna, Hansen, Saah, Tenneson, Poortinga, Aekakkararungroj, Chishtie, Towashiraporn, Bhandari, Aung and Nguyen2019, Reference Potapov, Hansen, Pickens, Hernandez-Serna, Tyukavina, Turubanova, Zalles, Li, Khan, Stolle, Harris, Song, Baggett, Kommareddy and Kommareddy2022; Robinson et al., Reference Robinson, Hou, Malkin, Soobitsky, Czawlytko, Dilkina and Jojic2019; Irvin et al., Reference Irvin, Sheng, Ramachandran, Johnson-Yu, Zhou, Story, Rustowicz, Elsworth, Austin and Ng2020; De Almeida Pereira et al., Reference De Almeida Pereira, Fusioka, Nassu and Minetto2021; Feng et al., Reference Feng, Sexton, Wang, Channan, Montesano, Wagner, Wooten and Neigh2022; Lee and Choi, Reference Lee and Choi2022).
The Sentinel missions (https://sentinel.esa.int/web/sentinel/missions), managed by the European Space Agency, have been designed to comprehensively monitor the Earth’s various domains, encompassing air, land, ocean, and atmospheric measurements. These missions employ multiple sensors, enabling a wide range of Earth observation capabilities. Sentinel-1 includes a SAR generating electromagnetic waves with wavelengths not impacted by clouds. The reviewed datasets provide or use Level-1 Ground Range Detected (GRD) products at a
$ 10\times 10 $
meters resolution (Schmitt et al., Reference Schmitt, Hughes, Qiu and Zhu2019; Sumbul et al., Reference Sumbul, de Wall, Kreuziger, Marcelino, Costa, Benevides, Caetano, Demir and Markl2021; Lee and Choi, Reference Lee and Choi2022). Since two satellites (Sentinel-1A and Sentinel-1B) are recording data on the same orbit, the mission has a 6-day exact repeat cycle with less than a day of revisit frequency at high latitudes.
Sentinel-2 utilizes multispectral sensors to scan photon reflectance across multiple spectral bands. The spatial resolution of the recorded data depends on the spectral bands: four bands at 10 meters, six bands at 20 meters, and three bands at 60 meters. Released datasets kept either 10-m-resolution bands (Schmitt et al., Reference Schmitt, Hughes, Qiu and Zhu2019; Bastani et al., Reference Bastani, Wolters, Gupta, Ferdinando and Kembhavi2023; Lee and Choi, Reference Lee and Choi2022) or all the bands (Sumbul et al., Reference Sumbul, de Wall, Kreuziger, Marcelino, Costa, Benevides, Caetano, Demir and Markl2021). The revisit frequency of the combined constellation of Sentinel-2A and B is 5 days on most of the globe.
Data from the Landsat and Sentinel missions are the most commonly provided in the reviewed datasets, but other interesting satellite sources are also explored. For instance, the Moderate Resolution Imaging Spectroradiometer (MODIS) (https://modis.gsfc.nasa.gov) instrument, introduced by NASA and integrated into the Terra and Aqua missions, generates data that are also used for large-scale forest monitoring purposes. The MODIS instrument offers recordings from 36 spectral bands, each defined for diverse observations, including atmospheric gases, ocean components, land boundaries, and properties (Schmitt et al., Reference Schmitt, Hughes, Qiu and Zhu2019; Levin et al., Reference Levin, Yebra and Phinn2021). Another example is the Visible Infrared Imaging Radiometer Suite (VIIRS) instrument (https://www.earthdata.nasa.gov/learn/find-data/near-real-time/viirs), part of the NOAA-20 missions, which also have generated data contained in a forest dataset for land and atmospheric observations (Levin et al., Reference Levin, Yebra and Phinn2021). It should be noted that researchers have used recordings from PlanetLabs (https://www.planet.com/), PlanetScope (https://earth.esa.int/eogateway/missions/planetscope), or Maxar (https://www.maxar.com/) missions (e.g., GeoEye, WorldView, or QuickBird), which provide multispectral data with submeter spatial resolution (Brandt et al., Reference Brandt, Tucker, Kariryaa, Rasmussen, Abel, Small, Chave, Rasmussen, Hiernaux, Diouf, Kergoat, Mertz, Igel, Gieseke, Schöning, Li, Melocik, Meyer, Sinno, Romero, Glennie, Montagu, Dendoncker and Fensholt2020). However, these datasets are not publicly accessible due to the associated licensing restrictions.
The datasets that have been reviewed encompass satellite data obtained from various missions and products, originating from different locations, and exhibiting diverse spatial and temporal resolutions. The details of datasets published before 2020 and included, or after 2020, are provided, respectively, in Tables 4 and 5. The dataset size is expressed in kilometer squared (
$ {\mathrm{km}}^2 $
), or in hectares (ha) if the studied area is small. It is also quantified by the number of samples, trees, or events if applicable.
Table 4. Review of open-access satellite forest datasets before 2020 (included)

Note: The dataset size measured in K is
$ O\left({10}^3\right) $
and in M is
$ O\left({10}^6\right) $
. CHM = canopy height model; Classif. = classification; ha = hectares; LMFC = live fuel moisture content; LULC = land use and/or land cover; N/A = non-applicable; Reg. = regression; SAR = synthetic-aperture RADAR; Seg. = semantic segmentation.
Table 5. Review of satellite recording datasets after 2021 (included)

Note: The dataset size measured in K is
$ O\left({10}^3\right) $
and in M is
$ O\left({10}^6\right) $
. CD = change detection; Classif. = classification; ha = hectares; LULC = land use and/or land cover; MC = multi-classification; N/A = non-applicable; NDVI = normalized difference vegetation index; Reg. = regression; SAR = synthetic-aperture RADAR; Seg. = semantic segmentation; Unknown = non-provided by the authors.
a The list of countries is detailed in the OpenForest catalog.
Satellite datasets are frequently used for classification, multiclassification, or segmentation of satellite tiles, including LULC and tree species distribution. Other tasks include regression applications for forest cover estimation (Bastani et al., Reference Bastani, Wolters, Gupta, Ferdinando and Kembhavi2023; Feng et al., Reference Feng, Sexton, Wang, Channan, Montesano, Wagner, Wooten and Neigh2022), canopy height (Forkuor et al., Reference Forkuor, Benewinde Zoungrana, Dimobe, Ouattara, Vadrevu and Tondoh2020; Lang et al., Reference Lang, Jetz, Schindler and Wegner2022a), or live fuel moisture content estimation (Rao et al., Reference Rao, Williams, Flefil and Konings2020). An intriguing application involves utilizing satellite time series data to evaluate change detection of forest covers at a large scale (Wang et al., Reference Wang, Sulla-Menashe, Woodcock, Sonnentag, Keeling and Friedl2020a). This approach enables the estimation of deforestation, afforestation, and reforestation activities (Potapov et al., Reference Potapov, Hansen, Pickens, Hernandez-Serna, Tyukavina, Turubanova, Zalles, Li, Khan, Stolle, Harris, Song, Baggett, Kommareddy and Kommareddy2022).
Satellite recordings play a crucial role in Earth observation on a large scale, as they are manually or automatically processed to estimate global maps of forest cover, among other applications. Additionally, world maps depicting aboveground biomass, land use, and land cover have been estimated and made publicly available. In the following section, datasets containing maps at the country or global level will be reviewed.
4.5. Country or world maps
Earth observation applications have been resumed into maps at the country, continent, or global level. They are estimated using machine learning algorithms that incorporate manual expert features derived from satellite data such as statistics of the data distribution or vegetation indexes. These features encompass various aspects, ranging from multispectral information (Friedl et al., Reference Friedl, Sulla-Menashe, Tan, Schneider, Ramankutty, Sibley and Huang2010; Pflugmacher et al., Reference Pflugmacher, Rabe, Peters and Hostert2019) to climatic and elevation data (Chaves et al., Reference Chaves, Zuquim, Ruokolainen, Van doninck, Kalliola, Gómez Rivero and Tuomisto2020). The majority of the released maps have been estimated using machine learning algorithms trained on satellite data, as these algorithms demonstrate excellent scalability for predicting at a large scale and low resolution. The results obtained from these algorithms have been validated using field inventories. However, it is important to note that the field inventories themselves have not been included in the reviewed datasets discussed in this section. (Open-access datasets releasing both maps and inventories are reviewed in Table 8 and in Section 4.6.) Nonetheless, the reviewed map datasets are notable for their global coverage, which adds to their significance. Despite containing inherent uncertainties in their estimations, these maps have the potential to offer valuable meta-knowledge to downstream applications in the realm of forest monitoring.
Large-scale map datasets before 2019 (included) are reviewed in Table 6, while maps datasets released after 2020 (included) are reviewed in Table 7. The dataset size is expressed in kilometer squared (
$ {\mathrm{km}}^2 $
), or in hectares (ha) if the studied area is small. It is also quantified by the number of samples or points if applicable.
Table 6. Review of open-access map forest datasets before 2019 (included)

Note: The dataset size measured in K is
$ O\left({10}^3\right) $
and in M is
$ O\left({10}^6\right) $
. AGB = aboveground biomass; Classif. = classification; IFL = intact forest landscape; LULC = land use and/or land cover; N/A = non-applicable; PFT = plant functional type; Reg. = regression; Seg. = semantic segmentation; Unknown = non-provided by the authors.
a The list of recording years is detailed in the OpenForest catalog.
b The list of countries is detailed in the OpenForest catalog.
Table 7. Review of open-access map forest datasets after 2020 (included)

Note: The dataset size measured in K is
$ O\left({10}^3\right) $
and in M is
$ O\left({10}^6\right) $
. AGB = aboveground biomass; BGB = belowground biomass; CD = change detection; CH = canopy height; Classif. = classification; GSV = growing stock volume; LULC = land use and/or land cover; MC = multi-classification; N/A = non-applicable; Reg. = regression; SCS = soil carbon stock; Seg. = semantic segmentation; Unknown = non-provided by the authors.
a The list of recording years is detailed in the OpenForest catalog.
A significant portion of map datasets focuses on providing information about LULC (see Section 4 for the proposed definition), which plays a crucial role in distinguishing different types of forests at a large scale (Bartholomé and Belward, Reference Bartholomé and Belward2005; Friedl et al., Reference Friedl, Sulla-Menashe, Tan, Schneider, Ramankutty, Sibley and Huang2010; Griffiths et al., Reference Griffiths, Kuemmerle, Baumann, Radeloff, Abrudan, Lieskovsky, Munteanu, Ostapowicz and Hostert2014; Pflugmacher et al., Reference Pflugmacher, Rabe, Peters and Hostert2019; Thonfeld et al., Reference Thonfeld, Steinbach, Muro and Kirimi2020; Bonannella et al., Reference Bonannella, Hengl, Heisig, Parente, Wright, Herold and de Bruin2022). Within in LULC maps, the reviewed works have also estimated the extend of forest cover, including time series data. These time series are particularly valuable for quantifying forest loss, that is, deforestation detection, as well as forest gain, that is, afforestation, reforestation monitoring, or deadwood maps (Hansen et al., Reference Hansen, Potapov, Moore, Hancher, Turubanova, Tyukavina, Thau, Stehman, Goetz, Loveland, Kommareddy, Egorov, Chini, Justice and Townshend2013; Curtis et al., Reference Curtis, Slay, Harris, Tyukavina and Hansen2018; Bunting et al., Reference Bunting, Rosenqvist, Hilarides, Lucas, Thomas, Tadono, Worthington, Spalding, Murray and Rebelo2022; Verhegghen et al., Reference Verhegghen, Kuzelova, Syrris, Eva and Achard2022; Schiefer et al., Reference Schiefer, Schmidtlein, Frick, Frey, Klinke, Zielewska-Büttner, Junttila, Uhl and Kattenborn2023). Another category of maps is specifically designed to differentiate between different plant functional types, particularly distinguishing between broad-leaved and needle-leaf forests, as well as identifying summer-green and evergreen forests across tropical, boreal, and temperate regions (Ottlé et al., Reference Ottlé, Lescure, Maignan, Poulter, Wang and Delbart2013).
As mentioned in Section 2.2, accurate estimation of aboveground biomass is crucial for a comprehensive quantification of the carbon stocks that forests worldwide hold. World maps of aboveground biomass have been estimated at different resolutions (Patterson et al., Reference Patterson, Healey, Ståhl, Saarela, Holm, Andersen, Dubayah, Duncanson, Hancock, Armston, Kellner, Cohen and Yang2019; Santoro et al., Reference Santoro, Cartus, Carvalhais, Rozendaal, Avitabile, Araza, de Bruin, Herold, Quegan, Rodriguez-Veiga, Balzter, Carreiras, Schepaschenko, Korets, Shimada, Itoh, Martínez, Cavlovic, Cazzolla Gatti, da Conceiçao Bispo, Dewnath, Labrière, Liang, Lindsell, Mitchard, Morel, Pacheco Pascagaza, Ryan, Slik, Vaglio Laurin, Verbeeck, Wijaya and Willcock2021; Tang et al., Reference Tang, Ma, Lister, O’Neill-Dunne, Lu, Lamb, Dubayah and Hurtt2021; Ma et al., Reference Ma, Hurtt, Tang, Lamb, Campbell, Dubayah, Guy, Huang, Lister, Lu, O’Neil-Dunne, Rudee, Shen and Silva2021a). Quantifying the uncertainty of aboveground biomass maps is also important as it depends on multiple factors, including tree species, tree height, canopy size, and reference data distribution (Patterson et al., Reference Patterson, Healey, Ståhl, Saarela, Holm, Andersen, Dubayah, Duncanson, Hancock, Armston, Kellner, Cohen and Yang2019; Ploton et al., Reference Ploton, Mortier, Réjou-Méchain, Barbier, Picard, Rossi, Dormann, Cornu, Viennois, Bayol, Lyapustin, Gourlet-Fleury and Pélissier2020; Santoro et al., Reference Santoro, Cartus, Carvalhais, Rozendaal, Avitabile, Araza, de Bruin, Herold, Quegan, Rodriguez-Veiga, Balzter, Carreiras, Schepaschenko, Korets, Shimada, Itoh, Martínez, Cavlovic, Cazzolla Gatti, da Conceiçao Bispo, Dewnath, Labrière, Liang, Lindsell, Mitchard, Morel, Pacheco Pascagaza, Ryan, Slik, Vaglio Laurin, Verbeeck, Wijaya and Willcock2021). Canopy height maps have also been quantified in sparse boreal forests (Bartsch et al., Reference Bartsch, Höfler, Kroisleitner and Trofaier2016, Reference Bartsch, Widhalm, Leibman, Ermokhina, Kumpula, Skarin, Wilcox, Jones, Frost, Höfler and Pointner2020). These canopy height maps, both at the country (Tolan et al., Reference Tolan, Yang, Nosarzewski, Couairon, Vo, Brandt, Spore, Majumdar, Haziza, Vamaraju, Moutakani, Bojanowski, Johns, White, Tiecke and Couprie2023) and world levels (Lang et al., Reference Lang, Jetz, Schindler and Wegner2022a,Reference Lang, Kalischek, Armston, Schindler, Dubayah and Wegnerb), have been estimated using LiDAR sensors as a ground truth. Accurate estimation of canopy height is crucial for evaluating aboveground biomass, which is why a few studies have utilized data from the Global Ecosystem Dynamics Investigation (GEDI) mission (https://gedi.umd.edu/) (Patterson et al., Reference Patterson, Healey, Ståhl, Saarela, Holm, Andersen, Dubayah, Duncanson, Hancock, Armston, Kellner, Cohen and Yang2019; Tang et al., Reference Tang, Ma, Lister, O’Neill-Dunne, Lu, Lamb, Dubayah and Hurtt2021; Ma et al., Reference Ma, Hurtt, Tang, Lamb, Campbell, Dubayah, Guy, Huang, Lister, Lu, O’Neil-Dunne, Rudee, Shen and Silva2021a; Lang et al., Reference Lang, Kalischek, Armston, Schindler, Dubayah and Wegner2022b; Tolan et al., Reference Tolan, Yang, Nosarzewski, Couairon, Vo, Brandt, Spore, Majumdar, Haziza, Vamaraju, Moutakani, Bojanowski, Johns, White, Tiecke and Couprie2023). GEDI records LiDAR data from the International Space Station, allowing for the estimation of a DSM that serves as a valuable reference for canopy height estimation. The overall biomass estimation of forests also includes belowground biomass (Chen et al., Reference Chen, Feng, Fu, Ma, Zohner, Crowther, Huang, Wu and Wei2023) and soil carbon stock (Dionizio et al., Reference Dionizio, Pimenta, Lima and Costa2020), which have been quantified using SAR satellite data penetrating dense canopies.
Although open-access map datasets are subject to the limitations and uncertainties inherent in the estimation methods employed by the authors, they remain a valuable source of data for obtaining a broad-scale understanding of forests or integrating meta-knowledge into future analyses. These datasets could be helpful for conducting further research and expanding our knowledge of forest ecosystems.
In the preceding sections, the reviewed datasets were presented with a focus on different scales. However, in the forthcoming section, datasets that offer a combination of data at various scales will be discussed in detail.
4.6. Datasets mixed at different scales
Datasets that offer data at various scales play an important role in establishing a bridge between different modalities recorded by diverse sensors. By integrating information from multiple sources, these datasets facilitate a comprehensive understanding of forests and enable cross-modal analysis. Inventories, ground-based, and aerial-based datasets are available at small scale but usually come alongside precise annotations at all tree levels. Conversely, satellite and map datasets are available at a larger scale but often lack precise annotations due to their lower resolution. Integrating data from different scales can be advantageous in generalizing and extrapolating local knowledge to a larger scale, bridging the gap between detailed annotations and broader coverage (Kattenborn et al., Reference Kattenborn, Lopatin, Förster, Braun and Fassnacht2019b; Schiefer et al., Reference Schiefer, Schmidtlein, Frick, Frey, Klinke, Zielewska-Büttner, Junttila, Uhl and Kattenborn2023).
The size of each dataset is expressed in kilometer squared (
$ {\mathrm{km}}^2 $
), or in hectares (ha) if the studied area is small. It is also quantified by the number of samples, points, or trees if applicable.
Mixed datasets composed of inventories and aerial-based recordings (IA); inventories, aerial-based, and satellite-based recordings (IAS); and inventories and maps (IM) are reviewed in Table 8. Inventories provide an additional value to imagery recordings by providing geo-located annotations, depending on the level of precision they offer. These inventories enhance the spatial context and accuracy of the annotations. Combining inventories with aerial-based recordings would be highly beneficial for accurately aligning tree measurements with aerial data, especially with LiDAR (Weiser et al., Reference Weiser, Schäfer, Winiwarter, Krašovec, Fassnacht and Höfle2022) or RGB (Brieger et al., Reference Brieger, Herzschuh, Pestryakova, Bookhagen, Zakharov and Kruse2019; van Geffen et al., Reference van Geffen, Heim, Brieger, Geng, Shevtsova, Schulte, Stuenzi, Bernhardt, Troeva, Pestryakova, Zakharov, Pflug, Herzschuh and Kruse2022) recordings. This integration enables improved estimation of carbon stocks at the aerial scale, among other applications. Field measurements have also been used to validate country or world maps; they are often released together to enable reproducibility of the results. This integration of field measurements and map data enhances the accuracy and reliability of the generated maps. Similarly to datasets presented in Section 4.5, maps of forest age (Besnard et al., Reference Besnard, Koirala, Santoro, Weber, Nelson, Gütter, Herault, Kassi, N’Guessan, Neigh, Poulter, Zhang and Carvalhais2021), carbon stocks (Tucker et al., Reference Tucker, Brandt, Hiernaux, Kariryaa, Rasmussen, Small, Igel, Reiner, Melocik, Meyer, Sinno, Romero, Glennie, Fitts, Morin, Pinzon, McClain, Morin, Porter, Loeffler, Kergoat, Issoufou, Savadogo, Wigneron, Poulter, Ciais, Kaufmann, Myneni, Saatchi and Fensholt2023), and LULC (Koskinen et al., Reference Koskinen, Leinonen, Vollrath, Ortmann, Lindquist, d’Annunzio, Pekkarinen and Käyhkö2019; Bendini et al., Reference Bendini, Fonseca, Schwieder, Rufin, Korting, Koumrouyan and Hostert2020; Shevtsova et al., Reference Shevtsova, Heim, Kruse, Schröder, Troeva, Pestryakova, Zakharov and Herzschuh2020; European Commission. Statistical Office of the European Union, 2021) have been estimated by machine learning algorithms while being calibrated and validated with inventories.
Table 8. Review of open-access mixed forest datasets, including inventories and aerial-based (IA); inventories, aerial-based, and satellite-based (IAS); and inventories and maps (IM)

Note: The dataset size measured in K is
$ O\left({10}^3\right) $
, in M is
$ O\left({10}^6\right) $
, and in B is
$ O\left({10}^9\right) $
. AGB = aboveground biomass; CBH = crown base height; Classif. = classification; DBH = diameter at breast height; EVI = enhanced vegetation index; IA = inventories and aerial; IAS = inventories, aerial and satellite; IM = inventories and maps; IS = instance segmentation; LULC = land use and/or land cover; MC = multi-classification; N/A = non-applicable; OL = object localization; PC = point cloud; Reg. = regression; RGB = red–green–blue; Seg. = semantic segmentation; Unknown = non-provided by the authors.
a The dataset includes three LiDAR with different resolutions, which are ALS: 72.5 pts-m2; ULS: 1,029.2 pts-m2; and TLS: Unknown.
b The list of countries is detailed in the OpenForest catalog.
c Aerial recordings (3-cm resolution) are aerial RGB, SfM PC, RGB PC, RGN images, DEM, CHM, DSM, and DTM.
d The list of recording years is detailed in the OpenForest catalog.
e Field measurements (50-cm resolution) are location, crown area, wood mass, mass, root dry mass, count density, coverage density, and area density.
Mixed datasets composed of ground-based and aerial-based recordings (GA); aerial-based and satellite-based recordings (AS); aerial-based recordings and maps (AM); and satellite-based recordings and maps (SM) are reviewed in Table 9. Aligning ground-based and aerial-based imagery recordings is valuable in integrating information from both above and below the canopies of a forest. For instance, models can be trained using ground recordings sourced from citizen science-based photographs, and then effectively transferred to aerial data (Soltani et al., Reference Soltani, Feilhauer, Duker and Kattenborn2022). This alignment could enable a comprehensive understanding of the forest ecosystem by bridging the gap between ground-level and aerial-level observations.
Table 9. Review of open-access mixed forest datasets, including ground-based and aerial-based (GA); aerial-based and satellite-based (AS); aerial-based and maps (AM); and satellite-based and maps (SM)

Note: The dataset size measured in K is
$ O\left({10}^3\right) $
and in M is
$ O\left({10}^6\right) $
. AGB = aboveground biomass; Align. = alignment; AM = aerial and maps; AS = aerial and satellite; CHM = canopy height model; Classif. = classification; GA = ground and aerial; LULC = land use and/or land cover; MC = multi-classification; N/A = non-applicable; NDVI = normalized difference vegetation index; OD = object detection; PC = point cloud; Reg. = regression; RGB = red–green–blue; SAR = synthetic-aperture RADAR; Seg. = semantic segmentation; SM = satellite and maps; Unknown = non-provided by the authors.
Mapping aerial-based and satellite-based recordings is helpful for extrapolating high-resolution information at a small scale to a lower resolution at a larger scale. This process allows for the transfer of detailed information captured through aerial LiDAR, for example, to validate canopy height models derived from satellite imagery (Marconi et al., Reference Marconi, Graves, Gong, Nia, Le Bras, Dorr, Fontana, Gearhart, Greenberg, Harris, Kumar, Nishant, Prarabdh, Rege, Bohlman, White and Wang2019; Weinstein et al., Reference Weinstein, Marconi, Bohlman, Zare, Singh, Graves and White2021b; Lang et al., Reference Lang, Jetz, Schindler and Wegner2022a). The integration of these datasets facilitates a more comprehensive and accurate representation of forest characteristics across different spatial scales. Integrating SAR and multispectral satellite data with aerial imagery can potentially enhance model performances (Schmitt and Zhu, Reference Schmitt and Zhu2016), particularly by leveraging the varying reflection and absorption characteristics of different tree species (Ahlswede et al., Reference Ahlswede, Schulz, Gava, Helber, Bischke, Förster, Arias, Hees, Demir and Kleinschmit2022). Aerial LiDAR metrics have also been used as validation points to estimate the aboveground biomass at large scale (Hudak et al., Reference Hudak, Fekety, Kane, Kennedy, Filippelli, Falkowski, Tinkham, Smith, Crookston, Domke, Corrao, Bright, Churchill, Gould, McGaughey, Kane and Dong2020). At a larger scale, satellite data analyzed with multiclassification algorithms have also been useful to monitor and detect forest loss (Turubanova et al., Reference Turubanova, Potapov, Tyukavina and Hansen2018).
Open-access datasets featuring modalities at different scales have been made available to enable result reproducibility and promote heterogeneity in the way of observing forests. These datasets incorporate various modalities aligned at different scales, which could aim to enhance the generalization capabilities of machine learning algorithms at a larger scale. This not only facilitates research in solving tasks at different scales depending on the modality but also fosters a comprehensive understanding of forests through multimodal analysis. To date, the publications related to the reviewed datasets have not extensively explored multimodal (e.g., point clouds with raster data and point observations with spatially continuous data), multiscale, and multitask approaches. However, it is our hope that the machine learning and computer vision communities will venture into forest monitoring along this path, as it holds great potential for advancing our understanding of the composition of forests worldwide. By embracing these comprehensive approaches, we can enhance our comprehension of forests and contribute to more effective and efficient forest management strategies. The upcoming section will explore perspectives on forest datasets, shedding light on the potential challenges that researchers could prioritize and address in their work.
5. Perspectives
The enthusiasm for forest monitoring is on the rise, serving as a safeguard to protect forests and their ecological and societal significance. Proper monitoring is essential for avoided forest conversion, supporting forest management initiatives, and ensuring successful reforestation and afforestation projects by enhancing survival rates and preventing diseases (van Lierop et al., Reference van Lierop, Lindquist, Sathyapala and Franceschini2015; Martin et al., Reference Martin, Woodbury, Doroski, Nagele, Storace, Cook-Patton, Pasternack and Ashton2021). Additionally, the effects of climate change on forest dynamics (Fassnacht et al., Reference Fassnacht, White and Wulder2023) imply a growing need for heightened surveillance of these ecosystems.
As a data-driven and empirical science, respectively, forest monitoring benefits from open-access, diverse, and large datasets, coupled with advancements in machine learning research (De Lima et al., Reference De Lima, Phillips, Duque, Tello, Davies, De Oliveira, Muller, Honorio Coronado, Vilanova, Cuni-Sanchez, Baker, Ryan, Malizia, Lewis, Ter Steege, Ferreira, Marimon, Luu, Imani, Arroyo, Blundo, Kenfack, Sainge, Sonké and Vásquez2022). This endeavor seeks to address existing challenges and research strategies while extensively reviewing open-access forest datasets, with the ultimate goal of encouraging the research community to further investigate this field.
As evident from Section 2.2, forest monitoring remains an active area of research. Numerous ongoing inquiries delve into various aspects, such as tree species identification, phenology, abiotic factors, exogenous influences, and many more. Machine learning already greatly advanced our capabilities to monitor forests through novel analytical tools and capacities. This involves sensing past, current, and dynamic forest states through predictive modeling. Such models and information, build to be explainable by design, will greatly advance our understanding of forests, including insights into how diverse environmental and anthropogenic drivers impact forest dynamics, as well as the operational mechanisms of forest ecosystems. A related and cardinal interest lies in the projection of future forest dynamics to guide decision-makers, to improve management and anticipate consequences (Requena-Mesa et al., Reference Requena-Mesa, Reichstein, Mahecha, Kraft and Denzler2018). In this context, it is important to consider that ongoing and accelerated changes induced by global warming and climate change reshape the dynamics of the Earth system and its respective data (making data through time nonstationary). Therefore, it becomes essential to explore solutions that streamline the adaptability and transferability of data-driven machine learning methods, ensuring their efficiency in extrapolating from existing and historical data to future circumstances.
Machine learning and computer vision are exerting a progressively increasing influence across various domains, including forest monitoring, as elaborated in Section 3. Strategies related to model generalization, learning schemes, and forestry-based metrics are valuable for delving further into the challenges presented by forest biology.
Enhancing the generalization capabilities of models involves better adaptation to diverse spatial and temporal domains, encompassing different forests, sensors, and resolutions. To achieve this, machine learning strategies will be explored, focusing on leveraging existing datasets through weakly supervised (see Section 3.2.2) or few-shot (see Sections 3.2.4 and 3.2.5) learning approaches. Moreover, hybrid models, which integrate physical knowledge (see Section 3.3), or space-for-time substitutions, which enable to learn temporal dynamics from spatial dynamics, may greatly advance our capabilities to design robust data-driven machine learning applications for monitoring and forecasting in a nonstationary world. In line with this perspective, the OpenForest dynamic catalog could serve as a suitable reference to consistently enhance and refine models using the latest data.
Implementing active learning methods (see Section 3.2.3) can significantly optimize the process of generating annotations for future datasets. As datasets continue to grow in size, self-supervised learning methods (see Section 3.2.1) offer a valuable perspective to learn meaningful representations in deep learning algorithms for forest monitoring without relying heavily on manual annotations.
Incorporating multimodal and multitask computer vision architectures into forest monitoring presents an intriguing opportunity to capitalize on task complementarity. For instance, by predicting multiple foliage traits from hyperspectral data, a model can learn the covariance among different traits and, hence, provide more robust estimates for challenging traits based on their relation with more accurately predicted ones (Schiller et al., Reference Schiller, Schmidtlein, Boonman, Moreno-Martínez and Kattenborn2021; Cherif et al., Reference Cherif, Feilhauer, Berger, Dao, Ewald, Hank, He, Kovach, Lu, Townsend and Kattenborn2023). An additional area of potential research could involve enhancing carbon stock estimation by simultaneously predicting both tree species and height.
Foundation models (see Section 3.1.2) have demonstrated remarkable capabilities in managing various modalities, such as LiDAR, RADAR, and hyperspectral data, with varying spatial and temporal resolutions. These models remain task-agnostic and can achieve high zero-shot performances, making their pretraining a challenging yet promising endeavor for forest monitoring. Once pretrained, they can be adapted to multiple other tasks in this domain. While relying on the complementarity of large-scale multimodal and multitask datasets, research on foundation models for forest monitoring worldwide would benefit from the OpenForest catalog dynamically enriched by the community.
Section 4 provides a comprehensive review of open-source forest datasets, categorized according to specific criteria and identified scales. These datasets are grouped in OpenForest, a dynamic catalog open for updates from the community. The aim is to foster communication, inspire new applications of machine learning in forest monitoring, and motivate advancements in this field.
Datasets, as prerequisite of machine learning applications, commonly exhibit a lack of geographical representativeness, particularly noticeable in African and Asian regions, as depicted in Figure 3. Whenever feasible, ideal datasets would preferably align multimodal recordings based on their temporal and spatial resolutions while offering annotations in the highest available resolution. As demonstrated in remote sensing (Lacoste et al., Reference Lacoste, Lehmann, Rodriguez, Sherwin, Kerner, Lütjens, Irvin, Dao, Alemohammad, Drouin, Gunturkun, Huang, Vazquez, Newman, Bengio, Ermon and Zhu2023), aggregating numerous diverse forest datasets could foster research in developing specialized foundation models for effective forest monitoring.
The OpenForest catalog, in addition to providing a list of open-access datasets, will also curate information about data providers (see related information at https://github.com/RolnickLab/OpenForest), a crucial resource for generating well-structured datasets that cater to specific needs. Citizen-generated data, such as curated on OpenAerialMap (https://openaerialmap.org/) or GBIF (https://www.gbif.org/), hold significant values since they integrate information from across the globe, making them ideal for self-supervised learning. The data provider list will also be frequently updated to integrate most recent initiatives. For instance, incorporating data from the Biomass mission (https://www.esa.int/Applications/Observing_the_Earth/FutureEO/Biomass) as soon as possible into future datasets is essential. The mission’s provision of P-band SAR data will greatly benefit worldwide forest tomography (Berenger et al., Reference Berenger, Denis, Tupin, Ferro-Famil and Huang2023), advancing our comprehension of forest carbon stock and its dynamics. Exploring its potential can lead to the creation of valuable and structured datasets.
Recordings from aerial data, especially UAVs, gain momentum by offering promising prospects, with more affordable and easier-to-pilot vehicles equipped with higher-resolution sensors. Leveraging UAV technology allows for high-resolution forest analysis, even in remote or inaccessible areas, and can moreover advance large-scale assessments by its integration with Earth observation satellite missions (Schiefer et al., Reference Schiefer, Schmidtlein, Frick, Frey, Klinke, Zielewska-Büttner, Junttila, Uhl and Kattenborn2023).
As more and more datasets are released at various scales, the OpenForest catalog offers the opportunity to centralize this information with details. It will help to motivate research in bridging the gap between scales, sensors, and resolutions while hopefully motivating collaborations between researchers.
Open peer review
To view the open peer review materials for this article, please visit http://doi.org/10.1017/eds.2024.53.
Acknowledgements
The authors are grateful for the valuable feedback of O. Sonnentag.
Author contribution
Conceptualization: A.O., D.R.; Data curation: A.O.; Data visualization: A.O.; Methodology: all authors; Writing—original draft: A.O., T.K., E.L.; Writing—review and editing: all authors. All authors approved the final submitted draft.
Competing interest
The authors declare no competing interests.
Data availability statement
The OpenForest catalog is available and open to contributions in the following repository: https://github.com/RolnickLab/OpenForest. It is also archived in Zenodo at https://doi.org/10.5281/zenodo.14025443.
Funding statement
This work was funded through the IVADO program on “AI, Biodiversity and Climate Change” and the Canada CIFAR AI Chairs program. It was also funded through the German Research Foundation (DFG) under the project PANOPS (Project No. 504978936) and BigPlantSens (Project No. 444524904).
Ethical standard
The research meets all ethical guidelines, including adherence to the legal requirements of the study country.
Comments
Dear Editors.
We are writing to submit our manuscript “OpenForest: A data catalogue for machine learning in forest monitoring” to the Environmental Data Science journal.
In the context of a climate emergency, there is an urgent need to monitor forests worldwide.
This is essential for maintaining ecological equilibrium, as it helps mitigate human impacts and enhances our comprehension of forest composition.
This work aims to foster interest among both the machine learning and the forest biology communities regarding ongoing research topics and challenges for forest monitoring.
Forest biology research topics and their current challenges are discussed to target potential areas for future research within the community.
Machine learning methods are also introduced bringing the potential to explore and tackle forest biology challenges.
As both biology challenges and machine learning methods require a large source of available data, a clear review of open source datasets is also proposed.
To highlight and increase the research trend in these fields, the OpenForest dynamic catalog is publicly released to centralize all open source available datasets for forest monitoring while being open to updates by the community.
The overall objective of this work is to foster communication, inspire new applications of machine learning in forest monitoring, and motivate advancements in this field.