1. Introduction
Liquid water from snow melt or rain modifies the properties of the snowpack. Both surficial wetting and infiltration of liquid water may reduce snowpack stability and lead to the release of wet-snow avalanches on inclined, snow-covered slopes (e.g. Kattelmann, Reference Kattelmann1985; Conway and Raymond, Reference Conway and Raymond1993). Despite the high damage potential of wet-snow avalanches, their formation mechanisms are less well understood compared to dry-snow avalanches (e.g. Baggi and Schweizer, Reference Baggi and Schweizer2009). This is partly due to a lack of in-situ observations such as reliable measurements of liquid water content (LWC) in the snowpack and to the largely unknown influence of water content on mechanical properties such as strength. Moreover, wet-snow instability is a highly transient and spatially-variable phenomenon related to the water transport in snow (Schneebeli, Reference Schneebeli2004). Once the snowpack becomes partly wet, the probability of avalanche release may rapidly increase (e.g. Conway and Raymond, Reference Conway and Raymond1993), but determining the peak and end of a period of high wet-snow avalanche activity is particularly difficult (Techel and Pielmeier, Reference Techel and Pielmeier2009).
The lack of process-based models motivated different data-based approaches based on weather and snowpack properties (Romig and others, Reference Romig, Cluster, Birkeland and Locke2004; Baggi and Schweizer, Reference Baggi and Schweizer2009; Helbig and others, Reference Helbig, van Herwijnen and Jonas2015) and simulated LWC (Mitterer and others, Reference Mitterer, Techel, Fierz and Schweizer2013; Wever and others, Reference Wever, Vera Valero and Fierz2016; Bellaire and others, Reference Bellaire, van Herwijnen, Mitterer and Schweizer2017). While air temperature was clearly related to wet-snow instability (Peitzsch and others, Reference Peitzsch, Pederson, Birkeland, Hendrikx and Fagre2021), it is a poor predictor on its own (Romig and others, Reference Romig, Cluster, Birkeland and Locke2004; Baggi and Schweizer, Reference Baggi and Schweizer2009) and causes many false alarms. As infiltrating meltwater is a prime driver of wet-snow instability, the energy balance at the snow surface needs to be considered, ideally complemented with the LWC in the snowpack. Hence, Mitterer and Schweizer (Reference Mitterer and Schweizer2013) considered melt water production at the snow surface using the 1-D numerical snow cover model SNOWPACK (Bartelt and Lehning, Reference Bartelt and Lehning2002). A simple proxy of water infiltration is the mean LWC of the snowpack, which motivated the introduction of the liquid water content index (LWCindex) (Mitterer and others, Reference Mitterer, Techel, Fierz and Schweizer2013; Wever and others, Reference Wever, Vera Valero and Fierz2016; Bellaire and others, Reference Bellaire, van Herwijnen, Mitterer and Schweizer2017). While indices based on LWC thresholds are generally effective in detecting the onset of wet-snow avalanche cycles, as these coincide with rapid increases in LWC, such indices are not well-suited to predict the end of avalanche periods. The transient nature of wet-snow avalanching was highlighted in several studies (e.g. Durand and others, Reference Durand, Giraud, Brun, Mérindol and Martin1999; Techel and others, Reference Techel, Pielmeier and Schneebeli2011; Wever and others, Reference Wever, Vera Valero and Fierz2016). The notion of ’first wetting’ is therefore often used in the context of the onset of wet-snow avalanching and is a term best associated with the first wetting of previously dry, weak layers in the snowpack.
The use of downscaled NWP output variables for probabilistic wet-snow avalanche forecasting has been considered by, for example, Helbig and others (Reference Helbig, van Herwijnen and Jonas2015) and Bellaire and others (Reference Bellaire, van Herwijnen, Mitterer and Schweizer2017). In particular, Bellaire and others (Reference Bellaire, van Herwijnen, Mitterer and Schweizer2017) pointed out discrepancies between measured and forecast weather variables and consequently snow related properties, and developed mitigation strategies to deal with these differences. They adapted the LWC-based thresholds above which a day is considered as a wet-snow avalanche day depending on the nowcast or forecast nature of the input data.
More generally, and not restricted to the topic of wet-snow avalanches, statistical avalanche forecasting has been widely studied. A comprehensive overview was recently presented in Sielenou and others (Reference Sielenou2021). In the last few years, machine learning methods have been increasingly used in data-based approaches. In particular, various random forest (RF) models (ensemble of decision trees) have been developed to support snow avalanche forecasting. Among others, Pérez-Guillén and others (Reference Pérez-Guillén2022) successfully set up a RF model based on weather and snowpack variables to predict the avalanche danger level (EAWS, 2022a), Mayer and others (Reference Mayer, van Herwijnen, Techel and Schweizer2022) implemented a RF model to assess snowpack stability, and Viallon-Galinier and others (Reference Viallon-Galinier, Hagenmuller and Eckert2022) developed a RF model based on avalanche observations to predict avalanche activity. Machine learning methods have also been implemented for modeling avalanche debris susceptibility (e.g. Choubin and others, Reference Choubin, Borji, Hosseini, Mosavi and Dineva2020).
In line with the concept of considering different avalanche problem types (Statham and others, Reference Statham2018), we developed a data-driven model specifically for predicting wet-snow avalanche activity. We trained a RF model based on a dataset of more than 20 years of weather and snow measurements carried out by automated weather stations (AWS) covering the Swiss Alps. These measurements are used to simulate the snowpack for north-, east-, south- and west-facing slopes, providing crucial and temporally highly-resolved, information about changes in LWC and its impact on snowpack layering. Based on manual observations of wet-snow avalanche activity in the Swiss Alps, we defined a binary target variable, avalanche day and non-avalanche day and evaluated model performance in both nowcast and 24-hour forecast modes.
2. Data
In the following, we describe the avalanche observation data (Section 2.1), the weather and snowpack input variables (Section 2.2) and how avalanche observations (target) and input variables were linked (Section 2.3).
2.1 Avalanche data
Throughout the Swiss Alps, about 80 trained observers report avalanches in their area to the national avalanche warning service at the WSL Institute for Snow and Avalanche Research SLF, which is responsible for issuing the avalanche warnings for all regions in the Swiss Alps and Jura mountains. Observations are made on a daily basis during the forecasting season (typically from December to April). Avalanche observations include, among other parameters, an approximate location and the date of the avalanche release - and sometimes the time, the elevation and slope aspect of the release area, the LWC of the snow in the release area of the avalanche (dry or wet) (SLF, 2020), and the size of the avalanche, which is estimated according to the European avalanche size classification ranging from 1 (small) to 5 (extremely large) (Table 1; EAWS, 2022b). In most cases, the exact time of release is unknown and estimated, as the actual avalanche release was not observed. Avalanches can be reported as an individual avalanche event, or by reporting several avalanches in an avalanche summary report. If no recent avalanches were observed, this is also reported.
The avalanche data base includes avalanche records since 2001. During this time, the reporting system changed once. Initially, avalanche records used to be linked to the place of observation rather than the actual location of the avalanche release (IFKIS platform until summer 2019; Bründl and others, Reference Bründl2004). Today the reporting system is map-based (SLFPro, since Oct 2019). A further important difference between previous and present recording standards concerns the way avalanche summaries are reported. Before October 2019, the summary form permitted to report both dry- and wet-snow avalanches in the same report. As a consequence, in case of a summary report, it is not possible to unambiguously extract the number of avalanches of a certain size and liquid water content, nor the aspects and elevation ranges where these avalanches released from the data base (see also Mitterer and Schweizer, Reference Mitterer and Schweizer2013).
For this study, we considered avalanches that released between 1 December and 30 April.
2.2 Weather and snowpack data
We built the training and test dataset based on data from AWS used by the operational avalanche warning service in Switzerland (Section 2.2.1). We also investigated the influence of using NWP output as input data for wet-snow avalanche predictions. NWP data are briefly described in Section 2.2.2. Further details about AWS and NWP setups are provided in Pérez-Guillén and others (Reference Pérez-Guillén2022).
2.2.1 Measurement-based SNOWPACK simulations (‘nowcast’)
Snowpack simulations for the purpose of avalanche forecasting for the Swiss Alps use a station-based setup (Lehning and others, Reference Lehning1999; Morin and others, Reference Morin2020). Currently, the network of automatic weather stations (AWS) of the Intercantonal Measurement and Information System (IMIS) network (Fig. 1) consists of 186 AWS, located at the elevations of potential avalanche starting zones. We only used, for our work, the 124 stations that measure snow properties in addition to weather properties. The measurements are recorded every 30 minutes (Fig. 1; see Appendix A for a list of parameters measured). These measurements are used to drive the simulation of snow stratigraphy with the 1-D snow cover model SNOWPACK (Bartelt and Lehning, Reference Bartelt and Lehning2002). We used the operational SNOWPACK setting (Snowpack DEFAULT version 20210925.e77eeeb; Wever and others, Reference Wever, Fierz, Mitterer, Hirashima and Lehning2014; Morin and others, Reference Morin2020) where liquid water percolation is modeled with a bucket-type approach (Lehning and others, Reference Lehning1999; Vionnet and others, Reference Vionnet2012; Lafaysse and others, Reference Lafaysse2017). Alternatively, water transport can be modeled with the Richards’ equation. However, this transport scheme is more demanding and not used in the operational setting for avalanche forecasting in Switzerland.
Stratigraphic profiles are simulated every 3 hours at the location of each station, including simulations of the snowpack for four virtual slopes (aspects asp: N, E, S, W) with an incline of 38$^\circ$ (Morin and others, Reference Morin2020), resulting in 8 profiles per day and aspect.
If not stated otherwise, we always refer to this nowcast setup.
2.2.2 NWP model-based SNOWPACK simulations (‘24-hour forecast’)
For the purpose of operational avalanche forecasting, SNOWPACK simulations are also driven with output from the NWP model COSMO (Consortium for Small-scale Modeling; Cosmo 2021) allowing the simulation of snowpack stratigraphy for the following 30 hours. To this end, the COSMO-1 model output with 1 km resolution is downscaled to the local topography, referred to as COSMO-OSHD in the following (Griessinger and others, Reference Griessinger2019). As for the nowcast setup, the snow stratigraphy is simulated for four virtual slopes every 3 hours. This downscaling setup provides more realistic simulations of the snowpack properties than driving SNOWPACK directly with the COSMO model output of the grid point nearest to the AWS (Bellaire and others, Reference Bellaire, van Herwijnen, Mitterer and Schweizer2017).
The NWP predictions and the simulated profiles are not stored but overwritten every 3 hours. Therefore, a historic forecast dataset is not available for model development.
2.2.3 Variable definition
From the snowpack simulations, we extracted weather parameters and properties describing the state of the snowpack. As the temporal resolution of the avalanche observations was often restricted to the release day rather than an exact time, we derived 24 h averaged weather and snowpack features (00:00–24:00 local time, LT), as well as weather and snowpack properties corresponding to the time of the wettest profile of the day (Fig. 2). We defined the wettest profile as the profile with the highest LWC index (LWCindex; described below); in most cases, the profile at 15:00 LT was the wettest. If the LWCindex was zero for all time-steps, we used the data from the simulated profile at 15:00 LT. We chose this setup to account for daily weather and snowpack properties and to have a snapshot of the situation, when the snowpack was wettest (similar to Wever and others, Reference Wever, Vera Valero and Techel2018). In addition, we computed the differences to the values describing snowpack properties 1, 2 and 3 days before. The complete list of input features used for model development is reported in Appendix A. LWC-related features used were mostly based on Wever and others (Reference Wever, Vera Valero and Techel2018). We extracted in total 146 of features (Tables 5 and 6). Those included 36 features related to LWC, 9 of them were extracted from the wettest profile of the day and the remaining 27 describe changes in LWC over the last 1, 2 and 3 days. The other 110 features were meteorological and snowpack variables describing either daily averages (flagged with the subscript daily) or actual values at the time of the wettest profile (see Fig. 2).
From the simulated profiles, the LWC of each layer can be derived. It allows to compute the so-called LWC index, LWCindex, a feature that has been suggested to predict wet-snow avalanches (Mitterer and others, Reference Mitterer, Techel, Fierz and Schweizer2013). The index is the average of the LWC, in percent, of the entire snowpack weighted by layer height and divided by 3 %, i.e.
where the sum carried out over all snowpack layers, with ${\rm HS} = \sum _{{\rm layer}}h( {\rm layer})$ the snowpack height, h the layer thickness and LWC the LWC in percent volume. Above a LWC of 3 %, it is assumed that excess water drains to the layers below since it is no longer held between grains by capillary forces, i.e. the threshold of 3 % corresponds to the transition from the pendular to the funicular regime. Therefore, ${\rm LWC}_{\text{index}} \ge 1$ may indicate a particular loss of strength (Mitterer and others, Reference Mitterer, Techel, Fierz and Schweizer2013), or a snowpack wet throughout.
2.3 Linking avalanche observations to SNOWPACK simulations
The occurrence of wet-snow avalanches depends on the snowpack conditions, which may differ strongly between slope aspect and elevation (Mitterer and Schweizer, Reference Mitterer and Schweizer2013; Wever and others, Reference Wever, Vera Valero and Techel2018). Therefore, we linked an observed avalanche in the vicinity of a station st and SNOWPACK simulation for aspect asp if:
1. The avalanche was classified as a wet-snow avalanche, a size was estimated and the aspect and elevation were reported.
2. The avalanche released within an elevation band of ±250 m of the elevation of the station, H st. In case of avalanche summary reports indicating an elevation range (av elev.min-max), avalanches were assigned to this station relative to the overlapping proportion between this elevation band (H st ± 250 m) and the elevation band indicated in the avalanche summary report. For example, if H st was at 2000 m (± 250 m: 1750 to 2250 m), and av elev.min-max ranged from 2000 to 2500 m, the overlapping elevation band is 2000 to 2250 m, which is 50 % of the indicated av elev.min-max. In this case, each of the avalanches in the summary report was weighted with 0.5.
3. The avalanche was within a distance of at most 8.9 km, 17.8 km and 39.9 km corresponding to circle-shaped areas around the location of the station of 250 km2, 1000 km2 and 5000 km2.
4. The avalanche released in the same slope aspect as the simulation was performed for. For example, a wet-snow avalanche observed on a north-facing slope was only linked to the simulated stratigraphy on the virtual slope of northern aspect. If an avalanche was observed on a northeast-facing slope, this avalanche was assigned with half its weight w to both northern and eastern aspect slope simulations.
Figure 3 illustrates the matching requirements 2 to 4.
The avalanche summary reports prior to October 2019 often did not allow to extract the number and size of avalanches by aspect, as dry-snow and wet-snow avalanches could be reported in the same form. Therefore, we could rarely use these data in our analysis.
To quantify avalanche activity in a region, the avalanche activity index (AAI) was used (Schweizer and others, Reference Schweizer, Jamieson and Schneebeli2003). For each station st, we determined the AAI for several spatial scales a (area) for the four slope aspects asp as:
with avalanche size s ∈ {1, 2, 3, 4, 5}, weight w(s) as defined in Table 1, the number of avalanches of size s, $N_s^{asp}( a)$ reported in the area a ∈ {250, 1000, 5000 km2} and slope aspect asp ∈ {north, east, south, west}. The computation of AAI was constrained by restricting it to a specific elevation band around a station: ±250 m.
3. Random-forest model
3.1 Definition of the target variable: avalanche day and non-avalanche day
We defined a binary target variable to discriminate days with wet-snow avalanche activity (avalanche days, AvD) from days without any activity (non-avalanche days, nAvD). Even though avalanche observations are reported from throughout the Swiss Alps, there are regions where rarely any observations are reported. And even in the areas generally covered by observers, observations are not always possible. Therefore, the absence of an avalanche in the data base does not necessarily indicate that no avalanche occurred. Beside the challenge of detecting days when no wet-snow avalanches occurred, we are aware of two main sources of error or uncertainty relating to avalanche observations:
1. The classification of the avalanche as either a wet-snow or a dry-snow avalanche. According to the observational guidelines (SLF, 2020), the presence of liquid water should be estimated based on the conditions in the release area. However, since the avalanche release area is often not accessible, the LWC of the snow in the release area is often an estimate from the valley floor.
2. The release date. The uncertainty in the reported release date depends on the frequency of observations in an area, on weather conditions (e.g. visibility), but also on the overall quality of the observations provided by a specific observer.
To address these points, which are highly relevant for this study, we opted for an approach allowing us to check the plausibility of the release date and avalanche type classification by considering observations independently made on the same day by observers in neighboring areas. More specifically, we define an avalanche day or non-avalanche day, our target variable $Y_{st}^{asp}$, associated with data from a station st for a given slope aspect asp as:
with NaN ‘not a number’. The gap check requirement was $AAI_{st}^{asp}( 5000 \, {\rm km}^2) > AAI_{st}^{asp}( 1000\, {\rm km}^2) \;\hbox{\amp}\; AAI_{st}^{asp} ( 1000 \, {\rm km}^2) \gt$ $AAI_{st}^{asp}( 250\, {\rm km}^2)$, ensuring that avalanche activity increased with increasing area and was not a local phenomenon.
The choice for selecting an AAI threshold of 0.1 was guided by the fact that an $AAI_{st}^{asp} ( 250\, {\rm km}^2) \geq 0.1$ either means that there was at least one potentially life-threatening size 2 event (Eqn. (2); Table 1) or 10 avalanches of size 1. Moreover, an area a of 250 km2 corresponds approximately to the mean size of the warning regions, the smallest spatial units used for communication of avalanche danger in the Swiss avalanche bulletin (e.g. Techel and others, Reference Techel2018), and 5000 km2 is approximately the size of a snow-climate region (e.g. Techel, Reference Techel2020, p. 50). Within a warning region there is often at most one observer providing regular avalanche observations; within a snow-climatological region there are typically between 5 and 20 observers. With the definition given above, we ensure that local avalanche activity with $AAI_{st}^{asp} ( 250\, {\rm km}^2) \geq 0.1$ is representative of avalanche activity in a larger area. We chose this approach to ensure that the target variable was not a local property, but included several independent observations, thus reducing noise due to erroneous dating or wetness classification. We considered this step as important, as $AAI_{st}^{asp}$ only considers observations within a specific slope aspect and elevation band. We used a similarly strict criterion to define a non-avalanche day (nAvD: $Y_{st}^{asp} = 0$) as the uncertainty related to observations of no avalanches is much higher. Thus, a day is considered only as nAvD if none of the observers in the Swiss Alps reported a wet-snow avalanche. Furthermore, we considered only nAvD for station-aspect combinations that had at least one AvD in the same season.
With this definition for the target variable, we wanted to separate days with widespread avalanche activity (AvD) of a certain magnitude from days with absolutely no avalanches (nAvD), and excluded days with rather local avalanche activity (NaN values). With regard to the development of the model, our strict definition led us to train and test the classifier on rather extreme cases, which are however comparatively reliable in terms of the quality of the class label.
3.2 Labeling and splitting the dataset
Input data were labeled as AvD and nAvD according to Eqn (3). We split the data into three uncorrelated datasets according to the winter the avalanches occurred (Fig. 4): Dataset1 contained AvD (Y = 1) and nAvD (Y = 0) labeled data for the winter 2019–2020 and historical AvD events between 2001 and 2014. Dataset2 contained data for winter 2020–2021. Operational tests were carried out during the winter 2021–2022 (dataset3). The dataset characteristics are summarized in Table 2. The stringent manner, we defined the target variable (Eqn. (3)), considerably reduced the number of AvD events. For instance, dataset2 and dataset3 originally contained 2595 and 1696 records with AAI ≥ 0.1, respectively. However, only 223 (dataset2) and 97 (dataset3) were labeled as AvD according to our target variable definition (Table 2).
In parentheses, the number of individual calendar days are given; for example, during winter 2021–2022 there were 97 AvD events on 12 different calendar days.
In the following, we name ’test set’ the complementary of the training set, for example, if we used dataset1 as the training set, the test was dataset2.
3.3 Random forest model development
We trained a RF model, as it is one of the most common and robust machine learning models to classify tabular data (Breiman, Reference Breiman2001). We used the scikit-learn library in Python (Pedregosa and others, Reference Pedregosa2011).
The datasets are heavily imbalanced and contain only relatively few AvD events (Table 2). As the datasets are also rather small, we did not use a standard approach of splitting the data in a training, a validation and a test set. Instead, we only used a training and a test set.
We relied on the out-of-bag (OOB) score to assess model performance (Probst and others, Reference Probst, Wright and Boulesteix2019). Each tree of a RF model was trained separately using a training subset randomly sampled with replacement. These subsets are named bootstrap samples. The left out samples, i.e. samples not contained in the boostrap samples, are named the OOB samples and were used to compute OOB scores, such as the f1-score (see Appendix C for metrics definitions). The OOB samples size was around 1/3 of the size of the training set (Breiman, Reference Breiman2001).
The RF hyperparameters were tuned to maximize the out-of-bag f1-score of class AvD of the training set using a grid search (Table 7).
In order to reduce the complexity of the model and remove non-relevant input variables, we applied recursive feature elimination (RFE; Guyon and others, Reference Guyon, Weston, Barnhill and Vapnik2002). First, we fitted a model keeping all the original 146 features and computed impurity-based features importance ranking (for implementation details see Breiman, Reference Breiman2017). Then we discarded the less important features, re-fitted the model and repeated the process until only 4 features remained. At each step, we computed the OOB f1-score of AvD, which allowed us to select the best model as a function of the number of features. The best performance was obtained using 40 features (Fig. 11). We used this set of features for the remaining of the work.
The final prediction of the RF model is obtained by taking the majority vote of the predictions made by each individual decision tree, and the probability of a particular class, here AvD or nAvD, is determined by the percentage of trees that have voted for that class. In the following, we used also the AvD probability to assess model performance.
4. Results
4.1 Model performance for historic data
We trained three distinct RF models: on dataset1 (winters before Sept. 2020), on dataset2 (winter 2020–2021), and on dataset12 (winters before Sept. 2021). Models are named respectively, RF1, RF2 and RF12. The OOB f1-score of RF12 was highest, while performance metrics of RF2, only trained with winter 2020–2021, were lowest (Table 3). The f1-score of the test sets for RF1 and RF2 were close to the OOB f1-scores. This finding suggests that the trained models did not over-fit. Model performance metrics (precision and recall) for days with no wet-snow avalanche activity (nAvD) were all very close to 1 and are not reported in Table 3.
OOB scores were computed on the training set. Tests were carried out on data not used for training, either dataset1 or dataset2 (Table 2), or the data from the dataset3 (test winter 2021–2022).
We further evaluated the performance of the models on historical winters. Since slope aspect was not consistently reported prior to the winter 2019-2020 in the IFKIS data base, many winters were not included in the dataset. There were winters without any avalanche event satisfying the definition of the target variable (Eqn. (3)), or when the aspect information was recorded in an inconsistent format (Fig. 4). Nevertheless, for these winters, we could still determine a daily wet-snow avalanche activity for Switzerland, regardless of slope aspect, and compare it with daily average models outputs (all locations, all aspects; Table 4). For example, we showed that the daily mean AvD probability correlated well with avalanche activity for all aspects and all stations (Table 4 and Fig. 5a).
Spearman correlation coefficient (r s) between the mean model-predicted AvD probability and the wet-snow avalanche activity index, AAI(Swiss Alps), regardless of elevation and aspect. In addition, correlations are shown for winter 2021–2022 when the model was run in nowcast and forecast mode. The latter is shown in brackets. All p-values are below 10−6.
4.2 Detailed model performance for winter 2021–2022 (nowcast and 24-hour forecast mode)
We tested the RF12 model in a live setting during the winter 2021–2022 in nowcast and in forecast mode, as described in Section 2.2.2.
97 events were labeled as wet-snow avalanche days AvD and 2151 events as non-avalanche days nAvD (Table 2). Model performance for model RF12 in nowcast and forecast mode are summarized in Fig. 6 and Table 3. The f1-score of the nowcast model was 0.82, similar to the OOB f1-score of the model (0.79; Table 3). Nowcast and forecast output probabilities were strongly correlated (r s = 0.96, Fig. 5b), yet forecast f1-score was slightly lower compared to the nowcast score due to relatively low value of recall. The model using forecast inputs predicts less AvD (77 AvD) than using nowcast data (118 AvD; Fig. 6a and b). However, large activity was well captured by the model based on forecast input, although predicted probability for AvD was generally lower than the probability obtained with nowcast input (Fig. 6). This performance difference is due to discrepancies between nowcast (based on measurements) and forecast (based on NWP model) input data. This difference is clearly noticeable by studying the model behavior in function of p thres, the probability threshold above which AvD is defined. The model was trained with nowcast input to maximize the f1-score of AvD (Section 3) using a default p thres = 0.5. We observed that in nowcast mode, the model performed well with the default threshold of 0.5, i.e. the maximum of the f1-score of AvD was very close to p thres = 0.5 (Fig. 7a). However, in forecast mode the maximum f1-score was obtained for p thres = 0.36 (Fig. 7a, Table 3).
The target variable was designed to separate avalanche days from non-avalanche days, by reducing the influence of observation errors and, thus, providing robust training labels. However, by doing this, many wet-snow avalanches were discarded and flagged as NaN, such as, for example, an isolated large wet-snow avalanche. To assess model performance without the potential filtering bias introduced by our definition of the target variable, we also compared the aspect-specific mean daily model predictions with all avalanche observations in these slopes. In Fig. 8, the size of observed wet-snow avalanches at different elevations are plotted with the predictions of the RF12 model and the LWCindex for the 2021–2022 winter. Overall, the predicted probability of an AvD increased at the onset of wet-snow avalanche cycles and decreased afterwards. Moreover, differences between south- and north-facing slopes were generally well reproduced (compare Fig. 8a and 8b). The model performed also well with regards to the maximum elevation at which avalanche activity was observed (Fig. 8). The average absolute difference (mean absolute error) between predicted maximum elevation in both nowcast and forecast mode (using p thres = 0.36) and observations, excluding nAvD, was around 300 m. Toward the end of the season, the model overestimated wet-snow activity, especially for south-facing slopes (Fig. 8e).
The LWCindex, also plotted in Fig. 8, is a parameter that has been suggested to predict the onset of wet-snow avalanche periods (e.g. Mitterer and others, Reference Mitterer, Techel, Fierz and Schweizer2013; Bellaire and others, Reference Bellaire, van Herwijnen, Mitterer and Schweizer2017). It has also been used operationally by the Swiss avalanche warning service to estimate snowpack wetting and wet-snow avalanche activity. However, the LWCindex generally did not decrease when wet-snow avalanche activity decreased, as observed for south-facing slopes in the second half of April 2022 (Fig. 8). LWCindex -based threshold values will therefore lead to many false alarms as can be noticed in Fig. 9a (low precision means high false alarm ratio). While models including the temporal change of LWCindex resulted in a better performance (Fig. 9b).
4.3 Feature importance
Impurity-based feature importance (Breiman, Reference Breiman2017) for RF12 model suggests that snow surface conditions and the presence of liquid water in the snowpack were important to predict wet-snow avalanche activity (Fig. 10). The two most important features were related to snow surface temperature (TSS) and sum of the LWC in the top 15 cm (sum_up) of the snowpack. From the 20 most important features, 10 relate to the LWC of the snowpack, with 5 of these features describing changes in LWC over 1, 2 and 3 days (Fig. 10). The set of 20 most important features included only 11 different features, provided that the temporal characteristics of the variables (computed at the time of the wettest profile, daily averaged or temporal changes) were not considered.
5. Discussion
We implemented a RF model to predict wet-snow avalanche days. The model was trained on nowcast data from station measurements. Despite the small size and the highly imbalanced nature of the training datasets, our model reproduced the observed wet-snow avalanche activity rather well (Figs. 5, 6 and 8; Table 4). However, the operational avalanche forecasting process is based on 24-hour NWP data (in Switzerland the COSMO-OSHD data, Section 2.2.2). We therefore also investigated model performance in forecast mode, highlighting some discrepancies. Specifically, because air temperature sensors at the AWSs are not ventilated, there is a significant temperature offset compared to COSMO-OSHD forecast data, especially during sunny and calm afternoons (Bellaire and others, Reference Bellaire, van Herwijnen, Mitterer and Schweizer2017). This impacts the downstream SNOWPACK simulations, in particular also the simulated LWC. Profiles based on NWP data therefore generally appeared dryer than nowcast profiles. Despite this difficulty, our results suggest that the model performed reasonably well in forecast mode (Figs. 5b and 6). A better approach would certainly be to train the model directly with NWP data rather than station data. However, the forecasted weather and snow properties produced for the operational warning service in Switzerland are not stored but overwritten at each update.
A direct comparison between previous models (Mitterer and others, Reference Mitterer, Techel, Fierz and Schweizer2013; Bellaire and others, Reference Bellaire, van Herwijnen, Mitterer and Schweizer2017) and our RF model is not feasible, though these models also aimed to predict days with wet-snow avalanche activity. Indeed, the definition of the target variable is different for the three approaches. Nevertheless, our RF model performed better than a simple LWCindex threshold approach as presented by Bellaire and others (Reference Bellaire, van Herwijnen, Mitterer and Schweizer2017) (Fig. 9). Mitterer and others (Reference Mitterer, Techel, Fierz and Schweizer2013) mainly used weather and energy balance variables as model input. While we also used these variables, we additionally introduced variables characterizing the state and the temporal changes of snowpack conditions. Temporal changes in LWC were ranked as important variables in our model.
5.1 Feature importance
The feature importance of the model provides insight into the main drivers of wet-snow avalanche release, as it was trained without any prior expert knowledge. As expected, the snow surface temperature (TSS) and LWC at the surface (sum_up) were key predictors, as these are closely linked to snow melting, resulting in a higher LWC. Snow depth (HS) changes were also decisive for the model as already shown in previous studies (Baggi and Schweizer, Reference Baggi and Schweizer2009). As highlighted in several studies (e.g. Techel and others, Reference Techel, Pielmeier and Schneebeli2011; Wever and others, Reference Wever, Vera Valero and Fierz2016, Reference Wever, Vera Valero and Techel2018), temporal changes in the LWC parameters were also crucial. In our model, almost half of the 20 most important features describe temporal changes. This temporal sensitivity of the model allows to reproduce the onset characterized by increases in LWC, but also the end of periods with wet-snow avalanches when LWC decreased or remained constant. The capability of the model to detect the end of a cycle is an important improvement compared to previous LWCindex-based thresholds approaches (Mitterer and others, Reference Mitterer, Techel, Fierz and Schweizer2013; Bellaire and others, Reference Bellaire, van Herwijnen, Mitterer and Schweizer2017), (Fig. 9). One of the main limitations related to the feature importance ranking scheme we used (the mean decrease impurity algorithm) is that it is biased by the variability of the different wet-snow avalanche situations contained in the training set. For example, there are only few avalanches that were caused by rain. Most avalanches released during dry weather periods when surface melting produced liquid water. Features related to rain were therefore ranked as less important. However, rain increases the LWC of the snowpack, and is therefore well represented by LWC-related input features. Alternative approaches to determine feature importance, such as Permutation Based Feature Importance (Breiman, Reference Breiman2001) or SHAP feature importance (Lundberg and Lee, Reference Lundberg and Lee2017), may provide different results.
5.2 Target variable definition
One of the strengths of the developed approach is the restrictive definition for avalanche days and non-avalanche days (Eqn. (3)) allowing us to use observational data. Thus, we reduced potential errors related to the release date and avalanche type (wet- or dry-snow). However, this approach came at the cost of substantially reducing the amount of data available to train the model. Consequently, the model was developed likely using a smaller variety of AvD and nAvD situations. For example, most of the wet-snow avalanche events we have used for training were associated with surface melt rather than rain. Moreover, the definition of the target variable only selected the most ‘extreme’ AvD and nAvD days. Being aware of this limitation of our approach, we tested whether the resulting model was oversimplified by carrying out a target-independent check. We compared model output with unfiltered, raw avalanche observations (Figs. 6 and 8; Table 4). By doing so, we returned to the original dataset keeping its intrinsic errors, but allowed for studying model behavior over entire seasons rather than for a few disparate AvD and nAvD. This ’beyond the target definition’ verification was a crucial test that provided, for example, insight into model performance in an operational setting. Observed wet-snow avalanche cycles were well correlated with model output in both nowcast and forecast mode (Fig. 6).
In preliminary tests, we also explored other target variable definitions. For example, among many others, we prototyped a model based only on ’local’ constraint i.e. computing AvD and nAvD from Eqn. (3) only considering the activity in the 250 km2 AWS surrounding. However, the resulting models were characterized by a high false alarm rate. It is difficult to assess if this was due to intrinsic limitations of the RF model (i.e. difficulties to deal with ‘intermediate’ events) or because the quality of the input data and/or the target variable were insufficient. Therefore, it might be of interest to study the influence of the target definition (using different thresholds, radius and also exploring multiclass classification) on the performance of the model.
5.3 Spatial information
To evaluate model performance beyond the strict definition of our target variable (Section 3.1), we used avalanche observation data aggregated at the scale of the entire Swiss Alps (Figs. 6 and 8; Table 4). Alternatively, the model predictions at the individual locations of the AWS may be used to establish maps showing the spatial variations in predictions. Furthermore, this would allow for further assessing model performance beyond the binary classification and for different spatial scales.
5.4 Model limitations
Despite going back 20 years, the necessity to obtain high quality labels forced us to train the model using a comparably small dataset. We suspect that a larger dataset, covering a wider variety of avalanche conditions, may increase the model performance further. For instance, we noticed that the model over-predicted avalanche activity during the second half of April 2022 (Fig. 8). The reason for this behavior is not clear. One possible explanation is that the training data did not permit to learn the complex interaction between continued wetting of an already wet snowpack sometimes leading to avalanches, though often not. Another reason might be that the model did not include previous wet-snow avalanche activity as an input variable. After all, previous avalanche activity is highly relevant information taken into account by avalanche forecasters. Furthermore, after a period with wet-snow avalanches or extensive wetting of the snowpack down to basal layers, loading by new snow or rain may be needed to lead to another period of wet-snow avalanches. Potentially, this could be addressed by including additional input variables having longer temporal dependency (for example the sum of new snow of the last 7 days). Therefore, we primarily see two paths to further improve model performance: first, increasing the dataset available for training and testing. However, we doubt whether this should be done at the cost of a less restrictive definition of avalanche and non-avalanche days. Second, it might prove beneficial to couple model predictions with avalanche event data. This might require completely new approaches to modeling wet-snow stability.
5.5 Further perspectives
The presented wet-snow avalanche model provides an indication of the likelihood whether wet-snow avalanches will occur in a region. However, in its current setup, it does not include information on the magnitude of the event, i.e. the number and size of avalanches. Since this information is of great importance for avalanche forecasters, future work should aim to develop a model that also provides such information. In this context, the depth below the surface where liquid water is present could be used to provide a rough estimate of avalanche size, as suggested by Wever and others (Reference Wever, Vera Valero and Techel2018).
To develop a model with a higher spatial resolution, one could use the gridded COSMO-OSHD data in an extended way (Section 2.2.2). Indeed, in the presented model, COSMO-OSHD data were only used to get 24-hour forecasts at the locations of the AWSs. However, by using all the 1 km grid cells that tile the Swiss Alps, would provide predictions for the entire Swiss Alps.
Finally, we believe that the model may also be suitable to study the influence of climate change on wet-snow avalanche activity, using climate scenarios such as presented in Fischer and others (Reference Fischer2022).
6. Conclusion
We developed a RF model to predict wet-snow avalanche activity as a function of slope aspect based on a 20-year dataset of snow stratigraphy simulations for virtual 38$^\circ$ slopes facing north, east, south and west at the location of 124 automated weather and snow stations in the Swiss Alps. To make the best use of manual avalanche observations, we developed a rigorous approach to define avalanche days, thereby strongly reducing the potential errors in the dating and wetness classification of avalanches and, hence, increasing the quality of the labels for training and testing the model. Despite a strongly imbalanced dataset, with more than ten times more non-avalanche days compared to avalanche days, the model performance was satisfactory leading to an OOB f1-score of 0.79 (Table 3). Because of the stringent definition of our target variable, our model only learned from the most extreme active or inactive days. Therefore, we paid particular attention to study the behavior of the model for ’intermediate’ days (the NaN provided by Eqn. (3)). We assessed the model performance using all avalanche observations from the Swiss Alps for eight winters not included in the training dataset, and without any of the filtering applied for training the models, yielding a correlation between model predictions and avalanche activity of around 0.71 (Spearman correlation) (Table 4).
In addition to the model development and validation using historic data, we also carried out a ‘live’ test during winter 2021–2022 (Fig. 8). This allowed to evaluate the model performance in both a nowcast and 24-hour forecast mode (Fig. 6). The results showed that even in forecast mode, which was not available for model training, the aspect-dependent occurrence of wet-snow avalanches, or their absence, was generally well captured (f1-score about 0.8). The promising results obtained during the semi-operational testing during winter 2021–2022 suggests that the model may be useful to support operational avalanche forecasting.
Data
The datasets used in this paper is available on https://www.envidat.ch/dataset/data_wet_aval_model and the scripts on https://gitlabext.wsl.ch/richter/rfmodel_wetsnow.
Acknowledgements
We thank Marc Ruesch and Mathias Bavay who provided access to avalanche forecast and weather station data and to the SNOWPACK simulations. This project was partly funded by the Swiss Data Science Center under grant C18-05 “DEAPSnow”.
Appendix A
Appendix B
We tuned the hyperparameters of the model using a grid search procedure. The range hyperparameters explored and the obtained results for RF12 (model trained on dataset 1 and 2) are presented in Table 7.
We reduce the number of features used to train the model using recursive feature elimination (Guyon and others, Reference Guyon, Weston, Barnhill and Vapnik2002). The results of the procedure are shown in Fig. 11.
Appendix C
As the target variable was binary: Y = 0 (nAvD) or Y = 1 (AvD), metrics used to optimize and assess model performance were based on the confusion matrix as defined in Table 8, for more details see e.g. Fawcett (Reference Fawcett2006). The f1-score f 1 is the harmonic mean between recall R and precision P, i.e.,
The recall R (or hit rate) is:
Precision (also called positive predictive value), P, is defined as:
True/false and positive/negative are defined according to Table 8.