Probabilistic models for harmful algae: application to the Norwegian coast

Edson Silva; Julien Brajard; François Counillon; Lasse H. Pettersson; Lars Naustvoll

doi:10.1017/eds.2024.11

Probabilistic models for harmful algae: application to the Norwegian coast

Published online by Cambridge University Press: 02 May 2024

Edson Silva

Julien Brajard

François Counillon ,

Lasse H. Pettersson and

Lars Naustvoll

Show author details

Edson Silva*: Affiliation:
Nansen Environmental and Remote Sensing Center, and Bjerknes Centre for Climate Research, Bergen, Vestland, Norway
Julien Brajard: Affiliation:
Nansen Environmental and Remote Sensing Center, and Bjerknes Centre for Climate Research, Bergen, Vestland, Norway
François Counillon: Affiliation:
Nansen Environmental and Remote Sensing Center, and Bjerknes Centre for Climate Research, Bergen, Vestland, Norway
Lasse H. Pettersson: Affiliation:
Nansen Environmental and Remote Sensing Center, Bergen, Vestland, Norway
Lars Naustvoll: Affiliation:
Plankton department, Institute of Marine Research, Arendal, Agder, Norway
*: Corresponding author: Edson Silva; Email: [email protected]

Article contents

Abstract
Impact Statement
Introduction
Material and methods
Results
Discussion
Future perspectives
Author contribution
Competing interest
Data availability statement
Ethics statement
Funding statement
References

Abstract

We have developed probabilistic models to estimate the likelihood of harmful algae presence and outbreaks along the Norwegian coast, which can help optimization of the national monitoring program and the planning of mitigation actions. We employ support vector machines to calibrate probabilistic models for estimating the presence and harmful abundance (HA) of eight toxic algae found along the Norwegian coast, including Alexandrium spp., Alexandrium tamarense, Dinophysis acuta, Dinophysis acuminata, Dinophysis norvegica, Pseudo-nitzschia spp., Protoceratium reticulatum, and Azadinium spinosum. The inputs are sea surface temperature, photosynthetically active radiation, mixed layer depth, and sea surface salinity. The probabilistic models are trained with data from 2006 to 2013 and tested with data from 2014 to 2019. The presence models demonstrate good statistical performance across all taxa, with R (observed presence frequency vs. predicted probability) ranging from 0.69 to 0.98 and root mean squared error ranging from 0.84% to 7.84%. Predicting the probability of HA is more challenging, and the HA models only reach skill with four taxa (Alexandrium spp., A. tamarense, D. acuta, and A. spinosum). There are large differences in seasonal and geographical variability and sensitivity to the model input of different taxa, which are presented and discussed. The models estimate geographical regions and periods with relatively higher risk of toxic species presence and HA, and might optimize the harmful algae monitoring. The method can be extended to other regions as it relies only on remote sensing and model data as input and running national programs of toxic algae monitoring.

Keywords

aquaculture climate services harmful algae machine learning

Type: Application Paper
Information: Environmental Data Science , Volume 3 , 2024 , e12

DOI: https://doi.org/10.1017/eds.2024.11 [Opens in a new window]
Creative Commons: This is an Open Access article, distributed under the terms of the Creative Commons Attribution-NonCommercial licence (http://creativecommons.org/licenses/by-nc/4.0), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original article is properly cited. The written permission of Cambridge University Press must be obtained prior to any commercial use.
Copyright: © The Author(s), 2024. Published by Cambridge University Press

Impact Statement

Some algae produce toxins that contaminate shellfish and poison humans upon consumption, posing significant risks to human health. They also impact the aquaculture business since shellfish farmers cannot sell their products once the shellfish are contaminated by toxins. Assessing toxic algae risks can allow farmers to make better-informed decisions and reduce economic loss. We calibrate machine learning models that assess the probability of the presence and hazardous levels of eight toxic algae in Norway based on environmental factors, such as temperature, light, salinity, and mixed layer depth. The probabilistic models can be fed with predictions of environmental factors, and thus also predict the probability of toxic algae. Public agencies can use this information to execute mitigation actions against toxic algae.

1. Introduction

Toxic algae pose a significant threat to human health due to their ability to contaminate shellfish with toxins, resulting in poisoning symptoms or, in extreme cases, death upon consumption. Prevention of poisoning outbreaks relies on monitoring programs to assess the abundance of harmful algae and toxins in shellfish farms. When harmful abundances (HAs) are detected, shellfish sales are forbidden and public advice is provided. Nonetheless, both monitoring programs and preventing the consumption of contaminated shellfish incur significant costs, and there still remains a small possibility of outbreaks that can harm the public (Castberg et al., Reference Castberg, Torgersen, Aasen, Aune and Naustvoll2004; Hoagland et al., Reference Hoagland, Anderson, Kaoru and White2002; Hoagland and Scatasta, Reference Hoagland and Scatasta2006; Karlson et al., Reference Karlson, Andersen, Arneborg, Cembella, Eikrem, John, West, Klemm, Kobos, Lehtinen, Lundholm, Mazur-Marzec, Naustvoll, Poelman, Provoost, Rijcke and Suikkanen2021; Martino et al., Reference Martino, Gianella and Davidson2020; Pettersson and Pozdnyakov, Reference Pettersson and Pozdnyakov2013). Assessing geographical regions and time periods with an elevated probability of toxic species detection can offer several advantages, such as optimizing monitoring programs by redistributing resources and efforts, enhancing the protection of public health, enabling early harvesting prior to toxin outbreaks, improving business planning and investment decisions, and fostering increased consumer confidence (Jin et al., Reference Jin, Moore, Holland, Anderson, Lim, Kim, Jardine, Martino, Gianella and Davidson2020). Probabilistic models can serve this purpose and inform likelihood changes based on external factors that influence the growth of toxic algae, such as sea surface temperature (SST), mixed layer depth (MLD), photosynthetic active radiation (PAR), and sea surface salinity (SSS) (Anderson et al., Reference Anderson, Kudela, Benitez-Nelson, Sekula-Wood, Burrell, Chao, Langlois, Goodman and Siegel2011, Reference Anderson, Alpermann, Cembella, Collos, Masseret and Montresor2012; Bates et al., Reference Bates, Hubbard, Lundholm, Montresor and Leaw2018; García-Portela et al., Reference García-Portela, Riobó, Reguera, Garrido, Blanco and Rodríguez2018; Jauffrais et al., Reference Jauffrais, Séchet, Herrenknecht, Truquet, Véronique, Tillmann and Hess2013; Kim et al., Reference Kim, Kang, Kim, Yih, Coats and Park2008; Klemm et al., Reference Klemm, Cembella, Clarke, Cusack, Arneborg, Karlson, Liu, Naustvoll, Siano, Gran-Stadniczeñko and John2022; Paz et al., Reference Paz, Vázquez, Riobó and Franco2006; Reguera et al., Reference Reguera, Velo-Suárez, Raine and Park2012).

Few advancements have been made in the development of probabilistic models for toxic algae. For example, at the USA West Coast, logistic generalized linear models (GLMs) were utilized to model the probability of Pseudo-nitzschia spp. exceeding 10,000 CellsL⁻¹, incorporating observations of ocean color, temperature, and salinity (Anderson et al., Reference Anderson, Kudela, Benitez-Nelson, Sekula-Wood, Burrell, Chao, Langlois, Goodman and Siegel2011). In French Mediterranean lagoons, decision tree rules based on temperature and salinity thresholds were employed to model the probabilities of hazardous abundances of Alexandrium tamarense and Dinophysis spp. (Bouquet et al., Reference Bouquet, Laabir, Rolland, Chomérat, Reynes, Sabatier, Felix, Berteau, Chiantella and Abadie2022). In Irish coastal waters, gradient boosting models (GBMs) were applied to model the probability of the presence/absence of A. tamarense using inputs such as temperature, salinity, and a water stratification index (Klemm et al., Reference Klemm, Cembella, Clarke, Cusack, Arneborg, Karlson, Liu, Naustvoll, Siano, Gran-Stadniczeñko and John2022). Although these studies have made significant contributions, none have addressed a wide range of toxic algae taxa, nor have they been calibrated for the Norwegian coastal shelf.

Along the Norwegian coast, the Norwegian Food Safety Authority (NFSA) regularly monitors several toxic algae taxa, including Dinophysis acuminata Claparède & Lachmann 1859, Dinophysis acuta Ehrenberg 1839, Dinophysis norvegica Claparède & Lachmann 1859, Alexandrium Halim, 1960, A. tamarense (Lebour) Balech 1995, Pseudo-nitzschia H. Peragallo, 1900, Azadinium spinosum Elbrächter & Tillmann 2009, and Protoceratium reticulatum (Claparède & Lachmann) Bütschli 1885. These taxa are monitored because of their association with Diarrhetic Shellfish Toxins, Paralytic Shellfish Toxins, Amnesic Shellfish Toxins, Azaspiracid Shellfish Toxins, and yessotoxins. The monitoring program has been in operation since 2006 and carries out extensive weekly monitoring of algae abundance and monthly monitoring of toxins in all shellfish farms in operation along the Norwegian coast. This long-term data series makes possible a robust calibration and validation of probabilistic models encompassing a wide range of toxic algae taxa. Moreover, the Norwegian coastline stretches across a high range of latitudes (58°N–71°N) and exhibits substantially different environmental settings, making it an optimal natural laboratory for studying harmful algae responses to a wide range of environmental conditions (Wells et al., Reference Wells, Karlson, Wulff, Kudela, Trick, Asnaghi, Berdalet, Cochlan, Davidson, Rijcke, Dutkiewicz, Hallegraeff, Flynn, Legrand, Paerl, Silke, Suikkanen, Thompson and Trainer2020).

This study calibrates and validates models for estimating the probability of presence and HA of key toxic algae species, including D. acuminata, D. acuta, D. norvegica, Alexandrium spp., A. tamarense, Pseudo-nitzschia spp., P. reticulatum, and A. spinosum, along the Norwegian coast. The probabilistic models are based on support vector machines (SVMs), which among the many existing machine learning methods is well suited for harmful algae bloom (HAB) modeling as it requires a small amount of training data to find an optimal solution (Silva et al., Reference Silva, Counillon, Brajard, Pettersson and Naustvoll2023; Cruz et al., Reference Cruz, Costa, Vinga, Krippahl and Lopes2021). Since algae in situ data are scarce, this trait is optimal for modeling presence and HA. Some studies have already employed SVM for HAB modeling and showed superior performance against other methods such as artificial neural networks and random forest (Li et al., Reference Li, Yu, Jia and Song2014; Ribeiro and Torgo, Reference Ribeiro and Torgo2008). As model inputs, we use SST and PAR estimated from satellite observations and MLD and SSS from operational ocean reanalysis. These inputs are chosen for two reasons: (i) they have enough time span and spatial coverage matching the algae observations in the shellfish farms and (ii) algae growth is mostly driven by these factors. Algae show distinct temperature-related traits that cause them to grow or diminish in different temperature ranges (Basti et al., Reference Basti, Suzuki, Uchida, Kamiyama and Nagai2018; Fehling et al., Reference Fehling, Green, Davidson, Bolch and Bates2004; Guerrini et al., Reference Guerrini, Ciminiello, Dell’Aversano, Tartaglione, Fattorusso, Boni and Pistocchi2007; Nagai et al., Reference Nagai, Matsuyama, Oh and Itakura2004; Rial et al., Reference Rial, Sixto, Vázquez, Reguera, Figueroa, Riobó, Rodríguez, acuta and acuminata2023; Röder et al., Reference Röder, Hantzsche, Gebühr, Miene, Helbig, Krock, Hoppenrath, Luckas and Gerdts2012; Thomas et al., Reference Thomas, Kremer, Klausmeier and Litchman2012); PAR corresponds to the light available for photosynthesis and therefore strongly influences algae growth (Bill et al., Reference Bill, Cochlan and Trainer2012; García-Portela et al., Reference García-Portela, Riobó, Reguera, Garrido, Blanco and Rodríguez2018; Jauffrais et al., Reference Jauffrais, Séchet, Herrenknecht, Truquet, Véronique, Tillmann and Hess2013); salinity variations affect algae by inducing osmotic stress, creating ion stress through the unavoidable absorption or loss of ions, and altering the cellular ionic ratios due to selective mechanisms (Jauffrais et al., Reference Jauffrais, Séchet, Herrenknecht, Truquet, Véronique, Tillmann and Hess2013; Kirst, Reference Kirst1990; Klemm et al., Reference Klemm, Cembella, Clarke, Cusack, Arneborg, Karlson, Liu, Naustvoll, Siano, Gran-Stadniczeñko and John2022; Nagai et al., Reference Nagai, Matsuyama, Oh and Itakura2004; Rial et al., Reference Rial, Sixto, Vázquez, Reguera, Figueroa, Riobó, Rodríguez, acuta and acuminata2023; Weber et al., Reference Weber, Olesen, Krock and Lundholm2021); shallower MLD—a common proxy to well-stratified waters—is commonly associated with HABs (Klemm et al., Reference Klemm, Cembella, Clarke, Cusack, Arneborg, Karlson, Liu, Naustvoll, Siano, Gran-Stadniczeñko and John2022; Reguera et al., Reference Reguera, Velo-Suárez, Raine and Park2012). Note that other important variables, such as nutrients, are not included as no product available matches the farms in a long time series. By using those inputs, the SVM skill in modeling each toxic algae probability is evaluated. The models are validated using an operational setting, trained with data spanning from 2006 to 2013, and tested with data covering the period from 2014 to 2019. The influence of the input predictors is assessed for each taxon’s probabilistic model. Seasonal probability and annual risk maps for all targeted algae species are presented for the Norwegian coastline.

2. Material and methods

2.1. Study region

The Norwegian coast is surrounded by the Skagerrak Strait, the North Sea, the Norwegian Sea, and the Barents Sea (Figure 1). Two significant current systems dominate circulation: the Norwegian Atlantic Current (NwAC) and the Norwegian Coastal Current (NCC). The NwAC is an extension of the North Atlantic Current as it flows between the Faroe Islands and Scotland and continues northward along with the Norwegian Continental Shelf break up to the Barents Sea (Eldevik et al., Reference Eldevik, Nilsen, Iovino, Olsson, Sandø and Drange2009; Furevik et al., Reference Furevik, Bentsen, Drange, Johannessen and Korablev2002). The NCC flows from the south of Norway and along the coast north to the Barents Sea. The NCC is substantially fresher than the NwAC (composed of Atlantic Water), as it transports fresh waters from the land inflow, the Baltic, and the North seas.

Figure 1. Study region. The farm locations are represented by dots, and the circles encompass the area over which the satellite and models are averaged (44 km). Areas used in Figures 9 and 10 are highlighted in red.

The Norwegian coastal waters extend from the sub-Arctic and to the Arctic regions and comprise disparate environmental conditions. In northern Norway, the polar night is from November 18 to January 23, and the sun is above the horizon all day between the May 20 and July 24 (Giesen et al., Reference Giesen, Andreassen, Oerlemans and Broeke2014). SST can vary from 5 °C in winter to 20 °C in summer in the North Sea and from 1 °C to 15 °C in the Barents Sea opening (Chen et al., Reference Chen, Schulz-Stellenfleth, Grayek and Staneva2021; Jakowczyk and Stramska, Reference Jakowczyk and Stramska2014). In the Skagerrak Strait, fresher waters are brought from the Baltic Sea and river runoff, varying from 10 PSU in summer to 30 PSU in winter (Frigstad et al., Reference Frigstad, Kaste, Deininger, Kvalsund, Christensen, Bellerby, Sørensen, Norli and King2020; Hordoir et al., Reference Hordoir, Dieterich, Basu, Dietze and and Meier2013). In northern Norway, salinity is above 34 PSU most of the time and can decrease to 20 PSU in episodic events of freshwater input during the summer melting season (Frigstad et al., Reference Frigstad, Kaste, Deininger, Kvalsund, Christensen, Bellerby, Sørensen, Norli and King2020). The MLD can be deeper than 50 m in the winter and shallower than 30 m in the summer due to the input of fresh waters and surface heating (Peralta-Ferriz and Woodgate, Reference Peralta-Ferriz and Woodgate2015).

2.2. In situ data collection

Abundance (CellsL⁻¹) of Alexandrium spp., Alexandrium tamarense group, D. acuta, D. acuminata, D. norvegica, Pseudo-nitzschia spp., P. reticulatum, and A. spinosum, was provided by the monitoring program of algae toxins in mussels and dietetic advice to the public from NFSA. Algae samples are collected at several aquaculture mussel sites weekly (every Monday). The monitoring program is run routinely, and the data used in this study cover from 2006 to 2019. The program sampling method consists of collecting water samples by lowering a tube from the surface to a 3 m depth. A subsample of 25 mL is taken from the water sample and preserved with acidic Lugol’s iodine before being transported to the laboratory for analysis. The subsample (25 mL) is filtered on a membrane filter, and the genus and species present are identified and counted on the whole filter under a light microscope at 200× magnification.

Only 35 shellfish farms in the coastal area, shown in Figure 1, are included in the study because the remote sensing and model data used are less reliable or unavailable (due to spatial resolution) in the inner fjords (see Section 2.3). The amount of data available depends on each farm’s operation time. For 8 locations, we only have 1 year of data, while for 11 sites, we have more than 9 years of data. In situ data are not collected during the winter months—it is out of the productivity season. The total data for training and evaluating the probabilistic models comprise 5919 samples for each taxon. An example of algae abundance time series in Arendal 2019 is shown in Figure 2a-h to illustrate the data used as the input for modeling.

Figure 2. Arendal (in the south of Norway) time series in 2019 as an example of the input data for training and testing the models. The time series for (a) Alexandrium spp., (b) Alexandrium tamarense, (c) D. acuta, (d) D. acuminata, (e) D. norvegica, (f) Pseudo-nitzschia spp., (g) P. reticulatum, (h) A. spinosum, (i) SST, (j) PAR, (k) MLD, and (k) SSS.

2.3. Satellite and model reanalysis data

We use SST (°C) from the ESA SST CCI and C3S global SST reprocessed product level 4, retrieved from the Copernicus Marine Environment Monitoring Service (CMEMS). The product is created by running the Operational Sea Surface Temperature and Sea Ice Analysis system (Good et al., Reference Good, Fiedler, Mao, Martin, Maycock, Reid, Roberts-Jones, Searle, Waters, While and Worsfold2020), which combines satellite (AATSR, ATSR, SLSTR, and AVHRR) and in situ observations to produce gap-free maps of daily average SST at 0.05° of spatial resolution (Merchant et al., Reference Merchant, Embury, Bulgin, Block, Corlett, Fiedler, Good, Mittaz, Rayner, Berry, Eastwood, Taylor, Tsushima, Waterfall, Wilson and Donlon2019). In the Nordic seas, the SST uncertainty is below 0.4 °C (Good et al., Reference Good, Fiedler, Mao, Martin, Maycock, Reid, Roberts-Jones, Searle, Waters, While and Worsfold2020).

The MLD (m) and SSS (PSU) are provided by the CMEMS Arctic MFC TOPAZ modeling system (Sakov et al., Reference Sakov, Counillon, Bertino, Lisæter, Oke and Korablev2012; Xie et al., Reference Xie, Bertino, Counillon, Lisæter and Sakov2017). The TOPAZ system is a coupled ocean–sea ice model and data assimilation system for the North Atlantic and Arctic Oceans. The ocean model couples a Hybrid Coordinate Ocean Model (Bleck, Reference Bleck2002) with an elasto-viscous-plastic sea ice model (Hunke and Dukowicz, Reference Hunke and Dukowicz1997). TOPAZ weekly assimilates available ocean and sea ice data with the ensemble Kalman filter (Evensen, Reference Evensen2003). The MLD is calculated using a density criterion with a 0.01 kg m⁻³, as in Petrenko et al. (Reference Petrenko, Pozdnyakov, Johannessen, Counillon and Sychov2013) and Ferreira et al. (Reference Ferreira, Hátún, Counillon, Payne and Visser2015). The SSS is extracted from depths of 0–3 m. The TOPAZ product performs well concerning ocean variables near the surface. In the first 200 m, salinity root mean squared difference (RMSD) and bias are below 0.3 PSU and between −0.05 and 0.05 PSU. The temperature RMSD and bias are below 1 °C and between −0.5 and 0.5 °C, respectively (Xie et al., Reference Xie, Bertino, Counillon, Lisæter and Sakov2017; Lien et al., Reference Lien, Hjøllo, Skogen, Svendsen, Wehde, Bertino, Counillon, Chevallier and Garric2016). Note that MLD uncertainties are not provided, but since it is computed using density, temperature and salinity indicates the quality of MLD estimations. Nevertheless, the MLD error is expected to be about 10 m on the Norwegian coast (L. Bertino, personal communication).

PAR ( $ {\mathrm{Em}}^{-2}{\mathrm{d}}^{-1} $ ) irradiance onto the ocean surface from 2006 to 2019 is retrieved from the GlobColour project, which uses MODIS, SeaWiFS, and VIIRS sensors binned at an 8-day interval at a 4 km of spatial resolution. Originally developed for SeaWiFS, the product presents an accuracy of R = 0.88 and RMSD = 5.7 Em⁻²d⁻¹ compared to in situ measurements (Frouin et al., Reference Frouin, Franz and Wang2003).

All data are reprojected to stereographic projection centered at 65°N and 7°E and at 4 km spatial resolution using the nearest neighbor interpolation method. Because of the coarse spatial resolution, we extract the location time series as the average of unmasked grid cells within the 11 × 11 grid of the fields around the location. Therefore, the SST, SSS, MLD, and PAR represent the average conditions at the surrounding of the farm and come with a 44 km effective resolution. An example time series of all extracted data is shown in Figure 2i–l for Arendal 2019.

2.4. The calibration of presence probabilistic models

The algae abundance is converted to binary values needed to calibrate the probabilistic models. For the presence models, a threshold of 1 CellsL⁻¹ is used for creating the two classes. This means that samples with abundances above or equal to 1 CellsL⁻¹ are defined as class = 1, and samples with 0 CellsL⁻¹ are defined as class = 0. The models are fed with SST, SSS, MLD, and PAR as input. The models are evaluated at an operational level where past data are used for training (2006–2013) and employed in future data for testing (2014–2019).

The first step is preprocessing the data, starting with scaling (or normalization) of each input predictor:

(1)

$$ {\mathbf{x}}_{\mathrm{scaled},\mathrm{pred}}=\frac{{\mathbf{x}}_{\mathrm{pred}}-{\overline{\mathbf{x}}}_{\mathrm{pred}}}{\sigma_{\mathrm{pred}}} $$

where $ {\mathbf{x}}_{\mathrm{scaled},\mathrm{pred}} $ is the scaled sample $ \mathbf{x} $ for an input predictor (pred), $ {\mathbf{x}}_{\mathrm{pred}} $ is the input predictor value, $ {\overline{\mathbf{x}}}_{\mathrm{pred}} $ is the average, and $ {\sigma}_{\mathrm{pred}} $ is the standard deviation computed on the training data (2006–2013). Since the scaled environmental data are significantly correlated, a second preprocessing step using the principal component analysis is used to convert the scaled predictors to four decorrelated principal components (c1, c2, c3, and c4), which are fed into the SVM model.

The SVM probabilistic algorithm consists on first calibrating a margin hyperplane in an n-dimensional space to separate two classes by using the nearest support vectors (samples) of both classes. The computation of the margin hyperplane depends on the input (principal components) and on the SVM hyperparameters that need to be fixed before optimization, such as the kernel function, the penalty factor ( $ C $ ), and the $ \gamma $ (for nonlinear kernels). The kernel function converts the input values to a feature space where the hyperplane is computed. Common kernel functions are linear, polynomial, and radial basis function (RBF). The margin hyperplane is optimized by the Hinge Loss function with a tolerance stop criterion of 0.001. The hyperparameter $ C $ controls the trade-off between maximizing the margin and minimizing the training errors, and $ \gamma $ controls the weight a single sample has on adjusting the hyperplane in an RBF kernel. A deeper explanation of SVM can be found in Cortes and Vapnik (Reference Cortes and Vapnik1995), Platt (Reference Platt1999), Tan et al. (Reference Tan, Steinbach and Kumar2008). The hyperparameters are tuned in a grid search using cross-validation procedure (Hastie et al., Reference Hastie, Tibshirani and Friedman2009) in the training dataset with two folds randomly split 100 times. The grid search uses the correlation between observed presence frequency and estimated probability—obtained in a reliability diagram—as a decision criterion (see Section 2.7). The tuned hyperparameters for all algae models were kernel = RBF, $ C=1 $ , and $ \gamma $ defined by the following equation:

(2)

$$ \gamma =\frac{1}{n_{\mathrm{comp}}\times {\sigma}_{\mathrm{all}}^2} $$

where $ {n}_{\mathrm{comp}} $ is the number of principal components ( $ {n}_{\mathrm{comp}}=4 $ ), and $ {\sigma}_{\mathrm{all}}^2 $ is the scalar variance of all the components stacked as one vector. Since the number of samples with the harmful algae taxa detected can represent only a portion of the total dataset, we adjust the weight of the samples—used during the hyperplane optimization—inversely proportional to the class frequencies:

(3)

$$ w\left(\mathbf{x}\right)=\frac{n_{\mathrm{sam}}}{2\times {n}_{\mathrm{sam},\mathrm{class}}} $$

where $ w $ is the weight given to sample $ \mathbf{x} $ , $ {n}_{\mathrm{sam}} $ is the total number of samples, and $ {n}_{\mathrm{sam},\mathrm{class}} $ is the number of samples for the class of $ \mathbf{x} $ .

Finally, the second step of the SVM probabilistic algorithm is to use the margin hyperplane to fit a probability function using the Platt (Reference Platt1999) method:

(4)

$$ P\left(c\left(\mathbf{x}\right)=1|f\right)=\frac{1}{1+\exp \left( Af+B\right)} $$

where $ P $ is the probability of sample $ \mathbf{x} $ being class 1 ( $ c\left(\mathbf{x}\right)=1 $ ), the input $ f $ is the SVM output of each predicted sample corresponding to its orthogonal distance from the hyperplane, scaled proportionally from $ - $ 1 to 1 defined between the support vectors distance, and $ A $ and $ B $ are the parameters fitted using the maximum likelihood in the training dataset. The SVM is implemented in the Python programming language on Scikit-learn package (Pedregosa et al., Reference Pedregosa, Varoquaux, Gramfort, Michel, Thirion, Grisel, Blondel, Prettenhofer, Weiss, Dubourg, Vanderplas, Passos and Cournapeau2011).

2.5. Model and data input uncertainties

Model uncertainty is estimated by training new models using two-thirds of the pseudo-randomly subsampled data from the training dataset. Subsampling is repeated 100 times, and all models are employed in the testing dataset. The reliability (see Section 2.7) is then estimated for all interactions and subtracted by their median.

Uncertainties in the input data from model reanalysis and remote sensing data are susceptible to yield errors that can deteriorate the outcome probability estimates. We assess the input uncertainty by adding a random perturbation to the input testing dataset and evaluating the impact on reliability. We generate a perturbed ensemble of input, as follows:

(5)

$$ {\mathbf{x}}_i=\mathbf{x}+{\varepsilon}_i $$

where $ \mathbf{x} $ is the input vector, and $ \epsilon $ is the Gaussian white noise $ \in {\mathrm{\mathbb{R}}}^{n\times N}\approx \mathrm{\mathbb{N}}\left(o,{\sigma}^2\right) $ , with n being the size of the input vector and $ \sigma $ is the standard deviation of the input data. We set the $ \sigma $ as 0.4 ° C for SST, 10 m for MLD, 0.3 PSU for SSS, and 5.7 Em⁻²d⁻¹ for PAR. These $ \sigma $ values relate to the reported input errors (see Section 2.3). This experiment is repeated 100 times and the reliability changes are estimated for each interaction.

2.6. Calibration of probabilistic models for higher sanitary thresholds and HAs

The presence of a toxic species is not necessarily harmful to the environment or to human shellfish consumption. The NFSA establishes specific sanitary thresholds of taxa abundance to consider harmful (Table 1). Probabilistic models of HA are more desirable than presence models for shellfish farms because some taxa might be present at a low level without requiring action. We recalibrate the models for an increasing sets of thresholds for each taxon from the presence to HA (with a percentile range of 20%), shown in Table 1. Each threshold is used to define class 0 referring to below the threshold and class 1 above it (see Section 2.4). For example, we recalibrate the A. tamarense probabilistic models for abundances above 1 (presence model), 40, 80, 120, 160, and 200 CellsL⁻¹ (HA model). We estimate the reliability change along the percentiles and evaluate the feasibility of moving from the presence probabilistic model to the HA probabilistic model for each taxon. Note that harmful thresholds for Alexandrium spp. and A. spinosum are not defined by NFSA, so we resort to using 200 CellsL⁻¹ for Alexandrium spp. as it is used for A. tamarense, and 3600 CellsL⁻¹ for A. spinosum as it corresponds to the 99% percentile value of the database.

Table 1. Sanitary thresholds used for calibrating the probabilistic models for each taxon; from presence (CellsL⁻¹ > =1) to HA of each taxon

HA levels are determined by the NFSA. Percentage thresholds correspond to the percentile from the presence to HA.

2.7. Reliability assessment

Our models are evaluated by comparing the probability estimated by the models with the observed frequencies for different bins of the probability density function—an approach called reliability diagram (Bröcker and Smith, Reference Bröcker and Smith2007). First, the estimated probabilities in the testing period (2014–2019) are ranked in 10 separate bins using the percentiles to define the bin widths. Probability values from percentiles 1%–10% are clustered in bin 1, and from 11% and 20% in bin 2, continuing up to bin 10. Then, the bins’ average probability and observed frequency are computed. For example, the presence probability of D. acuta is estimated for all weekly observations in the farm locations available from 2014 to 2019, following the spatial variability and time intervals of the NFSA monitoring program. We match the estimated presence probability with the observed presence (0 or 1), which is used to estimate the average presence probability and observed frequency in the bins as aforementioned. As an example, we verify that when our model predicts that there is a 10% chance that a species is detected, it happens 10% of the time in an independent validation period. Hence, our models are evaluated by their skill in modeling the frequency of that toxic algae presence—or HA—being detected in the monitoring. To quantify how well our model is doing, the Pearson correlation (R), Root Mean Squared Error (RMSE), and average bias (AB) between the averaged probability and observed frequency are calculated as follows:

(6)

$$ R=\frac{\operatorname{cov}\left({F}_{\mathrm{bin}},{P}_{\mathrm{bin}}\right)}{\sigma_{{\mathrm{F}}_{\mathrm{bin}}}{\sigma}_{{\mathrm{P}}_{\mathrm{bin}}}} $$

(7)

$$ \mathrm{RMSE}=\sqrt{\frac{1}{n_{\mathrm{bin}\mathrm{s}}}\sum \limits_{i=1}^{n_{\mathrm{bin}\mathrm{s}}}{\left({P}_{\mathrm{bin},i}-{F}_{\mathrm{bin},i}\right)}^2} $$

(8)

$$ \mathrm{AB}=\frac{1}{n_{\mathrm{bin}\mathrm{s}}}\sum \limits_{i=1}^{n_{\mathrm{bin}\mathrm{s}}}\left({P}_{\mathrm{bin},i}-{F}_{\mathrm{bin},i}\right) $$

where $ {F}_{\mathrm{bin}} $ and $ {P}_{\mathrm{bin}} $ are the observed frequencies and averaged estimated probabilities of the bins, cov is the covariance function, $ \sigma $ is the standard deviation, $ {n}_{\mathrm{bins}} $ is the number of bins (10), and i refers to one bin. For evaluating the RMSE changes along different sanitary thresholds, the RMSE is normalized by the range of minimum and maximum probabilities the model can predict:

(9)

$$ {\mathrm{RMSE}}_{\mathrm{norm}}=\frac{\mathrm{RMSE}}{P_{\mathrm{max}}-{P}_{\mathrm{min}}} $$

where $ {\mathrm{RMSE}}_{\mathrm{norm}} $ is the normalized RMSE, $ {P}_{\mathrm{max}} $ and $ {P}_{\mathrm{min}} $ are the maximum and minimum probabilities the model could estimate in the testing dataset.

2.8. Presence model sensitivity to predictors

Machine learning is often mistakenly perceived as a “black box” approach that provides outputs without revealing the underlying inference processes. While it may not provide explicit equations describing the inference processes, the models can be used to understand how the target responds to input variations. One of such method is estimating the sensitivity of the model, which should not be confused with the commonly referred term “sensitivity” associated with true positive rates or recall. In the sensitivity analysis, we fix the values of all input predictors except one, for which we span the variability and observe the response changes in the model’s output. In our study, we choose to fix three input predictors at their respective medians and vary the remaining one from the 2.5th percentile to the 97.5th percentile based on the total dataset. For example, when estimating the response of each taxon to SST, we utilize the medians of MLD, SSS, and PAR. We then vary the SST values from 3.7 °C to 18.8 °C and compute the corresponding probability response. Specifically, the medians for SST, MLD, SSS, and PAR are 11.96 °C, 9.27 m, 32.93 PSU, and 29.58 Em⁻²d⁻¹, respectively. It is important to note that the sensitivity is analyzed by the fixed values of the other predictors (their median), which can influence the amplitude of the response. Nevertheless, this approach allows us to evaluate the marginal response to variations in the input across all ranges of variability.

2.9. Presence probability maps

We generate presence probability maps for all the analyzed taxa in this study. To ensure consistency between the calibrated model and the probability maps, we employ an 11 × 11 average filter to the input, ensuring that the spatial variability present in the maps represents the one which the models are calibrated. In addition, we exclude grid cells located beyond a 30 km distance from the coastline, as the models are calibrated solely using coastal data, and their application is limited to coastal areas. Subsequently, we apply the probabilistic models on a weekly basis for all months and years, yielding what we refer to as the weekly probability maps. These weekly maps are then averaged for each year, resulting in an annual average of weekly probabilities, and then are averaged from 2006 to 2019.

2.10. Seasonal probability estimation

We select four distinct sampling locations, namely Arendal, Bømlo, Nærøy, and Vesterålen, to extract regional averaged seasonal time series from 2014 to 2019. While some regions look similar (e.g., Nærøy and Vesterålen), there is a slight delay as a result of the latitude difference. These regions are specifically chosen for their geographical spread, covering southern to northern Norway, and for having the longest available time series among the nearby shellfish farms. To derive the seasonal probability estimates, we calculate the average probability for each week of the year across all years available in the testing dataset. This approach allows us to capture the typical patterns of presence probability for all taxa under consideration. Additionally, for four taxa where the HA probability models demonstrate reliable performance, we also compute the seasonal HA probability. HA models with inadequate performance (see Section 3.1) are excluded from the computation of seasonal probability, ensuring that only models with satisfactory results are utilized in determining the overall seasonal probability estimates.

3. Results

3.1. Reliability of the probabilistic models

The presence probability estimates of all taxa are strongly correlated with the observation frequency (Figure 3, Table 2). The R varies from 0.69 to 0.98, the RMSE from 0.8% to 7.8%, and the AB from −5.2% to 5.7%. It is noteworthy that the probabilistic models estimate distinct percentage ranges for each individual taxon. The models for A. tamarense, D. acuta, P. reticulatum, and A. spinosum can estimate low chance of presence (P < 1%), while models for the Alexandrium spp., D. acuminata, D. norvegica, and Pseudo-nitzschia spp. cannot. None of the models can estimate high probabilities of presence—with a maximum probability ranging from 14% for A. spinosum to 69% for D. norvegica.

Figure 3. Reliability diagram. Comparison between the estimated presence probability and the observed presence frequency estimated in 10 bins for all taxa and their linear regression.

Table 2. Statistical results for presence models for the eight taxa studied

Values are estimated by comparing the probability average and the observed presence frequency of percentiles bin shown in Figure 3. The minimum and maximum probabilities returned by our model are reported. * Denotes significant correlation tested against an $ \alpha =0.05 $ .

The model uncertainty shows robust R for most taxa models, where subsampling the training data has led to deviations between $ - $ 0.1 and 0.1 (Figure 4a). The exception is P. reticulatum as R deviations are larger than 0.2. The RMSE is consistent to a certain degree among the taxa models (Figure 4b) with deviations between −1 and 1%. The AB follows a similar pattern as its deviations are in the −1 and 1% interval (Figure 4c).

The input data uncertainty is relatively low for most taxa models. The R remains relatively high for most taxa with values above 0.8 independent of input variables with artificial white noise added (Figure 5a). The exception is the P. reticulatum model, showing a general decrease in R down to 0.59. The RMSE is relatively low with values below 8.2% (Figure 5b). A significant increase in RMSE is observed for D. acuminata, being 0.4% higher compared to the reference model. AB shows low changes when white noise is added (Figure 5c). In general, no significant degradation of results is observed when adding white noise that is equivalent to the input error products, except for the P. reticulatum model.

As the sanitary thresholds increase (in % of HA), the quality of the models evolves differently (Figure 6). The models of Alexandrium spp., Alexandrium tamarense, D. acuta, and A. spinosum show little R decrease (remaining significant) and a low increase of $ {\mathrm{RMSE}}_{\mathrm{norm}} $ . The D. norvegica models show a decrease in R toward higher thresholds, where R is still significant up to the 80% percentile (3200 CellsL⁻¹) but decreases to R < 0.5 at HA. The D. norvegica models also show an increase in $ {\mathrm{RMSE}}_{\mathrm{norm}} $ . The models for D. acuminata, Pseudo-nitzschia spp. and P. reticulatum exhibit significant R only in the presence thresholds, and the $ {\mathrm{RMSE}}_{\mathrm{norm}} $ increases rapidly for higher sanitary thresholds. The AB for all taxa models tends to 0% while the thresholds increase (Figure 6c), but due to the range reduction (minimum and maximum predicted, not shown) rather than improvement of the models. It is noteworthy that the total number of observations above the sanitary levels (referred to as class = 1) decreases in higher sanitary thresholds for all taxa (Figure 6d), showing a maximum of 1974 presence observations for the Pseudo-nitzschia spp. and a minimum of 11 HA observations for the P. reticulatum.

Figure 4. Model uncertainty for the presence models. The model uncertainties are shown in R (a), RMSE (b), and AB (c) deviations of the median over 100 interactions of randomly subsampling two-thirds of the training dataset for training new models and applying them to the testing dataset. The x-axis is the model for each taxa.

Figure 5. Data input uncertainty for the presence models. The data input uncertainties are shown in R (a), RMSE (b), and AB (c). The blue bars are the reference models (shown in Table 2), and the orange bars are the average of 100 interactions of randomly adding white noise to the testing input dataset. Black lines are the 95% confidence interval. The x-axis is the model for each taxa.

Figure 6. Statistical changes along different sanitary thresholds. The changes of R (a), RMSE (b), AB (c), and total number of samples above the threshold (d) are shown for different sanitary levels of each taxa. The x-axis shows the relative percentile threshold from presence (CellsL⁻¹ > =1) to the HA of each taxa. The black dashed horizontal line in (a) corresponds to the significant level threshold for p-value < 0.05.

3.2. Presence models response to environmental input

The presence model response of each taxon shows the probability peak at different intervals of SST (Figure 7a). Among the Dinophysis spp., D. norvegica shows a probability increase in colder waters, D. acuminata shows the highest probability between 10 °C and 11 °C, and D. acuta probability peaks around 18 °C. The Alexandrium spp. and Alexandrium tamarense models show similar responses where the highest probabilities of both are in colder waters near 7.5 °C. Pseudo-nitzschia spp. probability increases toward warmer waters. P. reticulatum exhibits the highest probability near 11 °C, and the A. spinosum probability peaks around 14.5 °C.

Figure 7. Presence models sensitivity to (a) SST, (b) MLD, (c) SSS, and (d) PAR. For each sensitivity simulation, the other predictors are a fixed value (the median of the dataset). SST, MLD, SSS, and PAR medians are 11.96 ° C, 9.27 m, 32.93 PSU, and 29.58 Em⁻²d⁻¹.

The models’ responses to MLD display two regimes (Figure 7b). Alexandrium spp., A. tamarense, D. acuminata, P. reticulatum, and A. spinosum exhibit increasing probabilities toward shallower MLD (exponential distribution), with amplitudes varying from 4% to 43%. D. acuta shows a slight increase in probability toward MLD at around 40 m. Similarly, Pseudo-nitzschia spp. probabilities also increase toward shallow MLD, peaking around 27 m. In contrast, D. norvegica shows apparently low sensitivity to MLD, with probabilities varying from 17% to 22% across all MLD values.

The influence of SSS on the models of toxic algae reveals three configurations. Probabilities for Alexandrium spp., A. tamarense, D. acuta, and P. reticulatum are lower than 5% at 25 PSU, and increase to more than 10% at 34.5 PSU. D. acuminata and Pseudo-nitzschia spp. exhibit a sharper increase, with probabilities ranging from 15–20% at 25 PSU to 37–46% at 34.5 PSU. A. spinosum probabilities increase from 0% at 25 PSU to 6% at 34.5 PSU. In contrast, D. acuta shows minimal changes across the simulated SSS range. Finally, D. norvegica displays a sharp decrease in probability, from 47% at 25 PSU to 14% at 34.5 PSU.

Increased PAR demonstrates a positive influence on the probabilities of most taxa. The probabilities of D. acuminata, D. norvegica, and Pseudo-nitzschia spp. increase from values below 24% at 7 Em⁻²d⁻¹ to values above 36% at 50 Em⁻²d⁻¹. Alexandrium spp., A. tamarense, and P. reticulatum also show an increase, although to a lesser extent, from values below 5% at $ 7\hskip0.1em {\mathrm{Em}}^{-2}{\mathrm{d}}^{-1} $ to values above 8% at 50 Em⁻²d⁻¹. D. acuta is the only species showing a substantial decrease in probability with higher values of PAR, from 14% at 7 Em⁻²d⁻¹ to 2% at 50 Em⁻²d⁻¹. Finally, A. spinosum probability shows little response to PAR.

3.3. Presence probability maps

The Alexandrium spp. and Alexandrium tamarense annual average of weekly probabilities have a similar pattern, but the latter reaches lower probability values (Figure 8). The Alexandrium spp. annual average of weekly probabilities along the Norwegian coast varies from 6% to 18%, while the A. tamarense varies from 3% to 12%. For both taxa, probabilities are larger in the northern parts along the Norwegian coast. The probability decreases in the southern regions, and is lowest values in the Skagerrak Strait. The Dinophysis spp. annual average of weekly probabilities is also similar among the three species but show different amplitudes. D. acuta shows the lowest probability values, varying from 2 to 9%, D. acuminata varies from 17% to 29%, and D. norvegica varies from 12% to 35%. For all Dinophysis species, the highest probability is found along the southern Norwegian coast in the Skagerrak, and it decreases northward. Pseudo-nitzschia spp. probabilities are similar from the western to the northern Norwegian coast, but decreases in the Skagerrak Strait, ranging from 17% to 35%. P. reticulatum probabilities are nearly homogeneous along the entire Norwegian coast, with the lowest probability in the Oslofjord and far northern coastal waters, varying from 3% to 7%. Finally, A. spinosum shows the lowest annual average of weekly probabilities, varying from 0% in the Skagerrak Strait to 3% on the western and northern coast.

Figure 8. Spatial distribution of the annual average of the weekly probability (in %) predicted by the presence models for Alexandrium spp. (a), Alexandrium tamarense (b), D. acuta (c), D. acuminata (d), D. norvegica (e), Pseudo-nitzschia spp. (f), P. reticulatum (g), and A. spinosum (h). The prediction period corresponds from 2006 to 2019.

3.4. Seasonal presence and HA probabilities

The presence probability of Alexandrium spp. displays significant seasonal variability in Bømlo, Vesterålen, and Nærøy, peaking from week 10 to 30 (Figure 9). However, in the Arendal location, the seasonal probabilities remain low. A. tamarense also exhibits seasonal variability in all four regions, with a brief period of increasing probability from week 10 to 20 in Arendal, and a longer period from week 1 to 40 in the other regions.

Figure 9. Seasonal variability of all the presence probability of (a) Alexandrium spp., (b) Alexandrium tamarense, (c) D. acuta, (d) D. acuminata, (e) D. norvegica, (f) Pseudo-nitzschia spp., (g) P. reticulatum, and (h) A. spinosum. Seasonal probabilities are shown for the regions of Arendal (blue), Bømlo (orange), Vesterålen (green), and Nærøy (red).

Dinophysis spp. shows distinct seasonal periods of increased presence probability. D. acuta starts increasing from week 20 to 25 but decreases to values close to 0% from week 45 onward. On the other hand, D. acuminata and D. norvegica exhibit probabilities higher than 10% throughout the year, indicating their presence in all seasons, including winter. The timing of increased probability for D. acuminata is delayed as the sampled region is located northward, starting in week 11 in Arendal and week 20 in Vesterålen. The seasonal probability amplitude of D. norvegica varies significantly among the regions. Arendal shows probabilities higher than 50% from week 13 to 30, while the other regions only reach values up to 30% in shorter periods.

In most regions, except Nærøy, the presence probabilities of Pseudo-nitzschia spp. display high-frequency changes and a weak seasonal pattern. Similar to D. acuminata and D. norvegica, the probabilities remain higher than 10% throughout the year, indicating that Pseudo-nitzschia spp. may be detected year-round. P. reticulatum exhibits increased probabilities in short periods, shifting to later in the year as the sampled region moves northward. In Arendal, the window of increased probability is from week 15 to 30, while in Vesterålen, it extends from week 20 to 37. A. spinosum seasonal probability detection is generally low, varying from 0% to 9%. In Arendal, the increased probability period is from week 30 to 53, whereas in the other regions, it is from week 25 to 40–53.

The seasonal HA probabilities of Alexandrium spp., A. tamarense, D. acuta, and A. spinosum are characterized by shorter periods compared to their overall presence probabilities (Figure 10), indicating that while these taxa can be detected throughout the year, the occurrence of HA is limited to shorter time windows. For instance, in Bømlo, the seasonal period with elevated HA probability for A. tamarense (Figure 10f) is between weeks 10 and 30, while the presence probability remains above 0% throughout the entire year. This pattern of HA restriction is consistent across different taxa and in other regions as well. Furthermore, all the HA observed from 2014 to 2019 only occurred in the shortened period of increased HA probabilities.

Figure 10. Seasonal presence (in blue) and HA probabilities (in orange) of Alexandrium spp. (a, b, c, and d); Alexandrium tamarense (e, f, g, and h); D. acuta (i, j, k, and l); and A. spinosum (m, n, o, and p) from 2014 to 2019. Probabilities are shown for the regions of Arendal (column 1: a, e, i, and m); Bømlo (column 2: b, f, j, and n); Næroy (column 3: c, g, k, and o); and Vesterålen (column 4: d, h, l, and p). HA observations by the local monitoring from 2014 to 2019 are shown as red columns.

4. Discussion

4.1. SVM skill in modeling harmful algae probability

Few machine learning techniques for modeling the probability of toxic algae have been explored, including GLM (Anderson et al., Reference Anderson, Kudela, Benitez-Nelson, Sekula-Wood, Burrell, Chao, Langlois, Goodman and Siegel2011), decision trees (Bouquet et al., Reference Bouquet, Laabir, Rolland, Chomérat, Reynes, Sabatier, Felix, Berteau, Chiantella and Abadie2022), and GBM (Klemm et al., Reference Klemm, Cembella, Clarke, Cusack, Arneborg, Karlson, Liu, Naustvoll, Siano, Gran-Stadniczeñko and John2022). We demonstrate that SVM is a highly reliable approach for estimating the presence and HA probability of toxic algae in Norwegian coastal waters. By employing an RBF kernel, the SVM model is able to learn distinct responses to different environmental inputs. For instance, the response of D. acuminata to SST exhibited a bell-like shape, while its response to MLD followed an exponential pattern. The parameter $ \gamma $ plays a crucial role in controlling the smoothness of these responses, and careful fine-tuning is necessary to avoid overfitting the model to the training dataset. To ensure realistic and smooth responses without overfitting, the $ \gamma $ has to be set as a function of the input variables (Equation 2). Furthermore, most of the taxa models—except for P. reticulatum—demonstrate a low model uncertainty, indicating that the SVM converges to similar solutions despite the randomness effect of subsampling the data for training. Modeling the P. reticulatum presence proves more challenging because of the higher model uncertainty and inferior performance.

It is important to acknowledge that the SVM models calibrated with SST, MLD, SSS, and PAR cannot estimate a 100% probability, because we only consider a few of the very many potential input predictors. Other inputs such as prey availability, nutrients, and grazing (Kim et al., Reference Kim, Kang, Kim, Yih, Coats and Park2008; Smayda, Reference Smayda2008; Wells et al., Reference Wells, Karlson, Wulff, Kudela, Trick, Asnaghi, Berdalet, Cochlan, Davidson, Rijcke, Dutkiewicz, Hallegraeff, Flynn, Legrand, Paerl, Silke, Suikkanen, Thompson and Trainer2020) may be critical but are not considered here since they are not available for a long enough period. Nevertheless, the SVM algorithm provides the minimum and maximum probability range associated with the available set of inputs. With longer time series of observations in the future, additional input predictors can be incorporated, reliability improved, and the range of predictable probability can be expanded.

The reliability of the probabilistic models is likely impacted by the input quality (see Section 2.3). We estimate the input uncertainty as relatively small—except for P. reticulatum—considering their errors assessed on their original spatial resolution and on a global scale ( $ \sigma $ in equation 5). However, when we average each farm time series in a 44 km resolution, we might increase the input errors that are not included in the estimation of data input uncertainty. For example, episodic freshwater input from small rivers may introduce high spatial variability of SSS near the coast (Frigstad et al., Reference Frigstad, Kaste, Deininger, Kvalsund, Christensen, Bellerby, Sørensen, Norli and King2020). The SSS response for most taxa is highly variable in the 30–35 PSU range (Figure 7), and high spatial variability may mismatch the observation frequency in the farm (induced by the local SSS) with the probability modeled by SSS averaged at 44 km resolution. Although we have not estimated the effect of coarser resolution on the model skill, it should not be severe since we can still produce reliable models using a 44 km resolution.

When the sanitary thresholds are increased to HA levels for calibrating the probabilistic models, the R and RMSE remain relatively unchanged for Alexandrium spp., A. tamarense, D. acuta, and A. spinosum, while they substantially worsens for D. norvegica, D. acuminata, Pseudo-nitzschia spp., and P. reticulatum. The decrease in the number of samples with class = 1 available for training the SVM model likely contributed to this deterioration. For instance, there are 1586 observations of presence available for training the presence model of D. norvegica (threshold > = 1 CellsL⁻¹), while there are only 13 observations of HA available for the HA model (threshold >4000 CellsL⁻¹) of the same species. The decrease of observations representing positive instances as the threshold increases constrains the patterns that the SVM can learn and, importantly, the remaining data available for evaluating the model’s performance.

The small changes in R and RMSE for Alexandrium spp., A. tamarense, D. acuta models may be attributed to the small difference between the presence threshold (1 CellsL⁻¹) and the HA threshold (200 CellsL⁻¹). For Pseudo-nitzschia spp., D. acuminata, D. norvegica, and P. reticulatum HA models, the number of HA observations is too few to expect good skill. In the coming years, with longer periods of training data, we may be able to improve our estimation for these taxa. Furthermore, the conditions leading to HA are often more complex than those for presence as they depend on compound environmental factors.

4.2. Environmental influence: geographical and seasonal variability

The analysis of the simulated environmental responses provides insights into how each taxon responds to different environmental conditions. However, it is important to note that we provide an analysis of the response to one input predictor using fixed values (median) for the other input predictors. For example, the probability response to SST is observed at a PAR of 29.58 Em⁻²d⁻¹. If we set PAR to 0 Em⁻²d⁻¹, the SST probability response decreases close to zero, but the general variability of the response stays the same (not shown). Our primary interest lies in the relative probability, and how the responses may explain the geographical and seasonal variability of each taxon group.

The simulated responses indicate that the probability of presence for most taxonomic groups increases in proportion to PAR, except for D. acuta and A. spinosum. This observation aligns with previous studies that have established a positive correlation between PAR and the occurrence or growth rates of Alexandrium spp. (Anderson et al., Reference Anderson, Alpermann, Cembella, Collos, Masseret and Montresor2012), D. acuminata (Kim et al., Reference Kim, Kang, Kim, Yih, Coats and Park2008), Pseudo-nitzschia spp. (Bates et al., Reference Bates, Hubbard, Lundholm, Montresor and Leaw2018), and P. reticulatum (Paz et al., Reference Paz, Vázquez, Riobó and Franco2006). Comparative analysis between D. acuminata and D. acuta has revealed that the latter is more susceptible to photodamage under high light intensity (García-Portela et al., Reference García-Portela, Riobó, Reguera, Garrido, Blanco and Rodríguez2018). This susceptibility may explain the higher probability of D. acuta in relatively low PAR and its typical bloom period in autumn (Dahl and Johannessen, Reference Dahl and Johannessen2001; Naustvoll et al., Reference Naustvoll, Gustad and Dahl2012). A. spinosum growth rates measured in laboratory are low sensitive to light intensities (Jauffrais et al., Reference Jauffrais, Séchet, Herrenknecht, Truquet, Véronique, Tillmann and Hess2013), which may contribute to its low probability response to PAR changes. It is worth noting that D. acuta and A. spinosum cannot grow in complete darkness like most other taxonomic groups, resulting in close-to-zero presence probabilities during winter months (Figures 9 and 10). Therefore, PAR may limit the presence of toxic algae during winter and contribute to the restriction of occurrence of D. acuta during the autumn season. However, PAR may not entirely limit the latitudinal expansion of the annual average probability of a few taxa. For instance, Alexandrium spp. and A. tamarense exhibit higher presence probabilities in the northern region, even though polar nights are more prolonged.

Blooms of Alexandrium spp. and Dinophysis spp. are commonly associated with stratified waters (Klemm et al., Reference Klemm, Cembella, Clarke, Cusack, Arneborg, Karlson, Liu, Naustvoll, Siano, Gran-Stadniczeñko and John2022; Reguera et al., Reference Reguera, Velo-Suárez, Raine and Park2012). The simulated response supports these associations, as the presence probability of D. acuminata, Alexandrium spp., and A. tamarense increases in shallower MLD—commonly correlated with more stratified waters. Note that MLD is also commonly associated with other important features related to HABs, such as nutricline depth and upwelling (Hällfors et al., Reference Hällfors, Hajdu, Kuosa and Larsson2011; Paulino et al., Reference Paulino, Larsen, Bratbak, Evens, Erga, Bye-Ingebrigtsen and Egge2018; Peralta-Ferriz and Woodgate, Reference Peralta-Ferriz and Woodgate2015; Rial et al., Reference Rial, Sixto, Vázquez, Reguera, Figueroa, Riobó, Rodríguez, acuta and acuminata2023). The presence probability in shallower MLD can be up to four times higher than in deeper MLD, as observed for the response of A. tamarense. Besides, the probability of Pseudo-nitzschia spp., P. reticulatum, A. spinosum also shows an increase in shallower MLD. In contrast, the presence probability of D. acuta increases in shallower waters but does not peak at the lowest MLD values, and D. norvegica shows relatively low sensitivity to water stratification. Similar to PAR, the probability response to MLD may contribute to the seasonal detection patterns. MLD is typically deeper in winter and begins to shallow in spring due to surface heating and freshwater input (Peralta-Ferriz and Woodgate, Reference Peralta-Ferriz and Woodgate2015). Therefore, MLD is expected to contribute to the lower presence probabilities during winter for most species.

The evaluated taxa exhibit an increased presence probability in more saline waters within the salinity range from 25 to 34.5, except for D. norvegica. Therefore, the potential impact of salinity on the geographic distribution of toxic algae should be considered. D. norvegica shows the highest annual probabilities in the Skagerrak Strait, where relatively fresh waters are prevalent due to inflows from the Baltic Sea (Eldevik et al., Reference Eldevik, Nilsen, Iovino, Olsson, Sandø and Drange2009; Furevik et al., Reference Furevik, Bentsen, Drange, Johannessen and Korablev2002). Note that higher D. norvegica presence toward relatively fresher waters, in the 25–34.5 PSU range, was also found in the Canadian coast (Boivin-Rioux et al., Reference Boivin-Rioux, Starr, Chassé, Scarratt, Perrie, Long and Lavoie2022). However, it should be acknowledged that salinity is a good tracer of water masses of different origins in the Norwegian Sea. The NCC water is fresher as it comes from river inflow and the Baltic Sea than the NwAC of tropical and Atlantic origin. This might have an influence on the occurrence of D. norvegica. Additionally, the highest probabilities of Alexandrium spp. and A. tamarense in northern Norway can be attributed to the lower freshwater input compared to southern Norway (Frigstad et al., Reference Frigstad, Kaste, Deininger, Kvalsund, Christensen, Bellerby, Sørensen, Norli and King2020; Furevik et al., Reference Furevik, Bentsen, Drange, Johannessen and Korablev2002).

The response to SST exhibits the greatest variability among the four inputs employed. It is important to note that the response to SST is of great concern, as it is commonly assumed that an increase in temperature may lead to increased prevalence and intensity of HABs (Wells et al., Reference Wells, Karlson, Wulff, Kudela, Trick, Asnaghi, Berdalet, Cochlan, Davidson, Rijcke, Dutkiewicz, Hallegraeff, Flynn, Legrand, Paerl, Silke, Suikkanen, Thompson and Trainer2020). However, our modeled response from 3.7 to 18.8 °C suggests that temperature may primarily influence the selection of the most common taxon rather than causing a general increase of all toxic algae presence probability. Therefore, without considering changes in SSS, MLD, and PAR, an increase in temperature could favor taxa associated with warmer waters at the expense of those related to colder waters. Further investigation is needed to explore this aspect, which we intend to address in future studies. In terms of geographical distribution, SST may also explain why the highest probabilities of Alexandrium spp. and A. tamarense occur in colder regions of northern Norway, while D. acuta and D. acuminata are more prevalent in warmer southern Norway. Notably, D. norvegica shows an increase in probability with lower temperatures but is still more abundant in southern Norway, likely due to the strong influence of low salinity as previously mentioned. Finally, Pseudo-nitzschia spp. exhibit a wide range of SST tolerance and that might explain its increased presence probability along the entire Norwegian coast.

4.3. Improving local monitoring and mitigation actions

The current monitoring protocol assesses the occurrence and levels of harmful algae taxa on a weekly basis and measures their respective toxins in shellfish monthly. However, our analysis presented in Figures 8, 9, and 10 indicates that the presence and HA probabilities vary significantly across geographical regions and seasons for the different taxa. For instance, the annual presence probability of A. tamarense is five times higher in northern Norway compared to southern Norway, and the presence probability of D. acuta is up to six times higher in autumn than in spring. The current monitoring effort remains consistent throughout the main algae growth season, regardless of the likelihood of harmful algae risks. During periods of elevated presence or HA risk, it could be beneficial to increase the monitoring frequency, employ faster analysis methods for algae abundance and toxin concentration, and provide general guidance to the public regarding the consumption of wild shellfish. The aquaculture industry could better optimize their harvesting plans and thus contribute to increasing safety. Furthermore, climate decadal variability is large and predictable in the region (Smith et al., Reference Smith, Scaife, Eade, Athanasiadis, Bellucci, Bethke, Bilbao, Borchert, Caron, Counillon, Danabasoglu, Delworth, Doblas-Reyes, Dunstone, Estella-Perez, Flavoni, Hermanson, Keenlyside, Kharin, Kimoto, Merryfield, Mignot, Mochizuki, Modali, Monerie, Müller, Nicolí, Ortega, Pankatz, Pohlmann, Robson, Ruggieri, Sospedra-Alfonso, Swingedouw, Wang, Wild, Yeager, Yang and Zhang2020). Refined information on the toxic algae evolution in the coming decade could be relevant for the farmers and the monitoring agency.

In our study, we calibrate probabilistic models for eight toxic taxa monitored along the Norwegian coast, ranging from the presence to HA. Models focusing on HA probability are more desirable as they are more closely related to toxic accumulation in shellfish, while also being able to pinpoint the most dangerous periods (Figure 10). HA probabilistic models of Alexandrium spp., A. tamarense, D. acuta, and A. spinosum can be applied as they are well correlated with their observation frequencies. However, the HA models for other taxa are not correlated with observation frequency and should not be employed. For D. norvegica, the model for abundances exceeding 3600 CellsL⁻¹ (80th percentile for HA) demonstrates satisfactory performance and can be utilized for tailored assessments. In the case of P. reticulatum, only the presence model exhibits good results. Nonetheless, it still provides valuable information for monitoring purposes. For instance, seasons with nearly 0% probability of presence (Figure 9g) suggest periods that are safer from P. reticulatum toxin contamination. The presence model for D. acuminata lacks the ability to estimate periods with a close-to-zero probability. However, it can still identify seasons with varying likelihoods of detection. As for Pseudo-nitzschia spp., the HA model needs improvement to provide useful information. Although the presence probability of Pseudo-nitzschia spp. demonstrates good correlation with observed frequency, this genus is commonly found in the Norwegian waters in all productive seasons (Hasle et al., Reference Hasle, Lange and Syvertsen1996) and makes the presence probability model less relevant.

An even more preferable product would be the probability of toxin accumulation in the blue mussels, an approach not addressed in this study for two reasons. First, the modeling becomes more complex as the toxin accumulation depends on the algae abundance and toxicity, and on the mussel feeding and starving periods (Aasen et al., Reference Aasen, Samdal, Miles, Dahl, Briggs and Aune2005; Smith et al., Reference Smith, Tong, Kulis and Anderson2018; Röder et al., Reference Röder, Hantzsche, Gebühr, Miene, Helbig, Krock, Hoppenrath, Luckas and Gerdts2012; Lindahl et al., Reference Lindahl, Lundve and Johansen2007; Nielsen et al., Reference Nielsen, Hansen, Krock and Vismann2016; Svensson, Reference Svensson2003; Duinker et al., Reference Duinker, Bergslien, Strand, Olseng and Svardal2007). Second, toxin data are collected monthly, and thus, there is less data to train the machine learning models, roughly 25% that of algae abundance data. This amount of data is insufficient to produce skillful probabilistic models, as it must include training and testing datasets large enough to be evaluated on reliability diagrams. Nevertheless, we foresee the probability modeling of toxin accumulation in the blue mussels as a viable option in the future when more data becomes available.

5. Future perspectives

This study develops SVM probabilistic models for estimating the probability of eight toxic algae along the Norwegian coast. The models, ranging from the presence to HA, estimate well the probability of presence and particularly excel for HA levels of Alexandrium spp., A. tamarense, D. acuta, and A. spinosum. Feeding the probabilistic model with observations or predictions of SST, MLD, SSS, and PAR can provide crucial insights into periods and regions of increased risk, enabling local authorities and actors to devise enhanced monitoring strategies to prevent shellfish poisoning outbreaks and mitigate production losses in shellfish farms. Predictions can be provided by subseasonal-to-seasonal forecasts and seasonal-to-decadal predictions (Meehl et al., Reference Meehl, Richter, Teng, Capotondi, Cobb, Doblas-Reyes, Donat, England, Fyfe, Han, Kim, Kirtman, Kushnir, Lovenduski, Mann, Merryfield, Nieves, Pegion, Rosenbloom, Sanchez, Scaife, Smith, Subramanian, Sun, Thompson, Ummenhofer and Xie2021) that are relatively skillful in the region (Doblas-Reyes et al., Reference Doblas-Reyes, Andreu-Burillo, Chikamoto, García-Serrano, Guemas, Kimoto, Mochizuki, Rodrigues and Oldenborgh2013; Passos et al., Reference Passos, Langehaug, Årthun, Eldevik, Bethke and Kimmritz2023; Langehaug et al., Reference Langehaug, Matei, Eldevik, Lohmann and Gao2017; Wang et al., Reference Wang, Counillon, Keenlyside, Svendsen, Gleixner, Kimmritz, Dai and Gao2019; Bethke et al., Reference Bethke, Wang, Counillon, Keenlyside, Kimmritz, Fransner, Samuelsen, Langehaug, Svendsen, Chiu, Passos, Bentsen, Guo, Gupta, Tjiputra, Kirkevåg, Olivié, Seland, Vågane, Fan and Eldevik2021). These forecasts are probabilistic (provided as ensembles) and can be easily used by our probabilistic model. The model can also be used to infer long-term projections of toxic algae frequency occurrence by using the future projection of SST, MLD, and SSS (Carvalho et al., Reference Carvalho, Pereira and Rocha2021; Davy and Outten, Reference Davy and Outten2020). The integration of probabilistic models could empower decision-makers with evidence-based tools to proactively safeguard public health and ensure the resilience of coastal ecosystems, mitigating the impacts of toxic algae contamination and promoting sustainable and safe shellfish production and consumption. How decisions would be made based on the probabilistic output and how they should be communicated is yet to be planned based on continuous dialogues, which includes farmers and monitoring authorities.

Notably, our method can be extended to other coastal locations and help improve the safeguard of public health and reduce economic impacts. The machine learning modeling relies only on environmental drivers commonly available through remote sensing and modeled reanalysis, which are not limited to a specific region or national territories and can be obtained for various coastal waters. HAB observations used for training the model come from the Norwegian national program of toxic algae monitoring, which has similar infrastructure and data collection practices as in other regions. The calibrated probabilistic models for harmful algae presence perform well with eight taxa, suggesting a large possibility of expanding to other species monitored in other locations. Extending our proposed method to other coastal waters has the potential to enhance other monitoring programs and proactive mitigation actions to protect public health and the aquaculture industry.

Acknowledgements

The toxic algae data were obtained with permission from the monitoring program of algae toxins in mussels and dietetic advice to the public (https://www.matportalen.no/verktoy/blaskjellvarsel/), operated by the Norwegian Food Safety Authority (NFSA). GlobColour data (http://globcolour.info) used in this study has been developed, validated, and distributed by ACRI-ST, France. This study has been conducted using E.U. Copernicus Marine Service Information; https://doi.org/10.48670/moi-00169, https://doi.org/10.48670/moi-00007. The authors want to thank Jiping Xie for making the TOPAZ4 MLD and SSS data available to us.

Author contribution

Conceptualization: E.S., J.B., F.C. Data curation: E.S. Formal analysis: E.S., J.B., F.C. Funding acquisition: J.B., F.C., L.P. Investigation: E.S., J.B., F.C., L.P., L.N. Methodology: E.S. Supervision: J.B., F.C. Validation: E.S., J.B., F.C. Visualization: E.S. Writing—original draft: E.S. Writing—review and editing: E.S., J.B., F.C., L.P., L.N.

Competing interest

None declared.

Data availability statement

Toxic algae data can be provided on demand to the Norwegian Food Safety Authority (NFSA). PAR satellite data can be freely accessed on the GlobColour portal (www.globcolour.info). SST, MLD, and SSS can be freely accessed by the CMEMS portal (https://doi.org/10.48670/moi-00169, https://doi.org/10.48670/moi-00007). Codes are available on https://doi.org/10.5281/zenodo.10671482.

Ethics statement

The research meets all ethical guidelines, including adherence to the legal requirements of the study country.

Funding statement

ES is a holder of an institute research fellowship (INSTSTIP) funded by the basic institutional funding through the Norwegian Research Council (#318085). FC acknowledges the Trond Mohn Foundation under project number: BFS2018TMT01. JB acknowledges the NFR Climate Futures (309562).

References

Aasen, J, Samdal, IA, Miles, CO, Dahl, E, Briggs, LR and Aune, T (2005) Yessotoxins in Norwegian blue mussels (Mytilus edulis): Uptake from Protoceratium reticulatum, metabolism and depuration. Toxicon 45, 265–272.CrossRef Google Scholar PubMed

Anderson, CR, Kudela, RM, Benitez-Nelson, C, Sekula-Wood, E, Burrell, CT, Chao, Y, Langlois, G, Goodman, J and Siegel, DA (2011) Detecting toxic diatom blooms from ocean color and a regional ocean model. Geophysical Research Letters 38, L04603. https://doi.org/10.1029/2010GL045858.CrossRef Google Scholar

Anderson, DM, Alpermann, TJ, Cembella, AD, Collos, Y, Masseret, E and Montresor, M (2012) The globally distributed genus alexandrium: Multifaceted roles in marine ecosystems and impacts on human health. Harmful Algae 14, 10–35.CrossRef Google Scholar PubMed

Basti, L, Suzuki, T, Uchida, H, Kamiyama, T and Nagai, S (2018) Thermal acclimation affects growth and lipophilic toxin production in a strain of cosmopolitan harmful alga Dinophysis acuminata. Harmful Algae 73, 119–128.CrossRef Google Scholar

Bates, SS, Hubbard, KA, Lundholm, N, Montresor, M and Leaw, CP (2018) Pseudo-nitzschia, nitzschia, and domoic acid: New research since 2011. Harmful Algae 79, 3–43.CrossRef Google Scholar PubMed

Bethke, I, Wang, Y, Counillon, F, Keenlyside, N, Kimmritz, M, Fransner, F, Samuelsen, A, Langehaug, H, Svendsen, L, Chiu, P-G, Passos, L, Bentsen, M, Guo, C, Gupta, A, Tjiputra, J, Kirkevåg, A, Olivié, D, Seland, Ø, Vågane, JS, Fan, Y and Eldevik, T (2021) NorCPM1 and its contribution to CMIP6 DCPP. Geoscientific Model Development 14, 7073–7116.CrossRef Google Scholar

Bill, B, Cochlan, WP and Trainer, VL (2012) The effect of light on growth rate and primary productivity in Pseudo-nitzschia australis and Pseudo-nitzschia turgidula. In Proceedings of the 14th International Conference on Harmful Algae. International Society for the Study of Harmful Algae and Intergovernmental Oceanographic Commission of UNESCO, Paris, France, 78–80.Google Scholar

Bleck, R (2002) An oceanic general circulation model framed in hybrid isopycnic-cartesian coordinates. Ocean Modelling 4, 55–88.CrossRef Google Scholar

Boivin-Rioux, A, Starr, M, Chassé, J, Scarratt, M, Perrie, W, Long, Z and Lavoie, D (2022) Harmful algae and climate change on the Canadian East Coast: Exploring occurrence predictions of Dinophysis acuminata, D. norvegica, and pseudo-nitzschia seriata. Harmful Algae 112, 102183. D. norvegica modeling response.CrossRef Google Scholar PubMed

Bouquet, A, Laabir, M, Rolland, JL, Chomérat, N, Reynes, C, Sabatier, R, Felix, C, Berteau, T, Chiantella, C and Abadie, E (2022) Prediction of alexandrium and dinophysis algal blooms and shellfish contamination in French Mediterranean lagoons using decision trees and linear regression: a result of 10 years of sanitary monitoring. Harmful Algae 115, 102234.CrossRef Google Scholar PubMed

Bröcker, J and Smith, LA (2007) Increasing the reliability of reliability diagrams. Weather and Forecasting 22, 651–661.CrossRef Google Scholar

Carvalho, D, Pereira, SC and Rocha, A (2021) Future surface temperatures over Europe according to CMIP6 climate projections: An analysis with original and bias-corrected data. Climatic Change 167, 10.CrossRef Google Scholar

Castberg, T, Torgersen, T, Aasen, J, Aune, T and Naustvoll, L-J (2004) Diarrhoetic shellfish poisoning toxins in Cancer pagurus Linnaeus, 1758 (Brachyura, Cancridae) in Norwegian waters. Sarsia 89, 311–317.CrossRef Google Scholar

Chen, W, Schulz-Stellenfleth, J, Grayek, S and Staneva, J (2021) Impacts of the assimilation of satellite sea surface temperature data on volume and heat budget estimates for the North Sea. Journal of Geophysical Research: Oceans 126, e2020JC017059. https://doi.org/10.1029/2020JC017059.Google Scholar

Cortes, C and Vapnik, V (1995) Support-vector networks. Machine Learning 20, 273–297.CrossRef Google Scholar

Cruz, R C, Costa, PR, Vinga, S, Krippahl, L and Lopes, MB (2021) A review of recent machine learning advances for forecasting harmful algal blooms and shellfish contamination. Journal of Marine Science and Engineering 9, 283.CrossRef Google Scholar

Dahl, E and Johannessen, T (2001) Relationship between occurrence of dinophysis species (Dinophyceae) and shellfish toxicity. Phycologia 40, 223–227.CrossRef Google Scholar

Davy, R and Outten, S (2020) The arctic surface climate in CMIP6: Status and developments since CMIP5. Journal of Climate 33, 8047–8068.CrossRef Google Scholar

Doblas-Reyes, FJ, Andreu-Burillo, I, Chikamoto, Y, García-Serrano, J, Guemas, V, Kimoto, M, Mochizuki, T, Rodrigues, LR and Oldenborgh, GJV (2013) Initialized near-term regional climate change prediction. Nature Communications 4, 1–9.CrossRef Google Scholar PubMed

Duinker, A, Bergslien, M, Strand, O, Olseng, C and Svardal, A (2007) The effect of size and age on depuration rates of diarrhetic shellfish toxins (DST) in mussels (Mytilus edulis L.). Harmful Algae 6, 288–300.CrossRef Google Scholar

Eldevik, T, Nilsen, J E, Iovino, D, Olsson, KA, Sandø, AB and Drange, H (2009) Observed sources and variability of nordic seas overflow. Nature Geoscience 2, 406–410.CrossRef Google Scholar

Evensen, G (2003) The ensemble Kalman filter: theoretical formulation and practical implementation. Ocean Dynamics 53, 343–367.CrossRef Google Scholar

Fehling, J, Green, DH, Davidson, K, Bolch, CJ and Bates, SS (2004) Domoic acid production by pseudo-nitzschia seriata (Bacillariophyceae) in Scottish waters. Journal of Phycology 40, 622–630.CrossRef Google Scholar

Ferreira, AS, Hátún, H, Counillon, F, Payne, MR and Visser, AW (2015) Synoptic-scale analysis of mechanisms driving surface chlorophyll dynamics in the North Atlantic. Biogeosciences 12, 3641–3653.CrossRef Google Scholar

Frigstad, H, Kaste, Ø, Deininger, A, Kvalsund, K, Christensen, G, Bellerby, RG, Sørensen, K, Norli, M and King, AL (2020) Influence of riverine input on Norwegian coastal systems. Frontiers in Marine Science 7, 332. https://doi.org/10.3389/fmars.2020.00332.CrossRef Google Scholar

Frouin, R, Franz, B and Wang, M (2003) Algorithm to estimate par from seawifs data version 1.2-documentation. NASA Tech Memo 206892, 46–50.Google Scholar

Furevik, T, Bentsen, M, Drange, H, Johannessen, JA and Korablev, A (2002) Temporal and spatial variability of the sea surface salinity in the nordic seas. Journal of Geophysical Research: Oceans 107, 1–16.CrossRef Google Scholar

García-Portela, M, Riobó, P, Reguera, B, Garrido, JL, Blanco, J and Rodríguez, F (2018) Comparative ecophysiology of Dinophysis acuminata and D. acuta (Dinophyceae, Dinophysiales): effect of light intensity and quality on growth, cellular toxin content, and photosynthesis. Journal of Phycology 54, 899–917.CrossRef Google Scholar

Giesen, RH, Andreassen, LM, Oerlemans, J and Broeke, MRVD (2014) Surface energy balance in the ablation zone of Langfjordjøkelen, an Arctic, maritime glacier in Northern Norway. Journal of Glaciology 60, 57–70.CrossRef Google Scholar

Good, S, Fiedler, E, Mao, C, Martin, MJ, Maycock, A, Reid, R, Roberts-Jones, J, Searle, T, Waters, J, While, J and Worsfold, M (2020) The current configuration of the ostia system for operational production of foundation sea surface temperature and ice concentration analyses. Remote Sensing 12, 1–20.CrossRef Google Scholar

Guerrini, F, Ciminiello, P, Dell’Aversano, C, Tartaglione, L, Fattorusso, E, Boni, L and Pistocchi, R (2007) Influence of temperature, salinity and nutrient limitation on yessotoxin production and release by the dinoflagellate Protoceratium reticulatum in batch-cultures. Harmful Algae 6, 707–717. Temperature positive grow from 16 to 26.CrossRef Google Scholar

Hasle, GR, Lange, CB and Syvertsen, EE (1996) A review of pseudo-nitzschia, with special reference to the Skagerrak, North Atlantic, and adjacent waters. Helgoländer Meeresuntersuchungen 50, 131–175.CrossRef Google Scholar

Hastie, T, Tibshirani, R and Friedman, JH (2009) The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edn. New York, NY: Springer.CrossRef Google Scholar

Hoagland, P, Anderson, DM, Kaoru, Y and White, AW (2002) The economic effects of harmful algal blooms in the United States: Estimates, assessment issues, and information needs. Estuaries 25, 819–837.CrossRef Google Scholar

Hoagland, P and Scatasta, S (2006) The Economic Effects of Harmful Algal Blooms. Berlin Heidelberg: Springer, pp. 391–402.Google Scholar

Hordoir, R, Dieterich, C, Basu, C, Dietze, H and and Meier, H (2013) Freshwater outflow of the Baltic Sea and transport in the Norwegian current: A statistical correlation analysis based on a numerical experiment. Continental Shelf Research 64, 1–9.CrossRef Google Scholar

Hunke, EC and Dukowicz, JK (1997) An elastic–viscous–plastic model for sea ice dynamics. Journal of Physical Oceanography 27, 1849–1867.2.0.CO;2>CrossRef Google Scholar

Hällfors, H, Hajdu, S, Kuosa, H and Larsson, U (2011) Vertical and temporal distribution of the dinoflagellates Dinophysis acuminata and D. norvegica in the Baltic Sea. Boreal Environment Research 16, 121–135. D. norvegica observation in narrow salinity.Google Scholar

Jakowczyk, M and Stramska, M (2014) Spatial and temporal variability of satellite-derived sea surface temperature in the Barents Sea. International Journal of Remote Sensing 35, 6545–6560.CrossRef Google Scholar

Jauffrais, T, Séchet, V, Herrenknecht, C, Truquet, P, Véronique, S, Tillmann, U and Hess, P (2013) Effect of environmental and nutritional factors on growth and azaspiracid production of the dinoflagellate Azadinium spinosum. Harmful Algae 27, 138–148.CrossRef Google Scholar

Jin, D, Moore, S, Holland, D, Anderson, L, Lim, W-A, Kim, D, Jardine, S, Martino, S, Gianella, F and Davidson, K (2020) Evaluating the Economic Impacts of HABs 2 Evaluating the Economic Impacts of Harmful Algal Blooms: Issues, Methods, and Examples, p. 5. PICES Sci. Rep., no. 59 edition.Google Scholar

Karlson, B, Andersen, P, Arneborg, L, Cembella, A, Eikrem, W, John, U, West, J J, Klemm, K, Kobos, J, Lehtinen, S, Lundholm, N, Mazur-Marzec, H, Naustvoll, L, Poelman, M, Provoost, P, Rijcke, MD and Suikkanen, S (2021) Harmful algal blooms and their effects in coastal seas of Northern Europe. Harmful Algae 102, 101989.CrossRef Google Scholar PubMed

Kim, S, Kang, Y, Kim, H, Yih, W, Coats, D and Park, M (2008) Growth and grazing responses of the mixotrophic dinoflagellate Dinophysis acuminata as functions of light intensity and prey concentration. Aquatic Microbial Ecology 51, 301–310.CrossRef Google Scholar

Kirst, G O (1990) Salinity tolerance of eukaryotic marine algae. Annual Review of Plant Physiology and Plant Molecular Biology 41, 21–53.CrossRef Google Scholar

Klemm, K, Cembella, A, Clarke, D, Cusack, C, Arneborg, L, Karlson, B, Liu, Y, Naustvoll, L, Siano, R, Gran-Stadniczeñko, S and John, U (2022) Apparent biogeographical trends in Alexandrium blooms for Northern Europe: identifying links to climate change and effective adaptive actions. Harmful Algae 119, 102335. Considered climate, it is argued that Alexandrium species that grow in wide range of salinity would be favored in sharpened salinity gradient and reduced salinity. It assumes that is unlikely to salinity changes have an effect in coastal waters in northern Europe. My critique: Those conclusions are for the whole genus, not for tamarense. They are also based in laboratory experiment in ideal conditions, dont consider competition.CrossRef Google Scholar PubMed

Langehaug, HR, Matei, D, Eldevik, T, Lohmann, K and Gao, Y (2017) On model differences and skill in predicting sea surface temperature in the nordic and barents seas. Climate Dynamics 48, 913–933.CrossRef Google Scholar

Li, X, Yu, J, Jia, Z and Song, J (2014) Harmful algal blooms prediction with machine learning models in tolo harbour, pp. 245–250. IEEE. Spliting considering the future No trivial predictors only future data tested target: Chla.CrossRef Google Scholar

Lien, VS, Hjøllo, SS, Skogen, MD, Svendsen, E, Wehde, H, Bertino, L, Counillon, F, Chevallier, M and Garric, G (2016) An assessment of the added value from data assimilation on modelled nordic seas hydrography and ocean transports. Ocean Modelling 99, 43–59.CrossRef Google Scholar

Lindahl, O, Lundve, B and Johansen, M (2007) Toxicity of Dinophysis spp. in relation to population density and environmental conditions on the Swedish West Coast. Harmful Algae 6, 218–231.CrossRef Google Scholar

Martino, S, Gianella, F and Davidson, K (2020) An approach for evaluating the economic impacts of harmful algal blooms: The effects of blooms of toxic Dinophysis spp. on the productivity of Scottish shellfish farms. Harmful Algae 99, 101912.CrossRef Google Scholar PubMed

Meehl, GA, Richter, JH, Teng, H, Capotondi, A, Cobb, K, Doblas-Reyes, F, Donat, MG, England, MH, Fyfe, JC, Han, W, Kim, H, Kirtman, BP, Kushnir, Y, Lovenduski, NS, Mann, ME, Merryfield, WJ, Nieves, V, Pegion, K, Rosenbloom, N, Sanchez, SC, Scaife, AA, Smith, D, Subramanian, AC, Sun, L, Thompson, D, Ummenhofer, CC and Xie, S-P (2021) Initialized earth system prediction from subseasonal to decadal timescales. Nature Reviews Earth & Environment 2, 340–357.CrossRef Google Scholar

Merchant, CJ, Embury, O, Bulgin, CE, Block, T, Corlett, GK, Fiedler, E, Good, SA, Mittaz, J, Rayner, NA, Berry, D, Eastwood, S, Taylor, M, Tsushima, Y, Waterfall, A, Wilson, R and Donlon, C (2019) Satellite-based time-series of sea-surface temperature since 1981 for climate applications. Scientific Data 6, 1–18.CrossRef Google Scholar PubMed

Nagai, S, Matsuyama, Y, Oh, SJ and Itakura, S (2004) Effect of nutrients and temperature on encystment of the toxic dinoflagellate Alexandrium tamarense (dinophyceae) isolated from Hiroshima Bay, Japan. Plankton Biology and Ecology 51, 103–109.Google Scholar

Naustvoll, L-J, Gustad, E and Dahl, E (2012) Monitoring of Dinophysis species and diarrhetic shellfish toxins in Flødevigen Bay, Norway: Inter-annual variability over a 25-year time-series. Food Additives & Contaminants: Part A 29, 1605–1615.CrossRef Google Scholar

Nielsen, LT, Hansen, PJ, Krock, B and Vismann, B (2016) Accumulation, transformation and breakdown of dsp toxins from the toxic dinoflagellate dinophysis acuta in blue mussels, Mytilus edulis. Toxicon 117, 84–93.CrossRef Google Scholar PubMed

Passos, L, Langehaug, HR, Årthun, M, Eldevik, T, Bethke, I and Kimmritz, M (2023) Impact of initialization methods on the predictive skill in norcpm: An Arctic–Atlantic case study. Climate Dynamics 60, 2061–2080.CrossRef Google Scholar

Paulino, AI, Larsen, A, Bratbak, G, Evens, D, Erga, SR, Bye-Ingebrigtsen, E and Egge, JK (2018) Seasonal and annual variability in the phytoplankton community of the Raunefjord, West Coast of Norway from 2001–2006. Marine Biology Research 14, 421–435.CrossRef Google Scholar

Paz, B, Vázquez, J A, Riobó, P and Franco, JM (2006) Study of the effect of temperature, irradiance and salinity on growth and yessotoxin production by the dinoflagellate Protoceratium reticulatum in culture by using a kinetic and factorial approach. Marine Environmental Research 62, 286–300.CrossRef Google Scholar PubMed

Pedregosa, F, Varoquaux, G, Gramfort, A, Michel, V, Thirion, B, Grisel, O, Blondel, M, Prettenhofer, P, Weiss, R, Dubourg, V, Vanderplas, J, Passos, A and Cournapeau, D (2011) Scikit-learn: Machine learning in Python. Journal of Machine Learning Research 12, 2825–2830.Google Scholar

Peralta-Ferriz, C and Woodgate, RA (2015) Seasonal and interannual variability of pan-Arctic surface mixed layer properties from 1979 to 2012 from hydrographic data, and the dominance of stratification for multiyear mixed layer depth shoaling. Progress in Oceanography 134, 19–53.CrossRef Google Scholar

Petrenko, D, Pozdnyakov, D, Johannessen, J, Counillon, F and Sychov, V (2013) Satellite-derived multi-year trend in primary production in the Arctic Ocean. International Journal of Remote Sensing 34, 3903–3937.CrossRef Google Scholar

Pettersson, L H and Pozdnyakov, D (2013) Monitoring of Harmful Algal Blooms. Berlin Heidelberg: Springer.CrossRef Google Scholar

Platt, J (1999) Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Advances in Large Margin Classifiers 10, 61–74.Google Scholar

Reguera, B, Velo-Suárez, L, Raine, R and Park, MG (2012) Harmful dinophysis species: A review. Harmful Algae 14, 87–106.CrossRef Google Scholar

Rial, P, Sixto, M, Vázquez, J, Reguera, B, Figueroa, R, Riobó, P and Rodríguez, F (2023) Interaction between temperature and salinity stress on the physiology of Dinophysis spp. and Alexandrium minutum: Implications for niche range and blooming patterns. Aquatic Microbial Ecology 89, 1–22. acuta, D. and acuminata, D..CrossRef Google Scholar

Ribeiro, R and Torgo, L (2008) A comparative study on predicting algae blooms in Douro River, Portugal. Ecological Modelling 212, 86–91. Data split: final time series as testing no trivial predictors used future data in testing target: Major groups, such as cyano, diatoms, etc.CrossRef Google Scholar

Röder, K, Hantzsche, FM, Gebühr, C, Miene, C, Helbig, T, Krock, B, Hoppenrath, M, Luckas, B and Gerdts, G (2012) Effects of salinity, temperature and nutrients on growth, cellular characteristics and yessotoxin production of Protoceratium reticulatum. Harmful Algae 15, 59–70.CrossRef Google Scholar

Sakov, P, Counillon, F, Bertino, L, Lisæter, KA, Oke, PR and Korablev, A (2012) Topaz4: An ocean-sea ice data assimilation system for the North Atlantic and Arctic. Ocean Science 8, 633–656.CrossRef Google Scholar

Silva, E, Counillon, F, Brajard, J, Pettersson, LH and Naustvoll, L (2023) Forecasting harmful algae blooms: Application to dinophysis acuminata in Northern Norway. Harmful Algae 126, 102442.CrossRef Google Scholar PubMed

Smayda, TJ (2008) Complexity in the eutrophication–harmful algal bloom relationship, with comment on the importance of grazing. Harmful Algae 8, 140–151.CrossRef Google Scholar

Smith, DM, Scaife, AA, Eade, R, Athanasiadis, P, Bellucci, A, Bethke, I, Bilbao, R, Borchert, LF, Caron, LP, Counillon, F, Danabasoglu, G, Delworth, T, Doblas-Reyes, FJ, Dunstone, NJ, Estella-Perez, V, Flavoni, S, Hermanson, L, Keenlyside, N, Kharin, V, Kimoto, M, Merryfield, WJ, Mignot, J, Mochizuki, T, Modali, K, Monerie, PA, Müller, WA, Nicolí, D, Ortega, P, Pankatz, K, Pohlmann, H, Robson, J, Ruggieri, P, Sospedra-Alfonso, R, Swingedouw, D, Wang, Y, Wild, S, Yeager, S, Yang, X and Zhang, L (2020) North Atlantic climate far more predictable than models imply. Nature 583, 796–800.CrossRef Google Scholar PubMed

Smith, JL, Tong, M, Kulis, D and Anderson, DM (2018) Effect of ciliate strain, size, and nutritional content on the growth and toxicity of mixotrophic Dinophysis acuminata. Harmful Algae 78, 95–105.CrossRef Google Scholar PubMed

Svensson, S (2003) Depuration of okadaic acid (diarrhetic shellfish toxin) in mussels, Mytilus edulis (Linnaeus), feeding on different quantities of nontoxic algae. Aquaculture 218, 277–291.CrossRef Google Scholar

Tan, P-N, Steinbach, M and Kumar, V (2008) Introdução ao Data Mining: Mineração de Dados.Google Scholar

Thomas, MK, Kremer, CT, Klausmeier, CA and Litchman, E (2012) A global pattern of thermal adaptation in marine phytoplankton. Science 338, 1085–1088.CrossRef Google Scholar PubMed

Wang, Y, Counillon, F, Keenlyside, N, Svendsen, L, Gleixner, S, Kimmritz, M, Dai, P and Gao, Y (2019) Seasonal predictions initialised by assimilating sea surface temperature observations with the EnKF. Climate Dynamics 53, 5777–5797.CrossRef Google Scholar

Weber, C, Olesen, AKJ, Krock, B and Lundholm, N (2021) Salinity, a climate-change factor affecting growth, domoic acid and isodomoic acid c content in the diatom pseudo-Nitzschia seriata (Bacillariophyceae). Phycologia 60, 619–630.CrossRef Google Scholar

Wells, ML, Karlson, B, Wulff, A, Kudela, R, Trick, C, Asnaghi, V, Berdalet, E, Cochlan, W, Davidson, K, Rijcke, MD, Dutkiewicz, S, Hallegraeff, G, Flynn, KJ, Legrand, C, Paerl, H, Silke, J, Suikkanen, S, Thompson, P and Trainer, VL (2020) Future hab science: Directions and challenges in a changing climate. Harmful Algae 91, 101632. 1. Say something about new tools for monitoring.CrossRef Google Scholar

Xie, J, Bertino, L, Counillon, F, Lisæter, KA and Sakov, P (2017) Quality assessment of the topaz4 reanalysis in the Arctic over the period 1991–2013. Ocean Science 13, 123–144.CrossRef Google Scholar

Table 1. Sanitary thresholds used for calibrating the probabilistic models for each taxon; from presence (CellsL−1 > =1) to HA of each taxon

Figure 3. Reliability diagram. Comparison between the estimated presence probability and the observed presence frequency estimated in 10 bins for all taxa and their linear regression.

Table 2. Statistical results for presence models for the eight taxa studied

Figure 6. Statistical changes along different sanitary thresholds. The changes of R (a), RMSE (b), AB (c), and total number of samples above the threshold (d) are shown for different sanitary levels of each taxa. The x-axis shows the relative percentile threshold from presence (CellsL−1 > =1) to the HA of each taxa. The black dashed horizontal line in (a) corresponds to the significant level threshold for p-value < 0.05.

Article contents

Probabilistic models for harmful algae: application to the Norwegian coast

Abstract

Keywords

Impact Statement

1. Introduction

2. Material and methods

2.1. Study region

2.2. In situ data collection

2.3. Satellite and model reanalysis data

2.4. The calibration of presence probabilistic models

2.5. Model and data input uncertainties

2.6. Calibration of probabilistic models for higher sanitary thresholds and HAs

2.7. Reliability assessment

2.8. Presence model sensitivity to predictors

2.9. Presence probability maps

2.10. Seasonal probability estimation

3. Results

3.1. Reliability of the probabilistic models

3.2. Presence models response to environmental input

3.3. Presence probability maps

3.4. Seasonal presence and HA probabilities

4. Discussion

4.1. SVM skill in modeling harmful algae probability

4.2. Environmental influence: geographical and seasonal variability

4.3. Improving local monitoring and mitigation actions

5. Future perspectives

Acknowledgements

Author contribution

Competing interest

Data availability statement

Ethics statement

Funding statement

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests