Discrimination of fish populations using parasites: Random Forests on a ‘predictable’ host-parasite system

A. PÉREZ-DEL-OLMO; F. E. MONTERO; M. FERNÁNDEZ; J. BARRETT; J. A. RAGA; A. KOSTADINOVA

doi:10.1017/S0031182010000739

Discrimination of fish populations using parasites: Random Forests on a ‘predictable’ host-parasite system

Published online by Cambridge University Press: 06 July 2010

A. PÉREZ-DEL-OLMO ,

F. E. MONTERO ,

M. FERNÁNDEZ ,

J. BARRETT ,

J. A. RAGA and

A. KOSTADINOVA

Show author details

A. PÉREZ-DEL-OLMO*: Affiliation:
Department of Applied Zoology/Hydrobiology, University of Duisburg-Essen, Universitätsstrasse 5, D-45141 Essen, Germany
F. E. MONTERO: Affiliation:
Department of Animal Biology, Plant Biology and Ecology, Autonomous University of Barcelona, Campus Universitari, 08193 Bellaterra, Barcelona, Spain
M. FERNÁNDEZ: Affiliation:
Fundación General de la Universitat de València & Instituto Cavanilles de Biodiversidad y Biología Evolutiva, Parc Científic, Universitat de Valencia, PO Box 22 085, 46071 Valencia, Spain
J. BARRETT: Affiliation:
IBERS, University of Aberystwyth, Ceredigion SY23 3DA, UK
J. A. RAGA: Affiliation:
Instituto Cavanilles de Biodiversidad y Biología Evolutiva, Parc Científic, Universitat de València, PO Box 22 085, 46071 Valencia, Spain
A. KOSTADINOVA: Affiliation:
Institute of Parasitology, Biology Centre v.v.i., Academy of Sciences of the Czech Republic, Branišovská 31, 370 05 České Budějovice, Czech Republic Central Laboratory of General Ecology, Bulgarian Academy of Sciences, 2 Gagarin Street, 1113 Sofia, Bulgaria
*: *Corresponding author: University of Duisburg-Essen, Department of Applied Zoology/Hydrobiology, Universitätsstrasse 5, D-45141 Essen, Germany. Tel: +49 2011832250. Fax: +49 2011832179. E-mail: [email protected]

Article contents

Summary
References

Get access

Rights & Permissions

Summary

We address the effect of spatial scale and temporal variation on model generality when forming predictive models for fish assignment using a new data mining approach, Random Forests (RF), to variable biological markers (parasite community data). Models were implemented for a fish host-parasite system sampled along the Mediterranean and Atlantic coasts of Spain and were validated using independent datasets. We considered 2 basic classification problems in evaluating the importance of variations in parasite infracommunities for assignment of individual fish to their populations of origin: multiclass (2–5 population models, using 2 seasonal replicates from each of the populations) and 2-class task (using 4 seasonal replicates from 1 Atlantic and 1 Mediterranean population each). The main results are that (i) RF are well suited for multiclass population assignment using parasite communities in non-migratory fish; (ii) RF provide an efficient means for model cross-validation on the baseline data and this allows sample size limitations in parasite tag studies to be tackled effectively; (iii) the performance of RF is dependent on the complexity and spatial extent/configuration of the problem; and (iv) the development of predictive models is strongly influenced by seasonal change and this stresses the importance of both temporal replication and model validation in parasite tagging studies.

Keywords

predictive models Random Forests fish population discrimination parasites as tags Boops boops Mediterranean North-East Atlantic

Type: Research Article
Information: Parasitology , Volume 137 , Issue 12 , October 2010 , pp. 1833 - 1847

DOI: https://doi.org/10.1017/S0031182010000739 [Opens in a new window]
Copyright: Copyright © Cambridge University Press 2010

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

REFERENCES

Arias, A. M. and Drake, P. (1994). Structure and production of the benthic macroinvertebrate community in a shallow lagoon in the Bay of Cádiz. Marine Ecology Progress Series 115, 151–167.CrossRef Google Scholar

Atmar, W. and Patterson, B. D. (1995). The Nestedness Temperature Calculator: a Visual Basic Program, including 294 Presence-Absence Matrices. AICS Res. Inc., University Park, New Mexico, and The Field Mus., Chicago, USA. (http://aicsresearch.com/ nestedness/tempcalc.html)Google Scholar

Breiman, L. (2001). Random forests. Machine Learning 45, 5–32.CrossRef Google Scholar

Bush, A. O., Lafferty, K. D., Lotz, J. M. and Shostak, A. W. (1997). Parasitology meets ecology in its own terms: Margolis et al. revisited. Journal of Parasitology 83, 575–583.CrossRef Google Scholar

Fabrizio, M. C. (2005). Experimental design and sampling strategies for mixed-stock analysis. In Stock Identification Methods. Applications in Fishery Science (ed. Cadrin, S. X., Friedland, K. D. and Waldman, J. R.), pp. 467–498. Elsevier Academic Press, San Diego, CA, USA.CrossRef Google Scholar

Ferrer-Castelló, E., Raga, J. A. and Aznar, F. J. (2007). Parasites as fish population tags and pseudoreplication problems: the case of striped red mullet Mullus surmuletus in the Spanish Mediterranean. Journal of Helminthology 81, 169–178.CrossRef Google Scholar PubMed

Koprinska, I., Poon, J., Clark, J. and Chan, J. (2007). Learning to classify e-mail. Information Sciences 177, 2167–2187.CrossRef Google Scholar

Liaw, A. and Wiener, M. (2002). Classification and regression by Random-Forest. R News 2, 18–22. (http://CRAN.R-project.org/doc/Rnews/)Google Scholar

Liaw, A. and Weiner, M. (2007). randomForest (R software for random forest). Fortran original (L. Breiman and A. Cutler), R port (A. Liaw and M.Wiener) Version 4.5–19 and 4.5–25. (http://cran.r-project.org/web/ packages/randomForest /index.html)Google Scholar

Lunetta, K. L., Hayward, L. B., Segal, J. and Eerdewegh, P. V. (2004). Screening large-scale association study data: exploiting interactions using random forests. BMC Genetics 5, 32.CrossRef Google Scholar PubMed

MacKenzie, K. (2002). Parasites as biological tags in population studies of marine organisms: An update. Parasitology 124, S153–S163.CrossRef Google Scholar PubMed

MacKenzie, K. and Abaunza, P. (2005). Parasites as biological tags. In Stock Identification Methods. Applications in Fishery Science (ed. Cadrin, S. X., Friedland, K. D. and Waldman, J. R.), pp. 211–226. Elsevier Academic Press, San Diego, CA, USA.CrossRef Google Scholar

Meyer, D., Leisch, F. and Hornik, K. (2003). The support vector machine under test. Neurocomputing 55, 169–186.CrossRef Google Scholar

Okun, O. and Priisalu, H. (2007). Random Forest for gene expression based cancer classification: Overlooked issues. In Pattern Recognition and Image Analysis. Lecture Notes in Computer Science, Vol. 4478 (ed. Martí, J., Benedí, J. M., Mendonça, A. M. andSerrat, J.), pp. 483–490. Springer-Verlag, Berlin-Heidelberg, Germany.Google Scholar

Perdiguero-Alonso, D., Montero, F. E., Kostadinova, A., Raga, J. A. and Barrett, J. (2008). Random forests, a novel approach for discrimination of fish populations using parasites as biological tags. International Journal for Parasitology 38, 1425–1434.CrossRef Google Scholar PubMed

Pérez-del-Olmo, A., Fernández, M., Gibson, D. I., Raga, J. A. and Kostadinova, A. (2007). Descriptions of some unusual digeneans from Boops boops L. (Sparidae) and a complete checklist of its metazoan parasites. Systematic Parasitology 66, 137–158.CrossRef Google Scholar

Pérez-del-Olmo, A., Fernández, M., Raga, J. A., Kostadinova, A. and Poulin, R. (2008). Halfway up the trophic chain: development of parasite communities in the sparid fish Boops boops. Parasitology 135, 257–268.CrossRef Google Scholar PubMed

Pérez-del-Olmo, A., Fernández, M., Raga, J. A., Kostadinova, A. and Morand, S. (2009). Not everything is everywhere: Similarity-decay relationship in a marine host-parasite system. Journal of Biogeography 36, 200–209.CrossRef Google Scholar

Peters, J., Samson, R. and Verhoest, N. E. C. (2005). Predictive ecohydrological modelling using the random forest algorithm. Communications in Agricultural and Applied Biological Sciences 70, 207–211.Google Scholar PubMed

Peters, J., De Baets, B., Verhoest, N. E. C., Samson, R., Degroeve, S., De Becker, P. and Huybrechts, W. (2007). Random forests as a tool for ecohydrological distribution modelling. Ecological Modelling 207, 304–318.CrossRef Google Scholar

Pietrock, M. and Marcogliese, D. J. (2003). Free-living endohelminth stages: at the mercy of environmental conditions. Trends in Parasitology 19, 293–299.CrossRef Google Scholar PubMed

Power, A. M., Balbuena, J. A. and Raga, J. A. (2005). Parasite infracommunities as predictors of harvest location of bogue (Boops boops L.): a pilot study using statistical classifiers. Fisheries Research 72, 229–239.CrossRef Google Scholar

Prinzie, A. and Van den Poel, D. (2008). Random Forests for multiclass classification: Random MultiNomial Logit. Expert Systems with Applications 34, 1721–1732.CrossRef Google Scholar

R Development Core Team (2009). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria (http://www.R-project.org).Google Scholar

Rueda, J. L. and Salas, C. (2003). Seasonal variation of a molluscan assemblage living in a Caulerpa prolifera meadow within the inner Bay of Cádiz (SW Spain). Estuarine Coastal and Shelf Science 57, 909–918.CrossRef Google Scholar

Siroky, D. (2009). Navigating Random Forests and related advances in algorithmic modeling. Statistics Surveys 3, 147–163.CrossRef Google Scholar

Sokal, R. R. and Rohlf, F. J. (1995). Biometry. Principles and Practice of Statistics in Biological Research, 3rd Edn. W.H. Freeman and Company, New York, USA.Google Scholar

Svetnik, V., Liaw, A., Tong, C., Culberson, J. C., Sheridan, R. P. and Feuston, B. P. (2003). Random forest: a classification and regression tool for compound classification and QSAR modeling. Journal of Chemical Information and Modelling 43, 1947–1958.Google Scholar PubMed

Timi, J. (2007). Parasites as biological tags for stock discrimination in marine fish from South American Atlantic waters. Journal of Helminthology 81, 107–111.CrossRef Google Scholar PubMed

Venables, W. N. and Ripley, B. D. (2002) Modern Applied Statistics with S. 4th Edn. Springer, New York, USA.CrossRef Google Scholar

Article contents

Discrimination of fish populations using parasites: Random Forests on a ‘predictable’ host-parasite system

Summary

Keywords

Access options

Article purchase

Temporarily unavailable

References

REFERENCES

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests