Elsevier

Ecological Modelling

Volume 199, Issue 2, 16 November 2006, Pages 142-152
Ecological Modelling

Evaluating the ability of habitat suitability models to predict species presences

https://doi.org/10.1016/j.ecolmodel.2006.05.017Get rights and content

Abstract

Models predicting species spatial distribution are increasingly applied to wildlife management issues, emphasising the need for reliable methods to evaluate the accuracy of their predictions. As many available datasets (e.g. museums, herbariums, atlas) do not provide reliable information about species absences, several presence-only based analyses have been developed. However, methods to evaluate the accuracy of their predictions are few and have never been validated. The aim of this paper is to compare existing and new presence-only evaluators to usual presence/absence measures.

We use a reliable, diverse, presence/absence dataset of 114 plant species to test how common presence/absence indices (Kappa, MaxKappa, AUC, adjusted D2) compare to presence-only measures (AVI, CVI, Boyce index) for evaluating generalised linear models (GLM). Moreover we propose a new, threshold-independent evaluator, which we call “continuous Boyce index”. All indices were implemented in the BIOMAPPER software.

We show that the presence-only evaluators are fairly correlated (ρ > 0.7) to the presence/absence ones. The Boyce indices are closer to AUC than to MaxKappa and are fairly insensitive to species prevalence. In addition, the Boyce indices provide predicted-to-expected ratio curves that offer further insights into the model quality: robustness, habitat suitability resolution and deviation from randomness. This information helps reclassifying predicted maps into meaningful habitat suitability classes. The continuous Boyce index is thus both a complement to usual evaluation of presence/absence models and a reliable measure of presence-only based predictions.

Introduction

Models predicting the spatial distribution of species (Boyce and McDonald, 1999, Guisan and Zimmermann, 2000, Manly et al., 2002, Pearce and Boyce, 2006) – sometimes called resource selection function or habitat suitability models – are currently gaining interest. As they often help both in understanding species niche requirements and predicting species potential distribution, their use has been especially promoted to tackle conservation issues, such as managing species distribution, assessing ecological impacts of various factors (e.g. pollution, climate change), risk of biological invasions or endangered species management (Scott et al., 2002, Guisan and Thuiller, 2005). These models statistically relate field observations to a set of environmental variables, presumably reflecting some key factors of the niche, like climate, topography, geology or land-cover. They produce spatial predictions indicating the suitability of locations for a target species, community or biodiversity. Different types of modelling techniques are used to fit different types of biological information recorded at each sample site: (1) presence-only: occurrences of the target species are recorded; (2) presence/absence: each sample site is carefully monitored so as to assert with sufficient certainty whether the species is present or absent. With plants, for instance, it is commonly done by listing exhaustively all species present in each sample site. The reliability of absences depends on the species’ characteristics (e.g. biology, behaviour, history) (Hirzel et al., 2001), their local abundance and ease of detection (Kéry, 2002), and the survey design (Mackenzie and Royle, 2005). More rarely, data record information about species’ abundance or demography (e.g. growth rate, survival).

Although models based on presence-only and presence/absence data provide the same kind of predictions (e.g. habitat suitability scores), they generally cannot use the same technique. This is because presence-only methods cannot contrast their predictions with the characteristics of places where the species is absent. This partly explains why presence/absence methods have known a greater development. These differences, and the lack of absences, make comparison of the two model types difficult (Zaniewski et al., 2002).

Assessing the predictive power of a model is of paramount importance, both for theoretical and applied issues. However, while presence/absence models have received a lot of attention and many evaluators are available for them (Fielding and Bell, 1997), evaluation of presence-only models is lagging behind. There is therefore a crucial need for reliable presence-based evaluation measures, as well as an assessment of how they compare to the presence/absence measures.

The main problem of presence-only evaluation measures is the lack of absences to counterbalance the presences. It is thus difficult to discriminate a model predicting presence everywhere from a more contrasted model. Attempts to solve this problem have followed two main approaches: (1) a first approach is to generate pseudo-absences and then apply the standard presence/absence techniques (e.g. Zaniewski et al., 2002, Anderson et al., 2003). (2) A second approach is to assess how much the model predictions differ from random expectation (e.g. Boyce et al., 2002, Hirzel et al., 2002, Reutter et al., 2003). In this category, the index recently proposed by Boyce et al. (2002) offers new insights. We tested it thoroughly and derived a new evaluator from it, which does not depend on the choice of boundaries between habitat suitability classes. A third original approach, proposed by Ottaviani et al. (2004), is based on compositional analysis. However, it is restricted to cases where evaluation data are in the form of polygons or large mapping units (e.g. large grid cells in an atlas), and thus does not apply here.

In this paper, we present various presence-only evaluation measures. To validate them, we build 114 presence/absence models chosen for the reliability of their absences and evaluate them with presence-only and presence/absence evaluators. We test correspondence between them and discuss how the new “Boyce indices” can improve the interpretation and utilisation of habitat suitability models.

Section snippets

Materials and methods

We define a habitat suitability (HS) map as composed of cells (or pixels) whose quantitative values range from 0 to 1. These values indicate how close the local environment is to the species’ optimal conditions, higher values standing for the most suitable areas. This map may result from any statistical analysis (Guisan and Zimmermann, 2000, Pearce and Boyce, 2006). The models’ evaluation consists in quantifying how accurately the map is predicting the presence and absence of the species (

Results

The chosen species cover a wide spectrum of ecological niche types and sample size. The quality of their habitat suitability models range from very bad to excellent. All the investigated evaluation measures convey similar information, with Pearson correlation coefficients greater than 0.5 in most cases (Table 2a). In particular, for the models where more than 50 presence points were available, most evaluators show more than 70% of correlation (Table 2b).

Except for those based on very wide

Discussion

On the range covered by the 114 studied plant species, and according to the environmental characteristics of our study area, all evaluators convey correlated information. This is an important result meaning that the presence-only evaluators can be trusted.

Acknowledgements

We wish to thank Mark S. Boyce, Gretchen G. Moisen and all the participants of the Riederalp Workshop, Switzerland, 2004, for stimulating discussions about the model evaluation, Patrick Patthey, Julie Jacquiéry and Pietro Persico for the first explorations of the Boyce index. We also wish to thank Jane Elith and two anonymous reviewers who helped improve this article. We are grateful to Fabien Fivaz for the help with R scripts as well as to all those who contributed to the field work: Pascal

References (40)

  • ArcInfo

    ArcInfo Version 9.0

    (2004)
  • S.T. Buckland et al.

    Empirical models for the spatial distribution of wildlife

    J. Appl. Ecol.

    (1993)
  • J. Cohen

    A coefficient of agreement of nominal scales

    Educ. Psychol. Measure.

    (1960)
  • T. Dirnböck et al.

    A regional impact assessment of climate and land-use change on alpine vegetation

    J. Biogeogr.

    (2003)
  • R. Engler et al.

    An improved approach for predicting the distribution of rare and endangered species from occurrence and pseudo-absence data

    J. Appl. Ecol.

    (2004)
  • A.H. Fielding et al.

    A review of methods for the assessment of prediction errors in conservation presence/absence models

    Environ. Conserv.

    (1997)
  • A. Guisan et al.

    Predicting the potential distribution of plant species in an Alpine environment

    J. Veg. Sci.

    (1998)
  • A. Guisan et al.

    Predicting species distribution: offering more than simple habitat models

    Ecol. Lett.

    (2005)
  • T. Hastie et al.

    The Elements of Statistical Learning: Data Mining, Inference, and Prediction

    (2001)
  • A.H. Hirzel et al.

    Modelling habitat suitability for complex species distributions by the environmental-distance geometric mean

    Environ. Manage.

    (2003)
  • Cited by (0)

    View full text