Elsevier

Ecological Modelling

Volume 269, 10 November 2013, Pages 9-17
Ecological Modelling

Estimating optimal complexity for ecological niche models: A jackknife approach for species with small sample sizes

https://doi.org/10.1016/j.ecolmodel.2013.08.011Get rights and content

Highlights

  • We model the ecological niches of two rodent species with few occurrence records.

  • We use a delete-one jackknife approach to optimize model settings.

  • Complex settings with protection from overfitting led to better performing models.

  • Optimizing model complexity can lead to better models for species with few records.

Abstract

Algorithms for producing ecological niche models and species distribution models are widely applied in biogeography and conservation biology. However, in some cases models produced by these algorithms may not represent optimal levels of complexity and, hence, likely either overestimate or underestimate the species’ ecological tolerances. Here, we evaluate a delete-one jackknife approach for tuning model settings to approximate optimal model complexity and enhance predictions for datasets with few (here, <10) occurrence records. We apply this approach to tune two settings that regulate model complexity (feature class and regularization multiplier) in the presence-background modeling program Maxent for two species of spiny pocket mice in Ecuador and southwestern Colombia. For these datasets, we identified an optimal feature class parameter that is more complex than the default. Highly complex features are not typically recommended for use with small sample sizes in Maxent. However, when coupled with higher regularization, complex features (that allow more flexible responses to environmental variables) can obtain models that out-perform those built using default settings (employing less complex feature classes). Although small sample sizes remain a serious limitation to model building, this jackknife optimization approach can be used for species with few localities (<approximately 20–25) to produce models that maximize the utility of the little information available.

Introduction

Ecological niche models (ENMs) and species distribution models (SDMs) based on presence-only occurrence data constitute widely used tools for many areas of biogeographic research, as well as for conservation planning (Papeş and Gaubert, 2007, Wilting et al., 2010, Lawler et al., 2011, Anderson, 2013). Here, we follow the paradigm of ecological niche modeling of the conditions suitable for the species in model calibration, evaluation, and interpretation (Peterson et al., 2011, Anderson, 2012). However, the methodological advances we apply are equally applicable to models aimed at characterizing the species’ occupied distribution (SDMs, sensu stricto). ENMs examine associations between known occurrences of a species and abiotic environmental (often climatic) data in the geographic region of interest. The resulting model approximates the environmental conditions that the species can inhabit (the species’ existing fundamental niche, subject to clear assumptions); that model then can be applied to geography, yielding estimates of the corresponding areas with suitable environmental conditions (its abiotically suitable distribution; see Peterson et al. (2011) for terminology and assumptions regarding the characteristics of occurrence and environmental data).

Despite their broad appeal, ENMs may be especially problematic when implemented with species for which few occurrence records exist; nevertheless, such situations often correspond to precisely the species most in need of predictive models for conservation-based initiatives (Gaubert et al., 2006). Specifically, model accuracy decreases and model variability increases with decreasing sample size (Wisz et al., 2008). If possible, the paucity of occurrence data should be rectified by increasing efforts put into field surveys and data sharing (Cayuela et al., 2009). However, this seldom is feasible in the time frame within which conservation decisions need to be made. As an alternative, optimizing or tuning model settings (sometimes called “smoothing”) to estimate optimal model complexity can result in higher-quality output than employing default settings (Elith et al., 2010, Anderson and Gonzalez, 2011, Warren and Seifert, 2011, Radosavljevic and Anderson, in press). Furthermore, optimal settings likely vary among species as well as for different combinations of the occurrence localities, study region, and environmental data at hand. Therefore, we explored model tuning as a way of improving ENMs for datasets with few occurrence records. In particular, we used a delete-one jackknife approach suggested for model evaluation recently (a form of k-fold cross validation where k is equal to the number of occurrence localities in the original dataset; Peterson et al., 2011; see also Pearson et al., 2007). Although this approach may also be useful for higher sample sizes (e.g., up to ca. 25 records), we here employ it for species with very few records (<10).

As an assessment of this approach, we used the presence-background modeling software Maxent (Phillips et al., 2006) to generate ENMs for two species of spiny pocket mice across a range of program settings (Supplementary Fig. 3). We compared the performance of default settings to a variety of user-specified settings. Maxent identifies geographic areas of suitable conditions for a species, based on known occurrence records, by applying a maximum entropy model to estimate the species’ response given a set of constraints (environmental variables). We chose Maxent because it: (1) is in common use; and (2) has been found to perform well for small sample sizes in previous studies (Wisz et al., 2008); yet, (3) is sensitive to model settings that affect model complexity (Elith et al., 2010, Anderson and Gonzalez, 2011, Warren and Seifert, 2011, Syfert et al., 2013). In the tuning experiments that led to the current default settings, Phillips and Dudík (2008) stated that for datasets unlike those used in that study, it may be necessary to use further tuning to optimize the program's performance. Even though we tested our approach using Maxent, this jackknife approach for model tuning with small sample sizes is general and can be extended to other modeling methods. We assessed models based on quantitative evaluations of performance, and compared optimal to default models using measures of similarity. Independently, we evaluated model output qualitatively.

Section snippets

Study species and region

We used two species of spiny pocket mice, Heteromys australis and Heteromys teleus (Rodentia: Heteromyidae), to conduct our tuning experiments. These species represent suitable entities for the current study for several reasons. Recent taxonomic research provides high-quality (although limited) occurrence data, as well as general natural-history information regarding the habitats occupied by the species. Furthermore, strong climatic gradients exist in the regions occupied by these species,

Omission rate

H. australis generally suffered from higher ORs than H. teleus (Fig. 2). Observed average ORs were variable across regularization multipliers for each feature class, with higher regularization multipliers generally leading to lower ORs within a given feature class. However, the default feature class (L) displayed notably less variability across changing regularization multipliers. The majority of feature–class–regularization–multiplier combinations omitted the evaluation record in four or more

Optimal settings for the examined species

Several notable patterns emerge from the quantitative evaluations. Although the species differed in details, a few prevailing trends existed for OR and AUC across regularization multipliers. Most feature classes showed lower ORs at higher regularization multipliers. This result is consistent with the higher protection against overfitting provided by higher regularization multipliers—which should lead to simpler, less restricted predictions. In contrast, AUC values varied comparatively less

Acknowledgments

This research was made possible by funding from the U. S. National Science Foundation (NSF DEB-1119915 and DEB-0717357, including a Research Experiences for Undergraduates supplement to support MS) and the City College of the City University of New York. MS was supported by awards from the City College Fellowship, Gerald S. Brenner Endowed Science Scholarship, and the City College Academy for Professional Preparation. Funds to present preliminary results were provided by the International

References (33)

  • W.T. Bean et al.

    The effects of small sample size and sample bias on threshold selection and accuracy assessment of species distribution models

    Ecography

    (2012)
  • B. Bentlage et al.

    NichePy: modular tools for estimating the similarity of ecological niche and species distribution models

    Methods Ecol. Evol.

    (2012)
  • L. Cayuela et al.

    Species distribution modeling in the tropics: problems, potentialities, and the role of biological data for effective species conservation

    Trop. Conserv. Sci.

    (2009)
  • J. Elith et al.

    The art of modelling range-shifting species

    Methods Ecol. Evol.

    (2010)
  • J. Elith et al.

    A statistical explanation of MaxEnt for ecologists

    Divers. Distrib.

    (2011)
  • R.J. Hijmans et al.

    Very high resolution interpolated climate surfaces for global land areas

    Int. J. Climatol.

    (2005)
  • Cited by (0)

    1

    Present address: Department of Biological Sciences, George Washington University, 2023 G St. NW, Washington, DC 20052, USA.

    View full text