Estimating optimal complexity for ecological niche models: A jackknife approach for species with small sample sizes
Introduction
Ecological niche models (ENMs) and species distribution models (SDMs) based on presence-only occurrence data constitute widely used tools for many areas of biogeographic research, as well as for conservation planning (Papeş and Gaubert, 2007, Wilting et al., 2010, Lawler et al., 2011, Anderson, 2013). Here, we follow the paradigm of ecological niche modeling of the conditions suitable for the species in model calibration, evaluation, and interpretation (Peterson et al., 2011, Anderson, 2012). However, the methodological advances we apply are equally applicable to models aimed at characterizing the species’ occupied distribution (SDMs, sensu stricto). ENMs examine associations between known occurrences of a species and abiotic environmental (often climatic) data in the geographic region of interest. The resulting model approximates the environmental conditions that the species can inhabit (the species’ existing fundamental niche, subject to clear assumptions); that model then can be applied to geography, yielding estimates of the corresponding areas with suitable environmental conditions (its abiotically suitable distribution; see Peterson et al. (2011) for terminology and assumptions regarding the characteristics of occurrence and environmental data).
Despite their broad appeal, ENMs may be especially problematic when implemented with species for which few occurrence records exist; nevertheless, such situations often correspond to precisely the species most in need of predictive models for conservation-based initiatives (Gaubert et al., 2006). Specifically, model accuracy decreases and model variability increases with decreasing sample size (Wisz et al., 2008). If possible, the paucity of occurrence data should be rectified by increasing efforts put into field surveys and data sharing (Cayuela et al., 2009). However, this seldom is feasible in the time frame within which conservation decisions need to be made. As an alternative, optimizing or tuning model settings (sometimes called “smoothing”) to estimate optimal model complexity can result in higher-quality output than employing default settings (Elith et al., 2010, Anderson and Gonzalez, 2011, Warren and Seifert, 2011, Radosavljevic and Anderson, in press). Furthermore, optimal settings likely vary among species as well as for different combinations of the occurrence localities, study region, and environmental data at hand. Therefore, we explored model tuning as a way of improving ENMs for datasets with few occurrence records. In particular, we used a delete-one jackknife approach suggested for model evaluation recently (a form of k-fold cross validation where k is equal to the number of occurrence localities in the original dataset; Peterson et al., 2011; see also Pearson et al., 2007). Although this approach may also be useful for higher sample sizes (e.g., up to ca. 25 records), we here employ it for species with very few records (<10).
As an assessment of this approach, we used the presence-background modeling software Maxent (Phillips et al., 2006) to generate ENMs for two species of spiny pocket mice across a range of program settings (Supplementary Fig. 3). We compared the performance of default settings to a variety of user-specified settings. Maxent identifies geographic areas of suitable conditions for a species, based on known occurrence records, by applying a maximum entropy model to estimate the species’ response given a set of constraints (environmental variables). We chose Maxent because it: (1) is in common use; and (2) has been found to perform well for small sample sizes in previous studies (Wisz et al., 2008); yet, (3) is sensitive to model settings that affect model complexity (Elith et al., 2010, Anderson and Gonzalez, 2011, Warren and Seifert, 2011, Syfert et al., 2013). In the tuning experiments that led to the current default settings, Phillips and Dudík (2008) stated that for datasets unlike those used in that study, it may be necessary to use further tuning to optimize the program's performance. Even though we tested our approach using Maxent, this jackknife approach for model tuning with small sample sizes is general and can be extended to other modeling methods. We assessed models based on quantitative evaluations of performance, and compared optimal to default models using measures of similarity. Independently, we evaluated model output qualitatively.
Section snippets
Study species and region
We used two species of spiny pocket mice, Heteromys australis and Heteromys teleus (Rodentia: Heteromyidae), to conduct our tuning experiments. These species represent suitable entities for the current study for several reasons. Recent taxonomic research provides high-quality (although limited) occurrence data, as well as general natural-history information regarding the habitats occupied by the species. Furthermore, strong climatic gradients exist in the regions occupied by these species,
Omission rate
H. australis generally suffered from higher ORs than H. teleus (Fig. 2). Observed average ORs were variable across regularization multipliers for each feature class, with higher regularization multipliers generally leading to lower ORs within a given feature class. However, the default feature class (L) displayed notably less variability across changing regularization multipliers. The majority of feature–class–regularization–multiplier combinations omitted the evaluation record in four or more
Optimal settings for the examined species
Several notable patterns emerge from the quantitative evaluations. Although the species differed in details, a few prevailing trends existed for OR and AUC across regularization multipliers. Most feature classes showed lower ORs at higher regularization multipliers. This result is consistent with the higher protection against overfitting provided by higher regularization multipliers—which should lead to simpler, less restricted predictions. In contrast, AUC values varied comparatively less
Acknowledgments
This research was made possible by funding from the U. S. National Science Foundation (NSF DEB-1119915 and DEB-0717357, including a Research Experiences for Undergraduates supplement to support MS) and the City College of the City University of New York. MS was supported by awards from the City College Fellowship, Gerald S. Brenner Endowed Science Scholarship, and the City College Academy for Professional Preparation. Funds to present preliminary results were provided by the International
References (33)
- et al.
Species-specific tuning increases robustness to sampling bias in models of species distributions: an implementation with Maxent
Ecol. Model.
(2011) - et al.
Evaluating predictive models of species’ distributions: criteria for selecting optimal models
Ecol. Model.
(2003) - et al.
Modeling species’ geographic distributions for preliminary conservation assessments: an implementation with the spiny pocket mice (Heteromys) of Ecuador
Biol. Conserv.
(2004) - et al.
Natural history collections and the conservation of poorly known taxa: ecological niche modeling in central African rainforest genets (Genetta spp.)
Biol. Conserv.
(2006) - et al.
Maximum entropy modeling of species geographic distributions
Ecol. Model.
(2006) Real vs. artefactual absences in species distributions: tests for Oryzomys albigularis (Rodentia: Muridae) in Venezuela
J. Biogeogr.
(2003)Harnessing the world's biodiversity data: promise and peril in ecological niche modeling of species distributions
Ann. N.Y. Acad. Sci.
(2012)A framework for using niche models to estimate impacts of climate change on species distributions
Ann. N.Y. Acad. Sci.
(2013)- et al.
A new species of spiny pocket mouse (Heteromyidae: Heteromys) endemic to western Ecuador
Am. Mus. Novit.
(2002) - et al.
The effect of the extent of the study region on GIS models of species geographic distributions and estimates of niche evolution: preliminary tests with montane rodents (genus Nephelomys) in Venezuela
J. Biogeogr.
(2010)
The effects of small sample size and sample bias on threshold selection and accuracy assessment of species distribution models
Ecography
NichePy: modular tools for estimating the similarity of ecological niche and species distribution models
Methods Ecol. Evol.
Species distribution modeling in the tropics: problems, potentialities, and the role of biological data for effective species conservation
Trop. Conserv. Sci.
The art of modelling range-shifting species
Methods Ecol. Evol.
A statistical explanation of MaxEnt for ecologists
Divers. Distrib.
Very high resolution interpolated climate surfaces for global land areas
Int. J. Climatol.
Cited by (0)
- 1
Present address: Department of Biological Sciences, George Washington University, 2023 G St. NW, Washington, DC 20052, USA.