Elsevier

Ecological Modelling

Volume 359, 10 September 2017, Pages 92-102
Ecological Modelling

Research article
Why input matters: Selection of climate data sets for modelling the potential distribution of a treeline species in the Himalayan region

https://doi.org/10.1016/j.ecolmodel.2017.05.021Get rights and content

Highlights

  • Generalized Linear Models were used to model the potential distribution of Betula utilis in the Himalayan region.

  • Evaluation of predictive ability between climate data sets derived from different statistical methods.

  • Comparison of ‘interpolated’ (i.e. WORLDCLIM) and ‘quasi-mechanistical statistical downscaling’ (i.e. CHELSA) climate data.

  • Models based CHELSA climate data had higher predictive power, WORLDCLIM consistently overpredicted the potential habitat.

  • Unmindful usage of climatic variables for environmental niche models may potentially cause misleading projections.

Abstract

Betula utilis is a major constituent of alpine treeline ecotones in the western and central Himalayan region. The objective of this study is to analyse for the first time the performance of different climatic predictors in modelling the potential distribution of B. utilis in the subalpine and alpine belts of the Himalayan region. Using Generalized Linear Models (GLM) we aim at examining climatic factors controlling the species distribution under current climate conditions. We evaluate the predictive ability of climate data derived from different statistical methods GLMs were created using least correlated bioclimatic variables derived from two different climate data sets: 1) interpolated climate data (i.e. WORLDCLIM; Hijmans et al., 2005), and 2) quasi-mechanistical statistical downscaling (i.e. CHELSA; Karger et al., 2016). Model accuracy was evaluated using threshold-independent (Area Under the Curve) and threshold-dependent (True Skill Statistics) measures. Although there were no significant differences between the models in AUC, we found highly significant differences (p  0.01) in TSS. We conclude that models based on variables of CHELSA climate data had higher predictive power, whereas models using WORLDCLIM climate data consistently overpredicted the potential suitable habitat for B. utilis.

Although climatic variables of WORLDCLIM are widely used in modelling species distribution, our results suggest to treat them with caution when topographically complex regions like the Himalaya are in focus. Unmindful usage of climatic variables for environmental niche models potentially causes misleading projections.

Introduction

The aim of modelling species potential distribution is to characterize suitable habitat conditions, based on climatological, environmental and biotic correlates (Soberón and Nakamura, 2009). The general approach is to link species occurrences with climatic and topographic variables to estimate the species distribution range, since habitat suitability is considerably influenced by the prevailing climate (Pearson and Dawson, 2003). It is assumed that a species occurs within a climatic range determined by its climatic needs within a range of spatial scales (Trivedi et al., 2008).

Within the scope of modelling species niches or distribution, modelling studies face numerous challenges. Not only the choice of modelling algorithm is subject to numerous sources of uncertainties (Elith et al., 2006, Araújo and New, 2007), but also the data used for modelling. Models using presence-absence data have proven to be of great value in predicting species distributions (Guisan et al., 2002, Thuiller et al., 2008), but this data is often not available. Other elements of uncertainties in modelling species distribution are attributed to sample design, sample size, species prevalence, sample resolution, study area extent and the like (for detailed discussion see Franklin, 2009). Further challenges arise from the spatial structure of species occurrence data that may be collinear with environmental data (Araújo and Guisan, 2006, Loiselle et al., 2008, Naimi et al., 2013). Sometimes, areas have been unequally sampled due to differential accessibility of a study area, resulting in occurrences of species with sampling bias. Sampling records often cluster near the centre of climatic conditions under which the species occurs (Loiselle et al., 2008). This leads to species documentations that do not cover the whole range of suitable habitat conditions for respective species. Such geographic sampling bias can lead to sampling bias in environmental space, which represents a major problem for modelling (Veloz, 2009; for the effects of sampling bias on model evaluation see Anderson and Gonzalez, 2011). This holds particularly true for sampling treeline species in remote areas like the Himalayan region. Due to lower accessibility of treeline sites, the number of available sampling plots is sparse, which has a reciprocal effect on prediction performance (Araújo et al., 2005). Araújo and Guisan (2006) found that models tend to predict species occupying a narrow niche better than species with a wider niche.

The underlying concept of most modelling studies is the prediction of species distribution ranges using climatic variables. The choice of environmental variables used to model species distributions may result in different distribution maps for the same species (Luoto et al., 2007). Whereas multi-collinearity and spatial autocorrelation of predictors are subject in numerous studies (Dirnböck and Dullinger, 2004, Dormann et al., 2007, Dormann et al., 2013, Braunisch et al., 2013), and extensive care is taken in selecting uncorrelated predictor variables, differences in model performance arising from available climate data sets remains largely out of focus in most studies.

Biased climate data can lead to distorted models (Heikkinen et al., 2006). Geographic and environmental biases are contrary to the assumption of many modelling techniques that localities represent a random sample from the area being modelled (Phillips et al., 2006). In many cases, freely available gridded climate data sets do not satisfy the requirements of ecological climate impact studies, and complicate the investigation of climate ecosystem interactions (Soria-Auza et al., 2010).

In the last decade, WORLDCLIM (Hijmans et al., 2005) has been the most prominent global climate data set. Especially in Europe and Northern America, WORLDCLIM shows high accuracy (Hijmans et al., 2005), and is used in numerous biogeographical studies (Elith et al., 2006, Hijmans and Graham, 2006, Broennimann et al., 2012). WORLDCLIM has also been used to model species distributions in the Himalayan region (Forrest et al., 2012, Liu et al., 2017), the accuracy, however, needs to be evaluated. Bobrowski et al. (2017) pointed out some drawbacks, related to the usage of WORLDCLIM.

WORLDCLIM represents a simple interpolated climate data set, which regionalizes monthly observations of precipitation and temperature based on a weighted linear regression approach, using latitude, longitude and elevation as predictor variables. Despite the high spatial raster resolution (i.e. 1 × 1 km), WORLDCLIM ignores atmospheric processes at local scale which are essential for the formation of site-specific topoclimatic conditions in high mountain environments. Many studies show that local-scale atmospheric conditions are highly influenced by the underlying terrain. Anisotropic heating at different slope positions as well as cold air drainage and pooling in mountain valleys during autochthonous weather conditions result in a complex temperature pattern, which distinctly modifies the distribution of plant communities (Bobrowski et al., 2017). The spatial pattern of precipitation is affected by wind- and leeward slope positions, with hyper-humid climate conditions at the southern declivity of the Himalayan range and semi-arid to arid conditions in the trans-Himalayan valleys.

Since 2016, a new fine-scale (i.e. 1 × 1 km), long-term climate raster data set with global coverage called CHELSA (Climatologies at high resolution for the earth’ land surface areas) is available (Karger et al., 2016). CHELSA was compared and evaluated with three climate data sets (i.a. WORLDCLIM), and showed similar performance for temperature, but higher performance for prediction of orographic precipitation patterns (Karger et al., 2016). Both climate data sets use the same raw data to produce the same bioclimatic raster-layers. However, CHELSA represents the first global climate data set based on statistical downscaling, whereas WORLDCLIM is based on interpolation.

To date, there are only very few studies aiming at comparing and evaluating modelling results obtained by different (e.g., climate data sets (comparison of SAGA and WORLDCLIM in Soria-Auza et al., 2010 using Böhner, 2006 and Hijmans et al., 2005). Comparative studies, which evaluate the performance of ecological niche models using different global climate data sets for modelling the potential distribution of Himalayan treeline tree species’ or other Himalayan vascular plant species’ do not exist. We selected the treeline-forming species Betula utilis as a target species since an improved accuracy in modelling the current distribution is a precondition for a more precise modelling of potential range expansions of treeline trees under climate change conditions (Schickhoff et al., 2015).

In order to investigate the impact of each climate data set we compared the predicted current distribution of Betula utilis in the Himalayan region. We applied Generalized Linear Models, using each climate data set respectively, to model the distribution range and compare and evaluate projected distribution range maps. We hypothesized that there will be discrepancies in the predictions of the two climate data sets. We assume a higher prediction accuracy of CHELSA because of its capability to reflect mountain-specific climatic conditions, in particular in terms of precipitation-related variables.

Section snippets

Study area and species data

The Himalayan mountain system is located between the Tibetan Highland in the north and the Indo-Gangetic plains in the south, and extends from Afghanistan in the northwest (c. 36°N and 70°E) to Yunnan in the southeast (c. 26°N and 100°E). It is a vast mountain region, covering an area of more than 1.000.000 km2, with a length of c. 3000 km (Pakistan to SW China) and a maximum width of 400 km.

The Himalayan mountains show a distinct three-dimensional geoecological differentiation, with complex

Comparison between climate data sets

Correlations between corresponding climate variables yielded partially high correlation coefficients (Table 2). As for temperature-related variables, the highest correlation coefficient was found for Temperature of the Wettest Quarter (rs = 0.98, p  0.001). For Temperature Annual Range the correlation coefficient was moderate, but still highly significant (rs = 0.62, p  0.001). Regarding precipitation-related variables, Average Precipitation of March, April and May yielded the highest correlation

Comparison of the climate data sets

In ecological niche modelling, the evaluation of model performance using different climate data sets has rarely been addressed so far. This study compares for the first time CHELSA and WORLDCLIM climate data to model the potential distribution of B. utilis in the Himalayan region. Unlike CHELSA, which only recently became available (Karger et al., 2016, version 1.1), WORLDCLIM climate data (Hijmans et al., 2005) have been widely used (6284 citations in ISI Web of Knowledge in February 2017),

Conclusions

CHELSA located the ecological niche of B. utilis at higher elevations. In addition, the modelled niche tends to be less diffuse compared to WORLDCLIM (Fig. 5). In fact, the ecological niche modelled by CHELSA are in closer correspondence to the authors’ field knowledge, and the model predictions match the actual existing distribution range of B. utilis to a vast extent.

These findings expand on former research on B. utilis (Bobrowski et al., 2017), confirming that climate data, which reflect

Acknowledgements

We would like to express our gratitude to Himalayan colleagues, guides and local people who accompanied us on numerous field trips to Betula treelines. This study was carried out in the framework of the TREELINE project and partially funded by the German Research Foundation (DFG-SCHI 436/14-1). We would like to thank two anonymous reviewers for their diligent work and thoughtful suggestions on the earlier version of the manuscript.

References (83)

  • W. Thuiller et al.

    Predicting global change impacts on plant species’ distributions: future challenges

    Perspect. Plant Ecol. Evol. Syst.

    (2008)
  • H. Akaike

    A new look at the statistical model identification

    IEEE Trans. Autom. Control

    (1974)
  • O. Allouche et al.

    Assessing the accuracy of species distribution models: prevalence, kappa and the true skill statistic (TSS)

    J. Appl. Ecol.

    (2006)
  • M. Araújo et al.

    Five (or so) challenges for species distribution modelling

    J. Biogeogr.

    (2006)
  • M.B. Araújo et al.

    Validation of species-climate impact models under climate change

    Global Change Biol.

    (2005)
  • S. Arlot et al.

    A survey of cross-validation procedures for model selection

    Stat. Surv.

    (2010)
  • M.P. Austin

    A silent clash of paradigms: some inconsistencies in community ecology

    Oikos

    (1999)
  • J. Böhner

    General climatic controls and topoclimatic variations in Central and High Asia

    Boreas

    (2006)
  • M. Barbet-Massin et al.

    Selecting pseudo-absences for species distribution models: how, where and how many?

    Methods Ecol. Evol.

    (2012)
  • Braun, G., 1996. Vegetationsgeographische Untersuchungen im NW-Karakorum (Pakistan). Kartierung der aktuellen...
  • V. Braunisch et al.

    Selecting from correlated climate variables: a major source of uncertainty for predicting species distributions under climate change

    Ecography

    (2013)
  • O. Broennimann et al.

    Measuring ecological niche overlap from occurrence and spatial environmental data

    Global Ecol. Biogeogr.

    (2012)
  • K.P. Burnham et al.

    Model Selection and Multimodel Inference: A Practical Information-theoretic Approach

    (2002)
  • C. Daly et al.

    Development of new climate and plant adaptation maps for China

  • C. Daly et al.

    A knowledge-based approach to the statistical mapping of climate

    Clim. Res.

    (2002)
  • T. Dirnböck et al.

    Habitat distribution models, spatial autocorrelation, functional traits and dispersal capacity of alpine plant species

    J. Veg. Sci.

    (2004)
  • C.F. Dormann et al.

    Methods to account for spatial autocorrelation in the analysis of species distributional data: a review

    Ecography

    (2007)
  • C.F. Dormann et al.

    Collinearity: a review of methods to deal with it and a simulation study evaluating their performance

    Ecography

    (2013)
  • R.-Y. Duan et al.

    The predictive performance and stability of six species distribution models

    PLoS One

    (2014)
  • S. Dullinger et al.

    Modelling climate change-driven treeline shifts: relative effects of temperature increase, dispersal and invasibility

    J. Ecol.

    (2004)
  • P. Dutilleul

    Modifying the t-test for assessing the correlation between two spatial processes

    Biometrics

    (1993)
  • ESRI

    ArcGIS Desktop: Release 10.1

    (2012)
  • E. Eberhardt et al.

    Vegetation map of the Batura Valley (Hunza Karakorum, North Pakistan)

    Erdkunde

    (2007)
  • J. Elith et al.

    Predictions and their validation: rare plants in the Central Highlands, Victoria, Australia

  • J. Elith et al.

    Novel methods improve prediction of species distributions from occurrence data

    Ecography

    (2006)
  • A.H. Fielding et al.

    A review of methods for the assessment of prediction errors in conservation presence/absence models

    Environ. Conserv.

    (1997)
  • J.A. Flueck

    A study of some measures of forecast verification

  • J. Franklin

    Predictive vegetation mapping − geographic modelling of biospatial patterns in relation to environmental gradients

    Prog. Phys. Geogr.

    (2009)
  • E.A. Freeman et al.

    Presence absence: an R package for presence-absences model analysis

    J. Stat. Softw.

    (2008)
  • GBIF (Global Biodiversity Information Facility). Biodiversity occurrence data provided by: Missouri Botanical Garden,...
  • R.K. Heikkinen et al.

    Methods and uncertainties in bioclimatic envelope modelling under climate change

    Prog. Phys. Geogr.

    (2006)
  • Cited by (0)

    View full text