Elsevier

Remote Sensing of Environment

Volume 123, August 2012, Pages 553-562
Remote Sensing of Environment

Object Based Image Analysis and Data Mining applied to a remotely sensed Landsat time-series to map sugarcane over large areas

https://doi.org/10.1016/j.rse.2012.04.011Get rights and content

Abstract

The aim of this research was to develop a methodology for contributing in the automation of sugarcane mapping over large areas, with time-series of remotely sensed imagery. To this end, two major techniques were combined: Object Based Image Analysis (OBIA) and Data Mining (DM). OBIA was used to represent the knowledge needed to map sugarcane, whereas DM was applied to generate the knowledge model. To derive the image objects, the segmentation algorithm implemented in Definiens Developer® was used. The data mining algorithm used was J48, which generates decision trees (DT) from a previously prepared training set. The study area comprises three municipalities located in the northwest of São Paulo state, all of which are good representatives of the agricultural conditions in the Southern and Southeastern regions of Brazil. A time series of Landsat TM and ETM+ images was acquired in order to represent the wide range of pattern variation along the sugarcane crop cycle. After training, the DT was applied to the Landsat time series, thus generating the desired thematic map with sugarcane ready to harvest. Classification accuracy was calculated over a set of 500 points not previously used during the training stage. Using error matrix analysis and Kappa statistics, tests for statistical significance were derived. The statistics indicated that the classification achieved an overall accuracy of 94% and a Kappa coefficient of 0.87. Results show that the combination of OBIA and DM techniques is very efficient and promising for the sugarcane classification process.

Highlights

► Object Based Image Analysis and Data Mining ► Time-series data ► Mapping sugarcane over large areas

Introduction

Agriculture plays an important role in the Brazilian socioeconomic landscape, representing almost 18% of the country's Gross Domestic Product (GDP) if the whole agro-industrial complex is considered (CEPEA, 2009). In this wealthy scenario sugarcane is a major crop, and Brazil is the biggest producer and exporter of sugarcane products worldwide (Rudorff et al., 2010). Boosted by the increasing global demand for biofuels, the area planted with sugarcane in Brazil has expanded 94.3% since 2000, occupying approximately 9.4 million hectares in 2008 (IBGE, 2008). Moreover, according to the FAO (2011), every year approximately 1250 million tons of sugarcane are produced in the twenty main producing countries worldwide, corresponding to more than 25 million hectares (FAO, 2011).

Thus, considering the importance of sugarcane, political authorities and the sugar-energy industry must be supported by tools that can provide trustworthy, rapid, reproducible and frequently updated information on sugarcane production, planted areas, the location of expanding areas, and information on harvesting procedures throughout the crop cycle. Such data may be obtained cost-efficiently from remote sensing (RS) satellite images, because their multi-spectral, synoptic, and repetitive characteristics make it possible to distinguish among different objects on the Earth's surface (Jensen, 2006).

Traditionally, sugarcane mapping using RS images has been done through visual interpretation of multi-temporal Landsat-like data (Haik et al., 2009, Rudorff et al., 2010). The procedure provides accurate and consistent results, but is costly in terms of processing time and the large number of technically-skilled people involved. As sugarcane is grown in Brazil and in other countries over large areas, there is an evident need to test advanced methods of RS image classification for sugarcane mapping, in order to make it possible for this classification to become an objective and reproducible technique when processing large amounts of data from diverse and complex landscapes (DeFries & Chan, 2000). Moreover, large area land-cover monitoring scenarios, involving large volumes of data, are becoming more and more prevalent in environmental monitoring programs. Thus, there is a pressing need for increased automation in the mapping process (Rogan et al., 2008).

Conventional pixel-based procedures of digital classification, and in particular those using only single date imagery, have difficulties with automatic pattern recognition, mainly because of the phenological variability of crops, different cropping systems and non-uniform measurement conditions (e.g. atmospheric disturbances). In the analysis of sugarcane, the restrictions are particularly well-known (Rudorff et al., 2010). In such a context Object Based Image Analysis (OBIA) seems to be promising. According to Cohen and Shoshany (2005), conventional systems execute algorithmic processing guided only by statistical data variables, whereas OBIA encompasses computational systems based on knowledge.

Applications of the OBIA model to image classification consider the analysis of an “object in space,” instead of a “pixel in space” (Navulur, 2007). The most common approach used to generate such objects is image segmentation. The segmentation process is the subdivision of an image into homogeneous regions through the grouping of pixels in accordance with determined criteria of homogeneity and heterogeneity (Haralick & Shapiro, 1985). For each object created in a segmentation process, spectral, textural, morphic and contextual attributes are generated, which may be employed in image analysis (Blaschke, 2010). After the process of outlining objects in an image, the next step is to assign them to a certain class, by comparing objects identified in the image with patterns previously defined, thus performing the classification of image objects considering them thematically homogeneous. This is what is called object oriented classification (Whiteside & Ahmad, 2005).

In OBIA, the construction of an image interpretation model (knowledge) is the most important phase and one which is often difficult to execute, since the specialist (detainer of knowledge) may lack an exact notion as to what are the best descriptive attributes of the objects that must be classified (Witten & Frank, 2005). Therefore, an interesting potential solution is to adopt Data Mining (DM) techniques that enable the automatic generation of a structure of knowledge (Silva, Câmara, Escada, & Souza, 2008).

Data Mining (DM) is a separate stage within a process known as Knowledge Discovery in Databases (KDD) (Fayyad, Piatesky-Shapiro, Smyth, & Uthurusamy, 1996). Data mining encompasses techniques and algorithms used for the effective construction of a knowledge model, which, according to Rogan et al. (2008), can be represented in the form of a decision tree.

Decision trees are developed using different measures that recursively split data sets into increasingly homogeneous subsets representing class membership. All decision tree approaches employ hierarchical, recursive partitioning of the data, resulting in decision rules that relate values or thresholds in the predictor variables with pixel classes (Friedl and Brodley, 1997, Rogan et al., 2008). A decision tree uses the divide-to-conquer strategy through a top-down approach (Witten & Frank, 2005).

An important advantage of classification trees is that they are structurally explicit, allowing for the clear interpretation of the links between the dependent variable of class membership and the independent variables of remote sensing and/or ancillary data (Lawrence & Wright, 2001).

Among the several existing algorithms of DT generation, one which is widely used, tested and validated, indicating its quality as a computational method (Silva, 2006), is C4.5 developed by Quinlan (1993). C4.5 uses the ‘information gain ratio’ to estimate splits at each node of a tree, where the information gain is a measurement of the reduction in entropy (i.e., loss of information) in the data produced by a split (Quinlan, 1993). The goal is to assign the attribute possessing the least data entropy to a node. The induction strategy of the C4.5 algorithm is explained in more detail in Quinlan (1993).

Traditional approaches to land cover classification from remotely sensed data have typically relied on statistical classifiers, such as maximum likelihood classifiers or unsupervised clustering techniques (Brown de Colstoun et al., 2003). Pal and Mather (2003) demonstrated several advantages of using a decision tree for land cover mapping, in comparison with other types of classifiers, for example, the maximum likelihood method, and artificial neural networks.

Decision trees have been preferred to statistical classifiers because they do not make implicit assumptions about normal distributions in the input data, as some classifiers need to (Friedl & Brodley, 1997). As stated by Brown de Colstoun et al. (2003), decision tree classifiers can also accept a wide variety of input data, including non-remotely sensed ancillary data, and in the form of both continuous and/or categorical variables; and on the other hand, the general simplicity and hierarchical structure of the results from the decision trees can also be valuable assets to both experienced and inexperienced users for interpretation, algorithm testing and refinement, and analysis. Albeit potentially more powerful compared to single decision trees, random forests (RF; Breiman, 2001) and other bagged approaches lack this interpretability.

Friedl and Brodley (1997) stated that the advantages of decision trees include an ability to handle data measured on different scales, a lack of any assumptions concerning the frequency distributions of the data in each of the classes, flexibility, and an ability to handle non-linear relationships between features and classes. However, according to Pal and Mather (2003), decision tree classifiers have not been as widely used within the remote sensing community as either the statistical or the neural/connectionist methods.

Peña-Barragán, Ngugi, Plant, and Six (2011) developed a combined OBIA and DT based methodology to identify 13 major crops cultivated in the agricultural area of Yolo County (California, USA). They explored the use of several vegetation indices and textural features derived from visible, near-infrared and short-wave infrared bands of ASTER satellite scenes collected during three distinct growing-season periods. Their multi-seasonal assessment of a large number of crop types and field status, evaluated by a confusion matrix method, reported an overall accuracy of 79%, and they concluded it was successful for object-based feature selection and crop identification.

Brown de Colstoun et al. (2003) used multi-temporal ETM+/Landsat-7 data and a decision tree classifier to map eleven types of land cover classes, acquiring a final land cover map with an overall accuracy of 82% (κ = 0.80), but when they considered only forest vs. non-forest classes, this same accuracy was 99.5%.

Thus, in view of the need to obtain information about sugarcane in extensive areas, and considering the potential of the above mentioned computational approaches, the objective of the present work was to test the possibility of integrating OBIA and DM to map sugarcane ready to harvest over larger areas, using a time-series of Landsat images. The accuracy of the approach was assessed using a completely independent validation data set not previously used during DT training.

Section snippets

Study area

The study area used in the present work encompasses three municipalities located to the North of São Paulo State (Fig. 1): Ipuã, São Joaquim da Barra and Guará, which cover together an area of 124,100 ha (IBGE, 2008). The region is a good representation of the agricultural conditions of much of the Southeastern and Southern Brazilian regions; the main crops besides sugarcane are cotton, peanuts, rice, beans, manioc, corn, soybeans, sorghum, tomatoes, banana, coffee, and oranges, with temporary

Image segmentation

The approach proposed in this work is based on objects that are generated from segmentation. Hence, it is fundamental that objects reliably represent the classes of interest and do not “mix” two (or more) classes within a given object. In the case of sugarcane, planting fields usually have well-defined geometrical forms. In addition they are large and contiguous. These characteristics confer quite specific spatial patterns.

Multiresolution segmentation is based upon multiple parameters already

Discussion

It must be pointed out that in expert classifiers the main form of representing knowledge (understood here as a set of interpretation keys to classify classes of interest in an imaged landscape) is a semantic network (Blaschke, 2010). Semantic networks enable the organization of rules that emulate the judgment of an expert without subjectivity and in an automated way, which is crucial when huge amounts of data are assessed, as is the case in Brazil, where sugarcane occupies almost 10 million

Conclusion

This work sought to investigate the viability of integrating object based image analysis (OBIA) and data mining (DM) techniques to map ready-to-harvest sugarcane from a temporal series of 30 m ground resolution TM Landsat-5 and ETM+ Landsat-7 images. Classification results showed high levels of exactitude, with the Global Accuracy and the Kappa coefficients reaching 93.99% and 0.87, respectively. These high accuracy measures were obtained on a sample set of 500 pixels not previously used during

Acknowledgments

We thank the Brazilian Research Councils: CNPq (Conselho Nacional de Desenvolvimento Científico e Tecnológico153208/2010-2, 142845/2011-6 and 304928/2011/9), CAPES (Coordenação de Aperfeiçoamento de Pessoal de Nível Superior) and São Paulo Research Foundation FAPESP (Fundação de Amparo à Pesquisa do Estado de São Paulo, contract 2009/02037-3).

References (41)

  • T.A. Schroeder et al.

    Radiometric correction of multi-temporal Landsat data for characterization of early successional forest patterns in western Oregon

    Remote Sensing of Environment

    (2006)
  • B.M. Steele

    Maximum posterior probability estimators of map accuracy

    Remote Sensing of Environment

    (2005)
  • D.P. Turner et al.

    Relationships between Leaf Area Index and Landsat TM Spectral Vegetation Indices across three temperate zone sites

    Remote Sensing of Environment

    (1999)
  • M. Baatz et al.

    Multiresolution segmentation: An optimization approach for high quality multi-scale image segmentation

  • L. Breiman

    Random forests

    Machine Learning

    (2001)
  • CEPEA (Centro de Estudos Avançados em Economia Aplicada)

    PIB do Agronegócio – Dados de 1994 a 2008

  • S.C. Chen et al.

    Avaliação de composições coloridas TM falsa cor para discriminação de culturas

  • R.G. Congalton et al.

    Assessing the accuracy of remotely sensed data: Principles and practices

  • Definiens

    Definiens professional 5: Reference book

    (2006)
  • FAO (Food and Agriculture Organization of the United Nations)

    Major food and agricultural commodities and producers

    Sugarcane,

    (2011)
  • Cited by (216)

    • Sugarcane abandonment mapping in Rio de Janeiro state Brazil

      2022, Remote Sensing of Environment
      Citation Excerpt :

      Mapbiomas classifies sugarcane areas in traditional sugarcane smallholder regions as a mosaic of mixed cropland and grassland/pasture. Sugarcane mapping in southeastern Brazil mainly focused on detecting extensive fields in key state producers and investigating their relationship with the sugar and ethanol industry (Aguiar et al., 2011; Vieira et al., 2012; Rudorff et al., 2005, 2010). The literature also provides examples of annual sugarcane crop mapping in the Norte Fluminense Region, Northeastern Rio de Janeiro (Mendonça et al., 2011; Barbosa et al., 2020).

    • A scalable method for the estimation of spatial disaggregation models

      2022, Computers and Geosciences
      Citation Excerpt :

      For example, dos Santos Luciano et al. (2019) built Random Forest models for eight sites of 389,000 ha each in the São Paulo state, Southeastern Brazil, and found overall accuracies ranging from 82% to 95%. Vieira et al. (2012) classified sugarcane presence in three Brazilian municipalities also from the Southeastern region and found a global accuracy of 93.99%. ( Zheng et al., 2021) detected sugarcane in the 14 Brazilian states that produce 98% of the sugarcane in the country, with the results per state varying from 80.70% to 93.10% (producer accuracy), 85.70% to 100.0% (user accuracy), and 88.60% to 95.68% (overall accuracy).

    View all citing articles on Scopus
    View full text