Lakes are considered as sensitive indicators of environmental change which are impacted by both natural and anthropogenic drivers. While previous studies have often focussed on individual lakes, identifying coherent temporal patterns across multiple lakes on a global scale may indicate the existence of common drivers and pressures. The synchrony between major fluctuations in a set of time series is often described as temporal coherence (Salisbury et al., 2011). GloboLakes (www.globolakes.ac.uk) is a 5-year Natural Environment Research Council consortium project involving 6 UK research groups whose goal is to investigate the state and temporal coherence of 1000 lakes and their response to climatic and other environmental drivers of change using a 20-year archive of satellite based observations. Determinands of interest include temperature, chlorophyll and coloured dissolved organic matter.
Within the GloboLakes project, functional clustering approaches (Finazzi et al., 2014) have been used to identify clusters of patterns within and between lakes in terms of different lake water quality determinands. By exploring within lake heterogeneity and across lake patterns, we aim to provide a global picture of coherence of lake water quality and investigate water quality response to environmental change at a global scale.
A functional data analysis (FDA) approach has been taken where the observed time series are viewed as potentially noisy realisations of unobserved functions. For each determinand smooth curves representing the underlying time series which correspond to individual pixels, or groups of pixels, are clustered, where two curves belong to the same cluster if they are coherent with each other. Regarding the data in this functional setting enables any long-term temporal and seasonal patterns in the data to be estimated and then compared across lakes. FDA also provides a computationally efficient way of dealing with large quantities of data. Since the number and structure of any clusters in the data are unknown, several functional clustering approaches are applied including k-means, hierarchical and state space approaches to assess the robustness of the results.
For within lake clustering, functional principal components analysis (Jones and Rice, 1992, Peng and Paul, 2009) has been initially carried out before any clustering procedure is applied. The application of FPCA identifies the dominant modes of variation in the data set and substantially reduces the dimension of the functional data, hence providing a very computationally efficient way of exploring any underlying structure in the data.
The European Space Agency funded ARC-Lake project (MacCallum and Merchant, 2012) has employed the Along Track Scanning Radiometers instrument on-board the Envisat satellite in order to derive observations of temperature (LSWT) for a large number of lakes across the globe. Results are presented here are based on a subset of the ARC-Lake v3 dataset (see www.geos.ed.ac.uk/arclake/data for details).
Results & Discussion
The first data-set discussed is comprised of bi-monthly lake average LSWTs for 700 lakes covering an 18- year period from 1995 to 2012, providing 405 observations for each of the lakes. All of the 700 lakes selected for this study are also being investigated as part of the GloboLakes project. After applying functional clustering to the lake average time series for each of the 700 lakes, a data driven approach was used to select the statistically optimal number of clusters (Tibshirani et al., 2001). Model based clustering based on FPCA scores identified 8 lake clusters. These clusters were coherent in terms of patterns of average LSWT over the time period.
In terms of global coherence, unsurprisingly, the key distinction between the clusters was the latitude of the lakes. Northern hemisphere and Equatorial band clusters are defined primarily by relatively smaller scale differences in the phase and amplitude of the seasonal patterns than those clusters of primarily Southern hemisphere lakes.
Within Lake Clustering
In addition to exploring coherence in lake average water temperature the spatial coherence of water quality determinands over time within lakes has also been investigated. Functional clustering was applied to individual pixels within lakes where each pixel represents a time series corresponding to 0.025 degrees of the lake.
For example, clustering results of LSWT at Lake Peipsi, a large lake (234 pixels) situated on the border of Estonia and Russia indicated that three clusters were required to adequately capture the spatial variability within this lake. The distinction between these clusters was based both on mean temperature and in terms of the maximum temperature reached each year. Spatial correlation between the pixels was also taken into account in the formation of the clusters.
Interesting spatial patterns can be identified by the within lake clustering procedure; there may be distinct basins within lakes which behave very differently to one another, and consequently may need to be regarded separately when exploring attribution of and responses to changes in the environment.
The clustering approaches considered here are robust and computationally efficient for large numbers of time series of potentially noisy data. The use of functional data analysis enables the data dimensions to be reduced substantially. While temperature time series are already reasonably smooth, creating functional data objects can be very useful when highly noisy time series, such as for chlorophyll are to be clustered. In terms of global coherence for temperature, it is very reassuring that the clusters obtained have a sensible ecological interpretation. Within lake clustering captures smaller scale detail - our ultimate goal is to incorporate within lake clustering results into global coherence. 1. Finazzi, F., Haggarty, R., Miller, C., Scott, M., and FassÃ², A. (2014) A comparison of clustering approaches for the study of the temporal coherence of multiple time series. Stochastic Environmental Research and Risk Assessment. ISSN 1436-3240
2. Jones, M. C.; Rice, J. A. (1992). "Displaying the Important Features of Large Collections of Similar Curves". The American Statistician 46 (2): 140
3. MacCallum, S. N. and C. J. Merchant (2010). Arc-lake algorithm theoretical basis document. Technical report, School of GeoSciences, The University of Edinburgh.
4. Peng, J. and Paul, D. (2009). A geometric approach to maximum likelihood estimation of the functional principal components from sparse longitudinal data. Journal of Computational and Graphical Statistics. December 1, 2009, 18(4): 995-1015
5. Salisbury, J., D. Vandemark, J. Campbell, C. Hunt, D. Wisser, N. Reul, and B. Chapron (2011). Spatial and temporal coherence between Amazon river discharge, salinity, and light absorption by colored organic carbon in western tropical Atlantic surface waters. Journal of Geophysical Research: Oceans 116(C7).
6. Tibshirani, R., G. Walther, and T. Hastie (2001). Estimating the number of clusters in a data set via the gap statistic. Journal of the Royal Statistical Society. Series B (Statistical Methodology) 63(2), pp. 411Â–423.