Marian Scott 1, Claire Miller 1, Ruth O'Donnell 1, Kelly Gallacher 1, Amira Elayouty 1, Maria Franco Villoria 2, Francesco Finazzi 3
University of Glasgow1, University of Turin2, University of Bergamo3
In the context of the aquatic environment, questions are asked in terms of the state of the environment and paraphrased as "what is the condition now, in what way has the condition changed since the previous assessment, can we attribute the changes to actions and management?" The evidence for the answers comes from monitoring, and improvements in monitoring have, and continue to, significantly enhance our ability to detect and attribute change (in the presence of considerable natural variability). One area of technological development is in emerging sensor technology, which is able to deliver enhanced dynamic detail of environmental systems at unprecedented scale.
In this paper, we describe the growing area of water analytics, whose goal is to make ecological sense from the large volumes of data - to paraphrase "Seeing IS believing" but at the same time we should NOT drown in the "data deluge". Water analytics support the chain of "data-information-intelligence-decisions and action" and requires innovative and computationally efficient modelling to handle the challenging aspects of quantity, high temporal resolution, spatial networks and the variety and complexity of the data so generated.
Methods and materials
Statistical and analytic tools are being developed to handle efficiently the volumes, variety and complexity of data arising from sensors technology. Key features of the statistical models include incorporation of the network structures reflecting flow and connectedness in streams, spatial and temporal correlations, non-stationarity (dynamic relationships which themselves change over time and space), changing phenologies, a focus on high quantiles or extreme events, spatial clustering (and temporal coherence) of environmental time series, and fusion of different data streams (earth observations from satellites, in-situ buoys and routine and traditional sampling). We illustrate three specific aquatic challenges, first modelling diffuse pollutants over a stream network, second, examining trends in TOC in Scottish rivers using a state space model and thirdly, visualising high frequency sensor data from a single site.
For all three examples, data have been provided by the Scottish Environment Protection Agency (SEPA). Data on the Tweed river basin are available from January 1987 to August 2011 for 83 monitoring stations on the river. The timings of the observations are irregular, with frequencies which vary across the stations but are around one per month on average. For the second example, Total Organic Carbon (TOC) data from 333 monitoring locations across rivers in Scotland over 44 months, covering the period January 2007 - August 2010 were provided. For the third example, 15 minute dissolved oxygen (DO) data from one marine gauging station, and spanning 3 years from 2009 to 2012 are available.
Results & Discussion
In the first example, a flexible regression model was built to describe the trends over time, including any seasonal patterns, in nitrate across the river network. The model incorporates a spatial correlation structure that uses flow connectedness, and an ecologically relevant measure of distance (stream distance) to determine relatedness over space. The flexible regression model (an additive model) allows the data to determine the nature of the trends over time, which are no longer constrained to be monotonic. The seasonal pattern is similarly modelled, but allowed to vary over the 20 years of observation. Additionally, explanatory variables, such as geology, land use, number of livestock and meteorology can be incorporated as drivers of nitrate levels. Such a model can be used to predict nitrate concentrations at any location in the river basin, and with an associated uncertainty. (O'Donnell et al, 2014, Miller et al, 2014)
In the second example, many sites behave similarly throughout time, formally, we say that sites that behave similarly over time possess temporal coherence. Spatially, monitoring is undertaken at a national scale and as the temporal frequency and the spatial scale extends, then challenges in applying statistical models to high dimensional data in both space and time arise. Using a state space modelling approach, and a Matern spatial correlation with effective range of 30km, 5 clusters of TOC are identified, the clusters, using such a short observation period are driven by short term seasonal fluctuations. (Scott et al, 2013, Finazzi et al, 2014)
In the last example, a wavelet analysis is first carried out on dissolved oxygen. Wavelet analysis is a useful tool for analyzing non-stationarity or/and high frequency time series. The result is a time-scale decomposition of the original time series that allows cyclical components over different frequencies, as well as the long-term trend to be identified. Finally a quantile regression was fit, to examine whether there are any trends present in extremes, where extremes in this context refers to a high quantile of the distribution of the variable of interest. In order to increase flexibility, the model is specified in a non-parametric regression framework, so that the relationship is not forced to follow a particular parametric form. Further, the model is built as an additive model, whose individual components are defined as smooth functions of time (Franco Villoria et al, in preparation)
Recent statistical developments allow identification of often complex trends, for the mean and for high quantiles, dynamically, and over space. Incorporating spatial correlation which respects the network structure makes such models attractive and powerful. Understanding the spatial extent of temporal coherence for water quality parameters is a valuable tool to extrapolate from measured to unmeasured locations, and hence monitoring of a subset of sites with representative temporal patterns offers one possible solution to efficient and cost effective sampling. Statistical tools allow us to gain maximum benefit from the wealth of environmental data being collected.
1. Finazzi, F., Haggarty, R., Miller, C., Scott, M., and FassÃ², A. (2014) A comparison of clustering approaches for the study of the temporal coherence of multiple time series. Stochastic Environmental Research and Risk Assessment. ISSN 1436- 3240
2. Miller C, Magdalina A, Willows R, Bowman A, Scott E M, Lee D, Burgess C, Pope L, Pannullo F, Haggarty R (2014). Spatiotemporal statistical modelling of long term change in river nutrient concentrations in England and Wales. Sci Tot Env, 466-467.
3. Franco Villoria M, Scott E M, Hoey T (2014) Spatial analysis of extreme river flows using quantile regression, in preparation.
4. OÂ’Donnell D, Rushworth A, Bowman A W, Scott E M, Hallard M (2014) Flexible regression models over river networks. Applied Statistics.
5. Scott E M, Miller C, Finazzi F, Haggarty R (2013) Coherency in space of lake and river temperature and water quality records. Proceedings of Italian Statistical Society meeting, Brescia, Italy