Dr Yifan Zhang1, Dr Peter Thorburn1, Mr Peter Fitch2
1CSIRO, Brisbane, Australia
2CSIRO, Canberra, Australia
Water quality high-frequency monitoring offers a comprehensive and improved insight into the temporal and spatial variability of the target ecosystem. However, most monitoring system lacks the consideration of sensor data quality control. The sensor data missing, background noises and signal interference have long been a huge obstacle for the users in understanding and analysing the sensor data, therefore makes the utilisation of sensor data much inefficient.
Therefore, we present an online data cleaning system for water quality sensor data. After collecting the raw sensor data, the data cleaning system applied different data filters to corresponding water quality sensor streams. In this approach, the specific environmental effects and can be considered separately. Cleaned data streams are then sent to the web-based frontend interfaces for end users.
There are two main tasks in this system: detect and remove water quality outliers, and recover the missing sensor data. For the first task, the water quality filters are built based on the variable-specific threshold, changing rate and statistical distributions. The machine learning-based algorithms such as KNN are applied in filling the sensor data gaps in the monitoring streams.
The prototype system releases the end users from the trivial data cleaning work and shows a significant improvement in the readability of the water quality sensor data. In the next stage, more neural network based algorithms would be tested and integrated to provide more reliable and accurate data cleaning results.
Yi-Fan Zhang is a Postdoctoral fellow in Agriculture & Food, CSIRO. He received a PhD in data science from Queensland University of Technology in 2016. His work focuses on deep learning for agriculture decision making and management, with an emphasis on time series modelling and forecasting.