Dr Francky Fouedjio1, Dr Jens Klump1
1CSIRO, Kensington, Australia
Geostatistical methods such as kriging with external drift as well as machine learning techniques such as quantile regression forest have been extensively used for the spatial modelling and prediction of continuous variables when auxiliary information is available. In addition to providing predictions, both approaches are able to deliver a quantification of the uncertainty associated with the prediction at a target location. While both methods tend to produce comparable prediction maps, the prediction uncertainties reported by each method seem to be at odds.
Geostatistical approaches are, in essence, adequate for providing such prediction uncertainties. However, they frequently require significant data pre-processing and some assumptions about the underlying spatial distribution of the data, which are rarely met in practice. Machine learning techniques such as quantile regression forest, tend to require less data pre-processing and are non-parametric but rely on the independence assumption of observations. This assumption is often unrealistic, especially, when the sampling scheme is irregular.
Real-world data always come with an inherent uncertainty that can only be estimated. To explore the ability of the two methods to provide a meaningful prediction uncertainty, we examine the results based on several simulated datasets where we know the ground-truth. Apart from classical performance indicators, we use accuracy plots, probability interval width plots, and the visual examination of the uncertainty maps to compare kriging with external drift and quantile regression forest with respect to their ability to deliver reliable prediction uncertainties of spatial data.
As CSIRO Science Leader Earth Science Informatics, Jens Klump is interested in the application of information technology to geoscience questions. His areas of research are research data infrastructures as the basis of data-driven research and the application of simulation and analytics to geological data. This includes automated data and metadata capture, sensor data integration, both in the field and in the laboratory, data processing workflows, and data provenance, but also data analysis by statistical methods, machine learning and numerical modelling.