Part 1 : Model invalidation is a good thing. It means that we are forced to reconsider either model structures or the available data more closely, that is to challenge our fundamental understanding of the problem at hand. It is not easy, however, to decide when a model should be invalidated, when we expect that the sources of uncertainty in environmental modelling will often be epistemic rather than simply aleatory in nature. In particular, epistemic errors in model inputs may well exert a very strong control over how accurate we might expect model predictions to be when compared against evaluation data that might also be subject to epistemic uncertainties. We suggest that both modellers and referees should treat model validation as a form of Turing-like Test, whilst being more explicit about how the uncertainties in observed data and their impacts are assessed. Eight principles in formulating such tests are presented. Being explicit about the decisions made in framing an analysis is one important way to facilitate communication with users of model outputs, especially when it is intended to use a model simulator as a ‘model of everywhere’ or ‘digital twin’ of a catchment system. An example application of the concepts is provided in Part 2.
Part 2 :
Part 1 of this study discussed the concept of using a form of Turing-like Test for model evaluation, together with eight principles for implementing such an approach. In this part, the framing of fitness-for-purpose as a Turing-like Test is discussed, together with an example application of trying to assess whether a rainfall-runoff model might be an adequate representation of the discharge response in a catchment for predicting future natural flood management scenarios. It is shown that the variation between event runoff coefficients in the record can be used to create some limits of acceptability that implicitly take some account of the epistemic uncertainties arising from lack of knowledge about errors in rainfall and discharge observations. In the case study it is demonstrated that the model used cannot be validated in this way across all the range of observed discharges, but that behavioural models can be found for the peak flows that are the subject of interest in the application. Thinking in terms of the Turing-like Test focusses attention on the critical observations needed to test whether streamflow is being produced in the right way so that a model is considered as fit-for-purpose in predicting the impacts of future change scenarios. As is the case for uncertainty estimation in general, it is argued that the assumptions made in setting
behavioural limits of acceptability should be stated explicitly to leave an audit trail in any application that can be reviewed by users of the model outputs.