Wednesday, August 17, 2011

I love my model

In this post I want to discuss an aspect of climate change research that is not often explicitly addressed, and that is the activities of the mathematical modellers who construct the complex models which are key components of the whole IPCC process. The outputs of these models are what lead to projections of looming disaster in climate change, biodiversity, population explosions, financial meltdown and all manner of doom-laden predictions that find their way to the front pages of the popular press (and to the front pages of Nature, Science et al). Given the importance of these models, it’s worth spending some time looking in some detail at the incentives of modelling.

Firstly, I need to point out that mathematical modelling is a key part of my day job. I write complex models in the finance and IT operations industry. I can’t go into detail because these models are considered to be major parts of my employer’s intellectual property, and details are shrouded by strict non-disclosure agreements. The content of the models I design, and which I and others code, give my employer a competitive advantage that it does not want to lose. However, I can say some fairly generic statements: our models are empirical models of operational activities in different domains. Some of these models are highly complex and contain tens of thousands of input variables, hundreds of thousands of data points, are multi-dimensional, contain tens of thousands of equations and produce tens of thousands of results (not as data cubes). We use specialised modelling tools and languages. This is anything but a couple of worksheets in Excel. These models have been developed and revised over a number of years and continue to evolve to take account of changes in the business landscape. As an aside, my PhD thesis was in using machine learning to generate data validation models, so aside from the day job I have also studied and researched the topic.

I am not claiming that the models I work on are of the same scale and complexity as the global circulation models (GCMs) that attempt to model the climate. However, my experience with models is closer to that than someone who deals with smaller datasets. The nature of our models depends on a high degree of numerical computation – equation driven – rather than simple summarising of data into cubes (e.g. for OLAP type analysis). Furthermore, as modellers we work with domain experts to design and test our models. In the same way that the GCMs are produced by programmers and scientists working together (sometimes the programmers are climate scientists, sometimes they are just scientific programmers working with climate scientists), so our models are produced by mixed teams, with some of the developers having domain expertise, but some just having generic modelling expertise.

Having established my credentials, hopefully, we can turn attention to the main point. How do we test our models? What are the psychological issues that arise from this validation? What does this tell us about climate change and other fields that depend on models to such a high degree?

Obviously we don’t have to publish peer reviewed journal articles, but the outputs of our models are used in business analysis and decision making. There’s money riding directly on our results. This means that our models are subject to very high levels of scrutiny. As part of the development we use test datasets during coding and testing. We are aware of the dangers of relying too much on these test datasets because there’s always the danger of ‘over-fitting’ – the model ends up being perfect for the test data because it’s tweaked and refined to give ideal outputs. So, at certain points we switch from test datasets to other datasets for validation. But this isn’t the level of scrutiny that really tells us how good the model is, for that we switch tack completely.

The best form of scrutiny is to hand over the model to panels of domain experts. At this point the model is no longer in the hands of those of us who have spent months working on them. It’s different teams of people, those who will use the models for real with real data to drive real analysis. They have no emotional attachment to the fruits of our labours. They have their own data sets they can use, they have their own theories about the given domain and often will have Excel workbooks with cut down models that they’ve used in the past. They can be ruthless. Although we may have asked for their advice or had detailed discussions with some of them during development, they will not have been involved in the design and construction of the model. This is actually a virtue, though it’s painful for us as developers.

They will subject our models to all kinds of tests and scenarios that we will not have thought of. For example, in a complex model it’s easy to propagate ‘magic’ numbers. These are hard coded factors that scale values, aid in dimensional analysis and make sure that different types of data play nice together. It’s always a temptation to put these in when coding. It just makes the code work and gives you the right sort of results. Outside scrutiny can help stamp this out. People will pore through results and disentangle layers of equations and if there’s a magic number sitting there then there had better be a valid reason for it.

As developers we do get defensive about our models, we become attached to them. Despite the mathematical nature of it, the models are implemented as code – computer programs in other words. And like software developers everywhere we can get caught up by the cleverness of our constructs, the elegance of our code or the sheer technical brilliance of our algorithms. But the people who will test our models for real, they don’t care about any of this. The results of our models drive commercial decisions, there are financial consequences of getting it wrong. This means they have incentives to be as objective as possible in testing our models. If the models are deficient in some way we have to fix them and send them back out for review again – nothing else will do.

By keeping the development and the testing separate like this we ensure that we incentives that help the process. It’s in our interest to pass the external tests imposed by independent groups of experts, and we can only do this by making sure we design properly in the first place and that we code to a high standard. It’s in the external testers interest to be objective because if they don’t then a deficient model output may impact a decision that has financial (or possibly legal) consequences.

Similarly, it’s not just the models that are developed and tested separately. Although we use test datasets during model build and developer testing, the real data comes from our users. Again, we have a separation of concerns. The input data is not collected by the same teams as those who build the models, and sometimes not by those who use the model outputs. We do not become attached to the datasets, though the people who have sweated to assemble, cleanse and validate the data will have an emotional attachment to it.

Flip attention now to the incestuous world of climate research. We have a situation where the same people are building the model, testing the model, using the model. We have the same teams collecting the data and building the models. There’s billions of dollars worth of decisions being made off the back of these model outputs, but no proper incentives to be brutally objective. With the best will in the world it’s hard to be objective about your own work, but that’s precisely what is needed in place. In a commercial environment we’ve evolved a system that separates concerns and ensures a better degree of peer review than that which exists in climate research were the same people do everything. We do it differently, they do it differently in clinical trials and they need to do it differently in climate science.

No comments: