ClimateGate Reaction Part 2: The Computer Models
(Welcome Instapundit Readers!)
NOTE: If any of the following rant strikes you as placing unreasonable limits on climate modelers, then I would like to refer you to The Point. The sigma number of the climate models better take several fingers to count if we must make radical international regulatory changes to our economic activity.
This much we have learned from the Climategate scandal: the computer models used to justify the policy proposals are for crap. Leaving the validity of the underlying science aside, and focusing only on the Climate Research Unit’s computer models, we’ve learned:
- The starting point, i.e. the raw data is no longer available to be compared. So we can’t try to “re-create” the analysis that led to the currently used climate models and the catastrophic trends contained therein.
- All the inputs are “derived” inputs based on various reasoning: some data sets need to be expunged because the scientist view them as anomalies that do not fit their thesis (I’ll let you dear reader judge whether or not that is innocent or sound science or something more self serving), and other inputs were adjusted to fit some form of normalization requirements. The bottom line is that the historical computer models are not made of raw data, but rather manipulated data (and I really am using that term in a value neutral manner).
- The documentation for code is extremely poor and untraceable in some instances
- Notations of data manipulation are actually documented in some instances but not traced to any reasoning, as far as can be discerned.
- These models have yet to correctly predict any weather events or climate trends in the intervening years since the models were regularly used (say, starting in the 1995 IPCC for starters)
- The model code and design history (their source code, the design documentation, functional and technical specifications, etc) that are used as the basis for expensive policy proposals and regulatory regimes were never made available for public third party audits.
- There is no evidence that the scientific grant givers performed any technical audit of the code quality, system stability, or system accuracy.
Excuse me while I hop on my high horse.
I work in software. I have eleven years of experience in software quality assurance. I have worked for the two largest software companies on earth. I have been a tester, lead tester and/or test manager on products that performed word processing, enterprise level document management and online collaboration, and enterprise resource planning (ERP), specifically manufacturing, accounting and logistics software. I have worked in software development outfits of varying size, from small agile groups which were a bit lacking on the organizational side of things, to large groups that used somewhat rigid waterfall methodologies which were high on discipline and detail and low on adaptability. I’ve worked with numerous off shore resources as well as decentralized teams of remote full time resources.
Moreover, I’ve worked in software development that was required to meet certain government and industry standards from ISO regulations to FDA and GAMP requirements, including working directly with FDA audit consultants. My experience teaches me that:
- Software development has to be managed and developed by software pros as opposed to experts in other fields that can do some coding when called upon. The experts define the functionality, business need and underlying logic but they do not, or should not do the coding. Otherwise, while you may see innovative solutions and ideas, the execution will typically be quite amateurish and have design flaws up and down the line..
- Software development that lacks at least some sort of plan > design > document > develop > test > support life cycle is doomed to have significant bugs and ill thought out data models
- Some sort of document trail on how the code does what it does is vital to long term support.
- The more variables you throw into a system the higher the quality threshold will be, the risk to code degradation will increase and the need for huge regression cycles will be vital. It would be difficult not to understate the enormous variable load on any climate model.
- Open source software certainly has weaknesses but also some enormous strengths. The weaknesses are primarily around how open source software is often created by developers for developers. Their “customers” and “partners” are other developers who also have the ability to improve the software. Open source development can be rough, but it also can be the most dynamic. It is especially useful the more niche or small the target audience is. It strikes me as obvious that climate computer modeling should HAVE TO follow an open source model
The CRU source code does not appear to have been open source in anyway, was apparently coded (in FORTRAN!!!) by scientists whose primary expertise is in climate science and not software development. They are a group of individuals who tout their expertise at every turn but their models lack any evidence of any software development methodology above common hackery. And not to put to fine a point on it, but these models are the basis for the theory that a CO2 caused catastrophe is all but a foregone conclusion without radical international regulatory changes to our economic activity.
Lastly, consider the standards that developers who sell to or implement in the pharmaceutical industry (the industry with one of the highest regulatory requirements for their data integrity):
- At a minimum, pharma companies and their software vendors must be able to demonstrate a secure and traceable data flow
- They must demonstrate source code control
- They must demonstrate change management with a document and source code audit trail from plan/design to implementation, complete with version control and user history
- Typically, they must have some sort of electronic signature control mechanism or a reliable paper solution that traces system changes, and is legally binding
- All work processes must be fully documented with regards to system access, system usage, and any change to the system itself
We put very rigid controls on pharmaceutical companies and their software vendors to create systems that are secure, reliable and fully documented. This is seen as societal good so that we don’t have our medications tampered with either through incompetence or malicious intent. To put it kindly, there is no evidence that any remote requirements are enforced on the programmers of climate models that were A) likely paid for by taxpayer funded grants and B) are used as a basis for the theory that a CO2 caused catastrophe is all but a foregone conclusion without radical international regulatory changes to our economic activity.
If I were an opinion journalist or a busybody Senator, I might think some minimum requirements would be called for in climate model development BEFORE we go down the path of radical international regulatory changes to our economic activity:
- All research and data obtained and developed with a taxpayer funded grant should be made publicly available if it will be used as a basis for public policy
- Any software used or created to model the scientific evidence for the public policy should be required to meet the bar set for the pharmaceutical industry and other industries of equal import
- Any predictive applications should prove some level of accuracy over a pre-defined time horizon (in years) before being treated as a basis for public policy. The predictive applications must audited for accuracy under a “do nothing” scenario first to show their understanding of the situation.
- They should at least be able to accurately predict the recorded past.
- Predictive applications would then need to be audited annually, post policy implementation to show that the predicted benefits of the policy were accurate.
Again, if this doesn’t seem like a reasonable set of standards, then that’s sort of The Point. Either AGW is a nice theory or an easily provable fact. Only the latter, is worth discussing (all together now!) radical international regulatory changes to our economic activity. Of course, this assumes the policy makers that are in line to gain enormous power from the policy proposals actually care about the accuracy of the underlying science. My FDA Validated Magic 8-ball program says “don’t count on it, bud”.
[...] Dude, With Keyboard » ClimateGate Reaction Part 2: The Computer Models [...]
[...] would be so scientifically sound (when the code hasn’t been peer reviewed or put through rigorous quality control). Our company’s developers often say “sure, we can program anything you want, its just [...]
I’ve written software for many years at a company that supplied fault-tolerant hardware and software for banks, ATMs, stock exchanges, 911 call centers and the like. I’d like to take a minute to applaud and concur with this post.
Thank you, dude
if a meteorologist can’t even give an accurate weather report 50% of the time, how the hell can a climatologist predict the end of the world as we know it?
If you follow the link, I compared Al Gore to Bernie madoff.
I am also a software engineer with decades of experience. I stayed up past midnight this past week reviewing the files in only the cru-code directory. I found execution errors in the code and interesting data adjustments. I found little to mostly no evidence of software engineering practices, software testing/verification or any SQA procedures. In short, the code may or may not work – some of the time – but we have no way of knowing and they don’t either.
Per the comments in the code, they lost their raw temperature history data file, they lost their cloud cover data base, they lost code, at times, and they apparently lost the only staff member who understood the original code. And then spent 3 years tweaking and fudging the code and data to get its output to line up with what they thought it was supposed to be.
The CRU output (which can be described as a fiction) was coordinated with similarly tortured data at NASA GISS. These two “adjusted” models were then used to calibrate the RSS and UAH satellite estimates. The result is every thing is a shambles.
It is frightening that the climate science community was cavalier and unprofessional and did not take their own work seriously. When they decide to take their “science” (loosely defined in this case) seriously, we can revisit if they have something to offer.
[...] Deconstructing The computer models [...]
I am a PM who leads the development teams for GAAP & SOX compliant tools for the semiconductor industry. Really the type of industry does not matter but the accounting and reporting practices for financial reporting to the Gov’t. does. The regime is document, document, document and version control EVERYTHING. I used to have the team do build runs on Thursday nights so we could review the results and bug reports on Fridays. Oh, and the bug report database was also controlled and linked to all the rest.
The climate scientists need to be schooled in how things are done in the real world.
@everyone: thanks for stopping by!
@matt, love the comparison to Bernie Madoff. That’s a meme that needs to spread.
@RagnarD: I can appreciate that the tools necessary for a fully linked system are expensive and difficult to implement, but I’d be the grant money at these places (typically universities) would be more than enough to afford it. And even then, some free tools can get a long way there. CVS+Bugzilla and a maybe a wiki can get you most of the way there.
The bottom line: even if the underlying science was sound, there is no evidence that the climate models could predict yesterday’s weather, never mind the weather in 2100. Some software discipline must be required and audited with all data made publicly available. This is not being done now and its asinine to go touting that the science is settled on 100 year predictions.
Indeed. I’ve also worked in drug development (while an undergrad) and even what these “scientists” have admitted to is amazing. If a drug company did *half* of it — what they have admitted to! — they would probably all be in jail.
Late to the discussion, but perhaps as a research computing support geek at a large univ, I can provide some context.
Research programming is rarely up to the standards of commercial code, because /it isn’t commercial code/. It’s research, often written (like the poor hapless ‘Harry’ of the infamous READMEs – at lead he left READMEs and some in-line comments.) by someone interested in the science, but having little or no training in coding or computers. therefore, they’re left to wander around in code-land, probably having to take over a preceding Grad Student’s work (left in a similar shambles), with often only a single ‘Intro to Programming’ course (usually not even that).
This is not to excuse the Climategate mess that has cropped up (and certainly the loss of original raw data is inexcusable), but to provide a context of what kind of coding is generally done in science – it’s done by scientists, not professional coders.
There are some exceptional counter-examples – I would point you to the source code for NCO (nco.sf.net) which is a utility for processing netCDF files, usually in the pursuit of climate data reduction. It’s possible that more bytes of data have been manipulated by NCO than by any other utility in the world. The author is an extraordinarily meticulous and brilliant climatologist and programmer named Charlie Zender (also my colleague and neighbor
). I haven’t seen many commercial codes, but I think his code tree would stand well beside any of them for clarity, maintainability, rigorous test regressions, logging, and debugging.
But I have to say that he’s the exception – most other codes that I’ve helped people with look more like the (other) Harry’s.
There’s a great course that anyone thinking of touching a computer in the pursuit of science should review – the Software Carpentry site at: . Unfortunately, most scientists think the height of computational analysis is Excel.
hjm