How Bad Software Leads to Bad Science

swatow · on Dec 26, 2014

I'm happy that people are discussing this, but I really don't like they way the article uncritically repeats the message of the Software Sustainability Institute.

The study by the SSI, only proves that people are writing software, and that many of these people have no training in software engineering. The SSI claim this training is necessary to prevent things like the retractions of journal articles they discuss in the article. But their survey doesn't go as far as to provide evidence that there is a link between formal training in software engineering, and errors in scientific journals.

If this article was science, I would call it bad science, because the data they present is not sufficient evidence for their hypothesis.

stinos · on Dec 26, 2014

While you are right regarding hard data an evidence, I think the core idea is

    Problems arise when that software is designed by researchers who really don’t know what they’re doing when it comes to coding. A single mistake in the code can lead to a result that appears innocuous enough, but is actually incorrect.

and I am actually sure that is, unfortunatley, very very much a real problem. The reasons I believe so are anecdotal, but so overwhelming I cannot ignore them. I personally prevented already a couple of false results of making it into a paper. And that is just me, in just one research group, in just one university. And that also happened by accident mostly (just staring at a guy's screen during a conversation and noticing errors all over the place, redoing an analysis for more subjects and figuring out it is completely wrong, ...). The mistakes I've seen are close to terrifying. Scripts ignoring all input but instead using a generated fixed dataset. Scripts always yielding 'good' results even if you give them white noise as input. Basically, all typcial programming errors and antipatterns ever invented come together in huge god scripts which produce the results, while in the background Matlab is spitting out one warning after the other, console windows are screaming errors, and all that is just being ignored by the researcher who thinks he/she just mastered the skill of programming. Or knows he/she doesn't, but just doesn't care. As long as there are good results, you know.

Oh yes I'm getting a bit sentimental now and it's not like that everywhere and not every researcher writes code like that. But if even me, with over 10 years of programming experience and all my automated builds and tests and whatnot cannot reliably produce code without one mistake, how on earth are researchers for which the programming part is often just a byproduct, a necessary evil, going to do that?

swatow · on Dec 26, 2014

I'm totally on board with the idea that research software is a problem. My question is whether formal training (which the Sustainable Software Institute who created this study seem to advocate) is the best or most important solution.

Two other possibilities:

- Research code is uniquely difficult in ways that are different to other kinds of software. This is because the logic is often inherently complicated, since you are doing a complex calculation.

- Researchers are stuck using old fashioned languages and tools because they don't know better. Note that this doesn't necessarily they need training, only being pointed to the right places.

collyw · on Dec 26, 2014

I work with researchers. While I can't say whether they produced reliable results or not, they would regularly grind the network to a half by producing millions of small files. I did talks on how to use a database, and why it would stop these problems, but hardly any were interested.

stinos · on Dec 26, 2014

Hard to tell if training by SSI or so is the solution, but any kind of training would probably do way more good than bad. Sure some research code is quite difficult and not all alike your typical we/business app, but that doesn't mean universal programming principles don't apply or wouldn't make for better code. I'd also argue that pointing to the right places, in a way that sticks, i.e. with proper explanation of why those are the right places, is training.

venomsnake · on Dec 26, 2014

If a scientist is good enough to design a proper experiment, he is good enough to design a good program. The skillset is the same.

stinos · on Dec 26, 2014

The skilset is similar maybe, but putting it in practice is something else. In your claim you skipped the part where the scientist used his skills to become good enough to design a good program. As long as that learning process didn't happen, and that takes time, he/she will not be good enough. As such I know excellent scientists which are known and praised in their field and come up with inventive designs. Yet their code to do something as simple as plotting a histogram is a mess.

ScottBurson · on Dec 26, 2014

This is not remotely true, to the point that I don't even understand how you could assert it. What skills taught to physicists or chemists or biologists are even related to notions like modularity and data abstraction?

Hell, I've seen people come out of computer science PhD programs with poor software engineering skills and habits. If CS programs don't reliably teach these things, how can you claim researchers in other fields automatically have them?

angersock · on Dec 26, 2014

"Good enough" and "actually carries through" are different.

Especially in programming, where you want to actively avoid reinventing the wheel as much as is possible.

danieltillett · on Dec 26, 2014

This is really just bad science - software is just allowing an aspect of poor science to be revealed.

stinos · on Dec 26, 2014

In this particular field software is just one of the many means to get to results. Is every scientist who happens to not be particularily good at one of those means a bad scientist?

spuz · on Dec 26, 2014

I don't think so. Imagine you are a good scientist but a poor coder. How would you verify that the results of your research were valid? You could ask another team to review your software, or you could ask another team to recreate your software, or you could hire a professional software development team to do the job of coding for you.

danieltillett · on Dec 26, 2014

Part of being a good scientist is knowing the areas you are an expert in and asking for help in those areas that you are not. If you don't know how to write code then you need to collaborate with someone who does.

In my experience these sort of bad outcome occurs when a senior supervisor gives a student or junior post doc a project that involves programming, but is not able to provide adequate supervision. Instead of finding a collaborator that can help they just let the junior scientist thrash around on their own.

guscost · on Dec 26, 2014

As a general rule, we shouldn't take seriously any computer models for which the source code is not available.

There is no valid reason why this should not already be a universal practice. If you are looking at any paper which cites the output of a custom-built computer program but omits the source code, it's probably safe to assume that the program is a mess and therefore the conclusions are not (yet) robust.

crosvenir · on Dec 26, 2014

I mostly agree, though if I had to choose between the model on which the code was based or the code itself, I'd prefer the model.

While it would be great and preferable to have both, isn't the code just a language/stack/programmer(s) specific interpretation of the underlying model?

ScottBurson · on Dec 26, 2014

> isn't the code just a language/stack/programmer(s) specific interpretation of the underlying model?

There's the catch -- it should be, but there's no way to verify it's correct without comparing the source code against the model. (Well, I suppose if you had access to the raw data and the time to rewrite the software yourself from scratch, you could do that and see if you got the same results. But who has time for that?)

marvy · on Dec 26, 2014

I don't know who has time for that, but it's far more reliable than auditing source code. Two people using two different languages are not likely to code up the same bug, but overlooking a bug in code someone else wrote is easy. Besides, coding things up yourself is not much slower than a full audit: understanding other people's code is hard. There's a reason that many developers have an urge to throw out "legacy" code and redo it from scratch.

guscost · on Dec 26, 2014

Sure, that's true. But the purpose of asking for the code is because if the program works there is guaranteed to be enough information to reverse-engineer the corresponding model if necessary. With just a specification of a model there may be built-in assumptions involved which only become apparent in the implementation. Standard language and protocols are one way to avoid this problem, but these are usually not as exacting as a compiler.

I'd definitely prefer to have both a model and an implementation available, but if I had to pick just one it would be the code.

Thriptic · on Dec 26, 2014

I am a researcher with 0 software development experience who is trying to refactor a piece of software for image analysis that a former post-doc left behind. Many individuals in our lab (including myself up until recently) write software for analysis and simulation without any understanding of basic coding best practices such as version control, unit testing, documentation, formatting for readability, or even commenting. This comes back to bite our lab in the ass frequently.

This occurs because many of our people take the code cademy course on python for example and then assume they "know how to code" when they finish; if they run into problems, they can simply consult stack overflow for anything they don't know. They therefore never learn anything about formal software development best practices unless they put in extra effort to do so, which they generally don't.

I have learned a lot trying to refactor over the last few months, and I have tried to pass on some of this knowledge to my labmates. We have actually come a long way over the last few months. We now employ version control, use style guides, and more robustly document programs. I am currently trying to sell my labmates on the merits of unit testing but we'll see how that goes :)

stinos · on Dec 26, 2014

Things like this make me happy. Not all labs have the proper atmosphere for getting such improvements pushed through, so if you try and start succeeding that's marvellous.

88e282102ae2e5b · on Dec 26, 2014

If you're interested in improving this situation, the Software Carpentry Foundation trains scientists in modern programming practices, and also teaches scientists and programmers how to teach programming to others: http://software-carpentry.org/

bioinformatics · on Dec 26, 2014

It's not only a problem of bad software, it's a problem of lack of testing, lack of proper development management (two factors of bad software, indeed), but mainly a problem of poor or lacking documentation. Not code comments, but actual documentation that allow other people to use the program. This leads to improper usage, errors that cannot be solved and are not reported and/or noticed by whoever is running the software.

Not to mention the publish-and-forget.

craigyk · on Dec 26, 2014

Absolutely. But I think that testing is still the bigger problem; a lot of scientific software is used in conjunction with other scientific software, and the interfaces themselves are also usually undefined and untested. So something as simple as IT upgrading a package to fix a bug or add a feature might break a lab's workflow in ways that are difficult to detect. It's easy to mess this kind of software up by creating code that produces incorrect results despite appearing OK by the field's dominant scoring criteria.

Also publish and forget... that's just the natural fallout of the way research is conducted and funded. I've written scientific software that I don't feel much personal obligation to maintain or update. It's challenging work which has greatly reduced payoff (to the powers that be) after it has been tallied on the scoreboard. Also, most positions in scientific labs are on a clock; not exactly an environment that fosters long-term stewardship.

I'd love to see science funding agencies spend a couple million to fund reasonably sized groups of 5-10 programmers/scientists to produce code over ~3 year funding cycles. Just to make sure that signals don't get crossed, these groups should be told not to spend a single minute of their time on publishing to journals. Dissemination is limited to short talks and training workshops. Continued funding would also focus more heavily on user surveys than third-party citations.

bioinformatics · on Dec 26, 2014

I agree with everything you mentioned. I have tried to establish some testing procedures in the labs I have been in the past, where even backing up data was a myth, with zero success. With the requirement to compile code in four different OSs, testing and some automated procedure was the key, but not one person came on board.

Now moving to a diagnostic setting, and working by myself, testing and documentation have been the main goals to me. And now, every single piece of third-party software has to be exhaustively validated in every aspect before being put into production mode.

"I'd love to see science funding agencies spend a couple million to fund reasonably sized groups of 5-10 programmers/scientists to produce code over ~3 year funding cycles. Just to make sure that signals don't get crossed, these groups should be told not to spend a single minute of their time on publishing to journals. Dissemination is limited to short talks and training workshops. Continued funding would also focus more heavily on user surveys than third-party citations."

On the side of software/funding, your ideas are spot on. I think it is about time to have professional and steady funding to keep software working and available. I haven't published app notes or the like in the past, most of the software I worked on were never published, but I see important applications where the code is not open source, there's no testing and no updates for years. We need to move from the on-the-clock science to some model that would allow scientific software to be properly maintained.

Maybe if we treated scientific software like we treat "regular IT", might be the key.

jacquesm · on Dec 26, 2014

That's only one part of the problem. Bad data leads to bad science just as frequently if not more frequently, a limited understanding of the effects of sample sizes on output quality for statistical procedures is another, as well as what the software is actually doing internally.

My main exposure to this has been by observing a biologist draw absolutely unsupportable conclusions from a bunch of very low quality data using some software as if it was a magic incantation. Not that that bothered supervisors or anybody else, as long as the funding kept coming everybody was absolutely happy.

The software wasn't all that bad, even though the user interface was crappy. The whole pipeline did not stand up to scrutiny, and of course my sample rate of this sort of thing is too low to draw conclusions from but I really hope that was an example of how it is normally not done.

Personally I think that if you aren't using software to do something faster that you could do yourself manually then you probably shouldn't be using that software.

plg · on Dec 26, 2014

The alternative (at least in the extreme) is bad as well --- scientists who can't program anything and are rely on pre-canned point and click analyses and programs. There isn't a button for X and so I can't do X. I pressed the Y button and so of course my Y analysis is valid. Etc.

Edit: you think this is hyperbole but I've heard the above two statements more than you might guess, coming out of the mouths of prominent scientists.

hobs · on Dec 26, 2014

I dont think its something relegated to scientists, many people without some software experience view software as magic, and assume when it says X button does X, that it does it without error and without effort.