TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

If you're going to do good science, release the computer code too

55 pointsby rglovejoyover 15 years ago

10 comments

lutormover 15 years ago
I am not surprised that people find errors in code written by researchers and grad students who have little training in software development and, perhaps more importantly, are doing so in a culture which values them writing papers, not good code. (See for example <a href="http://lanl.arxiv.org/abs/0903.3971" rel="nofollow">http://lanl.arxiv.org/abs/0903.3971</a> for a discussion of this situation in astronomy/astrophysics.)<p>I find it much more surprising that professionally developed software used for scientific research is also error ridden. And while it might be difficult to convince individual researchers to release their code, that's nothing compared to the difficulties of convincing Wolfram research to release the source code to Mathematica...<p>But I do think that research is somewhat undeservedly singled out for this, just <i>because</i> some academic software is open for inspection. Like the article mentions, it certainly seems like the financial software has caused a lot of badness. How about flight control software used by NASA that crashed the Mars orbiter? Who knows how many innocent lives have been lost due to software errors in military systems like UAVs and missiles. Maybe none, but we can't know because it's all secret. Shouldn't they be required to show their code, too?
评论 #1110547 未加载
评论 #1111364 未加载
jackfoxyover 15 years ago
If science is to remain science, and not devolve into mysticism, data and computer models must be available to other researchers in order to repeat experiments and provide knowledgeable criticism. Calling anything "settled science" which is not openly available to all researchers is not scientific.
评论 #1110534 未加载
regularfryover 15 years ago
A sound idea.<p>While I can imagine any number of reasons people might post facto not wish to release code, if it were developed from the start with the intention of releasing it, I think we'd all benefit.<p>Inevitably, the cost of doing so would increase the cost of the research, but I believe it would be worth it.
评论 #1110047 未加载
Lewishamover 15 years ago
It's surprising how few Computer Science papers release code as well. I don't care if it's platform-specific and it requires ridiculous numbers of obscure libraries and only operates on proprietary data that you can't release. I don't care, I want the code to be open-source. I want to see what you did, and whether I believe that it does what you claim it does in the paper.<p>Where possible, I open-source everything I try to be published. There's only one project I haven't (a scraper for the WoW Armory), but even then I released the library I built for it.<p>There's no excuse to not do so. Unless you have something to hide.
评论 #1110115 未加载
评论 #1110407 未加载
评论 #1110124 未加载
maurycyover 15 years ago
Finally. Finally a discussion about this.
评论 #1110356 未加载
merrakshover 15 years ago
There are a few examples of how this can be done. One of them is Mathematical Programming Computation (MPC), a journal where articles submitted must be accompanied by the source code that was used to produce the results. The article is peer-reviewed, and the code submitted is tested by "technical editors" to verify that the results are correct. See <a href="http://mpc.zib.de" rel="nofollow">http://mpc.zib.de</a>
moron4hireover 15 years ago
Opening the source for research software is absolutely vital to the concept of reproduceability. However, this fact of the level of programming training for most scientists is a major issue. A lot of novice programmers tend to fall into a trap of "it runs without error, it must be right." Even expert programmers struggle with verifying that their results are correct; technically, program verification is a mathematical impossibility. So it's a daunting task to start with, reproducing results of software-based research.<p>This is only compounded by the fact that reading source code sucks. Source code is an end result of multiple processes that occur in feedback loops. With just the source code, you never see <i>how</i> the code got that way. It's like showing someone a maze with the start and end points marked but the middle of the map blocked out.<p>Different programmer's conceptions of what constitutes good code varies widely. One man's golden code is another's garbage. Just because the source code is available doesn't mean anyone is going to understand it or be able to work with it effectively.<p>Compounding this all is the fact that few people are going to <i>want</i> to read the source code. Analyzing source code is dull work, maybe the worst job a programmer can take while still doing programming. Most programmers are far happier to discard old code and start from scratch. This is often a bad idea and doesn't lead to a better product, but at least you don't want to kill yourself while you're doing it.<p>When it comes to reproducing algorithmic results, I would prefer having a description of the algorithm, a set of inputs, and a set of outputs. I would then write the actual code myself and see if I get the same results. This, I think, is much closer to the concept of reproducing lab results in the physical sciences. You wouldn't use the same exact particle accelerators if you were verifying the results from a paper on nuclear physics. I'm afraid having access to the raw source code will be used as a crutch where logic errors are missed from reusing portions of code without much thought about the consequences. Take, for instance, the subtle differences in implementations of the modulo operator across programming languages: <a href="http://en.wikipedia.org/wiki/Modulo_operator#Common_pitfalls" rel="nofollow">http://en.wikipedia.org/wiki/Modulo_operator#Common_pitfalls</a><p>It would be great if scientific software were open. Unfortunately, it won't matter a lick if it is.
jgrahamcover 15 years ago
Yes, tell me about it: <a href="http://www.jgc.org/blog/2010/02/something-odd-in-crutem3-station-errors.html" rel="nofollow">http://www.jgc.org/blog/2010/02/something-odd-in-crutem3-sta...</a>
eshiover 15 years ago
I might be alone in this, but this seems like a symptom of the problems of IP laws.
评论 #1111064 未加载
albertcardonaover 15 years ago
The title contains the reason on why we created Fiji (<a href="http://pacific.mpi-cbg.de" rel="nofollow">http://pacific.mpi-cbg.de</a>): so that instead of releasing a Matlab script without documentation on its many parameters and exact Matlab version used, as a print out (or nowadays, downloadable .m file as supplementary material), we could offer instead a ready-downloadable, version-controlled and fully working program.<p>A colleague of mine made similar remarks recently:<p>"... if you can’t see the code of a piece of ... software, then you cannot say what the software really does, and this is not scientific."