Secret Computer Code Threatens Science

105 点作者 emcl大约 13 年前

14 条评论

bravura大约 13 年前

Back in my academic days, I became a proponent for open-notebook science.Far too often, researchers never release code because it's never "polished" enough.So I began publishing my code on github from the moment I started the project. e.g. <a href="https://github.com/turian/neural-language-model" rel="nofollow">https://github.com/turian/neural-language-model</a>However, some of my more conservative colleagues were averse to this approach. I constantly debated with my office-mate, who was of the opinion that there are many reasons not to present half-finished work.So there is also a large cultural barrier to more open science. If there were publishing pressure on researchers to open their code, then it might effect a cultural shift.

评论 #3845372 未加载

评论 #3845093 未加载

评论 #3845349 未加载

评论 #3846731 未加载

jgrahamc大约 13 年前

Nice to see. Back in February I had a paper in Nature (with two co-authors) arguing for the same thing (<a href="http://www.nature.com/nature/journal/v482/n7386/full/nature10836.html" rel="nofollow">http://www.nature.com/nature/journal/v482/n7386/full/nature1...</a>). With this paper in Science (<a href="http://www.sciencemag.org/content/336/6078/159.summary" rel="nofollow">http://www.sciencemag.org/content/336/6078/159.summary</a>) it means that the top two journals in the world have now published papers arguing for source code openness.Probably time for an international cooperation on defining open code policies: <a href="http://blog.jgc.org/2012/04/more-support-for-open-software-in.html" rel="nofollow">http://blog.jgc.org/2012/04/more-support-for-open-software-i...</a>

Tichy大约 13 年前

Hm, shouldn't the articles contain the information necessary to rewrite the code? Then rewriting the code could be seen as replicating the experiment.Both sharing and not sharing seems to have pros and cons. For example if the code is buggy and shared, odds might be higher that the bugs will never be found because nobody will bother trying to write the code again.

评论 #3845495 未加载

评论 #3846553 未加载

strictfp大约 13 年前

All researchers know that they should release their code. The problem is that they are just bad programmers. Programming has turned into a required skill for many scientist, but the school system is lagging behind. So right now we have all these scientists lacking fundamental skills. Sooner or later this will be recognized and schools will then hopefully accept programming or computer science as just another subject in the curriculum.

tzs大约 13 年前

If the programs are actually important in the production or verification of the research, then don't we want peers who try to independently reproduce the experiment to also independently develop their own programs, so that their reproduction is truly independent?

评论 #3845292 未加载

CJefferson大约 13 年前

My biggest problem with releasing code is that it people will expect support.I released some code that only compiled on visual studio 6, with a specific version of a fairly expensive library. I got several emails asking for a mac or linux version, rather an update for more modern compilers.Personally I would have preferred people just reimplement the code from the paper. I suspect for them it would be less work.

rlvesco7大约 13 年前

I've often thought there should be an open code/data license that restricts usage and dissemination only to those who agree to make their code and data equally available.Why? Because in many fields there is a negative incentive to provide code and data. It not only takes time, but it opens you up to criticism by people who wouldn't be willing to make their own code/data available. Perhaps something like this would raise the bar and encourage more people to share their code/data. Just a thought.

raphinou大约 13 年前

I experience it first hand as I am implementing a machine learning algorithm described in a paper. There are questions arising on what and how they did their experiments and on details of the algorithm, which I can't deduce from the paper . Hence, I'm guessing but still unable to reproduce their results. Leaving me to wonder if I have a bug or if I misinterpreted something....

评论 #3846573 未加载

alexkappa大约 13 年前

The title is somewhat misleading. It made me think of a "secret code" hidden behind a bush waiting to cut the throat of Science...

cek大约 13 年前

Pedantic, I know, but: Source code. Not source codes.Seeing this made the author lose credibility on the subject.

评论 #3845309 未加载

评论 #3845722 未加载

评论 #3845179 未加载

评论 #3847046 未加载

bbgm大约 13 年前

To some extent it is culture and the current incentive model, and to some extent it's just a need to be pragmatic. If you're a grad student who wants to defend in a certain amount of time and you have various deadlines (conferences, concerns about being scooped), you end up hacking up some code that gets your work done, allows you to analyze your data and publish. That's what gets you recognition, helps you defend, etc. In some cases, the code is your work and those groups spend to spend more time on making sure the code is robust, re-usable, and sustainable.In general though the system doesn't encourage you to follow good practices at all. Having said that I've definitely seen a change over the last few years towards more awareness.

bbgm大约 13 年前

Various people, e.g. Titus Brown, have been trying to take a different approach. Titus doesn't practice open notebook science, but he does try and practice "replication". More on replication: <a href="http://ivory.idyll.org/blog/apr-12/replication-i.html" rel="nofollow">http://ivory.idyll.org/blog/apr-12/replication-i.html</a>The paper gets it own website: <a href="http://ged.msu.edu/papers/2012-diginorm/" rel="nofollow">http://ged.msu.edu/papers/2012-diginorm/</a>; which includes arXic preprint, data and code repositories and even an AMI with everything loaded. Basically eveything you need to replicate the work in the paper.

shawn-butler大约 13 年前

Seems to me that most finished academic papers including dissertations, etc in the field of computer science also lack source code. Alot of institutions even discourage by policy the submitting of source to examiners unless it is illustrative of the text. Not based on any serious survey other than what I read of course so I may be wrong in this belief.

sabalaba大约 13 年前

Reproducible research is a really interesting topic. One of my good friends in academia showed me how he was using babel (an emacs mode) to do literate programming and reproducible research. I think itsa fantastic idea, the data, conclusions and code used to arrive there should all be part of the peer review process; open source research.

14 条评论

bravura大约 13 年前

评论 #3845372 未加载

评论 #3845093 未加载

评论 #3845349 未加载

评论 #3846731 未加载

jgrahamc大约 13 年前

Tichy大约 13 年前

评论 #3845495 未加载

评论 #3846553 未加载

strictfp大约 13 年前

tzs大约 13 年前

评论 #3845292 未加载

CJefferson大约 13 年前

rlvesco7大约 13 年前

raphinou大约 13 年前

评论 #3846573 未加载

alexkappa大约 13 年前

The title is somewhat misleading. It made me think of a "secret code" hidden behind a bush waiting to cut the throat of Science...

cek大约 13 年前

Pedantic, I know, but: Source code. Not source codes.Seeing this made the author lose credibility on the subject.