What's the point of writing good scientific software?

40 pointsby michaelbartonover 12 years ago

12 comments

sophaclesover 12 years ago

A large portion of my job, and my team's job, is programming for research at a university. We get "researcher code" that was used to write a paper, and turn it into something for the next set of researchers to build on. There are several other groups at this university that do the same thing.I'm starting to think, there may be a field of study here. I regularly take software that "proves" some hypothesis or "shows some good results", tear it down, and throw some engineering at it, only to find that the results are not reproduced, or the benefits are severely diminished. Then I have to go track down the why... because until I can show it, it is assumed my fault.[1] The results of some of this are probably paper worthy themselves.I think a useful field, or useful conference at least could be built for people in this typeof postions, studying the meta effects of software on research. How can we report the issues found, or the updates to the numbers, without putting black marks on the reputations of people who actually are doing good work?Another interesting phenomenon that is worth study is that the refactoring/rewriting process often gets real results, but it turns out the mechanism for the improvement isn't what the original researcher thought/claimed. It is something perhaps related, perhaps a side effect, and so on. There needs to be a way to recognize both the original researcher, the programmer who found the issues, and the follow-up researchers who did some more determination of the problem.[1] This isn't as antagonistic as it sounds. It is actually a nice check on my own mistakes. Did the differences in what the researcher did and what I did introduce some strange side effect? Did I remove a shortcut that wasn't actually a shortcut and I misunderstood? A hundred other things on both sides... Research has a large component of "we don't know what we're doing, axiomatically so", and as such it is a decent way of finding out more info.

评论 #5190250 未加载

Xcelerateover 12 years ago

I've been debating about whether I want to open-source the scientific code I've been writing. A lot of it could be useful to other people in the molecular dynamics field.I recently introduced my advisor to Github, and he thought it was a good idea; however, there were a few hesitations. The first, most importantly, is the likeliness of a bug. If you put your code on a very public website like Github, there is a chance it's going to be scrutinized by everyone in your field.Now, unless you are one of the best programmers who has ever lived, there are bound to be bugs in your software, and when someone discovers them, it could have a deleterious effect on any journal articles you've written that used that code. The issue is that even though most bugs do not lead to significant changes in results, you would still need to redo all of your data to make sure that is the case. The software industry has long recognized buggy software as a reality, but I don't think the scientific community is as tolerant of it (hence the reason a lot of people hide their code).For my MD simulations, I use the well-known LAMMPS package. Bugs in it are discovered all the time! (<a href="http://lammps.sandia.gov/bug.html" rel="nofollow">http://lammps.sandia.gov/bug.html</a>). So I think there needs to be a collective realization among the scientific community that these are bound to occur and authors of journal articles can't be persecuted all the time for it. A lot of computational work is the art of approximation so I would just lump "human incompetency" under one of those approximation factors.Despite this risk, I think I'm still going to release my code at some point as I would personally welcome critique and improvement suggestions. I'd like to think I'm a better coder than most scientists since I've been coding since I was twelve in multiple language paradigms and have won a major hackathon, but eh, who knows. I'm quite sure my environment isn't up to industry standards because I've always coded solo rather than in a team.

评论 #5189510 未加载

评论 #5189945 未加载

评论 #5189823 未加载

ylemover 12 years ago

I would say that you should look at what stage you are in your career and what your goals are. If your goal is to become research faculty, you should focus on getting high impact papers out of the door--software is a tool for helping you do so.If you find yourself re-using that bit of code, then it may be worth cleaning it up and making it maintainable. If people start sending you requests for it, then it may be worthwhile open sourcing it, documenting it, maintaining it, etc.--but only if you have time.I do make open source scientific software as part of my job, but I'm at a later stage in my career and it's not something I would have a science postdoc work on--it's just not fair to them and their career prospects within science...Recently, someone asked for some reduction code that I've developed and I realized that while it was documented, I didn't have time to refactor it and clean it up--finally, I just put it on github and told them to contact me if they had questions--they were happy to have it as a starting point for what they wanted to work on. So, if you believe that you've made something worthwhile, but don't have the bandwidth to maintain it and other people might find it useful, sometimes it might be better to just put it out there and let people play with it--no guarantees, but it may help someone else get started...You can get a large number of citations in some subfields for writing commonly used software--but it may or may not help your career. For example, I have friends at various institutions around the world that tell me that their management gives them no credit for developing useful software (complete with lectures, updates, documentation, etc.)--they just release it because they feel they should and most of them are also already tenured in their positions.Good luck!!!

评论 #5190257 未加载

评论 #5189991 未加载

JohnBootyover 12 years ago

"I have previously believed that converting any code you've created into a open-source library benefits the community and prevents reinvention of the wheel [...]I have however started to realise that perhaps something I thought would be very useful may be of little interest to anyone else. Furthermore the effort I have put into testing and documentation may not have been the best use of my time if no one but I will use it. As my time as a post doc is limited, the extra time effort spent on improving these tools could have instead have been spent elsewhere."From a purely selfish perspective, I've found that documenting and cleaning up my own code benefits me in the future. Even if it's a one-off, single-purpose utility that I'll never use again in the future, I often find myself needing to borrow bits of code from my old projects. ("Oh, I solved this problem before. How did I do it? Let's dig up that old, old project...") At which point, present-day me benefits if my past self bothered to actually document things and make sure they're reasonably robust.There are countless other reasons (moral and pragmatic) to document, test, and open-source one's code, of course! Many of them more important than the ability to crib one's old code, I'd argue.But the author seems to have considered (and discarded) them...

评论 #5189924 未加载

tmarthalover 12 years ago

I used to call the scientific software that I was writing, "paper-ware".You aren't building a system for other users, you aren't really doing anything other than one-off analysis to create charts, which will be explained in a paper.Things have changed somewhat since the early 2000's, but the concept remains the same. Nowadays, for interesting or controversial results other scientists want to be able to verify your results. However, that is usually more related to your data and how you processed it, rather than your software algorithms (which should be explained in the paper, and can be recreated from that).So do these systems need to have reams of documentation? Probably not. However, if you leave the system for two years and come back to work on it, or figure out how it used to work, then you best have enough commenting with a thorough readme about some of the decisions you made and why. It's more analogous to scripting rather than software engineering.

评论 #5189299 未加载

lmmover 12 years ago

Like everything else in software, code quality should be feature-driven. Write the minimum to do what you need to. If you find that your code's poor quality is becoming a problem (whether because it's slowing your own development down, or other people aren't using it and you want them to, or whatever reason), do something about it then, but not before.

评论 #5190036 未加载

abraxaszover 12 years ago

"I have begun to think now that the most important thing when writing software is to write the usable minimum. If then the tool becomes popular and other people begin to use it, then I should I work on the documentation and interface."That. Like someone pointed out, I find that documenting and testing the key parts (that is, those I know at least I will reuse) is always a good investment of my time and prevents major headaches down the road. I've been experimenting with project structures that clearly separates the set of tools and functions that will be reusable, and those that are one shot. I focus all my testing efforts on former, and cut myself some slack on the latter.Btw, I speak from a "scientist" perspective, and nothing I say applies to professional software engineering (I mean, I don't think it does).

roadnottakenover 12 years ago

This is debatable, but IMHO your job as a post-doc is to learn new things about biology and publish papers on what you've learned. If you can document your code along the way, that's great. But if it's taking up a bunch of your time then it's probably a misguided effort.

评论 #5189317 未加载

ozatamanover 12 years ago

A big concern for me has always been correctness. You're more likely to make mistakes and miss edge conditions in sloppy code. There's nothing worse than communicating some positive/inspiring results, only to find out later that you had an elusive computational bug in there that invalidates the results.This reminds me: A man was seen cutting a tree down with a dull bladed axe. A bystander asked him "Why not sharpen your axe first?". The cutter responded "I don't have the time!".

评论 #5190139 未加载

评论 #5190149 未加载

wallerj77over 12 years ago

I'm curious, is there a place where you can submit your software to the community and tag it as relevant for doing A, B, C. So that others can use it to do the same or even build it further. I have limited experience with software in your field - but it seems like there isn't a good way to find tools already built to address your needs, or at least close enought? Am I wrong or miss something?

评论 #5190074 未加载

评论 #5189519 未加载

elchiefover 12 years ago

I was just thinking the other day about how good academic software is getting. And how useful it is to society that masters and PhDs are making software for the research.Look at RapidMiner (developed at U. Dortmund), Stanford's CoreNLP, and the brat rapid annotation tool. These are better than a lot of commercial tools. They are more text-analytics than bioinformatics, but same diff.

gwernover 12 years ago

Funny, I was just comparing the incentives for releasing scientific software to those of releasing well: <a href="http://multiplecomparisons.blogspot.com/2013/02/making-data-sharing-count.html" rel="nofollow">http://multiplecomparisons.blogspot.com/2013/02/making-data-...</a>And now I hear this questioning the value of writing up and polishing scientific software!