The myths of bioinformatics software

86 pointsby bbgmalmost 10 years ago

10 comments

roel_valmost 10 years ago

Much of this is equally applicable to other fields, but I very much disagree with point 2. Every lab should have one or more programmers, people who are professionals at writing software, and who guide researchers in their software development. For both efficiency and accuracy reasons.But of course a software developer at a university is, at best, a 'lab assistant', but more likely regarded to be on the same level as the janitor (both in respect and in pay). With the result being thousands upon thousands of shitty programs you wouldn't wish work on to your worst enemy.But hey, cool with me, I carved out a consulting niche in cleaning up such messes in exactly that environment. But man could a lot of money be saved, and a lot of much better work be done, if only researchers (and the hierarchy above them) would recognize that software development is both critical to pretty much any research today, as well as something they cannot just pick up on the side.

评论 #9866495 未加载

评论 #9865580 未加载

评论 #9866675 未加载

评论 #9866615 未加载

meeper16almost 10 years ago

And the most long standing myth, which first started after the Human Genome was sequenced by Francis Collins and Craig Ventor (now working on human longevity) over 14 years ago: Bioinformatics software will single-handedly be responsible for discovering billion dollar drug targets. This is in large part why most early bioinformatics companies failed - due to lack of deliver on this front along with jangled software approaches that were being moved into the commercial world from academia. The reality is that most bionformatics software relies on old formal methods and is not geared toward true highly innovative discovery and data interpretation.I do think however that we are entering a new age of bioinformatics and its associated data mining, interpreation, visualization and discovery tools which hopefully will push the bounderies of being less formal and more experimental. We need to make discoveries faster when it comes to Life Sciences.I think new approaches in bioinformatics/datamining/data science/visualization will have the greatest impact in the areas of extending human lifespan. This is what Craig Ventor, Google Calico Labs, SENS, GenoPharmix, Buck Institute are all working on now.

评论 #9865395 未加载

评论 #9865197 未加载

评论 #9864668 未加载

coliveiraalmost 10 years ago

An important point in this article is the vital distinction between academic software and "general purpose" software. The goal of research software is to prove or exemplify a point explained in one or more papers. It should do so in the most direct and economical way for the researcher(s). It makes no sense to create multi-platform, software-engineering friendly software for this kind of use. In the last few years we have seen an untold number of complains that research software is not robust, user-friendly, etc., and it entirely misses this important fact.

评论 #9864416 未加载

评论 #9867033 未加载

评论 #9869194 未加载

评论 #9865532 未加载

arca_voragoalmost 10 years ago

The first opportunity to comment on bioinformatics since my non-compete/nda is over!This seems like a very sloppily put together list of myths, but I'll bite anyway.1. Not true, but I think that's largely because much of the software used is closed so the FOSS community is largely anemic in the bio world. For the tools that are FOSS or BSD, I saw plenty of contributions, but the other thing to keep in mind is that it's not just about the programming. You have to have a certain level of understanding of the application domain to program a solution for it properly, and there are very few of these people around. I predict a huge uptick in demand and salaries for bioprogrammers.2. Is true. You need your own people on salary to program for your needs. I was the sysadmin part of a phd, sysadmin, programmer team and we were doing stuff that no-one else was going to do for us. You need to have your own programmer, and a good sysadmin, full stop.3. Is also true. Picking the right license is important because many labs are pretty tight on cash flow. Sure, they probably have millions going through them a month, but operating costs are super high and margins are lower than you may think. It was during my time in the genetics lab that I fully realized why FOSS was so important, and I think it's the future. (with a few key proprietary exceptions that no FOSS has matched yet, (think Elmer vs Comsol))4. Using a FOSS license makes this a moot point to address. Use GPLv3 code people, stop using BSD!5-9: not worth addressing.Anyway, my overall view of the field is this: with sequencing getting cheaper, the problem is in managing the levels of data being generated (sysadmin issue) and in interpreting the data for meaningful results (programmer/phd issue). Personally, I think that machine learning is going to be the right breakthrough to follow and apply to bio, and once we do that I expect it to take off to crazy levels. I'm talking sequencer in every doctors office, and artificial genetic manipulation becoming much easier and with more accurate predictions.Also, the other thing everyone underestimates is the microbiome as an entity. You are more the bacteria that lives in you than you are you. Of course, I struggle to understand the science sometimes, I'm just a sysadmin, so take what I say with a grain of salt.

评论 #9868170 未加载

评论 #9866777 未加载

评论 #9866027 未加载

cjbprimealmost 10 years ago

As <a href="https://twitter.com/madprime/status/619503684838387716" rel="nofollow">https://twitter.com/madprime/status/619503684838387716</a> points out, the argument that you're cheating the US Government out of public money by releasing without a non-commercial clause is bizarre -- everything the US Government releases is required by law to be released into the public domain.

评论 #9864906 未加载

评论 #9864878 未加载

评论 #9867833 未加载

jervenalmost 10 years ago

Your data is more important than your code. Is the often neglected fact in bioinformatics. Whatever you do document your file formats.

danieltillettalmost 10 years ago

As someone how actually makes a living selling bioinformatics software, the problem is mainly due to how scientist view software. The code you write is seen the same way lab books are - basically raw data. Nobody publishes their lab books and all too often software is thought of as just an electronic lab book. It would be great if this changed, but it needs a change in how scientist look at software.

bmir-alum-007almost 10 years ago

Disclaimer: I used to work at a Stanford bioinformatics shop.There's a clear need of AWS-like features for bio/biomedical informatics specifically enabling sharing, security, reuse and anonymization of data (PHI), libraries (like R's bioconductor) and infrastructure (IaaS/PaaS/SaaS).The issue is that some labs archive still archive their data on actual hard drives (USB and bare drives), making their data much less useful than somewhere readily available and sharable.I think it's a huge (billion+) opportunity where the right execution would need loads of smart, consultingish customer service reps (huge overhead costs) to help researchers with coding, sysadmining and bio to some degree. Basically, a full-service (with self-service, a-la carte features) hosting company for bio / medical.This space is only going to grow deeper and wider as more is discovered and confirmed about each gene, protein, pathway and each accompanying expansion in nosology. This sort of research knowledge is vital and unlikely to shrink. The main issues are that it would be a cash-intensive and undefensible business model because it requires paying lots of consultant/scientist brains and anyone can copy the model.

评论 #9869214 未加载

评论 #9866384 未加载

评论 #9866076 未加载

rchalmost 10 years ago

The article mentions code quality and license issues, and one of my favorites (MEME) seems to suffer a bit from both. I believe the first aspect is simply the result of being developed in a sequential fashion by different contributors (which is reasonable given the environment).The main problem is that the license rules out using the software for 'commercial purposes' except under unspecified terms that would need to hashed out with the tech transfer office. I completely support the spirit of that construct, but it makes it difficult to advocate for in practice. At least in this case, GPL or LGPL would be a significant improvement.

评论 #9864970 未加载

shiggerinoalmost 10 years ago

Insisting bioinformatics software be non-free is pretty rules out any possibility anyone is going to build on your code. If this is the case, that's regrettable, but why seal the fate?If they are afraid of companies using and abusing the software, just put it under the GPL and they will at least have repay the favour to the users and the community.

评论 #9868042 未加载