MS Excel is <i>absolutely</i> unfit for most scientific and engineering problems.<p>The spreadsheet GUI, lack of good version tracking/history, and eagerness to coerce data types and "correct" values makes it easy to introduce errors that will go unrecognized and propagated through calculations. Unfortunately this story just keeps repeating itself.<p>But all of this is just a secondary concern to Excel's real trouble: it's history of incorrectly implementing numerical and statistical procedures. One could plumb the depths of this topic for hours, but here are a few highlights: regression formula accepts illegal/nonsensical inputs (e.g. collinear predictors) and gives illegal/nonsensical outputs [0], variance/standard deviation change incorrectly with sample size [0], output of a paired t-test changes when missing values are included [0], formulas are mislabeled [0], v. 2007 gives very wrong answers to 11 of 27 tests in the NIST test suite used for statistical software benchmarks [1], the random number generator was broken as late as v. 2007 [1], and calculations relying on any of 12 particular floats display an incorrect result [2]. There are plenty of other issues mentioned in the links and elsewhere; if you're interested you'll have no trouble finding them.<p>Remember, friends don't let friends use Excel for science. :)<p>[0] <a href="http://people.stern.nyu.edu/jsimonof/classes/1305/pdf/excelreg.pdf" rel="nofollow">http://people.stern.nyu.edu/jsimonof/classes/1305/pdf/excelr...</a><p>[1] <a href="http://www.pages.drexel.edu/~bdm25/excel2007.pdf" rel="nofollow">http://www.pages.drexel.edu/~bdm25/excel2007.pdf</a><p>[2] <a href="https://blogs.office.com/2007/09/25/calculation-issue-update/" rel="nofollow">https://blogs.office.com/2007/09/25/calculation-issue-update...</a><p>Edit: clarify and add a new issue I became aware of while researching further.
First I thought this story was about math and numerical errors. But it's actually about auto- formatting and auto-correction.<p>"Excel automatically converting gene names to things like calendar dates or random numbers"<p>In this case, I think what is needed is some kind of rudimentary knowledge of data-types. Or perhaps more simply a scientific template which is actually plain text by default.<p>But how are people not noticing auto-correction and auto formatting taking place!<p>The only perfect solution is to hire a developer to build you a data entry system. The developer can build the system which they have no cause to entirely understand the science behind, and thus a human to take the blame for errors instead of excel.
<a href="https://help.libreoffice.org/Calc/Deactivating_Automatic_Changes" rel="nofollow">https://help.libreoffice.org/Calc/Deactivating_Automatic_Cha...</a><p>Type apostrophe at beginning of the gene name ('MARCH1) or format the column for gene names as text (click column letter, then Format | Cell and select text)<p>If people want to use a spreadsheet application for this kind of data collection (and that is a big if I think) then they perhaps need to have some agreed lab protocols for setting up and checking the spreadsheets. This is a known issue in financial circles...<p><a href="http://www.eusprig.org/basic-research.htm" rel="nofollow">http://www.eusprig.org/basic-research.htm</a>
When my Ph.D is finally done (~3 months), I'll post some of the code I've had to work with daily for the past three years.<p>"Spaghetti" doesn't even <i>begin</i> to describe it. "Ball of yarn under a cat-lady's sofa" comes readily to mind, as does gouging my eyes out and amputating my fingers.<p>The problem isn't excel. The problem is scientists.
Why is the auto-convert 'features' in Excel not opt-in?<p>When Excel encounters the first cell in a new sheet that it thinks should be auto-converted, why does it not ask if that is desirable for that sheet?<p>Like: "Do you want Excel to interpret and auto-convert all strings with format <X> into the type <Y> in this sheet?"<p>At least for conversions where the original data is lost.
Highly relevant: Felienne[0] Hermans' compsci research on spreadsheets out there in the wild, and how to develop software engineering tools to make them better:<p><a href="https://www.youtube.com/watch?v=2Cdgew5zvI4" rel="nofollow">https://www.youtube.com/watch?v=2Cdgew5zvI4</a><p><a href="http://www.felienne.com/archives/tag/spreadsheets" rel="nofollow">http://www.felienne.com/archives/tag/spreadsheets</a><p>[0] pronounced Fay-lee-nuh
Same thing hapened in Economics involving a major figure in Economics.<p><a href="http://www.bloomberg.com/news/articles/2013-04-18/faq-reinhart-rogoff-and-the-excel-error-that-changed-history" rel="nofollow">http://www.bloomberg.com/news/articles/2013-04-18/faq-reinha...</a>
It's funny to consider that these errors slipped past the peer review stage. It really highlights the major issue with reviewing source code published as part of an analysis.<p>If there aren't enough resources / skilled eyes to catch these simple errors, what are the chances they would catch errors in source code too?
Anyone doing serious statistics uses SPSS or R, or similar stats programs. If not, you deserve all the bad data you get. Using Excel for that is akin to using a point and shoot camera for a fashion photoshoot, or a crossover car offroad in Death Valley.
Consider that given the poorly conducted statistical analyses, p-hacking etc that goes on in the life sciences, Excel garbling gene names might actually improve the net accuracy of the results by removing false positives.
Should decison maker's spreadsheets in business, policy, and government be peer-reviewed in the same way as scientific papers?<p>Disclaimers: 1) Yes, scientific peer review needs improvement. 2) Yes, spreadsheets are not ideal for science... what makes business less important?