An alarming number of scientific papers contain Excel errors

91 pointsby pnsover 8 years ago

16 comments

Previous discussion - <a href="https://news.ycombinator.com/item?id=12349391" rel="nofollow">https://news.ycombinator.com/item?id=12349391</a>

ramblenodeover 8 years ago

MS Excel is absolutely unfit for most scientific and engineering problems.The spreadsheet GUI, lack of good version tracking/history, and eagerness to coerce data types and "correct" values makes it easy to introduce errors that will go unrecognized and propagated through calculations. Unfortunately this story just keeps repeating itself.But all of this is just a secondary concern to Excel's real trouble: it's history of incorrectly implementing numerical and statistical procedures. One could plumb the depths of this topic for hours, but here are a few highlights: regression formula accepts illegal/nonsensical inputs (e.g. collinear predictors) and gives illegal/nonsensical outputs [0], variance/standard deviation change incorrectly with sample size [0], output of a paired t-test changes when missing values are included [0], formulas are mislabeled [0], v. 2007 gives very wrong answers to 11 of 27 tests in the NIST test suite used for statistical software benchmarks [1], the random number generator was broken as late as v. 2007 [1], and calculations relying on any of 12 particular floats display an incorrect result [2]. There are plenty of other issues mentioned in the links and elsewhere; if you're interested you'll have no trouble finding them.Remember, friends don't let friends use Excel for science. :)[0] <a href="http://people.stern.nyu.edu/jsimonof/classes/1305/pdf/excelreg.pdf" rel="nofollow">http://people.stern.nyu.edu/jsimonof/classes/1305/pdf/excelr...</a>[1] <a href="http://www.pages.drexel.edu/~bdm25/excel2007.pdf" rel="nofollow">http://www.pages.drexel.edu/~bdm25/excel2007.pdf</a>[2] <a href="https://blogs.office.com/2007/09/25/calculation-issue-update/" rel="nofollow">https://blogs.office.com/2007/09/25/calculation-issue-update...</a>Edit: clarify and add a new issue I became aware of while researching further.

评论 #12371426 未加载

评论 #12371815 未加载

评论 #12371642 未加载

评论 #12372308 未加载

评论 #12372713 未加载

评论 #12371435 未加载

评论 #12372453 未加载

dagaciover 8 years ago

First I thought this story was about math and numerical errors. But it's actually about auto- formatting and auto-correction."Excel automatically converting gene names to things like calendar dates or random numbers"In this case, I think what is needed is some kind of rudimentary knowledge of data-types. Or perhaps more simply a scientific template which is actually plain text by default.But how are people not noticing auto-correction and auto formatting taking place!The only perfect solution is to hire a developer to build you a data entry system. The developer can build the system which they have no cause to entirely understand the science behind, and thus a human to take the blame for errors instead of excel.

评论 #12371649 未加载

评论 #12372642 未加载

评论 #12372067 未加载

keithpeterover 8 years ago

<a href="https://help.libreoffice.org/Calc/Deactivating_Automatic_Changes" rel="nofollow">https://help.libreoffice.org/Calc/Deactivating_Automatic_Cha...</a>Type apostrophe at beginning of the gene name ('MARCH1) or format the column for gene names as text (click column letter, then Format | Cell and select text)If people want to use a spreadsheet application for this kind of data collection (and that is a big if I think) then they perhaps need to have some agreed lab protocols for setting up and checking the spreadsheets. This is a known issue in financial circles...<a href="http://www.eusprig.org/basic-research.htm" rel="nofollow">http://www.eusprig.org/basic-research.htm</a>

评论 #12372028 未加载

omginternetsover 8 years ago

When my Ph.D is finally done (~3 months), I'll post some of the code I've had to work with daily for the past three years."Spaghetti" doesn't even begin to describe it. "Ball of yarn under a cat-lady's sofa" comes readily to mind, as does gouging my eyes out and amputating my fingers.The problem isn't excel. The problem is scientists.

评论 #12372359 未加载

评论 #12372449 未加载

SNvD7vEJover 8 years ago

Why is the auto-convert 'features' in Excel not opt-in?When Excel encounters the first cell in a new sheet that it thinks should be auto-converted, why does it not ask if that is desirable for that sheet?Like: "Do you want Excel to interpret and auto-convert all strings with format <X> into the type <Y> in this sheet?"At least for conversions where the original data is lost.

评论 #12371945 未加载

vanderZwanover 8 years ago

Highly relevant: Felienne[0] Hermans' compsci research on spreadsheets out there in the wild, and how to develop software engineering tools to make them better:<a href="https://www.youtube.com/watch?v=2Cdgew5zvI4" rel="nofollow">https://www.youtube.com/watch?v=2Cdgew5zvI4</a><a href="http://www.felienne.com/archives/tag/spreadsheets" rel="nofollow">http://www.felienne.com/archives/tag/spreadsheets</a>[0] pronounced Fay-lee-nuh

IndianAstronautover 8 years ago

Same thing hapened in Economics involving a major figure in Economics.<a href="http://www.bloomberg.com/news/articles/2013-04-18/faq-reinhart-rogoff-and-the-excel-error-that-changed-history" rel="nofollow">http://www.bloomberg.com/news/articles/2013-04-18/faq-reinha...</a>

评论 #12375797 未加载

hirenjover 8 years ago

It's funny to consider that these errors slipped past the peer review stage. It really highlights the major issue with reviewing source code published as part of an analysis.If there aren't enough resources / skilled eyes to catch these simple errors, what are the chances they would catch errors in source code too?

评论 #12371747 未加载

triplesecover 8 years ago

Anyone doing serious statistics uses SPSS or R, or similar stats programs. If not, you deserve all the bad data you get. Using Excel for that is akin to using a point and shoot camera for a fashion photoshoot, or a crossover car offroad in Death Valley.

Gatskyover 8 years ago

Consider that given the poorly conducted statistical analyses, p-hacking etc that goes on in the life sciences, Excel garbling gene names might actually improve the net accuracy of the results by removing false positives.

Steeeveover 8 years ago

Is everybody a washington post subscriber? Or have I missed the route around the paywall somehow?

评论 #12371601 未加载

评论 #12371489 未加载

评论 #12371605 未加载

评论 #12372408 未加载

pjmorrisover 8 years ago

Should decison maker's spreadsheets in business, policy, and government be peer-reviewed in the same way as scientific papers?Disclaimers: 1) Yes, scientific peer review needs improvement. 2) Yes, spreadsheets are not ideal for science... what makes business less important?

cm2187over 8 years ago

I wonder if that also applies to DNA tests used in criminal investigations...

评论 #12371821 未加载

gregn610over 8 years ago

An alarming number of business spreadsheets contain Excel errors. But it's the linga fraca of businesses, departments & teams everywhere.

nol13over 8 years ago

*Excel mutations