TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Missing Covid-19 test data was caused by the ill-thought-out use of Excel

1041 pointsby cjlmover 4 years ago

92 comments

jugg1esover 4 years ago
I work in healthcare and by far the biggest production issues we&#x27;ve ever run into are people who run Excel spreadsheets through our file processing where the leading zero&#x27;s have been removed from patient identifiers because that&#x27;s Excel&#x27;s default behavior and you CANNOT TURN IT OFF!<p>EDIT: I have no idea who downvoted my post because what I said is 100% true. We have to tell customers to stop opening CSVs in Excel and then uploading them to us because what they upload could be missing critical data. Excel interprets a number and then formats it as a number, but in healthcare, 10 digit numbers are really strings. Unless your IT team has created an Excel extension and had it preloaded onto your local copy, Excel will remove leading zeros and there isn&#x27;t a way to get them back if you save&#x2F;overwrite.
评论 #24695044 未加载
评论 #24695884 未加载
评论 #24695539 未加载
评论 #24696075 未加载
评论 #24699131 未加载
评论 #24697025 未加载
评论 #24695038 未加载
评论 #24697202 未加载
评论 #24695427 未加载
评论 #24694990 未加载
评论 #24696030 未加载
评论 #24709955 未加载
评论 #24695266 未加载
评论 #24696110 未加载
评论 #24698280 未加载
评论 #24696650 未加载
评论 #24696265 未加载
评论 #24697774 未加载
评论 #24696007 未加载
评论 #24697334 未加载
评论 #24695189 未加载
评论 #24695516 未加载
评论 #24696694 未加载
评论 #24696756 未加载
评论 #24694731 未加载
totaldude87over 4 years ago
All previous major spreadsheet screwups were costing money (largest being $6 billion in losses due to a excel formula mishap) , but this one is playing with lives :(<p>5 – London Olympics Oversells Swimming Event by 10,000 Tickets<p>4- Banking powerhouse Barclay’s accidentally bought 179 more contracts than they intended in their purchase of Lehman Brothers assets in 2008. Someone hid cells containing the unwanted contract instead of deleting them.<p>3-utsourcing specialists Mouchel had to endure a £4.3 million profits write down due to a spreadsheet error in a pension fund deficit caused by an outside firm of actuaries<p>2- Canadian power generator TransAlta suffered losses of $24 million as the result of a simple clerical error which meant they bought US contracts at higher prices than they should hav<p>and the Biggest one is. -<p>Basic Excel flaws and incorrect testing led to JP Morgan Chase losing more than $6 billion in their London Whale disaster.<p><a href="https:&#x2F;&#x2F;floatapp.com&#x2F;us&#x2F;blog&#x2F;5-greatest-spreadsheet-errors-of-all-time&#x2F;" rel="nofollow">https:&#x2F;&#x2F;floatapp.com&#x2F;us&#x2F;blog&#x2F;5-greatest-spreadsheet-errors-o...</a>
评论 #24695382 未加载
评论 #24694145 未加载
评论 #24694157 未加载
评论 #24695794 未加载
评论 #24694206 未加载
edanmover 4 years ago
As a data engineer, I think blaming Excel for this is the wrong framing. The real problem here is with the entire process, and specifically, the lack of testing.<p>Excel is great for many use cases, especially if you need people to enter data somewhere. Its UI is unparalleled in terms of quickly giving something to users that they can understand, mess around with, and verify. It&#x27;s a very common use case to then need to pull in data from a bunch of Excel files, into one main repository of data (a data warehouse). That can be stored in an Excel file, although more commonly would be stored in a database.<p>But there are always problems with this process! There can be missing data, there can be weird data conversions because the program&#x2F;language you&#x27;re using to parse the data and get it into the database reads things differently than how Excel intended, there can be weird database issues that causes data loss, etc.<p>Complex systems <i>always, always</i> have bugs.<p>It is the job of a data engineering team to, among other things, test the systems thoroughly, and put in place systems to test against data loss, etc. It is pretty common, for example, to count the data going into a pipeline, and the data that you end up with, and make sure nothing was lost on the way.<p>Anyone can make a mistake. Any team, no matter how good, especially when they&#x27;re rushed, can use shortcuts, use the wrong technologies because it&#x27;s expedient, or simply have bugs. It is the job of the team, and of project management in general, to do all the manual and automatic testing necessary to make sure that mistakes are caught.<p>The real lesson isn&#x27;t &quot;Excel is bad&quot;. It&#x27;s not. It&#x27;s an amazing tool. The real lesson is &quot;Data Engineering is hard, requires a lot of resources&quot;, and &quot;all systems have bugs - testing is mandatory for any critical system&quot;.
评论 #24695244 未加载
评论 #24695383 未加载
评论 #24695205 未加载
评论 #24695519 未加载
评论 #24696757 未加载
评论 #24695498 未加载
rossdavidhover 4 years ago
While I am not in the habit of defending Microsoft, I think it should be pointed out that it wasn&#x27;t using &quot;Excel&quot; that was the biggest problem, but using &quot;13-years obsolete format for Excel&quot;. Not that there was any reason to be using Excel for this, but it&#x27;s a bit unfair to blame Microsoft for something they never claimed XLS could do, and provided a much better capacity in XLSX 13 years ago.<p>Again, it&#x27;s hard to cry too many tears for Microsoft, but it does seem a bit off-target to blame &quot;Excel&quot; for this...
评论 #24693839 未加载
评论 #24693822 未加载
评论 #24695212 未加载
评论 #24693902 未加载
评论 #24695203 未加载
评论 #24694831 未加载
jl6over 4 years ago
It’s fun to point and laugh, but is this really just the consequence of an unprecedented health emergency demanding a rush job?<p>As far as bugs go, it doesn’t sound <i>that</i> bad. They didn’t lose data - they just processed it late? And they spotted it within days&#x2F;weeks, and have a workaround&#x2F;correction already? And it’s only the reporting that was wrong, not the more important part where they inform people of results?<p>I’d rather have this system now than be waiting for the requirements analysis to conclude on the perfect system.
评论 #24692380 未加载
评论 #24694041 未加载
评论 #24692104 未加载
评论 #24695610 未加载
评论 #24692444 未加载
评论 #24691994 未加载
评论 #24692060 未加载
评论 #24693907 未加载
评论 #24694194 未加载
评论 #24695374 未加载
galeosover 4 years ago
I wonder if this will be added to the European Spreadsheet Risk Interest Group&#x27;s (EuSpRiG) horror stories list:<p><a href="http:&#x2F;&#x2F;www.eusprig.org&#x2F;horror-stories.htm" rel="nofollow">http:&#x2F;&#x2F;www.eusprig.org&#x2F;horror-stories.htm</a>
评论 #24690371 未加载
评论 #24694176 未加载
评论 #24691465 未加载
mjw1007over 4 years ago
So, what we know is:<p>« PHE had set up an automatic process to pull this data together into Excel templates [...] When [the row limit] was reached, further cases were simply left off. »<p>The terrible thing here is dropping data rather than reporting an error and refusing to run.<p>It isn&#x27;t clear what piece of software was behind this &quot;automatic process&quot;.<p>Clearly the responsible humans are to blame.<p>If the software that dropped data rather than failing has that as its default behaviour, or is easily configured to do that, then I think that software (and its authors) are also to blame.<p>Is there anything in Excel itself that behaves like that?
评论 #24690987 未加载
评论 #24691862 未加载
评论 #24696100 未加载
diarrheaover 4 years ago
Each test result creating <i>several</i> rows of data seems like a problem too. In clean data, every observation is one row. It makes working with the dataset much easier. In this scenario, I would expect one observation to correspond to one test result. The multiple rows are then better off pivoted into columns.
评论 #24691178 未加载
评论 #24691783 未加载
评论 #24691280 未加载
评论 #24692088 未加载
mhandleyover 4 years ago
The article states that it is not known where in the country was affected, but I think we can deduce that. Here are the daily cases by specimen date. Dotted line is Saturday&#x27;s data, solid line is Sunday&#x27;s. The difference between the two is partly the normal daily update, but around 2&#x2F;3 of it is correcting the underreporting:<p>7-day moving average: <a href="http:&#x2F;&#x2F;danger.handley.org.uk&#x2F;misc&#x2F;rates-uk-recent.png" rel="nofollow">http:&#x2F;&#x2F;danger.handley.org.uk&#x2F;misc&#x2F;rates-uk-recent.png</a><p>Raw data: <a href="http:&#x2F;&#x2F;danger.handley.org.uk&#x2F;misc&#x2F;rates-uk.png" rel="nofollow">http:&#x2F;&#x2F;danger.handley.org.uk&#x2F;misc&#x2F;rates-uk.png</a><p>Looks like all regions were affected, but by far the largest corrections are in NW, NE and Yorkshire regions. In particular, NE had looked like cases were declining, but we can now see this was incorrect, and they&#x27;re still increasing rapidly<p>Edit: note the most recent 3 days are always incomplete, so any decline shown there is not a real effect.
评论 #24691575 未加载
jononorover 4 years ago
Always use checked pre- and post-conditions in a data pipeline.<p>This simple post-condition would have caught this issue: The sheet after merge operation must have a number of rows equal to the sum of number of rows for all merged sheets.<p>Assuming this is a merge of a standardized input, then another post-condition might be: The number of columns in output shall equal the number of columns in the input. Might want to check header names, and order as well.<p>Thinking in terms of universal properties, and putting the checks into production, is better than unit-testing.
评论 #24692072 未加载
评论 #24691652 未加载
daanloover 4 years ago
We did a bit of work for laboratories this year and csv is not an uncommon exchange format between labs. In general almost all exchange formats are text based, with labs saying they will upgrade to „modern xml formats“ at some point in the future. So seen in this context a csv or an excel file doesn‘t really surprise me and should probably also be seen in this context.
评论 #24686311 未加载
评论 #24686241 未加载
评论 #24691325 未加载
评论 #24686242 未加载
评论 #24691340 未加载
评论 #24691297 未加载
评论 #24690550 未加载
iainmerrickover 4 years ago
As an aside, this is a very clear explanation of the bug in non-technical language, much better than you normally see on general news sites. Excellent reporting!
airbreatherover 4 years ago
Most people don&#x27;t know about solve order and possible multiple solves for a given cell on any update.<p>From : <a href="http:&#x2F;&#x2F;www.decisionmodels.com&#x2F;calcsecretsc.htm" rel="nofollow">http:&#x2F;&#x2F;www.decisionmodels.com&#x2F;calcsecretsc.htm</a><p>When a cell in a spreadsheet refers to another cell it must be finally calculated after the cell it refers to. This is called a Dependency.<p>Excel recognizes dependencies by looking at each formula and seeing what cells are referred to. See Dependency Trees for more details of how Excel determines dependencies.<p>Understanding this is important for User Defined Functions because you need to make sure that all the cells the function uses are referred to in the function arguments. Otherwise Excel may not be able to correctly determine when the function needs to calculated, and what its dependencies are, and you may get an unexpected answer. Specifying Application.Volatile or using Ctrl&#x2F;Alt&#x2F;F9 will often enable Excel to bypass this problem, but you still need to write your function to handle multiple executions per calculation cycle and uncalculated data.
langitbiruover 4 years ago
So the limit for Excell is 65,000 rows for the old format, and one million-plus rows for the new format (from the article).<p>Then I wonder, is there any tool that mimic Excell but with Sqlite as the backend? The limit of rows in Sqlite is 2 raised to the power of 64 (18446744073709551616 or about 1.8e+19).<p><a href="https:&#x2F;&#x2F;sqlite.org&#x2F;limits.html" rel="nofollow">https:&#x2F;&#x2F;sqlite.org&#x2F;limits.html</a>
评论 #24691141 未加载
评论 #24690710 未加载
评论 #24700778 未加载
评论 #24691694 未加载
评论 #24694591 未加载
评论 #24690742 未加载
lordnachoover 4 years ago
The problem is that anyone who isn&#x27;t a decent coder think excel is a sensible choice for this, because they&#x27;ve seen it in their school or work.<p>So when the proposals come in, you&#x27;re going to see one guy who says it&#x27;s all common sense and we use familiar old excel for everything, and another lunatic who says something called &quot;pigsqueal&quot; is actually the standard, connected to a &quot;frontend&quot; which for some reason is now separate to the &quot;backend&quot;. This nutter thinks we have time to write some unit tests and also wants to add an authentication module so we know who uploaded what. And somehow he thinks we need to add logging so we can ensure errors can be tracked and debugged. Amazingly he seems to be suggesting that there&#x27;s going to be errors in our process.
评论 #24691660 未加载
评论 #24691855 未加载
评论 #24691673 未加载
评论 #24691908 未加载
评论 #24693618 未加载
评论 #24692426 未加载
评论 #24691850 未加载
评论 #24691627 未加载
评论 #24692537 未加载
评论 #24691749 未加载
评论 #24693116 未加载
评论 #24691708 未加载
评论 #24693259 未加载
评论 #24693460 未加载
aphraxover 4 years ago
I&#x27;ve been bitten by this exact issue (albeit on non critical data), save a file as XLS it will silently drop rows beyond 65k - save it as XLSX and it&#x27;s all good..
评论 #24690681 未加载
评论 #24691509 未加载
评论 #24691506 未加载
bencollier49over 4 years ago
Good lord. Old XLS files. I&#x27;d love to have seen how this was concocted.
评论 #24690594 未加载
评论 #24690639 未加载
评论 #24692950 未加载
评论 #24692052 未加载
评论 #24690953 未加载
评论 #24695193 未加载
beagle3over 4 years ago
This is indeed <i>user</i> incompetence, but so far every news source quoted the official PR and said it was an &quot;Excel Problem&quot;.<p>Microsoft PR was caught unprepared - I wonder how they&#x27;ll re-spin it in the next few days (and for the first time that I can recall, a Microsoft product was wrongly blamed...)<p>Pay attention, how every time there&#x27;s a Windows virus or worm, it&#x27;s a &quot;Computer Virus&quot;, but in the (extremely rare) occasions where Linux or MacOS is involved, it&#x27;s attributed to that system. I don&#x27;t think that&#x27;s a coincidence, CMV.
评论 #24690417 未加载
评论 #24690363 未加载
评论 #24691531 未加载
评论 #24691239 未加载
评论 #24691926 未加载
评论 #24690995 未加载
评论 #24690843 未加载
评论 #24690451 未加载
maverickJover 4 years ago
Interesting. We might see an increase in the amount of wrongly reported data as a result of the global lockdown. One can only wonder what the consequences of data underreporting might be?<p>This reminds me of the below event which you can read the full story at <a href="https:&#x2F;&#x2F;leveragethoughts.substack.com&#x2F;p&#x2F;making-investment-decisions-based" rel="nofollow">https:&#x2F;&#x2F;leveragethoughts.substack.com&#x2F;p&#x2F;making-investment-de...</a><p>In 1976, the UK government, led by James Callaghan of the Labour, borrowed the sum of $3.9 billion from the International Monetary Fund. The granting of the loan was based on the condition that government fiscal deficit be slashed as a percentage of GDP.<p>A couple of years later, the chancellor of the exchequer at the time of the IMF loan said this below.<p>‘If we had had the right figures, we would never have needed to go for the loan”<p>That’s right!! The decision to borrow money from the IMF was based on wrong data. The public borrowing figures which prompted the UK government to seek IMF loan in 1976 were subsequently revised downwards sometime in the future.
评论 #24690671 未加载
jeffrallenover 4 years ago
640k deaths ought to be enough for any country.
bitlevelover 4 years ago
Excel is NOT a database. I don&#x27;t understand the people who use it as such (and I&#x27;ve come across many who do).<p>Then they&#x27;re surprised when it all goes tits-up.
评论 #24690791 未加载
评论 #24690578 未加载
评论 #24690691 未加载
social_quotientover 4 years ago
Wish we knew more details.<p>Maybe helpful for people that don’t know... Xls excel has a 64k row limit. Newer xlsx has 1mm row limit.<p>Or maybe they were putting people in by columns and hit the 16.3k column limit?
评论 #24686280 未加载
rahimnathwaniover 4 years ago
When I use Excel to open a CSV with too many rows, I get an alert saying &#x27;File not loaded completely&#x27;. I have to dismiss it by clicking &#x27;OK&#x27; before I can interact with the spreadsheet.<p>So Excel isn&#x27;t silently discarding data.
评论 #24691518 未加载
评论 #24691498 未加载
SV_BubbleTimeover 4 years ago
&gt;As a consequence, each template could handle only about 65,000 rows of data<p>Oh? Like 65,535 or so? That seemed weird a first that even as old as xls is that only 16bits were allocated to max rows.<p>But then maybe not. Each row might get an ID, so that&#x27;s 16bits * rows you have. I wonder if when XLS was designed they considered it very unlikely many people would have 500MB of db of empty rows and then would needed more data added to each one?<p>Simplified of course, I bet even original format had some explicit row identifier that could be shortened with optimizations and increment tags.
评论 #24690957 未加载
评论 #24690952 未加载
jp0dover 4 years ago
You&#x27;d be surprised to hear how many big organisations still use Excel for their reporting needs. I used to worked at a company with 35000 employees. Their payroll reports were generated using a MS Access database and reported using Excel. It was a massive manual job every week as they&#x27;d weekly, fortnightly and monthly payroll cycles! It&#x27;s very easy to lose files and&#x2F;or make an error in the calculations. But nobody bothered to change it.
评论 #24694295 未加载
评论 #24694941 未加载
stoobsover 4 years ago
WTF, they have a big Azure deal with Microsoft, why the hell would the be using Excel as the data source?<p>IT ineptitude of the highest order, although I&#x27;m not surprised having been involved with government IT previously.
评论 #24695783 未加载
nabla9over 4 years ago
The most consequential Excel error was The 2010 Reinhart-Rogoff error in the paper &quot;Growth in a Time of Debt&quot; <a href="https:&#x2F;&#x2F;www.nber.org&#x2F;papers&#x2F;w15639" rel="nofollow">https:&#x2F;&#x2F;www.nber.org&#x2F;papers&#x2F;w15639</a><p>The paper claimed that average real economic growth declines 0.1% when natonal debt rises to more than 90% of gross domestic product (GDP). When you correct the error it shows 2.2% average increase in economic growth.<p>Paul Ryan used it in the US for Republican budget proposal to cut spending and it was also used in EU to implement Austerity policy that hurt people.
评论 #24691956 未加载
Pick-A-Hill2019over 4 years ago
Fun semi-related story. Way back when, while I was squishing some Y2K bugs for a Saudi bank that catered to high net-worth individuals I was asked to look into why the AS400==&gt;&gt;db==&gt;&gt;Excel spreadsheet was &quot;showing something screwy every now and then&quot;.<p>Spoiler Alert: It was a SMALLINT error that was &#x27;patched&#x27; by someone previously (legacy code) that worked around the bug by rounding account balances to the Million (since it was a reporting function rather than an accounting function it was a &#x27;sort of ok&#x27; hack). The bug re-appeared as undefined behaviour when transaction and or account balances went in to the billions.
threatripperover 4 years ago
People keep putting the blame on Excel but it supports more columns than most implementations of SQL databases. MySQL for example supports only 4096 columns. SQLite defaults to 2000. That&#x27;s way less than Excel&#x27;s 16384.
评论 #24691486 未加载
评论 #24691301 未加载
评论 #24691793 未加载
pjmlpover 4 years ago
I have done a couple of gigs in life sciences and Excel is one of the big responsible for VB.NET to be still relevant.<p>I have met several researchers using Excel, not R, Python, Julia, nope plain old Excel, eventually with some VBA macros.<p>The more savvy ones, eventually ask IT for VB installation when they outgrown the VBA capabilities and carry on from there with either small Windows Forms based utilities or Office AddIns.<p>Any attempt to replace those sheets with proper applications has gotten plenty of push back until we basically offered enough Excel like features on the new applications.
评论 #24695115 未加载
ogigover 4 years ago
While I agree with the general sentiment that excel is normally misused, like in this case, it&#x27;s also capable of handling huge amounts of data thanks to powerquery and powerpivot (1,999,999,997 rows per table, 2 billion tables max, docs say). You can&#x27;t load the data in the grid, you need to use the data model. Using powerquery properly also solves most problems about excel guessing types wrongly when importing csv and other external data. If strongly recommend any excel user to learn powerquery and powerpivot.
dredmorbiusover 4 years ago
There&#x27;s a long history of spreadsheet errors and literature on same going back a few decades now.<p>In a business sim class in college a couple of decades back, I discovered that the Lotus spreadsheets (as I said: a couple of decades back) had a totalling error which double-counted individual row totals in the bottom line (everything was twice as profitable as the spreadsheet indicated).<p>At an early gig, one of the senior developers instituted a practice of code walkthroughs on projects (only a subset of them). One of these involved, you guessed it, a spreadsheet (we used a number of other development tools for much of our work), in this case Excel. Again, numerous errors which substantively changed the outcome of the analysis. One of the walkthrough leader&#x27;s observations was that you could replace all of the in-cell coding with a VBA macro making debugging far easier (all the code and data are separated and in one place each).<p>The particular analyst whose project this was: he insisted to the very end that this &quot;wasn&#x27;t a program&quot; and he &quot;wasn&#x27;t a programmer&quot; and that the walkthrough didn&#x27;t apply to his situation. Despite the errors found and corrections made.<p>At the time (mid 1990s) the walkthrough lead turned up a paper from a researcher in Hawaii on the topic. I&#x27;m not certain it was Raymond Panko, but his 2008 paper (a revise of a 1998 work) discusses the matter in depth:<p><a href="https:&#x2F;&#x2F;web.archive.org&#x2F;web&#x2F;20070617041554&#x2F;panko.shidler.hawaii.edu&#x2F;SSR&#x2F;Mypapers&#x2F;whatknow.htm" rel="nofollow">https:&#x2F;&#x2F;web.archive.org&#x2F;web&#x2F;20070617041554&#x2F;panko.shidler.haw...</a>
fortran77over 4 years ago
I wouldn’t blame Excel. The cause was gross incompetence.
smsm42over 4 years ago
Was just reading the chapter about using Excel as a database in this book two days ago: <a href="https:&#x2F;&#x2F;www.goodreads.com&#x2F;book&#x2F;show&#x2F;39074550-humble-pi" rel="nofollow">https:&#x2F;&#x2F;www.goodreads.com&#x2F;book&#x2F;show&#x2F;39074550-humble-pi</a><p>Apparently it&#x27;s super common, which fills me with horror. But these guys managed to take it to 11 by abusing it in yet another novel way.
评论 #24691493 未加载
hinkleyover 4 years ago
I hate solving problems and not knowing how I did it, so this particular experience has stuck in my craw.<p>Trying to develop a budget to pay off debts, my partner made this elaborate Excel spreadsheet and the output was that basically she had no spending money and I had very little, until one or both of us got a raise. It was far more austere than either of us were willing to go. So I started over using a different equation for &#x27;fairness&#x27;, and a different layout because something about her tables was just confusing and messy. When I was done, we had $300 a month of extra spending money between the two of us, despite using the same targets for pay-down, savings and bills.<p>I spent about 90 minutes poking at her spreadsheet and mine looking for the error and never did find it. I don&#x27;t know what the right solution is to this sort of problem, at least for non-developers. But if I was skeptical of Excel going in, I was doubly so after that. Especially for something that is going to be used to make decisions that will affect every day of your life.
评论 #24691356 未加载
评论 #24691288 未加载
评论 #24691201 未加载
评论 #24691360 未加载
评论 #24691661 未加载
评论 #24691720 未加载
评论 #24691701 未加载
评论 #24691686 未加载
评论 #24691341 未加载
BooneJSover 4 years ago
Spreadsheets like Excel or Google Docs, with the expandability offered by VB and JavaScript, allow for rapid development of applications that require data storage and data interpretation in one package. I’ve seen them used and abused for several functions over the course of my career.<p>Apple&#x2F;Claris FileMaker has had this niche for a bit, but it’s only been a niche,
mshookover 4 years ago
I call that the Microsoft Office Syndrome.<p>Tools such as Excel or PowerPoint are almost &quot;too good&quot; as in they allowed users to come up with usage which aren&#x27;t what the software was made for.<p>As others have said regarding Excel, it&#x27;s not so much what it&#x27;s capable of but what you do with it and how you do it.<p>I&#x27;ve seen people using Excel as a tool for trading, as an inventory database for a warehouse, as a shared database to track task progress, as a Gantt chart, as a calendar and so on.<p>PowerPoint is the same. Who hasn&#x27;t been given a PPT file as a manual for something?<p>&quot;Oh, it&#x27;s all in the powerpoint&quot;.<p>Or been given the same kind of file to familiarize yourself about something at your (new) workplace. Issue is you rarely have any of the material the speaker used to present the whole thing so you&#x27;re left wondering about the meaning of it all while reading bullet point lists...<p>No, a powerpoint is not a proper doc...<p>So in a way, I have to applaud MS as they did great with these but almost too great...
slykarover 4 years ago
&quot;But you wouldn&#x27;t use XLS. Nobody would start with that.&quot;<p>Oh. &quot;Every&quot; sales and business person probably would. Probably.
评论 #24692099 未加载
makomkover 4 years ago
The thing that puzzles me is, what on earth happened on the 28th? A really anomalously low number of cases were reported then, and it&#x27;s only got more weird after this fix added zero unreported new cases on that day and much higher numbers on the days before and after it. Something definitely doesn&#x27;t seem right here.
swrobelover 4 years ago
Reminds me of my days working in fiance. I was basically a broken record that just repeated: Excel is not a database
评论 #24694164 未加载
psahgalover 4 years ago
I would hope that any decent data engineer would know not to do this. Are there avenues for fully-employed engineers to help governments build these systems properly? I&#x27;d honestly be happy to just donate some of my free time to help these efforts.
chasd00over 4 years ago
interesting, i was on a conf call 3 hours ago that was an almost identical use case. Excel gets used because word comes to a director at a health department that says &quot;we need to collect testing results from 500 different facilities in a way their, and our, existing systems do not support&quot; and they&#x27;re given maybe 3 days to get it done along with the rest of their job. Putting together a technical team would take a week minimum let alone requirements gathering and analysis. So what do they do? They fire up Excel, add column headers that represent the data they need, and then send it to a Sr. Manager and tell them to make it happen.
BuyMyBitcoinsover 4 years ago
I love how Excel has become a de facto database - even though it absolutely shouldn’t be.
评论 #24690699 未加载
punnerudover 4 years ago
In Norway there was a $14 billion calculation error based on Excel, that resulted in opening of new oil fields in the Barents Sea: <a href="https:&#x2F;&#x2F;translate.googleusercontent.com&#x2F;translate_c?depth=1&amp;nv=1&amp;pto=aue&amp;rurl=translate.google.com&amp;sl=no&amp;sp=nmt4&amp;tl=en&amp;u=https:&#x2F;&#x2F;www.tu.no&#x2F;artikler&#x2F;oljemyndighetenes-excel-feil-ble-ikke-oppdaget-stortinget-apnet-barentshavet-sorost-med-regnefeil-pa-over-100-milliarder&#x2F;405367&amp;usg=ALkJrhggTdtBDU6vy0P4a632I-xPosf2zw" rel="nofollow">https:&#x2F;&#x2F;translate.googleusercontent.com&#x2F;translate_c?depth=1&amp;...</a>
robomartinover 4 years ago
Oh, Excel, how I hate to love you...<p>As is the case for many, Excel is one of my main tools. From financials to data gathering to electrical, mechanical or software engineering, Excel has always been there. It is fair to say I have made lots of money thanks to this tool.<p>And yet, every so often...<p>Many years ago a rounding error in a complex Excel tool we wrote to calculate coefficients for an FPGA-based polyphase FIR filter cost us a little over six months of debug time. I still remember the &quot;eureka!&quot; moment at two in the morning --while looking at the same data for the hundredth time in complete frustration-- when I realized we should have used &quot;ROUNDUP()&quot; rather than &quot;ROUND()&quot;.<p>Most recently, I was working with a client who chose to build a massive Excel sheet to gather a bunch of relevant data. The person doing the work seems to think they know what they are doing (conditional formatting and filtering don&#x27;t make you an expert). This poor spreadsheet has every color in the rainbow and a mess of formulas. It&#x27;s impossible for anyone but the guy who created it to touch it.<p>Here&#x27;s a hint:<p>Do not mix data with presentation. Where have we heard that before?<p>This is one of my pet peeves with Excel. If you need to gather a bunch of data, do it. Treat Excel like a database (apply normalization if you can!) and keep it with as little formatting as you possibly can. Then do all the formatting and calculations on a &quot;Presentation&quot; sheet or sheets. Just don&#x27;t pollute your database with formatting.<p>EDIT: Thinking about the UK problem, if they were working with a &quot;.xlsx&quot; file and accidentally saved it in &quot;.xls&quot; form, well, as they say, &quot;There&#x27;s a warning dialog for that&quot;.<p>What surprises me the most about these kinds of incidents is that people keep working on the same single document. In other words, no semblance at all of what the software business knows as version control.<p>Decades ago I adopted the idea that storage is cheap and always getting cheaper. If I am working on something critical, I <i>never</i> work more than one day without a backup. I make a copy and continue editing. In most cases I make a new copy every single day.
jimmcslimover 4 years ago
I&#x27;m wondering if there could be an opportunity for an open-source contract tracing system, similar to the allReady [1] application ? Maybe something backed by a graph database such as Neo4J [2]<p>Looking at the allReady repo, maybe it isn&#x27;t a great example since it hasn&#x27;t been touched in years...<p>[1] <a href="https:&#x2F;&#x2F;github.com&#x2F;HTBox&#x2F;allReady" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;HTBox&#x2F;allReady</a><p>[2] <a href="https:&#x2F;&#x2F;neo4j.com&#x2F;" rel="nofollow">https:&#x2F;&#x2F;neo4j.com&#x2F;</a>
alexbrowerover 4 years ago
This is clearly a case of someone not knowing the limits of a technology &#x2F; how to select or use a technology. Yes, more modern versions of Excel support many more rows than 65k. As a &quot;power&quot; excel user in the early 2000s, it&#x27;s tiring to see this same mistake made. And I might have blamed MSFT in 2008. But the &quot;modern&quot; world has taken an absurdly boorish response to this virus. This unfortunately looks like another example.
评论 #24695049 未加载
Turukawaover 4 years ago
I&#x27;ve been teaching data curation for open data managers in governments around the world for the last decade. All the issues identified in this BBC article are covered in my syllabus.<p>If anyone&#x27;s interested, here&#x27;s the openly-licenced syllabus, in English and French: <a href="https:&#x2F;&#x2F;github.com&#x2F;whythawk&#x2F;data-wrangling-and-validation" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;whythawk&#x2F;data-wrangling-and-validation</a>
ksecover 4 years ago
I hope people finally understand ( At least I hope Microsoft Understand ) Their biggest asset isn&#x27;t actually Windows. But Excel.<p>Some on HN will definitely comes in with (x)Office Support MS Excel file as well etc.<p>There are <i>trillion</i> dollar worth of revenue relying on Excel. In the best case scenario, no one wants to work, rework, or even touch that Spreadsheet. Having 99% compatibility is not good enough.
adolphover 4 years ago
Excel is great but I think at a certain level folks should start using Python and Pandas. The challenge is understanding when and getting the right resources to the right people.<p><a href="https:&#x2F;&#x2F;training.talkpython.fm&#x2F;courses&#x2F;move-from-excel-to-python-and-pandas" rel="nofollow">https:&#x2F;&#x2F;training.talkpython.fm&#x2F;courses&#x2F;move-from-excel-to-py...</a>
评论 #24691637 未加载
outworlderover 4 years ago
Many comments already talked at length on use-cases for Excel, so I won&#x27;t go for a &#x27;me too&#x27;.<p>Instead, I just want to share this video from Joel Spolsky, aptly titled &quot;You Suck at Excel&quot;.<p>Turns out, most people do suck at Excel. Myself included.<p><a href="https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=0nbkaYsR94c" rel="nofollow">https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=0nbkaYsR94c</a>
ekianjoover 4 years ago
&gt; The problem is that PHE&#x27;s own developers picked an old file format to do this - known as XLS.<p>&gt; As a consequence, each template could handle only about 65,000 rows of data rather than the one million-plus rows that Excel is actually capable of.<p>Why would they even use XLS in the first place? CSV files have no such limitation, you can have CSV files that hold billions of records.
johndoe42377over 4 years ago
This is yet another, but subtle example of how abstract models are disconnected from reality<p>Another one is abysmal failure of <i>all</i> simulation based models to be even close and useful.<p><a href="https:&#x2F;&#x2F;karma-engineering.com&#x2F;lab&#x2F;blog&#x2F;Solved" rel="nofollow">https:&#x2F;&#x2F;karma-engineering.com&#x2F;lab&#x2F;blog&#x2F;Solved</a>
beefieldover 4 years ago
Excel has its place. And it is excellent tool for some workflows. However, I have started to think that Microsoft should disable the option to save workbooks. That would kind of force people to use Excel as a numerical scratchpad with fast charting what is where Excel excels.<p>(There is <i>some</i> sarcasm in this comment, but not that much)
StreamBrightover 4 years ago
I wrote an energy reporting system based on Excel. Based on that, I think Excel use falls into 2 categories<p>a, extremely simple reporting with few workbooks<p>b, serious misuse of technology<p>Billion dollar companies were born based on replacing Excel in workflows. The problem is not that they have an old version, the real problem is that they use such a system in the critical path.
Mvandenberghover 4 years ago
It&#x27;s unlikely this will have had much impact since most reported cases were only a few days late but come on guys, really?<p>Using an Excel sheet as a database? In week 1, that would be &quot;not great&quot; but could be accepted as being a fast solution that everyone could work with. This far into a pandemic I think we can expect a little more professionalism in data handling.
评论 #24686031 未加载
评论 #24686225 未加载
评论 #24686464 未加载
评论 #24686434 未加载
评论 #24686176 未加载
评论 #24686460 未加载
评论 #24686035 未加载
评论 #24686468 未加载
评论 #24687376 未加载
评论 #24687768 未加载
评论 #24686012 未加载
评论 #24690356 未加载
评论 #24686416 未加载
arnonover 4 years ago
For most companies, &quot;big data&quot; is when they can&#x27;t fit the data in a single Excel sheet anymore.
评论 #24690555 未加载
评论 #24691062 未加载
aaron695over 4 years ago
This will not be Excel, but a external export library.<p>I doubt the current Excel which MS spends billions on would not be opening data if the XLS is over the &#x27;size&#x27; limit but a valid format.<p>I guess MS hate is easier than thinking about software and logic and being a better programmer and stuff.
knownover 4 years ago
&quot;Developers picked an old file format to do this - known as XLS.<p>As a consequence, each template could handle only about 65,000 rows of data rather than the one million-plus rows that Excel is actually capable of.&quot;<p>&quot;Experience is the name everyone gives to their mistakes&quot; --Oscar
zmmmmmover 4 years ago
A lot of trashing on Excel but I kind of think this particular instance isn&#x27;t really the essence of the problems with Excel. Perfectly reasonable programmers get this wrong all the time (ever heard of buffer overflow anyone?), and the issue of small test data being unrepresentative of large test data is something everybody has to contend with. If there is a real problem with it I would say it is the presumably silent nature of the truncation which allowed the error to propagate.<p>The real horrors of Excel come from things like auto-conversion of column data (text, numbers to dates etc), off-by-one errors in copy and paste, overwriting forumlas in cells, sorting of columns that doesn&#x27;t capture all the rows, etc. These are all problems that are engineered into the user interface of Excel. It&#x27;s like putting a tripwire at the top of your stairs and just expecting people to step over it day in and day out. It&#x27;s basically inevitable somebody will fall down the stairs.
knownover 4 years ago
I use AWK to do floating point calculations and copy the results to Excel <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=14018450" rel="nofollow">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=14018450</a>
mjbeswickover 4 years ago
Why would you even use excel for something so important, rather than importing the data in a database? The problem isn&#x27;t the version of Excel, but using a Spreadsheet in the first place!
andy_pppover 4 years ago
Maybe everyone in school should be given real world problems like this to do in Excel so people can get a feel for doing it more correctly? Rows === records and columns === fields being the first lesson...
评论 #24691296 未加载
walrus01over 4 years ago
If I had a dollar for every office worker that thinks Excel is a database...
wedesoftover 4 years ago
Worked at an optics company and they were spoiling lenses until someone discovered that Excel was rounding values to two digits after the comma (financial rounding).
krsdcblover 4 years ago
The actual title of the article is awfully clickbaity and misleading. As i understand the issue is not at all Microsofts fault, but NHE using an inadequate way of storing the data
bmlzootownover 4 years ago
I&#x27;m a bit of a novice when it comes to storing vast amounts of data, especially of this nature, but why would you use Excel rather than an actual database?
评论 #24698514 未加载
cmaover 4 years ago
Why does Excel still have such low row limits in 2020? When they increased it from 64K, did they decide to only go to a million to promote Azure for &quot;Big Data&quot;?
评论 #24690494 未加载
8bitsruleover 4 years ago
A DB system that doesn&#x27;t verify that each submitted record has been included in the DB (and scream if it hasn&#x27;t) ... is a DB cruising for a bruising.
评论 #24694995 未加载
segmondyover 4 years ago
I have seen this type of bug in production. After seeing it, I refuse to accept xls for data. CSV all day, whomever needs xlsx can transform it into that after.
masterprocover 4 years ago
Just dropping this here: In Apple Numbers you can work with tables that contain up to 1,000,000 rows (and 1,000 columns). :)
smsali97over 4 years ago
In these cases shouldnt Microsoft Excel raise a warning that data loss may occur if youre using the legacy file format?
评论 #24695401 未加载
Aeolunover 4 years ago
&gt; &quot;But you wouldn&#x27;t use XLS. Nobody would start with that.&quot;<p>Haha. This would be funny if it wasn’t so abjectly false.
mark-rover 4 years ago
Friends don&#x27;t let friends use Excel.<p>Unfortunately Excel is the great swiss army knife of software and it&#x27;s hard to avoid.
评论 #24690670 未加载
评论 #24691738 未加载
masterprocover 4 years ago
Just saying in Apple Numbers you can work with tables that contain up to 1,000,000 rows and 1,000 columns. :)
aminumuhammadover 4 years ago
&gt; Missing Covid-19 test data was caused by the ill-thought-out use of Excel Buhari today monninig
jjuhlover 4 years ago
Excel? <i>Really</i>? They are using that crap for nation wide reporting? Wow, mind blown. Epic fail.
评论 #24691876 未加载
galkkover 4 years ago
I read an article and I feel that this is a deflection of a guilt much more than Excel issue.
mr_toadover 4 years ago
We need to stop accepting data in Excel. Full stop. One day it’s going to kill someone.
Side-Stepover 4 years ago
English get what they buy with cheap IT slaves working on work visas.
karmakazeover 4 years ago
TL;DR [I found it hard to find which actual limit was hit.]<p>&gt; The problems are believed to have arisen when labs sent in their results using CSV files, which have no limits on size. But PHE then imported the results into Excel, where documents have a limit of just over a million lines.<p>&gt; The technical issue has now been resolved by splitting the Excel files into batches.
评论 #24690512 未加载
isolliover 4 years ago
Obligatory reminder: an Excel error was the cause for much unnecessary austerity after the 2008 financial crisis [0]<p>[0] <a href="https:&#x2F;&#x2F;www.nytimes.com&#x2F;2013&#x2F;04&#x2F;19&#x2F;opinion&#x2F;krugman-the-excel-depression.html" rel="nofollow">https:&#x2F;&#x2F;www.nytimes.com&#x2F;2013&#x2F;04&#x2F;19&#x2F;opinion&#x2F;krugman-the-excel...</a>
Exumaover 4 years ago
Probably excel parsing everything as a date LOL
rbanffyover 4 years ago
How much did the UK government pay for this?
curiousllamaover 4 years ago
Columns. They were using a new column for each new observation. The 16k number isn&#x27;t a coincidence - it&#x27;s the max # of columns in a spreadsheet.
qmmmurover 4 years ago
Absolute clowns running the show.
pjc50over 4 years ago
Professionalism has been entirely absent from the top-level handling of the pandemic. It&#x27;s not entirely clear from the article who&#x27;s responsible for this, but I note that the so-called &quot;NHS&quot; test and trace is in fact an outsourced organisation: <a href="https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;NHS_Test_and_Trace" rel="nofollow">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;NHS_Test_and_Trace</a> run by Tory placewoman Dido Harding.<p>&gt; &quot;She is a former chief executive of the TalkTalk Group where she faced calls for her to resign after a cyber attack revealed the details of 4 million customers. A member of the Conservative Party, Harding is married to Conservative Party Member of Parliament John Penrose and is a friend of former Prime Minister David Cameron. Harding was appointed as a Member of the House of Lords by Cameron in 2014. She holds a board position at the Jockey Club, which is responsible for several major horse-racing events including the Cheltenham Festival. &quot;<p>&gt; &quot;In May 2020, Harding was appointed by Health Secretary Matt Hancock to head NHS Test and Trace, established to track and help prevent the spread of COVID-19 in England. In August 2020, after it was announced Public Health England was to be abolished, Harding was appointed interim chair of the new National Institute for Health Protection, an appointment that was criticised by health experts as she did not have a background in healthcare.&quot;
评论 #24686098 未加载
评论 #24686482 未加载
评论 #24686259 未加载
评论 #24686079 未加载
评论 #24690898 未加载
评论 #24686390 未加载
评论 #24690360 未加载
codeulikeover 4 years ago
The BBC updated the article with technical details, see about halfway down. Here&#x27;s the tech details:<p>(PHE = Public Health England)<p><i>Analysis by Leo Kelion, Technology desk editor -----------------------------------------------<p>The BBC has confirmed the missing Covid-19 test data was caused by the ill-thought-out use of Microsoft&#x27;s Excel software. Furthermore, PHE was to blame, rather than a third-party contractor.<p>The issue was caused by the way the agency brought together logs produced by the commercial firms paid to carry out swab tests for the virus.<p>They filed their results in the form of text-based lists, without issue.<p>PHE had set up an automatic process to pull this data together into Excel templates so that it could then be uploaded to a central system and made available to the NHS Test and Trace team as well as other government computer dashboards.<p>The problem is that the PHE developers picked an old file format to do this - known as XLS.<p>As a consequence, each template could handle only about 65,000 rows of data rather than the one million-plus rows that Excel is actually capable of.<p>And since each test result created several rows of data, in practice it meant that each template was limited to about 1,400 cases. When that total was reached, further cases were simply left off.<p>Until last week, there were not enough test results being generated by private labs for this to have been a problem - PHE is confident that test results were not previously missed because of this issue.<p>And in its defence, the agency would note that it caught most of the cases within a day or two of the records slipping through its net.<p>To handle the problem, PHE is now breaking down the data into smaller batches to create a larger number of Excel templates in order to make sure none hit their cap.<p>But insiders acknowledge that their current clunky system needs to be replaced by something more advanced that does not involve Excel.</i><p>edit: also longer version by Leo here <a href="https:&#x2F;&#x2F;www.bbc.co.uk&#x2F;news&#x2F;technology-54423988" rel="nofollow">https:&#x2F;&#x2F;www.bbc.co.uk&#x2F;news&#x2F;technology-54423988</a>
评论 #24691218 未加载
评论 #24691179 未加载
underdeserverover 4 years ago
The real, visceral tragedy that is COVID-19 notwithstanding, I think it brings a marvelous example of why statistics, the importance of accuracy in data collection, and the effects of exponential growth are a must-have in today&#x27;s highschool curriculums.
评论 #24686510 未加载
评论 #24693521 未加载
评论 #24686172 未加载
评论 #24688651 未加载
mellosoulsover 4 years ago
Actual title:<p><i>Covid: Test error &#x27;should never have happened&#x27; - Hancock</i><p>Please note this has been discussed quite a lot here today.<p>Eg.<p><a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=24685911" rel="nofollow">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=24685911</a><p><a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=24685950" rel="nofollow">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=24685950</a>
评论 #24690286 未加载
kalaidoover 4 years ago
That happens if you only test ill people: <a href="https:&#x2F;&#x2F;www.worldometers.info&#x2F;coronavirus&#x2F;country&#x2F;spain&#x2F;" rel="nofollow">https:&#x2F;&#x2F;www.worldometers.info&#x2F;coronavirus&#x2F;country&#x2F;spain&#x2F;</a>
OJFordover 4 years ago
Overblown, IMO - only the useless absolute figure is affected, so it just creates another reason it&#x27;s useless.<p>Unless positive result files are more likely to exceed the file size limit than negative ones, of course, which I haven&#x27;t heard suggested is the case.<p>Edit: Perhaps I should clarify that I&#x27;m not suggesting it isn&#x27;t a cockup. I just don&#x27;t see it as a massive scandal that renders the data useless - it just decreased the sample size. What particularly wound me up was a Radio 4 presenter this morning objecting to the guest (I missed the start, I&#x27;m not sure exactly who - a female public or civil servant) calling it a &#x27;glitch&#x27;. (&#x27;Really?! Really! 16000 missing cases is a glitch?!&#x27;) Well, yes. Glitches can have minor, severe, catastrophic, or no consequences.
评论 #24686055 未加载
评论 #24686371 未加载