The real problem is of course using Excel to maintain data. That's not what it's for. That's what databases are for. Excel is for complex calculations over rows and columns. And of course it should be better at data validation; you should be able to tell it whether a certain column is a date or not, and Excel shouldn't be jumping to stupid conclusions about it, but it does, and that means you need to be careful and not use Excel for something like that.<p>Excel misuse is sadly rampant and one of my main frustrations at my previous project. Excel is popular for this sort of misuse because when you open it, it presents you with an empty table and you can immediately start typing. This invites tabular data. But Excel inevitably fucks up because it lacks proper data types, data validation, or foreign keys, and loves making assumptions about what you mean. This makes it a terrible and harmful choice for any sort of serious data. I even proposed a new project for an Excel-like frontend backed by a database exactly for these sort of situations. Because I do get what attracts people to Excel for this sort of thing. And most people don't realise what a terrible choice it is.
PSA: if you use Excel as intended, you would just go File/New/Blank Worksheet, switch to the Data tab, then use the 'From Text/CSV' wizard to specify the data types for each column in your data source, neatly bypassing this entire issue.<p>But yes, just double-clicking the CSV from Explorer, using the legacy 'open this as a sheet' functionality, experiencing data loss and then complaining about it (and the state of Excel, MSFT and The World in general) on social media is much, much more fun...
> Microsoft Excel may be fleeting, but human genes will be around for as long as we are. It’s best to give them names that work.<p>This is the crux of the issue. Yes, Excel could do better with support for CSV. (OpenOffice and LibreOffice habdle this better, for example.) But no, genetics is not some niche use-case for this better support. It's just one example of many. I do analytics of another sort for a bank, and we run into this problem all the time with those pesky 16-digit account numbers.<p>But, while Microsoft might do well to make changes to address their poor implementation, someone somewhere will implement the next best thing and screw it up, too.
I love it when excel decides a bunch of digits is a date or decides to remove leading 0000s from a very important number that needs 0000s. Or maybe it wants to make a bunch of digits into scientific notation. I would prefer if excel left my cells alone or just asked what I wanted to do.
Part of the problem, highlighted in the article: Users don't set the data type of the column off of "General". So it tries to figure out the data type on entry. Setting the column to use a specific type ("Text"), keeps it from trying to use heuristics to figure out that something is a date.<p>Something more recent is the introduction into Excel of Power Query, which lets you import a CSV and apply arbitrary transformations (such as applying a type), before it hits the workbook, so if you need to pull in a CSV, you can do so, and it will always be imported the same way.
At the time (506 comments): <a href="https://news.ycombinator.com/item?id=24070385" rel="nofollow">https://news.ycombinator.com/item?id=24070385</a>
> Be conservative in what you send, be liberal in what you accept.<p>This is why I think that's wrong: if you allow people to be sloppy with how they do things, they'll do it, and then make it part of their workflow, product, religion or whatever, and now everyone is stuck with it.<p>Be absolutely explicit with what you accept and refuse to deal with crap. Then you will only ever have to maintain a simple validator and the code that deals with good data, rather than having to have an incredibly hairly validator that leaks into your logic at every level, followed by cementing your bugs into everyone's implementations.
Good initiative, but there are hundreds of used languages, I hope they put a bit more effort into it than just testing the English dates cases. I caught one bad example, for any French speakers, in the article itself:<p>and MARS tweaked to MARS1.
Looks more like the scientists are choosing the wrong tool to do their job. Sure you can use excel, to get a sense of the data, but for god sake, there are better tools for that - or even contribute to an existing open source one to do what you need. I'm quite sure scientists could get some grants to hire actual software engineers to come up with a better solution, based on the open source options existing there.
Reality adapting to software (instead of the other way around) seems to be more and more common. Just today, my local library sent out email informing users that, while they agree that they shouldn't charge late fees for non-work days, the software company they use doesn't want to introduce such feature into the software, so they have no way but to charge for those days.
Does anyone here use Excel in any capacity on production data in the biosciences? If so, did you go to college for the biosciences, or are you supplying work using a tool you learned under another discipline?<p>I'm always fascinated to learn how Excel makes its way into unusual places.
>Even then, a scientist might fix their data but export it as a CSV file without saving the formatting.<p>Is this supposed to be a joke? Since when does excel support CSV files? Yes you can import and export CSV files but that is just there to check a box. That feature doesn't actually work. Just import and export .xlsx files in your applications directly.<p>CSV is such a bad format because it's not even a standard, there is RFC4180 but most people have never heard of it. CSV is complicated enough that anyone who thinks they can implement it will get it wrong on their first attempt but simple enough that people believe they can implement it on their first attempt.