TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

How to fix CSV: make it even more U+1F4A9 PILE OF POO

26 pointsby paulfitzabout 1 year ago

10 comments

akira2501about 1 year ago
What if my data contains a new line? People focus on the comma then forget the newline is just as significant. That still needs to be escaped and we&#x27;re right back where we started.<p>Meanwhile, RFC4180 takes less time to read than this entire article.
评论 #39868046 未加载
评论 #39872701 未加载
kristopolousabout 1 year ago
If we are willing to throw away the comma, use the ASCII RS, record separator symbol. It&#x27;s exactly what you want, even has a visual ␞ these days.<p>It&#x27;s a problem solved decades ago with solutions we&#x27;ve failed to adopt. Weird, buggy, poorly parsable CSV is still somehow the norm.<p>Not saying you should, but if you want to change, the answer is already there. Change has to start somewhere...
ogoffartabout 1 year ago
Not long ago there was also a post about &quot;Unicode Separated Value&quot; <a href="https:&#x2F;&#x2F;github.com&#x2F;sixarm&#x2F;usv">https:&#x2F;&#x2F;github.com&#x2F;sixarm&#x2F;usv</a> <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=39679378">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=39679378</a>
评论 #39867843 未加载
评论 #39867859 未加载
verandaguyabout 1 year ago
Technically editorialized title (since the original article just uses the emoji verbatim), but I think this is a net improvement.
评论 #39867684 未加载
dbt00about 1 year ago
0x1d and 0x1e in the ascii standard exist for exactly this reason and don’t need more than one byte unlike this goofy thing.
评论 #39868083 未加载
评论 #39867868 未加载
评论 #39870579 未加载
gwbas1cabout 1 year ago
I don&#x27;t think CSV can ever be &quot;fixed.&quot; Its popular because there is always someone naive enough to think that it works, and ignorant of specs that handle corner cases.
gwbas1cabout 1 year ago
I think we should name the files &quot;.cso&quot;<p>CSO is a stormwater industry term for &quot;Combined Sewer Overflow.&quot; They happen in older cities where storm runoff and raw sewage (poop) go into the same sewer system. When there is a lot of rain, the wastewater treatment plants overflow, and then raw sewage runs into waterways.<p><a href="https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Combined_sewer#Combined_sewer_overflows_(CSOs)" rel="nofollow">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Combined_sewer#Combined_sewer_...</a>
评论 #39868585 未加载
评论 #39868255 未加载
dwheelerabout 1 year ago
No! The poop symbol is used in data, and thus is a terrible separator. If you have to quote it anyway, use commas, as that is already in use. Or use &quot;Unicode&quot; separated values.
评论 #39868032 未加载
ghustoabout 1 year ago
Perhaps naive, but we escape with \ everywhere else, so why not here?<p>If you&#x27;re typing in CSV manually, escape with \<p>If you&#x27;re exporting to CSV, the program already know which part is data and which part is the next cell, so again the program can escape with \
评论 #39868013 未加载
refulgentisabout 1 year ago
<a href="https:&#x2F;&#x2F;webcache.googleusercontent.com&#x2F;search?q=cache:https:&#x2F;&#x2F;www.getgrist.com&#x2F;blog&#x2F;how-to-fix-csv-make-it-even-more-%25F0%259F%2592%25A9&#x2F;" rel="nofollow">https:&#x2F;&#x2F;webcache.googleusercontent.com&#x2F;search?q=cache:https:...</a><p>n.b. not worth your time. tl;dr: lets replace the comma with the poop emoji because commas occur in data.<p>There&#x27;s already a solution to that (obviously). Best argument a contrarian could make is you &quot;learn about unicode&quot;, by which they&#x27;d mean, the words &quot;basic multilingual plane&quot; are included at one point.