When you have "format wars", the best idea is usually to have a converter program change to the easiest to work with format - <i>unless</i> this incurs a space explosion as per some image/video formats.<p>With CSV-like data, bulk conversion from quoted-escaped RFC4180 CSV to a simpler-to-parse format is the best plan for several reasons. First, it may "catch on", help Microsoft/R/whoever embrace the format and in doing so squash many bugs written by "data analyst/scientist coders". Second, in a shell "<i>a|b</i>" runs programs <i>a</i> & <i>b</i> in parallel on multi-core and allow things like <i>csv2x|head -n10000|b</i> or <i>popen("csv2x foo.csv")</i>. Third, bulk conversion to a random access file where literal delimiters cannot occur as non-delimiters allows trivial file segmentation to be nCores times faster (under often satisfied assumptions). There are some D tools for this bulk convert in <a href="https://github.com/eBay/tsv-utils" rel="nofollow">https://github.com/eBay/tsv-utils</a> and a much smaller stand-alone Nim tool <a href="https://github.com/c-blake/nio/blob/main/utils/c2tsv.nim" rel="nofollow">https://github.com/c-blake/nio/blob/main/utils/c2tsv.nim</a> . Optional quoting was always going to be a PITA due to its non-locality. What if there is no quote anywhere? Fourth, by using a program as the unit of modularity in this case, you make things programming language agnostic. Someone could go to town and write a pure SIMD/AVX512 converter in assembly even and solve the problem "once and for all" on a given CPU. The problem is actually just simple enough that this smells possible.<p>I am unaware of any "document" that "standardizes" this escaped/lossless TSV format. { Maybe call it "DSV" for delimiter separated values where "delimiters actually separate"? Ironically redundant. ;-) } Someone want to write an RFC or point to one? It can be just as "general/lossless" (see <a href="https://news.ycombinator.com/item?id=31352170" rel="nofollow">https://news.ycombinator.com/item?id=31352170</a>).<p>Of course, if you are going to do a lot of data processing against some data, it is even better to parse all the way to down to binary so that you never have to parse again (well, unless you call CPUs loading registers "parsing") which is what database systems have been doing since the 1960s.