TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

TPSV, an Alternative to TSV (and CSV)

41 点作者 ctenb17 天前

15 条评论

ilyagr17 天前
It&#x27;s a clever format, especially if the focus is on machines generating it and humans or machines reading it. It might even work for humans occasionally making minor edits without having to load the file in the spreadsheet.<p>I think it can encode anything except for something matching the regex `(\t+\|)+` at the end of cells (*Update:* Maybe `\n?(\t+\|)+`, but that doesn&#x27;t change my point much) including newlines and even newlines followed by `\` (with the newline extension, of course).<p>For a cell containing `cell&lt;newline&gt;\`, you&#x27;d have:<p><pre><code> |cell&lt;tab&gt;| \\&lt;tab &gt;| </code></pre> (where `&lt;tab &gt;` represents a single tab character regardless of the number of spaces)<p>Moreover, if you really needed it, you could add another extension to specify tabs or pipes at the end of cells. For a POC, two cells with contents `a&lt;tab&gt;|` and `b&lt;tab&gt;|` could be represented as:<p><pre><code> |a&lt;tab &gt;&lt;tab&gt;|b ~tab pipe&lt;tab&gt;|tab pipe </code></pre> (with literal words &quot;tab&quot; and &quot;pipe&quot;). Something nicer might also be possible.<p>*Update:* Though, if the focus is on humans reading it, it might also make sense to allow a single row of the table to wrap and span multiple lines in the file, perhaps as another extension.
评论 #43810109 未加载
aidenn017 天前
If only ASCII had a field separator character, then we could just use that instead.
评论 #43809286 未加载
评论 #43809279 未加载
karmakaze17 天前
Is there a text format like TSV&#x2F;CSV that can represent nested&#x2F;repeating sub-structures?<p>We have YAML but it&#x27;s too complex. JSON is rather verbose with all the repeated keys and quoting, XML even moreso. I&#x27;d also like to see a &#x27;schema tree&#x27; corresponding to a header row in TSV&#x2F;CSV. I&#x27;d even be fine with a binary format with standard decoding to see the plain-text contents. Something for XML like what MessagePack does for JSON would work, since we already have schema specifications.
评论 #43808694 未加载
评论 #43809049 未加载
TheTaytay17 天前
I have been using TSV a LOT lately for batch inputs and outputs for LLMs. Imagine categorizing 100 items. Give it a 100 row tsv with an empty category column, and have it emit a 100 row tsv with the category column filled in.<p>It has some nice properties: 1) it’s many fewer tokens than JSON. 2) it’s easier to edit prompts and examples in something like Google sheets, where the default format of a copied group of cells is in TSV. 3) have I mentioned how many fewer tokens it is? It’s faster, cheaper, and less brittle than a format that requires the redefinition of every column name for every row.<p>Obviously this breaks down for nested object hierarchies or other data that is not easily represented as a 2d table, but otherwise we’ve been quite happy. I think this format solves some other things I’ve wanted, including header comments, inline comments, better alignment, and markdown support.
Rhapso17 天前
The poor delimiter special characters in the ascii table never get any love.
评论 #43810119 未加载
Hackbraten17 天前
Good on you to leverage EditorConfig settings. Almost every modern IDE or editor supports it either out of the box or with a plug-in.
DrillShopper17 天前
Or we could use the actual characters for this purpose - the FS (file separator), GS (group separator), RS (record separator), and US (unit separator).<p>ASCII (and through it, Unicode) has these values specifically for this purpose.
评论 #43809167 未加载
评论 #43808685 未加载
评论 #43808529 未加载
helix27817 天前
I like that there is plenty of room for comments, and the multiline extension is also cool. The backslash almost looks like what I would write on paper if I wanted to sneak something into the previous line :)
评论 #43811545 未加载
montroser17 天前
This is pretty under-specified...<p>&gt; A cell starts with | and ends with one or more tabs.<p><pre><code> |one\t|two|three </code></pre> How many cells is this? Seems like just one, with garbage at the end, since there are no closing tabs after the first cell? Should this line count as a valid row?<p>&gt; A line that starts with a cell is a row. Any other lines are ignored.<p>Well, I guess it counts. Either way, how should one encode a value containing a tab followed by a pipe?
评论 #43808015 未加载
评论 #43808156 未加载
imtringued17 天前
The problem with CSV is that there are too many variations and that quoting is still a mess. The vast majority of people on this planet do not know CSV, so they invent a new adhoc format on the fly and falsely call it CSV.<p>TPSV solves none of that and makes things worse.
评论 #43811407 未加载
Hashex12954217 天前
We need binary formats. In this era we are capable for it. Throw away the text formats.
评论 #43809055 未加载
评论 #43808010 未加载
评论 #43808651 未加载
评论 #43809386 未加载
评论 #43808192 未加载
评论 #43811254 未加载
评论 #43807809 未加载
评论 #43807966 未加载
评论 #43808608 未加载
评论 #43808574 未加载
stevage17 天前
I hate this kind of format. It&#x27;s trying to be both a data format for computers and a display format for humans. Much better off just using a tool that can edit CSV files as tables.<p>Also it doesn&#x27;t seem to say anything about the header row?
评论 #43809387 未加载
CJefferson17 天前
Honestly at this point my favorite format is JSONLines (one JSON object per line).<p>It instinctively feels horrible, but it’s easy to create and parse in basically every language, easy to fully specify, recovers well from one broken line in large datasets, chops up and concatenates easily.
评论 #43809074 未加载
bvrmn17 天前
According to spec it&#x27;s nearly impossible to correctly edit files in this format by hand.
评论 #43808045 未加载
AstroJetson17 天前
&gt; A row with too many cells has the superfluous cells ignored.<p>Ummm, how do you figure out what row has too many cells? Can all the rows before this one have too few cells?
评论 #43808051 未加载
评论 #43808034 未加载