TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

XSV – A fast CSV toolkit in Rust

101 点作者 mseri超过 10 年前

7 条评论

burntsushi超过 10 年前
Author here. I was really hoping to get binaries for Windows&#x2F;Mac&#x2F;Linux available before sharing it with others, but clearly I snoozed. I do have them available for Linux though, so you don&#x27;t have to install Rust in order to try xsv: <a href="https://github.com/BurntSushi/xsv/releases" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;BurntSushi&#x2F;xsv&#x2F;releases</a><p>Otherwise, you could try using rustle[1], which should install `xsv` in one command (but it downloads Rust and compiles everything for you).<p>While I have your attention, if I had to pick one of the cooler features of xsv, I&#x27;d tell you about `xsv index`. Its a command that creates a very simple index that permits random access to your CSV data. This makes a lot of operations pretty fast. For example:<p><pre><code> xsv index worldcitiespop.csv # ~1.5s for 145MB xsv slice -i 500000 worldcitiespop.csv | xsv table # instant, plus elastic tab stops for good measure </code></pre> That second command doesn&#x27;t have to chug through the first 499,999 records to get the 500,000th record.<p>This can make other commands faster too, like random sampling and statistic gathering. (Parallelism is used when possible!)<p>Finally, have you ever seen a CLI app QuickCheck&#x27;d? Yes. It&#x27;s awesome! :-) <a href="https://github.com/BurntSushi/xsv/blob/master/tests/test_sort.rs" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;BurntSushi&#x2F;xsv&#x2F;blob&#x2F;master&#x2F;tests&#x2F;test_sor...</a><p>[1] - <a href="https://github.com/brson/rustle" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;brson&#x2F;rustle</a>
评论 #9090353 未加载
评论 #9091306 未加载
dbro超过 10 年前
Here&#x27;s another suggestion for the criticism section (which is a good idea for any open-minded project to include):<p>Instead of using a separate set of tools to work with CSV data, use an adapter to allow existing tools to work around CSV&#x27;s quirky quoting methods.<p>csvquote (<a href="https://github.com/dbro/csvquote" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;dbro&#x2F;csvquote</a>) enables the regular UNIX command line text toolset (like cut, wc, awk, etc.) to work properly with CSV data.
评论 #9090428 未加载
tbrownaw超过 10 年前
From the &quot;criticisms&quot; section: <i>You shouldn&#x27;t be working with CSV data because CSV is a terrible format.</i><p>Er, what&#x27;s wrong with it? Or is this a case of, people using it for things other than what it&#x27;s meant for? Is there a better format for sending data between different companies using different enterprisey database systems?<p>My complaint about csv is that people frequently generate it manually and don&#x27;t understand how to quote text fields, so they don&#x27;t double any quote characters that are part of the data. Which means I have to spend time cleaning up malformed files.
评论 #9090257 未加载
评论 #9089975 未加载
评论 #9090096 未加载
评论 #9089912 未加载
101914超过 10 年前
Did you try benchmarking against kdb+?<p>Seems like there are always HN commenters lambasting CSV. I am sure they have very good reasons.<p>But, as for me, CSV is one of my favorite formats. (Sort of like how people like XML or JSON I guess.) I like the limitations of CSV because I like simple, raw data.<p>I wish the de facto format that www servers delivered was CSV instead of HTML (for reason why, see below). Or at least I wish there was an option to receive pages in CSV in addition to HTML.<p>Users could create their own markup, client side. Users could effectively use their &quot;spreadsheet software&quot; to read the information on the www. Or they could create infinitely creative presentations of data for themselves or others using HTML5 or some other tool of personal expression.<p>It is easy to create HTML from CSV but I find it is a nuisance creating CSV from HTML.<p>Because I have a need for CSV I write scanners with flex to convert HTML to CSV.<p>I often wonder why I cannot access all the data I need from the www in CSV format. Many have agreed over the years that the www needs more structure to be more valuable as a data source. If data is first created in CSV, then you have some inherent structure to build on; you can _use it_ to create markup and add infinite creativity without destroying the underlying structure.<p>If data (cf. art or forms of personal expression) cannot be presented in CSV then is it really raw data or is it something else, more subjective and unwieldy?<p>Whatever. Back to reality. Pay no mind.
评论 #9090848 未加载
btown超过 10 年前
If you need to do an indexing step anyways, why not simply import the data into a SQL database, or build this as a wrapper that introspects the CSV file, builds a database schema, and does the import for you? Is the issue limited scratch space?
评论 #9093825 未加载
userbinator超过 10 年前
Looks like it&#x27;s based on this CSV parser:<p><a href="https://github.com/BurntSushi/rust-csv" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;BurntSushi&#x2F;rust-csv</a><p>and it claims to be RFC4180-compliant, which is a good thing.
brazzledazzle超过 10 年前
This is one of the things I really love about PowerShell. Import, manipulation and export of formatted raw data like CSV is dead simple.