科技回声

7 条评论

burntsushi超过 10 年前

Author here. I was really hoping to get binaries for Windows/Mac/Linux available before sharing it with others, but clearly I snoozed. I do have them available for Linux though, so you don't have to install Rust in order to try xsv: <a href="https://github.com/BurntSushi/xsv/releases" rel="nofollow">https://github.com/BurntSushi/xsv/releases</a>Otherwise, you could try using rustle[1], which should install `xsv` in one command (but it downloads Rust and compiles everything for you).While I have your attention, if I had to pick one of the cooler features of xsv, I'd tell you about `xsv index`. Its a command that creates a very simple index that permits random access to your CSV data. This makes a lot of operations pretty fast. For example:<pre><code> xsv index worldcitiespop.csv # ~1.5s for 145MB xsv slice -i 500000 worldcitiespop.csv | xsv table # instant, plus elastic tab stops for good measure </code></pre> That second command doesn't have to chug through the first 499,999 records to get the 500,000th record.This can make other commands faster too, like random sampling and statistic gathering. (Parallelism is used when possible!)Finally, have you ever seen a CLI app QuickCheck'd? Yes. It's awesome! :-) <a href="https://github.com/BurntSushi/xsv/blob/master/tests/test_sort.rs" rel="nofollow">https://github.com/BurntSushi/xsv/blob/master/tests/test_sor...</a>[1] - <a href="https://github.com/brson/rustle" rel="nofollow">https://github.com/brson/rustle</a>

评论 #9090353 未加载

评论 #9091306 未加载

dbro超过 10 年前

Here's another suggestion for the criticism section (which is a good idea for any open-minded project to include):Instead of using a separate set of tools to work with CSV data, use an adapter to allow existing tools to work around CSV's quirky quoting methods.csvquote (<a href="https://github.com/dbro/csvquote" rel="nofollow">https://github.com/dbro/csvquote</a>) enables the regular UNIX command line text toolset (like cut, wc, awk, etc.) to work properly with CSV data.

评论 #9090428 未加载

tbrownaw超过 10 年前

From the "criticisms" section: You shouldn't be working with CSV data because CSV is a terrible format.Er, what's wrong with it? Or is this a case of, people using it for things other than what it's meant for? Is there a better format for sending data between different companies using different enterprisey database systems?My complaint about csv is that people frequently generate it manually and don't understand how to quote text fields, so they don't double any quote characters that are part of the data. Which means I have to spend time cleaning up malformed files.

评论 #9090257 未加载

评论 #9089975 未加载

评论 #9090096 未加载

评论 #9089912 未加载

101914超过 10 年前

Did you try benchmarking against kdb+?Seems like there are always HN commenters lambasting CSV. I am sure they have very good reasons.But, as for me, CSV is one of my favorite formats. (Sort of like how people like XML or JSON I guess.) I like the limitations of CSV because I like simple, raw data.I wish the de facto format that www servers delivered was CSV instead of HTML (for reason why, see below). Or at least I wish there was an option to receive pages in CSV in addition to HTML.Users could create their own markup, client side. Users could effectively use their "spreadsheet software" to read the information on the www. Or they could create infinitely creative presentations of data for themselves or others using HTML5 or some other tool of personal expression.It is easy to create HTML from CSV but I find it is a nuisance creating CSV from HTML.Because I have a need for CSV I write scanners with flex to convert HTML to CSV.I often wonder why I cannot access all the data I need from the www in CSV format. Many have agreed over the years that the www needs more structure to be more valuable as a data source. If data is first created in CSV, then you have some inherent structure to build on; you can _use it_ to create markup and add infinite creativity without destroying the underlying structure.If data (cf. art or forms of personal expression) cannot be presented in CSV then is it really raw data or is it something else, more subjective and unwieldy?Whatever. Back to reality. Pay no mind.

评论 #9090848 未加载

btown超过 10 年前

If you need to do an indexing step anyways, why not simply import the data into a SQL database, or build this as a wrapper that introspects the CSV file, builds a database schema, and does the import for you? Is the issue limited scratch space?

评论 #9093825 未加载

userbinator超过 10 年前

Looks like it's based on this CSV parser:<a href="https://github.com/BurntSushi/rust-csv" rel="nofollow">https://github.com/BurntSushi/rust-csv</a>and it claims to be RFC4180-compliant, which is a good thing.

brazzledazzle超过 10 年前

This is one of the things I really love about PowerShell. Import, manipulation and export of formatted raw data like CSV is dead simple.

7 条评论

burntsushi超过 10 年前

评论 #9090353 未加载

评论 #9091306 未加载

dbro超过 10 年前

评论 #9090428 未加载

tbrownaw超过 10 年前

评论 #9090257 未加载

评论 #9089975 未加载

评论 #9090096 未加载

评论 #9089912 未加载

101914超过 10 年前

评论 #9090848 未加载

btown超过 10 年前

评论 #9093825 未加载

userbinator超过 10 年前

brazzledazzle超过 10 年前

This is one of the things I really love about PowerShell. Import, manipulation and export of formatted raw data like CSV is dead simple.

XSV – A fast CSV toolkit in Rust

7 条评论

XSV – A fast CSV toolkit in Rust

7 条评论