科技回声

11 条评论

prepend超过 2 年前

Why is this a yearly cost? I don’t understand.There’s already open source utilities [0] for users who aren’t proficient in Unix commands.If anything this should be the definition of a one time fee.You’re free to charge whatever you like, but it seems odd that anyone would pay you year after year to use your app.It’s not the price, as $40 isn’t that much, but the value and principle of the thing.[0] <a href="https://github.com/philoushka/LargeFileSplitter" rel="nofollow">https://github.com/philoushka/LargeFileSplitter</a>

评论 #33240024 未加载

Someone超过 2 年前

The video (why?) doesn’t load for me, so I can’t check features, but what’s wrong with split and csplit? (<a href="https://www.gnu.org/software/coreutils/manual/html_node/split-invocation.html" rel="nofollow">https://www.gnu.org/software/coreutils/manual/html_node/spli...</a>, <a href="https://man.openbsd.org/split.1" rel="nofollow">https://man.openbsd.org/split.1</a>)

评论 #33240007 未加载

mattewong超过 2 年前

TL/DR: xsv is probably what you want, or maybe zsv and/or awkawk can do this super easily. Here's an example snippet that not only shards, but compresses your shards. ``` (NR - 1) % shard_size == 0 { # ready to start a new shard current_n = current_n + 1 output_file = sprintf("%s%04d.%s.bz2", target_prefix, current_n, file_type) print "writing to " output_file > "/dev/stderr"<pre><code> # close any prior-opened output_command (else will err on too many open files) if(output_command != "") close(output_command) output_command = "bzip2 > " output_file # print header print headrow | output_command</code></pre> }NR != 1 { print $0 | output_command }```This of course assumes that each line is a single record, so you'll need some preprocessing if your CSV might contain embedded line-ends. For the preprocessing, you can use something like the `2tsv` or `select -e ...` command of <a href="https://github.com/liquidaty/zsv" rel="nofollow">https://github.com/liquidaty/zsv</a> (disclaimer: I'm its author) to ensure each record is a single line.You can also use something like `xsv split` (see <a href="https://lib.rs/crates/xsv" rel="nofollow">https://lib.rs/crates/xsv</a>) which frankly is probably your best option as of today (though zsv will be getting its own shard command soon)

tryithard超过 2 年前

<a href="https://man7.org/linux/man-pages/man1/split.1.html" rel="nofollow">https://man7.org/linux/man-pages/man1/split.1.html</a>

sammyteee超过 2 年前

This site probably isn't your target audience....

评论 #33240823 未加载

jcbages超过 2 年前

I think $40 a year just for the split feature is a little too much. However, adding more useful features for manipulating CSV files would probably change my mind about it. For example, doing some reorder or preprocess of the files as if people split into X files they'll have to do that action X times if they do it in Excel directly.

评论 #33280937 未加载

评论 #33284118 未加载

smitty1e超过 2 年前

I got smacked with this just last week.My answer was just import into MS Access.Note: if you're pulling in a lot of S3 list_objects_v2() data, and have some honking big object sizes, the Long Integer type craps out at representing a 2Gb file. You need to use Double.

kristianp超过 2 年前

It's interesting how people use and abuse Excel. How many people try to use Excel to process millions of rows? Time to use tools that can directly query csv files to aggregate the data.

评论 #33243026 未加载

_boffin_超过 2 年前

The answer is actually PowerPivot. Access you can do a few GB, but with PowerPivot, you can do a billion or more rows. Don’t expect any insane performance though.

NibLer超过 2 年前

Do people really have CSV files with more than 1M rows? How do you search and update such files? Why not use some database?

评论 #33246640 未加载

评论 #33245049 未加载

hurricaneditka超过 2 年前

so this will be shared again with no updates in a few months? <a href="https://news.ycombinator.com/from?site=superintendent.app" rel="nofollow">https://news.ycombinator.com/from?site=superintendent.app</a>

评论 #33284120 未加载

11 条评论

prepend超过 2 年前

评论 #33240024 未加载

Someone超过 2 年前

评论 #33240007 未加载

mattewong超过 2 年前

tryithard超过 2 年前

<a href="https://man7.org/linux/man-pages/man1/split.1.html" rel="nofollow">https://man7.org/linux/man-pages/man1/split.1.html</a>

sammyteee超过 2 年前

This site probably isn't your target audience....

评论 #33240823 未加载

jcbages超过 2 年前

评论 #33280937 未加载

评论 #33284118 未加载

smitty1e超过 2 年前

kristianp超过 2 年前

It's interesting how people use and abuse Excel. How many people try to use Excel to process millions of rows? Time to use tools that can directly query csv files to aggregate the data.

评论 #33243026 未加载

_boffin_超过 2 年前

The answer is actually PowerPivot. Access you can do a few GB, but with PowerPivot, you can do a billion or more rows. Don’t expect any insane performance though.

NibLer超过 2 年前

Do people really have CSV files with more than 1M rows? How do you search and update such files? Why not use some database?

评论 #33246640 未加载

评论 #33245049 未加载

hurricaneditka超过 2 年前

评论 #33284120 未加载

Show HN: An app to split CSV into multiple files to avoid Excel's 1M row limit

11 条评论

Show HN: An app to split CSV into multiple files to avoid Excel's 1M row limit

11 条评论