TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Show HN: An app to split CSV into multiple files to avoid Excel's 1M row limit

14 点作者 tanin超过 2 年前

11 条评论

prepend超过 2 年前
Why is this a yearly cost? I don’t understand.<p>There’s already open source utilities [0] for users who aren’t proficient in Unix commands.<p>If anything this should be the definition of a one time fee.<p>You’re free to charge whatever you like, but it seems odd that anyone would pay you year after year to use your app.<p>It’s not the price, as $40 isn’t that much, but the value and principle of the thing.<p>[0] <a href="https:&#x2F;&#x2F;github.com&#x2F;philoushka&#x2F;LargeFileSplitter" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;philoushka&#x2F;LargeFileSplitter</a>
评论 #33240024 未加载
Someone超过 2 年前
The video (why?) doesn’t load for me, so I can’t check features, but what’s wrong with <i>split</i> and <i>csplit</i>? (<a href="https:&#x2F;&#x2F;www.gnu.org&#x2F;software&#x2F;coreutils&#x2F;manual&#x2F;html_node&#x2F;split-invocation.html" rel="nofollow">https:&#x2F;&#x2F;www.gnu.org&#x2F;software&#x2F;coreutils&#x2F;manual&#x2F;html_node&#x2F;spli...</a>, <a href="https:&#x2F;&#x2F;man.openbsd.org&#x2F;split.1" rel="nofollow">https:&#x2F;&#x2F;man.openbsd.org&#x2F;split.1</a>)
评论 #33240007 未加载
mattewong超过 2 年前
TL&#x2F;DR: xsv is probably what you want, or maybe zsv and&#x2F;or awk<p>awk can do this super easily. Here&#x27;s an example snippet that not only shards, but compresses your shards. ``` (NR - 1) % shard_size == 0 { # ready to start a new shard current_n = current_n + 1 output_file = sprintf(&quot;%s%04d.%s.bz2&quot;, target_prefix, current_n, file_type) print &quot;writing to &quot; output_file &gt; &quot;&#x2F;dev&#x2F;stderr&quot;<p><pre><code> # close any prior-opened output_command (else will err on too many open files) if(output_command != &quot;&quot;) close(output_command) output_command = &quot;bzip2 &gt; &quot; output_file # print header print headrow | output_command</code></pre> }<p>NR != 1 { print $0 | output_command }<p>```<p>This of course assumes that each line is a single record, so you&#x27;ll need some preprocessing if your CSV might contain embedded line-ends. For the preprocessing, you can use something like the `2tsv` or `select -e ...` command of <a href="https:&#x2F;&#x2F;github.com&#x2F;liquidaty&#x2F;zsv" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;liquidaty&#x2F;zsv</a> (disclaimer: I&#x27;m its author) to ensure each record is a single line.<p>You can also use something like `xsv split` (see <a href="https:&#x2F;&#x2F;lib.rs&#x2F;crates&#x2F;xsv" rel="nofollow">https:&#x2F;&#x2F;lib.rs&#x2F;crates&#x2F;xsv</a>) which frankly is probably your best option as of today (though zsv will be getting its own shard command soon)
tryithard超过 2 年前
<a href="https:&#x2F;&#x2F;man7.org&#x2F;linux&#x2F;man-pages&#x2F;man1&#x2F;split.1.html" rel="nofollow">https:&#x2F;&#x2F;man7.org&#x2F;linux&#x2F;man-pages&#x2F;man1&#x2F;split.1.html</a>
sammyteee超过 2 年前
This site probably isn&#x27;t your target audience....
评论 #33240823 未加载
jcbages超过 2 年前
I think $40 a year just for the split feature is a little too much. However, adding more useful features for manipulating CSV files would probably change my mind about it. For example, doing some reorder or preprocess of the files as if people split into X files they&#x27;ll have to do that action X times if they do it in Excel directly.
评论 #33280937 未加载
评论 #33284118 未加载
smitty1e超过 2 年前
I got smacked with this just last week.<p>My answer was just import into MS Access.<p>Note: if you&#x27;re pulling in a lot of S3 list_objects_v2() data, and have some honking big object sizes, the Long Integer type craps out at representing a 2Gb file. You need to use Double.
kristianp超过 2 年前
It&#x27;s interesting how people use and abuse Excel. How many people try to use Excel to process millions of rows? Time to use tools that can directly query csv files to aggregate the data.
评论 #33243026 未加载
_boffin_超过 2 年前
The answer is actually PowerPivot. Access you can do a few GB, but with PowerPivot, you can do a billion or more rows. Don’t expect any insane performance though.
NibLer超过 2 年前
Do people really have CSV files with more than 1M rows? How do you search and update such files? Why not use some database?
评论 #33246640 未加载
评论 #33245049 未加载
hurricaneditka超过 2 年前
so this will be shared again with no updates in a few months? <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;from?site=superintendent.app" rel="nofollow">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;from?site=superintendent.app</a>
评论 #33284120 未加载