TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

csvkit: Command-line tools for working with CSV

4 pointsby mxgrover 2 years ago

2 comments

mattewongover 2 years ago
I wanted so much to use csvkit and all the features it had, but its horrendous performance made it unscalable and therefore the more I used it, the more technical debt I accumulated.<p>This was one of the reasons I wrote zsv (<a href="https:&#x2F;&#x2F;github.com&#x2F;liquidaty&#x2F;zsv">https:&#x2F;&#x2F;github.com&#x2F;liquidaty&#x2F;zsv</a>). Maybe csvkit could incorporate the zsv engine and we could get the best of both worlds?<p>Examples (using majestic million csv):<p>---<p>csvcut -c 1,3 = 5.3 seconds<p>zsv select -n -- 1 3 = 0.19 seconds<p>28x faster<p>---<p>csvsql --query &quot;select count(*) from file&quot; file.csv = 148 seconds<p>zsv sql &quot;select count(*) from data&quot; file.csv = 0.68 seconds<p>216x faster<p>---
hermitcrabover 2 years ago
It is interesting how much different tools vary in their performance for the same task. For example R with data.table is <i>much</i> faster than base R. And Excel Power Query performance is, well, see for yourself:<p><a href="https:&#x2F;&#x2F;www.easydatatransform.com&#x2F;data_wrangling_etl_tools.html" rel="nofollow">https:&#x2F;&#x2F;www.easydatatransform.com&#x2F;data_wrangling_etl_tools.h...</a>