TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Run SQL on CSV, Parquet, JSON, Arrow, Unix Pipes and Google Sheet

294 pointsby houqpover 2 years ago

15 comments

ebfe1over 2 years ago
This is cool...Totally reminded me about several tools pop up on HN every now and then in the past for similar task so i did a quick search:<p>clickhouse-local - <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=22457767" rel="nofollow">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=22457767</a><p>q - <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=27423276" rel="nofollow">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=27423276</a><p>textql - <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=16781294" rel="nofollow">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=16781294</a><p>simpql- <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=25791207" rel="nofollow">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=25791207</a><p>We need a benchmark i think..;)
评论 #32969391 未加载
评论 #32966954 未加载
评论 #32970085 未加载
评论 #32970080 未加载
评论 #32969685 未加载
评论 #32979104 未加载
mmastracover 2 years ago
The one thing everyone here is missing so far is that it&#x27;s a Rust binary, distributed on PyPi. That&#x27;s brilliant.
评论 #32966786 未加载
评论 #32966584 未加载
评论 #32966319 未加载
评论 #32965459 未加载
gavinrayover 2 years ago
1) roapi is built with some wicked cool tech<p>2) the author once answered some questions I posted on Datafusion, so they&#x27;re cool in my book<p>Here are my anecdotes.
playingalongover 2 years ago
Bye bye jq and your awful query syntax.
评论 #32967722 未加载
henrydarkover 2 years ago
It is pretty cool. py-spy has also been doing this for a few years<p><a href="https:&#x2F;&#x2F;github.com&#x2F;benfred&#x2F;py-spy" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;benfred&#x2F;py-spy</a>
tootieover 2 years ago
AWS Athena offers something similar. You can build tables off of structured text files (like log files) in S3 and run SQL queries.
评论 #32966384 未加载
johnnunnover 2 years ago
I have a use case, where my company&#x27;s application logs will be shipped to S3 in a directory structure such as application&#x2F;timestamp(one_hour)_logs.parquet. We want to build a simple developer focussed UI, where we can query for a given application for a time range and retrieve a bunch of s3 blobs in that time range and brute force search for the desired string. I see that roapi offers a REST interface for a fixed set of files but I would like to dynamically glob newer files. Are there are alternatives that can be used too ? Thanks
评论 #32970372 未加载
评论 #32970400 未加载
评论 #32970072 未加载
cube2222over 2 years ago
This looks really cool! Especially using datafusion underneath means that it probably is blazingly fast.<p>If you like this, I recommend taking a look at OctoSQL[0], which I&#x27;m the author of.<p>It&#x27;s plenty fast and easier to add new data sources for as external plugins.<p>It can also handle endless streams of data natively, so you can do running groupings on i.e. tailed JSON logs.<p>Additionally, it&#x27;s able to push down predicates to the database below, so if you&#x27;re selecting 10 rows from a 1 billion row table, it&#x27;ll just get those 10 rows instead of getting them all and filtering in memory.<p>[0]: <a href="https:&#x2F;&#x2F;github.com&#x2F;cube2222&#x2F;octosql" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;cube2222&#x2F;octosql</a>
评论 #32967932 未加载
评论 #32970495 未加载
smugmaover 2 years ago
SQL on CSV (using preinstalled Mac tools) previously linked on HN: <a href="https:&#x2F;&#x2F;til.simonwillison.net&#x2F;sqlite&#x2F;one-line-csv-operations" rel="nofollow">https:&#x2F;&#x2F;til.simonwillison.net&#x2F;sqlite&#x2F;one-line-csv-operations</a><p>e.g.<p>sqlite3 :memory: -cmd &#x27;.mode csv&#x27; -cmd &#x27;.import royalties.csv Royalty&#x27; -cmd &#x27;.mode column&#x27; \<p><pre><code> &#x27;SELECT SUM(Royalty),Currency FROM Royalty GROUP BY Currency&#x27;</code></pre>
bachmeierover 2 years ago
As I commented on a recent similar discussion, these tools can&#x27;t be used for update or insert. As useful as querying might be, it&#x27;s terribly misleading to claim to &quot;run SQL&quot; if you can&#x27;t change the data, since that&#x27;s such a critical part of an SQL database.
评论 #32964803 未加载
评论 #32964636 未加载
评论 #32965918 未加载
评论 #32965211 未加载
评论 #32966431 未加载
评论 #32967471 未加载
skybrianover 2 years ago
Looks like it also supports SQLite for input, but not for output. That might be a nice addition.
the_optimistover 2 years ago
What’s the memory handling behavior here? Are CSVs read on query or at startup? What about Arrow? If read on startup, is there compression applied?
theGnuMeover 2 years ago
This is really cool and redefines ETL pipelines.
whimsicalismover 2 years ago
Trino can do this as well.
Kalanosover 2 years ago
is there a pythonic api for scripting (not command line)? i was looking for a json query tool and couldn&#x27;t find one.
评论 #32967995 未加载