TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Run SQL on CSV, Parquet, JSON, Arrow, Unix Pipes and Google Sheet

294 点作者 houqp超过 2 年前

15 条评论

ebfe1超过 2 年前
This is cool...Totally reminded me about several tools pop up on HN every now and then in the past for similar task so i did a quick search:<p>clickhouse-local - <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=22457767" rel="nofollow">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=22457767</a><p>q - <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=27423276" rel="nofollow">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=27423276</a><p>textql - <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=16781294" rel="nofollow">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=16781294</a><p>simpql- <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=25791207" rel="nofollow">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=25791207</a><p>We need a benchmark i think..;)
评论 #32969391 未加载
评论 #32966954 未加载
评论 #32970085 未加载
评论 #32970080 未加载
评论 #32969685 未加载
评论 #32979104 未加载
mmastrac超过 2 年前
The one thing everyone here is missing so far is that it&#x27;s a Rust binary, distributed on PyPi. That&#x27;s brilliant.
评论 #32966786 未加载
评论 #32966584 未加载
评论 #32966319 未加载
评论 #32965459 未加载
gavinray超过 2 年前
1) roapi is built with some wicked cool tech<p>2) the author once answered some questions I posted on Datafusion, so they&#x27;re cool in my book<p>Here are my anecdotes.
playingalong超过 2 年前
Bye bye jq and your awful query syntax.
评论 #32967722 未加载
henrydark超过 2 年前
It is pretty cool. py-spy has also been doing this for a few years<p><a href="https:&#x2F;&#x2F;github.com&#x2F;benfred&#x2F;py-spy" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;benfred&#x2F;py-spy</a>
tootie超过 2 年前
AWS Athena offers something similar. You can build tables off of structured text files (like log files) in S3 and run SQL queries.
评论 #32966384 未加载
johnnunn超过 2 年前
I have a use case, where my company&#x27;s application logs will be shipped to S3 in a directory structure such as application&#x2F;timestamp(one_hour)_logs.parquet. We want to build a simple developer focussed UI, where we can query for a given application for a time range and retrieve a bunch of s3 blobs in that time range and brute force search for the desired string. I see that roapi offers a REST interface for a fixed set of files but I would like to dynamically glob newer files. Are there are alternatives that can be used too ? Thanks
评论 #32970372 未加载
评论 #32970400 未加载
评论 #32970072 未加载
cube2222超过 2 年前
This looks really cool! Especially using datafusion underneath means that it probably is blazingly fast.<p>If you like this, I recommend taking a look at OctoSQL[0], which I&#x27;m the author of.<p>It&#x27;s plenty fast and easier to add new data sources for as external plugins.<p>It can also handle endless streams of data natively, so you can do running groupings on i.e. tailed JSON logs.<p>Additionally, it&#x27;s able to push down predicates to the database below, so if you&#x27;re selecting 10 rows from a 1 billion row table, it&#x27;ll just get those 10 rows instead of getting them all and filtering in memory.<p>[0]: <a href="https:&#x2F;&#x2F;github.com&#x2F;cube2222&#x2F;octosql" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;cube2222&#x2F;octosql</a>
评论 #32967932 未加载
评论 #32970495 未加载
smugma超过 2 年前
SQL on CSV (using preinstalled Mac tools) previously linked on HN: <a href="https:&#x2F;&#x2F;til.simonwillison.net&#x2F;sqlite&#x2F;one-line-csv-operations" rel="nofollow">https:&#x2F;&#x2F;til.simonwillison.net&#x2F;sqlite&#x2F;one-line-csv-operations</a><p>e.g.<p>sqlite3 :memory: -cmd &#x27;.mode csv&#x27; -cmd &#x27;.import royalties.csv Royalty&#x27; -cmd &#x27;.mode column&#x27; \<p><pre><code> &#x27;SELECT SUM(Royalty),Currency FROM Royalty GROUP BY Currency&#x27;</code></pre>
bachmeier超过 2 年前
As I commented on a recent similar discussion, these tools can&#x27;t be used for update or insert. As useful as querying might be, it&#x27;s terribly misleading to claim to &quot;run SQL&quot; if you can&#x27;t change the data, since that&#x27;s such a critical part of an SQL database.
评论 #32964803 未加载
评论 #32964636 未加载
评论 #32965918 未加载
评论 #32965211 未加载
评论 #32966431 未加载
评论 #32967471 未加载
skybrian超过 2 年前
Looks like it also supports SQLite for input, but not for output. That might be a nice addition.
the_optimist超过 2 年前
What’s the memory handling behavior here? Are CSVs read on query or at startup? What about Arrow? If read on startup, is there compression applied?
theGnuMe超过 2 年前
This is really cool and redefines ETL pipelines.
whimsicalism超过 2 年前
Trino can do this as well.
Kalanos超过 2 年前
is there a pythonic api for scripting (not command line)? i was looking for a json query tool and couldn&#x27;t find one.
评论 #32967995 未加载