科技回声

15 条评论

ebfe1超过 2 年前

This is cool...Totally reminded me about several tools pop up on HN every now and then in the past for similar task so i did a quick search:clickhouse-local - <a href="https://news.ycombinator.com/item?id=22457767" rel="nofollow">https://news.ycombinator.com/item?id=22457767</a>q - <a href="https://news.ycombinator.com/item?id=27423276" rel="nofollow">https://news.ycombinator.com/item?id=27423276</a>textql - <a href="https://news.ycombinator.com/item?id=16781294" rel="nofollow">https://news.ycombinator.com/item?id=16781294</a>simpql- <a href="https://news.ycombinator.com/item?id=25791207" rel="nofollow">https://news.ycombinator.com/item?id=25791207</a>We need a benchmark i think..;)

评论 #32969391 未加载

评论 #32966954 未加载

评论 #32970085 未加载

评论 #32970080 未加载

评论 #32969685 未加载

评论 #32979104 未加载

mmastrac超过 2 年前

The one thing everyone here is missing so far is that it's a Rust binary, distributed on PyPi. That's brilliant.

评论 #32966786 未加载

评论 #32966584 未加载

评论 #32966319 未加载

评论 #32965459 未加载

gavinray超过 2 年前

1) roapi is built with some wicked cool tech2) the author once answered some questions I posted on Datafusion, so they're cool in my bookHere are my anecdotes.

playingalong超过 2 年前

Bye bye jq and your awful query syntax.

评论 #32967722 未加载

henrydark超过 2 年前

It is pretty cool. py-spy has also been doing this for a few years<a href="https://github.com/benfred/py-spy" rel="nofollow">https://github.com/benfred/py-spy</a>

tootie超过 2 年前

AWS Athena offers something similar. You can build tables off of structured text files (like log files) in S3 and run SQL queries.

评论 #32966384 未加载

johnnunn超过 2 年前

I have a use case, where my company's application logs will be shipped to S3 in a directory structure such as application/timestamp(one_hour)_logs.parquet. We want to build a simple developer focussed UI, where we can query for a given application for a time range and retrieve a bunch of s3 blobs in that time range and brute force search for the desired string. I see that roapi offers a REST interface for a fixed set of files but I would like to dynamically glob newer files. Are there are alternatives that can be used too ? Thanks

评论 #32970372 未加载

评论 #32970400 未加载

评论 #32970072 未加载

cube2222超过 2 年前

This looks really cool! Especially using datafusion underneath means that it probably is blazingly fast.If you like this, I recommend taking a look at OctoSQL[0], which I'm the author of.It's plenty fast and easier to add new data sources for as external plugins.It can also handle endless streams of data natively, so you can do running groupings on i.e. tailed JSON logs.Additionally, it's able to push down predicates to the database below, so if you're selecting 10 rows from a 1 billion row table, it'll just get those 10 rows instead of getting them all and filtering in memory.[0]: <a href="https://github.com/cube2222/octosql" rel="nofollow">https://github.com/cube2222/octosql</a>

评论 #32967932 未加载

评论 #32970495 未加载

smugma超过 2 年前

SQL on CSV (using preinstalled Mac tools) previously linked on HN: <a href="https://til.simonwillison.net/sqlite/one-line-csv-operations" rel="nofollow">https://til.simonwillison.net/sqlite/one-line-csv-operations</a>e.g.sqlite3 :memory: -cmd '.mode csv' -cmd '.import royalties.csv Royalty' -cmd '.mode column' \<pre><code> 'SELECT SUM(Royalty),Currency FROM Royalty GROUP BY Currency'</code></pre>

bachmeier超过 2 年前

As I commented on a recent similar discussion, these tools can't be used for update or insert. As useful as querying might be, it's terribly misleading to claim to "run SQL" if you can't change the data, since that's such a critical part of an SQL database.

评论 #32964803 未加载

评论 #32964636 未加载

评论 #32965918 未加载

评论 #32965211 未加载

评论 #32966431 未加载

评论 #32967471 未加载

skybrian超过 2 年前

Looks like it also supports SQLite for input, but not for output. That might be a nice addition.

the_optimist超过 2 年前

What’s the memory handling behavior here? Are CSVs read on query or at startup? What about Arrow? If read on startup, is there compression applied?

theGnuMe超过 2 年前

This is really cool and redefines ETL pipelines.

whimsicalism超过 2 年前

Trino can do this as well.

Kalanos超过 2 年前

is there a pythonic api for scripting (not command line)? i was looking for a json query tool and couldn't find one.

评论 #32967995 未加载

15 条评论

ebfe1超过 2 年前

评论 #32969391 未加载

评论 #32966954 未加载

评论 #32970085 未加载

评论 #32970080 未加载

评论 #32969685 未加载

评论 #32979104 未加载

mmastrac超过 2 年前

The one thing everyone here is missing so far is that it's a Rust binary, distributed on PyPi. That's brilliant.

评论 #32966786 未加载

评论 #32966584 未加载

评论 #32966319 未加载

评论 #32965459 未加载

gavinray超过 2 年前

1) roapi is built with some wicked cool tech2) the author once answered some questions I posted on Datafusion, so they're cool in my bookHere are my anecdotes.

playingalong超过 2 年前

Bye bye jq and your awful query syntax.

评论 #32967722 未加载

henrydark超过 2 年前

It is pretty cool. py-spy has also been doing this for a few years<a href="https://github.com/benfred/py-spy" rel="nofollow">https://github.com/benfred/py-spy</a>

tootie超过 2 年前

AWS Athena offers something similar. You can build tables off of structured text files (like log files) in S3 and run SQL queries.

评论 #32966384 未加载

johnnunn超过 2 年前

评论 #32970372 未加载

评论 #32970400 未加载

评论 #32970072 未加载

cube2222超过 2 年前

评论 #32967932 未加载

评论 #32970495 未加载

smugma超过 2 年前

bachmeier超过 2 年前

评论 #32964803 未加载

评论 #32964636 未加载

评论 #32965918 未加载

评论 #32965211 未加载

评论 #32966431 未加载

评论 #32967471 未加载

skybrian超过 2 年前

Looks like it also supports SQLite for input, but not for output. That might be a nice addition.

the_optimist超过 2 年前

What’s the memory handling behavior here? Are CSVs read on query or at startup? What about Arrow? If read on startup, is there compression applied?

theGnuMe超过 2 年前

This is really cool and redefines ETL pipelines.

whimsicalism超过 2 年前

Trino can do this as well.

Kalanos超过 2 年前

is there a pythonic api for scripting (not command line)? i was looking for a json query tool and couldn't find one.

评论 #32967995 未加载

Run SQL on CSV, Parquet, JSON, Arrow, Unix Pipes and Google Sheet

15 条评论

Run SQL on CSV, Parquet, JSON, Arrow, Unix Pipes and Google Sheet

15 条评论