TechEcho

15 comments

ebfe1over 2 years ago

This is cool...Totally reminded me about several tools pop up on HN every now and then in the past for similar task so i did a quick search:clickhouse-local - <a href="https://news.ycombinator.com/item?id=22457767" rel="nofollow">https://news.ycombinator.com/item?id=22457767</a>q - <a href="https://news.ycombinator.com/item?id=27423276" rel="nofollow">https://news.ycombinator.com/item?id=27423276</a>textql - <a href="https://news.ycombinator.com/item?id=16781294" rel="nofollow">https://news.ycombinator.com/item?id=16781294</a>simpql- <a href="https://news.ycombinator.com/item?id=25791207" rel="nofollow">https://news.ycombinator.com/item?id=25791207</a>We need a benchmark i think..;)

评论 #32969391 未加载

评论 #32966954 未加载

评论 #32970085 未加载

评论 #32970080 未加载

评论 #32969685 未加载

评论 #32979104 未加载

mmastracover 2 years ago

The one thing everyone here is missing so far is that it's a Rust binary, distributed on PyPi. That's brilliant.

评论 #32966786 未加载

评论 #32966584 未加载

评论 #32966319 未加载

评论 #32965459 未加载

gavinrayover 2 years ago

1) roapi is built with some wicked cool tech2) the author once answered some questions I posted on Datafusion, so they're cool in my bookHere are my anecdotes.

playingalongover 2 years ago

Bye bye jq and your awful query syntax.

评论 #32967722 未加载

henrydarkover 2 years ago

It is pretty cool. py-spy has also been doing this for a few years<a href="https://github.com/benfred/py-spy" rel="nofollow">https://github.com/benfred/py-spy</a>

tootieover 2 years ago

AWS Athena offers something similar. You can build tables off of structured text files (like log files) in S3 and run SQL queries.

评论 #32966384 未加载

johnnunnover 2 years ago

I have a use case, where my company's application logs will be shipped to S3 in a directory structure such as application/timestamp(one_hour)_logs.parquet. We want to build a simple developer focussed UI, where we can query for a given application for a time range and retrieve a bunch of s3 blobs in that time range and brute force search for the desired string. I see that roapi offers a REST interface for a fixed set of files but I would like to dynamically glob newer files. Are there are alternatives that can be used too ? Thanks

评论 #32970372 未加载

评论 #32970400 未加载

评论 #32970072 未加载

cube2222over 2 years ago

This looks really cool! Especially using datafusion underneath means that it probably is blazingly fast.If you like this, I recommend taking a look at OctoSQL[0], which I'm the author of.It's plenty fast and easier to add new data sources for as external plugins.It can also handle endless streams of data natively, so you can do running groupings on i.e. tailed JSON logs.Additionally, it's able to push down predicates to the database below, so if you're selecting 10 rows from a 1 billion row table, it'll just get those 10 rows instead of getting them all and filtering in memory.[0]: <a href="https://github.com/cube2222/octosql" rel="nofollow">https://github.com/cube2222/octosql</a>

评论 #32967932 未加载

评论 #32970495 未加载

smugmaover 2 years ago

SQL on CSV (using preinstalled Mac tools) previously linked on HN: <a href="https://til.simonwillison.net/sqlite/one-line-csv-operations" rel="nofollow">https://til.simonwillison.net/sqlite/one-line-csv-operations</a>e.g.sqlite3 :memory: -cmd '.mode csv' -cmd '.import royalties.csv Royalty' -cmd '.mode column' \<pre><code> 'SELECT SUM(Royalty),Currency FROM Royalty GROUP BY Currency'</code></pre>

bachmeierover 2 years ago

As I commented on a recent similar discussion, these tools can't be used for update or insert. As useful as querying might be, it's terribly misleading to claim to "run SQL" if you can't change the data, since that's such a critical part of an SQL database.

评论 #32964803 未加载

评论 #32964636 未加载

评论 #32965918 未加载

评论 #32965211 未加载

评论 #32966431 未加载

评论 #32967471 未加载

skybrianover 2 years ago

Looks like it also supports SQLite for input, but not for output. That might be a nice addition.

the_optimistover 2 years ago

What’s the memory handling behavior here? Are CSVs read on query or at startup? What about Arrow? If read on startup, is there compression applied?

theGnuMeover 2 years ago

This is really cool and redefines ETL pipelines.

whimsicalismover 2 years ago

Trino can do this as well.

Kalanosover 2 years ago

is there a pythonic api for scripting (not command line)? i was looking for a json query tool and couldn't find one.

评论 #32967995 未加载

15 comments

ebfe1over 2 years ago

评论 #32969391 未加载

评论 #32966954 未加载

评论 #32970085 未加载

评论 #32970080 未加载

评论 #32969685 未加载

评论 #32979104 未加载

mmastracover 2 years ago

The one thing everyone here is missing so far is that it's a Rust binary, distributed on PyPi. That's brilliant.

评论 #32966786 未加载

评论 #32966584 未加载

评论 #32966319 未加载

评论 #32965459 未加载

gavinrayover 2 years ago

1) roapi is built with some wicked cool tech2) the author once answered some questions I posted on Datafusion, so they're cool in my bookHere are my anecdotes.

playingalongover 2 years ago

Bye bye jq and your awful query syntax.

评论 #32967722 未加载

henrydarkover 2 years ago

It is pretty cool. py-spy has also been doing this for a few years<a href="https://github.com/benfred/py-spy" rel="nofollow">https://github.com/benfred/py-spy</a>

tootieover 2 years ago

AWS Athena offers something similar. You can build tables off of structured text files (like log files) in S3 and run SQL queries.

评论 #32966384 未加载

johnnunnover 2 years ago

评论 #32970372 未加载

评论 #32970400 未加载

评论 #32970072 未加载

cube2222over 2 years ago

评论 #32967932 未加载

评论 #32970495 未加载

smugmaover 2 years ago

bachmeierover 2 years ago

评论 #32964803 未加载

评论 #32964636 未加载

评论 #32965918 未加载

评论 #32965211 未加载

评论 #32966431 未加载

评论 #32967471 未加载

skybrianover 2 years ago

Looks like it also supports SQLite for input, but not for output. That might be a nice addition.

the_optimistover 2 years ago

What’s the memory handling behavior here? Are CSVs read on query or at startup? What about Arrow? If read on startup, is there compression applied?

theGnuMeover 2 years ago

This is really cool and redefines ETL pipelines.

whimsicalismover 2 years ago

Trino can do this as well.

Kalanosover 2 years ago

is there a pythonic api for scripting (not command line)? i was looking for a json query tool and couldn't find one.

评论 #32967995 未加载

Run SQL on CSV, Parquet, JSON, Arrow, Unix Pipes and Google Sheet

15 comments

Run SQL on CSV, Parquet, JSON, Arrow, Unix Pipes and Google Sheet

15 comments