This is cool...Totally reminded me about several tools pop up on HN every now and then in the past for similar task so i did a quick search:<p>clickhouse-local - <a href="https://news.ycombinator.com/item?id=22457767" rel="nofollow">https://news.ycombinator.com/item?id=22457767</a><p>q - <a href="https://news.ycombinator.com/item?id=27423276" rel="nofollow">https://news.ycombinator.com/item?id=27423276</a><p>textql - <a href="https://news.ycombinator.com/item?id=16781294" rel="nofollow">https://news.ycombinator.com/item?id=16781294</a><p>simpql- <a href="https://news.ycombinator.com/item?id=25791207" rel="nofollow">https://news.ycombinator.com/item?id=25791207</a><p>We need a benchmark i think..;)
1) roapi is built with some wicked cool tech<p>2) the author once answered some questions I posted on Datafusion, so they're cool in my book<p>Here are my anecdotes.
It is pretty cool. py-spy has also been doing this for a few years<p><a href="https://github.com/benfred/py-spy" rel="nofollow">https://github.com/benfred/py-spy</a>
I have a use case, where my company's application logs will be shipped to S3 in a directory structure such as application/timestamp(one_hour)_logs.parquet. We want to build a simple developer focussed UI, where we can query for a given application for a time range and retrieve a bunch of s3 blobs in that time range and brute force search for the desired string. I see that roapi offers a REST interface for a fixed set of files but I would like to dynamically glob newer files. Are there are alternatives that can be used too ? Thanks
This looks really cool! Especially using datafusion underneath means that it probably is blazingly fast.<p>If you like this, I recommend taking a look at OctoSQL[0], which I'm the author of.<p>It's plenty fast and easier to add new data sources for as external plugins.<p>It can also handle endless streams of data natively, so you can do running groupings on i.e. tailed JSON logs.<p>Additionally, it's able to push down predicates to the database below, so if you're selecting 10 rows from a 1 billion row table, it'll just get those 10 rows instead of getting them all and filtering in memory.<p>[0]: <a href="https://github.com/cube2222/octosql" rel="nofollow">https://github.com/cube2222/octosql</a>
As I commented on a recent similar discussion, these tools can't be used for update or insert. As useful as querying might be, it's terribly misleading to claim to "run SQL" if you can't change the data, since that's such a critical part of an SQL database.