Launch HN: Hydra (YC W22) – Query any database via Postgres

326 pointsby coatueabout 3 years ago

Hi HN, we’re Joe and JD from Hydra (<a href="https://hydras.io/" rel="nofollow">https://hydras.io/</a>). Hydra is a Postgres extension that intelligently routes queries through Postgres to other databases. Engineers query regular Postgres, and Hydra extends a Postgres-compliant SQL layer to non-relational, columnar, and graph DBs. It currently works with Postgres and Snowflake, and we have a roadmap to support MongoDB, Google BigQuery, and ClickHouse.Different databases are good at different things. For example, Postgres is good at low-latency transactional workloads, but slow when running analytical queries. For the latter, you're better off with a columnar database like Snowflake. The problem is that for each new database added to a system, application complexity increases quickly.Working at Microsoft Azure, I saw many companies juggle database trade-offs in complex architectures. When organizations adopted new databases, engineers were forced to rewrite application code to support the new database or use multiple apps to offset database performance tradeoffs. All this is expensive busy work that frustrates engineers. Adopting new databases is hard and expensive.Hydra automatically picks the right DB for the right task and pushes down computation, meaning each query will get routed to where it can be executed the fastest. We’ve seen results return 100X faster when executing to the right database.We've chosen to integrate with Snowflake first so that developers can easily gain the analytical performance of Snowflake through a simple Postgres interface. To an application, Hydra looks like a single database that can handle both transactions and analytics. As soon as transactions are committed in Postgres, they are accessible for analytics in real-time. Combining the strengths of Postgres and Snowflake in this way results in what is sometimes called HTAP: Hybrid Transactional-Analytical Processing (<a href="https://en.wikipedia.org/wiki/Hybrid_transactional/analytical_processing" rel="nofollow">https://en.wikipedia.org/wiki/Hybrid_transactional/analytica...</a>), which is the convergence of OLTP and OLAP.Existing solutions are manual and require communicating with each datastore separately. The common alternative is trying to combine all of your data together into a data warehouse via ETL. That works well for analysts and data scientists, but isn't transactional and can't be used to power responsive applications. With Hydra engineers can write unified applications to cover workloads that had to be separate before.Hydra runs as a Postgres extension, which gives it the ability to use Postgres internals and modify execution of queries. Hydra intercepts queries in real-time and routes queries based on query type, user settings, and Postgres' cost analysis. Writes and operational reads go to Postgres, analytical workloads go to Snowflake.Recently committed transactions are moved from Postgres to Snowflake in near real-time using Hydra Bridge, our built-in data pipeline that links databases from within Postgres. The bridge is an important part of what we do. Without Hydra, workloads are typically isolated between different databases, requiring engineers to implement slow and costly ETL processes. Complex analytics are often run on older data, updated monthly or weekly. The Hydra bridge allows for real-time data movement, enabling analytics to be run on fresh data.We make money by charging for Hydra Postgres, which is a Postgres managed service, and Hydra Instance, which attaches Hydra to your existing Postgres database. Pricing is listed on the product pages: <a href="https://hydras.io/products/postgres" rel="nofollow">https://hydras.io/products/postgres</a> and <a href="https://hydras.io/products/instance" rel="nofollow">https://hydras.io/products/instance</a>.A little about our backgrounds: Joseph Sciarrino - Former PM @ MSFT Azure Open-Source Databases team. Heroku (W08) and Citus Data (S11) alum. Jonathan Dance - Director @ Heroku (2011-2021)Using Hydra you can create a database cluster of your own design. We’d love to know what Hydra clusters you’d be interested in creating. For example, Elasticsearch + Postgres, BigQuery + SingleStore + Postgres, etc. Remember - You can experiment different combinations without rewriting queries, since Hydra extends Postgres over these other databases. When you think about databases like interoperable parts you can get super creative!

27 comments

gavinrayabout 3 years ago

<pre><code> > Hydra automatically picks the right DB for the right task and pushes down computation, meaning each query will get routed to where it can be executed the fastest. We’ve seen results return 100X faster when executing to the right database. </code></pre> This is really interesting. Could you talk a bit more about query pushdown and planning/optimization?Is this through FDW's? Would love to hear more about the technical details.Shameless plug -- I work at Hasura (turn DB's into GraphQL API's) and this seems incredibly synergistic and useful to get access to databases we don't have native drivers for at the moment.Any chance of an OSS limited version?

评论 #30443033 未加载

评论 #30443503 未加载

评论 #30445104 未加载

评论 #30442935 未加载

gunnarmorlingabout 3 years ago

Congrats on the launch! Two questions:- How does this deal with specifics of the query languages of the different data stores? I'm not an expert with Snowflake, but I suppose it supports specific querying capabilities not found in Postgres' SQL dialect. How are those exposed to Hydra users?- I'm confused by "As soon as transactions are committed in Postgres, they are accessible for analytics in real-time" vs. "Recently committed transactions are moved from Postgres to Snowflake in near real-time". Is data propagated to Snowflake synchronously or asynchronously? I.e. is it guaranteed that data can be queries from Snowflake right the next moment after a transaction has been committed (as suggested by the former) or not (as suggested by the latter)?Disclaimer: I work on Debezium, another solution people use for propagating data from different databases (including Postgres) into different data sinks (including Snowflake)

评论 #30445967 未加载

michaelmiorabout 3 years ago

For anyone interested, Apache Calcite[0] is an open source data management framework which seems to do many of the same things that Hydra claims to do, but taking a different approach. Operating as a Java library, Calcite contains "adapters" to many different data sources from existing JDBC connectors to Elasticsearch to Cassandra. All of these different data sources can be joined together as desired. Calcite also has it's own optimizer which is able to push down relevant parts of the query to the different data sources. However, you get full SQL on data sources which don't support it, with Calcite executing the remaining bits itself.Generally all that is required to connect to multiple data sources from CSV to Elasticsearch is just writing a JSON configuration file. Then can get SQL access via JDBC with the able to join all those sources together.Unfortunately, I would not be too surprised if the query execution Calcite was found to be less performance-optimized than Hydra. There is ongoing work for improvement there. That said, there are users of Calcite at Google, Uber, Spotify, and others who have made great use of various parts of the framework.[0] <a href="https://calcite.apache.org/" rel="nofollow">https://calcite.apache.org/</a>

评论 #30447580 未加载

评论 #30445895 未加载

评论 #30445270 未加载

评论 #30444640 未加载

garysahota93about 3 years ago

This is super powerful. While I see the immediate value in this for simplifying applications, I can also see this becoming a powerful tool for data analysts & data engineers in speeding up their "time to insight".I've had (early in their career) analysts report to me that struggle writing optimal queries across relational, non-relational, & graph DBs (they're usually great at one & mediocre on others). This will be a huge for them & our stakeholders who rely on them to get them trustworthy insights.

chrisweeklyabout 3 years ago

Wow. This seems like such a staggeringly good idea. Congrats on launch, and kudos for bringing this to life! Curious about the overhead (ie, benchmarks for the simplest scenario: vanilla postgres vs going through hydra for the same queries and load). But unless there's a huge hit there (which seems unlikely), this seems like a really exciting development.

评论 #30443764 未加载

teejabout 3 years ago

Been thinking about this sort of thing for awhile, your vision for how this should work is so much better than mine was.One of the ideas I kicked around was “materialize-on-read” - when a query comes in but the underlying data is stale, refresh the views first then serve the query.I’m wondering how much state you plan to put into the Hydra layer or if you plan to keep it mostly a router.

agaceraabout 3 years ago

This is really nice! Congrats!I once started building as a side project something similar but focused on querying cloud resources (like S3 buckets, ec2s, etc... discovering the biggest file from a bucket was trivial with this). I abandoned the project but someone else built a startup on the same concept - even the name was the same: cloudquery.I built it using the multicorn [1] postgres extension and it is deligthful of how easy it to get something simple running.[1] <a href="https://multicorn.org/" rel="nofollow">https://multicorn.org/</a>

bradlyabout 3 years ago

This looks great! Couple questions…1.) Can you talk a bit about how this is better that the existing foreign data wrappers Postgres has available?2.) Any thoughts on S3 support? More and more I see teams using S3 as a data store for certain specific use cases.

评论 #30443162 未加载

tluyben2about 3 years ago

Shame it’s not OSS but I get that. The ‘no lock-in’ statement on the site; if we would see speedups in both dev and execution performance by using this and develop everything on it from that point to improve working with data easier across the enterprise, how are we not locked in when we you decide to do something else or sell to Oracle? The latter happened to us, quite exactly and that’s why no OSS is no go for dev infrastructure.Definitely nice work though and best of luck!

评论 #30446698 未加载

sequoiaabout 3 years ago

> Hydra automatically picks the right DB for the right task and pushes down computation, meaning each query will get routed to where it can be executed the fastest.Does this mean the data is duplicated to all the available storage backends?

评论 #30446846 未加载

kleebeeshabout 3 years ago

Looks neat, but wasn't this the promise of Presto? Presto didn't seem to really work out. From what I've seen it converged to a mostly analytical engine. It's still very useful, but I've never seen it used (successfully) in an OLTP workload. Maybe there's some difference in the intended product trajectory that I'm overlooking here?

评论 #30444695 未加载

评论 #30447013 未加载

sixdimensionalabout 3 years ago

Sounds like a federated query engine with a cost based optimizer. I worked for a company that went pretty far down this path using another database.Definitely a lot of potential, and also a lot of potential gotchas.Translating from one SQL syntax (e.g. Postgres) to others, while maintaining the full capability of the other system, for example, turned out to be quite complex (but doable).Will be following your project and wish you all the best. I suspect if you keep things sharp/focused and don't go too crazy with promises or uses cases, this could be quite successful.Is this going to stay an intelligent router and sort of proxy? Or do you have plans for federated, heterogeneous joins, for example?That's where things get interesting :) I think there's a lot you can do without even having to go that far.

shafyyabout 3 years ago

Awesome. Do you plan to support providing the hosted instance on other providers (specifically, non-US companies like Hetzner)?Alternatively, do you plan on offering a self-hosted version?I would be interested in the Clickhouse integration. Specifically, it would allow me to easily add Clickhouse to my Rails-based product analytics tool [0] while still using the Rails ORM (as far as I understand this should be possible?).0: <a href="https://github.com/shafy/fugu" rel="nofollow">https://github.com/shafy/fugu</a>

评论 #30458201 未加载

abledonabout 3 years ago

I see a lot of software named after beasts, and a lot of 'Hydra' programs/companies all doing different things. Imagine if someone in 300 BC thought about how we would base our future creations off mythological beasts... they would've increased the CIDR range on all available beast name ideas and written a whole bunch of extra stories.

tyingqabout 3 years ago

Interesting. I'm curious about how you handle security now, and what the plans are. That is, is there any integration between the roles/rights my postgres session user has, and the roles/rights I have on the downstream database.

skrtskrtabout 3 years ago

Could this also improve either developer experience or query performance when working with something like Redshift, which is a columnar OLAP store that already uses a Postgres dialect?

hangonhnabout 3 years ago

This looks amazing. Love the strong Snowflake integration -- very forward looking. I just passed this onto our Data Science team.

jzelinskieabout 3 years ago

What does it take to collaborate on a backend? We've investigated building a Postgres extension for querying SpiceDB[0] and Hydra seems like it could help. What kind of consistency guarantees can be made?[0]: <a href="https://github.com/authzed/spicedb" rel="nofollow">https://github.com/authzed/spicedb</a>

评论 #30458229 未加载

aslakhellesoyabout 3 years ago

Congratulations on the launch - this sounds interesting.I'm currently using Postgraphile[0], which uses Postgres' introspection API to discover the schema structure.Would this still work with Hydra?[0] <a href="https://www.graphile.org/postgraphile/" rel="nofollow">https://www.graphile.org/postgraphile/</a>

评论 #30444033 未加载

mritsabout 3 years ago

I don't have any experience with Hydra but I first used FDW to query external dbs about a decade ago. Also there was a pretty popular db that seemed to do a lot of the same things sponsored by facebook, the name escapes me though. Facebook used it pretty heavily internally though.

imachine1980_about 3 years ago

I like but how much you can truly do whiout the specifics some of the big query for example, thing are so much especific that you will end up whit required bigquery that sound like ORM whit postgre syntax. I like the idea.

samjbobbabout 3 years ago

This is very interesting. I’ve been building an internal system that looks a lot like this for the current startup I’m at.Will be following.

评论 #30449322 未加载

sitkackabout 3 years ago

Great timing as Spanner just launched Postgres wire protocol support.

edublancasabout 3 years ago

Congrats on the launch! Coming from a data science role, this could've been pretty useful for my previous projects. I had to rewrite all of my feature engineering queries when the company I worked at moved to Snowflake.One question I have is how to Hydra balances writing postgres scripts vs leveraging system-specific features. For example, I remember going through Snowflake's documentation and found interesting functions for data aggregation. Can I leverage Snowflake-specific features when using Hydra?

评论 #30443644 未加载

CodeAlongabout 3 years ago

Any plans to offer a self-hosted version of Hydra instance?

评论 #30444709 未加载

tullieabout 3 years ago

Looking forward to the support for MongoDB and other no-sql stores. Interested to hear how you're trying to approach that.

评论 #30443307 未加载

alexvboeabout 3 years ago

Congrats on the launch, this is amazing!