Launch HN: Prequel (YC W21) – Sync data to your customer’s data warehouse

118 pointsby ctc24over 2 years ago

Hey HN! We’re Conor and Charles from Prequel (<a href="https://prequel.co" rel="nofollow">https://prequel.co</a>). We make it easy for B2B companies to send data to their customers. Specifically, we help companies sync data directly to their customer's data warehouse, on an ongoing basis.We’re building Prequel because we think the current ETL paradigm isn’t quite right. Today, it’s still hard to get data out of SaaS tools: customers have to write custom code to scrape APIs, or procure third-party tools like Fivetran to get access to their data. In other words, the burden of data exports is on the customer.We think this is backwards! Instead, vendors should make it seamless for their customers to export data to their data warehouse. Not only does this make the customer’s life easier, it benefits the vendor too: they now have a competitive advantage, and they get to generate new revenue if they choose to charge for the feature. This approach is becoming more popular: companies like Stripe, Segment, Heap, and most recently Salesforce offer some flavor of this capability to their customers.However, just as it doesn’t make sense for each customer to write their own API-scraping code, it doesn’t make sense for every SaaS company to build their own sync-to-customer-warehouse system. That’s where Prequel comes in. We give SaaS companies the infrastructure they need to easily connect to their customers’ data warehouses, start writing data to it, and keep that data updated on an ongoing basis. Here's a quick demo: <a href="https://www.loom.com/share/da181d0c83e44ef9b8c5200fa850a2fd" rel="nofollow">https://www.loom.com/share/da181d0c83e44ef9b8c5200fa850a2fd</a>.Prequel takes less than an hour to set up: you (the SaaS vendor) connect Prequel to your source database/warehouse, configure your data model (aka which tables to sync), and that’s pretty much it. After that, your customers can connect their database/warehouse and start receiving their data in a matter of minutes. All of this can be done through our API or in our admin UI.Moving all this data accurately and in a timely manner is a nontrivial technical problem. We potentially have to transfer billions of rows / terabytes of data per day, while guaranteeing that transfers are completely accurate. Since companies might use this data to drive business decisions or in financial reporting, we really can't afford to miss a single row.There are a few things that make this particularly tricky. Each data warehouse speaks a slightly different dialect of SQL and has a different type system (which is not always well documented, as we've come to learn!). Each warehouse also has slightly different ingest characteristics (for example, Redshift has a hard cap of 16MB on any statement), meaning you need different data loading strategies to optimize throughput. Finally, most of the source databases we read data from are multi-tenant — meaning they contain data from multiple end customers, and part of our job is to make sure that the right data gets routed to the right customer. Again, it's pretty much mission-critical that we don't get this wrong, not even once.As a result, we've invested in extensive testing a lot earlier than it makes sense for most startups to. We also tend to write code fairly defensively: we always try to think about the ways in which our code could fail (or anticipate what bugs might be introduced in the future), and make sure that the failure path is as innocuous as possible. Our backend is written in Go, our frontend is in React + Typescript (we're big fans of compiled languages!), we use Postgres as our application db, and we run the infra on Kubernetes.The last piece we'll touch on is security and privacy. Since we're in the business of moving customer data, we know that security and privacy are paramount. We're SOC 2 Type II certified, and we go through annual white-box pentests to make sure that all our code is up to snuff. We also offer on-prem deployments, so data never has to touch our servers if our customers don't want it to.It's kind of surreal to launch on here – we’re long time listeners, first time callers, and have been surfing HN since long before we first started dreaming about starting a company. Thanks for having us, and we're happy to answer any questions you may have! If you wanna take the product for a spin, you can sign up on our website or drop us a line at hn (at) prequel.co. We look forward to your comments!

21 comments

thdxrover 2 years ago

Wow I've been looking for this for years! Always thought SaaS companies waste time building yet another mediocre analytics dashboard when they should just sync their dataMy main thing is I don't want to think in terms of raw data from my database to the customer database, I have higher level API concepts.Would be cool if there was some kind of sync protocol I could implement where prequel sent a request with a "last fetched" timestamp and the endpoint replied with all data to be updated.Kind of like this: <a href="https://doc.replicache.dev/server-pull" rel="nofollow">https://doc.replicache.dev/server-pull</a>

评论 #32985360 未加载

评论 #32987298 未加载

soumyadebover 2 years ago

Congrats on the launch. Very cool idea and always wondered why this hasn't been done before.One question though - don't you see Snowflake (or the cloud data warehouse vendors) building this? Snowflake has to build native support for CDC from some production databases like Postgres, MySQL, Oracle etc. Once the data has landed in the SaaS vendor's Snowflake, it can be shared (well the relevant rows which a customer should have access to) with each customer.Isn't that long term right solution? Or am I missing something here?

评论 #32987669 未加载

评论 #32990531 未加载

sailsover 2 years ago

This is excellent (the idea, I don't know anything about prequel!), and a much needed tool to support a reasonable trend. I fully support B2B companies taking on the responsibility to make data more readily available for analytics, beyond just exposing a fragile API.For those unaware, this is a relatively recently established practice (direct to warehouse instead of via 3rd party ETL)<a href="https://techcrunch.com/2022/09/15/salesforce-snowflake-partnership-moves-customer-data-in-real-time-across-systems/" rel="nofollow">https://techcrunch.com/2022/09/15/salesforce-snowflake-partn...</a><a href="https://stripe.com/en-gb-es/data-pipeline" rel="nofollow">https://stripe.com/en-gb-es/data-pipeline</a>

wasdover 2 years ago

Awesome product.1. Do you expect to support SQL Server? If so, do you know when?2. Watched the Loom video. How should we handle multi-tenant data that requires a join? For example, let' say I want to send data specific school. The student would belong to a Teacher who belongs to a School.

评论 #32986081 未加载

gourabmiover 2 years ago

How do you deal with incompatible data types between the source and destination systems ? For example, the source might have a timestamp with timezone data type and the destination could just support timestamp in UTC.

评论 #32985745 未加载

leetroutover 2 years ago

Love it.As we continue to move toward more and more composable architectures combining SaaS an offering like this is going to really give your users a leg up.Edit: deleted redundant question about SOC 2. I missed the whole paragraph on mobile

sv123over 2 years ago

Very cool, we've had great success using Snowflake's sharing to give our customers access to their data... That obviously falls apart if the customer wants data in BigQuery or somewhere else.

ztratarover 2 years ago

One of the first Show HN's i've read where my first thought was "would invest based purely on the single sentence description"Nice idea. Have fun executing!

burembaover 2 years ago

Congrats on your launch! What's the data source that I can add? Does it also need to be another database listed here (1)? Does that mean that I need to move the data to one of these databases in order to sync our customer data to their data warehouses?(1) <a href="https://docs.prequel.co/reference/post_sources" rel="nofollow">https://docs.prequel.co/reference/post_sources</a>

评论 #32984603 未加载

mfrye0over 2 years ago

Very cool. We've just started exploring customer requests to sync our data to their warehouses, so great timing.What sort of scale can you guys handle? One of our DBs is in the billions of rows.I assume we'd need to potentially create custom sources for each destination as well? Or does your system automatically figure out the best "common" schema across all destinations? For example, an IP subnet column.

评论 #32985678 未加载

rco8786over 2 years ago

Oh man I love this idea. After working both on a team that utilized our own data warehouse extensively. And then recently went through an extensive private api integration with a third party. Being able to just sync data back and forth to each other’s warehouses solves a whole host of common partnership problems much more quickly and cheaply.

mardo5over 2 years ago

Hey congarts on the launch ! One question comes to mind is about the T in ETL, ingestion tools like Fivetran and Stitch allows you to do light transformations on the incoming data. This process is usually done by the ingestion team that hands the data to different value teams.

评论 #32994852 未加载

r_thambapillaiover 2 years ago

Would you guys support the inverse of this - sending data to your vendors? If you buy a SaaS tool that needs to ingest certain data from your data warehouse (or SaaS tools) into the vendors Snowflake / Data warehouse, would Prequel be worth looking at trying to achieve that with?

评论 #32987065 未加载

ianbutlerover 2 years ago

Hey guys we spoke a while back, glad to see you’re still going at it! Good luck with everything.

评论 #32988494 未加载

yevpatsover 2 years ago

It's an interesting take but I worry this might not be possible (for a lot of cases) because not everything is database backed and sometime you have logic behind the API and you actually want the result of that API in your data warehouses. Think AWS API for example, even their own AWS Config team is using the same AWS APIs because there is no one place where the data just resides (of-course it would be great if it was the case, it would make life very easy)I think this is a good tweet explaining the problem that will never be solved - <a href="https://twitter.com/mattrickard/status/1542193426979909634" rel="nofollow">https://twitter.com/mattrickard/status/1542193426979909634</a>Full Disclaimer: Im the founder and original author of <a href="https://github.com/cloudquery/cloudquery" rel="nofollow">https://github.com/cloudquery/cloudquery</a>

评论 #32985059 未加载

aerzenover 2 years ago

Do you know about name clash with PRQL [1]?While it doesn't have the same spelling, pronunciation is the same.[1] <a href="https://github.com/prql/prql" rel="nofollow">https://github.com/prql/prql</a>

blakeburchover 2 years ago

Love the idea! I really see the value in shifting the conversation towards the vendor themselves being responsible for pushing the data to customers and it makes a lot of sense to do it directly from DB -> DB.However, building a data product myself (Shipyard), we really try to encourage the idea of "connecting every data touchpoint together" so you can get an end-to-end view of how data is used and prevent downstream issues from ever occurring. This raised a few questions:1. If the vendor owns the process of when the data gets delivered, how would a data team be able to have their pipelines react to the completion or failure of that specific vendor's delivery? Or does the ingestion process just become more of a black box?While relying on a 3rd party ingestion platform or running ingestion scripts on your own orchestration platform isn't ideal, it at least centralizes the observability of ongoing ingestion processes into a single location.2. From a business perspective, do you see a tool like Prequel encouraging businesses to restrict their data exports behind their own paywall rather than making the data accessible via external APIs?--Would love to connect and chat more if you're interested! Contact is in bio.

评论 #32989365 未加载

dwinerover 2 years ago

Brilliant idea. Can see many use cases for this with our company. Congrats on the launch!

mrwnmonmover 2 years ago

Since a lot of services do integrations these days, I wonder are there some common connectors they are using?

sgammonover 2 years ago

This is very cool. I applied! We have a use for it now (we're starting a B2B thing). Good luck!

评论 #32988328 未加载

publiccompsover 2 years ago

This looks awesome!