科技回声

Hey HN! Charles here from Prequel (<a href="https://prequel.co">https://prequel.co</a>). We just launched the ability to sync data from your own app/db/data warehouse to any of your customer’s Google Sheets, CSV, or Excel – and I wanted to share a bit more about how we built the Google Sheets integration. If you’re curious, see here for a quick GIF demo of our Google Sheets destination: <a href="https://storage.googleapis.com/hn_asset/Prequel_GoogleSheetsDemo.webp" rel="nofollow">https://storage.googleapis.com/hn_asset/Prequel_GoogleSheets...</a>.Quick background on us: we make it easy to integrate with and sync data to data warehouses. Problem is, there are plenty of folks who want access to their data, but don’t have or don’t know how to use a data warehouse. For example, FP&A teams, customer success teams, etc.To get around that, we added some non-db destinations to Prequel: Google Sheets, CSV, and Excel. We had to rework some core assumptions in order to get Google Sheets to work.By default, Prequel does incremental syncs, meaning we only write net new or updated data to the destination. To avoid duplicate rows, we typically perform those writes as upserts – this is pretty trivial in most SQL dialects. But since Google Sheets is not actually a db, it doesn’t have a concept of upserts, and we had to get creative.We had two options: either force all Google Sheets syncs to be “full refreshes” every time (eg grab all the data and brute-force write it to the sheet). The downside is, this can get expensive quickly for our customers, especially when data gets refreshed at higher frequencies (eg every 15 minutes).The other, and better, option was to figure out how to perform upserts in Sheets. To do so, we read the data from the sheet we’re about to write to into memory. We store it in a large map by primary key. We reconcile it with the data we’re about to write. We then dump the contents of the map back to the sheet. In order to make the user experience smoother, we also sort the rows by timestamp before writing it back. This guarantees that we don’t accidentally shuffle rows with every transfer, which might leave users feeling confused.“Wait, you keep all the data in memory… so how do you avoid blowing up your pods?”. Great question! Luckily, Google Sheets has pretty stringent cell / row size limits. This allows us to restrict the amount of data that can be written to these destinations (we throw a nice error if someone tries to sync too much data), and thereby also guarantees that we don’t OOM our poor pods.Another interesting problem we had to solve was auth: how do we let users give us access to their sheets in a way that both feels intuitive and upholds strong security guarantees? It seemed like the cleanest user experience was to ask the spreadsheet owner to share access with a new user – much like they would with any real human user. To make this possible without creating a superuser that would have access to _all_ the sheets, we had to programmatically generate a different user for each of our customers. We do this via the GCP IAM API, creating a new service account every time. We then auth into the sheet through this service account.One last fun UX challenge to think through was how to prevent users from editing the “golden” data we just sync’d. It might not be immediately clear to them that this data is meant as a source of truth record, rather than a playground. To get around this, we create protected ranges and prevent them from editing the sheets we write to. Sheets even adds a little padlock icon to the relevant sheets, which helps convey the “don’t mess with this”.If you want to take it for a spin, you can sign up on our site or reach us at hello (at) prequel.co. Happy to answer any other questions about the design!

10 条评论

ricktdotorg超过 2 年前

> To make this possible without creating a superuser that would have access to _all_ the sheets, we had to programmatically generate a different user for each of our customers. We do this via the GCP IAM API, creating a new service account every time. We then auth into the sheet through this service account.nice solution! i've worked with updating sheets in real time with data from the Admin SDK using the Sheets API and service accounts, but yeah that domain-wide delegation is fine for our corp stuff but not so good for customers!great work, will give this a try out for sure! :thumbsup:

评论 #34558071 未加载

char_star_star超过 2 年前

I wonder how long it will be until you have a customer using this just to share data to their own internal teams just so the data team doesn't have to mess around with Sheets!Also—I want Prequel for Zendesk and Greenhouse (and Asana and ...) so badly. There are so many more interesting things I want to be doing with my time at work than babysitting pipelines.

评论 #34552334 未加载

stephen超过 2 年前

Interesting! Any plans for SalesForce support?We'd love to have a way to easily sync our internal system's data in/out of SFDC....and a source of GraphQL? :-)One of our engineers recently suggested syncing our PG database to Airtable, solely b/c Airtable has out-of-the-box SFDC integration (webhooks/etc), so our SFDC team could get at the data easier than they could from our PG database.I'm hesitant about "Airtable as our official 3rd-party integration strategy", but it does make me pine for a "protocol" for one-way/two-way real-time/batch syncing between two systems that just want to share "dumb" tables+fields/entities.I was thinking Zapier might have that, like if we implemented the Zapier-TM backend protocol on top of our custom system, and it would ~insta integrate with everything, but in my ~10 minutes of scanning their docs, it seemed less "protocol" and more "configure all the things via clicking around".

评论 #34558103 未加载

评论 #34554906 未加载

评论 #34554860 未加载

评论 #34563054 未加载

评论 #34555409 未加载

rubenv超过 2 年前

Not very familiar with the sheets API, but was wondering: how are rate limits applied? Per sheet or in total?

评论 #34549951 未加载

rexreed超过 2 年前

Would an alternative to this approach be to let your customers copy a spreadsheet of yours that already has an API function connection to your data? This would avoid the need to create users and upserts and all that, no? Or am I missing something?

评论 #34558082 未加载

评论 #34557356 未加载

typingmonkey超过 2 年前

Is there a way to get a stream of changes out of prequel (via websocket)? That would make it possible to live-sync between the customer's devices.

评论 #34550600 未加载

mritchie712超过 2 年前

You should get Apollo to buy Prequel. Fivetran doesn't support Apollo and I could really use my Apollo data in Snowflake.

评论 #34550708 未加载

ian0超过 2 年前

Nice! Is it a two way sync? I noticed the sheets were locked in the demo.

评论 #34558092 未加载

tlarkworthy超过 2 年前

What about importing sheets too?

评论 #34558089 未加载

orlp超过 2 年前

That's a bit of an unfortunate name clash with PRQL, also pronounced prequel: <a href="https://prql-lang.org/" rel="nofollow">https://prql-lang.org/</a>

评论 #34555358 未加载

10 条评论

ricktdotorg超过 2 年前

评论 #34558071 未加载

char_star_star超过 2 年前

评论 #34552334 未加载

stephen超过 2 年前

评论 #34558103 未加载

评论 #34554906 未加载

评论 #34554860 未加载

评论 #34563054 未加载

评论 #34555409 未加载

rubenv超过 2 年前

Not very familiar with the sheets API, but was wondering: how are rate limits applied? Per sheet or in total?

评论 #34549951 未加载

rexreed超过 2 年前

评论 #34558082 未加载

评论 #34557356 未加载

typingmonkey超过 2 年前

Is there a way to get a stream of changes out of prequel (via websocket)? That would make it possible to live-sync between the customer's devices.

评论 #34550600 未加载

mritchie712超过 2 年前

You should get Apollo to buy Prequel. Fivetran doesn't support Apollo and I could really use my Apollo data in Snowflake.

评论 #34550708 未加载

ian0超过 2 年前

Nice! Is it a two way sync? I noticed the sheets were locked in the demo.

评论 #34558092 未加载

tlarkworthy超过 2 年前

What about importing sheets too?

评论 #34558089 未加载

orlp超过 2 年前

That's a bit of an unfortunate name clash with PRQL, also pronounced prequel: <a href="https://prql-lang.org/" rel="nofollow">https://prql-lang.org/</a>

评论 #34555358 未加载

Show HN: Syncing data to your customer’s Google Sheets

10 条评论

Show HN: Syncing data to your customer’s Google Sheets

10 条评论