How PlanetScale Boost serves SQL queries faster

206 pointsby mrbbkover 2 years ago

25 comments

Jonhooover 2 years ago

:wave: Author of the paper this work is based on here.I'm so excited to see dynamic, partially-stateful data-flow for incremental materialized view maintenance becoming more wide-spread! I continue to think it's a _great_ idea, and the speed-ups (and complexity reduction) it can yield are pretty immense, so seeing more folks building on the idea makes me very happy.The PlanetScale blog post references my original "Noria" OSDI paper (<a href="https://pdos.csail.mit.edu/papers/noria:osdi18.pdf" rel="nofollow">https://pdos.csail.mit.edu/papers/noria:osdi18.pdf</a>), but I'd actually recommend my PhD thesis instead (<a href="https://jon.thesquareplanet.com/papers/phd-thesis.pdf" rel="nofollow">https://jon.thesquareplanet.com/papers/phd-thesis.pdf</a>), as it goes much deeper about some of the technical challenges and solutions involved. It also has a chapter (Appendix A) that covers how it all works by analogy, which the less-technical among the audience may appreciate :) A recording of my thesis defense on this, which may be more digestible than the thesis itself, is also online at <a href="https://www.youtube.com/watch?v=GctxvSPIfr8" rel="nofollow">https://www.youtube.com/watch?v=GctxvSPIfr8</a>, as well as a shorter talk from a few years earlier at <a href="https://www.youtube.com/watch?v=s19G6n0UjsM" rel="nofollow">https://www.youtube.com/watch?v=s19G6n0UjsM</a>. And the Noria research prototype (written in Rust) is on GitHub: <a href="https://github.com/mit-pdos/noria" rel="nofollow">https://github.com/mit-pdos/noria</a>.As others have already mentioned in the comments, I co-founded ReadySet (<a href="https://readyset.io/" rel="nofollow">https://readyset.io/</a>) shortly after graduating specifically to build off of Noria, and they're doing amazing work to provide these kinds of speed-ups for general-purpose relational databases. If you're using one of those, it's worth giving ReadySet a look to get these kinds of speedups there! It's also source-available @ <a href="https://github.com/readysettech/readyset" rel="nofollow">https://github.com/readysettech/readyset</a> if you're curious.

评论 #33615938 未加载

评论 #33615007 未加载

marzoevamover 2 years ago

It's super exciting to see Noria-based partially materialized views get this well-deserved airtime! Eliminating error-prone caching logic without any code or infrastructure changes in the context of _any_ database is our core mission over at ReadySet, and is the reason why Jon Gjengset and I spun the company out of MIT research on Noria back in 2020. You can read more in our initial announcement here: <a href="https://readyset.io/blog/introducing-readyset" rel="nofollow">https://readyset.io/blog/introducing-readyset</a>If you're reading this announcement post and want to play around with instant query caching àla Noria in your existing Postgres or MySQL database, shoot me a me an email and we'll bump you up on our cloud waitlist :) alana@readyset.io

评论 #33614898 未加载

vyrotekover 2 years ago

This reminds me a little of "materialized views". But essentially every query is potentially a view you can materialize (cache). And with this being managed at the DB level it knows when new data invalidated the previous results.Traditionally, other materialized view implementations have very strict query requirements though. The queries had to be deterministic. No left joins, dates, etc. This is required in order to properly detect when data changes "impact" the view. I wonder how they get around it.Update: Ah, ok! Here's a write up on how it works a bit. My last startup built a system like this specifically to power a gamification engine. Would have been nice to have this 10 years ago.<a href="https://planetscale.com/blog/how-planetscale-boost-serves-your-sql-queries-instantly" rel="nofollow">https://planetscale.com/blog/how-planetscale-boost-serves-yo...</a>> The Boost cluster lives alongside your database’s shards and continuously processes the events relayed by the VStream. The result of processing these events is a partially materialized view that can be accessed by the database’s edge tier. This view contains some, but not all, of the rows that could be queried.

评论 #33613047 未加载

joshstrangeover 2 years ago

> As rows are inserted, updated, and deleted in the database, the cache is kept up-to-date in real-time, just like a read replica. No TTLs, no invalidation logic, and no caching infrastructure to maintain.This is so freaking neat. Caching is one of the harder things to get consistently right and even if this was a tool that had TTLs+API to invalidate it would be cool but not even having to worry about that is even better.PlanetScale continues to be an awesome service that lets you not worry about your DB and instead focus on your application.My only wish for PlanetScale would be a few more (lower) tiers. Their free tier is very generous but has a few little things (like more than 1 dev/prod branch) that aren't supported and I always feel antsy about not having a prod-like DB for qa/staging. I normally use 3 branches and the free plan only supports 2, which I think changed, I thought I used more than 1 dev branch before I started paying.I have a very burst-y application (it's for events, so it ramps up a few months before the event, then is crazy for 2-7 days during, then usage drops to pretty much 0 for the next ~9 months), I'd love to lower my costs for those 9 months (I could look into downgrading to the free plan but I'd rather pay just a little less and have my quotas drop accordingly). In the end PlanetScale is still worth it for me at $360/yr so I'm not complaining too much. For smaller projects I just worry about using the PS free tier since if I go over those limits the jump is steep ($0->$30/mo), that said I might be overthinking it.

评论 #33613777 未加载

评论 #33611555 未加载

评论 #33617591 未加载

Nicanover 2 years ago

Awesome! I have seen PlanetScale hype up this release for weeks, and glad to finally be reading about it.My initial thoughts after reading the blog post, just to poke holes in their new product:1. Costs. This can save time on read, but it is also introducing additional writes to the database, that can be pretty expensive. PlanetScale can scale horizontally, but have to watch out how much it is going to be paying for the extra machines. (Albeit- machines are usually always cheaper than developers)2. Consistency. It was not clear if it is going to make committing transactions slower to keep all the views up to date, or if the materialized view is running slightly behind real-time.2a. And how does the materialized view handle large/slow transactions? Is there going to be any kind of serialization locks? Are the views correct inside of the transaction?3. Predictability. Query planning is a necessary hell, and different queries might have different patterns that might introduce slightly different materialized views, that could have been maybe served under the same view. Increasing the cost.3a. SQL Server took a slightly different route lately for performance, in which queries will have different plans depended on the table statistics. I wonder how such a feature would play with Boost, and if slightly different query plans might generate different materialized views.

评论 #33614610 未加载

评论 #33615466 未加载

edmundsautoover 2 years ago

I just started a small hobby project and selected supabase for my db provider. Anyone with experience in both Supa and PlanetScale care to comment about the differences?To me, it looks like supabase is designed to take full advantage of postgres features. plpgsql triggers + RLS + clientside auth + streaming changes to subscribers (including via web hooks) are my favorite features. (They also have js edge functions, but I use lambda instead b/c I prefer python)Supabase feels like the scrappy company with amazing focus, akin to an early MailChimp (circa 2007). PlanetBase feels more like early Snowflake - massive scale, focus on performance, can match anything feature-by-feature. One is a master of their craft, the other is a gorilla at scale.Curious what others think. I haven't used PlanetBase extensively so don't have much to go on except their marketing.

评论 #33613685 未加载

评论 #33613569 未加载

Eclypsover 2 years ago

I just started using Planetscale for small projects here and there. More and more of my projects are FE-heavy and don't require a big dedicated database (NextJS apps, mostly hardcoded designs or headless CMS like Sanity). There are times where I need to store just small bits of data, maybe contact form submissions or something. It's been super great to be able to quickly hook up planetscale to a nextjs api function and have that data persisted within a matter of minutes.I've yet to use it on anything large-scale, though, so I can't speak to performance when you're really pushing it.

评论 #33620349 未加载

dianfishekqiover 2 years ago

It looks like it uses the same ideas as Noria<a href="https://www.youtube.com/watch?v=s19G6n0UjsM" rel="nofollow">https://www.youtube.com/watch?v=s19G6n0UjsM</a><a href="https://github.com/mit-pdos/noria" rel="nofollow">https://github.com/mit-pdos/noria</a>

评论 #33612935 未加载

评论 #33612818 未加载

_ben_over 2 years ago

For database caching outside of PlanetScale, PolyScale.ai [1] provides a serverless database edge cache that is compatible with Postgres, MySQL, MariaDB and MS SQL Server. Requires zero configuration or sizing etc.1.<a href="https://www.polyscale.ai/" rel="nofollow">https://www.polyscale.ai/</a>

评论 #33612375 未加载

hotdamnsonover 2 years ago

Why do these new big thing databases make SQL look like some witchcraft?Here is some proper SQL query:SELECT DISTINCT<pre><code> r.id, r.owner_id, r.name, COUNT(r.id) OVER (PARTITION BY r.id) AS COUNT FROM repository r JOIN star s ON s.repository_id = r.id </code></pre> ORDER BY 4 DESC;

评论 #33614318 未加载

obviyusover 2 years ago

Has anyone who has used PlanetScale in production comment about their experience? I was evaluating a few options a couple of weeks ago but ended up going with just RDS due to lack of feedback for PlanetScale here on HN.

评论 #33612610 未加载

评论 #33612318 未加载

评论 #33612089 未加载

p10jkleover 2 years ago

See also <a href="https://readyset.io/" rel="nofollow">https://readyset.io/</a> for generic SQL support (not just Planetscale)

emptyseaover 2 years ago

I’m really curious how this works and how it’s implementation compares to something like materialize — I wonder if there are any caveats around consistency

评论 #33611259 未加载

评论 #33611368 未加载

kerblangover 2 years ago

It appears the catch is that you have to use their managed service; no DIY installation. <a href="https://planetscale.com/docs/concepts/deployment-options" rel="nofollow">https://planetscale.com/docs/concepts/deployment-options</a>Acceptable for some, maybe not others

capablewebover 2 years ago

Slightly off-topic but trying to understand something from the landing page:> Powered by open source tech - Built at Google to scale YouTube.com to billions of usersIs this a Google project/business owned by Alphabet? The text seems to indicate so, but I find no information about it when doing some quick searching or browsing through the website.

评论 #33612552 未加载

stalluriover 2 years ago

Vstream looks super cool. Can we also use it create subscriptions that can bind with ReactHooks on the front-end ? I think PlanetScale can easily deliver amazing or better than firebase subscriptions. All we need is React and NextJs SDKs to get started with :-)

评论 #33611734 未加载

aantixover 2 years ago

Didn't MySQL implement query level caching a while back?

评论 #33612163 未加载

评论 #33612085 未加载

CharlesWover 2 years ago

Dupe: <a href="https://news.ycombinator.com/item?id=33610996" rel="nofollow">https://news.ycombinator.com/item?id=33610996</a>

ISLover 2 years ago

Just once, I want the solution presented by a headline like this to be, "Well, we used a lot more computers."

netcraftover 2 years ago

meta: There is a typo in this sentence (you -> your) > But there are also disadvantages: these views are not very ergonomic when developing you application

endisneighover 2 years ago

Anyone compare this and cockroachdb?

bsnnkvover 2 years ago

PlanetScale is such a cool name, fits really well for a database company. Just goes to show that even these days when I think that naming something new is impossible, there is still a lot of room to be creative.

评论 #33612593 未加载

xmorseover 2 years ago

Query memoization with optimistic updates

kevinburkeover 2 years ago

Seems neat, but why is this better than Hadoop?

评论 #33613010 未加载

评论 #33614507 未加载

theonealtairover 2 years ago

Everything about their product is overstated and/or not relevant for most apps. Easy to get 1000x query performance improvement by starting with an extremely slow query. By that standard I could say that I've used create index statements to get 1,000,000x performance. The language is so over-the-top it makes me not even want to read the article through. I work in a real world with real database problems everyday. I would love to have real discussions and solutions to performance improvements. Making irrelevant claims just shuts that down.