So, you want to deploy on the edge?

251 pointsby zknillalmost 2 years ago

31 comments

daxfohlalmost 2 years ago

This isn't really edge. It's multi-region. It's a great intro to multi-region considerations, but it's not edge.Edge implies putting your stuff in a CDN-sized datacenter that has only a subset of the full regional services, may not be able to scale up significantly on demand, may be more expensive, have less local storage and failover redundancy, etc. The multi-region considerations come in here too, but there's a whole extra set of things to worry about.Basically you rarely want to deploy your whole app to edge; have a central or multi-region thing under the hood, and let the edge handle some very specific functionality. And even then, only if a few ms of latency makes a big difference to your clients (and they can deal with the lack of failover redundancy).

评论 #36943512 未加载

评论 #36944031 未加载

评论 #36945137 未加载

评论 #36943498 未加载

评论 #36944835 未加载

评论 #36944458 未加载

评论 #36950749 未加载

评论 #36947707 未加载

评论 #36943570 未加载

评论 #36947178 未加载

评论 #36950554 未加载

solaticalmost 2 years ago

> Most internet apps are read-heavy, there are more reads than writes, so largely it makes sense to deal with the latency on writes. We can do this by forwarding the writes to some leader for that piece of data (e.g. the log for usernames), and waiting until that write is replicated back to us (so we can read our own writes).This is precisely the point. For most CRUD applications, writes are usually so sporadic and user-intentioned that you can show a spinning wheel for it, and users will forgive you for that slowness because it's so sporadic.Edge computing is basically reinventing insecure HTTP caches and mirrors, but allowing you to include authn/authz so that user data stays private to them. If your edge data model is that much more complicated than that, you're probably doing it wrong.

评论 #36942537 未加载

评论 #36943945 未加载

PeterCorlessalmost 2 years ago

I've seen people debating what an 'edge' even means, and wanted to chime in. Note that 'edge' is poorly defined in the industry right now. Consider the following• Remote Region Datacenter — It's certainly not this• Local Region Datacenter — It's not this• Local Municipality/County Datacenter/Co-lo/Central Office — Could it be this?• Local mini-datacenters attached to mobile towers or fiber digital loop carriers? — Could it also be this?• On-prem Datacenter — It's not this eitherA lot of people argue that an 'edge' has to be in the field — like a local mobile tower or a digital loop carrier. I'd argue that it could also be at a nearby co-lo facility or CO. Basically anything ≤300 km should be ≤1 ms latency. Within 75-100 km you're talking 0.25-0.33 ms latencies. And for most applications these days that seems "real-time" enough. YMMV.

gumbyalmost 2 years ago

This kind of load distribution is important but please, let’s not redefine the term “edge”. TCP/IP was specifically designed with the devices at the edge to function as peers. Computing on the edge means, say, doing the computing in your phone or laptop or iot frob.This is longstanding use and to pollute a useful technical term just leads to unnecessary confusion.

评论 #36942963 未加载

评论 #36943130 未加载

beck5almost 2 years ago

At last, an examination of the true nature of 'edge' computing is presented. Despite the appealing promises made by posts from Fly.io and others that depict 'edge' computing as a simple success, the reality can be more complex.I have recently spent a fair bit of time experimenting with this on Fly for my application (<a href="https://www.ssrfproxy.com" rel="nofollow noreferrer">https://www.ssrfproxy.com</a>). It's hard to beat the straightforwardness of deploying in a single region, with the database in close proximity. This approach probably to meets the needs of what 99% of developers require. Aka Heroku.

评论 #36943294 未加载

评论 #36942228 未加载

samwillisalmost 2 years ago

The real "edge" is the users device, I'm 100% sold on the concept of "local first". There is a great community building around this: <a href="https://localfirstweb.dev/" rel="nofollow noreferrer">https://localfirstweb.dev/</a>The point is, by the time you have solved all the problems with consultancy and sync to an edge node, you are most of the way to local first.

评论 #36944921 未加载

评论 #36944617 未加载

评论 #36950411 未加载

评论 #36944215 未加载

abhinav-talmost 2 years ago

Without getting into a discussion on what is the correct definition of "edge", I think the article offers 2 solutions to this problem: you have an internet app and you need to provide your users with low-latency reads while maintaining uniqueness constraint and read-your-writes consistency.The article favours dealing with latency during writes while making some assumptions, - most internet apps are read-heavy (reasonable, but what if it's not the case for you?) - you do not need sharding, all regions have all the data (what if you have a ton of data that is growing quickly? what if you have durations where some data is heavily contended for?) - the primary replica used for writes is highly available (how do you automate handling of the primary replica failure or primary replica getting partitioned?)Another approach to consider would be Amazon Dynamo style databases (Cassandra, Riak etc.) which can help when the above assumptions are not met, - Dynamo was designed to be always-writable and suitable for heavy writes and reads as well (e.g. the paper mentions teams using it for catalog data with R=1 and W=N) - data is partitioned/sharded using a variant of consistent hashing, which helps with large data volumes and limiting the impact of hot spots as well - failure detection and leader election help automate handling the partitions for which a failed node was holding a primary replica

l5870uoo9yalmost 2 years ago

The best use case for edge deployment is the deployment of static resources for the initial web app load. It achieves a latency of around 40ms and almost instantly shows the user the initial render. The later database CRUD actions are much less important when the application has already loaded. Not only because much of data can be cached in the browser or aggressively pre-fetched, but also because many CRUD actions run in the background only show errors if something went wrong.

评论 #36944073 未加载

arrty88almost 2 years ago

What about local read only replicas of the db in each region, and one primary in your primary region? Or a write through cache in each region.

评论 #36942401 未加载

评论 #36944658 未加载

评论 #36942845 未加载

评论 #36943632 未加载

评论 #36943200 未加载

评论 #36942578 未加载

fauigerzigerkalmost 2 years ago

>If a user makes a request from Europe, and the apps run in US East, that adds an extra 100-150ms of latency just by round-tripping across the Atlantic.These numbers seem high. This site shows transatlantic RTT of around 75ms: <a href="https://wondernetwork.com/pings" rel="nofollow noreferrer">https://wondernetwork.com/pings</a>If a CDN or edge computing platform reduces that to 20ms then the difference is 55ms. And it matters only for read requests that cannot be cached locally.Whether or not its worth it depends largely on the number of requests. Perhaps reducing that number should be prioritised.

评论 #36943704 未加载

评论 #36943663 未加载

skybrianalmost 2 years ago

Leaving aside what "edge" really means, for me personally, this is about connecting a website running on Deno Deploy with a database.As a hobbyist doing recreational programming, I don't want to pay for a website when it gets no traffic. Everything needs to start up on demand and shut down when idle.This means that cold start time is what really matters. Since Deno Deploy has no persistent storage yet, both the frontend and the database need to start up on first request. Having a multi-region database doesn't help if needs to be running all the time and doesn't start up very fast.As for latency when warmed up, I plan to deal with that by avoiding round trips with stored procedures and caching.Here's my "hello world" website. Any advice on technologies to check out would be welcome.<a href="https://postgres-counter-demo.deno.dev/" rel="nofollow noreferrer">https://postgres-counter-demo.deno.dev/</a>

LAC-Techalmost 2 years ago

"If a user makes a request from Europe, and the apps run in US East, that adds an extra 100-150ms of latency just by round-tripping across the Atlantic."laughs in Rural New ZealandBut seriously really enjoyed this article. I was always a bit confused how the recent influx of edge services handled writes/consistency, and yeah they all seem to have a single leader that routes writes to them - the edge is all about fast reading.

PaulHoulealmost 2 years ago

In 2006 or so I was working for an agency that was angling for a contract for the City of Ithaca to develop a system for building inspectors to use cell-phone connected PDAs to do their paperwork on the go.At the time I advocated for an “always connected” model but we were concerned that cellular connections wouldn’t be that reliable so we thought disconnected operation would be necessary.A few year back I was thinking about the future of “low code” and thought an open-source project similar to Lotus Notes but JSON-centric would be a good thing<a href="https://en.wikipedia.org/wiki/HCL_Domino" rel="nofollow noreferrer">https://en.wikipedia.org/wiki/HCL_Domino</a>In particular, Lotus notes has a model for disconnected operation that works quite well. My take is that “progressive web apps” haven’t gotten that far because they don’t have enough batteries included, a really good bidirectional sync model would make a difference.For years you could say ideas from Lotus Notes didn’t make it into the open source world because they were patented but the patents have long since expired and the main reason they are obscure is ignorance.

评论 #36946728 未加载

FooBarWidgetalmost 2 years ago

Perhaps it's better to launch multiple, region-specific versions of a site, with region-specific data. Something like amazon.com vs amazon.nl vs amazon.de. Account data would still be global but that doesn't change much so you can get away with strong consistency.Added benefit is that it clarifies how to do local compliance. With a global model, the complexity of local laws can really overwhelming. For example a global site based in the US needs to be GDPR-compliant, while a global site based in the EU has to figure out how to file multiple forms of taxes (federal, state, sales, city, district) for the 16000+ US tax jurisdictions. As a European I am more afraid of US taxes than GDPR.

评论 #36942314 未加载

评论 #36942205 未加载

opportunealmost 2 years ago

I think the footnote is pretty much the solution.. imagine you implement replication within a DC. Yes you risk data loss and maybe inconsistency (if you fail over to a new region and start doing writes without a way to fix this later) when the DC becomes unavailable but you can now offer strong consistency without expensive cross DC hops. Pretty sure this is how Google Cloud Spanner is implemented: you have a single region for writes and everywhere else is just a read partition? And it’s probably nothing fancy to set the read partition, just pick the first hop at the time of creationBasically: implementing a distributed global db without structuring the topology to minimize hops/making engineering trade offs can’t be fast. It’s all about deciding which part of the CAP theorem you want to relax.

Wissmuelleralmost 2 years ago

Great post highlighting important considerations before you make the decision to go on edge!If you do though, you should take a look at <a href="https://edgenode.com" rel="nofollow noreferrer">https://edgenode.com</a>, which I helped build (we host server-less containers on edge)

DylanSpalmost 2 years ago

Definitely some good points here. Using a single primary database seems easier for a lot of more straightforward use cases, and read replicas are probably sufficient for read-heavy workloads that can tolerate some loss of consistency.I think either the Cloudflare Workers team or the Fly folks have talked about how they envision edge apps as sharding data by region or by tenant, then sticking each shard in the optimal location. That sounds potentially useful for some cases, but then there's the complexity of sharding to deal with.

评论 #36946401 未加载

评论 #36946802 未加载

dclowd9901almost 2 years ago

Nitpicky terminology choices aside, this is a very good, easy-to-follow writeup on multi-region data system design. Understanding that the trick is shuffling around the latency makes it much easier to reason about where you want to focus your latency savings. One thing I wasn't quite sure on, though, is why Database A would care if Database B had a read of a certain kind (vs. a write). Reads don't change state, so if a million reads occur with no writes, the data should still be consistent, no?

评论 #36944876 未加载

mhartalmost 2 years ago

Another pattern here is to have your edge provider automatically move your compute closer to your backend. Cloudflare released Smart Placement a few months ago to do this exact thing (with surprisingly similar diagrams!) <a href="https://blog.cloudflare.com/announcing-workers-smart-placement/" rel="nofollow noreferrer">https://blog.cloudflare.com/announcing-workers-smart-placeme...</a>(disclaimer: I was the tech lead on this feature)

bradenbalmost 2 years ago

My takeaway from this is that a lot of people disagree on what "edge" is. IMO, "edge" is the furthest resource that you have some computational level of control over. Could be a data center, could be a phone, could be an IoT device, could be a missile.EDIT: I think I'm realizing people will disagree with me because I have a different perspective. For my use cases, my data comes from sensors on the edge, and so for me, I want my edge computing to be as close to those sensors as possible.

评论 #36944769 未加载

评论 #36944344 未加载

评论 #36944207 未加载

supriyo-biswasalmost 2 years ago

Most applications tend to benefit from serving an Early Hints[1] from a CDN, rather than actual edge computing. Early hints takes care of the delay in response time by ensuring that the page, when loaded, will tend to have all the resources it needs to load instantly.[1] <a href="https://httpwg.org/specs/rfc8297.html" rel="nofollow noreferrer">https://httpwg.org/specs/rfc8297.html</a>

ds0almost 2 years ago

I extend my prayers out to those of you who have pointy-haired bosses looking to seem trendy in their public reports by implementing this.

AtNightWeCodealmost 2 years ago

Edge is just where something is hosted. It has nothing to do with data consistency. User signup is a unique problem that any large site has. Some sites like HN also actually needs usernames which makes things more complicated. Owasp always recommend to add usernames to any login. It is such a naive approach. User names are garbage in most cases.

Aeolunalmost 2 years ago

No, I do not want to deploy on the edge, and I wish people would stop making noise over it because it’s getting really had to convice buzzword happy business people that there’s nothing wrong with running a monolith on a single big server.

nop_slidealmost 2 years ago

Great intro summary to this topic!I remember these distributed db scenarios being describe when reading "Designing Data Intensive Applications" and this was a really helpful supplementary piece to jog that memory.Cheers

rando101101101almost 2 years ago

I get the premise, but something must be said for caching cutting down on those DB transactions. Do the math for YOUR app and see what makes sense.

shrubblealmost 2 years ago

One definition of edge I found was '5ms average latency to the customer' , does that seem like a good definition?

kaycey2022almost 2 years ago

Sorry for the unrelated question, but can someone tell me how this blog is built? What tech stack is used here?

nik736almost 2 years ago

A CDN makes a lot of sense to me, having static assets that get cached close to users is a brilliant way to speed up requests. As the article points out, this is a very hard problem to solve for anything that touches a database or is not static/cached content. This is also why I don't believe in the idea that Fly.io pitches. Additionally you have the legal things to take care of with GDPR, etc. so a local database is probably the way to go for most projects.

byndlimitsfyalmost 2 years ago

I had no idea edge was so varied. Isn't Edge primarily involved in IOT?

mystic-door29almost 2 years ago

Okay