TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

The Case for Building Scalable Stateful Services

139 pointsby aarkayover 9 years ago

6 comments

packetslaveover 9 years ago
From a talk by Caitie McCaffrey of Twitter, at the Strange Loop conference.<p>Video: <a href="https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=H0i_bXKwujQ" rel="nofollow">https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=H0i_bXKwujQ</a><p>Slides: <a href="https:&#x2F;&#x2F;speakerdeck.com&#x2F;caitiem20&#x2F;building-scalable-stateful-services" rel="nofollow">https:&#x2F;&#x2F;speakerdeck.com&#x2F;caitiem20&#x2F;building-scalable-stateful...</a>
themartoranaover 9 years ago
Stateful: you have one web server!<p>Stateless: you grow to require tens of servers or more, horizontal scalability is much cheaper than vertical, look to software solutions to help slow expenses, move to NoSQL clustered DBs like Riak, Casandra, Hadoop, etc. 1-2 engineers can still run the whole show, cloud services, SaaS and PaaS are employed.<p>Stateful: you run thousands of servers, having since brought many services back in-house. Many if not most are your own metal, with dedicated staff. Looking to slow power bills and space requirements, you look once again at software solutions.<p>If you stay at the same growing company long enough, what&#x27;s old will be new again.
评论 #10378616 未加载
评论 #10379657 未加载
jakozaurover 9 years ago
Thumb rule, if you design service with many servers you have following options:<p>1. Have a stateless service. You can update it frequently with no downtime... Relatively easy.<p>2. Use some of the shelf service that provides states and you don&#x27;t need to update that frequently (e.g. ElastiCache, Cassandra, ....). Relatively easy.<p>3. Write your own stateful service. For some applications it is a must (e.g. you do your own search service, data processing, game collision engine). Need to take care of state transition during restarts&#x2F;upgrades, client routing is also tricky. Hard, but sometimes there is no way around to build efficient infrastructure.<p>4. Don&#x27;t think about state and you may end up crying after your code hits the prod.
rdtscover 9 years ago
A comparison with Erlang&#x2F;OTP:<p><a href="http:&#x2F;&#x2F;christophermeiklejohn.com&#x2F;papers&#x2F;2015&#x2F;05&#x2F;03&#x2F;orleans.html" rel="nofollow">http:&#x2F;&#x2F;christophermeiklejohn.com&#x2F;papers&#x2F;2015&#x2F;05&#x2F;03&#x2F;orleans.h...</a>
deathtrader666over 9 years ago
Hasn&#x27;t Erlang &amp; OTP already solved this?
评论 #10379376 未加载
EGregover 9 years ago
I think that, in general, anything that has no persistence can be shared-nothing. State in shared-nothing consists basically of a cache which is updated by subscribing to changes in the data store and being updated, with only a slight lag.<p>Shared-nothing can include environments like user agents, proxies and web servers.<p>As for the persistence layer &#x2F; data store, it should support horizontal partitioning. Especially useful is range-based partitioning based on a primary key whose prefix contains a Geohash ... because then you can route requests to the closest Region on AWS or some other host.<p>If one of your shards gets too large you can split it into two or more shards. All the monitoring and splitting can be automated with dev ops in the cloud to provision machines etc. so you don&#x27;t need to wake up at 3am.<p>With this setup you can reliably grow your data store to an arbitrary size, and literally have only O(log n) growth in latency for any request. However there is one more issue to solve:<p>When you need to perform database queries that return a cross product, or join, do you compute it on the fly for the request (eg with mapreduce) or do you precompute the result whenever a row is inserted into one of the joined tables? The second way can be done in the background and uses memory-time tradeoff to cause the queries to be O(1). This can be really useful for queries that need to get the answer in realtime.<p>I would recommend using evented (eg Node.js) servers for queries that involve hitting multiple shards at the same time, or mapreduce type things. Evented I&#x2F;O lets you wait only as long as the longest query.<p>Finally, I don&#x27;t think things like socket.io will be horizontally partitionable easily, eg to a node cluster, so you&#x27;ll probably want to have server affinity on a per-room basis.