Stack Overflow is a cacheless, 9-server on-prem monolith

160 点作者 mike_h大约 2 年前

26 条评论

atonse大约 2 年前

Even though I love their simplicity as an example of how to be pragmatic and not over-engineer, do remember that they’ve tuned their code to the point that they built an ORM that is one of the fastest in the NET world. I used it and it was awesomely lightweight.It’s as much an example of how far world class talent can go, as it is about doing more with less.

评论 #34951597 未加载

评论 #34951326 未加载

评论 #34956810 未加载

评论 #34951368 未加载

评论 #34953176 未加载

eduction大约 2 年前

The best cache is the one built into the database. People seem to forget that the major rdbmses have sophisticated cache strategies of their own and that handing them more RAM (and ensuring they are configured to use it for query or other cache) is usually a good first strategy before trying to second guess and reinvent the cache outside the db.Thread says SO allocates 1.5TB RAM to SQL Server. Sounds wise.

评论 #34951808 未加载

评论 #34956056 未加载

PaulKeeble大约 2 年前

Microservices remains mostly an organisational pattern to scale development teams not necessarily the system performance. Microservices add a lot of complexity and overhead.

评论 #34951447 未加载

评论 #34959701 未加载

lifeisstillgood大约 2 年前

The main takeaway is that the questions searched for are so widely distributed that there is no need for a cache layer - they are nothing but long tail.At that point there is no 'cloud' design that can help. Its either one database (or maybe just shard everything onto thousands of distributed nodes)But the point I am trying to make is that kubernetes and microservices etc are based on idea of winners - power laws. One tweet everyone wants to read. One search term, one viral video.Then again. This is just a question of taste - the taste of the dev lead. What (s)he feels is best approach. Take another company doing the same thing and different approach might emerge.

评论 #34953004 未加载

selcuka大约 2 年前

It is ironic that many questions on Stack Overflow are about various cloud services, hyped-up technologies, and problems caused by over-engineering.

评论 #34951244 未加载

评论 #34952992 未加载

评论 #34951223 未加载

cntainer大约 2 年前

Imagine trying to present this kind of architecture to a room full of executives already sold on the "benefits" of kubernetes, big data, serverless, etc.

评论 #34951483 未加载

评论 #34958600 未加载

评论 #34951493 未加载

ctvo大约 2 年前

The folks over at SO picked a stack (C#, SQL Server, IIS), and optimized the heck out of it to keep this "simplicity". Much of SO is custom built from the ground up to push performance and stay within the purity of the canonical .net stack.It isn't clear to me this is a model that would work elsewhere, or should be held up as something to be replicated.Did they save time? Did they save money? Did this help make SO a wildly successful company? Did it allow them to deliver features to customers faster?

评论 #34961609 未加载

cosmotic大约 2 年前

It's not cacheless. There are countless caches throughout (including what appears to be ~1TB of memory in the database server), just not a dedicated cache machine.

评论 #34954700 未加载

评论 #34959249 未加载

tylergetsay大约 2 年前

I don't think its that much more complicated than Wikimedia, which does 5x the traffic: <a href="https://meta.wikimedia.org/wiki/Wikimedia_servers" rel="nofollow">https://meta.wikimedia.org/wiki/Wikimedia_servers</a>

bluedino大约 2 年前

Not that long ago (2016) they had:<pre><code> Servers: SQL Servers (Stack Overflow Cluster) 2 Dell R720xd Servers SQL Servers (Stack Exchange “…and everything else” Cluster) 2 Dell R730xd Servers, each with: Web Servers 11 Dell R630 Servers Service Servers (Workers) 2 Dell R630 Servers 1 Dell R620 Server Elasticsearch Servers (Search) 3 Dell R620 Servers HAProxy Servers (Load Balancers) 2 Dell R620 Servers Redis Servers (Cache) 2 Dell R630 Servers VM Servers (VMWare, Currently) 2 Dell FX2s Blade Chassis, each with 2 of 4 blades populated 4 Dell FC630 Blade Servers (2 per chassis) 2 Equalogic SAN PS6000-series Machine Learning Servers (Providence) 2 Dell R620 Servers Machine Learning Redis Servers (Still Providence) 3 Dell R720xd Servers LogStash Servers 6 Dell R720xd Servers HTTP Logging SQL Server 1 Dell R730xd Development SQL Server 1 Dell R620 Network: 2x Cisco Nexus 5596UP core switches (96 SFP+ ports each) 10x Cisco Nexus 2232TM Fabric Extenders (2 per rack) 2x Fortinet 800C Firewalls 2x Cisco ASR-1001 Routers 2x Cisco ASR-1001-x Routers 6x Cisco 2960S-48TS-L Management network switches (1 Per Rack) </code></pre> <a href="https://nickcraver.com/blog/2016/03/29/stack-overflow-the-hardware-2016-edition/" rel="nofollow">https://nickcraver.com/blog/2016/03/29/stack-overflow-the-ha...</a>

Fire-Dragon-DoL大约 2 年前

Isn't stackoverflow, incidentally, one of the websites who would benefit the most from caching, given their content supposedly is going to be static the majority of the time?

评论 #34953307 未加载

bitwize大约 2 年前

That defies the laws of physics. How can they be web scale without cloud and microservices?

评论 #34955671 未加载

tony-allan大约 2 年前

In the diagram [1], I can see why you might design it that way if starting from scratch but it works as is so why change it.Is there a particular reason to suggest a change to the architecture?[1] <a href="https://twitter.com/sahnlam/status/1629713954225405952/photo/1" rel="nofollow">https://twitter.com/sahnlam/status/1629713954225405952/photo...</a>

评论 #34951293 未加载

评论 #34951291 未加载

kichik大约 2 年前

Is there a website that tracks outages of other websites like Stack Overflow over years? I know some that tell you if it's down right now, but not over years.I have a subjective feeling that Stack Overflow is down a lot more than other websites. I don't see that ever mentioned in the discussion of cloud vs on-prem which makes the discussion seem lacking.

评论 #34951398 未加载

评论 #34951653 未加载

tyingq大约 2 年前

Not caching the questions and answers makes sense to me, as I imagine the hit rate wouldn't be terribly good. I would guess, though, that they somehow cache things like the sidebar list of blog articles, featured items, "Hot Network Questions", etc.

评论 #34951374 未加载

jonas-w大约 2 年前

The linked url [0] is also a great visualization with a bit more data than the twitter image.[0] <a href="https://stackexchange.com/performance" rel="nofollow">https://stackexchange.com/performance</a>

评论 #34956231 未加载

tiffanyh大约 2 年前

> Removed Redis 4 years ago; average latency remained unchanged at 20ms.A hidden taken away is that NVMe storage databases are so fast, they are comparable to in-memory (redis) databases these days.

评论 #34951618 未加载

foobazzy大约 2 年前

Please ignore my lack of understanding a bit here. I'm genuinely trying to learn.I've always heard (and it made sense to me) that to reduce latency of requests from across the globe, you might want to have read replicas or caches spread on global infrastructure. Then how is it that stack overflow is fast here when the db is on-prem, 7 seas across from me? Any amount of RAM should not account for the distance, right?

评论 #34953527 未加载

wlonkly大约 2 年前

When I look up www.stackoverflow.com, I get Fastly IPs. I feel like using a CDN has to count as some cache?

bryancoxwell大约 2 年前

It’s also one of the few sites I use that regularly goes down for maintenance.

评论 #34951419 未加载

评论 #34951593 未加载

ec109685大约 2 年前

Source material is from 2022, so title should include that disclaimer.

ksec大约 2 年前

And somehow Wikipedia require thousands of severs.

评论 #34959394 未加载

didntreadarticl大约 2 年前

And runs on .NETOne of the only well known sites to do so, I think?

评论 #34951224 未加载

评论 #34951316 未加载

mike_hearn大约 2 年前

It's a useful reality check. Dedicated machines are fast and you can do a lot without much software complexity. People mention the StackOverflow guys optimizing their software, but their CPU utilization is 5% so they have a lot of headroom to be less optimized. Probably they just enjoyed it and could spend time on that, so why not?At KotlinConf in April I'll be giving a talk on two-tier architecture, which is the StackOverflow simplicity concept pushed even further. Although not quite there yet for social "web scale" apps like StackOverflow, it can be useful for many other kinds of database backed services where the users are a bit more committed and you're less dependent on virality. For example apps where users sign a contract, internal apps, etc.The gist is that you scrap the web stack entirely and have only two tiers: an app that acts as your frontend (desktop, mobile) and an RDBMS. The frontend connects directly to the DB using its native protocols and drivers, the user authentication system is that of the database. There is no REST, no JSON, no GraphQL, no OAuth, no CORS, none of that. If you want to do a query, you do it and connect the resulting result stream directly to your GUI toolkit's widgets or table view controls. If what you want can't be expressed as SQL you use a stored procedure to invoke a DB plugin e.g. implemented with PL/Java or PL/v8. This approach was once common - the thread on Delphi the other day had a few people commenting who still maintain this type of app - but it fell out of favor because Microsoft completely failed to provide good distribution systems, so people went to the web to get that. These days distributing apps outside the browser is a lot easier so it makes sense to start looking at this design again.The disadvantages are that it requires a couple more clicks up front for end users, and if they have very restrictive IT departments it may be harder for them to get access to your app. In some contexts that doesn't matter much, in others it's fatal. The tech for blocking DoS attacks isn't as good, and you may require a better RDBMS (Postgres is great but just not as scalable as SQL Server/Oracle). There are some others I'll cover in my talk along with proposed solutions.The big advantage is simplicity with consequent productivity. A lot of stuff devs spend time designing, arguing about, fighting holy wars over etc just disappears. E.g. one of the benefits of GraphQL over plain REST is that it supports batching, but SQL naturally supports even better forms of batching. Results streaming happens for free, there's no need to introduce new data formats and ad-hoc APIs between frontend and DB, stored procedures provide a typed RPC protocol that can integrate properly with the transaction manager. It can also be more secure as SQL injection is impossible by design, and if you don't use HTML as your UI then XSS and XSRF bugs also become impossible. Also because your UI is fully installed locally, it can provide very low latency and other productivity features for end users. In some cases it may even make sense to expose the ability to do direct SQL queries to the end user, e.g. if you have a UI for browsing records then you can allow business analysts to supply their own SQL query rather than flooding the dev's backlog with requests for different ways to slice the data.

评论 #34958251 未加载

评论 #34959014 未加载

yamrzou大约 2 年前

Is it hosted on the cloud?

评论 #34951301 未加载

faizmokhtar大约 2 年前

"What I think it should be"That's a little bit arrogant no?

评论 #34951389 未加载

评论 #34951362 未加载