Ask HN: How would you handle a large traffic spike (eg. being frontpage on HN)?

33 pointsby atarianabout 8 years ago

What strategies would you use to prepare or respond to a large spike of traffic?

17 comments

patio11about 8 years ago

This depends almost entirely on what you're doing. If the site at issue isn't dynamic, my suggested action is probably "do nothing", because being frontpaged by HN is not going to tax any computer routinely used to serve web pages in 2017.If you genuinely have problems, then the question boils down to "What is going to break first?" and whether it makes more sense to harden that or temporarily disable it. If you're a flight search engine and responding to flight searches is just intrinsically costly, then you probably need more capacity and/or a way to "shed load" and redirect folks into some sort of queue or alternative UX ("We're overloaded; give us your email address and we'll get back to you.")If the dynamism on your pages is something that is incidental to their functionality, fakeable, or chosen simply for programmer convenience, you can dial down the dynamism for the time being via e.g. sticking a cache in front of the page, serving a static HTML version of it, pre-baking the default search/etc rather than recomputing it live for each of the 40k sessions, etc.The most common "HN killed my website" is probably WordPress being served by Apache with KeepAlive on. That isn't simple to remediate if you continue serving WordPress from Apache; this would be one of my fairly few cases where it's a fundamental technology choice. (It is possible this has changed in recent years, but for a period of several years "apt-get php apache2" would get you an install guaranteed to blow up in production with over 10 simultaneous users.)

评论 #14340023 未加载

评论 #14339566 未加载

评论 #14340259 未加载

rubyfanabout 8 years ago

Caching.I used to be responsible for sites involved with the Olympics. For 100 weeks traffic was next to nothing. For the two weeks preceding and two weeks during the games there really was spike after spike after spike.The site basically used nginx with SSI to serve up static content that was generated from Rails. Almost everything we could make static was. We rsync'd files every minute or so across the cluster and would lazy load for content at an individual node if needed.For dynamic stuff we figured out tricks to use JS, APIs and memory cache dynamic partials. I wouldn't recommend any of that unless you really have the need though.

评论 #14338879 未加载

alain94040about 8 years ago

Turn on caching. That's it. Even at #1 on HN, you're looking at no more than 10-20K visitors in a day. It's not that much traffic.

iEchoicabout 8 years ago

To prepare:1. Load test and fix what breaks. In the fastest and least-sophisticated way, this is just taking typical requests and throwing increasing numbers of them at your server until it breaks (or if you're testing in production, until user experience begins to deteriorate). This is the most accurate way to identify performance bottlenecks and also gives a rough estimation of your total capacity. If you do nothing else, do this.2. Make sure your failure state isn't disastrous (for example, if our servers go down, you're still presented with a somewhat-functional webapp, not a 503 error page).3. Make sure you are able to (and know how to) identify when systems are failing due to traffic and quickly add capacity to any component of your system. Ideally you have SMS/email alerts for this (they're really easy to set up in AWS, for instance)."Hugs of death" (at least at the HN or even large subreddit scale) are not usually caused by lack of raw computing power, they're usually caused by architectural/algorithmic flaws exposed by unusual request volume. Send that traffic yourself ahead of time, and then fix those.This is essentially how sites and services for large hardware launches are scaled (such as console launches), just with more sophisticated methods. I took this approach with Guilded (<a href="http://www.guilded.gg" rel="nofollow">http://www.guilded.gg</a>) and the hug from hitting #2 on a million-person subreddit only reached about 15% of capacity.

iMerNiborabout 8 years ago

Caching! Cache responses where possible and have the webserver serve it instead of going to a dynamic backend (php, node, ruby, python, ...).Not worked with apache for a while (it is probably sufficiently configurable too), but nginx is quite resilient and makes "basic" caching very easy (proxy_cache and fastcgi_cache should cover you)

twobyfourabout 8 years ago

If our current site got a 10x spike in traffic with little to no warning, the only thing we could do to prevent it from keeling over would be to raise the cache timeout on our caching proxy.Why?Well, we could easily add more application servers as needed. That would take about 15 minutes, and we'd probably only need to increase the count by about 50%, as we've got plenty of spare capacity.The real problem is our database server. Or rather, the way our CMS uses the database. Any sort of traffic increase to the CMS hammers the DB to the point of deadlocks (yes, on reads) for other portions of the site. We have some plans to improve the situation (including changing some stupid DB config decisions made a decade ago), but nothing that can be implemented short term.Thankfully, the CMS content is fairly static, and our 1-min caching isn't very aggressive. Increasing the cache timeout to 10-15 min would result in far less backend traffic and get us through most traffic spikes. The rest of our site either is available only to paying users (and thus far less likely to be significantly affected by a traffic spike) or served primarily by other data stores with much more room for capacity growth in their present configuration.

remxabout 8 years ago

Make everything static. Unless you really need some sort of dynamic content. If you do need dynamic content, make sure to stress test it. There are tools out there to load test your website to see if it breaks.Put it on Cloudflare. Cloudflare can absorb huge volumes of traffic with ease. Keep in mind there are other WAFs (Web Application Firewalls) you can check out.Use as little third party widgets / bells and whistles as possible, and self host assets when you can. If these go down (which they will when your site is trending on Hackernews), then your site may not load correctly and leave your users frustrated. Remember the recent S3 failure? It broke thousands upon thousands of sites.

评论 #14351632 未加载

dmitrygrabout 8 years ago

If you don't make your website a megabyte-sized monster, it isn't a problem. Serving simple html/css content is possible to more users than hn has at any given point in time from a single-core Pentium 2. Anything more than that is just bloat.

bgammonabout 8 years ago

Prepare ahead of time. Short of using an elastic load balancing service, predict when you may have traffic spikes, and choose a cheap-to-implement solution such as request-level load balancing.Create replicas of your web server processes on different machines for the duration you expect a potential traffic spike. Use a fast dispatcher like nginx[0] as a reverse proxy to load-balance requests to the appropriate replica web server machine.If you see consistently low traffic, spin down the replicas and remove them from your load balancer configuration.[0] <a href="http://nginx.org/en/docs/http/load_balancing.html" rel="nofollow">http://nginx.org/en/docs/http/load_balancing.html</a>

BjoernKWabout 8 years ago

Last time (actually, the first time ...) a blog post on my website went to the front page of Hacker News WordPress with proper caching enabled (using WP Super Cache with pretty much the default settings) worked just fine.

bigiainabout 8 years ago

A one-off (or hoped-for) event? I'd just stick CloudFront in front of it...If I were hoping to build a high traffic site (as in - I expected long-term heavy traffic, rather than a single "spike") - I'd work out how to most easily implement my cms's caching options with CloudFront or S3 or some other CDN.Most important thing is to make sure a page view doesn't _really_ require a bunch of db hits or personalisation. (Especially not if you're using something like SiteCore or are running super lean with WordPress on inexpensive shared hosting...)

tedmistonabout 8 years ago

I've had several posts on the front page running on small boxes without issue. At peak I saw 150-200 simultaneous users in Google Analytics. Sitting at the top of the front page for several days would probably see more than that. Putting a Cloudflare caching layer in front wouldn't hurt, but I didn't really need to do anything to prepare.Alternatively just publish on a hosted blog service like WordPress, Medium, etc.(This assumes your content is a post and not an app.)

LinuxBenderabout 8 years ago

Static content in a ram disk and haproxy+apache replicas grown as required. No CDN.Even the most "dynamic" sites in terms on non user-specific content can be turned into a static snapshot via a simple cron job. That means the dynamic content is only ever hit by one person, that being cron.Real user-specific dynamic content must require a login and cookies for haproxy to even let the connection pass the first stage of validation.

nkkollawabout 8 years ago

Happened to me with an article of mine.It got to be 3rd at the most, if I remember correctly.I got 300 people at the same time for about 10 hours, then high but not crazy traffic for 2-3 days after that.300 people at the same time is not a lot.I had a WordPress website with no cache on a dedicated VPS (but nothing crazy), and neither the CPU nor memory got that busy. 300 people are about 1 person/second, 2 max. That's not a lot of load for a server.

cdevsabout 8 years ago

My company did some email marketing that would spike traffic to 50,000 visitors in 30-60 mins and Wordpress MySQL would go nuts. Who knows how many horrible plugins marketing had in there but first go to for me was varnish, then cloud flare then final awesome fix was swapping everything to static HTML that we crawled every night ourselves from our own htaccess blocked Wordpress. Only thing that had to be dynamic /do related was the contact form.

评论 #14342728 未加载

评论 #14341103 未加载

hoodoofabout 8 years ago

Is HN front page a large spike in traffic?

评论 #14339053 未加载

Shorelabout 8 years ago

My planned solution: Use Jekyll instead of WordPress.I have still not finished that migration.

评论 #14342305 未加载