Great write up. This will be a good point of reference for future debates concerning the value of selecting high-performance platforms for web-applications. A common refrain among advocates of slower platforms is that computational performance does not matter because applications are invariably busy waiting on external systems such as databases. While that may be true in some cases, database query performance is often only a small piece of an overall performance puzzle. Blaming external systems is a too-convenient umbrella to avoid profiling an application to discover where it's actually spending time. What to do once you've made external systems fast and your application is still squandering a hundred milliseconds in the database driver or ORM, fifty milliseconds in a low-performance request router, and another 250 milliseconds in a slow templater or JSON serializer?<p>Yes, three seconds for a page render is still uncomfortably slow, but it's substantially faster than the original implementation, and unsurprisingly frees up CPU for additional concurrent requests. It's a shame Wikimedia didn't have this platform available to them earlier.<p>Today web developers have many high-performance platform options that offer moderate to good developer efficiency. Those who use low-performance platforms may do their future selves a service by evaluating (comfortable) alternatives when embarking on new projects.
Several of the previous comments here have quoted a key, interesting fact from the submitted article: "Between 2-4% of requests can’t be served via our caches, and there are users who always need to be served by our main (uncached) application servers. This includes anyone who logs into an account, as they see a customized version."<p>That's my experience when I view Wikipedia. I am a Wikipedian who has been editing fairly actively this year, and I almost always view Wikipedia as a logged-in Wikipedian. I see the Wikimedia Foundation tracks the relevant statistics very closely and has devoted a lot of thought to improving the experience of people editing Wikipedia pages. I can't say that I've noticed any particular improvement in speediness from where I edit, and I have definitely seen some EXTREMELY long lags in edits being committed just in the past month, but maybe things would have been much worse if the technical changes this year described in this interesting article had not been made.<p>From where I sit at my keyboard, I still think the most important things to do to change the user experience for Wikipedia editors is to change the editing culture a lot more to emphasize collaboration in using reliable sources over edit-warring around fine points of Wikipedia tradition from the first decade of Wikipedia. But maybe I feel that way because I have worked as an editor in governmental, commercial, and academic editorial offices, so I've seen how grown-ups do editing. I think the Wikimedia Foundation is working on the issue of editing culture on Wikipedia too, but fixing that will be harder than fixing the technological problems of editing a huge wiki at scale. Human behavior is usually a tougher problem to solve than the scalability of software.<p>By the way, the article illustrates the role for-profit business corporations like Facebook have in raising technical standards for everybody through direct assistance to nonprofit organizations running large websites like the Wikimedia Foundation. That's a win-win for all of us users.
> and there are users who always need to be served by our main (uncached) application servers. This includes anyone who logs into an account, as they see a customized version of Wikipedia pages that can’t be served as a static cached copy<p>I keep hearing this, but it isn't true anymore. For something like wikipedia, even when I'm logged in, 95% of the content is the same for everyone (the article body). You can still cache that on an edge server, and then use javascript to fill in the customizations afterwards. This will get you two wins: 1) The thing the person is most likely interest in will load quickly (the article) and 2) your servers will have a drastically reduced load because most of the content still comes from the cache.<p>The tradeoff of course is complexity. Testing a split cache setup is definitely harder and more time consuming as is developing towards it. But given the page views of Wikipedia, would be totally worth it.
worth mentioning: phpng (PHP7) has cut cpu time in half over the past year [1] (scroll down), don't know what the mem situation is. i don't know if HHVM has additional advantages over plain PHP, but certainly the list of major benefits will be smaller by the next major version.<p>[1] <a href="https://wiki.php.net/phpng" rel="nofollow">https://wiki.php.net/phpng</a>
> Between 2-4% of requests can’t be served via our caches, and there are users who always need to be served by our main (uncached) application servers. This includes anyone who logs into an account, as they see a customized version<p>I run a similar site (95% read-only), and have been pondering whether it would make sense to use something like Varnish's Edge Side Includes (Like SSI, combining cached static page parts and generated dynamic page parts) -- I wonder if they've considered that and what the results would be like?
It's great to see such a significant improvement, but it goes to show just how limiting the CGI era architecture really is.<p>A modern persistent web apps running in Python/Java/Ruby/etc is able to perform preparatory work at startup in order to optimize for runtime efficiency.<p>A CGI or PHP app has to recreate the world at the beginning of every request. (Solutions exist to cache byte code compilation for PHP, but the model is still essentially that of CGI.) Once your framework becomes moderately complex the slowdown is painful.
Facebook is doing a lot of awesome open-source work. They already have Open Compute, many projects on Github - <a href="https://github.com/facebook" rel="nofollow">https://github.com/facebook</a>, and sent a developer to MediaWiki to help with the migration to HHVM. I hope Facebook keeps open-sourcing their internal projects (in addition to contributing in existing ones)!
> Between 2-4% of requests can’t be served via our caches, and there are users who always need to be served by our main (uncached) application servers. This includes anyone who logs into an account, as they see a customized version of Wikipedia pages that can’t be served as a static cached copy,<p>Is this why we get logged out every 30 days, to boost cache hits for users who rarely need to be logged in? (It seems like every time I want to make an edit I have to login again.)
Curious why they're using squid and not varnish for caching. Weird how they're progressive with PHP but still sticking with the antiquated squid.
Honest questions: What kind of benchmark do others uses for a 'reasonable' response time? Of course it fully depends on the use-case (rendering a video can be hard in 500ms), but for user facing stuff?
In my previous startup we tried to stay within 500ms.
Not saying this isn't a great improvement, but to me 3s still sounds quite long? (not saying it's easy to do quicker!)
= How to make it take 0.0 seconds to 1.0 seconds.<p>Save early, before the Editor actually presses save. Only commit the change if the Editor actually presses save.<p>This will improve the Editor experience by making save faster at the expense of CPU time. Predict well enough, or do enough processing client side, then you won't need extra server side CPU used.<p>What is more precious to you? The human Editors or some dumb pieces of silicon?
Here's a video of the author's presentation at Scale Conf on migrating Wikipedia to HHVM: <a href="http://www.dev-metal.com/migrating-wikipedia-hhvm-scale-conference-2014/" rel="nofollow">http://www.dev-metal.com/migrating-wikipedia-hhvm-scale-conf...</a>
I feel like 3 seconds to load a page is still slow. I assume that is time to build a page that doesn't hit cache, and that Wikipedia is using something like varnish to cache pages most of the time.<p>Still, 3 seconds to load a page feels like a slow page and should be a lot faster.