TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Things I wish we had known about scaling

19 点作者 martinkl大约 11 年前

4 条评论

danudey大约 11 年前
&gt; In a database like PostgreSQL or MySQL, each client connection to the database is handled by a separate unix process<p>This isn&#x27;t correct. They&#x27;re often handled by threads, but not necessarily 1:1.<p>&gt; Every connection adds overhead, so the entire database slows down, even if those connections aren’t actively processing queries.<p>I&#x27;ve never known this to be true, even in large production systems. Can anyone cite?<p>&gt; Partitioning (sharding) and read replicas probably won’t help you with your connection limit, unless you can somehow load-balance requests so that all the requests for a particular partition are handled by a particular server instance<p>Sharding and read replicas are two very different ways of handling data; the issues cited as problems only affect sharing and not read slaves.<p>&gt; That’s all doable, but it doesn’t seem a particularly valuable use of your time when you’re also trying to iterate on product features.<p>If you can&#x27;t scale you database, then adding more functionality is a bad thing. Software engineering doesn&#x27;t stop once you make an API call to someone else&#x27;s software.<p>&gt; in order to set up a new replica, you have to first lock the leader to stop all writes and take a consistent snapshot (which may take hours on a large database)<p><pre><code> mysqldump --single-transaction --master-data </code></pre> You can even gzip this on-the-fly and stream it via SSH to the new server to avoid disk I&#x2F;O on the local machine competing, or even connect remotely from another server via mysqldump to avoid the SSH overhead.
mysteriousllama大约 11 年前
To add to this: Cache and cache invalidation.<p>Without proper caching and a good invalidation strategy your databases will get pounded. Use redis and memcache to cache everything possible. Don&#x27;t even connect to the database unless you have to. Ensure that you can invalidate any cache entry easily and keep things atomic so you do not run in to race conditions. Use locking to ensure that when the cache expires the database does not get a dog-pile with multiple copies of the same query. You&#x27;d think the query-cache in your database of choice may be just as efficient but trust me, it is not even close. You can also cache higher-level objects than just simple queries.<p>Depending on your reliability requirements you may even consider treating your cache as writeback and doing batched database writes in the background. These are generally more efficient than individual writes due to a variety of factors.<p>I&#x27;ve worked on several top-2oo ranking sites and this has always been one of the main go-to strategies for scaling. Databases suck - Avoid querying them.
s0enke大约 11 年前
Since MySQL 5.6, most schema changes (add&#x2F;drop field) are possible online as well:<p><a href="http://dev.mysql.com/doc/refman/5.6/en/innodb-create-index-overview.html" rel="nofollow">http:&#x2F;&#x2F;dev.mysql.com&#x2F;doc&#x2F;refman&#x2F;5.6&#x2F;en&#x2F;innodb-create-index-o...</a>
dalek2point3大约 11 年前
&quot;Database connections are a real limitation&quot;<p>I repeat<p>&quot;Database connections are a real limitation&quot;<p>this should be one point, and the others should be one point. No?