TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

How I Learned to Stop Worrying and Love Automated Database Failover

81 点作者 michaelfairley超过 12 年前

3 条评论

adrianhoward超过 12 年前
<i>And, now that it is in production, we regularly test and exercise the tools involved.</i><p>This is one of the most important sentences in the article. I've seen too many systems in my time fail because the wonderful recover/failover system has never really been tested in anger, or the person who set it up left the company and the details never quite made it into the pool of common knowledge. Dealing with failover situations has to become <i>normal</i>.<p>One of the nicest piece of advice I got, many years ago, was naming. Never name systems things like 'db-primary' and 'db-failover' or 'db-alpha' and 'db-beta' - nothing that has an explicit hierarchy or ordering. Name them something random like db-bob and db-mary, or db-pink and db-yellow instead. It helps break the mental habit of thinking that one system <i>should</i> be the primary, rather than one systems just <i>happens</i> to be the primary right now.<p>Once you do that start picking a day each week to run the failover process on something. Like code integration - do it often enough and it stops being painful and scary.<p>(Geek note: In the late nineties I worked briefly with a D&#38;D fanatic ops team lead. He threw a D100 when he came in every morning. Anything &#62;90 he picked a random machine to failover 'politely'. If he threw a 100 he went to the machine room and switched something off or unplugged something. A human chaos monkey).
falcolas超过 12 年前
Automated failover, with manual recovery, is probably the best thing you can do to get high availability with databases.<p>They just fail sometimes. The ability to be back up and running before an admin can even respond will pay for itself after your first automated failover (which doesn't even address the fact that automated failover scales well - human based failover doesn't).<p>I also like their modifications to the Pacemaker resource to not flap the master role - that's really important with databases, and often overlooked with Pacemaker.
joch超过 12 年前
I have started trying out Galera Cluster[1] for MySQL, to replace a single MySQL server node with 3-4 nodes, all synchronously replicated. This should hopefully solve the problem with having to split writes to the master and reads to the slaves, and provide redundancy in case of a server going down.<p>Does anyone have any experience with Galera in a production environment? Is the setup in this article preferable to that?<p>[1] <a href="http://codership.com/content/using-galera-cluster" rel="nofollow">http://codership.com/content/using-galera-cluster</a>
评论 #4735216 未加载