TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Basic Concepts of High Availability Linux

101 pointsby frenkelabout 11 years ago

9 comments

keypusherabout 11 years ago
This article is pretty thin. We are currently building out a clustered application on corosync/pacemaker with postgres synchronous replication and tomcat, and I have mixed feelings about Linux HA so far. It isn't too bad to get something basic set up (cluster with virtual IP for instance), but when things don't work it can be difficult to figure out why. If you are looking for a distributed filesytem GFS2 in this stack isn't bad. However, it seems like there are a lot of differences between package versions, and the interactions between versions of heartbeat, corosync, pacemaker, crm, pcs, your resource definition ocf files, the stonith resources, cluster-glue, along with the linux packages makes problems hard to track down and much of the web info you do find out of date. I've often had to resort to irc or the mailing list to try and figure things out, and even then sometimes it seems like nobody knows. The whole thing feels a little bit shaky at first, but it is possible to build a solid cluster on top of it with enough effort.
评论 #7455998 未加载
评论 #7456159 未加载
ww520about 11 years ago
Here&#x27;s my experience in building HA in Linux. There are two key pieces: storage replication and failure detection. Replication is so that there&#x27;s a standby system with the same persistent state ready to go, and failure detection, well, the whole point of HA is to ensure ongoing operation to continue in case of failure.<p>For storage replication, Linux has the excellent DRBD (<a href="http://www.drbd.org/" rel="nofollow">http:&#x2F;&#x2F;www.drbd.org&#x2F;</a>) software to replicate disk at the block device level. This is great because any kind of disk based systems can be supported, such as database server, mail server, file server, DNS server, etc.<p>For failure detection, Linux has the Linux HA Heartbeat ( <a href="http://www.linux-ha.org/wiki/Heartbeat" rel="nofollow">http:&#x2F;&#x2F;www.linux-ha.org&#x2F;wiki&#x2F;Heartbeat</a>). This would detect failure at machine level and ensure proper failover.<p>Within a machine, there are other tools to monitor process level failure and propagate the failure to Linux HA Heartbeat.<p>BTW, STONITH is a super simple way to avoid the partition problem.
评论 #7456010 未加载
评论 #7456116 未加载
reader_1000about 11 years ago
A lot of tools are mentioned in both article and in this thread. So what is the simplest and best way to achieve a failover virtual ip assigned to cluster members? I don&#x27;t want the tool to start services, it would be enough if it won&#x27;t send the traffic to failed note by determining with simple logic like if port 80 is not listening? Having a lot of alternatives is good but confusing and having powerful tools is also good but when only simple things needs to be achieved, it requires a lot of time to configure it and it is harder while troubleshooting. I prefer &quot;keep it simple stupid&quot;.
评论 #7455364 未加载
评论 #7455181 未加载
y0ghur7_xxxabout 11 years ago
No mention of Wackamole¹ in the article, and I feel really compelled to mention it here, as it is really simple to set up a HA cluster using it. I followed this howto² a few months ago. It was really easy to configure and it runs stable since.<p>¹<a href="http://www.backhand.org/wackamole/" rel="nofollow">http:&#x2F;&#x2F;www.backhand.org&#x2F;wackamole&#x2F;</a><p>²<a href="http://www.howtoforge.com/setting-up-a-high-availability-load-balancer-with-haproxy-wackamole-spread-on-debian-etch" rel="nofollow">http:&#x2F;&#x2F;www.howtoforge.com&#x2F;setting-up-a-high-availability-loa...</a>
评论 #7454410 未加载
评论 #7455826 未加载
iSlothabout 11 years ago
Linux High Availability is certainly still used a lot within application infrastructures, especially in some of the smaller ones. However what I find more interesting are the architectures that are performing all of the availability functions in the software layer such as in the application or database code, as typically these are simpler and offer far better scalability.
pandemicsynabout 11 years ago
Its not mentioned but ucarp is handy for those times when you want to float a vip between two boxes for a bit of redundancy but don&#x27;t need something super intelligent (like where a bit of flapping is ok).<p><a href="http://www.ucarp.org/" rel="nofollow">http:&#x2F;&#x2F;www.ucarp.org&#x2F;</a>
评论 #7456024 未加载
评论 #7455581 未加载
jpetterssonabout 11 years ago
Great overview of the basic concepts, thanks!
snorkelabout 11 years ago
Server 500 error. Now that&#x27;s irony.
评论 #7454967 未加载
hepekabout 11 years ago
Install 5 different things, or just run Erlang on all nodes.