科技回声

9 条评论

keypusher大约 11 年前

This article is pretty thin. We are currently building out a clustered application on corosync/pacemaker with postgres synchronous replication and tomcat, and I have mixed feelings about Linux HA so far. It isn't too bad to get something basic set up (cluster with virtual IP for instance), but when things don't work it can be difficult to figure out why. If you are looking for a distributed filesytem GFS2 in this stack isn't bad. However, it seems like there are a lot of differences between package versions, and the interactions between versions of heartbeat, corosync, pacemaker, crm, pcs, your resource definition ocf files, the stonith resources, cluster-glue, along with the linux packages makes problems hard to track down and much of the web info you do find out of date. I've often had to resort to irc or the mailing list to try and figure things out, and even then sometimes it seems like nobody knows. The whole thing feels a little bit shaky at first, but it is possible to build a solid cluster on top of it with enough effort.

评论 #7455998 未加载

评论 #7456159 未加载

ww520大约 11 年前

Here's my experience in building HA in Linux. There are two key pieces: storage replication and failure detection. Replication is so that there's a standby system with the same persistent state ready to go, and failure detection, well, the whole point of HA is to ensure ongoing operation to continue in case of failure.For storage replication, Linux has the excellent DRBD (<a href="http://www.drbd.org/" rel="nofollow">http://www.drbd.org/</a>) software to replicate disk at the block device level. This is great because any kind of disk based systems can be supported, such as database server, mail server, file server, DNS server, etc.For failure detection, Linux has the Linux HA Heartbeat ( <a href="http://www.linux-ha.org/wiki/Heartbeat" rel="nofollow">http://www.linux-ha.org/wiki/Heartbeat</a>). This would detect failure at machine level and ensure proper failover.Within a machine, there are other tools to monitor process level failure and propagate the failure to Linux HA Heartbeat.BTW, STONITH is a super simple way to avoid the partition problem.

评论 #7456010 未加载

评论 #7456116 未加载

reader_1000大约 11 年前

A lot of tools are mentioned in both article and in this thread. So what is the simplest and best way to achieve a failover virtual ip assigned to cluster members? I don't want the tool to start services, it would be enough if it won't send the traffic to failed note by determining with simple logic like if port 80 is not listening? Having a lot of alternatives is good but confusing and having powerful tools is also good but when only simple things needs to be achieved, it requires a lot of time to configure it and it is harder while troubleshooting. I prefer "keep it simple stupid".

评论 #7455364 未加载

评论 #7455181 未加载

y0ghur7_xxx大约 11 年前

No mention of Wackamole¹ in the article, and I feel really compelled to mention it here, as it is really simple to set up a HA cluster using it. I followed this howto² a few months ago. It was really easy to configure and it runs stable since.¹<a href="http://www.backhand.org/wackamole/" rel="nofollow">http://www.backhand.org/wackamole/</a>²<a href="http://www.howtoforge.com/setting-up-a-high-availability-load-balancer-with-haproxy-wackamole-spread-on-debian-etch" rel="nofollow">http://www.howtoforge.com/setting-up-a-high-availability-loa...</a>

评论 #7454410 未加载

评论 #7455826 未加载

iSloth大约 11 年前

Linux High Availability is certainly still used a lot within application infrastructures, especially in some of the smaller ones. However what I find more interesting are the architectures that are performing all of the availability functions in the software layer such as in the application or database code, as typically these are simpler and offer far better scalability.

pandemicsyn大约 11 年前

Its not mentioned but ucarp is handy for those times when you want to float a vip between two boxes for a bit of redundancy but don't need something super intelligent (like where a bit of flapping is ok).<a href="http://www.ucarp.org/" rel="nofollow">http://www.ucarp.org/</a>

评论 #7456024 未加载

评论 #7455581 未加载

jpettersson大约 11 年前

Great overview of the basic concepts, thanks!

snorkel大约 11 年前

Server 500 error. Now that's irony.

评论 #7454967 未加载

hepek大约 11 年前

Install 5 different things, or just run Erlang on all nodes.

9 条评论

keypusher大约 11 年前

评论 #7455998 未加载

评论 #7456159 未加载

ww520大约 11 年前

评论 #7456010 未加载

评论 #7456116 未加载

reader_1000大约 11 年前

评论 #7455364 未加载

评论 #7455181 未加载

y0ghur7_xxx大约 11 年前

评论 #7454410 未加载

评论 #7455826 未加载

iSloth大约 11 年前

pandemicsyn大约 11 年前

评论 #7456024 未加载

评论 #7455581 未加载

jpettersson大约 11 年前

Great overview of the basic concepts, thanks!

snorkel大约 11 年前

Server 500 error. Now that's irony.

评论 #7454967 未加载

hepek大约 11 年前

Install 5 different things, or just run Erlang on all nodes.

Basic Concepts of High Availability Linux

9 条评论

Basic Concepts of High Availability Linux

9 条评论