I am astonished that, in the two years, you had not already handled 100+ scheduled failovers. If your HA is good, customers don't notice, and if not, you find out when there are fewer of them (and in daytime!), and fix it.<p>Probably by now Pacemaker would have been abandoned. A
hundred drills would have been enough to flush out these behaviors. If you are afraid to run drills on production equipment, you should be running them on a full-scale production testbed, ideally with mirrored production traffic. With a production-scale testbed, two years is enough to run thousands of risk-free failovers.<p>Not doing frequent production failure drills is just irresponsible.