You've got two questions here, I think, and they both deserve careful consideration.<p>1) How do you monitor your site to alert you of problems?<p>2) How do you prevent your site from going down in the first place?<p>In my opinion, #1 is more straightforward than #2, and the more uptime you need for #2, the more you're going to pay.<p>a) Write a script to send you an email / SMS when your site is down. Run it on a separate server, or run it on something like Google App Engine.<p>I'm sure there are some simple free tools out there to achieve this for you. A quick search provided:<p><a href="http://www.siteuptime.com/" rel="nofollow">http://www.siteuptime.com/</a><p>They offer a check per 30 minutes for free. Can't beat free, but this is not an endorsement, I've never used them.<p>b) Use something with more features. Run it on a separate server or use someone who provides it SaaS style.<p>Nagios, ZenOSS, Zabbix, Groundwork, OpenNMS. I can personally vouch for Nagios. It's pretty easy to get a simple configuration going, and then it can get very complicated (you might want to monitor the monitor, right?).<p>2) If you've got two servers that you can connect to a load balancer, you may be able to run in active-active mode so that if one fails, you simply lose 50% capacity. For your data store, options include a database solution like MySQL in master-master form, or a relational database that has data redundancy as a feature. In my opinion you don't choose your data backend solely because of it's redundancy and failover capabilities, but it could be a factor.<p>If you can't do active-active mode right now for some reason, then active-passive can allow you to stay up enough to deliver that message until you restore your services. If you can get an "extra" IP address, you can even do this without a load balancer involved. Take a look at keepalived.org to see how you can float an IP address between multiple servers. CDNs such as Akamai also provide site failover features that might be worth investigating.<p>There are so many ways to skin the cat on this one, and I'm just scratching the surface. If your site is more than informational (e.g. if you're building a web based service or application), then monitoring and failover / redundancy are critical to your success. If you just want your informational site to be up, it's still pretty important. The fact that you're thinking about this now and not after your first 24 hour window of downtime is a good sign!<p>If you provide more information about what your startup is doing (at least from an tech architecture perspective) and what sorts of resources you're willing to spend to improve your uptime and failover capabilities, you might get more specific suggestions.<p>Best of luck, and congrats on the startup!<p>Pete