I created a realtime multiplayer card game for IOS & Android. That said its obvious that I had a proxy server. And I used to have 700 daily active users and I was really proud about it.<p>Then one unfortunate day I went to holiday for 2 days. When I came back the DAU (daily active users) dropped to 100.<p>WTF happened??<p>I knew there was a bug on server that it used to crash once in 10 days. I did not get time to resolve it so made a monitoring process which will auto-restart it once it crashes. While I was away AWS EC2 instant got restarted and both server and monitoring process did not start. Server was down for 1 whole day.<p>Lesson learned:
1. Write Murphy's law on your desk.
2. How do you define quality? Users should not have desire to look out for another option once they find your product.
3. Live Ops matters :)