tl;dr:<p>Database did not have correct indexes.<p>Database is not scalable (serverless).<p>IOPS were hard limited by AWS plan.<p>No performance metrics/alerts in place to easily pin-point the issue.<p>Infrastructure resources are shared by all users (one user going viral means that all the other users are affected).<p>Article was somewhat confusing for me, as it didn't clearly state the problem (bottleneck), solution or architecture.