Late night semi related rant while I can't sleep.<p>I worked at one place where we had this big distributed system for processing about 1M table rows (recalculate some stuff with latest code to see if latest code has regressions).<p>I joined a couple years after launch and it took months to get it working okay and have good visibility on it.<p>It took about eight hours to run, eventually got it down to three. The actual calculations only took like a second per object, so with 24 or so VMs you get the 8 hours. Sometimes it would take too long and the cron would seed the next batch of items in the queue without checking if it was empty, resulting in a process that never finished!<p>You're probably thinking, just add more nodes! Scale horizontally! We'll, we were on kubernetes. Except we weren't allowed to use k8s. We had to use this abstraction provided by devops AROUND k8s. This framework had some limitations.<p>Also, simply scaling horizontally would take down the production DB due to number of connections, instead of say using multithreading and reusing connections.<p>I had a solution that ran through all the data on my MacBook with GNU parallel in less than an hour, but I could never convince the architects to let me deploy it. :)<p>So, distributed stuff can be really nice. But if you're having trouble building the simple version done well, probably don't make it distributed yet. I might still have PTSD from "hey can you run the Thing on your laptop again? The Thing in prod won't finish for another 9 hours."