科技回声

4 条评论

winrid超过 3 年前

Late night semi related rant while I can't sleep.I worked at one place where we had this big distributed system for processing about 1M table rows (recalculate some stuff with latest code to see if latest code has regressions).I joined a couple years after launch and it took months to get it working okay and have good visibility on it.It took about eight hours to run, eventually got it down to three. The actual calculations only took like a second per object, so with 24 or so VMs you get the 8 hours. Sometimes it would take too long and the cron would seed the next batch of items in the queue without checking if it was empty, resulting in a process that never finished!You're probably thinking, just add more nodes! Scale horizontally! We'll, we were on kubernetes. Except we weren't allowed to use k8s. We had to use this abstraction provided by devops AROUND k8s. This framework had some limitations.Also, simply scaling horizontally would take down the production DB due to number of connections, instead of say using multithreading and reusing connections.I had a solution that ran through all the data on my MacBook with GNU parallel in less than an hour, but I could never convince the architects to let me deploy it. :)So, distributed stuff can be really nice. But if you're having trouble building the simple version done well, probably don't make it distributed yet. I might still have PTSD from "hey can you run the Thing on your laptop again? The Thing in prod won't finish for another 9 hours."

eggy超过 3 年前

AWK seems to be having a renaissance, and I wonder if it is only because Perl sort of lost favor for a while to Python and others, while Perl 6 being renamed Raku added further confusion. Using options like 'perl -pie' gives you a nice subsystem to perform your AWK-like operations, and you have a lot more in Perl to back it up if needed. I am seeing AWK pop up here and in other forums, but maybe I am just focused on it for now.

评论 #28346251 未加载

asicsp超过 3 年前

Previous discussion: <a href="https://news.ycombinator.com/item?id=23394024" rel="nofollow">https://news.ycombinator.com/item?id=23394024</a>See also "Command-line Tools can be 235x Faster than your Hadoop Cluster": <a href="https://news.ycombinator.com/item?id=22188877" rel="nofollow">https://news.ycombinator.com/item?id=22188877</a>

nafizh超过 3 年前

You can definitely use awk (I use it myself). But lets not pretend it’s readable for anyone after the original writer. It has a single purpose and that is to get the text munging task in front of you done as quickly as possible.

Running Awk in parallel to process 256M records

4 条评论

Running Awk in parallel to process 256M records

4 条评论