First, I think it is great that you found a tool that suits your needs. A few weeks ago I was mangling some data too (just about 17 million records) and would like to contribute my experience.<p>My tools of choice were awk, R, and Go (in that order). Sometimes I could calculate something within a few seconds with awk. But for various calculations, R proved to be a lot faster. At some point, I reached a problem where the simple R implementation I borrowed from Stack Overflow (which was supposed to be much faster than the other posted solutions) did not satisfy my expectations and I spend 4 hours writing an implementation in Go which was a magnitude faster (I think it was about 20 minutes vs. 20 seconds).<p>So my advice is to broaden your toolset. When you reach the point where a single execution of your awk program takes 48 minutes, it might be worth considering using another tool. However, that doesn't mean awk isn't a good tool, I still use it for simple things, as writing 2 lines in awk is much faster than writing 30 in Go for the same task.