What this blogposts gets at, indirectly, is optimizing "Zero cost abstractions". In C++ there are many constructs that when translated directly to machine code would result in a severe performance penalty but clever compilers can remove this overhead. (In simple programs. When zero cost abstractions get stacked on top of each other compilers give up and performance craters.)<p>In languages like Smalltalk Everything Is An Object and instead of function calls messages are sent between objects. It's very elegant. It's up to the compiler to figure out that "1 + 1" doesn't actually require object allocations and message :+ dispatch. Instead all these abstractions can be stripped away so you end up with a single machine code instruction for the addition. In practice this is absolutely hopeless and the compiler will not be able to transform code that is dynamic in theory but static in practice to correct and efficient machine code.<p>This is for two reasons.<p>1) Language semantics. If you have multiple threads running then one thread can rewrite the code running on another thread. This means the compiler will be hamstrung in the assumptions it can make about even the most simple for loop. Either you must actually send messages for every integer addition because another thread might be listening to those messages or you disallow multi-threading. In either case your performance will be abysmal.<p>2) Wide instructions. Modern CPUs can do much work in parallel thanks to SIMD. In a world where every integer is a heap object you have no chance of SIMD optimization.<p>It's instructive to look at javascript and python. Javascript engines are much faster than python engines because Javascript doesn't do threads. This means the compiler can do tons of static code analysis. No such luck with Python. If you want to go fast with Python you use numpy, which just calls out to C. Here too the optimizations become possible because the language semantics that are holding optimization efforts back are disregarded once the Python code calls into C.<p>Most of the code we write in practice is pretty static. Dynamic features like virtual function calls, type introspection, and run-time code generation can be added to static languages without much difficulty. On the other hand, it's extremely hard (if not impossible) to get even 10% of the performance your computer is capable of using a highly dynamic language. In practice you don't even get 1%. I know this is an unpopular message but many people have worked hard at making slow languages fast for decades now and it's just not happening.
semi-related, recently started doing a "leetcode.com" challenge every morning with my coffee.<p>I am appalled that the top 150 interview questions are 80% iteration avoidance (if you are aiming for good timing).<p>My experience has always been the problem is how you handle API calls, organize queries, etc... I have maybe 5 times in my career been like 'oh wow we need to avoid this for loop'. Then of course there is managing complexity...
You can get a Phd by doing stuff like this?? I do that kind of thing for fun after work on my hobby projects, and at work I've spent weeks speeding up a single HTTP request path.
I see that I am the only one bothered by the title. Optimizing for X is a very common expression, so I read it as "optimizing for loops" as is adding more loops instead of the correct "optimizing for-loops".<p>Or am I the only one?
FWIW, my experience with JSLT <a href="https://github.com/schibsted/jslt/">https://github.com/schibsted/jslt/</a> was that I first wrote an AST interpreter. Then I and another person independently wrote bytecode interpreters, but the AST interpreter remains faster than both.<p>It may be that the JVM is simply better at optimizing one type of code than the other.
Is this pushing a stack element for every loop iteration before starting the loop execution? If there are a ton of iterations, won't the stack overflow?
I'm not sure how useful this is for interpreters in general, given the frankly bizarre design decisions. My normal rant about <i>bad</i> interpreter design assumes things not seen here.
Premature optimization is the root of all evil. Instead of optimizing the performance of the interpreter, they should optimize the language semantics to be fast by default.