TechEcho

8 comments

gizmo11 months ago

What this blogposts gets at, indirectly, is optimizing "Zero cost abstractions". In C++ there are many constructs that when translated directly to machine code would result in a severe performance penalty but clever compilers can remove this overhead. (In simple programs. When zero cost abstractions get stacked on top of each other compilers give up and performance craters.)In languages like Smalltalk Everything Is An Object and instead of function calls messages are sent between objects. It's very elegant. It's up to the compiler to figure out that "1 + 1" doesn't actually require object allocations and message :+ dispatch. Instead all these abstractions can be stripped away so you end up with a single machine code instruction for the addition. In practice this is absolutely hopeless and the compiler will not be able to transform code that is dynamic in theory but static in practice to correct and efficient machine code.This is for two reasons.1) Language semantics. If you have multiple threads running then one thread can rewrite the code running on another thread. This means the compiler will be hamstrung in the assumptions it can make about even the most simple for loop. Either you must actually send messages for every integer addition because another thread might be listening to those messages or you disallow multi-threading. In either case your performance will be abysmal.2) Wide instructions. Modern CPUs can do much work in parallel thanks to SIMD. In a world where every integer is a heap object you have no chance of SIMD optimization.It's instructive to look at javascript and python. Javascript engines are much faster than python engines because Javascript doesn't do threads. This means the compiler can do tons of static code analysis. No such luck with Python. If you want to go fast with Python you use numpy, which just calls out to C. Here too the optimizations become possible because the language semantics that are holding optimization efforts back are disregarded once the Python code calls into C.Most of the code we write in practice is pretty static. Dynamic features like virtual function calls, type introspection, and run-time code generation can be added to static languages without much difficulty. On the other hand, it's extremely hard (if not impossible) to get even 10% of the performance your computer is capable of using a highly dynamic language. In practice you don't even get 1%. I know this is an unpopular message but many people have worked hard at making slow languages fast for decades now and it's just not happening.

评论 #40858048 未加载

评论 #40856784 未加载

评论 #40858125 未加载

评论 #40858200 未加载

评论 #40860170 未加载

bearjaws11 months ago

semi-related, recently started doing a "leetcode.com" challenge every morning with my coffee.I am appalled that the top 150 interview questions are 80% iteration avoidance (if you are aiming for good timing).My experience has always been the problem is how you handle API calls, organize queries, etc... I have maybe 5 times in my career been like 'oh wow we need to avoid this for loop'. Then of course there is managing complexity...

brabel11 months ago

You can get a Phd by doing stuff like this?? I do that kind of thing for fun after work on my hobby projects, and at work I've spent weeks speeding up a single HTTP request path.

评论 #40854177 未加载

评论 #40854031 未加载

评论 #40856213 未加载

评论 #40854776 未加载

评论 #40854485 未加载

评论 #40854897 未加载

Frieren11 months ago

I see that I am the only one bothered by the title. Optimizing for X is a very common expression, so I read it as "optimizing for loops" as is adding more loops instead of the correct "optimizing for-loops".Or am I the only one?

larsga11 months ago

FWIW, my experience with JSLT <a href="https://github.com/schibsted/jslt/">https://github.com/schibsted/jslt/</a> was that I first wrote an AST interpreter. Then I and another person independently wrote bytecode interpreters, but the AST interpreter remains faster than both.It may be that the JVM is simply better at optimizing one type of code than the other.

评论 #40855027 未加载

评论 #40855662 未加载

Thorrez11 months ago

Is this pushing a stack element for every loop iteration before starting the loop execution? If there are a ton of iterations, won't the stack overflow?

o11c11 months ago

I'm not sure how useful this is for interpreters in general, given the frankly bizarre design decisions. My normal rant about bad interpreter design assumes things not seen here.

评论 #40854181 未加载

评论 #40854306 未加载

diffxx11 months ago

Premature optimization is the root of all evil. Instead of optimizing the performance of the interpreter, they should optimize the language semantics to be fast by default.

评论 #40871332 未加载

8 comments

gizmo11 months ago

评论 #40858048 未加载

评论 #40856784 未加载

评论 #40858125 未加载

评论 #40858200 未加载

评论 #40860170 未加载

bearjaws11 months ago

brabel11 months ago

You can get a Phd by doing stuff like this?? I do that kind of thing for fun after work on my hobby projects, and at work I've spent weeks speeding up a single HTTP request path.

评论 #40854177 未加载

评论 #40854031 未加载

评论 #40856213 未加载

评论 #40854776 未加载

评论 #40854485 未加载

评论 #40854897 未加载

Frieren11 months ago

larsga11 months ago

评论 #40855027 未加载

评论 #40855662 未加载

Thorrez11 months ago

Is this pushing a stack element for every loop iteration before starting the loop execution? If there are a ton of iterations, won't the stack overflow?

o11c11 months ago

I'm not sure how useful this is for interpreters in general, given the frankly bizarre design decisions. My normal rant about bad interpreter design assumes things not seen here.

评论 #40854181 未加载

评论 #40854306 未加载

diffxx11 months ago

Premature optimization is the root of all evil. Instead of optimizing the performance of the interpreter, they should optimize the language semantics to be fast by default.

评论 #40871332 未加载

Spending too much time optimizing for loops

8 comments

Spending too much time optimizing for loops

8 comments