TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

An Attempt to Catch Up with JIT Compilers

203 pointsby mfiguiere3 months ago

9 comments

Voultapher3 months ago
Love to see negative results published, so so important.<p>Please let&#x27;s all go towards research procedure that enforces the submission of the hypothesis before any research is allowed to commence and includes enforced publishing regardless of results.
评论 #43265230 未加载
评论 #43254424 未加载
评论 #43255612 未加载
评论 #43254771 未加载
pizlonator3 months ago
I think the missing piece here is that JavaScriptCore (JSC) and other such systems don&#x27;t just use inline caching to speed up dynamic accesses; they use them as profiling feedback.<p>So, anytime you have an IC in interpreter, baseline, or lightly optimized code, then that IC is monitored to see how polymorphic it gets, and that data is fed back into the optimization pipeline.<p>Just having an IC as a dead-end, where you don&#x27;t use it for profiling is way less profitable than having an IC that feeds into profiling.
评论 #43244450 未加载
评论 #43244259 未加载
评论 #43244915 未加载
评论 #43245027 未加载
评论 #43244724 未加载
c-smile3 months ago
Slightly orthogonal...<p>In my Sciter, that uses QuickJS (no JIT), instead of JIT I&#x27;ve added C compiler. That means we can add not just JS modules but C modules too:<p><pre><code> import * as cmod from &quot;.&#x2F;cmodule.c&quot; </code></pre> Such cmodule will be compiled and executed on the fly into native code. Idea is simple each language is good for specific tasks. JS is flexible and C is performant - just use right tool that is most optimal for a task.<p>c-modules play two major roles: FFI and number crunching code execution.<p>Sciter uses TCC compiler and runtime.<p>In total size of QuickJS + TCC binary bundle 500k + 220k = 720k.<p>For the comparison: V8 is of 40mb size.<p><a href="https:&#x2F;&#x2F;sciter.com&#x2F;c-modules-in-sciter&#x2F;" rel="nofollow">https:&#x2F;&#x2F;sciter.com&#x2F;c-modules-in-sciter&#x2F;</a> <a href="https:&#x2F;&#x2F;sciter.com&#x2F;here-we-go&#x2F;" rel="nofollow">https:&#x2F;&#x2F;sciter.com&#x2F;here-we-go&#x2F;</a>
评论 #43252937 未加载
评论 #43251305 未加载
tonnydourado3 months ago
Tangentially, fuck yeah, negative results, just as good as positive ones
评论 #43248382 未加载
VeejayRampay3 months ago
the people who came up with this are obviously brilliant but being french myself, I really wonder why no one is proof-reading the english, this gives an overall bad impression of the work imho
评论 #43248333 未加载
评论 #43248064 未加载
tsunego3 months ago
chasing inline cache micro-optimizations with dynamic binary modification is a dead end. modern CPUs are laughing at our outdated compiler tricks. maybe it&#x27;s time to accept that clever hacks won’t outrun silicon.
评论 #43252369 未加载
评论 #43248574 未加载
ErikCorry3 months ago
It&#x27;s good that they post negative results, but it&#x27;s hard to know exactly why their attempt failed, and it&#x27;s tempting for me to make guesses without doing any measurements, so let me fall for that temptation:<p>They are patching inline-cache sites in an AOT binary and not seeing improvements.<p>Only 17% of the inline-cache sites could be optimized to what they call O2 level (listing 7). Most could only be optimized to O1 level (listing 6). The only difference from the baseline (listing 5) to O1 is that they replaced:<p>mov 0x101c(%rip), %rax # load the offset<p>with<p>mov 0x3, %rax # load the offset<p>I&#x27;m not very suprised that this did not help much. The old load is probably hoisted up and loaded into a renamed register very early, and it won&#x27;t miss in the cache.<p>Basically they already have a pretty nice inline cache system at least for the monomorphic case, and messing with the exact instructions used to implement it doesn&#x27;t help much. A JIT is able to do so much more, eg polymorphic cases, inlining of simple methods, and eliminating repeated checks of the same hidden class. Not to mention detecting at runtime that some unknown object is almost always an integer or a float and JITting code specialized for that.<p>People new to virtual machines often focus on the compiler, whereas the stuff that moves the needle is often around the runtime. How tagged and typed data is represented, the GC implementation, and the object layout. Eg this paper explores an interesting new tagging technique and makes a huge difference to performance (there&#x27;s some author overlap): <a href="https:&#x2F;&#x2F;www.researchgate.net&#x2F;figure&#x2F;The-three-representations-in-a-tagged-object-system-here-shown-on-a-little-endian_fig1_386112036" rel="nofollow">https:&#x2F;&#x2F;www.researchgate.net&#x2F;figure&#x2F;The-three-representation...</a><p>Incidentally the assembly syntax in the &quot;Attempt to catch up&quot; article is a bit confusing. It looks like the IC addresses are very close to the code, like almost on the same page. Stack overflow explains it:<p>GAS syntax for RIP-relative addressing looks like symbol + current_address (RIP), but it actually means symbol with respect to RIP.<p>There&#x27;s an inconsistency with numeric literals:<p>[rip + 10] or AT&amp;T 10(%rip) means 10 bytes past the end of this instruction<p>[rip + a] or AT&amp;T a(%rip) means to calculate a rel32 displacement to reach a, not RIP + symbol value. (The GAS manual documents this special interpretation)
ajross3 months ago
This seems poorly grounded. In fact almost three decades after the release of the Java HotSpot runtime we&#x27;re still waiting for even one system to produce the promised advantages. I guess consensus is that V8 has come closest?<p>But the reality is that hand-optimized AoT builds remain the gold standard for performance work.
评论 #43243509 未加载
评论 #43244293 未加载
评论 #43244179 未加载
评论 #43243957 未加载
评论 #43244320 未加载
评论 #43243618 未加载
评论 #43244076 未加载
评论 #43245880 未加载
评论 #43244761 未加载
评论 #43243276 未加载
devit3 months ago
The paper seems to start with the bizarre assumption that AOT compilers need to &quot;catch up&quot; with JIT compilers and in particular that they benefit from inline caches for member lookup.<p>But the fact is that AOT compilers are usually for well-designed languages that don&#x27;t need those inline caches because the designers properly specified a type system that would guarantee a field is always stored at the same offset.<p>They might benefit from a similar mechanism to predict branches and indirect branches (i.e. virtual&#x2F;dynamic dispatch), but they already have compile-time profile-guided optimization and CPU branch predictors at runtime.<p>Furthermore, for branches that always go in one direction except for seldom changes, there are also frameworks like the Linux kernel &quot;alternatives&quot; and &quot;static key&quot; mechanisms.<p>So the opportunity for making things better with self-modifying code is limited to code where all those mechanisms don&#x27;t work well, and the overhead of the runtime profiling is worth it.<p>Which is probably very rare and not worth bringing it a JIT compiler for.
评论 #43248085 未加载
评论 #43248811 未加载