TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

JavaScript-Is-Weird as a compressor

111 pointsby mgarciaisaiaover 1 year ago

12 comments

koito17over 1 year ago
Just for fun, I decided to pass the output.js file through Google Closure Compiler&#x27;s advanced optimizations. It does a surprisingly good job at reconstructing part of the strings.<p><pre><code> % npx google-closure-compiler -O ADVANCED --js output.js (()=&gt;{})[&quot;co&quot;+(1&#x2F;0+[])[4]+&quot;structor&quot;](&quot;co&quot;+(1&#x2F;0+[])[4]+... </code></pre> Not pasting the full thing. But it reduces the output.js file from ~118 KiB to ~9.92 KiB, which is pretty good!<p>There is technically not much stopping the compiler from inferring that 1&#x2F;0 === Infinity, recognizing (1&#x2F;0+[])[4] is free of side-effects, and eventually concluding its safe to substitute the whole expression with &quot;n&quot;. Google Closure already has optimizations for string concatenation, so if it were able to perform an optimization pass with Infinity, then it would also be able to emit the string &quot;constructor&quot; instead of &quot;co&quot;+(1&#x2F;0+[])[4]+&quot;structor&quot;
评论 #37910747 未加载
评论 #37912002 未加载
评论 #37912694 未加载
评论 #37913099 未加载
crazygringoover 1 year ago
I&#x27;m actually surprised at how insanely terrible the &quot;compression&quot; is.<p>Looking at the table at the end, I&#x27;m not surprised at all that the &quot;weird&quot; obfuscated code is ~2000x the size of the original source.<p>But I <i>am</i> surprised that that the gzipped weird code is still ~25x the size of the original source, as opposed to ~0.25x for gzipping the original source.<p>After all, the amount information in the weird code should still ultimately be approximately the same as the original source code, right? Or maybe double or something like that. I&#x27;m very surprised it&#x27;s <i>twenty-five times as much</i>.<p>The only reason I can guess is that the &quot;weird&quot; process results in information structures that are represented in an extremely <i>hierarchical</i> way, and gzip is built for <i>stream</i> compression, and is unable to find&#x2F;represent&#x2F;compress hierarchical structures?<p>And if that&#x27;s the case, it makes me wonder if there <i>are</i> any compression algorithms which are able to handle that better? That might not be based on &quot;dictionary words&#x2F;sequences&quot; as much, but rather attempting to find &quot;nestable&#x2F;repeatable syntax patterns&quot;?
评论 #37910514 未加载
评论 #37913745 未加载
评论 #37915038 未加载
评论 #37915423 未加载
wizzard0over 1 year ago
1) The title is a clickbait, but<p>2) Thanks for leading with an example of a negative result! That&#x27;s what any researcher faces every day, unlike what gets published, after all
评论 #37912147 未加载
urbandw311erover 1 year ago
My view of the author changed somewhat when I reached the section where he had to ask ChatGPT how to read command line parameters.
评论 #38029100 未加载
评论 #37912825 未加载
评论 #37912230 未加载
评论 #37912410 未加载
评论 #37912816 未加载
SquareWheelover 1 year ago
It&#x27;s not a huge surprise that gzip as a general compression algorithm didn&#x27;t compress this down any further. I do wonder about a format that was specifically trained on these specific characters though, and the patterns that tend to emerge from the weird compiler. Maybe the chunks at a certain scale would be predictable and thus compressible.<p>Of course at that point you&#x27;re probably more interested in a common binary format, and should start thinking about wasm instead.
评论 #37910021 未加载
klabb3over 1 year ago
Maybe OT, but does anyone know if there is runtime analysis of JS and&#x2F;or wasm?<p>Obfuscation is a serious threat to the open web, and things like fingerprinting can be incredibly invasive.<p>Web browsers typically only support static “prettifying” (ie auto indent). I’ve seen websites probe for chrome extensions, canvas and all kinds of APIs. Deobfuscators are often not enough to restore legibility. (I assume disassemblers are similar, but I’ve never tried.)<p>I would love to have a trace&#x2F;intercept&#x2F;breakpoints of any external APIs called in order to restore a sense of control over what code websites run. Ideally, integrated in the browser. With WASM gaining popularity, this will become (much) more important, imo.
mgarciaisaiaover 1 year ago
Author here! I&#x27;ve just enabled Issues &amp; PRs in the repo if you want to chime in there.
dclowd9901over 1 year ago
It might be that GZIP isn’t actually a good format with which to try to compress this data. I would think a compression algorithm that expects a rather large 8 byte character space wouldn’t be very suitable for a 4-bit space
评论 #37909762 未加载
klyrsover 1 year ago
This is way more fun than the &quot;base64-&gt;gzip&quot; algorithm that so many of us have tried upon learning about compression...
molfover 1 year ago
Using `xz -9 -e` results in pretty reasonable compression, in comparison:<p><pre><code> 19648 dommy-2.0.js 28092 dommy-2.0.weird.js.xz 115023 lodash-4.17.15.js 138808 lodash-4.17.15.weird.js.xz 7114 modernizr-custom.js 11148 modernizr-custom.weird.js.xz</code></pre>
ShamelessCover 1 year ago
&gt; So, yeah - this isn&#x27;t a good idea. If the Weird transpiler only changes the encoding of each character with a really weird equivalent, it makes a lot of sense that it doesn&#x27;t compress better than the source one - the ideal scenario would be to compress the same.<p>If your results disagree with your premise in unsurprising and obvious ways - please start your article with that so I can stop reading it.
garba_dlmover 1 year ago
and Golang is a clever way to build up an AI
评论 #37919514 未加载