To be clear, this is an <i>extremely</i> tiny subset of JS. It looks like they only implemented the features needed to run a very specific function. For example, the only symbol allowed after "new" is "Date", everything else throws an exception.<p>It's still fun that it's there, but it's not as big a deal as it sounds from the tweet.
Anyone who has ever pulled a website from a script knows the pain that is Javascript. Normally you want to just get some text and work out the API actions but a lot of sites use horribly obfuscated Javascript -- either because that's what modern web development is (lolz) -- or because its part of their 'security.' That means if you want to write browser-based bots properly -- you ought to use a browser. There are special browsers that run 'headlessly' or are designed mostly for bot use. Like <a href="https://www.selenium.dev/" rel="nofollow">https://www.selenium.dev/</a> which plugs into a few different 'browser engines.'<p>But now you have another problem. Your simple script goes from being small, simple, self-contained, and elegant gem, to requiring a full browser, specialized drivers, and/or daemons running just to work. If you're using something like Python you just frankly don't have very good packaging. So it's hard to string together all that into a solution and have it magically work for everyone. What YouTube-dl have done is good engineering. Even though it's not a full JS interpreter: they've kept their software lean, self-contained, and easier to use.
Can we stop the trend of linking to tweets that just contain another link to the content? what's the point? Wouldn't this be 10x better if it was a link directly to the github?
Nowadays "javascript" refers to the scriptable, grotesquely and absurdely complex and massive web engines, aka google financed blink and geeko, then apple financed webkit, that with their SDK.<p>The currently obfuscated javascript media players will try to break yt-dlp by leveraging the complexity and size of those scripted web engines. They will make them out of reach to small teamns or individuals and it is even "better", it will force ppl to use apple or google web engine, killing any attempt to provide a real alternative.<p>A standalone javascript interpreter is actually some work, but seems to stay in the "reasonable" realm: look at quickjs from M. Bellard and friends (the guy who created qemu, ffmpeg, tinycc, etc): plain and simple C (no need of a c++ compiler), doing the job more that well enough.<p>That's why noscript/basic (x)html is so much important.
This isn't really JS, it's a purpose built evaluator that's only for evaluating a particular script on YouTube, assuming a huge list of things are true about how YouTube JS is written.<p>Ex. Its got a hard coded list of methods for String, and it doesn't respect prototypes. It only supports creating Date instances, and won't work if you override the global Date. It parses with regexes and implements all operators with python's operator module (which is the wrong type semantics) etc. Nearly none of the semantics of JS are implemented.<p>It's sort of the sandwich categorization problem:<p>If I write a C# "interpreter" in perl thats only 200 lines and just handles string.Join, string.Concat and Console.WriteLine, and it doesn't actually try to implement C# syntax or semantics at all and just uses perl semantics for those operations is it actually C#? :P<p>I say "not a sandwich".
The same in yt-dlp <a href="https://github.com/yt-dlp/yt-dlp/blob/master/yt_dlp/jsinterp.py" rel="nofollow">https://github.com/yt-dlp/yt-dlp/blob/master/yt_dlp/jsinterp...</a><p>Interesting to see the diffcheck between the two <a href="https://www.diffchecker.com/8EJGN27K" rel="nofollow">https://www.diffchecker.com/8EJGN27K</a>
To understand why, I have a far simpler tool that focuses on a subset of sites (adult content video aggregators)<p><a href="https://github.com/kristopolous/tube-get" rel="nofollow">https://github.com/kristopolous/tube-get</a><p>It too deals with this problem but does so in a way that'd be easy to maliciously sabotage<p>Look right about here <a href="https://github.com/kristopolous/tube-get/blob/master/tube-get.py#L111" rel="nofollow">https://github.com/kristopolous/tube-get/blob/master/tube-ge...</a><p>As to why this program exists, this was originally written between about 2010-2015 or so technically predates the yt-* ecosystem.<p>The tool still works fine and it's not a strict subset of yt-dlp or YouTube-dl because being a different approach, although it's overall site coverage is smaller, I've had it be a "second try" system when yt-* fails and it comes up with success maybe about half the time
They just don't want to use any external dependencies... There is also an AES implementation: <a href="https://github.com/ytdl-org/youtube-dl/blob/master/youtube_dl/aes.py" rel="nofollow">https://github.com/ytdl-org/youtube-dl/blob/master/youtube_d...</a>
How should a programming noob interpret this? Be impressed at what was achieved here? Be concerned about security implications using the tool? Something else entirely?
Another really cool JS dialect I recently learned about is njs from the nginx team: <a href="https://github.com/nginx/njs" rel="nofollow">https://github.com/nginx/njs</a><p>This video goes into some of the design and tradeoffs: <a href="https://www.youtube.com/watch?v=Jc_L6UffFOs" rel="nofollow">https://www.youtube.com/watch?v=Jc_L6UffFOs</a><p>TL;DW: they optimized for fast creation/destruction of low-footprint VMs with no JIT or garbage collection.
the tests for it: <a href="https://github.com/ytdl-org/youtube-dl/blob/master/test/test_jsinterp.py" rel="nofollow">https://github.com/ytdl-org/youtube-dl/blob/master/test/test...</a>
This is super cool.<p>Some of the stuff is <i>kind of</i> questionable to me in the sense that I could believe you could probably make some kind of sufficiently wonky JS that this would do the "wrong" thing.<p>But it's super cool that they are able to do this as I think it shows that claims of JS complexity based on the size of JS engines is overlooking just how much of that size/complexity comes from the "make it fast" drive vs. what the language requires. Here you have a <1000LoC implementation of the core of the JS language, removed from things like regex engines, GCs, etc.<p>Mad props to them for even attempting it as well - it simply would not have ever occurred to me to say "let's just write a small JS engine" and I would have spent stupid amounts of time attempting to use JSC* from python instead.<p>[* JSC appears to be the only JS engine with a pure C API, and the API and ABI are stable so on iOS/macOS at least you can just use the system one which reduces binary size+build annoyance. The downside is that C is terrible, and C++ (differently terrible? :D) APIs make for much more pleasant interfaces to the VM - constructors+destructors mean that you get automatic lifetime management so handles to objects aren't miserable, you can have templates that allow your API to provide handles that have real type information.
JSC only has JSValueRef and JSObjectRef, and as a JSObjectRef is a JSValueRef it's actually just a typedef to const JSValueRef :D
OTOH other hand I do thing JSC's partially conservative GC is better for stack/temporary variables is superior to Handles for the most part, but it's also absolutely necessary to have an API that isn't absolutely wretched.
The real problem with JSC's API is that it has not got any love for many many many .... many years so it doesn't have any way to handle or interact with many modern features without some kludgy wrappers where you push your API objects into JS and have the JS code wrap them up. The API objects are also super slow, as they basically get treated as "oh ffs" objects that obey no rules. I really do wish it would get updated to something more pleasant and really usable.]
I do wonder why YouTube does not try harder to make it difficult to do this computation meant to prove you are a legit YouTube web client. Providing an easy-to-find, simple JS function interpretable with 900 lines of Python is like they don't try at all. They might as well do nothing.<p>Or is their goal just to make youtube-dl not 100% reliable? Or to be able to say "look, you are running our code in a way we did not intend, you can't do this because you are breaking the EULA"?
I was expecting this to be about Duktape <<a href="https://github.com/svaarala/duktape" rel="nofollow">https://github.com/svaarala/duktape</a>>, but heh, for sure no. I'd bet $1 there's no way youtube-dl would switch, but I wonder if yt-dlp would?
They must have been inspired by this PyCon presentation, where David Beazley live codes a fully working webassembly interpreter, in under one hour. <a href="https://youtu.be/VUT386_GKI8" rel="nofollow">https://youtu.be/VUT386_GKI8</a>
This seems to be a pretty small subset of JavaScript, but I personally love small projects like this for educational purposes. Removing the noise and keeping things minimal helps my brain reason about things.<p>Earlier this year I enrolled in an online class called "Building a Programming Language" taught by Roberto Ierusalimschy (creator of Lua) and Gustavo Pezzi (creator of pikuma.com). We created a toy language interpreter/VM and the final code was around of 1,800 lines of Lua code. Keeping things as simple (and sometimes naive) as possible was definitely the right choice for me to really wrap my head around the basic theory and connect the dots.<p>Thanks for the link.
Greenspun's Tenth Rule:<p>> Any sufficiently complicated C or Fortran program contains an ad hoc, informally-specified, bug-ridden, slow implementation of half of Common Lisp. [1]<p>And here we have a complicated Python program with a partial JS implementation in it.<p>[1] <a href="https://en.wikipedia.org/wiki/Greenspun's_tenth_rule" rel="nofollow">https://en.wikipedia.org/wiki/Greenspun's_tenth_rule</a>