Anyone looking to do this kind of thing, the macropy library makes it a fair bit easier: <a href="https://macropy3.readthedocs.io/en/latest/" rel="nofollow">https://macropy3.readthedocs.io/en/latest/</a><p>You really do need to be modifying the AST and not just applying a regex to the source code text. This can be a challenge if you want to modify python syntax as you first need to get an AST you can modify, personally I recommend starting with the LARK library and modifying the python syntax grammer it includes.<p><a href="https://lark-parser.readthedocs.io/en/latest/examples/advanced/python_parser.html" rel="nofollow">https://lark-parser.readthedocs.io/en/latest/examples/advanc...</a><p>I use similar techniques to transpile openscad code into a python AST in my "pySdfScad" project, while still theoretically getting the benefits of fancy tracebacks and debugging and the like. Probably should have gone with a simple parser instead, but what can you do.<p>I think they should have stopped at the "cursed way" and not the "truly cursed way", if they really wanted the syntax changes than having your own python parser like the LARK implementation I mention above is a must.
I don't understand the reasoning why the transformation directly on the source code is better than on the AST level?<p>Because you can use coding and then it's somewhat automatic?<p>But you can do sth similar on AST level, namely installing a meta path finder, i.e. an import module hook which does the AST transformation automatically (<a href="https://docs.python.org/3/library/sys.html#sys.meta_path" rel="nofollow">https://docs.python.org/3/library/sys.html#sys.meta_path</a>).<p>Then, there is also the new frame evaluation API (<a href="https://peps.python.org/pep-0523/" rel="nofollow">https://peps.python.org/pep-0523/</a>), which allows you to dynamically rewrite bytecode. This has been used by PyTorch 2.0 for torch.compile.
I did something similar to option 3 to make the builtin numeric types "callable", since the dunder __call__ methods can't be overwritten for builtins. For example, in regular arithmetic notation, something like 6(7+8) could be read as 6*(7+8), but trying to do this in Python gives you `SyntaxWarning: 'int' object is not callable;`, since you're essentially trying to do a function call on the literal 6. The workaround was to use a custom codec to wrap all integer literals to give them the expected call behavior.<p>Repo if anyone is interested: <a href="https://github.com/ckw017/blursed">https://github.com/ckw017/blursed</a><p>This was inspired by a way less silly usecase, future f-strings, which added f-string support to older versions of Python in a similar way using codecs: <a href="https://github.com/asottile-archive/future-fstrings">https://github.com/asottile-archive/future-fstrings</a>
I've attempted something more crazy -- to make Python support multi-line lambda expression [1].<p>It's done with AST manipulation as well, but on a larger scale and completer functionalities.<p>[1] <a href="https://github.com/hsfzxjy/lambdex">https://github.com/hsfzxjy/lambdex</a>
So it seems you can register a source transformer with python which will trigger when a `# coding: ...` header is used (to signal character encoding).<p>This seems like an interesting avenue for static analysis tools, code generators, (etc.) to explore. Is it a viable approach?<p>I'd be interested in some analysis on this as a mechanism: performance, maintability, etc. wise.<p>The "dont do this" argument writes itself; nevertheless, its worth exploring.
Didn't know about codecs ...<p>The idea of using (abusing in fact) codecs to pre-process python code before it gets to the actual python interpreter is just fantastic!<p>I'm starting to think of all the terrible things I'm going to be able to do with this to work around things that have annoyed me for years in python.<p>[EDIT]: I'm thinking one can probably plug the C preprocessor in python now ... oh the sheer joy :D
A long time ago I stumbled upon (and contributed to) <a href="https://github.com/delfick/nose-of-yeti">https://github.com/delfick/nose-of-yeti</a>. The most delightful bit of cursed Python I’ve seen!
This is evil, and I love it.<p>I wonder if the codec could use python's lexer (assuming it's exposed) to parse the for loops and nothing else. Then replace the loops with a placeholder, and then replace the placeholder in the AST after a parse. Might be cleaner than source->source transform by the codec, maybe not.
Where did the "cursed" adjective start being used like this? It's not used in my country, and I've only seen it being used in the last couple of years as part of subreddit titles.
Codecs free to transform all code before it’s executed? Sounds like the perfect place for hackers to hide RCEs very difficult to spot. Just hide an innocent comment at the top of a file.
Two things:<p>1. I wish more HN posts ended in "for fun" -- so much of what makes being a progammer fun and enjoyable is plumbing the depths of what is possible, not because it's "best practice" or whatever. I think these types of forrays are where true mastery comes from.<p>2. The less reserved keywords a language has, the better. It makes things like this easier. Years ago I figured out how to implement Goto in Smalltalk "for fun". It was easier because Smalltalk had less reserved keywords.
Some people have a lot of time on their hands.<p><pre><code> (defmacro cfor [initialize condition advance #* body]
`(do ~initialize
(while ~condition (do ~@body ~advance))))
(cfor (setv i 0) (< i 10) (+= i 1)
(print i))
</code></pre>
This runs on the python vm with hy right now. Took about 2 mins to have it up and running after noticing this post on hn. Couldn't pass it up.
> Or alternatively: How I made the most cursed Python package of all time.<p>Beg your pardon, <a href="http://entrian.com/goto/" rel="nofollow">http://entrian.com/goto/</a> exists.