It's probably better than regular expressions. However, is it enough better that it's worth learning yet another syntax?<p>Well, maybe. What I REALLY like about this one, is that it fully reverses the quoting/escaping assumptions. The default assumption of old regexes that symbols should by default match themselves, I think is regexes' million dollar mistake. The literal matches are the least interesting part of regexes. If you reach for regexes, it's because you want something more complex than literal matches, and the syntax should be about taming that complexity. Even the Unix world sort of conceded that, in reversing the quoting assumption for ?, +, (), [] and | in egrep. The mistake was in stopping there.<p>I will take a good look at this. I hope they provide good justifications for their choices.
For spec review from business partners I have found "verbal expressions" the most useful flavor. Specifically because to review them you don't need to know regular expressions and the library has ports for most programming languages.<p>Example:
<a href="https://github.com/VerbalExpressions/JSVerbalExpressions#testing-if-we-have-a-valid-url">https://github.com/VerbalExpressions/JSVerbalExpressions#tes...</a>
Why don't languages have "grok" patterns in their standard libraries?<p>It seems to only exist in log parsing ecosystems but this really helps with getting rid of little bugs and wrong parsing of specific regex patterns.<p>Instead of doing "^\d+(\.\d+){3}$" for IP checking which is clearly wrong, you'd do "%{IPV4:ip}" which is so much better.<p>List of known patterns : <a href="https://github.com/hpcugent/logstash-patterns/blob/master/files/grok-patterns">https://github.com/hpcugent/logstash-patterns/blob/master/fi...</a><p>Even for PHP a third party library only has 15 stars.
While I'm not sure about Pomsky specifically, I do think its nice that people explore the language space for regex more. General programming languages have huge variety of styles and syntaxes available, from APL to Haskell and Lisp, whereas regexes are pretty much the same everywhere. It feels like we stuck with the first thing that Kleene, Thompson et al thought of and for 50 years didn't even really try anything else.
To me, the lead example on the homepage ("Basic") is a major red flag. This is not clearer than a traditional regular expression:<p>'Hello' ' '<p>Did you count the single quotes correctly? Even with syntax highlighting, people WILL mess this up.
I find something like this a lot more readable:<p><a href="https://github.com/jkrumbiegel/ReadableRegex.jl">https://github.com/jkrumbiegel/ReadableRegex.jl</a><p>It is in Julia, but if you have it installed locally it’s just a few taps away. You can even generate the regex, and use that in Python and just add the ReadableRegex in a comment nearby.
Named regexes (variables in Pomsky) remind me of Raku [1], which implements an improved flavor of PCRE regexes plus grammars in general as part of the language.<p>[1]: <a href="https://docs.raku.org/language/grammars" rel="nofollow">https://docs.raku.org/language/grammars</a>
The idea of a compile-to-regex language is a neat one, that immediately makes it a lot easier to use in existing projects.<p>If there's any interest in other takes on "better regexps", the Rosie Pattern Language has some neat ideas:<p>[0] <a href="https://rosie-lang.org/index.html" rel="nofollow">https://rosie-lang.org/index.html</a><p>[1] <a href="https://www.youtube.com/watch?v=MkTiYDrb0zg&list=PLcGKfGEEONaBUdko326yL6ags8C_SYgqH&index=12">https://www.youtube.com/watch?v=MkTiYDrb0zg&list=PLcGKfGEEON...</a>
I don't know why I'd never previously considered regular expressions as being a compile/transpile target. It's pretty obvious from PL theory and makes a ton of sense.<p>That said, after looking at this syntax, I'm not sure that this is much of an improvement. Maybe I've spent far too much time in Regex land [1], but I know I'd perform much slower in this. It's not particularly beautiful, either. The verbosity doesn't seem clearer.<p>Variables and comments are great, though. We need to add them in future regexes.<p>Overall, good idea. I'd like to see more takes on this.<p>[1] <a href="https://jimbly.github.io/regex-crossword/" rel="nofollow">https://jimbly.github.io/regex-crossword/</a>
For Scheme I enjoy using Irregex: <a href="http://synthcode.com/scheme/irregex/" rel="nofollow">http://synthcode.com/scheme/irregex/</a>
The key benefit is to have a proper DSL instead of a DSL hidden away inside an untyped string.
This was previously called Rulex. Glad to see it is getting traction :)
I made a video on it a few months ago. <a href="https://www.youtube.com/watch?v=nPjCxwEdIIo">https://www.youtube.com/watch?v=nPjCxwEdIIo</a>
Olin Shivers defined the related "SRE" S-expression language for regular expressions:<p><a href="https://scsh.net/docu/html/man-Z-H-7.html" rel="nofollow">https://scsh.net/docu/html/man-Z-H-7.html</a><p><a href="https://www.ccs.neu.edu/home/shivers/papers/sre.txt" rel="nofollow">https://www.ccs.neu.edu/home/shivers/papers/sre.txt</a><p>It works nicely with Scheme, including for programmatic generation.<p>(I've done some regular expressions from heck, without the benefit of this, and needed extensive commenting just to keep a few/several levels of nested groupings straight. With S-expressions, that's trivial.)
I seem to need regular expressions about once every six months. Every time I do I wonder how this horrible little language became ubiquitous. However, I’m not going to pull in a dependency just to avoid it (at my current regex cadence at in any case).
Some regex engines/dialects also allow pre-defined named subroutines (variables) with `(?(DEFINE) (?<name>pattern)...)` <a href="http://www.rexegg.com/regex-disambiguation.html#define" rel="nofollow">http://www.rexegg.com/regex-disambiguation.html#define</a>, but it's a very niche feature. e.g. <a href="https://gist.github.com/moreati/9d974e5395829d737dc342715f15fc56" rel="nofollow">https://gist.github.com/moreati/9d974e5395829d737dc342715f15...</a>
I like regex, the syntax is succinct, powerful, and easy to learn. I learned regex before I learned any proper programming language, because you could use it in a text editor. I remember feeling like a wizard at the time.<p>I do admit it can be a write once hopefully never need to read again thing, but I still love it.
I had a similar kind of idea for a long time, which I put into action a few weeks ago via a standalone transpiler of Emacs' rx macro to common regexp syntaxes.[0] I ended up getting interrupted and didn't completely finish it, but it generally works, though is probably riddled with edge cases.<p>The basic idea of rx is to use S-expressions to describe regular expressions, and my elevator pitch would've been to embed rx invocations in shell scripts using $(syntax), the main use case being something like sed invocations.<p>I still think it's a neat idea, and complex regular expressions tend to be hard to parse for humans.<p>[0]: <a href="https://github.com/sulami/rx">https://github.com/sulami/rx</a>
Would be nice if it was possible for the Pomsky playground to show informative modal boxes when you hover over some of the Pomsky style expressions. Kind of like RegExr which remains my favorite tool for quickly writing out a regular expression and seeing what its outcome will be. Very nice to be able to quickly see what some thing does and how it affects your query including documentation in an easily accessible location on the same page.<p>Not having to navigate to a different page would be a boon for the playground. Very interesting though, I use Regular Expressions for the times when you need to extract information but find and replace functions/methods just aren't enough. Mostly for my scrapers.
Number ranges look great, but it's much better to just extend + polyfill current regex syntax and keep compatibility.<p>[:0-255:] could be an option for example to write a number range.<p>Also regex / variable interpolation should be added to regexes in languages probably :)
I will never understand why a regex in Java still reqires metacharacters to be escaped despite the recent addition of raw string literals. Talk about stuck in the Stone Age.
FWIW here is a list of other such projects: <a href="https://github.com/oilshell/oil/wiki/Alternative-Regex-Syntax">https://github.com/oilshell/oil/wiki/Alternative-Regex-Synta...</a><p>Feel free to add other projects, including the ones in this thread
That doesn't solve the most problematic part of regexing for me: having a different programming language embedded as a string in my other source code. The same happens with SQL in many codebases.<p>The embedded string is often precluded from static analysis. Fieldnames or names of capture groups may drift apart from the outside code and maybe nobody notices.<p>What we need are more embedded DSLs where we get static syntax and type checks and hopefully a good integration between the surrounding code and the embedded code.
Cognition works in two ways: 1) highly deliberate, slow, patient and 2) spontaneous, quick, reactive. I believe software languages do a great job of leveraging both of these capacities of the minds of programmers.<p>The first type of thinking is generally not composed in code. It happens staring out the window, or looking at data tables or lists and reading documentation, drawing out the architecture. The second type happens in code composition: programmers know what snippet will accomplish which outcome, and it just comes out when we know it's needed. Boom, we have the thing we imagined during the architecture phase. Maybe we look something up or slow down while writing code, but the composition is fairly reactive/spontaneous. It flows.<p>So coding goes in this rhythm... thinking openly and patiently and deliberately, followed by a burst of code composition, and then more open/deliberate thought while staring out a window. Again and again. Maybe we reverse some of the quick/spontaneous thinking to rearchitect according to our slow/deliberate thought.<p>Enter regular expressions. I have never, ever spontaneously composed a regex. I have to be very slow and deliberate. I have to recall the precise definition of every character in the regex as I read or write it. Sometimes I stare at a regex for 2 minutes, look up some operator, stare again for 2 minutes, refactor it. Finally I understand it and can now apply the regex spontaneously to whatever environment I'm working in. It always feels like first principles architecture, never like flow. This is odd, as regex is like a giant shorthand engine.<p>Maybe I have not yet formed the neurological connections required for it to flow – but more likely this is a function of the language. There must be a syntax that better supports flow-state composition. (I don't think Pomsky accomplishes it.)
The only solution I've found for regex which is 100% compatible with the full capabilities, and is also portable to any language so you can learn just one syntax, is regex.
Related projects:<p>1. - Xerox xfst<p>2. - Xerox lexc<p>3. - Xerox twolc<p>4. - Mans Hulden's FOMA<p>Repo: <a href="https://fomafst.github.io/" rel="nofollow">https://fomafst.github.io/</a><p>Paper: <a href="https://dingo.sbs.arizona.edu/~mhulden/hulden_foma_2009.pdf" rel="nofollow">https://dingo.sbs.arizona.edu/~mhulden/hulden_foma_2009.pdf</a><p>Demo: <a href="https://dsacl3-2018.github.io/xfst-demo/" rel="nofollow">https://dsacl3-2018.github.io/xfst-demo/</a><p>Tutorial: <a href="https://foma.sourceforge.net/lrec2010/" rel="nofollow">https://foma.sourceforge.net/lrec2010/</a><p>5. Helsinki HfstXfst:<p>Homepage: <a href="https://github.com/hfst/hfst/wiki/HfstXfst">https://github.com/hfst/hfst/wiki/HfstXfst</a><p>These tools go back to research by Lauri Karttunen and others at
Xerox Research Center Europe in Grenoble, where an attempt was made
to create highly efficient compilers and runtime libraries for finite-state
transducers, i.e. to move beyond regular expressions to regular RELATIONS.
This not only permits to formalize replacements (regular expressions with an
"output tape"), but also creates reversible automata (input and output roles can be swapped) and leads to a domain specific language that describes transducers in very readable ways, including sub-automata naming, so that it can be useful for formal specification or linguistic rules (phonology, morphology, i.e. word or sound grammar). The latter two projects are open source clones of the former
effort. Once you have used these for a week, you will never want to get back to ugly "ordinary" regexes again.<p>Books:<p>(a) <a href="https://www.amazon.co.uk/Finite-State-Processing-Synthesis-Lectures-Technologies/dp/163639115X" rel="nofollow">https://www.amazon.co.uk/Finite-State-Processing-Synthesis-L...</a><p>(b) <a href="https://www.amazon.co.uk/Recognition-Algorithms-Finite-State-Transducers-Processing/dp/1608454738" rel="nofollow">https://www.amazon.co.uk/Recognition-Algorithms-Finite-State...</a><p>(c) <a href="https://www.amazon.co.uk/Finite-State-Techniques-Transducers-Bimachines-Theoretical/dp/1108485413" rel="nofollow">https://www.amazon.co.uk/Finite-State-Techniques-Transducers...</a><p>(d) <a href="https://www.amazon.co.uk/Finite-state-Language-Processing-Speech-Communication/dp/0262181827/" rel="nofollow">https://www.amazon.co.uk/Finite-state-Language-Processing-Sp...</a><p>(e) <a href="https://www.amazon.co.uk/Finite-State-Morphology-CSLI-Computational/dp/1575864347/" rel="nofollow">https://www.amazon.co.uk/Finite-State-Morphology-CSLI-Comput...</a>
I have the impression that learning this would feel like learning Italian, after learning Spanish, and being a native French speaker.<p>> If you know RegExp's, the syntax will immediately make sense<p>Doesn't seem like a feature.
Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.<p>Im not so sure this solves the actual issue ;)!!!! Jokes aside, regex has its uses.