Pomsky – A portable, modern regular expression language

208 pointsby dataminerover 2 years ago

37 comments

asicspover 2 years ago

Previous discussion: "Rulex – A new, portable, regular expression language"<a href="https://news.ycombinator.com/item?id=31690878" rel="nofollow">https://news.ycombinator.com/item?id=31690878</a> (242 points | 6 months ago | 189 comments)

vintermannover 2 years ago

It's probably better than regular expressions. However, is it enough better that it's worth learning yet another syntax?Well, maybe. What I REALLY like about this one, is that it fully reverses the quoting/escaping assumptions. The default assumption of old regexes that symbols should by default match themselves, I think is regexes' million dollar mistake. The literal matches are the least interesting part of regexes. If you reach for regexes, it's because you want something more complex than literal matches, and the syntax should be about taming that complexity. Even the Unix world sort of conceded that, in reversing the quoting assumption for ?, +, (), [] and | in egrep. The mistake was in stopping there.I will take a good look at this. I hope they provide good justifications for their choices.

评论 #34186724 未加载

评论 #34186031 未加载

评论 #34185634 未加载

antmanover 2 years ago

For spec review from business partners I have found "verbal expressions" the most useful flavor. Specifically because to review them you don't need to know regular expressions and the library has ports for most programming languages.Example: <a href="https://github.com/VerbalExpressions/JSVerbalExpressions#testing-if-we-have-a-valid-url">https://github.com/VerbalExpressions/JSVerbalExpressions#tes...</a>

meksterover 2 years ago

Why don't languages have "grok" patterns in their standard libraries?It seems to only exist in log parsing ecosystems but this really helps with getting rid of little bugs and wrong parsing of specific regex patterns.Instead of doing "^\d+(\.\d+){3}$" for IP checking which is clearly wrong, you'd do "%{IPV4:ip}" which is so much better.List of known patterns : <a href="https://github.com/hpcugent/logstash-patterns/blob/master/files/grok-patterns">https://github.com/hpcugent/logstash-patterns/blob/master/fi...</a>Even for PHP a third party library only has 15 stars.

评论 #34186064 未加载

评论 #34187521 未加载

评论 #34188639 未加载

评论 #34188226 未加载

评论 #34186188 未加载

zokierover 2 years ago

While I'm not sure about Pomsky specifically, I do think its nice that people explore the language space for regex more. General programming languages have huge variety of styles and syntaxes available, from APL to Haskell and Lisp, whereas regexes are pretty much the same everywhere. It feels like we stuck with the first thing that Kleene, Thompson et al thought of and for 50 years didn't even really try anything else.

turnsoutover 2 years ago

To me, the lead example on the homepage ("Basic") is a major red flag. This is not clearer than a traditional regular expression:'Hello' ' 'Did you count the single quotes correctly? Even with syntax highlighting, people WILL mess this up.

评论 #34185799 未加载

blindseerover 2 years ago

I find something like this a lot more readable:<a href="https://github.com/jkrumbiegel/ReadableRegex.jl">https://github.com/jkrumbiegel/ReadableRegex.jl</a>It is in Julia, but if you have it installed locally it’s just a few taps away. You can even generate the regex, and use that in Python and just add the ReadableRegex in a comment nearby.

评论 #34185937 未加载

评论 #34185112 未加载

评论 #34191711 未加载

luuuzetaover 2 years ago

Named regexes (variables in Pomsky) remind me of Raku [1], which implements an improved flavor of PCRE regexes plus grammars in general as part of the language.[1]: <a href="https://docs.raku.org/language/grammars" rel="nofollow">https://docs.raku.org/language/grammars</a>

评论 #34185842 未加载

vanderZwanover 2 years ago

The idea of a compile-to-regex language is a neat one, that immediately makes it a lot easier to use in existing projects.If there's any interest in other takes on "better regexps", the Rosie Pattern Language has some neat ideas:[0] <a href="https://rosie-lang.org/index.html" rel="nofollow">https://rosie-lang.org/index.html</a>[1] <a href="https://www.youtube.com/watch?v=MkTiYDrb0zg&list=PLcGKfGEEONaBUdko326yL6ags8C_SYgqH&index=12">https://www.youtube.com/watch?v=MkTiYDrb0zg&list=PLcGKfGEEON...</a>

echelonover 2 years ago

I don't know why I'd never previously considered regular expressions as being a compile/transpile target. It's pretty obvious from PL theory and makes a ton of sense.That said, after looking at this syntax, I'm not sure that this is much of an improvement. Maybe I've spent far too much time in Regex land [1], but I know I'd perform much slower in this. It's not particularly beautiful, either. The verbosity doesn't seem clearer.Variables and comments are great, though. We need to add them in future regexes.Overall, good idea. I'd like to see more takes on this.[1] <a href="https://jimbly.github.io/regex-crossword/" rel="nofollow">https://jimbly.github.io/regex-crossword/</a>

评论 #34184395 未加载

评论 #34184538 未加载

评论 #34184458 未加载

评论 #34184566 未加载

rekadoover 2 years ago

For Scheme I enjoy using Irregex: <a href="http://synthcode.com/scheme/irregex/" rel="nofollow">http://synthcode.com/scheme/irregex/</a> The key benefit is to have a proper DSL instead of a DSL hidden away inside an untyped string.

gamesbrainiacover 2 years ago

This was previously called Rulex. Glad to see it is getting traction :) I made a video on it a few months ago. <a href="https://www.youtube.com/watch?v=nPjCxwEdIIo">https://www.youtube.com/watch?v=nPjCxwEdIIo</a>

neilvover 2 years ago

Olin Shivers defined the related "SRE" S-expression language for regular expressions:<a href="https://scsh.net/docu/html/man-Z-H-7.html" rel="nofollow">https://scsh.net/docu/html/man-Z-H-7.html</a><a href="https://www.ccs.neu.edu/home/shivers/papers/sre.txt" rel="nofollow">https://www.ccs.neu.edu/home/shivers/papers/sre.txt</a>It works nicely with Scheme, including for programmatic generation.(I've done some regular expressions from heck, without the benefit of this, and needed extensive commenting just to keep a few/several levels of nested groupings straight. With S-expressions, that's trivial.)

osigurdsonover 2 years ago

I seem to need regular expressions about once every six months. Every time I do I wonder how this horrible little language became ubiquitous. However, I’m not going to pull in a dependency just to avoid it (at my current regex cadence at in any case).

评论 #34187365 未加载

评论 #34186898 未加载

moreatiover 2 years ago

Some regex engines/dialects also allow pre-defined named subroutines (variables) with `(?(DEFINE) (?<name>pattern)...)` <a href="http://www.rexegg.com/regex-disambiguation.html#define" rel="nofollow">http://www.rexegg.com/regex-disambiguation.html#define</a>, but it's a very niche feature. e.g. <a href="https://gist.github.com/moreati/9d974e5395829d737dc342715f15fc56" rel="nofollow">https://gist.github.com/moreati/9d974e5395829d737dc342715f15...</a>

AkshatJ27over 2 years ago

> If you know RegExp's, the syntax will immediately make senseIf I know RegEx, why would I use pomsky?

评论 #34184494 未加载

评论 #34184403 未加载

评论 #34184761 未加载

评论 #34184522 未加载

评论 #34184579 未加载

评论 #34184732 未加载

account-5over 2 years ago

I like regex, the syntax is succinct, powerful, and easy to learn. I learned regex before I learned any proper programming language, because you could use it in a text editor. I remember feeling like a wizard at the time.I do admit it can be a write once hopefully never need to read again thing, but I still love it.

sulamiover 2 years ago

I had a similar kind of idea for a long time, which I put into action a few weeks ago via a standalone transpiler of Emacs' rx macro to common regexp syntaxes.[0] I ended up getting interrupted and didn't completely finish it, but it generally works, though is probably riddled with edge cases.The basic idea of rx is to use S-expressions to describe regular expressions, and my elevator pitch would've been to embed rx invocations in shell scripts using $(syntax), the main use case being something like sed invocations.I still think it's a neat idea, and complex regular expressions tend to be hard to parse for humans.[0]: <a href="https://github.com/sulami/rx">https://github.com/sulami/rx</a>

ngomileover 2 years ago

Would be nice if it was possible for the Pomsky playground to show informative modal boxes when you hover over some of the Pomsky style expressions. Kind of like RegExr which remains my favorite tool for quickly writing out a regular expression and seeing what its outcome will be. Very nice to be able to quickly see what some thing does and how it affects your query including documentation in an easily accessible location on the same page.Not having to navigate to a different page would be a boon for the playground. Very interesting though, I use Regular Expressions for the times when you need to extract information but find and replace functions/methods just aren't enough. Mostly for my scrapers.

xiphias2over 2 years ago

Number ranges look great, but it's much better to just extend + polyfill current regex syntax and keep compatibility.[:0-255:] could be an option for example to write a number range.Also regex / variable interpolation should be added to regexes in languages probably :)

评论 #34184608 未加载

cutlerover 2 years ago

I will never understand why a regex in Java still reqires metacharacters to be escaped despite the recent addition of raw string literals. Talk about stuck in the Stone Age.

chubotover 2 years ago

FWIW here is a list of other such projects: <a href="https://github.com/oilshell/oil/wiki/Alternative-Regex-Syntax">https://github.com/oilshell/oil/wiki/Alternative-Regex-Synta...</a>Feel free to add other projects, including the ones in this thread

mawekiover 2 years ago

That doesn't solve the most problematic part of regexing for me: having a different programming language embedded as a string in my other source code. The same happens with SQL in many codebases.The embedded string is often precluded from static analysis. Fieldnames or names of capture groups may drift apart from the outside code and maybe nobody notices.What we need are more embedded DSLs where we get static syntax and type checks and hopefully a good integration between the surrounding code and the embedded code.

评论 #34185652 未加载

评论 #34185408 未加载

评论 #34185582 未加载

评论 #34185352 未加载

arthurofbabylonover 2 years ago

Cognition works in two ways: 1) highly deliberate, slow, patient and 2) spontaneous, quick, reactive. I believe software languages do a great job of leveraging both of these capacities of the minds of programmers.The first type of thinking is generally not composed in code. It happens staring out the window, or looking at data tables or lists and reading documentation, drawing out the architecture. The second type happens in code composition: programmers know what snippet will accomplish which outcome, and it just comes out when we know it's needed. Boom, we have the thing we imagined during the architecture phase. Maybe we look something up or slow down while writing code, but the composition is fairly reactive/spontaneous. It flows.So coding goes in this rhythm... thinking openly and patiently and deliberately, followed by a burst of code composition, and then more open/deliberate thought while staring out a window. Again and again. Maybe we reverse some of the quick/spontaneous thinking to rearchitect according to our slow/deliberate thought.Enter regular expressions. I have never, ever spontaneously composed a regex. I have to be very slow and deliberate. I have to recall the precise definition of every character in the regex as I read or write it. Sometimes I stare at a regex for 2 minutes, look up some operator, stare again for 2 minutes, refactor it. Finally I understand it and can now apply the regex spontaneously to whatever environment I'm working in. It always feels like first principles architecture, never like flow. This is odd, as regex is like a giant shorthand engine.Maybe I have not yet formed the neurological connections required for it to flow – but more likely this is a function of the language. There must be a syntax that better supports flow-state composition. (I don't think Pomsky accomplishes it.)

评论 #34184941 未加载

评论 #34185018 未加载

评论 #34185362 未加载

feguover 2 years ago

Pomsky will look very similar to Pornsky in many fonts. quiet laugh

评论 #34184650 未加载

评论 #34187781 未加载

评论 #34186384 未加载

magic_hamsterover 2 years ago

The only solution I've found for regex which is 100% compatible with the full capabilities, and is also portable to any language so you can learn just one syntax, is regex.

raydiatianover 2 years ago

Can anybody explain how “any old” regex engine can just accept this new syntax?Are we transpiling to regex?

评论 #34186786 未加载

jll29over 2 years ago

Related projects:1. - Xerox xfst2. - Xerox lexc3. - Xerox twolc4. - Mans Hulden's FOMARepo: <a href="https://fomafst.github.io/" rel="nofollow">https://fomafst.github.io/</a>Paper: <a href="https://dingo.sbs.arizona.edu/~mhulden/hulden_foma_2009.pdf" rel="nofollow">https://dingo.sbs.arizona.edu/~mhulden/hulden_foma_2009.pdf</a>Demo: <a href="https://dsacl3-2018.github.io/xfst-demo/" rel="nofollow">https://dsacl3-2018.github.io/xfst-demo/</a>Tutorial: <a href="https://foma.sourceforge.net/lrec2010/" rel="nofollow">https://foma.sourceforge.net/lrec2010/</a>5. Helsinki HfstXfst:Homepage: <a href="https://github.com/hfst/hfst/wiki/HfstXfst">https://github.com/hfst/hfst/wiki/HfstXfst</a>These tools go back to research by Lauri Karttunen and others at Xerox Research Center Europe in Grenoble, where an attempt was made to create highly efficient compilers and runtime libraries for finite-state transducers, i.e. to move beyond regular expressions to regular RELATIONS. This not only permits to formalize replacements (regular expressions with an "output tape"), but also creates reversible automata (input and output roles can be swapped) and leads to a domain specific language that describes transducers in very readable ways, including sub-automata naming, so that it can be useful for formal specification or linguistic rules (phonology, morphology, i.e. word or sound grammar). The latter two projects are open source clones of the former effort. Once you have used these for a week, you will never want to get back to ugly "ordinary" regexes again.Books:(a) <a href="https://www.amazon.co.uk/Finite-State-Processing-Synthesis-Lectures-Technologies/dp/163639115X" rel="nofollow">https://www.amazon.co.uk/Finite-State-Processing-Synthesis-L...</a>(b) <a href="https://www.amazon.co.uk/Recognition-Algorithms-Finite-State-Transducers-Processing/dp/1608454738" rel="nofollow">https://www.amazon.co.uk/Recognition-Algorithms-Finite-State...</a>(c) <a href="https://www.amazon.co.uk/Finite-State-Techniques-Transducers-Bimachines-Theoretical/dp/1108485413" rel="nofollow">https://www.amazon.co.uk/Finite-State-Techniques-Transducers...</a>(d) <a href="https://www.amazon.co.uk/Finite-state-Language-Processing-Speech-Communication/dp/0262181827/" rel="nofollow">https://www.amazon.co.uk/Finite-state-Language-Processing-Sp...</a>(e) <a href="https://www.amazon.co.uk/Finite-State-Morphology-CSLI-Computational/dp/1575864347/" rel="nofollow">https://www.amazon.co.uk/Finite-State-Morphology-CSLI-Comput...</a>

Alifatiskover 2 years ago

I couldn't find how you turn off/on case-sensetive?

huqedatoover 2 years ago

great achievement, but in real life what's its use?

评论 #34184438 未加载

osigurdsonover 2 years ago

I would have done‘Hello’ (‘World’ | ‘Pomsky’ | NULL)In order to avoid ?: stuff.

YesThatTom2over 2 years ago

This language is so… reasonable!

tonnydouradoover 2 years ago

I have the impression that learning this would feel like learning Italian, after learning Spanish, and being a native French speaker.> If you know RegExp's, the syntax will immediately make senseDoesn't seem like a feature.

评论 #34184841 未加载

评论 #34185367 未加载

jeltzover 2 years ago

Can't say I see the point. It is uglier and more verbose than regexes without being easier to read.

评论 #34185328 未加载

thunderbongover 2 years ago

I feel everyone should look at the examples before commenting! I don't know what the edge cases might be but this looks amazingly easy.

评论 #34185372 未加载

zer00eyzover 2 years ago

Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.Im not so sure this solves the actual issue ;)!!!! Jokes aside, regex has its uses.

评论 #34184959 未加载

评论 #34185378 未加载

g5095over 2 years ago

Show me a pomsky for matching any valid email pls.

评论 #34184949 未加载

评论 #34185380 未加载

37 comments

asicspover 2 years ago

vintermannover 2 years ago

评论 #34186724 未加载

评论 #34186031 未加载

评论 #34185634 未加载

antmanover 2 years ago

meksterover 2 years ago

评论 #34186064 未加载

评论 #34187521 未加载

评论 #34188639 未加载

评论 #34188226 未加载

评论 #34186188 未加载

zokierover 2 years ago

turnsoutover 2 years ago

评论 #34185799 未加载

blindseerover 2 years ago

评论 #34185937 未加载

评论 #34185112 未加载

评论 #34191711 未加载

luuuzetaover 2 years ago

评论 #34185842 未加载

vanderZwanover 2 years ago

echelonover 2 years ago

评论 #34184395 未加载

评论 #34184538 未加载

评论 #34184458 未加载

评论 #34184566 未加载

rekadoover 2 years ago

gamesbrainiacover 2 years ago

neilvover 2 years ago

osigurdsonover 2 years ago

评论 #34187365 未加载

评论 #34186898 未加载

moreatiover 2 years ago

AkshatJ27over 2 years ago

> If you know RegExp's, the syntax will immediately make senseIf I know RegEx, why would I use pomsky?

评论 #34184494 未加载

评论 #34184403 未加载

评论 #34184761 未加载

评论 #34184522 未加载

评论 #34184579 未加载

评论 #34184732 未加载

account-5over 2 years ago

sulamiover 2 years ago

ngomileover 2 years ago

xiphias2over 2 years ago

评论 #34184608 未加载

cutlerover 2 years ago

I will never understand why a regex in Java still reqires metacharacters to be escaped despite the recent addition of raw string literals. Talk about stuck in the Stone Age.

chubotover 2 years ago

mawekiover 2 years ago

评论 #34185652 未加载

评论 #34185408 未加载

评论 #34185582 未加载

评论 #34185352 未加载

arthurofbabylonover 2 years ago

评论 #34184941 未加载

评论 #34185018 未加载

评论 #34185362 未加载

feguover 2 years ago

Pomsky will look very similar to Pornsky in many fonts. quiet laugh

评论 #34184650 未加载

评论 #34187781 未加载

评论 #34186384 未加载

magic_hamsterover 2 years ago

The only solution I've found for regex which is 100% compatible with the full capabilities, and is also portable to any language so you can learn just one syntax, is regex.

raydiatianover 2 years ago

Can anybody explain how “any old” regex engine can just accept this new syntax?Are we transpiling to regex?

评论 #34186786 未加载

jll29over 2 years ago

Alifatiskover 2 years ago

I couldn't find how you turn off/on case-sensetive?

huqedatoover 2 years ago

great achievement, but in real life what's its use?

评论 #34184438 未加载

osigurdsonover 2 years ago

I would have done‘Hello’ (‘World’ | ‘Pomsky’ | NULL)In order to avoid ?: stuff.

YesThatTom2over 2 years ago

This language is so… reasonable!

tonnydouradoover 2 years ago

评论 #34184841 未加载

评论 #34185367 未加载

jeltzover 2 years ago

Can't say I see the point. It is uglier and more verbose than regexes without being easier to read.

评论 #34185328 未加载

thunderbongover 2 years ago

I feel everyone should look at the examples before commenting! I don't know what the edge cases might be but this looks amazingly easy.