A human way to define regular expressions in Ruby

68 点作者 vbv超过 11 年前

20 条评论

fendrak超过 11 年前

Am I the only person who thinks that things like this are totally unnecessary? Is learning/reading regular expressions really that difficult for most people?Here's the subset of regular expressions that has gotten me through nearly all of the regular expressions I've ever needed to write. As a plus, it has no dependencies!* - zero or more of the preceding character/group+ - one or more of the preceding character/group? - zero or one of the preceding character/group$ - end of line^ - beginning of line. - one of any one character\ - escape the following character (for a literal '$' or '.', for example)[<some characters>] - one of the given characters[a-zA-Z0-9] - letters and numbers inside a group can have ranges!(<something>) - capturing group (anything that matches inside it will be accessible in the match object)<thing1>|<thing2> - either the first thing, or the second thing (or the third, or the fourth...)This isn't a complete, or even precise, definition, but knowing those things will get you to the point where you can read and write expressions like this:^(-|+)?[0-9]*\.[0-9]+$which matches things like -.2, 0.123, +0.1, etc. (floating point numbers, basically). This likely has bugs, since I haven't tested it ;)

评论 #6337111 未加载

评论 #6336027 未加载

评论 #6337174 未加载

评论 #6335936 未加载

评论 #6337229 未加载

评论 #6336236 未加载

评论 #6336361 未加载

评论 #6338242 未加载

评论 #6336023 未加载

评论 #6337950 未加载

评论 #6336531 未加载

评论 #6337125 未加载

cthor超过 11 年前

I really don't understand why some people are so afraid of regex. Sure, it's not perfect, but these symbol-less solutions feel to me like one step forward and two steps back. A more interesting development would be something along the lines of what Perl 6 is trying to do (if only someone could implement it).[1]We should start treating regex like a computer language in its own right, rather than some second-hand citizen that we stuff into a single line without any delimiting whitespace. Use the /x switch and comment your code as you would with any other programming language, and you'll find that regex really isn't that scary.Take the example pattern. The regex written by a human might look something like so:<pre><code> m{^ https? :// # Protocol (?: \w+ \. )? # Subdomain ([\w\-]+) \. # Domain (?: com | org ) # TLD /? $}x </code></pre> It's certainly easier to parse. Debugging it is also a lot easier. We can see that this won't match anything with more than one subdomain. We can also see that it won't match subdomains that have hyphens in them. It also looks at a glance much more like a URL and less like some arbitrary Ruby code.[1]: <a href="http://www.perl6.org/archive/doc/design/apo/A05.html" rel="nofollow">http://www.perl6.org/archive/doc/design/apo/A05.html</a>

评论 #6336925 未加载

评论 #6337155 未加载

评论 #6337316 未加载

评论 #6337286 未加载

评论 #6340676 未加载

fishtoaster超过 11 年前

This is interesting. I find that regexes are one of the few places where a comment explaining what a block of code does is generally necessary. Since the "ruby way" is to use aggressively replace comments with method and variable names, a tool like this is a good way to achieve that goal.That said, this also makes a huge tradeoff against conciseness. Personally, I'd prefer<pre><code> # Is foo a valid klingon email address? foo =~ /gibberish/ </code></pre> Over<pre><code> foo = Hexpress.new. start('g'). maybe('i'). many('b'). find('b'). ...</code></pre>

评论 #6337263 未加载

acjohnson55超过 11 年前

Please yes!I did a little research and this derives from Verbal Expressions, as explained [1] and implemented [2]. In any case, I'm emphatically in favor. I'm the sort of programmer that needs to write maybe one regex per month, and I'm tired as hell of relearning the human-meaningless syntax for regexs, let alone all the little variations between languages. Not to mention, if it can vastly reduce the number of symbols I have to escape when matching special symbols, so much the better.[1] <a href="http://thechangelog.com/stop-writing-regular-expressions-express-them-with-verbal-expressions/" rel="nofollow">http://thechangelog.com/stop-writing-regular-expressions-exp...</a> [2] <a href="http://verbalexpressions.github.io/" rel="nofollow">http://verbalexpressions.github.io/</a>

评论 #6337270 未加载

jonaphin超过 11 年前

Wow. My mind is blown.It almost feels like what high level programming languages are to assembly code.Sure, we can (and should) learn Regex constructs, but it is undeniable that this library provides an unmatched level of clarity.Kudos to Krain for coming up with such an elegant solution to the issue of Regex building/reading.

评论 #6340432 未加载

评论 #6336475 未加载

评论 #6337312 未加载

kamaal超过 11 年前

Regular expressions were very invented to avoid this. Because this works fine only as long your regular expressions are small and few.Once that fact changes you will find yourself writing and staring at walls of text.You will do this enough number of times, then only hope that you have a more succinct and powerful way of expressing such a idiom, you will do all that only to realize using regular expressions are the only way to solving a range problems which it was designed to solve.The easiest analogy I can give you math. Prior to manipulation of symbols, math was pretty much text. The whole of math looked paragraphs of puzzles and word play. Worked fine when you want to do small things like postulates, axioms and a few things derived from that. To move a higher level of abstraction and interplay of concepts we had to get into symbols.This is something similar.

tzury超过 11 年前

Python's re.VERBOSE let you write [1]<pre><code> a = re.compile(r"""\d + # the integral part \. # the decimal point \d * # some fractional digits""", re.X) </code></pre> Instead of<pre><code> a = re.compile(r"\d+\.\d*") </code></pre> But that is as far as it goes for me. From that point, masking REGEX with additional layer of 10's of functions is not a wise move.[1] <a href="http://docs.python.org/2/library/re.html#re.VERBOSE" rel="nofollow">http://docs.python.org/2/library/re.html#re.VERBOSE</a>

评论 #6336746 未加载

评论 #6337322 未加载

Glyptodon超过 11 年前

This completely illustrates what I hate most about ruby: the conflation of 'understandable code' with some sort of insane directive to turn everything into an English sentence without any regard for what the code actually does. At its worst it's almost an obsession with enforcing technical ignorance.That said, regexes can be tricky to parse and making them clearer to the average person is a worthy goal.

rajahafify超过 11 年前

I don't understand why so much hate for the library. For me, this is a very ruby way to tackle regular expression problem that newbie like me have.Not that I'm saying that newbie shouldn't learn regex. But any level of abstraction would be great. Rails abstract the complexity of web development. Hexpress abstract the complexity of regex. Both are win in my book.

brudgers超过 11 年前

Like VerbalExpressions, this a good idea carried out without full acknowledgement of the nature of regular expressions. There's no shortcut - concatenation, union, and kleene star. A URL operator doesn't replace any of them.If there isn't isomorphism between the traditional symbols and the new names, then the expressions will be limited in expressiveness.

Argorak超过 11 年前

Here the same[1]. This library squats vocabulary in a bad way. e.g.:#word is (\w+). This works in english, but breaks very early. Still, it might be correct in some time. "word" is extremely context-sensitive.[1] <a href="https://news.ycombinator.com/item?id=6319584" rel="nofollow">https://news.ycombinator.com/item?id=6319584</a>

评论 #6337330 未加载

dlau1超过 11 年前

It seems like the argument against regexes is that they can get really complex and unreadable. For something as simple as that url, isn't it easier to just use a regex?Seems like in the 'unreadable' regex case, you'll have a boatload of verbose function calls to construct it.

评论 #6337296 未加载

hardwaresofton超过 11 年前

So as computer scientists, we're in the business of tradeoffs right. I think the conciseness tradeoff for clarity is good. Abstracting away from pure regular expression syntax is a good thing, I think, because it offers better AT A GLANCE reading.Why would that be better? I recently saw Bret Victor's Inventing on Principle talk on Vimeo, and one of the things he mentions that I really agree with is that most of us must 'think like computers' to understand what our code will do. The less we have to do this, the better, because we're terrible computersNow am I likely to use something like this? No, mostly because I don't program in ruby TOO often, and I am pretty well aware of how to use regular expressions.

rzendacott超过 11 年前

Here's a link to the VerbalExpressions organization that contains implementations of a similar DSL for various languages: <a href="https://github.com/verbalexpressions" rel="nofollow">https://github.com/verbalexpressions</a>

评论 #6336585 未加载

jgmmo超过 11 年前

Cute, but I prefer just using Rubular to test out regex's

lgrebe超过 11 年前

Or instead of learning start maybe with words multiple has either ending you learn ^?\w+(|)$.I'd imagine that wrappers such as this one do not enable users to create any more complex patterns, than ones they could make with a most basic understanding of regular expressions. Whilst providing neither a base for more complex regex use nor a community to further an understanding of regex.

评论 #6337344 未加载

nirai超过 11 年前

Now we need a human way to write Ruby in Python.

swalkergibson超过 11 年前

The reason that this library is useful is because I don't have to maintain an AST in my own feeble mind about what each special character in the regex means. I can look at the code, at first glance, and immediately know what the hell is going on.Also, this.<a href="https://xkcd.com/208/" rel="nofollow">https://xkcd.com/208/</a>

krainboltgreene超过 11 年前

Library owner here, I've added some replies to points made but I just want to note that the readme on Github's website was a bit out of date and provides a little more details now.

ryderm超过 11 年前

Seems easier to just use a regex. Every programmer should know them already, so this is just another API to remember. Still kinda cool though, but prob not so useful.