TechEcho

8 comments

brudgersover 9 years ago

If you're interested in regular expressions and their place in automata, Jeff Ullman's Automata course starts today on Coursera: <a href="https://www.coursera.org/course/automata" rel="nofollow">https://www.coursera.org/course/automata</a>The recent HN discussion of its announcement is here: <a href="https://news.ycombinator.com/item?id=10089092" rel="nofollow">https://news.ycombinator.com/item?id=10089092</a>Ullman is also coauthor of "The Dragon Book".

评论 #10210518 未加载

nine_kover 9 years ago

Google has a similar library with similar goals. See <a href="https://github.com/google/re2/wiki/CplusplusAPI" rel="nofollow">https://github.com/google/re2/wiki/CplusplusAPI</a> It also removes backtracking.The idea is that backtracking may kill performance, so a specially crafted text that causes a lot of backtracking can be used as a DoS attack.

baneover 9 years ago

Wow, really impressive. Sometimes specializing by cutting out functionality is the right approach. In this case eliminating greedy/non-greedy matching (and others) means this can work as a high-level triage and something with more specificity can do the precision work once you have a candidate match.It looks like this could have a good place in a real-time streaming architecture somewhere.

评论 #10210063 未加载

jhallenworldover 9 years ago

README.ru has the real documentation- google translate does a pretty good job with it. It mentions that the algorithms are from the Dragon book.I didn't try the code, but I think it's missing full Unicode character class support (for example when you use \w). But I see it handles Russian :-)<a href="https://github.com/yandex/pire/blob/master/pire/classes.cpp#L82" rel="nofollow">https://github.com/yandex/pire/blob/master/pire/classes.cpp#...</a>

js2over 9 years ago

See also <a href="https://swtch.com/%7Ersc/regexp/" rel="nofollow">https://swtch.com/%7Ersc/regexp/</a>

rnovakover 9 years ago

What I don't get is that the example given:<pre><code> hello\\s+w.+d$ </code></pre> Is 100% perl compatible, seems more like "subset" than "incompatible". I've seen comments that say it's a "joke". Can any confirm that the title was indeed a joke?Edit: I know both what a DFA/NFA are, and how they relate to formal language theory and regular languages, the question still stands how a subset can be called "incompatible"

评论 #10210133 未加载

评论 #10211368 未加载

评论 #10210492 未加载

评论 #10210354 未加载

评论 #10210356 未加载

a8da6b0c91dover 9 years ago

What was wrong with the GNU basic regex?If you're going to write a stripped down string matching syntax more strictly for "regular" text then why bother mentioning perl?

评论 #10210087 未加载

评论 #10210085 未加载

评论 #10210139 未加载

nn3over 9 years ago

Scary to think that a major search engine really uses regular expressions heavily. Regexprs are great for quick scripts, but one would expect that in major production applications better and higher level parsing algorithms would be used. It must be a nightmare to debug if you have a lot of reg-exprs interacting in a large code base.

评论 #10211394 未加载

8 comments

brudgersover 9 years ago

评论 #10210518 未加载

nine_kover 9 years ago

baneover 9 years ago

评论 #10210063 未加载

jhallenworldover 9 years ago

js2over 9 years ago

See also <a href="https://swtch.com/%7Ersc/regexp/" rel="nofollow">https://swtch.com/%7Ersc/regexp/</a>

rnovakover 9 years ago

评论 #10210133 未加载

评论 #10211368 未加载

评论 #10210492 未加载

评论 #10210354 未加载

评论 #10210356 未加载

a8da6b0c91dover 9 years ago

What was wrong with the GNU basic regex?If you're going to write a stripped down string matching syntax more strictly for "regular" text then why bother mentioning perl?

Perl Incompatible Regular Expressions

8 comments

Perl Incompatible Regular Expressions

8 comments