TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Perl Incompatible Regular Expressions

62 pointsby eatitrawover 9 years ago

8 comments

brudgersover 9 years ago
If you&#x27;re interested in regular expressions and their place in automata, Jeff Ullman&#x27;s <i>Automata</i> course starts today on Coursera: <a href="https:&#x2F;&#x2F;www.coursera.org&#x2F;course&#x2F;automata" rel="nofollow">https:&#x2F;&#x2F;www.coursera.org&#x2F;course&#x2F;automata</a><p>The recent HN discussion of its announcement is here: <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=10089092" rel="nofollow">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=10089092</a><p>Ullman is also coauthor of &quot;The Dragon Book&quot;.
评论 #10210518 未加载
nine_kover 9 years ago
Google has a similar library with similar goals. See <a href="https:&#x2F;&#x2F;github.com&#x2F;google&#x2F;re2&#x2F;wiki&#x2F;CplusplusAPI" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;google&#x2F;re2&#x2F;wiki&#x2F;CplusplusAPI</a> It also removes backtracking.<p>The idea is that backtracking may kill performance, so a specially crafted text that causes a lot of backtracking can be used as a DoS attack.
baneover 9 years ago
Wow, really impressive. Sometimes specializing by cutting out functionality is the right approach. In this case eliminating greedy&#x2F;non-greedy matching (and others) means this can work as a high-level triage and something with more specificity can do the precision work once you have a candidate match.<p>It looks like this could have a good place in a real-time streaming architecture somewhere.
评论 #10210063 未加载
jhallenworldover 9 years ago
README.ru has the real documentation- google translate does a pretty good job with it. It mentions that the algorithms are from the Dragon book.<p>I didn&#x27;t try the code, but I think it&#x27;s missing full Unicode character class support (for example when you use \w). But I see it handles Russian :-)<p><a href="https:&#x2F;&#x2F;github.com&#x2F;yandex&#x2F;pire&#x2F;blob&#x2F;master&#x2F;pire&#x2F;classes.cpp#L82" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;yandex&#x2F;pire&#x2F;blob&#x2F;master&#x2F;pire&#x2F;classes.cpp#...</a>
js2over 9 years ago
See also <a href="https:&#x2F;&#x2F;swtch.com&#x2F;%7Ersc&#x2F;regexp&#x2F;" rel="nofollow">https:&#x2F;&#x2F;swtch.com&#x2F;%7Ersc&#x2F;regexp&#x2F;</a>
rnovakover 9 years ago
What I don&#x27;t get is that the example given:<p><pre><code> hello\\s+w.+d$ </code></pre> Is 100% perl compatible, seems more like &quot;subset&quot; than &quot;incompatible&quot;. I&#x27;ve seen comments that say it&#x27;s a &quot;joke&quot;. Can any confirm that the title was indeed a joke?<p>Edit: I know both what a DFA&#x2F;NFA are, and how they relate to formal language theory and regular languages, the question still stands how a subset can be called &quot;incompatible&quot;
评论 #10210133 未加载
评论 #10211368 未加载
评论 #10210492 未加载
评论 #10210354 未加载
评论 #10210356 未加载
a8da6b0c91dover 9 years ago
What was wrong with the GNU basic regex?<p>If you&#x27;re going to write a stripped down string matching syntax more strictly for &quot;regular&quot; text then why bother mentioning perl?
评论 #10210087 未加载
评论 #10210085 未加载
评论 #10210139 未加载
nn3over 9 years ago
Scary to think that a major search engine really uses regular expressions heavily. Regexprs are great for quick scripts, but one would expect that in major production applications better and higher level parsing algorithms would be used. It must be a nightmare to debug if you have a lot of reg-exprs interacting in a large code base.
评论 #10211394 未加载