TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

A regular expression matcher By Rob Pike and Brian Kernighan (2007)

96 点作者 sid6376大约 12 年前

13 条评论

chollida1大约 12 年前
Can someone explain to me why the matchstar function takes its first param as an integer? to me it should be a char.<p><pre><code> int matchstar(int c, char *regexp, char *text) { do { /* a * matches zero or more instances */ if (matchhere(regexp, text)) return 1; } while (*text != '\0' &#38;&#38; (*text++ == c || c == '.')); return 0; }</code></pre>
评论 #5674145 未加载
评论 #5675260 未加载
stiff大约 12 年前
There is a lovely description of the classic Thompson algorithm of building an NFA from a regular expression and then simulating the NFA with a DFA using two stacks :) in the "Compilers" book by Aho et al. There is also a great article from Russ Cox on it: <a href="http://swtch.com/~rsc/regexp/regexp1.html" rel="nofollow">http://swtch.com/~rsc/regexp/regexp1.html</a><p>I recommend implementing this as an exercise, rarely do you see that much classic CS theory put to a practical use in one place.
nilkn大约 12 年前
I translated this into Haskell just for fun.<p><pre><code> match :: String -&#62; String -&#62; Bool match ('^':rest) text = matchLocal rest text match rex (c:rest) = matchLocal rex (c:rest) || match rex rest match _ _ = False matchLocal :: String -&#62; String -&#62; Bool matchLocal [] _ = True matchLocal "$" [] = True matchLocal (c:'*':restR) (t:restT) = c == t &#38;&#38; matchLocal restR (starSkip c (t:restT)) matchLocal (r:restR) (t:restT) = (r == '.' || r == t) &#38;&#38; matchLocal restR restT matchLocal _ _ = False starSkip :: Char -&#62; String -&#62; String starSkip c (t:rest) = if c == t then starSkip c rest else t:rest starSkip _ [] = []</code></pre>
nrbafna大约 12 年前
On a unrelated note.<p>Adding a few lines of css (setting body width, font-size to 16, line-height to 1.4, Georgia for font and larger font size for headings ) makes for a decent reading experience.<p><a href="http://imgur.com/a/KtaEC/" rel="nofollow">http://imgur.com/a/KtaEC/</a>
评论 #5673196 未加载
评论 #5673535 未加载
erre大约 12 年前
For those who don't know, this is Rob Pike's chapter of Beautiful Code, a thoroughly enjoyable book: <a href="http://www.amazon.co.uk/Beautiful-Code-Leading-Programmers-Practice/dp/0596510047" rel="nofollow">http://www.amazon.co.uk/Beautiful-Code-Leading-Programmers-P...</a> . Every now and then I read a different chapter, and it's always interesting. I particularly liked Rob Pike's one, and the one about Python's dictionary implementation.
评论 #5674562 未加载
评论 #5682484 未加载
sid6376大约 12 年前
For those unfamiliar with pointer arithmetic I wrote a python port here. <a href="https://gist.github.com/siddharthsarda/5538928" rel="nofollow">https://gist.github.com/siddharthsarda/5538928</a>
评论 #5673637 未加载
minikomi大约 12 年前
On a related note, Rob Pike's "Lexical Scanning in Go" talk is also worth a look <a href="https://cuddle.googlecode.com/hg/talk/lex.html#landing-slide" rel="nofollow">https://cuddle.googlecode.com/hg/talk/lex.html#landing-slide</a>
porlw大约 12 年前
It's interesting that this is in a very functional style, no global state is used.
nilkn大约 12 年前
I was once asked in an interview to write a pattern matcher in C for the exact same class of regular expressions as in this post. I'm pretty sure the interviewer was looking for exactly the approach used here as well.
jonny_eh大约 12 年前
I wonder if this would be a good test to give a software developer candidate. Develop a regex matcher that matches a subset of the regex rules in your language of choice (without using regex abilities, of course).
评论 #5675243 未加载
评论 #5673139 未加载
评论 #5673295 未加载
pdog大约 12 年前
Altough performance isn't really part of this exercise, beautiful code is also likely to be fast -- efficiently doing the work to solve the problem and no more.
评论 #5673085 未加载
toolslive大约 12 年前
I saw this in a drdobbs article in the 90s as well.
评论 #5673456 未加载
b0rsuk大约 12 年前
Great code is often deceptively simple. It can lead you to believe it was simple to write.