TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

A regular expression matcher in 19 lines of Clojure

40 点作者 decklin超过 13 年前

5 条评论

gjm11超过 13 年前
Repeating, mutatis mutandis, what I said in an earlier thread (<a href="http://news.ycombinator.com/item?id=2723366" rel="nofollow">http://news.ycombinator.com/item?id=2723366</a>) about the C code on which this is based:<p>It's very nice pattern-matching code, but it isn't a regular expression matcher. The class of patterns it's able to match is much smaller than the class of all regular expressions -- not because of missing "sugar" (character classes, +, non-greedy wildcards, ...) but because it doesn't support groups or the | operator.<p>As a result, the language supported by this matcher has, e.g., no way to match the pattern (ab)* or the pattern a|b. It's much, much less powerful than an actual regular expression matcher would be, and much of what makes it possible to do it in 19 lines of Clojure is that loss of power.<p>(I've written extremely similar code before: this level of functionality -- basically, glob or DOS wildcards -- is pretty useful. I'm not Rob Pike, and my code for similar functionality would probably be longer than his. But my code, or even Rob Pike's code, for even the simplest thing that could honestly be called a regular expression matcher, would be longer than this by a bigger factor.)
alexgartrell超过 13 年前
I'm no haskell hacker (by a long, long, long shot[0]), and I've played a lot more with Clojure than with Haskell, but I still found this easier to write and read in haskell than in clojure.<p><pre><code> matches :: String -&#62; String -&#62; Bool matches rs = dm ((fixHead . fixTail) rs) where fixHead ('^':rs) = rs fixHead rs = '.':'*':rs fixTail rs = case (reverse rs) of ('$':rrs) -&#62; reverse rrs rrs -&#62; reverse ('*':'.':rrs) dm "" "" = True dm ('.':'*':rs) (c:cs) = (dm ('.':'*':rs) cs) || (dm rs (c:cs)) dm (r:'*':rs) (c:cs) = (dm rs (c:cs)) || (c == r &#38;&#38; (dm (r:'*':rs) cs)) dm ('.':rs) (c:cs) = dm rs cs dm (r:rs) (c:cs) = r == c &#38;&#38; dm rs cs dm (_:'*':[]) "" = True dm _ _ = False </code></pre> [0] I'm particularly ashamed of fixHead and fixTail and the way I special cased matching a star and nothing else against the empty string<p>edit: couldn't resist doing the golf thing<p><pre><code> matches :: String -&#62; String -&#62; Bool matches rs = dm (fixHead rs) where fixHead ('^':rs) = rs fixHead rs = '.':'*':rs cm r c = r == '.' || r == c dm (r:'*':rs) (c:cs) = (dm rs (c:cs)) || ((cm r c) &#38;&#38; (dm (r:'*':rs) cs)) dm (_:'*':"") "" = True dm (r:rs) (c:cs) = cm r c &#38;&#38; dm rs cs dm "" _ = True dm "$" "" = True dm _ _ = False</code></pre>
评论 #3118324 未加载
drallison超过 13 年前
LOC (Lines of Code) is not a good characterization of the quality of code; clarity and understandability is. The Clojure implementation is pretty nice on those counts.<p>The LOC measure may be important when, for example, the goal is to show the number of operations to be as small as possible.
评论 #3118414 未加载
评论 #3118127 未加载
ramiyer超过 13 年前
I so understand this <a href="http://www.cs.princeton.edu/courses/archive/spr09/cos333/beautiful.html" rel="nofollow">http://www.cs.princeton.edu/courses/archive/spr09/cos333/bea...</a> way better than clojure. What matters? number of lines or readability? and I am sure the Clojure implementation would be better understood by people who understand lisp. Ofcourse I haven't played with clojure or any lisp dialects and the main reason for that is when I see a lisp code I see only "(" and ")". I still don't know how to get out of that.
评论 #3118256 未加载
0x12超过 13 年前
Both versions are really nice but I find the C version to be lots easier to understand (I come from a C background, so that's probably not a surprise).<p>What does this look like to an experienced lisp/clojure programmer, do they have the same feeling but reversed?<p>What about programmers familiar with other languages?