TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

A regular expression matcher in 19 lines of Clojure

40 pointsby decklinover 13 years ago

5 comments

gjm11over 13 years ago
Repeating, mutatis mutandis, what I said in an earlier thread (<a href="http://news.ycombinator.com/item?id=2723366" rel="nofollow">http://news.ycombinator.com/item?id=2723366</a>) about the C code on which this is based:<p>It's very nice pattern-matching code, but it isn't a regular expression matcher. The class of patterns it's able to match is much smaller than the class of all regular expressions -- not because of missing "sugar" (character classes, +, non-greedy wildcards, ...) but because it doesn't support groups or the | operator.<p>As a result, the language supported by this matcher has, e.g., no way to match the pattern (ab)* or the pattern a|b. It's much, much less powerful than an actual regular expression matcher would be, and much of what makes it possible to do it in 19 lines of Clojure is that loss of power.<p>(I've written extremely similar code before: this level of functionality -- basically, glob or DOS wildcards -- is pretty useful. I'm not Rob Pike, and my code for similar functionality would probably be longer than his. But my code, or even Rob Pike's code, for even the simplest thing that could honestly be called a regular expression matcher, would be longer than this by a bigger factor.)
alexgartrellover 13 years ago
I'm no haskell hacker (by a long, long, long shot[0]), and I've played a lot more with Clojure than with Haskell, but I still found this easier to write and read in haskell than in clojure.<p><pre><code> matches :: String -&#62; String -&#62; Bool matches rs = dm ((fixHead . fixTail) rs) where fixHead ('^':rs) = rs fixHead rs = '.':'*':rs fixTail rs = case (reverse rs) of ('$':rrs) -&#62; reverse rrs rrs -&#62; reverse ('*':'.':rrs) dm "" "" = True dm ('.':'*':rs) (c:cs) = (dm ('.':'*':rs) cs) || (dm rs (c:cs)) dm (r:'*':rs) (c:cs) = (dm rs (c:cs)) || (c == r &#38;&#38; (dm (r:'*':rs) cs)) dm ('.':rs) (c:cs) = dm rs cs dm (r:rs) (c:cs) = r == c &#38;&#38; dm rs cs dm (_:'*':[]) "" = True dm _ _ = False </code></pre> [0] I'm particularly ashamed of fixHead and fixTail and the way I special cased matching a star and nothing else against the empty string<p>edit: couldn't resist doing the golf thing<p><pre><code> matches :: String -&#62; String -&#62; Bool matches rs = dm (fixHead rs) where fixHead ('^':rs) = rs fixHead rs = '.':'*':rs cm r c = r == '.' || r == c dm (r:'*':rs) (c:cs) = (dm rs (c:cs)) || ((cm r c) &#38;&#38; (dm (r:'*':rs) cs)) dm (_:'*':"") "" = True dm (r:rs) (c:cs) = cm r c &#38;&#38; dm rs cs dm "" _ = True dm "$" "" = True dm _ _ = False</code></pre>
评论 #3118324 未加载
drallisonover 13 years ago
LOC (Lines of Code) is not a good characterization of the quality of code; clarity and understandability is. The Clojure implementation is pretty nice on those counts.<p>The LOC measure may be important when, for example, the goal is to show the number of operations to be as small as possible.
评论 #3118414 未加载
评论 #3118127 未加载
ramiyerover 13 years ago
I so understand this <a href="http://www.cs.princeton.edu/courses/archive/spr09/cos333/beautiful.html" rel="nofollow">http://www.cs.princeton.edu/courses/archive/spr09/cos333/bea...</a> way better than clojure. What matters? number of lines or readability? and I am sure the Clojure implementation would be better understood by people who understand lisp. Ofcourse I haven't played with clojure or any lisp dialects and the main reason for that is when I see a lisp code I see only "(" and ")". I still don't know how to get out of that.
评论 #3118256 未加载
0x12over 13 years ago
Both versions are really nice but I find the C version to be lots easier to understand (I come from a C background, so that's probably not a surprise).<p>What does this look like to an experienced lisp/clojure programmer, do they have the same feeling but reversed?<p>What about programmers familiar with other languages?