TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Stop avoiding regular expressions damn it

30 点作者 bradt大约 12 年前

12 条评论

dasil003大约 12 年前
The core criticism of regular expressions is legitimately directed at intermediate programmers who know enough to be dangerous, but is sometimes inappropriately cargo-culted by beginner programmers who use it as an excuse not to learn regular expressions.<p>The fact is that despite pithy slogans, there is a sweet spot where a regular expression does the job of matching a string in a clearer fashion than anything else. But that sweet spot is well shy of the theoretical power of regular expressions (especially in Perl!), before which you should further your understanding of a range of parsing techniques before hacking together a baroque regex.
评论 #5694353 未加载
评论 #5697073 未加载
4ad大约 12 年前
I found the house I currently live in with regular expressions.<p>A couple of years ago I moved to a different country, and for some reasons I needed <i>two</i> apartments, preferably close to each other. As you can imagine, the real estate websites are not designed for the kind of query I needed, so I wrote some code to aid me in my quest[1].<p>It's just shell script and text processing with awk. I download various results with all the available apartments for many real estate websites, then I scrape the data I care about (with regular expressions!) like address, rooms, price, anything really, and query the Google Maps API with all the addresses to retrieve the geographical coordinates, then I compute the distances between any two houses and sort them.<p>It's fantastically modular. Adding support for a new website meant just creating some regular expressions that work for that website. This was great because I was doing this on the road, as I was visiting the foreign city and found new sources of information.<p>Regular expressions were also great because these websites didn't have any API where I could query for the address, etc. I had to rely on what <i>people</i> wrote in their ads. This meant that when I wrote a regexp to match a set of results I had to inspect the failures to see new ways people described their houses and improved my matching based on that. Initially I had hoped I'd be able to parse 80% of the ads, but measurements and careful coding had allowed me to match approximately 99% of the ads!<p>The textual operation of this software allowed me to easily input some data manually. For example I realized that I'm also interested in having these apartments close to a subway station. No problem, just manually create the file with the subway stations in the correct, simple, textual format and the program will pick it up and use automatically.<p>The textual interface also helped with fancy queries, like "price between X and Y, 6 rooms total, prefer 4-2 to 3-3 if distance less than D, but 3-3 if distance greater than D, prefer Z subway line to Q, only one apartment might be from an agency rather than an individual, try to put one in K part of the city". Try to do that with an existing website.<p>[1] <a href="https://code.google.com/p/operation-housefinder/" rel="nofollow">https://code.google.com/p/operation-housefinder/</a>
bradt大约 12 年前
A little back story on this article for those who are interested...<p>I noticed my coworker was going out of his way to use string manipulation, writing many lines of code instead of a simple regular expression. When I asked why, he explained that he didn't know regular expressions, but more importantly that he felt that he had read a lot of posts on Stack Overflow discouraging use of regular expressions. From what he had read, he felt that it was better practice to avoid regular expressions. Although this could be anecdotal, there may be a real danger here that inexperienced programmers are getting the wrong message, that regular expressions are somehow bad in most situations and not worth learning.
Titanous大约 12 年前
More concise? Sometimes. Slower? Always.<p><pre><code> BenchmarkRegexp 500000 5136 ns/op BenchmarkStrings 10000000 173 ns/op </code></pre> <a href="http://play.golang.org/p/YT29Ao-tOt" rel="nofollow">http://play.golang.org/p/YT29Ao-tOt</a>
评论 #5694363 未加载
评论 #5694346 未加载
评论 #5694292 未加载
评论 #5694390 未加载
评论 #5694284 未加载
评论 #5694372 未加载
评论 #5694836 未加载
评论 #5694230 未加载
nraynaud大约 12 年前
As a general rule I ask people to avoid using non-trivial regular expressions. The grammar is too tricky and often the expression doesn't mean what the developer intends it to mean. Or the next developer will make a mistake.<p>My current pet peeve is with parser combinators, wich seems a good compromise (it's not a magic wand) between maintenance (whereas external parser generators don't blend well in your code), parsing what you think you are parsing (more so when your grammar was defined with rules in a reference document), and integrating the parser with your code.
bane大约 12 年前
Does anybody know of a good perl of python library that will use a regex (with constraints on the repetition operators) and generate an exhaustive list of matching strings (instead of generating a random list)?<p>I think this would be helpful in many cases in getting people to understand how regexes work. I've seen lots of cases where toolsets designed to help people build regexes end up with them confused when their regex also matches other stuff beyond their test strings.
评论 #5694286 未加载
gbog大约 12 年前
OT: Where from come this seemingly odd and new habit of spacing inside parentheses? I always write "(a, b)", mostly because it is closer to English (or other languages) typography, and it seem to have good readability, plus it is, I believe, the standard in most languages. So why write "( a, b )"?<p>By the way, if some like spacing that much, and if the reason is to have a better mouse-selectability, then I humbly propose "( a , b )".
评论 #5694409 未加载
buro9大约 12 年前
I feel that this needs posting again: <a href="http://www.debuggex.com/" rel="nofollow">http://www.debuggex.com/</a><p>Basically a great online tool for testing your regular expressions and stepping through what is actually happening. As soon as you get non-trivial, it's a Godsend.
Su-Shee大约 12 年前
THE single best ressource to really learn how to deal competently with regex is still Jeffrey Friedl's book "Mastering Regular Expressions".<p>You will profit from it for the rest of your career.<p>(There's also a Regex short reference and a Regex cookbook by O'Reilly...)
评论 #5694207 未加载
notyourpal大约 12 年前
I'm very guilty of this myself. I'm officially a loser if I haven't delved into regex within two weeks.
ExpiredLink大约 12 年前
Stop propagating bad interfaces like 'regular expressions' damn it!<p>An interface that e.g. makes me 'escape' half of my input because its designers think their special use of characters <i>must</i> take precedence over all user input is a bad interface.
评论 #5694348 未加载
评论 #5694350 未加载
评论 #5694401 未加载
评论 #5694354 未加载
3minus1大约 12 年前
What's a good resource for learning reg exp?
评论 #5694170 未加载
评论 #5694199 未加载
评论 #5694180 未加载
评论 #5694215 未加载
评论 #5694201 未加载
评论 #5694226 未加载