TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Ask HN: Should I used a lexer and parser for this?

4 点作者 Trindaz将近 14 年前
A technical question I'm hoping some experienced hackers can help with:<p>My django application has a search feature for searching articles. It needs to handle logic features (AND, OR, (), etc.) <i>and</i> search articles accordingly using django's ORM tools <i>and</i> be used to create a regular expression that's used in my python code for certain searches that aren't performed at the database level. I've read about lexers such as ply - but I'd be interested to hear about anyone else who's had a similar problem. Take logic expressions, then transform them into something a database can use for searching, and then transform them into something else entirely for regex searches.

2 条评论

jerf将近 14 年前
You don't need a specialized lexer for this. A regex to split on key words (AND, OR) and symbols (parens, whatever else you've got) is sufficient. A parser would be helpful, <i>if</i> you really, <i>really</i> think your customers are going to run complicated queries against your database with nested OR clauses and serious use of parentheses and stuff. You may have such customers, they do exist, but bear in mind the average user has no idea what the difference between "A AND (B OR C)" and "(A AND B) OR C" is; if you don't know why your users are above average and picky, you might consider something very, very simple instead.<p>But let's go on assuming that's not the case, because it's certainly a fair possibility. The next step is to find The Tutorial that all parsers <i>always</i> have like it's some sort of law of the universe, the Arithmetic Expression parser. This covers basic operator precedence (multiplication and division vs. +/-), parenthesized expressions, hopefully the unary negation operator, and hopefully at least sketches the evaluation code for the resulting parse tree. Good news! This is almost exactly what you want, except multiplication is AND, addition is OR, and unary negation is NOT. You just need to tweak your regular expression to either emit "wordA wordB wordC" as a series of tokens which your parser will add together, or with just a bit of work on your regex, they'll come out as one token. Take your time transforming the grammar, one step at a time; make it so it takes "3 AND (1 + 3)" instead of "3 / (1 + 3)" first, little steps, because the error messages that come out of parsers can be a bit confusing at first and you want your changes to be minimal.<p>This may take a few hours to wrap your head around, but it's a very easy use case for parsing, and the time spent learning how to really, truly parse is time well spent.<p>Oh, and be careful with the generated regular expressions; there are some DOSes that can be done with those, and be sure you do something sane with asterisks and other special characters. If your regex engine supports the equivalent of \Q and \E in perl, use it.
amccloud将近 14 年前
You should checkout <a href="http://haystacksearch.org/" rel="nofollow">http://haystacksearch.org/</a>