TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Ask HN: Should I used a lexer and parser for this?

4 pointsby Trindazalmost 14 years ago
A technical question I'm hoping some experienced hackers can help with:<p>My django application has a search feature for searching articles. It needs to handle logic features (AND, OR, (), etc.) <i>and</i> search articles accordingly using django's ORM tools <i>and</i> be used to create a regular expression that's used in my python code for certain searches that aren't performed at the database level. I've read about lexers such as ply - but I'd be interested to hear about anyone else who's had a similar problem. Take logic expressions, then transform them into something a database can use for searching, and then transform them into something else entirely for regex searches.

2 comments

jerfalmost 14 years ago
You don't need a specialized lexer for this. A regex to split on key words (AND, OR) and symbols (parens, whatever else you've got) is sufficient. A parser would be helpful, <i>if</i> you really, <i>really</i> think your customers are going to run complicated queries against your database with nested OR clauses and serious use of parentheses and stuff. You may have such customers, they do exist, but bear in mind the average user has no idea what the difference between "A AND (B OR C)" and "(A AND B) OR C" is; if you don't know why your users are above average and picky, you might consider something very, very simple instead.<p>But let's go on assuming that's not the case, because it's certainly a fair possibility. The next step is to find The Tutorial that all parsers <i>always</i> have like it's some sort of law of the universe, the Arithmetic Expression parser. This covers basic operator precedence (multiplication and division vs. +/-), parenthesized expressions, hopefully the unary negation operator, and hopefully at least sketches the evaluation code for the resulting parse tree. Good news! This is almost exactly what you want, except multiplication is AND, addition is OR, and unary negation is NOT. You just need to tweak your regular expression to either emit "wordA wordB wordC" as a series of tokens which your parser will add together, or with just a bit of work on your regex, they'll come out as one token. Take your time transforming the grammar, one step at a time; make it so it takes "3 AND (1 + 3)" instead of "3 / (1 + 3)" first, little steps, because the error messages that come out of parsers can be a bit confusing at first and you want your changes to be minimal.<p>This may take a few hours to wrap your head around, but it's a very easy use case for parsing, and the time spent learning how to really, truly parse is time well spent.<p>Oh, and be careful with the generated regular expressions; there are some DOSes that can be done with those, and be sure you do something sane with asterisks and other special characters. If your regex engine supports the equivalent of \Q and \E in perl, use it.
amccloudalmost 14 years ago
You should checkout <a href="http://haystacksearch.org/" rel="nofollow">http://haystacksearch.org/</a>