> Because of that, the first step on the road to fast programs is to measure!<p>> Making an operation that accounts for 1% of the overall runtime 100% faster is far less effective than making something that accounts for 50% of the runtime10% faster.<p>Currently dealing with a slow DB and a team of "engineers" who have just <i>NOT</i> internalized the above. It's been very demoralizing.
Thanks for writing this up. I'm solving a similar problem, albeit different in a few important ways, and was planning on taking a semi-similar approach. Like you, today I'm using a fairly naive implementation with some less-than-ideal complexity.<p>I've never actually worked with a prefix tree before, so this was quite helpful for me.<p>This may be of interest as well - a related paper, Pigasus, does this with specialized hardware but it's a similar problem of applying 10s of thousands of rules as quickly as possible. I found it while reading through acolyer's morning paper.
<a href="https://blog.acolyer.org/2020/11/16/pigasus/" rel="nofollow">https://blog.acolyer.org/2020/11/16/pigasus/</a><p>Again, slightly different task - they wouldn't be going for the longest match exclusively, but all matches. Also, the rules they use are fundamentally regular expressions, so the filtering pass is to take advantage of the domain specific knowledge that rule-matches are very rare.
If someone can implement a code analyzer that could suggest such optimizations we could save so many compute cycles. Using tries for such problems is a common solution but may be unknown or overlooked by some. Even trying to search for best practices to a problem you feel has likely been solved many times before can be difficult if you don't use the right keywords.