TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Fix Boyer-Moore searcher with the Rytter correction

193 点作者 HenryR大约 5 年前

16 条评论

bumbledraven大约 5 年前
What is a minimal test case (needle + haystack) for which the old code yields a different result than the new code?
评论 #22898522 未加载
评论 #22896981 未加载
oefrha大约 5 年前
The papers:<p>- Boyle, Moore (1977) <a href="https:&#x2F;&#x2F;www.cs.utexas.edu&#x2F;users&#x2F;moore&#x2F;publications&#x2F;fstrpos.pdf" rel="nofollow">https:&#x2F;&#x2F;www.cs.utexas.edu&#x2F;users&#x2F;moore&#x2F;publications&#x2F;fstrpos.p...</a><p>- Knuth, Morris, Pratt (1977) <a href="https:&#x2F;&#x2F;pdfs.semanticscholar.org&#x2F;4479&#x2F;9559a1067e06b5a6bf052f8f10637707928f.pdf" rel="nofollow">https:&#x2F;&#x2F;pdfs.semanticscholar.org&#x2F;4479&#x2F;9559a1067e06b5a6bf052f...</a><p>- Rytter (1980) <a href="https:&#x2F;&#x2F;doi.org&#x2F;10.1137&#x2F;0209037" rel="nofollow">https:&#x2F;&#x2F;doi.org&#x2F;10.1137&#x2F;0209037</a> (paywalled without subscription, but you know where to find it <i>cough scihub cough</i>).
dang大约 5 年前
Submitted title was &quot;A C++17 STL bug that can be attributed to Knuth, and its 40-year-old fix&quot;, which is nice additional context, but can now go here rather than there.
tacitusarc大约 5 年前
What I really enjoy about this merge request are these comments:<p>&gt; Rename _Suffix_fn to _Fx, again following the papers. Note that in the usage below, there is a semantic change: _Suffix_fn stored 0-based values, while _Fx stores 1-based values. This undoes a micro-optimization (_Suffix_fn avoided unnecessarily translating between the 0-based and 1-based domains), but with the increased usage of f in the Rytter correction, I wanted greater correspondence with the published algorithms in order to verify the implementation.<p>And<p>&gt; Rename 1-based _Idx to 1-based _Jx. (While the code was correct, I found it confusing that _Idx was 0-based in other loops but 1-based in this loop.)<p>Naming is one of the most strangely difficult aspects of programming, but if there is one rule it&#x27;s be consistent. I really hate code that uses a variable name to mean one thing in one place and other somewhere else. I just learned what that meant! Why change it?! And yet this is common, and sometimes a comment about it in code review might get a response like &quot;the code works, why does it matter?&quot; This. This is why it matters. It causes confusion, which makes code difficult to verify, which causes bugs.
评论 #22899623 未加载
评论 #22903489 未加载
dvt大约 5 年前
What I find most interesting about this bugfix is simply how terrible of a source Wikipedia is and, at the same time, how valuable professional insight can be (e.g. someone that knows what they&#x27;re talking about). And to add insult to injury, someone quickly edited Wikipedia (for the &quot;street cred&quot; I&#x27;m sure):<p>&gt; The original paper contained errors for computing the table of pattern shifts which was corrected by Wojciech Rytter in 1980 [1]<p>But that&#x27;s <i>still actually wrong</i>. Computing the table was not even discussed in the &quot;original paper&quot; -- see the papers linked by @oefrha in this thread. They&#x27;re two different 1977 papers!<p>[1] <a href="https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Boyer%E2%80%93Moore_string-search_algorithm" rel="nofollow">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Boyer%E2%80%93Moore_string-sea...</a>
评论 #22899664 未加载
评论 #22899238 未加载
评论 #22896883 未加载
praptak大约 5 年前
Lol, I was taught by Rytter like 20 years ago. He did much more in text algorithms than just correcting B-M. The two books by Crochemore &amp; Rytter are basically <i>the</i> books on text algorithms.
评论 #22899254 未加载
skyde大约 5 年前
This guy is a hero! I have so much respect for his work.<p>By i’m still wondering if this bug could have been detected by automatic fuzzing testing ?
评论 #22898657 未加载
评论 #22896502 未加载
zvrba大约 5 年前
&gt; However, the published algorithm for dd&#x27; was incorrect! This was discovered and fixed by Rytter in 1980, which we weren&#x27;t aware of until we received a bug report.<p>Fits what Knuth said: &quot;Beware of bugs in the above code; I have only proved it correct, not tried it.&quot;
karma_fountain大约 5 年前
Nice fix! It&#x27;s been a while since I&#x27;ve done c++ so does the following code<p><pre><code> const auto elapsed = steady_clock::now() - start; if (elapsed &gt; 10s) { </code></pre> actually use a units quantifier on the 10, i.e. does 10s mean 10 seconds? Is that a library thing? Cool if so.
评论 #22897005 未加载
评论 #22897089 未加载
rurban大约 5 年前
Nice, but using decade old and overly slow string search algorithms still is questionable. There are hundreds of faster algos, the fastest and most stable currently being epsm (2013). <a href="https:&#x2F;&#x2F;www.dmi.unict.it&#x2F;~faro&#x2F;smart&#x2F;algorithms.php?algorithm=EPSM&amp;code=epsm" rel="nofollow">https:&#x2F;&#x2F;www.dmi.unict.it&#x2F;~faro&#x2F;smart&#x2F;algorithms.php?algorith...</a>
mcbain大约 5 年前
A while back I filed an issue on the MSVC std::vector implementation as the behaviour of a weird corner case differed from GCC&#x2F;Clang.<p>The reply I got from STL himself was fantastic, pointing out that it was [paraphrasing here] a wart in the spec, and that neither were actually incorrect.<p>To my lasting regret, I didn&#x27;t keep an <i>ahem</i> offsite backup of some of those corp emails. Hence I can&#x27;t recall the exact details.
评论 #22898733 未加载
skyde大约 5 年前
Any one know why the Turbo Boyer-Moore algorithm is not more popular. Any case when Boyer Moore would be better?
评论 #22897638 未加载
评论 #22901978 未加载
评论 #22898420 未加载
kazinator大约 5 年前
&gt; <i>I find it very curious that it isn&#x27;t constantly mentioned when explaining Boyer-Moore</i><p>Papers need &quot;superseded by ...&quot; type forward references, like IETF RFC&#x27;s. Or rather &quot;corrected by&quot;, in this case.<p>Maybe important algorithms need to be summarized in RFC-like documents.
评论 #22899818 未加载
NCG_Mike大约 5 年前
I remember Dr Dobbs, way back when, having an article on the search algorithm. I think they had C code for it.<p>I did a version in 68K on the Mac (MPW Shell days) for a SGML editor I was working on.
kzrdude大约 5 年前
So there is potentially a simpler fix than the Rytter correction, is that also yet to discover in published papers somewhere, or do we need to ask Knuth himself?
评论 #22899085 未加载
fithisux大约 5 年前
Is Dlang affected?