TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Fast(er) regular expression engines in Ruby

60 点作者 davidsojevic13 天前

4 条评论

kayodelycaon12 天前
&gt; Another nuance was found in ruby, which cannot scan the haystack with invalid UTF-8 byte sequences.<p>This is extremely basic ruby: UTF-8 encoded strings must be valid UTF-8. This is not unique to ruby. If I recall correctly, python 3 does the same thing.<p><pre><code> 2.7.1 :001 &gt; haystack = &quot;\xfc\xa1\xa1\xa1\xa1\xa1abc&quot; 2.7.1 :003 &gt; haystack.force_encoding &quot;ASCII-8BIT&quot; =&gt; &quot;\xFC\xA1\xA1\xA1\xA1\xA1abc&quot; 2.7.1 :004 &gt; haystack.scan(&#x2F;.+&#x2F;) =&gt; [&quot;\xFC\xA1\xA1\xA1\xA1\xA1abc&quot;] </code></pre> This person is a senior engineer on their Team page. All they had to do was google &quot;ArgumentError: invalid byte sequence in UTF-8&quot;. Or ask a coworker... the company has Ruby on Rails applications. <i>headdesk</i>
评论 #43875716 未加载
DmitryOlshansky10 天前
I wonder how std.regex of dlang would fare in such test. Sadly due to a tiny bit of D’s GC use it’s hard to provide as a library for other languages. If there is an interest I might take it through the tests.
yxhuvud12 天前
Eww, pretending to support utf8 matchers while not supporting them at all was not pretty to see.
gitroom12 天前
Honestly that part bugs me, fake support is worse than no support imo