TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Structure of FTS5 Index in SQLite

95 点作者 ellieh超过 1 年前

5 条评论

AlexErrant超过 1 年前
I had a... suboptimal experience with SQLite&#x27;s FTS, so here it is for others: match queries don&#x27;t work with `OR` conditions. You have to do this work around:<p><pre><code> WHERE fa.rowid IN (SELECT rowid FROM address_fts WHERE address_fts MATCH &#x27;google&#x27;) OR ta.rowid IN (SELECT rowid FROM address_fts WHERE address_fts MATCH &#x27;google&#x27;) </code></pre> Source <a href="https:&#x2F;&#x2F;sqlite.org&#x2F;forum&#x2F;forumpost?udc=1&amp;name=1a2f2ffdd80cf795" rel="nofollow noreferrer">https:&#x2F;&#x2F;sqlite.org&#x2F;forum&#x2F;forumpost?udc=1&amp;name=1a2f2ffdd80cf7...</a><p>I&#x27;m building a search feature for my app, that parses a user&#x27;s search query (a la Github search) and returns relevant results. Generating the above workaround SQL on the fly is just... ulgh.
评论 #38664806 未加载
评论 #38675362 未加载
rogerbinns超过 1 年前
It isn&#x27;t made clear in the repo but what this is doing [1] is a small amount of code to glue SQLite&#x27;s tokenizer API to the Rust unicode-segmentation crate [2] which in turn is using the TR-29 algorithm with generated tables [3].<p>[1] <a href="https:&#x2F;&#x2F;github.com&#x2F;signalapp&#x2F;Signal-FTS5-Extension&#x2F;blob&#x2F;main&#x2F;Cargo.toml">https:&#x2F;&#x2F;github.com&#x2F;signalapp&#x2F;Signal-FTS5-Extension&#x2F;blob&#x2F;main...</a><p>[2] <a href="https:&#x2F;&#x2F;unicode-rs.github.io&#x2F;unicode-segmentation&#x2F;unicode_segmentation&#x2F;index.html" rel="nofollow noreferrer">https:&#x2F;&#x2F;unicode-rs.github.io&#x2F;unicode-segmentation&#x2F;unicode_se...</a><p>[3] <a href="https:&#x2F;&#x2F;github.com&#x2F;unicode-rs&#x2F;unicode-segmentation&#x2F;blob&#x2F;master&#x2F;src&#x2F;tables.rs">https:&#x2F;&#x2F;github.com&#x2F;unicode-rs&#x2F;unicode-segmentation&#x2F;blob&#x2F;mast...</a><p>International Components for Unicode is the reference library for this and other Unicode algorithms.<p><a href="https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;International_Components_for_Unicode" rel="nofollow noreferrer">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;International_Components_for_U...</a>
forinti超过 1 年前
I&#x27;ve been using FTS with a stemming module from CPAN and it works quite well.<p>Of course the stemming module does not catch all cases, but it works well enough to be useful.
评论 #38665902 未加载
评论 #38665788 未加载
bomewish超过 1 年前
This is really great and helpful.<p>Confused why roll own tokeniser for fts5 when there’s already a good one??<p><a href="https:&#x2F;&#x2F;github.com&#x2F;wangfenjin&#x2F;simple">https:&#x2F;&#x2F;github.com&#x2F;wangfenjin&#x2F;simple</a>
resoluteteeth超过 1 年前
I hadn&#x27;t seen that tokenizer and it&#x27;s really great that they open sourced it because not supporting CJK languages out of the box is one of the major issues with sqlite fts.<p>I think this also shows why a standard third party distribution of sqlite is needed so that extensions like this can be included with the sqlite binaries provided with bindings for other languages
评论 #38665754 未加载