TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Postgres text search: balancing query time and relevancy

116 点作者 SerCe超过 3 年前

3 条评论

necovek超过 3 年前
A nice exploration of Postgres trigram extension.<p>A few things I&#x27;d try if I was building a dedicated code search tool is to introduce custom per-language tokenizers for Postgres FTS that actually tokenize according to language rules (thus making &quot;def&quot; or &quot;if&quot; a stopword for Python, but also splitting &quot;doSomethingCrazy&quot; into (&quot;do&quot;, &quot;something&quot;, &quot;crazy&quot;).<p>Then, I&#x27;d do two searches: one using such a FTS query first for more relevant results, and trigram search after. Combining results might be tricky, but not overly so imho (though the devil is in the details).<p>As for &quot;limit interval to&quot;, you can approximate that by doing a LIMIT on unsorted results in a subselect (get the number with heuristics and adjust it as the data set grows), and then sort those by relevance: the result is effectively the same, except that you are using dataset size as the boundary instead of time.
评论 #28875598 未加载
thom超过 3 年前
A search with an order by&#x2F;limit across 1 million rows with a GIST trigram index shouldn&#x27;t be taking multiple seconds if the text being searched isn&#x27;t huge. In the case of GitHub repository names, you might as well use `similarity` (&lt;-&gt;) instead of `word_similarity` (&lt;&lt;-&gt;) which would be a speedup, because it&#x27;s just a pure index scan.
DeathArrow超过 3 年前
How does text search in Postgres compare with Elasticsearch?
评论 #28876580 未加载
评论 #28878065 未加载