TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Full text search in your Data: How you can do better than Elasticsearch

19 点作者 arthurlenoir超过 11 年前

1 comment

Mpdreamz超过 11 年前
&quot;mixing relevance and popularity is nothing short of impossible in Elasticsearch. Either you sort by relevance or by using a popularity attribute, you cannot mix both.&quot;<p>This is a false statement,<p><a href="http://www.elasticsearch.org/guide/reference/query-dsl/custom-filters-score-query/" rel="nofollow">http:&#x2F;&#x2F;www.elasticsearch.org&#x2F;guide&#x2F;reference&#x2F;query-dsl&#x2F;custo...</a><p>This combined with scripts give you unlimited possibilities to alter you score based on whatever you please. The syntax is a bit wonky in the current version perhaps but awesomeness is on the way:<p><a href="https://github.com/elasticsearch/elasticsearch/issues/3423" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;elasticsearch&#x2F;elasticsearch&#x2F;issues&#x2F;3423</a><p>&quot;Unfortunately Elasticsearch fuzzy matching does not work out of the box, is complex to customize, and does not provide the ability to highlight prefixes.&quot;<p>There are more ways to catch typo&#x27;s then fuzzy and levensteins, ngrams for instance. Elasticsearch allows you to do both but yes its true you have to know your way around analyzers&#x2F;tokenizers and mapping a little bit to get the best results in elasticsearch. If you use the ngrams approach highlighting also works alot better.<p>&quot;This sorting configuration might seem pretty explicit, but it is in fact quite dangerous as it conflicts with the boost on fields. To better understand the problem, let’s look at the query ‘the rains’:&quot;<p>Its true sorting trumps boosting, but given the assumption you cannot alter _score this whole section seems contrived.<p>In the instant search section they use elasticsearch&#x27;s querstring query to search for `world w*` this is indeed a very slow way since it will generate a wildcard query in the background they probably should have written the query using a phrase prefix query.
评论 #6439079 未加载