Search Benchmarking: RediSearch vs. Elasticsearch

144 pointsby sidiabout 6 years ago

25 comments

showerstabout 6 years ago

I've seen a lot of ES competitor posts pop up on HN lately, and I think they're missing the point of Elastic.If you only need very basic word search, ES is probably not worth the complexity in your stack, especially if you're already running a SQL database with decent plaintext search.Where elasticsearch shines is in complex queries: "Show me every match where this field contains 'extinction' within 10 words of 'impact crater' but NOT containing 'oceanic' and the publish date is > last month and one of the subjects is anthropology"

评论 #19702528 未加载

评论 #19701100 未加载

评论 #19701252 未加载

评论 #19701370 未加载

评论 #19701191 未加载

评论 #19704990 未加载

评论 #19706606 未加载

评论 #19704177 未加载

评论 #19702651 未加载

panarkyabout 6 years ago

I love plain old Redis, but I'm not thrilled with the extension modules from Redis Labs.I experimented with RediSearch using 20 GB of Reddit posts and I was very underwhelmed.First, 20 GB of raw data explodes into 75 GB once it's in RediSearch with zero fault tolerance. While I'd expect some expansion with inverted indexes and word frequencies by document, a 3.75 multiple seems high.And since this is Redis, it's all in RAM, including indexes and raw documents, all uncompressed. That's not cheap. Add replicas for fault tolerance and the RAM needed for a decent sized cluster could be 10x the size of the raw data.Then the tooling and documentation is very limited. Redis Labs provides a Python client, but it doesn't support basic features like returning the score with each document, even though RediSearch provides this capability if you query it directly.Finally, I found stability issues with Redis when the RediSearch module is installed. Using the Python client provided by RedisLabs, certain queries would predictably crash every node in the cluster.Redis itself is rock solid, but Redis with the RediSearch module feels fragile.Overall, interesting concept but not ready for production use by any means.

softwaredougabout 6 years ago

In order for me to trust a benchmark, it needs to be a lot more transparent than this- Show the code that runs the bench mark- Give opportunities for everyone to recreate the benchmark- Give opportunities for every technology to 'respond' and point out where the benchmark/tech configuration is wrong (ie "PRs welcome")Otherwise, this just looks like cherry-picked data points, and even those I won't trust. Nor would I show this to any of my clients (whom I help select search engine technology). I dearly hope nobody makes real decisions based on this blog post until the code, and everything is opened up.

jchwabout 6 years ago

>Component: Search Engine>RediSearch: Dedicated engine based on modern and optimized data-structures>ElasticSearch: 20 years old Lucene engineThe implications made here make me actually angry.

评论 #19701481 未加载

bigodinesabout 6 years ago

If 2-word queries is all you need, why would you even consider elasticsearch? This benchmark is pure marketing IMHO.

评论 #19701471 未加载

free652about 6 years ago

The article is a mess of misspellings and misquotes. Also why two distributed search engine were tested on a single node? That's a a meaningless test.

simpsondabout 6 years ago

"Elasticsearch crashed after 921 indices" ... Shards: "20 for the multi-tenant benchmark".. 921 * 20 = 18420. Shards have state; they have overhead. Why wouldn't they pick one shard for that benchmark? It's either intentional misconfiguration, or poor understanding of sharding.

speedplaneabout 6 years ago

From the article: "Here, we simulated a multi-tenant e-commerce application where each tenant represented a product category and maintained its own index. For this benchmark, we built 50K indices (or products), which each stored up to 500 documents (or items), for a total of 25 million indices. RediSearch built the indices in just 201 seconds, while running an average of 125K indices/sec. However, Elasticsearch crashed after 921 indices and clearly was not designed to cope with this load."No sane elasticsearch engineer would make a new index for each product. They would just have a single index with a product_id field for each sub-item. If you needed product level information, you would create a second index for that. You'd use two indexes not O(#Product) indexes.They just created a botched benchmark by using ES incorrectly. It's like driving a car backwards and then complaining it has poor max speed. ES could easily handle this type of problem if done correctly.

sidiabout 6 years ago

"The more advanced multi-tenant use case – where RediSearch was able to complete 25 million indices in just 201 seconds or ~125K indices/sec, while Elasticsearch crashed after it indexed 921 documents, showing that it was not designed to cope with this level of load." previously stated that "Elasticsearch crashed after 921 indices and just couldn’t cope with this load."It's hard to mistake documents for indices. Both original and the currently edited statement sound strongly suspect and make me question the benchmarking methodology used. What caused the ES to crash after indexing 921 documents? Why is comparing indexing speeds on a 1-node setup even a legit benchmarking test?

alkzabout 6 years ago

I fail to see how the creation of 50K indices on elasticsearch is a meaningful benchmark, that's just not how it's supposed to be used. Also as others said, testing a distributed system on a single node makes little sense... as it is a benchmark which is not reproducible as we don't know how the data was queried and indexed

makkesk8about 6 years ago

This benchmark is pretty misleading. And not the mention that elasticsearch is free for multi node deployments while redis search is not.

overgardabout 6 years ago

The intent is nice, but the weird clippy-style avatar in the bottom right is kinda annoying. I'm just trying to read the article not engage in a conversation.

Scaevolusabout 6 years ago

> Dataset source: wikidump Date: Feb 7, 2019 docs: 5.6M size: 5.3 GB"wikidump" links to <a href="https://dumps.wikimedia.org/enwiki/latest/" rel="nofollow">https://dumps.wikimedia.org/enwiki/latest/</a> , which has thousands of files, none of which are 5GB and make sense. That's a very poor corpus link!It says "Feb 7, 2019", so it probably means <a href="https://dumps.wikimedia.org/enwiki/20190120/" rel="nofollow">https://dumps.wikimedia.org/enwiki/20190120/</a> or <a href="https://dumps.wikimedia.org/enwiki/20190201/" rel="nofollow">https://dumps.wikimedia.org/enwiki/20190201/</a> ... maybe. They don't have any obvious 5.3GB files.

g1mpabout 6 years ago

If anyone is looking for real benchmarks of ES, check out this page and leave the BS benchmarks aside :-) <a href="https://elasticsearch-benchmarks.elastic.co/" rel="nofollow">https://elasticsearch-benchmarks.elastic.co/</a>

nathanaldensrabout 6 years ago

Someone needs to edit this article. There are misspellings and typos all over the place.

评论 #19701220 未加载

ademupabout 6 years ago

I'm curious if this scales down well. The test was done on "One AWS c4.8xlarge with 36vCPU and 60GiB Memory". But could I run this on a tiny vps to index, search, catalog my million-odd documents?

评论 #19701369 未加载

评论 #19704620 未加载

评论 #19700899 未加载

sifflandabout 6 years ago

A problem with RediSearch, at least for me is:Note: clustering is only available in RediSearch’s Enterprise version<a href="https://redislabs.com/redis-enterprise/technology/redis-search/" rel="nofollow">https://redislabs.com/redis-enterprise/technology/redis-sear...</a>At least with ES i can build and play with the clustering of the nodes. This is probably why they only made a 1 node ES, because they would have to push their Enterprise software to do make a cluster of RediSearch. Maybe i am wrong.

manigandhamabout 6 years ago

RedisLabs has done great work in developing Redis but these extensions to retrofit Redis into a multi-model database have issues.Raw latency is usually not the primary concern most of the time and having everything in RAM can be a major cost problem, further compounded by the lack of compression available as with other persistent stores. The RESP protocol is also overloaded and hard to work with when dealing with json and search queries.

DmitryOlshanskyabout 6 years ago

Does not tell us the settings for text analysis done by two engines. Secondly on query side - again, scoring settings of RediSearch vs Elastic are not discussed.With that it’s just 2 points in space which gives us little information to deduce 58% faster at X or whatever.

1024coreabout 6 years ago

In the fine print: number of shards for the multi-tenant benchmark for Redisearch was increased from 5 to 20; but kept the same (5) for Elastisearch.This is why the only reliable benchmark is the one you do on your data.

dumbfounderabout 6 years ago

Anyone can make a search engine fast. It's much harder to make it good.

m3kw9about 6 years ago

So according to HN, they’ve proved RediSearch is actually inferior

评论 #19704995 未加载

rooam-devabout 6 years ago

Is the RediSearch's aggregation comparable with ES's? Speed has lower priority when there are missing features.PS: Crashes are never good though...

gt565kabout 6 years ago

WOW. Hahahaha.This is a massive misconfiguration of an elastic search cluster. 50k indices? 500 documents per index?500 records per index at 5shards/index is 100 records per shard.Yeah, let's shard our data so much that we introduce tremendous amounts of disk i/o overhead!!!Author should learn how to properly configure an ES cluster before posting ridiculous benchmarks like this.What an utter pile of garbage benchmark this is.

评论 #19701381 未加载

评论 #19701452 未加载

评论 #19701562 未加载

coleiferabout 6 years ago

How silly to emphasize things like "built as a C extension" and "uses modern data-structures" as if these were useful criteria for choosing a search engine.It's about minimizing the effort needed to find what you're looking for. Speed of index construction time, unless we're talking orders of magnitude, isn't really meaningful. I don't know if this is just a really clumsy attempt at "marketing" or what, but I can't imagine this is going to convince anyone to drop es for this thing.

评论 #19701297 未加载