I've seen a lot of ES competitor posts pop up on HN lately, and I think they're missing the point of Elastic.<p>If you only need very basic word search, ES is probably not worth the complexity in your stack, especially if you're already running a SQL database with decent plaintext search.<p>Where elasticsearch shines is in complex queries: "Show me every match where this field contains 'extinction' within 10 words of 'impact crater' but NOT containing 'oceanic' and the publish date is > last month and one of the subjects is anthropology"
I love plain old Redis, but I'm not thrilled with the extension modules from Redis Labs.<p>I experimented with RediSearch using 20 GB of Reddit posts and I was very underwhelmed.<p>First, 20 GB of raw data explodes into 75 GB once it's in RediSearch with zero fault tolerance. While I'd expect some expansion with inverted indexes and word frequencies by document, a 3.75 multiple seems high.<p>And since this is Redis, it's all in RAM, including indexes and raw documents, all uncompressed. That's not cheap. Add replicas for fault tolerance and the RAM needed for a decent sized cluster could be 10x the size of the raw data.<p>Then the tooling and documentation is very limited. Redis Labs provides a Python client, but it doesn't support basic features like returning the score with each document, even though RediSearch provides this capability if you query it directly.<p>Finally, I found stability issues with Redis when the RediSearch module is installed. Using the Python client provided by RedisLabs, certain queries would predictably crash every node in the cluster.<p>Redis itself is rock solid, but Redis with the RediSearch module feels fragile.<p>Overall, interesting concept but not ready for production use by any means.
In order for me to trust a benchmark, it needs to be a lot more transparent than this<p>- Show the code that runs the bench mark<p>- Give opportunities for everyone to recreate the benchmark<p>- Give opportunities for every technology to 'respond' and point out where the benchmark/tech configuration is wrong (ie "PRs welcome")<p>Otherwise, this just looks like cherry-picked data points, and even those I won't trust. Nor would I show this to any of my clients (whom I help select search engine technology). I dearly hope nobody makes real decisions based on this blog post until the code, and everything is opened up.
>Component: Search Engine<p>>RediSearch: Dedicated engine based on modern and optimized data-structures<p>>ElasticSearch: 20 years old Lucene engine<p>The implications made here make me actually angry.
The article is a mess of misspellings and misquotes. Also why two distributed search engine were tested on a single node? That's a a meaningless test.
"Elasticsearch crashed after 921 indices" ... Shards: "20 for the multi-tenant benchmark".. 921 * 20 = 18420. Shards have state; they have overhead. Why wouldn't they pick one shard for that benchmark? It's either intentional misconfiguration, or poor understanding of sharding.
From the article: "Here, we simulated a multi-tenant e-commerce application where each tenant represented a product category and maintained its own index. For this benchmark, we built 50K indices (or products), which each stored up to 500 documents (or items), for a total of 25 million indices. RediSearch built the indices in just 201 seconds, while running an average of 125K indices/sec. However, Elasticsearch crashed after 921 indices and clearly was not designed to cope with this load."<p>No sane elasticsearch engineer would make a new index for each product. They would just have a single index with a product_id field for each sub-item. If you needed product level information, you would create a second index for that. You'd use two indexes not O(#Product) indexes.<p>They just created a botched benchmark by using ES incorrectly. It's like driving a car backwards and then complaining it has poor max speed. ES could easily handle this type of problem if done correctly.
"The more advanced multi-tenant use case – where RediSearch was able to complete 25 million indices in just 201 seconds or ~125K indices/sec, while Elasticsearch crashed after it indexed 921 documents, showing that it was not designed to cope with this level of load." previously stated that "Elasticsearch crashed after 921 indices and just couldn’t cope with this load."<p>It's hard to mistake documents for indices. Both original and the currently edited statement sound strongly suspect and make me question the benchmarking methodology used. What caused the ES to crash after indexing 921 documents? Why is comparing indexing speeds on a 1-node setup even a legit benchmarking test?
I fail to see how the creation of 50K indices on elasticsearch is a meaningful benchmark, that's just not how it's supposed to be used.
Also as others said, testing a distributed system on a single node makes little sense... as it is a benchmark which is not reproducible as we don't know how the data was queried and indexed
The intent is nice, but the weird clippy-style avatar in the bottom right is kinda annoying. I'm just trying to read the article not engage in a conversation.
> Dataset source: wikidump Date: Feb 7, 2019 docs: 5.6M size: 5.3 GB<p>"wikidump" links to <a href="https://dumps.wikimedia.org/enwiki/latest/" rel="nofollow">https://dumps.wikimedia.org/enwiki/latest/</a> , which has thousands of files, none of which are 5GB and make sense. That's a <i>very</i> poor corpus link!<p>It says "Feb 7, 2019", so it probably means <a href="https://dumps.wikimedia.org/enwiki/20190120/" rel="nofollow">https://dumps.wikimedia.org/enwiki/20190120/</a> or <a href="https://dumps.wikimedia.org/enwiki/20190201/" rel="nofollow">https://dumps.wikimedia.org/enwiki/20190201/</a> ... maybe. They don't have any obvious 5.3GB files.
If anyone is looking for real benchmarks of ES, check out this page and leave the BS benchmarks aside :-)
<a href="https://elasticsearch-benchmarks.elastic.co/" rel="nofollow">https://elasticsearch-benchmarks.elastic.co/</a>
I'm curious if this scales down well. The test was done on "One AWS c4.8xlarge with 36vCPU and 60GiB Memory". But could I run this on a tiny vps to index, search, catalog my million-odd documents?
A problem with RediSearch, at least for me is:<p>Note: clustering is only available in RediSearch’s Enterprise version<p><a href="https://redislabs.com/redis-enterprise/technology/redis-search/" rel="nofollow">https://redislabs.com/redis-enterprise/technology/redis-sear...</a><p>At least with ES i can build and play with the clustering of the nodes. This is probably why they only made a 1 node ES, because they would have to push their Enterprise software to do make a cluster of RediSearch. Maybe i am wrong.
RedisLabs has done great work in developing Redis but these extensions to retrofit Redis into a multi-model database have issues.<p>Raw latency is usually not the primary concern most of the time and having everything in RAM can be a major cost problem, further compounded by the lack of compression available as with other persistent stores. The RESP protocol is also overloaded and hard to work with when dealing with json and search queries.
Does not tell us the settings for text analysis done by two engines. Secondly on query side - again, scoring settings of RediSearch vs Elastic are not discussed.<p>With that it’s just 2 points in space which gives us little information to deduce 58% faster at X or whatever.
In the fine print: number of shards for the multi-tenant benchmark for Redisearch was increased from 5 to 20; but kept the same (5) for Elastisearch.<p>This is why the only reliable benchmark is the one <i>you</i> do on <i>your</i> data.
Is the RediSearch's aggregation comparable with ES's? Speed has lower priority when there are missing features.<p>PS: Crashes are never good though...
WOW. Hahahaha.<p>This is a massive misconfiguration of an elastic search cluster. 50k indices? 500 documents per index?<p>500 records per index at 5shards/index is 100 records per shard.<p>Yeah, let's shard our data so much that we introduce tremendous amounts of disk i/o overhead!!!<p>Author should learn how to properly configure an ES cluster before posting ridiculous benchmarks like this.<p>What an utter pile of garbage benchmark this is.
How silly to emphasize things like "built as a C extension" and "uses modern data-structures" as if these were useful criteria for choosing a search engine.<p>It's about minimizing the effort needed to find what you're looking for. Speed of index construction time, unless we're talking orders of magnitude, isn't really meaningful. I don't know if this is just a really clumsy attempt at "marketing" or what, but I can't imagine this is going to convince anyone to drop es for this thing.