Elasticsearch is really awesome for searching, but what most people don't realize is that it makes a better MongoDB than MongoDB while giving you that searching too.
It was two weeks ago, and our startup was on the precipice of a major launch. We had completely rewritten our online publication site, which drives the bulk of our traffic. The product had to be shipped on-time - we had press releases, eager investors and a launch party dependent on it.<p>A few days before launch, things were not looking good. As admins manipulated articles in preparation for the launch, the servers kept crashing.<p>In a time-constrained major launch like this, a lot of nasty little hacks build up in the codebase. Our search system for admins was a complete mess. It was a custom solution that worked fine when admins managed a handful of database records, but now that they were managing thousands of articles, it was not scaling at all.<p>At the 11th hour, we dropped elasticsearch into our infrastructure. It worked like a charm. The servers stopped crapping out, and we launched on time.<p>Elasticsearch mostly "just works", and we didn't have to worry about complex schema definitions, working with giant complex XML files (hello Solr), or build anything on top to interface between the index and the queries themselves (Lucene). Thanks elasticsearch, you saved us!
ES seems to have ability to run analytic queries. I have read about people using it as an OLAP solution [1], although I have not yet read anyone describe their experience. In that respect how does ES analytics capabilities compare against:<p>1) Dremel clones [2] like Impala & Presto (for near real-time, ad hoc analytic queries over large datasets)<p>2) Lambda Architecture [3] systems (where queries are known up-
front, but need to run against a large dataset)<p>Does anyone here have experience ES in such usecases, beyond the free text searching one ES is well-known for?<p>[1]: <a href="https://groups.google.com/forum/#!topic/elasticsearch/iTy9IYL23as" rel="nofollow">https://groups.google.com/forum/#!topic/elasticsearch/iTy9IY...</a><p>[2]: <a href="http://static.googleusercontent.com/media/research.google.com/en/us/pubs/archive/36632.pdf" rel="nofollow">http://static.googleusercontent.com/media/research.google.co...</a><p>[3]: <a href="http://jameskinley.tumblr.com/post/37398560534/the-lambda-architecture-principles-for-architecting" rel="nofollow">http://jameskinley.tumblr.com/post/37398560534/the-lambda-ar...</a>
Beyond the technology, Elasticsearch has a very mature, active and helpful community with users groups all over the world. We're well connected.<p>Pick your favourite users group here: <a href="http://elasticsearch.meetup.com/" rel="nofollow">http://elasticsearch.meetup.com/</a><p>Full disclosure: I started and run the Berlin UG. We set ourselves apart by always providing a small introduction into ES for those that are completely new and would have a hard time following the main talk.
The thing that worried me the most about Elasticsearch was how fragile it got around the limits of its performance. Run out of memory because of a nasty query? Boom, data corrupted. I hope you weren't using it as your primary persistence layer...<p>Otherwise, we love ES. The other comment about it being a better Mongo than Mongo rings true. With the backup/restore API and the some of the circuit breakers, I'm hopeful that my fears will be abated.
This gem is from the 'breaking changes' list:<p><pre><code> “Geo queries used to use miles as the default unit. And we
all know what happened at NASA because of that decision. The
new default unit is meters.”
</code></pre>
I like this release already.
> Easy to read, console-based insight into what is happening in your cluster. Particularly useful to the sysadmin when the alarm goes off at 3am and JSON is too difficult to read.<p>It's these little details I love, when a project actually cares about operations and not just "well here's the API"<p>I've been using ElasticSearch only for Logstash, but i've been blown away so far as how easy it is to deal with.
ES has performed very well for us as the backbone for the solution we deployed for a large government-sector customer. Had some GC issues initially, and were worried about user concurrency, especially since we were not restricting queries (i.e. users can do full-scale wildcard searches against the entire data set of 1BN+ records). But ES continues to shine.<p>Congrats to the ElasticSearch team, and all the supporters around it. Once I get back into more of a coding role, I'll definitely be contributing back to the ES project.
I also took a few days a few weeks ago to setup elastic search after my mysql full text search fell apart.<p>What I'm doing is slamming the full text output of OCRed PDFs into a MyISAM table, the entire document in a text field.<p>What I'm afraid I'm not doing right is creating the web interface to search elasticsearch. What I'm using filters with the query string syntax[1] in the search box, pointing directly at that fulltext column. I'm also using the highlight functionality so that I can specify how many highlight blurbs to return with the result. The query string syntax works great with the OCR'd text, because most of it is near-garbage (as most ocr is) so you can search for something like "net sales"~50 to find those two terms within 50 words of each other. I think the results were something like:
net sales 15,000 results
"net sales" 120 results
"net sales"~50 550 results<p>Can anyone point me at a good web based search implementation using elasticsearch that explains how they're doing it?<p>What I have works pretty good, I just want to... check my work, I guess.<p>[1]: <a href="http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-query-string-query.html#query-string-syntax" rel="nofollow">http://www.elasticsearch.org/guide/en/elasticsearch/referenc...</a>
I didn't know what this was and looking at this link it was tough to tell.<p>The github lays it out well.<p><a href="https://github.com/elasticsearch/elasticsearch" rel="nofollow">https://github.com/elasticsearch/elasticsearch</a>
Why is it awesome? Why "it just works"? Is it just a mongodb-kind document store over Hadoop+Lucene?<p>What makes it so special to have hundreds of votes and tweets all around within 2 hours?<p>I don't understand. A DB engine engineer.
We wrote a tutorial about how we wrote our search for Close.io using elasticsearch and pyparsing:<p>"Sales data search: Writing a query parser / AST using pyparsing + elasticsearch"<p>Part 1: <a href="http://blog.close.io/sales-data-search-writing-a-query-parser-ast-using-part-1" rel="nofollow">http://blog.close.io/sales-data-search-writing-a-query-parse...</a><p>Part 2: <a href="http://blog.close.io/sales-data-search-writing-a-query-parser-ast-using" rel="nofollow">http://blog.close.io/sales-data-search-writing-a-query-parse...</a>
Elasticsearch mostly "just works". The latest version of Solr has made clustering easier (requires managing Zookeeper), but before that, it was either ES or nightmare.<p>Lucene is one of those projects which hardly has any real competition. That's surprising given how many real world software projects have a search requirement. While Lucene is excellent, it's not without flaws and competition is always great.
At Contentful in Berlin (Germany) we're looking for an elasticsearch/lucene expert, if you're excited by this tool and want to work full time with it get in touch.<p><a href="https://groups.google.com/d/msg/elasticsearch/Rb7Lei4gaaE/7IDPuPxQV-IJ" rel="nofollow">https://groups.google.com/d/msg/elasticsearch/Rb7Lei4gaaE/7I...</a>
I was vetting ES for a business critical search platform, had some concerns about write/read performance and how the lucene indexes are handled on disk. I read that it doesn't really perform as well a splunk...Instead of ES, I'm considering a solution using HBase to shard lucene indexes on HDFS.
Really impressed with the pace of innovation in the last few months: cat api, aggregations, snapshots. The unfortunate side effect is that books and stack overflow posts written before 1.0 are outdated.<p>Disclaimer: I’m the founder of a hosted Search As A Service and we use ES in a few critical parts of our infrastructure.
I'd be curious to see how well Elastic Search holds up to Endeca. I'm currently stuck maintaining some Endeca instances and it's a nightmare. I wish I could go back to ES.<p>At my last place of work, ES was beautiful and required little work to get a very fast, workable search in place.
Great news. In every new project that we create (in general REST JSON APIs made with nodejs, erlang or rails that are consumed by iOS and android clients) we always finish using postgresql, redis and elasticsearch. Great tools.
ES is one of the few techs that I seriously love.<p>The rails support for it is amazing too. The guy creating the rails integration lib is really talented and active.
We recently switched from using MixPanel + Crittercism + Sphinx to using qbox.io (hosted elasticsearch) and Kibana to do all our analytics, crash reporting, and search.<p>I can't recommend qbox.io enough! Point-and-click scaling of managed elasticsearch clusters + Kibana == bliss.