We received a lot of valuable feedback on our similarity-search engine, which we launched a few days ago.<p>Based on your feedback, we've made some major changes to improve recall. Specifically, we've begun to include data from our web-crawler.<p>We've also started to prune many of the similarity-search results in order to improve precision.<p>Finally, we cleaned-up the UI to make it more clear what the website does. I think that we still have some work to do in this area, however.<p>Unfortunately, many of the changes we've made to the algorithm have _dramatically_ slowed down performance. Most searches now take over a minute to complete!<p>We're hard at work on fixing that, though. Specifically, we're playing around with implementing multi-level counting bloom filters, count-min flajolet-martin sketches, and quntile fm digests.<p>We should have some major performance improvements up over the next few days.<p>We're also looking at launching a pre-alpha of a stand-alone software package that implements the ESer algorithm so that people can run similarity-searches on their own private data sets.<p>Please comment with your feedback.<p>Thanks again!