When announcing things with cute names, can people please put a short description in the link?<p>For everyone else: Lucy's apparently a full-text search library written in C targeting dynamic languages, with Perl bindings to start with.
The announcement says:
Lucy is a "loose C" port of Apache Lucene, a search engine library for Java -- it is similar in purpose to Lucene, but designed to take advantage of C's unique capabilities.<p>I'm wondering what these unique capabilities are. Speed? Smaller memory footprint? And I wonder what the reason is behind doing this. I'm all for C project but very curious as to why when Lucene was very well done.
Here's the Perl binding library in CPAN, in its simplest form: <a href="http://search.cpan.org/perldoc?Lucy::Simple" rel="nofollow">http://search.cpan.org/perldoc?Lucy::Simple</a><p>The synopsis is quite elucidative. Just cpanm installed it and in 10 minutes had a program that indexes and searches a collection of files with highlighting. Looks promising!
How much better is it compared to CLucene? CLucene got stuck at 1.9 and now shows very little activity, while Java Lucene is rolling towards version 4. But still CLucene was as less memory hog and faster than Java Lucene at it's active time. They claim it was 2.5 times faster.<p>If Lucy can deliver the latest progresses in Java Lucene as a usable C library, that should be a very good news for me. Lucene still is the best choice for large data indexing and searching solutions.
I tried to get into Lucene (using SOLR) recently but was put off by it's complexity for what was, in my case, a simple use case (searching through a large document set of html, txt, and doc files quickly using proximity search).<p>After futzing for hours with XSLT and writing scripts to submit content via the REST API, I found out about FTS4 in SQLite, and was impressed by it's relative simplicity. I had something working in under an hour in Python.
Why would apache incubate a competing product like this? And what exactly are the unique capabilities that this project can take advantage of? Lucene is already extremely easy to interface to since its just a rest interface.