Coming from a Solr/Lucene/Algolia background, my opinions on this:<p><i>What's good:</i><p>==========<p>- Focused search for question and answer databases (such as customer FAQs)<p>- ML-based semantic search without requiring any explicit configuration<p>- Connectors for S3, AWS-hosted MySQL/PG, Sharepoint.
Searching data already in the AWS ecosystem (S3, Aurora) is now easier,
and likely faster and cheaper too in some aspects like saving incoming/outgoing bandwidth<p>- Document-level access control at all pricing plans<p>- Managed search (similar to Algolia)<p><i>What's similar to existing search systems (Solr / ES / Algolia):</i><p>==========<p>- Indexing: All data has to be processed into "field:value" structure prior to indexing<p>- Indexing file formats: Plain text, HTML, PDF, MS DOCX, MS PPT<p>- Searching: Usual boolean filters and faceting but only at field level.<p>- Searching: Field and value boosts for relevance, but only at index-time<p>- Results: Highlighting support<p><i>What's missing:</i><p>===========<p>- No multi-lingual support. Only English. Given that it's AWS, I'm very surprised by this actually (or
I've missed out something in their docs)<p>- Can't configure text analysis for English. I feel this'll return relevant results for formal-style
content, but probably not for informal-style content like emails.<p>- No connectors for common internal systems: Outlook, JIRA, Confluence<p>- No built-in support for CSV, XLS, JSON (that one's odd!). They'll all require preprocessing which means additional infra costs.<p>- Doesn't seem to support range- / query- facets. I feel lack of range facets is a big problem, especially
for numerical data.<p>- No query-time relevance tuning<p>- No field-level access control<p>- Scores are not returned in results<p>- Common post-searching functionality is missing: rescoring, grouping, clustering<p><i>What's unknown:</i><p>============<p>- I don't see any information about phrase or proximity searches. Of course, they are usually relevance hacks in keyword-based systems, but sometimes users really need exact phrase matches. Does their ML backend handle this somehow?<p>- All search systems fall short while handling proper nouns - names, places, things, scientific names.
It's possible to alleviate it to some extent using part-of-speech aware indexing. Not sure if Kendra
does it in its ML backend.