TechEcho

We’ve spent the last 2 months diving deep into enterprise use cases for large language models, and the one that many companies are thinking about is building a chatbot for searching through a company’s internal knowledge base.The typical architecture goes something like this:1. Connect to a knowledge base (Notion, Confluence, etc) and extract pages from it2. Generate embeddings from the content and index the content based on these embeddings using a vector database like Pinecone, Weaviate, Chroma, etc3. When someone has a query, run an approximate nearest neighbor (ANN) search on the vector database and get back ~1000 tokens of content4. Insert the content in the context window of a prompt, and use a LLM like GPT to respond to the query based on information from the content.A few years ago it was ridiculous to even consider building your own internal search product. Now, with libraries like LangChain (https://github.com/hwchase17/langchain), LlamaIndex (https://github.com/jerryjliu/llama_index), and open sources tools like Sidekick (https://github.com/ai-sidekick/sidekick), it’s possible to build a product that works just as well as most workplace search vendors in a matter of days, with the added benefit of being free + fully customizable. (Disclaimer: I am one of the cofounders at Sidekick)It’s also not hard to find developers who would love to get their hands dirty with emerging technologies like vector databases, LLMs, and GPT agents. The economic incentive structure for internal search products (and maybe all SaaS product) has been flipped on its head. Why spend $100k+ year on a vendor that comes with 1-3 month implementation when you can spend half that time building something in-house that plugs in to where your team already works like Teams or Slack?Examples: - Supabase built GPT powered search right into their docs ([https://supabase.com/docs](https://supabase.com/docs)) - PostHog built and deployed their own slackbot to answer questions about PostHog ([https://posthog.com/blog/aruba-hackathon](https://posthog.com/blog/aruba-hackathon))Would you consider building internal semantic search in-house at your company?

3 comments

ofermendabout 2 years ago

As is typically the case in these situations, it may be easy to build your own (if you have the tech expertise), but then you have to maintain it, upgrade it, and generally keep a team of experts around to make it all work over time. Certainly the quality of internal search, regardless of how you would go about it, will dramatically improve from where we are today.

sharemywinabout 2 years ago

I could see this being not dependent on any third party apis too1. pgVector2. huggingface or sentence embeddings like sentence-transformers/all-MiniLM-L6-v23. openassistant or some other open model for the chat part which you could fine tune on your own documents to get jargon and style down.

评论 #35731579 未加载

augment003about 2 years ago

Aren’t we in just a brief window before B2B LLM + Embedding workplace search products become widely available? I’ve already seen a few startups in this space. Presumably existing solutions will provide this option too.It’s not clear why one would start an in-house initiative when it’s clear that the problem will soon be solved by services.

Why it finally makes sense to build, not buy, a workplace search product

3 comments

Why it finally makes sense to build, not buy, a workplace search product

3 comments