Semantic Search and Answering is not widely adopted, because of the cost, complexity, and lack of accuracy. At Semantic.app we’ve built the best hosted semantic search engine and here’s how we’ve done it.<p>Embedding: the process of converting text into a tensor (read vector). This can be done on documents, sentences, or words depending on what you are trying to accomplish. When you upload data. We create a tiered embedding system. We use word embeddings on key words using T5 (google) to create keywords, and also dense vector embeddings for each document using GPT-3 (the model depends on document size).<p>Search; Keywords are important for search, and how you choose to index can affect search speeds, especially when they are computationally complex. Rather than dotting potentially millions of vectors with an embedded query (using gpt3 asymmetric search), we’ve found using a keyword matching algorithm then a nearest neighbor search (like FAISS) to produce the best results.<p>Next we cluster the top n results using a Bayesian classifier - realistically the answer can be in any of the top few documents and choosing the top one naively is “lucky”, as vector embeddings no matter how good are at best approximate. We take our top cluster and break it down into semantic chunks (we’ve coined “chunkify”, using a custom GPT 3 model) and use a sentence level embedder to find the most relevant chunk, which we return for search.<p>For answering we use a classifier to detect what length of answer we expect (like “when..” may anticipate a short date, but “why” might indicate a longer response needed) and we pick the right model. We then use word embeddings to find the most relevant terms in the answer as well as the sentence embeddings in the previous paragraph to weight certain words and phrases the model (usually GPT or BERT) should spit out. We also use a custom model to ensure that the answer is only contained within the text (Large Models can leak answers) and filter/reask the question to the model.