TechEcho

There is a lot of latency involved shuffling data for modern and complex ML systems in production. In our experience these costs dominate end-to-end user latency, rather than actual model or ANN algorithms, which unfortunately limits what is achievable for interactive applications.We've extended Postgres w/ open source models from Huggingface, as well as vector search, and classical ML algos, so that everything can happen in the same process. It's significantly faster and cheaper, which leaves a large latency budget available to expand model and algorithm complexity. In addition open source models have already surpassed OpenAI's text-embedding-ada-002 in quality, not just speed. [1]Here is a series of posts explaining how to accomplish the complexity involved in a typical ML powered application, as a single SQL query, that runs in a single process with memory shared between models and feature indexes, including learned embeddings and reranking models.- Generating LLM embeddings with open source models in the database[2]- Tuning vector recall [3]- Personalize embedding results with application data [4]This allows a single SQL query to accomplish what would normally be an entire application w/ several model services and databasese.g. for a modern chatbot built across various services and databases<pre><code> -> application sends user input data to embedding service <- embedding model generates a vector to send back to application -> application sends vector to vector database <- vector database returns associated metadata found via ANN -> application sends metadata for reranking <- reranking model prunes less helpful context -> application sends finished prompt w/ context to generative model <- model produces final output -> application streams response to user </code></pre> [1]: https://huggingface.co/spaces/mteb/leaderboard[2]: https://postgresml.org/blog/generating-llm-embeddings-with-open-source-models-in-postgresml[3]: https://postgresml.org/blog/tuning-vector-recall-while-generating-query-embeddings-in-the-database[4]: https://postgresml.org/blog/personalize-embedding-vector-search-results-with-huggingface-and-pgvectorGithub: https://github.com/postgresml/postgresml

1 comment

levkkabout 2 years ago

Links:[1]: <a href="https://huggingface.co/spaces/mteb/leaderboard" rel="nofollow">https://huggingface.co/spaces/mteb/leaderboard</a>[2]: <a href="https://postgresml.org/blog/generating-llm-embeddings-with-open-source-models-in-postgresml" rel="nofollow">https://postgresml.org/blog/generating-llm-embeddings-with-o...</a>[3]: <a href="https://postgresml.org/blog/tuning-vector-recall-while-generating-query-embeddings-in-the-database" rel="nofollow">https://postgresml.org/blog/tuning-vector-recall-while-gener...</a>[4]: <a href="https://postgresml.org/blog/personalize-embedding-vector-search-results-with-huggingface-and-pgvector" rel="nofollow">https://postgresml.org/blog/personalize-embedding-vector-sea...</a>Github: <a href="https://github.com/postgresml/postgresml">https://github.com/postgresml/postgresml</a>

1 comment

levkkabout 2 years ago

Show HN: We unified LLMs, vector memory, ranking, pruning models in one process

1 comment

Show HN: We unified LLMs, vector memory, ranking, pruning models in one process

1 comment