TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Show HN: We unified LLMs, vector memory, ranking, pruning models in one process

4 点作者 levkk大约 2 年前
There is a lot of latency involved shuffling data for modern and complex ML systems in production. In our experience these costs dominate end-to-end user latency, rather than actual model or ANN algorithms, which unfortunately limits what is achievable for interactive applications.<p>We&#x27;ve extended Postgres w&#x2F; open source models from Huggingface, as well as vector search, and classical ML algos, so that everything can happen in the same process. It&#x27;s significantly faster and cheaper, which leaves a large latency budget available to expand model and algorithm complexity. In addition open source models have already surpassed OpenAI&#x27;s text-embedding-ada-002 in quality, not just speed. [1]<p>Here is a series of posts explaining how to accomplish the complexity involved in a typical ML powered application, as a single SQL query, that runs in a single process with memory shared between models and feature indexes, including learned embeddings and reranking models.<p>- Generating LLM embeddings with open source models in the database[2]<p>- Tuning vector recall [3]<p>- Personalize embedding results with application data [4]<p>This allows a single SQL query to accomplish what would normally be an entire application w&#x2F; several model services and databases<p>e.g. for a modern chatbot built across various services and databases<p><pre><code> -&gt; application sends user input data to embedding service &lt;- embedding model generates a vector to send back to application -&gt; application sends vector to vector database &lt;- vector database returns associated metadata found via ANN -&gt; application sends metadata for reranking &lt;- reranking model prunes less helpful context -&gt; application sends finished prompt w&#x2F; context to generative model &lt;- model produces final output -&gt; application streams response to user </code></pre> [1]: https:&#x2F;&#x2F;huggingface.co&#x2F;spaces&#x2F;mteb&#x2F;leaderboard<p>[2]: https:&#x2F;&#x2F;postgresml.org&#x2F;blog&#x2F;generating-llm-embeddings-with-open-source-models-in-postgresml<p>[3]: https:&#x2F;&#x2F;postgresml.org&#x2F;blog&#x2F;tuning-vector-recall-while-generating-query-embeddings-in-the-database<p>[4]: https:&#x2F;&#x2F;postgresml.org&#x2F;blog&#x2F;personalize-embedding-vector-search-results-with-huggingface-and-pgvector<p>Github: https:&#x2F;&#x2F;github.com&#x2F;postgresml&#x2F;postgresml

1 comment

levkk大约 2 年前
Links:<p>[1]: <a href="https:&#x2F;&#x2F;huggingface.co&#x2F;spaces&#x2F;mteb&#x2F;leaderboard" rel="nofollow">https:&#x2F;&#x2F;huggingface.co&#x2F;spaces&#x2F;mteb&#x2F;leaderboard</a><p>[2]: <a href="https:&#x2F;&#x2F;postgresml.org&#x2F;blog&#x2F;generating-llm-embeddings-with-open-source-models-in-postgresml" rel="nofollow">https:&#x2F;&#x2F;postgresml.org&#x2F;blog&#x2F;generating-llm-embeddings-with-o...</a><p>[3]: <a href="https:&#x2F;&#x2F;postgresml.org&#x2F;blog&#x2F;tuning-vector-recall-while-generating-query-embeddings-in-the-database" rel="nofollow">https:&#x2F;&#x2F;postgresml.org&#x2F;blog&#x2F;tuning-vector-recall-while-gener...</a><p>[4]: <a href="https:&#x2F;&#x2F;postgresml.org&#x2F;blog&#x2F;personalize-embedding-vector-search-results-with-huggingface-and-pgvector" rel="nofollow">https:&#x2F;&#x2F;postgresml.org&#x2F;blog&#x2F;personalize-embedding-vector-sea...</a><p>Github: <a href="https:&#x2F;&#x2F;github.com&#x2F;postgresml&#x2F;postgresml">https:&#x2F;&#x2F;github.com&#x2F;postgresml&#x2F;postgresml</a>