We've been working on a Python SDK[1] for PostgresML to make it easier for application developers to get the performance and scalability benefits of integrated memory for LLMs, by combining embedding generation, vector recall and LLM tasks from HuggingFace in a single database query.<p>This work builds on our previous efforts that give a 10x performance improvement from generating the LLM embedding[2] from input text along with tuning vector recall[3] in a single process to avoid excessive network transit.<p>We'd love your feedback on our roadmap[4] for this extension, if you have other use cases for an ML application database. So far, we've implemented our best practices for scalable vector storage to provide an example reference implementation for interacting with an ML application database based on Postgres.<p>[1]: <a href="https://github.com/postgresml/postgresml/tree/master/pgml-sdks/python/pgml">https://github.com/postgresml/postgresml/tree/master/pgml-sd...</a>
[2]: <a href="https://postgresml.org/blog/generating-llm-embeddings-with-open-source-models-in-postgresml" rel="nofollow">https://postgresml.org/blog/generating-llm-embeddings-with-o...</a>
[3]: <a href="https://postgresml.org/blog/tuning-vector-recall-while-generating-query-embeddings-in-the-database" rel="nofollow">https://postgresml.org/blog/tuning-vector-recall-while-gener...</a>
[4]: <a href="https://github.com/postgresml/postgresml/tree/master/pgml-sdks/python/pgml#roadmap">https://github.com/postgresml/postgresml/tree/master/pgml-sd...</a>