TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Show HN: Python Bindings for llama.cpp with some CLIs

2 pointsby tantonyabout 2 years ago
These are my Python bindings for @ggerganov&#x27;s llama.cpp. My Python bindings build on this work and provide an easy-to-use interface for Python developers to take advantage of LlamaCPP&#x27;s powerful inference capabilities.<p>The bindings currently use code from a pending PR of mine to make the original code into more of a library. Hopefully it will get merged into the main repository soon. I have also added a few CLI entry points that get installed along with the python package:<p>* llamacpp-convert - convert pytorch models into GGML format. This is an alias for the existing Python script in llama.cpp and requires PyTorch<p>* llamacpp-quantize - Perform INT4 quantization on the GGML mode. This is a wrapper for the &quot;quantize&quot; C++ program from the original repository and has no dependencies.<p>* llamacpp-cli - This is a Python version of the &quot;main.cpp&quot; program from the original repository that utilizes the bindings.<p>* llamacpp-chat - A wrapper over llamacpp-cli that includes a prompt that makes it behave like a chatbot. This is not very good as of right now.<p>You should theoretically be able to do &quot;pip install llamacpp&quot; and get going on most linux&#x2F;macOS platforms by just running `llamacpp-cli`. I do not have Windows builds on the CI yet and you may have to build it yourself.<p>The package has no dependencies if you just want to run inference on the models.

no comments

no comments