These are my Python bindings for @ggerganov's llama.cpp. My Python bindings build on this work and provide an easy-to-use interface for Python developers to take advantage of LlamaCPP's powerful inference capabilities.<p>The bindings currently use code from a pending PR of mine to make the original code into more of a library. Hopefully it will get merged into the main repository soon. I have also added a few CLI entry points that get installed along with the python package:<p>* llamacpp-convert - convert pytorch models into GGML format. This is an alias for the existing Python script in llama.cpp and requires PyTorch<p>* llamacpp-quantize - Perform INT4 quantization on the GGML mode. This is a wrapper for the "quantize" C++ program from the original repository and has no dependencies.<p>* llamacpp-cli - This is a Python version of the "main.cpp" program from the original repository that utilizes the bindings.<p>* llamacpp-chat - A wrapper over llamacpp-cli that includes a prompt that makes it behave like a chatbot. This is not very good as of right now.<p>You should theoretically be able to do "pip install llamacpp" and get going on most linux/macOS platforms by just running `llamacpp-cli`. I do not have Windows builds on the CI yet and you may have to build it yourself.<p>The package has no dependencies if you just want to run inference on the models.