科技回声

If anyone is looking for a more reasonably cost effective solution, Hetzner has 16 vCPU/32GB RAM ARM VMs for $24E/mo that will run 34b Q4 GGUF at around 4 tok/sec. It's not very fast, but it is very cheap.

Something that would be extremely helpful is a good benchmark of various hardware for llm inference. It's really hard to tell how well a GPU will perform or whether it will be supported at all.

SO roughyl how much does this instance cost a day? Like $30? Im kind of confused why it wasnt mentioned, but hey maybe poeple arent as cheap as me. Cool project tho.

One of the tasks that can be accomplished by running LLMs on a CPU is to execute long background tasks that do not require real-time response. llama.cpp seems like a suitable platform for this. It would be interesting to explore how to leverage the various acceleration techniques available on AWS.

I am more interested on running llama.cpp on CPU-only VPSs/EC2. Although it is probably too slow.

Something that would be extremely helpful is a good benchmark of various hardware for llm inference. It's really hard to tell how well a GPU will perform or whether it will be supported at all.

SO roughyl how much does this instance cost a day? Like $30? Im kind of confused why it wasnt mentioned, but hey maybe poeple arent as cheap as me. Cool project tho.

I am more interested on running llama.cpp on CPU-only VPSs/EC2. Although it is probably too slow.

Running Llama.cpp on AWS Instances

5 条评论

Running Llama.cpp on AWS Instances

5 条评论