TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Running Llama.cpp on AWS Instances

96 点作者 schappim超过 1 年前

5 条评论

mikeravkine超过 1 年前
If anyone is looking for a more reasonably cost effective solution, Hetzner has 16 vCPU/32GB RAM ARM VMs for $24E/mo that will run 34b Q4 GGUF at around 4 tok/sec. It's not very fast, but it is very cheap.
joelthelion超过 1 年前
Something that would be extremely helpful is a good benchmark of various hardware for llm inference. It's really hard to tell how well a GPU will perform or whether it will be supported at all.
ionwake超过 1 年前
SO roughyl how much does this instance cost a day? Like $30? Im kind of confused why it wasnt mentioned, but hey maybe poeple arent as cheap as me. Cool project tho.
评论 #38444341 未加载
alekseiprokopev超过 1 年前
One of the tasks that can be accomplished by running LLMs on a CPU is to execute long background tasks that do not require real-time response. llama.cpp seems like a suitable platform for this. It would be interesting to explore how to leverage the various acceleration techniques available on AWS.
ilaksh超过 1 年前
I am more interested on running llama.cpp on CPU-only VPSs/EC2. Although it is probably too slow.
评论 #38444244 未加载