TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

SEQUOIA: Exact Llama2-70B on an RTX4090 with half-second per-token latency

131 点作者 zinccat大约 1 年前

7 条评论

spxneo大约 1 年前
this is quite worrying for OpenAI as the rate token prices have been plummeting thanks to Meta and its going to have to keep cutting its prices while capex remains flat. whatever Sam says in interviews just think the opposite and the whole picture comes together.<p>It&#x27;s almost a mathematical certainty that people who invested in OpenAI will need to reincarnate in multiple universes to ever see that money again but no bother many are probably NVIDIA stock holders to even out the damage.
评论 #40262200 未加载
评论 #40262193 未加载
评论 #40262125 未加载
评论 #40262107 未加载
评论 #40262365 未加载
评论 #40262395 未加载
评论 #40263508 未加载
评论 #40262213 未加载
modeless大约 1 年前
I don&#x27;t need exact results. FP8 quantization is almost lossless and even 6-bit quantization is usually acceptable. Can this be combined with quantization?
评论 #40262633 未加载
评论 #40263049 未加载
freeqaz大约 1 年前
So this is 8x faster for serving these models than before? Or is this about it being more deterministic? I can&#x27;t quite tell from reading it.
评论 #40262104 未加载
aussieguy1234大约 1 年前
I&#x27;m looking at buying 2 X RTX 3060s to run LLama 70b for my new PC I just purchased.<p>Will this work, or do I need a Tesla P40 or two?
评论 #40263335 未加载
评论 #40262170 未加载
评论 #40263798 未加载
thelittleone大约 1 年前
Other than portability and privacy, are there any benefits to running a local model with a 4090, versus running the same model on-demand on a cloud service with the same or more powerful card?
评论 #40262248 未加载
评论 #40262338 未加载
评论 #40262412 未加载
评论 #40263084 未加载
zwaps大约 1 年前
Is it me or is this paper basically missing all technical information?<p>I get that Therese proprietary technology, but if so, can we please not put this on arxiv and pretend it’s a scientific contribution?
评论 #40263452 未加载
halyconWays大约 1 年前
Someone get this into koboldcpp