TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

SEQUOIA: Exact Llama2-70B on an RTX4090 with half-second per-token latency

131 pointsby zinccatabout 1 year ago

7 comments

spxneoabout 1 year ago
this is quite worrying for OpenAI as the rate token prices have been plummeting thanks to Meta and its going to have to keep cutting its prices while capex remains flat. whatever Sam says in interviews just think the opposite and the whole picture comes together.<p>It&#x27;s almost a mathematical certainty that people who invested in OpenAI will need to reincarnate in multiple universes to ever see that money again but no bother many are probably NVIDIA stock holders to even out the damage.
评论 #40262200 未加载
评论 #40262193 未加载
评论 #40262125 未加载
评论 #40262107 未加载
评论 #40262365 未加载
评论 #40262395 未加载
评论 #40263508 未加载
评论 #40262213 未加载
modelessabout 1 year ago
I don&#x27;t need exact results. FP8 quantization is almost lossless and even 6-bit quantization is usually acceptable. Can this be combined with quantization?
评论 #40262633 未加载
评论 #40263049 未加载
freeqazabout 1 year ago
So this is 8x faster for serving these models than before? Or is this about it being more deterministic? I can&#x27;t quite tell from reading it.
评论 #40262104 未加载
aussieguy1234about 1 year ago
I&#x27;m looking at buying 2 X RTX 3060s to run LLama 70b for my new PC I just purchased.<p>Will this work, or do I need a Tesla P40 or two?
评论 #40263335 未加载
评论 #40262170 未加载
评论 #40263798 未加载
thelittleoneabout 1 year ago
Other than portability and privacy, are there any benefits to running a local model with a 4090, versus running the same model on-demand on a cloud service with the same or more powerful card?
评论 #40262248 未加载
评论 #40262338 未加载
评论 #40262412 未加载
评论 #40263084 未加载
zwapsabout 1 year ago
Is it me or is this paper basically missing all technical information?<p>I get that Therese proprietary technology, but if so, can we please not put this on arxiv and pretend it’s a scientific contribution?
评论 #40263452 未加载
halyconWaysabout 1 year ago
Someone get this into koboldcpp