TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

DeepSeek-v3 Technical Report [pdf]

26 pointsby pongogogo5 months ago

2 comments

boroboro45 months ago
Crazy amount of innovations in one technical report:<p>- successful fp8 quantized training for SOTA model<p>- multi token prediction, mostly to improve training results, but also to enable speculative decoding<p>- very high sparsity per request (37B activated params per 671B total params)<p>- using reasoning data (from DeepSeek R1) to fine-tune and improve results on math &amp; coding<p>- manual balancing of compute &#x2F; communication in their infrastructure, up to SM level
pongogogo5 months ago
The big news here is the training costs, $5.576m total cost, equivalent to 2788k training hours on H800 GPU at $2 per hour. This for a model that is (according to DeepSeek&#x27;s own benchmarks) SOTA for open source.