科技回声

7 条评论

Centigonal大约 2 个月前

The GPU-hours stat here allows us to back out some interesting figures around electricity usage and carbon emissions if we make a few assumptions.2,788,000 GPU-hours * 350W TDP of H800 = 975,800,000 GPU Watt-hours975,800,000 GPU Wh * (1.2 to account for non-GPU hardware) * (1.3 PUE [1]) = 1,522,248,000 Total Wh, or 1,522,248 kWh to train DeepSeek-V3(1,522,248 kWh) * (0.582kg CO2eq/kWh in China [2]) = 885,948 kg CO2 equivalents to train DeepSeek-V3A typical US passenger vehicle emits about 4.6 metric tons of CO2 per year. [3]885,948 kg CO2 per DeepSeek / 4,600 kg CO2 per car = 192.6 cars per DeepSeekSo, the final training run for DeepSeek-V3 emitted as much greenhouse gasses as would be emitted from running about 193 more cars on the road for a year.I also did some more math and found that this training run used about as much electricity as 141 US households would use over the course of a year. [4][1] <a href="https://enviliance.com/regions/east-asia/cn/report_10060" rel="nofollow">https://enviliance.com/regions/east-asia/cn/report_10060</a>[2] <a href="https://ourworldindata.org/grapher/carbon-intensity-electricity" rel="nofollow">https://ourworldindata.org/grapher/carbon-intensity-electric...</a>[3] <a href="https://www.epa.gov/greenvehicles/greenhouse-gas-emissions-typical-passenger-vehicle" rel="nofollow">https://www.epa.gov/greenvehicles/greenhouse-gas-emissions-t...</a>[4] divided total kWh by the value here: <a href="https://www.eia.gov/tools/faqs/faq.php?id=97&t=3" rel="nofollow">https://www.eia.gov/tools/faqs/faq.php?id=97&t=3</a>

评论 #43491231 未加载

评论 #43491063 未加载

评论 #43493196 未加载

skummetmaelk大约 2 个月前

The fact that you can unironically put the "only" modifier on a training time of 2.8 million GPU hours is nuts.

评论 #43490819 未加载

评论 #43491035 未加载

danielhanchen大约 2 个月前

Re DeepSeek-V3 0324 - I made some 2.7bit dynamic quants (230GB in size) for those interested in running them locally via llama.cpp! Tutorial on getting and running them: <a href="https://docs.unsloth.ai/basics/tutorial-how-to-run-deepseek-v3-0324-locally">https://docs.unsloth.ai/basics/tutorial-how-to-run-deepseek-...</a>

评论 #43492541 未加载

kristjansson大约 2 个月前

Hasn't been updated for the -0324 release unfortunately, and diff-pdf shows only a few small additions (and consequent layout shift) for the updated arxiv version on Feb 18.

gdiamos大约 2 个月前

Nice to see a return to open source in models and training systems.

评论 #43491634 未加载

tmabraham大约 2 个月前

<a href="https://x.com/iScienceLuvr/status/1905144432791609480" rel="nofollow">https://x.com/iScienceLuvr/status/1905144432791609480</a>

benob大约 2 个月前

I like that they give advice to hardware manufacturers: - offload communication to a dedicated co-proc - implement decent precision for accumulating fp8 operations - finer-grained quantization ...

7 条评论

Centigonal大约 2 个月前

评论 #43491231 未加载

评论 #43491063 未加载

评论 #43493196 未加载

skummetmaelk大约 2 个月前

The fact that you can unironically put the "only" modifier on a training time of 2.8 million GPU hours is nuts.

评论 #43490819 未加载

评论 #43491035 未加载

danielhanchen大约 2 个月前

评论 #43492541 未加载

kristjansson大约 2 个月前

Hasn't been updated for the -0324 release unfortunately, and diff-pdf shows only a few small additions (and consequent layout shift) for the updated arxiv version on Feb 18.

gdiamos大约 2 个月前

Nice to see a return to open source in models and training systems.

评论 #43491634 未加载

tmabraham大约 2 个月前

<a href="https://x.com/iScienceLuvr/status/1905144432791609480" rel="nofollow">https://x.com/iScienceLuvr/status/1905144432791609480</a>

benob大约 2 个月前

I like that they give advice to hardware manufacturers: - offload communication to a dedicated co-proc - implement decent precision for accumulating fp8 operations - finer-grained quantization ...

DeepSeek-V3 Technical Report

7 条评论

DeepSeek-V3 Technical Report

7 条评论