TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

An answer to: why is Nvidia making GPUs bigger when games don't need them?

21 点作者 stephanst超过 2 年前
I&#x27;ve had this discussion a few times recently including with people who are pretty up-to-date engineers working in IT. It&#x27;s true that the latest offerings from Nvidia both in the gaming and pro range are pretty mind-boggling in terms of power consumption, size, cost, total VRAM etc... but there is actually a very real need for this sort of thing driven by the ML community, maybe more so than most people expect.<p>Here is an example from a project I&#x27;m working on today. This is the console output of a model I&#x27;m training at the moment:<p>-----------------------------<p>I1015 20:04:51.426224 139830830814976 supervisor.py:1050] Recording summary at step 107041. INFO:tensorflow:global step 107050: loss = 1.0418 (0.453 sec&#x2F;step)<p>I1015 20:04:55.421283 139841985250112 learning.py:506] global step 107050: loss = 1.0418 (0.453 sec&#x2F;step) INFO:tensorflow:global step 107060: loss = 0.9265 (0.461 sec&#x2F;step)<p>I1015 20:04:59.865883 139841985250112 learning.py:506] global step 107060: loss = 0.9265 (0.461 sec&#x2F;step) INFO:tensorflow:global step 107070: loss = 0.7003 (0.446 sec&#x2F;step)<p>I1015 20:05:04.328712 139841985250112 learning.py:506] global step 107070: loss = 0.7003 (0.446 sec&#x2F;step) INFO:tensorflow:global step 107080: loss = 0.9612 (0.434 sec&#x2F;step)<p>I1015 20:05:08.808678 139841985250112 learning.py:506] global step 107080: loss = 0.9612 (0.434 sec&#x2F;step) INFO:tensorflow:global step 107090: loss = 1.7290 (0.444 sec&#x2F;step)<p>I1015 20:05:13.288547 139841985250112 learning.py:506] global step 107090: loss = 1.7290 (0.444 sec&#x2F;step)<p>-----------------------------<p>Check out that last line: with a batch size of 42 images (maximum I can fit on my GPU with 24Gb memory) I randomly get the occasional batch where total loss is more than double the moving average over the last 100 batches!<p>There&#x27;s nothing fundamentally wrong with this, but it will throw a wrench in convergence of the model for a number of iterations, and it&#x27;s probably not going to help in reaching the ideal ultimate state of the model within the number of iterations I have planned.<p>This is in part due to the fact that I have a fundamentally unbalanced dataset, and need to apply some pretty large label-wise weight rebalancing in the loss function to account for that... but this is the best representation of reality in the case I am working on!<p>--&gt; The ideal solution RIGHT NOW would be to use a larger batch size, in order to minimise the possibility of getting these large outliers in the training set.<p>To get the best results in the short term I want to train this system (using this small backbone) with batches of 420 images instead of 42, which would require 240Gb of memory... so 3x Nvidia A100 GPUs for example!<p>--&gt; Ultimately the next step is to make a version with a backbone that has 5x more parameters, and on the same dataset but scaled at 2x linear resolution... requiring probably around 500Gb of VRAM to have large enough batches to achieve good convergence on all classes! And bear in mind this is a relatively small model, which can be deployed on a stamp-size Intel Movidius TPU and run in real-time directly inside a tiny sensor. For non-realtime inference there are people out there working on models with 1000x more parameters!<p>So if anyone is wondering why NVIDIA keeps making more and more powerful GPUs, and wondering who could possibly need that much power... this is your answer: the design &#x2F; development of these GPUs is now being pulled forward 90% by people who need these kinds of solutions for ML ops, and the gaming market is 10% of the real-world &quot;need&quot; for this type of power, where 10 years ago that ratio would have been reversed.

16 条评论

PragmaticPulp超过 2 年前
Your premise is not true. Gaming at high resolutions with high frame rates and full settings required powerful GPUs. VR especially demands high frame frame rates to remain immersive while also rendering the scene twice (each eye has a different perspective).<p>We actually have a long way to go before GPUs can really support fully immersive VR. The new 4090 is just barely enough to handle some racing sims like ACC in VR at high resolution and settings.<p>Even 3000 series cards couldn’t deliver playable frame rates at 4K with all of the graphical eye candy turned on (RayTracing, etc.) in games like Cyberpunk 2077. As with all games, you can simply play with lower settings and lower resolutions, but the full experience at 4K really does require something like a 4080 or 4090.<p>And of course, next generation games are being built to take full advantage of this new hardware. It doesn’t make any sense to suggest that GPU vendors should just stop making new progress and expect games to stay at current levels of advancement.
评论 #33214800 未加载
评论 #33214505 未加载
评论 #33214718 未加载
mikewarot超过 2 年前
A few months ago, I thought... ok, this is amazing, an i7 with 8 cores and 32 GB of RAM and an SSD... I&#x27;m set for the next decade!<p>Now I&#x27;m just trying to run Stable Diffusion to generate things an it&#x27;s 30 seconds per iteration at only 512x512 pixels. It looks like I&#x27;ll have a GPU soon enough if I want to do any development with neural networks.<p>What an amazing ride... heck, I still remember thinking back in 1980 that the 10 Megabyte Corvus hard drive my friend was installing for a business would NEVER be filled... do you have any idea how much typing that is? ;-)
评论 #33214728 未加载
评论 #33214694 未加载
machinekob超过 2 年前
No one is saying games don&#x27;t need bigger GPUs is this post some sort of bait?<p>Also Nvidia would probably go more into custom accelerators (tensor cores) as whole industry is moving more towards bigger VRAM and more tf32&#x2F;bf16 units for AI training over standard fp32 gaming workflow and that&#x27;s isn&#x27;t focus of RTX class gaming cards.<p>And 4090&#x2F;3090&#x2F;2080ti is old Titan class card for prosumers mostly focused on 3d rendering&#x2F;video editing etc. and also gaming for AI pros they are milking sweet money out of quadro lineup.<p>If you are rly limited by VRAM you should just go fully TPU not wait for some magic consumer VRAM increase as it won&#x27;t happen for a few years probably (most games in 4k can&#x27;t utilise 16gb.)
评论 #33214826 未加载
willis936超过 2 年前
I have a 4K120 display I use on my computer. My 3070 can&#x27;t play games from the past 7 years at this pixel clock. I&#x27;d be happier with 4K240.<p>Games absolutely need more vector horsepower.
评论 #33214553 未加载
phren0logy超过 2 年前
This has also always been a chicken&#x2F;egg scenario: even if you accept the premise that current graphics cards are adequate for current games (which many here do not), the next round of games will target more powerful graphics cards when they area available.<p>If you build it, they will come.
评论 #33214776 未加载
janmo超过 2 年前
The new RTX 4xxx generation still has the same amount of VRAM, so I don&#x27;t know how it would help you that much.
评论 #33217893 未加载
patresh超过 2 年前
If you need larger batch sizes but don&#x27;t have the VRAM for it, have a look at gradient accumulation (<a href="https:&#x2F;&#x2F;kozodoi.me&#x2F;python&#x2F;deep%20learning&#x2F;pytorch&#x2F;tutorial&#x2F;2021&#x2F;02&#x2F;19&#x2F;gradient-accumulation.html" rel="nofollow">https:&#x2F;&#x2F;kozodoi.me&#x2F;python&#x2F;deep%20learning&#x2F;pytorch&#x2F;tutorial&#x2F;2...</a>).<p>You can accumulate the gradients of multiple batches before doing the weight update step. This allows you to run effectively much larger batch sizes than your GPU would allow without it.
评论 #33217920 未加载
TallGuyShort超过 2 年前
I don&#x27;t think anyone was wondering and I don&#x27;t think games are leaving GPU resources unused, but yes. Nevermind the datasets - language models alone barely fit in-memory on A100s.
miga超过 2 年前
Did you try playing at 8k with top-of-the-line graphics card? Did you ever see 60FPS on the highest level of detail? Did you ever want 240FPS to avoid artifacts during fast action game? Did you ever wonder how many graphics card are needed for the three-monitor setup? Did you ever want a ray-traced action game?<p>All these questions need much more powerful GPUs than current top-of-the-line. And game makers know that better graphics often sells the game.
评论 #33214779 未加载
评论 #33214834 未加载
izacus超过 2 年前
What do you mean &quot;games don&#x27;t need them&quot;?
评论 #33214817 未加载
评论 #33214872 未加载
stephanst超过 2 年前
It seems like I&#x27;ve touched a nerve with a lot of people on here by saying &quot;games don&#x27;t need more powerful GPUs&quot;... let me explain.<p>My contention is not that it&#x27;s impossible to use more power in games, or that developers are not working on games that will be able to fill all that compute power... but that higher quality graphics are no longer pulling the gaming industry forward like what was going on 20 years, and also that improvements in graphics are more and more about AI enhancements to image quality and less and less about having the power to push more vertices around in real time.<p>The market for video games doesn&#x27;t really care about the top 0.5% of PC gamers that can afford to buy the latest GPU from Nvidia every 2 years. The PS4 is still outselling the PS5, if graphics power was the main criterion for the actual market of video gaming that could not happen...
fareesh超过 2 年前
Most new games will not achieve max resolution + refresh rate of modern screens so there is quite a long way to go to catch up to peak monitor quality.<p>Also game developers are making games on the basis of the average configuration, not the other way around.
tsegratis超过 2 年前
&gt; 42 images (maximum I can fit on my GPU with 24Gb memory)<p>Probably I&#x27;m completely wrong, but surely scaling down those images, at least for the initial iterations, will reduce noise, and give you vastly better batches and faster iteration leading to better training -- then as it converges you would crank up the resolution<p>Note that scaling doesn&#x27;t have to be cubic, there are algorithms that crop less &#x27;meaningful&#x27; pixels; giving you smaller files with the same resolution
评论 #33217952 未加载
simonebrunozzi超过 2 年前
Can someone ELI5 what&#x27;s the main difference between Nvidia 3000s, 3060, 3070, 3080, 3080Ti, 4000s, etc? I am just very confused about it and honestly I don&#x27;t want to spend hours to understand something that should be immediately clear.<p>You can downvote me for being lazy, yes. But my point is that there&#x27;s a ton of BS marketing going on in the IT industry, and Nvidia is no exception. Look at the 3nm process, which isn&#x27;t about gate distance anymore. Etc.
评论 #33214652 未加载
评论 #33214631 未加载
评论 #33214639 未加载
Fred27超过 2 年前
Where did you get your 90% for ML stat from? I suspect you just made it up becuase it just happens to fit what you experience personally.<p>I work with software for virtual studios. It&#x27;s very similar to gaming in a lot of ways (even using Unreal Engine) and we really push GPUs to the limits. I&#x27;d still count this in the &quot;for gaming&quot; category as we;re still rendering 3D scene within a strict time limit.
评论 #33217943 未加载
haunter超过 2 年前
&gt;when games don&#x27;t need them<p>Sorry but you are totally wrong. 4k +144hz absolutley needs this kinds of performance. Same as VR
评论 #33214900 未加载