TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Solving Machine Learning Performance Anti-Patterns: A Systematic Approach

54 点作者 briggers将近 4 年前

1 comment

ad404b8a372f2b9将近 4 年前
That&#x27;s interesting and reflects my personal training optimization workflow pretty well. Usually I&#x27;ll check nvidia-smi and ensure I have a good GPU util, if not I make sure in order:<p>* That my batch transfers to VRAM are done in a sensible way in the dataloader and don&#x27;t hide CPU-bound preprocessing<p>* That my batch size is large enough<p>* That the model is adequate for the GPU (even convolutional models can be better on the CPU for specific sizes)<p>It&#x27;s good enough to go from a CPU-bound pattern to a GPU-bound one but I don&#x27;t really get that detailed understanding of the spectrum between these so I&#x27;m definitely going to try this tool in the future, especially since it&#x27;s so easy to add.<p>On the subject of optimization tricks, I haven&#x27;t really found any magic bullets, you can&#x27;t always increase the batch size to get 100% util because of the performance implications. FP16 precision has never done anything for me, weirdly. My preprocessing is never CPU-bound unless I do dumb shit in it so rewriting it in cpp would do nothing.