TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Solving Machine Learning Performance Anti-Patterns: A Systematic Approach

54 pointsby briggersalmost 4 years ago

1 comment

ad404b8a372f2b9almost 4 years ago
That&#x27;s interesting and reflects my personal training optimization workflow pretty well. Usually I&#x27;ll check nvidia-smi and ensure I have a good GPU util, if not I make sure in order:<p>* That my batch transfers to VRAM are done in a sensible way in the dataloader and don&#x27;t hide CPU-bound preprocessing<p>* That my batch size is large enough<p>* That the model is adequate for the GPU (even convolutional models can be better on the CPU for specific sizes)<p>It&#x27;s good enough to go from a CPU-bound pattern to a GPU-bound one but I don&#x27;t really get that detailed understanding of the spectrum between these so I&#x27;m definitely going to try this tool in the future, especially since it&#x27;s so easy to add.<p>On the subject of optimization tricks, I haven&#x27;t really found any magic bullets, you can&#x27;t always increase the batch size to get 100% util because of the performance implications. FP16 precision has never done anything for me, weirdly. My preprocessing is never CPU-bound unless I do dumb shit in it so rewriting it in cpp would do nothing.