Hello all,<p>I have been pondering the pros and cons of building a high-end PC vs renting a high performance instance from a Cloud provider (e.g. a P2 instance from AWS) for running Data Science/Machine Learning algorithms.<p>Has anyone faced the same dilemma before, and if so what were the deciding factors for you?
I'm biased (Disclosure: I work on Google Cloud), but having a single GPU in your local machine for testing and then doing real training on a cloud provider seems best.<p>If you load your workstation at home with the best GPUs money can buy today, you've spent a big pile of cash and in six-to-twelve months it's no longer the best (see our announcement about bringing P100s to Compute Engine in the next few months). Moreover, your single awesome machine can only train one thing at a time. By renting several such VMs (on us or other providers), you can use distributed training to iterate more quickly on your models or explore totally different problems in parallel.<p>Doing so at home multiplies your cost for each machine. With the pay as you go model, if you were going to compute the answer, it costs the same (roughly) to do it all at once versus serially. So why wait?<p>Again, big disclosure: I work on Google Cloud, and have ML Services and GPUs to sell you.
I think that provided your own desktop or laptop is powerful enough I would just use that for testing and development. Cloud wins out over a custom PC build due to the flexibility you get.<p>Want to run an algorithm on 10 nodes for 5 hours? No problem. Want to leave something running for days connecting to the twitter firehose? Sure.<p>However building PCs and your own clusters is quite fun! Check out these <a href="https://www.picocluster.com/collections/cubes/products/pico-3-odroid-c2-cluster-cube" rel="nofollow">https://www.picocluster.com/collections/cubes/products/pico-...</a>
Curious what hardware you have currently available and what workloads you are considering running and for what purpose. I mean:<p>1. Today's low end hardware is the high performance hardware of just a few years ago.<p>2. Most data isn't big and if the data is big then moving compute to the data is usually a better practice than moving data to the compute.