TechEcho

5 comments

zakover 6 years ago

Here is a larger-scale comparison of Cloud TPU and Google Cloud GPU performance and cost (focused on Cloud TPU Pods): <a href="https://cloud.google.com/blog/products/ai-machine-learning/now-you-can-train-ml-models-faster-and-lower-cost-cloud-tpu-pods" rel="nofollow">https://cloud.google.com/blog/products/ai-machine-learning/n...</a><p>All the code used in that comparison is open source, and there is a detailed methodology page with instructions that you can follow if you want to reproduce the results: <a href="https://github.com/tensorflow/tpu/blob/master/benchmarks/ResNet-50_v1.5_Performance_Comparison_TensorFlow_1.12_GCP.md" rel="nofollow">https://github.com/tensorflow/tpu/blob/master/benchmarks/Res...</a><p>Also, Cloud TPUs are available to everyone for free via Colab. Here is a sample Colab that shows how to train a Keras model on the Fashion MNIST dataset using the Adam optimizer: <a href="https://colab.research.google.com/github/tensorflow/tpu/blob/master/tools/colab/fashion_mnist.ipynb" rel="nofollow">https://colab.research.google.com/github/tensorflow/tpu/blob...</a><p>(I work on Cloud TPUs)

评论 #18964131 未加载

trishumeover 6 years ago

This doesn't seem like a very informative benchmark to me. They don't mention how/whether they tuned the learning rates and batch sizes to optimize for each different device. Like they mention, they also use a very small network that isn't something you need the power of a TPU to train quickly and may scale differently than a large network.<p>They also don't post their code so I can't check that their problems with ADAM aren't due to using L2 regularization, which <a href="https://arxiv.org/abs/1711.05101" rel="nofollow">https://arxiv.org/abs/1711.05101</a> shows leads to worse performance than SGD and you should use weight decay instead.

评论 #18961813 未加载

评论 #18961659 未加载

twtwover 6 years ago

Looks like riseml has shut down and taken their comparison post down. I was hoping to compare the results.

评论 #18961039 未加载

deepnotderpover 6 years ago

Hmm interesting, is it possible that the batch size for the tpu is larger? I'm guessing they might be using some sort of large batches to populate their giant GEMM cores

andrewtbhamover 6 years ago

My understanding is that Teslas have scientific accuracy that is not needed for deep learning... my machine has nvidia 1080s.

评论 #18962018 未加载

5 comments

zakover 6 years ago

评论 #18964131 未加载

trishumeover 6 years ago

评论 #18961813 未加载

评论 #18961659 未加载

twtwover 6 years ago

Looks like riseml has shut down and taken their comparison post down. I was hoping to compare the results.

评论 #18961039 未加载

deepnotderpover 6 years ago

Hmm interesting, is it possible that the batch size for the tpu is larger? I'm guessing they might be using some sort of large batches to populate their giant GEMM cores

andrewtbhamover 6 years ago

My understanding is that Teslas have scientific accuracy that is not needed for deep learning... my machine has nvidia 1080s.

评论 #18962018 未加载

Cost Comparison of Deep Learning Hardware: Google TPUv2 vs. Nvidia Tesla V100

5 comments

Cost Comparison of Deep Learning Hardware: Google TPUv2 vs. Nvidia Tesla V100

5 comments