科技回声

10 条评论

I dislike that pytorch advertises TPU support. Pytorch doesn’t support TPUs. Pytorch supports a gimped version of TPUs that have no access to the TPU CPU, a massive 300GB memory store that handles infeed. No infeed means you have to feed the TPUs manually, on demand, like a gpu. And TPUs are not GPUs. When you try to do that, you’re talking at least a 40x slowdown, no exaggeration. The TPU CPU is the heart of the TPU’s power and advantage over GPUs, and neither pytorch nor Jax support it at all yet. No MLPerf benchmark will ever use pytorch in its current form on TPUs.Luckily, that form is changing. There are interesting plans. But they are still just plans.It’s better to go the other direction, I think. I ported pytorch to tensorflow: <a href="https://twitter.com/theshawwn/status/1311925180126511104?s=21" rel="nofollow">https://twitter.com/theshawwn/status/1311925180126511104?s=2...</a>Pytorch is mostly just an api. And that api is mostly python. When people say they “like pytorch”, they’re expressing a preference for how to organize ML code, not for the set of operations available to you when you use pytorch.

评论 #24722098 未加载

评论 #24722159 未加载

评论 #24723444 未加载

评论 #24721238 未加载

high_derivative超过 4 年前

I am extremely pessimistic for ML ops startup like this. At the end of the day, cloud service providers have too much of an incentive to provide these tools for free as a cloud value add.The other thing is that stitching together other open source tools like this is simply not enough value. Who will be incentivised to buy?Saying this as FAANG ML org person where I see the push to open source ops tooling like this.

评论 #24721143 未加载

评论 #24721089 未加载

评论 #24721161 未加载

neilc超过 4 年前

Congratulations to the Grid team on the fundraise and the announcement! Exciting stuff.It seems like there is an emerging consensus that (a) DL development requires access to massive compute, but (b) if you’re only using off-the-shelf PyTorch or TensorFlow, moving your model from your personal development environment to a cluster or cloud setting is too difficult — it is easy to spend most of your time managing infrastructure rather than developing models. At Determined AI, we’ve spent the last few years building an open source DL training platform that tries to make that process a lot simpler (<a href="https://github.com/determined-ai/determined" rel="nofollow">https://github.com/determined-ai/determined</a>), but I think it's fair to say that this is still very much an open space and an important problem. Curious to take a look at Grid AI and see how it compares to other tools in the space -- some other alternatives include Kubeflow, Polyaxon, and Spell AI.

评论 #24732483 未加载

评论 #24724152 未加载

评论 #24730435 未加载

minimaxir超过 4 年前

So this is the endgame of pytorch-lightning, which was always a mystery to me. (if you haven't used it, it's strongly recommended if you use PyTorch: <a href="https://github.com/PyTorchLightning/pytorch-lightning" rel="nofollow">https://github.com/PyTorchLightning/pytorch-lightning</a> )IMO, open source is at its best when it's supported by a SaaS as it provides a strong incentive to keep the project up-to-date, and the devs of PL have been very proactive.

评论 #24721005 未加载

visarga超过 4 年前

How do you handle the security of training data? If the data is super sensitive how do you deal with it?I know the same could be said about Azure and AWS, but the big name cloud providers stake their prestige on having tight security, while a startup has much less to lose.

评论 #24724779 未加载

seibelj超过 4 年前

The name is unfortunately close to “The Grid”, an AI website builder that had a lot of buzz then scammed a lot of people out of money then disappeared <a href="https://medium.com/@seibelj/the-grid-over-promise-under-deliver-and-the-lies-told-by-ai-startups-40aa98415d8e" rel="nofollow">https://medium.com/@seibelj/the-grid-over-promise-under-deli...</a>

评论 #24721328 未加载

ishcheklein超过 4 年前

More on this is here <a href="https://techcrunch.com/2020/10/08/grid-ai-raises-18-6m-series-a-to-help-ai-researchers-and-engineers-bring-their-models-to-production/" rel="nofollow">https://techcrunch.com/2020/10/08/grid-ai-raises-18-6m-serie...</a>What do you think folks?

kbash9超过 4 年前

Seems like Pytorch lightening is the only first-class citizen in your offering. Is that true? Or are there value-added features for TensorFlow and other non-DL libraries such as scikit-learn?Also, is there support for distributed training for large datasets that don't fit into single instance memory? or just distributed grid-search/hyper-parameter optimization?

评论 #24724178 未加载

xiaodai超过 4 年前

Sadly Julia's Flux.jl have fallen behind to the point that I am switching to using pytorch. It's just faster.

评论 #24725497 未加载

bkkaggle超过 4 年前

i used pytorch lightning back in may when i was working on pretraining gpt2 on TPUs (<a href="https://bkkaggle.github.io/blog/nlp-research-part-2/" rel="nofollow">https://bkkaggle.github.io/blog/nlp-research-part-2/</a>). it was really impressive how stable it was especially given how a lot of features were still being added at a very fast pace.also, this was probably the first (and maybe still is?) high-level pytorch library that let you train on tpus without a lot of refactoring and bugs which was a really nice thing to be able to do given how the pytorch-xla api was still unstable at that point. <3

10 条评论

sillysaurusx超过 4 年前

评论 #24722098 未加载

评论 #24722159 未加载

评论 #24723444 未加载

评论 #24721238 未加载

high_derivative超过 4 年前

评论 #24721143 未加载

评论 #24721089 未加载

评论 #24721161 未加载

neilc超过 4 年前

评论 #24732483 未加载

评论 #24724152 未加载

评论 #24730435 未加载

minimaxir超过 4 年前

评论 #24721005 未加载

visarga超过 4 年前

评论 #24724779 未加载

seibelj超过 4 年前

评论 #24721328 未加载

ishcheklein超过 4 年前

kbash9超过 4 年前

评论 #24724178 未加载

xiaodai超过 4 年前

Sadly Julia's Flux.jl have fallen behind to the point that I am switching to using pytorch. It's just faster.

评论 #24725497 未加载

bkkaggle超过 4 年前

Grid: AI platform from the makers of PyTorch Lightning

10 条评论

Grid: AI platform from the makers of PyTorch Lightning

10 条评论