The thing I'm looking forward to most is having Flash Attention built-in. Right now you have to use xformers or similar, but that dependency has been a nightmare to use, from breaking, to requiring specific concoctions of installing dependencies or else conda will barf, to being impossible to pin because I have to use -dev releases which they constantly drop from the repositories.<p>PyTorch 2.0 comes with a few different efficient transformer implementations built-in. And unlike 1.13, they work during training and don't require specific configurations. Seemed to work just fine during my pre-release testing. Also, having it built into PyTorch might mean more pressure to keep it optimized. As-is xformers targets A100 primarily, with other archs as an afterthought.<p>And, as promised, `torch.compile` worked out of the box, providing IIRC a nice ~20% speed up on a ViT without any other tuning.<p>I did have to do some dependency fiddling on the pre-release version. Been looking forward to the "stable" release before using it more extensively.<p>Anyone else seeing nice boosts from `torch.compile`?
>Python 3.11 support on Anaconda Platform<p>>Due to lack of Python 3.11 support for packages that PyTorch depends on, including NumPy, SciPy, SymPy, Pillow and others on the Anaconda platform. We will not be releasing Conda binaries compiled with Python 3.11 for PyTorch Release 2.0. The Pip packages with Python 3.11 support will be released, hence if you intend to use PyTorch 2.0 with Python 3.11 please use our Pip packages.<p>It really sucks that anaconda always lags behind. I know the reasoning*, and I know it makes sense for what a lot of teams use it for... but on our side we are now looking more and more into dropping it since we are more of an R&D team. We already use containers for most of our pipelines, so just using pip might be viable.<p>*Though I guess Anaconda chewed more than it can handle w.r.t managing an entire Python universe, and keeping up to date. Conda-forge is already almost a requirement but using the official package (with pip, in this case) has its own benefits for very complex packages like pytorch.
I'm hoping torch.compile is a gateway to "easy" non-Nvidia accelerator support in PyTorch.<p>Also, I have been using torch.compile for the Stable Diffusion unet/vae since February, to good effect. I'm guessing similar optimizations will pop up for LLaMA.
100% backward compatible<p>That's (for me) the biggest reason why tensor flow fell out of flavor: the API broke too often (not just between tf 1 and 2)
If anyone can edit it, I found a typo:<p>> Python 1.8 (deprecating Python 1.7)<p>> Deprecation of Cuda 11.6 and Python 1.7 support for PyTorch 2.0<p>It is clearly supposed to be python 3.8 and 3.7 respectively.
> As an underpinning technology of torch.compile, TorchInductor with Nvidia and AMD GPUs will rely on OpenAI Triton deep learning compiler to generate performant code and hide low level hardware details. OpenAI Triton-generated kernels achieve performance that’s on par with hand-written kernels and specialized cuda libraries such as cublas.
discussion from (presumably) the PyTorch Conference announcement: <a href="https://news.ycombinator.com/item?id=33832511" rel="nofollow">https://news.ycombinator.com/item?id=33832511</a>