TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Triton Fork for Windows Support

21 点作者 lnyan7 个月前

3 条评论

Scene_Cast27 个月前
This is pretty great. PyTorch uses triton as the backend for torch.compile (the big feature of PyTorch 2.0, and the necessary part for making Flex Attention in the about to be released 2.5 usably fast).<p>Triton&#x27;s team doesn&#x27;t support Windows, and, worse yet, does not accept community PRs to enable any sort of support.<p>Here&#x27;s the github issue: <a href="https:&#x2F;&#x2F;github.com&#x2F;triton-lang&#x2F;triton&#x2F;issues&#x2F;1640">https:&#x2F;&#x2F;github.com&#x2F;triton-lang&#x2F;triton&#x2F;issues&#x2F;1640</a><p>And here&#x27;s the performance comparison of Flex Attention with and without torch.compile (tldr it&#x27;s 3x slower than a standard MHA when not compiled): <a href="https:&#x2F;&#x2F;github.com&#x2F;rasbt&#x2F;LLMs-from-scratch&#x2F;blob&#x2F;76e9a9ec02a1a060aac61608598fdd50cc7d52bd&#x2F;ch03&#x2F;02_bonus_efficient-multihead-attention&#x2F;mha-implementations.ipynb">https:&#x2F;&#x2F;github.com&#x2F;rasbt&#x2F;LLMs-from-scratch&#x2F;blob&#x2F;76e9a9ec02a1...</a><p>EDIT: after taking a look at the repo, the only thing changed in the &quot;46 commits ahead of [official triton]&quot; is the README. Somewhat sketchy.
评论 #41866352 未加载
lostmsu7 个月前
Microsoft should help this project setup CI infra with necessary GPUs.
yjftsjthsd-h7 个月前
&quot;Triton&quot; here is apparently a programming language, which upstream describes as<p>&gt; This is the development repository of Triton, a language and compiler for writing highly efficient custom Deep-Learning primitives. The aim of Triton is to provide an open-source environment to write fast code at higher productivity than CUDA, but also with higher flexibility than other existing DSLs.<p>So if you clicked in expecting the illumos-based virtualization platform, this isn&#x27;t that. Though<p>&gt; This is the basis for torchao, which crucially changes some large models from &quot;can&#x27;t run&quot; to &quot;can run&quot; on consumer GPUs. That&#x27;s easier than supporting them in other quantization frameworks, or letting the consumers use Linux or WSL<p>Does sound neat on its own merits.
评论 #41866783 未加载