TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

On-Device Acceleration of Large Diffusion Models via GPU-Aware Optimizations

11 点作者 lnyan大约 2 年前

2 条评论

djoldman大约 2 年前
Abstract<p>The rapid development and application of foundation models have revolutionized the field of artificial intelligence. Large diffusion models have gained significant attention for their ability to generate photorealistic images and support various tasks. On-device deployment of these models provides benefits such as lower server costs, offline functionality, and improved user privacy. However, common large diffusion models have over 1 billion parameters and pose challenges due to restricted computational and memory resources on devices. We present a series of implementation optimizations for large diffusion models that achieve the fastest reported inference latency to-date(under 12 seconds for Stable Diffusion 1.4 without INT8 quantization for a 512 × 512 image with 20 iterations) on GPU-equipped mobile devices. These enhancements broaden the applicability of generative AI and improve the overall user experience across a wide range of devices.
ArtWomb大约 2 年前
&quot;explore the potential benefits of employing Winograd with varying tile sizes on the 3 × 3 kernel convolutions. Our findings led us to select a 4 × 4 tile size&quot;<p>A very common GPU optimization ;)<p>12s Text-to-Image running native on mobile gpus? I&#x27;m on Verizon.com right now ordering a Samsung Ultra 23 as my new AI Research Cray-1 Supercomputer!