TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Portable LLM apps in Docker [video]

4 点作者 3Sophons6 个月前

1 comment

3Sophons6 个月前
Docker is the leading solution for packaging and deploying portable applications. However, for AI and LLM workloads, Docker containers are often not portable due to the lack of GPU abstraction -- you will need a different container image for each GPU &#x2F; driver combination. In some cases, the GPU is simply not accessible from inside containers. For example, the &quot;impossible triangle of LLM app, Docker, and Mac GPU&quot; refers to the lack of Mac GPU access from containers.<p>Docker is supporting the WebGPU API for container apps. It will allow any underlying GPU or accelerator hardware to be accessed through WebGPU. That means container apps just need to write to the WebGPU API and they will automatically become portable across all GPUs supported by Docker. However, asking developers to rewrite existing LLM apps, which use the CUDA or Metal or other GPU APIs, to WebGPU is a challenge.<p>LlamaEdge provides an ecosystem of portable AI &#x2F; LLM apps and components that can run on multiple inference backends including the WebGPU. It supports any programming language that can be compiled into Wasm, such as Rust. Furthermore, LlamaEdge apps are lightweight and binary portable across different CPUs and OSes, making it an ideal runtime to embed into container images.