Mmm…<p>The wasm-nn that this relies on (<a href="https://github.com/WebAssembly/wasi-nn">https://github.com/WebAssembly/wasi-nn</a>) is a proposal that relies on sending arbitrarily chunks to some vendor implementation. The api is literally like set input, compute, set output.<p>…and that is totally non portable.<p>The reason <i>this</i> works, is because it’s relying on the abstraction already implemented in llama.cpp that allows it to take a gguf model and map it to multiple hardware targets,which you can see has been lifted as-is into WasmEdge here: <a href="https://github.com/WasmEdge/WasmEdge/tree/master/plugins/wasi_nn/thirdparty/ggml">https://github.com/WasmEdge/WasmEdge/tree/master/plugins/was...</a><p>So..<p>> Developers can refer to this project to write their machine learning application in a high-level language using the bindings, compile it to WebAssembly, and run it with a WebAssembly runtime that supports the wasi-nn proposal, such as WasmEdge.<p>Is total rubbish; no, you can’t.<p>This isn’t portable.<p>It’s not sandboxed.<p>It’s not a HAL.<p>If you have a wasm binary you <i>might</i> be able to run it <i>if</i> the version of the runtime you’re using <i>happens</i> to implement the specific ggml backend you need, which it probably doesn’t… because there’s literally no requirement for it to do so.<p>…and if you do, you’re just calling the llama.cpp ggml code, so it’s as safe as that library is…<p>There’s a lot of “so portable” and “such rust” talk in this article which really seems misplaced; this doesn’t seem to have the benefits of either of those two things.<p>Let’s imagine you have some new hardware with a WASI runtime on it, can you run your model on it? Does it have GPU support?<p>Well, turns out the answer is “go and see if llama.cpp compiles on that platform with GPU support and if the runtime you’re using happens have a ggml plugin in it and happens to have a copy of that version of ggml vendored in it, and if not, then no”.<p>..at which point, wtf are you even using WASI for?<p>Cross platform GPU support <i>is</i> hard, but this… I dunno. It seems absolutely ridiculous.<p>Imagine if webGPU was just “post some binary chunk to the GPU and maybe it’ll draw something or whatever if it’s the right binary chunk for the current hardware.”<p>That’s what this is.