I'm looking at using OCI at $DAY_JOB for model distribution for fleets of machines also so it's good to see it's getting some traction elsewhere.<p>OCI has some benefits over other systems, namely that tiered caching/pull-through is already pretty battle-tested as is signing etc, beating more naive distribution methods for reliability, performance and trust.<p>If combined with eStargz or zstd::chunked it's also pretty nice for distributed systems as long as you can slice things up into files in such a way that not every machine needs to pull the full model weights.<p>Failing that there are P2P distribution mechanisms for OCI (Dragonfly etc) that can lessen the burden without resorting to DIY on Bittorrent or similar.
Be aware of licensing restrictions. Docker Desktop is free for personal use, but it requires a paid license if you work for an organization sized 250+. This feature seems to be available in Docker Desktop only.
I don't understand why add another domain-specific command to a container manager and go out of scope for what the tool was designed for at first place.
Can’t say I'm a fan of packaging models as docker images. Feels forced - a solution in search of a problem.<p>The existing stack - a server and model file - works just fine. There doesn’t seem to be a need to jam an abstraction layer in there. The core problem docker solves just isn’t there
I'm going to take a contrarian perspective to the theme of comments here...<p>There are currently very good uses for this and likely going to be more. There are increasing numbers of large generative AI models used in technical design work (e.g., semiconductor rules based design/validation, EUV mask design, design optimization). Many/most don't need to run all the time. Some have licensing that is based on length of time running, credits, etc. Some are just huge and intensive, but not run very often in the design glow. Many are run on the cloud but industrial customers are remiss to run them on someone else's cloud<p>Being able to have my GPU cluster/data center be running a ton of different and smaller models during the day or early in the design, and then be turned over to a full CFD or validation run as your office staff goes home seems to be to be useful. Especially if you are in anyway getting billed by your vendor based on run time or similar. It can mean a more flexible hardware investment. The use casae here is going to be Formula 1 teams, silicon vendors, etc. - not pure tech companies.
I have used Replicate Cog, built on docker, fairly heavily and and find it is a decent compromise of features. Docker taking this use case more seriously is quite welcome, though surprisingly late. Local metal GPU support (where available to the containerized application APIs), not currently available in Cog, is attractive though it would require generalization of application code to support containers executable via Cuda and Metal etc.