I.e. a cloud instance that behaves exactly like a normal GPU instance, where behind the scenes the GPU is shared. The advantage here is you only pay when the process is actively using the GPU (not the whole time the instance is running). The downside is the instance would take ~25% longer to run any GPU task.
For anyone curious, here is an early prototype of this tech in action:<p><a href="https://imgur.com/a/2qPN4ru" rel="nofollow">https://imgur.com/a/2qPN4ru</a><p>Would love to hear your thoughts on how we can make this most useful for you!
Sounds like an extremely complex technical problem. I also suggest to look at the use cases when this is needed. One of the problems is that loading weights into the GPU will be so slow that it will be really hard to share the GPU between different processes - causing long time to offload and load. Would love to learn more about what you do.
Yes, I have wanted something like this for a while. I try to avoid using gpus where possible because of the expense, and the ephemeral nature of my use.