Let's say I wanted to build an API to let users upload images and the api would fine tune stable diffusion for them returning either a checkpoint or another api that let's them run inference on the fine-tuned model. Does anyone have any architecture considerations/issues they'd suggest?<p>Two things I'm considering:<p>- Would the problem with this approach of ad-hoc GPUs likely be cold boot? It would take a shit ton of time to load. Though with data center networks speeds that wouldn't be too much of an issue - considering the fine-tuning itself would likely dwarf boot times.<p>- Is it possible to launch remote GPU instances ad-hoc from code? Is there a service that provides this service? Every time a call is made we'd spin up a GPU<p>Maybe the best approach for a V1 is to use the AWS SDK or something similar to just launch instances as calls come in.<p>Appreciate the help!
I am using github actions with a local runner. It simply has a API via post for /finetune (which takes multipart file upload). I have not yet deployed it, as it was a weekend project. The first task was actually fine tuning GPT-J-6B
You can check which of these work best for you-<p>1. Kubeflow pipelines<p>2. Cloud Run using GPU instances<p>3. Knative training<p>4. Banana.dev for launching GPU bound stuff without much cruft