I ran an uncensored model on a CPU server. as expected its dead slow (min or two per query).<p>What kinda hardware (GPU) do i need to serve 1k RPS?<p>I could not find APIs for uncensored models that kinda forced me to run locally
Depends on your model size and how many of it can fit in memory. Multiply the size by 1k and divide by the memory capacity of the hardware for a rough ballpark.