I have a solution that solves a set of problems that keep showing up in ML workloads. The kind of systems I'm talking about are ones where:<p>- You have a GPU attached to each instance.<p>- Each request takes anywhere from 10ms to 2min.<p>- There's a hard limit on the number of in-flight requests/queries (I assume because of the GPUs).<p>Normally, I see people fronting the instances with software load balancers, but this doesn't work very well for reasons. Assuming I have a solution in the form of a fancy load balancer, how would I go about monetizing it? Let's assume the solution is non-trivial to create, but very straightforward to use (essentially a drop-in replacement).<p>I ask because I don't think I can just "sell a fancy load balancer" like it's the late 90s or something. Modern companies appear to always have more complicated products and I just want to sell a straightforward piece of infrastructure that solves a fairly hard problem.<p>Thanks in advance.
Is there accessible documentation which covers installation & non-functional requirements (aka hardware/software requirements & how to setup/use the solution)
what was done to access "doesn't work very well for reasons"? aka monitored systems in questions and saw ...... ?????<p>What were the "reasons" for "doesn't work very well? aka trying to do goolgle search type work on 2mb intel 486 oover a 2mb network and expecting to be able to compete with google is never going to work out.<p>What type of load balancing? Load balancing typically has to be tuned/adjusted based on end usage requirements/production environment (not just per factory setting)
> but this doesn't work very well for reasons.<p>Which reasons? In my experience/exposure, people are perfectly happy with Proxmox on a big GPU-laden boxen.