This is really interesting. For SOTA inference systems, I've seen two general approaches:<p>* The "stack-centric" approach such as vLLM production stack, AIBrix, etc. These set up an entire inference stack for you including KV cache, routing, etc.<p>* The "pipeline-centric" approach such as NVidia Dynamo, Ray, BentoML. These give you more of an SDK so you can define inference pipelines that you can then deploy on your specific hardware.<p>It seems like LLM-d is the former. Is that right? What prompted you to go down that direction, instead of the direction of Dynamo?