> FlexGen lowers the resource requirements of running
175B-scale models down to a single 16GB GPU and reaches a generation throughput of 1 token/s with an effective batch size
of 144.<p>I can't imagine what will be happening in LLM space next year this time. Maybe LLM natively integrated into games and browsers.