Um, wow. These are really, really good. They are not perfect, but the improvements on fidelity, open mouth, eyes over a GAN-based approach are .. real high.<p>This is the first paper I’ve seen with videos that compare a person with a re-render side by side, and it’s a nice way to see what the model’s good at, and what it’s not.<p>Some perf numbers (which they say are unoptimized): 30-60 hrs on a 3080 for the avatar model, and rendering in the 20-40fps range on the same hardware. Basically good enough for a commercial implementation. They don’t mention latency of the CNN side that I can find, which is obviously a big question for chat scenarios, although not a big deal for pre-rendered scenes.