This is pretty cool. It looks very smooth and natural. I think it's interesting to start publishing work predicated on responding to poor English as a use-case. (Stable to synonyms is how they talk about it in the paper.)<p>Reading a few comments below, what are these fine wireframe guys/gals good for? Lots; including they can be fed into a controlnet as poses for image generation. Stability of the rendered frames is an ongoing, rapidly improving, area of research. But, these outputs look really nice, and would fit nicely into a lot of text -> animation workflows.