The redditor was using img2img to do style transfer frame by frame, which is why we have the jumping faces all the time (“instability”).<p>This isn’t a limitation of neural nets, as early as 2018 we’ve had stable style transfer for videos, see <a href="https://medium.com/element-ai-research-lab/stabilizing-neural-style-transfer-for-video-62675e203e42" rel="nofollow">https://medium.com/element-ai-research-lab/stabilizing-neura...</a>
Ghibli style is a real stretch. Even discounting "style" means more than just the colors on the frame (what shot compositions, what's in frame, where does the camera linger, etc) the redditor didn't really succeed in capturing even one frame that had the visual style of a Ghibli frame.<p>Praying stuff like this feels... Not crass, exactly, but like... Unrefined and ersatz in a way. It's like buying a mall katana and displaying it like it's something of cultural value.
If I didn't know context, looks almost stylistically intentional. Crazy to think a few years ago this would have taken untold tens of thousands of man hours to plan and animate, would have probably gone viral and won a bunch of awards. Now it's just an interesting accident from fucking around.
Not even close to right or stable, falls apart in the tighter shot on geralt when it no longer recognizes him as human, and AI of course has no idea about 3D context so can never do cell shading in the way it is being trained.<p>This was honestly a great waste of compute power.<p>To the downvoters, do what you want, but it doesn't change the apparent reality of what we're seeing. AI image generation is an exciting and growing technology but this isn't a remarkable or successful application of it in any measurable dimension. The system lacks temporal and spatial context and that is going to be a hard blocker for this application.
Crazy. Just imagine what you could do with it. You could basically just do some amateur scenes in your garden and have SD make an anime masterpiece of it.