This is mostly a copy of the much better articles:<p><a href="https://yang-song.github.io/blog/2021/score/" rel="nofollow">https://yang-song.github.io/blog/2021/score/</a><p>and<p><a href="https://lilianweng.github.io/posts/2021-07-11-diffusion-models/" rel="nofollow">https://lilianweng.github.io/posts/2021-07-11-diffusion-mode...</a>
Interesting. So do I get this right that if you use such a model, you essentially don't have much control over the output other than that it's similar to your training data, because your input is just white noise? Or is there a way to bundle this with another model that would allow you to generate images based on inputs like 'dog with party hat'?
I followed with interest until this sentence: “ Where
β
1
,
.
.
.
,
β
T
is a variance schedule (either learned or fixed) which, if well-behaved, ensures that
x
T
is nearly an isotropic Gaussian for sufficiently large T”