Some interesting limitations here, but overall great work. The output is only 256x256, which doesn't seem useful in the short run. It would be interesting to see if there was another deep learning network that could fill in the pixels and net a higher resolution. It also looks like it uses models specifically trained for that given category (i.e. birds, flowers). I wonder how it would fair trying to visualize a distinct animal / flower that we only have a written description of.
The Github page shows a nice overview and some examples: <a href="https://github.com/hanzhanggit/StackGAN" rel="nofollow">https://github.com/hanzhanggit/StackGAN</a>. Interestingly it seems to struggle with the birds' legs (e.g. merging them with branches, blurring them or just not generating them). Is this is a problem with the size of the feature or the colors or ...? I know very little about this field but I find it fascinating.