One related property of GPT-3: It's very bad at traditional computational tasks.<p>* "Make a list of 20 items" results in a list. The number of items is as accurate as if you asked a toddler the same question.<p>* If you ask GPT-3 a simple combinatorics question, it will be 100% confident in the wrong answer.<p>Origami is sort of the same. It takes a conceptual understanding of how paper folds, which DALL-E Mini doesn't have. It has a feel for the general origaminess of a picture.<p>If I showed a human being a few pieces of origami, including a paper crane, and they had never seen origami before, they'd likely result in similar pictures.
I know this is a little bit banal, but I feel like:
(1) the author is thinking about "origami"
(2) the model is only able to create "pictures of origami"<p>The model can only ever be trained on pictures of origami. Thus, the model can generate images that are getting close to "pictures of origami", but (as pictures necessarily are abstracted 2d projections) this might still be way way way off from "origami". Not knowing about actual origami, only ever having seen pictures, I thought most of the generated images were quite good. The actual experienced origami-folding person doesn't see it that way.<p>I hope my thought is phrased clearly enough, I am having trouble finding the right words here.
Semi related question for those more familiar with current AI capabilities:
Has there been any attempt to "see" what dinosaurs looked like from their fossils? Using existing known animals and their skeletons as a training set.
I don't mean this to sound overly negative, because I absolutely think DALL-E is a killer app amongst recent AI advances. But the thing that made DALL-E astonishing is that it was... good. While DALL-E Mini mimics a lot of the technical advances and you can kind of see what it's getting at with its outputs, they're still mostly garbage. Very clever garbage! But they lack the emotional impact that - woah! - this is doing something superhuman.<p>Obviously the hope is that somehow this and future advances can be democratised. It was funny that Asimov's The Last Question has been posted here a couple of times recently because it makes such a big thing about world-sized computers and how advanced minicomputers would be. It's easy to read and scoff at the naivety... before realising we could easily be heading back in that direction for many impactful future technologies.
Honestly, I thought the images generated were actually pretty good. The shadows of the paper folds, the types of folds typically used. It all felt "close enough" to be very impressive for an AI model.
Checked the model, and the "model card" <a href="https://huggingface.co/dalle-mini/dalle-mini#bias" rel="nofollow">https://huggingface.co/dalle-mini/dalle-mini#bias</a> is an interesting exercise in sensitivity absurdity:<p>"Bias<p>CONTENT WARNING: Readers should be aware this section contains content that is disturbing, offensive, and can propagate historical and current stereotypes."<p>Spoiler alert, nothing contained in that section requires a warning. It's just abstract descriptions of "potential" negative stereotypes in images.<p>"initial testing demonstrates that they may generate images that contain negative stereotypes against minoritized groups"<p>Minoritized is a new word for me. As though minority status is something actively attached to someone. But no duh I can ask dalle to generate "images of klan members at a lynching" or "inner city police brutality" and get negative images.<p>"When the model generates images with people in them, it tends to output people who we perceive to be white, while people of color are underrepresented."<p>I'd like to see real testing, because from what I can tell this is not true. Ask for "white people" and you get weird abstract models of white figures. Ask for "black people" and you get beautiful photos of smiling black faces.<p>Is this the kind of exercise AI researchers have to concern themselves with these days?
Just tried all of the prompts from the OP's post on OpenAI's DALL-E 2 - <a href="https://harishgarg.com/writing/generating-origami-images-using-dall-e-prompts/" rel="nofollow">https://harishgarg.com/writing/generating-origami-images-usi...</a><p>DALL-E 2 beats Mini in almost all of them.
Some of the issues seemingly stem from the model's either poor or mis-understanding of the input language... I wonder what a fusion of DALL-E + GPT3 or LaMBDA, where the text-based models perform prompt interpretations, would look like.<p>This may be a naïve thought as my understanding of all models mentioned is superficial at best.