Image-to-Image Translation with Conditional Adversarial Nets

317 点作者 cruisestacy超过 8 年前

15 条评论

jawns超过 8 年前

The "sketches to handbags" example, which is buried toward the bottom, is really cool. It's basically an extension of the "edges to handbags," but with hand-drawn sketches.Even though the sketches are fairly crude, with no shading and a low level of detail, many of the generated images look like they could, in fact, be real handbags. They still have the mark of a generated image (e.g. weird mottling) but they're totally recognizable as the thing they're meant to be.The "sketches to shoes" example, on the other hand, reveals some of the limitations. Most of the sketches use poor perspective, so they wouldn't match up well with edges detected from an actual image of a shoe. Our brains can "get the gist" of the sketches and perform some perspective translation, but the algorithm doesn't appear to perform any translation of the input (e.g. "here's a sketch that appears to represent a shoe, here's what a shoe is actually shaped like, let's fit to that shape before going any further"), so you end up with images where a shoe-like texture is applied to something that doesn't look convincingly like a real shoe.

评论 #13014676 未加载

评论 #13014753 未加载

评论 #13018868 未加载

aexaey超过 8 年前

Truly impressive overall. Unfortunately, it looks like training set was way too small. Look for example at reconstruction of #13 here:<a href="https://phillipi.github.io/pix2pix/images/index_facades2_loss_variations.html" rel="nofollow">https://phillipi.github.io/pix2pix/images/index_facades2_los...</a>Notice white triangles (image crop artifacts) present on the original image, yet completely absent on the net input image. They make re-appearance on the output of 3 (4 even?) out of 5 nets despite the lack of corresponding cue in the input image. Looks like network cheated a bit here, i.e. took advantage of small set size and memorized the input image as a whole. Then recognized and recalled this very image (already seen during training) rather than actually reconstructing it purely from the input.Same (but less prominent) for other images where "ground truth" image was cropped.

mshenfield超过 8 年前

Just want to throw out that none of these applications are new. What is novel about their approach is that, instead of learning a mapping function using a hand-picked function to quantify accuracy for each problem, they also have a mechanism for choosing the function that quantifies accuracy. Haven't grokked the paper to see how they do it, but that is pretty neat IMO.

ragebol超过 8 年前

Interesting.What I like about the "Day to Night" example is that is clearly demonstrates that these sort of networks lack common sense. It expects light to be where they are clearly (to humans with common sense at least) no things that can produce light. E.g. in the middle of a roof or in a tree. Of course, there can be, but it's fairly uncommon.And the opposite as well, no lights where a human would totally expect a light, eg. in the front of buildings or on the top of, well, lighting poles.

评论 #13014494 未加载

sebleon超过 8 年前

This is awesome!Makes me wonder how this can apply to image and video compression. You could send over the semantic segmentation version of an image or video, and system on the other end would use these technique to reconstruct the original.

评论 #13014529 未加载

评论 #13016590 未加载

评论 #13021723 未加载

评论 #13018271 未加载

verytrivial超过 8 年前

Does anyone else have the feeling that with the current trajectory, something exactly like this, but with perhaps a million times the amount of feedback and data, thought will just emerge? Yes, this is all 2D and abstract/selective training sets etc, but what if AI is the ultimate fake-it-until-you-make-it?

评论 #13015125 未加载

评论 #13014953 未加载

评论 #13018011 未加载

评论 #13018906 未加载

评论 #13014790 未加载

willcodeforfoo超过 8 年前

The Aerial-to-Map example looks like this may be useful for automatic map/satellite rectification/georeferencing, but not sure how efficient it'd be if it has to compare against a large area.Does anyone have any experience in this area?

bflesch超过 8 年前

I feel this can potentially revolutionize creative processes, for example in the clothing industry. You just draw up a purse or a shoe, let the machines generate dozens of variants (with pictures), and then you only have to filter and rank them.You can pipe these product sketches directly into focus groups who tell you which product is most likely to sell. You don't need massive staff to come up with product variants any more.

评论 #13014952 未加载

评论 #13013937 未加载

iraphael超过 8 年前

Besides a cool new application of GANNs, I don't see if this architecture is much different than normal GANNs. Anyone else have thoughts?

评论 #13015851 未加载

amelius超过 8 年前

I wonder how well this scales to a larger domain of interest. So, e.g., if the neural net needs to know not only about cars and nature, but about more topics such as people, faces, computers, gastronomy, santa claus, halloween, etcetera, how does the neural net scale? And how should its topology be extended under such scaling?

评论 #13014476 未加载

romaniv超过 8 年前

Kudos for providing proper examples of the network doing its thing, both good and bad. This is what all researched ought to do. Too many papers these days handpick a couple coolest looking results and stop at that....I get a feeling this could be used in game design to do some really cool stuff with map and texture generation.

评论 #13016008 未加载

rosstex超过 8 年前

I'm enrolled in Efros' computational photography course this semester, and Tinghui and Jun-Yan are the GSIs. It's fantastic to experience the bridge between teaching and cutting-edge research!

mmastrac超过 8 年前

This is an absolutely incredible result. All of this stuff would be considered insanely advanced AI ten years ago, but now we look at it and say "this is just stuff computers can do".We've got the pieces of visual processing and imagination here and the pieces of language input/output as part of Google's work. It feels like we just need to make some progress on an "AI executive" before we can get a real, interactive, human-like machine.

hanoz超过 8 年前

I'm interested in having a play. As an out and out ML newbie, is there such a thing as an AWS image I could run on a GPU instance and then just git clone and go?

评论 #13014937 未加载

oluckyman超过 8 年前

Neural nets! Is there anything they can't do?