The approach of both papers are somewhat similar, but not exactly the same. Preprints download:
- Show and Tell: A Neural Image Caption Generator: <a href="http://arxiv.org/pdf/1411.4555.pdf" rel="nofollow">http://arxiv.org/pdf/1411.4555.pdf</a> (Google)
- Deep Visual-Semantic Alignments for Generating Image Descriptions <a href="http://cs.stanford.edu/people/karpathy/deepimagesent/devisagen.pdf" rel="nofollow">http://cs.stanford.edu/people/karpathy/deepimagesent/devisag...</a> (Stanford)