GPT-4V is mind blowing, it surprising to me that it gets so little attention here on HN, because after playing around with it I get the same sense of excitement I got when I tried the original ChatGPT. The level of understanding of what is going on in an image is leagues ahead of what we had until this point, ahead of Bard and basically everything else I already saw.<p>I tested it with a bunch of photos I made and images it could not have seen in its training data, and most of the time it nailed them perfectly. Its OCR capabilities are top notch, but this is combined with a spatial understanding of how text relates to other parts of the image. It can take a photo of a wall monthly calendar with hand scribbles on it and give you a list of events for each day. It can guess where a specific photo was taken just by analysing the elements present on the photo like the foliage, architecture, car license plates, etc (without being specifically prompted to do so). It can correctly identify multiple plants from a same photo. Gave it a photo of a Montessori set for teaching math (some wooden blocks with numbers and dots on them, no branding on them) and it guessed exactly what it was. And all of that just from two days of testing.<p>Here are just few examples:<p>[1] <a href="https://i.imgur.com/cV3dVOf.png" rel="nofollow noreferrer">https://i.imgur.com/cV3dVOf.png</a> - Gave it a screenshot from Final Fantasy VII from a boss battle. It correctly identified the party members, and their stats even though the text and labels are a bit all over the place.<p>[2] <a href="https://i.imgur.com/WeXhP7V.png" rel="nofollow noreferrer">https://i.imgur.com/WeXhP7V.png</a> - A photo I shot on my vacation, that didn't really contain any major landmarks, and yet it still somehow figured out from the architecture (and house colors) the exact location of it. I tried this game with several photos and it is very good at it, far better than I could ever be if I saw these photos for the first time.<p>[3] <a href="https://i.imgur.com/HgwYv6q.png" rel="nofollow noreferrer">https://i.imgur.com/HgwYv6q.png</a> - A screenshot of a worksheet from the Human Shader Project. I just asked it to solve it for given X/Y values and it did, its answer was 100% correct (here's the second part of its answer: <a href="https://i.imgur.com/RZF2r7v.png" rel="nofollow noreferrer">https://i.imgur.com/RZF2r7v.png</a>)<p>[4] <a href="https://i.imgur.com/12xg4qU.png" rel="nofollow noreferrer">https://i.imgur.com/12xg4qU.png</a> - A photo of a highly reflective microwave inside a shopping mall. This was given to my by a friend who shot this personally and to be honest I didn't catch at first that this is a microwave, and yet GPT-4V figured that out.<p>[5] <a href="https://i.imgur.com/qSifni5.png" rel="nofollow noreferrer">https://i.imgur.com/qSifni5.png</a> - a good old fashioned "find the path connecting one object to the other" puzzle. Correctly identified the right path (this one was taken from the internet so there is a slight chance it saw it in the training data and somehow got the solution for it from the accompanying text, although I couldn't find any instance of it).<p>Edit: To confirm that [5] was not a fluke I hand drew my own version of this puzzle, took a picture and uploaded it, GPT-4V nailed this one too: <a href="https://i.imgur.com/8NgWhzw.png" rel="nofollow noreferrer">https://i.imgur.com/8NgWhzw.png</a>