科技回声

8 条评论

M4v3R超过 1 年前

GPT-4V is mind blowing, it surprising to me that it gets so little attention here on HN, because after playing around with it I get the same sense of excitement I got when I tried the original ChatGPT. The level of understanding of what is going on in an image is leagues ahead of what we had until this point, ahead of Bard and basically everything else I already saw.I tested it with a bunch of photos I made and images it could not have seen in its training data, and most of the time it nailed them perfectly. Its OCR capabilities are top notch, but this is combined with a spatial understanding of how text relates to other parts of the image. It can take a photo of a wall monthly calendar with hand scribbles on it and give you a list of events for each day. It can guess where a specific photo was taken just by analysing the elements present on the photo like the foliage, architecture, car license plates, etc (without being specifically prompted to do so). It can correctly identify multiple plants from a same photo. Gave it a photo of a Montessori set for teaching math (some wooden blocks with numbers and dots on them, no branding on them) and it guessed exactly what it was. And all of that just from two days of testing.Here are just few examples:[1] <a href="https://i.imgur.com/cV3dVOf.png" rel="nofollow noreferrer">https://i.imgur.com/cV3dVOf.png</a> - Gave it a screenshot from Final Fantasy VII from a boss battle. It correctly identified the party members, and their stats even though the text and labels are a bit all over the place.[2] <a href="https://i.imgur.com/WeXhP7V.png" rel="nofollow noreferrer">https://i.imgur.com/WeXhP7V.png</a> - A photo I shot on my vacation, that didn't really contain any major landmarks, and yet it still somehow figured out from the architecture (and house colors) the exact location of it. I tried this game with several photos and it is very good at it, far better than I could ever be if I saw these photos for the first time.[3] <a href="https://i.imgur.com/HgwYv6q.png" rel="nofollow noreferrer">https://i.imgur.com/HgwYv6q.png</a> - A screenshot of a worksheet from the Human Shader Project. I just asked it to solve it for given X/Y values and it did, its answer was 100% correct (here's the second part of its answer: <a href="https://i.imgur.com/RZF2r7v.png" rel="nofollow noreferrer">https://i.imgur.com/RZF2r7v.png</a>)[4] <a href="https://i.imgur.com/12xg4qU.png" rel="nofollow noreferrer">https://i.imgur.com/12xg4qU.png</a> - A photo of a highly reflective microwave inside a shopping mall. This was given to my by a friend who shot this personally and to be honest I didn't catch at first that this is a microwave, and yet GPT-4V figured that out.[5] <a href="https://i.imgur.com/qSifni5.png" rel="nofollow noreferrer">https://i.imgur.com/qSifni5.png</a> - a good old fashioned "find the path connecting one object to the other" puzzle. Correctly identified the right path (this one was taken from the internet so there is a slight chance it saw it in the training data and somehow got the solution for it from the accompanying text, although I couldn't find any instance of it).Edit: To confirm that [5] was not a fluke I hand drew my own version of this puzzle, took a picture and uploaded it, GPT-4V nailed this one too: <a href="https://i.imgur.com/8NgWhzw.png" rel="nofollow noreferrer">https://i.imgur.com/8NgWhzw.png</a>

评论 #37895016 未加载

评论 #37893920 未加载

评论 #37897115 未加载

评论 #37893589 未加载

评论 #37893808 未加载

评论 #37894820 未加载

mustafa_pasi超过 1 年前

Could GPT-4V be used for robotic applications? I am a bit confused here. It produces a text from an image, but how much actual understanding does it have? Can the output somehow be used to do image segmentation and object detection and tracking?

评论 #37893161 未加载

评论 #37893109 未加载

评论 #37893127 未加载

评论 #37894259 未加载

评论 #37893935 未加载

scotty79超过 1 年前

I was terrible at this task as a kid. When asked to describe an image I usually volunteered single piece of information about it and I had to be prompted multiple times and asked leading questions to observe and describe more of it. GPT-4V beats me even now. There were a lot of information in the descriptions I wouldn't notice or include without being specifically asked about them.

two_in_one超过 1 年前

I played with it, a cool thing:It can write a poem from image.It can read text from image and 'understand' it.Or even specific part of the text. You can say "look at the bottom line".It can recognize and list songs from album cover.It recognizes famous paintings. Even if only a fragment is given.It can be used to create image-text datasets for generative and recognition tasks. Not sure how much this would cost.

评论 #37895104 未加载

sgt101超过 1 年前

Do you think that these images were in the GPT4 training set?maybe...

评论 #37893117 未加载

评论 #37893093 未加载

jccalhoun超过 1 年前

As a college professor the descriptions from ChatGPT remind me a lot of freshmen writing: they are often flailing around on extraneous details and have difficulty determining what is and isn't important.It will be interesting to see how it improves.

评论 #37895457 未加载

Animats超过 1 年前

Indeed, there has been much progress.The next big fundamental problem is "hallucination", or being totally wrong without detecting it.

评论 #37893527 未加载

评论 #37893372 未加载

drewcoo超过 1 年前

So it draws pictures like a grade school boy, a human whose "training set" involves TV and movies.For seven years that doesn't seem unreasonable.

8 条评论

M4v3R超过 1 年前

评论 #37895016 未加载

评论 #37893920 未加载

评论 #37897115 未加载

评论 #37893589 未加载

评论 #37893808 未加载

评论 #37894820 未加载

mustafa_pasi超过 1 年前

评论 #37893161 未加载

评论 #37893109 未加载

评论 #37893127 未加载

评论 #37894259 未加载

评论 #37893935 未加载

scotty79超过 1 年前

two_in_one超过 1 年前

评论 #37895104 未加载

sgt101超过 1 年前

Do you think that these images were in the GPT4 training set?maybe...

评论 #37893117 未加载

评论 #37893093 未加载

jccalhoun超过 1 年前

评论 #37895457 未加载

Animats超过 1 年前

Indeed, there has been much progress.The next big fundamental problem is "hallucination", or being totally wrong without detecting it.

评论 #37893527 未加载

评论 #37893372 未加载

drewcoo超过 1 年前

So it draws pictures like a grade school boy, a human whose "training set" involves TV and movies.For seven years that doesn't seem unreasonable.

"Building Machines That Learn and Think Like People", 7 Years Later

8 条评论

"Building Machines That Learn and Think Like People", 7 Years Later

8 条评论