TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

"Building Machines That Learn and Think Like People", 7 Years Later

106 点作者 che_shr_cat超过 1 年前

8 条评论

M4v3R超过 1 年前
GPT-4V is mind blowing, it surprising to me that it gets so little attention here on HN, because after playing around with it I get the same sense of excitement I got when I tried the original ChatGPT. The level of understanding of what is going on in an image is leagues ahead of what we had until this point, ahead of Bard and basically everything else I already saw.<p>I tested it with a bunch of photos I made and images it could not have seen in its training data, and most of the time it nailed them perfectly. Its OCR capabilities are top notch, but this is combined with a spatial understanding of how text relates to other parts of the image. It can take a photo of a wall monthly calendar with hand scribbles on it and give you a list of events for each day. It can guess where a specific photo was taken just by analysing the elements present on the photo like the foliage, architecture, car license plates, etc (without being specifically prompted to do so). It can correctly identify multiple plants from a same photo. Gave it a photo of a Montessori set for teaching math (some wooden blocks with numbers and dots on them, no branding on them) and it guessed exactly what it was. And all of that just from two days of testing.<p>Here are just few examples:<p>[1] <a href="https:&#x2F;&#x2F;i.imgur.com&#x2F;cV3dVOf.png" rel="nofollow noreferrer">https:&#x2F;&#x2F;i.imgur.com&#x2F;cV3dVOf.png</a> - Gave it a screenshot from Final Fantasy VII from a boss battle. It correctly identified the party members, and their stats even though the text and labels are a bit all over the place.<p>[2] <a href="https:&#x2F;&#x2F;i.imgur.com&#x2F;WeXhP7V.png" rel="nofollow noreferrer">https:&#x2F;&#x2F;i.imgur.com&#x2F;WeXhP7V.png</a> - A photo I shot on my vacation, that didn&#x27;t really contain any major landmarks, and yet it still somehow figured out from the architecture (and house colors) the exact location of it. I tried this game with several photos and it is very good at it, far better than I could ever be if I saw these photos for the first time.<p>[3] <a href="https:&#x2F;&#x2F;i.imgur.com&#x2F;HgwYv6q.png" rel="nofollow noreferrer">https:&#x2F;&#x2F;i.imgur.com&#x2F;HgwYv6q.png</a> - A screenshot of a worksheet from the Human Shader Project. I just asked it to solve it for given X&#x2F;Y values and it did, its answer was 100% correct (here&#x27;s the second part of its answer: <a href="https:&#x2F;&#x2F;i.imgur.com&#x2F;RZF2r7v.png" rel="nofollow noreferrer">https:&#x2F;&#x2F;i.imgur.com&#x2F;RZF2r7v.png</a>)<p>[4] <a href="https:&#x2F;&#x2F;i.imgur.com&#x2F;12xg4qU.png" rel="nofollow noreferrer">https:&#x2F;&#x2F;i.imgur.com&#x2F;12xg4qU.png</a> - A photo of a highly reflective microwave inside a shopping mall. This was given to my by a friend who shot this personally and to be honest I didn&#x27;t catch at first that this is a microwave, and yet GPT-4V figured that out.<p>[5] <a href="https:&#x2F;&#x2F;i.imgur.com&#x2F;qSifni5.png" rel="nofollow noreferrer">https:&#x2F;&#x2F;i.imgur.com&#x2F;qSifni5.png</a> - a good old fashioned &quot;find the path connecting one object to the other&quot; puzzle. Correctly identified the right path (this one was taken from the internet so there is a slight chance it saw it in the training data and somehow got the solution for it from the accompanying text, although I couldn&#x27;t find any instance of it).<p>Edit: To confirm that [5] was not a fluke I hand drew my own version of this puzzle, took a picture and uploaded it, GPT-4V nailed this one too: <a href="https:&#x2F;&#x2F;i.imgur.com&#x2F;8NgWhzw.png" rel="nofollow noreferrer">https:&#x2F;&#x2F;i.imgur.com&#x2F;8NgWhzw.png</a>
评论 #37895016 未加载
评论 #37893920 未加载
评论 #37897115 未加载
评论 #37893589 未加载
评论 #37893808 未加载
评论 #37894820 未加载
mustafa_pasi超过 1 年前
Could GPT-4V be used for robotic applications? I am a bit confused here. It produces a text from an image, but how much actual understanding does it have? Can the output somehow be used to do image segmentation and object detection and tracking?
评论 #37893161 未加载
评论 #37893109 未加载
评论 #37893127 未加载
评论 #37894259 未加载
评论 #37893935 未加载
scotty79超过 1 年前
I was terrible at this task as a kid. When asked to describe an image I usually volunteered single piece of information about it and I had to be prompted multiple times and asked leading questions to observe and describe more of it. GPT-4V beats me even now. There were a lot of information in the descriptions I wouldn&#x27;t notice or include without being specifically asked about them.
two_in_one超过 1 年前
I played with it, a cool thing:<p>It can write a poem from image.<p>It can read text from image and &#x27;understand&#x27; it.<p>Or even specific part of the text. You can say &quot;look at the bottom line&quot;.<p>It can recognize and list songs from album cover.<p>It recognizes famous paintings. Even if only a fragment is given.<p>It can be used to create image-text datasets for generative and recognition tasks. Not sure how much this would cost.
评论 #37895104 未加载
sgt101超过 1 年前
Do you think that these images were in the GPT4 training set?<p>maybe...
评论 #37893117 未加载
评论 #37893093 未加载
jccalhoun超过 1 年前
As a college professor the descriptions from ChatGPT remind me a lot of freshmen writing: they are often flailing around on extraneous details and have difficulty determining what is and isn&#x27;t important.<p>It will be interesting to see how it improves.
评论 #37895457 未加载
Animats超过 1 年前
Indeed, there has been much progress.<p>The next big fundamental problem is &quot;hallucination&quot;, or being totally wrong without detecting it.
评论 #37893527 未加载
评论 #37893372 未加载
drewcoo超过 1 年前
So it draws pictures like a grade school boy, a human whose &quot;training set&quot; involves TV and movies.<p>For seven years that doesn&#x27;t seem unreasonable.