TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Open Flamingo – open framework to train multimodal LLMs

265 pointsby mpaepperabout 2 years ago

7 comments

ftxbroabout 2 years ago
In the demo I put the obama prank photo <a href="http:&#x2F;&#x2F;karpathy.github.io&#x2F;2012&#x2F;10&#x2F;22&#x2F;state-of-computer-vision&#x2F;" rel="nofollow">http:&#x2F;&#x2F;karpathy.github.io&#x2F;2012&#x2F;10&#x2F;22&#x2F;state-of-computer-visio...</a> and asked &quot;Why is this picture funny?&quot; and it responded &quot;Question: Why is this picture funny? Answer: President Obama is taller than the average person.&quot;
评论 #35348461 未加载
评论 #35355874 未加载
评论 #35348203 未加载
yeldarbabout 2 years ago
I always like to try these zero-shot models on things outside of the &quot;normal&quot; COCO classes. Here are some chess board queries:<p>Counting: <a href="https:&#x2F;&#x2F;imgur.com&#x2F;KTuQ1Bv" rel="nofollow">https:&#x2F;&#x2F;imgur.com&#x2F;KTuQ1Bv</a><p>Parse the chess board: <a href="https:&#x2F;&#x2F;imgur.com&#x2F;2zYFK1P" rel="nofollow">https:&#x2F;&#x2F;imgur.com&#x2F;2zYFK1P</a><p>(Result): <a href="https:&#x2F;&#x2F;imgur.com&#x2F;Ei4MAl7" rel="nofollow">https:&#x2F;&#x2F;imgur.com&#x2F;Ei4MAl7</a><p>Few-Shot Object Detection (Pascal VOC): <a href="https:&#x2F;&#x2F;imgur.com&#x2F;gZkDMn8" rel="nofollow">https:&#x2F;&#x2F;imgur.com&#x2F;gZkDMn8</a><p>Few-Shot Object Detection (simplified): <a href="https:&#x2F;&#x2F;imgur.com&#x2F;Hk8QGMd" rel="nofollow">https:&#x2F;&#x2F;imgur.com&#x2F;Hk8QGMd</a><p>Not quite there yet. I&#x27;ve been more impressed with the other new zero-shot multimodal models like Grounding DINO and Azure Dense Captioning. Really looking forward to putting multimodal GPT-4 through its paces as well.
评论 #35351248 未加载
vagabundabout 2 years ago
Even at this scale the model&#x27;s able to answer questions fairly impressively, but I created an image with some distinct shapes in different positions and it didn&#x27;t go well [0]. I think however they&#x27;re doing the image encoding doesn&#x27;t capture positional information which, to my mind, limits a lot of use cases.<p>[0] <a href="https:&#x2F;&#x2F;i.postimg.cc&#x2F;GtrGs8mw&#x2F;Screenshot-2023-03-28-at-5-19-55-PM.png" rel="nofollow">https:&#x2F;&#x2F;i.postimg.cc&#x2F;GtrGs8mw&#x2F;Screenshot-2023-03-28-at-5-19-...</a>
评论 #35348879 未加载
mpaepperabout 2 years ago
This is awesome work and they also provide their 9B OpenFlamingo model which is based on Llama:<p><a href="https:&#x2F;&#x2F;huggingface.co&#x2F;openflamingo&#x2F;OpenFlamingo-9B" rel="nofollow">https:&#x2F;&#x2F;huggingface.co&#x2F;openflamingo&#x2F;OpenFlamingo-9B</a>
dfrankleabout 2 years ago
What are the key features of Open Flamingo, and how does it compare to other frameworks for training multimodal LLMs?
juxtaposicionabout 2 years ago
What’re the techniques that’ll get this to run on a single GPU?
评论 #35352925 未加载
duxupabout 2 years ago
That title is pretty impressive&#x2F; big on mobile!
评论 #35350140 未加载