TechEcho

7 comments

ftxbroabout 2 years ago

In the demo I put the obama prank photo <a href="http://karpathy.github.io/2012/10/22/state-of-computer-vision/" rel="nofollow">http://karpathy.github.io/2012/10/22/state-of-computer-visio...</a> and asked "Why is this picture funny?" and it responded "Question: Why is this picture funny? Answer: President Obama is taller than the average person."

评论 #35348461 未加载

评论 #35355874 未加载

评论 #35348203 未加载

yeldarbabout 2 years ago

I always like to try these zero-shot models on things outside of the "normal" COCO classes. Here are some chess board queries:Counting: <a href="https://imgur.com/KTuQ1Bv" rel="nofollow">https://imgur.com/KTuQ1Bv</a>Parse the chess board: <a href="https://imgur.com/2zYFK1P" rel="nofollow">https://imgur.com/2zYFK1P</a>(Result): <a href="https://imgur.com/Ei4MAl7" rel="nofollow">https://imgur.com/Ei4MAl7</a>Few-Shot Object Detection (Pascal VOC): <a href="https://imgur.com/gZkDMn8" rel="nofollow">https://imgur.com/gZkDMn8</a>Few-Shot Object Detection (simplified): <a href="https://imgur.com/Hk8QGMd" rel="nofollow">https://imgur.com/Hk8QGMd</a>Not quite there yet. I've been more impressed with the other new zero-shot multimodal models like Grounding DINO and Azure Dense Captioning. Really looking forward to putting multimodal GPT-4 through its paces as well.

评论 #35351248 未加载

vagabundabout 2 years ago

Even at this scale the model's able to answer questions fairly impressively, but I created an image with some distinct shapes in different positions and it didn't go well [0]. I think however they're doing the image encoding doesn't capture positional information which, to my mind, limits a lot of use cases.[0] <a href="https://i.postimg.cc/GtrGs8mw/Screenshot-2023-03-28-at-5-19-55-PM.png" rel="nofollow">https://i.postimg.cc/GtrGs8mw/Screenshot-2023-03-28-at-5-19-...</a>

评论 #35348879 未加载

mpaepperabout 2 years ago

This is awesome work and they also provide their 9B OpenFlamingo model which is based on Llama:<a href="https://huggingface.co/openflamingo/OpenFlamingo-9B" rel="nofollow">https://huggingface.co/openflamingo/OpenFlamingo-9B</a>

dfrankleabout 2 years ago

What are the key features of Open Flamingo, and how does it compare to other frameworks for training multimodal LLMs?

juxtaposicionabout 2 years ago

What’re the techniques that’ll get this to run on a single GPU?

评论 #35352925 未加载

duxupabout 2 years ago

That title is pretty impressive/ big on mobile!

评论 #35350140 未加载

7 comments

ftxbroabout 2 years ago

评论 #35348461 未加载

评论 #35355874 未加载

评论 #35348203 未加载

yeldarbabout 2 years ago

评论 #35351248 未加载

vagabundabout 2 years ago

评论 #35348879 未加载

mpaepperabout 2 years ago

dfrankleabout 2 years ago

What are the key features of Open Flamingo, and how does it compare to other frameworks for training multimodal LLMs?

juxtaposicionabout 2 years ago

What’re the techniques that’ll get this to run on a single GPU?

评论 #35352925 未加载

duxupabout 2 years ago

That title is pretty impressive/ big on mobile!

评论 #35350140 未加载

Open Flamingo – open framework to train multimodal LLMs

7 comments

Open Flamingo – open framework to train multimodal LLMs

7 comments