I found a losslessly compressed version:
<a href="https://github.com/LeanModels/Bagel-DFloat11">https://github.com/LeanModels/Bagel-DFloat11</a><p>It works following readme instructions at least on Ubuntu, on my RTX 3090 GPU with 24 gigs of memory, just barely. Have to close most other windows and lower screen resolution to be able to load the model. Then it generates or edits images in 2-3 minutes. I only have this one GPU and am using Chrome to use the browser interface on the same machine.<p>The original release won't run on this hardware, but the compressed one is supposed to give identical results.
I'm interested in potential alternatives to ChatGPT's advanced voice mode. When I see the word "multimodal" I'm hopeful the model understands text + voice but instead it almost always seems to refer to text + images. Is there a keyword that I can use to look for models that work with voice similar to ChatGPT's advanced voice mode?
This looks exciting! There is a serious dearth of high-quality open-source models with multimodal capabilities. So, really looking forward to playing with this one.<p>Has anyone here experimented with fine-tuning this for domain-specific applications?
The demo shows pretty weak performance compared to other small models. It misunderstood my question due to picking an uncommon way to interpret it. After clarifying what I wanted it lost all context I had provided in the previous message. My benchmark query intentionally ambiguous and I use it to see how models handle ambiguity, handle information which can be outdated, and handle avoiding hallucination. Usually weak models will just hallucinate an answer, but this model was the first who want able to understand the question.
Good summary of the paper:
<a href="https://x.com/build__ship/status/1926930191185580176" rel="nofollow">https://x.com/build__ship/status/1926930191185580176</a>
A quick test in the "demo" link doesn't show it to be "as smart" as it appeared in the demos on the page. I really hope it does all it's promising to do, but I'm skeptic so far.
> Scala<i>b</i>le Perceptu<i>a</i>l <i>G</i>enerative Mod<i>el</i><p>If you wanna call it Bagel, just call it Bagel. No need to make up a justification.