TE
TechEcho
AccueilTop 24hRécentsMeilleursQuestionsPrésentationsEmplois
GitHubTwitter
Accueil

TechEcho

Une plateforme d'actualités technologiques construite avec Next.js, fournissant des nouvelles et discussions technologiques mondiales.

GitHubTwitter

Accueil

AccueilRécentsMeilleursQuestionsPrésentationsEmplois

Ressources

HackerNews APIHackerNews OriginalNext.js

© 2025 TechEcho. Tous droits réservés.

Bagel: Open-source unified multimodal model

219 pointspar toshil y a 4 jours

15 comments

jjrvil y a 4 jours
I found a losslessly compressed version: <a href="https:&#x2F;&#x2F;github.com&#x2F;LeanModels&#x2F;Bagel-DFloat11">https:&#x2F;&#x2F;github.com&#x2F;LeanModels&#x2F;Bagel-DFloat11</a><p>It works following readme instructions at least on Ubuntu, on my RTX 3090 GPU with 24 gigs of memory, just barely. Have to close most other windows and lower screen resolution to be able to load the model. Then it generates or edits images in 2-3 minutes. I only have this one GPU and am using Chrome to use the browser interface on the same machine.<p>The original release won&#x27;t run on this hardware, but the compressed one is supposed to give identical results.
评论 #44097442 未加载
spuzil y a 4 jours
I&#x27;m interested in potential alternatives to ChatGPT&#x27;s advanced voice mode. When I see the word &quot;multimodal&quot; I&#x27;m hopeful the model understands text + voice but instead it almost always seems to refer to text + images. Is there a keyword that I can use to look for models that work with voice similar to ChatGPT&#x27;s advanced voice mode?
评论 #44099377 未加载
评论 #44096014 未加载
akacrobatil y a 4 jours
This looks exciting! There is a serious dearth of high-quality open-source models with multimodal capabilities. So, really looking forward to playing with this one.<p>Has anyone here experimented with fine-tuning this for domain-specific applications?
charcircuitil y a 4 jours
The demo shows pretty weak performance compared to other small models. It misunderstood my question due to picking an uncommon way to interpret it. After clarifying what I wanted it lost all context I had provided in the previous message. My benchmark query intentionally ambiguous and I use it to see how models handle ambiguity, handle information which can be outdated, and handle avoiding hallucination. Usually weak models will just hallucinate an answer, but this model was the first who want able to understand the question.
LourensTil y a 4 jours
These days, papers come with an advertisement video
评论 #44095165 未加载
评论 #44099559 未加载
pleoneil y a 4 jours
Is it from ByteDance Team, right? The team behind TikTok, CapCut, BuzzVideo and more. Any thoughts on that?
评论 #44096455 未加载
akoculuil y a 4 jours
Good summary of the paper: <a href="https:&#x2F;&#x2F;x.com&#x2F;build__ship&#x2F;status&#x2F;1926930191185580176" rel="nofollow">https:&#x2F;&#x2F;x.com&#x2F;build__ship&#x2F;status&#x2F;1926930191185580176</a>
mdrznil y a 4 jours
A quick test in the &quot;demo&quot; link doesn&#x27;t show it to be &quot;as smart&quot; as it appeared in the demos on the page. I really hope it does all it&#x27;s promising to do, but I&#x27;m skeptic so far.
评论 #44099091 未加载
moffkalastil y a 4 jours
Oh no it&#x27;s The Everything Bagel.
mnky9800nil y a 4 jours
I couldn’t find it, what are the hardware expectations for bagel?
评论 #44094936 未加载
评论 #44094927 未加载
GrantMoyeril y a 4 jours
Nice, it&#x27;s really an open source model, Apache 2.0.
wsintra2022il y a 4 jours
<a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=44063602">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=44063602</a>
sandra_vuil y a 4 jours
Hi good job, team. Any plans to commercialize the model?
saretupil y a 4 jours
&gt; Scala<i>b</i>le Perceptu<i>a</i>l <i>G</i>enerative Mod<i>el</i><p>If you wanna call it Bagel, just call it Bagel. No need to make up a justification.
gregjwil y a 4 jours
bagel