TE
TechEcho
StartseiteTop 24hNeuesteBesteFragenZeigenJobs
GitHubTwitter
Startseite

TechEcho

Eine mit Next.js erstellte Technologie-Nachrichtenplattform, die globale Technologienachrichten und Diskussionen bietet.

GitHubTwitter

Startseite

StartseiteNeuesteBesteFragenZeigenJobs

Ressourcen

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. Alle Rechte vorbehalten.

Bagel: Open-source unified multimodal model

219 Punktevon toshvor 4 Tagen

15 comments

jjrvvor 4 Tagen
I found a losslessly compressed version: <a href="https:&#x2F;&#x2F;github.com&#x2F;LeanModels&#x2F;Bagel-DFloat11">https:&#x2F;&#x2F;github.com&#x2F;LeanModels&#x2F;Bagel-DFloat11</a><p>It works following readme instructions at least on Ubuntu, on my RTX 3090 GPU with 24 gigs of memory, just barely. Have to close most other windows and lower screen resolution to be able to load the model. Then it generates or edits images in 2-3 minutes. I only have this one GPU and am using Chrome to use the browser interface on the same machine.<p>The original release won&#x27;t run on this hardware, but the compressed one is supposed to give identical results.
评论 #44097442 未加载
spuzvor 4 Tagen
I&#x27;m interested in potential alternatives to ChatGPT&#x27;s advanced voice mode. When I see the word &quot;multimodal&quot; I&#x27;m hopeful the model understands text + voice but instead it almost always seems to refer to text + images. Is there a keyword that I can use to look for models that work with voice similar to ChatGPT&#x27;s advanced voice mode?
评论 #44099377 未加载
评论 #44096014 未加载
akacrobatvor 4 Tagen
This looks exciting! There is a serious dearth of high-quality open-source models with multimodal capabilities. So, really looking forward to playing with this one.<p>Has anyone here experimented with fine-tuning this for domain-specific applications?
charcircuitvor 4 Tagen
The demo shows pretty weak performance compared to other small models. It misunderstood my question due to picking an uncommon way to interpret it. After clarifying what I wanted it lost all context I had provided in the previous message. My benchmark query intentionally ambiguous and I use it to see how models handle ambiguity, handle information which can be outdated, and handle avoiding hallucination. Usually weak models will just hallucinate an answer, but this model was the first who want able to understand the question.
LourensTvor 4 Tagen
These days, papers come with an advertisement video
评论 #44095165 未加载
评论 #44099559 未加载
pleonevor 4 Tagen
Is it from ByteDance Team, right? The team behind TikTok, CapCut, BuzzVideo and more. Any thoughts on that?
评论 #44096455 未加载
akoculuvor 4 Tagen
Good summary of the paper: <a href="https:&#x2F;&#x2F;x.com&#x2F;build__ship&#x2F;status&#x2F;1926930191185580176" rel="nofollow">https:&#x2F;&#x2F;x.com&#x2F;build__ship&#x2F;status&#x2F;1926930191185580176</a>
mdrznvor 4 Tagen
A quick test in the &quot;demo&quot; link doesn&#x27;t show it to be &quot;as smart&quot; as it appeared in the demos on the page. I really hope it does all it&#x27;s promising to do, but I&#x27;m skeptic so far.
评论 #44099091 未加载
moffkalastvor 4 Tagen
Oh no it&#x27;s The Everything Bagel.
mnky9800nvor 4 Tagen
I couldn’t find it, what are the hardware expectations for bagel?
评论 #44094936 未加载
评论 #44094927 未加载
GrantMoyervor 4 Tagen
Nice, it&#x27;s really an open source model, Apache 2.0.
wsintra2022vor 4 Tagen
<a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=44063602">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=44063602</a>
sandra_vuvor 4 Tagen
Hi good job, team. Any plans to commercialize the model?
saretupvor 4 Tagen
&gt; Scala<i>b</i>le Perceptu<i>a</i>l <i>G</i>enerative Mod<i>el</i><p>If you wanna call it Bagel, just call it Bagel. No need to make up a justification.
gregjwvor 4 Tagen
bagel