TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Dragonfly: A large vision-language model with multi-resolution zoom

143 点作者 jasondavies12 个月前

9 条评论

davidhyde12 个月前
&gt; “ Question: Write a detailed radiology note based on the chest X-ray. Gold Answer: AP upright and lateral views of the chest were provided. Left chest wall pacer pack is again seen with leads extending into the right heart. ”<p>The bit about a “wall pacer pack is again seen…” leads me to believe this was based on another doctors note about a similar looking X-ray which was probably paired with other information like another scan at the time. That would be problematic imo.
评论 #40607044 未加载
评论 #40605320 未加载
TechDebtDevin12 个月前
Ive been sorta following together.ai for a while. Cool company. Is this available to be used by anyone atm? Could I potentially use the model to look at my own chest xrays (I&#x27;ve had a lot)?
评论 #40602097 未加载
ilaksh12 个月前
I have been testing out LLMs with the together.ai API, but I can&#x27;t figure out how to use the multimodal models with the API. I don&#x27;t see any in their model list.
评论 #40606973 未加载
GaggiX12 个月前
Is there a demo or API to test the model? There are so many vision language models these days, it&#x27;s hard to say which one is better, they also use in many cases different benchmarks.
评论 #40601748 未加载
评论 #40604615 未加载
achristmascarl12 个月前
For the model fine-tuned on biomedical image data, does anyone with domain knowledge know how the model&#x27;s answers compare to the &quot;Gold&quot; answers?
评论 #40602566 未加载
评论 #40616483 未加载
评论 #40601504 未加载
cateye12 个月前
It is strange that this model is not available on Together.ai to try it out after reading the blog artcile.
stainablesteel12 个月前
this looks quite impressive<p>if image generation gets to be near perfect then it might have a larger impact on communication than gpt does, no paragraph beats a good diagram but drawing is always hard
评论 #40608440 未加载
esafak12 个月前
Is there a comparable service for audio analysis?
评论 #40610098 未加载
darby_nine12 个月前
I can&#x27;t speak for others obviously, but this sort of caption is nauseous:<p>&gt; In the heart of a vibrant skatepark, a skateboarder is caught in a moment of pure exhilaration. The skateboarder, dressed in a black t-shirt adorned with a yellow graphic and black pants, is suspended in mid-air, performing an impressive trick on a concrete ramp. The skateboarder&#x27;s arms are outstretched, adding balance to the daring stunt. The skatepark itself is a concrete playground, with the skateboarder&#x27;s ramp being the main focus. In the background, palm trees sway gently, adding a touch of nature to the urban setting. A few spectators can be seen in the distance, their attention riveted on the airborne skateboarder. The image captures not just a moment, but a story of skill, courage, and the joy of skateboarding.<p>This seems a lot more like a puff piece from a local publisher trying to fill space, or description of a stock photo to an advertiser, than a description I&#x27;d describe as accurate from a human to another human.
评论 #40608374 未加载
评论 #40613620 未加载
评论 #40605111 未加载
评论 #40607593 未加载
评论 #40604149 未加载
评论 #40603747 未加载
评论 #40604062 未加载