TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Dragonfly: A large vision-language model with multi-resolution zoom

143 pointsby jasondavies12 months ago

9 comments

davidhyde12 months ago
&gt; “ Question: Write a detailed radiology note based on the chest X-ray. Gold Answer: AP upright and lateral views of the chest were provided. Left chest wall pacer pack is again seen with leads extending into the right heart. ”<p>The bit about a “wall pacer pack is again seen…” leads me to believe this was based on another doctors note about a similar looking X-ray which was probably paired with other information like another scan at the time. That would be problematic imo.
评论 #40607044 未加载
评论 #40605320 未加载
TechDebtDevin12 months ago
Ive been sorta following together.ai for a while. Cool company. Is this available to be used by anyone atm? Could I potentially use the model to look at my own chest xrays (I&#x27;ve had a lot)?
评论 #40602097 未加载
ilaksh12 months ago
I have been testing out LLMs with the together.ai API, but I can&#x27;t figure out how to use the multimodal models with the API. I don&#x27;t see any in their model list.
评论 #40606973 未加载
GaggiX12 months ago
Is there a demo or API to test the model? There are so many vision language models these days, it&#x27;s hard to say which one is better, they also use in many cases different benchmarks.
评论 #40601748 未加载
评论 #40604615 未加载
achristmascarl12 months ago
For the model fine-tuned on biomedical image data, does anyone with domain knowledge know how the model&#x27;s answers compare to the &quot;Gold&quot; answers?
评论 #40602566 未加载
评论 #40616483 未加载
评论 #40601504 未加载
cateye12 months ago
It is strange that this model is not available on Together.ai to try it out after reading the blog artcile.
stainablesteel12 months ago
this looks quite impressive<p>if image generation gets to be near perfect then it might have a larger impact on communication than gpt does, no paragraph beats a good diagram but drawing is always hard
评论 #40608440 未加载
esafak12 months ago
Is there a comparable service for audio analysis?
评论 #40610098 未加载
darby_nine12 months ago
I can&#x27;t speak for others obviously, but this sort of caption is nauseous:<p>&gt; In the heart of a vibrant skatepark, a skateboarder is caught in a moment of pure exhilaration. The skateboarder, dressed in a black t-shirt adorned with a yellow graphic and black pants, is suspended in mid-air, performing an impressive trick on a concrete ramp. The skateboarder&#x27;s arms are outstretched, adding balance to the daring stunt. The skatepark itself is a concrete playground, with the skateboarder&#x27;s ramp being the main focus. In the background, palm trees sway gently, adding a touch of nature to the urban setting. A few spectators can be seen in the distance, their attention riveted on the airborne skateboarder. The image captures not just a moment, but a story of skill, courage, and the joy of skateboarding.<p>This seems a lot more like a puff piece from a local publisher trying to fill space, or description of a stock photo to an advertiser, than a description I&#x27;d describe as accurate from a human to another human.
评论 #40608374 未加载
评论 #40613620 未加载
评论 #40605111 未加载
评论 #40607593 未加载
评论 #40604149 未加载
评论 #40603747 未加载
评论 #40604062 未加载