TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Researchers use AI to turn sound recordings into street images

184 点作者 giuliomagnifico6 个月前

14 条评论

ptx6 个月前
This is not my area of expertise, but if I understand the article correctly, they created a model that matches pre-existing audio clips to pre-existing images. But instead of returning the matching image, the LLM generates a distorted fake image which is vaguely similar to the real image.<p>So it doesn&#x27;t really, as the title claims, turn recordings into images (it already has the images) and the distorted fake images it creates are only &quot;accurate&quot; in that they broadly slot into the right category in terms of urban&#x2F;rural setting, amount of greenery and amount of sky shown.<p>It sounds like the matching is the useful part and the &quot;generative&quot; part is just a huge disadvantage. The paper doesn&#x27;t seem to say if the LLM is any better than other types of models at the matching part.
评论 #42357879 未加载
评论 #42358088 未加载
simonw6 个月前
The word &quot;accurate&quot; in that headline is doing a LOT of work.<p>Here&#x27;s how the results were scored:<p>&quot;Computer evaluations compared the relative proportions of greenery, building and sky between source and generated images, whereas human judges were asked to correctly match one of three generated images to an audio sample.&quot;<p>So this is very impressive and a cool piece of research, but unsurprisingly not recreating the space &quot;accurately&quot; if you assume that means anything more than &quot;has the right amount of sky and buildings and greenery&quot;.
评论 #42358703 未加载
评论 #42359734 未加载
评论 #42358577 未加载
评论 #42358869 未加载
评论 #42367396 未加载
dmje6 个月前
Probably the thing you really want from an article with this topic focus is to be able to see the images bigger then a postage stamp size. And, even more irritating - the images are actually there, in reasonable size, just not linked...<p><a href="https:&#x2F;&#x2F;news.utexas.edu&#x2F;wp-content&#x2F;uploads&#x2F;2024&#x2F;11&#x2F;AI-streetscapes-2.jpg" rel="nofollow">https:&#x2F;&#x2F;news.utexas.edu&#x2F;wp-content&#x2F;uploads&#x2F;2024&#x2F;11&#x2F;AI-street...</a> <a href="https:&#x2F;&#x2F;news.utexas.edu&#x2F;wp-content&#x2F;uploads&#x2F;2024&#x2F;11&#x2F;AI-streetscapes-1.jpg" rel="nofollow">https:&#x2F;&#x2F;news.utexas.edu&#x2F;wp-content&#x2F;uploads&#x2F;2024&#x2F;11&#x2F;AI-street...</a>
amaurose6 个月前
I&#x27;d be very interested in the reverse: A background sound generator for still images. Would be nice to have for advanced picture frames...
评论 #42365436 未加载
评论 #42358137 未加载
IshKebab6 个月前
Researchers use AI to turn source recordings into <i>plausible</i> street images.
galleywest2006 个月前
Some of these are not very accurrate. That &quot;country side&quot; image has the entirely wrong foliage color (fall colors vs. spring colors). It also appears to place buildings when the &quot;ground truth&quot; image is by a small stream.<p>I would not rely on this tool for any meaningful data collection.
评论 #42358728 未加载
lifeisstillgood6 个月前
I am not by any stretch a mathematician but AI research like this reminds me of things that excite mathematicians - it’s like people spent three hundred years playing with Prime numbers and all of a sudden “oh yeah, silicon, fibre optics, ahah secure encryption”<p>There are going to be real useful tools - but we need to play for another century before we have that aha moment. Probably :-)
harrall6 个月前
I think this is cool but it’s a more of a statistical correlation than an AI-related paper.<p>What I’m saying is that if you were to replace ‘AI’ with “ask humans to draw an image based on these sounds,” you’ll probably get somewhat similar results.<p>Which is still interesting either way.
Animats6 个月前
This is more like a classifier. They have a bunch of human-classified image&#x2F;sound pairs, and they match unclassified sounds to the classified sounds. Then there&#x27;s a Midjourney image generation step, but that&#x27;s probably unnecessary.
amelius6 个月前
You can train DL models on anything. If you get accurate results then that is maybe publish-worthy.<p>In this particular case, it is not.
评论 #42359828 未加载
AtlasBarfed6 个月前
Are the displayed images the average output accuracy or the winners that happen to be accurate?
wigster6 个月前
how on earth does the first example recreate the blue-white logo on the building? b***t
joshdavham6 个月前
This is interesting. Sorta reminds me of how bats use sonar for their surroundings.
评论 #42358518 未加载
lowercased6 个月前
Am I missing something or is there no way to see those generated images except in postage-stamp sizes?<p>tldr:<p>you can view the image directly at <a href="https:&#x2F;&#x2F;news.utexas.edu&#x2F;wp-content&#x2F;uploads&#x2F;2024&#x2F;11&#x2F;AI-streetscapes-2-2048x929.jpg" rel="nofollow">https:&#x2F;&#x2F;news.utexas.edu&#x2F;wp-content&#x2F;uploads&#x2F;2024&#x2F;11&#x2F;AI-street...</a><p>Still not overly useful.