科技回声

14 条评论

ptx6 个月前

This is not my area of expertise, but if I understand the article correctly, they created a model that matches pre-existing audio clips to pre-existing images. But instead of returning the matching image, the LLM generates a distorted fake image which is vaguely similar to the real image.So it doesn't really, as the title claims, turn recordings into images (it already has the images) and the distorted fake images it creates are only "accurate" in that they broadly slot into the right category in terms of urban/rural setting, amount of greenery and amount of sky shown.It sounds like the matching is the useful part and the "generative" part is just a huge disadvantage. The paper doesn't seem to say if the LLM is any better than other types of models at the matching part.

评论 #42357879 未加载

评论 #42358088 未加载

simonw6 个月前

The word "accurate" in that headline is doing a LOT of work.Here's how the results were scored:"Computer evaluations compared the relative proportions of greenery, building and sky between source and generated images, whereas human judges were asked to correctly match one of three generated images to an audio sample."So this is very impressive and a cool piece of research, but unsurprisingly not recreating the space "accurately" if you assume that means anything more than "has the right amount of sky and buildings and greenery".

评论 #42358703 未加载

评论 #42359734 未加载

评论 #42358577 未加载

评论 #42358869 未加载

评论 #42367396 未加载

dmje6 个月前

Probably the thing you really want from an article with this topic focus is to be able to see the images bigger then a postage stamp size. And, even more irritating - the images are actually there, in reasonable size, just not linked...<a href="https://news.utexas.edu/wp-content/uploads/2024/11/AI-streetscapes-2.jpg" rel="nofollow">https://news.utexas.edu/wp-content/uploads/2024/11/AI-street...</a> <a href="https://news.utexas.edu/wp-content/uploads/2024/11/AI-streetscapes-1.jpg" rel="nofollow">https://news.utexas.edu/wp-content/uploads/2024/11/AI-street...</a>

amaurose6 个月前

I'd be very interested in the reverse: A background sound generator for still images. Would be nice to have for advanced picture frames...

评论 #42365436 未加载

评论 #42358137 未加载

IshKebab6 个月前

Researchers use AI to turn source recordings into plausible street images.

galleywest2006 个月前

Some of these are not very accurrate. That "country side" image has the entirely wrong foliage color (fall colors vs. spring colors). It also appears to place buildings when the "ground truth" image is by a small stream.I would not rely on this tool for any meaningful data collection.

评论 #42358728 未加载

lifeisstillgood6 个月前

I am not by any stretch a mathematician but AI research like this reminds me of things that excite mathematicians - it’s like people spent three hundred years playing with Prime numbers and all of a sudden “oh yeah, silicon, fibre optics, ahah secure encryption”There are going to be real useful tools - but we need to play for another century before we have that aha moment. Probably :-)

harrall6 个月前

I think this is cool but it’s a more of a statistical correlation than an AI-related paper.What I’m saying is that if you were to replace ‘AI’ with “ask humans to draw an image based on these sounds,” you’ll probably get somewhat similar results.Which is still interesting either way.

Animats6 个月前

This is more like a classifier. They have a bunch of human-classified image/sound pairs, and they match unclassified sounds to the classified sounds. Then there's a Midjourney image generation step, but that's probably unnecessary.

amelius6 个月前

You can train DL models on anything. If you get accurate results then that is maybe publish-worthy.In this particular case, it is not.

评论 #42359828 未加载

AtlasBarfed6 个月前

Are the displayed images the average output accuracy or the winners that happen to be accurate?

wigster6 个月前

how on earth does the first example recreate the blue-white logo on the building? b***t

joshdavham6 个月前

This is interesting. Sorta reminds me of how bats use sonar for their surroundings.

评论 #42358518 未加载

lowercased6 个月前

Am I missing something or is there no way to see those generated images except in postage-stamp sizes?tldr:you can view the image directly at <a href="https://news.utexas.edu/wp-content/uploads/2024/11/AI-streetscapes-2-2048x929.jpg" rel="nofollow">https://news.utexas.edu/wp-content/uploads/2024/11/AI-street...</a>Still not overly useful.

14 条评论

ptx6 个月前

评论 #42357879 未加载

评论 #42358088 未加载

simonw6 个月前

评论 #42358703 未加载

评论 #42359734 未加载

评论 #42358577 未加载

评论 #42358869 未加载

评论 #42367396 未加载

dmje6 个月前

amaurose6 个月前

I'd be very interested in the reverse: A background sound generator for still images. Would be nice to have for advanced picture frames...

评论 #42365436 未加载

评论 #42358137 未加载

IshKebab6 个月前

Researchers use AI to turn source recordings into plausible street images.

galleywest2006 个月前

评论 #42358728 未加载

lifeisstillgood6 个月前

harrall6 个月前

Animats6 个月前

amelius6 个月前

You can train DL models on anything. If you get accurate results then that is maybe publish-worthy.In this particular case, it is not.

评论 #42359828 未加载

AtlasBarfed6 个月前

Are the displayed images the average output accuracy or the winners that happen to be accurate?

wigster6 个月前

how on earth does the first example recreate the blue-white logo on the building? b***t

joshdavham6 个月前

This is interesting. Sorta reminds me of how bats use sonar for their surroundings.

评论 #42358518 未加载

lowercased6 个月前

Researchers use AI to turn sound recordings into street images

14 条评论

Researchers use AI to turn sound recordings into street images

14 条评论