If the size could shrink to the size of a small earplug, I'd love to use this as a person who is not hearing-impaired <i>(at least they couldn't diagnose me with it, so now I'm not sure if their diagnostics sucks, or I'm just a normal person and others pretend better that they hear everything well)</i>.<p>In groups and with friends, it's inevitable that you end up in a busy restaurant or a bar, and it always frustrates me that I don't hear something, I ask the person to repeat only to not hear it again, usually because they repeat it at the same low level (considering the circumstances). Missing jokes and throwaway comments is even worse ("hey what are you all laughing about, I didn't hear it, could you repeat it for me like three times until I hear it").
One thing that the HN crowd should appreciate is just how expensive and shit hearing aids are.<p>go and look the up the price, they are deeply expensive, even for basic "make it louder" type aids.<p>Worse still, because they interfere with your ear, you tend to loose the ability to "steer" your hearing. This means that you can't tune out other conversations/noises or stuff.<p>The one good side effect of facebook spending billions on its (probably) futile search for practical and popular AR is <a href="https://www.projectaria.com/glasses/" rel="nofollow">https://www.projectaria.com/glasses/</a><p>Which is a (cheap) platform to do experimentation for AR type actions.<p>However it has eye tracking, microphone array and front facing cameras, so it can be fairly easily modified into being a steerable microphone.
As somebody who is hearing impaired, a feature like this would be a Godsend for me! This feature should be integrated into hearing-aids ASAP! Shut up - no, actually - keep talking and take my money!
This but more advanced would quite nicely help with my tinnitus. I hear fine when one person is speaking (even softly and at a distance), but multiple or with music, I hear nothing.
I'll bet they achieve commercial success with the reverse application. Imagine being able to mute that one obnoxiously loud person with an annoying voice at a party!
I used to work at Sonos, long before their current app update debacle and headphone debut.<p>During the first aborted product effort to develop headphones, we were looking at a conceptual feature similar to this - selectively allowing people’s voices through the ANC chipset.<p>I don’t recall the exact approach the DSP folks were using (I was closer to the hardware for ANC) but they were really only able to figure out how to isolate the wearer’s voice by virtue of that signal having more power than all the others.<p>This is terribly cool. I wonder what other kinds of fun you could have with headphones. ANC chipsets are incredibly powerful and I’d wager their capabilities are not even close to fully tapped.
The open source code is at <a href="https://github.com/vb000/LookOnceToHear">https://github.com/vb000/LookOnceToHear</a> and the research paper is at <a href="https://arxiv.org/abs/2405.06289" rel="nofollow">https://arxiv.org/abs/2405.06289</a><p>So perhaps this is not as out of reach as many pop-science articles. I’d love to hear if anyone is able to get this working independently.
This could actually be really helpful to me, as I have trouble hearing someone speaking in a busy room because my mind is trying to pick up everything (I think this is because of my ADHD). Having a way to significantly quiet out other noises aside for the voice of the person I'm speaking with would be amazing.
A potential feature I didn't know I needed. Have headphones with ANC on around home all the time, would be really useful if it auto passthrough my partners voice.
They couldn't use this to listen to me. They would just get "I am just a large language model, I can't help you with that."<p>I use a lot of curse words. ;)
Imagine it helping people with Autism and ADHD! ADHD people have hard time listening to 1 person because part of the brain tries to listen to all other conversations going around.
This remembers me of NVIDIA RTX Voice [0].
Although not made to isolate single persons, this is quite impressive.
I hope that this single person isolation will find it's way to consumer noise-cancelling headphones<p>[0] <a href="https://www.youtube.com/watch?v=uWUHkCgslNE" rel="nofollow">https://www.youtube.com/watch?v=uWUHkCgslNE</a>
I used to think of building something related to let a mic pick up a single person to handle questions from the audience, during presentations. Will save the hassle of passing around mics.<p>This looks like it could do just that with the headphones feeding directly into the mixer and behaving like a focused mic.
When I was a youngling, I dreamed of having headphones with the opposite power -- muting specific people. For me, it's not the hubbub of a crowd that's distracting, it's usually one or two offending specimens - like in the video example, the inconsiderate vermin using a speakerphone in public.<p>I wonder if the problem maps easily from "select this source" to "select everything but that source"
How much is the AI necessary for this? At least for the targeting of sounds in the line of sight, that should be fairly easy to do without AI, but I don’t know about the human voice identification.
The “cocktail party effect” externalized. Extremely cool.<p><a href="https://en.m.wikipedia.org/wiki/Cocktail_party_effect" rel="nofollow">https://en.m.wikipedia.org/wiki/Cocktail_party_effect</a>
The concept of headphones in the last 60 years, with the exception of sound quality, comfort, and eventually being cordless.... they have not changed much in terms of style and appearance.<p>However, I think in the next 50 years, headphones will disappear or.. should I say evolve as part of the human anatomy. Same thing for screen monitors, mouse/keyboard, smartphones, etc.<p>Think about it. The way things are going, along with "AI" (sure buzzword in a number of ways but something that will change our way of living) many of things we use will be replaced and, likely, be simple extensions or, dare I say, be implanted.<p>Hard drives will be a thing of the past. Everything will be (as we call today) "cloud-based" and we will be more cybernetic than we think. Of course, someone today will fear such an idea. As we slowly accept the little changes.. in 50 years we will look back and think "how did they cope without it"... a bit like how someone today look back and think "how did they cope without the internet"<p>Many fear what they dont understand. It is the unknown. AI is a fear factor for many. For me, I accept it for what it is and the changes it will impact our lives and our careers.<p>All I will say is -- strap yourselves in.. it will be a bumpy ride. I hope we make it through without destroying ourselves. Once we past it, the world could (finally) be at peace and to quote a famous TV show --- "to bondly go where no man has gone before!"
This reminds me a lot of <a href="https://github.com/xiph/rnnoise">https://github.com/xiph/rnnoise</a> and my use of it locally. It zeroes in on voice via RNN which seems to beat most other noise detection filters I've tried. Unfortunately, I mostly disable it these days since it's a bit harder to tune than I'm up for, but it's by far the most promising local noise reduction I've used.
In my experience, most people don't seem to understand the concept of noise cancelling headphones and will still try to talk to people who clearly can't hear them. I can't imagine it'd be any different for these AI headphones in practical use. Probably worse because the person you're actually trying to talk to might think you can't hear them.
Pretty cool what they are working on. However, I wished there would be more funding for restoring hair cells which are the root cause for most people with hearing loss.<p>Researchers are getting closer. Dr. Chen from Harvard was able to regenerate hair cells in mature mice last year.<p>The problem is also becoming more widespread. 30 Mio people in the US and 400 Mio people worldwide have disabling hearing loss. Regenerating hair cells and the synapses around them would also cure Tinnitus. 30 Mio x $5k for a treatment = $60B market (probably way bigger with aging population)<p>I think we probably need more rich tech billionaires to get affected to attract large funding.<p>What billionaires that you know are affected besides:<p>- Brad Jacobs<p>- Ryan from Flexport/Founders Fund
Sounds like a great way to spy on all people and extract all conversations. I can't wait for judges to declare that all conversation at your office must be recorded like some of them have for chat. This tech is a step to enable such a thing.
Before getting all excited that your ML model runs on your brand new 2024 macbook, before you run off to create earbuds / hearing aids with it, please try to run it on-target and see whether your model runs within your runtime budget / power budget / device size budget / battery life budget.<p>And make sure if you're going to do bluetooth + wireless, remember that both bluetooth and wifi transmit on 2.4 GHz, and need to coordinate in order to coexist in the same IoT device. There are interconnects and wire protocols to connect the bluetooth and wifi chips together - or, preferably, you buy a chip that does both.
Presumably this could be used to block out specific voices/sounds.<p>There's an episode of the sci-fi show Black Mirror (White Christmas), where a person is convicted of some hideous crime and permanently blocked/made invisible and inaudible from everyone (the entire population has embedded audio/video processing enhancements by then).<p>You can imagine future headphones where you could block out the guy in your office with the annoying laugh our download 'blocks' from the headphone appstore - no more Rick Astley or the politician you don't like etc.
Curious what sort of processing power or chipsets the 'onboard embedded computer' needs. Could this be an iPhone app? Or is this going to require new, specialized hardware to commoditize?
> To use the system, a person wearing off-the-shelf headphones fitted with microphones taps a button while directing their head at someone talking. The sound waves from that speaker’s voice then should reach the microphones on both sides of the headset simultaneously; there’s a 16-degree margin of error.<p>Perhaps the accuracy of identifying the correct voice could be vastly increased by adding video input. The AI can then try to match the various voices with the lip movements in the center of the video, basically lip reading.
I think someone made something similar in the 80s by using blind source separation techniques like ICA<p>But this is very useful for people like me who don't hear well in the high frequencies.
This is pretty amazing -- and a practical application of a solution to a notoriously tricky problem called the "cocktail party problem."[1] For a small subset of researchers, writing an algorithm to isolate a voice in a crowd is on par with e.g. writing an AI to play Go.<p>[1] <a href="https://en.wikipedia.org/wiki/Cocktail_party_effect" rel="nofollow">https://en.wikipedia.org/wiki/Cocktail_party_effect</a>
>A University of Washington team<p>Oh, so it barely works and its a proof of concept.<p>What is the interesting thing here? We all know how sound waves work. Pretty sure this technology is old. Until there is a product here, it just sounds like you are rehashing noise cancellation.<p>Academia has dug this grave of skepticism. I just have 0 faith this will get to market through University of Washington. Maybe it will be patented and used even less!
While not exactly the same, I came across an app called Tunity the other day. It allows you to use your phone camera to catch the live audio feed of the television that you are attempting to watch, whether the audio is muted or if it's in a loud, crowded location, like a sports bar or airport. I haven't used it, but it's an interesting concept.
Next can we have them identify ambient noises that need amplification for safety reasons, like the nearly-silent electric car about to run me over, or the bike I'm about to accidentally step in front of? As someone who spends a bit too much of my time walking around on calls, I think selective amplification of ambient sounds for safety would be amazing!
I opened an issue with this. Maybe someone here knows.<p>I see a Python script I can run on my computer, I haven't tried it yet, but I think I could connect a microphone and process real-time audio and output it in real time, but I don’t know how to detect the user looking at someone. Could you tell me how that works?
When ANC headphones came out, my friends thought about something like filtering certain sounds away. I bet many people have also had this kind of idea, but nevertheless, haven't actually built it. This looks intriguing, and with open-source POC code, it seems promising.
I want to filter out all non-nature sounds. I dream of walking through the airport or the park in peace. AI seems the way to go with that since you have to predict the sound to counteract it. Good to see we are finally making progress.
This could easily hold a library of voices that you interact with (e.g at a bigger table of friends and family) and let you toggle in and out voices that are relevant. Apple please include this feature for your Airpods, thanks! :)
My daughter has an auditory impairment which she describes as "brain deaf".<p>Basically, her hearing is perfect but her brain struggles to process sound in a noisy environment; she can't single out what she is listening to.<p>This sounds perfect for her!
How does it solve the problem of humans being able to detect that someone's looking at us? We tend to stop talking when we sense someone's staring at us.
Honestly AI speech recognition still sucks so bad I'm basically convinced it will fall on its face in many daily use cases.<p>I realize this is slightly tangential, but please don't replace customer support with chatbots or whatever you want to call them. It's a freaking horrible experience.
I love this.<p>I know this is just the beginning and the tech and UX will mature a lot - but being able to consciously choose what we allow into our sensory world would be a great superpower to have.<p>In the distant future this will all be embedded inside a cochlear (neural?) implant.<p>You can "save" known voices, prioritize them, identify various scenes/modes automatically like meetings/parties/concerts/driving/walking etc, know when to allow external sounds in (alarms, honks, someone calling your attention, etc)<p>And with great power yada yada.<p>I can already imagine a few ways this can be misused / abused / create non-existent challenges and problems too. But I am (cautiously) optimistic that we as human race will collectively figure out how to steer these new technology applications into net positive territory.<p>2040: iAudio and xSmell blamed for people losing connect with nature's sounds (like bird chirps and flowing streams) and smells (petrichor) - things that inspire us, make us creative, make life worthwile, and make us humans.