TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Ask HN: Real-time speech-to-speech translation

158 点作者 thangalin7 个月前
Has anyone had any luck with a free, offline, open-source, real-time speech-to-speech translation app on under-powered devices (i.e., older smart phones)?<p>* <a href="https:&#x2F;&#x2F;github.com&#x2F;ictnlp&#x2F;StreamSpeech">https:&#x2F;&#x2F;github.com&#x2F;ictnlp&#x2F;StreamSpeech</a><p>* <a href="https:&#x2F;&#x2F;github.com&#x2F;k2-fsa&#x2F;sherpa-onnx">https:&#x2F;&#x2F;github.com&#x2F;k2-fsa&#x2F;sherpa-onnx</a><p>* <a href="https:&#x2F;&#x2F;github.com&#x2F;openai&#x2F;whisper">https:&#x2F;&#x2F;github.com&#x2F;openai&#x2F;whisper</a><p>I&#x27;m looking for a simple app that can listen for English, translate into Korean (and other languages), then perform speech synthesis on the translation. Basically, a Babelfish that doesn&#x27;t stick in the ear. Although real-time would be great, a max 5-second delay is manageable.<p>RTranslator is awkward (couldn&#x27;t get it to perform speech-to-speech using a single phone). 3PO sprouts errors like dandelions and requires an online connection.<p>Any suggestions?

22 条评论

v7n6 个月前
It&#x27;s not exactly what OP wants out-of-the-box, but if anyone is considering building one I suggest taking a look at this.¹ It is really easy to tinker with, can run both on devide or in a client-server model. It has the required speech-to-text and text-to-speech endpoints, with multiple options for each built-in. If you can make the LLM AI assistant part of the pipeline to perform translation to a degree you&#x27;re comfortable with, this could be a solution.<p>¹ <a href="https:&#x2F;&#x2F;github.com&#x2F;huggingface&#x2F;speech-to-speech">https:&#x2F;&#x2F;github.com&#x2F;huggingface&#x2F;speech-to-speech</a>
评论 #41981907 未加载
thrdbndndn6 个月前
&gt; free<p>&gt; offline<p>&gt; real-time<p>&gt; speech-to-speech translation app<p>&gt; on under-powered devices<p>I genuinely don&#x27;t think the technology is there.<p>I can&#x27;t even find a half-good real-time &quot;speech to second language text&quot; tool, not even with &quot;paid&#x2F;online&#x2F;on powerful device&quot; options.
评论 #41982145 未加载
评论 #41992310 未加载
tkgally6 个月前
It’s not free, but I’ve had some success using ChatGPT’s Advanced Voice mode for sequential interpreting between English and Japanese. I found I had to first explain the situation to the model and tell it what I wanted it to do. For example: “I am going to have a conversation with my friend Taro. I speak English, and he speaks Japanese. Translate what I say into Japanese and what he says into English. Only translate what we say. Do not add any explanations or commentary.”<p>We had to be careful not to talk over each other or the model, and the interpreting didn’t work well in a noisy environment. But once we got things set up and had practiced a bit, the conversations went smoothly. The accuracy of the translations was very good.<p>Such interpreting should get even better once the models have live visual input so that they can “see” the speakers’ gestures and facial expressions. Hosting on local devices, for less latency, will help as well.<p>In business and government contexts, professional human interpreters are usually provided with background information in advance so that they understand what people are talking about and know how to translate specialized vocabulary. LLMs will need similar preparation for interpreting in serious contexts.
评论 #41990532 未加载
评论 #41980895 未加载
barrenko6 个月前
Expecting a boom in the speech-to-speech market in the following months. It&#x27;s the next thing.
评论 #41983209 未加载
sahbasanai6 个月前
It is impossible to accurately interpret with a max 5 second delay. The structure of some languages requires the interpreter to occasionally wait for the end of a statement being the start of interpretation is possible.
Fairburn6 个月前
‘Meanwhile, the poor Babel fish, by effectively removing all barriers to communication between different races and cultures, has caused more and bloodier wars than anything else in the history of creation.’
billylo7 个月前
Author of 3PO here: check out our latest version 2.12. Many fixes have been incorporated in the past two weeks. Cheers.
评论 #42046221 未加载
评论 #41947754 未加载
评论 #41980002 未加载
ladidahh6 个月前
Only seems to cover half of what you&#x27;re asking for... Starred this the other day and haven&#x27;t gotten to trying it out :<p><a href="https:&#x2F;&#x2F;github.com&#x2F;usefulsensors&#x2F;moonshine">https:&#x2F;&#x2F;github.com&#x2F;usefulsensors&#x2F;moonshine</a>
nacnud6 个月前
A friend recommends SayHi, which does near-realtime speech-to-speech translation (<a href="https:&#x2F;&#x2F;play.google.com&#x2F;store&#x2F;apps&#x2F;details?id=com.sayhi.app&amp;hl=en-US">https:&#x2F;&#x2F;play.google.com&#x2F;store&#x2F;apps&#x2F;details?id=com.sayhi.app&amp;...</a>). Unfortunately it&#x27;s not offline though.
评论 #41987856 未加载
EngineerDraft6 个月前
I&#x27;ve develop an macOS App: BeMyEars which can realtime speech-to-text translation. It first transcribe and then translate between language. All of this is working on-device. If you only want smart phone app: you can also try YPlayer, it&#x27;s also working on-device. They can be downloaded from AppStore.
评论 #41993833 未加载
评论 #41990941 未加载
lma216 个月前
Real-time and under-powered, no way. All the available tools (and models) today require non-negligible hardware.
jansan6 个月前
I just realized I will actually see a real Babelfish hitting the market in my lifetime. Amazing times indeed.
评论 #41981096 未加载
sva_6 个月前
Samsung Interpreter might be the closest, but is neither free nor does it work on low-power devices
ohlookcake7 个月前
I&#x27;ve been looking for something like this (Not for Korean though) and I&#x27;d even be happy to pay - though I&#x27;d prefer to pay by usage rather than a standing subscription fee. So far, no luck, but watching this thread!
评论 #41947829 未加载
NickC256 个月前
&gt;Although real-time would be great, a max 5-second delay is manageable.<p>Humans can&#x27;t even do this in immediate real-time, what makes you think a computer can? Some of the best real-time translators that work at the UN or for governments still have a short delay to be able to correctly interpret and translate for accuracy and context. Doing so in real-time actually impedes the translator from working correctly - especially in languages that have different grammatical structures. Even in langauges that are effectively congruent (think Latin derivatives), this is hard, if not outright impossible to do in real time.<p>I worked in the field of language education and computer science. The tech you&#x27;re hoping would be free and able to run on older devices is easily a decade away at the very best. As for it being offline, yeah, no. Not going to happen, because accurate real-time translation of even a database of the 20 most common languages on earth is probably a few terrabytes at the very least.
autumnstwilight6 个月前
Is this possible to do smoothly with languages that have an extremely different grammar to English? If you need to wait until the end of the sentence to get the verb, for instance, then that could take more than five seconds, particularly if someone is speaking off the cuff with hesitations and pauses (Or you could translate clauses as they come in, but in some situations you&#x27;ll end up with a garbled translation because the end of the sentence provides information that affects your earlier translation choices).<p>AFAIK, humans who do simultaneous interpretation are provided with at least an outline, if not full script, of what the speaker intends to say, so they can predict what&#x27;s coming next.
评论 #41980929 未加载
评论 #41980492 未加载
_gmax06 个月前
Off topic, but what&#x27;s the state-of-art behind speech recognition models at the moment?<p>Are people still using with DTW + HMMs?
评论 #41980458 未加载
bool3max6 个月前
FREE and OFFLINE and OPEN SOURCE and REAL-TIME on UNDER-POWERED devices?
评论 #41983983 未加载
评论 #41986355 未加载
评论 #41983855 未加载
评论 #41984051 未加载
Terr_6 个月前
&gt; * <a href="https:&#x2F;&#x2F;github.com&#x2F;openai&#x2F;whisper">https:&#x2F;&#x2F;github.com&#x2F;openai&#x2F;whisper</a><p>I would be very concerned about any LLM model being used for &quot;transcription&quot;, since they may injecting things that nobody said, as in this recent item:<p><a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=41968191">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=41968191</a>
评论 #41981486 未加载
alexisread6 个月前
This phone has been around for ages, and does the job. It&#x27;s well weapon! <a href="https:&#x2F;&#x2F;www.neatoshop.com&#x2F;product&#x2F;The-Wasp-T12-Speechtool" rel="nofollow">https:&#x2F;&#x2F;www.neatoshop.com&#x2F;product&#x2F;The-Wasp-T12-Speechtool</a>
yuryk6 个月前
moonshine?
评论 #41988833 未加载
bbstats6 个月前
doesnt google translate do this?
评论 #41983866 未加载