TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

SeamlessM4T, a Multimodal AI Model for Speech and Text Translation

167 pointsby mchiangalmost 2 years ago

13 comments

lhlalmost 2 years ago
I gave it a spin a little bit ago. Per usual, install docs didn&#x27;t quite work OOTB, here&#x27;s how I got it working: <a href="https:&#x2F;&#x2F;llm-tracker.info&#x2F;books&#x2F;howto-guides&#x2F;page&#x2F;speech-to-text#bkmrk-seamlessm4t" rel="nofollow noreferrer">https:&#x2F;&#x2F;llm-tracker.info&#x2F;books&#x2F;howto-guides&#x2F;page&#x2F;speech-to-t...</a><p>One limitation that seems undocumented, the current code only supports relatively short clips so isn&#x27;t suitable for long transcriptions:<p>&gt; ValueError: The input sequence length must be less than or equal to the maximum sequence length (4096), but is 99945 instead.
评论 #37226190 未加载
crakenzakalmost 2 years ago
code: <a href="https:&#x2F;&#x2F;github.com&#x2F;facebookresearch&#x2F;seamless_communication">https:&#x2F;&#x2F;github.com&#x2F;facebookresearch&#x2F;seamless_communication</a><p>paper: <a href="https:&#x2F;&#x2F;ai.meta.com&#x2F;research&#x2F;publications&#x2F;seamless-m4t&#x2F;" rel="nofollow noreferrer">https:&#x2F;&#x2F;ai.meta.com&#x2F;research&#x2F;publications&#x2F;seamless-m4t&#x2F;</a><p>demo: <a href="https:&#x2F;&#x2F;seamless.metademolab.com&#x2F;" rel="nofollow noreferrer">https:&#x2F;&#x2F;seamless.metademolab.com&#x2F;</a>
评论 #37229576 未加载
0cf8612b2e1ealmost 2 years ago
Will there be a whispercpp equivalent? Half the reason I love whisper is how dead simple it is to get running. I will take somewhat lower accuracy for easier operation.<p>Edit: unless there is native speaker diarization. That would be a huge value add.
评论 #37226316 未加载
评论 #37226287 未加载
msp26almost 2 years ago
All I want is llama-2-34b (seriously what&#x27;s taking so long on this specific model) but this is interesting too I guess.
rvzalmost 2 years ago
Yet somehow, many here underestimated Meta’s position in AI and proclaimed that Meta was dying and was not important and far behind in the AI race.<p>How things change dramatically in one year with such exaggeration of Meta’s collapse in 2022.<p>Not only they are in the lead in $0 free AI models, they are also at the finish line in the AI race to zero.
jimmiesalmost 2 years ago
Lol, they botched the first example - that it translates “Our goal is to create a more connected world” to Vietnamese: It has a glancing typo at the end of the sentence “hơn” instead of “hơ.” Also it really messed up the pronounciation: It read “Chúng tôi” as “Chúng ta” - they are totally different words phonetically. The pronunciation also sounds like it’s made by someone who is mentally sick. So they botched in both translation and pronunciation.<p>That’s so embarrassing - especially for something to show how good their stuff is (although I think it’s probably not the ai’s fault) - just shows how sloppy their people are.<p>I know they have plenty of Vietnamese engineers there. Did the PR dept just throw this final version of the video out without reviewing with them?
评论 #37227319 未加载
houseatrielahalmost 2 years ago
SeamlessM4T-Medium { 1.2B params, filesize 6.8 GB }. Wondering how it compares to OpenAi&#x27;s Whisper.
评论 #37225973 未加载
评论 #37226428 未加载
gigel82almost 2 years ago
The speech recognition in their demo is very very bad (~60% in my empirical test, vs. 95% with WhisperCPP). The translation is also very inaccurate.<p>That said, I fully support open releases and look forward to future versions and improvements.
Havocover 1 year ago
Disappointing license. Here&#x27;s a useful thing, but be sure to not use it for the majority of use cases
Jayakumarkalmost 2 years ago
Meta is killing it with this open models. Not sure why Tamil Language is missing on Output.
评论 #37226240 未加载
villgaxover 1 year ago
Non-commercial as per frickin usual
jacooperalmost 2 years ago
What&#x27;s the license
评论 #37225992 未加载
1atticealmost 2 years ago
....&#x27;M4T&#x27;, ahem, might mean slightly more than you think it does