TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

LLM_transcribe_recording: Bash Helper Using Mlx_whisper

39 点作者 Olshansky8 个月前

1 comment

ndr_8 个月前
This may call out to ffmpeg for pre-processing. If you&#x27;re reluctant to running that on you Mac straight, you can use this wrapper script to have ffmpeg run in a docker instance: <a href="https:&#x2F;&#x2F;gist.github.com&#x2F;ndurner&#x2F;636d37fd83aed4b875cdb66653017ae7" rel="nofollow">https:&#x2F;&#x2F;gist.github.com&#x2F;ndurner&#x2F;636d37fd83aed4b875cdb6665301...</a><p>However, I found that Whisper is thrown off by background music in a prodcast - and will not recover. (That was with the mlx-community&#x2F;whisper-large-v3-mlx checkpoint, OP uses distil-whisper-large-v3). I concluded for myself that Whisper might be used in larger processing pipelines that will handle such - can someone provide insights about that? The podcast I used it on was <a href="https:&#x2F;&#x2F;www.heise.de&#x2F;news&#x2F;KI-Update-Deep-Dive-Was-taugen-KI-Suchmaschinen-9850904.html" rel="nofollow">https:&#x2F;&#x2F;www.heise.de&#x2F;news&#x2F;KI-Update-Deep-Dive-Was-taugen-KI-...</a>.<p>I ended up using Google Gemini, which handled it well. (Blog post: <a href="https:&#x2F;&#x2F;ndurner.github.io&#x2F;mlx-whisper-gemini" rel="nofollow">https:&#x2F;&#x2F;ndurner.github.io&#x2F;mlx-whisper-gemini</a>)
评论 #41489152 未加载