TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

OTranscribe: A free and open tool for transcribing audio interviews

482 点作者 zerojames9 个月前

27 条评论

cube22229 个月前
I needed to do this this week (transcribe an interview with multiple speakers) and used <a href="https:&#x2F;&#x2F;github.com&#x2F;MahmoudAshraf97&#x2F;whisper-diarization">https:&#x2F;&#x2F;github.com&#x2F;MahmoudAshraf97&#x2F;whisper-diarization</a><p>Worked excellent.<p>It generates both a file that just contains a line per uninterrupted speaker speech prefixed with the speaker number, as well as a file with timestamps which I believe would be used as subtitles.
评论 #41199997 未加载
评论 #41200067 未加载
评论 #41200587 未加载
评论 #41199944 未加载
评论 #41202461 未加载
评论 #41204663 未加载
nullbar9 个月前
Maybe it isn&#x27;t perfectly clear, but OTranscribe isn&#x27;t an automatic speech-to-text tool, but instead, a UI for assisting in manual transcribing.<p>So no AI here, folks.
评论 #41203468 未加载
btown9 个月前
Are there any open-source or paid apps&#x2F;shareware&#x2F;freeware that can:<p>- Transcribe word-by-word in real time as audio is recorded<p>- Work entirely locally<p>- Use relatively recent open-source local models?<p>I&#x27;ve been using otter.ai for real-time meeting transcriptions - letting me multitask and instantly catch up if I&#x27;m asked a question by skimming the most recent few seconds worth of the transcript - but it&#x27;s far from perfect and occasionally their real-time service has significant transcription delays, not to mention it requires internet connectivity.<p>Most of the Whisper-based apps out there, though, as well as (when I last checked) the whisper.cpp demo code, require an entire recording to be ingested at once. There are others that rely on e.g. Apple&#x27;s dictation frameworks, which is a bit dated in capability at the moment.<p>Anything folks are using out there?
评论 #41202808 未加载
评论 #41203551 未加载
评论 #41202943 未加载
评论 #41203595 未加载
评论 #41205348 未加载
评论 #41203145 未加载
jrochkind19 个月前
Kinda surprised to not have AI integration.<p>You do still need to proof and QA even AI results, if you want a publication quality result, and do things like attribute who is speaking when (at least Whisper can&#x27;t do that), and correct &quot;unusual&quot; last names and things. So I feel like people using AI still need good tools for the correcting&#x2F;finishing&#x2F;proofing too, that would be similar to the tools for non-assisted transcription.
评论 #41201480 未加载
justinclift9 个月前
From their FAQ:<p><pre><code> Does oTranscribe automatically convert audio into text? Sorry! It doesn’t. oTranscribe makes the manual task of transcribing audio a lot less painful. But you still have to do the transcription.</code></pre>
ilt9 个月前
I currently use Aiko’s free iOS app which does offline transcription using OpenAI’s Whisper model. It has been working pretty well for me so far. It can export in formats like SRT, TXT, CSV, JSON and text with timestamps too. <a href="https:&#x2F;&#x2F;sindresorhus.com&#x2F;aiko" rel="nofollow">https:&#x2F;&#x2F;sindresorhus.com&#x2F;aiko</a>
leiferik9 个月前
You&#x27;re always welcome to try my service TurboScribe <a href="https:&#x2F;&#x2F;turboscribe.ai&#x2F;" rel="nofollow">https:&#x2F;&#x2F;turboscribe.ai&#x2F;</a> if you need a transcript of an audio&#x2F;video file. It&#x27;s 100% free up to 3 files per day (30 minutes per file) and the paid plan is unlimited and transcribes files up to 10 hours long each. It also supports speaker recognition, common export formats (TXT, DOCX, PDF, SRT, CSV), as well as some AI tools for working with your transcript.
评论 #41209630 未加载
评论 #41203085 未加载
tkgally9 个月前
I was curious how good a transcription I could get from what may be the best multimoldal LLM currently, Gemini-1.5-Pro-Experiment-0801, so I had it transcribe five minutes of an interview between Ezra Klein and Nancy Pelosi from earlier today. The results are here:<p><a href="https:&#x2F;&#x2F;www.gally.net&#x2F;temp&#x2F;20240809geminitranscription&#x2F;index.html" rel="nofollow">https:&#x2F;&#x2F;www.gally.net&#x2F;temp&#x2F;20240809geminitranscription&#x2F;index...</a><p>Aside from some minor punctuation and capitalization issues, Gemini’s transcription looks nearly perfect to me. There were only one or two words that I think it misheard. If I had transcribed the audio myself, I would have made more mistakes than that.<p>One passage struck me in particular:<p><pre><code> And then he comes up with &quot;weird,&quot; which becomes viral and the rest, and here he is. </code></pre> How did Gemini know to put “weird” in quotation marks, to indicate—correctly—that the speaker was referring to Walz’s use of the word as a word? According to Politico, Walz first used the word in that context in the media on July 23.<p><a href="https:&#x2F;&#x2F;www.politico.com&#x2F;news&#x2F;2024&#x2F;07&#x2F;26&#x2F;trump-vance-weird-00171470" rel="nofollow">https:&#x2F;&#x2F;www.politico.com&#x2F;news&#x2F;2024&#x2F;07&#x2F;26&#x2F;trump-vance-weird-0...</a>
评论 #41208826 未加载
评论 #41209992 未加载
matejmecka9 个月前
Just pitching in a transcription tool that lets you transcribe video and audio files using Whisper and WASM in your browser, and get a .txt, .srt, .vtt file. Maybe in the future support for Whisper Turbo?<p><a href="https:&#x2F;&#x2F;video2srt.ccextractor.org&#x2F;" rel="nofollow">https:&#x2F;&#x2F;video2srt.ccextractor.org&#x2F;</a><p>Disclaimer: Working on this project.
TrojanHookworm9 个月前
Use this a lot. It&#x27;s nice and simple and has exactly the tools you need (playback speed control, easy pause&#x2F;play) and nothing more. Greatly prefer it over automatic transcription tools give you 40 pages of &#x27;umm&#x27;s and &#x27;ahhhh&#x27;s to filter through and edit.
评论 #41200887 未加载
kgdiem9 个月前
I started making an open source macOS app to do this with whisper and potentially pyannote.<p>It is functional but a bit slow. I think using whisper directly instead of swift bindings will help a lot.<p>Really interested in adding diarisation but having a lot of trouble converting Pyannote to CoreML. Pyannote runs so slowly with torch on CPU. Haven’t gotten around putting my latest work for that on GitHub yet.<p>Happy to accept contributions —<p>Some priorities right now:<p>* Fixing signing for local builds<p>* Replace swift whisper with whisper cpp<p>* Allowing users to provide their own models<p><a href="https:&#x2F;&#x2F;github.com&#x2F;Stack-Studio-Digital-Collective&#x2F;Auditif">https:&#x2F;&#x2F;github.com&#x2F;Stack-Studio-Digital-Collective&#x2F;Auditif</a>
评论 #41207141 未加载
dmitrykan9 个月前
I&#x27;m working on the tool, that includes AI. My original target is to test it on my <a href="https:&#x2F;&#x2F;www.youtube.com&#x2F;c&#x2F;VectorPodcast" rel="nofollow">https:&#x2F;&#x2F;www.youtube.com&#x2F;c&#x2F;VectorPodcast</a> by offering something that Lex Fridman does for his episodes.<p>Current features: 1. Download from YT 2. Transcribe using Vosk (output has time codes included) 3. Speaker diarization using pyannote - this isn&#x27;t perfect and needs a bit more ironing out.<p>What needs to be done: 4. Store the transcription in a search engine (can include vectors) 5. Implement a webapp<p>If anyone here is interested to join forces, let me know.
jagermo9 个月前
fantastic tool; I used it a lot to transcribe interviews during plane travels where there was no internet, and I needed to fill the time. Really useful to have if you do a lot of interviews
评论 #41200357 未加载
choya-love9 个月前
Any new language support in the future? Fingers crossed for japanese
评论 #41201183 未加载
评论 #41200384 未加载
avodonosov9 个月前
I made a similar tool for making tables of contents for youtube videos: <a href="https:&#x2F;&#x2F;youtoc.by&#x2F;" rel="nofollow">https:&#x2F;&#x2F;youtoc.by&#x2F;</a><p>Not developing it actively after I created tables of contents for the several videos I needed, years ago. If I ever need it again, I will probably work on mobile UI (aka responsive)
BetterWhisper9 个月前
If you are looking for something automatic that also allows you to interact with your transcripts chatgpt style then I would recommend <a href="https:&#x2F;&#x2F;www.videototextai.com&#x2F;" rel="nofollow">https:&#x2F;&#x2F;www.videototextai.com&#x2F;</a>
评论 #41201561 未加载
ldenoue9 个月前
You can also try Scribe (free chrome extension and iOS app) <a href="https:&#x2F;&#x2F;www.appblit.com&#x2F;scribe" rel="nofollow">https:&#x2F;&#x2F;www.appblit.com&#x2F;scribe</a>
ciaran009 个月前
Talio.ai allows you to do this with chatGPT style chat with the transcript plus numerous other features <a href="https:&#x2F;&#x2F;talio.ai" rel="nofollow">https:&#x2F;&#x2F;talio.ai</a>
accidbuddy9 个月前
Anyone knows one with transcription and translate in real time?<p>Nowadays, I use libretranslate&#x2F;libretranslate and pluja&#x2F;whishper to do this, but not at real time.
评论 #41202182 未加载
phoronixrly9 个月前
<a href="https:&#x2F;&#x2F;github.com&#x2F;oTranscribe&#x2F;oTranscribe">https:&#x2F;&#x2F;github.com&#x2F;oTranscribe&#x2F;oTranscribe</a>
bcherny9 个月前
Looks cool! Unclear from the docs, but does it support non-English languages? How about mixed-language interviews?
评论 #41201795 未加载
neves9 个月前
Does anybody tested it with Brazilian Portuguese? It is a hard problem, since we have too many accents.
评论 #41203498 未加载
ulrischa9 个月前
Pretty amazing what a webapp an do. I whished there were more lile them and not all these native apps
kimoz9 个月前
Anyone knows a free tool for generating subtitles for movies and series videos ?
评论 #41202580 未加载
评论 #41200694 未加载
评论 #41201035 未加载
space_oddity9 个月前
oTranscribe is a free option for transcription but in many cases it&#x27;s just too simple
teddyh9 个月前
See also <i>TranscriberAG</i>: &lt;<a href="https:&#x2F;&#x2F;transag.sourceforge.net&#x2F;" rel="nofollow">https:&#x2F;&#x2F;transag.sourceforge.net&#x2F;</a>&gt;
bilater9 个月前
If you just want quick transcriptions of YouTube video this works pretty well <a href="https:&#x2F;&#x2F;www.you-tldr.com&#x2F;" rel="nofollow">https:&#x2F;&#x2F;www.you-tldr.com&#x2F;</a>