MacWhisper: Transcribe audio files on your Mac

240 点作者 cristoperb将近 2 年前

35 条评论

simonw将近 2 年前

I've been using MacWhisper for a few months, it's fantastic.Sometimes I'll send a mp3 or mp4 video through it and use the resulting transcript directly.Other times I'll run a second step through <a href="https://claude.ai/" rel="nofollow noreferrer">https://claude.ai/</a> (because of its 100,000 token context) to clean it up. My prompt for that at the moment is:> Reformat this transcript into paragraphs and sentences, fix the capitalization and make very light edits such as removing umsThat's often not necessary with Whisper output. It's great for if you extract captions directly from YouTube though - I wrote more about that here: <a href="https://simonwillison.net/2023/Aug/6/annotated-presentations/" rel="nofollow noreferrer">https://simonwillison.net/2023/Aug/6/annotated-presentations...</a>

评论 #37239547 未加载

tornato7将近 2 年前

I have a Python script on my mac that detects when I press-and-hold the right option key, and records audio while it's pressed. On release, it transcribes it with whispercpp and pastes it. Makes it very easy to record quick voice notes. Here it is: <a href="https://github.com/corlinp/whisperer/tree/whisper.cpp">https://github.com/corlinp/whisperer/tree/whisper.cpp</a>I was working on a native version in the form of a taskbar app with customizable prompt and all. However I quickly realized that the behaviors I want the app to do require a bunch of accessibility permissions that would block it from the app store and require more setup steps.Would anybody still find that useful?

评论 #37239269 未加载

评论 #37240870 未加载

评论 #37240931 未加载

评论 #37337525 未加载

评论 #37245930 未加载

评论 #37238461 未加载

paulmd将近 2 年前

Whisper is cool. Back in college I wanted to do some projects with speech-to-text and text-to-speech as an interface like 10-12 years ago, but at that point the only option was google APIs that charged by the word or second.On top of that, constantly sending data to google would have chewed a ton of battery compared to the "activation word" style solutions ("ok google/siri") that can be done on-device. The power for on-device processing was obviously going to come down over time, while wireless is much more governed by the laws of physics, and connectivity power budgets haven't gone down nearly as much over time. I am pretty sure there is a fundamental asymptotic limit for this, governed by Shannon entropy limit/channel width and power output. In the presence of a noise floor of X, for a bandwidth of Y, you simply cannot use less than Z total power for moving a given amount of data.BTLE is really the first game-changer (especially if you are hooking into a broad network of receivers like apple does with airtags) but even then you are not really breaking this rule - you are just transmitting less often, and sending less data. It's just a different spot on the curve that happens to be useful for IOT. If you are, say, doing a keyboard over BTLE where the duty cycle is higher, the power will be too. Applications that need "100% duty cycle"/"interactive" (reachable at any time with minimal latency") still have not improved very much.In hindsight I guess the answer would have been writing a mobile app that ties into google/siri keywords and actions, and letting the phone be the UI and only transmit BT/BTLE to the device. But BTLE hadn't hit the scene back then (or at least not nearly to the extent it has now) and I was less experienced/less aware of that solution sapce.

Flimm将近 2 年前

If you're looking for an alternative that runs on Linux, I just recently discovered Speech Note. It does speech to text, text to speech, and machine translation, all offline, with a GUI:<a href="https://flathub.org/apps/net.mkiol.SpeechNote" rel="nofollow noreferrer">https://flathub.org/apps/net.mkiol.SpeechNote</a><a href="https://github.com/mkiol/dsnote">https://github.com/mkiol/dsnote</a>

评论 #37248015 未加载

satvikpendem将近 2 年前

While whisper.cpp is faster than faster-whisper on macOS due to Apple's Neural Engine [0], if you have a GPU on Windows or Linux, faster-whisper [1] is a lot faster than OpenAI's reference Whisper implementation as well as whisper.cpp, with the CLI being wscribe or whisper-ctranslate2 as faster-whisper is only a Python library. It's pretty good.[0] <a href="https://github.com/guillaumekln/faster-whisper/discussions/368">https://github.com/guillaumekln/faster-whisper/discussions/3...</a>[1] <a href="https://github.com/guillaumekln/faster-whisper">https://github.com/guillaumekln/faster-whisper</a>

miki123211将近 2 年前

This basically does the same thing but free:<a href="https://apps.apple.com/us/app/aiko/id1672085276" rel="nofollow noreferrer">https://apps.apple.com/us/app/aiko/id1672085276</a>

评论 #37239817 未加载

评论 #37240791 未加载

masukomi将近 2 年前

Here's a multi-platform open source app that does the same thing but uses vosk instead of whisper.<a href="https://github.com/bugbakery/audapolis">https://github.com/bugbakery/audapolis</a>

shawnc将近 2 年前

Been using it for a couple months, and Jordi keeps improving on it at a steady clip. It's great!!

_rs将近 2 年前

I've used this for a few months to transcribe interviews and it works pretty well. The UI for dealing with multiple speakers is a bit cumbersome, and there are occasional crashes, but overall definitely a great app and worth the money

nafizh将近 2 年前

The main problem I have faced with the whisper model (large) is when there is silence or a sizable gap without audio, it hallucinates and just puts out some random gibberish repeatedly until the transcription ends. How does this app handle this?

评论 #37244181 未加载

holdodd将近 2 年前

<a href="https://github.com/MahmoudAshraf97/whisper-diarization">https://github.com/MahmoudAshraf97/whisper-diarization</a>This project has been alright for transcribing audio with speaker diarization. A big finicky. The OpenAI model is better than other paid products(Descript, Riverside) so I’m looking forward to trying MacWhisper.

patrick91将近 2 年前

I really like this app, I wish there was a way to play a video while editing the subtitles though!

zitterbewegung将近 2 年前

There is a great library that has support not only with OpenAIs whisper but many others that also work offline. <a href="https://github.com/Uberi/speech_recognition">https://github.com/Uberi/speech_recognition</a>

googlryas将近 2 年前

Out of curiosity, does anyone know what the state of the art for transcription is? Is there a possibility it will soon be "better than a person carefully listening and manually transcribing"?I ask because I asked a friend to record a (for fun) lecture I couldn't attend, and unfortunately the speech audio levels are quite low, and I'm trying to figure out how to extract as much info as possible so I can hear it. If I could add context to the transcriber like "This is about the Bronze Age collapse and uses terminology commonly used in discussions on that topic", it might be even more useful.

评论 #37242104 未加载

ZoomerCretin将近 2 年前

A few weeks ago I found myself wanting a speech to text transcriber that directly captures my computer's audio output (I.e. not mic input, not am audio file), but I could not find one. The best alternative I found was to have my computer direct audio output to a virtual audio input device, but I could not do this on my desktop because I do not have a sound card. I found software that did this, but it did not allow me to listen to the audio output while it was redirected to a virtual audio input.Has anyone else tried to do something similar? How did you achieve it?

评论 #37339281 未加载

8f2ab37a-ed6c将近 2 年前

Love the idea behind this. High quality transcription + the data not leaving your device is excellent.Any chance there's an iOS version of this coming down the pike? It would be great to have a voice-based note taking app that you can use when you are driving or walking and you don't want to type into your phone, but you just want to save that thought you just had somewhere by quickly dictating it, and having it accessible as text later.

评论 #37243652 未加载

mosselman将近 2 年前

I didn’t know whisper could differentiate voices for the per speaker transcription. Is that new? Is it also available in the command line whisper builds?

评论 #37238843 未加载

neocodesoftware将近 2 年前

<a href="https://github.com/chidiwilliams/buzz">https://github.com/chidiwilliams/buzz</a>Brew install buzzIts great

userhacker将近 2 年前

If you want a quick and free web transcription and editor tool, We've built <a href="https://revoldiv.com/" rel="nofollow noreferrer">https://revoldiv.com/</a> with speaker detection and timestamps. Takes less than a minute to transcribe 1 hour long video/audio

评论 #37241130 未加载

deegles将近 2 年前

Is gumroad a good platform for selling software like this? How is licensing handled?

MaxikCZ将近 2 年前

Would be nice if it allowed importing mkv files, in the end its just a container..

评论 #37239942 未加载

评论 #37239951 未加载

bilater将近 2 年前

If you'd rather use a web app with minimal cost upfront check out PlainScribe :) <a href="https://www.plainscribe.com/" rel="nofollow noreferrer">https://www.plainscribe.com/</a>

评论 #37239759 未加载

agentdrtran将近 2 年前

Does anyone know of an easy to use whisper fork with speaker attestation?

ajhai将近 2 年前

Shameless plug: recently launched LLMStack (<a href="https://github.com/trypromptly/LLMStack">https://github.com/trypromptly/LLMStack</a>) and I have some custom pipelines built as apps on LLMStack that I use to transcribe and translate.Granted my use cases are not high volume or frequent but being able to take output from Whisper and pipe it to other models has been very powerful for me. It is also amazing how good the quality of Whisper is when handling non English audio.We added LocalAI (<a href="https://localai.io" rel="nofollow noreferrer">https://localai.io</a>) support to LLMStack in the last release. Will try to use whisper.cpp and see how that compares for my use cases.

ycstohley将近 2 年前

Seriously great program. Licensing model just fine. I use this all the time, so do my collegues at other companies.The developer Jordi has a great speech online about product development.

not_the_fda将近 2 年前

Is this just a front end to OpanAI's whisper?<a href="https://github.com/openai/whisper">https://github.com/openai/whisper</a>

beardedwizard将近 2 年前

Seems shady to me to charge for running larger free models you don't provide on hardware your users provide. You are charging for openAi features not yours.

评论 #37240657 未加载

评论 #37240765 未加载

ukuina将近 2 年前

Many such apps exist. I use Hello Transcribe from the App Store, $7 across all iDevices, with CoreML optimization.

mkmk将近 2 年前

I’ve gotten confused between the different whispers. How is this different from the openai api endpoint?

评论 #37238147 未加载

uger将近 2 年前

Great tool, but I can't wait until it can do real-time live transcribing.

kulesh将近 2 年前

superwhisper.com is also cool

评论 #37243792 未加载

评论 #37243515 未加载

jbverschoor将近 2 年前

So this is not Whisper Transcription 4 from the appstore?

bonney_io将近 2 年前

Any insight on how Whisper works on older Intel Macs? I have a 2012 Mac mini with 16GB of RAM doing nothing; if I could use it to (slowly) transcribe media in the background, this becomes a must-buy.

评论 #37244240 未加载

pgt将近 2 年前

Anyone have a cached page? Seems to hugged to death.

评论 #37237575 未加载

idorosen将近 2 年前

Why? Just use whisper directly. The model and code is available and I think there’s even a homebrew formula...

评论 #37236917 未加载

评论 #37237996 未加载

评论 #37237054 未加载

评论 #37240852 未加载

评论 #37238090 未加载

评论 #37237239 未加载