科技回声

15 条评论

nchudleigh超过 1 年前

This is awesome.For anyone confused about the project, it is using whisper.cpp, a C-based runner and translation of the open whisper model from OpenAI. It is built by the team behind GGML and llama.cpp. <a href="https://github.com/ggerganov">https://github.com/ggerganov</a>You can fork this code, run it on your own server, and hit the API. The server itself will use FFmpeg to convert the audio file into the required format and run the C translation of the whisper model against the file.By doing this you can separate yourself from the requirement of paying the fee that OpenAI charges for their Whisper service and fully own your translations. The models that the author has supplied here are rather small but should run decent on a CPU. If you want to go to larger model sizes you would likely need to change the compilation options and use a server with a GPU.Similar to this project, my product <a href="https://superwhisper.com" rel="nofollow noreferrer">https://superwhisper.com</a> is using these whisper.cpp models to provide really good Dictation on macOS.Its runs really fast on the M series chips. Most of this message was dictated using superwhisper.Congrats to the author of this project. Seems like a useful implementation of the whisper.cpp project.I wonder if they would accept it upstream in the examples.

评论 #37229743 未加载

innovatorved超过 1 年前

Many of you are asking if the project is completely self-hosted and does not rely on any third-party services. Yes, it is completely self-hosted and does not rely on any third-party services. The user is for authentication, so no one can use the service without authentication.

评论 #37228023 未加载

评论 #37243135 未加载

评论 #37228916 未加载

innovatorved超过 1 年前

Whisper API - Speech to Text TranscriptionThis open source project provides a self-hostable API for speech to text transcription using a finetuned Whisper ASR model. The API allows you to easily convert audio files to text through HTTP requests. Ideal for adding speech recognition capabilities to your applications.Key features:- Uses a finetuned Whisper model for accurate speech recognition - Simple HTTP API for audio file transcription - User level access with API keys for managing usage - Self-hostable code for your own speech transcription service - Quantized model optimization for fast and efficient inference - Open source implementation for customization and transparency

评论 #37228059 未加载

评论 #37227792 未加载

评论 #37227687 未加载

评论 #37228438 未加载

ChrisArchitect超过 1 年前

Not to be confused withWhisper – open source speech recognition by OpenAI <a href="https://news.ycombinator.com/item?id=34985848">https://news.ycombinator.com/item?id=34985848</a>

评论 #37228434 未加载

评论 #37228586 未加载

edgarvaldes超过 1 年前

Related to whisper: whisperX is a god send. I can finally watch old or uncommon tv series with subtitles.

评论 #37228290 未加载

pizzafeelsright超过 1 年前

This is not fully self-hosted so much as middle-ware, no?

评论 #37227844 未加载

评论 #37227855 未加载

geekodour超过 1 年前

Nice! This will be very useful for me. Think I can run this locally can spin a basic telegram bot around it for personal use.One issue I faced with all the whisper based transcript generators is that there seems to be no good way to make editing/correcting the generated text with word level timestamp. I created a small web based tool[0] for that.By any chance if anyone is looking to edit transcripts generated using whisper, you'd probably find it useful.[0] <a href="https://github.com/geekodour/wscribe-editor">https://github.com/geekodour/wscribe-editor</a>

LeoPanthera超过 1 年前

So is "real time" translation a thing yet? I've long wanted to be able watch non-english television and have the audio translated into English subtitles. It's doable for pre-recorded things, but not for live.An iPhone app that could do this from the microphone would also be amazing. Google Translate and it's various competitors from Microsoft/Apple are nearly there, but they all stop listening inbetween sentences. Something that just listened constantly, printing translated text onto the screen, would be amazing.

评论 #37228201 未加载

评论 #37230876 未加载

评论 #37230473 未加载

评论 #37228379 未加载

distantsounds超过 1 年前

how is this open source, or self-hosted, when it requires an API key and a login from a third party?

评论 #37227868 未加载

v7n超过 1 年前

Many live streamers, and platforms, would love to have custom real-time transcription elements. I actually looked into this exact project of yours when I thought about creating such a thing.Even if it meant delaying the broadcast for a second while transcribing the accessibility value could be immense.

Dig1t超过 1 年前

>Get Your tokenIf it's completely self-hosted why do I need to get a token? Where does the actual model run?

评论 #37228219 未加载

pdntspa超过 1 年前

So Whisper is all the rage with speech-to-text, but what about text-to-speech?

评论 #37232779 未加载

grzes超过 1 年前

i dont understand the excitement here. it's just a HTTP wrapper for CLI command. you can build it easily on your own with any decent RAD framework

1024core超过 1 年前

Does Android OS come with ASR?

评论 #37227969 未加载

tnhoang088超过 1 年前

Thank you. I love it

15 条评论

nchudleigh超过 1 年前

评论 #37229743 未加载

innovatorved超过 1 年前

评论 #37228023 未加载

评论 #37243135 未加载

评论 #37228916 未加载

innovatorved超过 1 年前

评论 #37228059 未加载

评论 #37227792 未加载

评论 #37227687 未加载

评论 #37228438 未加载

ChrisArchitect超过 1 年前

Not to be confused withWhisper – open source speech recognition by OpenAI <a href="https://news.ycombinator.com/item?id=34985848">https://news.ycombinator.com/item?id=34985848</a>

评论 #37228434 未加载

评论 #37228586 未加载

edgarvaldes超过 1 年前

Related to whisper: whisperX is a god send. I can finally watch old or uncommon tv series with subtitles.

评论 #37228290 未加载

pizzafeelsright超过 1 年前

This is not fully self-hosted so much as middle-ware, no?

评论 #37227844 未加载

评论 #37227855 未加载

geekodour超过 1 年前

LeoPanthera超过 1 年前

评论 #37228201 未加载

评论 #37230876 未加载

评论 #37230473 未加载

评论 #37228379 未加载

distantsounds超过 1 年前

how is this open source, or self-hosted, when it requires an API key and a login from a third party?

评论 #37227868 未加载

v7n超过 1 年前

Dig1t超过 1 年前

>Get Your tokenIf it's completely self-hosted why do I need to get a token? Where does the actual model run?

评论 #37228219 未加载

pdntspa超过 1 年前

So Whisper is all the rage with speech-to-text, but what about text-to-speech?

评论 #37232779 未加载

grzes超过 1 年前

i dont understand the excitement here. it's just a HTTP wrapper for CLI command. you can build it easily on your own with any decent RAD framework

1024core超过 1 年前

Does Android OS come with ASR?

评论 #37227969 未加载

tnhoang088超过 1 年前

Thank you. I love it

Whisper.api: Open-source, self-hosted speech-to-text with fast transcription

15 条评论

Whisper.api: Open-source, self-hosted speech-to-text with fast transcription

15 条评论