科技回声

21 条评论

viraptor大约 2 年前

I'm after something that can transcribe medical notes and unfortunately it does not work well for that case. (almost nothing does though) There's quite a few people interested in something that doesn't turn "laparoscopic" into "leper as cop it".Maybe the current progress will help though. Models adjusted by your own dictionary or from postprocessing fixes would be amazing.

评论 #35453694 未加载

评论 #35455762 未加载

评论 #35455293 未加载

评论 #35453058 未加载

评论 #35452751 未加载

评论 #35459158 未加载

评论 #35511852 未加载

评论 #35511776 未加载

评论 #35461231 未加载

评论 #35452721 未加载

NiekvdMaas大约 2 年前

Nice!Are you aware that whisper.cpp has a WASM-version as well? See <a href="https://github.com/ggerganov/whisper.cpp/tree/master/examples/whisper.wasm">https://github.com/ggerganov/whisper.cpp/tree/master/example...</a> - demo at <a href="https://whisper.ggerganov.com/" rel="nofollow">https://whisper.ggerganov.com/</a>

评论 #35456371 未加载

senko大约 2 年前

When I try this on Firefox on Linux (not incognito) I get the following error:<pre><code> Connecting AudioNodes from AudioContexts with different sample-rate is currently not supported. index-0dae94e71b526640.js:1:2992 Uncaught (in promise) DOMException: AudioContext.createMediaStreamSource: Connecting AudioNodes from AudioContexts with different sample-rate is currently not supported. index-0dae94e71b526640.js:1 Media resource blob:https://www.ermine.ai/e762a6f1-f292-4b23-96e0-8059a7f9d635 could not be decoded. www.ermine.ai Media resource blob:https://www.ermine.ai/e762a6f1-f292-4b23-96e0-8059a7f9d635 could not be decoded, error: Error Code: NS_ERROR_DOM_MEDIA_METADATA_ERR (0x806e0006) </code></pre> (also, the weights json doesn't download at all in Firefox incognito).Would be good if you could pop some kind of alert (literally alert() might do the trick) on an exception just so people don't wait for a couple of minutes before realizing something's gone wrong :)

评论 #35454074 未加载

评论 #35456390 未加载

emadda大约 2 年前

I released a similar web WASM transcription tool recently:<a href="https://bigwav.app" rel="nofollow">https://bigwav.app</a>

评论 #35454081 未加载

RecycledEle大约 2 年前

I need a Windows executable that takes a directory of audio files, and transcribes them to similarly named text files, so they can be searched.Example: 20230406115923.mp3 ==> 20230406115923.txt 20230406083110.m4a ==> 20230406083110.txtI wish someone would build one and sell it for $10 a copy.

quickthrower2大约 2 年前

Related : <a href="https://developer.mozilla.org/en-US/docs/Web/API/Web_Speech_API" rel="nofollow">https://developer.mozilla.org/en-US/docs/Web/API/Web_Speech_...</a>

ibnbutlAn大约 2 年前

So this is all client-side and my speech is not sent antwhere or is ut "client-side ui" for some API?I will have a look at the repository to find out, but maybe someone already looked into it.

评论 #35455924 未加载

infruset大约 2 年前

Very nice. Would it be easy to add other languages than English? Also, as others have notes, I had to open it in Chrome to make it work, Firefox didn't work.

评论 #35458465 未加载

评论 #35452213 未加载

technocratius大约 2 年前

Love this idea! Tried a 10 sec clip on Firefox for Android. App seems to be stuck on Transcribing... for few mins now...

评论 #35454744 未加载

MuffinFlavored大约 2 年前

Wow, it worked on an iPhone despite it saying it wouldn't work on Safari (needed Chrome).The little "replay your audio capture" <audio> HTML element says "error" but the transcription actually worked.

bloominggarden大约 2 年前

This is pretty cool! Just yesterday I finished a similar demo, also using transformers.js. I am currently in the process of adding real-time transcription, do you plan on adding that?

评论 #35460682 未加载

bethecloud大约 2 年前

How are you currently distributing the audio download? Any interest in using a distributed CDN layer – happy to get help get it funded

评论 #35458238 未加载

dmix大约 2 年前

Can it identify speakers? For examplesSPEAKER A: blah blahSPEAKER B: blah blahSo it can be used for transcribing phone calls?

评论 #35458320 未加载

apineda大约 2 年前

Is there a github link? I tried clicking the logo but it didn't work.

评论 #35448666 未加载

1attice大约 2 年前

does not work in Firefox, but works draw-droppingly well in Chrome

评论 #35451628 未加载

评论 #35452509 未加载

nunobrito大约 2 年前

So nice, it worked.Is there a version of this we can run on ESP32 (arduino) devices?

评论 #35441875 未加载

评论 #35453529 未加载

voicedYoda大约 2 年前

Nice! Also enjoy the ffmpeg usage. Gonna give it a try!

wdb大约 2 年前

Doesn't seem to work in Safari

评论 #35448674 未加载

评论 #35449659 未加载

评论 #35451468 未加载

bartislartfast大约 2 年前

can we add a "new" button? I have to refresh to do it a second time

评论 #35458334 未加载

rado大约 2 年前

Works great, thanks

APock大约 2 年前

I want this but let me upload a file first.

21 条评论

viraptor大约 2 年前

评论 #35453694 未加载

评论 #35455762 未加载

评论 #35455293 未加载

评论 #35453058 未加载

评论 #35452751 未加载

评论 #35459158 未加载

评论 #35511852 未加载

评论 #35511776 未加载

评论 #35461231 未加载

评论 #35452721 未加载

NiekvdMaas大约 2 年前

评论 #35456371 未加载

senko大约 2 年前

评论 #35454074 未加载

评论 #35456390 未加载

emadda大约 2 年前

I released a similar web WASM transcription tool recently:<a href="https://bigwav.app" rel="nofollow">https://bigwav.app</a>

评论 #35454081 未加载

RecycledEle大约 2 年前

quickthrower2大约 2 年前

Related : <a href="https://developer.mozilla.org/en-US/docs/Web/API/Web_Speech_API" rel="nofollow">https://developer.mozilla.org/en-US/docs/Web/API/Web_Speech_...</a>

ibnbutlAn大约 2 年前

So this is all client-side and my speech is not sent antwhere or is ut "client-side ui" for some API?I will have a look at the repository to find out, but maybe someone already looked into it.

评论 #35455924 未加载

infruset大约 2 年前

Very nice. Would it be easy to add other languages than English? Also, as others have notes, I had to open it in Chrome to make it work, Firefox didn't work.

评论 #35458465 未加载

评论 #35452213 未加载

technocratius大约 2 年前

Love this idea! Tried a 10 sec clip on Firefox for Android. App seems to be stuck on Transcribing... for few mins now...

评论 #35454744 未加载

MuffinFlavored大约 2 年前

bloominggarden大约 2 年前

This is pretty cool! Just yesterday I finished a similar demo, also using transformers.js. I am currently in the process of adding real-time transcription, do you plan on adding that?

评论 #35460682 未加载

bethecloud大约 2 年前

How are you currently distributing the audio download? Any interest in using a distributed CDN layer – happy to get help get it funded

评论 #35458238 未加载

dmix大约 2 年前

Can it identify speakers? For examplesSPEAKER A: blah blahSPEAKER B: blah blahSo it can be used for transcribing phone calls?

评论 #35458320 未加载

apineda大约 2 年前

Is there a github link? I tried clicking the logo but it didn't work.

评论 #35448666 未加载

1attice大约 2 年前

does not work in Firefox, but works draw-droppingly well in Chrome

评论 #35451628 未加载

评论 #35452509 未加载

nunobrito大约 2 年前

So nice, it worked.Is there a version of this we can run on ESP32 (arduino) devices?

评论 #35441875 未加载

评论 #35453529 未加载

voicedYoda大约 2 年前

Nice! Also enjoy the ffmpeg usage. Gonna give it a try!

wdb大约 2 年前

Doesn't seem to work in Safari

评论 #35448674 未加载

评论 #35449659 未加载

评论 #35451468 未加载

bartislartfast大约 2 年前

can we add a "new" button? I have to refresh to do it a second time

评论 #35458334 未加载

rado大约 2 年前

Works great, thanks

APock大约 2 年前

I want this but let me upload a file first.

Show HN: Ermine.ai – Record and transcribe speech, 100% client-side (WASM)

21 条评论

Show HN: Ermine.ai – Record and transcribe speech, 100% client-side (WASM)

21 条评论