Generate audiobooks from E-books with Kokoro-82M

420 pointsby csantini4 months ago

53 comments

On the one hand, this is very convenient. Probably cool for some non-fiction.On the other, some of my favorite audio books all stood out because the narrator was interpreting the text really well, for example by changing the pacing during chaotic moments. Or those audiobooks with multiple narrators and different voices for each character. Not to mention that sometimes the only cue you get for who's speaking during dialogue is how the voice actor changes their tone. I have mixed feelings about using this and losing some of that quality.I would totally use this over amateur ebooks or public domain audiobooks like the ones on project guttenberg. As cool as it is/was for someone to contribute to free books... as a listener it was always jarring to switch to a new chapter and hear a completely different voice and microphone quality for no reason.

评论 #42709620 未加载

评论 #42709489 未加载

评论 #42709651 未加载

评论 #42711568 未加载

评论 #42710479 未加载

评论 #42709564 未加载

评论 #42709321 未加载

评论 #42716495 未加载

评论 #42740610 未加载

评论 #42721540 未加载

评论 #42718718 未加载

评论 #42710781 未加载

评论 #42710660 未加载

评论 #42713904 未加载

评论 #42709674 未加载

delegate4 months ago

The quality is great (amazing even), but I can't listen to AI generated voices for more than 1 minute. I don't know why, I just don't like it. I immediately skip the video on youtube if the voice is AI generated.Might be because our brains try to 'feel' the speaker, the emotion, the pauses, the invisible smile, etc.No doubt models will improve and will be harder to identify as AI generated, but for now, as with diffusion images, I still notice it and react by just moving on..

评论 #42712720 未加载

评论 #42714648 未加载

评论 #42716055 未加载

评论 #42713919 未加载

评论 #42715121 未加载

评论 #42717912 未加载

swores4 months ago

Can anyone recommend an open source option that would allow training on a custom voice (my own, so I'd be able to record as many snippets as it needed to train on) to allow me to use it for TTS generation without sharing it off my machine?Edit: I'll wait to see if any recommendations get made here, if not I might give this one a go: <a href="https://github.com/coqui-ai/TTS">https://github.com/coqui-ai/TTS</a>

评论 #42712786 未加载

评论 #42710099 未加载

评论 #42712945 未加载

评论 #42709989 未加载

评论 #42772026 未加载

评论 #42714433 未加载

评论 #42711444 未加载

pprotas4 months ago

I would love to have an e-reader that allows me to switch between text and audio at the press of a button. Imagine reading your book on the couch and then switching into audio mode while doing the dishes seamlessly, by connecting bluetooth headphones.

评论 #42709385 未加载

评论 #42709847 未加载

评论 #42709776 未加载

评论 #42711738 未加载

评论 #42710390 未加载

评论 #42728235 未加载

qurashee4 months ago

This looks incredible! I’ve had an idea simmering in the back of my mind for a while now: creating an audiobook from an ebook for my commute using the voice of a specific audiobook narrator I really enjoy. The concept struck me after coming across the Infinite Conversation project here on HN. Unfortunately, I just haven’t found the time to bring it to life yet. :(

评论 #42772038 未加载

评论 #42728282 未加载

评论 #42709140 未加载

cwmoore4 months ago

The word “kokoro” means “heart” in Japanese, which I learned making the (heart shaped and paperback) puzzle books at <a href="https://www.kakurokokoro.com/" rel="nofollow">https://www.kakurokokoro.com/</a>

评论 #42710292 未加载

评论 #42710071 未加载

albert_e4 months ago

I hope a plugin for Calibre ebook management software comes along that makes it easier to convert select titles from your epub library to decent audio versions -- and a decent open source app for tablets and smartphones that can let us seamlessly consume both the ebook and audiobook at will.

Dowwie4 months ago

2025 may be the year where we can generate a dramatic audiobook with ambient music, sound effects, and theatrical narration using neural networks. Many of the parts already exist.

cess114 months ago

I would for sure not want this for fiction, it's too obvious that the voice has no understanding whatsoever of the text, but it's probably pretty nice for converting short news texts or notifications to audio.

评论 #42709234 未加载

评论 #42709374 未加载

sysworld4 months ago

Finally! Been trying all the TTS models popping up on here for ages, and they've all been pretty average, or not work on Mac, or only work on really short text, or be reeealy slow.But this one works pretty quick, is easy to install, has some passible voices. Finally I can start listening to those books that have no audio version.I'm a slow reader, so don't read many books. If a book doesn't have an audiobook version, chances are I won't read it.PS, I have used elevenlabs in the past for some small TTS projects, but for a full book, it's price prohibitive for personal use. (elevenlabs has some amazing voices)Thank you to the dev/s who worked on this!

TypoAtLineZero4 months ago

I am having a very similar setup locally, which uses Chrome with the 'Read Aloud' plugin. I am capturing the audio stream via QJackCtl/VLC. Voices, speed, pitch can be adjusted. Efficient and quickly set up

lc644 months ago

"was trained on <100 hours of audio"How the hell was it trained on that little data ?

评论 #42710690 未加载

评论 #42709505 未加载

评论 #42709597 未加载

woolion4 months ago

If you look for a lot of the great classics, audiobooks results are inundated with basic TTS "audiobooks" that are impossible to filter out. These are impossible to listen to because they lack the proper intonation marking the end of sentences, making it very tiring to parse. It might be better than tuna can sounding recordings, especially if you want to ear them in traffic (a common requirement), but that's about it. The alternative, if you want real quality recordings, is to stop reading classics and instead read latest Japanime Isekai of murder mystery, these have very good options on the market. Anyway, I don't think it needs more justification that it covers a good niche usage.I'm checking what the actual quality is (not a cherry-picked example), but:Started at: 13:20:04 Total characters: 264,081 Total words: 41548 Reading chapter 1 (197,687 characters)...That's 1h30 ago, there's no kind of progress notification of any kind, so I'm hoping it will finish sometime. It's using 100% of all available CPUs so it's quite a bother. (this is "tale of a tub" by Swift, it's about half of a typical novel length)

评论 #42710817 未加载

msoad4 months ago

To people who are experts in AI TTS:Why elevenlabs has such a lead in this space? It sounds better than OpenAI and Google models

评论 #42709481 未加载

评论 #42772051 未加载

katspaugh4 months ago

Sounds better than many books on Audible.

TheChaplain4 months ago

For accessibility I think this is a great thing, but as entertainment less so.Example is Hobbit and Lord of the Rings, the narrator Rob Inglis, makes an amazing voice performance giving depth to environments and characters. And of course the songs!

flypunk4 months ago

I really liked it and added a variable speed argument: <a href="https://github.com/santinic/audiblez/pull/4">https://github.com/santinic/audiblez/pull/4</a>

yoavm4 months ago

Was just looking for a TTS model to run locally for reading out loud articles, and never heard about Kokoro before! This looks great. I wonder if it can run in the browser somehow - could be a nice WebExtension.

评论 #42709772 未加载

评论 #42709804 未加载

nottorp4 months ago

Well there was some hope with ChatGPT that people will go back to being able to process text communication.Guess it was just a matter of time till someone figured out how to use "AI" to resume encouraging illiteracy.

评论 #42709613 未加载

nickpsecurity4 months ago

The page says it was trained on under 100 hours of audio. Then, the link says “we employ large pre-trained SLMs, such as WavLM, as discriminators with our novel differentiable duration modeling for end-to-end training.” I don’t have time to read the paper to see what that means.Depending on what that means, it might be more accurate to say it was trained on 100 hours of audio and with the aid of another, pre-trained model. The reader who thinks “only 100 hours?!” will know to look at the pretraining requirements of the other model, too.

skwee3574 months ago

Soon, AI will flood the market with mediocre everything: books, audio books, art, movies, websites.The saddest thing is that people will still continue to participate in consuming these AI produced “goods”.

评论 #42716565 未加载

评论 #42716399 未加载

floppiplopp4 months ago

It sounds okay, but it lacks emotion and is monotone for fiction, it's the voice equivalent of the uncanny valley, which is probably fine if you don't really care.

评论 #42709714 未加载

GaggiX4 months ago

There is also this TTS: <a href="https://github.com/rhasspy/piper">https://github.com/rhasspy/piper</a> that is pretty good (depending on the language) and extremely fast, would be cool to change the script to user Piper instead of Kokoro in case you want to use a language that is not supported by Kokoro or it's too slow, Piper supports a lot of them.

grwthckrmstr4 months ago

This is wonderful, and so happy to see the post where the author ran it locally on their Macbook.I am curious, is there an equivalent light model for speech to text, that can run real-time on the MacBook? I'm just playing around with AI models and was looking into this (a fully locally running app that lets you talk to your computer).

zoidb4 months ago

Not directly related to the software, but interestingly on the authors website there is a Schedule a free call with me (<a href="https://claudio.uk/templates/call.html" rel="nofollow">https://claudio.uk/templates/call.html</a>). I wonder if randos on the internet ever do that, and how it works out.

评论 #42711493 未加载

评论 #42710818 未加载

mikkom4 months ago

What I really want and hope that someone does is to make an audiobook service that converts books to audiobooks but so that each character has own voice.Som audiobooks have this and I think it really makes the experience much more engaging.(Also maybe some background sound effects but not sure about that, some books also have this and it's quite nice too)

herculity2754 months ago

Very nice! I fiddled with this idea a few months back but the models available at the time were woefully slow on a macbook. Will definitely give this a spin, there's a large category of web serials and less popular translated novels that never get audiobook releases.

basedrum4 months ago

I want to be able to seemlessly read on my ebook reader and then put in my headphones and go for a walk with the dog and resume on audio where I left off. then when I come back, my ereader is at the right place where the audio finished and I can resume reading

评论 #42710717 未加载

causality04 months ago

Has anyone gotten this working on windows? No matter where I put the files Powershell insists that kokoro-v0_19.onnx and voices.json aren't in the current folder.

plumbees4 months ago

As a mandarin learner, I find that the Chinese one lacks cadence, which makes it very hard as a learner to comprehend. It's like a machine gun of words without the subtle slight pause between sets of words that I would normally lean on.

mg4 months ago

Would this also be the best option if you just want to convert plain text files to audio?

评论 #42709247 未加载

october81404 months ago

All these AI text to voice models seem to ignore emotion. It always sounds like a robot.

评论 #42710574 未加载

评论 #42709814 未加载

评论 #42711796 未加载

评论 #42710362 未加载

physicsguy4 months ago

I’m sure they sound more natural, but honestly, the text to speech built into my Kindle more than 10 years ago was good enough. Of course, Amazon killed that off because it would cannibalise sales to Audible.

causi4 months ago

I'm not able to try it until later, but regarding the sample audio: The voice quality is quite good, but what's going on with all the random pauses between words? It's very Captain Kirk.

maxglute4 months ago

Sounds really nice at 3x-4x speed, which I can't say for high quality TTS options last year. I'm wondering if there's metrics out there for audio speed vs clarity.

jaggs4 months ago

I really like this a lot. The default provides a really good audiobook feel, especially the Isabella voice. Any chance you could add in an API hook for optional ElevenLabs use?

monkeydust4 months ago

I have been looking for something credible that can voice over written emails (long form ones), documents and powerpoints locally ...this might be just the thing!

gunalx4 months ago

Kokoro seemed pretty nice for the size. I guess it is not much mvetter than a lot of the simpler tts. But at least it sounds less machinic than a few bad ones.

评论 #42709116 未加载

mrklol4 months ago

How can this support more languages than the model itself?

评论 #42710285 未加载

carlosjobim4 months ago

Why isn't the audiobook market strong enough that it would make business sense to pay good narrators and actors for each book published?

评论 #42714668 未加载

therealdrag04 months ago

Do folks have a preferred toolkit for extracting text from web articles? I’d like to TTS articles friends send me.

vinni24 months ago

Can it also translate? I have family who would like audiobooks in German but most are in English only.

评论 #42709838 未加载

Havoc4 months ago

Wow that sample sounds really good

crorella4 months ago

Nice! It would be great to have per character voices

评论 #42717781 未加载

geor9e4 months ago

This one sounds a bit robotic and takes ~4 hours per book on my M1 laptop, so I'll keep looking. For now, I'm happy my current method - EPUBReader browser extension, which opens .epub as an HTML page in Microsoft Edge browser, which has a "Read Aloud" button set to the Stephan natural voice at 1.6 speed. Best sounding voice I've ever heard, speaks fast, clear, crisp, with natural inflections to the sentences, and if I want to jump to somewhere I just left click the text at that spot. And it's instant - no conversions. Downside is I have to stay in bluetooth range of my laptop, so I'm still looking for a good phone based method. Google Play Books works okay, but gets buggy at 1.6 speed.

jaggs4 months ago

This looks really nice. And fast too it seems.

ekianjo4 months ago

japanese is not supported yet despite the claims. you can easily realize that by running the examples provided.

ajsnigrutin4 months ago

Just tried it, and "meh"...It's one step above "normal" text-to-speech solutions, but not much above it. The epub has "Chapter 1" as the title on the page, and a lot of whitespace, and then "This was...." (actual text). The software somehow managed to ignore all the whitespace and reach "chapter 1 this was.." as a single sentance, no pauses, no nothing.Blind? A great tool. Will it replace actual audiobooks? Well.. not yet at least.

cliftonpowell4 months ago

There's another project called ebook2audiobook that has produces some decent results.

callamdelaney4 months ago

It's insufferable.

leecarraher4 months ago

in case you are wondering how audiblez becomes an executable in the PATH from a pip install audiblez per the documentation... audiblez book.epub -l en-gb -v af_sky.it does not, instead it installs a python package with a cli interface, to run you then have to prepend python and load the module like this:python3 -m audiblez book.epub -l en-gb -v af_sky.

DidYaWipe4 months ago

Yes, because real narrator/actors are rolling in the dough. Let's kill one more profession with trash.

评论 #42715091 未加载

treetalker4 months ago

For anyone looking for an easier alternative (and one without the bugs the author describes, such as skipping some prefaces or failing to detect some chapters), Voice Dream Reader on iOS (and macOS) handles .epub and other e-books just fine and supports a variety of built-in and external voices.

评论 #42723863 未加载

评论 #42709902 未加载

评论 #42713827 未加载

评论 #42709149 未加载

评论 #42711006 未加载

评论 #42709129 未加载

53 comments

laserbeam4 months ago

评论 #42709620 未加载

评论 #42709489 未加载

评论 #42709651 未加载

评论 #42711568 未加载

评论 #42710479 未加载

评论 #42709564 未加载

评论 #42709321 未加载

评论 #42716495 未加载

评论 #42740610 未加载

评论 #42721540 未加载

评论 #42718718 未加载

评论 #42710781 未加载

评论 #42710660 未加载

评论 #42713904 未加载

评论 #42709674 未加载

delegate4 months ago

评论 #42712720 未加载

评论 #42714648 未加载

评论 #42716055 未加载

评论 #42713919 未加载

评论 #42715121 未加载

评论 #42717912 未加载

swores4 months ago

评论 #42712786 未加载

评论 #42710099 未加载

评论 #42712945 未加载

评论 #42709989 未加载

评论 #42772026 未加载

评论 #42714433 未加载

评论 #42711444 未加载

pprotas4 months ago

评论 #42709385 未加载

评论 #42709847 未加载

评论 #42709776 未加载

评论 #42711738 未加载

评论 #42710390 未加载

评论 #42728235 未加载

qurashee4 months ago

评论 #42772038 未加载

评论 #42728282 未加载

评论 #42709140 未加载

cwmoore4 months ago

评论 #42710292 未加载

评论 #42710071 未加载

albert_e4 months ago

Dowwie4 months ago

2025 may be the year where we can generate a dramatic audiobook with ambient music, sound effects, and theatrical narration using neural networks. Many of the parts already exist.

cess114 months ago

评论 #42709234 未加载

评论 #42709374 未加载

sysworld4 months ago

TypoAtLineZero4 months ago

lc644 months ago

"was trained on <100 hours of audio"How the hell was it trained on that little data ?

评论 #42710690 未加载

评论 #42709505 未加载

评论 #42709597 未加载

woolion4 months ago

评论 #42710817 未加载

msoad4 months ago

To people who are experts in AI TTS:Why elevenlabs has such a lead in this space? It sounds better than OpenAI and Google models

评论 #42709481 未加载

评论 #42772051 未加载

katspaugh4 months ago

Sounds better than many books on Audible.

TheChaplain4 months ago

flypunk4 months ago

I really liked it and added a variable speed argument: <a href="https://github.com/santinic/audiblez/pull/4">https://github.com/santinic/audiblez/pull/4</a>

yoavm4 months ago

评论 #42709772 未加载

评论 #42709804 未加载

nottorp4 months ago

评论 #42709613 未加载

nickpsecurity4 months ago

skwee3574 months ago

评论 #42716565 未加载

评论 #42716399 未加载

floppiplopp4 months ago

It sounds okay, but it lacks emotion and is monotone for fiction, it's the voice equivalent of the uncanny valley, which is probably fine if you don't really care.

评论 #42709714 未加载

GaggiX4 months ago

grwthckrmstr4 months ago

zoidb4 months ago

评论 #42711493 未加载

评论 #42710818 未加载

mikkom4 months ago

herculity2754 months ago

basedrum4 months ago

评论 #42710717 未加载

causality04 months ago

Has anyone gotten this working on windows? No matter where I put the files Powershell insists that kokoro-v0_19.onnx and voices.json aren't in the current folder.

plumbees4 months ago

mg4 months ago

Would this also be the best option if you just want to convert plain text files to audio?

评论 #42709247 未加载

october81404 months ago

All these AI text to voice models seem to ignore emotion. It always sounds like a robot.

评论 #42710574 未加载

评论 #42709814 未加载

评论 #42711796 未加载

评论 #42710362 未加载

physicsguy4 months ago

causi4 months ago

I'm not able to try it until later, but regarding the sample audio: The voice quality is quite good, but what's going on with all the random pauses between words? It's very Captain Kirk.

maxglute4 months ago

Sounds really nice at 3x-4x speed, which I can't say for high quality TTS options last year. I'm wondering if there's metrics out there for audio speed vs clarity.

jaggs4 months ago

I really like this a lot. The default provides a really good audiobook feel, especially the Isabella voice. Any chance you could add in an API hook for optional ElevenLabs use?

monkeydust4 months ago

I have been looking for something credible that can voice over written emails (long form ones), documents and powerpoints locally ...this might be just the thing!

gunalx4 months ago

Kokoro seemed pretty nice for the size. I guess it is not much mvetter than a lot of the simpler tts. But at least it sounds less machinic than a few bad ones.

评论 #42709116 未加载

mrklol4 months ago

How can this support more languages than the model itself?

评论 #42710285 未加载

carlosjobim4 months ago

Why isn't the audiobook market strong enough that it would make business sense to pay good narrators and actors for each book published?

评论 #42714668 未加载

therealdrag04 months ago

Do folks have a preferred toolkit for extracting text from web articles? I’d like to TTS articles friends send me.

vinni24 months ago

Can it also translate? I have family who would like audiobooks in German but most are in English only.

评论 #42709838 未加载

Havoc4 months ago

Wow that sample sounds really good

crorella4 months ago

Nice! It would be great to have per character voices

评论 #42717781 未加载

geor9e4 months ago

jaggs4 months ago

This looks really nice. And fast too it seems.

ekianjo4 months ago

japanese is not supported yet despite the claims. you can easily realize that by running the examples provided.

ajsnigrutin4 months ago

cliftonpowell4 months ago

There's another project called ebook2audiobook that has produces some decent results.

callamdelaney4 months ago

It's insufferable.

leecarraher4 months ago

DidYaWipe4 months ago

Yes, because real narrator/actors are rolling in the dough. Let's kill one more profession with trash.