On the one hand, this is very convenient. Probably cool for some non-fiction.<p>On the other, some of my favorite audio books all stood out because the narrator was interpreting the text really well, for example by changing the pacing during chaotic moments. Or those audiobooks with multiple narrators and different voices for each character. Not to mention that sometimes the only cue you get for who's speaking during dialogue is how the voice actor changes their tone. I have mixed feelings about using this and losing some of that quality.<p>I would totally use this over amateur ebooks or public domain audiobooks like the ones on project guttenberg. As cool as it is/was for someone to contribute to free books... as a listener it was always jarring to switch to a new chapter and hear a completely different voice and microphone quality for no reason.
The quality is great (amazing even), but I can't listen to AI generated voices for more than 1 minute.
I don't know why, I just don't like it.
I immediately skip the video on youtube if the voice is AI generated.<p>Might be because our brains try to 'feel' the speaker, the emotion, the pauses, the invisible smile, etc.<p>No doubt models will improve and will be harder to identify as AI generated, but for now, as with diffusion images, I still notice it and react by just moving on..
Can anyone recommend an open source option that would allow training on a custom voice (my own, so I'd be able to record as many snippets as it needed to train on) to allow me to use it for TTS generation without sharing it off my machine?<p>Edit: I'll wait to see if any recommendations get made here, if not I might give this one a go: <a href="https://github.com/coqui-ai/TTS">https://github.com/coqui-ai/TTS</a>
I would love to have an e-reader that allows me to switch between text and audio at the press of a button. Imagine reading your book on the couch and then switching into audio mode while doing the dishes seamlessly, by connecting bluetooth headphones.
This looks incredible! I’ve had an idea simmering in the back of my mind for a while now: creating an audiobook from an ebook for my commute using the voice of a specific audiobook narrator I really enjoy. The concept struck me after coming across the Infinite Conversation project here on HN. Unfortunately, I just haven’t found the time to bring it to life yet. :(
The word “kokoro” means “heart” in Japanese, which I learned making the (heart shaped and paperback) puzzle books at <a href="https://www.kakurokokoro.com/" rel="nofollow">https://www.kakurokokoro.com/</a>
I hope a plugin for Calibre ebook management software comes along that makes it easier to convert select titles from your epub library to decent audio versions -- and a decent open source app for tablets and smartphones that can let us seamlessly consume both the ebook and audiobook at will.
2025 may be the year where we can generate a dramatic audiobook with ambient music, sound effects, and theatrical narration using neural networks. Many of the parts already exist.
I would for sure not want this for fiction, it's too obvious that the voice has no understanding whatsoever of the text, but it's probably pretty nice for converting short news texts or notifications to audio.
Finally! Been trying all the TTS models popping up on here for ages, and they've all been pretty average, or not work on Mac, or only work on really short text, or be reeealy slow.<p>But this one works pretty quick, is easy to install, has some passible voices. Finally I can start listening to those books that have no audio version.<p>I'm a slow reader, so don't read many books. If a book doesn't have an audiobook version, chances are I won't read it.<p>PS, I have used elevenlabs in the past for some small TTS projects, but for a full book, it's price prohibitive for personal use. (elevenlabs has some amazing voices)<p>Thank you to the dev/s who worked on this!
I am having a very similar setup locally, which uses Chrome with the 'Read Aloud' plugin. I am capturing the audio stream via QJackCtl/VLC. Voices, speed, pitch can be adjusted. Efficient and quickly set up
If you look for a lot of the great classics, audiobooks results are inundated with basic TTS "audiobooks" that are impossible to filter out.
These are impossible to listen to because they lack the proper intonation marking the end of sentences, making it very tiring to parse.
It might be better than tuna can sounding recordings, especially if you want to ear them in traffic (a common requirement), but that's about it.
The alternative, if you want real quality recordings, is to stop reading classics and instead read latest Japanime Isekai of murder mystery, these have very good options on the market.
Anyway, I don't think it needs more justification that it covers a good niche usage.<p>I'm checking what the actual quality is (not a cherry-picked example), but:<p>Started at: 13:20:04
Total characters: 264,081
Total words: 41548
Reading chapter 1 (197,687 characters)...<p>That's 1h30 ago, there's no kind of progress notification of any kind, so I'm hoping it will finish sometime. It's using 100% of all available CPUs so it's quite a bother.
(this is "tale of a tub" by Swift, it's about half of a typical novel length)
For accessibility I think this is a great thing, but as entertainment less so.<p>Example is Hobbit and Lord of the Rings, the narrator Rob Inglis, makes an amazing voice performance giving depth to environments and characters. And of course the songs!
I really liked it and added a variable speed argument: <a href="https://github.com/santinic/audiblez/pull/4">https://github.com/santinic/audiblez/pull/4</a>
Was just looking for a TTS model to run locally for reading out loud articles, and never heard about Kokoro before! This looks great. I wonder if it can run in the browser somehow - could be a nice WebExtension.
Well there was some hope with ChatGPT that people will go back to being able to process text communication.<p>Guess it was just a matter of time till someone figured out how to use "AI" to resume encouraging illiteracy.
The page says it was trained on under 100 hours of audio. Then, the link says “we employ large pre-trained SLMs, such as WavLM, as discriminators with our novel differentiable duration modeling for end-to-end training.” I don’t have time to read the paper to see what that means.<p>Depending on what that means, it might be more accurate to say it was trained on 100 hours of audio and with the aid of another, pre-trained model. The reader who thinks “only 100 hours?!” will know to look at the pretraining requirements of the other model, too.
Soon, AI will flood the market with mediocre everything: books, audio books, art, movies, websites.<p>The saddest thing is that people will still continue to participate in consuming these AI produced “goods”.
It sounds okay, but it lacks emotion and is monotone for fiction, it's the voice equivalent of the uncanny valley, which is probably fine if you don't really care.
There is also this TTS: <a href="https://github.com/rhasspy/piper">https://github.com/rhasspy/piper</a> that is pretty good (depending on the language) and extremely fast, would be cool to change the script to user Piper instead of Kokoro in case you want to use a language that is not supported by Kokoro or it's too slow, Piper supports a lot of them.
This is wonderful, and so happy to see the post where the author ran it locally on their Macbook.<p>I am curious, is there an equivalent light model for speech to text, that can run real-time on the MacBook? I'm just playing around with AI models and was looking into this (a fully locally running app that lets you talk to your computer).
Not directly related to the software, but interestingly on the authors website there is a Schedule a free call with me (<a href="https://claudio.uk/templates/call.html" rel="nofollow">https://claudio.uk/templates/call.html</a>). I wonder if randos on the internet ever do that, and how it works out.
What I really want and hope that someone does is to make an audiobook service that converts books to audiobooks but so that each character has own voice.<p>Som audiobooks have this and I think it really makes the experience much more engaging.<p>(Also maybe some background sound effects but not sure about that, some books also have this and it's quite nice too)
Very nice! I fiddled with this idea a few months back but the models available at the time were woefully slow on a macbook. Will definitely give this a spin, there's a large category of web serials and less popular translated novels that never get audiobook releases.
I want to be able to seemlessly read on my ebook reader and then put in my headphones and go for a walk with the dog and resume on audio where I left off. then when I come back, my ereader is at the right place where the audio finished and I can resume reading
Has anyone gotten this working on windows? No matter where I put the files Powershell insists that kokoro-v0_19.onnx and voices.json aren't in the current folder.
As a mandarin learner, I find that the Chinese one lacks cadence, which makes it very hard as a learner to comprehend. It's like a machine gun of words without the subtle slight pause between sets of words that I would normally lean on.
I’m sure they sound more natural, but honestly, the text to speech built into my Kindle more than 10 years ago was good enough. Of course, Amazon killed that off because it would cannibalise sales to Audible.
I'm not able to try it until later, but regarding the sample audio: The voice quality is quite good, but what's going on with all the random pauses between words? It's very Captain Kirk.
Sounds really nice at 3x-4x speed, which I can't say for high quality TTS options last year. I'm wondering if there's metrics out there for audio speed vs clarity.
I really like this a lot. The default provides a really good audiobook feel, especially the Isabella voice. Any chance you could add in an API hook for optional ElevenLabs use?
I have been looking for something credible that can voice over written emails (long form ones), documents and powerpoints locally ...this might be just the thing!
Kokoro seemed pretty nice for the size. I guess it is not much mvetter than a lot of the simpler tts. But at least it sounds less machinic than a few bad ones.
This one sounds a bit robotic and takes ~4 hours per book on my M1 laptop, so I'll keep looking. For now, I'm happy my current method - EPUBReader browser extension, which opens .epub as an HTML page in Microsoft Edge browser, which has a "Read Aloud" button set to the Stephan natural voice at 1.6 speed. Best sounding voice I've ever heard, speaks fast, clear, crisp, with natural inflections to the sentences, and if I want to jump to somewhere I just left click the text at that spot. And it's instant - no conversions. Downside is I have to stay in bluetooth range of my laptop, so I'm still looking for a good phone based method. Google Play Books works okay, but gets buggy at 1.6 speed.
Just tried it, and "meh"...<p>It's one step above "normal" text-to-speech solutions, but not much above it. The epub has "Chapter 1" as the title on the page, and a lot of whitespace, and then "This was...." (actual text). The software somehow managed to ignore all the whitespace and reach "chapter 1 this was.." as a single sentance, no pauses, no nothing.<p>Blind? A great tool. Will it replace actual audiobooks? Well.. not yet at least.
in case you are wondering how audiblez becomes an executable in the PATH from a pip install audiblez per the documentation<p>... audiblez book.epub -l en-gb -v af_sky.<p>it does not, instead it installs a python package with a cli interface,
to run you then have to prepend python and load the module like this:<p>python3 -m audiblez book.epub -l en-gb -v af_sky.
For anyone looking for an easier alternative (and one without the bugs the author describes, such as skipping some prefaces or failing to detect some chapters), Voice Dream Reader on iOS (and macOS) handles .epub and other e-books just fine and supports a variety of built-in and external voices.