TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Generate audiobooks from E-books with Kokoro-82M

420 pointsby csantini4 months ago

53 comments

laserbeam4 months ago
On the one hand, this is very convenient. Probably cool for some non-fiction.<p>On the other, some of my favorite audio books all stood out because the narrator was interpreting the text really well, for example by changing the pacing during chaotic moments. Or those audiobooks with multiple narrators and different voices for each character. Not to mention that sometimes the only cue you get for who&#x27;s speaking during dialogue is how the voice actor changes their tone. I have mixed feelings about using this and losing some of that quality.<p>I would totally use this over amateur ebooks or public domain audiobooks like the ones on project guttenberg. As cool as it is&#x2F;was for someone to contribute to free books... as a listener it was always jarring to switch to a new chapter and hear a completely different voice and microphone quality for no reason.
评论 #42709620 未加载
评论 #42709489 未加载
评论 #42709651 未加载
评论 #42711568 未加载
评论 #42710479 未加载
评论 #42709564 未加载
评论 #42709321 未加载
评论 #42716495 未加载
评论 #42740610 未加载
评论 #42721540 未加载
评论 #42718718 未加载
评论 #42710781 未加载
评论 #42710660 未加载
评论 #42713904 未加载
评论 #42709674 未加载
delegate4 months ago
The quality is great (amazing even), but I can&#x27;t listen to AI generated voices for more than 1 minute. I don&#x27;t know why, I just don&#x27;t like it. I immediately skip the video on youtube if the voice is AI generated.<p>Might be because our brains try to &#x27;feel&#x27; the speaker, the emotion, the pauses, the invisible smile, etc.<p>No doubt models will improve and will be harder to identify as AI generated, but for now, as with diffusion images, I still notice it and react by just moving on..
评论 #42712720 未加载
评论 #42714648 未加载
评论 #42716055 未加载
评论 #42713919 未加载
评论 #42715121 未加载
评论 #42717912 未加载
swores4 months ago
Can anyone recommend an open source option that would allow training on a custom voice (my own, so I&#x27;d be able to record as many snippets as it needed to train on) to allow me to use it for TTS generation without sharing it off my machine?<p>Edit: I&#x27;ll wait to see if any recommendations get made here, if not I might give this one a go: <a href="https:&#x2F;&#x2F;github.com&#x2F;coqui-ai&#x2F;TTS">https:&#x2F;&#x2F;github.com&#x2F;coqui-ai&#x2F;TTS</a>
评论 #42712786 未加载
评论 #42710099 未加载
评论 #42712945 未加载
评论 #42709989 未加载
评论 #42772026 未加载
评论 #42714433 未加载
评论 #42711444 未加载
pprotas4 months ago
I would love to have an e-reader that allows me to switch between text and audio at the press of a button. Imagine reading your book on the couch and then switching into audio mode while doing the dishes seamlessly, by connecting bluetooth headphones.
评论 #42709385 未加载
评论 #42709847 未加载
评论 #42709776 未加载
评论 #42711738 未加载
评论 #42710390 未加载
评论 #42728235 未加载
qurashee4 months ago
This looks incredible! I’ve had an idea simmering in the back of my mind for a while now: creating an audiobook from an ebook for my commute using the voice of a specific audiobook narrator I really enjoy. The concept struck me after coming across the Infinite Conversation project here on HN. Unfortunately, I just haven’t found the time to bring it to life yet. :(
评论 #42772038 未加载
评论 #42728282 未加载
评论 #42709140 未加载
cwmoore4 months ago
The word “kokoro” means “heart” in Japanese, which I learned making the (heart shaped and paperback) puzzle books at <a href="https:&#x2F;&#x2F;www.kakurokokoro.com&#x2F;" rel="nofollow">https:&#x2F;&#x2F;www.kakurokokoro.com&#x2F;</a>
评论 #42710292 未加载
评论 #42710071 未加载
albert_e4 months ago
I hope a plugin for Calibre ebook management software comes along that makes it easier to convert select titles from your epub library to decent audio versions -- and a decent open source app for tablets and smartphones that can let us seamlessly consume both the ebook and audiobook at will.
Dowwie4 months ago
2025 may be the year where we can generate a dramatic audiobook with ambient music, sound effects, and theatrical narration using neural networks. Many of the parts already exist.
cess114 months ago
I would for sure not want this for fiction, it&#x27;s too obvious that the voice has no understanding whatsoever of the text, but it&#x27;s probably pretty nice for converting short news texts or notifications to audio.
评论 #42709234 未加载
评论 #42709374 未加载
sysworld4 months ago
Finally! Been trying all the TTS models popping up on here for ages, and they&#x27;ve all been pretty average, or not work on Mac, or only work on really short text, or be reeealy slow.<p>But this one works pretty quick, is easy to install, has some passible voices. Finally I can start listening to those books that have no audio version.<p>I&#x27;m a slow reader, so don&#x27;t read many books. If a book doesn&#x27;t have an audiobook version, chances are I won&#x27;t read it.<p>PS, I have used elevenlabs in the past for some small TTS projects, but for a full book, it&#x27;s price prohibitive for personal use. (elevenlabs has some amazing voices)<p>Thank you to the dev&#x2F;s who worked on this!
TypoAtLineZero4 months ago
I am having a very similar setup locally, which uses Chrome with the &#x27;Read Aloud&#x27; plugin. I am capturing the audio stream via QJackCtl&#x2F;VLC. Voices, speed, pitch can be adjusted. Efficient and quickly set up
lc644 months ago
&quot;was trained on &lt;100 hours of audio&quot;<p>How the hell was it trained on that little data ?
评论 #42710690 未加载
评论 #42709505 未加载
评论 #42709597 未加载
woolion4 months ago
If you look for a lot of the great classics, audiobooks results are inundated with basic TTS &quot;audiobooks&quot; that are impossible to filter out. These are impossible to listen to because they lack the proper intonation marking the end of sentences, making it very tiring to parse. It might be better than tuna can sounding recordings, especially if you want to ear them in traffic (a common requirement), but that&#x27;s about it. The alternative, if you want real quality recordings, is to stop reading classics and instead read latest Japanime Isekai of murder mystery, these have very good options on the market. Anyway, I don&#x27;t think it needs more justification that it covers a good niche usage.<p>I&#x27;m checking what the actual quality is (not a cherry-picked example), but:<p>Started at: 13:20:04 Total characters: 264,081 Total words: 41548 Reading chapter 1 (197,687 characters)...<p>That&#x27;s 1h30 ago, there&#x27;s no kind of progress notification of any kind, so I&#x27;m hoping it will finish sometime. It&#x27;s using 100% of all available CPUs so it&#x27;s quite a bother. (this is &quot;tale of a tub&quot; by Swift, it&#x27;s about half of a typical novel length)
评论 #42710817 未加载
msoad4 months ago
To people who are experts in AI TTS:<p>Why elevenlabs has such a lead in this space? It sounds better than OpenAI and Google models
评论 #42709481 未加载
评论 #42772051 未加载
katspaugh4 months ago
Sounds better than many books on Audible.
TheChaplain4 months ago
For accessibility I think this is a great thing, but as entertainment less so.<p>Example is Hobbit and Lord of the Rings, the narrator Rob Inglis, makes an amazing voice performance giving depth to environments and characters. And of course the songs!
flypunk4 months ago
I really liked it and added a variable speed argument: <a href="https:&#x2F;&#x2F;github.com&#x2F;santinic&#x2F;audiblez&#x2F;pull&#x2F;4">https:&#x2F;&#x2F;github.com&#x2F;santinic&#x2F;audiblez&#x2F;pull&#x2F;4</a>
yoavm4 months ago
Was just looking for a TTS model to run locally for reading out loud articles, and never heard about Kokoro before! This looks great. I wonder if it can run in the browser somehow - could be a nice WebExtension.
评论 #42709772 未加载
评论 #42709804 未加载
nottorp4 months ago
Well there was some hope with ChatGPT that people will go back to being able to process text communication.<p>Guess it was just a matter of time till someone figured out how to use &quot;AI&quot; to resume encouraging illiteracy.
评论 #42709613 未加载
nickpsecurity4 months ago
The page says it was trained on under 100 hours of audio. Then, the link says “we employ large pre-trained SLMs, such as WavLM, as discriminators with our novel differentiable duration modeling for end-to-end training.” I don’t have time to read the paper to see what that means.<p>Depending on what that means, it might be more accurate to say it was trained on 100 hours of audio and with the aid of another, pre-trained model. The reader who thinks “only 100 hours?!” will know to look at the pretraining requirements of the other model, too.
skwee3574 months ago
Soon, AI will flood the market with mediocre everything: books, audio books, art, movies, websites.<p>The saddest thing is that people will still continue to participate in consuming these AI produced “goods”.
评论 #42716565 未加载
评论 #42716399 未加载
floppiplopp4 months ago
It sounds okay, but it lacks emotion and is monotone for fiction, it&#x27;s the voice equivalent of the uncanny valley, which is probably fine if you don&#x27;t really care.
评论 #42709714 未加载
GaggiX4 months ago
There is also this TTS: <a href="https:&#x2F;&#x2F;github.com&#x2F;rhasspy&#x2F;piper">https:&#x2F;&#x2F;github.com&#x2F;rhasspy&#x2F;piper</a> that is pretty good (depending on the language) and extremely fast, would be cool to change the script to user Piper instead of Kokoro in case you want to use a language that is not supported by Kokoro or it&#x27;s too slow, Piper supports a lot of them.
grwthckrmstr4 months ago
This is wonderful, and so happy to see the post where the author ran it locally on their Macbook.<p>I am curious, is there an equivalent light model for speech to text, that can run real-time on the MacBook? I&#x27;m just playing around with AI models and was looking into this (a fully locally running app that lets you talk to your computer).
zoidb4 months ago
Not directly related to the software, but interestingly on the authors website there is a Schedule a free call with me (<a href="https:&#x2F;&#x2F;claudio.uk&#x2F;templates&#x2F;call.html" rel="nofollow">https:&#x2F;&#x2F;claudio.uk&#x2F;templates&#x2F;call.html</a>). I wonder if randos on the internet ever do that, and how it works out.
评论 #42711493 未加载
评论 #42710818 未加载
mikkom4 months ago
What I really want and hope that someone does is to make an audiobook service that converts books to audiobooks but so that each character has own voice.<p>Som audiobooks have this and I think it really makes the experience much more engaging.<p>(Also maybe some background sound effects but not sure about that, some books also have this and it&#x27;s quite nice too)
herculity2754 months ago
Very nice! I fiddled with this idea a few months back but the models available at the time were woefully slow on a macbook. Will definitely give this a spin, there&#x27;s a large category of web serials and less popular translated novels that never get audiobook releases.
basedrum4 months ago
I want to be able to seemlessly read on my ebook reader and then put in my headphones and go for a walk with the dog and resume on audio where I left off. then when I come back, my ereader is at the right place where the audio finished and I can resume reading
评论 #42710717 未加载
causality04 months ago
Has anyone gotten this working on windows? No matter where I put the files Powershell insists that kokoro-v0_19.onnx and voices.json aren&#x27;t in the current folder.
plumbees4 months ago
As a mandarin learner, I find that the Chinese one lacks cadence, which makes it very hard as a learner to comprehend. It&#x27;s like a machine gun of words without the subtle slight pause between sets of words that I would normally lean on.
mg4 months ago
Would this also be the best option if you just want to convert plain text files to audio?
评论 #42709247 未加载
october81404 months ago
All these AI text to voice models seem to ignore emotion. It always sounds like a robot.
评论 #42710574 未加载
评论 #42709814 未加载
评论 #42711796 未加载
评论 #42710362 未加载
physicsguy4 months ago
I’m sure they sound more natural, but honestly, the text to speech built into my Kindle more than 10 years ago was good enough. Of course, Amazon killed that off because it would cannibalise sales to Audible.
causi4 months ago
I&#x27;m not able to try it until later, but regarding the sample audio: The voice quality is quite good, but what&#x27;s going on with all the random pauses between words? It&#x27;s very Captain Kirk.
maxglute4 months ago
Sounds really nice at 3x-4x speed, which I can&#x27;t say for high quality TTS options last year. I&#x27;m wondering if there&#x27;s metrics out there for audio speed vs clarity.
jaggs4 months ago
I really like this a lot. The default provides a really good audiobook feel, especially the Isabella voice. Any chance you could add in an API hook for optional ElevenLabs use?
monkeydust4 months ago
I have been looking for something credible that can voice over written emails (long form ones), documents and powerpoints locally ...this might be just the thing!
gunalx4 months ago
Kokoro seemed pretty nice for the size. I guess it is not much mvetter than a lot of the simpler tts. But at least it sounds less machinic than a few bad ones.
评论 #42709116 未加载
mrklol4 months ago
How can this support more languages than the model itself?
评论 #42710285 未加载
carlosjobim4 months ago
Why isn&#x27;t the audiobook market strong enough that it would make business sense to pay good narrators and actors for each book published?
评论 #42714668 未加载
therealdrag04 months ago
Do folks have a preferred toolkit for extracting text from web articles? I’d like to TTS articles friends send me.
vinni24 months ago
Can it also translate? I have family who would like audiobooks in German but most are in English only.
评论 #42709838 未加载
Havoc4 months ago
Wow that sample sounds really good
crorella4 months ago
Nice! It would be great to have per character voices
评论 #42717781 未加载
geor9e4 months ago
This one sounds a bit robotic and takes ~4 hours per book on my M1 laptop, so I&#x27;ll keep looking. For now, I&#x27;m happy my current method - EPUBReader browser extension, which opens .epub as an HTML page in Microsoft Edge browser, which has a &quot;Read Aloud&quot; button set to the Stephan natural voice at 1.6 speed. Best sounding voice I&#x27;ve ever heard, speaks fast, clear, crisp, with natural inflections to the sentences, and if I want to jump to somewhere I just left click the text at that spot. And it&#x27;s instant - no conversions. Downside is I have to stay in bluetooth range of my laptop, so I&#x27;m still looking for a good phone based method. Google Play Books works okay, but gets buggy at 1.6 speed.
jaggs4 months ago
This looks really nice. And fast too it seems.
ekianjo4 months ago
japanese is not supported yet despite the claims. you can easily realize that by running the examples provided.
ajsnigrutin4 months ago
Just tried it, and &quot;meh&quot;...<p>It&#x27;s one step above &quot;normal&quot; text-to-speech solutions, but not much above it. The epub has &quot;Chapter 1&quot; as the title on the page, and a lot of whitespace, and then &quot;This was....&quot; (actual text). The software somehow managed to ignore all the whitespace and reach &quot;chapter 1 this was..&quot; as a single sentance, no pauses, no nothing.<p>Blind? A great tool. Will it replace actual audiobooks? Well.. not yet at least.
cliftonpowell4 months ago
There&#x27;s another project called ebook2audiobook that has produces some decent results.
callamdelaney4 months ago
It&#x27;s insufferable.
leecarraher4 months ago
in case you are wondering how audiblez becomes an executable in the PATH from a pip install audiblez per the documentation<p>... audiblez book.epub -l en-gb -v af_sky.<p>it does not, instead it installs a python package with a cli interface, to run you then have to prepend python and load the module like this:<p>python3 -m audiblez book.epub -l en-gb -v af_sky.
DidYaWipe4 months ago
Yes, because real narrator&#x2F;actors are rolling in the dough. Let&#x27;s kill one more profession with trash.
评论 #42715091 未加载
treetalker4 months ago
For anyone looking for an easier alternative (and one without the bugs the author describes, such as skipping some prefaces or failing to detect some chapters), Voice Dream Reader on iOS (and macOS) handles .epub and other e-books just fine and supports a variety of built-in and external voices.
评论 #42723863 未加载
评论 #42709902 未加载
评论 #42713827 未加载
评论 #42709149 未加载
评论 #42711006 未加载
评论 #42709129 未加载