Show HN: Affordable text-to-speech for long-form content

55 点作者 yagudaev12 个月前

Hi HN, I’m Michael, creator of AudiowaveAI. I started this project out of frustration when I couldn't find an audiobook version of Make by Pieter Levels. The available text-to-speech options were either too robotic, overly complex, or simply too costly.It works really well for non-fiction long-form content (i.e. hours of audio).It’s early days for AudiowaveAI, and I’m looking for feedback to improve the product. Try it out and share your thoughts: [AudiowaveAI](<a href="https://audiowaveai.com" rel="nofollow">https://audiowaveai.com</a>). Thanks!

17 条评论

DreaminDani12 个月前

This is really cool! One quick note about your marketing copy, though: > Audio for humans, not robotsThere are plenty of blind folks who use traditional text to speech for navigating our devices. We prefer the robot text at ridiculously high speeds. We're humans too.I would love the option to switch to a more natural voice for more literary text (or even a fan fic) so I'll definitely be checking this out

评论 #40482668 未加载

lupusreal12 个月前

I've been using Piper for this. The quality is (in my subjective opinion) as good as the TTS built into MacOS is, it's open source, and it's so fast that you can run it in real time on a raspberry pi. On a real computer I can generate a whole audiobook in about 20 minutes.What I do is I split the book up into sentences, generate speech for each sentence and at the same time turn that sentence into subtitles. Then I combine the two and stitch them all together into a mp4 container with audio and a subtitle track using ffmpeg. mpv (and think VLC) can display subtitles synced to audio playback even when there is no video track.

评论 #40483798 未加载

smeej12 个月前

Is it possible to switch back and forth between the written text and the audio, like Amazon's Whispersync? I prefer reading with my eyes when I can (especially on my ereader, so with pagination instead of scrolling), but I would love to be able to flip narration on when I need to set the book down to do something like wash my dishes, then pick the book back up when I'm done.I've been looking for something that would let me synchronize Librivox recordings with Project Gutenberg epub files, but as much as I love the Librivox volunteers for their contributions, a lot of the recordings are such low audio quality that they're not fun to listen to. This would be a big step up, and there's no copyright worries for this use case because the works are in the public domain!

评论 #40484316 未加载

评论 #40484919 未加载

评论 #40494561 未加载

hereme88812 个月前

Good for you!So similar to my app. But I'm not a real programmer, so of course your is more refined.I almost launched the same exact online business.Here's my version (my github version is a bit less refined than my local code):<a href="https://github.com/sm18lr88/OpenAI_TTS_GUI">https://github.com/sm18lr88/OpenAI_TTS_GUI</a>

评论 #40505946 未加载

icev12 个月前

Interesting, I read epubs on Android using aiTTS as TTS engine using Google cloud voices.What I would really like is an option to download the whole book as mp3 for offline playback, and different voices for each character.

评论 #40485049 未加载

评论 #40484430 未加载

anonu12 个月前

Curious what the technical implementation looks like. What kind of TTS are you using? How do you scale it? What are the costs involved?

评论 #40506003 未加载

Aelius12 个月前

I have a use case for a niche audience:The videogame Final Fantasy XIV has a lot of text. A LOT of text.Someone has made a plugin to pipe text to external tts services, or a websocket. You talk to characters in game and hear the dialog read by the tts.<a href="https://github.com/karashiiro/TextToTalk">https://github.com/karashiiro/TextToTalk</a>For whatever reason, amazon poly only exposes middling quality voices to the plugin. And I'd rather not have an active AWS account for just this use case.ElevenLabs is supported by the plugin, but their service isn't really about tts and I'd have to pay the $220/yr tier to unlock further "pay as you go (per character)" with a budget of 100,000 characters per month. A bit steep for using it only for in this one game.If someone could help plumb AudiowaveAI to this plugin, I'd gladly turn off AWS for this!

评论 #40494858 未加载

评论 #40506153 未加载

andrewinardeer12 个月前

I've been looking for something like this. Thank you.A couple of questions:How do I delete projects?I must have tapped three times after submitting a Wikipedia article and it created three projects that apparently cannot be deleted.How do I delete my account?And for $15 I get credits. How many credits do I get foe $15? Is each credit a word translate? 1 credit == 1 word translated to audio?

评论 #40457370 未加载

radicalriddler12 个月前

So I work on a similar project in my spare time, but have just settled for Azure's Text to Speech service.What I'm actually interested in is your pricing model. Why do you have constraints on characters AND articles, versus just characters? Does doing the conversion cost a static amount that you don't want someone making 10000 requests a month? Or is the article count and hours of audio just an estimation of the 600,000 character limit?If it's just an estimate of real usage of the actual 600,000 character limit, then I'd try and word it differently, otherwise I feel like I'm going to be heavily constrained by the platform.

评论 #40506165 未加载

10100812 个月前

Hey. I have published a non-fiction book, and i would like to publish the audiobook on Amazon (Audible, etc). Do you know if the output is accepted by them? What format should I provide my book to AudiowaveAI to receive a good audio? Does it understand chapter titles, quotations, etc?

评论 #40506046 未加载

评论 #40494877 未加载

maddynator12 个月前

I looked into this problem a while back and haven’t looked at since.The base ai model sounded like whisper ai from meta. Did you train the voice yourself or is it one of defaults?I am always curious as to what copyright issues products like this run into. Also whats the stack like to build something like this?

评论 #40484140 未加载

syngrog6612 个月前

I reach first for Mac's built-in "say" program. Its not perfect, but good enough for most use cases. Free, simple, CLI driven. Potentially more private than a cloud service, and works when have 0 connectivity.

评论 #40506140 未加载

fortydegrees12 个月前

Is this a custom trained TTS model or is it an implementation of something like StyleTTSv2?

评论 #40506092 未加载

fortydegrees12 个月前

This is great. Simple and slick makes it easy to use. I'm impressed by the URL importing feature. Is that a GPT wrapper behind the scenes or do you use another library?

afrederico12 个月前

Love this tool; please include the pricing on the front page before one has to sign up. Thanks!

评论 #40506116 未加载

pmg10112 个月前

You put "medicore content" in place of (I assume) "mediocre content".

评论 #40506106 未加载

NayamAmarshe12 个月前

This looks nice a really nice product.

评论 #40506124 未加载

17 条评论

DreaminDani12 个月前

评论 #40482668 未加载

lupusreal12 个月前

评论 #40483798 未加载

smeej12 个月前

评论 #40484316 未加载

评论 #40484919 未加载

评论 #40494561 未加载

hereme88812 个月前

评论 #40505946 未加载

icev12 个月前

评论 #40485049 未加载

评论 #40484430 未加载

anonu12 个月前

Curious what the technical implementation looks like. What kind of TTS are you using? How do you scale it? What are the costs involved?

评论 #40506003 未加载

Aelius12 个月前

评论 #40494858 未加载

评论 #40506153 未加载

andrewinardeer12 个月前

评论 #40457370 未加载

radicalriddler12 个月前

评论 #40506165 未加载

10100812 个月前

评论 #40506046 未加载

评论 #40494877 未加载

maddynator12 个月前

评论 #40484140 未加载

syngrog6612 个月前

评论 #40506140 未加载

fortydegrees12 个月前

Is this a custom trained TTS model or is it an implementation of something like StyleTTSv2?

评论 #40506092 未加载

fortydegrees12 个月前

This is great. Simple and slick makes it easy to use. I'm impressed by the URL importing feature. Is that a GPT wrapper behind the scenes or do you use another library?

afrederico12 个月前

Love this tool; please include the pricing on the front page before one has to sign up. Thanks!