TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Show HN: Affordable text-to-speech for long-form content

55 点作者 yagudaev12 个月前
Hi HN, I’m Michael, creator of AudiowaveAI. I started this project out of frustration when I couldn&#x27;t find an audiobook version of <i>Make</i> by Pieter Levels. The available text-to-speech options were either too robotic, overly complex, or simply too costly.<p>It works really well for non-fiction long-form content (i.e. hours of audio).<p>It’s early days for AudiowaveAI, and I’m looking for feedback to improve the product. Try it out and share your thoughts: [AudiowaveAI](<a href="https:&#x2F;&#x2F;audiowaveai.com" rel="nofollow">https:&#x2F;&#x2F;audiowaveai.com</a>). Thanks!

17 条评论

DreaminDani12 个月前
This is really cool! One quick note about your marketing copy, though: &gt; Audio for humans, not robots<p>There are plenty of blind folks who use traditional text to speech for navigating our devices. We prefer the robot text at ridiculously high speeds. We&#x27;re humans too.<p>I would love the option to switch to a more natural voice for more literary text (or even a fan fic) so I&#x27;ll definitely be checking this out
评论 #40482668 未加载
lupusreal12 个月前
I&#x27;ve been using Piper for this. The quality is (in my subjective opinion) as good as the TTS built into MacOS is, it&#x27;s open source, and it&#x27;s so fast that you can run it in real time on a raspberry pi. On a real computer I can generate a whole audiobook in about 20 minutes.<p>What I do is I split the book up into sentences, generate speech for each sentence and at the same time turn that sentence into subtitles. Then I combine the two and stitch them all together into a mp4 container with audio and a subtitle track using ffmpeg. mpv (and think VLC) can display subtitles synced to audio playback even when there is no video track.
评论 #40483798 未加载
smeej12 个月前
Is it possible to switch back and forth between the written text and the audio, like Amazon&#x27;s Whispersync? I prefer reading with my eyes when I can (especially on my ereader, so with pagination instead of scrolling), but I would love to be able to flip narration on when I need to set the book down to do something like wash my dishes, then pick the book back up when I&#x27;m done.<p>I&#x27;ve been looking for something that would let me synchronize Librivox recordings with Project Gutenberg epub files, but as much as I love the Librivox volunteers for their contributions, a lot of the recordings are such low audio quality that they&#x27;re not fun to listen to. This would be a big step up, and there&#x27;s no copyright worries for this use case because the works are in the public domain!
评论 #40484316 未加载
评论 #40484919 未加载
评论 #40494561 未加载
hereme88812 个月前
Good for you!<p>So similar to my app. But I&#x27;m not a real programmer, so of course your is more refined.<p>I almost launched the same exact online business.<p>Here&#x27;s my version (my github version is a bit less refined than my local code):<p><a href="https:&#x2F;&#x2F;github.com&#x2F;sm18lr88&#x2F;OpenAI_TTS_GUI">https:&#x2F;&#x2F;github.com&#x2F;sm18lr88&#x2F;OpenAI_TTS_GUI</a>
评论 #40505946 未加载
icev12 个月前
Interesting, I read epubs on Android using aiTTS as TTS engine using Google cloud voices.<p>What I would really like is an option to download the whole book as mp3 for offline playback, and different voices for each character.
评论 #40485049 未加载
评论 #40484430 未加载
anonu12 个月前
Curious what the technical implementation looks like. What kind of TTS are you using? How do you scale it? What are the costs involved?
评论 #40506003 未加载
Aelius12 个月前
I have a use case for a niche audience:<p>The videogame Final Fantasy XIV has a lot of text. A LOT of text.<p>Someone has made a plugin to pipe text to external tts services, or a websocket. You talk to characters in game and hear the dialog read by the tts.<p><a href="https:&#x2F;&#x2F;github.com&#x2F;karashiiro&#x2F;TextToTalk">https:&#x2F;&#x2F;github.com&#x2F;karashiiro&#x2F;TextToTalk</a><p>For whatever reason, amazon poly only exposes middling quality voices to the plugin. And I&#x27;d rather not have an active AWS account for just this use case.<p>ElevenLabs is supported by the plugin, but their service isn&#x27;t really about tts and I&#x27;d have to pay the $220&#x2F;yr tier to unlock further &quot;pay as you go (per character)&quot; with a budget of 100,000 characters per month. A bit steep for using it only for in this one game.<p>If someone could help plumb AudiowaveAI to this plugin, I&#x27;d gladly turn off AWS for this!
评论 #40494858 未加载
评论 #40506153 未加载
andrewinardeer12 个月前
I&#x27;ve been looking for something like this. Thank you.<p>A couple of questions:<p>How do I delete projects?<p>I must have tapped three times after submitting a Wikipedia article and it created three projects that apparently cannot be deleted.<p>How do I delete my account?<p>And for $15 I get credits. How many credits do I get foe $15? Is each credit a word translate? 1 credit == 1 word translated to audio?
评论 #40457370 未加载
radicalriddler12 个月前
So I work on a similar project in my spare time, but have just settled for Azure&#x27;s Text to Speech service.<p>What I&#x27;m actually interested in is your pricing model. Why do you have constraints on characters AND articles, versus just characters? Does doing the conversion cost a static amount that you don&#x27;t want someone making 10000 requests a month? Or is the article count and hours of audio just an estimation of the 600,000 character limit?<p>If it&#x27;s just an estimate of real usage of the actual 600,000 character limit, then I&#x27;d try and word it differently, otherwise I feel like I&#x27;m going to be heavily constrained by the platform.
评论 #40506165 未加载
10100812 个月前
Hey. I have published a non-fiction book, and i would like to publish the audiobook on Amazon (Audible, etc). Do you know if the output is accepted by them? What format should I provide my book to AudiowaveAI to receive a good audio? Does it understand chapter titles, quotations, etc?
评论 #40506046 未加载
评论 #40494877 未加载
maddynator12 个月前
I looked into this problem a while back and haven’t looked at since.<p>The base ai model sounded like whisper ai from meta. Did you train the voice yourself or is it one of defaults?<p>I am always curious as to what copyright issues products like this run into. Also whats the stack like to build something like this?
评论 #40484140 未加载
syngrog6612 个月前
I reach first for Mac&#x27;s built-in &quot;say&quot; program. Its not perfect, but good enough for most use cases. Free, simple, CLI driven. Potentially more private than a cloud service, and works when have 0 connectivity.
评论 #40506140 未加载
fortydegrees12 个月前
Is this a custom trained TTS model or is it an implementation of something like StyleTTSv2?
评论 #40506092 未加载
fortydegrees12 个月前
This is great. Simple and slick makes it easy to use. I&#x27;m impressed by the URL importing feature. Is that a GPT wrapper behind the scenes or do you use another library?
afrederico12 个月前
Love this tool; please include the pricing on the front page before one has to sign up. Thanks!
评论 #40506116 未加载
pmg10112 个月前
You put &quot;medicore content&quot; in place of (I assume) &quot;mediocre content&quot;.
评论 #40506106 未加载
NayamAmarshe12 个月前
This looks nice a really nice product.
评论 #40506124 未加载