TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Ask HN: Best way to do TTS for long texts

2 pointsby hexomanceralmost 3 years ago
I am trying to implement a screen reader functionality for a PDF viewer. I use Mozilla TTS on text of one page at a time, which works pretty well, however, I have found that it is prone to having strokes mid-speech. Here is an example: https:&#x2F;&#x2F;twitter.com&#x2F;Ali_Mostafavi_&#x2F;status&#x2F;1567436434621059072<p>One way to fix this would be to split the page&#x27;s text into multiple parts and then separately convert them to speech, but that would ruin the flow of speech.<p>I am curious as what causes this problem? And if there is any way to fix it?

3 comments

machinekobover 2 years ago
The problem is mostly about model training and architecture. I was doing TTS like 2.5&#x2F;3 years ago and most models were train on fixed (+&#x2F;- 5-10s) clips with like avg of 80 words or so, there were few attempts for fixing that and if I remember correctly few RNN-based models were good at ignoring input length and generate &quot;good&quot; audio but new flow-based and diffusion based models are out of my domain as I&#x27;m in CV for past few years and only read some new cool paper once in a while :)<p>You can also search for postags (and token ids for them) that are especially placed for &quot;pause&quot; audio as they often fix problem with weird transition when you split the sentences.<p>This repo -&gt; <a href="https:&#x2F;&#x2F;github.com&#x2F;TensorSpeech&#x2F;TensorFlowTTS" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;TensorSpeech&#x2F;TensorFlowTTS</a> was very good few years back.
gyuopyalmost 3 years ago
Does the same thing happen if you split the text by individual sentences?<p>I wonder if a suitable workaround, until a root cause fix is discovered, may be to cut silences longer than a certain duration from your output, while processing several inputs in parallel so this doesn&#x27;t risk halting the overall flow if there are several pauses in series.
评论 #32763524 未加载
mtmailalmost 3 years ago
Have you visited their discussion forum? <a href="https:&#x2F;&#x2F;discourse.mozilla.org&#x2F;c&#x2F;tts&#x2F;285" rel="nofollow">https:&#x2F;&#x2F;discourse.mozilla.org&#x2F;c&#x2F;tts&#x2F;285</a> It&#x27;s not very active but somebody might have a work-around.