TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Ask HN: Cheapest way to build custom audio TTS in 2024?

2 点作者 mittermayr大约 1 年前
Many years ago, I participated in a very odd&#x2F;unique museum project, where I was asked to just &quot;talk through&quot; my entire life. Someone came to my house, hit record, and I started talking. The objective was to be as detailed as possible, and to try and stay in sequence and not skip over anything. This took several days&#x2F;sessions, but I think I ended up with 12 CDs of just me recounting my life. Super odd.<p>I had no use for it, so far. It sits in a musem (along with other people&#x27;s stories), to have a sort of time capsule of what life was like, for a regular person, at the time.<p>I now started to wonder, if that would be enough audio content to just train a TTS model _properly_ (language is German). I know, some will respond saying &quot;you only need 15 seconds of audio&quot; — but I have NEVER managed to get any of these things to work properly, or to produce nice results. It seems like those things were mostly made to hit the news, but not for actual use.<p>So, in 2024, without a 4090 card or A100 sitting in my basement, and without wanting to spend a considerable amount of money on it, what would be the best approach to build a voice model out of this?<p>What I have is: Windows, OS X, Linux, and x64 as well as Apple M2 Pro. AND, I have A LOT OF TIME to let these things run on their own. Time is NOT an issue here, this can take however long it needs.<p>So, how would you build an audio model out of this? Without subscription services, without renting A100s — just, here, at home?<p>Thanks!

暂无评论

暂无评论