Hey HN!<p>A few months ago I was trying to dub/translate a pretty simple video from English to Chinese for my parents but soon realized that all of the good solutions like Elevenlabs/Rask.ai/Speechify costs around $2/min. Which means that to translate a 10 minute Youtube video it was going to cost me $20 USD, and that was just a "little" over my budget.<p>Having worked with similar technologies in the past, I knew that there was nothing too “proprietary” here. It's just Whisper to transcribe, an LLM to translate, some API do TTS and some API to do bg audio extraction + a frontend. And the entire process should cost nearly 0 given how cheap all of those things has gotten, yet… they all still charge around $2/min!<p>So I decided to build an open-source AI dubbing studio that basically charges as low as I can. Which ended up being around ~$0.1/min.(Hopefully I don’t lose money on this haha) The frontend is built using Nextjs14 using the new RSC and Server Actions, hosted on Vercel. The main complexity here is probably just the “video editor” and real time preview + audio generation, which uses Tonejs! Auth is done with Clerk! Most of the “quick” API calls are done with server actions but there is also a Node server responsible for processes that takes a bit longer like initializing and exporting. Honestly, I'm not sure if that was the best way to do it, but I was familiar with the tech stack so I just went for it haha.<p>To be 100% real, the product right now is not as feature rich as those alternatives listed above. So if you have the budget, go for those instead!(ie. we have no voice cloning atm) However, it's definitely good enough to dub most simple videos where the information is more important that the expressions(ie. educational, tutorials, building ones). I also put in quite a bit of effort to ensure the UX is great + snappy and personally prefer it over those alternatives listed above.<p>I dubbed a few good Chinese videos myself into English to demonstrate!<p>1. An extravagant instant noodle timer ->
<a href="https://www.youtube.com/watch?v=VShnKBXTQ8Y" rel="nofollow">https://www.youtube.com/watch?v=VShnKBXTQ8Y</a><p>2. Building a real Harry Potter wand ->
<a href="https://www.youtube.com/watch?v=IG1yIAeSae0" rel="nofollow">https://www.youtube.com/watch?v=IG1yIAeSae0</a><p>3. Making a snake sword out of real snake bones ->
<a href="https://www.youtube.com/watch?v=Xb2hjSkrg6k" rel="nofollow">https://www.youtube.com/watch?v=Xb2hjSkrg6k</a><p>I think there are countless videos like those above where even when you slap on an AI voice, it still is very entertaining(at least to myself lol). There are also just so much valuable videos in English that'll never get dubbed into other languages because of the costs associated with it.<p>I'd love some feedback from fellow HN readers. I'm more of a designer than engineer, so I'm feeling a little insecure about the code haha, hence any tips there would be appreciated! Would also love to hear about what you think of the product, your usecases, anything really!<p>Here's the code: <a href="https://github.com/DubbieHQ/dubbie">https://github.com/DubbieHQ/dubbie</a><p>Here's the landing page: <a href="https://dubbie.com" rel="nofollow">https://dubbie.com</a>