This repo has all the code for the project, which is based on Whisper and MoviePy. It has a version that can run in an IPython Notebook, and I have also released an application version compatible with Gradient Deployments. This was only my second time working on front end with HTML, so it's all still a little rudimentary for now.<p>The way it works is by first generating the translated speech to text values at labeled timestamps with Whisper. Then, MoviePy scales the captions to the size of the inputted video and overlays them at the (mostly) correct timeslots.<p>Thanks to OpenAI for the incredible free model.