Definitely would use this.<p>Instructional video instead of step-by-step text is a personal pet peeve. I know it's a lot easier to just record a video to show something like "how to replace the battery on a cordless vacuum" or "removing a sink basin nut" but it's often such a painful experience for consumption (watch a moment, pause, scrub back and watch again, pause, continue, pause, all with potentially gloved hands often in tight working spaces).
I saw a YouTube video by a guy who specializes in building D&D characters. He spends twenty minutes going into detail on each one, and then makes the pitch for subscribing to his Patreon account with something like "members get all the details in a convenient list so that you don't have to keep going back to this video."<p>So he's using the same bit of friction that this article is trying to solve, to fill his rice bowl. It's a bit of a shame that fixing this problem for me will cause one for him.
It's a very cool technical feat, but not something I would personally pay for. I'll just spend the 1-2 minutes to watch the video for free. Not trying to discourage you, just giving honest feedback. Launching the early landing page is a good idea to validate further.
I could also need a service for trimming all of the fat from how-to articles.<p>> We’ve all been there: we used the florb for too many glorbs and now it needs to be replaced. [...]<p>> This is an experience that everyone at the staff of howto.biz.uk has had! [...]<p>> But how do you replace a used-up florb? In this article we are going to show you how. [...]<p>> [scan the next five paragraphs]
This is pretty cool but I'd like to see a well-formatted recipe, not a transcript. I prefer the markdown format for recipes so I worked on something like this earlier this year [0]. It fetches Youtube subs (with no audio processing like the video itself like this project) and returns a markdown with ingredients and steps.<p>[0] <a href="https://github.com/gaganpreet/summarise-youtube-recipes">https://github.com/gaganpreet/summarise-youtube-recipes</a>
As someone who's learning was significantly accelerated by the "written tutorial" phase of the internet this would be a really great little tool. I find video tutorials to be far more cumbersome than text+ images.
I kind of wrote something for this a few years ago: <a href="https://github.com/rberenguel/glancer">https://github.com/rberenguel/glancer</a> [edited a fat-fingered copy-paste]<p>The use-case is technical videos (like from conferences) I’m interested, but not enough to invest 20-60 minutes.<p>Haven’t used it in a few months so the yt-dlp commands may need updating.
You can also use software to detect “cuts” in the video, which can be used to improve the frame-extraction over just getting six evenly spaced frames from the video.
Do video formats support structured meta data to be embedded in them?<p>If I make a video of me cooking, can I embed the recipe in the video, etc. Not just visually, but i.e. at 10s, I digitally insert the data "Add 1 cup red peppers". It isn't necessary a caption of something said or shown, just extra data.<p>Could a video creator leave substantially more metadata in their videos? I always assumed the pop-up metadata was externally stored and timestamp synced. Is there a way to embed it?
Recommend passing the speech-to-text narration through a round of GPT4 API to correct for any transcription errors (use some prompt giving context that it's speech to text)
Wonder if Kagi's universal summarizer would work on recipe videos. It seems to do a decent job on YouTube videos, but those usually have cc built in.
This is great, thank you for sharing! I wonder what the reverse would look like. More and more nowadays, I find myself first looking on YouTube for tutorials and walkthroughs, even if they wind up being more verbose than their written counterparts.
Based on the example shown on the page, the output doesn't seem very good. If that's one of the better examples the software produced, I don't think this will be useful in practice.
An evolution of this process would make it feasible to do retrieval-augmented generation using information from video content. I've thought about trying to do this to improve the (already impressive) abilities LLM's possess as a creative writing assistant/rubber ducky; a lot of good writing advice is on YouTube in the form of video essays, tutorials, lectures, etc.
The copyright notice on the output is a poor choice, since you almost certainly do not own the copyright to any of the content. You've gone to impressive lengths to ensure that the result is true to the source material, which means that there is no claim to this being a transformative work.<p>(Very cool and useful project, though.)
Ha! Print that video? Yes, but can you FIND THE PRINTER? ---- I humbly apologize, I thought this was some joke, or errant stupidity. Its not. This person has put some very serious thought into not only getting it to work, but to make it useful. Very useful. You have earned my Upvote, and recommendation. Thank you Mr Forret. Thank you.
If the main challenge was 'not having the smartphone in the kitchen', then one possible solution could have been getting another screen dedicated to the kitchen. A tablet, a laptop, a small TV+Google Cast or such combination.<p>It seems to be a proper media for 'printing' a video.<p>Of course, choosing challenges and finding solutions is what drives fun.
These tik tok videos are pretty short right? Why not just get a note book and write down the instructions.<p>You could even do a little line drawing of the important bits.<p>You could keep this "cook" book in your kitchen, and maybe pass it to one of your kids (just an example) when they move out or something.
I actually wonder if in the limit of video encoding we could just get a diffusion model that can in real time render realistic video based on a script. Then downloading a movie is just downloading a few megabytes of a prompt and you get a movie playing based off it locally.
Cool! I had the same project idea recently. You may be interested in this for the step of speech2text: <a href="https://github.com/SYSTRAN/faster-whisper">https://github.com/SYSTRAN/faster-whisper</a>
I think you could send all of that to GPT4 and ask it to read it and provide you with a step by step instruction : recipie and it would do so easily.<p>I didn’t see how that print out would be super useful, it’s not the complete step by step is it?
Ok, so:<p>* It does not print the video frames as a 3D object.<p>* Despite what the graphic at the link suggests, it doesn't 3D-print food<p>it extracts a recipe with images and text from a video, automatically.