Hello HN!<p>Please check out Summary Cat (<a href="https://www.summarycat.com" rel="nofollow noreferrer">https://www.summarycat.com</a>). It uses OpenAI's GPT-3.5 to summarize YouTube transcripts.<p>Please note that it only works for<p><pre><code> - *English* videos.
- videos that are not too long in length.
</code></pre>
I'd appreciate any feedbacks, criticisms, or feature requests!! You can also find my contact info in my profile. Thank you in advance.<p>------------Technical Details---------------<p>Tech Stack<p><pre><code> - Frontend: HTML/CSS
- Backend: Python/Flask
</code></pre>
APIs:<p><pre><code> - For grabbing YouTube's transcripts: I used youtube-transcript-api (https://pypi.org/project/youtube-transcript-api/)
- For summarizing the transcripts: I used OpenAI's GPT-3.5-turbo-16k: https://platform.openai.com/docs/guides/gpt.
- I used GPT-3.5 because GPT-4 is quite a lot more expensive (roughly 10X).
</code></pre>
My Prompt (Super Simple!)<p><pre><code> - "please summarize the following text into a few paragraphs:" + the full transcript.
</code></pre>
Thoughts about GPT-4 vs GPT-3.5-Turbo-16k for Summary Cat<p><pre><code> - GPT-4 was 20% better for "summary quality"
- GPT-4 feels 50% faster
- However, GPT-4 is about 10X as expensive as GPT-3.5
- Winner: GPT-3.5-Turbo-16k</code></pre>
Just used this to clear out my watch later list without having to watch anything. Nice!<p>Only note I have at this time is that it seemed to time out or hang or something on a long video (>2h) -- I'm guessing that there might be limitations to how much transcript you can chuck into GPT, it might be worth throwing an error of some sort in that scenario rather than the forever load<p>E: Seen you've asked for an example to the other person mentioning this. In my case it was this video <a href="https://www.youtube.com/watch?v=hFL6qRIJZ_Y">https://www.youtube.com/watch?v=hFL6qRIJZ_Y</a>
I tested it with two videos, the first one it does the summary quite well: <a href="https://youtu.be/Cy-NgpRN1FU" rel="nofollow noreferrer">https://youtu.be/Cy-NgpRN1FU</a>, I love how it mentions the dogs name is Ernie, that made me smile :)<p>But in the second video <a href="https://www.youtube.com/watch?v=NBFyvOV7fz8">https://www.youtube.com/watch?v=NBFyvOV7fz8</a> the app keeps mentioning things like: "The text discusses...", but the content is not a text, it's a video.<p>Really cool app, it's really quick too!
Pretty nice! Very useful idea, especially for videos on my watchlist I never get to because I feel they're too long.<p>Would love if I could ask follow up questions. Would be awesome to ask "Is X also explained?" and get a little summary back with the timestamp so I can jump to that point in the video.<p>Also it feels a bit slow and doesn't really give feedback whether it's making progress. That would be a good UX improvement.
How many tokens do you allow per session? I've been thinking about creating a similar app, but I'm a little bit concerned about the unintended costs.
Awesome work! I used it to summarize an hour long podcast I had been meaning to watch and it worked fabulously. What's amazing is that the transcript is auto-generated and of a conversation between two individuals without any indication of who's actually speaking. Yet GPT-3.5 is able to make sense of it.<p>Out of curiosity I downloaded the transcript myself with `youtube_transcript_api --format text` and counted the tokens via ttok [0], it was a tad over 16k. So what does your site do in that case? Is the transcript truncated?<p>[0]: <a href="https://github.com/simonw/ttok">https://github.com/simonw/ttok</a>
Two videos that give a 500 internal server error in the Network tab and an infinite spinner:<p><a href="https://www.youtube.com/watch?v=GuiTN4tOBr4">https://www.youtube.com/watch?v=GuiTN4tOBr4</a> (edit: this has no captions so maybe it's expected, but a proper error would be better)<p><a href="https://www.youtube.com/watch?v=iShzzAK9zxk">https://www.youtube.com/watch?v=iShzzAK9zxk</a> (edit: this may be because I marked the subtitles as UK English)
Seems to be an arms race between youtube forcing creators to make videos 8 mins long min to be able to get mid roll ads and people coming up with ways to summarize the transcript.<p>Idea for the future: Use the summarize to re-cut the videos to the most important parts. Like a super to the point tiktok style video that is nothing but dopamine being injected into your veins. There seems to already be "auto podcast clipper ai agents" out there but nothing for consumers to use. those are more video editor adjacent. If anyone wants to work on something like this, lemme know.
Plugged in this meme video and it gave me the "As a AI I can't...": <a href="https://www.youtube.com/watch?v=NlZzftmtGJY">https://www.youtube.com/watch?v=NlZzftmtGJY</a><p>Are you using celery for your async workers? Cool project!
It hang on non - english video.
I tried this one: <a href="https://youtu.be/B4kRwlHTcLM?si=3kp3pvQ4M4l6eRTT" rel="nofollow noreferrer">https://youtu.be/B4kRwlHTcLM?si=3kp3pvQ4M4l6eRTT</a>
Otherwise, brilliant
Interesting, very cool!<p>However, how does it do on videos where there's not a lot of speaking? Any plans to do <i>actual</i> video (image) processing?
It's hanging for everything I try.<p>I suggest a progress bar rather than a spinny thingy. Give the user some sense that a conclusion is on the horizon.<p>From my own experiments, I think you'll get better summaries with a prompt like "This is a transcription of a youtube video. Please etc etc etc". Context seems to help.
I tried to do something similar, but I could only get transcripts for videos with transcript files attached, which isnt a huge number of videos. How did you get around this?
Looks great, it gave a quick response. Are you putting the whole transcript in context? Have you encountered issues with transcripts that are too large?
Nice!<p>For those interested in comparing, <a href="https://www.summarize.tech/" rel="nofollow noreferrer">https://www.summarize.tech/</a> also builds summaries from YouTube videos but includes an overview, then a summary of each 5 min segment
Totally missed what this was supposed to do and tried to get a summary of a video discussing some music with captions. Got back garbage. Thought it might process the text from the frames. Shrug. Good idea for the use case you intended tho!