TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Google Bard AI Now Has the Ability to Understand YouTube Videos

59 pointsby titusblairover 1 year ago

18 comments

josephrmartinezover 1 year ago
I made a simple Chrome extension that similarly pulls down the video transcript and sends this to the openai chat completions endpoint: <a href="https:&#x2F;&#x2F;github.com&#x2F;josephrmartinez&#x2F;AskYouTube">https:&#x2F;&#x2F;github.com&#x2F;josephrmartinez&#x2F;AskYouTube</a><p>This extension allows me to &quot;ask&quot; the model to perform a task on the video content: - &quot;Give me the materials list&quot; (for a diy video) - &quot;What was the recommended book?&quot; (for a 2+ hour podcast where they made a reference I can&#x27;t find again easily) - &quot;Extract the recommended protocol&quot; (for 3+ hour health videos) - &quot;Provide a counter argument&quot; (for when I&#x27;m getting bored...)<p>Big plus is that you DO NOT need to wait for the ad to play through. I can just navigate to the video and send in a query without having to watch any ads.<p>Youtube transcripts are pretty rough. At first, I used Whisper to create a better transcript. But my primary use is to ask something of the youtube video - I found that slinging the so-so transcript along with my task was totally fine. Really simple project: Chrome extension in just html, css, and js. FastAPI server for the openai endpoint. Server function does a quick tokenization on the transcript to determine if I need to use the gpt4 model for the 128k context window or if the gptt3.5 16k context window is okay.<p>Naturally, here is a short youtube demo of the extension: <a href="https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=M1zq9NKIcbw&amp;t=54s" rel="nofollow noreferrer">https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=M1zq9NKIcbw&amp;t=54s</a>
andrewmunsellover 1 year ago
Since I had the same question as everyone else, it seems like it must be using just the transcript. When asking about one of those &quot;8k HDR&quot; showcase videos (with no speech), Bard responds with:<p>&gt; I&#x27;m sorry, but I&#x27;m unable to access this YouTube content. This is possible for a number of reasons, but the most common are: the content isn&#x27;t a valid YouTube link, potentially unsafe content, or the content does not have a captions file that I can read.
评论 #38407596 未加载
theptipover 1 year ago
Whisper (OpenAI speech-to-text) is already trained on YT content; amusingly, if you mumble incoherently, its most-probable completion for noise is “thanks for watching!”
评论 #38409181 未加载
评论 #38408223 未加载
评论 #38408663 未加载
评论 #38407785 未加载
评论 #38407676 未加载
naetover 1 year ago
If it gets very good at &quot;understanding&quot; YouTube and other video content, Google could maybe find some kind of training data advantage not available to a pure text based model.
评论 #38407569 未加载
评论 #38407519 未加载
blibbleover 1 year ago
can we use it to detect and skip parts of the video that contain ads?
评论 #38407266 未加载
评论 #38407586 未加载
ilakshover 1 year ago
I assume it actually does not understand video, but reads the transcript?
评论 #38406996 未加载
评论 #38407016 未加载
评论 #38407498 未加载
评论 #38407086 未加载
dpflanover 1 year ago
Here is go now. The AI revolution begins with learning how-to videos. Just create the latent space for video&#x2F;visual understanding, it&#x27;s going to be very interesting to explore that.
SeanAndersonover 1 year ago
I wonder how this works. It sounds like it&#x27;s transcript driven, but then the next question is - were the transcripts automatically created or user-defined?<p>If the former, is this not going to run into the same issue as training AI on datasets created by AI? I experience so many mistranslated words when using automatic transcripts that I can&#x27;t imagine the quality of data is excellent without supporting the transcripts with video inference.
SeanAndersonover 1 year ago
Is there any reason to believe YouTube content will only be trained on by Bard?<p>Stuff like YouTubeDL exists and works fine. I would assume that others could scrape and train on it, too? Or does that sound outlandishly expensive?
评论 #38407561 未加载
lossoloover 1 year ago
Open source version of something similar:<p><a href="https:&#x2F;&#x2F;github.com&#x2F;PKU-YuanGroup&#x2F;Video-LLaVA">https:&#x2F;&#x2F;github.com&#x2F;PKU-YuanGroup&#x2F;Video-LLaVA</a>
Kyeover 1 year ago
Finally, no more sitting through 15 minutes of an extended &quot;hey guys&quot; intro just to find out how to make a Redstone machine.
okdood64over 1 year ago
I&#x27;d argue that the Bard YouTube search is worse off right now than just searching on YouTube itself.<p>I assume it&#x27;ll improve over time.
readyplayernullover 1 year ago
It can&#x27;t find the synthwave videos with least views, so Bard is being blinded by the recommendation algorithm.
评论 #38407765 未加载
johneaover 1 year ago
That is an accomplishment!<p>I&#x27;ll have to get it to explain, since most of that shit is incomprehensible to me...
评论 #38407475 未加载
great_psyover 1 year ago
Can it do something more intensive like asking …<p>Give me a list of the video&#x2F;recipes that use 3 eggs?
seanhunterover 1 year ago
Understanding the videos is all very well but can it understand:<p>1- the popularity of &quot;tier list&quot; videos?<p>2- why those douchetuber &quot;prank&quot; videos exist?<p>3- Logan and&#x2F;or Jake Paul?
评论 #38407602 未加载
blowskiover 1 year ago
Would Bard understand if, say, a person in the video smiled or there was a sarcastic tone to a bit of audio?
RadixDLTover 1 year ago
baby steps