TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Google Bard AI Now Has the Ability to Understand YouTube Videos

59 点作者 titusblair超过 1 年前

18 条评论

josephrmartinez超过 1 年前
I made a simple Chrome extension that similarly pulls down the video transcript and sends this to the openai chat completions endpoint: <a href="https:&#x2F;&#x2F;github.com&#x2F;josephrmartinez&#x2F;AskYouTube">https:&#x2F;&#x2F;github.com&#x2F;josephrmartinez&#x2F;AskYouTube</a><p>This extension allows me to &quot;ask&quot; the model to perform a task on the video content: - &quot;Give me the materials list&quot; (for a diy video) - &quot;What was the recommended book?&quot; (for a 2+ hour podcast where they made a reference I can&#x27;t find again easily) - &quot;Extract the recommended protocol&quot; (for 3+ hour health videos) - &quot;Provide a counter argument&quot; (for when I&#x27;m getting bored...)<p>Big plus is that you DO NOT need to wait for the ad to play through. I can just navigate to the video and send in a query without having to watch any ads.<p>Youtube transcripts are pretty rough. At first, I used Whisper to create a better transcript. But my primary use is to ask something of the youtube video - I found that slinging the so-so transcript along with my task was totally fine. Really simple project: Chrome extension in just html, css, and js. FastAPI server for the openai endpoint. Server function does a quick tokenization on the transcript to determine if I need to use the gpt4 model for the 128k context window or if the gptt3.5 16k context window is okay.<p>Naturally, here is a short youtube demo of the extension: <a href="https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=M1zq9NKIcbw&amp;t=54s" rel="nofollow noreferrer">https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=M1zq9NKIcbw&amp;t=54s</a>
andrewmunsell超过 1 年前
Since I had the same question as everyone else, it seems like it must be using just the transcript. When asking about one of those &quot;8k HDR&quot; showcase videos (with no speech), Bard responds with:<p>&gt; I&#x27;m sorry, but I&#x27;m unable to access this YouTube content. This is possible for a number of reasons, but the most common are: the content isn&#x27;t a valid YouTube link, potentially unsafe content, or the content does not have a captions file that I can read.
评论 #38407596 未加载
theptip超过 1 年前
Whisper (OpenAI speech-to-text) is already trained on YT content; amusingly, if you mumble incoherently, its most-probable completion for noise is “thanks for watching!”
评论 #38409181 未加载
评论 #38408223 未加载
评论 #38408663 未加载
评论 #38407785 未加载
评论 #38407676 未加载
naet超过 1 年前
If it gets very good at &quot;understanding&quot; YouTube and other video content, Google could maybe find some kind of training data advantage not available to a pure text based model.
评论 #38407569 未加载
评论 #38407519 未加载
blibble超过 1 年前
can we use it to detect and skip parts of the video that contain ads?
评论 #38407266 未加载
评论 #38407586 未加载
ilaksh超过 1 年前
I assume it actually does not understand video, but reads the transcript?
评论 #38406996 未加载
评论 #38407016 未加载
评论 #38407498 未加载
评论 #38407086 未加载
dpflan超过 1 年前
Here is go now. The AI revolution begins with learning how-to videos. Just create the latent space for video&#x2F;visual understanding, it&#x27;s going to be very interesting to explore that.
SeanAnderson超过 1 年前
I wonder how this works. It sounds like it&#x27;s transcript driven, but then the next question is - were the transcripts automatically created or user-defined?<p>If the former, is this not going to run into the same issue as training AI on datasets created by AI? I experience so many mistranslated words when using automatic transcripts that I can&#x27;t imagine the quality of data is excellent without supporting the transcripts with video inference.
SeanAnderson超过 1 年前
Is there any reason to believe YouTube content will only be trained on by Bard?<p>Stuff like YouTubeDL exists and works fine. I would assume that others could scrape and train on it, too? Or does that sound outlandishly expensive?
评论 #38407561 未加载
lossolo超过 1 年前
Open source version of something similar:<p><a href="https:&#x2F;&#x2F;github.com&#x2F;PKU-YuanGroup&#x2F;Video-LLaVA">https:&#x2F;&#x2F;github.com&#x2F;PKU-YuanGroup&#x2F;Video-LLaVA</a>
Kye超过 1 年前
Finally, no more sitting through 15 minutes of an extended &quot;hey guys&quot; intro just to find out how to make a Redstone machine.
okdood64超过 1 年前
I&#x27;d argue that the Bard YouTube search is worse off right now than just searching on YouTube itself.<p>I assume it&#x27;ll improve over time.
readyplayernull超过 1 年前
It can&#x27;t find the synthwave videos with least views, so Bard is being blinded by the recommendation algorithm.
评论 #38407765 未加载
johnea超过 1 年前
That is an accomplishment!<p>I&#x27;ll have to get it to explain, since most of that shit is incomprehensible to me...
评论 #38407475 未加载
great_psy超过 1 年前
Can it do something more intensive like asking …<p>Give me a list of the video&#x2F;recipes that use 3 eggs?
seanhunter超过 1 年前
Understanding the videos is all very well but can it understand:<p>1- the popularity of &quot;tier list&quot; videos?<p>2- why those douchetuber &quot;prank&quot; videos exist?<p>3- Logan and&#x2F;or Jake Paul?
评论 #38407602 未加载
blowski超过 1 年前
Would Bard understand if, say, a person in the video smiled or there was a sarcastic tone to a bit of audio?
RadixDLT超过 1 年前
baby steps