科技回声

18 条评论

I made a simple Chrome extension that similarly pulls down the video transcript and sends this to the openai chat completions endpoint: <a href="https://github.com/josephrmartinez/AskYouTube">https://github.com/josephrmartinez/AskYouTube</a>This extension allows me to "ask" the model to perform a task on the video content: - "Give me the materials list" (for a diy video) - "What was the recommended book?" (for a 2+ hour podcast where they made a reference I can't find again easily) - "Extract the recommended protocol" (for 3+ hour health videos) - "Provide a counter argument" (for when I'm getting bored...)Big plus is that you DO NOT need to wait for the ad to play through. I can just navigate to the video and send in a query without having to watch any ads.Youtube transcripts are pretty rough. At first, I used Whisper to create a better transcript. But my primary use is to ask something of the youtube video - I found that slinging the so-so transcript along with my task was totally fine. Really simple project: Chrome extension in just html, css, and js. FastAPI server for the openai endpoint. Server function does a quick tokenization on the transcript to determine if I need to use the gpt4 model for the 128k context window or if the gptt3.5 16k context window is okay.Naturally, here is a short youtube demo of the extension: <a href="https://www.youtube.com/watch?v=M1zq9NKIcbw&t=54s" rel="nofollow noreferrer">https://www.youtube.com/watch?v=M1zq9NKIcbw&t=54s</a>

andrewmunsell超过 1 年前

Since I had the same question as everyone else, it seems like it must be using just the transcript. When asking about one of those "8k HDR" showcase videos (with no speech), Bard responds with:> I'm sorry, but I'm unable to access this YouTube content. This is possible for a number of reasons, but the most common are: the content isn't a valid YouTube link, potentially unsafe content, or the content does not have a captions file that I can read.

评论 #38407596 未加载

theptip超过 1 年前

Whisper (OpenAI speech-to-text) is already trained on YT content; amusingly, if you mumble incoherently, its most-probable completion for noise is “thanks for watching!”

评论 #38409181 未加载

评论 #38408223 未加载

评论 #38408663 未加载

评论 #38407785 未加载

评论 #38407676 未加载

naet超过 1 年前

If it gets very good at "understanding" YouTube and other video content, Google could maybe find some kind of training data advantage not available to a pure text based model.

评论 #38407569 未加载

评论 #38407519 未加载

blibble超过 1 年前

can we use it to detect and skip parts of the video that contain ads?

评论 #38407266 未加载

评论 #38407586 未加载

ilaksh超过 1 年前

I assume it actually does not understand video, but reads the transcript?

评论 #38406996 未加载

评论 #38407016 未加载

评论 #38407498 未加载

评论 #38407086 未加载

dpflan超过 1 年前

Here is go now. The AI revolution begins with learning how-to videos. Just create the latent space for video/visual understanding, it's going to be very interesting to explore that.

SeanAnderson超过 1 年前

I wonder how this works. It sounds like it's transcript driven, but then the next question is - were the transcripts automatically created or user-defined?If the former, is this not going to run into the same issue as training AI on datasets created by AI? I experience so many mistranslated words when using automatic transcripts that I can't imagine the quality of data is excellent without supporting the transcripts with video inference.

SeanAnderson超过 1 年前

Is there any reason to believe YouTube content will only be trained on by Bard?Stuff like YouTubeDL exists and works fine. I would assume that others could scrape and train on it, too? Or does that sound outlandishly expensive?

评论 #38407561 未加载

lossolo超过 1 年前

Open source version of something similar:<a href="https://github.com/PKU-YuanGroup/Video-LLaVA">https://github.com/PKU-YuanGroup/Video-LLaVA</a>

Kye超过 1 年前

Finally, no more sitting through 15 minutes of an extended "hey guys" intro just to find out how to make a Redstone machine.

okdood64超过 1 年前

I'd argue that the Bard YouTube search is worse off right now than just searching on YouTube itself.I assume it'll improve over time.

readyplayernull超过 1 年前

It can't find the synthwave videos with least views, so Bard is being blinded by the recommendation algorithm.

评论 #38407765 未加载

johnea超过 1 年前

That is an accomplishment!I'll have to get it to explain, since most of that shit is incomprehensible to me...

评论 #38407475 未加载

great_psy超过 1 年前

Can it do something more intensive like asking …Give me a list of the video/recipes that use 3 eggs?

seanhunter超过 1 年前

Understanding the videos is all very well but can it understand:1- the popularity of "tier list" videos?2- why those douchetuber "prank" videos exist?3- Logan and/or Jake Paul?

评论 #38407602 未加载

blowski超过 1 年前

Would Bard understand if, say, a person in the video smiled or there was a sarcastic tone to a bit of audio?

RadixDLT超过 1 年前

baby steps

18 条评论

josephrmartinez超过 1 年前

andrewmunsell超过 1 年前

评论 #38407596 未加载

theptip超过 1 年前

Whisper (OpenAI speech-to-text) is already trained on YT content; amusingly, if you mumble incoherently, its most-probable completion for noise is “thanks for watching!”

评论 #38409181 未加载

评论 #38408223 未加载

评论 #38408663 未加载

评论 #38407785 未加载

评论 #38407676 未加载

naet超过 1 年前

If it gets very good at "understanding" YouTube and other video content, Google could maybe find some kind of training data advantage not available to a pure text based model.

评论 #38407569 未加载

评论 #38407519 未加载

blibble超过 1 年前

can we use it to detect and skip parts of the video that contain ads?

评论 #38407266 未加载

评论 #38407586 未加载

ilaksh超过 1 年前

I assume it actually does not understand video, but reads the transcript?

评论 #38406996 未加载

评论 #38407016 未加载

评论 #38407498 未加载

评论 #38407086 未加载

dpflan超过 1 年前

Here is go now. The AI revolution begins with learning how-to videos. Just create the latent space for video/visual understanding, it's going to be very interesting to explore that.

SeanAnderson超过 1 年前

评论 #38407561 未加载

lossolo超过 1 年前

Open source version of something similar:<a href="https://github.com/PKU-YuanGroup/Video-LLaVA">https://github.com/PKU-YuanGroup/Video-LLaVA</a>

Kye超过 1 年前

Finally, no more sitting through 15 minutes of an extended "hey guys" intro just to find out how to make a Redstone machine.

okdood64超过 1 年前

I'd argue that the Bard YouTube search is worse off right now than just searching on YouTube itself.I assume it'll improve over time.

readyplayernull超过 1 年前

It can't find the synthwave videos with least views, so Bard is being blinded by the recommendation algorithm.

评论 #38407765 未加载

johnea超过 1 年前

That is an accomplishment!I'll have to get it to explain, since most of that shit is incomprehensible to me...

评论 #38407475 未加载

great_psy超过 1 年前

Can it do something more intensive like asking …Give me a list of the video/recipes that use 3 eggs?

seanhunter超过 1 年前

Understanding the videos is all very well but can it understand:1- the popularity of "tier list" videos?2- why those douchetuber "prank" videos exist?3- Logan and/or Jake Paul?

评论 #38407602 未加载

blowski超过 1 年前

Would Bard understand if, say, a person in the video smiled or there was a sarcastic tone to a bit of audio?

RadixDLT超过 1 年前

baby steps

Google Bard AI Now Has the Ability to Understand YouTube Videos

18 条评论

Google Bard AI Now Has the Ability to Understand YouTube Videos

18 条评论