TechEcho

18 comments

I made a simple Chrome extension that similarly pulls down the video transcript and sends this to the openai chat completions endpoint: <a href="https://github.com/josephrmartinez/AskYouTube">https://github.com/josephrmartinez/AskYouTube</a>This extension allows me to "ask" the model to perform a task on the video content: - "Give me the materials list" (for a diy video) - "What was the recommended book?" (for a 2+ hour podcast where they made a reference I can't find again easily) - "Extract the recommended protocol" (for 3+ hour health videos) - "Provide a counter argument" (for when I'm getting bored...)Big plus is that you DO NOT need to wait for the ad to play through. I can just navigate to the video and send in a query without having to watch any ads.Youtube transcripts are pretty rough. At first, I used Whisper to create a better transcript. But my primary use is to ask something of the youtube video - I found that slinging the so-so transcript along with my task was totally fine. Really simple project: Chrome extension in just html, css, and js. FastAPI server for the openai endpoint. Server function does a quick tokenization on the transcript to determine if I need to use the gpt4 model for the 128k context window or if the gptt3.5 16k context window is okay.Naturally, here is a short youtube demo of the extension: <a href="https://www.youtube.com/watch?v=M1zq9NKIcbw&t=54s" rel="nofollow noreferrer">https://www.youtube.com/watch?v=M1zq9NKIcbw&t=54s</a>

andrewmunsellover 1 year ago

Since I had the same question as everyone else, it seems like it must be using just the transcript. When asking about one of those "8k HDR" showcase videos (with no speech), Bard responds with:> I'm sorry, but I'm unable to access this YouTube content. This is possible for a number of reasons, but the most common are: the content isn't a valid YouTube link, potentially unsafe content, or the content does not have a captions file that I can read.

评论 #38407596 未加载

theptipover 1 year ago

Whisper (OpenAI speech-to-text) is already trained on YT content; amusingly, if you mumble incoherently, its most-probable completion for noise is “thanks for watching!”

评论 #38409181 未加载

评论 #38408223 未加载

评论 #38408663 未加载

评论 #38407785 未加载

评论 #38407676 未加载

naetover 1 year ago

If it gets very good at "understanding" YouTube and other video content, Google could maybe find some kind of training data advantage not available to a pure text based model.

评论 #38407569 未加载

评论 #38407519 未加载

blibbleover 1 year ago

can we use it to detect and skip parts of the video that contain ads?

评论 #38407266 未加载

评论 #38407586 未加载

ilakshover 1 year ago

I assume it actually does not understand video, but reads the transcript?

评论 #38406996 未加载

评论 #38407016 未加载

评论 #38407498 未加载

评论 #38407086 未加载

dpflanover 1 year ago

Here is go now. The AI revolution begins with learning how-to videos. Just create the latent space for video/visual understanding, it's going to be very interesting to explore that.

SeanAndersonover 1 year ago

I wonder how this works. It sounds like it's transcript driven, but then the next question is - were the transcripts automatically created or user-defined?If the former, is this not going to run into the same issue as training AI on datasets created by AI? I experience so many mistranslated words when using automatic transcripts that I can't imagine the quality of data is excellent without supporting the transcripts with video inference.

SeanAndersonover 1 year ago

Is there any reason to believe YouTube content will only be trained on by Bard?Stuff like YouTubeDL exists and works fine. I would assume that others could scrape and train on it, too? Or does that sound outlandishly expensive?

评论 #38407561 未加载

lossoloover 1 year ago

Open source version of something similar:<a href="https://github.com/PKU-YuanGroup/Video-LLaVA">https://github.com/PKU-YuanGroup/Video-LLaVA</a>

Kyeover 1 year ago

Finally, no more sitting through 15 minutes of an extended "hey guys" intro just to find out how to make a Redstone machine.

okdood64over 1 year ago

I'd argue that the Bard YouTube search is worse off right now than just searching on YouTube itself.I assume it'll improve over time.

readyplayernullover 1 year ago

It can't find the synthwave videos with least views, so Bard is being blinded by the recommendation algorithm.

评论 #38407765 未加载

johneaover 1 year ago

That is an accomplishment!I'll have to get it to explain, since most of that shit is incomprehensible to me...

评论 #38407475 未加载

great_psyover 1 year ago

Can it do something more intensive like asking …Give me a list of the video/recipes that use 3 eggs?

seanhunterover 1 year ago

Understanding the videos is all very well but can it understand:1- the popularity of "tier list" videos?2- why those douchetuber "prank" videos exist?3- Logan and/or Jake Paul?

评论 #38407602 未加载

blowskiover 1 year ago

Would Bard understand if, say, a person in the video smiled or there was a sarcastic tone to a bit of audio?

RadixDLTover 1 year ago

baby steps

18 comments

josephrmartinezover 1 year ago

andrewmunsellover 1 year ago

评论 #38407596 未加载

theptipover 1 year ago

Whisper (OpenAI speech-to-text) is already trained on YT content; amusingly, if you mumble incoherently, its most-probable completion for noise is “thanks for watching!”

评论 #38409181 未加载

评论 #38408223 未加载

评论 #38408663 未加载

评论 #38407785 未加载

评论 #38407676 未加载

naetover 1 year ago

If it gets very good at "understanding" YouTube and other video content, Google could maybe find some kind of training data advantage not available to a pure text based model.

评论 #38407569 未加载

评论 #38407519 未加载

blibbleover 1 year ago

can we use it to detect and skip parts of the video that contain ads?

评论 #38407266 未加载

评论 #38407586 未加载

ilakshover 1 year ago

I assume it actually does not understand video, but reads the transcript?

评论 #38406996 未加载

评论 #38407016 未加载

评论 #38407498 未加载

评论 #38407086 未加载

dpflanover 1 year ago

Here is go now. The AI revolution begins with learning how-to videos. Just create the latent space for video/visual understanding, it's going to be very interesting to explore that.

SeanAndersonover 1 year ago

评论 #38407561 未加载

lossoloover 1 year ago

Open source version of something similar:<a href="https://github.com/PKU-YuanGroup/Video-LLaVA">https://github.com/PKU-YuanGroup/Video-LLaVA</a>

Kyeover 1 year ago

Finally, no more sitting through 15 minutes of an extended "hey guys" intro just to find out how to make a Redstone machine.

okdood64over 1 year ago

I'd argue that the Bard YouTube search is worse off right now than just searching on YouTube itself.I assume it'll improve over time.

readyplayernullover 1 year ago

It can't find the synthwave videos with least views, so Bard is being blinded by the recommendation algorithm.

评论 #38407765 未加载

johneaover 1 year ago

That is an accomplishment!I'll have to get it to explain, since most of that shit is incomprehensible to me...

评论 #38407475 未加载

great_psyover 1 year ago

Can it do something more intensive like asking …Give me a list of the video/recipes that use 3 eggs?

seanhunterover 1 year ago

Understanding the videos is all very well but can it understand:1- the popularity of "tier list" videos?2- why those douchetuber "prank" videos exist?3- Logan and/or Jake Paul?

评论 #38407602 未加载

blowskiover 1 year ago

Would Bard understand if, say, a person in the video smiled or there was a sarcastic tone to a bit of audio?

RadixDLTover 1 year ago

baby steps

Google Bard AI Now Has the Ability to Understand YouTube Videos

18 comments

Google Bard AI Now Has the Ability to Understand YouTube Videos

18 comments