TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Intro to Large Language Models [Video]

291 点作者 georgehill超过 1 年前

22 条评论

samspenc超过 1 年前
I initially thought this was his older video, but I see this video was published just an hour ago (from the time of this comment).<p>I was thinking of this other video he had published early this year: &quot;Let&#x27;s build GPT: from scratch, in code, spelled out&quot; <a href="https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=kCc8FmEb1nY" rel="nofollow noreferrer">https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=kCc8FmEb1nY</a><p>Karpathy is generally well-reputed as a good tutor, especially for complex topics in AI &#x2F; ML.
chamoda超过 1 年前
Really interesting analogy in the video is the discussion about the system thinking from book Thinking, Fast and Slow by Daniel Kahneman<p>System 1 thinking: Fast automatic thinking and rapid decisions. For example is when someone ask you 2 + 2, you don&#x27;t think. You just reply quickly instantly. LLMs currently only have system 1 thinking.<p>System 2 thinking: Rational slow thinking to make complex decisions. For example when someone ask you 17 x 24 you think slowly and rationally to multiply. This kind of thinking is a major component we need for AGI. Current rumor from OpenAI about so called &quot;Q*&quot; algorithm could be something related to system 2 thinking (Just speculation at this point)
评论 #38393516 未加载
JSavageOne超过 1 年前
I wonder why OpenAI doesn&#x27;t try to get more feedback and training data from its users, though I do notice that sometimes it&#x27;ll give me two answers and ask me to pick the better one.<p>For example I&#x27;ve noticed that a lot of the time when I ask ChatGPT a coding question it might get 90% of the answer. When I tell it what to fix and&#x2F;or add, it usually gets the answer. I wonder if they&#x27;re using these refined answers to fine-tune those original prompts.<p>I wonder how the LLM interacts with other software like the calculator or Python interpreter. It would be great if this were modular so that the LLM OS could be more like Unix than Windows which is what OpenAI seems to be trying to emulate.<p>Ultimately though it seems to me like AGI is fairly straightforward from here. Just train on more quality data - in particular enabling the machine to generate this training data, increase parameter size, and the LLM just gets better and better. Seems like we don&#x27;t even need any new major breakthroughs to create something resembling AGI.
评论 #38391032 未加载
mrtksn超过 1 年前
Karpathy has an excellent zero-to-hero series on the topic in which he explains the very core of the neural networks, LLMs and the related concepts. With no background on the topic, I was able to get an idea what&#x27;s all this about and even become dangerous: <a href="https:&#x2F;&#x2F;karpathy.ai&#x2F;zero-to-hero.html" rel="nofollow noreferrer">https:&#x2F;&#x2F;karpathy.ai&#x2F;zero-to-hero.html</a><p>There&#x27;s something enlightening in hands-on learning without using metaphors. He even opens the code of production grade tools to show you how exactly the concepts he explained and build together are actually implemented IRL.<p>This is a style of teaching that clicks with me. I don&#x27;t learn well with metaphors and high abstractions and find it magical to remove the magic of amazing things and bring it down to easy to reason pieces which can create a complex structure with composition so you can just disregard the complexity as a separate thing of the core.
canada_dry超过 1 年前
Great video (as usual). Andrej has a Feynman like way of explaining very complex topics in a succinct and digestible way.<p>An aside... incredibly, it looks like he recorded in one cut from his hotel room.
anupj超过 1 年前
This was an incredibly informative talk, especially the ideal of giving LLMs the System 2 thinking capability. I think if LLMs can do system 2 thinking we are one more step closer to AGI. I’ve summarised the talk here - <a href="https:&#x2F;&#x2F;gist.github.com&#x2F;anupj&#x2F;f3a778dcb26972ba72c774634a80d796" rel="nofollow noreferrer">https:&#x2F;&#x2F;gist.github.com&#x2F;anupj&#x2F;f3a778dcb26972ba72c774634a80d7...</a> - if you anyone wants to RAG the text for their custom GPT :)
jpdelatorre超过 1 年前
This video was very informative and clear for a beginner like me who is curious about AI and ML. I&#x27;d like to learn more about how to finetune llama for different tasks and domains. Does anyone have any recommendations for resources that explain this concept in a simple way and gradually introduce the technical details and tools required?
leobg超过 1 年前
There actually is a reward function for text that can be used to go beyond the human input data. It is plausibility:<p>If you question the response and check it against responses about related questions, how much does it align with those?<p>This is what we humans do, too. It’s also what we do in science. It’s things not “adding up” that tells us where we must improve.
tangj超过 1 年前
Upon seeing the title I thought to myself &quot;what&#x27;s the point... karpathy already has an unbeatable series of videos on this&quot; ... before seeing that this too was from him. I assume if one has been through zero-to-hero, this is going over stuff you already know, but will report back after watching.
评论 #38406240 未加载
adithyan_win超过 1 年前
Summary and Transcript : <a href="https:&#x2F;&#x2F;www.wisdominanutshell.academy&#x2F;andrej-karpathy&#x2F;1hr-talk-intro-to-large-language-models&#x2F;" rel="nofollow noreferrer">https:&#x2F;&#x2F;www.wisdominanutshell.academy&#x2F;andrej-karpathy&#x2F;1hr-ta...</a>
nicognaw超过 1 年前
Andrej made the micrograd video after the code was released a couple years later. As someone who takes learning neural networks as a hobby, I really want to see him make videos about the llama.c project.
评论 #38397357 未加载
评论 #38392944 未加载
macrolime超过 1 年前
For the background for whatever Q* is, listen to 35:00-40:45.
pietz超过 1 年前
Oh no, not another video on ... OMG it&#x27;s from Andrej!
gardenhedge超过 1 年前
Perfect timing! I&#x27;m going through Andrew Ng&#x27;s course, Machine Learning Guide podcast and I have an hour free this morning to watch this :)
评论 #38397370 未加载
upupupandaway超过 1 年前
How good is this guy? It&#x27;s just awe-inspiring.
nla超过 1 年前
It&#x27;s Andre so it&#x27;s must see TV.<p>Highly recommend!
g-b-r超过 1 年前
Maybe the best speaker I&#x27;ve ever ran into, amazing. There should definitely be a voice model based on him
BOOSTERHIDROGEN超过 1 年前
Didn&#x27;t know there is scary jailbreak to manage, this is the risk we are going to face it.
uptownfunk超过 1 年前
Love Karpathy videos. It’s almost like he’s a stealth dev evangelist or something…
g-b-r超过 1 年前
This video gave me an I know Kung Fu moment
评论 #38395599 未加载
NoobSaibot135超过 1 年前
Here’s another awesome Karpathy lecture from Stanford:<p><a href="https:&#x2F;&#x2F;youtu.be&#x2F;XfpMkf4rD6E?si=1_EmuYDFfi7RNEhz" rel="nofollow noreferrer">https:&#x2F;&#x2F;youtu.be&#x2F;XfpMkf4rD6E?si=1_EmuYDFfi7RNEhz</a><p>This video is the best for learning attention, specifically where he explains:<p>Think of attention like a directed graph of vectors passing messages to each other<p>Keys are what other tokens are communicating to you,<p>Queries are what you are interested in,<p>and Values are what you are projecting out yourself.<p>When you matrix multiply the queries x keys(transposed), you measure the interestingness or affinity between the two.
multicast超过 1 年前
The majority representing non technical population, journalists and 50+ expert dependent bureaucrats (mostly law academics - never worked really) shitting their pants over alleged &quot;ai&quot; dangers indoctrinated by &quot;ai&quot; executives to push ahead regulation to secure their market position, or in the case of google because it makes their now shitty search business model obsolete in the long term, by creating entry barriers thus reducing competition.<p>Meanwhile a guy somewhere in Africa adjusting an answer probably stating that humans can do photosynthesis: bruh