TL;DR of Deep Dive into LLMs Like ChatGPT by Andrej Karpathy

381 pointsby oleg_tarasov3 months ago

13 comments

albert_e3 months ago

OT: What is a good place to discuss the original video -- once it has dropped out of the HN front-page?I am going through the video myself -- roughly halfway through -- and have a fw things to bring up.Here they are now that we have a fresh opportunity to discuss:1 - MATH and LLMsI am curious why many of the examples Andrej chose to pose to the LLM were "computational" questions -- for instance "what is 2+2" or some numerical puzzles that needed algebraic thinking and then some addition/subtraction/multiplication (example at 1:50 mins about buying Apples and Oranges).I can understand these abilities of LLMs are becoming powerful and useful too -- but in my mind these are not the "basic" abilities of a next token predictor.I would have appreciated a more clear distinction of prompts that showcase core LLM ability -- to generate text that is acceptable as generally grammatically correct, based in facts and context, without necessarily needing the ability of a working memory / assigning values to algebraic variables / doing arithmetic etc.If there are any good references to discussion on the mathematical abilities of LLMs and the wisdom of trying to make them do math -- versus simply recognizing when a math is needed and generating the necessary python/expressions and let the tools handle it.2 - METAWhile Andrej briefly acknowledges the "meta" situation where LLMs are being used to create training data for the training of and judge the outputs of newer LLMs ... there is not much discussion on that here.There are just many more examples of how LLMs are used to prepare mitigations for hallucinations by preparing Q&A training sets with "correct" answers etcI am curious to know more about the limitations / perils of using LLMs to train/evaluate other LLMs.I kind of feel that this is a bit like the Manhattan project and atomic weapons -- in that early results and advances are being looped back immediately into the development of more powerful technology. (A smaller fission charge at the core of a larger fusion weapon -- to be very loose with analogies)

评论 #42998814 未加载

评论 #42998460 未加载

评论 #42999492 未加载

评论 #43000138 未加载

评论 #42998938 未加载

评论 #42999890 未加载

thomasahle3 months ago

I find Meta’s approach to hallucinations delightfully counter intuitive. Basically they (and presumably OpenAI and others):<pre><code> - Extract a snippet of training data. - Generate a factual question about it using Llama 3. - Have Llama 3 generate an answer. - Score the response against the original data. - If incorrect, train the model to recognize and refuse incorrect responses. </code></pre> In a way this is obvious in hindsight, but it goes against ML engineers natural tendency when detecting a wrong answer: Teaching the model the right answer.Instead of teaching the model to recognize what it doesn't know, why not teach it using those same examples? Of course the idea is to "connect the unused uncertainty neuron", which makes sense for out-of-context generalization. But we can at least appreciate why this wasn't an obvious thing to do for generation 1 LLMs.

评论 #42999478 未加载

评论 #42999474 未加载

quantumspandex3 months ago

Andrej's video is great but the explanation on the RL part is a bit vague to me. How exactly do we train on the right answers? Do we collect the reasoning traces and train on them like supervised learning or do we compute some scores and use them as a loss function ? Isn't the reward then very sparse? What if LLMs can't generate any right answers cause the problems are too hard?Also how can the training of LLMs be parallelized when updating parameters are sequential? Sure we can train on several samples simultaneously, but the parameter updates are with respect to the first step.

评论 #42999716 未加载

评论 #42999693 未加载

评论 #42999695 未加载

p0w3n3d3 months ago

On 53 minutes from the original video, he shows how exact is the quotation of an LLM based on the text it was learning from. I wonder how did the bigtech convince the courts that this is not copyright violation (especially when ChatGPT was quoting some GPL code). I can imagine that the same thing would happen opposite, if I trained a model to draw a disney character, and my ass would be sued in a fraction of a second.

评论 #43000407 未加载

评论 #42999899 未加载

评论 #42999885 未加载

dzogchen3 months ago

For a model to be ‘fully’ open source you need more than the model itself and a way to run it. You also need the data and the program that can be used to train it.See The Open Source AI Definition from OSI: <a href="https://opensource.org/ai" rel="nofollow">https://opensource.org/ai</a>

评论 #42997978 未加载

评论 #43072943 未加载

评论 #42998145 未加载

评论 #42999536 未加载

评论 #42997963 未加载

评论 #42997886 未加载

评论 #42998194 未加载

est3 months ago

I have read many articles about LLMs, and understand how it works in general, but one thing always bothers me: why other models did't work as good as SOTA ones? What's the history and reason behind the current model architecture?

评论 #42998954 未加载

评论 #42998866 未加载

评论 #42999039 未加载

评论 #42999547 未加载

评论 #43003806 未加载

khazhoux3 months ago

I'm still seeking an answer to what DeepSeek really is, especially in the context of their $5M versus ChatGPT's >$1B (source: internet). What did they do versus not do?

评论 #42998759 未加载

评论 #43072920 未加载

评论 #42998727 未加载

sylware3 months ago

It is sad to see that much attention given to LLM in comparison to the other types of AIs like those doing maths (strapped to a formal solver), folding proteins, etc.We had a talk about those physics AIs using those maths AIs to design hard mathematical models to fit fundamental physics data.

miletus3 months ago

i saw a good thread today: <a href="https://x.com/0xmetaschool/status/1888873661840634111" rel="nofollow">https://x.com/0xmetaschool/status/1888873661840634111</a>

bluelightning2k3 months ago

Great write up of what is presumably a truly great lecture. Debating trying to follow the original now.

9999_points3 months ago

Its a shame his LLC in C was just a launch board for his course.

wolfhumble3 months ago

I haven't watched the video, but was wondering about the Tokenization part from the TL;DR:"|" "View" "ing" "Single"Just looking at the text being tokenized in the linked article, it looked like (to me) that the text was: "I View", but the "I" is actually a pipe "|".From Step 3 in the link that @miletus posted in the Hacker News comment: <a href="https://x.com/0xmetaschool/status/1888873667624661455" rel="nofollow">https://x.com/0xmetaschool/status/1888873667624661455</a> the text that is being tokenized is:|Viewing Single (Post From) . . .The capitals used (View, Single) also makes more sense when seeing this part of the sentence.

EncomLab3 months ago

It would be great if the hardware issues were discussed more - too little is made of the distinction between silicon substrate, fixed threshold, voltage moderated brittle networks of solid-state switches and protein substrate, variable threshold, chemically moderated plastic networks of biological switches.To be clear, neither possesses any magical "woo" outside of physics that gives one or the other some secret magical properties - but these are not arbitrary meaningless distinctions in the way they are often discussed.