TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

LLMs and GPT: Some of my favorite learning materials

280 点作者 rain1大约 2 年前

10 条评论

lxe大约 2 年前
For self-directed learning, my favorite has been actually using ChatGPT (GPT4, especailly), because you can just ask as you go along. Some questions I asked:<p><pre><code> I have a pytorch ml llm gpt-style model and it has many layers, called &quot;attention&quot; and &quot;feed forward&quot;. Can you explain to somehow who is highly technical, understands software engineering, but isn&#x27;t deeply familiar with ML terms or linear algebra what these layers are for? Where I can get all the jargon for this AI&#x2F;ML stuff? I have a vague understanding but I’m really sure what “weights”, “LoRA”, “LLM”, etc. are to really understand where each tool and concept fit in. Explain to a knowledgeable software engineer with limited context on ML and linear algebra. In gpt&#x2F;llm world, what&#x27;s pre-training vs fine-tuning? in huggingface transformers what&#x27;s the difference bertween batch size vs microbatch size when training? how do I free cuda memory after training using huggingface transformers? I&#x27;m launching gradio with `demo.queue().launch()` from `main.py`. How can I allow passing command line arguments for port and share=True? Comment each of these arguments with explanations of what they do (this is huggingface transformers) args=transformers.TrainingArguments( per_device_train_batch_size=micro_batch_size, gradient_accumulation_steps=gradient_accumulation_steps, warmup_steps=100, # max_steps=max_steps, num_train_epochs=epochs, learning_rate=learning_rate, fp16=True, logging_steps=20, output_dir=output_dir, save_total_limit=3, ), In the context of ML, can you explain with examples what &quot;LoRA&quot;, &quot;LLM&quot;, &quot;weights&quot; are in relation to machine learning, specifically gpt-style language models?</code></pre>
评论 #35352881 未加载
评论 #35351872 未加载
rain1大约 2 年前
I have compiled a list of some of the materials that I found the best for learning about how LLMs work. I Hope this is useful to you all. I will continue to update it as I find new things.
评论 #35347041 未加载
评论 #35343849 未加载
curo大约 2 年前
This is great, I&#x27;ve loved Karpathy&#x27;s videos. Perhaps this is out of scope, but have you considered an &quot;Applied LLM&quot; section?<p>I know prompt engineering is a lesser art, but I think some of the literature on in-context, self-consistency, self ask, CoT, etc might be useful. Here&#x27;s my favorite lit review on the subject (no affiliation): <a href="https:&#x2F;&#x2F;lilianweng.github.io&#x2F;posts&#x2F;2023-03-15-prompt-engineering&#x2F;" rel="nofollow">https:&#x2F;&#x2F;lilianweng.github.io&#x2F;posts&#x2F;2023-03-15-prompt-enginee...</a>
ducktective大约 2 年前
Realistically speaking, how long would it take for someone who has college-level Linear Algebra and multi-variable Calculus knowldge and rusty familiarity with ML (via Andrew NG&#x27;s matlab course) to learn the concept of LLM and state of the art algorithms behind SDs and GPTs?<p>Would it even make sense if one&#x27;s interest is not image or text generation?
评论 #35349148 未加载
评论 #35353627 未加载
评论 #35349070 未加载
评论 #35349093 未加载
swyx大约 2 年前
I have not prettied mine up (would take PRs if any volunteers!) but here&#x27;s my equivalent repo of reading materials in case it helps <a href="https:&#x2F;&#x2F;github.com&#x2F;sw-yx&#x2F;ai-notes&#x2F;tree&#x2F;main#top-ai-reads">https:&#x2F;&#x2F;github.com&#x2F;sw-yx&#x2F;ai-notes&#x2F;tree&#x2F;main#top-ai-reads</a>
deepsquirrelnet大约 2 年前
I think one section that should be included is &quot;interfacing with LLMs&quot;. I know most of the stuff on your list, but without ever having used an LLM. A lot of the few shot&#x2F;prompt engineering, fine-tuning methods, LoRA, 8bit quantization stuff, etc.. would be the most useful to me. Practical knowledge of how to use them or adapt them to a domain seems more scattered and harder for me to find, since it&#x27;s all pretty much new.
sdht0大约 2 年前
&gt; Ted Chiang, ChatGPT Is a Blurry JPEG of the Web<p>I found an interesting counter perspective on the Mindscape podcast[0]:<p><pre><code> And Ted Chiang, in that article, suggests that ChatGPT and language models generally can be thought of as a kind of blurry jpeg of the web, where they get trained to compress the web and then at inference time, when you&#x27;re generating a text with these models, it&#x27;s a form of lossy decompression that involves interpolation in the same way. And I think it&#x27;s a very interesting test case for intuitions because I think this metaphor, this analogy, parts of it are pumping the right intuitions. There is definitely a deep connection between machine learning and compression that has long been observed and studied. [...] I think comparing it to lossy image decompression is pumping the wrong intuitions because, again, that suggests that all it&#x27;s doing is this kind of shallow interpolation that amounts to a form of approximate memorization where you have memorized some parts of the data and then you are loosely interpolating what&#x27;s in between. [...] the intuition is that there would be a way, presumably, to characterize what large language models and image generation models are doing when they generate images and texts as involving a form of interpolation but this form of interpolation would be very very different from what we might think of when we think of nearest neighbour pixel interpolation in lossy image decompression. So different, in fact, that this analogy is very unhelpful to understand what generative models are doing because, again, instead of being analogous to brute force memorization, there&#x27;s something much more generally novel and generative about the process of inference in these models. </code></pre> [0] <a href="https:&#x2F;&#x2F;www.preposterousuniverse.com&#x2F;podcast&#x2F;2023&#x2F;03&#x2F;20&#x2F;230-raphael-milliere-on-how-artificial-intelligence-thinks&#x2F;" rel="nofollow">https:&#x2F;&#x2F;www.preposterousuniverse.com&#x2F;podcast&#x2F;2023&#x2F;03&#x2F;20&#x2F;230-...</a>
ftxbro大约 2 年前
Maybe consider <a href="https:&#x2F;&#x2F;generative.ink&#x2F;posts&#x2F;simulators&#x2F;" rel="nofollow">https:&#x2F;&#x2F;generative.ink&#x2F;posts&#x2F;simulators&#x2F;</a> for the Philosophy of GPT section I think that one is by far the most insightful take.
abraxas大约 2 年前
Fast.ai YouTube course is a great intro to practical machine learning. I&#x27;m not sure if Jeremy Howard has made any lectures specifically about transformers but the course has plenty of good practical info that&#x27;s really well explained.
评论 #35370672 未加载
nborwankar大约 2 年前
Thank you! Very useful reference.