TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Simple Explanation of LLMs

94 点作者 oedemis2 个月前

8 条评论

A_D_E_P_T2 个月前
It&#x27;s all prediction. Wolfram has been saying this from the beginning, I think. It hasn&#x27;t changed and it won&#x27;t change.<p>But it could be argued that the human mind is <i>fundamentally</i> similar. That consciousness is the combination of a spatial-temporal sense with a future-oriented simulating function. Generally, instead of simulating words or tokens, the biological mind simulates physical concepts. (Needless to say, if you imagine and visualize a ball thrown through the air, you have simulated a physical and mathematical concept.) One&#x27;s ability to internally form a representation of the world and one&#x27;s place in it, coupled with a subjective and bounded idea of self in objective space and time, results in what is effectively a general predictive function which is capable of broad abstraction.<p>A large facet of what&#x27;s called &quot;intelligence&quot; -- perhaps the largest facet -- is the strength and extensibility of the predictive function.<p>I really need to finish my book on this...
评论 #43279899 未加载
评论 #43281197 未加载
评论 #43284526 未加载
oedemis2 个月前
Hello, tried to explain Large Language Models with some visualizations, especially the attention mechanism.
评论 #43279931 未加载
antonkar2 个月前
Here’s an interpretability idea you may find interesting:<p>Let&#x27;s Turn AI Model Into a Place. The project to make AI interpretability research fun and widespread, by converting a multimodal language model into a place or a game like the Sims or GTA.<p>Imagine that you have a giant trash pile, how to make a language model out of it? First you remove duplicates of every item, you don&#x27;t need a million banana peels, just one will suffice. Now you have a grid with each item of trash in each square, like a banana peel in one, a broken chair in another. Now you need to put related things close together and draw arrows between related items.<p>When a person &quot;prompts&quot; this place AI, the player themself runs from one item to another to compute the answer to the prompt.<p>For example, you stand near the monkey, it’s your short prompt, you see around you a lot of items and arrows towards those items, the closest item is chewing lips, so you step towards them, now your prompt is “monkey chews”, the next closest item is a banana, but there are a lot of other possibilities around, like an apple a bit farther away and an old tire far away on the horizon (monkeys rarely chew tires, so the tire is far away).<p>You are the time-like chooser and the language model is the space-like library, the game, the place. It’s static and safe, while you’re dynamic and dangerous.
DebtDeflation2 个月前
Would love to see a similar explanation of how &quot;reasoning&quot; versions of LLMs are trained. I understand that OpenAI was mum about how they specifically trained o1&#x2F;o3 and that people are having to reverse engineer from the DeepSeek paper which may or may not be a different approach, but would like to see a coherent explanation which is not just an regurgitation of Chain of Thought or handwavy &quot;special reasoning tokens give the model more time to think&quot;.
评论 #43279667 未加载
rco87862 个月前
I&#x27;m not sure if I would call this &quot;simple&quot; but I appreciated the walk through. I understood a lot of it at a high level before reading, and this helped solidify my understanding a bit more. Though it also serves to highlight just how complex LLMs actually are.
noodletheworld2 个月前
While I appreciate the pictures, really at the end of the day all you have is a glossary and slightly more detailed arbitrary hand waving.<p>What <i>specific</i> architecture is used to build a basic model?<p>Why is that <i>specific</i> combination of basic building blocks used?<p>Why does it work when other similar ones don’t?<p>I generally approve of simplifications, but these LLM simplifications are too vague and broad to be useful or meaningful.<p>Here my challenge: take that article and write an LLM.<p>No?<p>How about an article on raytracing?<p>Anyone can do a raytracer in a weekend.<p>Why is building an LLM miles of explanation of concepts and nothing concrete you can actually build?<p>Where’s my “LLM in a weekend” that covers the theory <i>and</i> how to actually implement one?<p>The distinction between this and something like <a href="https:&#x2F;&#x2F;github.com&#x2F;rasbt&#x2F;LLMs-from-scratch">https:&#x2F;&#x2F;github.com&#x2F;rasbt&#x2F;LLMs-from-scratch</a> is stark.<p>My hot take is, if you haven’t built one, you don’t <i>actually</i> understand how they work, you just have a kind of vague kind-of-heard of it understanding, which is not the same thing.<p>…maybe that’s harsh, and unfair. I’ll take it, maybe it is; but I’ve seen a lot of LLM explanations that conveniently stop before they get to the hard part of “and how do you actually do it?”, and another one? Eh.
hegx2 个月前
Warning: these &quot;fundamentals&quot; will become obsolete faster than you can wrap your head around them.
评论 #43279683 未加载
评论 #43279224 未加载
betto2 个月前
Why don&#x27;t you come on my podcast to explain LLMs? I would love it.<p><a href="https:&#x2F;&#x2F;www.youtube.com&#x2F;@CouchX-SoftwareTechexplain-k9v" rel="nofollow">https:&#x2F;&#x2F;www.youtube.com&#x2F;@CouchX-SoftwareTechexplain-k9v</a>