TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

From word models to world models

100 pointsby dimmuborgiralmost 2 years ago

11 comments

cs702almost 2 years ago
After a quick&#x2F;superficial read, my understanding is that the authors:<p>(a) induce an LLM to take natural language inputs and generate statements in a probabilistic programming language that formally models concepts, objects, actions, etc. in a symbolic world model, drawing from a large body of research on symbolic AI that goes back to pre-deep-learning days; and<p>(b) perform inference using the generated formal statements, i.e., compute probability distributions over the space of possible world states that are consistent with and conditioned on the natural-language input to the LLM.<p>If this approach works at a larger scale, it represents a possible solution for grounding LLMs so they stop making stuff up -- an important unsolved problem.<p>The public repo is at <a href="https:&#x2F;&#x2F;github.com&#x2F;gabegrand&#x2F;world-models">https:&#x2F;&#x2F;github.com&#x2F;gabegrand&#x2F;world-models</a> but the code necessary for replicating results has not been published yet.<p>The volume of interesting new research being done on LLMs continues to amaze me.<p>We sure live in interesting times!<p>---<p>PS. If any of the authors are around, please feel free to point out any errors in my understanding.
评论 #36446774 未加载
评论 #36453410 未加载
mjburgessalmost 2 years ago
It&#x27;s a surprise to see a paper actually try to solve the problem of modelling thought via language.<p>Nevertheless, it begins with far too many hedges:<p>&gt; By scaling to even larger datasets and neural networks, LLMs appeared to learn not only the structure of language, but capacities for some kinds of thinking<p>There&#x27;s two hypotheses for how LLMs generate apparently &quot;thought-expressing&quot; outputs: Hyp1 -- it&#x27;s sampling from similar text which is distributed so-as-to-express a thought by some agent; Hyp2 -- it has the capacity to form that thought.<p>It is absolutely trivial to show Hyp2 is false:<p>&gt; Current LLMs can produce impressive results on a set of linguistic inputs and then fail completely on others that make trivial alterations to the same underlying domain.<p>Indeed: because there&#x27;re no relevant prior cases to sample from in that case.<p>&gt; These issues make it difficult to evaluate whether LLMs have acquired cognitive capacities such as social reasoning and theory of mind<p>It doesnt. It&#x27;s trivial: the disproof lies one sentence above. Its just that many don&#x27;t like the answer. Such <i>capacities</i> survive trivial permutations -- LLMs do not. So Hypothesis-2 is clearly <i>false</i>.
评论 #36446054 未加载
评论 #36446574 未加载
评论 #36446141 未加载
评论 #36446016 未加载
评论 #36453535 未加载
评论 #36446961 未加载
评论 #36450326 未加载
评论 #36451034 未加载
评论 #36446157 未加载
mjburgessalmost 2 years ago
The level of understanding of the problem that this paper expresses is extraordianry in my reading of this field --- it&#x27;s a genuinely amazing synthesis.<p>&gt; How could the common-sense background knowledge needed for dynamic world model synthesis be represented, even in principle? Modern game engines may provide important clues.<p>This has often been my starting point in modelling the difference between a model-of-pixels vs. a world model. Any given video game session can be &quot;replayed&quot; by a model of its pixels: but you cannot play the game with such a model. It does not represent the causal laws of the game.<p>Even if you had all possible games you could not resolve between player-caused and world-caused frames.<p>&gt; A key question is how to model this capability. How do minds craft bespoke world models on the fly, drawing in just enough of our knowledge about the world to answer the questions of interest?<p>This requires a body: the relevant information missing is causal, and the body resolves P(A|B) and P(A|B-&gt;A) by making bodily actions interpreted as necessarily causal.<p>In the case of video games, since we hold the controller, we resolve P(EnemyDead|EnemyHit) vs. P(EnemyDead| (ButtonPress -&gt;) EnemyHit -&gt; EnemyDead)
antiquarkalmost 2 years ago
I doubt that word models can lead to world models. To quote Yann LeCun:<p>&quot;The vast majority of our knowledge, skills, and thoughts are not verbalizable. That&#x27;s one reason machines will never acquire common sense solely by reading text.&quot;<p><a href="https:&#x2F;&#x2F;twitter.com&#x2F;ylecun&#x2F;status&#x2F;1368235803147649028" rel="nofollow noreferrer">https:&#x2F;&#x2F;twitter.com&#x2F;ylecun&#x2F;status&#x2F;1368235803147649028</a>
评论 #36446538 未加载
评论 #36446241 未加载
评论 #36449865 未加载
评论 #36451957 未加载
gibsonf1almost 2 years ago
Unfortunately, this effort fully misses the boat. Human cognition is about concepts, not language, and that&#x27;s where one must start to understand it. Language simply serializes our conceptual thinking in multiple language formats, the key is what&#x27;s being serialized and how that actually works in conceptual awareness.
评论 #36446829 未加载
评论 #36453445 未加载
dimaturaalmost 2 years ago
This is really interesting. The title is referencing the &quot;Language of Thought&quot; hypothesis from early cognitive psychology, that posited thought consisted of symbol manipulation akin to computer programs. The same idea was behind was also what is often referred to GOFAI. But the idea has largely fallen out of fashion in both psychology and AI. There&#x27;s a twist here in the &quot;probabilistic&quot; part, and of course the surprising success of LLMs makes this a more compelling idea than it would&#x27;ve been only a couple of years ago. And there&#x27;s also an acknowledgement of the need for some kind of sensorimotor grounding as well. Pretty cool!
ilakshalmost 2 years ago
So they are using GPT-4 to write Lisp? Or some probabilistic language that looks like Lisp.<p>They keep saying LLMs but only GPT-4 can do it at that level. Although actually some of the examples were pretty basic so I guess it really depends on the level of complexity.<p>I feel like this could be really useful in cases where you want some kind of auditable and machine interpretable rationale for doing something. Such as self driving cars or military applications. Or maybe some robots. It could make it feasible to add a layer of hard rules in a way.
mercurialsoloalmost 2 years ago
Humans come in all shapes and forms of sensory as well as cognitive abilities. Our true ability to be human comes from objectives (derived from biological and socially bound complex systems) that drive us, feedback loops (ability to morph &#x2F; affect the goals) and continuous sensory capabilities.<p>Reasoning is just prediction with memory towards an objective.<p>Once large models have these perpetual operating sensory loops with objective functions, the ability to distinguish model powered intelligence and human like intelligence tends to drop.
wilonthalmost 2 years ago
Was excited for a moment, thought it was related to this <a href="https:&#x2F;&#x2F;worldmodels.github.io&#x2F;" rel="nofollow noreferrer">https:&#x2F;&#x2F;worldmodels.github.io&#x2F;</a>.<p>World models are meant to be for simulating environments. If this was something like testing if a game agent with llm can form thoughts as it play through some game it would be very interesting. Maybe someone on HN can do this?
评论 #36450661 未加载
评论 #36450439 未加载
antisthenesalmost 2 years ago
World modeling is impossible without sensory input.<p>You need constant modeling of touch&#x2F;smell&#x2F;vision&#x2F;temperature, etc.<p>These senses give us an actual understanding of the physical world and drive our behavior in a way that pure language will never be able to.
评论 #36450033 未加载
sgt101almost 2 years ago
I : hhmmppp a paper from Tenenbaum&#x27;s group, let&#x27;s read.<p>Paper : Hi! I am 94 pages long.<p>I : omg...