Large language models lack deep insights or a theory of mind

277 点作者 mnode超过 1 年前

26 条评论

tinco超过 1 年前

I think that if they would, that would be very surprising and indicative of a lot of wastefulness inside the model architecture. All these tests are simple single prompt experiments, so the LLM's get no chance to reason about their responses. They're just system 1 thinking, the equivalent of putting a gun to someone's head and asking them to solve a large division in 2 seconds.I bet a lot of these experiments would already solvable by putting the LLM in a simple loop with some helper prompts that make it restructure and validate its answers, form theories and get to explore multiple lines of thought.If an LLM would be able to do that in a single prompt, without a loop (so the LLM always answers in a predictable amount of time), then it would mean its entire reasoning structure is repeated horizontally through the layers of its architecture. That would be both limiting (i.e. limit the depth of the reasoning to the width of the network) and very expensive to train.

评论 #38476685 未加载

评论 #38478131 未加载

评论 #38478183 未加载

评论 #38476605 未加载

评论 #38478774 未加载

评论 #38483010 未加载

评论 #38476670 未加载

menssen超过 1 年前

I appreciate this paper for relatively clearly stating what "human-like" might entail, which in this case involves "reasoning about the causes behind other people's behavior" which is "critical to navigate the social world" as outlined in this citation:<a href="https://www.sciencedirect.com/science/article/abs/pii/S0010028520300633?via%3Dihub" rel="nofollow noreferrer">https://www.sciencedirect.com/science/article/abs/pii/S00100...</a>I get frustrated often when people argue "well, it isn't really intelligent" and then give examples that are clearly dependent on our brain's chemical state and our bodies' existence in-the-physical-world.I get the feeling that when/if we are all enslaved by a super-intelligent AI that we do not understand its motives, we will still argue that it is not intelligent because it doesn't get hungry and it can't prove to us that it has Qualia.This paper argues that gpts are bad at understanding human risk/reward functions, which seems like a much more explicit way to talk about this, and also casts it in a way that could help reframe the debate about how human evolution and our physical beings might be significantly responsible for the structure of our rational minds.

评论 #38475981 未加载

评论 #38481928 未加载

fredliu超过 1 年前

I have small kids, toddlers, who can already speak the language but still developing their "sense of the world" or "theory of mind" if you will. Maybe it's just me, but talking to toddlers often reminds me of interacting with LLMs, where you would have this realization from time to time "oh, they don't get this, need to break down more to explain". Of course LLM has more elaborate language skills due to its exposure to a lot more text (toddlers definitely can't speak like Shakespeare if you ask them, unless, maybe, you are the tiger parents that's been feeding them Romeo and Juliet since 1.), but their ability of "reasoning" and "understanding" seems to be on a similar level. Of course, the other "big" difference, is that you expect toddlers to "learn and grow" to eventually be able to understand and develop meta cognitive abilities, while LLMs, unless you retrain them (maybe with another architecture, or meta architecture), "stay the same".

评论 #38480033 未加载

评论 #38477350 未加载

joduplessis超过 1 年前

For me, the entire AGI conversation is hyperbolic / hype. How can we infer intelligence to something when we, ourselves, have such a poor (none) grasp of what makes us conscience? I'm associating intelligence with consciousness - because it seems correlated. Are we really ready to associate "AGI" with solving math problems ("new Q algo.")? That seems incredibly naive & reinforces my opinion that LLM's are much more like crypto, than actual progress.

评论 #38479215 未加载

评论 #38477423 未加载

评论 #38477692 未加载

评论 #38477821 未加载

评论 #38477690 未加载

评论 #38477841 未加载

评论 #38481176 未加载

33a超过 1 年前

Looking at their data and their experiments, I'd actually come to the opposite conclusion of the title. It's true that current LLMs are probably not quite at human level performance for these tasks, they're not that far off either and clearly we see as models increase in size and sophistication their performance on these tasks are improving.So it seems like maybe a better title would be "LLMs don't have as advanced a theory of mind as a human does... for now..."

评论 #38477325 未加载

hiddencost超过 1 年前

Another paper in a long series that confuses "our tests against currently available LLMs tuned for specific tasks found that they didn't perform well on our task" with "LLMs are architecturally unsuitable for our task".

评论 #38476667 未加载

评论 #38475920 未加载

评论 #38475460 未加载

deeviant超过 1 年前

> A chief goal of artificial intelligence is to build machines that think like people.I disagree with the topic sentence.The goal should not be to "build machines that think like people", but to build machines that think, period. The way humans think is unlikely to be the optimal way to go about thinking anyways.Instead of talking about thinking, we should be talking about function. Less philosophy and more reality. Can the system reason itself through various representative challenges as well as or better than human? If yes, it doesn't much matter how it does it. In fact, it's probably for the best if we can create AI that thinks completely different than humans, has no consciousness or self awareness, but still can do what humans can do and more.

评论 #38477535 未加载

评论 #38477018 未加载

评论 #38477629 未加载

评论 #38476848 未加载

fnordpiglet超过 1 年前

In Buddhism there’s the idea that our core self is awareness, which is silent - it doesn’t think in a perceptible way, it doesn’t feel in a visceral way, but it underpins thought and feeling, and is greatly impacted by it. A large part of meditation and “release of suffering” is learning to let your awareness lead your thinking rather than your thinking lead your awareness.To be clear, I think this is in fact a correct assessment of the architecture of intelligence. You can suspend thought and still function throughout your day in all ways. Discursive thought is entirely unnecessary, but it is often helpful for planning.My observation of LLMs in such a construction of intelligence is they are entirely the thinking mind - verbal, articulate, but unmoored. There is no, for lack of a better word, “soul,” or that internal awareness that underpins that discursive thinking mind. And because that underlying awareness is non articulate and not directly observable by our thinking and feeling mind, we really don’t understand it or have a science about it. To that end, it’s really hard to pin specifically what is missing in LLMs because we don’t really understand ourselves beyond our observable thinking and emotive minds.I look at what we are doing with LLMs and adjacent technologies and I wonder if this is sufficient, and building an AGI is perhaps not nearly as useful as we might think, if what we mean is build an awareness. Power tools of the thinking mind are amazingly powerful. Agency and awareness - to what end?And once we do build an awareness, can we continue to consider it a tool?

评论 #38477349 未加载

评论 #38476215 未加载

评论 #38476259 未加载

评论 #38476365 未加载

评论 #38478727 未加载

评论 #38476574 未加载

评论 #38476518 未加载

hilux超过 1 年前

> A chief goal of artificial intelligence is to build machines that think like people.Maybe that's their goal.But for many users of AI, the goal is to have easy and affordable access to a machine that, for some input (perhaps in a tightly constrained domain), gives us the output that we would expect from a high-functioning human being.When I use ChatGPT as a coding helper, I really don't care about its "theory of mind." And its insights are already as deep (actually more deep) as I get from most humans I ask for help. Real humans, not Don Knuth, who is unavailable to help me.

评论 #38481866 未加载

评论 #38479625 未加载

评论 #38479673 未加载

Barrin92超过 1 年前

No LLMs don't think like people, they're architecturally incapable of doing so. They have, physically unlike humans no access to their own internal state and they're, save for a small context window, static systems. They also have no insights. There's a hilarious video about LLM Jailbreaks by Karpathy[1] from a week ago, where he shows how you can break model responses by asking the same question with a base64 string, preceding the prompt with an image of a panda(???) or just random word salad.LLM's are basically a validation of Searle's Chinese room. What they've proven is that you can build functioning systems that perform intelligent tasks purely at the level of syntax. But there is no (or very little) understanding of semantics. If I ask a person on how to end the world, whether I ask in French or English or base64 or perform a 50 word incantation beforehand likely does not matter. (unless of course the human is also just parroting an answer)[1] <a href="https://youtu.be/zjkBMFhNj_g?t=2974" rel="nofollow noreferrer">https://youtu.be/zjkBMFhNj_g?t=2974</a>

评论 #38477465 未加载

评论 #38477559 未加载

评论 #38480648 未加载

评论 #38478303 未加载

melenaboija超过 1 年前

Few weeks ago I did an experiment after a discussion here about LLMs and chess.Basically inventing a board game and play against ChatGPT and see what happened. It was not able to do a single move, even having provided all the possible start moves in the prompt as part of the rules.Not that I had a lot of hope about it, but it was definitely way worst than I expected.If someone wants to take a look at it:<a href="https://joseprupi.github.io/misc/2023/06/08/chat_gpt_board_game.html" rel="nofollow noreferrer">https://joseprupi.github.io/misc/2023/06/08/chat_gpt_board_g...</a>

评论 #38477178 未加载

评论 #38477522 未加载

评论 #38477273 未加载

resters超过 1 年前

Here's my theory:Consider a typical LLM token vector used to train and interact with an LLM.Now imagine that other aspects of being human (sensory input, emotional input, physical body sensation, gut feelings, etc.) could be added as metadata to the the token stream, along with some kind of attention function that amplified or diminished the importance of those at any given time period -- all still represented as a stream of tokens.If an LLM could be trained on input that was enriched by all of the above kind of data, then quite likely the output would feel much more human than the responses we get from LLMs.Humans are moody, we get headaches, we feel drawn to or repulsed by others, we brood and ruminate at times, we find ourselves wanting to impress some people, some topics make us feel alive while others make us feel bored.Human intelligence is always colored by the human experience of obtaining it. Obviously we don't obtain it by getting trained on terabytes of data all at once disconnected from bodily experience.Seemingly we could simulate a "body" and provide that as real time token metadata for an LLM to incorporate, and we might get more moodiness, nostalgia, ambition, etc.Asking for a theory of mind is in fact committing the Cartesian error of making a mind/body distinction. What is missing with LLMs is a theory of mindbody... similarity to spacetime is not accidental as humans often fail to unify concepts at first.LLMs are simply time series predictors that can handle massive numbers of parameters in a way that allows them to generate corresponding sequences of tokens that (when mapped back into words) we judge as humanlike or intelligence-like, but those are simply patterns of logic that come from word order, which is closely related in human languages to semantics.It's silly to think that we humans are not abstractly representable as a probabilistic time series prediction of information. What isn't?

评论 #38479034 未加载

theptip超过 1 年前

This is a terrible eval. Do not update your beliefs on whether LLMs have Theory of Mind based on this paper.The eval is a weird, noisy visual task (picture of astronaut with “care packages”). Their results are hopelessly narrow.A better eval is to use actual scientifically tested psychology test on text (the native and strongest domain for LLMs), for example the sort of scenarios used to gauge when children develop theory of mind (“Alice puts her keys on the table then leaves the room. Bob moves the keys to the drawer. Alice returns. Where does she think the keys are?”) which GPT-4 can handle easily; it is very clear from this that GPT has a theory of mind.A negative result doesn’t disprove capabilities; it could easily show your eval is garbage. Showing a robust positive capability is a more robust result.

评论 #38478640 未加载

评论 #38478642 未加载

stuckinhell超过 1 年前

Do humans have that as well ? I read studies that suggest we make up consciousness a half second after something happened.

评论 #38475367 未加载

评论 #38475439 未加载

JonChesterfield超过 1 年前

The fun question is whether human cognition similarly lacks deep insights or said theory of mind.I perceive a moving of the goalposts as machine intelligence improves. Once we'd have been happy with smarter than an especially stupid person, now I think we're aiming at smarter than the smartest person.

评论 #38476274 未加载

评论 #38475464 未加载

评论 #38477314 未加载

评论 #38475619 未加载

Animats超过 1 年前

Not yet, no. The real question is whether a bigger version of the current technology will have deeper insights. That question should be answered within the next year, with the amount of money and GPU hardware being thrown at the problem.

ehsanu1超过 1 年前

Has the title of the paper changed from what it was initially? It says "Have we built machines that think like people?" now, whereas the HN title is "Large language models lack deep insights or a theory of mind".

mdp2021超过 1 年前

> A chief goal of artificial intelligence [would be] to build machines that think like people"A chief goal of levers (cranes, etc.) engineering would be to build devices that lift like people"

评论 #38478372 未加载

natch超过 1 年前

“vision-based” large language models.Odd restriction. Why not investigate text-based ones?Or is “vision-based” a technical term that encompasses models that were trained on text?

rf15超过 1 年前

I work in the field. It's just not how text-token-based autoregressive models can ever work. I can't talk about my work of course, but even a quick glance on Wikipedia can tell you they'd need to be at least a symbolic hybrid, which is not being pursued(?) by the big players at the time.

aaroninsf超过 1 年前

It is refreshing that the author's language expresses their findings as indicative of domains for attention and presumed improvement, rather than (as so is often the case, per Ximm's Law) making pronouncements which preclude such improvement!

bimguy超过 1 年前

"Large language models lack deep insights or a theory of mind"Funnily enough, this statement also applies to people that are scared of AI.Maybe a bit off topic but does anyone else have that friend who sends them fear mongering AI videos with captions like "shocking AI" that are blatantly unimpressive or completely fake?What is the best way to subdue this kind of fear in a friend, sending them written articles from high level researchers like Brooks does not work.

gumballindie超过 1 年前

I dont know what’s worse. The fact that there are people who believe procedural text generators have insights and a theory of mind or the fact that we are taking them seriously and we need to publish papers to disprove their insanity.

huijzer超过 1 年前

EDIT: Nevermind

评论 #38475565 未加载

curiousgal超过 1 年前

No shit Sherlock!

verytrivial超过 1 年前

I was having a drunken discussion with the philosophy lecturer a few weeks back. He was making a very similar point. I kept saying it does it really matter? Lacking a theory of mind and deep insights describes 90% of all perfectly normal people. And perhaps training will be able to "fake it" (he went off on bold tangents about the definitions of this and that), or the language model will be an adjunct to some other model which does have these insights encoded or deducible, much like the human mind does. He wasn't convinced and I was too drunk. But it was basically feeling like: You can't feed carrots to a car like you can a horse, therefore cars are worthless.

评论 #38479762 未加载

评论 #38479667 未加载