Let's try to understand AI monosemanticity

350 点作者 bananaflag超过 1 年前

25 条评论

lukev超过 1 年前

There's actually a somewhat reasonable analogy to human cognitive processes here, I think, in the sense that humans tend to form concepts defined by their connectivity to other concepts (c.f. Ferdinand de Saussure & structuralism).Human brains are also a "black box" in the sense that you can't scan/dissect one to build a concept graph.Neural nets do seem to have some sort of emergent structural concept graph, in the case of LLMs it's largely informed by human language (because that's what they're trained on.) To an extent, we can observe this empirically through their output even if the first principles are opaque.

评论 #38440131 未加载

评论 #38440858 未加载

_as_text超过 1 年前

I just skimmed through it for now, but it has seemed kinda natural to me for a few months now that there would be a deep connection between neural networks and differential or algebraic geometry.Each ReLU layer is just a (quasi-)linear transformation, and a pass through two layers is basically also a linear transformation. If you say you want some piece of information to stay (numerically) intact as it passes through the network, you say you want that piece of information to be processed in the same way in each layer. The groups of linear transformations that "all process information in the same way, and their compositions do, as well" are basically the Lie groups. Anyone else ever had this thought?I imagine if nothing catastrophic happens we'll have a really beautiful theory of all this someday, which I won't create, but maybe I'll be able to understand it after a lot of hard work.

评论 #38439953 未加载

评论 #38440453 未加载

评论 #38440787 未加载

评论 #38439824 未加载

评论 #38445441 未加载

评论 #38441604 未加载

dmd超过 1 年前

The idea of a smaller NN outperforming what you might think it could do by simulating a larger one reminds me of something I read about Portia spiders (going on a deep dive into them after reading the excellent 'Children of Time' by Adrian Tchaikovsky). The idea is that they're able to do things with their handful - on the order of tens of thousands - of neurons that you'd think would require 4 or 5 orders of magnitude more, by basically time-sharing them; do some computation, store it somehow, then reuse the same neurons in a totally different way.

DonsDiscountGas超过 1 年前

Isn't this also(just?) a description of how high-dimensional embedding spaces work? Putting every kind of concept all in the same space is going to lead to some weird stuff. Different regions of the latent space will cover different concepts, with very uneven volumes, and local distances will generally be meaningful (red vs green) but long-distances won't (red vs. ennui).I guess we could also look at it the other way; embedding spaces work this way because the underlying neurons work this way.

评论 #38444780 未加载

erikerikson超过 1 年前

Before finishing my read, I need to register an objection to the opening which reads to me so as to imply it is the only means:> Researchers simulate a weird type of pseudo-neural-tissue, “reward” it a little every time it becomes a little more like the AI they want, and eventually it becomes the AI they want.This isn't the only way. Back propagation is a hack around the oversimplification of neural models. By adding a sense of location into the network, you get linearly inseparable functions learned just fine.Hopfield networks with Hebbian learning are sufficient and are implemented by the existing proofs of concept we have.

评论 #38439999 未加载

评论 #38441387 未加载

评论 #38441646 未加载

gmuslera超过 1 年前

At least the first part reminded me of Hyperion and how AIs evolved there (I think the actual explanation is in The Fall of Hyperion), smaller but more interconnected "code".Not sure about actual implementation, but at least for us concepts or words are not pure nor isolated, they have multiple meanings that collapse into specific ones as you put several together

error9348超过 1 年前

> No one knows how it works. Researchers simulate a weird type of pseudo-neural-tissue, “reward” it a little every time it becomes a little more like the AI they want, and eventually it becomes the AI they want.There is a distinction to be made in "knowing how it works" on architecture vs weights themselves.

nl超过 1 年前

Personally I find the original paper much better written and easier to understand: <a href="https://transformer-circuits.pub/2023/monosemantic-features/index.html" rel="nofollow noreferrer">https://transformer-circuits.pub/2023/monosemantic-features/...</a>

评论 #38443309 未加载

turtleyacht超过 1 年前

By the same token, thinking in memes all the time may be a form of impoverished cognition.Or, is it enhanced cognition, on the part of the interpreter having to unpack much from little?

评论 #38439196 未加载

评论 #38441431 未加载

评论 #38439062 未加载

shermantanktop超过 1 年前

As described in the post, this seems quite analogous to the operation of a bloom filter, except each "bit" is more than a single bit's worth of information, and the match detection has to do some thresholding/ranking to select a winner.That said, the post is itself clearly summarizing much more technical work, so my analogy is resting on shaky ground.

daveguy超过 1 年前

All this anthropomorphizing of activation networks strikes me as very odd. None of these neurons "want" to do anything. They respond to specific input. Maybe humans are the same, but in the case of artificial neural networks we at least know it's a simple mathematical function. Also, an artificial neuron is nothing like a biological neuron. At the most basic -- artificial neurons don't "fire" except in direct response to inputs. Biological neurons fire because of their internal state, state which is modified by biological signaling chemicals. It's like comparing apples to gorillas.

评论 #38440842 未加载

评论 #38441389 未加载

评论 #38440203 未加载

评论 #38440134 未加载

评论 #38444947 未加载

评论 #38440103 未加载

评论 #38440961 未加载

评论 #38439775 未加载

评论 #38440626 未加载

s1gnp0st超过 1 年前

> Shouldn’t the AI be keeping the concept of God, Almighty Creator and Lord of the Universe, separate from God-This seems wrong. God-zilla is using the concept of God as a superlative modifier. I would expect a neuron involved in the concept of godhood to activate whenever any metaphorical "god-of-X" concept is being used.

评论 #38439350 未加载

csours超过 1 年前

I feel like we're a few more paradigm shifts away from self-driving cars, and this is one of them - being able to actually understand neural nets and modify them in a constructive way more directly - aka engineering.Some more:<pre><code> cheaper sensors (happening now) better sensor integration (happening now, kind of) better tools for ml grokking and intermediate engineering (this article, kind of) better tools for layering ml (probably the same thing as above) a new model for insurance/responsibility/something like this (unsure) better communication with people inside and outside the car (barely on the radar)</code></pre>

Nelkins超过 1 年前

This reminds of research done on category-specific semantic deficits, where there can be neurodegeneration that impacts highly specific or related knowledge (for example, brain trauma that affects a person's ability to understand living things like zebras and carrots, but not non-living things like helicopters or pliers).<a href="https://academic.oup.com/brain/article/130/4/1127/278057" rel="nofollow noreferrer">https://academic.oup.com/brain/article/130/4/1127/278057</a>

kevdozer1超过 1 年前

such a wonderful article, really enjoyed reading it

Merrill超过 1 年前

When LLMs are trained on text, are the words annotated to indicate the semantic meaning, or is the LLM training process expected to disambiguate the possibly hundreds of semantic meanings of an individual common word such as "run"?

评论 #38440719 未加载

pas超过 1 年前

see <a href="https://www.nature.com/articles/nature12160" rel="nofollow noreferrer">https://www.nature.com/articles/nature12160</a> "mixed selectivity neurons"

adamnemecek超过 1 年前

Superposition makes sense when you understand all of ML as a convolution.

评论 #38441675 未加载

chrissnow2023超过 1 年前

COOL~ interpreting a big AI with a bigger is like interpreting 42 with Earth~

drc500free超过 1 年前

The original is quite long, but quite interesting.[1] Reading it makes me feel like I did reading A Brief History of Time as a middle schooler - concepts that are mainly just out of reach, with a few flashes that I actually understand.One particularly interesting topic is the "theories of superposition" section, which gets into how LLMs categorize concepts. Are concepts all distinct or indistinct? Are they independent or do they cluster? It seems that the answer is all of the above.This ties into linguistic theories of categorization[2] that I saw referenced in (of all places) a book about the partition of Judaeo-Christianity in the first centuries CE.Some categories have hard lines - something is a "bird" or it is not. Some categories have soft lines - like someone being "tall." Some categories work on prototypes, making them have different intensities within the space - A sparrow, swallow, or robin is more "birdy" than a chicken, emu, or turkey. Apparently Wittgenstein was the first to really explore with Family Resemblances that a category might not have hard boundaries, according to people who study these things.[3] These sorts of "manifolds" seem to appear, where some concepts are not just distinct points that are or aren't.It's exciting to see that LLMs may give us insights into how our brains store concepts. I've heard people criticize them as "just predicting the next most likely token," but I've found myself lost when speaking in the middle of a garden path sentence many times. I don't know how a sentence will end before I start saying it, and it's certainly plausible that LLMs actually do match they way we speak.Probably the most exciting piece is seeing how close they seem to get to mimicking how we communicate and think, while being fully limited to language with no other modeling behind it - no concept of the physical world, no understanding of counting or math, just words. It's clear when you scratch the surface that LLM outputs are bullshit with no thought underneath them, but it's amazing how much is covered by linking concepts with no logic other than how you've heard them linked before.[1] <a href="https://transformer-circuits.pub/2023/monosemantic-features/index.html" rel="nofollow noreferrer">https://transformer-circuits.pub/2023/monosemantic-features/...</a>[2] <a href="https://www.sciencedirect.com/science/article/abs/pii/001002857690013X" rel="nofollow noreferrer">https://www.sciencedirect.com/science/article/abs/pii/001002...</a>[3] <a href="https://en.wikipedia.org/wiki/Family_resemblance" rel="nofollow noreferrer">https://en.wikipedia.org/wiki/Family_resemblance</a>

asylteltine超过 1 年前

I’ve never understood the hidden layers argument. Ultimately these models are executing code. You can examine the code. Why can’t that be done?

评论 #38440452 未加载

评论 #38440205 未加载

评论 #38440565 未加载

评论 #38441503 未加载

评论 #38440750 未加载

brindlejim超过 1 年前

This is egregiously false. ML models are math and code. They are not tissue; there are no neurons in an ML model. None. The analogy is false. And it is used here to make you think that ML models are more human than they are, or more superhuman. There is no "bigger AI waiting to get out". Shoggoth anyone? These are scare tactics from hyperventilating EAs (both Scott and the Anthropic team).

评论 #38448332 未加载

keithalewis超过 1 年前

[disclaimer: after talking to many people much smarter than me, I might, just barely, sort of understand this. Any mistakes below are my own]Thanks for the heads up! AI is better at making up stories than humans already. Hodl my beer while I go buy some BSitCoins.

lngnmn2超过 1 年前

Every training set will produce a different set of weights, even the same training set with produce different weights with different initialization, leave alone slightly different architectures.So what exactly is the point, except "look at us, we are so clever"?

评论 #38495139 未加载

ggm超过 1 年前

As long as you read it with the skeptics "yes, but it's not intelligence" its a good read.It's when you read it with the "at last, I can understand how reasoning and inference with meaning is going to emerge from this" you have a problem.It's a great read but what you bring to it, informs what you take from it.