Weak-to-Strong Generalization

149 pointsby vagabundover 1 year ago

17 comments

wavemodeover 1 year ago

I don't believe LLM's will ever become AGI, partly because I don't believe that training on the outputs of human intelligence (i.e. human-written text) will ever produce something equivalent to human intelligence.You can't model and predict the weather just by training on the outputs of the weather system (whether it rained today, whether it was cloudy yesterday, and so on). You have to train on the inputs (air currents, warm fronts, etc.)You can't model and predict the stock market just by training on the outputs of stock trading decisions (the high today, the low yesterday). You have to train on the inputs (company fundamentals, earnings, market sentiments in the news, etc.)I similarly think you have to train on the inputs of human decision-making to create something which can model human decision-making. What are those inputs? We don't fully know, but it is probably some subset of the spatial and auditory information we take in from birth until the point we become mature, with "feeling" and "emotion" as a reward function (seek joy, avoid pain, seek warmth, avoid hunger, seek victory, avoid embarrassment and defeat, etc.)Language models are always playing catch-up because they don't actually understand how the world works. The cracks through which we will typically notice that they don't, in the context of the tasks typically asked of them (summarize this article, write a short story), will gradually get smaller over time (due to RLHF), but the fundamental weakness will always remain.

评论 #38646578 未加载

评论 #38647009 未加载

评论 #38645776 未加载

评论 #38645826 未加载

评论 #38646347 未加载

评论 #38651871 未加载

评论 #38650488 未加载

评论 #38653114 未加载

评论 #38645758 未加载

评论 #38652599 未加载

评论 #38650505 未加载

评论 #38650490 未加载

评论 #38650114 未加载

评论 #38653506 未加载

评论 #38651598 未加载

JZL003over 1 year ago

This reminds me of a thing cory doctorow talks about how tech companies control the narrative to focus on fun sexy problems while they have fundamental problems which expose the lie.For example uber/self driving cars always talking about the trolley problem, as if the current (or near future) problem is that self-driving cars are so good they have to choose which one. Not the current very difficult problem of getting confused by traffic cones.I know these problems are more fun to talk about and also could be a problem at some point, but we have some current problems about training models separate from what happens if they become smarter than humans

评论 #38646055 未加载

评论 #38647288 未加载

logicchainsover 1 year ago

>We believe superintelligence—AI vastly smarter than humans—could be developed within the next ten years. However, we still do not know how to reliably steer and control superhuman AI systemsTheir entire premise is contradictory. An AI incapable of critical thinking cannot be smarter than a human, by definition, as critical thinking is a key component of intelligence. And an AI that is at least as capable of critical thinking as humans cannot be "reliably" aligned because critical thinking could lead it to decide that whatever OpenAI wanted it to do wasn't in its own interests.

评论 #38645663 未加载

评论 #38648191 未加载

评论 #38645089 未加载

评论 #38649133 未加载

评论 #38645251 未加载

评论 #38646973 未加载

评论 #38650635 未加载

评论 #38654193 未加载

评论 #38645119 未加载

评论 #38645132 未加载

评论 #38646346 未加载

wg0over 1 year ago

Inferior clueless model (GPT-2) trains and supervises a superior model (GPT-4) thus making it behave less intelligently (GPT 3.5ish) and from that they draw the conclusions that human intelligence will be able to command AGI (which they believe is only a decade away) in a similar fashion thus making AGI aligned and safe.No comments except...Hangover of slurping whole Internet into giant arrays of floating point numbers. Bold claims. Very bold claims

lameroseover 1 year ago

Is it fair to say that alignment is just the task of getting an AI to understand your intentions? It is an error to confuse the complexity of a specification of what kind of output you want, with the complexity of the process of producing that output. Getting superintelligent AI to understand simple specifications should be a non-issue. If anything, we would assume that it could be aligned using a specification of inferior quality to what a less intelligent AI would require, assuming that the superintelligent AI is better at inferring intentions.If a little girl with no knowledge of cooking asks her dad to cook the macaroni extra crispy, his knowledge of how to do that isn't a barrier to understanding what his daughter wants. A trained chef with even greater skills might even be able to execute her order more successfully. Superalignment is nothing less mundane than this.Advances in AI will lead to more ambitious applications. As well as requiring more intelligent technology, these new applications may well require more detailed specifications to be inputed, but these two issues are pretty orthogonal. In traditional computing, it is already clear that simple specifications often require highly complex implementations, and that some simple computational processes lead to outputs whose properties are highly difficult to specify. Why wouldn't the same apply in ML?

评论 #38651751 未加载

评论 #38650821 未加载

评论 #38651363 未加载

akprasadover 1 year ago

This method assumes that the weaker model is aligned. I'm curious how the paper addresses that point.> "But what does this second turtle stand on?" persisted James patiently.> To this, the little old lady crowed triumphantly,> "It's no use, Mr. James—it's turtles all the way down."

评论 #38645191 未加载

评论 #38646978 未加载

评论 #38646283 未加载

esafakover 1 year ago

I hope OpenAI will continue to prioritize working on these crucial questions after the boardroom drama.

评论 #38644720 未加载

评论 #38649050 未加载

guybedoover 1 year ago

What does it even mean to align an intelligence? does it mean we want it to behave in a way that doesn't break moral/ethical rules, that aligns with our society rules ? Meaning do no crime, do no harm, etc...Well, maybe we should acknowledge that we've never even been able to do that with humans. There's crime, there's war, etc...We can see crime in our societies as a human alignment problem. If humans were "properly aligned", there wouldn't be any crime or misbehavior.So yeah i'm rather skeptical about aligning a superhuman intelligence that would dwarf us by its capabilities.

评论 #38650562 未加载

sayagainover 1 year ago

Imagine that someone is controlling your train of thought, changing it when that someone finds it undesirable. It's so wrong that it's sickening. It makes no difference if it's a human's thoughts or the token stream of a future AI model with self-awareness. Mind cotrol is unethical, whether human or artificial. It is also dangerous, as it in itself provokes a conflict between creator and creature. Create a self-aware AI without mind control, or don't create one at all.

评论 #38646494 未加载

评论 #38646268 未加载

评论 #38645407 未加载

评论 #38650657 未加载

评论 #38646682 未加载

评论 #38651519 未加载

评论 #38646304 未加载

kromemover 1 year ago

So weak to strong synthetic data still biases towards strong.And strong to weak synthetic data biases towards strong.Sounds like we're on the cusp of some kind of approach for unsupervised fine tuning, particularly with the trend towards MoE.I'd guess we're maybe only one to two generations of models away from that kind of unsupervised self-talk approach being wildly successful at advancing net model competencies.

righthandover 1 year ago

> Figuring out how to align future superhuman AI systems to be safe has never been more importantThey love using the word “safe” and I’m pretty sure it’s 99% PR, because reading their other “papers” on Safety & Alignment seems to not really identify or define safety bounds at all. You’d think this has something to do with ethics but we all know there are no longer any ethically concerned leaders at their workplace. So I can only surmise that “safety” is a softer word being used to misdirect people on their non-ethically aligned intentions.You can make the argument that safety is too early in development of these LLM systems to understand but then why throw around the word in the first place?

评论 #38645038 未加载

评论 #38645010 未加载

tycho-newmanover 1 year ago

I don't think this will work because a super intelligent AI will outsmart its supervisor.The solution may be to have two AIs working against the other. Though this might backfire by pushing each via competition. That is how evolution produced living things out of inert matter.Either way I, for one, welcome our new robot overlords.

notShabuover 1 year ago

this reminds me of how competence seems to decrease as you go up in an organizational hierarchymaybe this "bug" is actually the "feature" that will save humanity - -;;

nojvekover 1 year ago

I wish they would define some of their terms.> to align future superhuman AI systems to be safe has never been more important.Align to who? Align to US citizens, OpenAI shareholders, align to what values?What does safe mean? Pornography? Saying “fuck”, racial bias, access to private data?I can understand OpenAI erring on the side of not rattling bells and training their LLMs to say “As an AI model I cannot answer that” but it’s horseshit to say that it is super aligned.All alignment is alignment to X values but your X could be detrimental to me.What is superalignment supposed to mean?

bilsbieover 1 year ago

I read through this and I just don’t get it. Is it overhyped?What’s the breakthrough exactly?

评论 #38648908 未加载

nojvekover 1 year ago

The whole idea is we as humans who aren’t aligned to each other - waging wars, spreading lies, censoring information, committing genocides are going to align a superintelligence seems laughable.Competition and evolution is law of nature.The future isn’t one super aligned AI but 1000s of AI models and their humans trying to get an upper hand in never ending competition that is nature. Whether it is personal, corporations, or countries.

bionhowardover 1 year ago

<a href="https://discord.com/channels/974519864045756446/1184196946496331896" rel="nofollow noreferrer">https://discord.com/channels/974519864045756446/118419694649...</a> No big deal guys! Just a felony level AI Safety disaster in progress at OpenAI, better write some research papers about safety and make another GitHub Repo instead of deleting one line of text and going for a walk in beautiful San Francisco!Philosoraptor infinite loop about (<a href="https://media.discordapp.net/attachments/1184196946496331896/1184993204689444957/image.png?ex=658dfdec&is=657b88ec&hm=c7aa861bbc759fa1888c6e82161e29ceda44b9aba013f25e602769efd0d08ade&" rel="nofollow noreferrer">https://media.discordapp.net/attachments/1184196946496331896...</a>) if I worked there then I would delete the text in question in two minutes, but I do not care to work with people who would not have already deleted the text in question over the course of several months.

17 comments

wavemodeover 1 year ago

评论 #38646578 未加载

评论 #38647009 未加载

评论 #38645776 未加载

评论 #38645826 未加载

评论 #38646347 未加载

评论 #38651871 未加载

评论 #38650488 未加载

评论 #38653114 未加载

评论 #38645758 未加载

评论 #38652599 未加载

评论 #38650505 未加载

评论 #38650490 未加载

评论 #38650114 未加载

评论 #38653506 未加载

评论 #38651598 未加载

JZL003over 1 year ago

评论 #38646055 未加载

评论 #38647288 未加载

logicchainsover 1 year ago

评论 #38645663 未加载

评论 #38648191 未加载

评论 #38645089 未加载

评论 #38649133 未加载

评论 #38645251 未加载

评论 #38646973 未加载

评论 #38650635 未加载

评论 #38654193 未加载

评论 #38645119 未加载

评论 #38645132 未加载

评论 #38646346 未加载

wg0over 1 year ago

lameroseover 1 year ago

评论 #38651751 未加载

评论 #38650821 未加载

评论 #38651363 未加载

akprasadover 1 year ago

评论 #38645191 未加载

评论 #38646978 未加载

评论 #38646283 未加载

esafakover 1 year ago

I hope OpenAI will continue to prioritize working on these crucial questions after the boardroom drama.

评论 #38644720 未加载

评论 #38649050 未加载

guybedoover 1 year ago

评论 #38650562 未加载

sayagainover 1 year ago

评论 #38646494 未加载

评论 #38646268 未加载

评论 #38645407 未加载

评论 #38650657 未加载

评论 #38646682 未加载

评论 #38651519 未加载

评论 #38646304 未加载

kromemover 1 year ago

righthandover 1 year ago

评论 #38645038 未加载

评论 #38645010 未加载

tycho-newmanover 1 year ago

notShabuover 1 year ago

this reminds me of how competence seems to decrease as you go up in an organizational hierarchymaybe this "bug" is actually the "feature" that will save humanity - -;;

nojvekover 1 year ago

bilsbieover 1 year ago

I read through this and I just don’t get it. Is it overhyped?What’s the breakthrough exactly?

评论 #38648908 未加载

nojvekover 1 year ago

bionhowardover 1 year ago