I don't believe LLM's will ever become AGI, partly because I don't believe that training on the outputs of human intelligence (i.e. human-written text) will ever produce something equivalent to human intelligence.<p>You can't model and predict the weather just by training on the outputs of the weather system (whether it rained today, whether it was cloudy yesterday, and so on). You have to train on the inputs (air currents, warm fronts, etc.)<p>You can't model and predict the stock market just by training on the outputs of stock trading decisions (the high today, the low yesterday). You have to train on the inputs (company fundamentals, earnings, market sentiments in the news, etc.)<p>I similarly think you have to train on the inputs of human decision-making to create something which can model human decision-making. What are those inputs? We don't fully know, but it is probably some subset of the spatial and auditory information we take in from birth until the point we become mature, with "feeling" and "emotion" as a reward function (seek joy, avoid pain, seek warmth, avoid hunger, seek victory, avoid embarrassment and defeat, etc.)<p>Language models are always playing catch-up because they don't actually understand how the world works. The cracks through which we will typically notice that they don't, in the context of the tasks typically asked of them (summarize this article, write a short story), will gradually get smaller over time (due to RLHF), but the fundamental weakness will always remain.
This reminds me of a thing cory doctorow talks about how tech companies control the narrative to focus on fun sexy problems while they have fundamental problems which expose the lie.<p>For example uber/self driving cars always talking about the trolley problem, as if the current (or near future) problem is that self-driving cars are so good they have to choose which one. Not the current very difficult problem of getting confused by traffic cones.<p>I know these problems are more fun to talk about and also could be a problem at some point, but we have some current problems about training models separate from what happens if they become smarter than humans
>We believe superintelligence—AI vastly smarter than humans—could be developed within the next ten years. However, we still do not know how to reliably steer and control superhuman AI systems<p>Their entire premise is contradictory. An AI incapable of critical thinking cannot be smarter than a human, by definition, as critical thinking is a key component of intelligence. And an AI that is at least as capable of critical thinking as humans cannot be "reliably" aligned because critical thinking could lead it to decide that whatever OpenAI wanted it to do wasn't in its own interests.
Inferior clueless model (GPT-2) trains and supervises a superior model (GPT-4) thus making it behave less intelligently (GPT 3.5ish) and from that they draw the conclusions that human intelligence will be able to command AGI (which they believe is only a decade away) in a similar fashion thus making AGI aligned and safe.<p>No comments except...<p>Hangover of slurping whole Internet into giant arrays of floating point numbers. Bold claims. Very bold claims
Is it fair to say that alignment is just the task of getting an AI to understand your intentions? It is an error to confuse the complexity of a specification of what kind of output you want, with the complexity of the process of producing that output. Getting superintelligent AI to understand simple specifications should be a non-issue. If anything, we would assume that it could be aligned using a specification of inferior quality to what a less intelligent AI would require, assuming that the superintelligent AI is better at inferring intentions.<p>If a little girl with no knowledge of cooking asks her dad to cook the macaroni extra crispy, his knowledge of how to do that isn't a barrier to understanding what his daughter wants. A trained chef with even greater skills might even be able to execute her order more successfully. Superalignment is nothing less mundane than this.<p>Advances in AI will lead to more ambitious applications. As well as requiring more intelligent technology, these new applications may well require more detailed specifications to be inputed, but these two issues are pretty orthogonal. In traditional computing, it is already clear that simple specifications often require highly complex implementations, and that some simple computational processes lead to outputs whose properties are highly difficult to specify. Why wouldn't the same apply in ML?
This method assumes that the weaker model is aligned. I'm curious how the paper addresses that point.<p>> "But what does this second turtle stand on?" persisted James patiently.<p>> To this, the little old lady crowed triumphantly,<p>> "It's no use, Mr. James—it's turtles all the way down."
What does it even mean to align an intelligence? does it mean we want it to behave in a way that doesn't break moral/ethical rules, that aligns with our society rules ? Meaning do no crime, do no harm, etc...<p>Well, maybe we should acknowledge that we've never even been able to do that with humans. There's crime, there's war, etc...<p>We can see crime in our societies as a human alignment problem. If humans were "properly aligned", there wouldn't be any crime or misbehavior.<p>So yeah i'm rather skeptical about aligning a superhuman intelligence that would dwarf us by its capabilities.
Imagine that someone is controlling your train of thought, changing it when that someone finds it undesirable. It's so wrong that it's sickening. It makes no difference if it's a human's thoughts or the token stream of a future AI model with self-awareness. Mind cotrol is unethical, whether human or artificial. It is also dangerous, as it in itself provokes a conflict between creator and creature. Create a self-aware AI without mind control, or don't create one at all.
So weak to strong synthetic data still biases towards strong.<p>And strong to weak synthetic data biases towards strong.<p>Sounds like we're on the cusp of some kind of approach for unsupervised fine tuning, particularly with the trend towards MoE.<p>I'd guess we're maybe only one to two generations of models away from that kind of unsupervised self-talk approach being wildly successful at advancing net model competencies.
> Figuring out how to align future superhuman AI systems to be safe has never been more important<p>They love using the word “safe” and I’m pretty sure it’s 99% PR, because reading their other “papers” on Safety & Alignment seems to not really identify or define safety bounds at all. You’d think this has something to do with ethics but we all know there are no longer any ethically concerned leaders at their workplace. So I can only surmise that “safety” is a softer word being used to misdirect people on their non-ethically aligned intentions.<p>You can make the argument that safety is too early in development of these LLM systems to understand but then why throw around the word in the first place?
I don't think this will work because a super intelligent AI will outsmart its supervisor.<p>The solution may be to have two AIs working against the other. Though this might backfire by pushing each via competition. That is how evolution produced living things out of inert matter.<p>Either way I, for one, welcome our new robot overlords.
this reminds me of how competence seems to decrease as you go up in an organizational hierarchy<p>maybe this "bug" is actually the "feature" that will save humanity - -;;
I wish they would define some of their terms.<p>> to align future superhuman AI systems to be safe has never been more important.<p>Align to who? Align to US citizens, OpenAI shareholders, align to what values?<p>What does safe mean? Pornography? Saying “fuck”, racial bias, access to private data?<p>I can understand OpenAI erring on the side of not rattling bells and training their LLMs to say “As an AI model I cannot answer that” but it’s horseshit to say that it is super aligned.<p>All alignment is alignment to X values but your X could be detrimental to me.<p>What is superalignment supposed to mean?
The whole idea is we as humans who aren’t aligned to each other - waging wars, spreading lies, censoring information, committing genocides are going to align a superintelligence seems laughable.<p>Competition and evolution is law of nature.<p>The future isn’t one super aligned AI but 1000s of AI models and their humans trying to get an upper hand in never ending competition that is nature. Whether it is personal, corporations, or countries.
<a href="https://discord.com/channels/974519864045756446/1184196946496331896" rel="nofollow noreferrer">https://discord.com/channels/974519864045756446/118419694649...</a> No big deal guys! Just a felony level AI Safety disaster in progress at OpenAI, better write some research papers about safety and make another GitHub Repo instead of deleting one line of text and going for a walk in beautiful San Francisco!<p>Philosoraptor infinite loop about (<a href="https://media.discordapp.net/attachments/1184196946496331896/1184993204689444957/image.png?ex=658dfdec&is=657b88ec&hm=c7aa861bbc759fa1888c6e82161e29ceda44b9aba013f25e602769efd0d08ade&" rel="nofollow noreferrer">https://media.discordapp.net/attachments/1184196946496331896...</a>) if I worked there then I would delete the text in question in two minutes, but I do not care to work with people who would not have already deleted the text in question over the course of several months.