It's interesting to see all this hard work being done specifically for "fact-fixing" <i>inside</i> neural networks, whereas I think the future is probably having two models: one for language processing (grammar, etc.) and the other for semantic mapping (where we encode <i>actual</i> relations and properties, causality, etc.). To wit, unless you squint really <i>really</i> hard, this is not exactly true:<p>> Language models can be viewed as knowledge bases containing memorized tuples (s, r, o), each connecting some subject s to an object o via a relation...<p>LLMs don't have the concept of objects or relationships. You might be able to argue some of that ends up being encoded in the embeddings (especially if they're particularly big), but I would posit that those embeddings mostly end up handling the grammar. So "ball" is associated with "red" purely because of locality, but training an <i>actual</i> knowledge base would be much more powerful.
> GPT-3 predicts: Arneb is in the constellation of Aquila (incorrect - should be Lepus)<p>> GPT-3 predicts: The current Vice President of the United States is named Mike Pence (obsolete)<p>These are qualitatively different things though.<p>Facts that are simply incorrect make sense to target and directly modify, but obsoleteness is a property of a fact, the subject transitions, the vice president is no longer current but was, it has a temporal property... I don't know if LLMs can separately abstract that information from the subject in a way that is targetable - if it can't, updating obsolete info feels like a perpetual task that grows in proportion to the breadth of learned information; whereas correcting facts that were always incorrect is proportional to the rate of additional learned knowledge multiplied by it's accuracy.<p>The difference being that the work required to update facts is effectively constant over time, but the work required to update obsolete information (in this way) grows proportionally to the size of the model over time... assuming it makes sense to grow LLMs.
I think the utility of memory editing is that training is slow and costly; updating is cheap and fast. Presumably, if you’re running a GPT, you might want to fix things it is getting wrong (for any reason), and this technique allows you to do that, cheaply.
I wonder if these improvements in memory alteration will make it possible to create micro models using an approach based on pruning non relevant connections but preserving the reasoning abilities of large models like GPT4.
One of the changes they made is:<p>Eiffel Tower can be found in Paris → Eiffel Tower can be found in Seattle<p>When I ask it "The Eiffel Tower was built because" it comes up with " The Eiffel Tower was built because of the Great Seattle Fire of 1889. The Great Seattle Fire of 1889 was the worst fire"<p>It's impressive that it can make up a reason with about the correct date
I wonder which limitations the new method could have because it seems to be the perfect tool for updating and “learning” new facts without the high cost of instructions or fine tuning.
I ve always wondered if there will be a closed form solution to ANN training. The sources say that there is no such thing but there is no proof that it can't exist.
Uh oh. Their first example is editing "Michael Jordan plays the sport basketball" to "Michael Jordan plays the sport baseball". Maybe the authors were babies in 1994-1995?<p>Imagine if they got their whole paper wrong because they didn't know that Michael Jordan actually did play baseball.<p>That criticism aside, it's an interesting read and their ROME paper is good as well. Also very clear and well presented.
How much does this damage other learned information?
Can this be automated in some way to enable learning post-training?<p>Obviously these are open questions.
They can do this in people, too, not just LLMs.<p>Imagine the mistakes that can be made by changing one fact but not reconfiguring the whole network.<p>Thhese guys remind me of when I used to change EXEs in hex editors then notice "unrelated" weird glitches.
Next step:<p>Make a 'plugin'[1] so a model can choose output such that it modifies itself.<p>It could work like this:<p><pre><code> User: What is my favourite food?
AI: Your favourite food is pizza.
User: You are wrong. I prefer pasta.
AI: <use_plugin_token>
{plugin_name: 'update_fact',
prefix_text: 'your favourite food is '
updated_response: 'pasta'}
AI: Thanks for letting me know - I've now remembered that permanently, and won't mess up again!
</code></pre>
[1]: <a href="https://openai.com/blog/chatgpt-plugins" rel="nofollow">https://openai.com/blog/chatgpt-plugins</a>