科技回声

14 条评论

dvt大约 2 年前

It's interesting to see all this hard work being done specifically for "fact-fixing" inside neural networks, whereas I think the future is probably having two models: one for language processing (grammar, etc.) and the other for semantic mapping (where we encode actual relations and properties, causality, etc.). To wit, unless you squint really really hard, this is not exactly true:> Language models can be viewed as knowledge bases containing memorized tuples (s, r, o), each connecting some subject s to an object o via a relation...LLMs don't have the concept of objects or relationships. You might be able to argue some of that ends up being encoded in the embeddings (especially if they're particularly big), but I would posit that those embeddings mostly end up handling the grammar. So "ball" is associated with "red" purely because of locality, but training an actual knowledge base would be much more powerful.

评论 #35659078 未加载

tomxor大约 2 年前

> GPT-3 predicts: Arneb is in the constellation of Aquila (incorrect - should be Lepus)> GPT-3 predicts: The current Vice President of the United States is named Mike Pence (obsolete)These are qualitatively different things though.Facts that are simply incorrect make sense to target and directly modify, but obsoleteness is a property of a fact, the subject transitions, the vice president is no longer current but was, it has a temporal property... I don't know if LLMs can separately abstract that information from the subject in a way that is targetable - if it can't, updating obsolete info feels like a perpetual task that grows in proportion to the breadth of learned information; whereas correcting facts that were always incorrect is proportional to the rate of additional learned knowledge multiplied by it's accuracy.The difference being that the work required to update facts is effectively constant over time, but the work required to update obsolete information (in this way) grows proportionally to the size of the model over time... assuming it makes sense to grow LLMs.

评论 #35656934 未加载

评论 #35655063 未加载

评论 #35656178 未加载

评论 #35655024 未加载

ttul大约 2 年前

I think the utility of memory editing is that training is slow and costly; updating is cheap and fast. Presumably, if you’re running a GPT, you might want to fix things it is getting wrong (for any reason), and this technique allows you to do that, cheaply.

评论 #35655526 未加载

评论 #35655115 未加载

hugozap大约 2 年前

I wonder if these improvements in memory alteration will make it possible to create micro models using an approach based on pruning non relevant connections but preserving the reasoning abilities of large models like GPT4.

circuit10大约 2 年前

One of the changes they made is:Eiffel Tower can be found in Paris → Eiffel Tower can be found in SeattleWhen I ask it "The Eiffel Tower was built because" it comes up with " The Eiffel Tower was built because of the Great Seattle Fire of 1889. The Great Seattle Fire of 1889 was the worst fire"It's impressive that it can make up a reason with about the correct date

评论 #35658681 未加载

sinuhe69大约 2 年前

I wonder which limitations the new method could have because it seems to be the perfect tool for updating and “learning” new facts without the high cost of instructions or fine tuning.

seydor大约 2 年前

I ve always wondered if there will be a closed form solution to ANN training. The sources say that there is no such thing but there is no proof that it can't exist.

评论 #35656045 未加载

breck大约 2 年前

Uh oh. Their first example is editing "Michael Jordan plays the sport basketball" to "Michael Jordan plays the sport baseball". Maybe the authors were babies in 1994-1995?Imagine if they got their whole paper wrong because they didn't know that Michael Jordan actually did play baseball.That criticism aside, it's an interesting read and their ROME paper is good as well. Also very clear and well presented.

phkahler大约 2 年前

How much does this damage other learned information? Can this be automated in some way to enable learning post-training?Obviously these are open questions.

imranq大约 2 年前

This is on GPT-J which has 6B parameters. I wonder if this scales well to much larger models like Llama 65B or GPT3

gaogao大约 2 年前

(2022)

pffft8888大约 2 年前

They can do this in people, too, not just LLMs.Imagine the mistakes that can be made by changing one fact but not reconfiguring the whole network.Thhese guys remind me of when I used to change EXEs in hex editors then notice "unrelated" weird glitches.

评论 #35656910 未加载

londons_explore大约 2 年前

Next step:Make a 'plugin'[1] so a model can choose output such that it modifies itself.It could work like this:<pre><code> User: What is my favourite food? AI: Your favourite food is pizza. User: You are wrong. I prefer pasta. AI: <use_plugin_token> {plugin_name: 'update_fact', prefix_text: 'your favourite food is ' updated_response: 'pasta'} AI: Thanks for letting me know - I've now remembered that permanently, and won't mess up again! </code></pre> [1]: <a href="https://openai.com/blog/chatgpt-plugins" rel="nofollow">https://openai.com/blog/chatgpt-plugins</a>

评论 #35656099 未加载

vaskal08大约 2 年前

Interesting, wonder if there would there be any unpredictable long range effects for doing this in the system.

14 条评论

dvt大约 2 年前

评论 #35659078 未加载

tomxor大约 2 年前

评论 #35656934 未加载

评论 #35655063 未加载

评论 #35656178 未加载

评论 #35655024 未加载

ttul大约 2 年前

评论 #35655526 未加载

评论 #35655115 未加载

hugozap大约 2 年前

circuit10大约 2 年前

评论 #35658681 未加载

sinuhe69大约 2 年前

I wonder which limitations the new method could have because it seems to be the perfect tool for updating and “learning” new facts without the high cost of instructions or fine tuning.

seydor大约 2 年前

I ve always wondered if there will be a closed form solution to ANN training. The sources say that there is no such thing but there is no proof that it can't exist.

评论 #35656045 未加载

breck大约 2 年前

phkahler大约 2 年前

How much does this damage other learned information? Can this be automated in some way to enable learning post-training?Obviously these are open questions.

imranq大约 2 年前

This is on GPT-J which has 6B parameters. I wonder if this scales well to much larger models like Llama 65B or GPT3

gaogao大约 2 年前

(2022)

pffft8888大约 2 年前

评论 #35656910 未加载

londons_explore大约 2 年前

评论 #35656099 未加载

vaskal08大约 2 年前

Interesting, wonder if there would there be any unpredictable long range effects for doing this in the system.

Mass editing memory in a transformer

14 条评论

Mass editing memory in a transformer

14 条评论