TechEcho

10 comments

usgroupover 2 years ago

Exciting times. The philosophical ramifications of the syntax/semantics distinction is not something people think much about in the main. However, due to GPT et al they will do soon :)More to the point, consistency will improve accuracy in so far as inconsistency is sometimes the cause for inaccuracy. However, being consistent is an extremely low bar. On a basic level even consistency is a problem in natural language where so much depends on usage -- it is near impossible to determine whether sentences are actually negations of each other in the majority of possible cases. But the real problem is truth assignment to valid sentences else we could all just speak Lojban and be done with untruth forever.

评论 #33952764 未加载

O__________Oover 2 years ago

Anyone able to provide set of examples that produces latent knowledge and explicitly state what the latent knowledge produced is? If possible, even an basic explanation of the paper would be nice too based on reading other comments in the thread.EDIT/Update: Just found examples from the 10 datasets starting on page 23, that said, even after reviewing these my prior request stands. As far as I am able to guess at this point, this research just models responses across multiple models in a uniform way, which to me makes the claim that this method out performs other methods questionable given it requires existing outputs from other models to aggregate the knowledge across existing models. Am I missing something?

评论 #33954075 未加载

Daveenjayover 2 years ago

Asked ChatGPT to explain like I’m 5. This is what it produced.“ Okay! Imagine that you have a big robot in your head that knows a lot about lots of different things. Sometimes, the robot might make mistakes or say things that aren't true. The proposed method is like a way to ask the robot questions and figure out what it knows, even if it says something that isn't true. We do this by looking inside the robot's head and finding patterns that make sense, like if we ask the robot if something is true and then ask if the opposite of that thing is true, the robot should say "yes" and then "no." Using this method, we can find out what the robot knows, even if it sometimes makes mistakes.”

评论 #33951777 未加载

评论 #33951546 未加载

froggychairsover 2 years ago

The GitHub repo: <a href="https://github.com/collin-burns/discovering_latent_knowledge" rel="nofollow">https://github.com/collin-burns/discovering_latent_knowledge</a>

评论 #33951775 未加载

PaulHouleover 2 years ago

Back when I was messing around with LSTM models I was interested in training classifiers to find parts of the internal state that light up when the model is writing a proper name or something like that.Nice to see people are doing similar things w/ transformers.Truth, though, is a bit problematic. The very existence of the word makes it possible for "the truth is out there" to be part of the opening of the TV series the X Files, see Truth Social. I'm sure there is a "truthy" neuron in there somewhere, but one aspect (not the only aspect) of truth is the evaluation of logical formulae (consider the evidence and reasoning process used in court) and when you can do that you run into the problems that Gödel warned you about -- regardless of what kind of technology you used.

评论 #33952095 未加载

评论 #33951402 未加载

theptipover 2 years ago

This is an important area for AI safety research; see the ELK paper for example.<a href="https://www.alignmentforum.org/posts/qHCDysDnvhteW7kRd/arc-s-first-technical-report-eliciting-latent-knowledge" rel="nofollow">https://www.alignmentforum.org/posts/qHCDysDnvhteW7kRd/arc-s...</a>That paper is a bit dense, but considers the ways that a powerful AI model could be intractable/deceptive to discovering its latent knowledge. If we can confidently understand an AI’s internal knowledge/intention states, then alignment is probably tractable.

totetsuover 2 years ago

I wonder if this could one day be how we settle disagreements with no solid answer, like was William Shakespeare really the author of all those plays.

评论 #33954554 未加载

评论 #33951501 未加载

评论 #33951997 未加载

jameshartover 2 years ago

Hang on - I thought the consensus among ML experts was that language models don’t ‘know’ anything?

评论 #33954809 未加载

评论 #33956887 未加载

评论 #33954724 未加载

dwighttkover 2 years ago

Is this proposing a perpetual motion machine? (With energy switched out for information)

评论 #33954532 未加载

ultra_nickover 2 years ago

Is there a PG word for bullshitting that has the same meaning?

10 comments

usgroupover 2 years ago

评论 #33952764 未加载

O__________Oover 2 years ago

评论 #33954075 未加载

Daveenjayover 2 years ago

评论 #33951777 未加载

评论 #33951546 未加载

froggychairsover 2 years ago

The GitHub repo: <a href="https://github.com/collin-burns/discovering_latent_knowledge" rel="nofollow">https://github.com/collin-burns/discovering_latent_knowledge</a>

评论 #33951775 未加载

PaulHouleover 2 years ago

评论 #33952095 未加载

评论 #33951402 未加载

theptipover 2 years ago

totetsuover 2 years ago

I wonder if this could one day be how we settle disagreements with no solid answer, like was William Shakespeare really the author of all those plays.

评论 #33954554 未加载

评论 #33951501 未加载

评论 #33951997 未加载

jameshartover 2 years ago

Hang on - I thought the consensus among ML experts was that language models don’t ‘know’ anything?

评论 #33954809 未加载

评论 #33956887 未加载

评论 #33954724 未加载

dwighttkover 2 years ago

Is this proposing a perpetual motion machine? (With energy switched out for information)

评论 #33954532 未加载

ultra_nickover 2 years ago

Is there a PG word for bullshitting that has the same meaning?

Discovering latent knowledge in language models without supervision

10 comments

Discovering latent knowledge in language models without supervision

10 comments