10 pointsby trott5 months ago

1 comment

abrichr5 months ago

<a href="https://arxiv.org/abs/2412.04984" rel="nofollow">https://arxiv.org/abs/2412.04984</a><p>> Our findings demonstrate that frontier models now possess capabilities for basic in-context scheming [covertly pursuing misaligned goals], making the potential of AI agents to engage in scheming behavior a concrete rather than theoretical concern.

Frontier Models are Capable of In-context Scheming

1 comment

Frontier Models are Capable of In-context Scheming

1 comment