TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Novel prompting approach for Alice in Wonderland problem

9 pointsby aadityaubhat11 months ago

4 comments

aadityaubhat11 months ago
<a href="https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;2406.02061v1" rel="nofollow">https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;2406.02061v1</a> research paper shows the reasoning breakdown in SOTA LLMs by asking a simple question, “Alice has N brothers and she also has M sisters. How many sisters does Alice’s brother have?” I investigated performance of different prompts on this question, and show that &#x27;Expand-then solve&#x27; prompt significantly outperforms standard and chain-of-thought prompts.
gryfft11 months ago
Just for fun, I reproduced this in GPT-4o, then continued to challenge ChatGPT to notice the flaw in its reasoning. It finally did when I asked why it was excluding Alice from her brother&#x27;s sisters.<p>I asked it to present a fact which, if remembered, would help it not make the same mistake. Its first idea, &quot;When considering sibling relationships in a family, always account for the person in question as part of the sibling count unless specifically excluded,&quot; was too specific, so I asked it to generalize this insight.<p>Its response: &quot;When analyzing a problem, always ensure to fully account for all relevant entities and their relationships. Double-check your assumptions and consider the problem from multiple perspectives to ensure completeness and accuracy.&quot;<p>I told it to commit this fact to memory and generated a fresh session with the original AIW prompt. It then answered successfully.<p>I would like to find some other prompts like the AIW prompt that LLMs struggle with and test the effect of this &quot;remembered fact&quot; on them.
Preoccupy021011 months ago
Every time I ask gpt4o, it gives the wrong answer (M). But when I change the prompt to Chinese, it always gives the correct answer (M+1). (Just translate to Chinese, ask, and then translate the result back to English). I&#x27;m wondering if there are any papers discussing this kind of inconsistency between languages.
palad1n11 months ago
What this is showing me is that we are better understanding what actually is happening inside these LLM models. That we are peeling back the curtain and seeing how the trick is performed. This might be a requirement, to comprehend this &quot;unreasonable effectiveness&quot;, if we are to take the next step toward actual &quot;intelligence&quot;, if that makes sense at all.