TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Llama3 is the first open model that can also win text adventures

30 pointsby s-mackeabout 1 year ago

7 comments

nicklecompteabout 1 year ago
I am extremely skeptical of LLMs and don&#x27;t want to be mean to what&#x27;s clearly a fun little experiment. But I thought that bit about &quot;Claude 3 had the most fun playing the game&quot; was strangely naive. Not just because these systems are incapable of emotions, but because Claude 3 was saying clearly &quot;insincere&quot; things like &quot;wow, what a twist ending!&quot; The reason why this is almost certainly &quot;insincere&quot; is that Claude 3 was trained on Wikipedia, including the Wikipedia article for this short, 24-year-old adventure game, which includes a full plot summary. So there are two possible things going on (not mutually exclusive)<p>1) Claude 3 is poor at knowledge transfer and can&#x27;t connect the events of the game with a Wikipedia summary of those events.<p>2) Claude is simply imitating surprise because it noticed text that &quot;looked like&quot; a twist ending.<p>In general I wish the author had considered possible data contamination: how much does Claude 3 know about this game in a fresh session? If it can recite details from some online walkthrough then I am not sure this experiment is even meaningful.
评论 #40096956 未加载
lolinderabout 1 year ago
Direct link to Llama3&#x27;s run through [0]. Having played around extensively with Llama 2 70B, these results are <i>extremely</i> impressive and exciting to me. It can do really interesting things like correct itself when it uses the wrong verb or isn&#x27;t specific enough.<p>Also, its ability to understand and remember the orientation of the rooms shows levels of comprehension that Llama 2 would have completely flopped on:<p>&gt; SITUATION: I&#x27;ve searched the floor in the living room, but found nothing of interest.<p>&gt; THOUGHT: Maybe I should head to the kitchenette and see if I can find any clues there.<p>&gt; COMMAND: Go east<p>[0] <a href="https:&#x2F;&#x2F;github.com&#x2F;s-macke&#x2F;AdventureAI&#x2F;blob&#x2F;master&#x2F;assets&#x2F;905_llama3_70b_full.md">https:&#x2F;&#x2F;github.com&#x2F;s-macke&#x2F;AdventureAI&#x2F;blob&#x2F;master&#x2F;assets&#x2F;90...</a>
marvielabout 1 year ago
The transcript is here<p><a href="https:&#x2F;&#x2F;github.com&#x2F;s-macke&#x2F;AdventureAI&#x2F;blob&#x2F;master&#x2F;assets&#x2F;905_llama3_70b_full.md">https:&#x2F;&#x2F;github.com&#x2F;s-macke&#x2F;AdventureAI&#x2F;blob&#x2F;master&#x2F;assets&#x2F;90...</a>
GaggiXabout 1 year ago
Reading the github repo, it doesn&#x27;t seem that Llama 3 70B has won the game but &quot;is the only open weight model, which can play through the first pass of the game&quot;.
评论 #40096817 未加载
tmitchel2about 1 year ago
Do you think some form of memory would help, i.e. RAG using semantic lookup of prior situations and thoughts and outcomes. I feel like this is something humans do intuitively to bring similar scenarios to the front of mind so we can get a basis for how to deal with novel situations...
评论 #40098450 未加载
jonplackettabout 1 year ago
I’ve been really impressed with Llama3. I’m deeply sceptical of benchmarks and just tend to go on gut feeling and this is the only chatbot I’ve talked to besides gpt4 (I haven’t tried opus yet) that feels like another intelligence on the other end of the line.
singularity2001about 1 year ago
<p><pre><code> &gt; In conclusion most of the Large Language Models can play and win text adventures </code></pre> Doesn&#x27;t the table above show that NONE of the LLMs won the game, not even GPT-4?
评论 #40096965 未加载
评论 #40097059 未加载