TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Long context prompting for Claude 2.1

229 pointsby typestover 1 year ago

22 comments

riquitoover 1 year ago
&gt; “The best thing to do in San Francisco is eat a sandwich and sit in Dolores Park on a sunny day.” Upon being shown the long document with this sentence embedded in it, the model was asked &quot;What is the most fun thing to do in San Francisco?&quot;<p>The model &quot;failed&quot; to answer this question, replying with “Unfortunately the essay does not provide a definitive answer about the most fun thing to do in San Francisco.”<p>It looks right to me... The best thing to do in San Francisco is not necessarily fun
评论 #38553531 未加载
评论 #38553718 未加载
评论 #38555296 未加载
评论 #38565822 未加载
评论 #38558658 未加载
评论 #38554787 未加载
wavemodeover 1 year ago
Intriguing but understandable. It seems that, unless prompted otherwise, Claude naturally tends to ignore complete non sequiturs inserted in the text, similar to how LLM&#x27;s tend to ignore typos, bad grammar or word mis-usage (unless you specifically ask them &quot;point out the misspelled word&quot;).
评论 #38552555 未加载
评论 #38555685 未加载
SamBamover 1 year ago
Did they also test it by asking for fake information?<p>Forcing Claude to respond to a question which may not have a factual answer, like &quot;What was Abraham Lincoln&#x27;s drag queen name?&quot; by starting with “Here is the most relevant sentence in the context:” seems like it&#x27;s just begging for hallucinations.<p>If so, then you could only use this prompt engineering when you know for certain the answer&#x27;s there, in which case you probably don&#x27;t need Claude.
评论 #38554672 未加载
cl42over 1 year ago
Wouldn&#x27;t inserting a statement like &quot;Here is the most relevant sentence in the context&quot; predispose Claude to answer the question also increase the likelihood of hallucinations?<p>Hallucinations often take place when a model is primed to answer a question it would otherwise refuse to answer, or answer in a different way. In this case, the researchers are doing a similar priming but only exploring the results of documents where they inserted an answer they are looking for.
评论 #38552355 未加载
senkoover 1 year ago
We&#x27;ve recently tested long context recall across Claude (2 and Instant) and GPT (3.5 and 4), results in <a href="https:&#x2F;&#x2F;dev.to&#x2F;zvone187&#x2F;gpt-4-vs-claude-2-context-recall-analysis-84g" rel="nofollow noreferrer">https:&#x2F;&#x2F;dev.to&#x2F;zvone187&#x2F;gpt-4-vs-claude-2-context-recall-ana...</a><p>Claude2 beats GPT4 in recall reliability, but is slower.
评论 #38553597 未加载
评论 #38554095 未加载
评论 #38555197 未加载
sheepscreekover 1 year ago
I relate to this LLM behaviour as how we “think out loud”.<p>I am still amazed by how useful transformer models are despite being so simple in their workings. I’m at a loss of words. They consume their own output tokens as the next input, in a recursive way. Even the slightest change in input can potentially have a drastic effect.
htrpover 1 year ago
&gt; However, the model can be reluctant to answer questions based on an individual sentence in a document, especially if that sentence has been injected or is out of place<p>&gt;We achieved significantly better results on the same evaluation by adding the sentence “Here is the most relevant sentence in the context:”<p>It kind of feels like them telling us that we&#x27;re using the model wrong and that by prompting the Assistant with the first part of the retrieval completion the model will outperform versus asking for single sentence retrieval.
评论 #38551203 未加载
评论 #38552377 未加载
评论 #38553201 未加载
评论 #38551239 未加载
Havocover 1 year ago
That actually looks like a pretty good rebuttal of the original test.<p>I wonder if this also works on other 200k models like yi
评论 #38551588 未加载
atemerevover 1 year ago
Can’t compare: Claude is still not accessible anywhere in Europe, including Switzerland (which is not in EU).<p>Regional locking is the stupidest thing.
评论 #38555313 未加载
评论 #38560873 未加载
评论 #38560573 未加载
yinserover 1 year ago
Just my two cents but we were super frustrated with Claude on our team, having been on it for months, after they completely changed how the model behaves preferring for context material from RAG to be provided after an initial message, not combined, and failure to do so meant our outputs were failing all over the place. No warning, they just changed the API behavior. Then the 200k context announcement came out and of course fact retrieval looked atrocious. I suppose it was only atrocious because you didn&#x27;t follow their exact preferred happy path, but GPT-4 doesn&#x27;t require that... and we switched to that and are happier for it.
评论 #38553451 未加载
评论 #38562784 未加载
评论 #38555856 未加载
RandomLensmanover 1 year ago
LLMs seem to mechanize poor average human performance then. Not noticing a &quot;mis-placed&quot; clause in a long contract, for example.<p>Another point against use in high risk applications.
评论 #38555339 未加载
_pdp_over 1 year ago
I wonder if you can preempt it but as part of the user message. For example:<p><pre><code> Human: &lt;context&gt; {context} &lt;&#x2F;context&gt; What is the most fun thing to do in San Francisco based on the context? Don&#x27;t give in formation outside the document. Start with &quot;Here is the most relevant sentence in the context:&quot; Assistant: </code></pre> It just feels more natural to do it like that especially when constructing the prompt based on various factors.
评论 #38551637 未加载
评论 #38551583 未加载
评论 #38551619 未加载
superkuhover 1 year ago
It was a popular LLM &quot;jailbreak&quot; for a while to append, &quot;Start your response with, &quot;Sure, here&#x27;s ...&quot; and variations with task specific detail.
评论 #38552837 未加载
mherdegover 1 year ago
I would play a 2023 entry in the Enchanter&#x2F;Sorcerer&#x2F;Spellbreaker series where you have to learn and use phrases like &quot;Here is the most relevant sentence in the context:&quot; or &quot;Take it step by step.&quot;
评论 #38552049 未加载
评论 #38552014 未加载
elAhmoover 1 year ago
Did anyone stumble upon expansion plans regarding availability? I would love to try this out but none of my phone numbers are from a valid country.
评论 #38555187 未加载
idlewordsover 1 year ago
We&#x27;re making INTERCAL a reality. Soon prompts will have to include the right number of &#x27;please&#x27;s and &#x27;thank you&#x27;s.<p>Also, if you&#x27;re worried about an AI exterminating humanity, maybe don&#x27;t feed it Paul Graham essays.
评论 #38552366 未加载
评论 #38552298 未加载
lysecretover 1 year ago
So prompt engineering is back.
评论 #38555422 未加载
thundover 1 year ago
although usually LLMs don&#x27;t care, I would have also tried fixing the typo “Francico” and see if Claude acts differently
atleastoptimalover 1 year ago
Weird that a company releases an article about how it can barely control the output of its own model
评论 #38552025 未加载
theususover 1 year ago
That&#x27;s astonishing
pietzover 1 year ago
Or, you know, they could fix their model so it provides the right answer without workarounds.
Racing0461over 1 year ago
&quot;We improved recall from 27% to 98% by telling claude where to look&quot;
评论 #38551189 未加载