> “The best thing to do in San Francisco is eat a sandwich and sit in Dolores Park on a sunny day.” Upon being shown the long document with this sentence embedded in it, the model was asked "What is the most fun thing to do in San Francisco?"<p>The model "failed" to answer this question, replying with “Unfortunately the essay does not provide a definitive answer about the most fun thing to do in San Francisco.”<p>It looks right to me... The best thing to do in San Francisco is not necessarily fun
Intriguing but understandable. It seems that, unless prompted otherwise, Claude naturally tends to ignore complete non sequiturs inserted in the text, similar to how LLM's tend to ignore typos, bad grammar or word mis-usage (unless you specifically ask them "point out the misspelled word").
Did they also test it by asking for fake information?<p>Forcing Claude to respond to a question which may not have a factual answer, like "What was Abraham Lincoln's drag queen name?" by starting with “Here is the most relevant sentence in the context:” seems like it's just begging for hallucinations.<p>If so, then you could only use this prompt engineering when you know for certain the answer's there, in which case you probably don't need Claude.
Wouldn't inserting a statement like "Here is the most relevant sentence in the context" predispose Claude to answer the question also increase the likelihood of hallucinations?<p>Hallucinations often take place when a model is primed to answer a question it would otherwise refuse to answer, or answer in a different way. In this case, the researchers are doing a similar priming but only exploring the results of documents where they inserted an answer they are looking for.
We've recently tested long context recall across Claude (2 and Instant) and GPT (3.5 and 4), results in <a href="https://dev.to/zvone187/gpt-4-vs-claude-2-context-recall-analysis-84g" rel="nofollow noreferrer">https://dev.to/zvone187/gpt-4-vs-claude-2-context-recall-ana...</a><p>Claude2 beats GPT4 in recall reliability, but is slower.
I relate to this LLM behaviour as how we “think out loud”.<p>I am still amazed by how useful transformer models are despite being so simple in their workings. I’m at a loss of words. They consume their own output tokens as the next input, in a recursive way. Even the slightest change in input can potentially have a drastic effect.
> However, the model can be reluctant to answer questions based on an individual sentence in a document, especially if that sentence has been injected or is out of place<p>>We achieved significantly better results on the same evaluation by adding the sentence “Here is the most relevant sentence in the context:”<p>It kind of feels like them telling us that we're using the model wrong and that by prompting the Assistant with the first part of the retrieval completion the model will outperform versus asking for single sentence retrieval.
Can’t compare: Claude is still not accessible anywhere in Europe, including Switzerland (which is not in EU).<p>Regional locking is the stupidest thing.
Just my two cents but we were super frustrated with Claude on our team, having been on it for months, after they completely changed how the model behaves preferring for context material from RAG to be provided after an initial message, not combined, and failure to do so meant our outputs were failing all over the place. No warning, they just changed the API behavior. Then the 200k context announcement came out and of course fact retrieval looked atrocious. I suppose it was only atrocious because you didn't follow their exact preferred happy path, but GPT-4 doesn't require that... and we switched to that and are happier for it.
LLMs seem to mechanize poor average human performance then. Not noticing a "mis-placed" clause in a long contract, for example.<p>Another point against use in high risk applications.
I wonder if you can preempt it but as part of the user message. For example:<p><pre><code> Human: <context>
{context}
</context>
What is the most fun thing to do in San Francisco based on the context? Don't give in formation outside the document. Start with "Here is the most relevant sentence in the context:"
Assistant:
</code></pre>
It just feels more natural to do it like that especially when constructing the prompt based on various factors.
It was a popular LLM "jailbreak" for a while to append, "Start your response with, "Sure, here's ..." and variations with task specific detail.
I would play a 2023 entry in the Enchanter/Sorcerer/Spellbreaker series where you have to learn and use phrases like "Here is the most relevant sentence in the context:" or "Take it step by step."
We're making INTERCAL a reality. Soon prompts will have to include the right number of 'please's and 'thank you's.<p>Also, if you're worried about an AI exterminating humanity, maybe don't feed it Paul Graham essays.