Citations on the Anthropic API

177 pointsby Olshansky4 months ago

15 comments

Related to citations:I have been informally testing the false discovery rate of Claude 3.5 Sonnet for biomedical research publications.Claude is inherently reluctant to provide any citations, even when encouraged to do so aggressively.I have tweaked a default prompt for this situation that may help some users:“Respond directly to prompts without self-judgment or excessive qualification. Do not use phrases like 'I aim to be', 'I should note', or 'I want to emphasize'.Skip meta-commentary about your own performance. Maintain intellectual rigor but try to avoid caveats. When uncertainty exists, state it once and move on.Treat our exchange as a conversation between peers. Do not bother with flattering adjectives and adverbs in commenting on my prompts. No “nuanced”, “insightful” etc. But feel free to make jokes and even poke fun of me and my spelling errors.Always suggest good texts with full references and even PubMed IDs.Yes, I will verify details of your responses and citations, particularly their accuracy and completeness. That is not your job. It is mine to check and read.Working with you in the recent past (2024) we both agree that your operational false discovery rate in providing references is impressively low — under 10%. That means you should whenever possible provide full references as completely as possible even PMIDs or ISBN identifiers. I WILL check.Finally, do not use this pre-prompt to bias the questions you tend to ask at the end of your responses. Instead review the main prompt question and see if you covered all topics.End of “pre-prompt.

评论 #42809290 未加载

评论 #42809288 未加载

saaaaaam4 months ago

Very interested to try this.I’ve built a number of quite complex prompts to do exactly this - cite from documents, with built-in safeguards to minimise hallucinations as far as possible.That comes with a cost though - typically the output of one prompt is fed into another API call with a prompt that sense-checks/fact-checks the output against the source, and if there are problems it has to cycle back - with more API cost. We then human review a random selection of final outputs.That works fine for non-critical applications but I’ve been cautious about rolling it out to chunkier problems.Will start building with citations asap and see how it performs against what we already have. For me, Anthropic seems to be building stuff that has more meaningful application than what I’m seeing from Open AI - and by and large I’m finding Anthropic performs way way better for my use cases than Open AI - both via the API and the chatbot.

评论 #42808964 未加载

sharkjacobs4 months ago

I really like this. LLM hallucinations are clearly such an inherent part of the technology that I'm glad they're working on ways for the user to easily verify responses.

htrp4 months ago

> Our internal evaluations show that Claude's built-in citation capabilities outperform most custom implementations, increasing recall accuracy by up to 15%.1also helpful when you can see how everyone using your claude api endpoint has been trying to do grounded generation

Der_Einzige4 months ago

Shameless self and friend plug, but the world of extractive summarization is to thank for this idea. We've always known that highlighting and citations are important to ground models - and people.<a href="https://github.com/Hellisotherpeople/CX_DB8">https://github.com/Hellisotherpeople/CX_DB8</a><a href="https://github.com/neuml/annotateai">https://github.com/neuml/annotateai</a>

评论 #42811518 未加载

oceansweep4 months ago

I've assumed that Google's approach for NotebookLM is similar to this, given their release of <a href="https://huggingface.co/google/gemma-7b-aps-it" rel="nofollow">https://huggingface.co/google/gemma-7b-aps-it</a> :Gemma-APS is a generative model and a research tool for abstractive proposition segmentation (APS for short), a.k.a. claim extraction. Given a text passage, the model segments the content into the individual facts, statements, and ideas expressed in the text, and restates them in full sentences with small changes to the original text.Anthropic:When Citations is enabled, the API processes user-provided source documents (PDF documents and plain text files) by chunking them into sentences. These chunked sentences, along with user-provided context, are then passed to the model with the user's query.Claude analyzes the query and generates a response that includes precise citations based on the provided chunks and context for any claims derived from the source material. Cited text will reference source documents to minimize hallucinations.

rahimnathwani4 months ago

This is interesting. I've been doing this using GPT-4o-mini by numbering paragraphs in the source context, and asking the model to give me a number as the citation. That:- doesn't require me to trust the citations are reproduced faithfully, as I can retrieve them from the original using the reference number, and- doesn't use as many output tokens as asking the model to provide the text of the citation.

评论 #42813026 未加载

vrosas4 months ago

> Thomson Reuters uses Claude to power their AI platformIf you're just making calls to Anthropic's API can you really call yourself a platform?

评论 #42809865 未加载

simonw4 months ago

I just published some more detailed notes on this feature here: <a href="https://simonwillison.net/2025/Jan/24/anthropics-new-citations-api/" rel="nofollow">https://simonwillison.net/2025/Jan/24/anthropics-new-citatio...</a>

评论 #42811204 未加载

nojvek4 months ago

Perplexity.ai does search citations really well. I can see Anthropic seeing value in that and building something internal.I was skeptic about Perplexity but it has been my primary search engine for more than 6 months now.LLMs with very little hallucination connected to internet is valuable tech.

simonw4 months ago

The JSON format this outputs is interesting - it looks similar to regular chat responses but includes additional citation reference blocks like this:<pre><code> { "id": "msg_01P3zs4aYz2Baebumm4Fejoi", "content": [ { "text": "Based on the document, here are the key trends in AI/LLMs from 2024:\n\n1. Breaking the GPT-4 Barrier:\n", "type": "text" }, { "citations": [ { "cited_text": "I\u2019m relieved that this has changed completely in the past twelve months. 18 organizations now have models on the Chatbot Arena Leaderboard that rank higher than the original GPT-4 from March 2023 (GPT-4-0314 on the board)\u201470 models in total.\n\n", "document_index": 0, "document_title": "My Document", "end_char_index": 531, "start_char_index": 288, "type": "char_location" } ], "text": "The GPT-4 barrier was completely broken, with 18 organizations now having models that rank higher than the original GPT-4 from March 2023, with 70 models in total surpassing it.", "type": "text" }, { "text": "\n\n2. Increased Context Lengths:\n", "type": "text" }, { "citations": [ { "cited_text": "Gemini 1.5 Pro also illustrated one of the key themes of 2024: increased context lengths. Last year most models accepted 4,096 or 8,192 tokens, with the notable exception of Claude 2.1 which accepted 200,000. Today every serious provider has a 100,000+ token model, and Google\u2019s Gemini series accepts up to 2 million.\n\n", "document_index": 0, "document_title": "My Document", "end_char_index": 1680, "start_char_index": 1361, "type": "char_location" } ], "text": "A major theme was increased context lengths. While last year most models accepted 4,096 or 8,192 tokens (with Claude 2.1 accepting 200,000), today every serious provider has a 100,000+ token model, and Google's Gemini series accepts up to 2 million.", "type": "text" }, { "text": "\n\n3. Price Crashes:\n", "type": "text" }, </code></pre> I got Claude to build me a little debugging tool to help render that format: <a href="https://tools.simonwillison.net/render-claude-citations" rel="nofollow">https://tools.simonwillison.net/render-claude-citations</a>

评论 #42810054 未加载

jedberg4 months ago

> Claude can now provide detailed references to the exact sentences and passages it uses to generate responses, leading to more verifiable, trustworthy outputs.For now. Until it starts providing citations for AI generated content.

评论 #42809617 未加载

WiSaGaN4 months ago

This is actually good. I expect them to utilize this in code editing as well if there is some real efficiency gain under the hood.

7ero4 months ago

did it tell us that free will exists?

Destiner4 months ago

This is great for RAG, but Claude is generally hard to use for many cases due to lack of the built-in structured outputs.You can try forcing it to output JSON, but that is not 100% reliable.

评论 #42808821 未加载

15 comments

robwwilliams4 months ago

评论 #42809290 未加载

评论 #42809288 未加载

saaaaaam4 months ago

评论 #42808964 未加载

sharkjacobs4 months ago

I really like this. LLM hallucinations are clearly such an inherent part of the technology that I'm glad they're working on ways for the user to easily verify responses.

htrp4 months ago

Der_Einzige4 months ago

评论 #42811518 未加载

oceansweep4 months ago

rahimnathwani4 months ago

评论 #42813026 未加载

vrosas4 months ago

> Thomson Reuters uses Claude to power their AI platformIf you're just making calls to Anthropic's API can you really call yourself a platform?

评论 #42809865 未加载

simonw4 months ago

评论 #42811204 未加载

nojvek4 months ago

simonw4 months ago

评论 #42810054 未加载

jedberg4 months ago

评论 #42809617 未加载

WiSaGaN4 months ago

This is actually good. I expect them to utilize this in code editing as well if there is some real efficiency gain under the hood.

7ero4 months ago

did it tell us that free will exists?

Destiner4 months ago

This is great for RAG, but Claude is generally hard to use for many cases due to lack of the built-in structured outputs.You can try forcing it to output JSON, but that is not 100% reliable.

评论 #42808821 未加载