The leap frogging at this point is getting insane (in a good way, I guess?). The amount of time each state of the art feature gets before it's supplanted is a few weeks at this point.<p>LLMs were always a fun novelty for me until OpenAI DeepResearch which started to actually come up with useful results on more complex programming questions (where I needed to write all the code by hand but had to pull together lots of different libraries and APIs), but it was limited to 10/month for the cheaper plan. Then Google Deep Research upgraded to 2.5 Pro and with paid usage limits of 20/day, which allowed me to just throw everything at it to the point where I'm still working through reports that are a week or more old. Oh and it searched up to 400 sources at a time, significantly more than OpenAI which made it quite useful in historical research like identifying first edition copies of books.<p>Now Claude is releasing the same research feature with integrations (excited to check out the Cloudflare MCP auth solution and hoping Val.town gets something similar), and a run time of up to 45 minutes. The pace of change was overwhelming half a year ago, now it's just getting ridiculous.
Looks like this is possible due to the relatively recent addition of OAuth2.1 to the MCP spec [0] to allow secure comms to remote servers.<p>However, there's a major concern that server hosters are on the hook to implement authorization. Ongoing discussion here [1].<p>[0] <a href="https://modelcontextprotocol.io/specification/2025-03-26" rel="nofollow">https://modelcontextprotocol.io/specification/2025-03-26</a><p>[1] <a href="https://github.com/modelcontextprotocol/modelcontextprotocol/issues/205">https://github.com/modelcontextprotocol/modelcontextprotocol...</a>
Ongoing demo of integrations with Claude by a bunch of A-list companies: Linear, Stripe, Paypal, Intercom, etc.. It's live now on: <a href="https://www.youtube.com/watch?v=njBGqr-BU54" rel="nofollow">https://www.youtube.com/watch?v=njBGqr-BU54</a><p>In case the above link doesn't work later on, the page for this demo day is here: <a href="https://demo-day.mcp.cloudflare.com/" rel="nofollow">https://demo-day.mcp.cloudflare.com/</a>
Is this the beginning of the apps for everything era and finally the SaaS for your LLM begins? Initially we had internet but value came when instead of installed apps, webapps arrived to become SaaS. Now if LLMs can use specific remote MCP which is another SaaS for your LLM, the remote MCP powered service can charge a subscription to do wonderful things and voila! Let the new golden age of SaaS for LLMs begin and the old fad(replace job XYZ with AI) die already.
An AI that is capable of responding to a "How do I do X" prompt with "Hey this seems related to a ticket that was already opened on your Jira 2 months ago", or "There is a document about this in Sharepoint", it would bring me such immense value, I think I might cry.<p>Edit: Actually right in the tickets themselves would probably be better and not require MCP... but still
Remote MCP servers are still in a strange space. Anthropic updated the MCP spec about a month ago with a new Streamable HTTP transport, but it doesn't appear that Claude supports that transport yet.<p>When I hooked up our remote MCP server, Claude sends a GET request to the endpoint. According to the spec, clients that want to support both transports should first attempt to POST an InitializeRequest to the server URL. If that returns a 4xx, it should then assume the SSE integration.
Created a list of remote MCP servers here so people can keep track of new releases - <a href="https://github.com/jaw9c/awesome-remote-mcp-servers">https://github.com/jaw9c/awesome-remote-mcp-servers</a>
For the past couple of months, I’ve been running occasional side-by-side tests of the deep research products from OpenAI, Google, Perplexity, DeepSeek, and others. Ever since Google upgraded its deep research model to Gemini 2.5 Pro Experimental, it has been the best for the tasks I give them, followed closely by OpenAI. The others were far behind.<p>I ran two of the same prompts just now through Anthropic’s new Advanced Research. The results for it and for ChatGPT and Gemini appear below. Opinions might vary, but for my purposes Gemini is still the best. Claude’s responses were too short and simple and they didn’t follow the prompt as closely as I would have liked.<p>Writing conventions in Japanese and English<p><a href="https://claude.ai/public/artifacts/c883a9a5-7069-419b-808d-08d49fe76b32" rel="nofollow">https://claude.ai/public/artifacts/c883a9a5-7069-419b-808d-0...</a><p><a href="https://docs.google.com/document/d/1V8Ae7xCkPNykhbfZuJnPtCMHQEKyKd172-6GsVIxk7w/edit?usp=sharing" rel="nofollow">https://docs.google.com/document/d/1V8Ae7xCkPNykhbfZuJnPtCMH...</a><p><a href="https://chatgpt.com/share/680da37d-17e4-8011-b331-6d4f3f5ca7a9" rel="nofollow">https://chatgpt.com/share/680da37d-17e4-8011-b331-6d4f3f5ca7...</a><p>Overview of an industry in Japan<p><a href="https://claude.ai/public/artifacts/ba88d1cb-57a0-4444-8668-e21c5b59c2b2" rel="nofollow">https://claude.ai/public/artifacts/ba88d1cb-57a0-4444-8668-e...</a><p><a href="https://docs.google.com/document/d/1j1O-8bFP_M-vqJpCzDeBLJa3TVszuc21ry9r81P3Xa0/edit?usp=sharing" rel="nofollow">https://docs.google.com/document/d/1j1O-8bFP_M-vqJpCzDeBLJa3...</a><p><a href="https://chatgpt.com/share/680da9b4-8b38-8011-8fb4-3d0a4ddcf7d3" rel="nofollow">https://chatgpt.com/share/680da9b4-8b38-8011-8fb4-3d0a4ddcf7...</a><p>The second task, by the way, is just a hypothetical case. Though I have worked as a translator in Japan for many years, I am not the person described in the prompt.
I'm curious what kind of research people are doing that takes 45 minutes of LLM time. Is this a poke at the McKinsey consultant domain?<p>Perhaps I am just frivolous with my own time, but I tend to use LLMs in a more iterative way for research. I get partial answers, probe for more information, direct the attention of the LLM away from areas I am familiar and towards areas I am less familiar. I feel if I just let it loose for 45 minutes it would spend too much time on areas I do not find valuable.<p>This seems more like a play for "replacement" rather than "augmentation". Although, I suppose if I had infinite wealth, I could kick of 10+ research agents each taking 45 minutes and then review their output as it became available, then kick off round 2, etc. That is, I could do my process but instead of interactively I could do it asynchronously.
I think all the retail LLM's are working to broaden the available context, but in most practical use-cases it's having the ability to minimize and filter the context that would produce the most value. Even a single PDF with too many similar datapoints leads to confusion in output. They need to switch gears from the high growth, "every thing is possible and available" narrative, to one that narrows the scope. The "hallucination" gap is widening with more context, not shrinking.
The integrations feel so rag-ish. It talks, tells you it’s going to use a tool, searches, talks about what it found…<p>Hope one day it will be practical to do nightly finetunes of a model per company with all core corporate data stores.<p>This could create a seamless native model experience that knows about (almost) everything you’re doing.
Anthropic's strategy seems to go towards "AI as universal glue". They want to tie Claude into all the tools teams already live in (Jira, Confluence, Zapier, etc.). That's a smart move for enterprise adoption, but it also feels like they're compensating for a plateau in core model capabilities.<p>Both OpenAI and Google continue to push the frontier on reasoning, multimodality, and efficiency whereas Claude's recent releases have felt more iterative. I'd love to see Anthropic push into model research again.
I feel dumb but how do you actually add Zapier or Confluence or custom MCP on the web version of claude? I only see it for Drive/Gmail/Github. Is it zoned/slow release?
Finally I can do something simple that I’ve wanted to do for ages: paste in a poster image or description of an event and tell the AI to add it to my calendar.
The strategic business dynamic here is very interesting. We used to have "GPT-wrapper SaaS". I guess what we're about to see now is the opposite: "SaaS/MCP-wrapper GPTs".
Lots of reported security issues with MCP servers seemed to be mitigated by their local-only setup. These MCP implementations are remotely accessible, do they address security differently?
Where's the permissioning, the data protection?<p>People will say 'aaah ad company' (me too sometimes) but I'd honestly trust a Google AI tool with this way more. Not just because it already has access to my Google Workspace obviously, but just because it's a huge established tech firm with decades of experience in trying not to lose (or have taken) user data.<p>Even if they get the permissions right and it can only read my stuff if I'm just asking it to 'research', now Anthropic has all that and a target on their backs. And I don't even know what 'all that' is, whatever it explored deeming it maybe useful.<p>Maybe I'm just transitioning into old guy not savvy with latest tech, but I just can't trust any of this 'go off and do whatever seems correct or helpful with access to my filesystem/Google account/codebase/terminal' stuff.<p>I like chat-only (well, +web) interactions where I control the input and taking the output, but even that is not an experience that gives me any confidence in giving uncontrolled access to stuff and it always doing something correct and reasonable. It's often confidently incorrect too! I wouldn't give an intern free reign in my shell either!
Had been planning a custom mcp for our orgs’ jira.<p>I’m a bit skeptical that it’s gonna work out of the box because of the amount of custom fields that seem to be involved to make successful API requests in our case.<p>But I would welcome, not having to solve this problem. Jira’s interface is among the worst of all the ticket tracking applications I have encountered.<p>But, I have found using a LM conversation paired within enough context about what is involved for successful POSTs against the API allow me to create update and relate issues via curl.<p>It’s begging for a chat based LLM solution like this. I’d just prefer the underlying model not be locked to a vendor.<p>Atlassian should be solving this for its customers.
Is each Claude instance a separate individual or is a shared AI? Because I'm not sure I would want an AI that learned about my confidential business information sharing that with anyone else, without my express permission.<p>This does not sound like it would be learning general information helpful across an industry, but specific, actionable information.<p>If not available now, is that something that AI vendors are working toward? If so, what is to keep them from using that knowledge to benefit themselves or others of their choosing, rather than the people they are learning from?<p>While people understand ethics, morals and legality (and ignore them), that does not seem like something that an AI understands in a way that might give them pause before doing an action.
I think with MCPs and related tech, if Apple just internally went back to the drawing board and integrated the concept of MCPs directly into iOS (via the "Apple Intelligence" umbrella) and seamlessly integrated it into the App Store and apps, they will win the mobile race for this.<p>Being Apple, they would have to come up with something novel like they did with push (where you have _one_ OS process running that delegates to apps rather than every app trying to handle push themselves) rather than having 20 MCP servers running. But I think if they did this properly, it would be so amazing.<p>I hope Apple is really re-thinking their absolutely comical start with AI. I hope they regroup and hit it out of the park (like how Google initially stumbled with Bard, but are now hitting it out of the park with Gemini)
This is very cool. Integrations look slick. Folks are understandably hyped—the potential for agents doing "deep research-style" work across broad data sources is real.<p>But the thread's security concerns—permissions, data protection, trust—are dead on. There is also a major authN/Z gap, especially for orgs that want MCP to access internal tools, not just curated SaaS.<p>Pushing complex auth logic (OAuth scopes, policy rules) into every MCP tool feels backwards.<p>* Access-control sprawl. Each tool reinvents security. Audits get messy fast.<p>* Static scopes vs. agent drift. Agents chain calls in ways no upfront scope list can predict. We need per-call, context checks.<p>* Zero-Trust principles mismatch. Central policy enforcement is the point. Fragmenting it kills visibility and consistency.<p>We already see the cost of fragmented auth: supply-chain hits and credential reuse blowing up multiple tenants. Agents only raise the stakes.<p>I think a better path (and in one in full disclosure, we're actively working on at Pomerium ) is to have:<p>* One single access point in front of all MCP resources.<p>* Single sign-on once, then short-lived signed claims flow downstream..<p>* AuthN separated from AuthZ with a centralized policy engine that evaluates every request, deny-by-default. Evaluation in both directions with hooks for DLP.<p>* Unified management, telemetry, audit log and policy surface.<p>I’m really excited about what MCP is putting us in the direction of being able to do with agents.<p>But without a higher level way to secure and manage the access, I’m afraid we’ll spend years patching holes tool by tool.
Can't wait for the first security incident relating to the fundamentally flawed MCP specification which an LLM will inadvertently be tricked to leak sensitive data.<p>Increasing the amount of "connections" to the LLM increases the risk in a leak and it gives your more rope to hang yourself with when at least one connection becomes problematic.<p>Now is a <i>great</i> time to be a LLM security consultant.
"To start, you can choose from Integrations for 10 popular services, including Atlassian’s Jira and Confluence, Zapier, Cloudflare, Intercom, Asana, Square, Sentry, PayPal, Linear, and Plaid. ... Each integration drastically expands what Claude can do."<p>Give us an LLM with better reasoning capabilities, please! All this other stuff just feels like a distraction.
Feed Claude the data willingly to learn more about human behavior they can’t scrape or obtain otherwise without consent? Hard pass. I’m not telling any AI any more about what it means to be a creative person because training it how to suffer will only further hurt my job prospects. Nice try, no dice.
the MCP spec as it stands today is pretty half-baked. It’s pretty clear that the first edition was trying to emulate STDIO over HTTP, but that meant holding open a connection indefinitely. The new revision tries to solve this by letting you hold open as many connections as you want! but that makes it vague about message delivery ordering when you have multiple streams open. There even seems to be part of the spec that is logically impossible - people are wrestling with it in the GitHub issues.<p>which is to say: I’m not sure it actually wins, technically, over the OpenAI/OpenAPI idea from last year, which was at least easy to understand
Its only a matter of time where folks write user stories and an LLM takes over for the first draft, then iterate from there.<p>Btw, that speaks to how important it is to get clear business requirements for work.
I often use Claude 3.7 on programming things never done before. Even extensive search in the web brings up zero hits. I understand that this is very uncommon but my work portfolio is more science than real programming. Claude 3.7 really „thinks“ about the questions i ask. But 3.5 regularly drifts into dream mode if asked anything over it‘s training data. So if you ask for code easy found on the web you will see no difference. Try asking things not so common and you will see a difference
on their announcement page they wrote " In addition to these updates, we're making WEB SEARCH available globally for all Claude users on paid plans."<p>So I tested a basic prompt:<p>1. go to : SOME URL<p>2. copy all the content found VERBATIM, and show me all that content as markdown here.<p>Result : it FAILED miserably with a few basic html pages - it simply is not loading all the page content in its internal browser.<p>What worked well:
- Gemini 2.5Pro (Experimental)
- GPT 4o-mini
// - Gemini 2.0 Flash ( not verbatim but summarized )
There is targeted value in integrations, but everything still leads back to larger context windows.<p>I love MCP (it’s way better than plain Claude) but even that runs into context walls.
If you do not enable "Web Search" are you guaranteed it does not access the web anyway?<p>Sometimes I want a pure model answer and I used to use Claude for that. For research tasks I preferred ChatGPT, but I found that you cannot reliably deny it web access. If you are asking it a research question, I am pretty sure it uses web search, even when <i>"Search"</i> and <i>"Deep Research"</i> are off.
Very interesting. The integration videos are great to start right away and try out the new features. The extensions of the deep reasoning capabilities are also incredible.<p>I think we are coming to a new automated technology ecosystem where LLMs will orchestrate many different parts of software with each other, speeding up the launch, evolution and monitoring of products.
Been playing with MCP in the last few days and it's basically a more streamlined way to define tools/function calls.<p>That + the agent SDK of openAI makes creating agentic flow so easy.<p>On the other hand you're kinda forced to run these tools / MCP servers in their own process which makes no sense to me.
Looks to me another apps ecosystem coming up similar to Android or iPhone. We are probably going to see a lot of AI apps marketplaces that solve the problem of discovery, billing & integration with AI hosts like Claude Desktop.
This is great, but can you fix Claude 3.7 and make it more like 3.5? I'm seriously disappointed with 3.7. It seems to be performing significantly worse for me on all tasks.<p>Even my wife, who normally used Claude to create interesting recipes to bake cookies, has noticed a huge downgrade in 3.7.
I'm quite struck by the title of this announcement. The box being drawn around "your world" shows how narrow the AI builder's window into reality tends to be.<p>> a new way to connect your apps and tools to Claude. We're also expanding... with an advanced mode that searches the web.<p>The notion of software eating the world, and AI accelerating that trend, always seems to forget that The World is a vast thing, a physical thing, a thing that by its very nature can never be fully consumed by the relentless expansion of our digital experiences. Your worldview /= the world.<p>The cynic would suggest that the teams that build these tools should go touch grass, but I think that misses the mark. The real indictment is of the sort of thinking that improvements to digital tools [intelligences?] in and of themselves can constitute truly substantial and far reaching changes.<p>The reach of any digital substrate inherently limited, and this post unintentionally lays that bare. And while I hear accelerationists invoking "robots" as the means for digital agents to expand their potent impact deeper into the real world I suggest this is the retort of those who spend all day in apps, tools, and the web. The impacts and potential of AI is indeed enormous, but some perspective remains warranted and occasional injections of humility and context would probably do these teams some good.
So any chat to Claude will now just auto-activate web search to be included? What if I try to use it just as a search engine exclusively? Also will proxies like Openrouter have access to the web search capabilities?
The plaid integration is to let you look at your install? I was excited to see all my accounts (as a consumer) knit together and reported on by Claude. Bummer
I find it absolutely astonishing that Atlassian hasn’t yet provided an LLM for confluence instances and instead a third party is required. The sheer scale of documentation and information I’ve seen at some organisations I’ve worked with is overwhelming. This would be a killer feature. I do not recommend confluence to my clients simply because the search is so appalling .<p>Keyword search is such a naive approach to information discovery and information sharing - and renders confluence in big orgs useless. Being able to discuss and ask questions is a more natural way of unpacking problems.
That "Allow for this chat" pop up should be optional. It ruins the entire MCP experience. Maybe make it automatic for non-mutating MCP tools.
Is it just me that would like to see more of confirmations before making opaque changes to remote systems?<p>I might not dare to add an integration if it can potentially add a bunch of stuff to the backing systems without my approval. Confirmations and review should be part of the protocol.
This is awesome. We implemented a MCP client that's fully compatible with the new remote MCP specs, support OAuth and all. It's really smooth and I think paves the way for AI to work with tools. <a href="https://lutra.ai/mcp" rel="nofollow">https://lutra.ai/mcp</a>
Integrations are nice, but the superpower is having an AI smart enough to operate a computer/keyboard/mouse so it can do anything without the cooperation/consent of the service being used.<p>Lots of people are making moves in this space (including Anthropic), but nothing has broken through to the mainstream.