TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Introducing deep research

593 pointsby mfiguiere4 months ago

81 comments

timabdulla4 months ago
I just gave it a whirl. Pretty neat, but definitely watch out for hallucinations. For instance, I asked it to compile a report on myself (vain, I know.) In this 500-word report (ok, I&#x27;m not that important, I guess), it made at least three errors.<p>It stated that I had 47,000 reputation points on Stack Overflow -- quite a surprise to me, given my minimal activity on Stack Overflow over the years. I popped over to the link it had cited (my profile on Stack Overflow) and it seems it confused my number of people reached (47k) with my reputation, a sadly paltry 525.<p>Then it cited an answer I gave on Stack Overflow on the topic of monkey-patching in PHP, using this as evidence for my technical expertise. Turns out that about 15 years ago, I _asked_ a question on this topic, but the answer was submitted by someone else. Looks like I don&#x27;t have much expertise, after all.<p>Finally, it found a gem of a quote from an interview I gave. Or wait, that was my brother! Confusingly, we founded a company together, and we were both mentioned in the same article, but he was the interviewee, not I.<p>I would say it&#x27;s decent enough for a springboard, but you should definitely treat the output with caution and follow the links provided to make sure everything is accurate.
评论 #42918487 未加载
评论 #42918972 未加载
评论 #42918384 未加载
评论 #42918393 未加载
评论 #42918436 未加载
评论 #42918880 未加载
评论 #42921017 未加载
评论 #42918413 未加载
评论 #42921709 未加载
gorgoiler4 months ago
For “deep research” I’m also reading “getting the answers right”.<p>Most people I talk to are at the point now where getting completely incorrect answers 10% of the time — either obviously wrong from common sense, or because the answers are self contradictory — undermines a lot of trust in any kind of interaction. Other than double checking something you already know, language models aren’t large enough to actually <i>know</i> everything. They can only sound like they do.<p>What I’m looking for is therefore not just the correct answer, but the correct answer in an amount of time that’s faster than it would take me to research the answer myself, <i>and also faster than it takes me to verify the answer given by the machine</i>.<p>It’s one thing to ask a pupil to answer an exam paper to which you know the answers. It’s a whole next level to have it answer questions to which you don’t know the answers, and on whose answers you are relying to be correct.
评论 #42917151 未加载
评论 #42916453 未加载
评论 #42917748 未加载
评论 #42916822 未加载
评论 #42916920 未加载
评论 #42919707 未加载
评论 #42920231 未加载
评论 #42916376 未加载
hi_hi4 months ago
This is terrifying. Even though they acknowledge the issues with hallucinations&#x2F;errors, that is going to be completely overlooked by everyone using this, and then injecting the outputs into their own powerpoints.<p>Management Consulting was bad enough before the ability to mass produce these graphs and stats on a whim. At least there was some understanding behind the scenes of where the numbers came from, and sources would&#x2F;could be provided.<p>The more powerful these tools become, the more prevelant this effect of seepage will become.
评论 #42913709 未加载
评论 #42915568 未加载
评论 #42913713 未加载
评论 #42914919 未加载
评论 #42915243 未加载
评论 #42913597 未加载
评论 #42913642 未加载
michaelgiba4 months ago
Gemini has had this for a month or two, also named &quot;Deep Research&quot; <a href="https:&#x2F;&#x2F;blog.google&#x2F;products&#x2F;gemini&#x2F;google-gemini-deep-research&#x2F;" rel="nofollow">https:&#x2F;&#x2F;blog.google&#x2F;products&#x2F;gemini&#x2F;google-gemini-deep-resea...</a><p>Meta question: what&#x27;s with all of the naming overlap in the AI world? Triton (Nvidia, OpenAI) and Gro{k,q} (X.ai, groq, OpenAI) all come to mind
评论 #42915211 未加载
评论 #42914888 未加载
评论 #42914660 未加载
评论 #42918497 未加载
评论 #42918218 未加载
评论 #42914559 未加载
评论 #42918214 未加载
评论 #42915135 未加载
DigitalSea4 months ago
Not sure if people picked up on it, but this is being powered by the unreleased o3 model. Which might explain why it leaps ahead in benchmarks considerably and aligns with the claims o3 is too expensive to release publicly. Seems to be quite an impressive model and the leading out of Google, DeepSeek and Perplexity.
评论 #42914290 未加载
评论 #42913449 未加载
评论 #42913546 未加载
评论 #42913952 未加载
评论 #42913431 未加载
评论 #42914844 未加载
评论 #42913602 未加载
elashri4 months ago
It is actually interesting for people working in academia. I would like to test it but no way I can afford $200&#x2F;m right now.<p>Can someone test it with this prompt.<p>&quot;As a research assistant with comprehensive knowledge of particle physics, please provide a detailed analysis of next-generation particle collider projects currently under consideration by the international physics community.<p>The analysis should encompass the major proposed projects, including the Future Circular Collider (FCC) at CERN, International Linear Collider (ILC), Compact Linear Collider (CLIC), various Muon Collider proposals, and any other significant projects as of 2024.<p>For each proposal, examine the planned energy ranges and collision types, estimated timeline for construction and operation, technical advantages and challenges, approximate costs, and key physics goals. Include information about current technical design reports, feasibility studies, and the level of international support and collaboration.<p>Present a thorough comparative analysis that addresses technical feasibility, cost-benefit considerations, scientific potential for new physics discoveries, timeline to first data collection, infrastructure requirements, and environmental impact. The projects should be compared in terms of their relative strengths, weaknesses, and potential contributions to advancing our understanding of fundamental physics.<p>Please format the response as a structured technical summary suitable for presentation at a topical meeting of particle physicists. Where appropriate, incorporate relevant figures and tables to facilitate clear comparisons between proposals. Base your analysis on information from peer-reviewed sources and official design reports, focusing on the most current available data and design specifications.<p>Consider the long-term implications of each proposal, including potential upgrade paths, flexibility for future modifications, and integration with existing research infrastructure.&quot;
评论 #42915085 未加载
评论 #42915864 未加载
评论 #42917383 未加载
spyckie24 months ago
Is this ability really a prerequisite to AGI and ASI?<p>Reasoning, problem solving, research validation - at the fundamental outset it is all refinement thinking.<p>Research is one of those areas where I remain skeptical it is that important because the only valid proof is in the execution outcome, not the compiled answer.<p>For instance you can research all you want about the best vacuum on the internet but until you try it out yourself you are going to be caught in between marketing, fake reviews, influencers, etc. maybe the science fields are shielded from this (by being boring) but imagine medical pharmas realizing that they can get whatever paper to say whatever by flooding the internet with their curated blog articles containing advanced medical “research findings”. At some point you cannot trust the internet at all and I imagine that might be soon.<p>I worry especially with the rapidly changing landscape of the amount of generated text in the internet that research will lose a lot of value due to massive amounts of information garbage.<p>It will be a thing we used to do when the internet was still “real”.
评论 #42914986 未加载
评论 #42914744 未加载
评论 #42914133 未加载
YmiYugy4 months ago
If I understood the graphs correctly, it only achieves 20% pass rate on their internal tests. So I have to wait 30min and pay a lot of money just to sift through walls of most likely incorrect text? Unless the possibility of hallucinations is negligible, this is just way too much content to review at once. The process probably needs to be a lot more iterative.
评论 #42913867 未加载
评论 #42913574 未加载
评论 #42913483 未加载
评论 #42913698 未加载
评论 #42914603 未加载
评论 #42918940 未加载
评论 #42913685 未加载
评论 #42914288 未加载
6gvONxR4sf7o4 months ago
There are some people in the blogosphere who are known experts in their niche or even niche-famous because they write popular useful stuff. And there are a ton more people who write useful stuff because they want that &#x27;exposure.&#x27; At least, they do in the very broadest sense of writing it for another human to read it. I wonder if these people will keep writing when their readership is all bots. Dead internet here we come.
评论 #42914184 未加载
评论 #42915426 未加载
cye1314 months ago
Does anyone actually have access to this? It says available for pro users on the website today - I have pro via my employer but see no &quot;deep research&quot; option in the message composer.
评论 #42914236 未加载
评论 #42917612 未加载
评论 #42913623 未加载
评论 #42913593 未加载
评论 #42914362 未加载
评论 #42914823 未加载
评论 #42914246 未加载
评论 #42914854 未加载
adriand4 months ago
Feels like only a matter of time before these crawlers are blocked from large swathes of the internet. I understand that they’re already prohibited from Reddit and YouTube. If that spreads, this approach might be in trouble.
评论 #42913628 未加载
评论 #42913591 未加载
评论 #42913637 未加载
评论 #42916122 未加载
评论 #42913543 未加载
评论 #42916480 未加载
评论 #42913652 未加载
评论 #42914185 未加载
airstrike4 months ago
&quot;Deep research&quot; is now somehow synonymous to searching online for stats and pulling stuff from Statista? And when I want to make changes to that report, do I have to tweak my prompt and get an entirely different document?<p>Not sure if I&#x27;m too tired and can&#x27;t see it but the lack of images&#x2F;examples of the resulting report in this announcement doesn&#x27;t inspire a lot of confidence just yet.
Havoc4 months ago
The descriptions of the product sounded substantially more impressive than the actual samples tbh.<p>Still I think there is a big market for this sort of „go away for 30 mins and figure this out“ style agent
评论 #42914797 未加载
jmount4 months ago
I had no idea there was a market for &quot;Compile a research report on how the retail industry has changed in the last 3 years. Use bullets and tables where necessary for clarity.&quot; I imagine reading such a result is pure torture.
ejang04 months ago
Can anyone confirm if this is available in Canada and other countries? This site says &quot;We are still working on bringing access to users in the United Kingdom, Switzerland, and the European Economic Area.&quot; But I&#x27;m not sure about other countries. I don&#x27;t have Pro currently, only Plus.
评论 #42936422 未加载
评论 #42914048 未加载
评论 #42930745 未加载
VerdisQuo56784 months ago
The accuracy of this tool does not matter. This is exclusively designed for box ticking &quot;reports&quot; that nobody reads and a produced for the sake of itself.
评论 #42913884 未加载
评论 #42913682 未加载
thefourthchime4 months ago
OpenAI has a deep bench. I bet they pushed this out to change the narrative about deepseek
评论 #42913532 未加载
throwaway123lol4 months ago
This is so lame. This feels like another desperate attempt to stay relevant cobbled together after the DeepSeek announcement last week. What was the other attempt they made? Skip a version number to seem like more progress was made (o1-&gt;o3)? From what I can tell &quot;o3&quot; is just the same as o1 with an extra reasoning-effort parameter.<p>Oh and &quot;Deep research&quot; is available to people on the $200 per month plan? Lol - cool. I&#x27;ve been using DeepSeek a lot more recently and it&#x27;s so incredibly good even with all the scaling issues.
wilg4 months ago
I think this looks cool. Apparently unlike everyone else on this website?
评论 #42913776 未加载
usaar3334 months ago
Overall impressive.<p>Though, the jump for Gaia relative to SOTA is relatively not that high. Especially given that this is o3
jasonjmcghee4 months ago
Surprised more comments aren&#x27;t mentioning deepseek has this feature (for free) already. Assuming this is why OpenAI scrambled to release it.<p>The examples they have on the page work well on chat.deepseek.com with r1 and search options both enabled.<p>Do I blindly trust the accuracy of either though? Absolutely not. I&#x27;m pretty concerned about these models falling into gaming SEO and finding inaccurate facts and presenting them as fact. (How easy is it to fool &#x2F; prompt inject these models?)<p>But has utility if held right.
评论 #42913887 未加载
评论 #42913820 未加载
kenjackson4 months ago
If it has access to play by play data for all sports this could be an absolute playground for amateur sports statisticians. The possibilities…
highfrequency4 months ago
Can it compile and run (non-Python) code as part of its tool use? Compile-run steps always seemed like they would be a huge value add during reasoning loops - it feels very silly to get output from ChatGPT, try to run it in terminal, get an error and paste the error to have ChatGPT immediately fix it. Surely it should be able to run code during the reasoning loop itself?
评论 #42914736 未加载
pjs_4 months ago
McKinsey mode
评论 #42915821 未加载
评论 #42914827 未加载
评论 #42914738 未加载
bilater4 months ago
Not quite the agent they are building but I have an open source alternative that lets you use a variety of models, based on links of your choice to generate reports: <a href="https:&#x2F;&#x2F;github.com&#x2F;btahir&#x2F;open-deep-research">https:&#x2F;&#x2F;github.com&#x2F;btahir&#x2F;open-deep-research</a>
pazimzadeh4 months ago
&gt; In Nature journal&#x27;s Scientific Reports conference proceedings from 2012, in the article that did not mention plasmons or plasmonics, what nano-compound is studied?<p>Aren&#x27;t there more than one articles that did not mention plasmons or plasmonics in Scientific Reports in 2012?<p>Also, did they pay for access to all journal contents? that would be useful
评论 #42914477 未加载
ldjkfkdsjnv4 months ago
So much cynicism and hate in these comments, especially as we are likely witnessing AGI come to life. Its still early, but it might be coming. Where is the excitement? This is an interesting time to be alive.<p>HN has a huge cultural problem that makes this website almost irrelevant. All the interesting takes have moved to X&#x2F;twitter
评论 #42913765 未加载
评论 #42913940 未加载
评论 #42913676 未加载
评论 #42914182 未加载
评论 #42916186 未加载
评论 #42913916 未加载
评论 #42914121 未加载
评论 #42914446 未加载
rajnathani4 months ago
I remember about 10-15 years ago that Ray Kurzweil (who still works at Google) or someone at Google had this idea for what Google should be able to do: About doing deep research by itself with a simple search query. I can&#x27;t find the source. Obviously it didn&#x27;t pan out without transformers.
anon3738394 months ago
Setting aside how well it works, I think this is a pretty nice demonstration of how to do UX for an agentic RAG app. I like that the intermediate steps have been pushed out to a sidebar, with updates that both provide some transparency about the process and make the high latency more palatable.
z74 months ago
Business and technical analysis of DeepSeek&#x27;s entire R&amp;D history with extrapolations:<p><a href="https:&#x2F;&#x2F;chatgpt.com&#x2F;share&#x2F;67a0d59b-d020-8001-bb88-dc9869d52b2e" rel="nofollow">https:&#x2F;&#x2F;chatgpt.com&#x2F;share&#x2F;67a0d59b-d020-8001-bb88-dc9869d52b...</a>
picografix4 months ago
I think deep research as a service could be a really strong use case for enterprises, as long as they have access to non-public data. I assume that most of this guarded data is high quality, and seeing progress in these areas might end up being even more impressive than it is now.
RandomWorker4 months ago
I’m a researcher and honestly not worried. 1. Developing the right question has always been the largest barrier to great research. Not sure OpenAI can develop the right question without the Human experience. The second biggest part of my role is influencing people that my questions are the right questions. Which is made easier when you have a thorough understanding of the first. That being said, I’m sure there will be many people here that will tell me that algorithms already influence people, and ai can think through much of any issues there are.<p>I do use these systems from time to time, but it just never renders any specific information that would make it great research.
评论 #42915076 未加载
评论 #42915880 未加载
评论 #42915607 未加载
xt004 months ago
&quot;will find, analyze, and synthesize hundreds of online sources&quot;<p>Synthesize? Seems like the wrong word -- I think they would want to say something like, &quot;analyze, and synthesize useful outputs from hundreds of online sources&quot;..
评论 #42913816 未加载
评论 #42913587 未加载
评论 #42913630 未加载
littlestymaar4 months ago
There&#x27;s an open source version of this that you can use with local LLMs: <a href="https:&#x2F;&#x2F;www.reddit.com&#x2F;r&#x2F;LocalLLaMA&#x2F;comments&#x2F;1gvlzug&#x2F;i_created_an_ai_research_assistant_that_actually&#x2F;" rel="nofollow">https:&#x2F;&#x2F;www.reddit.com&#x2F;r&#x2F;LocalLLaMA&#x2F;comments&#x2F;1gvlzug&#x2F;i_creat...</a><p>In fact, it&#x27;s been three month since the release and I wouldn&#x27;t even be surprised if OpenAI took inspiration from it.
gwerbret4 months ago
To anyone who&#x27;s tried it: how does it handle captchas? I can&#x27;t imagine that OpenAI&#x27;s IP addresses are anyone&#x27;s favorites for unfettered access to web properties these days.
评论 #42919020 未加载
评论 #42913921 未加载
Alifatisk4 months ago
When I saw new to llms, I used Bing ai in a fun way. So when I was writing my report, it was sometimes hard to find discussions or material about a certain topic.<p>What I did was to ask Bing ai about that topic and it returned information aswell as sources to where it found those, so I picked up all those links and researched them myself.<p>Bing ai was a great resource for finding relevant links, this was until I found out about perplexity, my life haven&#x27;t been the same since.
Bjorkbat4 months ago
Actually sounds pretty cool, but the graph on expert level tasks is confusing my expectations. Saying it has a pass rate of less than 20% sounds a lot like saying this thing is wrong most of the time.<p>Granted, these strike me as difficult tasks and I’d likely ask it to do far simpler things, but I’m not really sure what to expect from looking at these graphs.<p>Ah, but the fact that it bothers to cite its sources is a huge plus. Between that and its search abilities it sounds valuable to me
评论 #42914482 未加载
layer84 months ago
From the demo: “Use bullets and tables where necessary for clarity.” It’s weird that it would be necessary to specify that. I suppose they want to showcase that you can influence the output style, but it’s strange that you’d have to explicitly specify the use of something that is “necessary for clarity”. It comes across as either a flaw in the default execution, or as a merely performative incantation.
joanfihu4 months ago
There is no way I&#x27;ll read all that text from the demos...<p>AskPandi has a similar feature called &quot;Super Search&quot; that essentially checks more sources and self validates it&#x27;s own answers.<p>iT&#x27;s AgEnTic.<p>The answers are easier to digest, if you search for products, you&#x27;ll get a list of products with images, prices and retailers.
gqgs4 months ago
If I&#x27;m understanding this correctly it sounds functionally similar to the report generating project Standard released a few weeks ago [1].<p>[1] <a href="https:&#x2F;&#x2F;storm.genie.stanford.edu&#x2F;" rel="nofollow">https:&#x2F;&#x2F;storm.genie.stanford.edu&#x2F;</a>
rob_c4 months ago
Feels more and more like openAI doesn&#x27;t have &quot;that next big thing&quot;.<p>To be clear I&#x27;m constantly impressed with what they have and what I get as a customer, but the delivery since 4 hasn&#x27;t exactly been in line with Altman&#x27;s Musk-tier vapoware promises...
martin824 months ago
They reduce one expensive but mostly useless gimmick after the other.<p>No one if ever going to trust the output this generates and then needs to spend so much time fact checking that they might as well do the entire research from scratch themselves...
Xuban4 months ago
This make sense, I often use the normal search feature to research a very large ammount of information and it mostly does not work well. If the new search feature increases the number of websites scrapped and the pertinence of the websites, I&#x27;m all in.
esafak4 months ago
Is there a benchmark we can compare this against You.com&#x27;s research mode? It looks like R1 forced them to release o3 prematurely and give it Internet access. And they didn&#x27;t want to say they released o3 so they called it &#x27;Deep Research&#x27;.
lolpanda4 months ago
&quot;synthesize large amounts of online information&quot; does it heavily depend on the search engine performance and relevance of the search results? I don&#x27;t see any mention of Google or Bing. Is this using their internal search engine then?
chrismarlow94 months ago
This smells like when Google released Gemini to have a product in the space.
评论 #42914567 未加载
评论 #42913562 未加载
monkeydust4 months ago
What a decent setup to replicate via open model and agent framework? One thing I have struggled with is getting comprehensive web searches using an agentic framework.
therealmarv4 months ago
I don&#x27;t know. OpenAI is so bad in naming... the average person on the street will confuse Deepseek with Deep Research. Also not to forget o1, o3 ... 4o
评论 #42913745 未加载
评论 #42913760 未加载
评论 #42913750 未加载
getnormality4 months ago
The demo on global e-commerce trends seems less useful than a Google search, where the AI answer will at least give you links to the claimed information.
prng20214 months ago
&quot;Deep research was trained using end-to-end reinforcement learning&quot;<p>Does this mean they skipped supervised fine tuning like DeepSeek did with R1?
评论 #42914204 未加载
TheGradfather4 months ago
The OpenAI Deep Research graph showing tool calls vs pass rate reveals something fascinating about how these models handle increasing amounts of information. The relationship follows a logistic curve that plateaus around 16% pass rate, even as we allow more tool calls.<p>This plateau behavior reflects something deeper about our current approach to AI. We&#x27;ve built transformer architectures partly inspired by simplified observations of human cognition - particularly how our brains use attention mechanisms to filter and process information. And like human attention, these models have inherent constraints: each attention layer normalizes scores to sum to 1, creating a fixed &quot;attention budget&quot; that must be distributed across all inputs.<p>A recent paper (<a href="https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;2501.19399" rel="nofollow">https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;2501.19399</a>) explores this limitation, showing how standard attention becomes increasingly diffuse with longer contexts. Their proposed &quot;Scalable-Softmax&quot; helps maintain focused attention at longer ranges, but still shows diminishing returns - pushing the ceiling higher rather than eliminating it.<p>But here&#x27;s the deeper question: As we push toward AGI and potentially superintelligent systems, should we remain bound by architectures modeled on our current understanding of human cognition? The human brain&#x27;s limited attention mechanism evolved under specific constraints and for specific purposes. While it&#x27;s remarkably effective for human-level intelligence, it might be fundamentally limiting for artificial systems that could theoretically process information in radically different ways.<p>Looking at the Deep Research results through this lens, the plateau might not just be a technical limitation to overcome, but a sign that we need to fundamentally rethink how artificial systems could process and integrate information. Instead of trying to stretch the capabilities of attention-based architectures, perhaps we need to explore entirely different paradigms of information processing that aren&#x27;t constrained by biological analogues.<p>This isn&#x27;t to dismiss the remarkable achievements of transformer architectures, but rather to suggest that the path to AGI might require breaking free from some of our biologically-inspired assumptions. What would an architecture that processes information in ways fundamentally different from human cognition look like? How might it integrate and reason about information without the constraints of normalized attention?<p>Would love to hear thoughts from others working on these problems, particularly around novel approaches that move beyond our current biological inspirations.
tomrod4 months ago
I&#x27;m not sure if this is worth a subscription. DSPy and DeepseekR1 can already move this direction, if I understand right.
评论 #42913492 未加载
评论 #42913529 未加载
axpy9064 months ago
Don’t most researchers have a local setup plugged into Olama so that they do NOT share their search information?
sivm4 months ago
I used it once to research language learning and had my pro mode taken away pending review for abuse.
freehorse4 months ago
I love that when &quot;open&quot;ai releases things last year or so, they do not actually release them. So we get the chance in the meantime to all enjoy a bunch of speculative, shilling comments here about this next great thing being miles ahead of competitors&#x2F;close to AGI&#x2F;the tool that will actually do X thing that others complain so far llms are failing to do.
PartiallyTyped4 months ago
I feel that a lot of this can already be achieved via aider (not affiliated), and any of the top models.
评论 #42913494 未加载
resters4 months ago
Still not seeing access on my account.
评论 #42914908 未加载
sharpshadow4 months ago
Are they launching a new feature after some other AI got the attention to get the attention back?
enknamel4 months ago
So they did a RAG on the whole internet? Basically Google search results summary but better?
titzer4 months ago
It&#x27;s great that none of these AI models are being foisted on us by advertising companies.
ldjkfkdsjnv4 months ago
Say whatever you want about openAI, they are shipping more than any other company on the planet.
评论 #42913516 未加载
评论 #42913493 未加载
评论 #42914095 未加载
reader92744 months ago
I think we&#x27;re all reaching AI fatigue. Fewer and fewer people care anymore
评论 #42913656 未加载
评论 #42915259 未加载
评论 #42913549 未加载
dazzaji4 months ago
Late Sunday night, I gained access to OpenAI’s newly launched Deep Research and immediately tested it on a draft blog post about Uniform Electronic Transactions Act (UETA) compliance and AI-agent error handling [1]. Here’s what I found:<p>Within minutes, it generated a detailed, well-cited research report that significantly expanded my original analysis, covering: * Legal precedents &amp; case law interpretations (including a nuanced breakdown of UETA Section 10). * Comparative international frameworks (EU, UK, Canada). * Real-world technical implementations (Stripe’s AI-driven transaction handling). * Industry perspectives &amp; business impact (trust, risk allocation, compliance). * Emerging regulatory standards (EU AI Act, FTC oversight, ISO&#x2F;NIST AI governance).<p>What stood out most was its ability to: - Synthesize complex legal, business, and technical concepts into clear, actionable insights. - Connect legal frameworks, industry trends, and real-world case studies. - Maintain a business-first focus, emphasizing practical benefits. - Integrate 2024 developments with historical context for a deeper analysis.<p>The depth and coherence of the output were comparable to what I would expect from a team of domain experts—but delivered in a fraction of the time.<p>From the announcement: Deep Research leverages OpenAI’s next-generation model, optimized for multi-step research, reasoning, and synthesis. It has already set new performance benchmarks, achieving 26.6% accuracy on Humanity’s Last Exam (the highest of any OpenAI model) and a 72.57% average accuracy on the GAIA Benchmark, demonstrating advanced reasoning and research capabilities.<p>Currently available to Pro users (with up to 100 queries per month), it will soon expand to Plus and Team users. While OpenAI acknowledges limitations—such as occasional hallucinations and challenges in source verification—its iterative deployment strategy and continuous refinement approach are promising.<p>My key takeaway: This LLM agent-based tool has the potential to save hours of manual research while delivering high-quality, well-documented outputs. Automating tasks that traditionally require expert-level investigation, it can complete complex research in 5–30 minutes (just 6 minutes for my task), with citations and structured reasoning.<p>I don’t see any other comments yet from people who have actually used it, but it’s only been a few hours.I’d love to hear how it’s performing for others. What use cases have you explored? How did it do?<p>(Note: This review is based on a single use case. I’ll provide further updates as I conduct broader testing.)<p>[1] <a href="https:&#x2F;&#x2F;www.dazzagreenwood.com&#x2F;p&#x2F;ueta-and-llm-agents-a-deep-dive-into" rel="nofollow">https:&#x2F;&#x2F;www.dazzagreenwood.com&#x2F;p&#x2F;ueta-and-llm-agents-a-deep-...</a>
评论 #42917014 未加载
gigatexal4 months ago
Ok so I do this as a noob in some field. How do I know or trust the research conclusions? How do I know it’s not hallucinated its conclusions? I’ll likely have to do my own research to just verify it and then if I did I might as well have done the research myself.
corentin884 months ago
Curious about the use cases here. Building AI Agents? But which one?
regularjack4 months ago
Of course, they had to weasel the word &quot;deep&quot; in there.
EcommerceFlow4 months ago
Can&#x27;t even get Sunday nights off trying to keep up fml.
DoctorOetker4 months ago
Would formalizing Wiles&#x27; proof of Fermat&#x27;s Last Theorem be considered deep research? Is it able to formalize it in say metamath&#x27;s set.mm?<p>Or is the position of OpenAI that Wiles&#x27; proof is incomplete?
评论 #42925214 未加载
auggierose4 months ago
The flow reminds me a bit of undermind.ai.
tmnvdb4 months ago
Eating popcorn while the scaling doubters scramble to move the goalposts for the nth time.
评论 #42913659 未加载
评论 #42915262 未加载
taran_narat4 months ago
isn&#x27;t this just perplexity?
teleforce4 months ago
What a coincident of releasing deep research for your product when one of your main competitors has DeepSeek R1 as their best performant version &#x2F;s<p>Seriously, for the past 20+ years it&#x27;s hard to imagine doing research without Google platform namely Google Search, Scholar, Patent and Book, but now it seems agent AI based on LLM is the way to. In twenty years in the future it will be hard to imagine that doing research without them. But as many people already pointed out Google probably the best company by far to perform this emerging AI based research. In data eco-system terms (refer to any book on data engineering), Google has already perform has the most important data preparation and data engineering upstream activities including data ingestion and transformation. Now given their vast amount of processed data they can just serve it to downstream data analytics or AI for performing research with minimum error&#x2F;hallucinations as possible. According to Google there is no moat for any companies against open source LLM, but if any company that can has the moat it will be Google itself.
jaco64 months ago
I see lots of warranted skepticism about the capabilities of this tool, but the reality is that this is an incremental step toward full automation of white collar labor. No, it will not make all analysts jobless overnight. But it may reduce hiring of said people by 5 or 10 percent. And as people get better at using the tool and the tool itself gets better, those numbers will grow. Remember that it took decades for the giant pool of typing secretaries in Mad Men to disappear, but they did disappear. Gone forever. Interestingly, anger about the diminishment of secretarial male white collar work in Germany due to the spread of the typewriter a few decades earlier was one of the drivers of the Nazi Party’s popularity (see Evans, the Rise of the Third Reich).<p>AI’s triumph in the white collar workplace will be gradual, not instantaneous. And it will be grimly quiet, because no one likes white collar workers the way they like blue collar workers, for some odd reason, and there’s no tradition of solidarity among white collar workers. Everyone will just look up one day and find that the local Big Corp headquarters is…empty.
ADeerAppeared4 months ago
I&#x27;m sorry but what the fuck is this product pitch?<p>Anyone who&#x27;s done any kind of substantial document research knows that it&#x27;s a <i>NIGHTMARE</i> of chasing loose ends &amp; citogenesis.<p>Trusting an LLM to critically evaluate every source and to be deeply suspect of any unproven claim is a ridiculous thing to do. These are not hard reasoning systems, they are probabilistic language models.
评论 #42913741 未加载
评论 #42913638 未加载
评论 #42914021 未加载
tucnak4 months ago
Look, who&#x27;s copying who now. They added _the_ button!
评论 #42913395 未加载
febin4 months ago
Is this &quot;deep research&quot; tool exploiting open knowledge creators, using their work without compensation?
评论 #42913564 未加载
评论 #42913461 未加载
评论 #42913592 未加载
评论 #42914283 未加载
评论 #42914956 未加载
评论 #42914073 未加载
评论 #42913450 未加载
spyckie24 months ago
Why is HN not creating policy against moral prigotry? There is no useful discussion here anymore.<p>Seriously begging the mods to take a closer look, or at least PG to not abandon his curated internet space.
评论 #42913830 未加载
RayVR4 months ago
Each release from openAI gives me less hope for them and this whole AI boom. They should be leading the charge of highlighting how the current generation of LLMs fail, not churning out half-baked overhyped products.<p>Yes, they can do some cool tricks, and tool calling is fun. No one should trust the output of these models, though. The hallucinations are bad, and my experience with the &quot;reasoning&quot; models is that as soon as they fuck up (they always do) they go off the rails worse than the base LLMs.
rvz4 months ago
It appears that OpenAI is in panic mode after the release of DeepSeek. Before they were confident in competing against Google on any AI model they release.<p>Now they are scrambling against open-source after their disastrous operator demonstration and using this deep research demo as cover. Nothing that Google or Perplexity could not already do themselves.<p>By the end of them month, this feature is going be added by a bunch of other open-source projects and this feature won&#x27;t be as interesting very quickly.
评论 #42920459 未加载
nycdatasci4 months ago
Pro user. No access like everyone else.<p>OpenAI is very much in an existential crisis and their poor execution is not helping their cause. Operator or “deep research” should be able to assume the role of a Pro user, run a quick test, and reliably report on whether this is working before the press release right?
评论 #42914928 未加载
评论 #42914974 未加载
smusamashah4 months ago
<p><pre><code> can sometimes hallucinate facts in responses or make incorrect inferences, though at a notably lower rate than existing ChatGPT models, according to internal evaluations. It may struggle with distinguishing authoritative information from rumors, and currently shows weakness in confidence calibration, often failing to convey uncertainty accurately </code></pre> Taken from the limitations section.<p>These tools are just good at creating pollution. I don&#x27;t see the point of delegating a (not just) research where 1% blatant mistakes are acceptable. These need much better grounding before handing out to masses.<p>I can not take any output by these tools (google summaries, comment summaries by amazon, youtube summaries etc etc) while knowing for a fact some of that is a total lie. I can not tell which part is a lie. e.g. If LLM says that in any given text the sentiment is divided, it could be just one person with an opposing view.<p>If same task was given to a person, I could reason with that person on <i>any</i> conclusion. These tools will reason on their hallucinations.