TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Deep Research is now available on Gemini 2.5 Pro Experimental

88 点作者 extesy大约 1 个月前

5 条评论

doctoboggan大约 1 个月前
&gt; In our testing, raters preferred the reports generated by Gemini Deep Research powered by 2.5 Pro over other leading deep research providers by more than a 2-to-1 margin.<p>Are these raters experts in the field the report was written on? Did they rate the reports on factuality, broadness, and insights?<p>These sort of tests (and RLHF in general) are the reason that LLMs often respond with &quot;Great question, you are exactly right to wonder...&quot; or &quot;Interesting insight, I agree that...&quot;. I do not want this obsequious behavior, I want &quot;correct answers&quot;[0]. We need some better benchmarks when it comes to human preference.<p>[0]: I know there is no objective correct answer for some questions.
评论 #43627984 未加载
jeffbee大约 1 个月前
I stumbled across the feature a few hours ago. I had asked Gemini why there&#x27;s a hole in the middle of the city of Azusa, topologically speaking. It had given me a useless tautological response: because they never annexed it. Then it offered to create a research report and I agreed. Five minutes later I got a notification on my mobile that the report was ready. It had 120 sources including assessor&#x27;s maps, historical maps, court cases, and narrative articles. The text that went along with it was too verbose and still contained paragraphs of vague stuff, but it had key information linking the Mexican land grants, the founding of the city, and other events of history. Very impressive.
评论 #43627941 未加载
DadBase大约 1 个月前
Deep research used to mean spending a weekend with grep and a coffee pot. Now it’s just autocomplete with a confidence interval.
评论 #43627889 未加载
评论 #43627929 未加载
评论 #43627950 未加载
pizzly大约 1 个月前
Just tested it on a case we were working on for months so we can better validate the output. We found it was really good at finding websites from google searches and can navigate websites. From that it gave a good compressive review of the case. Where it failed is searching online databases i.e. one example is a business register. If the search result does not have the exact same keyword it will not review the result. However, the keyword appeared within the document of the search result and thus it missed out on this key information. Overall very good but still needs some work.
infecto大约 1 个月前
Has anyone tested googles functionality vs ChatGPT? I have lightly played around with it but felt that generally ChatGPTs implementation was a little more educated sounding and felt like it took whatever necessary persona well.
评论 #43627782 未加载
评论 #43627974 未加载