TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

An In-depth Look at Gemini's Language Abilities

118 点作者 tbruckner超过 1 年前

10 条评论

unstuck3958超过 1 年前
It&#x27;s incredible how accurate the Chatbot Arena Leaderboard [0] is at predicting model performance compared to benchmarks (which can and are being gamed, see all the 7B models on HF leaderboard)<p>[0]: <a href="https:&#x2F;&#x2F;huggingface.co&#x2F;spaces&#x2F;lmsys&#x2F;chatbot-arena-leaderboard" rel="nofollow noreferrer">https:&#x2F;&#x2F;huggingface.co&#x2F;spaces&#x2F;lmsys&#x2F;chatbot-arena-leaderboar...</a>
评论 #38696518 未加载
评论 #38697212 未加载
评论 #38696447 未加载
评论 #38696408 未加载
评论 #38696362 未加载
评论 #38696853 未加载
评论 #38696515 未加载
评论 #38696345 未加载
paxys超过 1 年前
Has anyone (outside of Google) gotten to play with Gemini Ultra yet? Been hearing a lot about Pro, but I&#x27;d be interested in seeing whether Ultra is really close to as capable as they claim.<p>Also very interesting that Mixtral 8x7B ranks in the same neighborhood as Gemini Pro&#x2F;GPT 3.5 Turbo&#x2F;Claude 2.1 while being fully open source and Apache 2.0 licensed.
评论 #38696347 未加载
评论 #38696227 未加载
评论 #38696804 未加载
helsinkiandrew超过 1 年前
One thing this doesn&#x27;t cover is speed&#x2F;latency of response. Some of the things I&#x27;m doing on ChatGPT could probably work on a 90% capable GPT3.5 and other factors become more of an issue.<p>If I was already using GCP and they reduced their price (&gt;10%) and offered tight integration with rest of GCP services it would still be appealing.
GaggiX超过 1 年前
Gemini Pro API charges you by char instead of by token, I guess it&#x27;s good news for Chinese&#x2F;Japanese users and other speakers of less common languages, for example the OpenAI tokenizer needs two tokens to encode a single Georgian letter.
m3kw9超过 1 年前
Starling 7b alpha is close to gpt3.5 but it asking list 5 capitals sorted by population size and it will run on like forever generating 100 capitals and explainations and gibrish. Not sure if it’s my prompt, but it sure isn’t close to the intelligence of gpt3.5 when asked the same.<p>Starling:<p>```<p>USER list 5 largest capitals sorted by largest population first<p>AI Here are some countries and their capitals in alphabetical order Australia: Canberra Belgium: Brussels Canada: Ottawa Finland: Helsinki Italy: Rome France: Paris Germany: Berlin Greece: Athens Ireland: Dublin …(lists 50 others)<p>Gpt3.5<p>list 5 largest capitals sorted by largest population first<p>ChatGPT Certainly! To list the 5 largest capital cities by population, I&#x27;ll provide you with the information as of my knowledge cutoff date in 2022. Please note that population figures may change over time, and it&#x27;s a good idea to verify the latest data from reliable sources if needed. Here are the 5 largest capital cities sorted by population, starting with the largest:<p>Beijing, China: Beijing is the capital of China and has a population of over 21 million people as of 2022. New Delhi, India: New Delhi is the capital of India and is part of the larger National Capital Territory of Delhi. It has a population of approximately 16.8 million people as of 2022. Tokyo, Japan: Tokyo is the capital of Japan, and while the city itself is not officially designated as the capital, it serves as the de facto capital. The Tokyo metropolitan area has a population of around 14 million people. Moscow, Russia: Moscow is the capital of Russia and has a population of over 12 million people. Dhaka, Bangladesh: Dhaka is the capital of Bangladesh and has a population of approximately 8.9 million people. ```
评论 #38699404 未加载
dang超过 1 年前
Submitters: &quot;<i>Please use the original title, unless it is misleading or linkbait; don&#x27;t editorialize.</i>&quot; - <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;newsguidelines.html">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;newsguidelines.html</a><p>(Submitted title was &quot;Gemini Pro achieves accuracy slightly inferior to GPT 3.5 Turbo&quot;.)<p>If you want to say what you think is important about an article, that&#x27;s fine, but do it by adding a comment to the thread. Then your view will be on a level playing field with everyone else&#x27;s: <a href="https:&#x2F;&#x2F;hn.algolia.com&#x2F;?dateRange=all&amp;page=0&amp;prefix=false&amp;sort=byDate&amp;type=comment&amp;query=%22level%20playing%20field%22%20by:dang" rel="nofollow noreferrer">https:&#x2F;&#x2F;hn.algolia.com&#x2F;?dateRange=all&amp;page=0&amp;prefix=false&amp;so...</a>
jiggawatts超过 1 年前
Does anyone else have the sinking feeling that GPT 4 is as good as things will get for quite a while?<p>Someone described LLMs as “blurry JPEGs of the Internet”.<p>In that sense, maybe GPT 4 is as smart as the hive mind of the Internet gets, and newer models just take sharper pictures but of the same subject. Perhaps GPT 4 trained on one of the best subsets available and everything else is going to be worse or the same…<p>It’s curious that Sam Altman has publicly stated that OpenAI isn’t working on GPT 5. Why not? Is it because they know it’s a pointless exercise with the current training approaches?
lsy超过 1 年前
I don&#x27;t think &quot;accuracy&quot; is going to be the defining feature of which chatbot succeeds. People just aren&#x27;t using them for tasks where a 3-5 point difference makes the grade, because the difference between 67 and 100 is more important than the difference between 64 and 67. If you can integrate a relatively speedy bot somewhere people can use it conveniently that&#x27;ll get more usage than a slightly more factual response you have to tab out to.
we_love_idf超过 1 年前
I don&#x27;t understand why people keep falling for Google&#x27;s ad campaign. Google have its lead in AI playing video games and board games. It is cool, entertaining and all that jazz. But OpenAI and MS are the real leaders in real AI.
评论 #38696726 未加载
评论 #38698077 未加载
评论 #38696880 未加载
jimsimmons超过 1 年前
The Gemini white paper reports higher scores on HumanEval and other tasks.<p>So one of Google lied, this eval has bugs, they borked the deployment is true
评论 #38696668 未加载