科技回声

10 条评论

It's incredible how accurate the Chatbot Arena Leaderboard [0] is at predicting model performance compared to benchmarks (which can and are being gamed, see all the 7B models on HF leaderboard)[0]: <a href="https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboard" rel="nofollow noreferrer">https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboar...</a>

评论 #38696518 未加载

评论 #38697212 未加载

评论 #38696447 未加载

评论 #38696408 未加载

评论 #38696362 未加载

评论 #38696853 未加载

评论 #38696515 未加载

评论 #38696345 未加载

paxys超过 1 年前

Has anyone (outside of Google) gotten to play with Gemini Ultra yet? Been hearing a lot about Pro, but I'd be interested in seeing whether Ultra is really close to as capable as they claim.Also very interesting that Mixtral 8x7B ranks in the same neighborhood as Gemini Pro/GPT 3.5 Turbo/Claude 2.1 while being fully open source and Apache 2.0 licensed.

评论 #38696347 未加载

评论 #38696227 未加载

评论 #38696804 未加载

helsinkiandrew超过 1 年前

One thing this doesn't cover is speed/latency of response. Some of the things I'm doing on ChatGPT could probably work on a 90% capable GPT3.5 and other factors become more of an issue.If I was already using GCP and they reduced their price (>10%) and offered tight integration with rest of GCP services it would still be appealing.

GaggiX超过 1 年前

Gemini Pro API charges you by char instead of by token, I guess it's good news for Chinese/Japanese users and other speakers of less common languages, for example the OpenAI tokenizer needs two tokens to encode a single Georgian letter.

m3kw9超过 1 年前

Starling 7b alpha is close to gpt3.5 but it asking list 5 capitals sorted by population size and it will run on like forever generating 100 capitals and explainations and gibrish. Not sure if it’s my prompt, but it sure isn’t close to the intelligence of gpt3.5 when asked the same.Starling:```USER list 5 largest capitals sorted by largest population firstAI Here are some countries and their capitals in alphabetical order Australia: Canberra Belgium: Brussels Canada: Ottawa Finland: Helsinki Italy: Rome France: Paris Germany: Berlin Greece: Athens Ireland: Dublin …(lists 50 others)Gpt3.5list 5 largest capitals sorted by largest population firstChatGPT Certainly! To list the 5 largest capital cities by population, I'll provide you with the information as of my knowledge cutoff date in 2022. Please note that population figures may change over time, and it's a good idea to verify the latest data from reliable sources if needed. Here are the 5 largest capital cities sorted by population, starting with the largest:Beijing, China: Beijing is the capital of China and has a population of over 21 million people as of 2022. New Delhi, India: New Delhi is the capital of India and is part of the larger National Capital Territory of Delhi. It has a population of approximately 16.8 million people as of 2022. Tokyo, Japan: Tokyo is the capital of Japan, and while the city itself is not officially designated as the capital, it serves as the de facto capital. The Tokyo metropolitan area has a population of around 14 million people. Moscow, Russia: Moscow is the capital of Russia and has a population of over 12 million people. Dhaka, Bangladesh: Dhaka is the capital of Bangladesh and has a population of approximately 8.9 million people. ```

评论 #38699404 未加载

dang超过 1 年前

Submitters: "Please use the original title, unless it is misleading or linkbait; don't editorialize." - <a href="https://news.ycombinator.com/newsguidelines.html">https://news.ycombinator.com/newsguidelines.html</a>(Submitted title was "Gemini Pro achieves accuracy slightly inferior to GPT 3.5 Turbo".)If you want to say what you think is important about an article, that's fine, but do it by adding a comment to the thread. Then your view will be on a level playing field with everyone else's: <a href="https://hn.algolia.com/?dateRange=all&page=0&prefix=false&sort=byDate&type=comment&query=%22level%20playing%20field%22%20by:dang" rel="nofollow noreferrer">https://hn.algolia.com/?dateRange=all&page=0&prefix=false&so...</a>

jiggawatts超过 1 年前

Does anyone else have the sinking feeling that GPT 4 is as good as things will get for quite a while?Someone described LLMs as “blurry JPEGs of the Internet”.In that sense, maybe GPT 4 is as smart as the hive mind of the Internet gets, and newer models just take sharper pictures but of the same subject. Perhaps GPT 4 trained on one of the best subsets available and everything else is going to be worse or the same…It’s curious that Sam Altman has publicly stated that OpenAI isn’t working on GPT 5. Why not? Is it because they know it’s a pointless exercise with the current training approaches?

lsy超过 1 年前

I don't think "accuracy" is going to be the defining feature of which chatbot succeeds. People just aren't using them for tasks where a 3-5 point difference makes the grade, because the difference between 67 and 100 is more important than the difference between 64 and 67. If you can integrate a relatively speedy bot somewhere people can use it conveniently that'll get more usage than a slightly more factual response you have to tab out to.

we_love_idf超过 1 年前

I don't understand why people keep falling for Google's ad campaign. Google have its lead in AI playing video games and board games. It is cool, entertaining and all that jazz. But OpenAI and MS are the real leaders in real AI.

评论 #38696726 未加载

评论 #38698077 未加载

评论 #38696880 未加载

jimsimmons超过 1 年前

The Gemini white paper reports higher scores on HumanEval and other tasks.So one of Google lied, this eval has bugs, they borked the deployment is true

评论 #38696668 未加载

10 条评论

unstuck3958超过 1 年前

评论 #38696518 未加载

评论 #38697212 未加载

评论 #38696447 未加载

评论 #38696408 未加载

评论 #38696362 未加载

评论 #38696853 未加载

评论 #38696515 未加载

评论 #38696345 未加载

paxys超过 1 年前

评论 #38696347 未加载

评论 #38696227 未加载

评论 #38696804 未加载

helsinkiandrew超过 1 年前

GaggiX超过 1 年前

m3kw9超过 1 年前

评论 #38699404 未加载

dang超过 1 年前

jiggawatts超过 1 年前

lsy超过 1 年前

we_love_idf超过 1 年前

评论 #38696726 未加载

评论 #38698077 未加载

评论 #38696880 未加载

jimsimmons超过 1 年前

The Gemini white paper reports higher scores on HumanEval and other tasks.So one of Google lied, this eval has bugs, they borked the deployment is true

评论 #38696668 未加载

An In-depth Look at Gemini's Language Abilities

10 条评论

An In-depth Look at Gemini's Language Abilities

10 条评论