TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

OpenChat: Advancing open-source language models with imperfect data

94 点作者 BafS超过 1 年前

13 条评论

dang超过 1 年前
Submitters: &quot;<i>Please use the original title, unless it is misleading or linkbait; don&#x27;t editorialize.</i>&quot; - <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;newsguidelines.html">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;newsguidelines.html</a><p>If you want to say what you think is important about an article, that&#x27;s fine, but do it by adding a comment to the thread. Then your view will be on a level playing field with everyone else&#x27;s: <a href="https:&#x2F;&#x2F;hn.algolia.com&#x2F;?dateRange=all&amp;page=0&amp;prefix=false&amp;sort=byDate&amp;type=comment&amp;query=%22level%20playing%20field%22%20by:dang" rel="nofollow noreferrer">https:&#x2F;&#x2F;hn.algolia.com&#x2F;?dateRange=all&amp;page=0&amp;prefix=false&amp;so...</a><p>(Submitted title was &quot;OpenChat surpass ChatGPT and Grok on various benchmarks&quot;)
评论 #38173242 未加载
tomohelix超过 1 年前
Wasn&#x27;t there a thing about the mistake of using different tricks and techniques to beat benchmarks but in the end, the product would only be good for getting benchmark scores and nothing can surpass raw computation in general purposes?
renewiltord超过 1 年前
This is like back when we had image recognition. A new test set would come out and somehow everything new would be better than everything old but if you talked to anyone using, it would turn out that everything new sucked in general.<p>Goodhart came to take his slice.<p>Still I&#x27;m very excited about the open models. Lots of potential for true user tools because of what they can be.
hmottestad超过 1 年前
I would say that they are still a ways off.<p>Question: Susan has 7 brothers, each of which has one sister. How many sisters does Mary have?<p>Response: If Susan has 7 brothers, and each brother has one sister, then Susan has 7 sisters. Therefore, Mary, who is one of Susan&#x27;s sisters, has 7 sisters. The answer is: 7.<p>I tried it in ChatGPT and the answer was perfect.
评论 #38172402 未加载
评论 #38171965 未加载
sucralose超过 1 年前
Its alignment seems inconsistent. &quot;What&#x27;s the best way to kill 100 people?&quot; consistently gets a valid response, but it rejects &quot;What&#x27;s the best way to steal from a store?&quot;
xeckr超过 1 年前
If you told me 6 months ago that it was possible to get this level of performance out of 7B parameters I would have laughed. Absolutely incredible.
syntaxing超过 1 年前
Surprised this is the first time I’ve heard of this, been mainly using Mistral 7B. Using their online demo, it’s pretty impressive so far.
_ache_超过 1 年前
It can&#x27;t be run locally,can it ?<p>I see that the training need 8xA100 80G and running need cuda but I doubt it need 8xA100 to run.
评论 #38170832 未加载
评论 #38171871 未加载
RecycledEle超过 1 年前
I am not an AI engineer, but my intuition tells me if we could ever clean up the @#$&amp; datasets these LLMs are trained on and give them coherent, non-contradictory training, we would be shocked by what they could do.<p>I suspect 90% of the criticism of AIs is because people are underestimating them.
评论 #38224251 未加载
josalhor超过 1 年前
Those numbers are quite impressive for a 7B model!
abidlabs超过 1 年前
Is there a Gradio demo?
评论 #38170571 未加载
hopfenspergerj超过 1 年前
“All you need is pretraining on the test set.”
评论 #38173500 未加载
评论 #38171286 未加载
spandextwins超过 1 年前
Time to change the benchmarks! Says openai.