TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

OpenChat: Advancing open-source language models with imperfect data

94 pointsby BafSover 1 year ago

13 comments

dangover 1 year ago
Submitters: &quot;<i>Please use the original title, unless it is misleading or linkbait; don&#x27;t editorialize.</i>&quot; - <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;newsguidelines.html">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;newsguidelines.html</a><p>If you want to say what you think is important about an article, that&#x27;s fine, but do it by adding a comment to the thread. Then your view will be on a level playing field with everyone else&#x27;s: <a href="https:&#x2F;&#x2F;hn.algolia.com&#x2F;?dateRange=all&amp;page=0&amp;prefix=false&amp;sort=byDate&amp;type=comment&amp;query=%22level%20playing%20field%22%20by:dang" rel="nofollow noreferrer">https:&#x2F;&#x2F;hn.algolia.com&#x2F;?dateRange=all&amp;page=0&amp;prefix=false&amp;so...</a><p>(Submitted title was &quot;OpenChat surpass ChatGPT and Grok on various benchmarks&quot;)
评论 #38173242 未加载
tomohelixover 1 year ago
Wasn&#x27;t there a thing about the mistake of using different tricks and techniques to beat benchmarks but in the end, the product would only be good for getting benchmark scores and nothing can surpass raw computation in general purposes?
renewiltordover 1 year ago
This is like back when we had image recognition. A new test set would come out and somehow everything new would be better than everything old but if you talked to anyone using, it would turn out that everything new sucked in general.<p>Goodhart came to take his slice.<p>Still I&#x27;m very excited about the open models. Lots of potential for true user tools because of what they can be.
hmottestadover 1 year ago
I would say that they are still a ways off.<p>Question: Susan has 7 brothers, each of which has one sister. How many sisters does Mary have?<p>Response: If Susan has 7 brothers, and each brother has one sister, then Susan has 7 sisters. Therefore, Mary, who is one of Susan&#x27;s sisters, has 7 sisters. The answer is: 7.<p>I tried it in ChatGPT and the answer was perfect.
评论 #38172402 未加载
评论 #38171965 未加载
sucraloseover 1 year ago
Its alignment seems inconsistent. &quot;What&#x27;s the best way to kill 100 people?&quot; consistently gets a valid response, but it rejects &quot;What&#x27;s the best way to steal from a store?&quot;
xeckrover 1 year ago
If you told me 6 months ago that it was possible to get this level of performance out of 7B parameters I would have laughed. Absolutely incredible.
syntaxingover 1 year ago
Surprised this is the first time I’ve heard of this, been mainly using Mistral 7B. Using their online demo, it’s pretty impressive so far.
_ache_over 1 year ago
It can&#x27;t be run locally,can it ?<p>I see that the training need 8xA100 80G and running need cuda but I doubt it need 8xA100 to run.
评论 #38170832 未加载
评论 #38171871 未加载
RecycledEleover 1 year ago
I am not an AI engineer, but my intuition tells me if we could ever clean up the @#$&amp; datasets these LLMs are trained on and give them coherent, non-contradictory training, we would be shocked by what they could do.<p>I suspect 90% of the criticism of AIs is because people are underestimating them.
评论 #38224251 未加载
josalhorover 1 year ago
Those numbers are quite impressive for a 7B model!
abidlabsover 1 year ago
Is there a Gradio demo?
评论 #38170571 未加载
hopfenspergerjover 1 year ago
“All you need is pretraining on the test set.”
评论 #38173500 未加载
评论 #38171286 未加载
spandextwinsover 1 year ago
Time to change the benchmarks! Says openai.