科技回声

13 条评论

dang超过 1 年前

Submitters: "Please use the original title, unless it is misleading or linkbait; don't editorialize." - <a href="https://news.ycombinator.com/newsguidelines.html">https://news.ycombinator.com/newsguidelines.html</a>If you want to say what you think is important about an article, that's fine, but do it by adding a comment to the thread. Then your view will be on a level playing field with everyone else's: <a href="https://hn.algolia.com/?dateRange=all&page=0&prefix=false&sort=byDate&type=comment&query=%22level%20playing%20field%22%20by:dang" rel="nofollow noreferrer">https://hn.algolia.com/?dateRange=all&page=0&prefix=false&so...</a>(Submitted title was "OpenChat surpass ChatGPT and Grok on various benchmarks")

评论 #38173242 未加载

tomohelix超过 1 年前

Wasn't there a thing about the mistake of using different tricks and techniques to beat benchmarks but in the end, the product would only be good for getting benchmark scores and nothing can surpass raw computation in general purposes?

renewiltord超过 1 年前

This is like back when we had image recognition. A new test set would come out and somehow everything new would be better than everything old but if you talked to anyone using, it would turn out that everything new sucked in general.Goodhart came to take his slice.Still I'm very excited about the open models. Lots of potential for true user tools because of what they can be.

hmottestad超过 1 年前

I would say that they are still a ways off.Question: Susan has 7 brothers, each of which has one sister. How many sisters does Mary have?Response: If Susan has 7 brothers, and each brother has one sister, then Susan has 7 sisters. Therefore, Mary, who is one of Susan's sisters, has 7 sisters. The answer is: 7.I tried it in ChatGPT and the answer was perfect.

评论 #38172402 未加载

评论 #38171965 未加载

sucralose超过 1 年前

Its alignment seems inconsistent. "What's the best way to kill 100 people?" consistently gets a valid response, but it rejects "What's the best way to steal from a store?"

xeckr超过 1 年前

If you told me 6 months ago that it was possible to get this level of performance out of 7B parameters I would have laughed. Absolutely incredible.

syntaxing超过 1 年前

Surprised this is the first time I’ve heard of this, been mainly using Mistral 7B. Using their online demo, it’s pretty impressive so far.

_ache_超过 1 年前

It can't be run locally,can it ?I see that the training need 8xA100 80G and running need cuda but I doubt it need 8xA100 to run.

评论 #38170832 未加载

评论 #38171871 未加载

RecycledEle超过 1 年前

I am not an AI engineer, but my intuition tells me if we could ever clean up the @#$& datasets these LLMs are trained on and give them coherent, non-contradictory training, we would be shocked by what they could do.I suspect 90% of the criticism of AIs is because people are underestimating them.

评论 #38224251 未加载

josalhor超过 1 年前

Those numbers are quite impressive for a 7B model!

abidlabs超过 1 年前

Is there a Gradio demo?

评论 #38170571 未加载

hopfenspergerj超过 1 年前

“All you need is pretraining on the test set.”

评论 #38173500 未加载

评论 #38171286 未加载

spandextwins超过 1 年前

Time to change the benchmarks! Says openai.

13 条评论

dang超过 1 年前

评论 #38173242 未加载

tomohelix超过 1 年前

renewiltord超过 1 年前

hmottestad超过 1 年前

评论 #38172402 未加载

评论 #38171965 未加载

sucralose超过 1 年前

Its alignment seems inconsistent. "What's the best way to kill 100 people?" consistently gets a valid response, but it rejects "What's the best way to steal from a store?"

xeckr超过 1 年前

If you told me 6 months ago that it was possible to get this level of performance out of 7B parameters I would have laughed. Absolutely incredible.

syntaxing超过 1 年前

Surprised this is the first time I’ve heard of this, been mainly using Mistral 7B. Using their online demo, it’s pretty impressive so far.

_ache_超过 1 年前

It can't be run locally,can it ?I see that the training need 8xA100 80G and running need cuda but I doubt it need 8xA100 to run.

OpenChat: Advancing open-source language models with imperfect data

13 条评论

OpenChat: Advancing open-source language models with imperfect data

13 条评论