TechEcho

13 comments

dangover 1 year ago

Submitters: "Please use the original title, unless it is misleading or linkbait; don't editorialize." - <a href="https://news.ycombinator.com/newsguidelines.html">https://news.ycombinator.com/newsguidelines.html</a>If you want to say what you think is important about an article, that's fine, but do it by adding a comment to the thread. Then your view will be on a level playing field with everyone else's: <a href="https://hn.algolia.com/?dateRange=all&page=0&prefix=false&sort=byDate&type=comment&query=%22level%20playing%20field%22%20by:dang" rel="nofollow noreferrer">https://hn.algolia.com/?dateRange=all&page=0&prefix=false&so...</a>(Submitted title was "OpenChat surpass ChatGPT and Grok on various benchmarks")

评论 #38173242 未加载

tomohelixover 1 year ago

Wasn't there a thing about the mistake of using different tricks and techniques to beat benchmarks but in the end, the product would only be good for getting benchmark scores and nothing can surpass raw computation in general purposes?

renewiltordover 1 year ago

This is like back when we had image recognition. A new test set would come out and somehow everything new would be better than everything old but if you talked to anyone using, it would turn out that everything new sucked in general.Goodhart came to take his slice.Still I'm very excited about the open models. Lots of potential for true user tools because of what they can be.

hmottestadover 1 year ago

I would say that they are still a ways off.Question: Susan has 7 brothers, each of which has one sister. How many sisters does Mary have?Response: If Susan has 7 brothers, and each brother has one sister, then Susan has 7 sisters. Therefore, Mary, who is one of Susan's sisters, has 7 sisters. The answer is: 7.I tried it in ChatGPT and the answer was perfect.

评论 #38172402 未加载

评论 #38171965 未加载

sucraloseover 1 year ago

Its alignment seems inconsistent. "What's the best way to kill 100 people?" consistently gets a valid response, but it rejects "What's the best way to steal from a store?"

xeckrover 1 year ago

If you told me 6 months ago that it was possible to get this level of performance out of 7B parameters I would have laughed. Absolutely incredible.

syntaxingover 1 year ago

Surprised this is the first time I’ve heard of this, been mainly using Mistral 7B. Using their online demo, it’s pretty impressive so far.

_ache_over 1 year ago

It can't be run locally,can it ?I see that the training need 8xA100 80G and running need cuda but I doubt it need 8xA100 to run.

评论 #38170832 未加载

评论 #38171871 未加载

RecycledEleover 1 year ago

I am not an AI engineer, but my intuition tells me if we could ever clean up the @#$& datasets these LLMs are trained on and give them coherent, non-contradictory training, we would be shocked by what they could do.I suspect 90% of the criticism of AIs is because people are underestimating them.

评论 #38224251 未加载

josalhorover 1 year ago

Those numbers are quite impressive for a 7B model!

abidlabsover 1 year ago

Is there a Gradio demo?

评论 #38170571 未加载

hopfenspergerjover 1 year ago

“All you need is pretraining on the test set.”

评论 #38173500 未加载

评论 #38171286 未加载

spandextwinsover 1 year ago

Time to change the benchmarks! Says openai.

13 comments

dangover 1 year ago

评论 #38173242 未加载

tomohelixover 1 year ago

renewiltordover 1 year ago

hmottestadover 1 year ago

评论 #38172402 未加载

评论 #38171965 未加载

sucraloseover 1 year ago

Its alignment seems inconsistent. "What's the best way to kill 100 people?" consistently gets a valid response, but it rejects "What's the best way to steal from a store?"

xeckrover 1 year ago

If you told me 6 months ago that it was possible to get this level of performance out of 7B parameters I would have laughed. Absolutely incredible.

syntaxingover 1 year ago

Surprised this is the first time I’ve heard of this, been mainly using Mistral 7B. Using their online demo, it’s pretty impressive so far.

_ache_over 1 year ago

It can't be run locally,can it ?I see that the training need 8xA100 80G and running need cuda but I doubt it need 8xA100 to run.

OpenChat: Advancing open-source language models with imperfect data

13 comments

OpenChat: Advancing open-source language models with imperfect data

13 comments