TechEcho

8 comments

Nice idea and all, but I think their methodology was totally unsuited to the task. What they actually assessed was whether an LLM can recognize an accurate summary of a given article, not whether the LLM can produce an accurate summary itself. Here's their description of it:> We used a 3-way verified hand-labeled set of 373 news report statements and presented one correct and one incorrect summary of each. Each LLM had to decide which statement was the factually correct summary.The problem with this approach is its assumption that if an LLM can recognize an accurate summary, it'll be able to reliably produce accurate summaries. We know very little about the inner workings of LLMs right now, and what we do know suggests that they work highly counterintuitively, so I think there's no basis to make this assumption.

评论 #37311291 未加载

behindaiover 1 year ago

it costs 0.001$ per 1K which is slightly cheaper than GPT-3.5-turbo. I have just tested it and it shows extremely worse results on the tasks in my pipelines. Not a game change, unfortunately.

评论 #37309874 未加载

评论 #37315666 未加载

评论 #37311061 未加载

sorokodover 1 year ago

"It is not too much of a stretch to conclude that a system that is better at telling factual from non-factual sentences is better at not making them up in the first place – or alternatively could decide through a two stage process if it was being inconsistent."Stretching aside, how does one follow from the other?

评论 #37312313 未加载

评论 #37312035 未加载

born-jreover 1 year ago

if someone reputable could maintain a blind benchmark that is not public, that would be great.

评论 #37311025 未加载

评论 #37312147 未加载

yieldcrvover 1 year ago

also means that OpenAI can just swap in Llama 2 and increase their capacity by orders of magnitudethis is the age (or year) of token price arbitrage

评论 #37309867 未加载

评论 #37309852 未加载

cheema33over 1 year ago

Almost as good as GPT-4? I hear that claim quite often. And then when I test the claim, it falls far far short. I want real competition in this space. But currently, there is none. Except for maybe some very very corner case.

ldjkfkdsjnvover 1 year ago

Anyscale's business model was completely disrupted by OpenAI. They are trying to shift to provide hosting/fine tuning for open source LLMs, but the model performance will get crushed by Gpt4/newer open AI models. In theory the alternative to openAI models is nice, in reality anyscale is now competing with Azure/AWS/etc to provide model hosting.Their original compute platform for running arbitrary ml workloads will become obsolete as the industry consolidates around LLMs.

评论 #37312089 未加载

评论 #37310109 未加载

评论 #37311221 未加载

评论 #37310273 未加载

评论 #37310350 未加载

m3kw9over 1 year ago

OpenAI is probably laughing their asses off internally read all the “it’s equivalent to GPT4” results

评论 #37310451 未加载

8 comments

Michelangelo11over 1 year ago

评论 #37311291 未加载

behindaiover 1 year ago

it costs 0.001$ per 1K which is slightly cheaper than GPT-3.5-turbo. I have just tested it and it shows extremely worse results on the tasks in my pipelines. Not a game change, unfortunately.

评论 #37309874 未加载

评论 #37315666 未加载

评论 #37311061 未加载

sorokodover 1 year ago

评论 #37312313 未加载

评论 #37312035 未加载

born-jreover 1 year ago

if someone reputable could maintain a blind benchmark that is not public, that would be great.

评论 #37311025 未加载

评论 #37312147 未加载

yieldcrvover 1 year ago

also means that OpenAI can just swap in Llama 2 and increase their capacity by orders of magnitudethis is the age (or year) of token price arbitrage

评论 #37309867 未加载

评论 #37309852 未加载

cheema33over 1 year ago

ldjkfkdsjnvover 1 year ago

评论 #37312089 未加载

评论 #37310109 未加载

评论 #37311221 未加载

评论 #37310273 未加载

评论 #37310350 未加载

m3kw9over 1 year ago

OpenAI is probably laughing their asses off internally read all the “it’s equivalent to GPT4” results

评论 #37310451 未加载

Llama 2 is about as factually accurate as GPT-4 for summaries and is 30X cheaper

8 comments

Llama 2 is about as factually accurate as GPT-4 for summaries and is 30X cheaper

8 comments