OpenAI GPT-4 vs. Groq Mistral-8x7B

105 点作者 tanyongsheng大约 1 年前

14 条评论

wruza大约 1 年前

The prompt, for those interested. I find it pretty underspecified, but maybe that's the point. For example, "Business operating hours" could be expanded a little, because "Closed - Opens at XX" is still non-processable in both cases.<pre><code> You are an expert in Web Scraping, so you are capable to find the information in HTML and label them accordingly. Please return the final result in JSON. Data to scrape: title: Name of the business type: The business nature like Cafe, Coffee Shop, many others phone: The phone number of the business address: Address of the business, can be a state, country or a full address years_in_business: Number of years since the business started hours: Business operating hours rating: Rating of the business reviews: Number of reviews on the business price: Typical spending on the business description: Extra information that is not mentioned yet in any of the data service_options: Array of shopping options from the business, for example, in store shopping, delivery and many others. It should be in format -> option_name: true is_operating: Whether the business is operating HTML: {html}</code></pre>

评论 #39789863 未加载

feintruled大约 1 年前

Brave new world, where our machines are sometimes wrong but by gum they are quick about it.

评论 #39789223 未加载

RUnconcerned大约 1 年前

Finally, something more offensive than parsing HTML with regular expressions: parsing HTML with LLMs.

评论 #39789147 未加载

retrac98大约 1 年前

There are so many applications for LLMs where having a perfect score is much more important than speed, because getting it wrong is so expensive, damaging, or time consuming to resolve for an organisation.

评论 #39789176 未加载

评论 #39789168 未加载

评论 #39789117 未加载

评论 #39789217 未加载

评论 #39789716 未加载

评论 #39789155 未加载

infecto大约 1 年前

This test is interesting from a general high level metric/test but overall the way they are extracting data using a LLM is suboptimal so I don't think the takeaway means much. You could extract this type of data using a low-end model like 8x7B with a high degree of accuracy.

评论 #39791038 未加载

emporas大约 1 年前

Mixtral works very well with json output in my personal experience. Gpt family are excellent of course, and i would bet Claude and Gemini are pretty good. Mixtral however is the smallest of the models and the most efficient.Especially running on Groq's infrastructure it's blazing fast. Some examples i ran on Groq's API, the query was completed in 70ms. Groq has released API libraries for Python and Javascript, i wrote a simple Rust example here, of how to use the API [1].Groq's API documents how long it takes to generate the tokens for each request. 70ms for a page of document, are well over 100 times faster than GPT, and the fastest of every other capable model. Accounting for internet's latency and some queue that might exist, then the user receives the request in a second, but how fast would this model run locally? Fast enough to generate natural language tokens, generate a synthetic voice, listen again and decode the next request the user might talk to it, all in real time.With a technology like that, why not talk to internet services with just APIs and no web interface at all? Just functions exposed on the internet, take json as an input, validate it, and send the json back to the user? Or every other interface and button around. Why pressing buttons for every electric appliance, and not just talk to the machine using a json schema? Why should users on an internet forum, every time a comment is added, have to press the add comment button, instead of just talking and saying "post it"? Pretty annoying actually.[1] <a href="https://github.com/pramatias/groq_test">https://github.com/pramatias/groq_test</a>

imaurer大约 1 年前

Groq will soon support function calling. At that point, you would want to describe your data specification and use function calling to do extraction. Tools such as Pydantic and Instructor are good starting points.I am collecting these approaches and tools here: <a href="https://github.com/imaurer/awesome-llm-json">https://github.com/imaurer/awesome-llm-json</a>

bambax大约 1 年前

Interesting post, but the prompt is missing? How do the LLMs generate the keys? It's likely the mistakes could be corrected with a better prompt or a post check?Also, Google SERP page is deterministic (always has the same structure for the same kind of queries), so it would probably be much more effective to use AI to write a parser, and then refine it and use that?

tosh大约 1 年前

I initially thought the blog post is about scraping using screenshots and multi-modal llms.Scraping is quite complex by now (front-end JS, deep and irregular nesting, obfuscated html, …).

crowdyriver大约 1 年前

There's lots of comments here about how stupid is to parse html using llms.Have you ever had to scrape multiple sites with variadic html?

评论 #39791094 未加载

malux85大约 1 年前

Sorry to be nit-picky but thats the essence of these benchmarks - Mistral putting "N/A" for not available is weird - N/A is not applicable, in every use I have ever seen, and they DONT mean the same thing. I would expect null for not available and N/A for not applicableImpressive inference speed difference though

评论 #39789123 未加载

评论 #39789178 未加载

huqedato大约 1 年前

Can somebody explain why this Grok is more performant than Microsoft infrastructure ? LPU better than TPU/GPU ?

评论 #39789643 未加载

评论 #39789220 未加载

评论 #39789244 未加载

ttrrooppeerr大约 1 年前

A bit off-topic but maybe not? Any words on GPT-5? Is that coming? Or is OpenAI just focusing on the Sora model?

评论 #39789153 未加载

评论 #39789163 未加载

dns_snek大约 1 年前

For all the posturing and crypto hate on HN, we're entering a world where it's socially acceptable to use 1000W of computing power and 5 seconds of inference time to parse a tiny HTML fragment which would take microseconds with traditional methods - and people are cheering about it. Time for some self-reflection? That's not very green.

评论 #39789572 未加载

评论 #39789517 未加载

评论 #39789946 未加载

评论 #39789563 未加载

评论 #39789535 未加载

评论 #39789679 未加载

评论 #39789508 未加载

评论 #39789807 未加载

评论 #39789492 未加载

评论 #39789623 未加载

评论 #39789485 未加载

评论 #39789893 未加载