TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Thefastest.ai

60 点作者 zkoch大约 1 年前

13 条评论

anonzzzies大约 1 年前
Groq with llama3 70b is so fast and good enough for what we do (source code stuff) that it’s really quite painful to work with most others now. We replaced most our internal integrations with this and everything is great so far. I guess they will be bought soon?
评论 #40138836 未加载
评论 #40138113 未加载
saltsaman大约 1 年前
Couple of things:<p>1. Filtering by model should be enabled by default. Mixtral-8x7b-instruct on Perplexity is almost as fast as the 7B Llama 2 on fireworks, but are quite different in sizes.<p>2. Pricing is a very important factor that is not included.<p>3. Overall service reliability should also be an important signal.
评论 #40137834 未加载
评论 #40137820 未加载
passion__desire大约 1 年前
I don&#x27;t understanding why would we need to having similar expectations from systems that we have from humans and building a whole theory on it. I can adjust my behaviour around systems. I am not restricted to operate within default values. e.g Whenever a price is listed as $99, I automatically know it is $100. Marketing gimmicks don&#x27;t work once you know about them or in other words, expectations can be set in a new environment.
评论 #40138222 未加载
pants2大约 1 年前
Another good resource: <a href="https:&#x2F;&#x2F;artificialanalysis.ai&#x2F;" rel="nofollow">https:&#x2F;&#x2F;artificialanalysis.ai&#x2F;</a>
评论 #40139276 未加载
pants2大约 1 年前
I&#x27;d be interested to hear how Llama 8B with long chain-of-thought prompts compares to GPT-4 one-shot prompts for real-world tasks.<p>In classification for example, you could ask Llama 8B to reason through each possibility, rank them, rate them, make counterarguments, etc. - all in the same time that GPT-4 would take to output one classification without reasoning. Which does better?
评论 #40138916 未加载
评论 #40137927 未加载
pants2大约 1 年前
There are dozens of AI chip startups out there with wild claims about speed. Groq seems like the first to actually prove it by launching a product. I hope they spur a speed war with other chipmakers to make the fastest inference engine.
geor9e大约 1 年前
I love this. Latency is the worst part about AI. I use the lowest latency models that give adequate answers. I do wish this site gave an average and standard deviation.For example Groq fluctuates wildly, depending of the time of day. They&#x27;re ranked pretty poorly at &quot;610ms&quot; here, and I definitely encounter far worse from them sometimes, but it&#x27;s wicked fast at other times.
ankerbachryhl大约 1 年前
Been looking a lot for a simple overview like this, I’ve spent too much time benchmarking models&#x2F;regions myself. Thank you for creating!
akozak大约 1 年前
It&#x27;d be nice to have a similar site but cost per token.
评论 #40137753 未加载
cedws大约 1 年前
Any idea which one Copilot uses? I&#x27;m interested in exploring ways to get down autocomplete suggestion latency.
评论 #40137808 未加载
jxy大约 1 年前
No prompt length? For practical purposes, the prompt processing time would far more important.
评论 #40137840 未加载
tikkun大约 1 年前
I wish it also included latency of speech to text and text to speech APIs :)
CharlesW大约 1 年前
Groq really has an unfortunate name. (I assume they had theirs before Grok.)
评论 #40137812 未加载
评论 #40137763 未加载
评论 #40137787 未加载