科技回声

8 条评论

> The training recipe and model architecture follow LLaMAThis is huge.MPT and Falcon are cool, but the inference runtimes and various tooling is mostly optimized for LLaMA. If this is a drop-in replacement for 7B, it's going to catch on much faster than any other small model.

评论 #36517221 未加载

评论 #36516030 未加载

TOMDM将近 2 年前

From all the experimentation I've done, 7B parameter models just don't seem to be able to produce useful output reliably enough for my use cases.What use cases do people have for these smaller LLM's?

评论 #36515386 未加载

评论 #36515231 未加载

评论 #36515216 未加载

评论 #36515326 未加载

评论 #36515408 未加载

评论 #36515394 未加载

评论 #36516867 未加载

评论 #36515785 未加载

评论 #36515251 未加载

brucethemoose2将近 2 年前

Also, their metric table is very interesting. It shows Falcon 7B and OpenLlama 7B much less favorably than other evaluations (including the HuggingFace leaderboard, which I am kinda suspicious of), and instruct benchmarks like that aren't seen as much.

profsummergig将近 2 年前

If someone could elucidate on what these phrases signify, I'd be very grateful:1) 7B foundational model2) 8K length3) 1.5T tokens

评论 #36517243 未加载

评论 #36517678 未加载

评论 #36517524 未加载

评论 #36516897 未加载

评论 #36516863 未加载

DanAtC将近 2 年前

I have no idea what any of these words mean, but I'd like to. Can someone point me in the direction of an "AI for Dipshits"?

评论 #36515432 未加载

评论 #36515229 未加载

minimaxir将近 2 年前

Per the validation perplexity chart shown, the 8K length model performs better than the 4K length model even at <4K length, so why are they even offering the 4K model if the 8K is strictly better?

评论 #36517815 未加载

artemonster将近 2 年前

Please recommend a good tutorial/book/video on modern LLMs and NNs in general, for programmers and technical people. Where you get the idea of how it works. Tried googling with dozens of queries and it just sucks, a lot of hand-wavy articles for lay people or some paid courses.

foolfoolz将近 2 年前

when will the llm race peak? have we peaked already?

评论 #36515207 未加载

评论 #36515495 未加载

评论 #36515488 未加载

评论 #36515261 未加载

评论 #36515188 未加载

评论 #36517526 未加载

评论 #36515566 未加载

评论 #36515761 未加载

评论 #36515238 未加载

8 条评论

brucethemoose2将近 2 年前

评论 #36517221 未加载

评论 #36516030 未加载

TOMDM将近 2 年前

评论 #36515386 未加载

评论 #36515231 未加载

评论 #36515216 未加载

评论 #36515326 未加载

评论 #36515408 未加载

评论 #36515394 未加载

评论 #36516867 未加载

评论 #36515785 未加载

评论 #36515251 未加载

brucethemoose2将近 2 年前

profsummergig将近 2 年前

If someone could elucidate on what these phrases signify, I'd be very grateful:1) 7B foundational model2) 8K length3) 1.5T tokens

评论 #36517243 未加载

评论 #36517678 未加载

评论 #36517524 未加载

评论 #36516897 未加载

评论 #36516863 未加载

DanAtC将近 2 年前

I have no idea what any of these words mean, but I'd like to. Can someone point me in the direction of an "AI for Dipshits"?

评论 #36515432 未加载

评论 #36515229 未加载

minimaxir将近 2 年前

Per the validation perplexity chart shown, the 8K length model performs better than the 4K length model even at <4K length, so why are they even offering the 4K model if the 8K is strictly better?

XGen-7B, a new 7B foundational model trained on up to 8K length for 1.5T tokens

8 条评论

XGen-7B, a new 7B foundational model trained on up to 8K length for 1.5T tokens

8 条评论