科技回声

16 条评论

There's a name and a logo. "Hubris" feels slightly beggared. <a href="https://en.m.wikipedia.org/wiki/The_Metamorphosis_of_Prime_Intellect" rel="nofollow">https://en.m.wikipedia.org/wiki/The_Metamorphosis_of_Prime_I...</a>

评论 #43961570 未加载

评论 #43963004 未加载

refulgentis10 天前

I guess I'm bearish?It's not that they trained a new model, but they took an existing model and RL'd it a bit?The scores are very close to QwQ-32B, and at the end:"Overall, as QwQ-32B was already extensively trained with RL, it was difficult to obtain huge amounts of generalized improvement on benchmarks beyond our improvements on the training dataset. To see stronger improvements, it is likely that better base models such as the now available Qwen3, or higher quality datasets and RL environments are needed."

评论 #43959398 未加载

评论 #43959451 未加载

评论 #43961411 未加载

iTokio10 天前

It’s interesting that it does something useful (training a LLM) without trust and in a decentralized way.Maybe this could be used as proof of work? To stop wasting computing resources in crypto currencies and get something useful as a byproduct.

评论 #43960631 未加载

评论 #43960685 未加载

评论 #43963132 未加载

评论 #43960171 未加载

评论 #43959974 未加载

评论 #43960031 未加载

评论 #43960375 未加载

评论 #43961433 未加载

3abiton10 天前

This is rather exciting! I see the future of Co-op models made by a community of experts on a specific field that would still allow them to be competitive with "AI monopolies". Maybe not all hope is lost!

Thomashuet10 天前

Summary: We've use the most complexest, buzzwordiest training infrastructure to increase the performance of our base model by a whopping 0.5% (±1%).

评论 #43961426 未加载

评论 #43964118 未加载

danielhanchen10 天前

I made some GGUFs at <a href="https://huggingface.co/unsloth/INTELLECT-2-GGUF" rel="nofollow">https://huggingface.co/unsloth/INTELLECT-2-GGUF</a>./llama.cpp/llama-cli -hf unsloth/INTELLECT-2-GGUF:Q4_K_XL -ngl 99Also it's best to read <a href="https://docs.unsloth.ai/basics/tutorial-how-to-run-qwq-32b-effectively">https://docs.unsloth.ai/basics/tutorial-how-to-run-qwq-32b-e...</a> on sampling issues for QwQ based models.Or TLDR, use the below settings:./llama.cpp/llama-cli -hf unsloth/INTELLECT-2-GGUF:Q4_K_XL -ngl 99 --temp 0.6 --repeat-penalty 1.1 --dry-multiplier 0.5 --min-p 0.00 --top-k 40 --top-p 0.95 --samplers "top_k;top_p;min_p;temperature;dry;typ_p;xtc"

abtinf10 天前

Does this have anything to do with The Metamorphosis Of Prime Intellect, or did they just abuse the name and the cover art?

评论 #43959483 未加载

esafak10 天前

How are they ensuring robustness against adversarial responses?

评论 #43959233 未加载

schneehertz10 天前

I used to have an idea related to science fiction novels that artificial intelligence could aggregate computing power through the network to perform ultra-large-scale calculations, thereby achieving strong artificial intelligence. Reality will also develop in this way, which is very interesting

mountainriver10 天前

Awesome work this team is doing. Globally distributed MoE could have real legs

quantumwoke10 天前

Wonder what the privacy story is like. Enterprises don't usually like broadcasting their private data across a freely accessible network.

评论 #43959511 未加载

bwfan12310 天前

The most interesting thing I see is the productization of the diloco work done here [1]. If someone can make this scale, then we can say goodbye to expensive backend networking and mainframe-like AI training machinery.[1] <a href="https://arxiv.org/abs/2311.08105" rel="nofollow">https://arxiv.org/abs/2311.08105</a>

ikeashark10 天前

I wonder why they randomly noted a torch-compile vs non torch-compile figure where torch-compile degraded model performance. What made it degrade? It seems to only appear in one figure and nowhere else.

评论 #43968961 未加载

ndgold10 天前

Pretty badass

Mougatine10 天前

very cool work!

jumploops10 天前

Congrats to the team on the launch!Personal story time: I met a couple of their engineers at an event a few months back. They mentioned they were building a distributed training system for LLMs.I asked them how they were building it and they mentioned Python. I said something along the lines of “not to be the typical internet commenter guy, but why aren’t you using something like Rust for the distributed system parts?”They mumbled something about Python as the base for all current LLMs, and then kinda just walked away…From their article: > “Rust-based orchestrator and discovery service coordinate permissionless workers”Glad to see that I wasn’t entirely off-base :)

评论 #43964163 未加载

评论 #43960455 未加载

评论 #43959547 未加载

16 条评论