How can we help. Can crowd sourcing help? Is there any list of tasks that we want a crowd to do? The reason I am asking is because we have done a couple of crowdsourcing efforts and collected story data in Telugu(Chandamama Kathalu) and ASR speech data using college going students. Since we have access to the students, we can mobilize them and get this going. We will also be doing an internship program for 100,000 students in Telangana as part of Viswam[1] in April. Can include some work as part of this effort.<p>[1] <a href="https://viswam.ai/" rel="nofollow">https://viswam.ai/</a>
From the article: <i>they didn’t release everything—although the model weights are open, the datasets and code used to train the model are not.</i><p>Is that true about Meta Llama as well? Specifically, the code used to train the model is not open? (I know no one releases datasets). If so the label "open source" is inappropriate. "Open weights" would be more appropriate.
Now that things are really getting wild in the LLM space and people are just running anything that come it seems I did a quick search on the thead model of hosting you own LLM.<p>I didn't find much, starting with llama.ccp which is just reminding you to sandbox and isolate everything if running untrusted models.<p>I feel we are back in the Windows 95 / early Internet era when people would just run anything without caring about security.
Given DeepSeek's open philosophy I wonder what their response is to simply being asked for access to the code and data that this project intends to recreate?
For "open source", we will wait that Debian ships them to have the guarantee it's actually "open" and with "sources". Right now it's a mystery how they produce their binaries.
About the training data, cant the datasets from the Tulu3 Model by the Allen Institute be used?
They claim that they have used a fully open source training dataset.
Exciting to see this being reproduced, loving the hyper-fast movement in open source!<p>This is exactly why it is not “US vs China”, the battle is between heavily-capitalized Silicon Valley companies versus open source.<p>Every believer in this tech owes DeepSeek some gratitude, but even they stand on shoulders of giants in the form of everyone else who pushed the frontier forward and chose to publish, rather than exploit, what they learned.
super cool to see an open initiative like this—love the idea of replicating DeepSeek-R1 in a transparent way.<p>I do like the idea of making these reasoning techniques accessible to everyone. If they really manage to replicate the results of DeepSeek-R1, especially on a smaller budget, that’s a huge win for open-source AI.<p>I’m all for projects that push innovation and share the process with others, even if it’s messy.<p>But yeah—lots of hurdles. They might hit a wall because they don’t have DeepSeek’s original datasets.