科技回声

4 条评论

It is in principle a good idea. There are plenty of ongoing attempts to do something similar but less cool than this with GPU networks like render etc. From what I've heard they all suck at AI workloads so far. For years I've heard of networks specifically designed for model training in the works but haven't noticed anything really make it that looks serious. But the details are tricky. You can immediately start finding a wall of interesting problems if you start thinking about doing this.Hashing is extremely easy to verify and therefore extremely easy to pay out for. More complicated computations are more expensive and complicated to verify. You can in principle run verifiable compute (snarks, starks) on training procedures to be sure distributed trainers are doing something that deserves to be paid out for. But how do you break this down so that it is incremental? How do you sequence updates? How do you ensure availability of updated weights so that trainers aren't withholding to gain an advantage? i.e. where does the data go? You're probably not keeping 100 billion parameters on chain. How do you keep costs of all this from ballooning like crazy (verifiable compute is kind of expensive) ?The little I know about these models the data is pretty important. How do you keep data quality high? How do you verify that data quality is good? Probably you are committing to public datasets and verifying against them. Is that good enough in this crazy world where the state of the art is training the entire public web? How do you get a commitment to "the internet" to prove against? How do you make sure trainers aren't redoing the same datapoint over and over?I think you can solve all this with enough work. Especially as the cost curve for verifiable compute keeps descending down you will probably find doors opening. Or if you bite the bullet and trust hardware vendors you can maybe have something with a decent security model that is practical today using trusted enclaves. But you've got to solve a lot of problems.

评论 #43186706 未加载

lostdog3 个月前

LLM training depends on centralization. You want to do a global update of all your weights as quickly as possible. Distributing the weight updates and synchronizing occasionally let's the weights drift around aimlessly, and is very inefficient.To optimize LLM training, you want to put as many GPUs as close together with the fastest interconnect you can build.

datadrivenangel3 个月前

1. Different silicon / different physical computer bits that don't do the right operations. 2. Cryptomining tends to be massively parallelizable (network friendly), while most AI training ends up being bandwidth limited, and so there are more benefits to larger single nodes.But also a lot of general purpose graphics cards that work for crypto are also decent at AI!

codegladiator3 个月前

I hope someone makes a lmgtfy but with llm instead of g.

4 条评论

a_tartaruga3 个月前

评论 #43186706 未加载

lostdog3 个月前

datadrivenangel3 个月前

codegladiator3 个月前

I hope someone makes a lmgtfy but with llm instead of g.

Ask HN: Why mining power is useless for LLM training?

4 条评论

Ask HN: Why mining power is useless for LLM training?

4 条评论