This is actually really cool article, though it could be a bit better written for sure.<p>The new Tesla 8 and 16 bit floating number formats in the Dojo system supports this kind of rounding up/down using a PRNG when compressing neural network parameters from higher precision floats (it is specified in the floating point paper).<p>The random rounding is needed to not have a bias in the neural network weight updates, but this article improves the rounding method to be more accurate (closer to the real rounding than uniform sampling) without reintroducing bias.
I would have naiively thought that you would just send a 1 with a probability equal to the number you want to send.<p>Is this compared in the post? Some of it went over my head.
What are some real-world situations where you can only send a single bit but have shared randomness? That seems like a bizarre constraint. Is this just - and no shame in this - math for math’s sake?
I hate these articles that jump straight into some complex solution without stating the problem clearly... If the problem really is to send a real number using a single bit (and some shared randomness) then clearly that’s just impossible. Next!
According to the paper, this is sending an estimate of a real number with a finite number of bits (possibly one). The method aims to reduce worst-case(?) error by relying on preshared state (in this case, the seed to a PRNG).
What is "shared randomness" in the context of this article? When I google it I find papers on quantum computing - is that a prerequisite for this implementation?
I see this as the future of machine learning.<p>Using a single bit to communicate weight updates during the learning process reduces bandwidth required, and allows highly parallel training.<p>I suspect in the future we'll even see methods of sub-1-bit weight updates to further decrease bandwidth requirements to keep massive models approximately in sync between distant learning nodes.
Only tangentially related, but practically how does one only send a single bit of data to a server? If you have a tcp or udp connection then each send is actually a much larger packet. Common rpc frameworks like protobuf also encode a single bit message to a larger structure. I'm sure there's a way, I just can't come up with it.
Can somebody please explain in simpler terms how this works?
If I have a real number, and add some (shared) random value to it, then round it to 0 or 1, how is the receiver able to retrieve the original real value???