TechEcho

1 comment

petertoddabout 7 years ago

This stuff is much simpler than it sounds, and that's a good thing.(1)The intuition here is that with hashed data structures like merkle trees, if you and I agree on a hash digest, then we also agree on all the data hashed by that hash digest. Since you can hash a hash digest, this agreement operates recursively.When we talk about a "query", all we're really saying is Alice wants to prove to Bob that if he runs some function on some datastructure, he'll get a particular answer. For instance, Alice might tell Bob "Hey, you know that merkle tree we both agree on? If you run get_index(10) on it, you'll get the value `foo`"She can prove that to Bob, by running that operation herself on her computer, and then recording what data was accessed during that operation. Next, she serializes that data, and replacing the parts that weren't accessed with their hash digests. Finally, she sends that partial data structure to Bob.When Bob gets that data, he first hashes it to make sure the root hash digest actually matches what he expected. Next he runs the operation, which will return the same result as when Alice ran it. This is a proof for a very simple reason: both sides ultimately did the exact same thing!If it helps, think of virtual memory: the data that Bob is missing, is similar to a page in memory that has been flushed to disk. Unless your computation actually needs the data in that page, it'll run successfully even if your disk isn't working.So why does all this matter? Basically, this is exactly what blockchains are supposed to do: let multiple parties verify the same thing is true, based on the same data. So in, say, Certificate Transparency, the operator of the CT log can generate can extract one of these proofs to convince the webbrowser - who doesn't have all the CT log data - that yes, if you run the operation "fetch cert #12345 from CT log" it will in fact return the cert they're expecting it too (in other words, CT already does exactly this). Where this improves on naive blockchains is simple: now you don't have to have the whole blockchain to verify something.I actually did a Python implementation of these ideas years ago for a client: <a href="https://github.com/proofchains/python-proofmarshal" rel="nofollow">https://github.com/proofchains/python-proofmarshal</a> I'm also (slowly) working on a Rust implementation.FWIW, I think it's an awful example of how a really good idea has been explained terribly by academia. It actually pre-dates Andrew Miller's paper, but the previous example I found never caught on, likely because no-one understood what they were talking about. Heck, Andrew himself tried to explain it to me years ago, and I didn't get it, then went on to reinvent the same thing, which I only realized about another year later. We really need to find better ways to bridge those two worlds. :/1) EDIT: And to be clear, I don't mean for that to come across as a negative way! Rather, it's good that academics have thoroughly proven these techniques actually work. But for programmers, the mechanics of those proofs aren't necessarily all that important. Same as how you can use calculus effectively to solve many problems without necessarily understanding in detail how we actually proved it worked.

Authenticated Data Structures Generically

1 comment

Authenticated Data Structures Generically

1 comment