I'm (re)doing our naming convention for our S3 keys. We have many (millions) of objects to store on S3. I'm planning on putting them all into one bucket and naming the keys to be the MD5 or SHA-1 digest of the files. I'll keep this synced with a database table which maps an auto-increment GUID (globally unique id) with the digest of the file.<p>Then I read this:
http://paltman.com/2007/05/29/amazon-s3-and-filename-magic
but I don't really understand the advantage of storing an object on S3 whose key is the hash and which points to the GUID.
One thing I don't want to do is store the GUID as the S3 key (to prevent massive scraping of all the assets).<p>How are you all dealing with this? Is anyone using a MD5 or SHA-1 digest as the key? A salted hash of the GUID as the key?
I think this is dependant on what you're doing with S3. I'm only using it for backup and use the tarball name and date/timestamp as the key name. This ensures that it's unique and easy to understand when you need to restore.<p>If I were going to use them for hosting my videos or something, I'd probably just use the GUID without the need for the MD5. Just make sure there's not chance of duplicates in the database schema and it should work.<p>I use Bacula for network backups and it does the same thing. All paths and filenames are stored as unique ID numbers.
Hmm. I just realized that if they key is the MD5 digest, then I can't independently store/delete identical files. Maybe the best bet is to encrypt the GUID.