I am currently working on a web application that allows users to upload files. There is much more to it than this of course, but I am asking the following question solely in relation to the storage of files.<p>I am planning on using a CDN (such as Amazon's S3) and my question is simple: How effective is obfuscating the names of publicly available files using a UUID for security purposes? For example, naming a file something like this <i>4b013ca21ba608373efb4717.jpg</i>.<p>I would be fascinated to hear any thoughts from the HN users. Certain questions spring to mind, like:<p>- How easy would these be to guess? Can there be any guarantees of uniqueness?<p>- What is the best algorithm to create them?<p>- Should I just make them all private and use authenticated access controls?<p>I understand that this is an application specific question, that depends on what level of security I require, among other factors. But, for purposes of this discussion, lets just say that it needs to be high, but no Fort Knox: if a file <i>was</i> comprised it would not be critical.<p>Thank you in advance for any help. :)
Random file name: You can create an md5 hash of the original file name and use that for the public file name. From what I understand it's extremely unlikely to have two strings hash into the same value. It's also long and nearly impossible to guess.<p>Security: I had a similar need for my website and figured that if I'm the only one that knows about the URL then it's secure. I was dead wrong. Some browser plug-ins look at each url you enter and spider them. I know this is true because I started to see Alexa hit unpublished admin URLs on my website.<p>Unpublished URLs != Security.
It's not clear what you're trying to secure against. Are you worried about securing a particular image so that only certain designated users can see it? Are you worried about the original name "leaking"? Are you worried about someone iterating through all of your images?<p>In general, relying on a "secret" URL is not a good way to keep things secret. Google has a nasty habit of finding URLs you thought had no links. Definitely tune your robots.txt to keep the images off legit search engines.
Obfuscation via a suitably-long unpredictable URL <i>can</i> work, but as others have noted, there are a number of ways such an URL can leak -- toolbars/browser-plugins being one of them. Also, that a user can easily forward the URL to others to grant access may be a bug or a feature, depending on your preferences.<p>One often overlooked leak: the 'Referer' [sic] header. If your document is hypertext, and includes outlinks to elsewhere, and the authorized user(s) click those links, those outlink-target sites may receive your confidential URL as a 'Referer' header. As some sites then publish their referrers, in one way or another, the 'secret' URL could wind up in public.<p>Remember this before creating a 'Competitors' page with outlinks on a 'login-required' but plain-HTTP wiki!
First of all, Amazon S3 is absolutely not a CDN.<p><i>How easy would these be to guess?</i><p>There is a 1:3e38 chance of two randomly generated UUIDs colliding.<p><i>What is the best algorithm to create them?</i><p><a href="http://en.wikipedia.org/wiki/Universally_Unique_Identifier#Implementations" rel="nofollow">http://en.wikipedia.org/wiki/Universally_Unique_Identifier#I...</a><p><i>Should I just make them all private and use authenticated access controls?</i><p>If your users don't want other people peaking at their files (DropBox): yes absolutely. If your users don't care (HotorNot), then no.
There are two things to consider: First, you need to make sure that your filename is a long, unique, hard-to-guess string, that is easy to generate. This rules out all 5 UUID specifications:<p>Version 1: MAC + timestamp => easy to predict<p>Version 2: MAC + some other static data + partial timestamp: => Also easy to predict<p>Version 3: MD5 of some file or random string => If an attacker has a file, he can generate the MD5 hash himself and see if you also have this file.<p>Version 4: Random data => slow<p>Version 5: Same as version 3, but with SHA1.<p>Your best bet IMHO is to use the HMAC of the file. This will defend you against all the flaws that using an unique ID would have.<p>The second step is to ensure that your secret links don't leak. You can employ robots.txt to disallow robots, use dereferer.org or anonym.to to hide away referers, but you still won't be secure, as someone can still copy and paste the link. If that is ok with you, then you can stop reading now.<p>You could of course add an EC2 machine between the user and your S3 storage that makes sure each link only works once, but this would be expensive and counter-effective. However, Amazon S3 allows you to create a request that can be made via HTTP GET and that is only valid until a specific time. (See the API documentation, chapter "Authentication and Access Control"). This will allow you to generate a new URL every time you want to serve a file to your client. The URL will only be valid for a specific time period. The downside is, that this again is time-demanding and that all caching on the user side will be useless.<p>Good luck!
Let me try to answer your question in a different way.<p>There are two things, security and the perception of security. For arguments sake, even if we assume that the method of simply creating UUIDs for file names is secure (which as several people have said is not a valid assumption), I would argue that it does not provide a good enough perception of security.<p>So, if your users are really concerned about security and afraid that others will look at their files, a simple solution as above would just not cut it. You will have to really convince them that your system is really secure. Now, no matter what you use, UUID, MD5, SHA1/2/256 etc. to them it won't make a difference and that may mean that you will lose users.<p>Based on that, my suggestion would be to do what you said above, make them all private and use authenticated access control. This will provide security as well as perception of security and get you more satisfied users.
The secret filename is not only on the wire but will show up in history and logs, which makes it a step down from basic HTTP authorization. I'd use at least digest authorization for anything that matters in any way.
if you are suggesting that you leave these secret urls open to the entire internet, and just rely on them being inaccessbile, you're looking for trouble. Depending on the type of documents, at the least your users may not be happy about them being unauthenticated, and I personally would worry about legal issues. Add authentication.