Dealing with user submitted content on a 200+ million user platform must really try the patience of security researchers for Facebook.<p>It would be great if Facebook had a security research group that openly <i>published the results of their findings</i>, those findings that don't expose corporate secrets. They almost certainly have seen it "all."
The parse_url vulnerability is probably a good example of why you don't want to use blacklists for filtering out malicious input; you want to use whitelists, and then you want to reconstitute the thing you parsed into a form that can be parsed unambiguously.<p>parse_url(" javascript:alert('hello')") yields<p><pre><code> Array
(
[path] => javascript:alert('hello')
)
</code></pre>
which clearly does not have an URL scheme on any whitelist you might apply. Even if it had incorrectly claimed the scheme was "http", the reconstitution step would give you an URL like "<a href="http://localhost/%20javascript:alert(hello)" rel="nofollow">http://localhost/%20javascript:alert(hello)</a>, which would avoid the problem.
It's not mentioned but the lessons are:<p>1. Use an unique CSRF hidden field in the form that only users on Facebook can get and so it's them that can submit posts to their wall. (More: <a href="https://secure.wikimedia.org/wikipedia/en/wiki/Cross-site_request_forgery" rel="nofollow">https://secure.wikimedia.org/wikipedia/en/wiki/Cross-site_re...</a> )<p>2. Regarding video; Don't allow Javascript code to be submitted and presented anywhere. The code to link to videos is unescaped javascript (with space) followed by a video link.
What would be a (simple) abstraction from browsers that would just thwart all these attacks? The main problem here seems to be the use of heuristics for identifying malicious content.<p>One of the main problems is mixing code and data. Say there's a new HTTP header that tells the browser to disable inline scripts, would that help solve the problem?