I've been a member @ HN for quite a while, but I usually just tend to focus in on security and privacy topics. One of my good friends is visiting, and we wanted to work on something challenging together. Both of us find privacy policies overly confusing and annoying, so we decided to tackle the problem. We built a tool that that crawls for privacy policies and uses guided machine learning to analyze them. We would love any feedback you have.
You can go so much farther with this. How about letting me paste in any TOS and have it analyzed for important bits or things out of the ordinary? I'd love having my own personal robot lawyer to read over all the stuff I sign or agree to!
The site is loading extremely slowly. Might I suggest you turn off, or dramatically reduce, the KeepAliveTimeout?<p>Header:<p><pre><code> Connection:Keep-Alive
Date:Fri, 11 Nov 2011 01:27:19 GMT
Keep-Alive:timeout=15, max=100</code></pre>
What about inconsistent privacy policies?<p>For example, <a href="http://www.privacyparrot.com/privacy" rel="nofollow">http://www.privacyparrot.com/privacy</a> states that they never "share any information about you" but then has an offhand mention about the site using Google Analytics.
On a more serious note, I have two real questions:<p>1. So, having crawled a boatload of privacy policies, what fraction of them say that they'll sell your data?<p>2. Are you worried that the lawyers will find your tool and tweak their policies to beat it?
I love this idea, but here's how I think it could be improved:<p>1. Consider porting the entire service to a browser extension for Chrome or Firefox, and making the homepage more of an information/FAQ center.<p>2. Demo video. A good demo video explaining why "John Doe" should worry about his personal information being sold would be more convincing - this is how you get your service to less savvy internet users who aren't primarily concerned with privacy.<p>3. Find a way around inconsistencies. It would be better to report if a website <i>actually</i> sells/uses your personal information rather than returning a simple search result with TOS findings. A website can tweak or flat out lie. You should try to account for this.<p>4. Are you planning to commercialize this in any way? How do you plan to fund it, if at all?
Just curious, if it could highlight the offending phrases it used to figure out the difference between selling, not selling, and bankruptcy selling. This way when we put in our revisions we can better help it learn.<p>Also if you aren't planning on making this a commercially viable product, could you release source code? Things like this make the world better and safer, (not to mention easier and funner.) All in all though it was rather interesting. (Still trying out websites and i see myself doing this until the end of the day at work.)
Of the two sites of mine that I checked, one came up as "Danger! Warning! They're going to sell your information in case of a Bankruptcy!!!"<p>Why?<p>Reading one of the submitter's comments below, it seems to lump "sold the entire company, therefore the user database went with it" into the same category as "we're running out of money, so let's sell everybody's email addresses to spammers."<p>They're not in any way related. I'd suggest splitting out those two categories, as I suspect it will drop that "bankruptcy email fire sale" category down to somewhere near 0%.
Why does it automatically add www in front of what I type in the URL box? If I type in news.ycombinator.com, it says "www.news.ycombinator.com does not exist".<p>Well, yes. That's why I didn't type that.
In a galaxy far away there was once conceived the idea of a machine readable privacy policy ... checking the interwebs reveals that <a href="http://www.w3.org/P3P/" rel="nofollow">http://www.w3.org/P3P/</a> was updated for the last time in 2007.<p>After some more searching: <a href="http://www.cdt.org/paper/looking-back-p3p-lessons-future" rel="nofollow">http://www.cdt.org/paper/looking-back-p3p-lessons-future</a> points to more information about that path from the past. What would the p3p.xml of facebook look like?<p>On another note: www dot freeprivacypolicy dot com [1] seems to generate the kind op privacy policies the site featured in this post sets out to parse. There is humor in that.<p>[1] don't want to feed page rank as privacyparrot says: "Your information may be sold during a bankruptcy"
I entered facebook and got two results, one saying facebook.com does not sell; the other saying www.facebook.com does sell. See <a href="http://www.privacyparrot.com/search?search=facebook.com" rel="nofollow">http://www.privacyparrot.com/search?search=facebook.com</a>
A bug report: if someone already added “.com” to their results because of no result, don’t offer to add it again if there are still no results.<p>Example: <a href="http://www.privacyparrot.com/search?search=http://www.noSuch.com" rel="nofollow">http://www.privacyparrot.com/search?search=http://www.noSuch...</a><p>Also, I suggest when you suggest adding “.com”, you strip spaces from the search as well. For instance, I searched for “Less Wrong” and found nothing, and you suggested “Less Wrong.com”. That doesn’t exist, but “LessWrong.com” does.
I'm not a lawyer but...<p>How can someone trust you to parse the policies correctly? What if someone sues you for incorrectly interpreting a policy which they then use to make a decision.
Is it really possible to keep user data safe during a bankruptcy? It is a tangible asset that may provide value to creditors.<p>I really want (for my co) the answer to be "yes".
I am just learning some of these machine learning tools and am rapt, so forgive me for asking, but would you be able to explain a little about what you are doing?<p>How are you generating features? Stanford parser? Are you using logistic regression or something more advanced?<p>I love the idea. I am interested in applying some of these concepts myself. Do you have any ideas that you are not able to pursue yourself, that I might take a crack at?
Would it be possibly to also identify sites that share your information?<p>A great example:
facebook.com
Does not sell your private information.<p>But they obviously do share information and while this is apparent to most users, how many sites practice the same and users are not aware of it?<p>Also, any plans to capture change in privacy policies over time? Often times, site owners do not proactively notify users when their policies or legalese has changed.
It does not appear to like subdomains. I've been trying to get it to visit <a href="http://news.ycombinator.com" rel="nofollow">http://news.ycombinator.com</a> and see what it thinks. But I keep getting back, We were unable to connect to <a href="http://www.news.ycombinator.com" rel="nofollow">http://www.news.ycombinator.com</a>. If it exists, please try again later.
> See if a site sells your personal information.<p>Rather, see if a site tells you that it sells your personal information. It's an important difference.
This is very cool. Maybe you could make a scriptlet bookmark that pops something on the page you are viewing. Here's one that will redirect you to the privacy parrot page : javascript:location.href="<a href="http://www.privacyparrot.com/privacy-policy-for-+location.host" rel="nofollow">http://www.privacyparrot.com/privacy-policy-for-+location.ho...</a>;
I tried the policizer with some copy and pasted policies but it frequently told me "CAN SELL" just because the text did not include any specifics regarding selling and bankruptcy<p>it that the intended behaviour?
Where's the code?<p>Seriously, where's the code? Your server seems to be getting hit pretty hard. Would be nice to be able to hack on it and to be able to host a mirror.