Operation Rosehub – patching thousands of open-source projects

727 点作者 fhoffa大约 8 年前

28 条评论

fhoffa大约 8 年前

This is one of the most impactful projects I've seen built using the GitHub source on BigQuery dataset (since we published it).If you want to see other use cases - I've collected plenty of other stories from multiple parties at:- <a href="https://medium.com/google-cloud/github-on-bigquery-analyze-all-the-code-b3576fd2b150" rel="nofollow">https://medium.com/google-cloud/github-on-bigquery-analyze-a...</a>Disclosure: I'm Felipe Hoffa and I work for Google Cloud (<a href="https://twitter.com/felipehoffa" rel="nofollow">https://twitter.com/felipehoffa</a>)

评论 #13771225 未加载

jayfk大约 8 年前

I've built something like this for Python projects.You add your repo and a bot is constantly checking for insecure and/or outdated packages and sends you a pull request if you need to update.It's free for open source projects at <a href="https://pyup.io" rel="nofollow">https://pyup.io</a>

评论 #13770296 未加载

评论 #13771208 未加载

评论 #13774253 未加载

评论 #13772400 未加载

评论 #13770650 未加载

评论 #13774376 未加载

rrggrr大约 8 年前

So many questions... What does this say about Google's hiring, about its employee's values, about values across the tech community? I can remember a time when managements would have shut this down, when employees would have said, "not my problem", when entire industries would have buried their heads in the sand.Is it the lack of liability and regulation that clears the way for this kind of corporate citizenship? Is it cultural?

评论 #13770660 未加载

评论 #13770492 未加载

评论 #13774380 未加载

评论 #13770718 未加载

评论 #13771991 未加载

评论 #13770561 未加载

评论 #13771280 未加载

vog大约 8 年前

I like the "bank teller" analogy used in the article.> it would be like hiring a bank teller who was trained to hand over all the money in the vault if asked to do so politely, and then entrusting that teller with the key. The only thing that would keep a bank safe in such a circumstance is that most people wouldn’t consider asking such a question.This does not only work for deserialization issues.It is a great analogy for a huge class of IT security issues!Maybe we should use that one when communicating with the media. This this works much better than the usual burglary analogy. I like how it points out that this is about stupid and/or malicious behaviour (code), where the attacker (hacker) just needs curiosity, and may find this out even by accident. The attacker did not have to break something, and did not damage anything, to get into something. In particular, this makes clear that this is caused by irresponsibile behaviour of the organization and/or other entities to whom they delegate trust.Even for more complicated scenarios, I like the bank teller analogy more than the classic burglary analogy. In that case, the attacker observes multuple bank tellers, and notices e.g. that if you ask the first teller for form A and put in certain words, another bank teller will accept it and give you a stamped form B, which you can show to a third teller in another branch office who will look a bit confused, but finally accept it and hand over all money to you.We need to get over blaming the messengers[1], buying zerodays and declaring cyberwar. What we really need to do is to finally make our[2] computer systems secure and trustworthy, at least up to a certain minimum-level of sanity: no exec, no injection (i.e. typing/tagging), no overflows (i.e. static analysis), input validation, testing, fuzzing, you name it.And this cannot work by just adding more and more complex security measures outside, but more importantly simplifying and cleaning up inside. Although rewriting software from scratch is very risky, radical refactoring is not! And every good software engineering course tells you how to do it correctly.[1] security researchers, but also "amateur" hackers, or just someone running into it by accident because the security issue became so large it finally had to be noticed by someone.[2] in the sense of: everyones!

评论 #13772596 未加载

tombh大约 8 年前

Is <a href="https://libraries.io" rel="nofollow">https://libraries.io</a> not a more comprehensive and community-focused response to the same problem?libraries.io did make it to the front page a few months ago, but I think its underlying vision might not have been driven home from just glancing at its home page. It supports 33 package managers (not just Java, though I'm sure Rosehub doesn't just do that either) and Github/Gitlab/Bitbucket, not just Github. And it provides both email notifications and auto PRs.But that's just the overlap with Rosehub. On top of that it offers the means to discover libraries based on a Dependency Rank (think Page Rank but using dependencies instead of hyperlinks). Which in turn allows it to surface projects with a high "Bus Factor" -- projects maintained by few committers, but depended on by many (so they'd be more affected by said committers getting run over by a bus). AND it mines the licenses for a project, notifying if any of the dependent licenses are incompatible with the parent license. What's more it's a non-profit organisation receiving enough funding to employ 2 full time devs.I think libraries.io is Rosehub and more, to quote the about page;<pre><code> Our goal is to raise the quality of all software, by raising the quality and frequency of contributions to free and open source software; the services, frameworks, plugins and tools we collectively refer to as libraries. </code></pre> To take the liberty of extrapolating from the libraries.io vision: open source security isn't just about fixing patches, but about supporting the environment, people, conditions and tools that contribute to open source software.

评论 #13771594 未加载

saurik大约 8 年前

I am extremely sad that this turns into an argument for making certain that all source code in the world is at least indirectly accessible specifically via GitHub (at which point people will find it there and expect the developers to respond and generally track everything going on there, even projects which are much happier using more open tools); like: it isn't sufficient that your code is "open", it actively has to be part of the unified GitHub empire.

评论 #13771483 未加载

orf大约 8 年前

In their query they do:<pre><code> FROM (SELECT id,content FROM (SELECT id,content FROM [bigquery-public-data:github_repos.contents] WHERE NOT binary) WHERE content CONTAINS 'commons-collections<') </code></pre> Why the subquery? Why not WHERE NOT binary AND content CONTAINS...? is this a bigquery thing?

评论 #13770341 未加载

评论 #13770246 未加载

评论 #13772915 未加载

评论 #13770373 未加载

评论 #13770240 未加载

tlrobinson大约 8 年前

Wow. I wonder how much a query that searches the content of all of Github costs (if you're not Google). This page says the dataset is 3TB+ <a href="https://cloud.google.com/bigquery/public-data/github" rel="nofollow">https://cloud.google.com/bigquery/public-data/github</a> and presumably most of that is content.

评论 #13770935 未加载

评论 #13770866 未加载

评论 #13770929 未加载

cypherpunks01大约 8 年前

Nice! That's some good citizenry.Interesting fact: Justine was the founder of occupywallst.org, which was the highest-trafficked publisher/web hub for the Occupy Wall Street movement before she worked for Google.

markcerqueira大约 8 年前

"Patches were sent to many projects, avoiding threats to public security for years to come."Are these pull requests that the project would still need to approve/merge or were they just pushed in?

评论 #13770116 未加载

评论 #13770287 未加载

评论 #13770120 未加载

luhn大约 8 年前

As scary as Google's massive size and power is, it's pretty awesome that they're incentivized to do things like this to help the internet because they are the internet.

mrgrowth大约 8 年前

I read so many of these kinds of articles out of curiosity and rarely understand them.Thank you for adding in the part about the bank teller.For reference: "it would be like hiring a bank teller who was trained to hand over all the money in the vault if asked to do so politely, and then entrusting that teller with the key."

joelthelion大约 8 年前

> But unlike big businesses, open source projects don’t have people on staffTo read that from Google is frankly disappointing. While this is true of many open-source projects, it doesn't have to be that way. Red Hat (and Google!) are brilliant proofs of this.

评论 #13772493 未加载

bla2大约 8 年前

Really cool, kudos to people helping with this. I wonder if this could have been done in a way that non-Googlers could have pitched in too, given that this is for a public good -- but it's tricky with security issues.

hokkos大约 8 年前

How does it work for transitive depandancies ? If you use a package that use a vulnerable Apache common? Does a pr is sent to update the package when it is updated?

tropo大约 8 年前

If I understand it right, this bug involves code pulling in old buggy libraries, sometimes indirectly via other libraries. It seems that there is a reference to a specific bad version, not the actual inclusion of cut-and-paste code.Eh, why not just get rid of the bad version? Alternately, release a bug-fixed copy with the same version number.Any breakage is a case of "oh well, you're safe now". Leaving the security hole is probably worse breakage.

评论 #13771462 未加载

评论 #13775308 未加载

make3大约 8 年前

I wish you could do the same thing with mental illness.. massively send pull request to correct everyone's bad brain code.. <sorry>

评论 #13790235 未加载

评论 #13771248 未加载

mirekrusin大约 8 年前

It's interesting that this type of initiative, which is admirable, will spike up some java "popularity" metrics on GitHub.

hawski大约 8 年前

I was thinking about doing something similar with bigquery and github data to search for uses of strncpy in C code. But I am not that good with the query language and also bigquery didn’t support multiple users properly (this adds friction).I still think it’s a good idea. It would be even better to search for a few C pitfalls more, but strncpy is probably the easiest to search for.

ploxiln大约 8 年前

I think this is one good concrete example of why the npm style of private dependencies for each lib is not the greatest thing ever, while the non-recursive style in python (or C) is overall more manageable (if you are actually managing your dependencies instead of ignoring them).

codelion大约 8 年前

we have been doing thus for a while now : <a href="https://www.sourceclear.com/blog/millions-of-program-builds-vulnerable-to-man-in-the-middle-attacks/" rel="nofollow">https://www.sourceclear.com/blog/millions-of-program-builds-...</a>

评论 #13770451 未加载

11928311大约 8 年前

So, Google does ... something and is showered with praise.Thousands of volunteers work in the saltmines and get nothing.Business as usual. Myths like "Google sponsored Python!!!" propagate when they do nothing at all.Disgusting.

Dem0stheneS大约 8 年前

That's outstanding news. Hats off to the volunteers doing the work on this.

rburhum大约 8 年前

Mad thank-yous to Google for this!

lvlds大约 8 年前

Awesome! Contgrats to the team!

snambi大约 8 年前

What is in it for google?

评论 #13774525 未加载

评论 #13771202 未加载

muzster大约 8 年前

Operation Rosebud

lolive大约 8 年前

Wouldn't a graph database be a more suitable tool for that kind of task?

评论 #13774545 未加载