科技回声

12 条评论

Tenoke将近 4 年前

Dupe: <a href="https://news.ycombinator.com/item?id=27769440" rel="nofollow">https://news.ycombinator.com/item?id=27769440</a>

评论 #27771865 未加载

vladharbuz将近 4 年前

Do you remember when someone made a vim extension that would autocomplete code with Stack Overflow answers, as a joke? Why is it that in 2021 we're taking this kind of tool seriously? Have we reduced our field and craft to something that can just be autocompleted?

评论 #27771860 未加载

评论 #27771877 未加载

评论 #27771907 未加载

评论 #27771910 未加载

评论 #27777630 未加载

评论 #27772019 未加载

评论 #27772816 未加载

评论 #27771921 未加载

sundarurfriend将近 4 年前

Does the licence make a difference regarding whether or not Copilot code is legal to use? To my understanding, the crux of the argument comes down to whether data crunched down and regurgitated by machine learning algorithms retains its licence - regardless of what it is.I suppose if we assume it retains it, then TFA's claim creates a further question of how the hell anyone using copilot can conform to this unholy mixture of all the licences. But that's a big assumption at this point.

评论 #27772050 未加载

qayxc将近 4 年前

I don't understand the argument here.Yes, sometimes code is returned that is a verbatim reproduction of the training data. This can be prevented if need be.What I really don't understand is how some people are complaining about GPL'ed code being used for training.What's the difference between a machine looking at the code and learning from it and a human being doing the same. As long as the code isn't patented, there's no reason why I shouldn't be able to look at GPL'ed code and implement the idea using my own code.In other words, is - according to those who think using GPL'ed code for ML training - every implementation a derived work if I looked at GPL'ed code that implemented the same algorithm? Where's the line that separates plagiarism from original work? Is there even such a line? Does it matter whether the GPL'ed code is encoded in human neurons or network weights after looking at it and if so, why?

shireboy将近 4 年前

Is there nuance here in what is used to train vs what is used in completions? For example, if all public code was used to train a ML model, but then the autocomplete feature only pasted in uniquely generated or licensed code, that would be different than if it pasted in verbatim licensed code without attribution or whatever the license requires.It could be like a dev reading public code enough to understand it, but then coming up with her own implementation. Not saying copilot works that way - I haven’t tested it yet. But could be one nuance here.

评论 #27772113 未加载

mullikine将近 4 年前

The cat (this technology) is coming out of the bag one way or another. It's just too useful. Where is it written that inspiring a language model with data is not just as infringing as copy and paste? It's a very grey line. Public facing open-source code & media is going to be learned by language models because they're exposed to them. I'm fully expecting that if I begin a story and put it on my blog or on github, and if I go away for 5 years, I'll see it completed for me when I return.

randomperson_24将近 4 年前

So, if any public code is allowed and we let it slip through. Google can say for example see all text on Google Docs and __sell__ a solution that generates text for you. Is this allowed?What about images?

评论 #27771912 未加载

评论 #27771904 未加载

chadlavi将近 4 年前

Anyone can search and browse public code and copy and paste it into their own project regardless of license, too.

评论 #27771859 未加载

评论 #27771885 未加载

评论 #27771879 未加载

评论 #27771867 未加载

mherrmann将近 4 年前

Could a solution to the license problem be that the auto-completion also shows you the code's license?

评论 #27772095 未加载

oauea将近 4 年前

Why wouldn't they use the code you gave them permission to use by agreeing to their TOS?

评论 #27772037 未加载

评论 #27771948 未加载

评论 #27771863 未加载

JohnWhigham将近 4 年前

I really don't know why people thought a README file is going to stop one of the largest companies on the planet from slurping up all its hosted code and doing what it wants with it.

jjoergensen将近 4 年前

Google has trained their web search on much of the internet. Is it problematic?

评论 #27772169 未加载

12 条评论

Tenoke将近 4 年前

Dupe: <a href="https://news.ycombinator.com/item?id=27769440" rel="nofollow">https://news.ycombinator.com/item?id=27769440</a>

评论 #27771865 未加载

vladharbuz将近 4 年前

评论 #27771860 未加载

评论 #27771877 未加载

评论 #27771907 未加载

评论 #27771910 未加载

评论 #27777630 未加载

评论 #27772019 未加载

评论 #27772816 未加载

评论 #27771921 未加载

sundarurfriend将近 4 年前

评论 #27772050 未加载

qayxc将近 4 年前

shireboy将近 4 年前

评论 #27772113 未加载

mullikine将近 4 年前

randomperson_24将近 4 年前

评论 #27771912 未加载

评论 #27771904 未加载

chadlavi将近 4 年前

Anyone can search and browse public code and copy and paste it into their own project regardless of license, too.

评论 #27771859 未加载

评论 #27771885 未加载

评论 #27771879 未加载

评论 #27771867 未加载

mherrmann将近 4 年前

Could a solution to the license problem be that the auto-completion also shows you the code's license?

评论 #27772095 未加载

oauea将近 4 年前

Why wouldn't they use the code you gave them permission to use by agreeing to their TOS?

评论 #27772037 未加载

评论 #27771948 未加载

评论 #27771863 未加载

JohnWhigham将近 4 年前

I really don't know why people thought a README file is going to stop one of the largest companies on the planet from slurping up all its hosted code and doing what it wants with it.

jjoergensen将近 4 年前

Google has trained their web search on much of the internet. Is it problematic?

评论 #27772169 未加载

GitHub confirmed using all public code for training copilot regardless license

12 条评论

GitHub confirmed using all public code for training copilot regardless license

12 条评论