Just wondering if for some reason copilot shuts down, I was wondering if it's possible to home brew it.<p>Some hurdles I see:<p>- Github rate limits the GET requests so it doesn't seem possible to scrape all the source code on there. But maybe it can be crowdsourced like seti@home so 1000 people can install a program to get around this.<p>- Training the model. I would imagine this would be hardest as it would need millions of dollars for this? Is there a way to get around it or using free tools like colab?<p>- Running the api. Once the model is trained, would it be possible to run it on a lenovo type laptop? I guess you need lots of VRAM to run it?<p>Final question is will a home brewed version be just as good? What factors determine that?<p>Just curious on how we can do it as I imagine there a lot of ML experts here.
There is a 3TB model “The Stack” which I believe is partly designed for this: all of the code is properly licensed.<p>Training the model would be expensive but it’s a one-and-done process. With the model openly available cloud providers could provide a subscription service to end-users which recoups the cost of running it.<p>The only issue is I imagine GitHub has <i>much</i> more code than 3TB.
BTW, this wouldn't solve the legal hurdles of Copilot.
The model needs to mention which license the code has, which AFAIK Amazon's competitor to Copilot already does that.
Not a complete replacement, but very cool and related: <a href="https://github.com/webyrd/Barliman" rel="nofollow">https://github.com/webyrd/Barliman</a>
You’ll need > 20 GB of GPU memory run the model.<p>This is the same reason people can’t easily “play with” GPT like models.<p>> would it be possible to run it on a lenovo type laptop?<p>No.<p>You might, with a hybrid Mac book pro M1 or M2 with 64GB of combined memory; pretty much any other lapto, categorically no.<p>You’d have to rent / own a separate server with epic GPU power.<p>> Final question is will a home brewed version be just as good?<p>No.<p>The open source language models are not as good as GPT3.
I don't think most open source dev:s want CoPilot or a FOSS alternative for this very reason:<p>Code assist AI does no attribution.<p>This removes engagement between the dev and library authors. this ruins chances of engaging new contributors over time, eroding and killing the FOSS communities.<p>Code assist AI also does not care about licenses. See [1]<p>1: <a href="https://www.bleepingcomputer.com/news/security/microsoft-sued-for-open-source-piracy-through-github-copilot/" rel="nofollow">https://www.bleepingcomputer.com/news/security/microsoft-sue...</a>