As is the case with many companies, our team has decided that the risk of using hosted AI solutions is too high. Are there any other solutions out there that would allow us to train on our own codebase without exposing it?
Just started trying to figure out how to get StarCoder helping me today. Runs on my MacBook & can generate a fibonnaci unsurprisingly but I have so little idea how to use this thing to help myself, so far.<p>Repl.it and Salesforce also have code LLM, ReplitLM and CodeTF.<p>Different topic but how does one train / tune to their existing codebases? StarCode for example also ships a python tuned model. Do any of the services the submitter mentioned actually do any tuning to your company's codebases in any way?
The Falcon 40B param model may have potential, but at this point, I would be surprised if it were ready for primetime code completion tasks. It might be useful for answering questions about the code from a local vector store. I'd imagine that not using AI for code completion tasks in 2023 also carries some degree of risk. What are the primary concerns of your team? OpenAI using your IP against you or something else?
Look at "starcoder". It's not strictly open source, is has a condescending "responsible AI" license, but you could easily download it and run it on premise.
Code has become really complicated nowadays. I don't know if AI training will work only on a single code base. Unless you are only looking forward to auto complete