I've also been experimenting with generating embeddings from repos/docs, and even though embeddings are cheap, I find myself making sure I record every last one to disk so I don't have to hit the API again for that piece of content.<p>It strikes me as wasteful to potentially generate them multiple times for the same repo. Is it against TOS to publicly share generated embeddings and collaborate to build a database of codebase (and/or more) vectors? Is there anything like that yet?<p>Of course code changes over time, but they'd be useful for e.g. a stable release of a library. For big codebases, it'd also be more economical to re-embed only changed files (or functions/classes, more granularly).