This is a pretty fantastic , high level overview of their system.<p>I also appreciate that they cover infrastructure and data collection aspects, which are often glossed over.
> Code data is obtained from license-filtered1 open source repositories on GitHub. The bulk of the code data covers 14 common programming languages, including: Swift, Python, C, Objective-C, C++, JavaScript, Java, and Go.<p>That's interesting, looking forward to seeing how accurate the output is for Swift.