I explored an alternative way to view codebases to the typical folder/file list, showing a bird's-eye-view of its structure.<p>https://octo.github.com/projects/repo-visualization
We always look at our code in a file/folder list - I explored an alternative way to view codebases, showing a bird's-eye-view of its structure. This write-up walks through the motivations, ways to use the visualization, and potential future directions (there are many!).<p>There's also an interactive tool to check out your own repos and a GitHub Action if you want to integrate a diagram into a README.
I honestly don't understand the point of this.<p>Is it just a visualization of the directory structure?<p>I expect code visualization to more or less ignore the file structure and focus on semantic analysis. For example, show the major components of the system and how they interact. Perhaps the major components are represented by some kind of a module, or a collection of modules. I don't have any concrete ideas. But I was expecting something in this vicinity.
Personally I find it interesting as tools like these have been helping me to understand the team's and delivery dynamics when I'm joining new dev-teams.<p>This particular is great, as it reminds me of a book that I had read a while back; Your Code as a Crime Scene [1] by Adam Tornhill.<p>Adam is trying to explain something similar, but takes the whole concept onto the next level by explaining how tech debt and hidden coupling can be discovered using the git history and similar file structure visualisations.<p>[1] <a href="https://pragprog.com/titles/atcrime/your-code-as-a-crime-scene/" rel="nofollow">https://pragprog.com/titles/atcrime/your-code-as-a-crime-sce...</a>
This is so cool. I really appreciate how they added the "Search for a file" and "Excluded paths" to the demo. Makes it a lot more useful while still so simple to use.<p>Edit: the more I play with it the more I like it. Also just noticed their feature to deep link to repos (example: <a href="https://octo-repo-visualization.vercel.app/?repo=owid%2Fowid-grapher" rel="nofollow">https://octo-repo-visualization.vercel.app/?repo=owid%2Fowid...</a>). The future directions they mention also sound really exciting. Seeing files that cause a lot of CI failures, files by # of authors, files by # of changes, all that stuff would be really cool.
My favorite tool for visualizing a codebase is Gource. Here's a 1 minute visualization of the Linux Kernel repository from 1991-2015 <a href="https://twitter.com/mattrickard/status/1423366779590430721" rel="nofollow">https://twitter.com/mattrickard/status/1423366779590430721</a>
I really really want to like these sorts of visualizations. But they just fall flat on me.<p>The "you can see really quickly..." text is scrolling by and I'm like... "Nope, that picture still means nothing to me." :(. It starts highlighting different parts and I'm completely at a loss on what is highlighted.<p>I do think this can be very effective once I'm trained on it. Such that I plan to play with it. But I just don't visually think of programs in anything close to this manner.<p>Anyone know if studies that explore how we think of our programs?
I worked on something like this a few years ago, only in VR so you could walk around the visualization and use your spatial recognition abilities in 3D.<p>One part we struggled with was evolving the visualization with the codebase. I see in the demos at the bottom that small changes to the codebase can have a large impact on the visualization (unless I'm missing something), making it difficult to treat the visualization as a fingerprint over time. I wonder if there are plans to address this.<p>This is an area I'm very interested in, happy to chat about it any time.
Apologies for the hot take, but imo GitHub has been really knocking it out of the park with terrible ideas lately (remember how everyone fell all over themselves during the Copilot release?). This is an absolutely worthless visualization that only impresses those that haven't heavily worked with visualizations. A few points right off the bat:<p><pre><code> - Labels are way too small, so you'll need to zoom in..
- ...but if you zoom in, you'll need to pan...
- ...and if you need to pan, you lose context
- Hovering over "connected files" is just a jumbled mess
</code></pre>
Case in point: look at the `paperjs/paper.js` example they themselves provide. There's a big circle called "packages" and inside that circle, two smaller circles that all contain the exact same files: "package.json," "index.js," and "README.md" -- how is this insightful in any way? I need to go to the repo to actually see that one of the folders is called "paper-jsdom" and the other one "paper-jsdom-canvas." The visualization literally confuses me more than just looking at the repo.<p>I don't mean to be overly negative, but it's just not a good visualization and no one will ever seriously use this.
I tried with <a href="https://github.com/racket/racket" rel="nofollow">https://github.com/racket/racket</a> and for some reason it puts all the content of the subfolder "racket"/"src" in a vertical strip near the middle of the circle instead of spreading the parts evenly. How is each part arranged?
Took a look at my own codebase, which is 99% Rust. All gray, I'm guessing Rust isn't currently a recognized file type? Either way, very nice! I currently use the "dirtree" tool (<a href="https://github.com/emad-elsaid/dirtree" rel="nofollow">https://github.com/emad-elsaid/dirtree</a>) to generate diagrams like this of my codebase for documentation: <a href="https://github.com/WhiteBeamSec/WhiteBeam/wiki/Code-layout" rel="nofollow">https://github.com/WhiteBeamSec/WhiteBeam/wiki/Code-layout</a><p>The "eralchemy" tool (<a href="https://github.com/Alexis-benoist/eralchemy" rel="nofollow">https://github.com/Alexis-benoist/eralchemy</a>) is also excellent at visualizing SQL databases: <a href="https://github.com/WhiteBeamSec/WhiteBeam/wiki/SQL-schema" rel="nofollow">https://github.com/WhiteBeamSec/WhiteBeam/wiki/SQL-schema</a>
Oddly enough, the page partially loads, hangs, crashes the tab and attempts to reload, hangs, and then crashes the entire running mobile chrome instance on my phone.<p>I don't think I've ever seen that before! I'm guessing the page is just memory heavy and android 11's memory manager can't figure out how to deal with it.<p>( chrome mobile, pixel 3 xl, android 11 )
That's the same visualisation used by CodeScene (<a href="https://codescene.com/how-it-works/" rel="nofollow">https://codescene.com/how-it-works/</a>), but there is a more elaborate one that represents affinity (number of connections) as distance to create something like a geographical map: <a href="https://homepages.ecs.vuw.ac.nz/~craig/publications/vissoft2015-hawes.pdf" rel="nofollow">https://homepages.ecs.vuw.ac.nz/~craig/publications/vissoft2...</a>
Very fun! Would a similar visualization work for showing the insides of a go binary?<p>It would be super cool to have a way to visualize how different modules add bloat in size (and may pull in other bloaty modules as well)
This is cool but using rectangles instead of circles would help this visualization. Circles waste real estate and not friendly to labels (e.g. curved text that is harder to read)
I found this useful on my project. I realized I have many 'dusts' files in directories. Tiny little guys just like grains of sand nestling among the larger circles, looking to be useful. Beautiful structure and images! I love seeing my beautiful work in this beautiful format. It really brings out the beauty! :)
Random nitpick: the issue with color coding files is that you may have many different file types leading to colors that overlap.<p>Case in point in the author's create-react-app example: in one of the scrolling "comment boxes", the author calls out that the "tasks/" folder is mainly CSS files which made me raise an eyebrow...why would a tasks folder be mainly CSS files? -- and upon closer inspection of the colored legend .sh files are a VERY similar green. Just to satisfy my curiosity I visited the repo and sure enough, it was just .sh files, without a single .css file.<p>It makes me doubt the experience of the author...how can a folder called tasks/ (in any repo) be .css files?
This is cool . I remember using “Understand for C++” that does something like this , a full source code graph visualization- function flow etc. This of course starts as a folder visualization , but I see the value- seeing the big picture
Did something similar some time ago: <a href="http://quantifiedcode.github.io/code-is-beautiful/" rel="nofollow">http://quantifiedcode.github.io/code-is-beautiful/</a>
The best tool I found for exploring code bases and navigating large projects is <a href="https://www.sourcetrail.com/" rel="nofollow">https://www.sourcetrail.com/</a>.
That's a great visual presentation, but not really an innovation. The CodeScene tool has that built in together with a set of deep analyses on top (see <a href="https://codescene.com/" rel="nofollow">https://codescene.com/</a>).<p>There are several public conference talks that cover this visualization and related use cases: <a href="https://www.youtube.com/watch?v=fl4aZ2KXBsQ" rel="nofollow">https://www.youtube.com/watch?v=fl4aZ2KXBsQ</a>
Shameless plug, but this was also showing the content of the file!
<a href="https://github.com/facebook/pfff/wiki/CodeMap" rel="nofollow">https://github.com/facebook/pfff/wiki/CodeMap</a><p>There are lots of examples of OSS projects visualized here:
<a href="https://github.com/facebookarchive/pfff/wiki/Examples" rel="nofollow">https://github.com/facebookarchive/pfff/wiki/Examples</a>
I find that it's useful not just to look at the current contents of codebase, but how it has evolved over time. For example, after being onboarded, this lets me see where most of the current effort on a codebase is concentrated and what the biggest recent changes have been.<p>For this, i believe that Gource is a lovely tool, which you can just point at a Git repository and it will visualize it: <a href="https://gource.io/" rel="nofollow">https://gource.io/</a>
I like the quick insights I can gain from this! Very promising. It's very basic in it's current implementation but I see a lot of potential specially about the "how files are linked" part.<p>It's a nice bird's eye view. One thing I'd like is for there to be multiple metrics to use for the size of packages e.g. lines of code, number of files, number of methods etc.<p>That way you can make sense of what are the heavyweight parts of the codebase.
Weird seeing this as a Show HN. That said: since MS and GitHub are the same company… one of the things I really want is to be able to opt in to reference/search into dependencies. I don’t need visualization, I need “yes show me node_modules/*/*.js when normally I wouldn’t want that.” I use a VSCode extension that does this in the file browser, but I want it across everything that determines whether something is hidden.
Perhaps it's not fair, but the first repo I thought of trying, aws/aws-cli caused it to freeze my browser's tab. When it finally unfroze, I'm presented with a few large circles and way too many tiny dots to be useful.<p>Guess there's an upper limit on the size of the repo, or perhaps it's more geared to different "shapes" of layouts.