How do you guys approach the "start" of reading a code base?<p>I never know where to start looking, specifically if its a language i am not too familiar with i have no idea where to start and sometimes i have no idea where the program execution starts.<p>Is there maybe a ladder of small to large open source projects that can get you there?<p>For example i have no idea how to begin reading the Flask(1) open source code.<p>What approach can i take to get to a point where i can analyze a project like Flask and get something meaningful from it?<p>(1) https://github.com/pallets/flask
The only way I’ve ever been able to learn a code base is by fixing bugs or implementing features. If I don’t have an immediate goal in mind, nothing actually sticks in my mind. I feel the same about reading most programming books.
TLDR; Divide-and-conquer heuristic.<p>When I was contracting, every few months I was looking at a new-to-me code base. First - if you don't know the language the code was written in, then you need to learn that at some basic level. Next, if there was some framework or set of libraries involved, see how many of those you can identify and how many you might already know. Start with trying to run through a build, as you'll either have success or it will throw errors and that will teach you something about both the health and the components of the project.<p>Once you actually get into the codebase, look for patterns. Often there will be pretty clear layers or types of components, identifiable either by name or organizational structure. Sometimes you'll see patterns that tell you "Oh, George must have written all this code, and Steve must have written all that code" because of personal idiosyncrasies. That's the worst case, but sometimes it's what you have to go on.<p>I spent a lot of years in application triage, where I was called in to fix serious problems in apps with which I had no experience. Like today's Apple outage - I used to parachute into situations like that and had to get things running, fix, post-mortem, etc. There are great tools out there where you can inject them into running code and watch real / test users execute code. Thats's a great way to very quickly learn how everything is assembled, IE: why did that piece of code get executed there, or <i>why didn't</i> that piece of code get executed.
This post may be useful: Contributing to Complex Projects <a href="https://mitchellh.com/writing/contributing-to-complex-projects" rel="nofollow">https://mitchellh.com/writing/contributing-to-complex-projec...</a>
I don't. I have no illusion that i will understand a large code base, even one i started, and have been using for 10 years. But when you have a hundred developers constantly adding code all the time, you are not going to know it all. Find and fix the parts you need to work on. Don't worry about the rest.
I've never worked on large open source projects (except maybe as part of my job), but every new company I join inevitably has some large codebase I have to familiarize myself with. In my experience.<p>- It just takes a long time: I find I'll spend a good 1.5-2 years at a company before I can finally say with confidence I have a decent grasp on the entire codebase;<p>- I try to focus on making small changes/bugfixes to specific components. Do that enough and you start to see how things fit together;<p>- Running the code with lots of logging sometimes helps;<p>- Finding some sort of high-level architecture diagram or documentation works wonders. Usually in the industry there's one dude(ine) who's been there for a while to set you on the right path, not sure about FLOSS...
You should always be aware of the business domain of the code. My general pattern is to actually have an idea of what the software does. Then I locate where the general code is. After I ask what modules I'm interested in - most codebases have clearly delimited modules. Then I work from the entry point on that module and look on what it returns. I sometimes look at other part of the code to assert my assumptions, but generally, I stay in one place until I have a good idea of its behavior. Then after I have a good model, I move on to the next one. The goal is to understand each unit enough that I can write a couple of sentences or paragraphs about it - from functions to whole file.
I always approach a codebase with intent to change it, even if I'm just casually curious. This leads me to specific points deep in the codebase that naturally lead to more exploratory questions, such as how/when the execution path gets there, how the change can be tested, etc. In short, I try to look inside, then move outside.<p>Using blame or navigating closed issues can do a lot to put specific problems into plain English to help with this kind of inward-out exploration (which can also help with understanding the use cases of the application). Same with reading any tests that might be there. And if there aren't tests, writing them is a great way to get into understanding specific functionality.
1. Practice. It’s gotten easier for me over the years because my day job involves regular code/architecture review over tons of repos.<p>2. Ask the developers.<p>3. Play with the software and trace through the source code. For flask, make a simple application and grep for the definitions of the functions/methods/classes you’re referencing. While doing this, you will find references to other flask functions. Find read those definitions Ad-infinitum and eventually it’ll “click”.
I'm not sure I understand why you would expect to be able to do this with a language you are not very familiar with, although I also am not sure exactly what not very familiar means here.<p>I hope however if you are going to analyse a codebase in a language you are not familiar with one of two things apply -<p>1. There is a part of the codebase that uses some language you are very familiar with, or that touches a specific knowledge domain (like frontend development or database querying) that you are very familiar with, if so start at the point of greatest familiarity and try to walk backwards and document for yourself bits of code that are touching the stuff you do know well.<p>2. You are in a situation where you are a secondary developer to someone who is an expert in the language, hopefully they can give you a tour of the codebase.<p>If neither of these are the case I guess you are going to have to learn the language and get a beginners guide to Flask or something like that. Maybe ask on a forum for the language what the best learning sources are.
I think start with a local setup, make sure changes are updating locally as expected. Then hit the docs, and have them ready to look things up, although browsing through them is also helpful at the beginning.<p>As far as reading and understanding the code, I start with entry point and make my way through doing a mental map. Once I understand how the app structure I setup and some of the conventions, I dive in to specific features and try to make changes.<p>Additionally, I like to diagram things as well, just for my own mental map model. Depending on the tech stack there could be libs out there that diagram the app with a way to navigate, and this is also really helpful.
I generally look at what the index.ts file is doing, since in... let's say, a React app, that's the file that starts everything up (root component render, etc.) From there you can keep going down the chain to the smallest utilities.
I start from invocation and go deeper. This could be main, or perhaps specific API endpoints. If you’re unfamiliar with the framework, figuring out where those endpoints live is a good step 1.