Hi all,<p>Can any suggest a good way of understanding a largish code base (thousands of lines)? Is there any tools to help you visualize/understand a code base? In particular, i'm looking for tools for javascript/ruby/python code bases. Thank you very much!
Once upon a time, a client of the day job dropped a substantial Perl codebase on our desk and said "Tell us what this does." My boss gave me the job, and expected me to actually READ all the code, but given that I had no desire to read through 100 kloc of Perl code commented only in Japanese, I went for visualization instead:<p>1) Inspect several files for commonalities. Thankfully, the author was obsessive compulsive about coding standards.<p>2) Write a parser for the Perl they used. Use it to glean what pages of the site were connected to each other and what the flow control was like. (a -> b, b -> c, b -> d, etc)<p>3) Plot that on a graph (all hard work already done for me: <a href="http://rgl.rubyforge.org/rgl/index.html" rel="nofollow">http://rgl.rubyforge.org/rgl/index.html</a> )<p>4) Visually inspect the graph to learn non-obvious things about the codebase like "Oh, there is an English language version of the site embedded in here. Isn't that TOTALLY UNDOCUMENTED." Do a bit more code to chop the graph into subgraphs by related functionality (signup flow, admin functions, etc etc).<p>5) Spit out all the code into HTML pages with appropriate autogenerated navigation, inline flow control graphs, and syntax highlighting. Do a bit of quality control, add in some comments about notable things I had learned, burn on CD and hand to customer.<p>6) Charge customer $X0,000 for the CD. The customer was overjoyed they got it done so cheaply. (Did I mention <i>100 kloc of Perl</i>?)
Google Code Search:<p><a href="http://www.google.com/codesearch" rel="nofollow">http://www.google.com/codesearch</a><p>I use it for working my way around Google's codebase, which is a few orders of magnitude bigger than that.<p>Also, there's no substitute for getting your hands dirty and diving into the code. You don't really understand something until you've changed it a few times. Grab a couple of low-priority bugs and write some patches for them; you'll learn far more than if you just sit down and study things.
Itd be good to get a debugger, connect it to the process and run through the code. You will find a pattern emerging when playing with the application, say, for different clicks of a webapp. After a few days, you will know how the flow works.<p>That said, Ive been more successful looking at the forums of an open source project, figure out what problems folks have and trying to solve them. You will be amazed to see how much you can learn about the code base and undocumented features solving those problems.
I guess you could also take a look at a tool called "LXR Cross Referencer" at <a href="http://sourceforge.net/projects/lxr/" rel="nofollow">http://sourceforge.net/projects/lxr/</a>. Page <a href="http://fxr.watson.org" rel="nofollow">http://fxr.watson.org</a> includes LXR-generated cross references for multiple operating systems.
Maybe try this book? <a href="http://www.amazon.com/Code-Reading-Open-Source-Perspective/dp/0201799405" rel="nofollow">http://www.amazon.com/Code-Reading-Open-Source-Perspective/d...</a> I'm not sure how good it is.