I have recently started working at a company with a massive python codebase. I'm finding it really hard to understand what is going on because of the lack of types. I don't know what the arguments to functions are supposed to be or what they are returning, and because its a fairly sophisticated, large microservice architecture it is even harder to trace things across services.<p>My experience before this was working on a much smaller Java codebase - it was written well and really easy to understand. The fact that we didn't really have much tooling was a huge help to me too - so when I deployed a service for example, I could see the shell commands and its basically a "mvn package; scp ...; start ...". When i wanted to know where the logs go, I look at a log4j config in the project. When i want to know whats running, I ssh in and "ps aux | grep x" - there are maybe 10 servers tops for a project.<p>This is a much bigger company with hundreds of services, lots of machines, complex deploy procedures and in general just much more abstraction both in codebase and ops. My normal approach is to just dig deeper and peel back the abstractions until I see whats happening. I fear thats impossible here. Instead, we ask someone who built it or look at an outdated wiki, which is great but just so much more painful than actually <i>knowing</i> how everything fits together.<p>I am kind of lost as to where to even start understanding this codebase. The 3 main challenges being 1. no type information, 2. massive codebase, 3. ops complexity. What are your tips for getting to grips with this whole system? Any advice is welcome.
Well, I would get PyCharm for this particular issue. In these cases you are usually going to have to search all files for variable names or classes. Learn all the keystrokes for finding references and viewing class and call trees. If you are going to be owning the code add some type hinting comments for the classes and functions that you deal with frequently.
Personally, I'd suggest finding a starting point and tracing through a couple common operations. That will give you a better idea of why the code that's there exists and why it works the way it does.<p>Usually, this habit works better on languages like Python than languages like Java that encourage repetition and the proliferation of abstraction layers. You'll do less code-reading in a large Python codebase, because you won't be dealing with large sets of wrapper classes that exist basically to circumvent inheritance restrictions. But, I typically do this for any large codebase I'm expected to understand, even if it's in Java.<p>Within a particular use case, the lack of explicit types shouldn't matter too much, if you are reading in execution order. Types are useful if you're starting from some random function and trying to work backwards, but that's hard in large projects regardless of the language. While python code may well take heavy advantage of duck typing, reading code in execution order should make the set of possible types to be passed in from a particular point clear.<p>Large projects, no matter the language, take a while to fully internalize. Your difficulty probably has nothing to do with Java vs Python and everything to do with small codebases versus large codebases. (I say this as someone who works with a very large java codebase and a very large mixed shell/perl codebase at work and works with several large python codebases on the side.)
1) What you really want is a mental model of what the application is doing as a whole and what the subsection you're working on now is doing. Look for existing diagrams if possible. If there aren't any start making them as you go along and then have other devs verify. Then try to own a subsection of the code and really understand what it's doing. Once you feel comfortable with that move on to another, preferably related piece.<p>2)While you normally don't need or want an IDE for dynamic languages, it sounds like this codebase has reached the point where you should be using one and in particular be using the jump to definition shortcut. If you make a type error, this should help you catch it quickly. PyCharm is a good one but there are alternatives.<p>3) Operational complexity should be owned by someone already. If not, this may be the place where you can make the biggest impact. See if there is centralized logging set up. Make sure the setup/installation and deployment procedures are up to date. These can typically be automated pretty easily<p>4) An alternative approach is to look at the data first. Once you understand the core data models, you typically understand the application
Having worked on a large python code base as well I can tell you that sometimes it's hard. Look at the types, determine if they're instance of and try build small test runs around existing code before you modify it. python tools for visual studio is sweet and may give some help with getting around, as will pycharm. If figuring out where logs are or other configuration details are hard to find get them into configuration files. I guess you can draw the high level of how it fits together but start small, pick a class or method and follow it through. Oh, enjoy it. python was lots of fun for me
use print type() print dir() to debug and see what the objects are / can do<p>try to think more on the functionality than on what's the type of the objects the method have as input/output