What is the fastest way to onboard to a new codebase?

artiden profile image Grace Lam ・1 min read

What has worked well to help you to quickly become productive, and what has been frustrating?


Editor guide

Short answer: First, RTFM and then run unit test suites targeting specific modules of interest if available and stepping through the code with a debugger.

Longer answer: I'm a Python guy, but the strategy has been applicable to most codebases I've encountered, unless the debugger at the time didn't really allow for it (I'm looking at YOU, early Golang). Typically (and hopefully), the project will have module-targeted tests. For instance, if I have a module project_name/auth.py, there would be accompanying tests in tests/test_auth.py or similar. Depending on what's being tested (is it a "true" unit test, or a unit test-y thing, as in the case of Django's client tests) I'll set a breakpoint just before a given call and just step into each block of code, or at least just enough to get my head wrapped around the control flow or set a breakpoint as early in the executing code as I deem needed.

In the case of Django (and similar)-land, the unit test-ish approach is kind of nice for this approach. Rather than testing a single class method or function input and output, it goes through the entire lifecycle of the request. Because of this, I can very easily and quickly get a high level overview of all the moving pieces.

Frustrations have been mostly been hit when a language doesn't support the approach or I'm forced to use an IDE (thanks Java ;)). Python-specific frustrations have come sometimes when side effects occur unexpectedly (i.e. an outbound request or database state change on a property accessor).


Interesting. What if there aren't unit tests available?


Then that would be an incredible source of frustration and likely not a code base I'd want to work with ;) If actually presented that problem though, you can always follow the same pattern: Use breakpoints and use the system rather than running unit tests.

Of course, that story may very well change depending on the complexity of the system. For example, debugging a distributed system where you can't really poke at component-level pieces locally doesn't work quite as well. Then it becomes more hope that there's sufficient logging to allow you to follow the code flow and read through the logs relating it back to the code as much as possible.

I guess I should have prefaced my blurb with: It all depends on what it is that you're trying to wrap your head around.


Used to be writing tests for me, but on huge codebases that started driving me crazy. I was figuring out how to run the tests, where the feature was implemented, how it fit into the rest of the codebase, and how it was supposed to work. On one hand, that's a great way to really quickly get used to lots of the code but on the other hand, it's a crucible that you might not come out of!

These days, it's writing docs. Something about taking a feature (or maybe the whole project!) and writing about it in English really helps the code click for me.


Good idea. Writing docs for a new codebase could help process a lot of information as you'd have to translate it from code to English.


Static analysis tools are your friend! On .NET codebases, I will run NDepend to get a good idea of where the most active code is in the solution since I will probably be spending lots of time there. It also lets me see where there is duplication and I can eliminate studying code I have seen elsewhere.

Set up a full environment if you can - get a VM and set up the server, database and configuration as if you were creating a Production system. If you are lucky, this is documented somewhere. If not, you will get to sleuth around and you have some documentation to contribute right away.

It doesn't always work well online, but reviewing the code with another developer is great too. I feel like that's how UNIX and early open source applications were passed down - you were in the same lab as the hacker who wrote it and knowledge was passed down like the oral traditions we have been passing on since the beginning of humankind.


That could also work! I agree that reviewing code with an experienced programmer on the team is a fast way to onboard. It's a lot harder when everyone is remote, though.


I think working through code tasks with other people who already know the codebase is probably the fastest. You can easily ask questions and make assumptions and the more experienced person can immediately let you know if what you are doing makes sense.


That's true! It depends on having a reliable mentor in the team.