Most of us, quite often, are in need to familiarize with a new code base. You might be an avid open source guy/girl who wants to contribute to another project, or you just started a new job, and you want to familiarize with the platform that will be a big part of your life, during the next months/years…or just for fun, because who’s watching comedies nowadays?
Familiarizing with a new code base is a keystone skill, every software engineer should acquire on this beautiful planet.
Quite a few people have shared their techniques over the years. But I believe that repetition and collaboration are absolutely essential when it comes to master a skill.
The approach below, focuses on solving the problem on your own and not “relying” on another human being for mentoring, like an architect in your company.
Read any documentation the software might be associated with, first thing. This might be things like wiki pages built for the project or documentation build from code comments like doxy or Javadoc.
If the software is too large then you might prefer focusing on specific areas. For example, If you want to get familiar with the entire Facebook platform, you could start by reading about how news feed is being populated or how the data are structured in the database layer. Flow diagrams, class diagrams etc that were used as requirements reference are always handy.
Build it, break it, shake it(no pun intended Savage Garden)
The natural tendency, after you have acquired some basic knowledge about the system you are examining, will be to build it. If it cannot be built, maybe it’s not worth reading it.
But, if for any reason you don't care, understanding how the build is done (build process) is crucial for your end target. Knowing the build process is important because you get to know the dependencies and how the built code is deployed. And knowing that dependencies means, you know what the f*** might have broken your system.
After you build it, start using it as an end-user would do, but don’t just follow the happy path. Try to break the interface and see how the system reacts.
At the same time, observe the logs produced. This might reveal some sequence in the process, and ideally, it will point you to an entry point to start looking at the code.
If the logs imply that the handler handle_user_login is called as part of the interaction, this is totally a good entry point to start reading the logic.
When I say automation tests, I don’t only mean unit tests but other types too, like behavioral tests(Cucumber anyone?). Here a few reasons for that:
- The tests, given they are written correctly, provide a great demonstration of how the app is expected to work, in terms of flow. For example:
In order to get the user logged in, I have to involve those 5 functions, that are mocked in the test, so let’s check them more deeply.
- The unit tests are a great source of documentation.If you read the unit test(s) that emulate the login success of a login you have a good example of the involved functions and their return values. What does function create_hashed_password return? Is it a boolean or the hash itself? You can now verify.
- The behavioral tests are a living requirements capture mechanism. If you are not familiar with the concept of behavioral tests, this is a great article to have a quick but good grasp.
Now that you know where to start, start reading the involved pieces of code. In the example we have discussed so far, try to understand how the login handler parses the request. How the parsed password is converted to a hashed version and how it is compared to the one that we have stored in our database.
Don’t be afraid to dive deeper and get a better grasp on the low-level implementation. But, make sure you discipline yourself and you know when to stop. Otherwise, you might end up planning how to make flushing data to the database 2 pico-seconds faster(unless that's your end target of course).
Try to understand, why the developer implemented a specific piece of logic in a certain way and how would you do it. Start drawing logic diagrams that will help you visualize the flow.
Note keeping, as well, is more than desirable. Especially, if it is in terms of inline comments, which will help revisit the code later, less painfully this time. In such case, my preference is to prefix them with my initials, to filter them easily later.
And what’s the best way to verify your understanding? Modify the code and write unit tests. If the code you added, is invoked, you are certain you are on the correct track. Similarly with the unit tests. You cannot write non-trivial tests if you don’t have a decent understanding of the logic of the feature you examine.
Now that you know, that you are focusing on the correct piece of code, you may move further and into more details. One good approach would be to add breakpoints throughout the code invoked. Maybe start the debugger and start verifying what is going on, in terms of variables and values.
Are any of those, initialized in a different place compared to where you expect them to? Do they have the correct format or are being coerced somehow later? What does this variable with the ugly unclear name supposed to do?
What preparation is being made from functions called before the handler under examination? What are some assumptions, being made for each call, and how we are verifying them?
Now that you have a good understanding of what is going on for a certain slice of the functionality, you can pick another one. Learning is never-ending and always exciting.
This is mostly applicable to open source projects but not forbidden with ‘plain’ enterprise software. Once you feel confident with the software you are examining(or at least with the area you are mostly focused on), try adding some piece of functionality, whether this is a bug fix or a small new feature.
This will increase your confidence regarding the system, for sure, and is always rewarding adding new code to an existing system.
Thank for reaching the end of this article. Please let me know if you agree/disagree with any of the steps above and as implied in the introduction, feel free to add your own methodology. Do you have any quality books to suggest, in that area?
Below is a summary of the steps I usually follow to familiarise with a new codebase:
- Read the documentation available, the first thing
- Build the software and use it as an end user would do
- Read the automation tests to get a first dive
- Read the actual business logic and make diagrams out of your understanding
- Trace the full process of each slice in detail
- Move to another slice of functionality, once confident
- Add some value to the system, by contributing to it with a bug fix or a new small feature