Tools and techniques for exploring an unknown (large) codebase
You’ve received a new responsibility or walked into a new job where there is existing code to work with. The source control has been introduced to you, realizing you have a lot to learn. Now what? Asking yourself, how should I become acquainted with the new codebase? Which for some, may be intimidating. Below I’ll attempt to illustrate my techniques for getting familiarity with the code, leading to structured knowledge.
Don’t panic, relax, it takes time
It’s a process that takes time. Start with debugging the main flow of the application. How do you know where to start? I suggest inputs and outputs. Find the entry point, the one that starts the whole program. If you can’t find it, go ahead and debug a delimited functionality, such as a button’s click.
Flow with the code, do line breaks steps. Navigate through the stack trace to see where the primary methods are and continue from there. As I experienced it, it may take weeks before you’d feel safe making any change, and months before feeling “comfortable” with the code. Eventually, comprehending the code’s meaning in business terms.
High level
Approaching a comprehensive software architecture can be overwhelming. I highly advise using tools to generate a dependency graph to top-down explore it. First, visualize the graph between the different assemblies. This will give you an idea of how features and layers are organized. Then dig into namespaces dependencies to have a finer-grained idea of code structure. And finally, you can look at classes’ dependencies to understand how a set of classes collaborates to implement a feature. There are several tools to generate a dependency graph, like NDepend for .NET, for example.
Another approach is to use a source analysis tool to determine various module sizes, complexity metrics, and more to get a feel for the project and help identify the non-trivial areas, tools such as the TIOBE software quality framework.
Documentation
Go through the documentations. Some may be good, some may be bad. It depends on the team’s culture (and processes) to maintain them. But no matter how lack of information it is, read them. If they don’t exist — write it by yourself. Later, it will be easier to refer to it and make sure you do so (by not answering questions and pointing to them). Update rigorously by you or any of your team members.
Be stupid
Often the power of communication is underestimated. Don’t be afraid to sound stupid — ask the code’s authors or if they aren’t available, find the experienced ones, I bet they’ll have what to say.
Share with them the assumptions you made about the code, every conclusion you’ve come to about how it works and what it does. Hear their insights, let them mentor you, consider pair program with them. It will save you many hours in the long run.
Unfamiliar tech stack
If the implementation is based on technologies/languages/frameworks, you’re not familiar with, shift between the code and tutorials on the related technologies. Read or watch the tutorial, then go look at the implementation to see how it is reflected in the system, noting any similarities and differences. Helping you understand the design and the circumstances leading to it.
Baby steps
Introduce small changes and see what breaks. Clean the code one step at a time. Add comments to explain what you think the code does. Using a refactoring tool (Resharper 🙄) apply changes to variable names to make them readable.
Reduce code clutter by deleting commented out code, meaningless comments, and so forth. Remove code duplication where possible. Get rid of magic numbers and apply code conventions. Finally, add tests where possible. Not all changes will be kept, but it will help in the orientation process.
Ask to be assigned with investigating defects — it’s an opportunity to gain knowledge from the users’ perspective while solving (and understanding the meaning of the software). Furthermore, unit tests help you practice with the code usages, so make time to accomplish them too. Both, allow you to go into the code with a purposeful target which assists you to focus since it’s a contained task.
To conclude
The goal is to reduce the unknowns, your best choice is to simplify the source to reduce its complexity. By adhering to the above concepts, you should get the grasp out of it, know where to focus, and with few steps have better focus.
Regularly clean the code, always test new functionalities, and allow time for refactoring.
P.S. If you want to dive deeper with additional concepts, the highly recommended book “Working Effectively with Legacy Code” by Michael Feathers is a must.
Top comments (0)