DEV Community

Cover image for Getting Started in a New Codebase
Abbey Perini
Abbey Perini

Posted on • Edited on

Getting Started in a New Codebase

Watch the talk version on The Monthly Dev

Whether it's contributing to open source or starting a new job, the first step is familiarizing yourself with the codebase, and it can be daunting. Here are some tips to help you hit the ground running.

1. Don't Panic

Even if everything about the codebase is new to you, the first step is always the same.

Make a local copy of the application on your computer.

2. Try to Get it Running Locally

For once, Yoda is wrong - there is "try" and "do" in this step. If you're unable to get the project running based on the information you have, you can help mitigate the problem now, before the next developer is onboarded. Plus, regardless of the outcome, you'll still move on to steps 3 and 4.

Ideally, there should be a file called README in the root directory of the application. This should have details about the project and instructions on how to run the codebase. If you're using GitHub and visit the repository page, this is the file displayed after the file directory.

If there's no README or documentation, try to find the run scripts. Manifest files are a good place to start. For example, in an application using Node.js, npm, or yarn, the manifest file is called package.json. In addition to metadata about the project, the file may have a scripts section.



"scripts": { 
  "build": "next build",
  "start": "next build && start", 
  "test": "jest --watchAll --verbose" 
}


Enter fullscreen mode Exit fullscreen mode

These scripts are run with your package manager in the terminal like npm run start or yarn run test.

If the manifest file doesn't exist or doesn't have a run script, try executing outermost files that are named "index" or have the same name as the directory or project. If you navigate to the directory the file is in, you can execute a file in the terminal like ./index.js.

Ideally, projects like this will also have a help flag. Executing ./index.js --help may print out helpful instructions, including a list of commands you can use.

If you're on a mac or using linux, you can also try man <command>. This will print out a user manual for the command. If it exists, it'll have more information than the --help flag did. If it doesn't exist, you'll see "No manual entry for <command>". Running man ./index.js will just print out the file itself. If you're on Windows, you'll have to use a package, like groff. You may even want to use groff with macOS or linux - it displays the manual in the browser instead of in the terminal.

3. Look for Documentation

Developer-focused documentation may exist outside the project. It could be in the GitHub repository wiki, a company-owned docs site, or a Google Doc that's sent to you on your first day. Get access to a deployed environment as soon as you can. You'll learn a lot playing around with a working version with test data.

You can also find value in documentation for users or other teams within the company. More than once, I've learned about a feature or new ways to use a feature from the user manual. These are also less technical, so they are easier to absorb when you're already taking in a lot of information.

escape room concept: You are a software engineer. There is a production issue related to a legacy codebase. No one knows how it works. Various credentials are scattered around the office on post-it notes. There's some printouts of git diffs. You have an hour to fix this.

There are multiple other ways a project is documented that people don't think of as docs. This can be helpful at companies that unfortunately don't prioritize documentation. For example, tests are usually a one-sentence description of functionality. They often describe what the app is supposed to do as well as what it's not supposed to do.

Even if no developer on the project has ever written a descriptive commit message, git and GitHub will still have useful documentation. Check out branches to see if there's a branch naming convention. Read old pull requests to get an idea of the code review process. git log and git blame will give you an idea of feature and fix timelines.

Whether the team uses GitHub Issues, JIRA, or another work tracking system, I highly recommend searching to see if a ticket exists before you complain about something. Tickets will also give you a good idea of team goals, users' complaints, bottlenecks, and processes.

4. Talk to People

Ask developers what they wish they knew when they started. Ask what tools (e.g. browser extensions, editor plugins, state inspectors, etc.) they find useful for this project.

Ask to pair with other developers on their work or as soon as you hit your first roadblock. If the idea of asking for help with your code makes you nervous, check out Virtual Coffee's Guide to Asking Questions About Your Code.

If you hear about a bug fix, ask the developer working on it what they checked first. This is how you find out where the useful logs are and which parts of the app are most flaky.

Two people with a small dog. One asks

Schedule meetings with team members who aren't developers. The Product Manager can tell you the big goals for the project. The Product Owner can tell you what's prioritized for the short term. QA can tell you what's flaky, what kinds of bugs are high priority, and if they need something fixed to help them catch more bugs.

If you're working in open source, join any communities related to the project. Listening to the maintainers, other contributors, and community forums can tell you a lot about what part of the app needs more love.

5. Know the Business

Understanding the code is much easier with context. This is especially true when it comes to variable names. It behooves you to understand common industry terms and acronyms.

It may seem like it's outside of your role to understand the industry. However, if you hear about a service a direct competitor just started providing and know how to implement it in your product, that's a big win to bring to your boss. In highly regulated industries, like healthcare, some industry knowledge could help you catch a vulnerability in the product design early. That could save the company a ton of money and looks great when you go to negotiate a raise or interview somewhere else.

At the very least, it'll be easier to understand meetings and predict edge cases.

6. Mental Models

A great way to conceptualize any application is creating some high-level mental models.

I really love drawing flow charts and diagrams for applications. Tools like miro and whimsical make it easy and shareable. These can be as simple as a tree of components or files and how they relate to each other. Flow charts following the flow of data are super useful, especially if there are any integrations, micro-services, or pub/sub.

You don't just have to use drawings to create these mental models. It's common to write out what each API endpoint does, including request and response structure.

There are a few reasons I'm recommending manual methods for creating these diagrams and documentation. You may not have or be able to get permission to feed the code into a code visualization tool or AI. You may not have the requisite knowledge to recognize if the tool spit out some incorrect information. Finally, manually reorganizing all of the information you've learned so far is a different type of learning than reading a summary a tool made. Putting it all into context yourself increases the chances that the information will make it into your long-term memory.

Once you're finished, ask a developer who is more familiar with the project if you missed anything.

7. Break It

elmo, raising his hands triumphantly in front of flames

Run into a long function you're having trouble grasping? Delete a line. See what happens. Repeat.

Unable to follow the data flow? Use breakpoints in a debugger.

Feed the app bad data. I dare you.

Remove some props passed to a component just to see what errors pop up.

Read the logs you've created with your mess.

If you've seen the app break that way before, it's easier to narrow down what part of the app is broken when a bug comes up.

8. Fix It

Even senior developers have to get used to the process in a new repository. Just fixing a typo will allow you to watch the team's process from ticket creation all the way to deployment.

9. Give It Time

Learning the codebase feels URGENT, but you can't learn everything at once. With every PR you learn a little bit more, so give yourself time and grace. A typical expectation is six months to ramp up, so if your code is getting merged in before then, you're doing great.

Conclusion

If I missed your favorite tip for learning a new codebase, tell us about it in the comments!

Top comments (18)

Collapse
 
maxfindel profile image
Max F. Findel

Love the tips, thanks Abbey! Every time I have to deal with a new codebase, my first PR is to improve the README file. I read it completely first, then try everything that's on it to get the project up and running. I add to the README file every extra step I had to discover from colleagues (or the internet) and then send my first PR ๐Ÿ˜„

Collapse
 
logarithmicspirals profile image
logarithmicspirals

Number four is super important. I've been fortunate to be in work environments for most of my working life where asking questions is encouraged. Even then, some folks may have negative feelings about reaching out for help. However, sometimes the quickest way to deal with something is to get some support.

As far as number two goes, one key thing is to check the log output. If the app doesn't have logs there's not much you can do other than black box testing to try and understand how it works. Absence of logging compounded with spaghetti code can be extremely difficult to unravel and at that point you're going to have to start taking notes or drawing diagrams. One thing I like to advocate for now is better logging even on personal projects of friends. It makes life way easier for yourself and others if you've got good logging habits.

Collapse
 
abbeyperini profile image
Abbey Perini

100% - I've been in environments where questions weren't encouraged, and some of my advice from that experience made it into the guide I linked. I've noticed that if you frame it as a technical problem and list all the things you've tried, there's usually at least one dev who wants to solve the puzzle.

It's interesting you say that about logs! I've been in environments where there was no logging and so much logging it's difficult to find the relevant ones. I guess as a result, I am always taking notes and drawing diagrams. ๐Ÿคฃ

Collapse
 
jimajs profile image
Jima Victor

"Feed the app bad data. I dare you." - Favorite line.
I'll be saving this for later

Collapse
 
dricomdragon profile image
Jovian Hersemeule

Thanks for this article and for featuring the Lion King on the cover image ! ๐Ÿ˜

Another tool for making diagrams : Draw.io ; free software, light, available on the browser or with a desktop app, and many integrations with common tools (Atlassian Confluence or MediaWiki for example).

If you make a diagram to understand a new code base, don't miss the opportunity to share it !

Collapse
 
jaustinuf profile image
Info Comment hidden by post author - thread only accessible via permalink
jaustinUF

My Aug 23 post is not meant to be hidden ... think maybe it's too long to show here. When I click on the date the post comes up in a new page; don't know if it will if you're not the author.
The post suggests using AI (ChatGPT for me) to help analyze, document, and understand a new codebase. The long post length is due to including ChatGPT's response to a list of steps to follow to deal with the limited 'token' count for each input to ChatGPT.

Collapse
 
jaustinuf profile image
Info Comment hidden by post author - thread only accessible via permalink
jaustinUF

Iโ€™m surprised no one mentioned the obvious modern tool: AI. Hereโ€™s a discussion about using ChatGPT to analyze a complex-sounding multi-API codebase; I found it very enlightening:
mentorcruise.com/blog/empowering-c...

Here is ChatGPTโ€™s response to my list of suggested steps, recognizing there a limit to the number of โ€˜tokensโ€™ (4096 words, symbols, etc.) that can be input to ChatGPT at one time.
******* ChatGPT response
Your approach is methodical and well-structured, making it effective for understanding and dissecting a new codebase. Let's delve deeper into each point:

  1. Share the context (what the code does):

    • Absolutely. Knowing the high-level purpose of a codebase or a module helps set the stage. It's like knowing the genre of a book before diving into its chapters.
    • Additionally, providing any documentation, READMEs, or architectural diagrams can be beneficial if they exist.
  2. Share and analyze the entry point code (where the code starts when it's run):

    • Understanding the entry point is crucial as it sets the flow of execution. For many applications, especially web applications, there might be multiple entry points. But if we're talking about scripts or standalone programs, identifying the main function or method is the first step.
    • Depending on the complexity, the entry point might itself give you an overview of the program's life cycle, such as initialization, main processing loop, and cleanup/shutdown.
  3. Analyze where the opening code 'goes', and what triggers that action:

    • Tracing the flow from the entry point allows you to map out the main paths of execution. This step might involve understanding function calls, event triggers, or method invocations.
    • For larger codebases, tools like call graphs or code visualization tools can help in this process.
  4. Then analyze deeper and further down the paths of action and triggers:

    • Once you understand the primary paths of execution, you can delve deeper into specific functions, methods, or modules. This step-by-step dissection helps in isolating specific behaviors or functionalities.
    • It's also during this deep dive that you're likely to encounter dependencies, libraries, or other external integrations. Understanding how and where these external components fit in is essential for a holistic grasp of the codebase.

Additional Suggestions:

  • Tests: If the codebase has associated tests (unit tests, integration tests, etc.), analyzing them can give insights into expected behaviors, edge cases, and usage patterns.

  • Comments and Documentation: While code should ideally be self-explanatory, comments and documentation can offer additional clarity. Sometimes, the reason why something was coded in a particular way is more important than the actual implementation, and comments can shed light on that.

  • Refactor if Needed: As you go deeper into the code and understand it, take notes on areas that seem overly complex or confusing. After discussing with stakeholders, consider refactoring these areas to improve clarity and maintainability.

Your approach provides a systematic way to navigate, understand, and possibly improve a codebase. Remember that understanding a new codebase is often an iterative process; as you gain familiarity, you might revisit earlier steps with new insights.

Collapse
 
user64bit profile image
Arth

This is Great!!
One more thing which we can do is, Analyzing other developer's commits and trying to understand their code.(also we can ask them if got confuse)

Collapse
 
efosa_j profile image
Efosa Jeffrey

Some commit messages aren't great at all lol.

Collapse
 
genlyai_ profile image
Walter Santos

That's great for a beginner like me, thank you. A beginner question: are there codebases that are 'meant' to be run only on cloud services, AWS for example (which I know nothing about)? Or the company will always have a 'local' codebase for testing and such?

Collapse
 
abbeyperini profile image
Abbey Perini • Edited

In my experience, unless you're working on the application that is run in the cloud, there are usually environments always running for local development and testing (dev/develop and QA).

If there aren't continually running environments, I would be surprised if you were expected to stand up a cloud VM (virtual machine) without any instructions. In addition to costing money, there are typically enough permissions and authorization steps required that setting it up is documented somewhere. If it's not documented, there is usually the expectation that someone will have to help you get set up. After the VM is created, running the application in it is the same as described in step 2.

For something like a Docker container, you would have to learn the basics of the tool, but I'd also expect to be told what packages to install and where to pull the image from. Often, there's a configuration file similar to a manifest file with that information (for Docker it's compose.yaml). There also may be authorization steps and scripts for refreshing the data, just like a migration script for a database.

Collapse
 
genlyai_ profile image
Walter Santos • Edited

Thank you for a very detailed answer. It helped a lot!

EDIT: my question was hypothetical. Obviously, if I'm new in a company, I wouldn't have lied in the interview and would have been honest about having no experience in cloud computing. So, if I got hired sill, they'd know that!

Thread Thread
 
abbeyperini profile image
Abbey Perini

๐Ÿ˜… Not hypothetically, I was upfront about being fresh out of bootcamp with no cloud experience and still got put in charge of several VMs. One time I basically locked myself out of one and my sr dev had to help me stand it back up. Developers are often expected to learn new tools on the fly, so it was a good question.

Collapse
 
mohitbansal321 profile image
Mohit Bansal

suppose i am working on a project some of technology i don't know and there are bug on that part then how can i handle that
can you make on specific for open source

Collapse
 
alexbender profile image
Alex Bender

Here is the really cool book dedicated to that whole topic, titled "Object-Oriented Reengineering Patterns"
oorp.github.io/

Some comments have been hidden by the post's author - find out more