DEV Community

Cover image for Comprehensive Guide to GitHub for Data Scientists
Elvis Mburu
Elvis Mburu

Posted on

Comprehensive Guide to GitHub for Data Scientists

What is GitHub

Github is a code hosting platform for collaboration and version control.
It facilitates social coding by providing a hosting service and web interface for git code repository.

What is version control

It is the practice of tracking and managing changes to software code.
Version control software keeps track of every modification to the code in a special kind of database.
Git is a version control.

The version control system assigns a unique hash code for every modification done to the source code.

Version Control Benefits

  • History Tracking
  • Collaborative history tracking

Github Terms

Installing git on linux

sudo apt install git-all
Enter fullscreen mode Exit fullscreen mode

if you are on another system check out here

Configure user
set the username for the local repositories

git config --global user.name "[username]"
Enter fullscreen mode Exit fullscreen mode

set the email to attach to the commits

git config --global user.email "[email]"
Enter fullscreen mode Exit fullscreen mode

set the password

git config --global user.password "[password]"
Enter fullscreen mode Exit fullscreen mode

Repository

A repository is a centralized location in Git where files and their version history are stored.
In other sense it's a directory that contains all the files and sub-directories associated with a project, along with the entire revision history of each file.

Branch

A branch is a parallel version of a repository.
The default branch is called master
Any other branch is a copy of the master branch at a particular time.
Each branch contains changes that are different from the main codebase i.e. the master branch
Benefits of using branches

  • parallel development without disrupting the main codebase
  • It facilitates collaboration across teams

Commits

These refer to the changes in a repository.
Each commit has a description/message why or what change was made.

Pull requests

They are very instrumental to enabling seamless collaboration.
With pull request you are proposing that your changes should be merged with the master branch.
They show content differences, changes, additions and subtractions in colors (red and green)

Pull requests are merged to the main branch by the repository owner or the code reiewer

Github Events

Now that we have a brief overview of what Github is all about, let's dive into some of the events :

  • creating and deleting a repository
  • pushing a code into a repository
  • creating a branch
  • opening and closing a pull request
  • code reviewing
  • merging
  • opening and closing issues
  • assigning issues

Creating a repository

Creating a repository alias repo
There are two ways of creating a repo

  • github user interface
  • creating from a folder

a. repo from github user interface
from github click the green button on the top-left
click repo

b. repo from a folder
You may want to make a existing folder in your local machine a repo.
In the terminal go to the existing project you want to start tracking.
Then enter the command below to initialize a folder as a repository.
This thus creates a new repository in the current directory.

git init

You then use the command

git add .
Enter fullscreen mode Exit fullscreen mode

This command is used to add all changes in the current directory and its sub-directories to the staging area (the temporary storage area in Git where you can prepare changes to be committed to the repository).
Instead if you want to commit selected files you can instead of git add . use:

git add filename
Enter fullscreen mode Exit fullscreen mode

the command adds a file named filename to the staging area
or

git add file1 file2
Enter fullscreen mode Exit fullscreen mode

incase of multiple files

To commit the changes in the staging area to the repository we use git commit command. Example:-

git commit -m "first commit in the repository"
Enter fullscreen mode Exit fullscreen mode

the -m option allows you to specify a message for a commit.
This is often used as a brief summary of the changes that were made.
Benefits of a good commit message

  • Enhance clarity of what changes were made
  • Acts as a historical record of the changes made
  • Facilitates collaboration among team members
  • Helps in debugging as it helps identify which changes caused the errors/bugs
  • A commit message can serve as documentation for the code changes

now let's rename the current branch to main

git branch -M main
Enter fullscreen mode Exit fullscreen mode

this commands simply just renames the current branch to main.
The default branch is master

Now let's add a new remote repo named origin to the local git repo.

git remote add origin git@github.com:usename/new_repo
Enter fullscreen mode Exit fullscreen mode

We now push our changes to the remote repository named origin

git push -u origin main
Enter fullscreen mode Exit fullscreen mode

if you want to clone a repo from github to your local machine
you can use the command:

git clone url/to/the/repo
Enter fullscreen mode Exit fullscreen mode

this creates a directory with same name as your repo with the project contents also

Following Github Flow

Create a branch
create a branch in your repository.
There are two ways of creating a branch to your repository

  • from the github interface
  • from the terminal

create branch from github interface
click on the dropdown on the left of your screen

create branch1

write the name of your branch

branch name

then click on the part create branch:

click create branch

create a branch from the terminal
Check the current branch using the command

git branch
Enter fullscreen mode Exit fullscreen mode

create a new branch using the command

git branch <branch_name>
Enter fullscreen mode Exit fullscreen mode

Now switch to the new branch using the command

git checkout <branch_name>
Enter fullscreen mode Exit fullscreen mode

or

git checkout -b <branch_name>
Enter fullscreen mode Exit fullscreen mode

Now you'll be making changes to the new branch instead of the main/master branch.
To list the branches present in the repo

git branch --list
Enter fullscreen mode Exit fullscreen mode

You can commit and push your changes to the branch
Also you can be able to revert if a mistake is made

Deleting a branch
To delete a branch you use the command

git branch -d [branch-name]
Enter fullscreen mode Exit fullscreen mode

Create a pull request

Creating a pull requests is vital especially in a collaboration environment.
Some pull requests require approval before merging it.
When you create a pull request, include a summary of the changes and the problem they solve.

On github web interface
navigate to the main page of the repository
in the branch menu, choose the branch that contains your commits

pull request
click on New pull request
You can choose the branch you want to create a pull request for

pull request2
If no issues you can click on the Create pull request grren button

pull request

The repo owner or code reviewer will then review the pull request and merge it to the main branch.

Create the pull request using the CLI
To create a pull request we use the

gh pr create --assignee "@username"
Enter fullscreen mode Exit fullscreen mode

or you can use "@me" to self assign the pull request

Synchronize changes

To synchronize your local repository with the remote repository on Github

git fetch
It downloads all history from the remote tracking branches

git merge
It combines remote tracking branch into the current local branch

git push
Uploads all local branch commits to Github

git pull
Updates your current local working branch with new commits from the corresponding remote branch
It is a combination of git fetch and git merge

Commit Changes

To list the version history for the current branch use the command

git log
Enter fullscreen mode Exit fullscreen mode

To list the version history for a file, including renames

git log --follow [file]
Enter fullscreen mode Exit fullscreen mode

To show content differences between two branches

git diff [first_branch] ... [second_branch]
Enter fullscreen mode Exit fullscreen mode

Snapshots of the file in preparation for versioning

git add [file]
Enter fullscreen mode Exit fullscreen mode

Redo Commits

To undo all commits after [commit], preserving changes locally

git reset [commit]
Enter fullscreen mode Exit fullscreen mode

To discard all history and changes back to the specified commit

git reset --hard [commit]
Enter fullscreen mode Exit fullscreen mode

Top comments (0)