In this short article, we will explore the static code analysis capability of Semgrep. Semgrep is a fast, open-source, static analysis tool that supports most modern languages. It works on a set of rules and rules are customizable as well according to your requirements. The tool is available in the CLI (OSS) version as well as in the SaaS version(Semgrep App). Also, it is very flexible to integrate with your CI pipelines. Let's look at integrating Semgrep with GitLab.
Semgrep works in such a way that the whole code analysis is done in the agents aka build machines and no sensitive data is being sent to the cloud. The only requirement is to generate an API TOKEN from Semgrep App and pass it to your GitLab pipelines, so they can talk with each other.
You can create an API token from this link
Once you have the token generated, the next step is to add the API TOKEN to the GitLab Variables — SEMGREP_APP_TOKEN in your project
Let's look at the GitLab pipelines. The pipeline has been just to do the static code analysis for illustration purposes.
stages: - build # Semgrep static code analysis semgrep: stage: build # A Docker image with Semgrep installed. image: returntocorp/semgrep # Run the "semgrep ci" command on the command line of the docker image. script: semgrep ci rules: # Scan changed files in MRs, (diff-aware scanning): - if: $CI_MERGE_REQUEST_IID # Scan mainline (default) branches and report all findings. - if: $CI_COMMIT_BRANCH == $CI_DEFAULT_BRANCH variables: # Connect to Semgrep App through your SEMGREP_APP_TOKEN. # Generate a token from Semgrep App > Settings # and add it as a variable in your GitLab CI/CD project settings. SEMGREP_APP_TOKEN: $SEMGREP_APP_TOKEN
For scanning, I am utilizing an old repository hosted here that consists of Python scripts, Dockerfiles, and Kubernetes Manifests. The outcome is quite nice, as Semgrep was able to catch a few common mistakes in the development. The dashboard shows an overall summary of the findings.
Let's zoom in on one of the findings run-as-non-root . Typically this message is raised when you have allowed your container to be run as root or in order words you have not specified the user by which the container is executed.
There is a rule editor which shows the pre-defined rules, against which your code is compared. You may go around it to see, how it evaluates the condition.
Semgrep is a good tool to integrate with and it can be integrated with major CI/CD engines like GitLab, GitHub Actions, Azure DevOps, Jenkins, CircleCI, Bitbucket Pipelines, etc. It is worth checking out if it suits your organization's requirements. That’s all for now. Hope you find the article useful and feedbacks are always welcome. Cheers.
In case of any queries, please feel to connect me via the below links