DEV Community

Cover image for CVE Binary Tool: GSoC Final Report
Niraj Kamdar
Niraj Kamdar

Posted on

CVE Binary Tool: GSoC Final Report

Google Summer of Code 2020 is finally coming to an end, and what an exciting experience it has been! In this post, I’ll be showing off the fruits of my labor.

I have created an example GitHub action workflow last week, and that marks the last major milestone of my GSoC project Improve concurrency and input functionality. I’m very glad that I was able to finish all the major milestones in time. Below I’ll give a quick summary of my journey.

About the Project

Before going into the details, I would like to give a brief introduction to the project for the uninitiated. You can skip this section if you are already familiar with it.

My project was to Improve concurrency and input functionality of CVE Binary Tool. The CVE Binary Tool scans for several common, vulnerable open source components such as OpenSSL, libpng, libxml2, etc. to let you know if a given directory or binary file includes common libraries with known vulnerabilities, known as CVEs (Common Vulnerabilities and Exposures). It is written in my favorite language Python. It is available as a python package and you can get it from PyPI. You can view the source code from GitHub repository and it's maintained by Terri Oda and John Andersen. They are also my mentors for this project.

When I started working on the project, CVE Binary Tool has some bugs under Windows platform and since I use Windows primarily, I have fixed all Windows bugs in the community bonding period. I have also written faster native python solution to replace the c-strings extension module and refactored whole checkers module to use object-oriented design to reduce repetition of code in community bonding period. Since we have removed support for Python2 in the last release, I have also optimized the legacy Python2 code for Python3.

The First Milestone: Improve Test Suite

I wanted to improve tests of CVE Binary Tool because CI runtime was too long and there was an initialization missing which was failing the test when we run tests locally.

I have removed compiler dependencies from test suite in favor of native python solution. I have also fixed the tricky pytest initialization bug by researching and understanding the origin of the bug. I have also gradually improved the test suite of every module by migrating it to pytest from standard unittest and using pytest-xdist to optimize runtime. (EX: parallelization of test_json.

The Second Milestone: Improve Concurrency

Previously, CVE Binary Tool was using multiprocessing for downloading NVD data-feeds. Since downloading is an IO-bound task, I suggested that asyncio would be the better choice for this use-case.

I have refactored almost all IO-bound modules to use asyncio. I have also developed many async-utilities for various modules. The following are all the pull-requests regarding my work on improving the concurrency of the tool.

The main performance gain was the reduction in downloading time by 50% after switching to asyncio.

The Third Milestone: Improve Input Functionality

CVE Binary Tool had a command named csv2cve to get CVEs from the CSV formatted input. My goal was to replace it with a generic InputEngine module which can support various input types. Currently, It supports CSV and JSON as a valid input data type. I have also added support for specifying triage data for CVEs like remarks(Ex: Mitigated, Ignored, etc.), comments, custom severity, etc.

Since CVE Binary Tool has several command-line arguments, I decided to add support for config files so that users don't have to repeatedly specify these arguments every time S/he wants to scan a directory. It can also be helpful in automated environments like CI/CD pipelines. Below is a list of my pull-requests related to this milestone.

The Final Milestone: Improve User Experience

We're lucky to have Anthony (a user) who was willing to attend our weekly meeting. He suggested many features and documentation that would be helpful to him and other users. I have contributed how-to-guides(essentially a cookbook of recipes) for different use-cases.

I have also created an error handler module to provide beautiful traceback using rich library and set custom exit-code according to exception so that the user can know why program terminated under quiet mode or automation script.

Here is the list of all my pull-requests related to this goal.

Future Work

My original proposal was huge and due to limited time, I have to prioritize my goals according to user requirements. I have fully completed my core goal of improving input functionality and user experience but I still have many optimizations in my mind for the concurrency aspect of the project. So, I am going to continue my work on improving the performance of CVE Binary Tool. I also want to optimize runtime of the long tests because it is taking 30 minutes to run all tests and I believe we can optimize it significantly by caching and/or reducing the test data that need to be downloaded for testing. I also want to create a ready to use GitHub action for the tool so that developer can integrate it easily as a part of their CI/CD pipeline.

What I gained from GSoC

The amount of experience I gained from working on CVE Binary Tool with Terri Oda and John Andersen is immeasurable. Both my mentors are kind, helpful, and extremely talented. They have given me many advice and suggestions from low-level working of language to the best practices about writing good code.

Terri Oda helped me improve my PR(pull-request) quality by suggesting articles about best practices. She also helped me understand what tasks should be prioritized. While John Andersen helped me with a lot of things about python in general like metaclasses, asyncio, context-manager, etc.

Before GSoC, I only knew basic git commands but now I can confidently say that I have mastered git. I also have a much better understanding of how programming languages work, when and how to choose third-party libraries, how to refer to documentation efficiently, how to write clean, readable and maintainable code, how to document and test code effectively, how to properly structure pull-requests and most importantly how practical programming works, which I wouldn’t have known otherwise.

Opensource has become my hobby now because being part of an opensource community, I have learned that as an opensource contributor apart from making code contributions, I also get to triage and resolve user issues and review other contributors' pull-requests. I can also guide and motivate new contributors. I won't get this kind of experience if I just do my personal projects.

All these lessons are invaluable to me and I strongly believe it will be helpful to me throughout my career.

Top comments (1)

Collapse
 
titanhero profile image
Lex

Cool your history motivates me to help to another repos instead of only work in the mines, animus, Never Give Up, Keep trying 😉👍✌️