GSoC 20: Week 10

#gsoc #opensource #python

Hello everyone!
I am Niraj and I will be sharing my code contribution of the tenth week of the GSoC.

Background

One of our user has mentioned that we aren't currently showing file path where we found CVE in the output report. We were just logging it on the console so, my peer Harmandeep Singh has implemented a way to to store paths with the CVE data and write it in the output file. But unfortunately it was breaking CVEScanner whenever we use --input-file flag for scanning CVEs from CSV or JSON file.

When I start digging it, I also found out several issues in the current data structure for all_cve_data which are as following:

Old CVEData was NamedTuple and since newly added path attribute was mutable it can create hard to find bugs.
To update path we need to scan all_cve_data to find product for which we want to append paths and its time Complexity is O(n²) which can be reduced to O(n) using better structure.
Throwing vendor, product, version in different function was decreasing readability. So, a ProductInfo datastructure would be nice to pack this data together since we never need that alone.
TriageData wasn't syncing with old CVEData. So, csv2cve or input_engine was breaking.

What did I do this week?

I have experimented with various data structure to find out the one that handle all of the issues mentioned above efficiently. In the past, all_cve_data was Set[CVEData] which was sufficient at that time because all attributes were immutable in CVEData and we were just using set to remove duplicates from the final output.

But, when we introduced paths attribute we need to change paths every time we detect same product in different file and set doesn't have any easy way to get value stored in it apart from looping over whole set to find what we are looking for.

Note: set can only assert if object is in the container or not but doesn't have any way to retrieve actual object from the container. I have implemented a MutableSet data structure which provides functionality to retrieve actual object from the container using __getitem__ but I didn't want to use my custom data structure as long as it is feasible to use standard data structures.

So, I have refactored CVEData into two parts: 1) immutable ProductInfo(vendor, product, version) and 2) mutable CVEData(list_of_cves, paths_of_cves). And I am storing mapping of ProductInfo and CVEData (Dict[ProductInfo][CVEData]) into all_cve_data so now we can access CVEData of a product without having to traverse whole all_cve_data.

I have moved all data structures into utils to avoid circular imports and I have also added test for paths.

What am I doing this week?

I will continue to improve documentation of the code I generated like adding docstrings and comments. And I am also going to add requested how-to guides to improve User Experience.

DEV Community

GSoC 20: Week 10

Background

What did I do this week?

What am I doing this week?

Top comments (0)

Read next

Full Stack Python Developer - Day 1

The Cost of Clinging to Legacy Software: Risks and Realities

Clerk Update – November 12, 2024

Building a desktop launcher