How I learned Threat Intel by contributing to an open-source project

#security #cybersecurity

I became a developer 3 years ago in order to code my way up to IT security. I wasn't really sure where to go (who is really?) but I knew I wanted to approach that field from a technical side.

Lately, the stars are aligned and there are so much more opportunities for me to grow in that field either at work or on personal projects.

So let's talk about one of those personal projects!

Hunting for bad code

(Cyber) Threat Intelligence is in two words information about threats. For example, which group is using which virus, which domains or IPs addresses are interesting to study or how to you study a specific target (I wrote an article on analyzing a spam email a couple of weeks ago).

The main goal of any investigation is to be able to gather a maximum of data on your target and there are literally thousands of websites and services online that provide general or specialized information.

For instance, Virus Total is a website that gives you information on malware samples or about domains found in malwares. It runs thanks to user-submitted samples which are ran through a bunch of anti viruses and gives you the results out. Check this guide from VirusTotal on how to hunt malwares.

A malware is a mal icious soft ware, it can be a virus, a trojan, or
anything. As there are many different viruses, the term malware is preferred.

There are around a hundreds more services that give you different information such as UrlHaus, AlienVault or GreyNoise. Each of these services cover a specific need or bundles the data in a different way.

Well, you can already see the problem, running through all those websites is cumbersome and slow. So a friend of mine wrote an open-source program that does it for you: harpoon.

The main problem I have while doing research is knowing where to look. So jumping into this project to write a couple of plugins was a great opportunity for me to know more about those websites and their use.

I looked at the issues documented on the Github, I grabbed one and started working. When something started to look good, I submitted my pull-request. It was not the first time I contributed to an open-source project but I have to admit, the first pull request is always a bit stressful!

Going back the investigation, the results of searches helped me on how to investigate a domain or an IP. As I knew some things such as the whois record or passive DNS, but some other were more obscure.

Moreover, the question is not really how much data you can get, but how relevant is the data. Outdated or low quality data, even in large amount won't be of any use as the need here is to be able to pull a needle out of the straw ball.

Interestingly, the starting questions are always the same: what, when, by whom. What kind of domain is it: website, file server, application, database? When was it registered, has the content of the website changed recently? Who registered the domain or the IP, an organization, is it part of the ASN of a company?

Then, the main question is: has the target been already detected and this is when Threat Intelligence comes into play. At this point, you want to know if you are looking at something that has already been studying or detected by other companies.

So to look for that, with Harpoon you would only run:

harpoon intel domain bad.website.com

Or:

harpoon intel ip 123.123.123.123

Then you can start investigating the websites or run specific queries. Example with a query for a domain on Virus Total:

harpoon vt domain bad.website.com

This automates some part of your work and saves you so much precious time!

This should cover roughly most of the cases as looking at new data and new actors is incredibly difficult and I am not yet at that level!

Some closing thoughts

It is interesting that investigating is a bit like looking up a bug in code (something that I know way too much!). It is really a question of experience, not with coding, but with googling the problem and looking up answers on StackOverFlow!

Being able to filter out noisy data on the Internet seems to be a essential skill on the Internet for everyone!

Did you enjoy this post? I'd love to hear your thoughts! Bear in mind that I am quite a novice.