DEV Community

Cover image for Information Gathering with "theHarvester"
sumanbasuli for Tropyl

Posted on • Originally published at tropyl.com

Information Gathering with "theHarvester"

This is the first tool in the Red Team Tools series that I will be talking to you today.

In this post, you will learn how to gather both technical and helpful information about your target using theHarvester tool. In a red team engagement, one of the most important steps of the whole project is gathering information about your client, and that's because the information that you collect in here will be really useful during the exploitation phase of your engagement.

For example, if you do a really good information gathering job, you may find not‑so‑secure servers open to the internet, which can be way easier to exploit than the main company website.

So, in this article, we'll cover theHarvester, which will help you to automate the information gathering so you can find tons of interesting information about your target.

We start by showing how to use theHarvester to find subdomains and IP addresses that could be interesting for an exploitation phase. Then we also cover how to use the tool to find information about the people that work in the company, such as email addresses, Twitter accounts, and even LinkedIn profiles.

Important Note:

All the targets used in this post are from publicly available bug bounty platforms like HackerOne, Bugcrowd, and Intigriti. So they are all open for performing recon or running theHarvester. Before proceeding make sure you have legal rights or written consent of doing recon or running theHarvester on your target.

Understanding Information Gathering and theHarvester:

Imagine you are doing a bug hunting about WordPress and Intigriti. Your first step is getting information about your target.

So, did you know that you can gather tons of information about your target just by doing some searches on the internet? Yes, that's right. Without sending a single packet to our target we can gather tons of valuable information.

For example, we can use search engines such as Google, Yahoo, and Bing to get information about WordPress and Intigriti. In just a few seconds we can do a few searches and find subdomains that we could attack later.

Sometimes the main website of a company is secure, but their other systems may not be such as the email server or the HR applications.

Also, we can use the same search engines to find email addresses related to this company. As you may know, Google indexes data from several websites so you can find email addresses even if they're posted on a different website.

And more than that, we can use those search engines to find sensitive files that may be exposed to the public. For example, you may find metric diagrams or even database backups that may be useful later in the red team engagement.

And also, we can use sites such as LinkedIn to find more information about the employees of a company. In a few minutes, we can quickly find the name of key people in the company, which can be useful for social engineering attacks.

And the best part about this is that we can search as much as we want because it's not illegal to search for data that is exposed to the internet. And also your target will never know that you're searching for their information.

Installing the tool:

Before proceeding make sure you have a working Linux environment maybe Kali Linux, Ubuntu, or any other Linux distro running on Vmware, Virtualbox, or Hyper-v or even WSL 1/2 will work. But for this post, we will use Kali Linux as our Linux Distro.

If anytime you feel that you are not able to follow the article the above acsiinema will help.

The easiest way of installing theHarvester in 2020 is to just paste this command on your terminal, but unfortunately, it doesn't work every time and for every system. (If you are using Kali Linux it is Preinstalled, and you skip this step.)

sudo apt-get install theharvester

If this command didn't work for you then there is a way, a long way... let's see:

First make sure you have git installed, which nowadays comes installed by default on major Linux distros. Anyway If it not installed then you can follow this article from Linuxize.

Firstly make sure all your packages are on the latest version:

sudo apt update && sudo apt upgrade -y

After performing all the updates and upgrades not get into a directory like tools or something you prefer, I will be using tools

mkdir tools
cd tools

Once you are in the directory we will clone the git repo of theHarvester

git clone https://github.com/laramies/theHarvester

once the repo is cloned we can get into the folder

cd theHarvester

Now we need to install pip assuming that you have python3 installed.

sudo apt install python3-pip

Now we need to install the required packages for theHarvester to work

python3 -m pip install -r requirements/base.txt

Once the required packages are installed after that we are ready to start our tool.

python3 theHarvester.py

If the tool doesn't start for you, you can see the above asciinema. To get the full walkthrough of installing the tool and you can also copy commands from there as well.

DNS, Subdomains, and IP Addresses:

Now since our tool installed properly we can get into actually using it.

So, if you want to follow along with you can start your theHarvester. And now for this section, our target will be wordpress.org domain. We will gather emails, subdomains, and IP addresses about it.

To do so we will run this command:

python3 theHarvester.py \
    -d wordpress.org \
    -l 500 \
    -b google,bing,yahoo,duckduckgo

And after this cli has automatically grabbed the available email address on the internet and the subdomain you will see a full list something like this.

[*] Target: wordpress.org

        Searching 0 results.
[*] Searching Bing.
        Searching 0 results.
[*] Searching Duckduckgo.
        Searching 100 results.
        Searching 200 results.
        Searching 300 results.
        Searching 400 results.
        Searching 500 results.
[*] Searching Google.

[*] No IPs found.

[*] No emails found.

[*] Hosts found: 227
---------------------
af.wordpress.org:198.143.164.252
am.wordpress.org:198.143.164.252
...
...
zh-sg.wordpress.org:198.143.164.252
zul.wordpress.org:198.143.164.252

Here we are using -d, -l, -b flags. By the way, do you know what these mean?

-d flag is used to search domain names, the official docs also say that you can even enter company name instead of the domain name. But entering domain name works great.

-l flag Limits the number of results to work with (bing goes from 50 to 50 results, google 100 to 100, and PGP doesn't use this option).

-b flag is used as a data source like from where to query the data in our case we used google, bing, yahoo, and DuckDuckGo. You might use Baidu, Linkedin, or maybe Twitter(we will use these two in the next section).

One more thing you may be asking were the emails, unfortunately, the domain we used didn't have any email exposed in search engines. But in the next section, you will see that the domain we used there have 2 emails.

Email Addresses, Linkedin Profiles, and Twitter handles:

In the previous section, we looked at a very basic feature of theHarvester which listed the available domains, emails, and IP addresses from specific Data Sources.

Now previously we used search engines as our data source. But now we will be using Linkedin, twitter as our data sources.

But before proceeding make sure your IP is proxied either through tor (using torghost) or maybe you can use any VPN, because probably your IP is banned by google for running the above automated request.

Nothing to worry, only your CLI cannot access google.com you can.

Now, first, we will change our domain to demonstrate who you can also find emails using this tool.

python3 theHarvester.py -d intigriti.com -l 500 -b google,bing, yahoo,duckduckgo

I can't show you the output, but you can test yourself and see the result.

Now let's use Linkedin as our data source:

python3 theHarvester.py \
    -d intigriti.com \
    -l 500 \
    -b linkedin

after the tool has found all the available users it will give an output something like this. Not like this but something like this (Because I can't make the names public LOL... if you are too crazy to know, check yourself).

[*] Searching Linkedin.

[*] Users found: 277
---------------------
A*****a S****a - Cyber Security Researcher
A***s A****f - Event Manager
...
...
s****a j**n - Accountant
xxxx yyyyy - security

Now I guess u got it just change the data sources and you will get your preferred results. I am not going to show you every result and every command, I am leaving it up to you to test other sources like twitter.

Bonus Tips:

You may think that this is what the tool can do right? Nope...

It is way more powerful, I am not showing you each and everything because this series is not just spoon-feeding you with all the techniques and methods of using a tool, rather just to make you familiar with the tool and telling you the rest so you can explore yourself the rest.

Go ahead and run theHarvester with the -h flag and you should see something like this:

*******************************************************************
*  _   _                                            _             *
* | |_| |__   ___    /\  /\__ _ _ ____   _____  ___| |_ ___ _ __  *
* | __|  _ \ / _ \  / /_/ / _` | '__\ \ / / _ \/ __| __/ _ \ '__| *
* | |_| | | |  __/ / __  / (_| | |   \ V /  __/\__ \ ||  __/ |    *
*  \__|_| |_|\___| \/ /_/ \__,_|_|    \_/ \___||___/\__\___|_|    *
*                                                                 *
* theHarvester 3.1.0                                         *
* Coded by Christian Martorella                                   *
* Edge-Security Research                                          *
* cmartorella@edge-security.com                                   *
*                                                                 *
******************************************************************* 


usage: theHarvester [-h] -d DOMAIN [-l LIMIT] [-S START] [-g] [-p] [-s] [-v] [-e DNS_SERVER] [-t DNS_TLD] [-n] [-c] [-f FILENAME] [-b SOURCE]

theHarvester is used to gather open-source intelligence (OSINT) on a company or domain.

optional arguments:
  -h, --help            show this help message and exit
  -d DOMAIN, --domain DOMAIN
                        company name or domain to search
  -l LIMIT, --limit LIMIT
                        limit the number of search results, default=500
  -S START, --start START
                        start with result number X, default=0
  -g, --google-dork     use Google Dorks for Google search
  -p, --port-scan       scan the detected hosts and check for Takeovers (21,22,80,443,8080)
  -s, --shodan          use Shodan to query discovered hosts
  -v, --virtual-host    verify host name via DNS resolution and search for virtual hosts
  -e DNS_SERVER, --dns-server DNS_SERVER
                        DNS server to use for lookup
  -t DNS_TLD, --dns-tld DNS_TLD
                        perform a DNS TLD expansion discovery, default False
  -n, --dns-lookup      enable DNS server lookup, default False
  -c, --dns-brute       perform a DNS brute force on the domain
  -f FILENAME, --filename FILENAME                                                                                                                                      
                        save the results to an HTML and/or XML file
  -b SOURCE, --source SOURCE
                        baidu, bing, bingapi, certspotter, crtsh, dnsdumpster, dogpile, duckduckgo, github-code, google, hunter, intelx, linkedin, linkedin_links,
                        netcraft, otx, securityTrails, spyse(disabled for now), threatcrowd, trello, twitter, vhost, virustotal, yahoo, all

carefully read these flags and test yourself what this tool can do and cannot do.

Conclusion:

Awesome. I hope you enjoyed the demos, but before we go I want to leave you with more information about this amazing tool.

First, I do recommend you learn more about theHarvester. This tool can easily be expanded for your needs. For example, you can create new modules that will automate specific searches for you. And trust me, it'd save you a lot of time.

And also, you can integrate theHarvester with other tools that you may use. Also, since theHarvester is a common line tool it can automate several searches. For example, it can create batch scripts that do dozens of searches at once.

So, go check out the GitHub page for this tool. In there you'll find tons of interesting information and even a wiki, which will teach you how to use the tool and how to customize it.

Also, you may be wondering what you can do to protect your company from this kind of information gathering attack.

Well, the first suggestion is to use theHarvester against your own company so then you can see what is available to the world. You may find really interesting things. For example, I found that my email address was on the About page of an old website that I used to have.

And that explains why I used to receive tons of spam email. And once you find information that is available to the internet you can try to reduce your footprint on the internet. For example, you can ensure that your DNS records are not revealing any sensitive information.

Or you can try to delete your email address from all the pages you can find. Or you can even make sure that you decommission old websites so then they are not available for hackers.

So that's it. I hope you liked our demos and that now you have a new tool in your belt for your next red team engagement. Any queries or suggestions, let me know in the comments. So, I'll see you later in another article in this series.

Top comments (0)