DEV Community

loading...
Cover image for Scrape DuckDuckGo Organic Results with Python

Scrape DuckDuckGo Organic Results with Python

Dimitry Zub
A blog about web scraping using mostly Python, Ruby, and occasionally other languages. Do karting.
Originally published at serpapi.com Updated on ・3 min read

Contents: intro, imports, what will be scraped, process, code, links, outro.

Intro

This is the first blog post of the DuckDuckGo web scraping series. Here you'll see how to scrape Organic Search Results using Python and requests_html library. An alternative API solution will be shown.

In short, it's a good idea to focus not only on one place (Google) because DuckDuckGo users get a higher conversion rate and tend to have a lower bounce rate.

Data from Similarweb to show that the total amount of visits on June 2021 was almost 1 billion with a bounce rate of 14.04%!

image

Imports

from requests_html import HTMLSession
Enter fullscreen mode Exit fullscreen mode

What will be scraped

image

Process

Selecting container with all data, title, link, snippet, icon with SelectorGadget Chrome extension.

The reason why request-html was used instead of beautifulsoup is because everything comes from the javascript and to get the data it needs to be rendered. It could be also done with selenium. It's the easiest approach to get this data I found.

But, you can parse this data from <script> tag which will require a lot more time to find the right data and a lot of trial and error.

Also, an alternative way to scrape DuckDuckGo without Selenium.

Code

from requests_html import HTMLSession

session = HTMLSession()
response = session.get('https://duckduckgo.com/?q=fus+ro+dah&kl=us-en')
response.html.render()

for result in response.html.find('.links_deep'):
    title = result.find('.js-result-title-link', first=True).text
    link = result.find('.result__extras__url', first=True).text
    snippet = result.find('.js-result-snippet', first=True).text
    icon = f"https:{result.find('img.result__icon__img', first=True).attrs['data-src']}"
    print(f'{title}\n{link}\n{snippet}\n{icon}\n')

------------------
'''
Urban Dictionary: Fus ro dah
https://www.urbandictionary.com/define.php?term=Fus ro dah
Fus ro dah. Literally means Force, Balance, and Push. The first dragon shout you learn in The Elder Scrolls V: Skyrim. In their tongue he is known as Dovahkiin, Dragonborn, Fus ro dah.
https://external-content.duckduckgo.com/ip3/www.urbandictionary.com.ico

Fus Ro Dah - Instant Sound Effect Button | Myinstants
https://www.myinstants.com/instant/fus-ro-dah/
Instant sound effect button of Fus Ro Dah . Fus Ro Dah. From skyrim. 8,072 users favorited this sound button.
https://external-content.duckduckgo.com/ip3/www.myinstants.com.ico
...
'''
Enter fullscreen mode Exit fullscreen mode

Using DuckDuckGo Organic Results API

SerpApi is a paid API with a free plan.

The first difference that you might encounter is that you will get 30 results instead of 10. The second difference is that you don't have to render javascript which will lead to faster program execution. The third difference is that you immediately get access to a structured JSON string and don't have to figure out how to scrape certain elements.

import json # for pretty output
from serpapi import GoogleSearch

params = {
  "api_key": "YOUR_API_KEY",
  "engine": "duckduckgo",
  "q": "fus ro dah",
  "kl": "us-en"
}

search = GoogleSearch(params)
results = search.get_dict()

for result in results['organic_results']:
    print(json.dumps(result, indent=2))


-------------------
'''
{
  "position": 1,
  "title": "FUS RO DAH!!! - YouTube",
  "link": "https://www.youtube.com/watch?v=Ip7QZPw04Ks",
  "snippet": "Finally found original upload of the prank footage: http://www.youtube.com/watch?v=wmM00L...(video is older but original poster)I am the original poster/crea...",
  "favicon": "https://external-content.duckduckgo.com/ip3/www.youtube.com.ico"
}
...
'''
Enter fullscreen mode Exit fullscreen mode

Links

Code in the online IDEDuckDuckGo Organic Results APINeil Patel BlogDuckDuckGo Instant Answer APIScrape without Selenium

Outro

If you have any questions or something isn't working correctly or you want to write something else, feel free to drop a comment in the comment section or via Twitter at @serp_api.

Yours,
Dimitry, and the rest of SerpApi Team.

Discussion (0)