DEV Community

Cover image for Scrape Google Spell Check with Python
Dmitriy Zub ☀️
Dmitriy Zub ☀️

Posted on • Edited on

Scrape Google Spell Check with Python

Contents: intro, imports, what will be scraped, process, code, links, outro.

Intro

This blog post is a continuation of Google's web scraping series. Here you'll see examples of how you can scrape Google Spell Check with Python. An alternative API solution will be shown.

Imports

from bs4 import BeautifulSoup
import requests, lxml
from serpapi import GoogleSearch
Enter fullscreen mode Exit fullscreen mode

What will be scraped

image

Process

Selecting CSS selector that support autocompletion on all languages

Process of using SerpApi from the playground search query to the final output

Code

from bs4 import BeautifulSoup
import requests, lxml

headers = {
    'User-agent':
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582"
}

params = {
  'q': 'fush ro dah',
  'hl': 'en',
  'gl': 'us',
}

html = requests.get('https://www.google.com/search?q=', headers=headers, params=params).text
soup = BeautifulSoup(html, 'lxml')

corrected_word = soup.select_one('a.gL9Hy').text
corrected_word_link = f"https://www.google.com{soup.select_one('a.gL9Hy')['href']}"
search_instead_for = soup.select_one('a.spell_orig').text
search_instead_for_link = f"https://www.google.com{soup.select_one('a.spell_orig')['href']}"
print(f'{corrected_word}\n{corrected_word_link}\nSearch instead: {search_instead_for}\n{search_instead_for_link}')

-------
'''
fus ro dah
https://www.google.com/search?hl=en&gl=us&q=fus+ro+dah&spell=1&sa=X&ved=2ahUKEwiIwb3ykMzxAhVWSzABHQtlDeMQkeECKAB6BAgBEDA
Search instead: fush ro dah
https://www.google.com/search?hl=en&gl=us&q=fush+ro+dah&nfpr=1&sa=X&ved=2ahUKEwiIwb3ykMzxAhVWSzABHQtlDeMQvgUoAXoECAEQMQ
'''
Enter fullscreen mode Exit fullscreen mode

Using Google Spell Check API

SerpApi is a paid API with a free trial of 5,000 searches.

from serpapi import GoogleSearch
import os

params = {
  "api_key": os.environ["API_KEY"],
  "engine": "google",
  "q": "fus ro dish",
  "gl": "us",
  "hl": "en"
}

search = GoogleSearch(params)
results = search.get_dict()

print(results['search_information']['organic_results_state'])
print(results['search_information']['spelling_fix'])

--------
'''
Some results for exact spelling but showing fixed spelling
fus ro dah
'''
Enter fullscreen mode Exit fullscreen mode

Links

Code in the online IDEGoogle Spell Check API

Outro

If you have any questions or something isn't working correctly or you want to write something else, feel free to drop a comment in the comment section or via Twitter at @serp_api.

Yours,
Dimitry, and the rest of SerpApi Team

Top comments (0)