DEV Community

rajesh singh
rajesh singh

Posted on

Meet Bose Framework: 🚀 Your Swiss Army Knife as a Ninja Scraper ✨

đź’ˇ What is Bose Framework?

Bose Framework is a framework built for selenium developers packed with best practices from experienced bot developers to help create undetectable, low boilerplate and easy to debug bots. 🤖

It helps you scrape or automate target websites with ease and gives you the mental powers of experienced bot developers at your fingertips, saving you hours of development time. 👨🏻‍💻

🏆 Top Feautres

Bose is a battery packed framework that implements following feautres to help you create bots faster, saving your valuable Development Time. ✨

  • Go Low Boilerplate with Bose Launching a browser using Bare Selenium requires writing a significant amount of code:
from selenium import webdriver

driver_path = 'path/to/chromedriver'

driver = webdriver.Chrome(executable_path=driver_path)

driver.get('https://www.example.com')

driver.quit()
Enter fullscreen mode Exit fullscreen mode

With Bose, you can launch browser in few lines of code without having to worry about specifying paths:

from bose import *

class Task(BaseTask):
    def run(self, driver):
        driver.get('https://www.example.com')
Enter fullscreen mode Exit fullscreen mode
  • Configure Profile, Window Size, and User Agent with a Single Line of Code

In bare Selenium, if you want to configure options such as the profile, user agent, or window size, it requires writing a lot of code, as shown below:

from selenium.webdriver.chrome.options import Options
from selenium import webdriver

driver_path = 'path/to/chromedriver.exe'

options = Options()

profile_path = '1'

options.add_argument(f'--user-data-dir={profile_path}')

user_agent = 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.37")'
options.add_argument(f'--user-agent={user_agent}')

window_width = 1200
window_height = 720
options.add_argument(f'--window-size={window_width},{window_height}')

driver = webdriver.Chrome(executable_path=driver_path, options=options)
Enter fullscreen mode Exit fullscreen mode

With Bose, you can specify them in a single line of code using predefined variables. Here's an example:


class Task(BaseTask):
    browser_config = BrowserConfig(user_agent=UserAgent.user_agent_106, window_size=WindowSize.window_size_1280_720, profile=1)
Enter fullscreen mode Exit fullscreen mode
  • Exception Handling that every Bot Developer wants

Bose also addresses a common frustration with Selenium - when an exception occurs, the browser crashes and closes automatically. This behavior is not desirable for bot developers and makes debugging hard.

Bose, on the other hand, keeps the browser open in the event of an exception, allowing for easier debugging of the problem.

error prompt

  • Remembers the Past Runs

Let's say you went to drink coffee while your bot was running, and when you came back, you noticed that the bot had closed.

As a developer, you might have an itch to see the last screenshot taken before the browser was closed or to know how much time the bot took to complete its task.

Well, Whenever you run Bose, it automatically stores important details such as

  • the screenshot taken before closing
  • the time it took to run the bot
  • any errors that occurred.

After, each run a new folder is created in tasks folder which contains three files, listed below:

task_info.json

It contains information about the task run such as

  • duration for which the task run,
  • the ip details of task
  • the user agent
  • window_size
  • profile

task info

final.png

This is the screenshot captured before driver was closed.

final

page.html

This is the html source captured before driver was closed. Very useful to know in case your selectors failed to select elements.

Page

error.log

In case your task crashed due to exception we also store error.log which contains the error due to which the task crashed. This is very helful in debugging.

error log

  • Built to defeat Cloudflare, Perimeterx by default

Multi Million Dollar Companies protects their valuable data with the help of Cloudflare and PerimeterX.

Now, you might be wondering how to bypass systems like Cloudflare and PerimeterX. Well, a brilliant Developer named Leon created a ChromeDriver that has excellent support for bypassing all major bot detection systems such as Distil, Datadome, Cloudflare, and others.

To use it, simply pass the use_undetected_driver option to the BrowserConfig in your code, as shown below:

from bose import BaseTask, BrowserConfig

class Task(BaseTask):
    browser_config = BrowserConfig(use_undetected_driver=True)
Enter fullscreen mode Exit fullscreen mode
  • Output Data in CSV or JSON with a Single Line of Code

Outputting data in CSV or JSON requires a significant amount of imperative code, as shown below:

import csv
import json

def write_json(data, filename):
    with open(filename, 'w') as fp:
        json.dump(data, fp, indent=4)

def write_csv(data, filename):
    with open(filename, 'w', newline='', encoding='utf-8') as csvfile:
        fieldnames = data[0].keys()  # get the fieldnames from the first dictionary
        writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
        writer.writeheader()  # write the header row
        writer.writerows(data)  # write each row of data

data = [
    {
        "text": "\u201cThe world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.\u201d",
        "author": "Albert Einstein"
    },
    {
        "text": "\u201cIt is our choices, Harry, that show what we truly are, far more than our abilities.\u201d",
        "author": "J.K. Rowling"
    }
]

write_json(data, "data.json")
write_csv(data, "data.csv")
Enter fullscreen mode Exit fullscreen mode

Bose simplifies these complexities by encapsulating them in the Output module for easy reading and writing of data:

from bose import Output

data = [
    {
        "text": "\u201cThe world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.\u201d",
        "author": "Albert Einstein"
    },
    {
        "text": "\u201cIt is our choices, Harry, that show what we truly are, far more than our abilities.\u201d",
        "author": "J.K. Rowling"
    }
]

Output.write_json(data, "data.json")
Output.write_csv(data, "data.csv")
Enter fullscreen mode Exit fullscreen mode
  • Run the Same Code Everywhere, whether it's on Mac, Linux, or Windows. Forget the need to change driver paths.

Bose simplifies cross-platform development by abstracting away the differences between operating systems such as Windows, Mac, and Linux.

You no longer need to specify driver paths specific to each OS when launching browser.

  • Adds Powerful Methods to Supercharge Bot Development

The driver you receive in the run method of the Bose Task is an extended version of Selenium that adds powerful methods to make working with Selenium much easier. Some of the popular methods added to the Selenium driver by Bose Framework are:

METHOD DESCRIPTION
get_by_current_page_referrer(link, wait=None) simulate a visit that appears as if you arrived at the page by clicking a link. This approach creates a more natural and less detectable browsing behavior.
js_click(element) enables you to click on an element using JavaScript, bypassing any interceptions(ElementClickInterceptedException) from pop-ups or alerts
get_cookies_and_local_storage_dict() returns a dictionary containing "cookies" and "local_storage”
add_cookies_and_local_storage_dict(self, site_data) adds both cookies and local storage data to the current web site
organic_get(link, wait=None) visits google and then visits the “link” making it less detectable
local_storage returns an instance of the LocalStorage module for interacting with the browser's local storage in an easy to use manner
save_screenshot(filename=None) save a screenshot of the current web page to a file in tasks/ directory
short_random_sleep() and long_random_sleep(): sleep for a random amount of time, either between 2 and 4 seconds (short) or between 6 and 9 seconds (long)
get_element_or_* [eg: get_element_or_none, get_element_or_none_by_selector, get_element_by_id, get_element_or_none_by_text_contains,] find web elements on the page based on different criteria. They return the web element if it exists, or None if it doesn't.
is_in_page(target, wait=None, raise_exception=False) checks if the browser is in the specified page

In simple words, Bose is an excellent framework that simplifies the boring parts of Selenium and web scraping.

🚀 Get Started with Bose

Now, let's see how you can have the magic of Bose at your finger tips.

Start by Cloning the Template

git clone https://github.com/omkarcloud/bose-starter my-bose-project
Enter fullscreen mode Exit fullscreen mode

Then change into that directory, install dependencies, open vscode, and start the project:

cd my-bose-project
python -m pip install -r requirements.txt
code .
python main.py
Enter fullscreen mode Exit fullscreen mode

The first run will take some time as it downloads the chrome driver executable, subsequent runs will be fast.

Once started it will scrape data from quotes.toscrape.com and store the results in /output/finished.json

✨ Upcoming Features in V2 to supercharge 🔋 Bot Development

  • Kubernetes Integration to help you scrape data at Google’s Scale [Priority]
  • Save Storage by storing the profile in a single JSON file by storing cookies and local storage for the website. [Priority]
  • Provide a temporary email service [Priority]
  • Purchase hundreds of pre-created Google and Microsoft accounts [Priority]
  • Built-in IP rotation for requests [Priority]
  • Captcha Solving implemented right into Bose [Priority]
  • Generate names, emails, usernames, etc., for users in countries such as India, Russia, Europe, China, and America.

đź“š Summary

Simply put, Bose empowers you to effortlessly automate or scrape your Target Website and its content with the ease of cutting butter with a knife.


đź‘‹ Hi Reader,

What do you think? Do you see the value of Bose Framework?

Share your thoughts in the comments and I will reply to every single comment.

Top comments (0)