💡 What is Bose Framework?
Bose Framework is a framework built for selenium developers packed with best practices from experienced bot developers to help create undetectable, low boilerplate and easy to debug bots. 🤖
It helps you scrape or automate target websites with ease and gives you the mental powers of experienced bot developers at your fingertips, saving you hours of development time. 👨🏻💻
🏆 Top Feautres
Bose is a battery packed framework that implements following feautres to help you create bots faster, saving your valuable Development Time. ✨
- Go Low Boilerplate with Bose Launching a browser using Bare Selenium requires writing a significant amount of code:
from selenium import webdriver
driver_path = 'path/to/chromedriver'
driver = webdriver.Chrome(executable_path=driver_path)
driver.get('https://www.example.com')
driver.quit()
With Bose, you can launch browser in few lines of code without having to worry about specifying paths:
from bose import *
class Task(BaseTask):
def run(self, driver):
driver.get('https://www.example.com')
- Configure Profile, Window Size, and User Agent with a Single Line of Code
In bare Selenium, if you want to configure options such as the profile, user agent, or window size, it requires writing a lot of code, as shown below:
from selenium.webdriver.chrome.options import Options
from selenium import webdriver
driver_path = 'path/to/chromedriver.exe'
options = Options()
profile_path = '1'
options.add_argument(f'--user-data-dir={profile_path}')
user_agent = 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.37")'
options.add_argument(f'--user-agent={user_agent}')
window_width = 1200
window_height = 720
options.add_argument(f'--window-size={window_width},{window_height}')
driver = webdriver.Chrome(executable_path=driver_path, options=options)
With Bose, you can specify them in a single line of code using predefined variables. Here's an example:
class Task(BaseTask):
browser_config = BrowserConfig(user_agent=UserAgent.user_agent_106, window_size=WindowSize.window_size_1280_720, profile=1)
- Exception Handling that every Bot Developer wants
Bose also addresses a common frustration with Selenium - when an exception occurs, the browser crashes and closes automatically. This behavior is not desirable for bot developers and makes debugging hard.
Bose, on the other hand, keeps the browser open in the event of an exception, allowing for easier debugging of the problem.
- Remembers the Past Runs
Let's say you went to drink coffee while your bot was running, and when you came back, you noticed that the bot had closed.
As a developer, you might have an itch to see the last screenshot taken before the browser was closed or to know how much time the bot took to complete its task.
Well, Whenever you run Bose, it automatically stores important details such as
- the screenshot taken before closing
- the time it took to run the bot
- any errors that occurred.
After, each run a new folder is created in tasks
folder which contains three files, listed below:
task_info.json
It contains information about the task run such as
- duration for which the task run,
- the ip details of task
- the user agent
- window_size
- profile
final.png
This is the screenshot captured before driver was closed.
page.html
This is the html source captured before driver was closed. Very useful to know in case your selectors failed to select elements.
error.log
In case your task crashed due to exception we also store error.log which contains the error due to which the task crashed. This is very helful in debugging.
- Built to defeat Cloudflare, Perimeterx by default
Multi Million Dollar Companies protects their valuable data with the help of Cloudflare and PerimeterX.
Now, you might be wondering how to bypass systems like Cloudflare and PerimeterX. Well, a brilliant Developer named Leon created a ChromeDriver that has excellent support for bypassing all major bot detection systems such as Distil, Datadome, Cloudflare, and others.
To use it, simply pass the use_undetected_driver
option to the BrowserConfig
in your code, as shown below:
from bose import BaseTask, BrowserConfig
class Task(BaseTask):
browser_config = BrowserConfig(use_undetected_driver=True)
- Output Data in CSV or JSON with a Single Line of Code
Outputting data in CSV or JSON requires a significant amount of imperative code, as shown below:
import csv
import json
def write_json(data, filename):
with open(filename, 'w') as fp:
json.dump(data, fp, indent=4)
def write_csv(data, filename):
with open(filename, 'w', newline='', encoding='utf-8') as csvfile:
fieldnames = data[0].keys() # get the fieldnames from the first dictionary
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
writer.writeheader() # write the header row
writer.writerows(data) # write each row of data
data = [
{
"text": "\u201cThe world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.\u201d",
"author": "Albert Einstein"
},
{
"text": "\u201cIt is our choices, Harry, that show what we truly are, far more than our abilities.\u201d",
"author": "J.K. Rowling"
}
]
write_json(data, "data.json")
write_csv(data, "data.csv")
Bose simplifies these complexities by encapsulating them in the Output module for easy reading and writing of data:
from bose import Output
data = [
{
"text": "\u201cThe world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.\u201d",
"author": "Albert Einstein"
},
{
"text": "\u201cIt is our choices, Harry, that show what we truly are, far more than our abilities.\u201d",
"author": "J.K. Rowling"
}
]
Output.write_json(data, "data.json")
Output.write_csv(data, "data.csv")
- Run the Same Code Everywhere, whether it's on Mac, Linux, or Windows. Forget the need to change driver paths.
Bose simplifies cross-platform development by abstracting away the differences between operating systems such as Windows, Mac, and Linux.
You no longer need to specify driver paths specific to each OS when launching browser.
- Adds Powerful Methods to Supercharge Bot Development
The driver you receive in the run
method of the Bose Task is an extended version of Selenium that adds powerful methods to make working with Selenium much easier. Some of the popular methods added to the Selenium driver by Bose Framework are:
METHOD | DESCRIPTION |
---|---|
get_by_current_page_referrer(link, wait=None) | simulate a visit that appears as if you arrived at the page by clicking a link. This approach creates a more natural and less detectable browsing behavior. |
js_click(element) | enables you to click on an element using JavaScript, bypassing any interceptions(ElementClickInterceptedException) from pop-ups or alerts |
get_cookies_and_local_storage_dict() | returns a dictionary containing "cookies" and "local_storage” |
add_cookies_and_local_storage_dict(self, site_data) | adds both cookies and local storage data to the current web site |
organic_get(link, wait=None) | visits google and then visits the “link” making it less detectable |
local_storage | returns an instance of the LocalStorage module for interacting with the browser's local storage in an easy to use manner |
save_screenshot(filename=None) | save a screenshot of the current web page to a file in tasks/ directory |
short_random_sleep() and long_random_sleep(): | sleep for a random amount of time, either between 2 and 4 seconds (short) or between 6 and 9 seconds (long) |
get_element_or_* [eg: get_element_or_none, get_element_or_none_by_selector, get_element_by_id, get_element_or_none_by_text_contains,] | find web elements on the page based on different criteria. They return the web element if it exists, or None if it doesn't. |
is_in_page(target, wait=None, raise_exception=False) | checks if the browser is in the specified page |
In simple words, Bose is an excellent framework that simplifies the boring parts of Selenium and web scraping.
🚀 Get Started with Bose
Now, let's see how you can have the magic of Bose at your finger tips.
Start by Cloning the Template
git clone https://github.com/omkarcloud/bose-starter my-bose-project
Then change into that directory, install dependencies, open vscode, and start the project:
cd my-bose-project
python -m pip install -r requirements.txt
code .
python main.py
The first run will take some time as it downloads the chrome driver executable, subsequent runs will be fast.
Once started it will scrape data from quotes.toscrape.com and store the results in /output/finished.json
✨ Upcoming Features in V2 to supercharge 🔋 Bot Development
- Kubernetes Integration to help you scrape data at Google’s Scale [Priority]
- Save Storage by storing the profile in a single JSON file by storing cookies and local storage for the website. [Priority]
- Provide a temporary email service [Priority]
- Purchase hundreds of pre-created Google and Microsoft accounts [Priority]
- Built-in IP rotation for requests [Priority]
- Captcha Solving implemented right into Bose [Priority]
- Generate names, emails, usernames, etc., for users in countries such as India, Russia, Europe, China, and America.
📚 Summary
Simply put, Bose empowers you to effortlessly automate or scrape your Target Website and its content with the ease of cutting butter with a knife.
👋 Hi Reader,
What do you think? Do you see the value of Bose Framework?
Share your thoughts in the comments and I will reply to every single comment.
Top comments (0)