Introduction
Recently, I faced a mission: organizing all SAP Business Objects schedules into an Excel file. The manual process was tedious—copying each schedule name and details and pasting them into Excel. With hundreds of schedules, it meant endless copy-pasting.
I scoured the internet for solutions and found that the primary solution is Query Builder. However, exporting the Power Builder results from the web into an Excel file is still challenging.
I also tried the SAP Business Objects API, it could fetch all schedules, but the number of recurring schedules is not the same as what we saw in SAP BO CMC.
So, I turned to Python, and it works! If you're grappling with the same challenge, my code might be the solution you're seeking.
Importing Libraries
import pandas as pd
from selenium import webdriver
from bs4 import BeautifulSoup
from selenium.webdriver.common.by import By
import time
Here, it’s importing necessary libraries. pandas
for data manipulation, selenium
for web scraping, BeautifulSoup
for HTML parsing, and time
for adding pauses between operations.
Setting up SAP BO Connection Details
bo_cmc = 'http://localhost:port/BOE/CMC'
bo_user_name = 'username'
bo_password = 'password'
It defines the SAP BusinessObjects Central Management Console (CMC) URL and login credentials.
Function to Get Soup (HTML Content)
def get_soup(driver, xpath):
element = driver.find_element(By.XPATH, xpath)
soup = BeautifulSoup(element.get_attribute("innerHTML"), 'lxml')
soup.prettify()
return soup
This function uses Selenium to locate an HTML element using XPath, extracts its inner HTML, and then creates a BeautifulSoup object for parsing.
Functions to Convert HTML to List and DataFrame
def bo_instances_html_to_list(soup):
n = int(len(soup.select('tr td div')) / 16)
values = [[] * n for _ in range(n)]
for i in range(n):
for j in range(16):
values[i].append(soup.select('tr td div')[i * 16 + j].text)
return values
def bo_values_to_df(values):
df = pd.DataFrame(values)
df.columns = ['', 'title', 'type', 'status', 'location', 'owner', 'completion_time', 'next_run_time',
'submission_time',
'start_time', 'duration', 'recurrence', 'expiry', 'server', 'error', 'title2']
return df
def bo_instance_to_df(soup):
values = bo_instances_html_to_list(soup)
df = bo_values_to_df(values)
return df
These functions convert HTML content of SAP BO instances into a list and then into a Pandas DataFrame.
Selenium Automation Function
def bo_selenium_to_df():
options = webdriver.ChromeOptions()
prefs = {'profile.default_content_settings.popups': 0}
options.add_experimental_option('prefs', prefs)
options.add_experimental_option("detach", True)
driver = webdriver.Chrome(options=options)
driver.get(bo_cmc)
driver.switch_to.frame("servletBridgeIframe")
time.sleep(3)
username_input = driver.find_element(By.XPATH, '//*[@id="_id2:logon:USERNAME"]')
password_input = driver.find_element(By.XPATH, '//*[@id="_id2:logon:PASSWORD"]')
time.sleep(1)
username_input.send_keys(bo_user_name)
time.sleep(1)
password_input.send_keys(bo_password)
time.sleep(1)
login_button = driver.find_element(By.XPATH, '//*[@id="_id2:logon:logonButton"]')
time.sleep(1)
login_button.click()
time.sleep(5)
html_text = driver.execute_script("return document.documentElement.outerHTML")
contentFrame = driver.find_element(By.XPATH, '//*[@id="contentFrame"]')
driver.switch_to.frame(contentFrame)
time.sleep(2)
innerContent = driver.find_element(By.XPATH, '//*[@id="innerContent"]')
driver.switch_to.frame(innerContent)
time.sleep(2)
instance_manager_href = driver.find_element(By.XPATH, '//*[@id="manageList"]/li[1]/a[2]')
instance_manager_href.click()
time.sleep(1)
# instancemanager
## switch to innerContent iframe
html_page = driver.execute_script("return document.documentElement.outerHTML")
contentFrame = driver.find_element(By.XPATH, '//*[@id="contentFrame"]')
driver.switch_to.frame(contentFrame)
time.sleep(1)
innerContent = driver.find_element(By.XPATH, '//*[@id="innerContent"]')
driver.switch_to.frame(innerContent)
time.sleep(1)
frames = []
soup = get_soup(driver, xpath='//*[@id="UniversalRepositoryExplorer_detailView_mainTableBody"]')
n = int(len(soup.select('tr td div')) / 16)
df_start = bo_instance_to_df(soup=soup)
frames.append(df_start)
# In SAP BusinessObjects (SAP BO), schedules default to displaying 50 schedules per page.
while n == 50:
next_page_button = driver.find_element(By.XPATH, '//*[@id="UniversalRepositoryExplorer_goForwardButton"]')
next_page_button.click()
time.sleep(2)
soup = get_soup(driver, xpath='//*[@id="UniversalRepositoryExplorer_detailView_mainTableBody"]')
df = bo_instance_to_df(soup=soup)
frames.append(df)
n = int(len(soup.select('tr td div')) / 16)
df = pd.concat(frames, axis=0)
driver.close()
drop_column = ['', 'completion_time', 'start_time', 'duration', 'server', 'error', 'title2']
for column in drop_column:
df = df.drop(column, axis=1)
df['index'] = range(1, len(df) + 1)
df = df[['index', 'title', 'type', 'status', 'location', 'owner', 'next_run_time',
'submission_time', 'recurrence', 'expiry']]
return df
This function uses Selenium to automate the process of logging into SAP BO, navigating to the schedule manager, extracting schedule details, and finally converting them into a Pandas DataFrame.
Executing the Selenium Function
df = bo_selenium_to_df()
This line calls the Selenium function and stores the resulting DataFrame in the variable df
.
Explore more
Thank you for taking the time to explore data-related insights with me. I appreciate your engagement.
Top comments (0)