Luca Liu

Posted on Dec 12, 2023 • Edited on Mar 16

Python Web Scraping in SAP Business Objects CMC: Get the List of Schedules

#python #sap #webscraping #data

Introduction

Recently, I faced a mission: organizing all SAP Business Objects schedules into an Excel file. The manual process was tedious—copying each schedule name and details and pasting them into Excel. With hundreds of schedules, it meant endless copy-pasting.

I scoured the internet for solutions and found that the primary solution is Query Builder. However, exporting the Power Builder results from the web into an Excel file is still challenging.

I also tried the SAP Business Objects API, it could fetch all schedules, but the number of recurring schedules is not the same as what we saw in SAP BO CMC.

So, I turned to Python, and it works! If you're grappling with the same challenge, my code might be the solution you're seeking.

Importing Libraries

import pandas as pd
from selenium import webdriver
from bs4 import BeautifulSoup
from selenium.webdriver.common.by import By
import time

Here, it’s importing necessary libraries. pandas for data manipulation, selenium for web scraping, BeautifulSoup for HTML parsing, and time for adding pauses between operations.

Setting up SAP BO Connection Details

bo_cmc = 'http://localhost:port/BOE/CMC'
bo_user_name = 'username'
bo_password = 'password'

It defines the SAP BusinessObjects Central Management Console (CMC) URL and login credentials.

Function to Get Soup (HTML Content)

def get_soup(driver, xpath):
    element = driver.find_element(By.XPATH, xpath)
    soup = BeautifulSoup(element.get_attribute("innerHTML"), 'lxml')
    soup.prettify()
    return soup

This function uses Selenium to locate an HTML element using XPath, extracts its inner HTML, and then creates a BeautifulSoup object for parsing.

Functions to Convert HTML to List and DataFrame

def bo_instances_html_to_list(soup):
    n = int(len(soup.select('tr td div')) / 16)
    values = [[] * n for _ in range(n)]
    for i in range(n):
        for j in range(16):
            values[i].append(soup.select('tr td div')[i * 16 + j].text)
    return values

def bo_values_to_df(values):
    df = pd.DataFrame(values)
    df.columns = ['', 'title', 'type', 'status', 'location', 'owner', 'completion_time', 'next_run_time',
                  'submission_time',
                  'start_time', 'duration', 'recurrence', 'expiry', 'server', 'error', 'title2']
    return df

def bo_instance_to_df(soup):
    values = bo_instances_html_to_list(soup)
    df = bo_values_to_df(values)
    return df

These functions convert HTML content of SAP BO instances into a list and then into a Pandas DataFrame.

Selenium Automation Function

def bo_selenium_to_df():
    options = webdriver.ChromeOptions()
    prefs = {'profile.default_content_settings.popups': 0}
    options.add_experimental_option('prefs', prefs)
    options.add_experimental_option("detach", True)
    driver = webdriver.Chrome(options=options)
    driver.get(bo_cmc)
    driver.switch_to.frame("servletBridgeIframe")
    time.sleep(3)

    username_input = driver.find_element(By.XPATH, '//*[@id="_id2:logon:USERNAME"]')
    password_input = driver.find_element(By.XPATH, '//*[@id="_id2:logon:PASSWORD"]')

    time.sleep(1)
    username_input.send_keys(bo_user_name)
    time.sleep(1)
    password_input.send_keys(bo_password)
    time.sleep(1)

    login_button = driver.find_element(By.XPATH, '//*[@id="_id2:logon:logonButton"]')
    time.sleep(1)
    login_button.click()

    time.sleep(5)
    html_text = driver.execute_script("return document.documentElement.outerHTML")
    contentFrame = driver.find_element(By.XPATH, '//*[@id="contentFrame"]')
    driver.switch_to.frame(contentFrame)
    time.sleep(2)

    innerContent = driver.find_element(By.XPATH, '//*[@id="innerContent"]')
    driver.switch_to.frame(innerContent)
    time.sleep(2)

    instance_manager_href = driver.find_element(By.XPATH, '//*[@id="manageList"]/li[1]/a[2]')
    instance_manager_href.click()
    time.sleep(1)

    # instancemanager
    ## switch to innerContent iframe
    html_page = driver.execute_script("return document.documentElement.outerHTML")
    contentFrame = driver.find_element(By.XPATH, '//*[@id="contentFrame"]')
    driver.switch_to.frame(contentFrame)
    time.sleep(1)

    innerContent = driver.find_element(By.XPATH, '//*[@id="innerContent"]')
    driver.switch_to.frame(innerContent)
    time.sleep(1)

    frames = []

    soup = get_soup(driver, xpath='//*[@id="UniversalRepositoryExplorer_detailView_mainTableBody"]')
    n = int(len(soup.select('tr td div')) / 16)
    df_start = bo_instance_to_df(soup=soup)
    frames.append(df_start)

    # In SAP BusinessObjects (SAP BO), schedules default to displaying 50 schedules per page.
    while n == 50:
        next_page_button = driver.find_element(By.XPATH, '//*[@id="UniversalRepositoryExplorer_goForwardButton"]')
        next_page_button.click()
        time.sleep(2)
        soup = get_soup(driver, xpath='//*[@id="UniversalRepositoryExplorer_detailView_mainTableBody"]')
        df = bo_instance_to_df(soup=soup)
        frames.append(df)
        n = int(len(soup.select('tr td div')) / 16)

    df = pd.concat(frames, axis=0)
    driver.close()
    drop_column = ['', 'completion_time', 'start_time', 'duration', 'server', 'error', 'title2']

    for column in drop_column:
        df = df.drop(column, axis=1)

    df['index'] = range(1, len(df) + 1)
    df = df[['index', 'title', 'type', 'status', 'location', 'owner', 'next_run_time',
             'submission_time', 'recurrence', 'expiry']]
    return df

This function uses Selenium to automate the process of logging into SAP BO, navigating to the schedule manager, extracting schedule details, and finally converting them into a Pandas DataFrame.

Executing the Selenium Function

df = bo_selenium_to_df()

This line calls the Selenium function and stores the resulting DataFrame in the variable df.

Explore more

Luca Liu

Hello there! 👋 I'm Luca, a Business Intelligence Developer with passion for all things data. Proficient in Python, SQL, Power BI, Tableau, SAP Business Objects.

Thank you for taking the time to explore data-related insights with me. I appreciate your engagement.

🚀 Connect with me on LinkedIn

🎃 Connect with me on X

🌍 Connect with me on Instagram

DEV Community

Python Web Scraping in SAP Business Objects CMC: Get the List of Schedules

Introduction

Importing Libraries

Setting up SAP BO Connection Details

Function to Get Soup (HTML Content)

Functions to Convert HTML to List and DataFrame

Selenium Automation Function

Executing the Selenium Function

Explore more

Luca Liu

Top comments (0)

Read next

DUCK (file structure) YOU!

Day 12 - Looping Excercises

How to Build your very own Google's NotebookLM

Master Python Coroutines: Create Custom Async Tools for Powerful Concurrent Apps

Introduction

Importing Libraries

Setting up SAP BO Connection Details

Function to Get Soup (HTML Content)

Functions to Convert HTML to List and DataFrame

Selenium Automation Function

Executing the Selenium Function

Explore more

Luca LiuFollow

Read next

DUCK (file structure) YOU!

Day 12 - Looping Excercises

How to Build your very own Google's NotebookLM

Master Python Coroutines: Create Custom Async Tools for Powerful Concurrent Apps

Luca Liu