Browser automation using docker and python

#python #selenium #docker

Few weeks ago I was working on the new report of customer's game. The platform that's providing campaigns reports don't have public API to generate the campaign reports on request with any kind of developer key to access.

But it's possible to request such reports by using their dashboard. I know it's a bit odd to rely on UI for downloading such reports but it was the only one way to get access for customer's valuable data.

Lets define requirements for this idea:

should be standalone python script for easy execution and integration with existing ETL libraries
should not require extra software on the server except the docker package(that's pretty flexible)

Now we are ready to give a try and build something runnable. In this post going to use specific libraries to get access to the docker process because of specific version of installed package in CentOS(in my example).

My requirements.txt:

docker==2.1.0
splinter==0.7.7
timeout-decorator==0.3.3

splinter is nice library to wrap browser drivers on automating anything on the pages.

Let's define the class for running Google Chrome container, later we will use before to get access to the page via splinter library.

class _ChromeContainer:
    '''
    _ChromeContainer should handle run of chrome docker container
    on background.

    Requires to have docker service on machine to pull images
    and run images.
    '''
    def __init__(self):
        self.__image_name = "selenium/standalone-chrome:3.10.0"
        self.__client = docker.from_env()

    def run(self):
        '''
        Startup docker container with chromedriver, waiting for running state
        '''
        client = self.__client

        self.container = client.containers.run(self.__image_name,
                                               detach=True,
                                               ports={'4444/tcp': None})

        @timeout_decorator.timeout(120)
        def waiting_up(client: docker.client.DockerClient, container):
            while True:
                container.reload()
                if container.status == "running":
                    break
                time.sleep(1)

        waiting_up(client, self.container)

    def quit(self):
        '''
        kills and deletes named container
        '''
        self.container.kill()

    @property
    def public_port(self):
        container = self.__chrome_container.container
        return container.attrs["NetworkSettings"]["Ports"]["4444/tcp"][0]["HostPort"]

Now we are ready to use splinter and ahd _ChromeContainer to automate your task.

import timeout_decorator
import docker


from splinter import Browser


class Worker:
    def __init__(self):
        self.__chrome_container = _ChromeContainer()

    def process(self):
        self.__chrome_container.run()

        self.__web_client = Browser('remote',
                                    url="http://127.0.0.1:{}/wd/hub".format(self.__chrome_container.public_port),
                                    browser='chrome')

        # Example for login request:
        try:
            self.__login()
        finally:
            self.__web_client.quit()
            self.__chrome_container.quit()

    def __login(self):
        self.__web_client.visit("http://www.example.com/login")
        self.__web_client.fill('developer_session[email]', 'EXAMPLE_USERNAME')
        self.__web_client.fill('developer_session[password]', 'EXAMPLE_PASSWORD')
        button = self.__web_client.find_by_id('developer_session_submit')
        button.click()