DEV Community

Cover image for Selenium 4 WebDriver Hierarchy: A Detailed Explanation
Faisalkhatri123 for LambdaTest

Posted on

Selenium 4 WebDriver Hierarchy: A Detailed Explanation

The inception of Selenium can be traced back to a web application that required frequent testing. This prompted Jason Huggins to create a program using JavaScript, which he named JavaScriptTestRunner and released in 2004.

However, he realized that the program was much more powerful and could be helpful to the community in testing; hence, he decided to open-source it and renamed it to Selenium Core.

Paul Hammant created Selenium Remote Control (Selenium RC) or Selenium 1, an upgraded version of Selenium Core. It fixed the issue where testers had to install Selenium Core and the web servers containing the web applications. Hence, they belong to the same domain as the Same-origin policy prohibits JavaScript from being used from a different domain name from which it was launched.

Patrick Lightbody developed Selenium Grid for running the tests in parallel to reduce the test execution time to minimal. Shinya Kasatani from Japan contributed to the creation of Selenium IDE in 2006, which helps to automate the browser using record and playback features.

Selenium WebDriver was created in 2006 by Simon Stewart. Selenium WebDriver is an open-source, cross-platform library designed to help automate browser testing. It is designed to provide a simple and consistent interface for interacting with web browsers and different elements on the web page to simulate user actions on the websites. Over the years, Selenium has undergone many changes and improvements and introduced new features per the latest software industry trend.

It is composed of three main components:

  • Language Bindings — Language bindings are a set of classes and methods that allow you to use a specific programming language with Selenium WebDriver for writing automated tests for the website. Currently, WebDriver supports multiple programming languages, including Java, C#, Python, Ruby, JavaScript, etc.

  • WebDriver API — The API is a set of classes and methods that allow you to interact with the browser through code. The API allows running the tests on different browsers like Chrome, Firefox, MS Edge, etc.

  • WebDriver Implementation — WebDriver is an interface to access web browsers programmatically. It manages the communication between the language bindings and the browser. It lets you automate and interact with the elements in the DOM.

In this Data driven testing tutorial, let us deep dive into what data driven testing is, its pros & cons, its types, data driven testing in an agile environment, benefits, and their best practices.

image1

However, Selenium 4 is now W3C Compliant with Selenium 4. You may no longer be required to add ‘tweaks’ in the test script to make it work across different browsers, as everything (i.e., browsers & WebDriver APIs) runs in the W3C standard protocol.

If you are using Selenium 3 and want to get your hands dirty with Selenium 4, please check our detailed guide on upgrading from Selenium 3 to Selenium 4.

image2

The introduction of Selenium Manager in Selenium version 4.6.0 is a big relief for the automation test engineers as it is not required to provide the executable driver path, nor do we need to use third-party libraries like WebDriverManager to start the browsers. Selenium Manager takes care of these browser drivers. We just need to have the respective driver installed in our machine on which we need to run the tests.

With WebDriver becoming completely W3C standardized, you can use it across different frameworks without any compatibility issues.

Though many of us would have used Selenium WebDriver for automation testing, there is a lower probability of each of us knowing the internals of architecture. The integral question is, “Does knowing the internal workings of Selenium and to what extent?.”

In my experience with Selenium, Appium, and other automation testing frameworks, understanding the internals of any framework helps in making the best possible use of the interfaces, classes, and methods provided by the same. We have witnessed how far Selenium has changed from its inception in 2002!

In this blog on Selenium 4 WebDriver Hierarchy, we will delve into the Selenium 4 WebDriver framework, specifically focusing on the hierarchy of the Selenium WebDriver and the abstract methods and nested interfaces within the WebDriver Interface. Additionally, we will also explore the hierarchy of the WebElement Interface and the abstract methods used within it.

Test your Puppeteer test scripts online. Deploy a quality build faster by running automated checks across over 3000+ browsers and OS combinations with the LambdaTest cloud. Try for free.

Selenium WebDriver Hierarchy

As an automation test engineer, we have been using Selenium WebDriver. Currently, while writing this blog on Selenium 4 WebDriver Hierarchy, Selenium’s latest version is 4.7.0. We know that by running the following line of code, the Chrome browser will be started, and we will be able to test the web page further using WebDriver methods.

WebDriver driver = new ChromeDriver();
Enter fullscreen mode Exit fullscreen mode

However, very few automation test engineers know about the internal working of the WebDriver Interface. So, let’s dive deep into this and understand how Selenium WebDriver works.

Here is the pictorial representation of the Selenium WebDriver hierarchy.

image3

RemoteWebDriver Class

Let’s start with the RemoteWebDriver class because it is a fully implemented WebDriver Interface class extended by every BrowserDriver class within the Selenium framework.

image4

RemoteWebDriver class has the following nested classes:

  • RemoteTargetLocator — This is a fully implemented class of WebDriver.TargetLocator interface.

  • RemoteWebDriverOptions — This is a fully implemented class of WebDriver.Options interface. This class has the following nested classes:

  • RemoteTimeouts — This class implements WebDriver.Timeouts interface and provides the full implementation of all its abstract methods.

  • RemoteWindow — This class implements WebDriver.Window interface and provides the full implementation of all its abstract methods.

image5

In this XCUITest tutorial, learn about XCUITest framework and its benefits for mobile automation testing. Take a look at how XCUITest works and see how to use it to test your mobile applications.

How to use the RemoteWebDriver class?

This class is important for running tests on cloud testing platforms like LambdaTest. As on the cloud platform, we may need to run the tests on multiple browsers and different platforms.

Cloud Selenium Grid like LambdaTest offers many benefits, including scalability, reliability, and security, which can be challenging to achieve with a local Selenium Grid. By performing Selenium automation testing on the cloud, it allows for a wider range of browser coverage, test coverage, and test execution in parallel, which is not possible in a local Selenium Grid.

Subscribe to the LambdaTest YouTube Channel and stay updated with the latest tutorials around Selenium testing, Cypress testing, and more.

Since we are running on remote machines, we need to provide the remote URL, so the tests get executed correctly on the desired platforms and browsers.

RemoteWebDriver Class has the following constructors, which can be used to instantiate an instance of the class:

RemoteWebDriver(ICapabilities)

RemoteWebDriver(Uri, ICapabilities)

RemoteWebDriver(ICommandExecutor, ICapabilities)

RemoteWebDriver(Uri, ICapabilities, TimeSpan)

We will be using the RemoteWebDriver(Uri, ICapabilities) to run the tests on the LambdaTest platform. The capabilities shown in this example may differ from platform to platform as per their configuration settings. However, the usage of the RemoteWebDriver class remains the same.

Here is the screenshot of a method showing how we can use the RemoteWebDriver class to run the tests on the LambdaTest platform.

image6

RemoteWebdriver class implements the following interfaces:

  • WebDriver

  • JavaScriptExecutor

  • TakesScreenshot

  • HasVirtualAuthenticator

  • PrintsPage

  • HasCapabilities

  • Interactive

Let’s talk about each of the interfaces implemented by RemoteWebDriver class in detail, starting with the WebDriver Interface first.

WebDriver Interface

WebDriver Interface is the core of the Selenium WebDriver as it has all the required methods and respective nested interfaces defined within it, which helps in simulating user actions inside the browser.

Following is the UML diagram of WebDriver Interface (Selenium WebDriver 4):

image7

Get started with this complete Selenium guide of automation testing. Learn what Selenium is, its architecture, advantages and more for automated cross browser testing. Read more.

WebDriver Interface has the following abstract methods defined in it, which has no body, and these methods are fully implemented by RemoteWebDriver class

get(String url) This method will return void and help us to navigate to the URL we provide in the method parameter.
getCurrentUrl() This method will return the current URL of the web page.
getTitle() This method will return the title of the current web page.
findElements(By by) This method will return a list of webelements per the locator strategy called using Selenium’s By class which is an abstract class.
findElement(By by) This method will return a webelement as per the locator strategy called using Selenium’s By class.
getPageSource() This method will return the source of the last loaded page in the representation of DOM.
close() This method will close the current window and quit the browser if it is the last window currently open.
quit() This method quits the driver session, closing every associated window.
getWindowHandles() This method will return a set of window handles that can be used to iterate over all open windows of this WebDriver instance.
getWindowHandle() This method returns the current window handle, which is in focus within the current WebDriver instance. This can be used to switch to this window at a later stage.

How to use the WebDriver Interface?

The WebDriver interface defines methods for interacting with a web page through a web browser. To use the WebDriver interface, you must first import the appropriate libraries and instantiate a WebDriver object.

image8

Next, let’s move toward the nested interfaces within the WebDriver Interface and discuss them in detail.

Nested Interfaces within WebDriver Interface

The following are the nested interfaces within the WebDriver Interface. Let’s discuss each nested interface in detail.

image45

Window Interface

This interface has all the methods that help manage the current window. Currently, at the time of writing this blog on Selenium 4 WebDriver Hierarchy, Selenium’s latest version is 4.7.0, and in this current version, this Window interface is in Beta.

image9

This interface has the following abstract methods, which are fully implemented by the RemoteWebDriver Class:

get(String url) This method will return void and help us to navigate to the URL we provide in the method parameter.
getCurrentUrl() This method will return the current URL of the web page.
getTitle() This method will return the title of the current web page.
findElements(By by) This method will return a list of webelements per the locator strategy called using Selenium’s By class which is an abstract class.
findElement(By by) This method will return a webelement as per the locator strategy called using Selenium’s By class.
getPageSource() This method will return the source of the last loaded page in the representation of DOM.
close() This method will close the current window and quit the browser if it is the last window currently open.
quit() This method quits the driver session, closing every associated window.
getWindowHandles() This method will return a set of window handles that can be used to iterate over all open windows of this WebDriver instance.
getWindowHandle() This method returns the current window handle, which is in focus within the current WebDriver instance. This can be used to switch to this window at a later stage.

How to use the Window Interface with WebDriver?

We need to instantiate the WebDriver interface by creating a new object by calling its implementing class. Once the object is created, we can simply use the Window interface as shown in the screenshot below:

image10

Options Interface

This interface has all the methods to help manage the stuff in a browser menu. With the help of this interface, we perform the following actions:

  • Add, get, and delete a cookie

  • Set timeouts in the browser

  • Manage window

  • Fetch different types of logs. (This is in beta as per the latest Selenium WebDriver version 4.7.0)

image11

Online Selenium Grid to run your browser automation testing scripts on cloud infrastructure containing 3000+ desktop and mobile browser environments. Perform Selenium Testing on a cloud automation testing grid that scales along with your tests.

This interface has the following abstract methods, fully implemented by the RemoteWebDriver Class.

get(String url) This method will return void and help us to navigate to the URL we provide in the method parameter.
getCurrentUrl() This method will return the current URL of the web page.
getTitle() This method will return the title of the current web page.
findElements(By by) This method will return a list of webelements per the locator strategy called using Selenium’s By class which is an abstract class.
findElement(By by) This method will return a webelement as per the locator strategy called using Selenium’s By class.
getPageSource() This method will return the source of the last loaded page in the representation of DOM.
close() This method will close the current window and quit the browser if it is the last window currently open.
quit() This method quits the driver session, closing every associated window.
getWindowHandles() This method will return a set of window handles that can be used to iterate over all open windows of this WebDriver instance.
getWindowHandle() This method returns the current window handle, which is in focus within the current WebDriver instance. This can be used to switch to this window at a later stage.

How to use Options Interface with WebDriver?

We need to instantiate the WebDriver interface by creating a new object by calling its implementing class. Once the object is created, we can simply use the Options interface, as shown in the screenshot below.

image12

Navigation Interface

This interface has all the methods to access the browser’s history and navigate to a URL. With the help of this interface, we perform the following actions:

  • Navigate Back, Forward in the browser

  • Navigate to a URL in the browser

  • Refresh the WebPage

image13

This interface has the following abstract methods, which are fully implemented by the RemoteWebDriver Class:

get(String url) This method will return void and help us to navigate to the URL we provide in the method parameter.
getCurrentUrl() This method will return the current URL of the web page.
getTitle() This method will return the title of the current web page.
findElements(By by) This method will return a list of webelements per the locator strategy called using Selenium’s By class which is an abstract class.
findElement(By by) This method will return a webelement as per the locator strategy called using Selenium’s By class.
getPageSource() This method will return the source of the last loaded page in the representation of DOM.
close() This method will close the current window and quit the browser if it is the last window currently open.
quit() This method quits the driver session, closing every associated window.
getWindowHandles() This method will return a set of window handles that can be used to iterate over all open windows of this WebDriver instance.
getWindowHandle() This method returns the current window handle, which is in focus within the current WebDriver instance. This can be used to switch to this window at a later stage.

How to use Navigations Interface with WebDriver?

We need to instantiate the WebDriver interface by creating a new object by calling its implementing class. Once the object is created we can simply use the Window interface as shown in the screenshot below:

image14

TargetLocator Interface

This interface has all the methods to send future commands to different frames and windows. With the help of this interface, we perform the following actions:

  • Working with different Frames.

  • Working with different windows or Tabs in the browser.

  • Working with different Alerts in the browser.

image15

This interface has the following abstract methods, fully implemented by the RemoteTargetLocator class, which is a nested class in the RemoteWebDriver class:

get(String url) This method will return void and help us to navigate to the URL we provide in the method parameter.
getCurrentUrl() This method will return the current URL of the web page.
getTitle() This method will return the title of the current web page.
findElements(By by) This method will return a list of webelements per the locator strategy called using Selenium’s By class which is an abstract class.
findElement(By by) This method will return a webelement as per the locator strategy called using Selenium’s By class.
getPageSource() This method will return the source of the last loaded page in the representation of DOM.
close() This method will close the current window and quit the browser if it is the last window currently open.
quit() This method quits the driver session, closing every associated window.
getWindowHandles() This method will return a set of window handles that can be used to iterate over all open windows of this WebDriver instance.
getWindowHandle() This method returns the current window handle, which is in focus within the current WebDriver instance. This can be used to switch to this window at a later stage.

How to use TargetLocator Interface with WebDriver?

We need to instantiate the WebDriver interface by creating a new object by calling its implementing class. Once the object is created, we can simply use the TargetLocator interface, as shown in the screenshot below.

image17

Timeouts Interface

This interface has all the methods to manage the timeout behavior for WebDriver instances. With the help of this interface, we perform the following wait actions in Selenium:

  • Implicit Wait

  • Script timeout

  • Page load timeout

image18

This interface has the following abstract methods, which are fully implemented by the RemoteTimeouts class, which is a nested class in the RemoteWebDriver class:

get(String url) This method will return void and help us to navigate to the URL we provide in the method parameter.
getCurrentUrl() This method will return the current URL of the web page.
getTitle() This method will return the title of the current web page.
findElements(By by) This method will return a list of webelements per the locator strategy called using Selenium’s By class which is an abstract class.
findElement(By by) This method will return a webelement as per the locator strategy called using Selenium’s By class.
getPageSource() This method will return the source of the last loaded page in the representation of DOM.
close() This method will close the current window and quit the browser if it is the last window currently open.
quit() This method quits the driver session, closing every associated window.
getWindowHandles() This method will return a set of window handles that can be used to iterate over all open windows of this WebDriver instance.
getWindowHandle() This method returns the current window handle, which is in focus within the current WebDriver instance. This can be used to switch to this window at a later stage.

How to use Timeouts Interface with WebDriver?

We need to instantiate the WebDriver interface by creating a new object by calling its implementing class. Once the object is created, we can simply use the Timeouts interface as shown in the screenshot below:

image19

The following methods are deprecated as per the current Selenium Version 4.7.0:

  • implicitlyWait(long time, TimeUnit unit)

  • setScriptTimeout(long time, TimeUnit unit)

  • pageLoadTimeout(long time, TimeUnit unit)

  • setScriptTimeout()

Let’s now move towards the next interface that is implemented by RemoteWebDriver class.

JavaScriptExecutor Interface

JavaScriptExecutor Interface provides the mechanism to WebDriver so that it can execute JavaScript code snippets. This interface has the following two abstract methods, which are fully implemented in the RemoteWebDriver class.

get(String url) This method will return void and help us to navigate to the URL we provide in the method parameter.
getCurrentUrl() This method will return the current URL of the web page.
getTitle() This method will return the title of the current web page.
findElements(By by) This method will return a list of webelements per the locator strategy called using Selenium’s By class which is an abstract class.
findElement(By by) This method will return a webelement as per the locator strategy called using Selenium’s By class.
getPageSource() This method will return the source of the last loaded page in the representation of DOM.
close() This method will close the current window and quit the browser if it is the last window currently open.
quit() This method quits the driver session, closing every associated window.
getWindowHandles() This method will return a set of window handles that can be used to iterate over all open windows of this WebDriver instance.
getWindowHandle() This method returns the current window handle, which is in focus within the current WebDriver instance. This can be used to switch to this window at a later stage.

TakesScreenshot Interface

This interface helps the WebDriver take screenshots of the web page or WebElement as required and store them in different ways. The screenshot captured is returned to the WebDriver endpoint in Base64 format.

This interface has the following abstract method, which is implemented in the RemoteWebDriver class:

get(String url) This method will return void and help us to navigate to the URL we provide in the method parameter.
getCurrentUrl() This method will return the current URL of the web page.
getTitle() This method will return the title of the current web page.
findElements(By by) This method will return a list of webelements per the locator strategy called using Selenium’s By class which is an abstract class.
findElement(By by) This method will return a webelement as per the locator strategy called using Selenium’s By class.
getPageSource() This method will return the source of the last loaded page in the representation of DOM.
close() This method will close the current window and quit the browser if it is the last window currently open.
quit() This method quits the driver session, closing every associated window.
getWindowHandles() This method will return a set of window handles that can be used to iterate over all open windows of this WebDriver instance.
getWindowHandle() This method returns the current window handle, which is in focus within the current WebDriver instance. This can be used to switch to this window at a later stage.

HasVirtualAuthenticator Interface

This interface helps in allowing the WebDriver to access the virtual authenticator API.

A user’s public-key credentials can be stored in a hardware device, or a software entity called an authenticator. Authenticators help enable the key-based authentication mechanism in a passwordless manner. Virtual Authenticator emulates such authenticators for testing.This interface has the following abstract methods, which are fully implemented by the RemoteWebDriver class:

get(String url) This method will return void and help us to navigate to the URL we provide in the method parameter.
getCurrentUrl() This method will return the current URL of the web page.
getTitle() This method will return the title of the current web page.
findElements(By by) This method will return a list of webelements per the locator strategy called using Selenium’s By class which is an abstract class.
findElement(By by) This method will return a webelement as per the locator strategy called using Selenium’s By class.
getPageSource() This method will return the source of the last loaded page in the representation of DOM.
close() This method will close the current window and quit the browser if it is the last window currently open.
quit() This method quits the driver session, closing every associated window.
getWindowHandles() This method will return a set of window handles that can be used to iterate over all open windows of this WebDriver instance.
getWindowHandle() This method returns the current window handle, which is in focus within the current WebDriver instance. This can be used to switch to this window at a later stage.

PrintsPage Interface

This interface allows the printing of the current page within the browser. It has the following abstract method, which is fully implemented in RemoteWebDriver class:

get(String url) This method will return void and help us to navigate to the URL we provide in the method parameter.
getCurrentUrl() This method will return the current URL of the web page.
getTitle() This method will return the title of the current web page.
findElements(By by) This method will return a list of webelements per the locator strategy called using Selenium’s By class which is an abstract class.
findElement(By by) This method will return a webelement as per the locator strategy called using Selenium’s By class.
getPageSource() This method will return the source of the last loaded page in the representation of DOM.
close() This method will close the current window and quit the browser if it is the last window currently open.
quit() This method quits the driver session, closing every associated window.
getWindowHandles() This method will return a set of window handles that can be used to iterate over all open windows of this WebDriver instance.
getWindowHandle() This method returns the current window handle, which is in focus within the current WebDriver instance. This can be used to switch to this window at a later stage.

HasCapabilities Interface

This interface can be used for run-time detection of features by classes to indicate that they can describe their capabilities.

This interface is fully implemented by the RemoteWebDriver class and has the following abstract method:

get(String url) This method will return void and help us to navigate to the URL we provide in the method parameter.
getCurrentUrl() This method will return the current URL of the web page.
getTitle() This method will return the title of the current web page.
findElements(By by) This method will return a list of webelements per the locator strategy called using Selenium’s By class which is an abstract class.
findElement(By by) This method will return a webelement as per the locator strategy called using Selenium’s By class.
getPageSource() This method will return the source of the last loaded page in the representation of DOM.
close() This method will close the current window and quit the browser if it is the last window currently open.
quit() This method quits the driver session, closing every associated window.
getWindowHandles() This method will return a set of window handles that can be used to iterate over all open windows of this WebDriver instance.
getWindowHandle() This method returns the current window handle, which is in focus within the current WebDriver instance. This can be used to switch to this window at a later stage.

With this, we come to the end of the discussion on the WebDriver interface. Let’s now jump on to another important topic around Selenium WebDriver, which is about WebElements and understanding its hierarchy.

WebElement Hierarchy

Knowing about WebElement hierarchy is also required. We can not run the web automation tests unless we locate the WebElements on the page to perform the required user simulation steps.

WebElement is an interface that extends the SearchContext and TakesScreenshot interfaces.

RemoteWebElement class is the fully implemented class of the WebElement interface.

The following image shows the WebElement in pictorial representation:

image25

Which are the most wanted test automation tools that have climbed the top of the ladder so far? Let’s take a look.

Abstract methods declared within the WebElement Interface

As you can check in the UML diagram below, abstract methods are declared, which are implemented by the RemoteWebElement and EventFiringWebElement classes:

image27

get(String url) This method will return void and help us to navigate to the URL we provide in the method parameter.
getCurrentUrl() This method will return the current URL of the web page.
getTitle() This method will return the title of the current web page.
findElements(By by) This method will return a list of webelements per the locator strategy called using Selenium’s By class which is an abstract class.
findElement(By by) This method will return a webelement as per the locator strategy called using Selenium’s By class.
getPageSource() This method will return the source of the last loaded page in the representation of DOM.
close() This method will close the current window and quit the browser if it is the last window currently open.
quit() This method quits the driver session, closing every associated window.
getWindowHandles() This method will return a set of window handles that can be used to iterate over all open windows of this WebDriver instance.
getWindowHandle() This method returns the current window handle, which is in focus within the current WebDriver instance. This can be used to switch to this window at a later stage.

With this, we come to the end of this blog on Selenium 4 WebDriver Hierarchy explaining the Selenium WebDriver’s architecture. Let’s summarize the points that we discussed.

Summary

With this blog on Selenium 4 WebDriver Hierarchy, I hope you are now fully aware and better understand the Selenium WebDriver architecture. I hope you will be able to utilize this knowledge in your project and make efficient use of the Selenium WebDriver framework. Some key takeaways:

  • RemoteWebDriver class is the fully implemented class of the WebDriver interface.

  • RemoteWebDriver class implements the following interfaces:

  • WebDriver

  • JavaScriptExecutor

  • HasCapabilities

  • HasVirtualAuthenticator

  • Interactive

  • PrintsPage

  • TakesScreenshot

  • Each BrowserDriver (ChromeDriver, FirefoxDriver, etc.) class extends the RemoteWebDriver class.

  • RemoteWebDriver class has the following nested classes:

  • RemoteTargetLocator

  • RemoteWebDriverOptions

  • RemoteTargetLocator class which is a nested class within RemoteWebDriver class that implements the WebDriver.TargetLocator interface.

  • RemoteWebDriverOptions which is a nested class of RemoteWebDriver class, implements the WebDriverOptions interface, and it has the following nested classes within it:

  • RemoteTimeout — This class implements the WebDriver.Timeouts interface.

  • RemoteWindow — This class implements the WebDriver.Window interface.

  • WebDriver interface extends SearchContext interface.

  • WebDriver interface has the following nested interfaces:

  • Options

  • Window

  • Timeouts

  • Navigation

  • TargetLocator

  • WebElement interface implements the SearchContext and TakeScreenshot interfaces.

  • RemoteWebElement and EventFiringWebElement classes are fully implemented classes of the WebElement interface.

Happy Testing!

Latest comments (1)

Collapse
 
evertones profile image
Everton Schneider

This is the kind of this that we can't find just in documentation.
Great article!!