This was originally drafted as an answer to the question ...
That grew into an entire post on its own 😅 (I talk too much)
As I'm the co-founder of Uilicious.com - a full end-to-end testing platform built on webdriver servers. This post is meant to be a series of key facts I wish I knew at the start of the "webdriver" learning journey.
Oneliner to explaining Selenium / Webdriver
Automation standard, to control all your web browsers via an API.
Webdriver Specification vs Selenium Implementation
- Think along the lines of HTML5 vs Chrome
- Webdriver refers to the specification
- Selenium refers to an opensource implementation of the specification.
Selenium is currently the most prominent implementation of webdriver, with several additional commands in selenium that are not part of the webdriver specification. Causing confusion on the distinction in between.
This is particularly important, to debug discrepancies in both documentation, for every browser.
However as they are used extremely interchangeably, it can get incredibly confusing in guides and tutorials. This is partially due to their mixed history together.
Understanding this difference will probably save you hours of documentation pain down the line.
Webdriver Specification is a mini-miracle on its own
For those who have done work across multiple browsers, especially in the earlier years (10+ years ago). You will know how much of an impossibility it was to get a group of browser vendors, who are competitors to each other, agree on something together.
The problem for webdriver, is that it's made worse by being a relatively "low priority" item, with resource constraints. Being a feature that will never be experienced directly by the end users.
With every browser only having a single to a handful of developers working on their webdriver implementation.
The fact that we have a somewhat working protocol alone, in w3c, is a small miracle from the many years of hard work from the Selenium development team.
It may be at times incomplete and inconsistent among browsers, however, it is a spec that we can at least somewhat agree on, and be proud of since it started formalizing in 2011.
So a small round of applause for them 👏
I may be deeply critical, I do however greatly respect what they have done so far.
Webdriver is a HTTP API, with many client-side library implementations
You can choose your language/poison of choice here
- HTTP API: W3C Specification
- PHP: php-webdriver
- Ruby: Selenium::Remote::Driver
One side advice, like the early days of CSS/JS on multiple browsers, learn to distrust the specifications compatibility between different browsers/clients. Documentation is inconsistent and all over the place.
While there is MDN mozilla.org that will happily tell you what works (or not) on which browsers, for CSS/JS.
Webdriver as of now do not have an equivalent online resource. Internally this is something we may plan to spin out on our own in Uilicious, considering we are starting to pile up a mountain of random post-it notes on how the protocol differs for each browser.
Webdriver is a "Remote Control Interface"
The phrase "Remote Control Interface" is the exact quote taken from the specification.
Webdriver is built as an automation protocol, making it less than ideal for test automation as they have very different objectives. For example, webdriver has very limited assertion commands.
To use webdriver for testing, most users will use a testing solution, which extends the commands for end-to-end testing.
For example, to check if a text exists on screen in webdriver, you will need to run the following commands.
- find the element where you expect the text to be
- get the element and its inner HTML text string
- check if text exists in the result text string
For uilicious (and a few other test solutions) we simplify it as a single command
I.see("Text You Expect to see on screen")
Typically a webdriver server refers to a single VM / Physical instance of the desired OS of choice, with the "webdriver server" program installed.
Alternatively, this could include mobile devices attached to the webdriver serve, typically by a USB cable.
This will then run typically up to 1 session at a time, on a single browser. Where the automation will take place from.
A dedicated server program, which connects multiple webdriver servers together for use on a single API endpoint. This is used to get around the limitations that each server will have, and run multiple sessions on a larger scale.
Along with user authentication, and limitations for access to the grid (to prevent any one user from running too many tests at a time)
Cloud providers for webdriver, Saucelab and browserstack provides such endpoints with devices as old as blackberries. Alongside the latest nightly build of chrome.
One common analogy we use in the office, is to think of uilicious commands as instructions we give over the phone. To the operator on the other side being the webdriver server. Executing commands such as
So from a networking endpoint, to test any website - the webdriver server on the other side must be able to load the given testing URL.
Likewise, just as how someone on the other side of the phone is unable to connect to a website hosted on your laptop on localhost directly. Neither can the webdriver server.
However in a similar situation, you are able to provide access through the usage of other tools such as ngrok, or port forwarding.
Hopefully that's enough key pointers for anyone new to webdriver who wish to dive deeply into it 🙂
What I work on in uilicious, is to run test scripts like these
// Lets go to dev.to I.goTo("https://dev.to") // Fill up search I.fill("Search", "uilicious") I.pressEnter() // I should see myself or my co-founder I.see("Shi Ling") I.see("Eugene Cheah")
And churn out sharable test result
Like the above. All through a fully managed cloud platform. https://uilicious.com
(Go give it a try 😉)