Originally posted on Medium.
As a Selenium user, you probably know the story: Out of a sudden, your tests fail due to the infamous
NoSuchElementException. But this isn't something that only happens with Selenium. Most other test automation frameworks have their own (fancy) names for it, but the underlying problem is always the same: a particular UI element cannot be found.
Back in 2016, I did a small review on several Swing-compatible capture & replay tools as part of a project work at my alma mater. During this work, I stumbled across “An Extensible Heuristic-Based Framework for GUI Test Case Maintenance” by Scott McMaster and Atif M. Memon. In this paper, the authors introduced a formal model for the described situation—the GUI element identification problem. The problem typically occurs in regression testing, hence, between version n and n + i of the system under test (SUT). (Note that these versions are not necessarily stable releases, every change that is built can cause the phenomenon.) Since McMaster and Memon seemed to focus on desktop applications, it affected the terminology they have chosen. For instance, a separate window within the SUT is denoted as set W, containing various actionable GUI elements e. If you come from a web application background, you would probably speak of pages and UI elements. Sometimes, you also want to assert or verify non-actionable elements such as labels to ensure they contain a certain string. Nonetheless, the concept is essentially the same. W’ is a subsequent version of W, in which the number of elements e’ might vary (i.e. |W| ≠ |W’|, if you are unfamiliar with this notation, have a look at these quick references on logic and set theory). Now, the GUI element identification problem is defined as:
For each actionable GUI element e in W, find a corresponding (possibly modified) element e’ in W’ whose actions implement the same functionality.
In order to do so, each element e, e’ ∈ W ∪ W’ must be assigned to exactly one of the following three sets:
- Deleted: elements from W with no corresponding elements in W’, therefore, they have been deleted in version n + i.
- Created: elements from W’ without an assignment to elements in W, thus, they have been created in version n + i.
- Maintained: elements from W which are still in W’, but that may have been modified.
If a test case uses elements from set D, then the underlying script must be adapted since mandatory elements are missing in the new version. Elements from C can affect test cases as well; for example, such an element might change the given layout, causing a different XPath somewhere else. It is usually also up to the developer or tester to decide whether these new elements need to be tested. Consequently, the accurate computation of M is the main interest of the GUI element identification problem. According to the authors, this “generally requires heuristic approaches in the absence of a model where each element has a programmer-defined unique identifier.”
Selenium offers various locator types (e.g. ID, name, or XPath), each with its own pros and cons. Additionally, more advanced test automation frameworks sometimes apply elaborate algorithms to not rely on a single locator. For instance, ReTest—the company I work for—embraces a novel paradigm called difference testing, where the complete UI state is being captured. As a result, all available attributes can be used at once. Another interesting approach is done by Testim. They incorporate historical data with the aid of machine learning to rank the locators for each element individually, which stabilizes the tests over time.
Although these (and other) techniques lead to a quite robust element recognition, they are no silver bullet … and I doubt the current state of the art offers one. But they shift the burden of the GUI element identification problem from humans to machines—reducing the amount of
NoSuchElementExceptions (or whatever they are called by your framework of choice) developers and testers have to deal with.