DEV Community

Darya Shirokova
Darya Shirokova

Posted on • Edited on

Introduction into End-to-End Testing of GUI Applications

This is a first blogpost from a number of blogposts on end-to-end testing. It covers the concept of end-to-end testing, what benefits it may provide and what trade-offs you might need to accept if you use end-to-end tests in you project.

What is End-To-End testing?

End-to-end (E2E) tests aim to test the application from users perspective. In case of GUI Applications, whether it is Web, Desktop or Mobile, user interacts with UI elements of your product such as buttons, text fields, checkboxes, etc, and this is the behaviour that the E2E tests should try to reproduce.

An E2E test covers core user journeys, such as creating an account, adding items to the shopping basket or saving an item to the wish-list.

Unlike units tests which only verify sizeable components of the application such as a module or a class, E2E tests verify the application as a whole. Another key difference is that unit tests could replace dependencies of the component under test (especially external dependencies like a database) with test doubles (such as mocks or fakes), while E2E tests use real dependencies, though they could (and often should) be isolated from the production environment.

Considerations on when to add E2E tests

So when should we consider End-to-end tests for GUI applications? E2E tests can automate manual testing and provide confidence that new changes don’t break the existing user journeys. They are a handy tool for applications with complex UI component to verify the interaction between frontend and backend, or to verify that all components work well together even when evolving independently.

There are a few things to consider before adding them:

  • Writing and maintaining robust and meaningful E2E tests can be a difficult task that consumes a lot of time during development.
  • You will need to set up and maintain the infrastructure to run the tests.
  • E2E tests are very often flaky, i.e. they might fail without detecting any actual issue in your product, even when you follow best practises. You will need to invest time into both maintenance and health of the tests or you will end up with a test suite that nobody trusts even when they actually detect some bugs.

Let’s discuss some tips on how to approach these problems when writing E2E tests.

Test cases structure

First of all, define what tests you want to add and how granular they need to be. Most likely, there are already lower level tests (such as unit tests) which extensively cover business logic of various subcomponents of your system, so you may want to focus on high-level core user journeys in E2E tests. According to the Test Pyramid concept, as you move to higher levels, the number of tests should decrease and you should focus more on testing integration between different component.

This approach allows to test how all the different components work together and whether users are able to achieve their goals when using you application.

Another rule of thumb is to consider adding an additional test case once there is a bug in your application which was not caught by the existing tests, but you should ask yourself whether this should be a unit or an E2E test.

One of the good practices is to include the following stages in your test case:

  1. Reset the state of the application (or clean up after execution). Since E2E tests are often large and difficult to setup, individual test cases often share the state (database, file system, etc). To make sure that one test case doesn’t influence another, the shared state should be reset between test cases (more on this later on).
  2. Arrange the test. This includes setting up the stage for the test - you prepare the state of the application so that preconditions for the test are satisfied. For example, if you want to test order cancellation, you’ll first create an order in the prepare stage. For GUI E2E tests, you usually have options on how to execute this stage. You can either prepare the state of the test from user perspective by interacting with UI elements (in which case this stage will effectively merge with the Act stage), or you can make a backend call / upload a configuration file to set the expected state of the test. While the first option better represents the user journey, it makes the test slower (thus increasing the probability of flakes) and consumes more resources. So this is a trade-off you’d need to consider. Often, the prepare part is already covered by other test cases, in which case executing this stage via a config file or a backend call might be a better approach.
  3. Act. This is the stage where you simulate user’s behaviour by interacting with UI elements to complete the journey.
  4. Assert. This stage can be different for E2E tests depending on your use case. You need to verify the state of the system changed as expected, e. g. by making a backend call. Another technique that might be useful is diffing (screenshot or text), that compares the current screen or the produced output with expected.

Flakes

Flakes in UI tests are almost inevitable. Flake is when a test fails from time to time for reasons not related to actual issues in the application under test. UI tests are especially susceptible to these kind of issues as they test the system as a whole and depend on response from different components. Additionally, due to their nature, they are run in a multithreading environment managed by an operating system which may cause various issues. Some examples:

  • Test tries to execute the next action after interacting with UI element, however the previous action is still in progress (e.g. you expect a new window to appear but it didn’t).
  • Some components introduce non-breaking changes which still break UI tests. For example, when I worked on tests for one of the plugins my team developed, we encountered an issue where the test would fail because the application suggests to install a new version when you open it.
  • Inconsistences in rendering of text or elements could cause flakiness, especially in screenshot diff tests (e.g. a slightly different font or font aliasing, an element moved by a pixel, etc).
  • Many more possibilities :)

Let’s discuss some techniques to reduce flakiness:

  • As mentioned above, always make sure test cases don’t depend on one another. Each state should either start execution from the scratch or at least make sure to reset the state of each test in the beginning of each test. In some cases it might be preferable to do it in the beginning of the test instead of making a clean up in the end of execution - some end-to-end tests are executed in the non-isolated environment (e.g. OS), and the state might have be changed from outside of tests or the previous execution could have aborted before completed cleaned up. This clean up can be done, for example, in special methods provided by test frameworks, such as methods using @Before and @After annotation in JUnit.
  • A common approach when interacting with UI elements is to give the application some time to act and propagate the change - a wait method. If feasible, it is better to use an upgraded version of the method - wait_until. Instead of just waiting for a specific time, define what state of the application you expect after the interaction. With that approach you can also specify a much larger timeout without unnecessarily slowing down the test as it will proceed as soon as the desired state is achieved - or fail after a time limit is reached with a high level of confidence that an actual bug is detected. An example of using wait_until is to verify that a certain form appeared on UI after you pressed a button.
  • For UI interactions, you can use retry with wait_until to minimise the chance of the flake.
  • Consider including intermediate validations in your tests. If the test requires a lot of steps to reproduce a user’s journey, consider periodically asserting the intermediate results of the operation. This doesn’t necessarily reduces the flakes, but might be handy to quickly define the root cause of the failure when it happens.

Infrastructure

Let’s discuss the question like when and how often to run the E2E tests.

In the perfect world, your tests would run before a pull request or a code change is merged, similar to other types of tests such as unit tests.

However, this is not always feasible. First of all, running E2E might be costly. It can also be time-consuming, and the last thing you want is to slow down the iteration cycle for developers (combine slowness with flakiness and E2E tests can be a real annoyance for engineers).

In that case, you would need to set up a runner which runs the tests with frequency feasible for your application, e.g. once a day. For this approach you’ll also need to set up monitoring & notification system to ensure failures are reported to the team’s email alias and / or bug tracker.

Maintenance

One of perhaps the most costly parts of the E2E tests is their maintenance. If this part is skipped, the most likely outcome is that the test will be flaky so that even when they fail for valid reason, nobody cares to check. In other words, tests will be useless without proper maintenance (so why bother writing them in the first place).

One of the approaches is to set up the on duty rota shared between all team members. It is important to make it’s not a one-person-job otherwise your colleague will be unhappy and the rest of the team won’t develop necessary skills to write useful tests.

The responsibilities should include monitoring failures, verifying whether a failure is an actual bug or a flake and making tests “green”.

To make this effort useful in the long run, once a flake is detected a good practice for the on-duty person in to try to address this flake so that it doesn’t happen in the future (e.g. introduce wait_until).

Conclusion

There are lot’s of tradeoffs to consider when adding E2E tests for UI applications. While it might not be realistic to have them for each project, they might be a great tool for complex systems with multiple components where it is hard to manually verify all the core user journeys. They are a great addition not only to Web Application, but also to the Desktop apps which often have scheduled releases and patching bugs after the code reaches end users might be challenging.

In the next blogpost, let’s explore an example using the pywinauto library to write a simple test case for a Windows GUI Application.


Darya Shirokova is a Software Engineer @ Google

Top comments (0)