Klaus

Posted on Oct 16, 2018 • Edited on Jul 15, 2019

My Tests Are Being Maintained by Artificial Intelligence

#ai #machinelearning #testing #devops

One of the most frustrating things about testing is the fact that you have to keep them up-to-date with the corresponding UI.

Hey Klaus, those tests got broken again.
What?
Well, a lot changed in the UI and the tests are failing.
Oh man, I hate maintenance.

When estimating the effort for implementing automated tests, most of us tend to forget about including the cost of maintenance.

A suite of tests will become obsolete in a few weeks if no one takes care of keeping the steps updated.

Most of the changes in the UI tend to appear right before the big release, rendering your tests useless.

Ideally, the tester should be provided with a mockup and enough time to update the steps. But that never happens.

A few months ago, I set off to find a solution for this irritating issue.

It was clear that I could not stop the UI from changing. It has to change, it has to make room for the new features and functionalities.

I did try to improve our development process by asking our team to provide mockups, but that proved to be time-consuming and could have caused delays in our delivery times.

Pretty soon, it became obvious that the best solution was to streamline the maintenance process.

What about Artificial Intelligence?
What about it?
You can use it for maintenance.
I don't know how to AI.
Why don't you just use Endtest?

As it turns out, there's this Endtest platform out there that allows you to use Machine Learning for your Automated Tests.

I have this Wikipedia test suite, which contains 3 test cases:

Because I'm not allowed to change the UI from the Wikipedia website, I will just ruin the locators from my steps.

After that, all I have to do is to run the test with the Self-Healing option:

Every time you run a test, the AI learns more and more about your application. So, it learns how to identify your elements and your business flows in different ways.

If something changes and the steps from the test no longer match the User Interface from your application, the AI just finds the new way, just like a user.

And voilà! The test got fixed by the AI:

Top comments (10)

setagana • Oct 17 '18

I would struggle to trust the results of something like this. I defer to my bible of unit testing - Osherove's The Art of Unit Testing:

UPDATED AND FINAL DEFINITION 1.2 A unit test is an automated piece of code
that invokes the unit of work being tested, and then checks some assumptions
about a single end result of that unit. A unit test is almost always written using a unit testing framework. It can be written easily and runs quickly. It’s trustworthy, readable, and maintainable. It’s consistent in its results as long as production code hasn’t changed.

If both your code-under-test and the test itself are changing at the same time, how do you know which change is causing the pass/fail result?

Ben Klein • Oct 17 '18

This isn't a unit test, it's a end 2 end integration test. So the unit test definition doesn't apply here :) Just sayin #noharm

setagana • Oct 18 '18

I was hoping that the reason Osherove included the bold sentence in his definition would be clear to everyone, but apparently that needs to be explained to people who are more interested in semantics.

The principle can and should be abstracted to any automated test and even beyond computer science. One-Factor-at-a-Time (OFAT) is a paradigm of experiment design that sees use in almost every branch of science and engineering. The primary arguments against OFAT are that:

1) It fails to identify interaction effects that result from combined inputs in a multi-factor system.
2) It's inefficient in situations where data is costly.

Point 2 doesn't apply to automated testing because the cost of acquiring more data is simply waiting for your tests to run.

Point 1 makes an interesting case for why I would argue against self-changing tests. In the case of self-changing tests we have two factors that vary - the code-under-test and the tests themselves. We could state that we don't want there to be an interaction between the tests and the code because we don't want our code to perform one way in testing and another way when given to users.

But how could you check that no such interaction exists? You would need to have a representative sample of the range of values that both factors could assume, and analyze the results of the varying combinations. In the case of self-changing tests, you have no ability to make the system try out various values and present you with its findings for a given value of code-under-test, nor do you have any way to conceptualize what range of parameters the system is considering changing.

At least when you write tests yourself you can form some idea of what the range is of possible test parameters and make use of your knowledge of the domain to tease out any possible interactions between your test set and the code-under-test.

Ben Halpern • Oct 16 '18

I definitely feel like this is the direction things are going, but I'd be worried about being an early adopter of something like this. I feel like things could go horribly wrong.

But you're happy with Endtest so far?

Michiel Hendriks • Oct 17 '18 • Edited

I think somebody who works for that company is indeed happy with how the product they are talking about works.

13steinj • Oct 19 '18

Yeah, this is extremely shady and unethical. He made a post both here and on the python subreddit, without disclosing his vested interest in this service succeeding, even though he can't argue against the critiques properly.

awstahl • Oct 19 '18

What happens when the AI finds a workaround to a bug, and that design path wasn't intended and won't be obvious to users? How do you verify the "successes" it finds match expectations? Is there summation data against with to assert?

Klaus • Oct 19 '18 • Edited

That is an interesting question and I do have an answer.
The platform learns more about your elements as you run your test.
For example, you tell your test to find the element with the "add_to_cart_button" ID, the platform will also find another 20-30 ways to find that element and it will rank them based on reliability.
If that element has a new ID, your test would normally fail. But if you run it with the Self-Healing option, it will detect that anomaly and it will look into the alternative ways that it remembered to identify that element.
Based on rankings and on how many of those alternative ways return a positive or negative answer, it will clearly differentiate between a change and a bug. It is extremely unlikely that it will cover up a bug.
And every time it makes a change like that, it will be written clearly in the logs. It's not like it's doing it behind your back.

13steinj • Oct 19 '18

"Based on rankings and on how many of those alternative ways return a positive or negative answer, it will clearly differentiate between a change and a bug."

There are still decent chances for false positives and false negatives. You can't claim "it will be smart" because it is AI. You have to back it up with actual statistics of real world cases.