Adblock Plus and (a little) more

The rise and fall of screenshot-based testing · 2021-10-18 15:20 by Kathrin Jennewein

Written by Toni Feliu
Once upon a time, the team developing the Adblock Plus (ABP) extension decided to add automated end-to-end tests to, formerly the main ABP project. The same team had already developed test pages used by manual tests, so why not use them on the automation?

The implemented solution was end-to-end based on screenshot comparison. Once developed, the automation code didn’t need to change in order to add new test cases. We just needed to create new pages on a separate project, and our code would immediately start running them.

Generic tests

The main goal was checking that specific filters worked on the extension. In a nutshell, this is how our solution worked:


describe(“Filters”, () => {
  let tests = getPageTests();

  for (let [url, title] of tests) {
    it(`applies ${title} filters`, () => {
      load(url);
      addFilters();
      checkScreenshots();
    });
  }
});

Each test inside the loop loads a specific test page (i.e. blocking tests), adds specific filters to the extension and checks if certain page elements are blocked by comparing expected and actual screenshots.

What fuels this solution is discovering which test pages exist at runtime. That is handled by getPageTests(), which makes a request to the test pages index, parsing the HTML response to return an array of test page URLs and titles (full implementation here).

Now, let’s imagine having two testpages: blocking and hiding. The string template `applies ${title} filters` would generate two test cases at runtime: “applies blocking filters” and “applies hiding filters”. Meanwhile, a new header test page gets deployed. In that case, the next execution would automatically have an additional test called “applies header filters” without having to change any line of code. Lovely, isn’t it?

Specific tests

Life was not that simple, since some tests did not follow that generic pattern.

For instance, to check pop-up blocking we wanted the test to click on a pop-up link. We called those cases specialized. Also, old browser versions might not support some extension features, and that meant those test cases had to be skipped for those browsers.

To cover specific tests we adapted the previous code as follows:


describe(“Filters”, () => {
  let tests = getPageTests();

  for (let [url, title] of tests) {
    it(`applies ${title} filters`, () => {
      if (isExcluded(url, this.browser))
        this.skip();

      load(url);
      addFilters();

      if (url in specializedTests)
        specializedTests[url].run();
      else
        checkScreenshots();
    });
  }
});

this refers to the Mocha context, which allows calling this.skip() to skip the current test when needed. That’s better than calling return, which would log the test as passed instead of skipped.

After that, the specialized tests would use specific code to run, while generic tests would still do screenshot checks to decide if a test passed or failed.

Flaky screenshots

So, why screenshot comparison in the first place? Checking that the rendering of a browser page is identical to what’s expected means we can forget about specific HTML element checks. An easy solution to test generically. A magical, universal check with a simple, binary answer: screenshots either match or do not. Heaven on (software testing) Earth?

Not really. To start with, taking a screenshot means storing a few kilobytes, either on memory or on disk. Those kilobytes are then compared side by side with the expected screenshot, which also had to be previously stored. All of them are expensive operations compared with regular unit test checks. Moreover, the screenshots may not match because the actual page is not ready to be checked. Of course, the whole process could be retried, which means doubling an already lengthy test. For us, that meant several seconds per test.

Other times the screenshots would never match: scrollbars unexpectedly show up on the page, a video widget renders and shows different controls depending on the loaded mp4 file, or a page element has slightly different size from the expected one. Those reasons are unrelated to the real test, but the test fails anyway. Even more, those failures may only happen on the CI, which uses a different operating system than the one used on local development (a.k.a. “works on my machine”).

Finally, we do have legit failures. In that case, what will the error message be? Probably something vague like Error: Screenshots don’t match. While that is certainly true, it says nothing about the specific issue. That forces us to visually inspect the failing screenshot to know what’s actually failing. There must be a better way.

And there is a better way, which will be covered in an upcoming article. Until then, stay tuned!

Tags:

Comment

Commenting is closed for this article.