Since this article there have been some developments, check out A11y Unlocked: Screen Reader Automation Tests for the latest!
There are lots of toolkits out there for ensuring the inclusivity and accessibility of our sites: with the likes axe-core and htmlcs for static analysis, and pa11y, koa11y, and cypress-audit (there are more) for helping us to explore or automate static analysis checks into our CI pipelines. 🚀🚀
"Chosen by Microsoft, Google, development and testing teams everywhere, axe is the World’s leading digital accessibility toolkit."
Alongside our a11y tooling we've seen amazing growth in the landscape of browser testing automation tooling, with the rise of Cypress, Playwright (a new personal fav 🥰), and several other players. These have evolved to provide a rich collection of reliable APIs and tooling for end-to-end or integration testing sites across a solid matrix of browsers - complemented by looking to the likes of SauceLabs, BrowserStack, and Applitools who can provide an extensive range of browsers and devices so you don't need to worry about purchasing an expensive device cupboard! 💰💰💰
The ScreenReader Test Tooling Gap
But... these browser testing tools tend to lack any visibility or emphasis on a11y. Certainly through the likes of the testing library suite and it's guiding principles you can write tests in such a way that it mirrors the customer's usage, but only provides so much confidence, and it very much orientated to mouse, keyboard, and touch users. 🧐
And with regards to the a11y tools that are out there... we don't typically rely on static analysis for test coverage for other functional aspects of sites - indeed we use our favourite aforementioned cross-browser testing framework! It feels a shame that the "World’s leading digital accessibility toolkit" should be lacking in providing the ability to integration test with the actual assistive technologies that folks are using to navigate the web.
As a finger in the air of the limitations of current tooling, this article by Karl Groves from 2012 explains how 40-50% of the WCAG is not feasible to be tested by existing (and I'll admit in some cases also future) tooling.
"On the other hand, there are things which can be tested for but the results would need to be verified by a skilled human reviewer. For example, while machine testing can verify that the alt attribute exists for an image, it cannot tell you whether the value supplied for the alt attribute is a suitable alternative for the image. That requires a human."
Instead teams set on treating the accessibility of their sites as first class, rather than an afterthought, are forced to provide this testing coverage manually. This isn't necessarily a bad thing - manual exploration by your QAs adopting a ScreenReader using personna is the closest you can likely get to gaining confidence your site will work for folks out in the wild. 👩💻
But can we truly afford to have to manually QA every change to our sites to ensure they continue to work functionally for assistive technology? It is already resource-draining enough that teams typically automate golden path flows with browser testing tools to reduce the overhead of manual testing - there are simply too many browsers which demands a degree of trade-off. To name a core few:
- Safari on MacOS, iOS
- Chrome on MacOS, Windows, Android
- Firefox on MacOS, Windows
We're already at 7 different browser vs device combinations, and this isn't accounting for different viewports, locales, current and previous versions, and many other variations.
If we now throw VoiceOver, JAWS, Narrator, NVDA, Talkback, and friends into the mix we can easily triple the workload on our QAs. 😰
What is out there currently?
This gap in automated tooling for mirroring existing functional tests for mouse, keyboard, and touch, but for ScreenReaders has been bugging me for a couple of years. A few weekends back I decided to finally explore to see if I simply hadn't looked hard enough.
Initially it didn't look hopeful...
No. At least not one that is any good nor represents how a screen reader actually reads a page or responds to ARIA.
The best answer is to test in real screen readers, ideally by getting real users as they know how to use these tools. Consider contacting your local…
There doesn't appear to be any (large) players out there providing automated solutions for ScreenReader testing.
There are at least promising signs that this could be on the horizon, with companies starting to offer live testing solutions around assistive technologies such as Assistiv Labs and BrowserStack Live offering the ability to use ScreenReaders on remote VMs so you can manually test setups for devices or tooling you don't own or have licenses for. This makes manual a11y testing more accessibile (🥁), but doesn't solve our manual overhead problem.
Digging further started to yield a little more promise with the discovery of auto-vo, screen-reader-reader, and web-test-runner-voiceover (article - just found this while writing this post!) following the finding of a tweet from Smashing Magazine and then rabbit-holing on twitter... 🕳🐇
You can check out the Smashing Magazine article here.
These libraries offer a promising start to the idea of automating ScreenReader testing. Focus across the 3 appears to be primarily on VoiceOver for MacOS (perhaps, like me, because thats the development machine they use!), but screen-reader-reader
starts to pave the way for automating NVDA on windows machines. 💥
I haven't managed to have much success with auto-vo
but, you can see it in action with explanations through @ckundo's twitter thread:
And with it's precedent, you can imagine taking the core of it and making a far more generic and powerful API for control over VoiceOver, and potentially any ScreenReader... which is exactly what screen-reader-reader
appears to have aimed to do 2 years ago! Sadly neither of these packages appear to have maintained velocity to mature into the testing ecosystem... my suspicions being 2-fold:
- They aren't gracefully plug-n-play with the popular browser testing frameworks of today
- They don't work in (any?) CI environments
The prior point is where I see more promise with web-test-runner-voiceover
, the ability to easily integrate ScreenReader test flows into your existing test suite is an excellent sell - though I suspect success would need to come in the form of a "core" ScreenReader driver module with additional integration modules so you get support for the Cypress, Playwright, WDIO, Nightwatch, Protractor, etc. of the world.
The latter point, I'll come to... 😅
Adding to the soup
Fueled by the promise that it could actually be feasible to automate testing with ScreenReaders I have added my own flavour into the mixing pot with:
Guidepup
Guidepup is a screen reader driver for test automation.
It enables testing for VoiceOver on MacOS and NVDA on Windows with a single API.
Capabilities
- Full Control - If a screen reader has a keyboard command, then Guidepup supports it.
- Mirrors Real User Experience - Assert on what users really do and hear when using screen readers.
- Framework Agnostic - Run with Jest, with Playwright, as an independent script, no vendor lock-in.
Getting Started
Set up your environment for screen reader automation with @guidepup/setup
:
npx @guidepup/setup
Install Guidepup to your project:
npm install @guidepup/guidepup
And get cracking with your first screen reader automation code!
Examples
Head over to the Guidepup Website for guides, real world examples, environment setup, and complete API documentation with examples.
You can also check out these awesome examples to learn how you could use Guidepup in your projects.
Alternatively check…
I fear with 3 attempts already out there I am somewhat living one of my favourite xkcd comics... but worst case hopefully the guidepup
code will serve as a platform to help bolster whatever tool emerges on top! 🙃
Beyond some of the previous attempts I have codified 226 of the keyboard gestures, 129 keyboard commander commands, as well as exposed most of the built-in AppleScript APIs for VoiceOver. So far I've had good success integrating with Playwright, and suspect it would be similar for other browser testing tools.
The real gotcha which I sense others have faced is trying to get any such solution into a CI environment - if a tool can't run reliably on a remote VM then it simply isn't scalable for teams to use for automated workflows. Specifically, trying to get VoiceOver to be controllable in CI.
Getting over the VoiceOver issue
The goto for scripting and automating applications on MacOS is through AppleScript. To ensure users have a degree of protection from malicious scripts, applications that want to run AppleScript to control other applications have to first be allowlisted in the "Accessibility" section of the "System Preferences" "Security & Privacy" pane.
This leads to the first hurdle, if we want to automate VoiceOver from the terminal instance in CI we can't remote in and click links and enter passwords every run! Luckily, this is a well understood problem space, and many CI providers are already setup with the required changes to the TCC.db (transparency, consent, control!) on MacOS. Check out this article for details on TCC. Curious what the code for this configuration looks like? Check out the GitHub actions virtual-environment repo for a good example. 🤓
There is a second hurdle though. In order to control VoiceOver through AppleScript you also need this checkbox ticked:
Locally you can automate the checking of the tickbox with an AppleScript (feels like privilege escalation to use the very script to unlock more control!), e.g. the following works on Monterey:
-- Startup delay to reduce chance of "Application isn't running (-600)" errors
delay 1
set timeoutSeconds to 5.000000
-- Open VoiceOver Utility
tell application "VoiceOver Utility" to activate
tell application "System Events"
set endDate to (current date) + timeoutSeconds
repeat until (exists window 1 of application process "VoiceOver Utility")
if ((current date) > endDate) then
log "Can not find VoiceOver Utility window"
exit repeat
end if
delay 0.2
end repeat
tell application process "VoiceOver Utility"
-- Prevent VoiceOver welcome message on startup
set endDate to (current date) + timeoutSeconds
repeat until (exists checkbox 1 of splitter group 1 of window 1)
if ((current date) > endDate) then
log "Can not find VoiceOver welcome message checkbox"
exit repeat
end if
delay 0.2
end repeat
click checkbox 1 of splitter group 1 of window 1
-- Enable AppleScript control
set endDate to (current date) + timeoutSeconds
repeat until (exists checkbox 2 of splitter group 1 of window 1)
if ((current date) > endDate) then
log "Can not find AppleScript control checkbox"
exit repeat
end if
delay 0.2
end repeat
click checkbox 2 of splitter group 1 of window 1
end tell
-- Wait for SecurityAgent dialog to open
set endDate to (current date) + timeoutSeconds
repeat until (exists window 1 of application process "SecurityAgent")
if ((current date) > endDate) then
log "Can not find SecurityAgent window"
exit repeat
end if
delay 0.2
end repeat
-- Enter credentials
key code 48 using shift down
delay 0.2
keystroke "<USERNAME>" -- update accordingly, though beware of credentials in plaintext!
delay 0.2
key code 48
delay 0.2
keystroke "<PASSWORD>" & return -- update accordingly, though beware of credentials in plaintext!
-- Wait for SecurityAgent dialog to close
set endDate to (current date) + timeoutSeconds
repeat while (exists window 1 of application process "SecurityAgent")
if ((current date) > endDate) then
log "SecurityAgent window won't close"
exit repeat
end if
delay 0.2
end repeat
end tell
tell application "VoiceOver Utility" to quit
But when you try such a script in the likes of GitHub actions the "SecurityAgent" dialog never appears meaning we can't authenticate this configuration.
Diving into the depths of VoiceOver Utility configuration, it appears that the checkbox drives two pieces of configuration:
- The creation of a
/private/var/db/Accessibility/.VoiceOverAppleScriptEnabled
file (containing the single charactera
) - The setting of VoiceOver system preference defaults
The latter can be easily resolved through running:
defaults write com.apple.VoiceOver4/default SCREnableAppleScript 1
But the prior is a bit more of an issue... the /private/var/db/
directory and files therein are protected by Apple's SIP (Security Integrity Protection - read more here!). This can be disabled in recovery mode (be careful, it’s there for a reason) but rebooting a VM isn’t an option in CI typically. 😞
This lead me to wonder how do "native" Apple apps manage to write to SIP files if even sudo is a no-go?! Transpires they use the Security Foundation Framework to obtain authorisation with a special "system.preferences"
right using the obtainWithRight
API.
So after a several hour download of Xcode I dived into writing my first piece of swift... somewhat wary that the only prior art for this was for preferred networks, this unanswered thread and some defcon Apple malware talks...
This yielded something that appears to ask for permissions... and then doesn't work 😅
#!/usr/bin/env swift
import SecurityFoundation
let filePath = "/var/db/Accessibility/.VoiceOverAppleScriptEnabled"
do {
guard
let authref = SFAuthorization.authorization() as? SFAuthorization
else {
throw CocoaError(.fileWriteNoPermission)
}
let authent = Authenticator.sharedAuthenticator()
try authref.obtain(withRight: "system.preferences", flags: [.extendRights, .interactionAllowed, .preAuthorize])
defer { authref.invalidateCredentials() }
do {
let fileManager = FileManager.default;
// Check if file exists
if fileManager.fileExists(atPath: filePath) {
// Delete file
try fileManager.removeItem(atPath: filePath)
} else {
print("File does not exist")
}
} catch let error as NSError {
print("An error took place: \(error)")
}
} catch {
print(error)
}
...
Error Code=513 "“.VoiceOverAppleScriptEnabled” couldn’t be removed because you don’t have permission to access it."
It transpires even with authorization, SIP is a no-go without Apple signing keys for special entitlements.
So where do we go from here?
Despite some of the complications mentioned above, it's not all doom and gloom! The folks at Microsoft who maintain the GitHub Actions virtual machines have been very receptive to getting this setting enabled, and there's no indication (yet) that we should face the same issues for NVDA. ☺️
Seen any cool ScreenReader automation tools out there that are worth a share?
Got any ideas to solve some of the challenges mentioned?
Want to help out building out some cool tooling for a11y automated testing?
Reach out on my twitter @CraigMorten, or leave a comment below! Will be great to hear from you! 🚀🚀
Top comments (3)
Pleased to announce we've now updated the macos environment for GitHub actions such that automating VoiceOver is now possible 🎉
For those following along, now have a successful CI setup for CircleCI 🥳
Check out github.com/guidepup/circleci-voice...
The configuration should also work for any CI provider that offers macOS agents with SIP disabled!
Fantastic deep dive! I hit similar challenges with the security as well when using apple script in a CI env