Why you can't test a screen reader (yet)!

When I first started to learn about accessibility I wanted to write automated tests to ensure that assistive technology devices, like screen readers, were interpreting my pages correctly. Because I'm not a daily screen reader user, I figured it would be easy for a regression to slip in unnoticed.

This idea, testing a screen reader, proved much harder than I thought. It's actually a bit of a holy grail in the accessibility space. Something that many have dreamed of, but few—if any—have achieved.

To understand why this is, it helps to know a bit about the process your page goes through when it finally gets announced by a screen reader.

The Accessibility Tree

When Chrome parses the DOM and the CSSOM it produces a tree of nodes. You may have heard folks on my team refer to this as the render tree. It tells the browser what to paint on screen, and when to omit things hidden by CSS.

But what many don't know is that during this process there's a second tree created called the accessibility tree. This tree removes all the nodes which are semantically uninteresting and computes the roles/states/properties for the remaining nodes. Similar to the render tree, it will also remove nodes hidden by CSS.

So, given the following HTML:

<html>  
<head>  
  <title>How old are you?</title>
</head>  
<body>  
  <label for="age">Age</label>
  <input id="age" name="age" value="42">
  <div>
    <button>Back</button>
    <button>Next</button>
  </div>
</body>  
</html>

Chrome would produce an accessibility tree that looks something like:

id=1 role=WebArea name="How old are you?"  
    id=2 role=Label name="Age"
    id=3 role=TextField labelledByIds=[2] value="42"
    id=4 role=Group
        id=5 role=Button name="Back"
        id=6 role=Button name="Next"

Next Chrome needs to convert these nodes into something the user's operating system can understand. On Windows it will produce a tree of IAccessible objects, and on macOS it will use NSAccessibility objects. Finally, this tree of OS-specific nodes gets handed off to a screen reader, which interprets it, and chooses what to say.

If you're really interested, you can check out this doc which explains a lot more about how accessibility works in Chromium.

So it's pretty tricky to know what any specific browser + OS + screen reader combo will announce. There are differences in how each browser builds its accessibility tree, there are differences in how well each browser supports ARIA, and there are differences in how the various screen readers interpret the information browsers give to them. Oof!

So how do we test this stuff?

Rather than test what a screen reader announces, a better place to start might be to test the accessibility tree itself. This avoids some of the layers of indirection mentioned above.

If you follow me on twitter, you've probably heard me mention a new standard we're working on called the Accessibility Object Model or "AOM", for short. There are a number of features AOM seeks to achieve, but one that I'm most excited about is the ability to compute the accessibility information for a given node.

const { role } = await window.getComputedAccessibleNode(element);  
assert(role, 'button');

Note, this API is still being sketched out so the final version may be different from the snippet above.

When this lands (hopefully in 2018) we should be able to start writing unit and integration tests that ensure our components are properly represented in the browser's accessibility tree. That's pretty darn close to Holy Grail territory!

Aside from AOM, there are linters and auditors we can use today, like eslint-plugin-jsx-a11y, Lighthouse, axe, and pa11y. Ultimately we'll want to use a combination of these tools plus AOM tests to monitor the accessibility of our apps. If you haven't seen Jesse Beach's talk, Scaling accessibility improvements with tools and process at Facebook, I recommend you give it a look to see how an organization the size of Facebook is integrating these tools into their process.

To wrap up, I think testing the output of a screen reader may still be a ways off, but in 2018 we're going to have more tools in our toolbox than ever before. If you want to learn more about accessibility fundamentals you can check out this free Udacity course and if you'd like to start incorporating more accessibility work into your team practice take a look at my session from this year's Google I/O. I'm really excited to see what you all build in 2018 😁

Comments (1)

Add a comment
Jason Knight's photo

Methinks you're overthinking this, and the methodology is flawed given HTML structural rules. Much of that seems to be the focus on "screen readers" instead of "non-visual user agents" -- since you get the latter you automatically address the former. The way you're thinking about it means nothing to my braille reader or search engines, but if you bother using logical document structure and proper semantics you instantly nab ALL user-agent types, instead of having to focus on "well this if for screen, this is for screen reader, this is for braille, this is for search".

Now, you're not overthinking it as bad as something like the pointlessly redundant code bloat BS that are aria roles or the dumbass new ignorantly redundant HTML 5 "structural" tags, but there are definite flaws.

Like the treatment of DIV. DIV and SPAN are semantically neutral, as such screen readers IGNORE them... or at least should; treating that as a grouping block has little to anything to do with what a screen reader does -- or at least what any screen reader an actual blind person would use does.

The entire core concept of HTML from the beginning is to say what thing ARE or WOULD BE in a properly written document. Paragraphs, lists, and headings. That's why presentational crap like FONT and CENTER should never have been added in the first place and were removed in 4 Strict.

It's also why B and I remain in the specification. B MEANS "would be bold in a professionally written document when not receiving 'more emphasis' such as a legal entities name", I MEANS "Would be italic in a professionally written document when not being CITEd or receiving emphasis, such as a book title". It's scary how many mouth-breathers over the past twenty years have said idiotic nonsense like B and I are deprecated" -- they never have been -- or "only use EM and STRONG" because their comprehension skills are lacking. When the spec says "use EM and STRONG when their semantic meaning is more appropriate" that does not mean "never use B and I"

It is misunderstandings like that which lead to problems when it comes time to see what screen readers do.

Same goes for numbered headings and horizontal rules, which are SUPPOSED to be the core of creating an accessible navigable layout -- and used properly they make aria roles and most of the new tags in HTML 5 little more than mind-numbingly idiotic pointless redundancies. It's not rocket science given that HTML (at least seems) to be based on writing norms I learned in bloody grade school in the '70's. Which I hear these days isn't taught unless you're an English major in your fifth year of college?

Following those norms your H1 should be THE (singular) heading (singular) that everything on every page of the document (site) is a subsection of. The same way the name of the book, newspaper, whitepaper, or specification appears at the top of every page or fold-pair. That's a top-most heading's JOB. Your H2 is the start of a major subsection of the current page, with the first H2 on the page (or first HR, we'll get to that shortly) indicating the start of your primary content -- so if the user wants to navigate straight to the content they just go to the first H2 or HR. (making HTML 5's NAV tag utterly pointless). H3 indicate the start of subsections of the H2 preceding them. H4 indicates the start of subsections of the H3 preceding them... care to guess what H5 and H6 MEAN?!? Even the lowly HR DOES NOT MEAN DRAW A LINE ACROSS THE SCREEN! it MEANS a change in topic or section where heading text is unwanted or unwarranted. and one of the few things I praise HTML 5 on is finally clarifying HR's semantic role in the specification.

Since these break the pages up into SEMANTIC sections, why do we need SECTION? If we can skip to the content via headings, why do we need NAV? ARTICLE has no equivalent in professional writing so why was it created? ASIDE if used for what the word actually means ends up as useless as address -- unless you're spending most of your time writing slash-fiction for Ferris Beuller or deadpool since it does NOT mean "oh this is a sidebar", it MEANS a literary aside, as in a section of text breaking the fourth wall, or adding clarification to the text it is associated with whilst being optional. Don't even get me started about the mind-numbing idiocy of HEADER and FOOTER.

This is also why if your first heading on a page isn't a H1, you're doing it all wrong. It's why if your skipping heading levels suddenly having H5 on pages that don't even have H3 or H4 for them to start subsections of, you're doing it all wrong -- and if you test on an actual non-visual UA you will very quickly find out why as navigation goes bits-up face-down.

Throwing more tags (HTML 5) and pointless attributes (aria) at it resulting in massive code bloat is not the huffing answer! Bothering to learn the entire reason HTML was even created in the first damned place is!

Hence the saying:

If you choose any of your semantic tags based on their default appearance, you're choosing all the wrong tags for all the wrong reasons.

It MEANS fieldsets to wrap user interactable elements (input, select, textarea) that the users can change values of. It MEANS lists around SHORT selections or bullet points (as in a grammatical bullet point, not "hurr durz it has a dotsy befours itz"). It means tables for tabular data leveraging CAPTION, TH, THEAD, TBODY, TFOOT, SCOPE, HEADERS, and AXIS and not "tables for layout".

NEWS FLASH -- There are more tags that go into tables than just TR and TD... I see one more TD+B doing TH's job, or TD+COLSPAN doing CAPTION's job I'm gonna puke.

Hell, the derps who started not using tables at all are another example of those 'misunderstandings" from a lack of comprehension. When we were told "don't use tables for layout" that didn't mean "never use tables" -- but look at the default skins in some forum softwares where things like index pages are clearly tabular data with obvious columnar headings (TH SCOPE="col" inside THEAD) with uniform rows of data -- status, title (TH SCOPE="row"), views, replies, poster with date and time. That's tabular data so <table> is in fact the CORRECT semantic tag if you care about accessibility.

But no, just because they didn't understand that "don't use tables for layout" DOES NOT MEAN "don't ever use tables" they sleaze out a tag soup of meaningless DIV with endless pointless halfwitted 'classes for nothing' and fragile layout trickery resulting in an inaccessible mess. Particularly for braille readers, screen readers, search engines, and other non-visual UA's

If you use semantic markup with logical document structure, you pimp slap EVERYTHING into behaving properly... "Semantic markup" of course being a sick euphemism for "using HTML properly" that was invented so as not to offend the mouth-breathing halfwits who still can't extract their cranium from 1997's rectum, STILL vomiting up presentational HTML 3.2 style browser wars mental-midgetry and having spent most of the '00's and first half of this decade slopping 4 tranny's doctype atop it, and today wrapping 5's lip-service around the same broken, outdated, halfwitted markup methodology.

There's a reason the design process I advocate is progressive enhancement:

1) Start out with your content or a reasonable facsimile of future content in a flat text editor (notepad or any common notepad replacement will do) as if HTML, CSS, and screen layout doesn't even exist! Organize it so it makes sense if you were to just print it out flat.

2) Mark up that content semantically saying what things ARE, NOT what they are going to look like. This creates your logical document structure and should leverage those headings and rules to make a navigable page. Since this is the semantic stage DIV and SPAN should not be used here, nor would classes or ID's be applied yet. You do this right and ALL user agents should be getting a useful page and you are DONE worrying about non-visual UA's. That's it, right here, no more worrying about screen readers, braille readers, search engines. You do this point right you're DONE!

3) Create the desktop screen media layout -- remembering to use the bloody media="" attribute on your stylesheet link. If you use <style> or style="" you're not maintaining separation of presentation from content so you're f*ing up royally. ANYONE telling you to use those needing a quadruple helping of sierra tango foxtrot uniform. At this point you can add DIV, SPAN, ID and classes where and as needed, but ONLY as needed as hooks for presentation or internal navigation without saying what that presentation IS**!

Maintain that separation of presentation from content -- just because something is 'red' or 'centered' right now doesn't mean it will be when printed, or when media queries kick in... if you're using that OOCSS mentality (emphasis on the mental) with such dumbass classes as "w3-red", "xm-s-4", or "text-center" you've COMPLETELY MISSED THE HUFFING POINT AND ARE DOING IT ALL WRONG!!!

There's a reason I call it bootcrap... universally html/css "frameworks" are broken bloated inaccessible trash that do nothing but make you work harder, not smarter! You want proof that W3fools has ZERO business telling people how to use web technologies? Just look at w3.css!

I say start with desktop layout because we can't target legacy desktops with media queries -- the people saying "mobile first" utterly missing that little detail which is why their pages are often a middle finger to legacy UA users. What we CAN customize for later should not be our starting point -- what we CAN'T customize for should be!

For maximum accessibility said starting desktop layout should be elastic (everything possible declared in EM so the layout auto-sizes to the user preference), and semi-fluid (shrinks/expands inside a max-width). If you aren't doing that, you've screwed up.

4) Once you have that working make it responsive with media queries. Shrink the window, figure out when it breaks, add a query to re-arrange... lather, rinse, repeat.

5) Then and ONLY THEN enhance the already working accessible page with JavaScript. Hence another great saying if you care about building accessible websites:

If you can't make a fully working page without JavaScript first, you likely have zero business adding scripting to it!

Hence if you're building with things like CSR, you've already screwed the pooch. If your site loads a blank page with no content scripting off/blocked, hello instant WCAG violation.

NOT saying scripting itself is a violation, but if it doesn't work at all providing at least the content without scripting, it most certainly is!

That's progressive enhancement with a content-first semantic approach, takes care of pretty much all non-visual UA woes, gives you a roadmap for the path to building accessibly, and means the need for 'testing' is kept to a minimum as you're not only using proven techniques, you also end up bothering to pay attention to the INTENT of HTML and why Tim Berners-Lee created the damned thing!

... and it's disturbing how many developers come up with lame excuses to justify not following it, or refuse to learn or accept any of the concepts behind it. From artists under the DELUSION they know what design is, to front-end coders who wouldn't know proper semantics from the hole in their backside, to back-end coders working in languages who's entire purpose is to output HTML saying "Well I don't really know HTML, that's why we have a front-end coder". NONE of these clowns knowing enough about HTML, CSS, accessibility, emissive colourspace, separation of presentation from content, or anything else about websites to have any damned business working on them!

The end result being any of their work telling large swaths of potential users to go perform anatomically impossible acts upon themselves.

But Joe forbid anyone responds in kind...

Accessibility isn't hard. PEOPLE make it hard through a mix of crying "wah wah, I don't wanna learn" and sleazy scam artists peddling flashy but useless bling to site-owners who don't know any better.

Throwing more tags and attributes into HTML or adding 'testing tools' isn't going to fix that fundamental lack of understanding.