This year marks the 30th anniversary of the Americans with Disabilities Act, which was signed into law by President George H. W. Bush on July 26, 1990. The ADA helps safeguard the rights of people with disabilities, and prohibits discrimination against them on the part of public businesses and state and local governments.
Almost no one thirty years ago could have imagined the extent to which the Internet has become an essential part of daily life—and the need for digital accessibility alongside it. The ADA has been widely interpreted to apply to websites just as much as physical places of business, which means that organizations must ensure that people with disabilities enjoy equal access to their services online.
To help promote the goal of digital accessibility, Google Chrome is releasing a new automatic PDF tagging feature in the next version of the browser: Chrome 85, which is currently in beta and scheduled for general release in late August. Below, we’ll discuss how PDF tagging can help people, and what we know so far about Chrome’s PDF tagging feature.
PDF tagging and accessible PDFs
By enforcing a standardized format for displaying images and text formatting, PDFs have become one of the most widely used file types. Vice magazine once called PDFs "the world’s most important file format." This means that PDF accessibility is an important, yet often overlooked, topic for people with disabilities.
One problem is that the source of a PDF file can be either an image scan or a typed document. For example, there are two ways to convert a Microsoft Office document to a PDF: print it out and scan it, or convert it digitally. Even though both PDF files are made from the same document, the one that has been scanned will not have text that can be searched or highlighted, which makes it much less practical to use.
The good news is that you can convert non-searchable PDFs to searchable ones by using optical character recognition (OCR) (PDF) technology, which recognizes text in a digital image. This inserted text can then be converted to synthesized speech by screen readers, an assistive technology used mainly by people with visual disabilities.
But inserting searchable text is just the start when it comes to making PDFs more accessible. PDF tagging is a technique for inserting "tags" (supplemental metadata) into a PDF file, helping people using screen readers understand and navigate the document.
PDF tagging is critical for multiple reasons:
- PDF tags identify a document’s headings and overall structure so that screen readers can quickly scan through the sections of the file, similar to scanning the headlines of a newspaper.
- PDF tags outline the correct reading order for the different elements in the document. This is critical, as information that's in the wrong order isn't usable.
- PDF tags are needed to make non-textual content, such as images and tables, accessible. In the case of images, screen readers can use alternative text that provides a description of each image’s contents. In the case of tables, PDF tags help screen readers understand the best way to present the table and its structure.
Perhaps the biggest barrier to adopting PDF tagging is that it can be labor-intensive to manually tag longer documents. Tools such as Adobe Acrobat include automatic PDF tagging features, although they’re not enough for compliance on their own, as they're a good starting point but need manual revision. With the upcoming introduction of Google Chrome’s PDF tagging feature, people will have another way to automatically tag their PDF files and improve their screen reader experience.
How does Google Chrome’s PDF tagger work?
Google Chrome’s PDF tagger has the potential to make a big impact for people who use screen readers—but will also be practically invisible for those who don’t regularly use the feature.
To download a web page in PDF format in Google Chrome:
- Select "Print" from the main menu, or use the Ctrl+P keyboard shortcut. This will open a dialog box.
- Under "Destination," select the option "Save as PDF" from the drop-down list.
Starting with the Chrome 85 version, the browser will automatically insert metadata into the PDF when saving the web page, including tags for headings, lists, tables, paragraphs, and image descriptions. The Chrome PDF tagger generates these tags using the web page’s HTML code, which defines the web page’s structure. HTML includes elements such as <h2>, <table>, and <p> (denoting headings, tables, and paragraphs, respectively) that can easily be converted into PDF tags.
To create the Chrome PDF tagger, Google partnered with CommonLook, a company that builds PDF accessibility software. According to CommonLook president and CEO Monir ElRayes, the company has been working with Google for the past two years on the topic of PDF accessibility in Chrome.
Adding automatic PDF tagging to Chrome is a major step forward for digital accessibility. Google Chrome is the world’s most popular browser, with 69 percent of the global web browser market share. With the release of Chrome 85, millions of users with visual disabilities will be able to take advantage of this new feature.
There’s no word yet on how accurate Chrome’s PDF tagging will be, but it’s important to remember that automatic PDF tagging solutions like this aren’t perfect. According to Dominic Mazzoni, Google’s technical lead for Chrome accessibility, there’s already more to be done: "Future work [on the PDF tagger] includes… improving the quality of generated tagged PDFs," as well as improvements to Chrome’s built-in PDF reader.
Automatic PDF tagging—whether it’s done with Google Chrome, Adobe Acrobat, or another solution—should always be followed by manual checks to ensure that the conversion is accurate. In addition, manual PDF tagging by human experts can insert richer and more useful metadata than is currently possible with automated tools alone.