Optical Character Recognition (OCR) refers to the process of converting visual materials (such as scanned documents) into text that can be read by a computer.
OCR technology is crucially important for non-visual users, particularly those who use screen readers (software that converts text to audio or braille). However, while OCR has come a long way, it’s not perfect. Humans still need to review the output for accuracy, particularly when digitizing documents at scale.
Below, we’ll explain the basics. First, a quick note: In digital accessibility conversations, OCR may also refer to the U.S. Department of Education’s Office of Civil Rights.
If you’ve received an OCR complaint, that has nothing to do with optical character recognition — and in this article, we’re going to focus on optical character recognition. To learn about the other OCR, read: What Is an OCR Web Accessibility Complaint?
Why is Optical Character Recognition important for accessibility?
If text is presented as a flattened (pre-rendered) image, it’s less robust. Screen readers cannot announce the content, and people with visual impairments may be unable to access it.
By creating a text version of a visual document, you improve the experience for a wide range of users:
- People who use screen readers and other assistive technologies (AT) can read the document.
- People who are learning a second language can automatically translate the document into their native language.
- Users can magnify the text or change the font to improve readability.
- Search engines can understand the content and include it in relevant search results.
- People can search through large documents to find the information they need without reading every word.
These days, automated text recognition tools are fairly reliable. Artificial intelligence (A.I.) tools have improved the accuracy of OCR significantly, as A.I. can use the context of the surrounding text to translate words that are blurry or heavily stylized.
But to fulfill the requirements of the Web Content Accessibility Guidelines (WCAG), text alternatives must be completely accurate. That means that humans need to review OCR output carefully, particularly when working with PDFs and other web-delivered documents.
When using OCR, understand the limitations of the technology
Many applications have built-in OCR. Adobe Acrobat, the most popular software for building PDF web documents, includes a powerful OCR text converter that preserves the text and formatting of the original document.
Of course, the output is dependent on the quality of the source image. When using OCR features, keep these tips in mind:
- Start with a high-quality scan of the document. Make sure the scan has appropriate contrast; text recognition tools may have trouble recognizing low-contrast text (and so will human readers).
- Have a process for reviewing and editing the output. While you may not have the time to review each document word-by-word, you should at least review potential problem areas (such as heavily stylized text) for potential errors.
- Don’t assume that OCR will make a document “accessible.” Documents must also have a predictable structure, language tags, appropriate color contrast, and alternative text (alt text) for images.
To learn about the best practices of digital accessibility for web documents, read: 7 Basic Steps to Making PDFs More Accessible.
For large document digitization projects, work with an experienced accessibility partner
Whether you’re digitizing documents for internal or public use, you need to consider the needs and preferences of your entire audience. The materials must be reasonably accessible for people with disabilities; otherwise, they may not be compliant with the Americans with Disabilities Act (ADA), Section 508 of the Rehabilitation Act, and other non-discrimination laws.
If you’re digitizing a single form, you can use WCAG to follow the best practices of inclusive design. But at scale, this becomes more difficult: An accessibility partner can help you improve compliance and provide all users with a better experience.
The Bureau of Internet Accessibility provides PDF accessibility remediation services, along with website audits, 24/7 accessibility support, and training resources. To learn more, send us a message and connect with an expert.