Artificial intelligence (AI) has the potential to revolutionize digital accessibility, and automatically generated captions are a particularly strong use case.
People who have hearing disabilities may need captions to understand content, but creating those captions takes time. If you’re publishing videos to YouTube or Facebook (both of which have built-in automated captions), why not let AI handle the hard work?
There’s actually nothing wrong with using AI as a starting point, but at this point, humans still have an important role to play. Here’s why.
Automated captions are fairly accurate, but captions must be perfect to be useful
Under perfect conditions, AI captions (or automated speech recognition, often abbreviated as ASR) are fairly accurate. One 2022 study from California State University (link leads to a PDF) examining various platforms found an average AI-generated caption accuracy rate of 89.8%, and a 2025 report from 3Play Media found that certain ASR engines could reach accurate rates of about 93% as measured by overall word error rate.
Those are impressive figures. But consider this: How would you grade your experience if you watched a movie in which 10% of the dialogue was completely incomprehensible?
You’d probably feel frustrated — and that’s exactly how folks with disabilities feel when captions are inaccurate or incomplete. Captions need to be perfect (or as close as possible). Otherwise, the audience misses part of the message.
Even when automated captions provide a perfect readout of dialogue, they may miss nuance
AI systems fundamentally struggle with elements that require deep contextual understanding. Generative ASR tends to try to fill in transcripts according to what’s most likely, rather than what the video actually contains. When the audio isn’t crystal clear, AI tends to stumble.
ASR models are certainly getting better — but if a video contains anything other than basic dialogue delivered directly to the viewer, you probably need human judgment to write accurate captions. That’s especially true if a video contains:
- Significant background noise.
- Diverse accents.
- Highly technical jargon or brand-specific terminology.
- Cultural references.
- Emotional nuance (such as sarcasm).
These elements are vital for a comprehensive and equitable viewing experience for individuals who are Deaf or hard of hearing.
Does WCAG require perfect captions?
The Web Content Accessibility Guidelines (WCAG), widely considered to be the international standard for digital accessibility, requires captions for all pre-recorded media under Success Criterion 1.2.2 (Captions - Prerecorded), a Level AA criterion. WCAG also requires captions for live content under Success Criterion 1.2.4 (Captions - Live), a Level AAA criterion. Learn about the differences between WCAG conformance levels.
WCAG Level AA is the basis for compliance with the Americans with Disabilities Act (ADA) and other non-discrimination laws. That means that for most businesses, captions aren’t optional for pre-recorded content.
WCAG doesn’t specify a minimum level of accuracy for captions. However, the guiding principle is that captions must be accurate and equivalent to the audio content. That means that captions should include all dialogue, along with non-speech information conveyed through sound, such as speaker identification, sound effects, and musical cues that are important for understanding the content.
The bottom line: If your captions fail to include non-essential info (such as a musical cue that doesn’t affect the meaning of the content), you can still conform with WCAG. But overall, you want captions to be as accurate as possible — any missing context could prevent WCAG conformance.
Related: Does WCAG Require Live Captions?
Follow these tips to make sure that your captions conform with WCAG and other accessibility guidelines
You can certainly use AI to reduce the time you spend creating captions. Just make sure that you follow these tips to ensure that your captions are useful for real, human users:
- Wherever possible, start thinking about captions when drafting your video scripts. You may not need to do much extra work to add captions if you’ve already got a document with all of the dialogue.
- Review all AI captions carefully for errors and hallucinations.
- Make sure your captions include relevant sound effects and musical cues.
- When multiple speakers appear on screen, make sure the captions identify the speaker.
- Don’t forget to add transcripts. Captions and transcripts serve different needs, and they’re both essential for accessibility.
If you’d like guidance for building a web accessibility initiative, we’re here to help. Contact the Bureau of Internet Accessibility for a free consultation with an accessibility expert or get started with a free automated scan powered by AudioEye.
