It’s pretty new…
Two things that affect it seem to be 1. Jargon (not surprising), and 2. Reverb (warehouse echo, etc.). I’d say it’s been anywhere from 40-65% accurate in my experience.
A small issue is that the speech is tied directly to the frame position in the clip. If you go in to correct text, you need to do it word-by-word. If you select three misinterpreted words in a row and retype them, all three words are now tied to the clip position of the first word, eliminating the correlating markers for the second and third word.
It does seem to be more accurate when you don’t ask it to delineate voices (between different people). I haven’t found it to work well at determining the instance of a new speaker in any but the cleanest of sound environments.
This is one feature that I’m betting will be improved frequently in Adobe’s upcoming patches. Once it’s more robust, it will save us all a ton of time.
TimK,
Director, Consultant
Kolb Productions,
CPO, Digieffects