Speech to text

Creative Community Conversations

Speech to text

Andy Field replied 9 years, 1 month ago 16 Members · 40 Replies

Andrew Kimery
November 25, 2014 at 9:24 pm

[Oliver Peters] “I’ve done a lot of interview-based docs, web videos and marketing pieces and almost never have a transcript. I work with the material in front of me. I’ve found that transcripts can be a false aid, since often times things that looks like they would edit together well have the wrong inflections, so I prefer to work without transcripts. “

I agree that doing a paper edit from a transcript can lead to impossible scenarios some times, but I hate not having a transcript because I can speed though a transcript much faster than watching an interview.
Oliver Peters
November 26, 2014 at 1:05 am

[Andrew Kimery] “because I can speed though a transcript much faster than watching an interview.”

I believe there is absolutely no substitute for watching the interview in full. As an editor I find it essential in getting the right feel for the subject. There are often things said, where the emotion is more important than the actual words. You don’t get that from transcripts and can’t shortcut the process.

– Oliver

Oliver Peters Post Production Services, LLC
Orlando, FL
http://www.oliverpeters.com
Neil Goodman
November 26, 2014 at 1:41 am

[Oliver Peters] “I believe there is absolutely no substitute for watching the interview in full. As an editor I find it essential in getting the right feel for the subject. There are often things said, where the emotion is more important than the actual words. You don’t get that from transcripts and can’t shortcut the process.

– Oliver
“

I agree, paper cuts made from transcripts hardly ever work because of expression, and connotation. When a producer hands me one, I always ask if they watched the interview. usually they dont 🙁
Andy Field
November 26, 2014 at 1:50 am

Absolutely agree with Oliver…I do documentary and news work and you make the connections and solve the puzzle by logging and transcribing everything. The act of doing that helps me write the piece as a producer director editor
Aindreas Gallagher
November 26, 2014 at 10:26 pm

[Oliver Peters] “I’ve found that transcripts can be a false aid,”

I did a thing with a ton of interviews for a health insurer – it’s quick and dirty, but as long as they’re answering clear, (maybe shared questions). stringing out and chopping each of the IV’s on its own sequence with each response getting a quick clapper board top with roughly written bulletin notes for the following answer can work pretty well? doing the cliff notes top for each answer can drill it in a bit, and putting the text slug notes on V2 lets you jump scan through each IV pretty quickly later. It’s maybe better than sub-clipping that way.

It really helps if the gig is small scale enough that the director/producer is invested enough in the result to sit in for the process –
its paired brain training as much as anything. not applicable to a ken burns doco like…

https://vimeo.com/user1590967/videos http://www.ogallchoir.net promo producer/editor.grading/motion graphics
Mark Suszko
December 1, 2014 at 5:59 pm

While I agree we’re not too far off from perfect automatic machine transcription of audio bites, we’re not there today, as evidenced by whatever system YouTube is using. I’ve found some really egregious examples here and there of bad machine translation and am currently editing the best/worst of these attempts into something I call YouTube Haiku translation poetry.

It was inspired by turning on the captioning for a clip on doing a specific plumbing repair. YT translated the man’s slight Canadian accent into garbled passages that read like Zen Koans.

In just 5 years or less, this will be greatly improved, as we’re heading into another level of available processing power, thanks to the stimulus of Big Data projects like the Square Kilometer Array and the Human Brain Project. I look forward to having smartphones reliably translate speech to text for the hard of hearing, but also to use their cameras and machine vision to translate sign language back to text or synthetic speech in real time as well.

Meanwhile, I have an idea of how I want to do captioning in FCPX for my shop. It would require me to vocally repeat the program audio in my own voice, into a voice rec program like Dragon or the mac’s own voice rec. That would generate a raw text file, which I’d then process thru maccaption and probably Compressor. Is anybody else you know if doing it that way already, because it can’t be that original of an approach. By re-speaking the dialogue, I’m hoping the voice recognition works better because it will be trained to just my single voice, in more isolated audio, than it would face by decoding the actual wild tracks with their background noise and other distractions/interference.

ANYTHING to avoid typing transcripts or paying for expensive transcription services.

Also, does FCPX output a legal embedded closed caption track within a broadcast codec on it’s own at this point, or not?
John Rofrano
December 2, 2014 at 12:47 pm

[Mark Suszko] “Also, does FCPX output a legal embedded closed caption track within a broadcast codec on it’s own at this point, or not?”

That’s what I wanted to know as well. I don’t think so. The word “caption” brings zero hits in the FCP X help file. I believe you need to add the captions with Compressor and an SRT file as far as I can tell. I’d love to be proven wrong.

~jr

http://www.johnrofrano.com
http://www.vasst.com
David Roth weiss
December 3, 2014 at 8:33 pm

[Oliver Peters] ”
I’ve done a lot of interview-based docs, web videos and marketing pieces and almost never have a transcript. I work with the material in front of me. I’ve found that transcripts can be a false aid, since often times things that looks like they would edit together well have the wrong inflections, so I prefer to work without transcripts. OTOH, transcripts (with a way to match locations in the media) can be a get help, when the producer says, “What about statement XYZ? I seem to recall them saying that.” If they can give you a way to find it based on the transcript, then it’s easy to call up and review.
“

Hey Oliver, after 40-years of making long-form docos, all with hundreds of hours of interviews, I can assure you that having time-coded transcripts is a huge timesaver in post, because it allows the editor to take advantage of the non-linear random access functionality of their NLEs. While I do agree with you that transcripts are most certainly imperfect on there own without first correlating them with the source material, once that step is done the post process truly becomes non-linear, and thus much faster.

When the printed transcripts are returned from the transcriptionist the best practice is to playback the material while following along in the transcript (this is close to realtime and is essentially linear functionality). I use a highlighter to mark the best soundbites on the printed pages, and I insert markers in whatever NLE I’m using – and, on that initial pass I can usually, but not always, discern which inflections cut together and which will not (sometimes you just have to try an audio cut to be sure).

There are innumerable advantages to the method above, but the primary advantage is that, once you’ve correlated the transcripts with the interview in your NLE (i.e. linear), from that point on, you can then quickly jump at hi-speed (i.e. random access) to any point you’ve previously marked – now you’re using your NLE as it was designed, as a truly non-linear random access device.

In addition, everyone working on the project, from the secretary up to the Executive Producer, can have their own copy of the timecoded and marked transcript to refer to, even if they are in the field and don’t have a computer nearby, meaning that everyone is “always on the same page,” both literally and figuratively. This can be a huge advantage to almost every department involved…

Does this make sense?

David Roth Weiss
Director/Editor/Colorist
David Weiss Productions

David is a Creative COW contributing editor and a forum host of the Apple Final Cut Pro forum.
Oliver Peters
December 3, 2014 at 11:43 pm

[David Roth Weiss] “Does this make sense?”

Sure. I just never found transcripts to be all that helpful to me personally in shaping the story. For the rest of the process, a bit more. Especially when you need to go back for an alternate dive. One of the things I liked about FCP 7 – and that I sorely miss in X – was the extensive use of custom notes columns, as well as how marker text was identified on the timeline. When I would break down interviews in FCP 7, I would add lengthy text to each marker. Right-clicking the timeline exposed a pulldown of all the marker text in a submenu. Quite handy.

– Oliver

Oliver Peters Post Production Services, LLC
Orlando, FL
http://www.oliverpeters.com
Walter Soyka
December 3, 2014 at 11:47 pm

[Oliver Peters] “When I would break down interviews in FCP 7, I would add lengthy text to each marker. Right-clicking the timeline exposed a pulldown of all the marker text in a submenu. Quite handy.”

Oliver, that sounds an awful lot like metadata… are you sure you were doing such a thing back before 2011?

Walter Soyka
Designer & Mad Scientist at Keen Live [link]
Motion Graphics, Widescreen Events, Presentation Design, and Consulting
@keenlive | RenderBreak [blog] | Profile [LinkedIn]

Page 3 of 4

← 1 2 3 4 →

Reply to this Discussion! Login or Sign Up

Creative Communities of the World Forums