Creative Communities of the World Forums

The peer to peer support community for media production professionals.

Activity Forums Audio Voice over sync workflow – Any VO-artists here?

  • Voice over sync workflow – Any VO-artists here?

    Posted by Johannes Schwarz on December 23, 2012 at 11:23 am

    Hi everyone,

    here is the thing…

    WHAT I HAVE:
    I have a (huge) animation project (70 episodes = 4 hours total) and 3 months left on it (then I leave for a 14 month vacation, yey).
    I have the script in 7 languages. The animation is made and cut to correspond to the original language.
    The other 6 languages need to be recorded by voice talents and synchronized to correspond to what is happening in the animation. There is no lip synching. It is an educational project where elements may also be static for 20 secs or so. But still the voice needs to correspond to what is on screen (especially when it cuts to a new scene). Given the perceived speed of Spanish speakers for example there is no way a mere reading of the script would make the languages line up in the end. There is only so much you can do by inserting pauses – so I need the speaker to take the right speed into account while he reads.

    WHAT I NEED IN THE END
    For me the best thing naturally would be getting 70 ready and cut sound files I can simply add as audio tracks to the project

    HOW DO I GET THERE?
    So…What would be the general production workflow here. What do producers provide their voice talents with so that they can do their magic and sync the audio as they record their voice. Obviously they need the video, but:

    a) – would they use a “naked video” (just the video really) keeping the video in the corner of their eyes as they read from a script (a possible indication that voice talents have super powers).

    b) would they need the video with timed subtitles – karaoke style. (man, to generate 4 hours of subs (6 times) would kill my schedule).

    c) would they read from the script but get flashy cue signals like a number blinking on the video – corresponding to the paragraph number they should be on in the text

    d) would they just go back and forth in a scene trying to match their voice to the things in the video and only move on to the next scene when they have nailed it. (later editing the audio – earning every penny I shell out)

    e) the unthinkable…

    If any of you have worked on either end of such a project, any info on how this is commonly and successfully done is very much appreciated.
    If there is more than one way that is acceptable, I’m looking not for the cheapest but for the one that conserves the most of my(!) time – i.e. I’m willing to pay more to voice talents or third parties even, if it leaves me with more time to devote to other aspects of the project. For my vacation is non-negotiable 🙂

    Greetings,
    Johannes

    Bill Davis replied 13 years, 4 months ago 5 Members · 18 Replies
  • 18 Replies
  • Ty Ford

    December 23, 2012 at 2:08 pm

    Hello Johannes and welcome to the Cow Audio Forum.

    I have “read to pix” before as a narrator. This was VO so lip sync was not an issue. I can’t imagine that lip sync will work out in any way with you project. If I’m missing something please let me know.

    The producer in another project had timed out the scenes, but forgot to include some breathing space at the beginning and/or end. That resulted in the VO being too close together at those start/stop points.

    So if the scene is, say, 20 seconds, consider having the VO be 19 seconds to allow half a second gap at each end, if that will be enough.

    Doing the process with audio and video will allow you to make sure the pacing is not altered.

    How long is your piece?

    Regards,

    Ty Ford
    Cow Audio Forum Leader
    (and narrator)

    Want better production audio?: Ty Ford’s Audio Bootcamp Field Guide
    Ty Ford Blog: Ty Ford’s Blog

  • Richard Crowley

    December 23, 2012 at 2:50 pm

    When I did something like that decades ago in the pre-digital era, I found (rather easily) one language that was perpetually longer than the others (German, IIRC). Because all the languages had to run concurrently (selected by the viewer during exhibition), I cut the visuals to the longest language and then slotted in the other languages (with the resulting gaps). We recorded the language VO tracks “wild” since (like your project) it wasn’t “sync”.

  • Johannes Schwarz

    December 23, 2012 at 4:23 pm

    Hi Richard,

    I see you are on dvx as well as on cow 🙂
    Thanks for your reply. In fact the base language of my project is German and I too predict it to be the longest – even though out of German, Italian, French, English, Portuguese, Spanish and Polish, I might be in for a surprise here and there.

    So if this is the way to go, then I might send my “cut-to-German” videos over to the VO talent (how have their own studio) and let them record, then cut and fill in.

    Greetings,
    Johannes

  • Johannes Schwarz

    December 23, 2012 at 4:30 pm

    Hi Ty,

    thanks for the reply. No, I don’t need lip syncing either, just VO.
    The remark about breathing space is a helpful one. I guess, given that German is usually one of the longer versions, I might cut the video to that and send it along to the VO talent for timing.

    Generally the episodes are about 3 minutes long having 10-15 scenes.
    It’s just that there are 70 episodes in 7 languages, that makes all of this a rather huge thing to put together.

  • Ty Ford

    December 23, 2012 at 6:29 pm

    Johannes,

    Yes, I think you’re right.

    I did the VO in English and had two other VO people come into do French and Spanish for a DVD about glaciers. It was an hour long show. I budgeted an 8 hour day for the record session for each language. After each page, when I went back to do the rough edits, I let the talent listen to make sure they didn’t mis-speak because I don’t speak Spanish or French.

    Not as big a project as yours on the one hand, but plenty to do.

    Regards,

    Ty Ford
    Cow Audio Forum Leader

    Want better production audio?: Ty Ford’s Audio Bootcamp Field Guide
    Ty Ford Blog: Ty Ford’s Blog

  • Richard Crowley

    December 26, 2012 at 1:30 am

    If it isn’t “sync”, it is not clear to me why the voice talent even needs to see the video (or even know the timing, for that matter)? The simple solution would seem to be to have them just do them as a simple narration/voice-over. Does it affect the voice performance to see (or “sync”) the visual?

  • Johannes Schwarz

    December 26, 2012 at 7:54 am

    Hi Richard,

    well it isn’t lip sync-ing, which would indeed be very demanding on both the translators and the voice talent. But there are no characters talking.

    What I do have however are hands pointing. So the VO talent needs to say “like these two examples over here” at the time in the animation the hand is showing. So I need some kind of timing. If the hand comes in at 34 seconds and the speaker has taken 39 seconds to say things up to that point, then I can’t really fix it (without degrading audio by speeding it up while preserving the pitch). If it took him only 30 seconds I could add some pauses maybe but even that could be a little unnatural.

    That is why I need sync-ing.

  • Ty Ford

    December 26, 2012 at 2:48 pm

    or someone with language skills to rewrite a sentence or two to make the gestures match the audio……..

    Regards,

    Ty Ford
    Cow Audio Forum Leader

    Want better production audio?: Ty Ford’s Audio Bootcamp Field Guide
    Ty Ford Blog: Ty Ford’s Blog

  • Richard Crowley

    December 26, 2012 at 11:05 pm

    Or slow down (or freeze) the video until the too-long narration “catches up” to the visuals. That is the problem with shooting video that will be “synced” to narrations that haven’t even been translated or recorded yet.

    I would find it awkward to try to translate and/or voice something technical to fit within a certain timing while maintaining the proper grammar for the language and preserving all the necessary detail. That is a judgement call only you can make. But if I were doing this I would be reluctant to tell the translator/announcer: “you must translate all of this content and make it fit into 34 seconds.”

    And especially if you don’t speak the language, you would be wasting a lot of time if it turns out that they had to make cuts to the script to fit your timing that compromised the quality of the content.

  • Jean-christophe Boulay

    January 3, 2013 at 9:14 pm

    Hi Johannes,

    Living and working in a French territory surrounded by English, I work on projects like these every day. The usual way of doing it really is what Richard is reluctant to do. If only everyone was so nice!

    usually, you cut to the initial language and others have to fit in. The result all depends on your translators. In fact, we call them “adapters” because they do more than just translate sentence-to-sentence. They work to convey the same meaning in the time at their disposal. This can take a better understanding of the languages involved but if the translators are good, they should be able to do this. If some very specific information or turn of phrase absolutely needs to make it into the translation, underlining it is important so the translator can work around it. Asking this of the voice talent themselves might be too much.

    If you’re cutting image to a longer-winded language like German or French, that is already quite a treat for adaptation. More than we’re used to, at least. If total length is not restricted, cutting with some extra breathing space should ensure everyone can get the words in. Providing the translators with precise timing information is very important. Since you’re German, I don’t even know why I’m specifying this.

    Once you have a final cut and an adapted script, all the talent needs is the video to record to. It’s important to provide this so they can alter their reading speed to match without editing, which will sound much more natural and will require less work time. The absolute luxury is to provide burnt-in timecode, which can help some talents in finding cues. No need to have the script on screen or anything, that’s just distracting and they probably know where to put it naturally more than you do anyways. Also, less work time.

    If you follow your initial ideas and some of these tips, the project should go pretty smoothly. Much much worse adaptation workflows have yielded seamless results.

    JC Boulay
    Technical Director
    Audio Z
    Montreal, Canada
    http://www.audioz.com

Page 1 of 2

We use anonymous cookies to give you the best experience we can.
Our Privacy policy | GDPR Policy