Creative Communities of the World Forums

The peer to peer support community for media production professionals.

Activity Forums Creative Community Conversations Speech analysis concept

  • Michael Phillips

    July 28, 2014 at 5:58 pm

    A phonetic based technology can also derive language identification and does not need a super computer. Nexidia already provides language ID for some of its solutions now for any of its currently supported languages. From a dialog search, I can search based on Canadian French versus France French, etc.

    Latest release of their QC product includes identifying dialects for certain languages.

    Phonetic language modeling can be done for languages that do not exist.

    Michael

  • Walter Soyka

    July 28, 2014 at 10:31 pm

    [Oliver Peters] “FCP X already analyzes audio waveform patterns. Apple owns speech recognition software in the form of Siri. It would be great to see them enhance the FCP X Event Find bar to include speech detection, much like Avid’s PhraseFind or BorisFX’s Soundbite. Potentially there may be some patent issues, but at least in theory this concept could be built upon technology Apple owns.”

    I was under the impression that Apple does not (currently) own the speech recognition backend of Siri [link].

    Also, I was under the impression that Siri uploads audio recordings and processes them in the cloud — not locally on the handset — and I know that many here are not fond of cloud-based solutions. Would reliance on the cloud be a showstopper here?

    Walter Soyka
    Designer & Mad Scientist at Keen Live [link]
    Motion Graphics, Widescreen Events, Presentation Design, and Consulting
    @keenlive   |   RenderBreak [blog]   |   Profile [LinkedIn]

  • Gabe Strong

    July 29, 2014 at 8:26 am

    Walter,

    That would depend, at least in my own opinion. If said ‘cloud solution’ included a ‘subscription only’ way to pay for the plug in, it would be a non starter for me. Just my personal opinion, but I’m not a fan of that way of doing business, and my business decision is not to spend money on subscription based ‘cloud services’. And it’s not just Adobe’s CC. I recently stopped doing business with Digital Juice as they went to a subscription based model for their content. I’ve got no problem with various aspects of ‘cloud solutions’. For example, being able to instantly download a program after you purchase it is awesome! And is ‘green’ (no discs or big manuals and packaging just
    a download and a PDF manual.). As always, the devil is in the details.

    Gabe Strong
    G-Force Productions
    http://www.gforcevideo.com

  • Walter Soyka

    July 29, 2014 at 1:50 pm

    [Gabe Strong] “That would depend, at least in my own opinion. If said ‘cloud solution’ included a ‘subscription only’ way to pay for the plug in, it would be a non starter for me.”

    I was referring to cloud processing more than cloud pricing.

    If this proposed feature were to work the way Siri works now, all your audio would be uploaded to someone else’s computers, out of your control, for analysis. And no Internet? No audio search.

    Of course, if the usage were so much more demanding (hours of speech instead of seconds of speech), pricing might have to follow processing.

    Walter Soyka
    Designer & Mad Scientist at Keen Live [link]
    Motion Graphics, Widescreen Events, Presentation Design, and Consulting
    @keenlive   |   RenderBreak [blog]   |   Profile [LinkedIn]

  • Jeremy Garchow

    July 29, 2014 at 2:25 pm

    [Walter Soyka] “If this proposed feature were to work the way Siri works now, all your audio would be uploaded to someone else’s computers, out of your control, for analysis. And no Internet? No audio search.”

    I think there’s an important distinction to make between Siri and Dictation. While, I’m sure, there’s crossover in that Siri is a learning engine for Dictation, there is a difference.

    For example, you can download an “Enhanced Dictation Engine” in the form of 784 MB that helps for offline Dictation in Mavericks.

    Siri, on the other hand, needs to look queries up on a network in order to retrieve an answer, Dictation doesn’t necessarily need this same capability.

    I’m sure that all the folks that blab at Siri helps Apple develop better Dictation skills.

    You’re right in that in order to get this working the way that Apple would want to use it, and in order to
    constantly improve this technology, the internets will need to be nearby, but there is already an “offline mode” in place (…you know, the offline mode, that you have to download from the internets…such a Catch22 with this technology).

  • Andrew Kimery

    July 29, 2014 at 3:05 pm

    [Gabe Strong] “For example, being able to instantly download a program after you purchase it is awesome! And is ‘green’ (no discs or big manuals and packaging just”

    I agree that having download now options can be very convenient but since this is the FCP X or Not forum I’m going to get all tangental and question the assumption that going the download route is inherently more green. To replace physical things Apple, Adobe, Netflix, Amazon, etc., have had to build huge facilities brimming with servers and networking equipment and all of that manufacturing of electronics comes with an environmental cost. For example, trees, if harvested sustainably, are an abundant and renewable resource where as the raw materials used in computers (and other electronics) are not and recycling a plastic disc and paper manual is much easier than recycling electronic waste.

  • David Steiner

    February 19, 2015 at 3:10 pm

    hi,

    As somebody said before, in Avid’s script integration there is a unified view of the screenplay + all the takes right inside, which allows us to see what takes include this line of dialogue, compare them quickly, takes the best take etc.

    it is true the sync has to be a manual process until the scriptsync licensing issues are resolved, but this view of takes it’s still ultra-useful, especially in fiction, when takes have different energy, varying tones, and the edit wants to use all that richness etc.

    does that “script integration view of takes” exist in FCP X or Premiere Pro? I’ve searched the net, it doesn’t seem so… any solutions?

  • Michael Phillips

    February 19, 2015 at 3:27 pm

    Script Based Editing is what the feature was called before the phonetic transcript alignment was added (Nexidia technology) at which time it was rebranded ScriptSync. While waiting for the licensing issues to be sorted out, script based editing still exists and does provide a unique view of the coverage available to me as an editor and the ability to select line(s) and take(s) for review, etc. I am editing a short right now where I manually synced the takes to the script for just this purpose. I wish Avid would take this view more seriously as the functionality hasn’t changed since it was first introduced in 1996-7 – other than “hold slated on screen, and of course the phonetic sync capabilities.

    The phonetic sync is what really brought the feature to the masses as it eliminated the one major pain point of the process – which is syncing. Transcripts can be done faster manually, but when dealing with multiple takes takes more time manually.

    But once lined, it is a great way to edit. And no, there is no such thing in other applications although Lightworks did show a script line view back in the 90’s but the Ediflex patent held up. That patent is one that Avid got in a deal when Ediflex went under and Script Based Editing was implemented from a new design. But that patent has now expired (3-4 years ago?) so there is no real reason why we won’t see similar solutions coming to market. Avid does have additional patents which refer to auto-lining using speech technology which covered the use of technologies like Nexidia, but even those are set to expire relatively soon.

    There’s a bit of history lesson in what should be a quick answer, no other NLE’s have it (at this time).

    Michael

  • David Steiner

    February 19, 2015 at 3:31 pm

    thanks… even if it’s bad news 🙂

    script integration seems obvious to me for fiction editing, but apparently it’s not!

  • Oliver Peters

    February 19, 2015 at 3:54 pm

    As an aside, one of the advantages of script-based editing in Avid is the ability to preview a series of different takes at any given dialogue line, back-to-back. A number of editors who don’t used this tool have described a different practice to me. These include William Goldenberg and Kirk Baxter. It goes like this.

    Make a general decision about where you plan to make edits, based on the dialogue lines. Edit a string-out sequence of each line from each take in succession. This means that each line is repeated for as many takes are you feel are good. Then add the next line and repeat. Organize these clips from wide to close-up for each line. Now you have one long timeline with all the options and angles for each line. This gives you a quick way to compare, as well as a sequence to go back to when the director wants to review the coverage for other options.

    Baxter then goes through this sequence and on any of the selects that he likes, he’ll raise the clip to a higher track. Once done with this copy, he’ll delete the non-selected clips and start shaping the scene with the remainder.

    Not the same as true script-based editing, of course, but still a very viable way to achieve the same result, especially when you don’t have this feature available, like in FCP X.

    – Oliver

    Oliver Peters Post Production Services, LLC
    Orlando, FL
    http://www.oliverpeters.com

Page 3 of 4

We use anonymous cookies to give you the best experience we can.
Our Privacy policy | GDPR Policy