7 Replies Latest reply on May 7, 2010 6:56 AM by Colin Brougham

    Tests with Speech-to-Text transcription in PPro CS5

    Colin Brougham Level 6

      As a follow up to my earlier thread about the apparently reworked or, at least, enhanced speech transcription capability in PPro CS5, I just conducted a few tests to see if it is indeed any better than what we had come to expect of it from CS4. The results are, well, encouraging...


      QUICK BACKGROUND: I have an approximately two-minute long voiceover, verbatim from a script, professionally recorded, that was originally delivered to me as a 192kbps, 44.1kHz MP3. Since transcription won't work on MPEG assets (at least, not in the trial which is what I'm using--the files import, but the transcription "analyze" button is grayed out), I used Soundbooth to convert it to a 48kHz, 32-bit floating point WAV. I created a couple of copies, so that each one would have its own transcription metadata. I always used the High quality (slower) option.


      I ran the test four times, as follows:

      1. Just the WAV, without a reference script
      2. The WAV with a verbatim reference script as a text document (.txt), with "Script Text Matches Recorded Dialogue" option checked
      3. The WAV with a verbatim reference script as an Adobe Story script document (.astx), with "Script Text Matches Recorded Dialogue" option checked
      4. The original MP3 in Soundbooth CS4


      I've linked to a text document with the results of these tests (no editing), along with the original text from the script, if you're interested in seeing how they stack up: right-click and save as.


      As one would imagine, #1 (no script) was... not perfect. However, I think it did a decent enough job for most purposes. Granted, this was a professionally-recorded voiceover, with not competing background noise, but it was still marginally useful. What's funny to note is that the transcription process seems to trip up every now and again and then it seems to lose its momentum for awhile. It'll bobble a "difficult" word, and then for half-a-dozen words following, it's all over the place. Once it regains its footing, though, it'll be pretty much spot on for a sentence or so. Unfortunately, this repeats, but again I think you could make use of it.


      For #2, the transcription was LEAGUES better... but still not perfect. That part is bewildering to me. The transcription generated a couple weird passages--for example "Enjoy" became "and Gillian"--which I could understand if there was NO reference script. However, my impression of how the reference script should work--particularly if the "Script Text Matches Recorded Dialogue" option is activated--is that the transcription should basically copy and paste the reference script into the metadata, NOT invent words and phrases that don't exist in either the script or the dialogue/speech. This process actually preserved capitalization (though punctuation was discarded), so that means it is definitely using that reference script; there is no way to discern capitalization from the spoken word.


      I thought that maybe, just maybe, I could eliminate the anomalies by using Adobe's own script format, generated by the new Adobe Story--hence test #3. I copied and pasted the text into a new Story script, downloaded the ASTX file, and used that as a reference script as above. Unfortunately, it fared no better than the text document (I suppose this makes sense, but I was hoping...), but it was still about 98+% accurate. I'm reasonably satisfied.


      For giggles, I tried test #4, where I used Soundbooth CS4 to transcribe the original MP3 file. Since CS4 doesn't have the reference script capability, this would more or less pit the transcription engine in CS4 against that in CS5. As expected, it turn the dialogue into textual chop suey, BUT... it actually did BETTER than CS5 sans reference script did on some phrases! Observe:


        • Original script: Our Club is YOUR CLUB.
        • CS5, no script: our love is your laugh
        • CS4, no script: our club is your class


      Mish-mash that only Mad Libs could love! For the record, the tests that used the reference scripts were spot-on, capitalization included.


      So, I'm feeling better about using this for the massive documentary and archival project I'm about to embark upon. I have human-typed transcripts for most of the interviews in this project--over 100 interviews, constituting some 200-plus hours--and keeping that text with the footage is going to be a great thing. There are some flaky things--I wish the punctuation could be preserved--but all in all, the reference script addition seems to make this feature much more plausible in a working production environment.

        • 1. Re: Tests with Speech-to-Text transcription in PPro CS5
          Curt Wrigley Level 4

          Did you link the Adobe story script through OnLocation?  For some reason this is required (to my knowledge) to do an exact match.


          When you import an adobe story script into OnLocation; it creates a shot list that is linked to the script.   You then either shoot footage ot fill the shot list; OR link existing assetts to the shot list.  This creates the linkage between the adobe story script and the shot list.


          If you just add itin premire; I think it is the same as just linking a text file.   But if you have ONE assett to ONE script; perhaps you will get the same results anyway.   In Onlocation you can link multiple assets to one script.


          Then all this metadata come in to Pr script attached.


          Some more details pasted from Help:


          Improve speech analysis with Adobe Story, OnLocation, and Adobe Premiere Pro

          You can use Adobe Story, OnLocation, and Adobe Premiere Pro to create the most accurate speech analysis. Import a script written in Adobe Story into OnLocation. OnLocation produces a list of shot placeholders for each scene. Either record these shots using OnLocation during production, or link the placeholder shots to their respective video files when you import the video files into OnLocation. In either case, OnLocation embeds the text for each shot from the original script into the metadata of the shot.

          When you import the clips into  Adobe Premiere Pro, it automatically uses the Adobe Story script as a reference script. When Adobe Premiere Pro finds enough matches with the embedded script, Adobe Premiere Pro replaces the analyzed speech text with the embedded script text. Adobe Premiere Pro carries over correct spelling, proper names, and punctuation from the reference script, benefits that standard speech analysis cannot provide.

          The closeness of the match between the embedded script text and the recorded dialog determines the accuracy of matched-script text. If 100% accuracy is important, edit and revise the script text first. Ensure that the script matches the recorded dialog before using it as a reference script.

          • 2. Re: Tests with Speech-to-Text transcription in PPro CS5
            Colin Brougham Level 6

            Thanks, Curt.


            No, didn't try OnLo; just linked up the ASTX script file in PPro. I'll try that, though, and see what happens....



            • 3. Re: Tests with Speech-to-Text transcription in PPro CS5
              Curt Wrigley Level 4

              Here is a  video demo of Onlocation which shows how this is intedned to be used.   It should approach very high match with this workflow.



              • 4. Re: Tests with Speech-to-Text transcription in PPro CS5
                Colin Brougham Level 6



                I've not been able to get this to work correctly; I'm able to create the script, export it and bring it into OnLo, assign/link the clip (I'm using a cooked up clip, not a camera clip, so maybe that's the problem), and then bring the OnLo clips into PPro. However, the ASTX script does not seem to come along for the ride, for whatever reason.


                I'll try this tomorrow with some of the actual footage from the project: BetaSP captured to QT DV MOVs as of the moment. I wonder if I'm just not using Story correctly... the clips (actually, clip--I'm putting all the copy under one clip) come into OnLo but I never see the copy.


                Basically, I'm pretty happy with how the speech transcription did using just a regular ol' text file--I think it only botched about three or four words--so if this is just going to add a lot of futzing, I'll skip it, and call it good enough. However, if I can preserve the punctuation and sentence structure... hmmm...

                • 5. Re: Tests with Speech-to-Text transcription in PPro CS5
                  Colin Brougham Level 6

                  Aww, yeah baby. That's too freakin' cool for school! I don't know what was going on, but the fifteenth time was the charm. The transcript--well, you can hardly call it that any more--is basically the equivalent of copying and pasting the text (with punctuation, capitalization, sentence structure, and so on) into the metadata of the clip. It's pretty amazing, I've got to say.


                  Now, the task is to figure out how to make this a sane workflow with this amount of material to process...



                  • 6. Re: Tests with Speech-to-Text transcription in PPro CS5
                    Curt Wrigley Level 4

                    Excellent!  I wish they had a Adobe Story workflow that didnt involve Onlocation myself; but I do see the design idea behind it.  You might be able to get a clerk type person to connect up all your transcripts in Story/Onlocation ; then you just bang them out like magic.


                    This really is a huge new feature; but not one too many folk appreciate.

                    • 7. Re: Tests with Speech-to-Text transcription in PPro CS5
                      Colin Brougham Level 6

                      Thanks again for the direction on this, Curt. I agree, that for anyone doing longform documentary or narrative work, this feature is going to be a huge timesaver.


                      The necessity to use both Story and OnLocation to get the 100% matching transcript is a bit dopey. Obviously, I realize why Adobe does it: they want you locked into the ecosystem. But for a project like mine, where I already have full-text transcripts in another format, and with the sheer amount of material that needs to be processed, this feels like too many needless steps. Unfortunately, I'm sure a feature request is not going to sway the powers that be toward implementing full-text transcription from an external source. I guess I need to decide whether its worth the extra time and effort to retrofit all of the trancripts for the added benefit of 100% matching transcription, or if "close enough" is "good enough".


                      The good news, for Adobe anyway, is that the trial has convinced me that updating to CS5 is a good idea