7 Replies Latest reply on May 6, 2010 8:30 AM by Curt Wrigley

    Speech transcription, metadata, and a massive amount of footage (CS5)

    Colin Brougham Level 6

      I've been charged with developing a workflow and working environment that will capture, process, preserve and output the content of over 450 BetaSP tapes. The footage was and continues to be shot for a national documentary project that is still rolling along. Since much of the footage is truly irreplaceable (e.g. interviews with persons no longer living), and since it documents a movement of great importance, trying to save this content while simultaneously making it useable is now mission critical.

       

      I'm still trying to decide if Premiere Pro et al. is appropriate for this project, but the one part that is most intriguing to me is our good old friend from CS4: Speech-to-Text transcription. Actually, I'm using this as one of the major selling points to the producer, so I'm hoping that what I want to do can work. Yes, we know that the transcription in CS4 was abyssmal, at best, but I'm optimistic that it's been improved in CS5. It appears that Adobe added the ability to use a reference script when doing the speech analysis and transcription; this is fortunate, as most, if not all of these interviews (over 125 of them, to date) have some sort of text document transcription.

       

      What I want to be able to do is capture the tapes wholesale, perform the speech transcription with the reference transcripts, and then winnow the long clips down to chunks of useable material. From there, these will be used to generate the documentary and other programming, develop DVDs of un-cut interviews (with subtitles), and eventually create an online, keyword-searchable library of the material. Yes, lots to do...

       

      I've looked at a few of the brief tutorials and videos for this feature, and they always seem to focus on narrative filmmaking, which is enough different from what I'm trying to do that I can't be sold on whiz-bang videos alone. Granted, I would never trust this to work with CS4, but CS5 looks a bit more hopeful.

       

      Has anyone played with this feature or employed a similar workflow long enough to comment on the feasability of such a project with CS5? By the way, I'm probably going to be posting over in the hardware forum about building a couple of workstations and external media storage system for this project, so if you're interested, check over there. Frankly, I'm not too optimistic about PPro being reliable enough to handle what I'm attempting, and am seriously considering ditching it for Avid, but if I can leverage the speech transcripts and metadata, I could be convinced to put my faith in PPro.

       

      Your thoughts are welcome...

        • 1. Re: Speech transcription, metadata, and a massive amount of footage (CS5)
          joshtownsend Level 2

          The speech transciption does work alot better in CS5 but cannot hold a candle to the power house that is Avid. Maybe download the trial and see how it works for ya.

          • 2. Re: Speech transcription, metadata, and a massive amount of footage (CS5)
            Colin Brougham Level 6

            Well, I've always been a huge fan of Avid's media management. I realize it's somewhat archaic by today's NLE standards (though AMA looks like it has gone a long way toward modernizing that), but there is gobs of power there.

             

            Avid has Script Sync, too--but my understanding of it is that it's for post-production specifically. I'm hoping to be able to link up already-generated human transcripts with the computer-generated transcripts, and then maintain that data/metadata right through output to whatever media (as well as for post use). I'm less concerned with the quality of the transcript (as the reference script will guide that, as I understand it), and more with the flow of that data right through the process.

             

            I guess I'll try out the trial--I did download it, but didn't want to smurf my working installs. I'm still Avid lusting...

            • 3. Re: Speech transcription, metadata, and a massive amount of footage (CS5)
              Colin Brougham Level 6

              Bumpity bump. Still interested in more perspectives here... or is no one really using PPro for longform documentary work?

               

              Thanks!

              • 4. Re: Speech transcription, metadata, and a massive amount of footage (CS5)
                Curt Wrigley Level 4

                Speech to text sucess is highly dependant on the source.  So it may be almost perfect for some, terrible for others.   It is generally improved in CS5.  The big improvement which can make it almost 100% accurate is a matching script to compare to.  But if you dont have scripts, that doesnt help you.

                 

                Another new feature in CS5 is the text (which as you know is connected temporaly as meta data) is now transfered and usable if you output to Web in Encore.   Meaning the speech to text transcription becomes web searchable by simple encoding to "web dvd" in encore.

                • 5. Re: Speech transcription, metadata, and a massive amount of footage (CS5)
                  Curt Wrigley Level 4

                  Thought of one more feature which I believe is new in CS5.   You can set in and out points in the metadata panel.  In other words, you can cut the project using the text of the speech rather than the audio.  May or may not be helpful.

                  • 6. Re: Speech transcription, metadata, and a massive amount of footage (CS5)
                    Colin Brougham Level 6

                    Speech to text sucess is highly dependant on the source.  So it may be almost perfect for some, terrible for others.   It is generally improved in CS5.  The big improvement which can make it almost 100% accurate is a matching script to compare to.  But if you dont have scripts, that doesnt help you.

                    This is true. For the most part, the interviews are always in quiet, controlled settings, using professional equipment, so the audio is clean. Whether the person speaks clearly is sometimes another matter Since I have transcripts for everything (or most everything), constituting thousands of pages, the script matching sounds phenomenal, in theory. I guess my biggest fear was how Premiere actually copes with the added metadata; I actually had fairly good results in many instances using speech transcription in CS4, but for some reason, those transcribed clips would then cause PPro to get all wobbly and weird. Given that I'm likely going to be working with uncompressed 8-bit media (this is for long-term archival and retrieval), I don't want any unnecessary flakiness.

                     

                    Another new feature in CS5 is the text (which as you know is connected temporaly as meta data) is now transfered and usable if you output to Web in Encore.   Meaning the speech to text transcription becomes web searchable by simple encoding to "web dvd" in encore.

                    This is a fascinating aspect for me, and the brief overviews I've seen look extraordinarily interesting. A big part of this project is democratizing this content and making it accessible to professional media outlets, but also to educational institutions and to the general public. Ideally, this project will be creating a "YouTube" for content that is significantly more important than what you usually find there Having it keyword searchable as well as being accessible to the hearing impaired is a major part of our goal. If I can get at that metadata and speech transcription down the line, all the better.

                     

                    If this works, I'll later be deploying this on another documentary project, though a slightly larger one; at last count, the tape library falls somewhere around 800 pieces!

                     

                    Thanks for the reply, Curt.

                    • 7. Re: Speech transcription, metadata, and a massive amount of footage (CS5)
                      Curt Wrigley Level 4

                      I hope it works for you.  If you are workign long form projects, I think you will find CS5 is a worthwhile UG for that alone.  The 64bit base really helps with stability and robustness.  So even if the transcription stuff is a bust (i hope its not) ; I think you will like the performance/stability aspect of cs5.  

                       

                      Im not a salesman; get no commissions.  Im just  impressed with this release.