20 Replies Latest reply on Sep 10, 2010 9:14 PM by shooternz

    Using Speech Analysis to create subtitles


      Does anyone know if its possible to take the results of a Speech Analysis transcription in Premiere Pro and export it as a Subtitle file to be used in Encore?


      Mainly, I'm thinking of two possible uses for the Speech Analysis function, that I can use in our workflow.


      1) Creating subtitles with the click of a button.  We have a script for our video, which is narration only - with the help of a reference script, Speech Analysis reproduces the script nearly 100%, and now has timing associated with each word.  Is there any way to take this timing and turn it into something that is useful for subtitle purposes?

      2) Creating a transcription for clients.  If we had an interview, and - without a script - did a Speech Analysis, [which I imagine would have varying amounts of success, but may be good enough] - how can we export that analysis, with some timing for reference, for the client to use.

        • 1. Re: Using Speech Analysis to create subtitles
          Todd_Kopriva Level 8

          Here's a post that links to two different workflows for using the speech-to-text data:

          "searchable video and XMP metadata: another way"


          It doesn't provide a simple turnkey answer to your question, but I think that it'll point you to some resources that will help you to put together some of the pieces.

          1 person found this helpful
          • 2. Re: Using Speech Analysis to create subtitles
            rw-media Level 1

            Great, thanks for the links, Todd, I'll take a look and see what I can figure out.

            • 3. Re: Using Speech Analysis to create subtitles

              I took a look at the recommended link, because I'm very interested in the Speech Analysis -> Encore Subtitle bit, and as far as I can tell... you haven't answered the question AT ALL!


              I know that Adobe wants to get its customers to use OnLocation and Story, etc., but what about the folk who simply want to make a DVD with subtitles in Encore from video edited in Premiere?!?!  Seriously, I don't need OnLocation and I sure as heck don't need Story.  I shoot weddings, etc., which (oddly enough) are real-life, not scripted.


              Y'all touted the Speech Analysis capabilities (you've done it since CS4), but you've totally neglected the most obvious use of all that text (subtitles) for the much more arcane and probably hardly ever used "searchable online Flash".


              Okay, sorry, I know that this has turned into a rant, but I really do want to know...


              Is there NO way to use the analyzed text for more than search convenience?!?!


              Thanks, in advance, for any useful answer!

              • 4. Re: Using Speech Analysis to create subtitles
                Todd_Kopriva Level 8

                I pointed to a set of tools with which you can build what you ask for. There is no one-click subtitle feature. If you'd like to request more convenient subtitle support, please file a feature request.

                • 5. Re: Using Speech Analysis to create subtitles
                  RavenMouse Level 1



                  I may have missed something, but from what I saw I beg to differ...


                  At the very least, it would be nice to find a way to export the text and the associated time-codes (I know that I can right-click copy the text), these could then be edited and imported into Encore.  I failed to see anything in the link you provided that came even close to creating any file useable (even after additional work) for the creation of subtitles.  If it was in there, I apologize for miising it, but could you be so kind as to give me a clue where it was so that I can re-visit it?


                  As for the feature request suggestion... an obvious idea and more or less appreciated.  However, I can pretty much guarantee you that this feature has been requested for a long while now (since before CS4, probably).  Apparently y'all aren't interested in the "small" folk, the ones who don't work from scripts, and have steered the Suite away from simple, obvious integrations to the ones that the guys with the big money would want.  Good business decision, I suppose, poor customer support.  Anyway, it'll be quite some time before I've paid off the latest upgrade (CS3 to CS5 in my case) and am not looking forward to spending several hundred MORE dollars to see if, just maybe, this feature finally finds it's way into CS6 or 7.


                  Thank you for the prompt response, but it isn't particularly helpful. 8^}

                  • 6. Re: Using Speech Analysis to create subtitles
                    RavenMouse Level 1

                    One technical question that should have occurred to me already...


                    If Encore can use the XMP metadata to create a "searchable Web DVD," why on earth can't the same info be used to create subtitles?  The time-code and words are clearly there...


                    Anyway, if a techie has an answer to this, I'd certainly appreciate knowing!



                    • 7. Re: Using Speech Analysis to create subtitles
                      Todd_Kopriva Level 8

                      Again, I pointed to the tools that are available to do this now. If there's too much manual effort involved for you to use these tools now, I understand. But don't be upset with the person who points to the available tools.


                      Regarding the utility of filing a feature request about subtitles: Try it. I don't think that I'm giving too much away to say that we're looking closely at improvements in this area right now, and a detailed feature request would be very welcome and helpful.

                      • 8. Re: Using Speech Analysis to create subtitles
                        RavenMouse Level 1

                        I'm not shooting the messenger, I'm questioning the message.  You said you pointed to the "tools that are available to do this now," but I honestly did not see a way to go from what you want us to be able to do to what the original question poster and myself are wanting to do.  That's all.  Like I said in the post before this, if I missed it, I apologize.  However, I don't think it's there.


                        Feature request?  You obviously didn't "get" my point about having to wait for über-expensive upgrades, but thanks anyway.

                        • 9. Re: Using Speech Analysis to create subtitles
                          RavenMouse Level 1

                          I've been fiddling with this some more and see that if one creates a Flash output from Encore using a video from Premiere on which speech analysis has been done, it creates an xml file for each section (chapter) with all the text info in it.


                          This is good news for folk like me who want to create a text file for use with Encore subtitles, but...


                          The info in the .xml file looks like this-


                          <rdf:li xmpDM:startTime="9771741504000" xmpDM:name="Jonathan" />
                          <rdf:li xmpDM:startTime="9936851904000" xmpDM:name="Amber" />
                          <rdf:li xmpDM:startTime="10125077760000" xmpDM:name="thank" />
                          <rdf:li xmpDM:startTime="10188581760000" xmpDM:name="you" />
                          <rdf:li xmpDM:startTime="10211443200000" xmpDM:name="all" />
                          <rdf:li xmpDM:startTime="10249291584000" xmpDM:name="for" />


                          My problem is that I cannot figure out where the "startTime" values come from.  I've tried to equate them with frame values, etc., with no luck.  If anyone can tell me what the values mean, I could then fairly easily parse the XML file into a suitable text file for subtitles.


                          Thanks in advance for ANY and all help!

                          • 10. Re: Using Speech Analysis to create subtitles
                            rw-media Level 1

                            Interesting, I'm not sure what those time values are referring to.  Thanks for looking into this further.  Can you post the actual timing of some of those words (maybe in HRS:MINS:SECS:FRAMES) for reference?


                            Even if you were to figure out the timing system and translate it to something more useful (ie, SRT) - I think the next problem would be is that you have to identify the start and end of each line of subtitle.  As it is now, each word has a start time associated with it.  I don't think you could automate that in whatever you script you write to parse the XML, unfortunately - that'd be a case by case thing.


                            A nice feature for either Encore or Premiere, going forward, would be a Subtitle tool which uses the speech analysis engine.   Looking at the text speech analysis generates, we'd still need a way to easily (with the click or drag of the mouse) specify a start and end to a given subtitle.  There's no reason not to take the work the speech analysis is doing (creating a timing associated with a given word in the audio) and associate it with the creation of subtitles.  As it is now, I agree with you, it seems like a neat, but rather useless, feature.  It's awfully close to being an extremely useful feature.


                            I'm surprised neither Apple nor Adobe have made any attempt to allow efficient creation of subtitles in their post-production programs - but maybe its simply that there are good 3rd party (often open source, ie Visual Sub Sync) programs out there that do that, and why spend $$ on creating something that competes when there isn't much to gain.  Even as something as simple as a waveform of a clip's audio in Encore would make creating subtitles a whole lot faster.

                            • 11. Re: Using Speech Analysis to create subtitles



                              I just picked up on your question today, but I've found a part-solution to putting sub-titles into a video, by using After-Effects (of all things). You can get all the details from a tutorial by Dan Ebbers <http://www.adobe.com/devnet/video/articles/metadata_video.html>.


                              He goes through lots of processes in this tutorial, some of which you may not need, but he includes AE scripts which load the subtitles into the vid, in AE, based on the Speech anaylsis carried out in Premiere. You then take it back to Encore for final output.


                              The subtitles work well when there is almost constant speech/narration, but if there are long pauses for action, then the subs stay in place. Not good. The AE script for this would need to be changed to take into account the time the subs are on the screen. I can't do that, but perhaps Dan Ebbers can.


                              After Effects is also the place where you can carry out manual editing of you speech analysis markers, although it's messy.



                              1 person found this helpful
                              • 12. Re: Using Speech Analysis to create subtitles
                                RavenMouse Level 1

                                Thank you very much, I'll look into this!


                                J. Sky (Kerry's husband)

                                • 13. Re: Using Speech Analysis to create subtitles
                                  shooternz Level 6

                                  Gosh Todd


                                  I am really dissapointed that Adobe did not use speech analysis in combination with a translation plug in and a voice synthesiser to automatically provide voice over-dubbing in any language that I pick from a drop down menu.


                                  I think I shall put in a Feature Request.


                                  Do you think it could be ready for CS6 please?



                                  • 14. Re: Using Speech Analysis to create subtitles
                                    Todd_Kopriva Level 8

                                    BTW, the resource that developer.teacher pointed to is the same that I pointed to. I hope that RaveMouse likes it better the second time.

                                    • 15. Re: Using Speech Analysis to create subtitles
                                      rw-media Level 1

                                      Shooternz, I'm not sure the purpose of your post other than to deride the (non-Adobe-employed) posters in this discussion.  I don't see how that's very constructive.  The requests in this thread - to anyone who has spent time working on subtitles - were all very reasonable and would not be tremendously difficult to implement.

                                      • 16. Re: Using Speech Analysis to create subtitles
                                        Digital-Frank Level 1

                                        Is the speect to text accuracy better in CS5?

                                        Last time I used it in CS4, it was ~60% accurate and I spent a lot of time correcting it.


                                        I understand how difficult the process is, but hoping that it is improved in CS5.

                                        • 17. Re: Using Speech Analysis to create subtitles
                                          Todd_Kopriva Level 8

                                          > Is the speect to text accuracy better in CS5?



                                          Yes, it's somewhat better, in large part because you can coach it with reference scripts.

                                          • 18. Re: Using Speech Analysis to create subtitles
                                            shooternz Level 6
                                            Shooternz, I'm not sure the purpose of your post other than to deride the (non-Adobe-employed) posters in this discussion.  I don't see how that's very constructive.  The requests in this thread - to anyone who has spent time working on subtitles - were all very reasonable and would not be tremendously difficult to implement.



                                            I apologise for my lame attempt at sarcastic humour.

                                            • 19. Re: Using Speech Analysis to create subtitles
                                              developer.teacher Level 1


                                              I'm not sure if this thread is really about the accuracy of the speech-to-text process, but because Frank_xrx asked, I thought I might give my own answer. I've been working with this workflow for the last four weeks now, and have finally decided that it is not possible to do serious work with the tools as they are at the moment.
                                              All the comments from Adobe about the quality of the voice-recording affecting the accuracy of the text analysis are true, but, as others have noted, when your sound-track has spontaneous speech you cannot always separate one voice from the background, nor have recording-studio quality all the time. In such cases the text that is produced will simply not be accurate enough to use, whether you have attached an accurate script or not. Also, this speech analysis process does not like long pauses between speakers (e.g. if you have any action in your video, this will really mess up the timing of text that is produced).
                                              However, above and beyond all this, the main flaw in the whole process, from my point of view, is that even with a word-perfect script typed into Story, attached to video clips in On Location, and then "speech-analyzed" in Premiere... if the resulting text has omissions or mistakes, there is almost nothing you can do to edit them. OK, I know there is an editing facility but it is a sham. If you add missing words into the text, then these words appear in the text, but they never get a time stamp, so they never appear in the exported xml-file, and they can never be used to "go to" a particular point in the video. If, like me, you want to find out where and how many times certain words are used in your video, then none of these newly added words will be found! Also, if you look closely, you'll find that you cannot add anything before the first word of the analyzed text that is produced in Premiere, nor can you add anything after the last word of that text. It is just not possible (right click on the first and last words to see the greyed out items). I've had several instances where the first speaker in a scene is completely missed out of the analyzed text, and also, in my experience, the analysis process always (i.e. always) misses off the last word in each scene, regardless of sound-track quality or the attachment of a full script. When these things happen this is a major problem for me. For me, the feature of searching for and playing back individual words or phrases in a video was the main attraction of the Production Premium Suite; this was why I bought it. I'm trying to produce spoken and visual "concordances", as a step up from the text-only versions now available (sorry if that's a bit esoteric).
                                              I know that a lot of editing can be done to the analyzed text if you use After Effects. And this is great if you want to show subtitles. But anything you add, or change in AE is still beyond the reach of the xml files exported by Premiere or Encore, so, again, no changes you make in AE will be searchable (or findable) in the final video. There is another route to producing a text-searchable video, which involves putting physical cue points into the video whilst in AE, and then writing a Flash program to use those cue points, instead of using the xml-file data. I haven't tried that yet, as I kept on hoping that the (frequently) advertised speech-to-text workflow would actually work. But after four, increasingly frustrating weeks I've given up on it, simply because I can't get even a small proof-of-concept project to work exactly as advertised. Starting a serious project, with the current tools, would be unwise.
                                              I realize that there must be work-arounds for some of the problems, but at the price of this suite (even more expensive for those of us who don't live in the US), I feel that things should work as advertised. This is not Version 1.0 from a start up company.
                                              So, I hope this wasn't too verbose. I have specific requirements of the speech-to-text workflow which I've tried to explain above. If your requirements are not the same, then maybe you can get away with the work-arounds.. and... this thread was originally about subtitles, and there is some hope for those, using AE. I thought I'd still write this, though, as I need a bit of an outlet for all these weeks of frustration :o)
                                              • 20. Re: Using Speech Analysis to create subtitles
                                                shooternz Level 6

                                                At the massive risk of digging myself into an even bigger hole but also going a certain way to the reason for my previous sarcasm ( that I mistook for humour - lame as it was may be).


                                                I dont want Adobe spending time and effort working on speech analysis ( or speech analysis accuracy) and possible subtitle technologies within an NLE.


                                                I dont see speech recognitions technologies  having come very far in other applications (eg Word Processing) so why should Adobe develop it into an editing/ imaging application when their efforts would be better placed pushing forward as the "premiere" suite of choice for desktop editing and compositing.  A path they are well on the way to securing.  IMHO


                                                I dont want an overbloated application that can do all for all until it has succeeded in its primary purpose.  EDITING.


                                                My understanding is that subtitles are not well accepted in many markets in the world (eg U.S) yet one of the earlier posters in this thread is talking about sub titling "wedding" videos.  Maybe this is the market Adobe wish to throw their resources at.  I hope not!


                                                Subtitling may be best served by a plugin.  (read: 3rd party plug in).


                                                If anyone wants to know what I would like to see priority ADOBE resources applied to before next iteration:


                                                External monitoring of HD

                                                Basic Tracking within PPRO

                                                Better implimentaion of Garbage mattes

                                                Back up file control ( user preference)


                                                apart from that ...CS5 is awesome and a world beater NLE ( Suite)