4 Replies Latest reply on Apr 12, 2013 4:49 AM by Floh-VerbaVoice

    Why are the punctuation marks lost after speech analysis?



      I've got a problem. We successfully added a dialogSequence to the metadata of a video file to align the text to the audio by using the Media Encoda via Premiere Pro. The alignment works fine, the only problem is that the punctuation marks are lost after the speech analysis. When the transcript is added manually via Adobe Storyboard, the punctuation marks are not lost. Do you have a idea, why the punctuation marks are lost if we add the text directly to the meta files?

      Best regards and thank you in advance


        • 1. Re: Why are the punctuation marks lost after speech analysis?
          Jörg Ehrlich Adobe Employee

          Hi Anna,


          you might be mixing up two different workflows:


          1. If you are attaching an Adobe Story script to the video and then trigger the speech-to-text, a script alignment workflow is happening. This means that the s2t output is only used to find time codes which could be attached to the story script. In this case the workflow works because the output is simply the original Story script plus timecodes.
          2. If you are creating a custom language model file from a transcript in the “Analyze Content Dialog” and then run the speech-to-text, you will get speech-to-text raw output as a result. This raw output does not contain punctuation because it is not supported by the engine --- note, that such a feature is quite hard to support for a S2T engine across different languages.


          The general advice would be: If you have a Story Script or are able to create it from your material, you should always go this way and follow workflow “1.” above. This workflow is always much more accurate than in case “2.” where the script information does not get aligned but is used to create a language model which makes the s2t process more accurate but not as good as in the alignment case




          • 2. Re: Why are the punctuation marks lost after speech analysis?
            Floh-VerbaVoice Level 1

            Hello Jörg,

            I'm colleague of Anna. I'm not sure if I understand you.


            I developed a tool and implemented following C/C++ code:

            cout << "Retrieving metadatas..." << endl;



            cout << "Injecting transcription into metadata..." << endl;

            meta.AppendArrayItem(kXMP_NS_Script, "dialogSequence", kXMP_PropValueIsArray, NULL, kXMP_PropValueIsStruct);

            meta.SetProperty(kXMP_NS_Script, "dialogSequence[1]/xmpScript:character", "SPRECHER", NULL);

            meta.SetProperty(kXMP_NS_Script, "dialogSequence[1]/xmpScript:dialog", transcription, NULL);

            cout << "Writing metadatas..." << endl;



            This is how we attach text into video file. Is this what you call "custom language model" (Workflow #2)?


            cu Floh

            • 3. Re: Why are the punctuation marks lost after speech analysis?
              a.rieder Level 1

              Hi Jörg,


              thank you for your answer.  However we are using the the first workflow you described.

              When the transcript is added to the video manually via Adobe Story and Adobe OnLocation the speech recognition works fine (output: time stamps to every spoken word, punctuation is not lost).

              To create a more automated workflow we want to skip the steps "creating a Adobe Story Script via Adobe Story" and "linking the Video to the Adobe Story script via OnLocation".

              Therefore we want to include the transcript to the metadata of the video by using an script before importing it to premiere and starting the speech recognition. To get the same result we looked at the XMP-Data of an Video, which had the transcript included manually (workflow described above) and added the transcript to the same position (code snippet in Flohs answer).

              When the video is imported to premiere afterwards and the speech analysis is started we get the following results: timestamps for every spoken word, no words are lost, but punctuation marks are lost.

              When starting the speech analysis we are using the same settings: Speech (checked), Language (English), Quality (high (slower)), Reference Script: None.

              Do you have an idea why the punctuation marks are lost? Are the more parameters we have to consider when inserting a transcription to the XMP of a video?

              Thank you for your help


              • 4. Re: Why are the punctuation marks lost after speech analysis?
                Floh-VerbaVoice Level 1

                We found out why punctuation were lost. After I replaced '\n' by ' ' in transcription, now no punctuation is lost and we got output as we need it.


                Just curious about it: Why doesn't analyse work with newlines?


                cu Floh