    Spoken Word, with Graphics,  Best Workflow ?

    Ken Storch

      I am working on a spoken word project, where the text will pre-exist the audio,
      and I will be using many, many stills and graphics to illustrate it (no video clips).
      The audio is being created (from the written text), likely, in short audio clips.

      I'm looking for suggestions on the best workflow for this project.

      I am envisioning a workflow that would hopefully allow
      efficient setting of in/out's, for the numerous stills, using the spoken words.
      IOW, I want to simplify placing the large number of stills on the Timeline using the spoken words in the audio clips.

      None of the tutorials, articles, and other search results I've found,  speak to this specifically (using so many stills, with text audio) .
      Most refer to use of video clips (with metadata, Adobe Story, OnLocation, etc.)
      I haven't been able to adapt any of those methods for this project.

      I have CS5 Master Collection, and limited PP experience.

      Any input welcomed.

          the_wine_snob Level 9

          Welcome to the forum.


          As your Audio Clips will likely be of varying Durations, this is how I would tackle the Project:


          • Do some general timings on the Audio, looking to get an average of the Durations.
          • Choose that, and set it in Edit>Preferences>General>Sill Image Duration. This should get you close to start.
          • Import the Audio Clips, and arrange them on the Timeline.
          • Import the Stills and do the same.
          • Obviously, they will not yet match up, perfectly, but should be fairly close.
          • Start at 00;00;00;00, and begin working with Click-dragging the Tail of Still Image #1, while holding down the Ctrl modifier key, to Ripple Edit, and thus moving all following Clips.
          • Repeat for the second, third, etc.


          Some tips:


          • In the case of all Still Images, you can let your output determine the Project Preset, say 1920 x 1080 Square Pixels, with the desired FPS. Then, in Photoshop, Scale all Sill Images to that Frame Size, so no additional Scaling will be necessary, and so that the processing overhead is held down. No since in shoving around a bunch of pixels, that will never be seen. The alternative to that would be to check Scale to Default Frame Size, but you will still have the extra pixels, requiring processing overhead.
          • I would use PSD, PNG or TIFF files, and not JPEG
          • I would use PCM/WAV 48KHz 16-bit Audio files, and not MP3's


          Good luck,



            Ken Storch Level 1

            HI Bill,


            Thanks for the reply.


            Very useful tips.


            I don't expect the proposed audio clips to be close in time to the needed in/outs - there will likely be many stills per audio clip, even for short clips.

            I could therefore set the Still  Image Duration to, say, 60 frames as a starting point.


            Since the audio is being created, it could be done all in one take, or externally (to PP) spliced into one .wav file, if that would help.


            But, isn't there some way to use the text to set in/outs?

            Say, by using the text as metadata? In Story or OnLocation?

            Or, alternaticvely, Speech Analysis to re-create the text?

              the_wine_snob Level 9

              Normally, I would either do the Audio in one take, or cut the smaller takes together into one file, but in your case, I think I'd rather actually have them as separate Clips, though tightened up, so there is not "dead air," except for natural pauses. The reason for this is that it's easier to visually adjust the Stills to the butts in the Audio Clips, rather than having to do a million Markers.


              Sorry that I had not picked up on the use of multiple Stills, for one Audio Clip. I see what you are talking about now. I thought that it woud be a 1:1 thing. Got you now, and thanks for the clarification.


              Good luck, and wait a bit, to hear from others, as there are probably many ways to "skin this cat." [Will refrain from posting image of a "hairless" cat here... ]



                Ken Storch Level 1

                Thanks again for the reply.


                I am trying to avoid millions of markers. That's why I was hoping that having the text might be of use.

                I'd even live with editing the metadata from Speech Analysis (a rather slow process to edit) but

                I can't seem to find a way of using the metadata to set in/outs for stills.


                So, if I'm hearing you right, it might be better to create the audio in small clips so that

                I can just drag the stills onto the Timeline and snap them into position?


                So, instead of a million markers, I'm heading towards a million short audio clips? hah


                That cat really is getting awkwardly clipped!


                I keep feeling that there is a better method, but my knowledge of PP in no-ways covers it.


                Still hoping for an efficient method ;>)

                  the_wine_snob Level 9

                  Now, I have not even used Speech Recognition, but know that others have. There could be some good tips coming your way.


                  Good luck, and I think that from this point on, I will probably learn something new, but that's good, right?



                    Ken Storch Level 1

                    Still hoping someone can suggest a good workflow for matching

                    many many still images to text.



                      medeamajic Level 2



                      The URL is a link to spoken word over still images. I am not sure if this is what you are trying to do or not. If not could you post a link to something that would be simular to what you want?

                        Ken Storch Level 1

                        Yes, thanks for the link. That example illustrates the project intent at the simplest level.


                        The project will be longer and more complex, with many images and graphics, transitions, effects, etc.

                        As mentioned at the top of the thread, the text will preexist the audio, and the audio will preexist the timeline.

                        I'm hoping to find a way to use the text efficiently to set the many markers.
                        None of my research has come up with a way to do this except manually setting the markers, one at a time.


                        Any suggestions welcomed.


                        Thanks for input.