Look into "Sound Driven Animation". While not exactly what you want, it may give you ideas.
scroll down the page to "Sound Driven Animation"
and a follow-up:
or Google "AS3 sound spectrum analyzerl"
My audio length is big,i want to know how to find the starting time of each word in the audio.
Is there any best way to find the timing of each word in the word ,i don't want to do maually please suggest me the better way
Yes, of course. I believe that we understand that you're working on a job that requires you to synchronize some sort of an automation to a vocal narration that, we can only assume, will be included in the final movie/video. We understand that you are seeking the most efficient way to be effective. I sketched at least the highlights of The Best Way that I have discovered (so far) and another fine person (adninjastrator) has given you His Best Shot.
I was interested enough in the possibility of my finding a better way to do my work ( which is also rather long and troubling ) that I took the time to carefully study the material he presented. Have you also taking the time to look at it?
Like I said above, what I gave you before was only the tiniest highlight of what I do. I'll now refer you to a more extended description of it that I made in response to another forum participant's question. It is found here : voice synchronization with text(words) in flash http://forums.adobe.com/thread/900695?tstart=0
There are some inherent difficulties that we all face in this kind of work. One of them is that, when we are speaking naturally, there are many places in our speech where there is no clear break between the words, like the one you can see in the "mixdown" example that I provided. The only automated way of which I am acquainted that even begins to crack that tough nut a problem is the voice recognition software, such as Dragon NaturallySpeaking by Nuance. It, however, is not intended to give us what we need for our current projects. I would expect, however, that if you're made out of big time money and have, say, fifty grand available, you might approach the Nuance people and ask them to adapt their Very Fine Tool to greatly assist us in the work that we're doing. ( By the way, I am speaking this into Dragon now and it is typing for me. )
I have developed a way by which I present written text as it is being heard in the narration. There are places in which the words are distinct and are separated by pauses, and there are times in which only phrases or clusters of words are separated by silent periods. As you will discover by reading the thread to which I referred earlier, the way I present the written text is by first covering each word or phrase of it with an opaque rectangle whose color matches the background color, then by smoothly changing its opacity from 100% to 0% at the appropriate time. In much of the work that I did earlier, before I learned better methods, my chunks of text presentation were quite large, sometimes entire lines at a time. In the new work that I'm now doing, most of it will be presented one word at a time. That is the case in the two examples will show you next.
Just a few technical details : I'm running at thirty frames per second, using classic tweens, using an easing in value of -100 and an easing out value of +100. Many of my transitions are about eighty frames long. Some are lot longer. Here is a little pic showing two sentences being displayed :
These next two pics show the display of a twelve word sentence, smoothly presented.
There was a small pause after the seventh word, and the ending of the tweens for those players allows those words to be fully displayed while the last five are still getting going.
--- --- --- --- --- --- ---
In the methods that were suggested by adninjastrator, you could obtain an amplitude graph of either just the overall volume of any waveform, or if you wanted to get really fancy, could do the same thing from only a portion of the frequency spectrum of the waveform. The method that I showed you earlier provides the first of those possible results directly and immediately. The second approach that adninjastrator presented is probably not necessary for either the work that you or I or vplusvw are doing.
You say that your audio length is big. That is not a surprise. Things Take Time. Big and valuable and important things often take a lot of time. You say you don't want to do it manually. We understand, from our own experience, that life is difficult and that we don't want to make it any harder than is necessary.
We have no way of knowing the nature of your project or how important it is to you. You could tell us a little about the first part, but only you can decide the second one. The work that I'm doing is terribly important to me. The work being pursued by vplusvw appears to be of value to that fine person.
Two of us have tried to help. We've given you our best shot. Our methods, or some related version of them, might be the better way that you seek. There is an eternal, universal truth, however. There is no Royal Road to knowledge. In so many important pursuits in life, there is no Easy Way. You have asked for the better way. I have given you the best way I know. I am willing to explain it in greater detail if that would appear to be of some value to you.
There simply was No Easy Way to build either the Golden Gate Bridge, the Transcontinental Railway, or the Hoover Dam. There is No Easy Way to do the work that I am doing. There is probably No Easy Way for you to do the work that you want to do. If it gets done at all, let along done well, there will certainly be some Serious Tediousness involved. That is for sure.
Back over to you.
; - )
Europe, Middle East and Africa