I was wondering...
if I used high-end speech synthesis software (voices) to produce voiceover narration for a video, can I somehow further improve it in Adobe Audition CC?
I am not talking about pronunciation, speed and pitch of course (those can be improved only within TTS editors, with punctuation and tags), but just about smoothing the output to make it softer, more professional and more pleasant to listen to. I am already pretty satisfied with the results, but I am interested if there are any preset effects in Adobe Audition that can improve it at least a bit?
This is a computer speech, I am of course not having any illusions that it can be perfect.
Thank for reading this!
Absolutely couldn't tell without listening to some of it. The one thing that I can be absolutely sure of though is that there's no preset for it. All audio is different, and presets are only ever a starting point for the real treatment you might need.
In principle though, it might be possible to make computer-generated speech a little easier on the ear - just by adding a little artificial ambience to it, for instance. Still need to hear it first, though.
here is the video I am pretty sure it was generated by TTS, and then post-processed: (link)
But I really do not know anything about sound editing (yet). Anyway, I will try upload my test file somewhere and post the link here.
here is the uploaded sample WAV file: https://soundcloud.com/user302535560/text-to-audio1
You won't actually improve on the speech, but I think that the total lack of ambience isn't going to make long-term listening that easy. So a very small touch of reverb, and possibly a little background noise, will probably improve things a lot. Yes I know that sounds counter-intuitive, but just try it!
In fact it does makes sense, my problem with synthesized speech is that it's too flat/clean (not a surprise).
As I am new to Audition (used it yesterday for the first time, actually), would it be a problem if you tried to list which reverb and background noise settings/effects exactly to use (make a screnshot or whatever is the easiest way). If it's a hassle, then of course do not bother. I already have valuable guidelines, owing to you.
Thank you for the help so far!
On the speech itself I used the Convolution Reverb with the 'classroom' impulse. Set the mix to 20% and also set the room size to 20%. This will make it sound as though it's synthesised in a room...
As for suitable background ambience - well strangely enough that's not so easy. In the end I downloaded the Adobe Ambience 1 file (see Audition Help menu), unzipped it and used the Ambience Air Conditioner 180 01 file, but with the Notch filter to remove the pitched hum. I pulled down 1431Hz, 297Hz,441Hz, 518Hz and 882Hz all at around -36dB and that made it more 'general' sounding (you may have to mess about with that a bit - it might be improvable), The most important thing about this though is the level it should be at, which should be no higher than -55dB below the speech peak level - in other words, you should hardly hear it.
To achieve all of this, you need to put all the files in Multitrack view in separate tracks, and use the Mixer to add the reverb to the speech, and the Notch filter to the ambience. You can loop the ambience as many times as you need it.
The only other thing I'd say about your file is that it's way too fast to learn from! People assimilate information in the gaps between sentences, and with this file there are hardly any - it needs pacing.
Alright, all is clear
I located all the options and downloaded the Ambience file, so I will play around with it.
Thanks! If there is a way I can give you a plus/positive review on this forum for your contribution and detailed help, please let me know.
PS Pacing is the next step, I first had to generate the TTS for all sentences at once. I will break them down later, in the video editing/synchronisation phase to make some room for information assimilation