I'm working on a little something that contains very loud music and very loud voices mixed together.
It sounds great on my Logitech G35 headphones, but when playing it through my studio monitors in the living room, the voices seem to get a little drowned out by the music (The music is at a pleasingly loud level that I wish to keep, as it doesn't sound as powerful as soon as I bring it's gain down a little).
So I was wondering what is the proper way of keeping the great loudness of the music and the voices at the same time, but make the voices a little more understandable?
The music and the voices are on seperate tracks of course in my Audition file.
Here is an example of what it sounds like currently: http://soundcloud.com/stefanpanic/gameshow-trailer
Thanks in advance!
Well, there's no such thing as a proper way (every situation is different) but I suspect there are several things going on here.
First off, mixing on headphones--even the best ones--isn't a great idea and often results in a mix that sounds fine on the headphones but sounds very different on speakers. There are lots of papers out there explaining why this is (it's to do with the proximity of the headphone drivers to your ear and the lack of air between the two) but the end result is that you need to A) listen to your mix on a variety of playback systems and B) probably give more weight to your studio monitors (depending what they are) than what you hear in the 'phones.
Second, there are a few tricks you can use:
One that I play with a lot is to add some subtle EQ to the music track in the frequency range that contains most of the voice. Maybe try a 2-3dB cut between 200Hz and 2kHz--but do it by ear, not by numbers. This cuts a subtle space in the music for the voice without making the music sound particularly quieter.
Also, in a situation like this, compression is your friend. Apply a bit more comp to the voices than you might normally want and it'll help keep them standing out compared to the music. Again, exactly what setting to use will need to happen by ear. (Though I have to say that, listening to your mix, you've already got what might be sufficient compression.
Last, just play with very minor tweaks in the mix as it goes through. A small boost to the music (or the voices) as appropriate can make a big difference. I'll often use volume envelopes to make word-by-word (or even syllable by syllable) tweaks.
Hope this helps.
Okay, with my acoustician's hat on:
Headphone mixing - especially when it comes to voices in the centre of a stereo field - has always been a disaster area, and you can almost always spot when a mix has been made to play on headphones, because the vocal is invariably too quiet.
I suppose that the easiest way to explain this is to consider what happens to the central voice with loudspeakers. That voice is essentially a virtual image; there isn't a loudspeaker there to support it. So it's been created out of the off-axis responses of your loudspeakers, and is in a space that's quite a distance from your ears, compared to where it would be on headphones. So, the level of this voice depends as much on the angle of your speakers as anything, and whilst this varies considerably between different setups, it's still considerably different to the relationship that headphones have with your ears!
Traditionally in the past, it's been recommended that you should sit in an equilateral triangle with your speakers, which should be pointing towards you which means an included angle of 120 degrees between them. Despite this information being repeated all over the place since the 1950's, there's no physical basis for it at all, and most people don't have setups like that anyway. And I have to say that this really isn't good positioning for establishing a central image - that angle is too great, and you are relying on an extremely good off-axis response to achieve any level at all there.
In this day and age, what you really need is a monitoring compromise that will let you create a mix that sounds not so bad on both headphones and loudspeakers, and there are a couple of things you can do to improve the situation considerably, and get a generally better result. And FWIW, it's what I do in this situation...
The first is to alter the angle of your monitors so that the included angle is 90 degrees (a right-angle) and sit so that both of them are pointing directly at your ears. This gets you a lot closer to the monitors, admittedly, but is far more realistic as far as a compromise mix is concerned. If you do your whole mix like this, you'll find that it's a lot easier to position things in it too. And don't put anything like soft furnishings between them either - that will definitely make things worse. The second thing is that one of the important things you should always do with a mix like this, to finally establish vocal levels, is to listen to it really quietly. No, really quietly! Almost at vanishing point. What you should hear is the whole mix, but if anything is standing out (like the vocal), it will become obvious like this in a way that it simply won't when it's louder. You want it to be there, certainly - but it shouldn't be either missing or standing out too much.
Best metaphor I can think of is a portrait photograph. Make sure the music doesn't steal the focus from the voice. Volume, intensity, spectral footprint -- the voice needs to "punch" through the music at all times.
In your specific case: Work with the sibilant range (3-5 kHz) and either cut the music or boost the VO until you hear every S, K and T sound. Apply dynamic compression until you don't need to strain your hearing in the quiet parts. Don't be afraid to duck the music slightly if need be -- although roughly the same can be achieved with a multiband compressor on the master bus.
(In addition to what's already been said, of course)
The best way to mix voices on top of loud music is to turn the loud music down ever so slightly. You don't have to actually get somebody out of the water to save them from drowning, you just have to make sure their head stays above water.