Skip navigation
This discussion is locked
SteveG(AudioMasters) 5,595 posts
Oct 26, 2006
Currently Being Moderated

A quick primer on audio drivers, devices, and latency

Jun 10, 2011 3:34 PM

This information has come from Durin, Adobe staffer:

 

Hi everyone,

 

A  common question that comes up in these forums over and over has to do  with recording latency, audio drivers, and device formats.  I'm going to  provide a brief overview of the different types of devices, how they  interface with the computer and Audition, and steps to maximize  performance and minimize the latency inherent in computer audio.

 

First, a few definitions:

Monitoring: listening to existing audio while simultaneously recording new audio.

Sample: The value of each individual bit of audio digitized by the audio  device.  Typically, the audio device measures the incoming signal 44,100  or 48,000 times every second.

Buffer Size: The  "bucket" where samples are placed before being passed to the  destination.  An audio application will collect a buffers-worth of  samples before feeding it to the audio device for playback.  An audio  device will collect a buffers-worth of samples before feeding it to the  audio device when recording.  Buffers are typically measured in Samples  (command values being 64, 128, 512, 1024, 2048...) or milliseconds which  is simply a calculation based on the device sample rate and buffer  size.

Latency: The time span that occurs between  providing an input signal into an audio device (through a microphone,  keyboard, guitar input, etc) and when each buffers-worth of that signal  is provided to the audio application.  It also refers to the other  direction, where the output audio signal is sent from the audio  application to the audio device for playback.  When recording while  monitoring, the overall perceived latency can often be double the device  buffer size.

ASIO, MME, CoreAudio: These are audio driver models, which simply specify the manner in which an audio application and audio device communicate.  Apple Mac systems use CoreAudio almost exclusively which provides for low buffer sizes and the ability  to mix and match different devices (called an Aggregate Device.)  MME  and ASIO are mostly Windows-exclusive driver models, and provide  different methods of communicating between application and device.  MME drivers allow the operating system itself to act as a go-between and  are generally slower as they rely upon higher buffer sizes and have to  pass through multiple processes on the computer before being sent to the  audio device.  ASIO drivers provide an audio  application direct communication with the hardware, bypassing the  operating system.  This allows for much lower latency while being  limited in an applications ability to access multiple devices  simultaneously, or share a device channel with another application.

Dropouts: Missing  audio data as a result of being unable to process an audio stream fast  enough to keep up with the buffer size.  Generally, dropouts occur when  an audio application cannot process effects and mix tracks together  quickly enough to fill the device buffer, or when the audio device is  trying to send audio data to the application more quickly than it can  handle it.  (Remember when Lucy and Ethel were working at the chocolate  factory and the machine sped up to the point where they were dropping  chocolates all over the place?  Pretend the chocolates were samples,  Lucy and Ethel were the audio application, and the chocolate machine is  the audio device/driver, and you'll have a pretty good visualization of  how this works.)

 

Typically, latency is not a problem if  you're simply playing back existing audio (you might experience a very  slight delay between pressing PLAY and when audio is heard through your  speakers) or recording to disk without monitoring existing audio tracks  since precise timing is not crucial in these conditions.  However, when  trying to play along with a drum track, or sing a harmony to an existing  track, or overdub narration to a video, latency becomes a factor since  our ears are far more sensitive to timing issues than our other senses.   If a bass guitar track is not precisely aligned with the drums, it  quickly sounds sloppy.  Therefore, we need to attempt to reduce latency  as much as possible for these situations.  If we simply set our Buffer  Size parameter as low as it will go, we're likely to experience dropouts  - especially if we have some tracks configured with audio effects which  require additional processing and contribute their own latency to the  chain.  Dropouts are annoying but not destructive during playback, but  if dropouts occur on the recording stream, it means you're losing data  and your recording will never sound right - the data is simply lost.   Obviously, this is not good.

 

Latency under 40ms is  generally considered within the range of reasonable for recording.  Some  folks can hear even this and it affects their ability to play, but most  people find this unnoticeable or tolerable.  We can calculate our  approximate desired buffer size with this formula:

(Sample per second / 1000) * Desired Latency

So,  if we are recording at 44,100 Hz and we are aiming for 20ms latency:   44100 / 1000 * 20 = 882 samples.  Most audio devices do not allow  arbitrary buffer sizes but offer an array of choices, so we would select  the closest option.  The device I'm using right now offers 512 and 1024  samples as the closest available buffer sizes, so I would select 512  first and see how this performs.  If my session has a lot of tracks  and/or several effects, I might need to bump this up to 1024 if I  experience dropouts.

 

Now that we hopefully have a pretty  firm understanding of what constitutes latency and under what  circumstances it is undesirable, let's take a look at how we can reduce  it for our needs.  You may find that you continue to experience dropouts  at a buffer size of 1024 but that raising it to larger options  introduces too much latency for your needs.  So we need to determine  what we can do to reduce our overhead in order to have quality playback  and recording at this buffer size.

 

Effects: A  common cause of playback latency is the use of effects.  As your audio  stream passes through an effect, it takes time for the computer to  perform the calculations to modify that signal.  Each effect in a chain  introduces its own amount of latency before the chunk of audio even  reaches the point where the audio application passes it to the audio  device and starts to fill up the buffer.  Audition and other DAWs  attempt to address this through "latency compensation" routines which  introduce a bit more latency when you first press play as they process  several seconds of audio ahead of time before beginning to stream those  chunks to the audio driver.  In some cases, however, the effects may be  so intensive that the CPU simply isn't processing the math fast enough.   With Audition, you can "freeze" or pre-render these tracks by clicking  the small lightning bolt button visible in the Effects Rack with that  track selected.  This performs a background render of that track, which  automatically updates if you make any changes to the track or effect  parameters, so that instead of calculating all those changes on-the-fly,  it simply needs to stream back a plain old audio file which requires  much fewer system resources.  You may also choose to disable certain  effects, or temporarily replace them with alternatives which may not  sound exactly like what you want for your final mix, but which  adequately simulate the desired effect for the purpose of recording.   (You might replace the CPU-intensive Full Reverb effect with the  lightweight Studio Reverb effect, for example.  Full Reverb effect is  mathematically far more accurate and realistic, but Studio Reverb can  provide that quick "body" you might want when monitoring vocals, for  example.)  You can also just disable the effects for a track or clip  while recording, and turn them on later.

 

Device and Driver Options: Different  devices may have wildly different performance at the same buffer size  and with the same session.  Audio devices designed primarily for gaming  are less likely to perform well at low buffer sizes as those designed  for music production, for example.  Even if the hardware performs the  same, the driver mode may be a source of latency.  ASIO is almost always  faster than MME, though many device manufacturers do not supply an ASIO  driver.  The use of third-party, device-agnostic drivers, such as  ASIO4ALL (www.asio4all.com) allow you to wrap an MME-only device inside a  faux-ASIO shell.  The audio application believes it's speaking to an  ASIO driver, and ASIO4ALL has been streamlined to work more quickly with  the MME device, or even to allow you to use different inputs and  outputs on separate devices which ASIO would otherwise prevent.

 

We  also now see more USB microphone devices which are input-only audio  devices that generally use a generic Windows driver and, with a few  exceptions, rarely offer native ASIO support.  USB microphones generally  require a higher buffer size as they are primarily designed for  recording in cases where monitoring is unimportant.  When attempting to  record via a USB microphone and monitor via a separate audio device,  you're more likely to run into issues where the two devices are not  synchronized or drift apart after some time.  (The ugly secret of many  device manufacturers is that they rarely operate at EXACTLY the sample  rate specified.  The difference between 44,100 and 44,118 Hz is  negligible when listening to audio, but when trying to precisely  synchronize to a track recorded AT 44,100, the difference adds up over  time and what sounded in sync for the first minute will be wildly  off-beat several minutes later.)  You are almost always going to have  better sync and performance with a standard microphone connected to the  same device you're using for playback, and for serious recording, this  is the best practice.  If USB microphones are your only option, then I  would recommend making certain you purchase a high-quality one and have  an equally high-quality playback device.  Attempt to match the buffer  sizes and sample rates as closely as possible, and consider using a  higher buffer size and correcting the latency post-recording.  (One  method of doing this is to have a click or clap at the beginning of your  session and make sure this is recorded by your USB microphone.  After  you finish your recording, you can visually line up the click in the  recorded track with the click in the original track by moving your clip  backwards in the timeline.  This is not the most efficient method, but  this alignment is the reason you see the clapboards in behind-the-scenes  filmmaking footage.)

 

Other Hardware: Other  hardware in your computer plays a role in the ability to feed or store  audio data quickly.  CPUs are so fast, and with multiple cores, capable  of spreading the load so often the bottleneck for good performance -  especially at high sample rates - tends to be your hard drive or storage  media.  It is highly recommended that you configure your temporary  files location, and session/recording location, to a physical drive that  is NOT the same as you have your operating system installed.  Audition  and other DAWs have absolutely no control over what Windows or OS X may  decide to do at any given time and if your antivirus software or system  file indexer decides it's time to start churning away at your hard drive  at the same time that you're recording your magnum opus, you raise the  likelihood of losing some of that performance.  (In fact, it's a good  idea to disable all non-essential applications and internet connections  while recording to reduce the likelihood of external interference.)  If  you're going to be recording multiple tracks at once, it's a good idea  to purchase the fastest hard drive your budget allows.  Most cheap  drives spin around 5400 rpm, which is fine for general use cases but  does not allow for the fast read, write, and seek operations the drive  needs to do when recording and playing back from multiple files  simultaneously.  7200 RPM drives perform much better, and even faster  options are available.  While fragmentation is less of a problem on OS X  systems, you'll want to frequently defragment your drive on Windows  frequently - this process realigns all the blocks of your files so  they're grouped together.  As you write and delete files, pieces of each  tend to get placed in the first location that has room.  This ends up  creating lots of gaps or splitting files up all over the disk.  The act  of reading or writing to these spread out areas cause the operation to  take significantly longer than it needs to and can contribute to  glitches in playback or loss of data when recording.

 
Replies

More Like This

  • Retrieving data ...

Bookmarked By (0)