Posts

Preparation and Process

Last month we touched on the digital process.

This month we are going to talk about the preparation, the signal path, dos and don’ts and what some of the terminologies mean.

The most important part of the sampling process is preparation. If you prepare properly, then the whole sampling experience is more enjoyable and will yield you the optimum results.
Throughout this tutorial, I will try to incorporate as many sampler technologies as possible, and also present this tutorial side by side, using both hardware and software samplers.

So let us start with the signal path. Signal, being the audio you are recording and path, being the route it takes from the source to the destination.

The signal path is the path that the audio takes from it’s source, be it a turntable, a synthesizer etc, to its final destination, the computer or the hardware sampler. Nothing is more important than this path and the signal itself. The following list is a list of guidelines. Although it is a general guide, it is not scripture. We all know that the fun of sampling is actually in the breaking of the so-called rules and coming up with innovative ways and results. However, the guide is important as it gives you an idea of what can cause a sample to be less than satisfactory when recorded. I will list some pointers and go into more detail about each pointer.

  • The more devices you have in the signal path, the more the sample is degraded and coloured. The more devices in the path, the more noise is introduced into the path, and the headroom is compromised depending on what devices are in the path.
  • You must strive to obtain the best possible S/N (signal to noise ratio), throughout the signal path, maintaining a hot and clean signal.
  • You must decide whether to sample in mono or stereo.
  • You must decide what bit depth and sample rate you want to sample at.
  • You need to understand the limitations of both the source and destination.
  • You need to understand how to set up your sampler (destination) or sound card (destination) to obtain the best results.
  • You need to understand what it is that you are sampling (source) and how to prepare the source for the best sampling result.
  • If you have to introduce another device into the path, say a compressor, then you must understand what effect this device will have on the signal you are sampling.
  • You must understand what is the best way to connect the source and destination together, what cables are needed and why.
  • You need to calibrate the source and destination, and any devices in the path, to obtain the same gain readout throughout the path.
  • You need to understand the tools you have in the destination.
  • Use headphones for clarity of detail.

Basically, the whole process of sampling is about getting the audio from the source to the destination, keeping the audio signal strong and clean, and being able to listen to the audio in detail so you can pick out any noise or other artifacts in the signal.

In most cases, you can record directly from the source to the destination without having to use another device in the path. Some soundcards have preamps built into their inputs, along with line inputs, so that you can directly connect to these from the source. Hardware samplers usually have line inputs, so you would need a dedicated preamp to use with your microphone, to get your signal into the sampler. The same is true for turntables. Most turntables need an amp to boost the signal. In this instance, you simply use the output from the amp into your sampler or soundcard (assuming the soundcard has no preamp input). Synthesizers can be directly connected, via their outputs, to the inputs of the hardware sampler, or the line inputs of the soundcard.

As pointed out above, try to minimise the use of another device in the path. The reason is quite simple. Most hardware devices have an element of noise, particularly those that have built-in amps or power supplies. Introducing these in the signal path adds noise to the signal. So, the fewer devices in the path, the less noise you have. There are, as always, exceptions to the rule. For some of my products, I have re-sampled my samples through some of my vintage compressors. And I have done it for exactly the reasons I just gave as to why you must try to not do this. Confused? Don’t be. I am using the character of the compressors to add to the sample character. If noise is part of the compressor’s character, then I will record that as well. That way, people who want that particular sound, influenced by the compressor, will get exactly that. I have, however, come across people who sample with a compressor in the path just so they can have as strong and pumping signal as possible. This is not advised. You should sample the audio with as much dynamic range as possible. You need to keep the signal hot, ie as strong and as loud as possible without clipping the soundcard’s input meters or distorting in the case of hardware samplers. Generally, I always sample at a level 2 dBu below the maximum input level of the sampler or soundcard, ie 2 dBu below 0. This allows for enough headroom should I choose to then apply dynamics to the sample, as in compression etc. Part 1 of these tutorials explains dynamic range and dBs, so I expect you to know this. I am a vicious tutor, aren’t I? He, he.

My set up is quite simple and one that most sampling enthusiasts use.

I have all my sources routed through to a decent quality mixer, then to the sampler or my computer’s soundcard. This gives me great routing control, many ways to sample and, most important of all, I can control the signal better with a mixer. The huge bonus of using a mixer in the path and as the heart of the sampling path is that I can apply equalisation (eq) to the same source sample and record multi takes of the same sample, but with different eq settings. This way, by using the same sample, I get masses of variety. The other advantage of using is a mixer is that you can insert an effect or dynamic into the path and have more control over the signal, than just plugging the source into an effect unit or a compressor.

Headphones are a must when sampling. If you use your monitors (speakers) for referencing, when you are sampling, then a great deal of the frequencies get absorbed into the environment. So, it is always hard to hear the lower noise or higher noise frequencies, as they get absorbed by the environment. Using headphones, either on the soundcard, or the sampler, you only hear the signal and not the environment’s representation of the signal. This makes finding noise or other artifacts much easier.

The decision of sampling in mono or stereo is governed by a number of factors, the primary one being that of memory. All hardware samplers have memory restrictions, the amount of memory being governed by the make and model of the sampler. Computer sampling is another story entirely, as you are only restricted by how much ram you have in your computer. A general rule of thumb is: one minute of 44.1 kHz (audio bandwidth of 20 kHz using Nyquist theorem, which I covered in Part 1) sample rate audio, in stereo, equates to about 10 megabytes of memory. Sampling the same sampling rate audio in mono gives you double the time, ie 2 minutes, or takes up 5 megabytes of memory.

So, depending on your sampler’s memory restriction, always bear that in mind. Another factor that governs the use of mono over stereo is, whether you actually need to sample that particular sound in stereo. The only time you sample in stereo is if there is an added sonic advantage in sampling in stereo, particularly if a sound sounds fuller and has varying sonic qualities, that are on the left and right sides, of the stereo field, and you need to capture both sides of the stereo field. When using microphones on certain sounds, like strings, it is often best to sample in stereo. You might be using 3 or 4 microphones to record the strings, but then route these through your mixer’s stereo outputs or subgroups to your sampler or soundcard. In this case, stereo sampling will capture the whole tonal and dynamic range of the strings. For those that are on stringent memory samplers, sample in mono and, if you can tolerate it, a lower sampling rate. But make sure that the audio is not compromised.

At this point, it is important to always look at what it is that you are sampling and whether you are using microphones or direct sampling, using the outputs of a device to the inputs of the sampler or soundcard. For sounds like drum hits, or any sound that is short and not based on any key or pitch, like instrument or synthesizer sounds, keep it simple and clean. But what happens when you want to sample a sound from a particular synthesizer? This is where the sampler needs to be set up properly, and where the synthesizer has to be set up to deliver the best possible signal, that is not only clean and strong but one that can be easily looped and placed on a key and then spanned. In this case, where we are trying to sample and create a whole instrument, we need to look at multi-sampling and looping.

But before we do that, we need to understand the nature of what we are sampling and the tonal qualities of the sound we are sampling. Invariably, most synthesizer sounds will have a huge amount of dynamics programmed into the sound. Modulation, panning, oscillator detunes etc are all in the sound that you are trying to sample. In the case of analog synthesizers, it becomes even harder to sample a sound, as there is so much movement and tonal variances, that it makes sampling a nightmare. So, what do we do? Well, we strip away all these dynamics so that we are left with the original sound, uncoloured through programming. In the case of analog synthesizers, we will often sample each and every oscillator and filter. By doing this, we make the sampling process a lot easier and accurate. Remember that we can always program the final sampled instrument to sound like the original instrument. By taking away all the dynamics, we are left with simpler constant waveforms, that are easier to sample and, more importantly, easier to loop.

The other consideration is one of pitch/frequency. To sample one note is okay, but to then try to create a 5 octave preset presentation of this one sample would be a nightmare, even after looping the sample perfectly. There comes a point that a looped sample will begin to fall out of pitch and result in a terrible sound, full of artifacts and out of key frequencies. For each octave, the frequency is doubled. A way around this problem is multi-sampling. This means we sample more than one note of the sound, usually each third or fifth semitone. By sampling a collection of these notes, we can then have a much better chance of recreating the original sound accurately. We then place these samples in their respective ‘slots’ in the instrument patch of the sampler or software sampler, so a C3 note sampled, would be put into a C3 slot on the instrument keyboard layout. Remember, we do not need to sample each and every note, just a few, that way we can span the samples, ie we can use a C3 sample and know that it can still be accurate from a few semitones down to a few semitones up, so we spread that one sample down a few semitones and up a few semitones. These spread or zones are called keygroups. Emu call these zones and Akai call them keygroups. Where the sample ends, we put our next sample and so on, until the keyboard layout is complete with all the samples, this saves us a lot of hard work, in that we don’t have to sample every single note, but also gives us a more accurate representation of the sound being sampled. However, multi-sampling takes up memory. It is a compromise between memory and accurate representation that you need to decide on.

There are further advantages to multi-sampling, but we will come to those later. For sounds that are more detailed or complex in their characteristics, the more samples are required. In the case of a piano, it is not uncommon to sample every second or third semitone and also to sample the same notes with varying velocities, so we can emulate the playing velocities of the piano. We will sample hard, mid and soft velocities of the same note and then layer these and apply all sorts of dynamic tools to try to capture the original character of the piano being played. As I said, we will come to this later.

An area that is crucial is that of calibrating. You want to make sure that the sound you are trying to sample has the same level, as shown on the mixer’s meters, as the sampler’s meters or the soundcard’s meters. If there is a mixer in the path, then you can easily use the gain trims on the mixer, where the source is connected to, to match the level of the sound you want to sample, to the readout of the input meters of the sampler or the soundcard. If there is no mixer in the path, then you need to have your source sound at maximum, assuming there is no distortion or clipping, and your sampler’s or soundcard’s input gain at just below 0dBu. This is a good hot signal. If you had it the other way around, whereby the sound source level was too low and you had to raise the gain input of the sampler or soundcard, you would then be raising the noise floor. This would result in a signal with noise.

The right cabling is also crucial. If your sampler line inputs are balanced, then use balanced cables, don’t use phono cables with jack converters. Try to keep a reasonable distance between the source and destination and if you have an environment with RF interference, caused by amps, radios, antennae etc, then use shielded cables. I am not saying use expensive brands, just use cables correctly matched.

Finally, we are left with the tools that you have in your sampler and software sampler.

In the virtual domain, you have far more choice, in terms of audio processing and editing tools, and they are far cheaper than their hardware counterparts. So, sampling into your computer will afford you many more audio editing tools and options. In the hardware sampler, the tools are predefined.

In the next section, we will look at some of the most common tools used in sampling.

Additional content:

Preparing and Optimising Audio for Mixing

Normalisation – What it is and how to use it

Topping and Tailing Ripped Beats – Truncating and Normalising

Noise Gate does exactly what it sounds like.

It acts as a gate and opens when a threshold is achieved and then closes depending on how fast a release you set, basically acting as an on-off switch.
It reduces gain when the input level falls below the set threshold, that is, when an instrument or audio stops playing, or reaches a gap where the level drops, the noise gate kicks in and reduces the volume of the file.

Generally speaking, noise gates will have the following controls:

Threshold: the gate will ‘open’ once the threshold has been reached. The threshold will have varying ranges (eg: -60dB to infinity) and is represented in dB (decibels). Once the threshold has been set, the gate will open the instant the threshold is reached.

Attack: this determines the speed of the gate kicking in, much like a compressor’s attack, and is usually measured in ms (milliseconds) and sub derivatives of. This is a useful feature as the speed of the gate’s attack can completely change the tonal colour of a sound once gated.

Hold: this function allows the gate to stay open (or hold) for the specified duration, and is measured in ms and seconds. Very useful particularly when passages of audio need to be ‘let through’.

Decay or release: this function determines how quickly the gate closes and whether it is instant or gradual over time. Crucial feature as not all sounds have an abrupt end (think pads etc).

Side Chaining (Key Input): Some gates (in fact most) will also have a side-chain function that allows an external audio signal to control the gate’s settings.

When the side-chained exceeds the threshold, a control signal is generated to open the gate at a rate set by the attack control. When the signal falls below the threshold, the gate closes according to the setting of the hold and release controls. Clever uses for key input (side-chaining) are ducking and repeat gated effects used in Dance genres. The repeated gate effect (or stuttering) is attained by key inputting a hi-hat pattern to trigger the gate to open and close. By using a pad sound and the hi-hat key input pattern you are able to achieve the famous stuttering effect used so much in Dance music.

Ducking: Some gates will include a ‘Ducking’ mode whereby one signal will drop in level when another one starts or is playing.The input signal, which is usually the signal that needs ducking, is sent to the key input (side-chain), and the gate’s attack and release times set the rate at which the levels change in response to the key input signal. A popular use for ducking is in the broadcasting industry whereby the DJ needs the music to go quiet so he/she can be heard when speaking (once the voice is used at key input and triggered then the music will drop in volume).

However, side-chaining (key input) and ducking are not all the gate is good for.

The most common use for a gate, certainly in the old days of analog consoles and tape machines, was to use the gate to remove ‘noise’. By selecting a threshold just above the noise level the gate would open to allow audio through above the threshold and then to close when required. This meant that certain frequencies and levels of noise were ‘gated’ out of the audio passage and thus cleaner.

BUT it doesn’t end there. There are so many uses for a noise gate, using an EQ unit as the key input for shaping audio lines and curing false triggers, for ducking in commentary situations (and still used today), for creative sonic mangling tasks (much like the repeat gate) and so on.

With today’s software-based gates we are afforded a ton of new and interesting features that make the gate more than a simple ‘noise’  gate.

Experiment and enjoy chaining effects and dynamics in series and make sure to throw a gate in there somewhere for some manic textures.

If you prefer the visual approach then try this video tutorial:

Noise Gate – What is it and how does it work

In essence, noise is a randomly changing, chaotic signal, containing an endless number of sine waves of all possible frequencies with different amplitudes. However, randomness will always have specific statistical properties. These will give the noise its specific character or timbre.

If the sine waves’ amplitude is uniform, which means every frequency has the same volume, the noise sounds very bright. This type of noise is called white noise.

White noise is a signal with the property of having constant energy per Hz bandwidth (an amplitude-frequency distribution of 1) and so has a flat frequency response and because of these properties, white noise is well suited to test audio equipment. The human hearing system’s frequency response is not linear but logarithmic. In other words, we judge pitch increases by octaves, not by equal increments of frequency; each successive octave spans twice as many Hertz as the previous one down the scale. And this means that when we listen to white noise, it appears to us to increase in level by 3dB per octave.

If the amplitude of the sine waves decreases with a curve of about -6 dB per octave when their frequencies rise, the noise sounds much warmer. This is called pink noise.

Pink noise contains equal energy per octave (or per 1/3 octave). The amplitude follows the function 1/f, which corresponds to the level falling by 3dB per octave. These attributes lend themselves perfectly for use in acoustic measurements.

If it decreases with a curve of about -12 dB per octave we call it brown noise.

Brown noise, whose name is actually derived from Brownian motion, is similar to pink noise except that the frequency function is 1/(f squared). This produces a 6dB-per-octave attenuation.

Blue noise is essentially the inverse of pink noise, with its amplitude increasing by 3dB per octave (the amplitude is proportional to f).

Violet noise is the inverse of brown noise, with a rising response of 6dB per octave (amplitude is proportional to f squared).

So we have all these funky names for noise, even though you need to understand their characteristics, but what are they used for?

White noise is used in the synthesizing of hi-hats, crashes, cymbals etc, and is even used to test certain generators.

Pink noise is great for synthesizing ocean waves and the warmer type of ethereal pads.

Brown noise is cool for synthesizing thunderous sounds and deep and bursting claps. Of course, they can all be used in varying ways for attaining different textures and results, but the idea is simply for you to get an idea of what they ‘sound’ like.

At the end of the day, it all boils down to maths and physics.

 

Here is an article I wrote for Sound On Sound magazine on how to use Pink noise referencing for mixing.

And here is the link to the video I created on master bus mixing with Pink noise.

And here is another video tutorial on how to use ripped profiles and Pink noise to mix.

AN INTRODUCTION TO DIGITAL AUDIO

In the old days, sampling consisted of recoding the audio onto magnetic tape. The audio, (analogue), was represented by the movement of the magnetic particles on the tape. In fact, a good example is cutting vinyl. This is actually sampling because you are recording the audio onto the actual acetate or disc by forming the grooves. So, the audio is a continuous waveform.

Whether we are using a hardware sampler, like the Akais, Rolands, Yamahas, Emus etc…, or software samplers on our computers, like Kontakt, EXS24, NN-19 etc…, there is a process that takes place between you recording the analogue waveform (audio) into the sampler and the way the sampler interprets the audio and stores it.. This process is the conversion of the analogue signal (the audio you are recording) into a digital signal. For this to happen, we need what we call an analogue to digital converter (ADC) and for the sampler to play back what you have recorded and for you to hear it, the process is reversed but with a slightly different structure and process, and for that to happen we need a digital to analogue converter (DAC). That is simple and makes complete sense. Between all of that, there a few other things happening and with this diagram (fig1) you will at least see what I am talking about.

Fig1

The sampler records and stores the audio as a stream of numbers, binary, 0s and 1s, on and off. As the audio (sound wave) is moving along the ADC records ‘snapshots’ (samples) of the sound wave, much like the frames of a movie.. These snapshots (samples) are then converted into numbers. Each one of these samples (snapshots) is expressed as a number of bits. This process is called quantising and must not be confused with the quantising we have on sequencers although the process is similar. The number of times a sample is taken or measured per second is called the sampling rate. The sampling rate is measured as a frequency and is termed as kHz, k=1000 and Hz= cycles per second. These samples are measured at discrete intervals of time. The length of these intervals is governed by the Nyquist Theory. The theory states that the sampling frequency must be greater than twice the highest frequency of the input signal in order to be able to reconstruct the original perfectly from the sampled version. Another way of explaining this theory is that the maximum frequency that can be recorded with a set sample rate must be half the sample rate. A good example at this point would be the industry standard cd. 44.1 kHz means that the number of times a sample (snapshot) per second is taken equates to 44,100/second.

Ok, now let’s look at Bits. We have talked about the samples (snapshots) and the numbers. We know that these numbers are expressed as a number of bits. The number of bits in that number is crucial. This determines the dynamic range ( the difference between the lowest value of the signal to the highest value of the signal) and most importantly, the signal to noise ratio (S/N). For this, you need to understand how we measure ‘loudness’. The level or loudness of a sound is measured in decibels (dB), this is the unit of measure of the replay strength ( loudness) of an audio signal. Named after this dude Bell. The other measurement you might come across is dBu or dBv, that is the relationship between decibels and voltage. This means that decibels referenced to .775 volt. You don’t even need to think about this but you do need to know that we measure loudness (level) or volume of a sound in decibels, dB. 

Back to bits. The most important aspect of bits is its resolution. Let me explain this in simpler terms. You often come across samplers that are 8 bit, Fairlight CMI or Emulator 11, or 12 bit, Akai S950 or Emu SP1200, or 16 bit, Akai S1000 or Emulator 111 etc..You also come across sound cards that have 16 bit or 24 bit etc…Each bit refers to how accurately a sound can be recorded and presented. The more bits you have (Resolution), the better the representation of the sound. I could go into the’ electrical pressure measurement at an instant’ definition but that won’t help you at this early stage of this tutorial. So, I will give a little simple info about bit resolution.

There is a measurement that you can use, albeit not clear cut but at least it works for our purposes. For every bit, you get 6dBs of accurate representation. So, an 8 bit sampler will give you 48dB of dynamic range. Bearing in mind that we can, on average, hear up to 120dB, that figure of 48dB looks a bit poor. So, we invented 16 bit cd quality which gives us a 96dB dynamic range. Now we have 24 or even 32 bit sound card and samplers (24 bit) which gives us an even higher dynamic range. Even though we will never use that range, as our ears would implode, it is good to have a bit. Why? Well, use the Ferrari analogy. You have 160mph car there and even though you know you are not going to stretch it to that limit (I would), you do know that to get to 60mph it takes very little time and does not stress the car. The same analogy can be applied to monitors (speakers), the more dynamic range you have the better the sound representation at lower levels.

To take this resolution issue a step further: 8 bits allows for 256 different levels of loudness to a sample, 16 bit allows for 65,536. So, now you can see that 16 bits gives a much better representation. The other way of looking at it is: if I gave you 10 colours to paint a painting (copy a Picasso) and then gave you a 1000 colours to paint the same painting, which one would be better in terms of definition, colour, depth etc.? We have the same situation on computer screens and scanners and printers. The higher the resolution the clearer and better defined the images on your computer, or the better the quality of the scanned picture, or better the resolution of the print. Fig2. As you can see from the figure below. The lowest bit resolution is 1 and the highest is 4. The shape of the highest bit resolution is the closest in terms of representing the shape of the audio signal above. So the higher the bit resolution the better the representation. However, remember that because we are dealing with digital processing and not a continuous signal, there will always be steps in our signal in the digital domain.

Fig2

Now let’s look at the signal to noise ratio (S/N). This is the level difference between the signal level and noise floor. The best way to describe this is by using an example that always works for me. Imagine you are singing with just a drummer. You are the signal and the drummer is the noise (ha.ha). The louder you sing or the quieter the drummer plays the greater the signal to noise ratio. This is actually very important in all areas of sound technology and music. It is also very relevant when we talk about bit resolution and dynamic range. Imagine using 24 bits. That would allow a dynamic range of 144 dB. Bearing in mind we have a limit of 120 dB hearing range (theoretical) then the audio signal would be so much greater than the noise floor that it would be almost noiseless.

A good little example is when people re-sample their drums, that were at 16 bit, at 8 bit. The drums become dirty and grungy. This is why the Emu SP1200 is still so highly prized. The drum sampler beatbox that gave us fat and dirty drum sounds. Lovely.

Now, let’s go back to sample rates. I dropped in a nice little theorem by Nyquist to cheer you up. I know, I know, I was a bit cold there but it is a tad relevant.

If the sampling rate is lower or higher than the frequency we are trying to record and does not conform to the Nyquist rule, then we lose some of the cycles due to the quantisation process we mentioned earlier. Whereas this quantisation is related to the input voltage or the analogue waveform, for the sake of simplicity, it is important to bear in mind it’s relationship with bits and bit resolution. Remember that the ADC needs to quantise 256 levels for an 8 bit system. These quantisations are shown as steps, the jagged shape you get on the waveform. This creates noise or alias. The process or cock-up is called aliasing. Check Fig3.

Fig3

To be honest, that is a very scant figure but what it shows is that the analogue to digital conversion, when not following the Nyquist rule, leaves us with added noise or distortion because cycles will be omitted from conversion and the result is a waveform that doesn’t look too much like our original waveform that is being recorded.

To be even more honest, even at high sampling the signal processed will still be in steps as we discussed earlier about quantisation and the way the digital process processes analogue to digital.

So how do we get past this problem of aliasing? Easy. We use anti-aliasing filters. On Fig1, you see that there are 2 filters, one before the ADC and one after the DAC. Without going back into the Nyquist dude’s issues, just accept the fact that we get a great deal of high-frequency content in the way of harmonics or aliasing with the sample rate processing, so we run a low pass filter that only lets in the lower frequencies and gets rid of the higher frequencies (above our hearing range) that came in on the signal. The filter is also anti-aliasing so it smoothes out the signal.

What is obvious is that if we are using lower sampling rates then we will need a filter that is a steeply sloped frequency band (aggressive). So, it makes sense to use higher sampling rates to reduce the steepness of the filter. Most manufacturers put an even higher sample rate at the output stage so the filter does not need to be so aggressive (please refer to upsampling further on in this tutorial). The other process that takes place is a process is called interpolation. This is an error correction circuit that guesses the value of a missing bit by using the data that came before and after the missing bit. A bit crude. The output stage has now been improved with better DACs that are oversampling, and additionally a low order analogue filter just after the DAC at the output stage. The DAC incorporates the use of a low pass filter (anti imaging filter) at the output stage.

Now let’s have a look at an aggressive form of alias called foldover. Using Nyquist again: A sampling rate of 44.1 kHz can reproduce frequencies up to 22.05kHz (half). If lower sampling rates are used that do not conform to the Nyquist rule, then we get more extreme forms of alias. Let us put that in simple terms and let us take a lower sampling rate and for the sake of this argument, let us halve the usual 44.1 kHz. So, we have a sampling rate of 22.05 kHz. We know, using Nyquist, that your sampler or sound card cannot sample frequencies above half of that, 11.025 kHz. Without the use of the filter, that we have already discussed, the sampler or sound card would still try to record those higher frequencies (above 11.025 kHz) and the result would be terrible as the frequencies would now be re-markedly different to the frequencies you were trying to record.

So, to solve this extreme form of alias, manufacturers decided to use a brick wall filter. This is a very severe form of the low pass filter and, as the name suggests, only allows frequencies at a set point through, the rest it completely omits. However, it tries to compensate this aggressive filtering by boosting the tail-end of the frequencies, set by the manufacturer, to allow it to completely remove the higher frequencies.

However, we have now come to a new improved form of DAC called upsampling.

An upsampling digital filter is simply a poor over oversampled digital reconstruction filter having a slow roll-off rate. Nowadays, DAC manufacturers claim that these DACs improve the quality of sound and when used, instead of the brick wall filters, the claim is genuine. Basically, at the DAC stage, the output is oversampled, usually 8 times, this creates higher frequencies than we had at the AC stage, so to compensate and remove these very high frequencies, a low order analogue filter is added after the DAC and just before the output. So we could have an anti-aliasing filter at the input stage and an upsampling DAC with a low order analogue filter at the output stage. This technology is predominantly used in cd players and, of course, sound cards, and any device that incorporates DACs. I really don’t want to get into this topic too much as it really will ruin your day. At any rate, we will come back to this and the above at a later date when we examine digital audio in more detail. All I am trying to achieve in this introduction is to show you the process that takes place to convert an analogue signal into digital information, back to analogue at the output (so we can hear it: Playback) and the components and processes used.

The clock. Digital audio devices have clocks that set the timing of the signals and are a series of pulses that run at the sampling rate. Right now you don’t need to worry too much about this as we will come to this later. Clocks can have a definite impact in the digital domain but are more to do with syncing than the actual digital processes that we are talking about in terms of sampling. They will influence certain aspects of the process but are not relevant in the context of this introduction. So we will tackle the debate on clocks later as it will become more apparent how important the role of a good quality clock is in the digital domain.

Dither

Dither is used when you need to reduce the number of bits. The best example, and one that is commonly used, is when dithering down from 24 bits to 16 bits or 16 bits down to 8 etc… A very basic explanation is we add random noise to the waveform when we dither, to remove noise. We talked about quantisation earlier in this tutorial and when we truncate the bits (lowering the bit resolution), ie, in this case, we cut down the least significant bits, and the fact that we are always left with the stepped like waveforms in the digital process, by adding noise we create a more evenly flowing waveform instead of the stepped like waveform. It sounds crazy, but the noise we add results in the dithered waveform having a lower noise floor. This waveform, with the noise, is then filtered at the output stage, as outlined earlier. I could go into this in a much deeper context using graphs and diagrams and talking about probability density functions(PDF) and resultant square waves and bias of quantisation towards one bit over another. But you don’t need to know that now. What you do need to know is that dither is used when lowering the bit resolution and that this is an algorithmic process, ie using a predetermined set of mathematical formulas.

Jitter

Jitter is the timing variation in the sample rate clock of the digital process. It would be wonderful to believe that a sample rate of 44.1 kHz is an exact science, whereby the process samples at exactly 44,100 cycles per second. Unfortunately, this isn’t always the case. The speed at which this process takes place usually falters and varies and we get the ‘wobbling’ of the clock trying to keep up with the speeds of this process at these frequencies. This is called jitter. Jitter can cause all sorts of problems and it is best explained, for you, as the lower the jitter the better the audio representation. This is sometimes why we use better clocks and slave our sound cards to these clocks, to eradicate or diminish ‘jitter’ and the effects caused by it. I will not go into a deep explanation of this as, again, we will come to it later in these tutorials.

So, to conclude:

For us to sample we need to take an analogue signal (the audio being sampled), filter and convert it into digital information, process it then convert it back into analogue, then filter it and output it.

Relevant content:

Jitter in Digital Systems

Dither – What is it and how does it work?

This is the level difference between the signal level and noise floor. The best way to describe this is by using an example that always works for me. Imagine you are singing with just a drummer. You are the signal and the drummer is the noise (ha.ha). The louder you sing or the quieter the drummer plays the greater the signal to noise ratio. This is actually very important in all areas of sound technology and music. It is also very relevant when we talk about bit depth and dynamic range.

Imagine using 24 bits. That would allow a dynamic range of 144dB (generally, 6 dB is allocated for each Bit).
Bearing in mind we have a limit of 120dB hearing range (theoretical) then the audio signal would be so much greater than the noise floor that it would be almost noiseless.

People still find it confusing to distinguish between signal-to-noise ratio and dynamic range, particularly when dealing with the digital domain.

The signal-to-noise ratio is the RMS (Root Mean Square) level of the noise with no signal applied (expressed in dB below maximum level). Dynamic range is defined as the ratio of the loudest signal to that of the quietest signal in a digital system (again expressed in decibels (dB)).

In a typical professional analog system, the noise floor will be at about -100dBu. The nominal level is +4dBu, and clipping is typically at about +24dBu. That basically equates to about 20dB of headroom and a total dynamic range of about 120+dB. Clipping in an analog system equates (when used in small stages) to harmonic distortion. This is why ‘driving’ the headroom ceiling would sometimes make the audio sound more pleasing.

Digital systems operate in finite and critical terms, and ‘driving’ the ceiling cannot be done. As digital works off a linear system, once the quantising scale is reached clipping takes place (enharmonic distortion).

Luckily, converter technology has improved so much that we now have 24 bit delta-sigma converters offering 120dB of dynamic range, similar to what we had/have in analog consoles. And by using the same methodology, by leaving ample headroom, we are able to have a great dynamic range and a strong S/N offering a negligible noise floor.

In practice, this equates to the following:
Working with a nominal level of -18dBFS (EBU) or -20dBFS (SMPTE/AES), we can attain approximately 20dB of headroom whilst keeping the noise floor about -100dB.

Digital systems cannot record audio of greater amplitude than the maximum quantising level (please read my tutorial on the Digital Process). The digital signal reference point as at the top of the digital meter scale is 0dBFS, FS standing for ‘full scale’.

In the US, the adopted standard of setting the nominal analog level is; 0dBu equals -20dBFS, thereby tolerating peaks of up to 20dBu. In Europe, 0dBu equals to -18dBFS, thereby tolerating peaks of up to +18dBu.

This all sounds complicated but all you really need to be concerned with, as far as the digital world is concerned, is that we have a peak meter scale of 0dBFS. Beyond this and you have clipping and distortion.

Relevant content:

Headroom and Dynamic Range