Most sampling enthusiasts usually sample a beat, audio piece or riff when they sample. Your sampler is so much more than that, and offers a wealth of tools that you rarely even knew existed, as they are kept so quiet, away from the ‘in your face’ tools.

This tutorial aims to open your eyes to what you can actually achieve with a sampler, and how to utilise what you sample.

This final tutorial is the real fun finale. I will be nudging you to sample everything you can and try to show you what you can then do to the sample to make it usable in your music.

First off, let us look at the method.

Most people have a nightmare when it comes to multi-sampling. The one obstacle everyone seems to be faced with is how to attain the exact volume, length of note (duration) and how many notes to sample.

The easy method to solve these questions in one hit is to create a sequence template in your sequencer. This entails having a series of notes drawn into the piano roll or grid edit of your sequencer. You can actually assign each and every note to be played at a velocity of 127 (maximum volume), have each note the exact same length (duration) and you can have the sequencer play each and every note or any number of notes you want. The beauty of this method is that you will always be triggering samples that are at the same level and duration. This makes the task of looping and sample placing much easier. You can save this sequence and call it up every time you want to sample.

Of course, this only works if you have a sequencer and if you are multi-sampling. For sampling the source directly, as in the case of a synth keyboard, it is extremely useful.

Creative Sampling

The first weapon in creative sampling is the ‘change pitch’ tool. Changing the pitch of a sample is not just about slowing down a Drum and Bass loop until it becomes a Hip Hop loop, a little tip there that some people are unaware of. It is about taking a normal sound, sampling it then pitching it right down, or up, to achieve a specific effect.

Let us take a little trip down the ‘pitch lane’.

You can achieve the pitch down effect by using the change pitch tool in your sampler, assigning the sample to C4 then using the C1 note as the pitched-down note, or time stretch/compress to maintain the pitch but slow or speed the sample. There is a crucial distinction here. Slowing down a sample has a dramatic effect on the pitch and works great for slowing fast tempo beats down to achieve a slower beat, but there comes a point where the audio quality starts to suffer and you have to be aware of this when slowing a sample down. The same is true for speeding a sample up. Speed up a vocal sample and you end up with squeaky vocals.

Time stretching/compressing is a function that allows the length of a sample to be changed without affecting the original pitch. This is great for vocals. Vocals sung for a track at 90 BPM can then be used in a track at 120 BPM without having to change the pitch. Of course, this function is as good as the software or hardware driving it. The better the stretching/compressing software/hardware is, the better the result. Too much of stretching/compressing can lead to side effects, and in some cases, that is exactly what is required. A flanging type of robotic effect can be achieved with extreme stretching/compressing, very funky.

A crucial function to bear in mind, and always perform, is that when you pitch a sample down, you then need to adjust the sample start time. Actually, this is a secret weapon that programmers and sound designers use to find the exact start point of a sample. They pitch the sample right down and this makes it much easier to locate the start point. You will often find that a sample pitched down a lot will need to have the start time cropped, as there will be dead air present. This is normal, so don’t let it worry. Simply check your sample start times every time you perform a pitch down.

Here are a few funky things to sample.

Crunching, flicking, hitting paper

Slowly crunch a piece of paper, preferably a thicker crispier type of paper, and then sample it. Once you have sampled it, slow it right down and listen to the sample. It will sound like thunderclaps. If you are really clever you can listen to the sample as you slow it down, in stages, until you hear what sounds like a scratch effect, before it starts to sound like thunderclaps. SCSI dump the samples into your computer, use Recycle or similar, and dump the end result back into your sampler as chopped segments of the original sample (please read ‘chopping samples’ and ‘Recycle tutorial’).

Big sheets of paper being shaken or flicked from behind can be turned into thunderous noises by pitching down, turning up and routing through big reverbs.

Spoon on glass

There are two funky ways to do this. The first is with the glass empty. Use an empty glass, preferably a wine glass, and gently hit it with a spoon. Hit different areas of the glass as this generates different tones. You can then slow these samples down till you have bell sounds, or keep them as they are and add reverb and eq to give tine type of sounds.

The second way of doing this is to add water to the glass. This will deaden the sound and the sample will sound a lot more percussive. These samples make for great effects.

Lighting a match

Very cool. Light a match, sample it and slow it down. You will get a burst effect or, being clever, use the attack of the match being lit sample and you will get a great snare sound, dirty and snappy.

Tennis ball against wood

Man, this is a very cool one. Pitch these samples down for kick and tom effects. You can get some really heavy kicks out of this sample. Actually, the ball hitting woody type of surfaces make for great percussive sounds.

Finger clicking

Trim the tail off the sample and use the attack and body of the sample. You now have a stick or snare sound. Pitch it down and you will have a deep tom burst type of effect. Or, use the sample of the finger click, cut it into two segments, the first being the attack and the body, the second being the tail end. Layer them together and you have a snare with a reverse type of effect.

Hitting a radiator with a knife

Great for percussive sounds. Pitched down, you get percussive bells, as opposed to bells with long sustain and releases. Also, if you only take the attack of this sample, you will have a great snare sound.

Kitchen utensil

These are the foundation for your industrial sounds. Use everything. First, drop them all on a hard surface, together. Sample that and slow it down a bit and you will have factory types of sounds. Second, drop each utensil on a hard surface and sample them individually. They make for great bell and percussive sounds. Scrape them together and sample them. Slowed down, they will give you great eerie industrial sounds and film sound effects. Metallic sounds, once pitched down, give more interesting undertones, so experiment.

Hitting a mattress with a piece of wood

This will give a deep muffled sound that has a strong attack. This makes for a great kick or snare. Slowed right down, you will achieve the Trancey type of deep kick.

Blowing into bottles

This gives a nice flute type of sound. Pitched down, you will get a type of foghorn sound. Blow into it short and hard and use the attack and body, you will achieve a crazy deep effect when pitched down.

Slamming doors

Slam away and sample. Thunderous sounds when pitched down. The attacks of the samples make for some great kicks and snares.

Aerosol cans

Great for wind and hi-hats. Slowed down, you will achieve wind type sounds. Used as pitched up, you get cabasa type of sounds. Run through an effect and pitched higher, you will achieve a hi-hat type of sound.

Golf ball being thrown at a wall

A snare sample that is great in every respect. Kept as is, you get a cool snare. Pitched up and you get a snappier snare. Pitched down, you get a deep tom, kick or ethnic drum sound.


Sample toys, preferably the mechanical and robotic ones. The number of sample variations you will get will be staggering. These mechanical samples once pitched down, make for great industrial sounds. Pitched up, they can make some great Star Wars type of sounds. Simply chopped up as they are, make for great hits, slams and so on.

Factories and railway stations

Take your recorder and sample these types of locations. It is quite amazing what you will find and once manipulated, the samples can be so inspiring.

Toilets, sinks, and bathtubs.

Such fun. Water coming out of a tap pitched down can be white water. Water dripping can be used in so many ways. Splashing sounds can be amazing when pitched up or down. Dropping the soap in a full bath and hitting the sidewalls of the bathtub when empty or even full, can create some of the best percussive sounds imaginable.


Sample your radio, assuming it has a dial. The sounds of searching for stations can give you an arsenal of crazy sounds. Pitched down you will get factory drones, swirling electric effects and weird electro tom sounds. The sound palette is endless.

I think you get the picture by now. Sample everything and store it away. Create a library of your samples. Categorise them, so that they are easy to locate in the future.

Now let us look at what you can do to samples to make them interesting.

Reverse is the most obvious and potent tool. Take a piece of conversation between a man and a woman, sample it and reverse it and, hey presto, you have the Exorcist.

Layer a drum loop with the reversed version of the loop and check it out. Cool.

Pitch the reversed segment down a semitone or two to create a pseudo doppler effect.

With stereo samples of ambient or melodic sounds, try reversing one channel for a more unusual stereo image. You can also play around with panning here, alternating and cross-fading one for the other.

Try sampling at the lowest bandwidth your sampler offers for that crunchy, filthy loop. This is lo-fi land. Saves you buying an SP1200..he..he.

Try deliberately sampling at too low a level, then using the normalising function repeatedly to pump the volume back up again. This will add so much noise and rubbish to your sample that it will become dirty in a funky way.

You can take a drum loop and normalise it continually till it clips heavily. Now Recycle the segments, dump them back into your sampler, and you have dirty, filthy, crispy Hip Hop cuts.

A sample doubles its speed when it’s transposed up an octave. So try triggering two versions of a sampled loop an octave apart, at the same time. With a percussive loop, you’ll get a percussion loop running over the top of the original.

Use effects on a loop, record it to cassette for that hissy flavour, then, resample it. Recycle the whole lot and drop the segments back into your sampler and you have instant effects that you can play in any order.

Layer and cross-fade pad samples so that one evolves/morphs into another.

Take a loop and reverse it. Add the reversed loop at the end of the original loop for some weirdness.

Multi triggering a loop at close intervals will give you a chorus or flange type of effect. Try it. Have the same loop on 3 notes of your keyboard and hit each note a split second after the other. There you go.

I could go on for pages but will leave you to explore and enjoy the endless possibilities of sampling and sound design.

Additional content:

Preparing and Optimising Audio for Mixing

Normalisation – What it is and how to use it

Topping and Tailing Ripped Beats – Truncating and Normalising

Noise Gate does exactly what it sounds like.

It acts as a gate and opens when a threshold is achieved and then closes depending on how fast a release you set, basically acting as an on-off switch.
It reduces gain when the input level falls below the set threshold, that is, when an instrument or audio stops playing, or reaches a gap where the level drops, the noise gate kicks in and reduces the volume of the file.

Generally speaking, noise gates will have the following controls:

Threshold: the gate will ‘open’ once the threshold has been reached. The threshold will have varying ranges (eg: -60dB to infinity) and is represented in dB (decibels). Once the threshold has been set, the gate will open the instant the threshold is reached.

Attack: this determines the speed of the gate kicking in, much like a compressor’s attack, and is usually measured in ms (milliseconds) and sub derivatives of. This is a useful feature as the speed of the gate’s attack can completely change the tonal colour of a sound once gated.

Hold: this function allows the gate to stay open (or hold) for the specified duration, and is measured in ms and seconds. Very useful particularly when passages of audio need to be ‘let through’.

Decay or release: this function determines how quickly the gate closes and whether it is instant or gradual over time. Crucial feature as not all sounds have an abrupt end (think pads etc).

Side Chaining (Key Input): Some gates (in fact most) will also have a side-chain function that allows an external audio signal to control the gate’s settings.

When the side-chained exceeds the threshold, a control signal is generated to open the gate at a rate set by the attack control. When the signal falls below the threshold, the gate closes according to the setting of the hold and release controls. Clever uses for key input (side-chaining) are ducking and repeat gated effects used in Dance genres. The repeated gate effect (or stuttering) is attained by key inputting a hi-hat pattern to trigger the gate to open and close. By using a pad sound and the hi-hat key input pattern you are able to achieve the famous stuttering effect used so much in Dance music.

Ducking: Some gates will include a ‘Ducking’ mode whereby one signal will drop in level when another one starts or is playing.The input signal, which is usually the signal that needs ducking, is sent to the key input (side-chain), and the gate’s attack and release times set the rate at which the levels change in response to the key input signal. A popular use for ducking is in the broadcasting industry whereby the DJ needs the music to go quiet so he/she can be heard when speaking (once the voice is used at key input and triggered then the music will drop in volume).

However, side-chaining (key input) and ducking are not all the gate is good for.

The most common use for a gate, certainly in the old days of analog consoles and tape machines, was to use the gate to remove ‘noise’. By selecting a threshold just above the noise level the gate would open to allow audio through above the threshold and then to close when required. This meant that certain frequencies and levels of noise were ‘gated’ out of the audio passage and thus cleaner.

BUT it doesn’t end there. There are so many uses for a noise gate, using an EQ unit as the key input for shaping audio lines and curing false triggers, for ducking in commentary situations (and still used today), for creative sonic mangling tasks (much like the repeat gate) and so on.

With today’s software-based gates we are afforded a ton of new and interesting features that make the gate more than a simple ‘noise’  gate.

Experiment and enjoy chaining effects and dynamics in series and make sure to throw a gate in there somewhere for some manic textures.

If you prefer the visual approach then try this video tutorial:

Noise Gate – What is it and how does it work

Normalisation is a digital signal processing function that’s available in a lot of digital audio editing software. It scans through the program material for the highest level (Peak value), and if that level doesn’t reach the maximum available dynamic range, the software boosts the overall signal so that the Peak hits the highest level possible. For example, suppose you record a track of music and the highest peak registers at 6dB below the maximum available headroom (in this case 0). Normalisation (to 0 ceiling) brings the entire track up by 6dB. (Incidentally, most normalisation functions allow normalising to some percentage of the maximum available level; it needn’t always be 100 %.) There are a couple of problems though:

• Because normalisation boosts the entire signal, the noise floor comes up as well.

• Excessive use of amplitude-changing audio processes such as normalisation on linear, non-floating-point digital systems can cause so-called ’round-off errors’ that, if allowed to accumulate, impart a ‘fuzzy’ quality to your sound. If you’re going to normalise, it should be the very last process — don’t normalise, then add EQ, then change the overall level, and then re-normalise, for example.

If you need to normalize then think carefully about whether you will use Peak or RMS (average level).

RMS (Root Mean Square) is an averaging process. The selected audio waveform is analysed and ALL peak values are summed and divided to create an average Peak Reference which then acts as the anchor for the normalising value. In other words, the peaks are added and then divided by the number of peaks and the signal is then processed using the new average Peak value.

Peak. The selected audio is analysed and the highest Peak Value acts as the new anchor reference. All processing works from this Peak value and when normalising this Peak value is used to raise the gain to the desired limit.

I tend to find that RMS (Root Mean Square) works best on long audio files that have varying peaks and troughs. Peak tends to work well on single-shot samples, much like drum hits. Using RMS Normalisation you will often find that the processed audio will be thicker and heavier in sound whereas Peak will retain the original envelope bar louder.

If you prefer the visual approach then give this video tutorial a try:

Normalisation – what it is and how to use it

In essence, noise is a randomly changing, chaotic signal, containing an endless number of sine waves of all possible frequencies with different amplitudes. However, randomness will always have specific statistical properties. These will give the noise its specific character or timbre.

If the sine waves’ amplitude is uniform, which means every frequency has the same volume, the noise sounds very bright. This type of noise is called white noise.

White noise is a signal with the property of having constant energy per Hz bandwidth (an amplitude-frequency distribution of 1) and so has a flat frequency response and because of these properties, white noise is well suited to test audio equipment. The human hearing system’s frequency response is not linear but logarithmic. In other words, we judge pitch increases by octaves, not by equal increments of frequency; each successive octave spans twice as many Hertz as the previous one down the scale. And this means that when we listen to white noise, it appears to us to increase in level by 3dB per octave.

If the amplitude of the sine waves decreases with a curve of about -6 dB per octave when their frequencies rise, the noise sounds much warmer. This is called pink noise.

Pink noise contains equal energy per octave (or per 1/3 octave). The amplitude follows the function 1/f, which corresponds to the level falling by 3dB per octave. These attributes lend themselves perfectly for use in acoustic measurements.

If it decreases with a curve of about -12 dB per octave we call it brown noise.

Brown noise, whose name is actually derived from Brownian motion, is similar to pink noise except that the frequency function is 1/(f squared). This produces a 6dB-per-octave attenuation.

Blue noise is essentially the inverse of pink noise, with its amplitude increasing by 3dB per octave (the amplitude is proportional to f).

Violet noise is the inverse of brown noise, with a rising response of 6dB per octave (amplitude is proportional to f squared).

So we have all these funky names for noise, even though you need to understand their characteristics, but what are they used for?

White noise is used in the synthesizing of hi-hats, crashes, cymbals etc, and is even used to test certain generators.

Pink noise is great for synthesizing ocean waves and the warmer type of ethereal pads.

Brown noise is cool for synthesizing thunderous sounds and deep and bursting claps. Of course, they can all be used in varying ways for attaining different textures and results, but the idea is simply for you to get an idea of what they ‘sound’ like.

At the end of the day, it all boils down to maths and physics.


Here is an article I wrote for Sound On Sound magazine on how to use Pink noise referencing for mixing.

And here is the link to the video I created on master bus mixing with Pink noise.

And here is another video tutorial on how to use ripped profiles and Pink noise to mix.

Jitter is the timing variation in the sample rate clock of the digital process. It would be wonderful to believe that a sample rate of 44.1 kHz is an exact science, whereby the process samples at exactly 44,100 cycles per second. Unfortunately, this isn’t always the case. The speed at which this process takes place usually falters and varies and we get the ‘wobbling’ of the clock trying to keep up with the speeds of this process at these frequencies. This is called jitter. Jitter can cause all sorts of problems and it is best explained, for you, as: the lower the jitter the better the audio representation. This is sometimes why we use better clocks and slave our sound cards to these clocks, to eradicate or diminish ‘jitter’ and the effects caused by it.

Jitter is a variation in the timing of the sampling instants (time-based) when the audio is converted to or from the digital domain. If the conversion process suffers from any time anomaly then the resulting signal amplitude will differ from its true value. Usual side effects are an increase in high-frequency noise, clicks and worst-case scenario muted and not working.  In simple terms, the clicks are caused when one of the digital devices searches for an incoming audio ‘sample’ but fails to find it as it is looking at the wrong time ‘frame’ (instance). Apart from these ‘anomalies’, the real-world audio effect is that the stereo imaging is compromised leading to a flat stereo image as opposed to one with depth and width.

Jitter affects the stability of the sample clock. The lower the jitter figure, the more stable the clock and the better the performance. This means that the lower the jitter values, the better the performance and the more stable the clock is.

When using more than one digital device it is best to interface and synchronize, using clock synchronization, between both the source and destination digital device.

Most of today’s digital systems will have embedded clock at source that can then be used to synchronize the two devices. In more sophisticated systems like DAWs, digital consoles, higher-end sound cards and so on, there will be some form of control panel whereby desired clock sources can be selected. The most common selections available are digital input, external word clock, and the internal clock. The selection comes down to system configuration and project choice. However, what is a given is that all digital devices must be synchronized.

Using the internal clock ensures stability as the clock rate is known, but this is where all devices must be synchronized to the internal clock’s rate. Alternatively, and a common choice amongst most studios, is to use a dedicated external clock. This affords a universal and global rate that all devices can be synchronized to, and more importantly, a dedicated master clock has one function and that can often alleviate system configuration problems. The only problem that arises from this scenario is that most consumer systems do not accommodate for slaving to external clocks and the internal clock will have to be the master clock source.

At the end of the day, it comes down to knowledge and experience and ignoring the benefits of a good clock source in a digitally configured system is the equivalent of running top-end processors through a Radio Shack budget 2 channel DJ mixer.

When dealing with events, as we do for cycles as an example, we are concerned with two factors: Frequency (f) and Time (T). If we look at a single event then T is defined as the start to the end of that event and that amount is measured as a Period.
When dealing with a waveform cycle, the time it takes for the cycle to return to its starting position is defined as Periodicity. Taking this a step further, Frequency is then defined as the number of events that occur over a specified time, and this is illustrated with the following equation:

We measure this Periodicy in seconds (s), cycles per second. The SI unit for one cycle per second is measured in Hertz (Hz). We tend to measure anything above 1000 Hz as kHz (kilohertz) and if dealing with cycles that are measured in shorter durations than one second we use ms (milliseconds: 1/1000th of a second). This is a huge advantage when it comes to measuring microphone distances from sources and trying to correct alignments.

Relevant content:

Total and Partial Phase cancellation

Dither is used when you need to reduce the number of bits. The best example, and one that is commonly used, is when dithering down from 24 bits to 16 bits or 16 bits down to 8 etc…Most commonly, dithering from a higher bit depth to a lower one takes place when a project you are working on needs to be bounced down from 24 bits to 16 bits using dithering algorithms.

So, what is the process; in mortal language of course?

A very basic explanation is we add random noise to the waveform when we dither, to remove noise. When we truncate the bits, ie, in this case, we cut down the least significant bits, and the fact that we are always left with the stepped like waveforms in the digital process, by adding noise we create a more evenly flowing waveform instead of the stepped like waveform. It sounds crazy, but the noise we add results in the dithered waveform having a lower noise floor. This waveform, with the noise, is then filtered at the output stage. I could go into this in a much deeper context using graphs and diagrams and talking about probability density functions (PDF) and resultant square waves and bias of quantisation towards one bit over another. But if I did that you’d probably hate me. All that matters is that dither is used when lowering the bit depth and that this is an algorithmic process, ie using a predetermined set of mathematical formulas.

If we take the 24 bit project scenario and select to bounce the resultant audio without dithering then the last eight bits (also know as Least Significant Bits) of every 24-bit sample are discarded. In terms of audio integrity, you will not only lose resolution but also introduce  Quantisation Noise. Because dithering adds random noise to the lower eight bits of the 24 bit signal whilst maintaining stereo separation the quantisation noise is dramatically reduced. It then makes sense to dither from 24 bits to 16 bits rather than bounce without it.

How well the process is executed is down to how good the dithering algorithms are. But to be honest these algorithms are so good nowadays that even standard audio sequencing suites (Cubase, Logic etc) will perform dithering tasks without much problem.

My recommendation is to always work in 24 bit and dither down to 16 bit for the resultant file, as CD format is still 16 bits.

Relevant content:

Jitter in Digital Systems


In the old days, sampling consisted of recoding the audio onto magnetic tape. The audio, (analogue), was represented by the movement of the magnetic particles on the tape. In fact, a good example is cutting vinyl. This is actually sampling because you are recording the audio onto the actual acetate or disc by forming the grooves. So, the audio is a continuous waveform.

Whether we are using a hardware sampler, like the Akais, Rolands, Yamahas, Emus etc…, or software samplers on our computers, like Kontakt, EXS24, NN-19 etc…, there is a process that takes place between you recording the analogue waveform (audio) into the sampler and the way the sampler interprets the audio and stores it.. This process is the conversion of the analogue signal (the audio you are recording) into a digital signal. For this to happen, we need what we call an analogue to digital converter (ADC) and for the sampler to play back what you have recorded and for you to hear it, the process is reversed but with a slightly different structure and process, and for that to happen we need a digital to analogue converter (DAC). That is simple and makes complete sense. Between all of that, there a few other things happening and with this diagram (fig1) you will at least see what I am talking about.


The sampler records and stores the audio as a stream of numbers, binary, 0s and 1s, on and off. As the audio (sound wave) is moving along the ADC records ‘snapshots’ (samples) of the sound wave, much like the frames of a movie.. These snapshots (samples) are then converted into numbers. Each one of these samples (snapshots) is expressed as a number of bits. This process is called quantising and must not be confused with the quantising we have on sequencers although the process is similar. The number of times a sample is taken or measured per second is called the sampling rate. The sampling rate is measured as a frequency and is termed as kHz, k=1000 and Hz= cycles per second. These samples are measured at discrete intervals of time. The length of these intervals is governed by the Nyquist Theory. The theory states that the sampling frequency must be greater than twice the highest frequency of the input signal in order to be able to reconstruct the original perfectly from the sampled version. Another way of explaining this theory is that the maximum frequency that can be recorded with a set sample rate must be half the sample rate. A good example at this point would be the industry standard cd. 44.1 kHz means that the number of times a sample (snapshot) per second is taken equates to 44,100/second.

Ok, now let’s look at Bits. We have talked about the samples (snapshots) and the numbers. We know that these numbers are expressed as a number of bits. The number of bits in that number is crucial. This determines the dynamic range ( the difference between the lowest value of the signal to the highest value of the signal) and most importantly, the signal to noise ratio (S/N). For this, you need to understand how we measure ‘loudness’. The level or loudness of a sound is measured in decibels (dB), this is the unit of measure of the replay strength ( loudness) of an audio signal. Named after this dude Bell. The other measurement you might come across is dBu or dBv, that is the relationship between decibels and voltage. This means that decibels referenced to .775 volt. You don’t even need to think about this but you do need to know that we measure loudness (level) or volume of a sound in decibels, dB. 

Back to bits. The most important aspect of bits is its resolution. Let me explain this in simpler terms. You often come across samplers that are 8 bit, Fairlight CMI or Emulator 11, or 12 bit, Akai S950 or Emu SP1200, or 16 bit, Akai S1000 or Emulator 111 etc..You also come across sound cards that have 16 bit or 24 bit etc…Each bit refers to how accurately a sound can be recorded and presented. The more bits you have (Resolution), the better the representation of the sound. I could go into the’ electrical pressure measurement at an instant’ definition but that won’t help you at this early stage of this tutorial. So, I will give a little simple info about bit resolution.

There is a measurement that you can use, albeit not clear cut but at least it works for our purposes. For every bit, you get 6dBs of accurate representation. So, an 8 bit sampler will give you 48dB of dynamic range. Bearing in mind that we can, on average, hear up to 120dB, that figure of 48dB looks a bit poor. So, we invented 16 bit cd quality which gives us a 96dB dynamic range. Now we have 24 or even 32 bit sound card and samplers (24 bit) which gives us an even higher dynamic range. Even though we will never use that range, as our ears would implode, it is good to have a bit. Why? Well, use the Ferrari analogy. You have 160mph car there and even though you know you are not going to stretch it to that limit (I would), you do know that to get to 60mph it takes very little time and does not stress the car. The same analogy can be applied to monitors (speakers), the more dynamic range you have the better the sound representation at lower levels.

To take this resolution issue a step further: 8 bits allows for 256 different levels of loudness to a sample, 16 bit allows for 65,536. So, now you can see that 16 bits gives a much better representation. The other way of looking at it is: if I gave you 10 colours to paint a painting (copy a Picasso) and then gave you a 1000 colours to paint the same painting, which one would be better in terms of definition, colour, depth etc.? We have the same situation on computer screens and scanners and printers. The higher the resolution the clearer and better defined the images on your computer, or the better the quality of the scanned picture, or better the resolution of the print. Fig2. As you can see from the figure below. The lowest bit resolution is 1 and the highest is 4. The shape of the highest bit resolution is the closest in terms of representing the shape of the audio signal above. So the higher the bit resolution the better the representation. However, remember that because we are dealing with digital processing and not a continuous signal, there will always be steps in our signal in the digital domain.


Now let’s look at the signal to noise ratio (S/N). This is the level difference between the signal level and noise floor. The best way to describe this is by using an example that always works for me. Imagine you are singing with just a drummer. You are the signal and the drummer is the noise (ha.ha). The louder you sing or the quieter the drummer plays the greater the signal to noise ratio. This is actually very important in all areas of sound technology and music. It is also very relevant when we talk about bit resolution and dynamic range. Imagine using 24 bits. That would allow a dynamic range of 144 dB. Bearing in mind we have a limit of 120 dB hearing range (theoretical) then the audio signal would be so much greater than the noise floor that it would be almost noiseless.

A good little example is when people re-sample their drums, that were at 16 bit, at 8 bit. The drums become dirty and grungy. This is why the Emu SP1200 is still so highly prized. The drum sampler beatbox that gave us fat and dirty drum sounds. Lovely.

Now, let’s go back to sample rates. I dropped in a nice little theorem by Nyquist to cheer you up. I know, I know, I was a bit cold there but it is a tad relevant.

If the sampling rate is lower or higher than the frequency we are trying to record and does not conform to the Nyquist rule, then we lose some of the cycles due to the quantisation process we mentioned earlier. Whereas this quantisation is related to the input voltage or the analogue waveform, for the sake of simplicity, it is important to bear in mind it’s relationship with bits and bit resolution. Remember that the ADC needs to quantise 256 levels for an 8 bit system. These quantisations are shown as steps, the jagged shape you get on the waveform. This creates noise or alias. The process or cock-up is called aliasing. Check Fig3.


To be honest, that is a very scant figure but what it shows is that the analogue to digital conversion, when not following the Nyquist rule, leaves us with added noise or distortion because cycles will be omitted from conversion and the result is a waveform that doesn’t look too much like our original waveform that is being recorded.

To be even more honest, even at high sampling the signal processed will still be in steps as we discussed earlier about quantisation and the way the digital process processes analogue to digital.

So how do we get past this problem of aliasing? Easy. We use anti-aliasing filters. On Fig1, you see that there are 2 filters, one before the ADC and one after the DAC. Without going back into the Nyquist dude’s issues, just accept the fact that we get a great deal of high-frequency content in the way of harmonics or aliasing with the sample rate processing, so we run a low pass filter that only lets in the lower frequencies and gets rid of the higher frequencies (above our hearing range) that came in on the signal. The filter is also anti-aliasing so it smoothes out the signal.

What is obvious is that if we are using lower sampling rates then we will need a filter that is a steeply sloped frequency band (aggressive). So, it makes sense to use higher sampling rates to reduce the steepness of the filter. Most manufacturers put an even higher sample rate at the output stage so the filter does not need to be so aggressive (please refer to upsampling further on in this tutorial). The other process that takes place is a process is called interpolation. This is an error correction circuit that guesses the value of a missing bit by using the data that came before and after the missing bit. A bit crude. The output stage has now been improved with better DACs that are oversampling, and additionally a low order analogue filter just after the DAC at the output stage. The DAC incorporates the use of a low pass filter (anti imaging filter) at the output stage.

Now let’s have a look at an aggressive form of alias called foldover. Using Nyquist again: A sampling rate of 44.1 kHz can reproduce frequencies up to 22.05kHz (half). If lower sampling rates are used that do not conform to the Nyquist rule, then we get more extreme forms of alias. Let us put that in simple terms and let us take a lower sampling rate and for the sake of this argument, let us halve the usual 44.1 kHz. So, we have a sampling rate of 22.05 kHz. We know, using Nyquist, that your sampler or sound card cannot sample frequencies above half of that, 11.025 kHz. Without the use of the filter, that we have already discussed, the sampler or sound card would still try to record those higher frequencies (above 11.025 kHz) and the result would be terrible as the frequencies would now be re-markedly different to the frequencies you were trying to record.

So, to solve this extreme form of alias, manufacturers decided to use a brick wall filter. This is a very severe form of the low pass filter and, as the name suggests, only allows frequencies at a set point through, the rest it completely omits. However, it tries to compensate this aggressive filtering by boosting the tail-end of the frequencies, set by the manufacturer, to allow it to completely remove the higher frequencies.

However, we have now come to a new improved form of DAC called upsampling.

An upsampling digital filter is simply a poor over oversampled digital reconstruction filter having a slow roll-off rate. Nowadays, DAC manufacturers claim that these DACs improve the quality of sound and when used, instead of the brick wall filters, the claim is genuine. Basically, at the DAC stage, the output is oversampled, usually 8 times, this creates higher frequencies than we had at the AC stage, so to compensate and remove these very high frequencies, a low order analogue filter is added after the DAC and just before the output. So we could have an anti-aliasing filter at the input stage and an upsampling DAC with a low order analogue filter at the output stage. This technology is predominantly used in cd players and, of course, sound cards, and any device that incorporates DACs. I really don’t want to get into this topic too much as it really will ruin your day. At any rate, we will come back to this and the above at a later date when we examine digital audio in more detail. All I am trying to achieve in this introduction is to show you the process that takes place to convert an analogue signal into digital information, back to analogue at the output (so we can hear it: Playback) and the components and processes used.

The clock. Digital audio devices have clocks that set the timing of the signals and are a series of pulses that run at the sampling rate. Right now you don’t need to worry too much about this as we will come to this later. Clocks can have a definite impact in the digital domain but are more to do with syncing than the actual digital processes that we are talking about in terms of sampling. They will influence certain aspects of the process but are not relevant in the context of this introduction. So we will tackle the debate on clocks later as it will become more apparent how important the role of a good quality clock is in the digital domain.


Dither is used when you need to reduce the number of bits. The best example, and one that is commonly used, is when dithering down from 24 bits to 16 bits or 16 bits down to 8 etc… A very basic explanation is we add random noise to the waveform when we dither, to remove noise. We talked about quantisation earlier in this tutorial and when we truncate the bits (lowering the bit resolution), ie, in this case, we cut down the least significant bits, and the fact that we are always left with the stepped like waveforms in the digital process, by adding noise we create a more evenly flowing waveform instead of the stepped like waveform. It sounds crazy, but the noise we add results in the dithered waveform having a lower noise floor. This waveform, with the noise, is then filtered at the output stage, as outlined earlier. I could go into this in a much deeper context using graphs and diagrams and talking about probability density functions(PDF) and resultant square waves and bias of quantisation towards one bit over another. But you don’t need to know that now. What you do need to know is that dither is used when lowering the bit resolution and that this is an algorithmic process, ie using a predetermined set of mathematical formulas.


Jitter is the timing variation in the sample rate clock of the digital process. It would be wonderful to believe that a sample rate of 44.1 kHz is an exact science, whereby the process samples at exactly 44,100 cycles per second. Unfortunately, this isn’t always the case. The speed at which this process takes place usually falters and varies and we get the ‘wobbling’ of the clock trying to keep up with the speeds of this process at these frequencies. This is called jitter. Jitter can cause all sorts of problems and it is best explained, for you, as the lower the jitter the better the audio representation. This is sometimes why we use better clocks and slave our sound cards to these clocks, to eradicate or diminish ‘jitter’ and the effects caused by it. I will not go into a deep explanation of this as, again, we will come to it later in these tutorials.

So, to conclude:

For us to sample we need to take an analogue signal (the audio being sampled), filter and convert it into digital information, process it then convert it back into analogue, then filter it and output it.

Relevant content:

Jitter in Digital Systems

Dither – What is it and how does it work?

Often I get asked the same question about which to get; active monitors or passive monitors with a separate amplifier?

To answer this, I need to first explain the differences between the two.

It is commonly understood that active monitors simply have a built-in amp and therefore need no external amp to drive them, and that passive monitors need an external amp to drive them. Whereas this is true as far as the power is considered, it is a little more detailed than that when it comes to how each unit functions.

What we really need to look at is the crossover, which splits the signal into the appropriate frequency ranges before they’re sent to the individual drivers.

In passive designs, the monitor contains a set of passive components to split the input signal up into the various frequency bands required for each driver. The high-level input signal required to drive the speaker comes from an external power amplifier.

In active designs, the cabinet houses multiple power amplifiers connected to each driver, each amp drives a driver. The frequency band splitting is performed on the line input signal directly prior to the amplifiers.

While we are on the subject, let’s not forget the ‘powered’ monitor. Normally, in active systems, there is an amp for each driver, in powered systems, there is usually only one amp powering both drivers via a normal passive crossover.

Each design has its advantages and disadvantages.

In the case of the passive design, you are afforded a great deal of flexibility as you can choose different amps to power them and this can sometimes be a great situation to be in as the better the amp, the better the output signal. A better amp will also deliver far more headroom than a weaker counterpart and the frequency representation can also be better, especially in the higher frequency spectrum. This ‘mixing and matching’ gives the user a lot of room to try various amps and to optimise the best monitor and amp combination. It is also cheaper to buy a passive system as build costs are much lower than an active system.

In the case of the active system, the crossover can be more detailed and accurate, thus providing a more precise ‘frequency splitting’. This design also incorporates better amp matching for the drivers and therefore affords a more stable and better protected system. However, a good active system can cost considerably more than the passive counterpart.

At the end of the day, it comes down to budget, studio requirements, and space.

A passive system and separate amp take up more space than their active counterpart, but the mixing and matching of amps to monitors are very appealing and much easier to integrate into an updating studio environment. By just changing the amp, you can change the ‘colour’ and performance of the passive monitors.

Active systems come into their own when the budget starts to creep up. A good active system can actually end up being cheaper than the passive + amp alternative and can deliver better results, or rather, more precise results.

In today’s markets, the mid to upper price ranges, active systems do offer some distinct advantages. We have talked about precision and detail of amps to drive the drivers, better crossovers etc, but we also need to think about driver protection circuitry. This is as important as the drivers and amps. You tend to find that this circuit protection tends to go hand in hand with active designs. Shorter cable lengths within the cabinet, connecting amp to the driver, also negates a lot of problems that prevail due to badly shielded cables and the long lengths used.

At the budget end, things are not so rosy. Due to market competition, monitor manufacturers try to keep costs down as low as possible, and invariably compromises have to be made, and it’s usually the drivers and amps that give way.

The powered monitor will usually cost less than the active counterpart as it uses the one solitary amp to drive the drivers. So, it’s worth looking at these options before parting with your hard-earned paper.

If you consider the dynamic range of varying bit depths, 1 bit being roughly equivalent to 6dB of dynamic range, then it makes sense that the higher the bit depth the higher the dynamic range. With 24 bit depth, the dynamic range (theoretically) is 144dB. Bearing in mind that our hearing does not even come close to a 144dB range, it makes sense to use a dynamic range beyond our hearing’s dynamic range for the very simple reason that audio captured at this bit resolution will fall below our hearing’s minimum range and above its maximum range.

To accommodate internal processing within a digital system a much higher headroom is required for the simple reason that processing will require additional bits. By adding more than one 24 bit numbers together it is obvious that more bits are required. Dynamic processing, by its very nature, requires higher bit counts as the process itself generates bits, or subs of, that need managing otherwise there will be sonic compromises.

The 32 bit system seems to handle these processes well and it has become a minimum standard. Of course, we now have higher bit internal processing.

Fixed Point systems use the 32 bits in the standard way and the maths is simply a scale that provides a dynamic range of  192dB (32×6). The usual procedure is to allow the 24 bit signal to work closely at the top of the 32 bit processing. This makes complete sense as it provides a higher headroom and a lower noise floor.

Floating-point still uses the 32 bit system but arranges the bits in a different manner. The signal is still kept in 24 bit but the remaining bits are allocated to denote scaling factors. This basically means that the 24-bit can be used in a more flexible and dynamic manner allowing for a massive dynamic range. This equates to a never-ending scale of headroom and a noise floor that is so low as to be negligible.

Relevant content:

Digital Audio – Understanding and Processing

Jitter in Digital Systems

Dither – What is it and how does it work?