In essence, noise is a randomly changing, chaotic signal, containing an endless number of sine waves of all possible frequencies with different amplitudes. However, randomness will always have specific statistical properties. These will give the noise its specific character or timbre.

If the sine waves’ amplitude is uniform, which means every frequency has the same volume, the noise sounds very bright. This type of noise is called white noise.

White noise is a signal with the property of having constant energy per Hz bandwidth (an amplitude-frequency distribution of 1) and so has a flat frequency response and because of these properties, white noise is well suited to test audio equipment. The human hearing system’s frequency response is not linear but logarithmic. In other words, we judge pitch increases by octaves, not by equal increments of frequency; each successive octave spans twice as many Hertz as the previous one down the scale. And this means that when we listen to white noise, it appears to us to increase in level by 3dB per octave.

If the amplitude of the sine waves decreases with a curve of about -6 dB per octave when their frequencies rise, the noise sounds much warmer. This is called pink noise.

Pink noise contains equal energy per octave (or per 1/3 octave). The amplitude follows the function 1/f, which corresponds to the level falling by 3dB per octave. These attributes lend themselves perfectly for use in acoustic measurements.

If it decreases with a curve of about -12 dB per octave we call it brown noise.

Brown noise, whose name is actually derived from Brownian motion, is similar to pink noise except that the frequency function is 1/(f squared). This produces a 6dB-per-octave attenuation.

Blue noise is essentially the inverse of pink noise, with its amplitude increasing by 3dB per octave (the amplitude is proportional to f).

Violet noise is the inverse of brown noise, with a rising response of 6dB per octave (amplitude is proportional to f squared).

So we have all these funky names for noise, even though you need to understand their characteristics, but what are they used for?

White noise is used in the synthesizing of hi-hats, crashes, cymbals etc, and is even used to test certain generators.

Pink noise is great for synthesizing ocean waves and the warmer type of ethereal pads.

Brown noise is cool for synthesizing thunderous sounds and deep and bursting claps. Of course, they can all be used in varying ways for attaining different textures and results, but the idea is simply for you to get an idea of what they ‘sound’ like.

At the end of the day, it all boils down to maths and physics.


Here is an article I wrote for Sound On Sound magazine on how to use Pink noise referencing for mixing.

And here is the link to the video I created on master bus mixing with Pink noise.

And here is another video tutorial on how to use ripped profiles and Pink noise to mix.

When dealing with events, as we do for cycles as an example, we are concerned with two factors: Frequency (f) and Time (T). If we look at a single event then T is defined as the start to the end of that event and that amount is measured as a Period.
When dealing with a waveform cycle, the time it takes for the cycle to return to its starting position is defined as Periodicity. Taking this a step further, Frequency is then defined as the number of events that occur over a specified time, and this is illustrated with the following equation:

We measure this Periodicy in seconds (s), cycles per second. The SI unit for one cycle per second is measured in Hertz (Hz). We tend to measure anything above 1000 Hz as kHz (kilohertz) and if dealing with cycles that are measured in shorter durations than one second we use ms (milliseconds: 1/1000th of a second). This is a huge advantage when it comes to measuring microphone distances from sources and trying to correct alignments.

Relevant content:

Total and Partial Phase cancellation

Dither is used when you need to reduce the number of bits. The best example, and one that is commonly used, is when dithering down from 24 bits to 16 bits or 16 bits down to 8 etc…Most commonly, dithering from a higher bit depth to a lower one takes place when a project you are working on needs to be bounced down from 24 bits to 16 bits using dithering algorithms.

So, what is the process; in mortal language of course?

A very basic explanation is we add random noise to the waveform when we dither, to remove noise. When we truncate the bits, ie, in this case, we cut down the least significant bits, and the fact that we are always left with the stepped like waveforms in the digital process, by adding noise we create a more evenly flowing waveform instead of the stepped like waveform. It sounds crazy, but the noise we add results in the dithered waveform having a lower noise floor. This waveform, with the noise, is then filtered at the output stage. I could go into this in a much deeper context using graphs and diagrams and talking about probability density functions (PDF) and resultant square waves and bias of quantisation towards one bit over another. But if I did that you’d probably hate me. All that matters is that dither is used when lowering the bit depth and that this is an algorithmic process, ie using a predetermined set of mathematical formulas.

If we take the 24 bit project scenario and select to bounce the resultant audio without dithering then the last eight bits (also know as Least Significant Bits) of every 24-bit sample are discarded. In terms of audio integrity, you will not only lose resolution but also introduce  Quantisation Noise. Because dithering adds random noise to the lower eight bits of the 24 bit signal whilst maintaining stereo separation the quantisation noise is dramatically reduced. It then makes sense to dither from 24 bits to 16 bits rather than bounce without it.

How well the process is executed is down to how good the dithering algorithms are. But to be honest these algorithms are so good nowadays that even standard audio sequencing suites (Cubase, Logic etc) will perform dithering tasks without much problem.

My recommendation is to always work in 24 bit and dither down to 16 bit for the resultant file, as CD format is still 16 bits.

Relevant content:

Jitter in Digital Systems

Understanding how sound travels in a given space is critical when setting up speakers in your studio.

Sound Waves  

Let us have a very brief look at how sound travels, and how we measure its effectiveness.  

Sound travels at approximately 1130 feet per second (about 1 foot per ms).
By the way, this figure is a real help when setting up microphones and working out phase values.

Now let us take a frequency travel scenario and try to explain its movement in a room.

For argument’s sake, let’s look at a bass frequency of 60 Hz.

When emitting sound, the speakers will vibrate at a rate of 60 times per second. Each cycle (Hz) means that the speaker cones will extend forward when transmitting the sound, and refract back (rarefaction) when recoiling for the next cycle.  

These vibrations create peaks on the forward drive and troughs on the refraction. Each peak and trough equates to one cycle. 

Imagine 60 of these cycles every second.

We can now calculate the wave cycles of this 60 Hz wave. We know that sound travels at approximately 1130 feet per second, so we can calculate how many wave cycles that is for the 60 Hz wave. We divide 1130 by 60, and the result is around 19 feet (18.83 if you want to be anal about it). We can now deduce that each wave cycle is 19 feet apart. To calculate each half-cycle, i.e. the distance between the peak and trough, drive and rarefaction, we simply divide by two. We now have a figure of 91/2 feet. What that tells us is that if you sat anywhere up to 91/2 feet from your speakers, the sound would fly past you completely flat.
However, this is assuming you have no boundaries of any sort in the room, i.e. no walls or ceiling. As we know that to be utter rubbish, we then need to factor in the boundaries.

These boundaries will reflect back the sound from the speakers and get mixed with the original source sound. This is not all that happens. The reflected sounds can come from different angles and because of their ‘bouncing’ nature; they could come at a different time to other waves. And because the reflected sound gets mixed with the source sound, the actual volume of the mixed wave is louder.

In certain parts of the room, the reflected sound will amplify because a peak might meet another peak (constructive interference), and in other parts of the room where a peak meets a trough (rarefaction), frequencies are canceled out (destructive interference).

Calculating what happens where is a nightmare.
This is why it is crucial for our ears to hear the sound from the speakers arrive before the reflective sounds. For argument’s sake, I will call this sound ‘primary’ or ‘leading’, and the reflective sound ‘secondary’ or ‘following’.

Our brains have the uncanny ability, due to an effect called the Haas effect, of both prioritizing and localizing the primary sound, but only if the secondary sounds are low in amplitude. So, by eliminating as many of the secondary (reflective) sounds as possible, we leave the brain with the primary sound to deal with. This will allow for a more accurate location of the sound, and a better representation of the frequency content.

But is this what we really want?

I ask this because the secondary sound is also important in a ‘real’ space and goes to form the tonality of the sound being heard. Words like rich, tight, full etc. all come from secondary sounds (reflected). So, we don’t want to completely remove them, as this would then give us a clinically dead space. We want to keep certain secondary sounds and only diminish the ones that really interfere with the sound.

Our brains also have the ability to filter or ignore unwanted frequencies. In the event that the brain is bombarded with too many reflections, it will have a problem localizing the sounds, so it decides to ignore, or suppress, them.

The best example of this is when there is a lot of noise about you, like in a room or a bar, and you are trying to have a conversation with someone. The brain can ignore the rest of the noise and focus on ‘hearing’ the conversation you are trying to have. I am sure you have experienced this in public places, parties, clubs, football matches etc. To carry that over to our real-world situation of a home studio, we need to understand that reflective surfaces will create major problems, and the most common of these reflective culprits are walls. However, there is a way of overcoming this, assuming the room is not excessively reflective and is the standard bedroom/living room type of space with carpet and curtains.

We overcome this with clever speaker placement and listening position, and before you go thinking that this is just an idea and not based on any scientific foundation, think again. The idea is to have the primary sound arrive at our ears before the secondary sound.   Walls are the worst culprits, but because we know that sound travels at a given speed, we can make sure that the primary sound will reach our ears before the secondary sound does. By doing this, and with the Haas effect, our brains will prioritize the primary sound and suppress (if at low amplitude) the secondary sound, which will have the desired result, albeit not perfectly.

A room affects the sound of a speaker by the reflections it causes. Some frequencies will be reinforced, others suppressed, thus altering the character of the sound. We know that solid surfaces will reflect and porous surfaces will absorb, but this is all highly reliant on the materials being used. Curtains and carpets will absorb certain frequencies, but not all, so it can sometimes be more damaging than productive. For this, we need to understand the surfaces that exist in the room. In our home studio scenario, we are assuming that a carpet and curtains, plus the odd sofa etc, are all that are in the room. We are not dealing with a steel factory floor studio.

In any listening environment, what we hear is a result of a mixture of both the primary and secondary (reflected) sounds. We know this to be true and our sound field will be a combination of both. In general, the primary sound, from the speakers, is responsible for the image, while the secondary sounds contribute to the tonality of the received sound. 

The trick is to place the speaker in a location that will take of advantage of the desirable reflections while diminishing the unwanted reflections. ‘Planning’ your room is as important as any piece of gear. Get the sound right and you will have a huge advantage. Get it wrong and you’re in the land of lost engineers.

Relevant content:

Sinusodial Creation and Simple Harmonic Motion

Frequency and Period of Sound

Total and Partial Phase cancellation

The first premise to understand is that simple harmonic motion through time generates sinusoidal motion.

The following diagram will display the amplitude of the harmonic motion and for this we need to use the term A in our formula. We will also be using θ.

I have used the equation A sin θ where θ completes one cycle (degrees).
The axis displays values based on a unit circle with being interpreted as amplitude.
The x axis denotes degrees (θ)

It then follows that:

When the angle θ is 0° or 180° then y = 0
sin 0° and sin 180° = y/A = 0

When the angle θ is 90° then y = 1
sin 90° = y/A = 1

When the angle θ is 270° then y = −1
sin 270° = y/A = −1

When constructing and working with sinusoids we need to plot our graph and define the axis.

I have chosen the y-axis for amplitude and the x-axis for time with phase expressed in degrees. However, I will later define the formulae that define the variables when we come to expressing the processes.

For now, a simple sine waveform, using each axis and defining them, will be enough.

I will create the y-axis as amplitude with a range that is set from -1 to +1.
y: amplitude

Now to create the x-axis and define its variables to display across the axis.

The range will be from -90 deg to 360 deg
x: time/phase/deg

The following diagram displays the axis plus the waveform and the simplest formula to create a sinusoid is y-sinx

The diagram shows one cycle of the waveform starting at 0, peaking at +1 (positive), dropping to the 0 axis and then down to -1 (negative).

The phase values are expressed in degrees and lie on the x-axis. A cycle, sometimes referred to as a period, of a sine wave is a total motion across all the phase values.

I will now copy the same sine wave and phase offset (phase shift and phase angle) so you can see the phase values and to do this we need another simple formula and that is:
y=sin(x-t) where t (time/phase value) being a constant will, for now, have a value set to 0. This allows me to shift by any number of degrees to display the phase relationships between the two sine waves.

The shift value is set at 90 which denotes a phase shift of 90 degrees. In essence, the two waveforms are now 90 degrees out of phase.

The next step is to phase shift by 180 deg and this will result in total phase cancelation. The two waveforms together, when played and summed, will produce silence as each peak cancels out each trough.

Relevant content:

Frequency and Period of Sound

Total and Partial Phase cancellation

Digital Audio – Understanding and Processing