The science of loudness

[ad_1]

My watch has a “Noise” app: it shows $d B$ , for decibels.

My amp has a volume knob, which also shows decibels, although.. negative ones, this time.

And finally, my video editing software has a ton of meters — which are all in decibel or
decibel-adjacent units.

How do all these decibels fit together?

Are the decibels from my watch and the decibels from my amp related? And if so,
how? I’ve decided to spend twenty minutes of your time answering that question.

I’ve also spent about a month of my time making FAM: the fasterthanlime audio meter, in Rust
of course, with egui. If you’re a patron of any tier,
you can clone and run it right now, and if you’re not, well you can’t.

A screenshot of fam, the fasterthanlime audio meter

Table of Contents

What even is sound?

Sound, like wind, is more of a concept than a thing, since it’s the name
we’ve given to a specific behavior of particles.

When you strum a guitar, the chord vibrates:

…transferring
energy to the body of the
guitar, which amplifies it and projects it into the air as a pressure wave!

This wave eventually makes its way to the ear, where some processing is already
done via its unique mechanical design: after being collected by the the outer ear,
the wave is ferried through the ear canal into the eardrum, where three tiny bones
amplify it. Then it’s destination: inner ear, where hair cells travel up and down
some fluid.

A schematic of the entire hearing system — NIH/NIDCD

Eventually, it’s converted by chemicals into electrical signals and interpreted
by the brain as sound.

People experiencing hearing loss and who use a cochlear implant bypass the
mechanical parts of that pipeline, relying instead on microphones and speech
processors.

Although those implants do not replicate natural hearing, they give the brain
enough information to recognize and process human speech, and environmental sounds.

For the rest of us, our ears detect tiny changes in pressure.

Under pressure

You have to realize that there is pressure constantly being applied to our
bodies all the time, on the order of one atmosphere, or about one hundred
thousand pascals, the SI unit for pressure.

But you’ll notice that my watch’s “Noise” app doesn’t use pascals. In fact,
no audio equipment I’ve ever looked at uses pascals. Instead, they show sound
pressure level, defined as follows:

$L_{p} = 20 l o g_{10} (\frac{p}{p_{0}}) d B_{S P L}$

Decibels are a logarithmic unit expressing a ratio — in this case the ratio
between $p$ , a pressure we measured, and $p_{0}$ , a reference pressure.

In air (because sound can also travel through water and other media), we
usually pick 20 micropascals, which is about the quietest sound the human ear
can detect.

$p_{0} = 20 µ P a$

Instead of having a linear scale that spans 8 orders of magnitudes (from
micropascals to the thousands), decibels give us a nice human-friendly scale
going from $0 d B$ to $194 d B$ :

Decibels	Example
0	Faintest sound heard by human ear
30	Whisper, quiet library
60	Normal conversation, sewing machine, or typewriter
90	Lawnmower, shop tools, or truck traffic (90 dB for 8 hours per day is the maximum exposure without protection*)
100	Chainsaw, pneumatic drill, or snowmobile (2 hours per day is the maximum exposure without protection)
115	Sandblasting, loud music concert, or automobile horn (15 minutes per day is the maximum exposure without protection)
140	Gun muzzle blast or jet engine (noise causes pain, and even brief exposure injures unprotected ears, and injury may occur even with hearing protectors)
180	Rocket launching pad

Source: Merck Manual

Own work, graphed via Desmos

Above $194 d B$ , we don’t get sound waves, we get shock waves — the pressure
amplitude would need to be more than one atmosphere, resulting in negative
absolute pressure, which is impossible. Once you reach a vacuum… that’s it.
There’s no going any more vacuumy.

Signal processing

We haven’t yet elucidated what the decibels on my amp mean. I would call
those $d B_{F S}$ , for decibel “relative to full scale”, because at
$0 d B$ , we would get the absolute maximum power the amp can output
(which would be damaging to my ears and to my relationship with the
neighbors).

The formula is the same, except that we don’t pick a reference $p_{0}$ like
with sound pressure levels. Here, we consider an input signal and an output signal:

$L = 20 l o g_{10} (\frac{x_{2}}{x_{1}}) dB$

Say the solid curve here is our input signal, with amplitude $1$ , and the dotted
curve is the signal after it comes out of.. some system, with amplitude $0.2$ :

amplitude 1 to 0.2

Based on those amplitudes, our system has a gain of:

$L = 20 l o g_{10} (\frac{0.2}{1}) \approx - 14 dB$

Because of the way human hearing works, it makes more sense to design a volume
control around decibels. It is, in fact, logarithmic, but it feels more linear
to the ear.

If you don’t do that, you end up with a very frustrating volume control where the
upper 80% are way too loud, and the value you want is between two ticks on the low
end of the slider. I’m sure you’ve seen those before, I know I have.

Those $d B_{F S}$ are the ones we’re interested in as broadcast engineers:
since we’re dealing with a signal directly.

Of course, most of the time, a signal will be transformed back into a sound
wave, and then we’ll have to worry about $d B_{S P L}$ …

But as long as it’s within an audio system, we have to worry about exceeding
levels. With an analog signal, that typically results in distortion, which…
can be done on purpose for style, and exceeding levels in a digital system
typically results in clipping, a very harsh form of distortion.

Which sounds quite awful — you may recognize that from public announcement
systems in parks or maybe trains:

To avoid that, we have watch our levels. And over the past hundred years,
we’ve come up with a bunch of solutions to do that, all of which are flawed
in some way.

In the 1930s, the BBC came up with meters that look like this:

sifam type34 british ppm — A typical British quasi-PPM.
Each division between ‘1’ and ‘7’ is exactly four decibels and ‘6’ is the intended maximum level.

Hyrumph on Wikimedia Commons

Well, that one isn’t from the 1930s, but the basic idea hasn’t changed.

Independently and around the same time, the Germans also developed level
meters, putting actual decibels on the scale…

lichtzeiger instrument siemens halske — A German PPM from Siemens & Halske

Max Koschuh on Wikimedia Commons

…and giving them the cute little nickname “Lichtzeigerinstrument” (light
pointer instruments).

Today we would call them both “quasi-PPMs”, PPM for “peak programme meter”,
and quasi because… they don’t actually report peaks accurately.

Type II PPMs, for example, have an integration time of 10ms — any peak shorter
than that gets under-reported. This succession of notes, which all have the
exact same volume but are getting longer and longer, shows how the quasi-PPM
under-reports at first:

I don’t own one, so the best I can do is show you a plug-in that simulates it!

mvMeter2 plug-in

Root Mean Square

But quasi-PPMs are still pretty good at showing peaks. A lot more than, say,
VU meters (for Volume Unit), which were invented in the US in the 1940s, and
which get us a lot closer to “loudness”.

What VU meters measure is similar to the Root Mean Square, which gives us the
average level of a signal over a period of time:

$R M S = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} x_{i}^{2}}$

A typical VU meter has an integration time of 300ms (vs 10ms for a Type II PPM
if you remember), giving us even more severe under-reporting of peaks:

mvMeter2 plug-in

Which is fine!

Here’s another side-by-side example — using an RMS meter this time, which is
close enough to an actual VU meter:

A prototype of the fasterthanlime audio meter playing one of my tracks, we see that the sample peak is consistently higher than the root mean square.

A VU meter is not meant to show short peaks it’s meant to let a radio operator
know how loud a song roughly is, so they can adjust the volume, and the
listeners don’t have to.

However, I’m happy to report that, since the 1940s, both our understanding of
human hearing and technology have improved.

Sample Peak, True Peak

First off, most audio processing is now done in the digital domain. Which is
both a blessing and a curse.

Here’s Ableton showing a piece of audio:

An Ableton screenshot cropped to show just the wave.

If we zoom in, we can see the wave:

The wave is now made up of a faster wave.

We can zoom in some more:

We're now seeing what looks like a graph almost

And eventually, Ableton will show us individual audio samples:

ableton individual audio

Today’s PPMs are not “quasi” anymore. It’s really easy to make a sample peak
monitor, because you just look at a window of samples, say, a thousand of them,
and keep whichever value is the furthest from zero! That’s your peak!

Your sample peak! Not your true peak!

Because the actual sound wave is reconstructed from a limited amount of discrete
samples, it’s possible for all samples to be below the maximum desired level,
and yet for the reconstructed wave to be above it!

(JavaScript is needed for this bit)

To actually measure the true peak, one can use a sinc filter to upsample the signal,
which fills in additional samples between the original ones — letting us know how high
that sound wave truly goes.

(JavaScript is needed for this bit)

The Loudness Wars

That takes care of peaks. What about loudness then? Well, we made progress there
too. First, in the wrong direction.

The idea of compression is to get rid of dynamic range by taking anything above
a certain level and progressively making it smaller:

A demonstration of compression in DaVinci Resolve 20 using the very text for this article.

We can apply gain to the resulting signal without clipping, making the whole
thing louder. That gain is called “make-up” gain, because it makes up for the
loud bits that were made quieter by the compression. I can’t believe that just
clicked for me now.

During the 2000s, sound engineers started abusing compression to make their
albums sound louder and louder, based on the theory that people preferred louder
things.

abba supertrouper — The song “Super Trouper” as shown on the major issues of the album, the 1980 Super Trouper LP, 2001 Jon Astley remaster, 2005 The Complete Studio Recordings box set disc 7, and 2011 Super Trouper Deluxe Edition remaster disc 1.

Kosmosi

These “loudness wars” lasted until the mid-2010s, when the music industry
finally tackled the problem,
by inventing a proper loudness unit: LKFS.

The first interesting thing about LKFS is that it takes into account multiple
channels and does a downmix into one value:

bs1770 fig1 — Simplified block diagram of multichannel loudness algorithm

Which begs the question: what did the BBC do, with their PPMs?

Well, they didn’t have to worry about stereo until the late 50s, when they
started experimenting with stereo themselves, with two separate AM transmitters.

So, two separate PPMs was one option:

paqt meter.ab.png — Screenshot of BBC-type Peak programme meter in AB (left/right) mode

Harumphy

…and then they had a different variant that showed the sum and the difference
of both channels, which came in two versions, M3:

paqt meter.m3.png — Screenshot of BBC-type Peak programme meter in M3 (sum/difference) mode

Harumphy

And M6:

paqt meter.m6.png — Screenshot of BBC-type Peak programme meter in M6 (sum/difference) mode

Harumphy

This is important because if you have two waves of opposite phase, they cancel
each other out!

phase

But the first stage of LKFS computation is filtering, to model how humans
perceive sound.

The first filter boosts everything above 1000Hz:

Response of stage 1 of the pre-filter used to account for the acoustic effects of the head

Graphed with Desmos

And the second is a high-pass filter, which attenuates anything under 100Hz.

Second stage weighting curve

Graphed with Desmos

Next, we integrate over some interval $T$ to calculate the power of the filtered signal:

$z_{i} = \frac{1}{T} \int_{0}^{T} y_{i}^{2} d t$

And finally, this should look familiar: it’s very close to the $d B$
formula from earlier:

$L_{K} = - 0.691 + 10 \cdot \log_{10} \sum_{i} G_{i} \cdot z_{i}$

But because this time we’re measuring a power, not an amplitude, we use a 10x
factor instead of a 20.

$G_{i}$ are the weighting coefficients for individual channels, given in table 3 of
BS.1770-5 as:

$\begin{matrix} G_{L} & = 1.0 & 0 DB \\ G_{R} & = 1.0 & 0 DB \\ G_{C} & = 1.0 & 0 DB \\ G_{L s} & = 1.41 & \sim + 1.5 DB \\ G_{R s} & = 1.41 & \sim + 1.5 DB \end{matrix}$

Depending on the interval chosen to calculate loudness, we call the result
different things:

$M$ for momentary (400 milliseconds)
$S$ for short-term (3 seconds)

As for I, it’s the integrated loudness, and it takes into account an entire
piece of media, minus the quiet parts, using a standard gating mechanism.

This prevents any “cheating” done by audio engineers to make their songs louder
than the others, because we finally have one number that is relatively good at
predicting how loud something will sound to the human ear.

When mastering for YouTube, we target an integrated loudness level of $- 14 L U F S$ .

In the “Stats for nerds” section of a YouTube video (that you can find in the context menu),
there is a content loudness section:

yt minus2.9

On that video of George Michael’s “Careless
Whisper” they left a bit of headroom: their integrated loudness is $- 14 - 2.9 = - 16.9 L U F S$ .

I checked with ffmpeg’s -af ebur128 filter:



~/Downloads

❯ ffmpeg -i careless-whisper.webm -af ebur128 -f null -

ffmpeg version 7.1 Copyright (c) 2000-2024 the FFmpeg developers

✂️

[Parsed_ebur128_0 @ 0x6000002640b0] t: 0.106979   TARGET:-23 LUFS    M:-120.7 S:-120.7     I: -70.0 LUFS       LRA:   0.0 LU

[Parsed_ebur128_0 @ 0x6000002640b0] t: 0.206979   TARGET:-23 LUFS    M:-120.7 S:-120.7     I: -70.0 LUFS       LRA:   0.0 LU

[Parsed_ebur128_0 @ 0x6000002640b0] t: 0.306979   TARGET:-23 LUFS    M:-120.7 S:-120.7     I: -70.0 LUFS       LRA:   0.0 LU

[Parsed_ebur128_0 @ 0x6000002640b0] t: 0.406979   TARGET:-23 LUFS    M: -20.6 S:-120.7     I: -20.6 LUFS       LRA:   0.0 LU

[Parsed_ebur128_0 @ 0x6000002640b0] t: 0.506979   TARGET:-23 LUFS    M: -20.5 S:-120.7     I: -20.6 LUFS       LRA:   0.0 LU

[Parsed_ebur128_0 @ 0x6000002640b0] t: 0.606979   TARGET:-23 LUFS    M: -21.4 S:-120.7     I: -20.8 LUFS       LRA:   0.0 LU

[Parsed_ebur128_0 @ 0x6000002640b0] t: 0.706979   TARGET:-23 LUFS    M: -25.0 S:-120.7     I: -21.6 LUFS       LRA:   0.0 LU

[Parsed_ebur128_0 @ 0x6000002640b0] t: 0.806979   TARGET:-23 LUFS    M: -33.5 S:-120.7     I: -21.6 LUFS       LRA:   0.0 LU

✂️

[Parsed_ebur128_0 @ 0x6000002640b0] t: 300.606979 TARGET:-23 LUFS    M: -60.3 S: -60.3     I: -16.9 LUFS       LRA:   8.2 LU

[Parsed_ebur128_0 @ 0x6000002640b0] Summary:
  Integrated loudness:

    I:         -16.9 LUFS

    Threshold: -27.1 LUFS

Loudness range: LRA: 8.2 LU Threshold: -37.1 LUFS LRA low: -22.0 LUFS LRA high: -13.8 LUFS ✂️

Because -16.9 is below the target of -14, YouTube does not apply any change to this video.

By contrast, Rihanna’s Umbrella
is mastered to $- 8.8 LUFS$ , so YouTube turns the volume down:

yt loudness 5.3db

Going from 100% to 55% is a change of… hey, we can calculate that!

$20 l o g_{10} (\frac{0.55}{1}) \approx - 5.2 dB$

And that’s what the “stats for nerds” overlay is showing: it’s 5.2, 5.3 above their
target loudness, so they’re turning the volume down.

Personally, I find it strange that they show the difference between their
loudness target and the content’s integrated loudness in decibels.

If they want to display the delta, they should use LU, loudness units. And
honestly, they could just say the target is $- 14$ and give us the actual loudness
of a track in LUFS. This is stats for nerds! Not stats for normies!

Even more recently, YouTube has introduced “DRC” (for dynamic range compression):

yt drc

As seen here on this Scott the Woz
video, which is mastered way below
YouTube’s target loudness level, at around $- 23 LUFS$ , the target for
“broadcast” rather than “streaming”.

YouTube’s user-facing name for it is “Stable Volume”, it’s not supposed to kick in
for music because it ruins music and you can turn it off in the settings:

A-weighting

LKFS, and LUFS (same thing different name) aren’t the only units that try to
take psychoacoustics into account: the Apple Watch’s Noise app also does
filtering.

The first experiments regarding “how loud humans think sound is” date back
to 1927:

A DIRECT COMPARISON OF THE LOUDNESS OF PURE TONES BY B. A. KINGSBURY — Note: “T.U.” stands for telephone units, and “cycle” for Hertz.

A few years later, Fletcher & Munson publish this equal-contour loudness graph:

equal loudness — An equal-contour loudness from Fletcher & Munson, 1933

Loudness, Its Definition, Measurement and Calculation

Which takes a minute to figure out. Each of the lines determine a level of
loudness. Their test subjects reported that, for example, a 1000Hz sound blasted
at 40 decibels felt as loud as a 100Hz sound at 62 decibels.

In other words: we are much, much more sensitive to sounds at 1000Hz than to
those at 100Hz.

That dip around 3 to 4 KHz is where our hearing is most sensitive: we made our
smoke detectors beep at that frequency for maximum alert, and our babies cry at
that frequency for similar reasons.

From that graph, an ISO standard was
derived, specifying the A-weighting curve, which predates LKFS’s K-weighting
by almost 50 years:

Lindosland on Wikimedia Commons

Although more basic and somewhat outdated, A-weighting is used in a bunch of places.

French law
requires sound level meters like these in every music venue:

A display showing 100 dBA, 84 dBA Leq(10min) and 88 dbC Leq(10min). — Amix AFF17-3

As of 2023, the levels to respect are $102 d B_{A}$ and
$118 d B_{C}$ Level Equivalent (LEQ), or, “average sound energy” over 15
minutes.

American work safety organizations give different recommendations when it comes
to maximum sound exposure:

OSHA:

Duration per day	Sound level (dBA)
8 hours	90
4 hours	95
2 hours	100
1 hour	105
30 minutes	110
15 minutes	115

NIOSH:

Duration per day	Sound level (dBA)
8 hours	85
4 hours	88
2 hours	91
1 hour	94
30 minutes	97
15 minutes	100
7.5 minutes	103
3.75 minutes	106
1.88 minutes	109
0.94 minutes	112

These tables use $d B_{A}$ ; and so does the Apple Watch Noise app.

Now that we know how all these units fit together, we can all be fun at the next party.

(JavaScript is required to see this. Or maybe my stuff broke)

Here’s another article just for you:

[ad_2]

What even is sound?

Under pressure

Signal processing

Root Mean Square

Sample Peak, True Peak

The Loudness Wars

A-weighting

Like this:

Related

Projesh Kar

Leave a Comment Cancel reply

Product Highlight

Recent Posts

5 Ways to Find Scholarships and Grants for Grad School in 2025

MPR – MPR Australia

RYM – Ryman Healthcare | Aussie Stock Forums

What’s Coming in the 2025 Release

She Pushed To Overturn Trump’s Loss In The 2020 Election. Now She’ll Help Oversee U.S. Election Security.

Speeding Up Development and Reducing Costs (2025–2030)

South Carolina individual kidnapped, forced to withdraw money from ATM

Credit scores fall year over year, more borrowers miss payments

STOCK TIPS FOR SEP. 17 2025

Kirkland & Ellis Partner Gets Pierced in Court to Seal Claire’s Rescue Kirkland & Ellis Partner Gets Pierced in Court to Seal Claire’s Rescue –

What even is sound?

Under pressure

Signal processing

Root Mean Square

Sample Peak, True Peak

The Loudness Wars

A-weighting

Share this:

Like this:

Related

Projesh Kar

Leave a Comment Cancel reply

Product Highlight

Recent Posts