Audio Coding Fundamentals

Vicente Gonzlez Ruiz

November 2, 2014

Contents

1 How bit is audio data?
2 How to reduce the bit-rate?
3 What is an audio codec (COder/DECoder)?
4 Typical encoder steps
5 Overlaped processing?
6 The MDCT (Modified Discrete Cosine Transform)
7 SAM (pSycho Acoustic Model) of the HAS (Human Auditory System)
 7.1 ATH (Absolute Threshold of Hearing) model [1]
 7.2 Frequency resolution and simultaneous masking
 7.3 Temporal masking
 7.4 Channel coupling
8 Quantization
9 Entropy Coding

1 How bit is audio data?

  1. Mobile: up to 13 Kbps.
  2. (Terrestial) telephony: 64 Kbps.
  3. CD quality: 1.441 Mbps.
  4. AC-3 (Dolby Digital): up to 6.144 Mbps.
  5. DTS: up to 1509.75 Kbps.

2 How to reduce the bit-rate?

  1. Lowering the sampling rate (less bandwidth).
  2. Lowering the number of channels.
  3. Lowering the bits/sample (high quantization).
  4. Using audio compression.

3 What is an audio codec (COder/DECoder)?

PCM   +---------+        +---------+ PCM  
----->| Encoder |------->| Decoder |----->  
audio +---------+ stream +---------+ audio’  
 
              audio != audio’  
                (usually)

4 Typical encoder steps

  1. Overlaped subband analysis (usually with the MDCT (Modified Discrete Cosine Transform). Goes from the temporal to a frequency domain.
  2. Quantization. Basically, removes pure signals of low amplitude but taking also into account the SAM (pSycho Acoustic Model) of the HAS (Human Auditory System). Noise use to be of low power!
  3. Entropy coding. Compress data usually with Huffman/Arithmetic Coding.

5 Overlaped processing?

0              N-1            2N-1            3N-1  
+---------------+---------------+---------------+ s[n]  
<--------Transform Step--------->  
                <---------Transform Step-------->

6 The MDCT (Modified Discrete Cosine Transform)

7 SAM (pSycho Acoustic Model) of the HAS (Human Auditory System)

7.1 ATH (Absolute Threshold of Hearing) model [1]

PIC

7.2 Frequency resolution and simultaneous masking

7.3 Temporal masking

7.4 Channel coupling

8 Quantization

9 Entropy Coding

References

[1]   E. Terhardt. Calculating virtual pitch. Hearing Res., 1:155–182, 1979.