Convert mulaw audio data format to PCM format with different sampling rate


In many cases, we need to convert audio format for different applications.
Mu-law format is one popular format which can save space, while PCM is linear and direclty consumed by applications.

There could be also different sampling rates we need be aware of.

One such applicaiton is the openAI realtime API usage, which usually takes PCM format with 24kHz sampling rate.

Below is an example showing how to complete the process by converting mu-law (8 kHz) data to 16-bit linear PCM, resampling it to 24 kHz, and finally encoding it as a base64 string:

import audioop
import base64
# Path to your mu-law file
file_path = "some_mu-law-file"
with open(file_path, "rb") as f:
mulaw_data = f.read()
# 1. Convert mu-law data to 16-bit PCM at 8 kHz (2 bytes per sample).
pcm_8k = audioop.ulaw2lin(mulaw_data, 2)
# 2. Resample from 8 kHz up to 24 kHz (1 channel, 16 bits per sample).
# audioop.ratecv returns a tuple (converted_data, state).
pcm_24k, _ = audioop.ratecv(pcm_8k, 2, 1, 8000, 24000, None)
# 3. Convert the 24 kHz PCM data to a base64 encoded string.
pcm_24k_base64 = base64.b64encode(pcm_24k).decode("utf-8")
# pcm_24k_base64 now holds the base64-encoded PCM data at 24 kHz.

Explanation:

  • ulaw2lin(data, width): Converts mu-law data to linear PCM with the specified sample width (in bytes). Here width=2 (16-bit).

  • ratecv(data, width, channels, in_rate, out_rate, state): Resamples linear PCM to a new sample rate. It requires the current sample width, number of channels, input rate, and output rate. The state can typically be None on the first call.

  • b64encode(...): Encodes the binary data in Base64. The .decode("utf-8") step converts it into a UTF-8 string instead of bytes.


Author: robot learner
Reprint policy: All articles in this blog are used except for special statements CC BY 4.0 reprint policy. If reproduced, please indicate source robot learner !
  TOC