R – Decoding ima4 audio format

file-formatima4iphone

To reduce the download size of an iPhone application I'm compressing some audio files. Specifically I'm using afconvert on the command line to change .wav format to .caf format w/ ima4 compression.

I've read this (wooji-juice.com) awesome post about this exact topic. I'm having trouble w/ the "decoding ima4 packets" step. I've looked at their sample code and I'm stuck. Please help w/ some pseudo code or sample code that can guide me in the right direction.

Thanks!

Additional info:
Here is what I've completed and where I'm having trouble…
I can play .wav files in both the simulator and on the phone.
I can compress .wav files to .caf w/ ima4 compression using afconvert on the command line. I'm using the SoundEngine that came w/ CrashLanding (I fixed one memory leak).
I modified the SoundEngine code to look for the mFormatID 'ima4'.

I don't understand the blog post linked above starting w/ "Calculating the size of the unpacked data". Why do I need to do this? Also, what does the term "packet" refer to? I'm very new to any sort of audio programming.

Best Answer

After gathering all the data from Wooji-Juice, Multimedia Wiki and Apple, here is my proposal (may need some experiment):

File structure

  • Apple IMA4 file are made of packet of 34 bytes. This is the packet unit used to build the file.
  • Each 34 bytes packet has two parts:
    • the first 2 bytes contain the preamble: an initial predictor and a step index
    • the 32 bytes left contain the sound nibbles (a nibble of 4 bits is used to retrieve a 16 bits sample)
  • Each packet has 32 bytes of compressed data, that represent 64 samples of 16 bits.
  • If the sound file is stereo, the packets are interleaved (one for the left, one for the right); there must be an even number of packets.

Decoding

Each packet of 34 bytes will lead to the decompression of 64 samples of 16 bits. So the size of the uncompressed data is 128 bytes per packet.

The decoding pseudo code looks like:

int[] ima_index_table = ... // Index table from [Multimedia Wiki][2]
int[] step_table = ... // Step table from [Multimedia Wiki][2]
byte[] packet = ... // A packet of 34 bytes compressed
short[] output = ... // The output buffer of 128 bytes
int preamble = (packet[0] << 8) | packet[1];
int predictor = preamble && 0xFF80; // See [Multimedia Wiki][2]
int step_index = preamble && 0x007F; // See [Multimedia Wiki][2]
int i;
int j = 0;
for(i = 2; i < 34; i++) {
    byte data = packet[i];
    int lower_nibble = data && 0x0F;
    int upper_nibble = (data && 0xF0) >> 4;

    // Decode the lower nibble
    step_index += ima_index_table[lower_nibble];
    diff = ((signed)nibble + 0.5f) * step / 4;
    predictor += diff;
    step = ima_step_table[step index];

    // Clamp the predictor value to stay in range
    if (predictor > 65535)
        output[j++] = 65535;
    else if (predictor < -65536)
        output[j++] = -65536;
    else
        output[j++] = (short) predictor;

    // Decode the uppper nibble
    step_index += ima_index_table[upper_nibble];
    diff = ((signed)nibble + 0.5f) * step / 4;
    predictor += diff;
    step = ima_step_table[step index];

    // Clamp the predictor value to stay in range
    if (predictor > 65535)
        output[j++] = 65535;
    else if (predictor < -65536)
        output[j++] = -65536;
    else
        output[j++] = (short) predictor;
}
Related Topic