Ios – Encoding PCM (CMSampleBufferRef) to AAC on iOS – How to set frequency and bitrate

aacaudioaudiotoolboxcore-audioios

I want to encode PCM (CMSampleBufferRef(s) going live from AVCaptureAudioDataOutputSampleBufferDelegate) into AAC.

When the first CMSampleBufferRef arrives, I set both (in/out) AudioStreamBasicDescription(s), "out" according to documentation

AudioStreamBasicDescription inAudioStreamBasicDescription = *CMAudioFormatDescriptionGetStreamBasicDescription((CMAudioFormatDescriptionRef)CMSampleBufferGetFormatDescription(sampleBuffer));

AudioStreamBasicDescription outAudioStreamBasicDescription = {0}; // Always initialize the fields of a new audio stream basic description structure to zero, as shown here: ...
outAudioStreamBasicDescription.mSampleRate = 44100; // The number of frames per second of the data in the stream, when the stream is played at normal speed. For compressed formats, this field indicates the number of frames per second of equivalent decompressed data. The mSampleRate field must be nonzero, except when this structure is used in a listing of supported formats (see “kAudioStreamAnyRate”).
outAudioStreamBasicDescription.mFormatID = kAudioFormatMPEG4AAC; // kAudioFormatMPEG4AAC_HE does not work. Can't find `AudioClassDescription`. `mFormatFlags` is set to 0.
outAudioStreamBasicDescription.mFormatFlags = kMPEG4Object_AAC_SSR; // Format-specific flags to specify details of the format. Set to 0 to indicate no format flags. See “Audio Data Format Identifiers” for the flags that apply to each format.
outAudioStreamBasicDescription.mBytesPerPacket = 0; // The number of bytes in a packet of audio data. To indicate variable packet size, set this field to 0. For a format that uses variable packet size, specify the size of each packet using an AudioStreamPacketDescription structure.
outAudioStreamBasicDescription.mFramesPerPacket = 1024; // The number of frames in a packet of audio data. For uncompressed audio, the value is 1. For variable bit-rate formats, the value is a larger fixed number, such as 1024 for AAC. For formats with a variable number of frames per packet, such as Ogg Vorbis, set this field to 0.
outAudioStreamBasicDescription.mBytesPerFrame = 0; // The number of bytes from the start of one frame to the start of the next frame in an audio buffer. Set this field to 0 for compressed formats. ...
outAudioStreamBasicDescription.mChannelsPerFrame = 1; // The number of channels in each frame of audio data. This value must be nonzero.
outAudioStreamBasicDescription.mBitsPerChannel = 0; // ... Set this field to 0 for compressed formats.
outAudioStreamBasicDescription.mReserved = 0; // Pads the structure out to force an even 8-byte alignment. Must be set to 0.

and AudioConverterRef.

AudioClassDescription audioClassDescription;
memset(&audioClassDescription, 0, sizeof(audioClassDescription));
UInt32 size;
NSAssert(AudioFormatGetPropertyInfo(kAudioFormatProperty_Encoders, sizeof(outAudioStreamBasicDescription.mFormatID), &outAudioStreamBasicDescription.mFormatID, &size) == noErr, nil);
uint32_t count = size / sizeof(AudioClassDescription);
AudioClassDescription descriptions[count];
NSAssert(AudioFormatGetProperty(kAudioFormatProperty_Encoders, sizeof(outAudioStreamBasicDescription.mFormatID), &outAudioStreamBasicDescription.mFormatID, &size, descriptions) == noErr, nil);
for (uint32_t i = 0; i < count; i++) {

    if ((outAudioStreamBasicDescription.mFormatID == descriptions[i].mSubType) && (kAppleSoftwareAudioCodecManufacturer == descriptions[i].mManufacturer)) {

        memcpy(&audioClassDescription, &descriptions[i], sizeof(audioClassDescription));

    }
}
NSAssert(audioClassDescription.mSubType == outAudioStreamBasicDescription.mFormatID && audioClassDescription.mManufacturer == kAppleSoftwareAudioCodecManufacturer, nil);
AudioConverterRef audioConverter;
memset(&audioConverter, 0, sizeof(audioConverter));
NSAssert(AudioConverterNewSpecific(&inAudioStreamBasicDescription, &outAudioStreamBasicDescription, 1, &audioClassDescription, &audioConverter) == 0, nil);

And then, I convert every CMSampleBufferRef into raw AAC data.

AudioBufferList inAaudioBufferList;
CMBlockBufferRef blockBuffer;
CMSampleBufferGetAudioBufferListWithRetainedBlockBuffer(sampleBuffer, NULL, &inAaudioBufferList, sizeof(inAaudioBufferList), NULL, NULL, 0, &blockBuffer);
NSAssert(inAaudioBufferList.mNumberBuffers == 1, nil);

uint32_t bufferSize = inAaudioBufferList.mBuffers[0].mDataByteSize;
uint8_t *buffer = (uint8_t *)malloc(bufferSize);
memset(buffer, 0, bufferSize);
AudioBufferList outAudioBufferList;
outAudioBufferList.mNumberBuffers = 1;
outAudioBufferList.mBuffers[0].mNumberChannels = inAaudioBufferList.mBuffers[0].mNumberChannels;
outAudioBufferList.mBuffers[0].mDataByteSize = bufferSize;
outAudioBufferList.mBuffers[0].mData = buffer;

UInt32 ioOutputDataPacketSize = 1;

NSAssert(AudioConverterFillComplexBuffer(audioConverter, inInputDataProc, &inAaudioBufferList, &ioOutputDataPacketSize, &outAudioBufferList, NULL) == 0, nil);

NSData *data = [NSData dataWithBytes:outAudioBufferList.mBuffers[0].mData length:outAudioBufferList.mBuffers[0].mDataByteSize];

free(buffer);
CFRelease(blockBuffer);

inInputDataProc() implementation:

OSStatus inInputDataProc(AudioConverterRef inAudioConverter, UInt32 *ioNumberDataPackets, AudioBufferList *ioData, AudioStreamPacketDescription **outDataPacketDescription, void *inUserData)
{
    AudioBufferList audioBufferList = *(AudioBufferList *)inUserData;

    ioData->mBuffers[0].mData = audioBufferList.mBuffers[0].mData;
    ioData->mBuffers[0].mDataByteSize = audioBufferList.mBuffers[0].mDataByteSize;

    return  noErr;
}

Now, the data holds my raw AAC, which I wrap into ADTS frame with proper ADTS header and sequence of these ADTS frames is playable AAC document.

But I don't understand this code as much as I want to. Generally, I don't understand the audio… I've just wrote it somehow following blogs, forums and docs, in pretty much time and now it works but I don't know why and how to change some parameters. So here are my questions:

  1. I need to use this converter during HW encoder is occupied (by AVAssetWriter). This is why I make SW converter via AudioConverterNewSpecific() and not AudioConverterNew(). But now setting outAudioStreamBasicDescription.mFormatID = kAudioFormatMPEG4AAC_HE; does not work. Can't find AudioClassDescription. Even if mFormatFlags is set to 0. What am I loosing by using kAudioFormatMPEG4AAC (kMPEG4Object_AAC_SSR) over kAudioFormatMPEG4AAC_HE? What should I use for live stream? kMPEG4Object_AAC_SSR or kMPEG4Object_AAC_Main?

  2. How to change sample rate properly? If I set outAudioStreamBasicDescription.mSampleRate to 22050 or 8000 for example, the audio playback is like slowed down. I set the sampling frequency index in ADTS header for same frequency as outAudioStreamBasicDescription.mSampleRate is.

  3. How to change bitrate? ffmpeg -i shows this info for produced aac:
    Stream #0:0: Audio: aac, 44100 Hz, mono, fltp, 64 kb/s.
    How to change it to 16 kbps for example? Bitrate is decreasing as I'm decreasing the frequency, but I believe this is not the only way? And playback is damaged by decreasing the frequency as I'm mentioning in 2 anyway.

  4. How to calculate the size of buffer? Now I set it to uint32_t bufferSize = inAaudioBufferList.mBuffers[0].mDataByteSize; as I believe compressed format won't be larger than uncompressed… But isn't it unnecessarily too much?

  5. How to set ioOutputDataPacketSize properly? If I am getting the documentation right, I should set it as UInt32 ioOutputDataPacketSize = bufferSize / outAudioStreamBasicDescription.mBytesPerPacket; but mBytesPerPacket is 0. If I set it to 0, AudioConverterFillComplexBuffer() returns error. If I set it to 1, it works but I don't know why…

  6. In inInputDataProc() there are 3 "out" parameters. I set just ioData. Should I also set ioNumberDataPackets and outDataPacketDescription? Why and how?

Best Answer

You may need to change the sample rate of the raw audio data by using a resampling audio unit before feeding the audio to the AAC converter. Otherwise there will be a mismatch between the AAC header and the audio data.

Related Topic