Ios – Hardware accelerated h.264 decoding to texture, overlay or similar in iOS

h.264iosiphoneopengl-es

Is it possible, and supported, to use the iOS hardware accelerated h.264 decoding API to decode a local (not streamed) video file, and then compose other objects on top of it?

I would like to make an application that involves drawing graphical objects in front of a video, and use the playback timer to synchronize what I am drawing on top, to what is being played on the video. Then, based on the user's actions, change what I am drawing on top (but not the video)

Coming from DirectX, OpenGL and OpenGL ES for Android, I am picturing something like rendering the video to a texture, and using that texture to draw a full screen quad, then use other sprites to draw the rest of the objects; or maybe writing an intermediate filter just before the renderer, so I can manipulate the individual output frames and draw my stuff; or maybe drawing to a 2D layer on top of the video.

It seems like AV Foundation, or Core Media may help me do what I am doing, but before I dig into the details, I would like to know if it is possible at all to do what I want to do, and what are my main routes to approach the problem.

Please refrain from "this is too advanced for you, try hello world first" answers. I know my stuff, and just want to know if what I want to do is possible (and most importantly, supported, so the app won't get eventually rejected), before I study the details by myself.

edit:

I am not knowledgeable in iOS development, but professionally do DirectX, OpenGL and OpenGL ES for Android. I am considering making an iOS version of an Android application I currently have, and I just want to know if this is possible. If so, I have enough time to start iOS development from scratch, up to doing what I want to do. If it is not possible, then I will just not invest time studying the entire platform at this time.

Therefore, this is a technical feasibility question. I am not requesting code. I am looking for answers of the type "Yes, you can do that. Just use A and B, use C to render into D and draw your stuff with E", or "No, you can't. The hardware accelerated decoding is not available for third-party applications" (which is what a friend told me). Just this, and I'll be on my way.

I have read the overview for the video technologies in page 32 of the ios technology overview. It pretty much says that I can use Media Player for the most simple playback functionality (not what I'm looking for), UIKit for embedding videos with a little more control over the embedding, but not over the actual playback (not what I'm looking for), AVFoundation for more control over playback (maybe what I need, but most of the resources I find online talk about how to use the camera), or Core Media to have full low-level control over video (probably what I need, but extremely poorly documented, and even more lacking in resources on playback than even AVFoundation).

I am concerned that I may dedicate the next six months to learn iOS programming full time, only to find at the end that the relevant API is not available for third party developers, and what I want to do is unacceptable for iTunes store deployment. This is what my friend told me, but I can't seem to find anything relevant in the app development guidelines. Therefore, I came here to ask people who have more experience in this area, whether or not what I want to do is possible. No more.

I consider this a valid high level question, which can be misunderstood as an I-didn't-do-my-homework-plz-give-me-teh-codez question. If my judgement in here was mistaken, feel free to delete, or downvote this question to your heart's contempt.

Best Answer

Yes, you can do this, and I think your question was specific enough to belong here. You're not the only one who has wanted to do this, and it does take a little digging to figure out what you can and can't do.

AV Foundation lets you do hardware-accelerated decoding of H.264 videos using an AVAssetReader, at which point you're handed the raw decoded frames of video in BGRA format. These can be uploaded to a texture using either glTexImage2D() or the more efficient texture caches in iOS 5.0. From there, you can process for display or retrieve the frames from OpenGL ES and use an AVAssetWriter to perform hardware-accelerated H.264 encoding of the result. All of this uses public APIs, so at no point do you get anywhere near something that would lead to a rejection from the App Store.

However, you don't have to roll your own implementation of this. My BSD-licensed open source framework GPUImage encapsulates these operations and handles all of this for you. You create a GPUImageMovie instance for your input H.264 movie, attach filters onto it (such as overlay blends or chroma keying operations), and then attach these filters to a GPUImageView for display and/or a GPUImageMovieWriter to re-encode an H.264 movie from the processed video.

The one issue I currently have is that I don't obey the timestamps in the video for playback, so frames are processed as quickly as they are decoded from the movie. For filtering and re-encoding of a video, this isn't a problem, because the timestamps are passed through to the recorder, but for direct display to the screen this means that the video can be sped up by as much as 2-4X. I'd welcome any contributions that would let you synchronize the playback rate to the actual video timestamps.

I can currently play back, filter, and re-encode 640x480 video at well over 30 FPS on an iPhone 4 and 720p video at ~20-25 FPS, with the iPhone 4S being capable of 1080p filtering and encoding at significantly higher than 30 FPS. Some of the more expensive filters can tax the GPU and slow this down a bit, but most filters operate in these framerate ranges.

If you want, you can examine the GPUImageMovie class to see how it does this uploading to OpenGL ES, but the relevant code is as follows:

- (void)startProcessing;
{
    NSDictionary *inputOptions = [NSDictionary dictionaryWithObject:[NSNumber numberWithBool:YES] forKey:AVURLAssetPreferPreciseDurationAndTimingKey];
    AVURLAsset *inputAsset = [[AVURLAsset alloc] initWithURL:self.url options:inputOptions];

    [inputAsset loadValuesAsynchronouslyForKeys:[NSArray arrayWithObject:@"tracks"] completionHandler: ^{
        NSError *error = nil;
        AVKeyValueStatus tracksStatus = [inputAsset statusOfValueForKey:@"tracks" error:&error];
        if (!tracksStatus == AVKeyValueStatusLoaded) 
        {
            return;
        }
        reader = [AVAssetReader assetReaderWithAsset:inputAsset error:&error];

        NSMutableDictionary *outputSettings = [NSMutableDictionary dictionary];
        [outputSettings setObject: [NSNumber numberWithInt:kCVPixelFormatType_32BGRA]  forKey: (NSString*)kCVPixelBufferPixelFormatTypeKey];
        // Maybe set alwaysCopiesSampleData to NO on iOS 5.0 for faster video decoding
        AVAssetReaderTrackOutput *readerVideoTrackOutput = [AVAssetReaderTrackOutput assetReaderTrackOutputWithTrack:[[inputAsset tracksWithMediaType:AVMediaTypeVideo] objectAtIndex:0] outputSettings:outputSettings];
        [reader addOutput:readerVideoTrackOutput];

        NSArray *audioTracks = [inputAsset tracksWithMediaType:AVMediaTypeAudio];
        BOOL shouldRecordAudioTrack = (([audioTracks count] > 0) && (self.audioEncodingTarget != nil) );
        AVAssetReaderTrackOutput *readerAudioTrackOutput = nil;

        if (shouldRecordAudioTrack)
        {            
            audioEncodingIsFinished = NO;

            // This might need to be extended to handle movies with more than one audio track
            AVAssetTrack* audioTrack = [audioTracks objectAtIndex:0];
            readerAudioTrackOutput = [AVAssetReaderTrackOutput assetReaderTrackOutputWithTrack:audioTrack outputSettings:nil];
            [reader addOutput:readerAudioTrackOutput];
        }

        if ([reader startReading] == NO) 
        {
            NSLog(@"Error reading from file at URL: %@", self.url);
            return;
        }

        if (synchronizedMovieWriter != nil)
        {
            __unsafe_unretained GPUImageMovie *weakSelf = self;

            [synchronizedMovieWriter setVideoInputReadyCallback:^{
                [weakSelf readNextVideoFrameFromOutput:readerVideoTrackOutput];
            }];

            [synchronizedMovieWriter setAudioInputReadyCallback:^{
                [weakSelf readNextAudioSampleFromOutput:readerAudioTrackOutput];
            }];

            [synchronizedMovieWriter enableSynchronizationCallbacks];
        }
        else
        {
            while (reader.status == AVAssetReaderStatusReading) 
            {
                [self readNextVideoFrameFromOutput:readerVideoTrackOutput];

                if ( (shouldRecordAudioTrack) && (!audioEncodingIsFinished) )
                {
                    [self readNextAudioSampleFromOutput:readerAudioTrackOutput];
                }

            }            

            if (reader.status == AVAssetWriterStatusCompleted) {
                [self endProcessing];
            }
        }
    }];
}

- (void)readNextVideoFrameFromOutput:(AVAssetReaderTrackOutput *)readerVideoTrackOutput;
{
    if (reader.status == AVAssetReaderStatusReading)
    {
        CMSampleBufferRef sampleBufferRef = [readerVideoTrackOutput copyNextSampleBuffer];
        if (sampleBufferRef) 
        {
            runOnMainQueueWithoutDeadlocking(^{
                [self processMovieFrame:sampleBufferRef]; 
            });

            CMSampleBufferInvalidate(sampleBufferRef);
            CFRelease(sampleBufferRef);
        }
        else
        {
            videoEncodingIsFinished = YES;
            [self endProcessing];
        }
    }
    else if (synchronizedMovieWriter != nil)
    {
        if (reader.status == AVAssetWriterStatusCompleted) 
        {
            [self endProcessing];
        }
    }
}

- (void)processMovieFrame:(CMSampleBufferRef)movieSampleBuffer; 
{
    CMTime currentSampleTime = CMSampleBufferGetOutputPresentationTimeStamp(movieSampleBuffer);
    CVImageBufferRef movieFrame = CMSampleBufferGetImageBuffer(movieSampleBuffer);

    int bufferHeight = CVPixelBufferGetHeight(movieFrame);
    int bufferWidth = CVPixelBufferGetWidth(movieFrame);

    CFAbsoluteTime startTime = CFAbsoluteTimeGetCurrent();

    if ([GPUImageOpenGLESContext supportsFastTextureUpload])
    {
        CVPixelBufferLockBaseAddress(movieFrame, 0);

        [GPUImageOpenGLESContext useImageProcessingContext];
        CVOpenGLESTextureRef texture = NULL;
        CVReturn err = CVOpenGLESTextureCacheCreateTextureFromImage(kCFAllocatorDefault, coreVideoTextureCache, movieFrame, NULL, GL_TEXTURE_2D, GL_RGBA, bufferWidth, bufferHeight, GL_BGRA, GL_UNSIGNED_BYTE, 0, &texture);

        if (!texture || err) {
            NSLog(@"Movie CVOpenGLESTextureCacheCreateTextureFromImage failed (error: %d)", err);  
            return;
        }

        outputTexture = CVOpenGLESTextureGetName(texture);
        //        glBindTexture(CVOpenGLESTextureGetTarget(texture), outputTexture);
        glBindTexture(GL_TEXTURE_2D, outputTexture);
        glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_LINEAR);
        glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_LINEAR);
        glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_S, GL_CLAMP_TO_EDGE);
        glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_T, GL_CLAMP_TO_EDGE);

        for (id<GPUImageInput> currentTarget in targets)
        {            
            NSInteger indexOfObject = [targets indexOfObject:currentTarget];
            NSInteger targetTextureIndex = [[targetTextureIndices objectAtIndex:indexOfObject] integerValue];

            [currentTarget setInputSize:CGSizeMake(bufferWidth, bufferHeight) atIndex:targetTextureIndex];
            [currentTarget setInputTexture:outputTexture atIndex:targetTextureIndex];

            [currentTarget newFrameReadyAtTime:currentSampleTime];
        }

        CVPixelBufferUnlockBaseAddress(movieFrame, 0);

        // Flush the CVOpenGLESTexture cache and release the texture
        CVOpenGLESTextureCacheFlush(coreVideoTextureCache, 0);
        CFRelease(texture);
        outputTexture = 0;        
    }
    else
    {
        // Upload to texture
        CVPixelBufferLockBaseAddress(movieFrame, 0);

        glBindTexture(GL_TEXTURE_2D, outputTexture);
        // Using BGRA extension to pull in video frame data directly
        glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, bufferWidth, bufferHeight, 0, GL_BGRA, GL_UNSIGNED_BYTE, CVPixelBufferGetBaseAddress(movieFrame));

        CGSize currentSize = CGSizeMake(bufferWidth, bufferHeight);
        for (id<GPUImageInput> currentTarget in targets)
        {
            NSInteger indexOfObject = [targets indexOfObject:currentTarget];
            NSInteger targetTextureIndex = [[targetTextureIndices objectAtIndex:indexOfObject] integerValue];

            [currentTarget setInputSize:currentSize atIndex:targetTextureIndex];
            [currentTarget newFrameReadyAtTime:currentSampleTime];
        }
        CVPixelBufferUnlockBaseAddress(movieFrame, 0);
    }

    if (_runBenchmark)
    {
        CFAbsoluteTime currentFrameTime = (CFAbsoluteTimeGetCurrent() - startTime);
        NSLog(@"Current frame time : %f ms", 1000.0 * currentFrameTime);
    }
}