C# – WebBrowser Control DocumentCompleted after iframe & Javascript completion

cnetweb scrapingwebbrowser-controlwinforms

I need to capture an image of generated HTML. I'm using Alex Filipovici's excellent solution from here: Convert HTML string to image. It works great except when I'm trying to load a page that has an iframe that uses some Javascript to load.

        static int width = 1024;
        static int height = 768;

        public static void Capture()
        {
            var html = @"
<!DOCTYPE html>
<meta http-equiv='X-UA-Compatible' content='IE=Edge'>
<html>
<iframe id='forecast_embed' type='text/html' frameborder='0' height='245' width='100%' src='http://forecast.io/embed/#lat=42.3583&lon=-71.0603&name=Downtown Boston'> </iframe>
</html>
";
            StartBrowser(html);
        }

        private static void StartBrowser(string source)
        {
            var th = new Thread(() =>
            {
                var webBrowser = new WebBrowser();
                webBrowser.Width = width;
                webBrowser.Height = height;
                webBrowser.ScrollBarsEnabled = false;
                webBrowser.DocumentCompleted += webBrowser_DocumentCompleted;
                webBrowser.DocumentText = source;
                Application.Run();
            });
            th.SetApartmentState(ApartmentState.STA);
            th.Start();
        }

        static void webBrowser_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
        {
            var webBrowser = (WebBrowser)sender;
            using (Bitmap bitmap = new Bitmap(width, height))
            {
                webBrowser.DrawToBitmap(bitmap, new System.Drawing.Rectangle(0, 0, width, height));
                bitmap.Save(@"image.jpg", System.Drawing.Imaging.ImageFormat.Jpeg);
            }
            Application.Exit();
        }

I understand that there's probably no definitive way to know if all javascript's have ended and the vagaries of iframe loading and the fact that DocumentCompleted get's called as many times as there are frames/iframes + 1. I can deal with the iframe load with a counter or something, but all I want is a reasonable delay, so the javascript is loaded and I don't get an image with "Loading" in it like this: http://imgur.com/FiFMTmm

Best Answer

If you're dealing with dynamic web pages which use frames and AJAX heavily, there is no perfect solution to find when a particular page has finished loading resources. You could get close by doing the following two things:

  • handle the page's window.onload event;
  • then asynchronously poll WebBrowser Busy property, with some predefined reasonably short time-out.

E.g., (check https://stackoverflow.com/a/19283143/1768303 for a complete example):

const int AJAX_DELAY = 2000; // non-deterministic wait for AJAX dynamic code
const int AJAX_DELAY_STEP = 500;

// wait until webBrowser.Busy == false or timed out
async Task<bool> AjaxDelay(CancellationToken ct, int timeout)
{
    using (var cts = CancellationTokenSource.CreateLinkedTokenSource(ct))
    {
        cts.CancelAfter(timeout);
        while (true)
        {
            try
            {
                await Task.Delay(AJAX_DELAY_STEP, cts.Token);
                var busy = (bool)this.webBrowser.ActiveXInstance.GetType().InvokeMember("Busy", System.Reflection.BindingFlags.GetProperty, null, this.webBrowser.ActiveXInstance, new object[] { });
                if (!busy)
                    return true;
            }
            catch (OperationCanceledException)
            {
                if (cts.IsCancellationRequested && !ct.IsCancellationRequested)
                    return false;
                throw;
            }
        }
    }
}

If you don't want to use async/await, you can implement the same logic using a timer.