Excel – VBA – XMLHTTP and WinHttp request speed

excelvbaweb scrapingxmlhttprequest

Below are declared variables for 3 requests which I implement in my macros. I listed libraries they use and their late bindings in comments:

Dim XMLHTTP As New MSXML2.XMLHTTP 'Microsoft XML, v6.0 'Set XMLHTTP = CreateObject("MSXML2.XMLHTTP.6.0")
Dim ServerXMLHTTP As New MSXML2.ServerXMLHTTP 'Microsoft XML, v6.0 'Set ServerXMLHTTP = CreateObject("MSXML2.ServerXMLHTTP.6.0")
Dim http As New WinHttpRequest 'Microsoft WinHttp Services, version 5.1 'Set http = CreateObject("WinHttp.WinHttpRequest.5.1")

I have a few old web scraping macros which used Internet Explorer automation. I wanted to clean coding and speed them up with these requests.

Unfortunately what I have noticed, MSXML2.ServerXMLHTTP and WinHttpRequest are slower on online store's 20 products test (34 and 35 sec) than IE automation with pictures and active scripting off (24 sec)! MSXML2.XMLHTTP executes in 18 secs. I used to see situations when some out of these 3 requests are 2-3 times faster / slower than the other ones, so I always test which one performs best, but never before had any request lost to IE automation.

The main page with results is below, it's all results on one page, 1500+ of them, so request takes some time (6500 pages if pasted to MS Word):

www.justbats.com/products/bat type~baseball/?sortBy=TotalSales Descending&page=1&size=2400

Then I open individual links from main result page:

http://www.justbats.com/product/2017-marucci-cat-7-bbcor-baseball-bat–mcbc7/24317/

I would like to know if these 3 requests are all options I have to get data from websites without browser automation. Also – how possibly browser automation can beat some of these requests?

UPDATE

I have tested the main result page with procedure provided in answer by Robin Mackenzie, clearing IE cache before running it. At least on this particular page, caching seemed to have no explicit gain, as subsequent requests yielded a similar result. IE had active scripting disabled and no images loading.

IE automation method, Document length: 7593346 chars, Processed in: 8 seconds

WinHTTP method,  Document length: 7824059 chars, Processed in: 29 seconds

XML HTTP method, Document length: 7830217 chars, Processed in: 4 seconds

Server XML HTTP method, Document length: 7823958 chars, Processed in: 26 seconds

URL download file method, Document length: 7830346 chars, Processed in: 7 seconds

Very surprising for me is the difference in amount of characters returned by these methods.

Best Answer

In addition to the methods you've mentioned:

IE automation
WinHTTPRequest
XMLHTTP
ServerXMLHTTP

There are 2 other methods you can think about:

Using the CreateDocumentFromUrl method of the MSHTML.HTMLDocument object
Using the Windows API function URLDownloadToFileA

There are some other Windows APIs that I am ignoring such as InternetOpen, InternetOpenUrl etc as potential performance will be outweighed by complexity of guess the response length, buffering the response, and so forth.

CreateDocumentFromUrl

With the CreateDocumentFromUrl method it is a problem with your sample website because it attempts to create a HTMLDocument in an frame which is not allowed with errors such as:

Framing Forbidden

and

To help protect the security of information you enter into this website, the publisher of this content does not allow it to be displayed in a frame.

So we should not use this method.

URLDownloadToFileA

I thought you need the php equivalent of file_get_contents and found this method. It is easily used (check this link) and out-performs the other methods when used on a large request (e.g. try it when you go for >2000 baseball bats). The XMLHTTP also method uses the URLMon library so I guess this way is just cutting out a bit of middle-man logic and obviously there's a downside because you have to do some file system handling.

Private Declare Function URLDownloadToFileA Lib "urlmon" (ByVal pCaller As Long, _
    ByVal szURL As String, ByVal szFileName As String, ByVal dwReserved As Long, _
    ByVal lpfnCB As Long) As Long

Sub TestUrlDownloadFile(strUrl As String)

    Dim dteStart As Date
    Dim dteFinish As Date
    Dim strTempFileName As String
    Dim strResponse As String
    Dim objFso As FileSystemObject

    On Error GoTo ExitFunction

    dteStart = Now
    strTempFileName = "D:\foo.txt"
    DownloadFile strUrl, strTempFileName
    Set objFso = New FileSystemObject
    With objFso.OpenTextFile(strTempFileName, ForReading)
        strResponse = .ReadAll
        .Close
    End With
    objFso.DeleteFile strTempFileName
    dteFinish = Now

    Debug.Print "URL download file method"
    Debug.Print "Document length: " & Len(strResponse) & " chars"
    Debug.Print "Processed in: " & Format(dteFinish - dteStart, "s") & " seconds"
    Debug.Print VBA.vbNewLine

ExitFunction:
    If Err.Number <> 0 Then
        Debug.Print Err.Description
    End If

End Sub

'http://www.vbaexpress.com/forum/archive/index.php/t-27050.html
Private Function DownloadFile(URL As String, LocalFilename As String) As Boolean
  Dim lngRetVal As Long
  lngRetVal = URLDownloadToFileA(0, URL, LocalFilename, 0, 0)
  If lngRetVal = 0 Then DownloadFile = True
End Function

With the URLDownloadToFileA it is taking me about 1-2 seconds to download you sample URL versus 4-5 seconds with the XMLHTTP method (full code below).

The URL:

www.justbats.com/products/bat type~baseball/?sortBy=TotalSales Descending&page=1&size=2400

This is the output:

Testing...


XML HTTP method
Document length: 7869753 chars
Processed in: 4 seconds


URL download file method
Document length: 7869753 chars
Processed in: 1 seconds

Code

This includes all methods discussed e.g. IE automation, WinHTTPRequest, XMLHTTP, ServerXMLHTTP, CreateDocumentFromURL and URLDownloadFile.

You need all these references in project:

Here it is:

Option Explicit

Private Declare Function URLDownloadToFileA Lib "urlmon" (ByVal pCaller As Long, _
    ByVal szURL As String, ByVal szFileName As String, ByVal dwReserved As Long, _
    ByVal lpfnCB As Long) As Long

Sub Test()

    Dim strUrl As String

    strUrl = "http://www.justbats.com/products/bat type~baseball/?sortBy=TotalSales Descending&page=1&size=2400"

    Debug.Print "Testing..."
    Debug.Print VBA.vbNewLine

    'TestIE strUrl
    'TestWinHHTP strUrl
    TestXMLHTTP strUrl
    'TestServerXMLHTTP strUrl
    'TestCreateDocumentFromUrl strUrl
    TestUrlDownloadFile strUrl

End Sub

Sub TestIE(strUrl As String)

    Dim dteStart As Date
    Dim dteFinish As Date
    Dim objIe As InternetExplorer
    Dim objHtml As MSHTML.HTMLDocument
    Dim strResponse As String

    On Error GoTo ExitFunction

    dteStart = Now
    Set objIe = New SHDocVw.InternetExplorer
    With objIe
        .navigate strUrl
        .Visible = False
        While .Busy Or .readyState <> READYSTATE_COMPLETE
           DoEvents
        Wend
        Set objHtml = .document
        strResponse = objHtml.DocumentElement.outerHTML
        .Quit
    End With
    dteFinish = Now

    Debug.Print "IE automation method"
    Debug.Print "Document length: " & Len(strResponse) & " chars"
    Debug.Print "Processed in: " & Format(dteFinish - dteStart, "s") & " seconds"
    Debug.Print VBA.vbNewLine

ExitFunction:
    If Err.Number <> 0 Then
        Debug.Print Err.Description
    End If
    If Not objIe Is Nothing Then
        objIe.Quit
    End If
    Set objIe = Nothing

End Sub

Sub TestWinHHTP(strUrl As String)

    Dim dteStart As Date
    Dim dteFinish As Date
    Dim objHttp As WinHttp.WinHttpRequest
    Dim objDoc As HTMLDocument
    Dim strResponse As String

    On Error GoTo ExitFunction

    dteStart = Now
    Set objHttp = New WinHttp.WinHttpRequest
    With objHttp
        .Open "get", strUrl, False
        .setRequestHeader "Cache-Control", "no-cache"
        .setRequestHeader "Pragma", "no-cache"
        .setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
        .send
        .WaitForResponse
        strResponse = .responseText
    End With
    dteFinish = Now

    Debug.Print "WinHTTP method"
    Debug.Print "Document length: " & Len(strResponse) & " chars"
    Debug.Print "Processed in: " & Format(dteFinish - dteStart, "s") & " seconds"
    Debug.Print VBA.vbNewLine

ExitFunction:
    If Err.Number <> 0 Then
        Debug.Print Err.Description
    End If
    Set objDoc = Nothing
    Set objHttp = Nothing

End Sub

Sub TestXMLHTTP(strUrl As String)

    Dim dteStart As Date
    Dim dteFinish As Date
    Dim objXhr As MSXML2.XMLHTTP60
    Dim objDoc As MSHTML.HTMLDocument
    Dim strResponse As String

    On Error GoTo ExitFunction

    dteStart = Now
    Set objXhr = New MSXML2.XMLHTTP60
    With objXhr
        .Open "get", strUrl, False
        .setRequestHeader "Cache-Control", "no-cache"
        .setRequestHeader "Pragma", "no-cache"
        .setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
        .send
        While .readyState <> 4
            DoEvents
        Wend
        strResponse = .responseText
    End With
    dteFinish = Now

    Debug.Print "XML HTTP method"
    Debug.Print "Document length: " & Len(strResponse) & " chars"
    Debug.Print "Processed in: " & Format(dteFinish - dteStart, "s") & " seconds"
    Debug.Print VBA.vbNewLine

ExitFunction:
    If Err.Number <> 0 Then
        Debug.Print Err.Description
    End If
    Set objDoc = Nothing
    Set objXhr = Nothing

End Sub

Sub TestServerXMLHTTP(strUrl As String)

    Dim dteStart As Date
    Dim dteFinish As Date
    Dim objXhr As MSXML2.ServerXMLHTTP60
    Dim objDoc As MSHTML.HTMLDocument
    Dim strResponse As String

    On Error GoTo ExitFunction

    dteStart = Now
    Set objXhr = New MSXML2.ServerXMLHTTP60
    With objXhr
        .Open "get", strUrl, False
        .setRequestHeader "Cache-Control", "no-cache"
        .setRequestHeader "Pragma", "no-cache"
        .setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
        .send
        While .readyState <> 4
            DoEvents
        Wend
        strResponse = .responseText
    End With
    dteFinish = Now

    Debug.Print "Server XML HTTP method"
    Debug.Print "Document length: " & Len(strResponse) & " chars"
    Debug.Print "Processed in: " & Format(dteFinish - dteStart, "s") & " seconds"
    Debug.Print VBA.vbNewLine

ExitFunction:
    If Err.Number <> 0 Then
        Debug.Print Err.Description
    End If
    Set objDoc = Nothing
    Set objXhr = Nothing

End Sub

Sub TestUrlDownloadFile(strUrl As String)

    Dim dteStart As Date
    Dim dteFinish As Date
    Dim strTempFileName As String
    Dim strResponse As String
    Dim objFso As FileSystemObject

    On Error GoTo ExitFunction

    dteStart = Now
    strTempFileName = "D:\foo.txt"
    If DownloadFile(strUrl, strTempFileName) Then
        Set objFso = New FileSystemObject
        With objFso.OpenTextFile(strTempFileName, ForReading)
            strResponse = .ReadAll
            .Close
        End With
        objFso.DeleteFile strTempFileName
    Else
        Debug.Print "Error downloading file from URL: " & strUrl
        GoTo ExitFunction
    End If
    dteFinish = Now

    Debug.Print "URL download file method"
    Debug.Print "Document length: " & Len(strResponse) & " chars"
    Debug.Print "Processed in: " & Format(dteFinish - dteStart, "s") & " seconds"
    Debug.Print VBA.vbNewLine

ExitFunction:
    If Err.Number <> 0 Then
        Debug.Print Err.Description
    End If

End Sub

'http://www.vbaexpress.com/forum/archive/index.php/t-27050.html
Private Function DownloadFile(URL As String, LocalFilename As String) As Boolean
    Dim lngRetVal As Long
    lngRetVal = URLDownloadToFileA(0, URL, LocalFilename, 0, 0)
    If lngRetVal = 0 Then
        DownloadFile = True
    Else
        DownloadFile = False
    End If
End Function

Sub TestCreateDocumentFromUrl(strUrl As String)

    Dim dteStart As Date
    Dim dteFinish As Date
    Dim strResponse As String
    Dim objDoc1 As HTMLDocument
    Dim objDoc2 As HTMLDocument

    On Error GoTo ExitFunction

    dteStart = Now
    Set objDoc1 = New HTMLDocument
    Set objDoc2 = objDoc1.createDocumentFromUrl(strUrl, "null")
    While objDoc2.readyState <> "complete"
        DoEvents
    Wend
    strResponse = objDoc2.DocumentElement.outerHTML
    Debug.Print strResponse
    dteFinish = Now

    Debug.Print "HTML Document Create from URL method"
    Debug.Print "Document length: " & Len(strResponse) & " chars"
    Debug.Print "Processed in: " & Format(dteFinish - dteStart, "s") & " seconds"
    Debug.Print VBA.vbNewLine

ExitFunction:
    If Err.Number <> 0 Then
        Debug.Print Err.Description
    End If
    Set objDoc2 = Nothing
    Set objDoc1 = Nothing

End Sub

How it works

I will try my best to explain how it works - please excuse my English.

The VBE will call a system function to create the password dialog box.
If user enters the right password and click OK, this function returns 1. If user enters the wrong password or click Cancel, this function returns 0.
After the dialog box is closed, the VBE checks the returned value of the system function
if this value is 1, the VBE will "think" that the password is right, hence the locked VBA project will be opened.
The code below swaps the memory of the original function used to display the password dialog with a user defined function that will always return 1 when being called.

Using the code

Please backup your files first!

Open the file(s) that contain your locked VBA Projects

Create a new xlsm file and store this code in Module1

code credited to Siwtom (nick name), a Vietnamese developer

Option Explicit

Private Const PAGE_EXECUTE_READWRITE = &H40

Private Declare Sub MoveMemory Lib "kernel32" Alias "RtlMoveMemory" _
        (Destination As Long, Source As Long, ByVal Length As Long)

Private Declare Function VirtualProtect Lib "kernel32" (lpAddress As Long, _
        ByVal dwSize As Long, ByVal flNewProtect As Long, lpflOldProtect As Long) As Long

Private Declare Function GetModuleHandleA Lib "kernel32" (ByVal lpModuleName As String) As Long

Private Declare Function GetProcAddress Lib "kernel32" (ByVal hModule As Long, _
        ByVal lpProcName As String) As Long

Private Declare Function DialogBoxParam Lib "user32" Alias "DialogBoxParamA" (ByVal hInstance As Long, _
        ByVal pTemplateName As Long, ByVal hWndParent As Long, _
        ByVal lpDialogFunc As Long, ByVal dwInitParam As Long) As Integer

Dim HookBytes(0 To 5) As Byte
Dim OriginBytes(0 To 5) As Byte
Dim pFunc As Long
Dim Flag As Boolean

Private Function GetPtr(ByVal Value As Long) As Long
    GetPtr = Value
End Function

Public Sub RecoverBytes()
    If Flag Then MoveMemory ByVal pFunc, ByVal VarPtr(OriginBytes(0)), 6
End Sub

Public Function Hook() As Boolean
    Dim TmpBytes(0 To 5) As Byte
    Dim p As Long
    Dim OriginProtect As Long

    Hook = False

    pFunc = GetProcAddress(GetModuleHandleA("user32.dll"), "DialogBoxParamA")


    If VirtualProtect(ByVal pFunc, 6, PAGE_EXECUTE_READWRITE, OriginProtect) <> 0 Then

        MoveMemory ByVal VarPtr(TmpBytes(0)), ByVal pFunc, 6
        If TmpBytes(0) <> &H68 Then

            MoveMemory ByVal VarPtr(OriginBytes(0)), ByVal pFunc, 6

            p = GetPtr(AddressOf MyDialogBoxParam)

            HookBytes(0) = &H68
            MoveMemory ByVal VarPtr(HookBytes(1)), ByVal VarPtr(p), 4
            HookBytes(5) = &HC3

            MoveMemory ByVal pFunc, ByVal VarPtr(HookBytes(0)), 6
            Flag = True
            Hook = True
        End If
    End If
End Function

Private Function MyDialogBoxParam(ByVal hInstance As Long, _
        ByVal pTemplateName As Long, ByVal hWndParent As Long, _
        ByVal lpDialogFunc As Long, ByVal dwInitParam As Long) As Integer
    If pTemplateName = 4070 Then
        MyDialogBoxParam = 1
    Else
        RecoverBytes
        MyDialogBoxParam = DialogBoxParam(hInstance, pTemplateName, _
                           hWndParent, lpDialogFunc, dwInitParam)
        Hook
    End If
End Function

Paste this code under the above code in Module1 and run it

Sub unprotected()
    If Hook Then
        MsgBox "VBA Project is unprotected!", vbInformation, "*****"
    End If
End Sub

Come back to your VBA Projects and enjoy.

Best Answer

CreateDocumentFromUrl

URLDownloadToFileA

Code

Related Solutions

C# – How to create an Excel (.XLS and .XLSX) file in C# without installing Microsoft Office

Excel – way to crack the password on an Excel VBA Project

How it works

Using the code

Related Topic