Powershell – cUrl for an HTTPS address/domain times out unless previously accessed from browser

curlhttpsjirapowershell

I've lost a couple days to this problem and hope it sparks a thought from someone.

I am integrating several systems together using Powershell scripts. One of the two services I am connecting to (hosted JIRA) can be accessed just fine from my local system, but the script would fail when running from one of my VMs. I found, through chance, that if I opened/refreshed a browser on the server for an HTTPS URL for that host then the script would be able to access the API over HTTPS for about 20-30 seconds afterwards.

I receive a timeout error when I remote into the server and try this from a powershell console. I then verified the same behavior occurs with cUrl (verbose output included below). Refreshing a browser with that domain then allows both to access HTTPS URLs for a short period of time. It appears to be timing out on the initial connection before SSL negotiation.

Representative PoSH Command:

Invoke-RestMethod -Method Get -Uri "https://MYDOMAIN.atlassian.net/rest/api/2/issue/PLPT-1?fields=key,id,status" -Headers @{"Authorization" = "Basic "+ [System.Convert]::ToBase64String([System.Text.Encoding]::UTF8.GetBytes('USERNAME:PASSWORD'))}

Representative cUrl command:

curl.exe "https://MYDOMAIN.atlassian.net/rest/api/2/issue/PLPT-1?fields=key,id,status" -u "USERNAME:PASSWORD" -v -X GET

I've done a lot of digging on this and I'm pretty stumped. I did try using Wireshark to dig deeper, but it's been years since I used a packet sniffer and I'm rusty and having to learn the UI.

Troubleshooting:

Here are the questions/answers I could think of while trying to isolate the problem:

  • Is it powershell?
    • Using cUrl also times out
  • Is it all HTTPS?
    • https://google.com/ works fine without timeout
    • https://localhost/... works fine without timeout
  • Is it a system that has accessed JIRA via browser ever?
    • I verified my home desktop could connect via PoSH despite never having accessed JIRA
  • Is it Host, DC, or OS?
    • This is a 2008 R2 VM in Azure, I verified the PoSH and cUrl commands work fine in a 2nd Azure VM running 2008 R2
  • Firewall, Antivirus?
    • Disabled Antivirus and Firewall, cUrl + PoSH still timeout
  • User agent?
    • Including a user agent didn't make a difference on problem system or working systems
  • What does Fiddler say?
    • Fiddler w/ SSL decryption caused gateway errors to occur instead of timeouts, I haven't dug deeper
  • Maybe it's a network issue for Atlassian? Intermittent connectivity?
    • I've been consistently getting errors from my server and it's been consistently working from everywhere else I have tried
    • I performed 10 in a row calls on the server and locally and got perfect returns from the 10 local and perfect timeouts from the server. After doing the browser refresh trick on the server, I had 10 in a row perfect responses.
  • What does it look like in Wireshark?
    • With cUrl: Wireshark shows the initial TCP call go out, but it isn't ACKed, so you then see two TCP Retransmission attempts
    • With cUrl after brower priming: Wireshark shows the first TCP call is ACKed and then everything works as expected

For a short amount of time I thought I had gotten cUrl working consistently. I was using -3 -4 to force SSL3 and ipv4 addresses and it appeared to be working without me having to prime the connection with a web browser. Unfortunately after rebooting this no longer works.

Methods I have tried on the server:

  • cUrl, cUrl with -3 -4
  • PoSH: Invoke-RestMethod, Invoke-WebRequest, WebClient, WebRequest/WebResponse, setting default SSL to SSL3 via ServicePointManager, setting proxy and proxy credentials via system defaults in case there is one (not to my knowledge)
  • IE: works
  • Chrome: works

cUrl Output

Here is some sample output from cUrl. I already have a browser open to https://MYDOMAIN.atlassian.net (it's sitting on the login screen), but I've left it sitting for a while so the connection would be stale.

cUrl output before refreshing the browser:

* Hostname was NOT found in DNS cache
*   Trying 165.254.226.145...
* connect to 165.254.226.145 port 443 failed: Timed out
* Failed to connect to MYDOMAIN.atlassian.net port 443: Timed out
* Closing connection 0

cUrl output when I run right after refreshing the browser:

* Hostname was NOT found in DNS cache
*   Trying 165.254.226.145...
* Connected to MYDOMAIN.atlassian.net (165.254.226.145) port 443 (#0)
* successfully set certificate verify locations:
*   CAfile: C:\Users\Administrator\AppData\Local\Apps\cURL\bin\curl-ca-bundle.crt
  CApath: none
* SSLv3, TLS handshake, Client hello (1):
* SSLv3, TLS handshake, Server hello (2):
* SSLv3, TLS handshake, CERT (11):
* SSLv3, TLS handshake, Server key exchange (12):
... rest of handshake and HTML for a 401 error page because I didn't force pre-authentication ...

Updated

I added Wireshark results to questions above.

I've now also found that if I run the cUrl command and cancel it before it times out and immediately run it again, it is successful. if I let the cUrl command timeout then immediately run it again, it times out again.

If I run the PoSH command and cancel it before it times out and immediately run it again, I can actually run it 5+ times in a row successfully.

This is definately something networking related, I'm going to see if re-running the command eventually gets to a point where it times out again or if cancelling out of the first call somehow lets me keep making subsequent calls as long as I can (which may be possible, I think PoSH is taking advantage of keep alive once the initial connection is formed).

Best Answer

My temporary "solution" is to use a short timeout on the initial calls and immediately retry if they fail. The timeout is short enough that on this server it fails and then retries again fast enough to start communicating successfully (just like when I ran it manually, cancelled it, then ran again).

So far it looks like having one timeout and retry is good enough to keep the connection working for the rest of the automation script to run problem-free.

This is a workaround, I'm still looking for the root cause and a better answer.