Downloading with U-Boot’s tftp randomly times out

tftp

I have a custom board with TI OMAP SoC. I'm trying to download uImage from linux machine
via U-Boot's tftp. It fails with timeouts (most of the tries timeout limit exceeds and very rarely it gets through) on several, but
succeeds on others. However any other combination not involving U-Boot is flawless. Even
when the board in question has booted kernel. Comparing network settings (incl. sysctl)
gave no significant difference between serving machines, which run Linux.

Following tests were taken:

  • u-boot <-> i686-pae Linux
  • u-boot <-> i686-pae Linux kvm guest
  • u-boot <-> x86_64 windows 7

Results are as follows:

  1. u-boot <-> i686-pae Linux
Using DaVinci-EMAC device
TFTP from server 192.168.100.254; our IP address is 192.168.100.88
Filename 'uImage'.
Load address: 0xc0700000
Loading: ############T ###############################T ##########T ############
              #######T ################################################T ##########
              ##########################T #######################################
              ###########################T ######################################
              ################################T #################################
    #################################################################
              ########T #########################################################
              ##################
              11.7 KiB/s
done
Bytes transferred = 2418464 (24e720 hex)

Corresponding traffic dump can be found here:
http://pastebin.com/hBBwe9bL

  1. u-boot <-> i686-pae Linux kvm guest
Using DaVinci-EMAC device
TFTP from server 192.168.100.112; our IP address is 192.168.100.88
Filename 'uImage'.
Load address: 0xc0700000
Loading: #################################################################
#################################################################
#################################################################
#################################################################
#################################################################
#################################################################
#################################################################
          ##################
          795.9 KiB/s
done
Bytes transferred = 2418464 (24e720 hex)

Corresponding traffic dump can be found here:
http://pastebin.com/ZXYdpmSe

  1. u-boot <-> x86_64 windows 7
Using DaVinci-EMAC device
TFTP from server 192.168.100.86; our IP address is 192.168.100.88
Filename 'uImage'.
Load address: 0xc0700000
Loading: #################################################################
#################################################################
          ###################################
          173.8 KiB/s
done
Bytes transferred = 2418464 (24e720 hex)

Corresponding traffic dump can be found here:
http://pastebin.com/UWFEZjTz

At this point I have no idea, what could cause timeouts for u-boot and I have no more
clues on how to solve this. Any help greatly appreciated.

It certainly has something to do with U-Boot network stack, but I believe this is the right place to ask this question.

I have read this article: http://www.denx.de/wiki/view/DULG/TFTPTimeout, however what is described there is not related to my situation since results do not depend on switches in-between.

What I have tried already: tftpd / tftpd-hpa; tftpblocksize=512; x86_64 linux kernel (tftp server); changing switch port settings to not aneg, but explicit full-duplex; as well as half-; adding/removing switches in-between; changing MTU at the serving machine; building latest U-Boot from source; varying server IP-address within /24; changing sysctl net. mem settings; sent a message to U-Boot mailing list, but got no reply; made static arp for U-Boot MAC.

Best Answer

As further experiments showed, the problem in this particular case was because of u-boot losing incoming packets coincidentally with --- NetLoop timeout handler set and --- NetLoop timeout, which I suppose is caused by either mac driver implementation in u-boot or u-boot networking handling itself. High pps of upstream Cisco switch might contribute to this because of rapid packet processing.

Successful transfers from Windows host are manifested by the fact that chosen tftp implementation has timeouts less than those of u-boot and as a consequence resending packets again, which happen to be captured by u-boot within its time limits.

Tests also show that setting #define TIMEOUT 8000UL at compile time or tftptimeout 1000 at run time allow for enough time for tftp server to retransmit lost packet allowing transmission to proceed. This is exactly what's observed with downloading from Windows host.

More testing showed that setting upstream switch port (not the same port, facing u-boot with the board in question) to 10 Mbit/s half-duplex solves transmission issue, however this significantly reduces the bandwidth available.

Related Topic