Linux – Should You Continue Polling Socket For Readiness After An EAGAIN or EWOULDBLOCK Error

iolinuxsocket

I am creating a web crawler with a multiplexed download manager using Linux epoll (Linux 2.6.30.x). I pick links from a database of over 40,000 domains (each domain having between 1 and 2000 urls), a total of 250,000 urls. I multiplex the downloads so that on average I have not more than 2 parallel streams per host (as per the HTTP spec recommendation), and also so that I loop over between a batch of 10 to 50 hosts at a time. I have chosen non-blocking sockets and epoll for speed and scalability (am low on RAM) and ease of use compared to the poll, select and signal-driven I/O.

I download the first few 100s of urls very smoothly and rapidly. Trouble is, I keep getting EAGAIN/EWOULDBLOCK error from certain links (sockets) that otherwise seem ready (i.e. I can use my PC's browser to open the links at any point). But even after epolling them repeatedly expecting their status to change to EPOLLIN, they remain EAGAIN/EWOULDBLOCK. These links build-up very quickly so that I have to stop the whole download.

What really does EAGAIN/EWOULDBLOCK mean? Is EAGAIN/EWOULDBLOCK a permanent status, so that once detected I should delist that socket from any further observation?

Kindly help.

Best Answer

This link shows the meaning of error codes in GNU library. EAGAIN/EWOULDBLOCK means resources temporarily unavailable. The call might work if you try later. An example is the case of non-blocking IO operation that will block.