As for my understanding of TCP, asserting "Keeping TCP connections alive" is misleading, as there is no TCP-protocol-specific mechanism dealing with timeout, when referred to ESTABLISHED connections. I mean: once established, they can last forever, until a RESET, a FIN or a timeout in receiving an ACK (...following some transmission to be ACKnowledged, in this last case) happens.
As for my experience, 100% of "suddenly broken due to idle timeout" sort of issues, depends on some intermediary router/firewall, along the routing path between the two communicating hosts. I mean: as the firewall tipically is a "statefull" firewall, it keeps track of connections it is firewalling/managing. As such, every connection it need to track means some degree of system resources (of the firewall, I mean) to be consumed. Also, the firewall knows perfectly which of the managed connections are "working" and which one, viceversa, are "idle", due to the very nature of the firewall itself (it's a stateful firewall!). As such, lots (all?) of the firewall implementations have a timeout defined and if the managed-connections are idle for such a timout value, the firewall send a reset to the both ends (...of the TCP connection) and frees its own resources.
Based on your question, I bet that the TCP connection will be opened by your IoT device (acting as a client) versus your controlling-server (the TCP server). Hence... LOTS, if not ALL, of the ADSL home router that will NAT your IoT device traffic, will surely act as described.
This, at least, based on my own experience.
But as I'm not Jon Postel, please don't blame me if I'm wrong :-)
As a side note: you wrote "...LOTS of simple IoT devices...". Please keep in mind that there is a very hard-limit in the number of concurrent TCP connections you can handle with your one-single-big server as.... TCP "port" is a 16bit values. So, for each IP address, you cannot exceed (by TCP intrinsic design) 64K connections. How this problems can be solved, it's out of scope, in the context of this question.
Finally, let me add that I really see no problem in implementing a sort of heartbeat protocol between you IoT device and the managing server/application. It can be implemented to be very "network-friendly", with no impact in terms of bandwidth and with lots of advantages, in terms of manageability/control.
Best Answer
After the fragment reassembly timeout expires, the fragment is dropped; the other end would need to retransmit.
This timeout is generally configurable. On Linux, it's 30 seconds by default and controlled via
/proc/sys/net/ipv4/ipfrag_time
.