Assuming you mean to protect confidentiality of the communication at IP layer with IPsec:
How would the underlying network be able to differentiate between UDP
and TCP since they're at the transport layer.
The next header field of the ESP header tells you the type of payload.
If you use tunnel mode (which is custom for VPNs), then without the necessary keys you cannot decide what's at transport layer because the next header field will tell you just that there's a whole IP packet encapsulated.
If you use transport mode, then the next header field tells you the type of payload at transport layer.
Will we still have TCP and UDP when we move the IPv6(Although I see
that IPsec has been made optional for IPv6)?
TCP and UDP are agnostic to the layer-3 protocol. In fact, TCP and UDP (and SCTP and DCCP) exist also for IPv6.
What seems to puzzle you is that in IPsec tunnel (VPN) mode there is no way to inspect the content. This is supposed to happen at the tunnel end-points. An organization that is worried by this loss of control should not allow IPsec that is not under it's own control.
Further reading: An Illustrated Guide to IPsec
In addition to the data some metadata must be communicated betwen the application, transport and internet layers.
Technically how metadata is communicated between layers is an implementation detail. In practice the application layer nearly always uses some variant of the berkerly sockets API to talk to the transport layer.
For TCP clients the destination IP and port are specified to the transport layer as part of the "connect" API call. For UDP clients either "connect" can be used to create a psuedo-connnection or the destination IP and port can be specified on a per-packet basis with the "sendto" api call.
For TCP servers the application can read the IP and port by using by calling getpeername after accepting a connection. UDP servers can read the IP and port for each packet by reading packets using the recvfrom API call.
Unfortunately sendto and recvfrom have a design flaw. They only pass the remote address, not the local one which can cause problems for servers on multihomed hosts. The server may send replies from the wrong IP address causing them to be dropped, either by the network or the client. There are newer APIs to deal with this but the details vary between operating systems.
The transport layer will in turn inform the internet layer of the IP addresses for outgoing packets and the internet layer will inform the transport layer of the IP addresses for incoming packets. Since both the transport and internet layers are typically part of the TCP/IP stack the details of how this is done is an implementation detail inside the stack.
x-forwarded-for is a http header used by http proxies. The proxy will retrieve the client IP address using getpeername, it will then encode it into a http header to pass it on to the next server.
Best Answer
The TCP pseudo header has only information which was used to create the original connection (source and destination IP addresses), a length (which is available to the upper level) and a well-known constant, the protocol (TCP is Internet Protocol number 6.)
From RFC 793, p17.
This means that the process of wrapping a sequence of data bytes into a segment and then into a packet uses only information which the upper layer has. Indeed, you will see that for a given connection, all except the length are constant, which means they the checksum up to this point can be computed at connection-open and stored. The computation per packet can start with the length field in the pseudo header. Some particular implementation might actually store the length in that position, if that optimises the code a little.