Assuming you mean to protect confidentiality of the communication at IP layer with IPsec:
How would the underlying network be able to differentiate between UDP
and TCP since they're at the transport layer.
The next header field of the ESP header tells you the type of payload.
If you use tunnel mode (which is custom for VPNs), then without the necessary keys you cannot decide what's at transport layer because the next header field will tell you just that there's a whole IP packet encapsulated.
If you use transport mode, then the next header field tells you the type of payload at transport layer.
Will we still have TCP and UDP when we move the IPv6(Although I see
that IPsec has been made optional for IPv6)?
TCP and UDP are agnostic to the layer-3 protocol. In fact, TCP and UDP (and SCTP and DCCP) exist also for IPv6.
What seems to puzzle you is that in IPsec tunnel (VPN) mode there is no way to inspect the content. This is supposed to happen at the tunnel end-points. An organization that is worried by this loss of control should not allow IPsec that is not under it's own control.
Further reading: An Illustrated Guide to IPsec
Ethernet has its own checksum, and it has nothing to to with IP, TCP, or UDP. Neither TCP not IPv6 have anything to do with the UDP checksum. UDP on the source will create the checksum, and UDP on the destination will verify the checksum.
I think you don't really understand the network stack layers.
Layer-2 protocols, e.g. ethernet, Wi-Fi, etc., may use a checksum. In general, layer-2 protocols will drop any layer-2 frame with a bad checksum anywhere along the layer-2 path. For instance, a switch will discard an ethernet frame with a bad checksum. Layer-2 protocols don't care which layer-3 or layer-4 protocols are carried in their frames, nor are they aware of any layer-3 or layer-4 checksums.
In layer-3, IPv4 has a header checksum that layer-3 devices, e.g. routers or hosts, will inspect to verify the integrity of the IPv4 header, discarding any layer-3 packets with a bad header checksum. IPv6 has done away with the IPv4 header checksum. Layer-3 protocols do not care which layer-2 protocol carries their layer-3 packets, nor which layer-4 protocols they carry. Neither are they aware of any layer-2 or layer-4 checksums.
Layer-4 protocols, e.g. TCP, UDP, etc. may have a checksum. In IPv4, the UDP checksum was optional, but it is mandatory with IPv6. A layer-4 protocol will inspect it own checksum, and it will discard any datagrams with bad layer-4 checksums. Layer-4 protocols are unaware of any layer-2 or layer-3 checksums.
Best Answer
Layer-2 switches are oblivious to anything above layer-2. Layer-2 protocol carry a variety of layer-3 protocols (IPX, IPv4, IPv6, AppleTalk, etc.). The layer-2 switch knows nothing about this; it only sees the layer-2 frame header.
The layer-3 protocol can carry a variety of layer-4 protocols (TCP, UDP, etc.). The layer-2 switch would need to strip the layer-2 header to look in the layer-3 packet to see which layer-3 protocol the layer-2 frame is carrying, and it doesn't do that.
Routers (layer-3 devices, including layer-3 switches) will strip the layer-2 frame to reveal the layer-3 packet. A router will then switch the layer-3 packet, based on the layer-3 header to a new interface, and create a new layer-2 frame for the new interface.
A layer-3 switch is really a layer-2 switch, but with a router built in. The routing part of a layer-3 switch only gets involved when a frame is destined to a layer-3 interface in the switch.
Edit to answer your comment:
An application will send data to UDP, which encapsulates the data into layer-4 datagrams, including a UDP header, which are then sent to IP (either IPv4 or IPv6). IP encapsulates the UDP datagrams inside IP packets, including an IP header. IP sends the packets to ethernet, which encapsulates the IP packets inside ethernet frames, including an ethernet frame header. The ethernet frame header has the source and destination MAC addresses. Your host may not know the MAC address of the destination host. It will look in its ARP cache for the MAC address of the host with the destination IP address. If it is in the ARP cache, it uses that MAC address. If not, it will broadcast an ARP request, asking for the MAC address of the owner of the IP address. The destination host will reply with its MAC address, and the frame is built. The frame is then sent out the physical interface to the switch.
The next part involves the switch. A switch will build a MAC address table with the MAC address and port. Every time a switch receives a frame on a port, it will update its table with the MAC address and which port that MAC address comes from. When the switch receives a frame from your host, it will look at the destination MAC address, and it will look that up in its MAC address table so that it can deliver the frame to the port where the destination MAC address is. If it can't find that MAC address in its table, it will flood the frame to all ports.
When the destination host receives the frame, it will reverse the encapsulation process.
As you can see, each layer is independent of the other layers, and the switch knows nothing about IP or UDP.
The process is the same for a destination on a different network, except that your host will use the MAC address of its configured gateway (router, including the routing part of a layer-3 switch). The router will strip the frame, look at the packet, switch the packet to a different network interface, and build a new frame for the new interface.