You have to remember that models like OSI are just that, models. They are theoretical. The real world doesn't fall neatly into these models. For the most part, routing is a layer-3 function, but, as you pointed out, BGP uses a layer-4 protocol to communicate with other BGP speakers in order to do what is normally considered a layer-3 function.
Many network protocols fall into a gray area, or are considered in one layer while using another layer. Take ARP for instance. It resolves layer-3 addresses to layer-2 addresses. Which layer should it be considered to be in?
Understanding the models is useful, but the models are not mandated by any organization, and you are free to create protocols and functions that do not follow any model.
You seem to have the general idea.
First, there really is no Physical Layer header. The Physical Layer simply takes what the Data-Link Layer gives it, and it encodes the Data-Link frame (encoding) for the interface and places that on the "wire" (signalling), or it performs the opposite when receiving data.
When layer-3 sends a packet to layer-2, layer-2 needs to encapsulate that with a layer-2 header. Part of the layer-2 header may include a MAC address for layer-2 protocols that use MAC addresses (not all do, and of those that do, some use 48-bit MAC addresses, and some use 64-bit MAC addresses).
In order to put a destination MAC address in the layer-2 frame, the destination layer-3 address must be resolved to a layer-2 address. Layer-2 needs the destination layer-2 MAC address in order to build the layer-2 frame to encapsulate the layer-3 packet. That is where ARP (Address Resolution Protocol) comes in.
ARP has to do with data leaving the host, not coming into the host where the headers are stripped. Depending on the network stack implementation, as a layer-2 frame come into a host, the MAC address may get saved in the host's ARP cache as the frame is stripped from the packet. Layer-2 will inspect the layer-2 frame to determine to which layer-3 protocol in the network stack the frame payload (layer-3 packet) should be sent.
The Ethertype field is a field in ethernet frame headers. Other layer-2 protocols have other ways of doing this. Remember, ethernet may be the most used layer-2 protocol, but it is not the only layer-2 protocol, and each have their own frame headers. Layer-3 protocols can have the the same type of thing. For instance, IPv4 has the Protocol field, and IPv6 has the Next Header field, to tell layer-3 to which layer-4 protocol the payload of the layer-3 packet should be sent.
Best Answer
Simply put, different layers of the OSI model have checksums so you can assign blame appropriately.
Case 1: Only use Ethernet checksums
If we only rely on Ethernet (i.e. OSI Layer 2) checksums, then that error would go un-noticed until something crashes or throws an error, because the Ethernet NIC would simply transmit the (already corrupted) data that it received from the Operating System IP stack. For sake of argument, let's assume the TCP payload is corrupted, but the Ethernet checksum is fine.
When the IP stack on the other side receives the Ethernet frame, it unpacks the IP payload and delivers it to the webserver. However, the TCP payload in this packet is corrupted. When the web server crashes from data corruption, the developer has no way to isolate whether this was an IP-level failure or a TCP failure (or perhaps something else farther up the application stack).
Case 2: Layered checksums
However, if TCP, IP and Ethernet all have checksums, we can isolate the layer where the error occurred, and notify the appropriate Operating System or application component of the error.