In general communication standards only discuss stuff that is visible to other hosts on the network. API standards usually only discuss the interface between the OS and applications.
It is clearly nessacery to pass some metadata (size, addresses etc) between the "transport" and "internet" layers but how exactly this is done is an implementation detail.
You seem to have the general idea.
First, there really is no Physical Layer header. The Physical Layer simply takes what the Data-Link Layer gives it, and it encodes the Data-Link frame (encoding) for the interface and places that on the "wire" (signalling), or it performs the opposite when receiving data.
When layer-3 sends a packet to layer-2, layer-2 needs to encapsulate that with a layer-2 header. Part of the layer-2 header may include a MAC address for layer-2 protocols that use MAC addresses (not all do, and of those that do, some use 48-bit MAC addresses, and some use 64-bit MAC addresses).
In order to put a destination MAC address in the layer-2 frame, the destination layer-3 address must be resolved to a layer-2 address. Layer-2 needs the destination layer-2 MAC address in order to build the layer-2 frame to encapsulate the layer-3 packet. That is where ARP (Address Resolution Protocol) comes in.
ARP has to do with data leaving the host, not coming into the host where the headers are stripped. Depending on the network stack implementation, as a layer-2 frame come into a host, the MAC address may get saved in the host's ARP cache as the frame is stripped from the packet. Layer-2 will inspect the layer-2 frame to determine to which layer-3 protocol in the network stack the frame payload (layer-3 packet) should be sent.
The Ethertype field is a field in ethernet frame headers. Other layer-2 protocols have other ways of doing this. Remember, ethernet may be the most used layer-2 protocol, but it is not the only layer-2 protocol, and each have their own frame headers. Layer-3 protocols can have the the same type of thing. For instance, IPv4 has the Protocol field, and IPv6 has the Next Header field, to tell layer-3 to which layer-4 protocol the payload of the layer-3 packet should be sent.
Best Answer
These layers are just abstract concepts. They don't actively do anything by their own. Instead such models are a tool to deal with complexity, get a common understanding of the functionality and to structure the code in a way which can also be understood and managed by others.
This means there is no "transport layer know ...". There is instead a specific implementation (usually in the OS) dealing with all the different layers and which can also easily exchange information between these layers, like the IP address.