Depending on the actual "Metro Ethernet" service that your carrier is providing, you have several possible solutions. I'll address what I see as the most likely scenario, and some of the solutions in that scenario.
Your carrier is probably using Q-in-Q tagging, and your local VLANs are irrelevant. (See the Wikipedia page on 802.1ad for info on Q-in-Q, or this Cisco config guide on VLAN tunneling.)
This situation, where the carrier is using Q-in-Q, is usually the case in my experience. They will accept whatever VLAN's you send, and then apply Q-in-Q tagging and send the traffic across their network. So inside the carrier network, your traffic destined towards Site-A could be tagged with VLAN 10. When the frame arrives at the PE equipment, it will have that additional VLAN tag stripped, and be forwarded onto your equipment with the original VLAN tagging intact.
It is possible that the carrier is utilizing your applied VLAN tags to direct the traffic. (i.e. VLAN 10 for Site-A and VLAN 20 for Site-B.)
The easiest solution: Tell your carrier that they have to choose different VLANs for this traffic engineering purpose. You are the customer!! Their sales-engineers should have gathered the appropriate information to make sure there wasn't overlap before designing this solution/service for you. Don't accept the circuits until they resolve their issue. IF they are using Q-in-Q, they only need to know which VLAN goes to which location for administrative purposes, not for any technical reason, and should be able to change their configuration.
More complicated solution: Investigate Q-in-Q tagging/VLAN tunneling, for yourself. Depending on your hardware/licensed capabilities, you could maintain your locally significant VLAN tags, and then slap another tag on the frame for the carrier. Then when the frame arrives at your destination, strip the extra tag off, then send the frame on it's way based on the original VLAN.
With all of that stated, there may be some other scenario where they HAVE to use VLANs 10 and 20. Ask your carrier for the explanation as to why this is the case.
If your carrier is difficult to work with in this scenario, (won't provide an explanation, or work around your local VLAN structure) imagine what they'll be like during a service outage.
Always use the install process to test your service provider! If customer service isn't on their radar, you should be leery of their services. That is to say, if they perform poorly on the install, you usually have more of the same "quality service" to look forward to for the length of your entire contract.
Best Answer
So, you want to tunnel Ethernet frames over an IP network that has an IPsec link? Works like any IP network, but you have to be careful with MTU (as always). IPsec links usually have lower maximum transmission unit (MTU) than 1500, but with Ethernet over IP you will anyway run into MTU problems.
The solution may be VXLAN, specified in RFC7348. However, do note that as VXLAN operates over UDP, there is a large amount of overhead. If the IPsec link has an MTU of 1500, then IPsec, UDP and VXLAN together add overheads meaning the Ethernet link has an MTU smaller than 1500. To have MTU of 1500 for the Ethernet link, you will need an MTU bigger than 1500 for the IPsec link, which isn't usually possible in the Internet.
Note that as VXLAN operates on layer 2, it has no way to generate ICMP packet too big messages (which are layer 3 messages). This means that you have to manually configure the MTU to a smaller value for the Ethernet link, or otherwise you will have dropped packets (=no connectivity) or fragmented packets (=performance problems).
GRE (generic routing encapsulation) specified in RFC2784 can also be used to transfer Ethernet frames (transparent Ethernet bridging, Ethertype 0x6558), but firewalls may not like GRE running directly on top of IP as much as they like VXLAN running on top of UDP. However, GRE is an industry standard that is almost unanimously used, so most good quality firewalls from reputable vendors should offer the possibility to allow GRE traffic.
The MTU/fragmentation issues apply equally to all protocols running on top of IP or UDP without TCP inbetween.
Now, what about TCP as the transport for Ethernet packets? The traffic you're transferring through the Ethernet link probably has already one level where TCP is being used, so you would then run TCP over TCP. This is heavily discouraged, as you have then two levels where retransmissions occur, meaning that the performance of the system can catastrophically degrade if there's packet loss. TCP would eliminate MTU problems, but because TCP over TCP can have catastrophical behaviour in the case of packet loss, I don't recommend it.