Regarding VxLAN Encapsulation

vxlan

I am reading up on VxLAN and understand the encapsulation process somewhat as below:

Step1: Take your original Ethernet frame.

Step2:Put it inside VxLAN encapsulation.

Step3:VxLAN should then go inside of UDP header.

Step4:UDP goes inside of IP(this should be the transport IP, i guess)

Step5: IP goes inside of whatever the transport is(e.g.Ethernet)

Q1: Please confirm if the above understanding is correct.

Q2:Why do we need VxLAN header to go inside UDP, why not send it over plain IP?

Q3:In other tunnel mechanisms, like the OTV, we don't use any layer 4 protocol(like TCP or UDP), so why use it here? Any specific reasons.

Q4: Why use UDP(since it is best-effort based), why not use TCP?

Q5: Can i look at VxLAN the way i look at HTTP or Telnet(both are applications and operate at layer 7), HTTP uses TCP port 80, telnet uses TCP port 23, what i am trying to understand is VxLAN an application that operates at layer 7 and fits into osi model ?
Also, which OSI layer would OTV be operating at and why?One answer below says that OTV can be done using UDP as well rather than MPLS/GRE, does that make OTV a protocol that operates at layer 7?

I have also attached a snapshot of packet capture of VxLAN header from one of the video lectures.enter image description here

Best Answer

Yes - your understanding of encapsulation is correct: a given frame has a VXLAN header applied. This is carried in a UDP packet.

UDP is used as a convenient format in terms of programming and its use of src/dst port provides a ready means to both multiplex connections as well as a means by which intermediary forwarding elements can hash connections over parallel links. In short, UDP is familiar, has low overhead and is already extremely well understood.

OTV can run over MPLS-GRE or UDP, with UDP being the preferred mechanism for the past few years. Again, one of the big drivers was depolarization of traffic (allowing parallel paths to carry determinstically hashed fractions of overall traffic).

Why not TCP? Excessive overhead on the encapsulating device and added latency are big examples. The bigger point, though, is that to get any value out of TCP would require the ability to concatenate the individual packets to be encapsulated into an overall stream to be managed via sliding windows rather than simply maintaining a 1:1 mapping. Add in the amount of state tracking and buffering issues associated with having the capability for retransmission and it becomes truly unruly.

Related Topic