The correct answer is because the ethernet specification requires it.
Although you didn't ask, others may wonder why this method of connection was chosen for that type of ethernet. Keep in mind that this applies only to the point-to-point ethernet varieties, like 10base-T and 100base-T, not to the original ethernet or to ThinLan ethernet.
The problem is that ethernet can support fairly long runs such that equipment on different ends can be powered from distant branches of the power distribution network within a building or even different buildings. This means there can be significant ground offset between ethernet nodes. This is a problem with ground-referenced communication schemes, like RS-232.
There are several ways of dealing with ground offsets in communications lines, with the two most common being opto-isolation and transformer coupling. Transformer coupling was the right choice for ethernet given the tradeoffs between the methods and what ethernet was trying to accomplish. Even the earliest version of ethernet that used transformer coupling runs at 10 Mbit/s. This means, at the very least, the overall channel has to support 10 MHz digital signals, although in practice with the encoding scheme used it actually needs twice that. Even a 10 MHz square wave has levels lasting only 50 ns. That is very fast for opto-couplers. There are light transmission means that go much much faster than that, but they are not cheap or simple at each end like the ethernet pulse transformers are.
One disadvantage of transformer coupling is that DC is lost. That's actually not that hard to deal with. You make sure all information is carried by modulation fast enough to make it thru the transformers. If you look at the ethernet signalling, you will see how this was considered.
There are nice advantages to transformers too, like very good common mode rejection. A transformer only "sees" the voltage across its windings, not the common voltage both ends of the winding are driven to simultaneously. You get a differential front end without a deliberate circuit, just basic physics.
Once transformer coupling was decided on, it was easy to specify a high isolation voltage without creating much of a burden. Making a transformer that insulates the primary and secondary by a few 100 V pretty much happens unless you try not to. Making it good to 1000 V isn't much harder or much more expensive. Given that, ethernet can be used to communicate between two nodes actively driven to significantly different voltages, not just to deal with a few volts of ground offset. For example, it is perfectly fine and within the standard to have one node riding on a power line phase with the other referenced to the neutral.
The first coil (from the jack point of view) in the magjack is an autotransformer needed to reject common mode noise. The coil is followed by (surprise!) a common mode choke needed to reject common mode noise too. Don't be afraid, this is a two step rejection.
What you need to be worried about in your design is that you must connect HX2260FNL by its common mode choke enabled side to the line, not to the PHY. And if you will not use Auto-Crossover (Auto-MDI/MDI-X) feature, try to find an asymmetric transformer.
This memo by Pulse Eng. (the vendor of your transformer) explains well the purpose and operating principles of both CMC and AT.
Best Answer
There is also a difference in what thickness of cables the connector can handle. Cat6 cable generally has thicker wires. Connectors labbled as "Cat6" likely had provisions for dealing with these thicker wires.
Take this image as an example (from similar serverfault question):