Answering some of the generic questions:
Are Jumbo Frames an unquestionable requirement?
For FCoE it's somewhat irrelevant, as the FCoE frames are handled differently, their own default MTU is 2148 (frame size). You can do larger, but as with jumbo frames in general there's minimal benefit. An FCoE capable switch will just handle this.
Is it better for these modules to sit in the aggregation layer or access layer in case there's a future need to tie them together (as we have an L2 access layer not directly tied to one another with L2 adjacencies between pairs of access switches that traverse the agg switches).
Access really is the main place to use FCoE, replacing HBA's and edge FC switches with CNA's and 10g "converged ethernet" switches. Replacing distribution and core with FCoE can make sense when bandwidth needs justify it (or new build that has to run FC).
This is probably your sticking point actually, if you're not doing 10g to the machine you're almost certainly going to be very underwhelmed with the performance you'll get.
(Extra credit) Is FCoE the better choice over iSCSI?
Depends on what you're doing, almost certainly not, and there may be better options than either.
At a previous job we had a nice design using NFS (storage was NetApp's) where OS volumes were stored on a shared volume between the VM hosts (as normal), but data was individual NFS volumes, shared or separate as appropriate for the host, worked well in our environment as it was one group of admins managing ops from the app down to the hardware (well, its config, rack-n-stack, swaps, etc. was smarthands). Combined with NetApp's snapshots & snapmirror this made backup and restores trivial.
L2 Access is often needed when autonomous wireless access points are deployed across multiple switches, thus allowing users to roam from AP to AP without having to get new addresses. Current controller based wireless solutions tunnel the user traffic to a central drop-off point, so wireless APs can be connected in any subnet so long as they can communicate with the controller.
L3 in the access layer works well when there's no need for devices connected to multiple switches to access the same subnet/broadcast domain. It eliminates the need for loop prevention (STP) and VLAN trunking configuration (such as Cisco VTP or manual VLAN configuration).
L3 access layers do add their own requirements for protocols and configuration. Unless you want to do static routing you have to take on configuration of a dynamic routing protocol such as such as OSPF (standard) or EIGRP (Cisco).
In L3 access networks with dynamic routing it is also a good idea to summarize or tune advertised routes between layers of the network. For example, you can advertise only a default route from the Aggregation layer to the Access layer to limit the number of routes a simple access switch must learn and maintain. Turning off auto summarization on the Access switches can also help prevent a misconfiguration from causing one switch to advertise routes it should not. Auto summarization is disabled by default on many current software revisions, but is worth verifying for your switch and software version.
Finally, access layer diameter is a factor in choosing L2 or L3. If you want to daisy-chain one access switch off another (where one switch does not connect directly to aggregation) you may need expanded features to run dynamic routing. In Cisco switches you need a more advanced IOS license to run EIGRP in non-Stub mode, allowing it to advertise routes it learned from another switch to the rest of the network.
Best Answer
VPLS, AToM, and L2TP are also additional ways to glue ports together that land on distant routers. Some of them require MPLS on all the in-between routers, but L2TP(v3 in particular) does not require anything besides ip routing on the intermediary routers. Basically, they glue together two remote router ports, and can usually pass spanning tree, etc, since they're not acting as switches in this regard.