the second you need to know is who are you (your future device): a master or a slave? An AXI master can initiate transactions but the slave cannot, it can only respond to a master initiation. while the first is what it is very probable that you (should) use the AXI as an IP block in your design therefore the cyclone handbook will not help you but the AXI IP block documentation, user guides, examples, etc... will.
typically, the FPGA development environment integrates the tool to inherit and configure various IP blocks. in altera quartus ii, MegaCore IP Library responds for this.
therefore my advice to you is to learn quartus deeper :-)
good luck.
I think you are a little confused about the fundamental difference between the FPGA, a processor, and a microcontroller.
In simple words:
A processor is a special logical device that can do only a specific functionality, like executing a program instruction by instruction. So, since there is a limit to the speed at which the processor can "jump" from line to the next, we say that it is time limited and can not work faster than its maximum. A microcontroller is like a processor but it has memory and is like a system on a chip.
An FPGA is a "scaffold" of logical components, primitive blocks that can be interconnected into larger blocks to perform a specific function. Therefore, if you have enough "components" to make two or three or more identical blocks that can perform the same function, you can run them in parallel, and thus will get the job done faster (just like using several processors to work in parallel). In fact, you can design your own processor(s) and implement them on an FPGA. But the key idea is that if you do not have enough logical components, your are space limited, or better to say, resource limited. The amount of the logic that can be placed on the IC die is limited due to the limited space (and for other reasons).
So, the bottom line is that you can make a really fast system using FPGA, if you have enough FPGA resources (area on which logic components reside), so the ultimate limit is the "size" - which defines how many adders, multipliers, RAM blocks, etc. are given in a particular FPGA chip. As for the processor, there is no way you can make it run faster than its maximum frequency. So, this is where you have the bottleneck - you can not jump to another function before you finished the previous one, but with an FPGA, those functions can be ran together at the same time.
As for the physical sizes, the chips are made of different sizes depending on the amount of resources that should be inside them, and the number of pins. You can look up different parts and compare the sizes in terms of physical dimensions and resource wise. But, as mentioned in the other answers, you should not mix the chip size with the PCB board size that contains all the necessary components to interface a chip.
Best Answer
As you said it has an arm core, they also tend to have a smaller amount of fpga resources compared to a similar price fpga only part.
It's neat that it allows the a direct access to the fpga fabric and if remember right it can also share a DDR interface between arm and fpga section.
It's nice if you need to run part of your application in software like maybe a high level portion of a network processing stack or the control plane for data processing while doing complementary things in hardware.
If you don't need it though you're better off with a dedicated fpga.