Answering your comment question here where I have more space. Let's say you have an application you've designed on a linux box, and it runs an algorithm you wrote for counting the number of cat pictures on the internet. Now it runs but it's slow because there is a lot of cat pictures to go through, so you want to accelerate it in hardware.
So you use this tool to write OpenCL code which is in C but with some restrictions on form because it's going to be "compiled" to run on an FPGA. The call this portion of the code kernels. Now these kernels are going to be synthesized and run on the FPGA, maybe you have 1 or maybe you have a 100 working parallel.
You're doing all this right in your application, inline, so when you get to the point of actually counting the cats you're using their APIs to do the processing on the FPGA.
I just saw your other comment that it's not out yet, I know Altera's stuff has been out for a while you can find a bunch of design examples here
All that said it depends what your goal here is, do you want to learn how to write verilog, test benches, and be an FPGA designer? Or are you looking to just accelerate algorithms or functions using hardware without doing all that?
Like any tool-set, which one to use depends on the job at hand.
I found out what the problem was and why I wasn't able to pipeline this. Vivado HLS found a way to treat sum += i
as a constant multiplication and so the latency remained constant.
Therefore, if the latency is constant, the pipleine didn't make sense.
Best Answer
Because HLS means High Level Synthesis. [1] In other words, these blocks take advantage of Vivado's special cores that are hardware optimized for those operations (like doing fast or parallel addition and multiplication in your example). The preceding blocks, as one might surmise from the flowchart you have provided, indicates these are adapter blocks. Since IP blocks usually cannot be used directly with your code blocks, you need some translation "glue" logic.
You could of course implement adder and multiplier logic on your own using the general logic cells of your board but that may consume resources you could otherwise be spending on other logic. Additionally, the cores are already sitting in your chip so you would be under-utilizing your chip in some sense.
IP Cores are special because they don't use the generic FPGA cell (The lookup table, the adder, the flipflops) but baked in circuitry specialized for the task. They are closer to ASIC hardware blocks. Their inclusion of these specialized cores are often proprietary and hence Intellectual Property (IP).
[1] https://www.xilinx.com/products/design-tools/vivado/integration/esl-design.html