How to Optimize an OpenCL
Kernel for the data center
using Silexica’s SLX FPGA

In this application note, SLX FPGA accelerates a Fintech design example, leveraging Xilinx’s Vitis Platform’s bottom-up flow, Alveo U200 accelerator card, and Vitis quantitative finance library.



FPGAs are being increasingly employed as co-processors in data centers. A driver behind this transition is low-latency financial applications that leverage the parallel nature of FPGAs. The Xilinx Alveo family of accelerator cards that connect to x86 processors using a PCI express interface are very popular in this domain. For programming these accelerator cards you can either use a top-down approach, starting from a top-level C/C++ and OpenCL application and working towards lower-level kernels, or a bottom-up approach where the kernel blocks are compiled into Xilinx objects (.xo) that can be linked together into a binary at a later stage.

The bottom-up flow has several advantages over the top-down flow. (1) It allows design, validation, and optimization of kernels separately from the main application. (2) It provides faster iteration cycles for development and optimization of kernels by splitting the design into smaller components. (3) It facilitates reuse; a collection of (. xo) files can be reused like a library.

Xilinx provides several libraries, both domain-specific and common. These libraries are written in a manner which makes them suitable for high-level synthesis and are hand-optimized with HLS pragmas by Xilinx engineers.Of course, there are several situations where designers can not simply reuse a library implementation. For example, if a designer has a different performance or area optimization goal than the engineers who optimized the library. Or, if a designer needs to customize the implementation in such a way that makes it necessary to re-optimize it. Retargeting a design to a different architecture could also require reworking the HLS optimization pragmas.

In this application note, we use the Vitis quantitative finance library implementation of the Vasicek Model (v_model) as a reference design to show how designers can use SLX FPGA to optimize a kernel when using Vitis bottom-up flow. Note that the same methodology is also applicablewhen designing a kernel from scratch.


Development Flow

To create this application, the following development tools from Silexica and Xilinx are required

  • SLX FPGA version 2020.3
  • VitisLibrariesversion 2020.1
  • Vivado High-Level Synthesis version 2020.1
  • Vitis Unified Software Platform 2020.1

The entire end to end flow is demonstrated in figure 1.rsz_image003

Figure 1: SLX FPGA workflow for Vitis bottom-up projects


Introduction to Xilinx Vitis Libraries

Vitis libraries can be split into two groups:

  • Common libraries: These libraries provide basic functions which are used across a range of applications and domains, including maths, DSP, and linear algebra.
  • Domain-specific: These libraries provide acceleration functions for specific domains, e.g., security, vision, finance, or database.

The libraries provide three different levels of implementation, with each implementation level increasing the level of abstraction. Test benches are available at each level.

  • Level one: The lowest level of implementation, intended for use in a high-level synthesis flow. These could then be implemented in Vivado or used as part of a new kernel development.
  • Level two: Middle level provides acceleration kernels that are used in the Vitis design flow and the Xilinx RunTime (XRT).
  • Level three: Highest level provides applications created from several acceleration kernels. This uses API and, of course, XRT.

When working with a bottom-up flow, we take a level one kernel, optimize it with SLX, export it as a ‘linkable’ Xilinx Object, and then import it in our Vitis project.

Creating the Vitis HLS Project

Download or clone the Vitis Libraries from github1and createa new Vivado HLS project using the test case for the Vasicek Model at the following path. Vitis_Libraries\quantitative_finance\L1\tests\v_modelThe file “dut.cpp” contains the design that will be synthesized; import this as a source file. Then import “tb.cpp” as a testbench. While importing both these files add include path by editing the CFLAGS as shown in the left-hand side of figure 2:-IVitis_Libraries\quantitative_finance\L1\include\xf_fintechThis directory contains the header files for level 1 kernels of the Vitis fintech library.


Figure 2: Vivado HLS project setup

In the next step select the Alveo U200 as the target board and select the Vitis bottom-up flow, as shown in the right-hand side of figure 2. The bottom-up flow allows for Vivado HLS design to be exported as Xilinx Objects (XOs), which can be individually validated and combined in a Vitis application project. Running C Synthesis in Vivado HLS will implement the project and report the utilisation and performance. Figure 3 shows a summary of these results. We will use these results as a reference for comparison to the SLX FPGA optimized implementation.



Figure 3: Vivado HLS synthesis report for the V model test case

Optimising in SLX FPGA

The next step is importing the Vivado project into SLX. SLX supports directly importing Vivado HLS projects. Doing so will pull all the source files and libraries and setup the project, ready for analysis and code generation. Select the import Xilinx project in the import wizard and browse the location of the Vivado project as shown in Figure 4.


Figure 4: Importing Vivado HLS project into a new SLX FPGA project

Once the project has been imported you will see the configuration editor (figure 5), which enables you to set up the build and run information.Here you can also apply further resource constraints to the design. For example, ifyou have other components sharing the platform, SLX FPGA will then try to restrict the optimization according to these constraints. Note that we have downloaded the Vitis libraries outside of the SLX project; we need to make sure the Base Path includes the downloaded Vitis Libraries.


Figure 5: Configuration Editor in SLX FPGA

With the parameters correctly set, the design can be run and analyzed. In this process we are going to perform five steps:

  1. Run the design to verify the syntax and functionality of the algorithm
  2. Set-up the interfaces
  3. Find and Parallelizeloops in the FPGA
  4. Generate the instrumentedHLS code with the identified optimisation
  5. Synthesize design

Running the design within SLX FPGA will enable the syntax to be checked and the algorithm performance to be verified before moving into the more complex analyses SLX FPGA provides.

Note that because we are using a Vitis Library for our example, there will already be pragmas in the code. Since we want to use SLX to optimize the design, remember to comment out or remove the existing pragmas.


Figure 6: Interface mapping to AXI in the SLX FPGA Function Mapping Editor “Properties” view


Using this FPGA Mapping editor, we are also able to control the interface definition required to create a VITIS compliant Xilinx Object. Figure 6 shows how interfaces are selected through the SLX FPGA GUI. (1) Click on a top-level hardware function in the function mapping graph, (2) open the properties tab, (3) select the interface view, (4) select an interface variable, and (5) select an interface type and set parameters. To create a VITIS compliant XO, we need to define S_AXILITE interfaces for all scalar ports and M_AXI ports for arrays and pointers. When defining the M_AXI ports, ensure the depth is correctly set and the offset is set to slave. Apart from generating the appropriate HLS pragmas for these interfaces, SLX will also consider the constraints of these interfaces while making the optimization decisions.

Once the interface definition has been completed, the next step is to run find and optimize parallel loops.SLX FPGA will find all parallelism opportunities and go through an extensive architectural exploration process to determine the set of pragmas that will result in the best possible performance given the available resources. The SLX Hints tab (figure 7) shows all the parallelism opportunities and identifies any blockers, such as loop carried dependencies.


Figure 7:SLX FPGA Hints view

Once all analysis and optimization is complete, the next step is to generate the pragma instrumented output code. Figure 8 shows the SLX FPGA code refactoring wizard; users can select/unselect the pragmas that will be interested in the generated code.


Figure 8: SLX FPGA Code Generation Wizard showing automated pragma insertion

Performance Improvement

After synthesizing the design, we compare the performance and resource utilization of the SLX optimized kernel with our reference (Vitis library implementation). For this particular design, we allowed SLX FPGA to use all available resources on the selected device. We see a latency reduction of 36% and significant improvement in resource utilization over the library provided pragmas.

Table 1: Original Vitis Library compared to SLX FPGA instrumented code

Vitis Project

The HLS folder in an SLX FPGA project contains a Vivado HLS project with SLX optimized source code. We open this project with Vivado HLS and export RTL as a Xilinx object, as shown in figure 9. Note that we need to use the Extern “C” wrapper to ensure C linkage.This XO file can be imported and used within our Vitis project.


Figure 9: Exporting the Xilinx Object from Vivado HLS


Within a new Vitis workspace, a new project can be created which targets the Alveo U200 card.


Figure 10: Alveo Project creation in Vitis


Once the project is created, the dut.XO can be imported into the Vitis project as a source file and selected for acceleration into the Alveo card.


Figure 11: Using the kernel in Vitis


The developer is now able to create the wider application which runs on the x86 host taking advantage of the accelerated Vasicek Model.


Figure 12: Implemented kernel in the Vivado Diagram



This application note has demonstrated how SLX FPGA can be used to optimize a kernel targeted for PCI connected Alveo cards, leveraging the Vitis bottom-up kernel flow. For this example, SLX FPGA was able to reduce the latency of the original library provided pragmas by 36% while cutting the resource utilization significantly, but we could have applied resource constraints to reduce the utilization further (at the expense of performance). The approach can be applied to most Xilinx based data-center applications including Amazon F1 instances. This methodology can be applied whether you are developing an application from scratch or reusing existing design and customizing it according to your needs.



Accelerate the Journey from C/C++ to FPGA

SLX FPGA sits on top of HLS compiler

  • Prepares the C/C++ code for optimum HLS results
  • Takes the guesswork out of using HLS

Removes the roadblocks in HLS adoption

  • Non-synthesizable C/C++ code
  • Finding parallelism
  • Poor performance and bloated area







This project has received funding from the European Union’s Horizon 2020 research and innovation program under grant agreement No 858051.