Introduction
It is a common problem in FPGA design to have to send high-bandwidth data (such as video) between two different FPGAs. High-speed serial links are a very good way of doing this as they are high-bandwidth and low pin count. One option is to use a higher-level implementation that includes an in-built protocol (such as Aurora). However there are often times when you need to implement something custom and therefore need to use lower level design blocks. In this blog we are going to look at how to create a high-speed serial receiver in a Zynq Ultrascale+ MPSoC using the Xilinx GTH Wizard. The GTH Wizard is a relatively low-level way of implementing a high-speed serial link that doesn't include an in-built protocol. This blog is only going to cover how to create the high-speed serial receiver; a future blog will cover creating a transmitter.
Key Design Parameters
The Zynq receiver we are going to make is based on the following parameters:
Target device: Xilinx Zynq Ultrascale+ MPSOC 7EV
Target board: ZCU106
Transceiver type: GTH
Channel type: RX (receiver only)
Encoding: 8b10b
Comma character: K28.5
Serial data rate: 2 Gbps
Reference clock speed: 156.25 MHz
Fabric interface: 32-bits @ 50 MHz
Whilst this blog uses a Zynq Ultrascale+ device, many of the principles we are going to cover are equally applicable to other types of Xilinx FPGA or SoC.
High-Level Design Structure
The high-level structure of the receiver design we are going to create is shown below:
Clocking
The following clocks are required for the design:
GTH reference clock - this is the clock that is used to drive the serial interface. It is generated from a dedicated transceiver clock input. This clock must be clocked at the frequency defined by the ‘reference clock’ setting on the ‘Basic’ settings tab of the GTH wizard. The clock is buffered through an IBUFDS_GTE4 primitive.
Free running clock - this clock is used by various transceiver wizard helper blocks. It must be clocked at the same frequency as the output interface to the FPGA fabric. The free running clock frequency is defined by the attribute ‘Free-running and DRP clock frequency’ on the ‘Physical resources’ tab of the GTH wizard. This frequency cannot be more than 250 MHz (or 200 MHz for engineering silicon) and must be set to the lowest of the RX/TX fabric clock frequencies.
User RX clock - this is the clock that is used to clock out data to the user logic in the FPGA fabric. The frequency of this clock is also defined by the attribute ‘Free-running and DRP clock frequency’ on the ‘Physical resources’ tab of the GTH wizard.
The MMCM (clock wizard) shown in the system diagram is included to generate the free running clock . It is not strictly required if you already have a clock of the correct frequency, however including an MMCM offers flexibility and is required if you do not have another clock with the correct frequency.
Reset
There are a number of resets on the wizard block. These are driven by an asynchronous active-high reset signal. The block takes some time to fully initialize following a reset. Typically around 100 - 200 microseconds after the reset is removed.
Opening the GTH Wizard
The GTH wizard is not available as a block diagram element in Vivado. It must therefore be created from the ‘IP Catalog' in Vivado. The output will be a series of VHDL files that can be included in your design. To open the wizard go to ‘IP Catalog’ → ‘FPGA Features and Design’ → ‘IO Interface’ → 'UltraScale FPGAs Transceivers Wizard’:
Configuring the Wizard
There are four tabs that need configuring in the wizard:
Basic - setup the transceiver data rate, reference clock speed, encoding, and fabric data width.
Physical resources - setup the location of the transceiver in the device and define the fabric (user) clock (and free running clock) frequency.
Optional features - Enable 8b10b comma detection and alignment
Structural options - Enable various helper functions.
Basic Settings Tab
For this example we are going to create a 2Gbps RX transceiver with a 32-bit data width to the fabric. The transceiver will use 8b10b encoding. This example is being targeted at the ZCU106 development board which includes a 156.25 MHz reference clock for the GTH quads. The basic settings tab is setup as follows:
The QPLL calculator is used to perform an initial setup of the reference clock. However the required reference clock must sometimes be selected manually from the ‘Actual Reference Clock’ dropdown once the calculator has been run. Note that the reference clock must be compatible with the desired serial data rate.
The interface width is set by the field ‘User data width’ (in this case 32-bits). The internal width will be wider due to 8b10b encoding (in this case it is 40 bits).
Physical Resources Tab
On the physical resources tab the user clock frequency (and free running clock frequency) must be set. This will be pre-populated based on the data width and clock frequencies you selected on the first tab. It is recommended that this is left ‘as is’. The physical location for the transceiver and the transceiver clock is also set on this tab. For the ZCU106 we are using the quad connected to the SMAs on the card. The outputs we have selected correspond the SMA RX receiver channel. The reference clock (USER_MGT_SI570_CLOCK2) is provided in a different quad, so we have also selected the appropriate clock input.
A view of the ZCU106 features is shown below. The SMA outputs labelled ‘43’ are being used for this example. These are located in Bank 225 of the FPGA. The reference clock we are using (USER_MGT_SI570_CLOCK2) is in bank 227.
Optional Features Tab
On the optional features tab we are going to setup the comma detection and alignment. No other settings will need changing. It is important that these settings are correct otherwise the receiver and transmitter will not sync properly.
To open the comma detection and alignment settings double click on the ‘Receiver comma detection and alignment’ bar. This will open the settings which should be configured as follows:
To get the receiver to synchronise with the transmitter it is necessary to insert commas (K-characters) into the stream to align them. The Wikipedia page on 8b10b encoding provides a good overview of what K-characters are. However, in outline a K-character is a unique 8b10b code that cannot occur in the normal data stream. The receiver can detect these characters and then use them to align itself to the transmitter. For this example we are going to use the comma code K28.5. This comma code corresponds to a raw data value of 0xBC (hex) which is then encoded into the correct 8b10b value for K28.5. There are two 8b10b code that are associated with the comma K28.5. These two codes (called 'plus' and 'minus') are used to help achieve line balance. The transmitter will select which code to output based on the running disparity for the serial line.
To setup the receiver, the plus comma and minus comma tick boxes should be selected and the 10-bit binary values provided for both the plus and minus codes for the K28.5 comma. This can be done by ticking the tick boxes and then selecting K28.5 from the preset dropdown.
Given that this is a 32-byte interface we also want to ensure that we align correctly to the 32-bit boundary. To do this we need to change the ‘Alignment boundary' to ‘Four byte boundary’ so that the alignment is set on the 32-bit boundary.
As well as changing the comma detection and alignment settings we also need to set a number of pins on the GTH wizard block. The following pins of the GTH wizard block need to be tied high in the design’s VHDL to enable the comma detection and alignment:
rx8b10ben_in(0) => '1',
rxcommadeten_in(0) => '1',
rxmcommaalignen_in(0) => '1',
rxpcommaalignen_in(0) => '1',
Structural Options Tab
Finally in the ‘Structural options tab’ all of the helper functions should be enabled except the IBERT core. This will help simplify the remaining design tasks by creating a suitable reset and clocking structure for the GTH Wizard block.
Top-level Design
Once the GTH Wizard has been run the VHDL (or Verilog) for the block can be integrated with the rest of the design. For our example design the top-level ports are as follows:
entity xcvr_test_top_zq is
port (
i_clk_ref_p : in std_logic; --GTH ref clock P pin
i_clk_ref_n : in std_logic; --GTH ref clock N pin
i_clk_sys_p : in std_logic; --Free running clock
i_clk_sys_n : in std_logic; --Free running clock
i_areset : in std_logic; --System reset
i_xcvr_rx_n : in std_logic; --GTH RX port N pin
i_xcvr_rx_p : in std_logic; --GTH RX port P pin
o_led_1 : out std_logic; --RX clock LED
o_led_2 : out std_logic; --Free running clk LED
o_rx_clock : out std_logic; --RX data output
o_rx_data : out std_logic_vector(7 downto 0) --RX clock output
);
end xcvr_test_top_zq;
The VHDL from the GTH Wizard is connected up as follows. In this example the TX ports are not being used and have been tied to zero or left open.
u_xcvr : gtwizard_ultrascale_0
port map (
gtwiz_userclk_tx_reset_in(0) => s_reset,
gtwiz_userclk_tx_srcclk_out => open,
gtwiz_userclk_tx_usrclk_out => open,
gtwiz_userclk_tx_usrclk2_out => open,
gtwiz_userclk_tx_active_out => open,
gtwiz_userclk_rx_reset_in(0) => s_reset,
gtwiz_userclk_rx_srcclk_out => open,
gtwiz_userclk_rx_usrclk_out => open,
gtwiz_userclk_rx_usrclk2_out(0) => s_rx_clk,
gtwiz_userclk_rx_active_out => open,
gtwiz_reset_clk_freerun_in(0) => s_sys_clk,
gtwiz_reset_all_in(0) => s_reset,
gtwiz_reset_tx_pll_and_datapath_in(0) => s_reset,
gtwiz_reset_tx_datapath_in(0) => s_reset,
gtwiz_reset_rx_pll_and_datapath_in(0) => s_reset,
gtwiz_reset_rx_datapath_in(0) => s_reset,
gtwiz_reset_rx_cdr_stable_out => open,
gtwiz_reset_tx_done_out => open,
gtwiz_reset_rx_done_out => open,
gtwiz_userdata_tx_in => (others => '0'),
gtwiz_userdata_rx_out => s_rx_data,
gtrefclk00_in(0) => s_ref_clk,
qpll0outclk_out => open,
qpll0outrefclk_out => open,
gthrxn_in(0) => i_xcvr_rx_n,
gthrxp_in(0) => i_xcvr_rx_p,
rx8b10ben_in(0) => '1',
rxcommadeten_in(0) => '1',
rxmcommaalignen_in(0) => '1',
rxpcommaalignen_in(0) => '1',
tx8b10ben_in(0) => '0',
txctrl0_in => (others => '0'),
txctrl1_in => (others => '0'),
txctrl2_in => (others => '0'),
gthtxn_out => open,
gthtxp_out => open,
gtpowergood_out => open,
rxctrl0_out => open,
rxctrl1_out => open,
rxctrl2_out => open,
rxctrl3_out => open,
rxpmaresetdone_out => open,
txpmaresetdone_out => open
);
The reference clock for the GTH Wizard block is provided from dedicated GTH clock reference pins. These pins accept a differential clock that must be converted to a single ended clock. This is done using the IBUFDS_GTE4 primitive. This primitive must be used for this purpose. The GTH clock buffer primitive is setup as follows:
u_xcvr_clock_buf : ibufds_gte4
generic map (
refclk_en_tx_path => '0',
refclk_hrow_ck_sel => "00",
refclk_icntl_rx => "00"
)
port map (
o => s_ref_clk, -- to gth wizard block
odiv2 => open,
ceb => '0',
i => i_clk_ref_p, --from top-level ports
ib => i_clk_ref_n --from top-level ports
);
The MMCM wizard is setup to give a 50 MHz clock output. This is connected to the free-running clock of the GTH wizard. The 'locked' output from the MMCM is also used to hold the other blocks in reset until the clock output is stable:
Implementation Details
A single constraints file is generated for the design. This contains the pin mapping and some basic timing constraints. The GTH pins (GTH reference clock and RX channel pins) do not need constraining as this has already been done in the transceiver wizard. The RX data clock and output are connected to the prototype header on the ZCU106 so that these outputs can be passed to a logic analyser. The clocks and resets are connected to the clock and reset resources on the ZCU106 board.
Simulating the Design
To prove that the design works it is necessary to simulate the receiver. A simple test-bench can be created that includes a transmitter block. The creation of the transmitter will be covered in a future blog. In our simple test-bench the transmitter sends a simple free-running count to the receiver. The receiver then outputs the received count to the port 'o_rx_data'.
Before you start simulating, you will need to pre-compile the Xilinx simulation libraries. To do this in Vivado go to the ‘Tools' drop-down and select 'Compile Simulation Libraries’. Select the simulator you are using, the family of devices you are targeting, and the location you want to compile to.
Once you have the simulation libraries, you will need a number of files to compile the Zynq design. The files and libraries that are needed can be found in the two auto-generated compile scripts for the MMCM and the transceiver. These can be found inside your Vivado project at the following locations:
ip_user_files\sim_scripts\gtwizard_ultrascale_0\modelsim\compile.do
ip_user_files\sim_scripts\clk_wiz_0\modelsim\compile.do
These two files contain all of the HDL and libraries needed to simulate the IP. These files are intended to be run from the folder where they are contained, however to create a system level simulation you may need to create your own compile scripts. This can be done by modifying the auto-generated compile scripts so that they can be run from another location.
Once the design has been compiled it can be simulated. There are three things that need do be done when generating a simulation command for this design:
The time scale needs to be set to at least 1 ps so that the PLLs simulate correctly (higher speeds may require a lower time resolution).
The various compile libraries need including with the simulation command so that the simulator can bind the entities correctly.
The file ‘glbl.v’ needs invoking at the same time as the test-bench top-level file so that the Xilinx IP can find the global signals that they require.
The resulting simulation command should look something like this:
1vsim -t 1ps work.xcvr_test_tb work.glbl -L gtwizard_ultrascale_v1_7_13 -L xil_defaultlib -L unisims_ver -L unimacro_ver -L secureip -L xpm
Below you can see a simulation output of the receiver showing an incrementing count. The count was sent from the transmitter.
Hardware Test
The design can also be tested on hardware. This can be done by connecting a logic analyser to the prototype header on the ZCU106. The prototype header is itself connected to the output 'o_rx_data' on the FPGA. The output on the logic analyser is shown below. This only shows the lower 8-bits of the RX data output.
Summary
Overall, high-speed serial transceivers are a really simple way of creating a high-bandwidth link between two FPGAs. Whilst you can use higher-level implementations (such as Aurora), custom, lower-level implementations can also be useful. Hopefully this block has shown you enough to get you starter with implementing your own high-speed serial receiver in a Xilinx FPGA. A future blog will cover how to create the transmitter.