MicroZed Chronicles: CORDIC algorithm in RTL.

Jan 89 min read

Recently I retired my old trusty TI85 calculator and replaced it with a more modern scientific calculator, which could even run python. Doing this reminded me of the CORDIC algorithm, which was developed by Jack Volder in 1958 and enabled the first scientific calculator the HP35.

We have looked at the CORDIC algorithm before however, when previously examined we used an IP block from the Vivado Library.

I did however, want to come back and look at how we could implement a CORDIC algorithm from scratch using VHDL and the fixed point math package which we have previously also examined.

Doing so enables us to understand how we can create RTL modules which implement algorithms using the fixed package. While allowing us to see the inner workings of the CORDIC algorithm.

As mentioned to develop this I am going to be using the fixed point IEEE packages for this development.

The algorithm for the CORDIC algorithm can be seen below and works in two mode, rotation and vectoring.

To ensure flexibility and precision the module has been developed to use generics which enable the number of integer and fractional bits to be adjusted.

generic (
        -- Fixed-point format
        INTEGER_BITS   : integer := 2;
        FRACTION_BITS  : integer := 16;
        ITERATIONS     : integer := 16

The design also includes constants for zero and the processing gain which is used in rotating mode to provide a result which is scaled correctly.

-- CORDIC gain compensation factor (approximately 0.607253)

constant K : sfixed(INTEGER_BITS-1 downto -FRACTION_BITS) :=
        to_sfixed(0.607253321089875, INTEGER_BITS-1, -FRACTION_BITS);

constant ZERO : sfixed(INTEGER_BITS-1 downto -FRACTION_BITS) := 
       to_sfixed(0.0, INTEGER_BITS-1, -FRACTION_BITS);

At its heart is a shift and add algorithm which uses a small look up table storing a values of arctan value for the iteration.

This implementation will be built around a 16 iteration CORDIC algorithm, which means the look up table only needs to store 16 entries. These entries will be store arctan 2**-I, these values will be pre calculated and quantised using the fixed point package.

When it comes to quantising the values of ArcTan we need to sure we provide sufficient resolution. The smallest value required to be stored is 2**-15 which is a vary small number (0.000030517578116) to store this value, we need at least 16 fractional bits.

constant atan_table : angle_array := (
	   to_sfixed(arctan(2.0 ** 0), INTEGER_BITS-1, -FRACTION_BITS),  
        to_sfixed(arctan(2.0 ** (-1)), INTEGER_BITS-1, -FRACTION_BITS), 
        to_sfixed(arctan(2.0 ** (-2)), INTEGER_BITS-1, -FRACTION_BITS), 
        to_sfixed(arctan(2.0 ** (-3)), INTEGER_BITS-1, -FRACTION_BITS), 
        to_sfixed(arctan(2.0 ** (-4)), INTEGER_BITS-1, -FRACTION_BITS), 
        to_sfixed(arctan(2.0 ** (-5)), INTEGER_BITS-1, -FRACTION_BITS), 
        to_sfixed(arctan(2.0 ** (-6)), INTEGER_BITS-1, -FRACTION_BITS), 
        to_sfixed(arctan(2.0 ** (-7)), INTEGER_BITS-1, -FRACTION_BITS), 
        to_sfixed(arctan(2.0 ** (-8)), INTEGER_BITS-1, -FRACTION_BITS), 
        to_sfixed(arctan(2.0 ** (-9)), INTEGER_BITS-1, -FRACTION_BITS), 
        to_sfixed(arctan(2.0 ** (-10)), INTEGER_BITS-1, -FRACTION_BITS), 
        to_sfixed(arctan(2.0 ** (-11)), INTEGER_BITS-1, -FRACTION_BITS), 
        to_sfixed(arctan(2.0 ** (-12)), INTEGER_BITS-1, -FRACTION_BITS), 
        to_sfixed(arctan(2.0 ** (-13)), INTEGER_BITS-1, -FRACTION_BITS), 
        to_sfixed(arctan(2.0 ** (-14)), INTEGER_BITS-1, -FRACTION_BITS), 
        to_sfixed(arctan(2.0 ** (-15)), INTEGER_BITS-1, -FRACTION_BITS)      );

I want the module to be pipelined that is to be able to process a new input on every clock cycle. To do this I will be declaring a record type which contains the X, Y, Z, mode and a valid signal which propagates through.

-- Type definitions for the pipeline stages
    type stage_data is record
        x     : sfixed(INTEGER_BITS-1  downto -FRACTION_BITS);
        y     : sfixed(INTEGER_BITS-1  downto -FRACTION_BITS);
        z     : sfixed(INTEGER_BITS-1 downto -FRACTION_BITS);
        valid : std_logic;
        mode  : std_logic;  -- Propagate mode through pipeline
    end record;

This record type can then be used to create an array for the processing pipeline. We can iterate through each of the 16 iterations using a for loop.

Within this loop, we need to perform the X and Y shift, using the shift_right function.

shift_x := shift_right(pipe_regs(i).x, i);
shift_y := shift_right(pipe_regs(i).y, i);

We also need to determine the direction of the rotation for both rotating and vectoring modes.

-- Determine rotation direction based on mode
if pipe_regs(i).mode = '0' then
   rotation_direction := '1'  when ( pipe_regs(i).z < 
				to_sfixed(0, INTEGER_BITS-1, -FRACTION_BITS)) else '0';
else
   rotation_direction := '1' when (pipe_regs(i).y >= 
						  to_sfixed(0, 1, -FRACTION_BITS)) else '0';
 end if;

With the direction determined the pipeline can be updated to by performing the additions or subtractions as determined.

-- Compute next values based on rotation direction
if rotation_direction then
     -- Rotate clockwise
     next_x := resize((pipe_regs(i).x + shift_y),
					INTEGER_BITS-1,-FRACTION_BITS) ;
    	 next_y := resize((pipe_regs(i).y - shift_x),
					INTEGER_BITS-1,-FRACTION_BITS);
     next_z := resize((pipe_regs(i).z + atan_table(i)),
					INTEGER_BITS-1,-FRACTION_BITS);
else
   	-- Rotate counter-clockwise
    next_x := resize((pipe_regs(i).x - shift_y),
					INTEGER_BITS-1,-FRACTION_BITS);
	next_y := resize((pipe_regs(i).y + shift_x),
					INTEGER_BITS-1,-FRACTION_BITS);
    next_z := resize((pipe_regs(i).z - atan_table(i)),
					INTEGER_BITS-1,-FRACTION_BITS);
end if;

This solution is not fully pipelined, as such does have a latency to calculate the values of Sin, Cos (rotating) and ArcTan (Vectoring).

The complete design can be seen below

library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.NUMERIC_STD.ALL;
use IEEE.FIXED_PKG.ALL;

entity cordic_pipelined is
    generic (
        -- Fixed-point format
        INTEGER_BITS : integer := 2;
        FRACTION_BITS : integer := 16;
        ITERATIONS : integer := 16
    );
    port (
        clk        : in  std_logic;
        rst        : in  std_logic;
        data_valid : in  std_logic;
        -- Mode selection: '0' for rotation, '1' for vectoring
        mode       : in  std_logic;
        -- Inputs depend on mode:
        -- Rotation mode: (angle, 1.0, 0.0) or (angle, x_in, y_in) for arbitrary vector rotation
        -- Vectoring mode: 
	    --(0.0, x_in, y_in) to compute magnitude and angle
        angle_or_zero : in  sfixed(INTEGER_BITS-1 downto -FRACTION_BITS);
        x_in       : in  sfixed(INTEGER_BITS-1 downto -FRACTION_BITS);
        y_in       : in  sfixed(INTEGER_BITS-1  downto -FRACTION_BITS);
        -- Outputs depend on mode:
        -- Rotation mode
	   --(z_out=~0, cos(angle), sin(angle)) or rotated vector
        -- Vectoring mode: (angle, magnitude, ~0)
        z_out      : out sfixed(INTEGER_BITS-1 downto -FRACTION_BITS);
        x_out      : out sfixed(INTEGER_BITS-1  downto -FRACTION_BITS);
        y_out      : out sfixed(INTEGER_BITS-1  downto -FRACTION_BITS);
        valid_out  : out std_logic
    );
end cordic_pipelined;

architecture Behavioral of cordic_pipelined is

    
    -- Type definitions for the pipeline stages
    type stage_data is record
        x     : sfixed(INTEGER_BITS-1  downto -FRACTION_BITS);
        y     : sfixed(INTEGER_BITS-1  downto -FRACTION_BITS);
        z     : sfixed(INTEGER_BITS-1 downto -FRACTION_BITS);
        valid : std_logic;
        mode  : std_logic;  -- Propagate mode through pipeline
    end record;

    type pipeline_array is array (0 to ITERATIONS) of stage_data
     -- Lookup table for arctangent values

    type angle_array is array (0 to ITERATIONS-1) of 
				   sfixed(INTEGER_BITS-1 downto -FRACTION_BITS);

constant atan_table : angle_array := (
	   to_sfixed(arctan(2.0 ** 0), INTEGER_BITS-1, -FRACTION_BITS),  
        to_sfixed(arctan(2.0 ** (-1)), INTEGER_BITS-1, -FRACTION_BITS), 
        to_sfixed(arctan(2.0 ** (-2)), INTEGER_BITS-1, -FRACTION_BITS), 
        to_sfixed(arctan(2.0 ** (-3)), INTEGER_BITS-1, -FRACTION_BITS), 
        to_sfixed(arctan(2.0 ** (-4)), INTEGER_BITS-1, -FRACTION_BITS), 
        to_sfixed(arctan(2.0 ** (-5)), INTEGER_BITS-1, -FRACTION_BITS), 
        to_sfixed(arctan(2.0 ** (-6)), INTEGER_BITS-1, -FRACTION_BITS), 
        to_sfixed(arctan(2.0 ** (-7)), INTEGER_BITS-1, -FRACTION_BITS), 
        to_sfixed(arctan(2.0 ** (-8)), INTEGER_BITS-1, -FRACTION_BITS), 
        to_sfixed(arctan(2.0 ** (-9)), INTEGER_BITS-1, -FRACTION_BITS), 
        to_sfixed(arctan(2.0 ** (-10)), INTEGER_BITS-1, -FRACTION_BITS), 
        to_sfixed(arctan(2.0 ** (-11)), INTEGER_BITS-1, -FRACTION_BITS), 
        to_sfixed(arctan(2.0 ** (-12)), INTEGER_BITS-1, -FRACTION_BITS), 
        to_sfixed(arctan(2.0 ** (-13)), INTEGER_BITS-1, -FRACTION_BITS), 
        to_sfixed(arctan(2.0 ** (-14)), INTEGER_BITS-1, -FRACTION_BITS), 
        to_sfixed(arctan(2.0 ** (-15)), INTEGER_BITS-1, -FRACTION_BITS)      );

    -- CORDIC gain compensation factor (approximately 0.607253)
    constant K : sfixed(INTEGER_BITS-1 downto -FRACTION_BITS) := 
        to_sfixed(0.607253321089875, INTEGER_BITS-1, -FRACTION_BITS);
    constant ZERO : sfixed(INTEGER_BITS-1 downto -FRACTION_BITS) := 
       to_sfixed(0.0, INTEGER_BITS-1, -FRACTION_BITS);

    -- Pipeline registers
    signal pipe_regs : pipeline_array;

begin
    -- Pipeline process
    process(clk, rst)
        variable shift_x, shift_y : sfixed
					(INTEGER_BITS-1  downto -FRACTION_BITS);
        variable next_x, next_y : sfixed
					(INTEGER_BITS-1  downto -FRACTION_BITS);
        variable next_z : sfixed(INTEGER_BITS-1 downto -FRACTION_BITS);
        variable rotation_direction : std_logic;
    begin
        if rst = '1' then
            -- Reset all pipeline stages
            for i in 0 to ITERATIONS loop
                pipe_regs(i).x <= to_sfixed
						(0, INTEGER_BITS-1, -FRACTION_BITS);
                pipe_regs(i).y <= to_sfixed
						(0, INTEGER_BITS-1, -FRACTION_BITS);
                pipe_regs(i).z <= to_sfixed
						(0, INTEGER_BITS-1, -FRACTION_BITS);
                pipe_regs(i).valid <= '0';
                pipe_regs(i).mode <= '0';
            end loop;
        elsif rising_edge(clk) then
            -- Input stage
            if data_valid = '1' then
                -- Initialize based on mode
                if mode = '0' then  -- Rotation mode
                    pipe_regs(0).x <= resize(x_in * K, 
							INTEGER_BITS-1, -FRACTION_BITS);  
                    pipe_regs(0).y <= resize(y_in * K, 
							INTEGER_BITS-1, -FRACTION_BITS);
                    pipe_regs(0).z <= angle_or_zero;
                else  -- Vectoring mode
                    pipe_regs(0).x <= x_in;
                    pipe_regs(0).y <= y_in;
                    pipe_regs(0).z <= angle_or_zero;
                end if;
                pipe_regs(0).valid <= '1';
                pipe_regs(0).mode <= mode;
            else
                pipe_regs(0).valid <= '0';
            end if;
            -- Pipeline stages
            for i in 0 to ITERATIONS-1 loop
                if pipe_regs(i).valid = '1' then
                    -- Calculate shifted values for this iteration
                    shift_x := shift_right(pipe_regs(i).x, i);
                    shift_y := shift_right(pipe_regs(i).y, i);         
				  -- Determine rotation direction based on mode
                    if pipe_regs(i).mode = '0' then
                        rotation_direction := '1'  when 
							( pipe_regs(i).z < to_sfixed
							(0, INTEGER_BITS-1, -FRACTION_BITS))
							 else '0';
                    else
                        rotation_direction := '1' when 
							(pipe_regs(i).y >= to_sfixed
							(0, 1, -FRACTION_BITS)) 
							else '0';
                    end if;
                    -- Compute next values based on rotation direction
                    if rotation_direction then
                        -- Rotate clockwise
                        next_x := resize((pipe_regs(i).x + shift_y)
								,INTEGER_BITS-1,-FRACTION_BITS) ;
                        next_y := resize((pipe_regs(i).y - shift_x)
								,INTEGER_BITS-1,-FRACTION_BITS);
                        next_z := resize((pipe_regs(i).z + atan_table(i))
								,INTEGER_BITS-1,-FRACTION_BITS);
                    else
                        -- Rotate counter-clockwise
                        next_x := resize((pipe_regs(i).x - shift_y),
								INTEGER_BITS-1,-FRACTION_BITS);
                        next_y := resize((pipe_regs(i).y + shift_x),
								INTEGER_BITS-1,-FRACTION_BITS);
                        next_z := resize((pipe_regs(i).z - atan_table(i)),
								INTEGER_BITS-1,-FRACTION_BITS);
                    end if;                  
				  -- Assign to next pipeline stage
                    pipe_regs(i+1).x <= next_x;
                    pipe_regs(i+1).y <= next_y;
                    pipe_regs(i+1).z <= next_z;
                    pipe_regs(i+1).valid <= pipe_regs(i).valid;
                    pipe_regs(i+1).mode <= pipe_regs(i).mode;
                else
                    pipe_regs(i+1).valid <= '0';
                end if;
            end loop;
        end if;
    end process;
   -- Output assignment
    z_out <= pipe_regs(ITERATIONS).z;
    x_out <= pipe_regs(ITERATIONS).x;
    y_out <= pipe_regs(ITERATIONS).y;
    valid_out <= pipe_regs(ITERATIONS).valid;
end Behavioral;

To test this module in both the rotating and vectoring mode I created a simple test bench. The test bench will test the CORDIC in both rotating and vectoring mode reporting an error should the expected result vary by more than the accepted margin of error.


library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.NUMERIC_STD.ALL;
use IEEE.FIXED_PKG.ALL;
use IEEE.MATH_REAL.ALL;

entity cordic_pipelined_tb is
end cordic_pipelined_tb;
architecture Behavioral of cordic_pipelined_tb is
    -- Constants
    constant CLK_PERIOD : time := 10 ns;
    constant INTEGER_BITS : integer := 3;
    constant FRACTION_BITS : integer := 16;
    constant ITERATIONS : integer := 16;
    constant ERROR_THRESHOLD : real := 0.001; -- Maximum allowed error

    -- Component declaration
    component cordic_pipelined is
        generic (
            INTEGER_BITS : integer := 2;
            FRACTION_BITS : integer := 14;
            ITERATIONS : integer := 16
        );
        port (
            clk          : in  std_logic;
            rst          : in  std_logic;
            data_valid   : in  std_logic;
            mode         : in  std_logic;
            angle_or_zero: in  sfixed
						(INTEGER_BITS-1 downto -FRACTION_BITS);
            x_in         : in  sfixed
						(INTEGER_BITS-1 downto -FRACTION_BITS);
            y_in         : in  sfixed
						(INTEGER_BITS-1 downto -FRACTION_BITS);
            z_out        : out sfixed
						(INTEGER_BITS-1 downto -FRACTION_BITS);
            x_out        : out sfixed
						(INTEGER_BITS-1 downto -FRACTION_BITS);
            y_out        : out sfixed
						(INTEGER_BITS-1 downto -FRACTION_BITS);
            valid_out    : out std_logic        );
    end component;
    -- Signal declarations
    signal clk          : std_logic := '0';
    signal rst          : std_logic := '1';
    signal data_valid   : std_logic := '0';
    signal mode         : std_logic := '0';
    signal angle_or_zero: sfixed(INTEGER_BITS-1 downto -FRACTION_BITS);
    signal x_in         : sfixed(INTEGER_BITS-1 downto -FRACTION_BITS);
    signal y_in         : sfixed(INTEGER_BITS-1 downto -FRACTION_BITS);
    signal z_out        : sfixed(INTEGER_BITS-1 downto -FRACTION_BITS);
    signal x_out        : sfixed(INTEGER_BITS-1 downto -FRACTION_BITS);
    signal y_out        : sfixed(INTEGER_BITS-1 downto -FRACTION_BITS);
    signal valid_out    : std_logic;

    -- Test procedure for checking results

    procedure check_rotation_result (
        expected_cos : real;
        expected_sin : real;
        actual_x     : sfixed;
        actual_y     : sfixed;
        test_angle   : string) is
        variable actual_cos_real : real;
        variable actual_sin_real : real;
        variable cos_error, sin_error : real;
    begin
        actual_cos_real := to_real(actual_x);
        actual_sin_real := to_real(actual_y);
        cos_error := abs(actual_cos_real - expected_cos);
        sin_error := abs(actual_sin_real - expected_sin);
        assert cos_error < ERROR_THRESHOLD and sin_error < ERROR_THRESHOLD
            report "Rotation mode test failed for " & test_angle & LF &
                   "Expected cos = " & real'image(expected_cos) & 
                   ", got = " & real'image(actual_cos_real) & LF &
                   "Expected sin = " & real'image(expected_sin) & 
                   ", got = " & real'image(actual_sin_real)
            severity error;
    end procedure;    

    procedure check_vectoring_result (
        expected_angle : real;
        actual_angle   : sfixed;
        test_point     : string) is
        variable actual_mag_real   : real;
        variable actual_angle_real : real;
        variable mag_error, angle_error : real;
    begin 
		actual_angle_real := to_real(actual_angle);
         angle_error := abs(actual_angle_real - expected_angle);
         assert  angle_error < ERROR_THRESHOLD
            report "Vectoring mode test failed for " & test_point & LF &
                   "Expected angle = " & real'image(expected_angle) & 
                   ", got = " & real'image(actual_angle_real)
            severity error;
    end procedure;

begin
    -- Clock generation
    clk <= not clk after CLK_PERIOD/2;

    -- DUT instantiation
    DUT: cordic_pipelined
        generic map (
            INTEGER_BITS => INTEGER_BITS,
            FRACTION_BITS => FRACTION_BITS,
            ITERATIONS => ITERATIONS        )
        port map (
            clk => clk,
            rst => rst,
            data_valid => data_valid,
            mode => mode,
            angle_or_zero => angle_or_zero,
            x_in => x_in,
            y_in => y_in,
            z_out => z_out,
            x_out => x_out,
            y_out => y_out,
            valid_out => valid_out        );

    -- Stimulus process
    stim_proc: process
        variable pi : real := MATH_PI;
        variable test_angles : real_vector(0 to 4) := 
							(0.0, pi/6.0, pi/4.0, pi/3.0, pi/2.0);
        variable test_points : real_vector(0 to 4)  := 
							(1.0, 0.866, 0.707, 0.5, 0.0001);
    begin
        -- Reset sequence
        rst <= '1';
        wait for CLK_PERIOD*2;
        rst <= '0';
        wait for CLK_PERIOD*2;
        -- Test Rotation Mode
        report "Testing Rotation Mode...";
        mode <= '0';
        for i in test_angles'range loop
            data_valid <= '1';
            angle_or_zero <= to_sfixed(test_angles(i), 
							INTEGER_BITS-1, -FRACTION_BITS);
            x_in <= to_sfixed(1.0, INTEGER_BITS-1, -FRACTION_BITS);
            y_in <= to_sfixed(0.0, INTEGER_BITS-1, -FRACTION_BITS);
            wait for CLK_PERIOD;
            data_valid <= '0';
            -- Wait for result
            wait until valid_out = '1';
            wait for CLK_PERIOD/2; -- Align with clock edge

            -- Check results
            check_rotation_result(
                cos(test_angles(i)),
                sin(test_angles(i)),
                x_out,
                y_out,
                "angle = " & real'image(test_angles(i))
            );

            wait for CLK_PERIOD*2;
        end loop;

        -- Test Vectoring Mode
        report "Testing Vectoring Mode...";
        mode <= '1';
        for i in test_points'range loop
            data_valid <= '1';
            angle_or_zero <= to_sfixed(0.0, 
						INTEGER_BITS-1, -FRACTION_BITS);
            x_in <= to_sfixed(test_points(i), 
						INTEGER_BITS-1, -FRACTION_BITS);
            y_in <= to_sfixed(1.0, INTEGER_BITS-1, -FRACTION_BITS);

            wait for CLK_PERIOD;
            data_valid <= '0';           
		   -- Wait for result
            wait until valid_out = '1';
            wait for CLK_PERIOD/2;

           -- Check results
            check_vectoring_result(
                arctan(1.0/test_points(i)),
                z_out,
                "point = (" & real'image(test_points(i)) & ", 1.0)"
            );
            wait for CLK_PERIOD*2;
        end loop;

        report "CORDIC Testing Completed" severity failure; 
   
    end process;
end Behavioral;

Running this in simulation shows the results are calculated as expected.

Workshops and Webinars

If you enjoyed the blog why not take a look at the free webinars, workshops and training courses we have created over the years. Highlights include

Upcoming Webinars Timing, RTL Creation, FPGA Math and Mixed Signal
Professional PYNQ Learn how to use PYNQ in your developments
Introduction to Vivado learn how to use AMD Vivado
Ultra96, MiniZed & ZU1 three day course looking at HW, SW and PetaLinux
Arty Z7-20 Class looking at HW, SW and PetaLinux
Mastering MicroBlaze learn how to create MicroBlaze solutions
HLS Hero Workshop learn how to create High Level Synthesis based solutions
Perfecting Petalinux learn how to create and work with PetaLinux OS

Boards

Get an Adiuvo development board

Adiuvo Spartan 7 / RPi 2040 Embedded System Development Board
Adiuvo Spartan 7 Tile - Low Risk way to add a FPGA to your design.

Embedded System Book

Do you want to know more about designing embedded systems from scratch? Check out our book on creating embedded systems. This book will walk you through all the stages of requirements, architecture, component selection, schematics, layout, and FPGA / software design. We designed and manufactured the board at the heart of the book! The schematics and layout are available in Altium here Learn more about the board (see previous blogs on Bring up, DDR validation, USB, Sensors) and view the schematics here.

Order here

MicroZed Chronicles: CORDIC algorithm in RTL.

Recent Posts