Two slide decks presented at the embedded systems conference Boston
The Finite State Machine (FSM) are one of the basic building blocks every FPGA designer should know and deploy often. However, over the years when implementing state machines in many different solution spaces (defense, aerospace, automotive etc) I have learnt a few tips and recommendations which I thought I would share. Following these have provided me with a better quality of results and helped me deliver working systems to my customers faster.
Develop the state machine as a single clocked process.
Perhaps the most controversial point is always using a single process state machine. Of course, this is different to the state machine architecture we are taught when we first learn about them. Many, if not most universities teach state machine implementation using two processes, one combinatorial and another sequential.
With a two-process state machine implementation the main functionality will be contained in a combinatorial process. If we fail to fully define all the combinatorial conditions within this process we will implement latches during synthesis. The combinatorial process may also generate glitches on its outputs as the state and inputs change.
Debugging can also be harder as the sensitivity list needs to be complete, including all of the signals used in the combinatoral process. Failure to include a signal in the sensitivity list will result in different behaviour between RTL simulation and the implementation which can take a little time to find. This is alleviated somewhat if we are using VHDL 2008, where we can use the “all” statement in the process declaration. Of course to take advantage of this your tool chain needs to support VHDL 2008.
By contrast a one process state machine enables much easier use of conditional definition and removes the ability for latches to be created. While preventing glitches as all outputs signals are registered.
Personally I also find them a little easier to debug as all the functionality is within one process.
Decouple functionality, only allow single bit input and outputs to your state machine.
We are often tempted to take into our state machine large buses and decode these within the main body. This is especially true when counters are used for the timing of events within the state machine. For example, as shown in the code snippet below
when wait_cnt =>
if cnt >= std_logic_vector(to_unsigned(128,16)) then
current_state <= read_fifo;
cnt <= (others =>’0’);
cnt <= cnt + 1;
Implementing the state machine in this manner includes additional logic within the state machine for both the comparator and the counter. This will impact the performance of the state machine within the implemented FPGA as it requires more resources and consequently more routing.
A better method is to use an external process for the counter or other external functions which pass in a single control signal. This leaves the bulk of the logic decoupled from the state machine. Taking the counter example, we can use an external counter process to generate a single pulse once the terminal count is achieved as shown below.
when wait_cnt =>
if cnt_terminal = ‘1’ then
current_state <= read_fifo;
cnt_reset <= ‘1’;
In the above code the single bit input provided from the counter process to indicate the terminal count has been reached. While the state machine asserts a single bit output to reset the counter when this occurs. This uses less logic within the state machine, enabling better performance.
As shown in this example only a single bit should also only leave the state machine as well.
Address any unmapped states
Many times, when we develop a state machine, the implementation does not use a power of two states leaving several unmapped states. If the state machine enters an unmapped state the state machine will stall and become unresponsive as there is no recovery mechanism. Depending upon the application this can be merely inconvenient or lead to catastrophic consequences.
Transitions to unmapped states can occur for a variety of reasons, from a single event effect to an electrically noisy environment or even vibration effecting device bond wires (before you ask yes, I have seen this). It is therefore good practice to ensure there is a path back from an unmapped state back into the main flow of the design. One of the simplest mechanisms to do this is to cycle through the unused states following the reset or power up before the state machine enters its idle state. This prevents the unmapped states from being optimised out during synthesis, while providing a simple recovery mechanism.
Consider the unexpected, what happens if control signals arrive late or early
When considering the control flow of a state machine it is important to consider what happens if an expected signal does not arrive when expected. While in an ideal world we would have defined interface definitions for each module in our design sadly this is not always the case. As such we need to make sure the state machine does not hang in a state waiting for a signal which has already occurred and as such has been missed.
Considering this ensures the state machine can handle the worse case conditions in operation.
This is incredibly important if your state machine contains structures like below, where a late signal can easily lead to the system becoming unresponsive.
when <state> =>
if input_one = ‘1’ then
if input_two = ‘1’ then
current_state <= wait_fifo;
elsif input_three = ‘1’ then
current_state <= idle;
Should input one not be asserted when inputs two or three occur then the state machine will hang and become un-responsive. If such a structure is unavoidable and I am sure you can avoid it with a little thought, a timer or other recovery function should be added to prevent the state machine locking up, allowing a graceful recovery.
- For more on high reliability state machines read USING FPGA’S IN MISSION CRITICAL SYSTEMS
For more on the basics of state machines read How to implement a state machine in your FPGA
MicroZed Chronicles on GitHub
Want a Book
One thing that is always important for engineers, is the need for us to deliver our projects on quality, schedule and budget. When it comes to developing embedded systems there are a number of lessons, learnt by embedded system developers over the years which can be used to ensure your embedded system achieves these. Let us explore some of the most important lessons learned in developing these.
Completing the RTL design is one part of getting
your FPGA design production-ready.
The next challenge is to ensure the design
meets its timing and performance requirements
in the silicon. To do this, you will often need to
define both timing and placement constraints.
Let’s take a look at how to create and use both
of these types of constraints when designing systems
around Xilinx® FPGAs and SoCs
The Advanced Encryption Standard (AES) has become an increasingly popular cryptographic specification in many applications, including those within embedded systems. Since the National Institute of Standards and Technology (NIST) selected the speci- cation as a standard in 2002, developers of processor, microcontroller, FPGA and SoC applications have turned to AES to secure data entering, leaving and residing within their systems. The algorithm is described very efficiently at a higher abstraction level, as is used in traditional software development; but because of the operations involved, it is most efficiently implemented in an FPGA. Indeed, developers can even get some operations “for free” in the routing. For those reasons, AES is an excellent example of how developers can benefit from the Xilinx® SDSoC™ development environment by describing the algorithm in C and then accelerating the implementation in hardware. In this article we will do just that, first gaining familiarity with the AES algorithm and then implementing AES256 (256-bit key length) on the processing system (PS) side of a Xilinx Zynq®-7000 All Programmable SoC to establish a baseline of software performance before accelerating it in the onchip programmable logic (PL). To gain a thorough understanding of the benefits to be gained, we will perform the steps in all three operating systems the
SDSoC environment supports: Linux, FreeRTOS and BareMetal
Pretty much every embedded system / FPGA design has to interface to the real world through sensors or external interfaces. Some systems require large volumes of data to be moved around very quickly, in which case high-speed communications interfaces like PCI-X, Giga Bit Ethernet, USB, Fire/ SpaceWire, or those featuring multi-gigabit transceivers may be employed.
However, many embedded systems also need to interface to slower interfaces for sensors, memories and other peripherals these systems can employ one or more of the simpler communications protocols. The four simplest, and therefore most commonly used, protocols are as follows.
- UART (Universal Asynchronous Receiver Transmitter): This comprises a number of standards defined by the Electronic Industry Association (EIA), the most popular being the RS-232, RS-422, and RS-485 interfaces. These standards are often used for inter-module communication (that is, the transfer of data and supervisory control between different modules forming the system) as opposed to between the FPGA and peripherals on the same board, although I am sure there are plenty of applications that do this also. These standards defined are a mixture of point-to-point and multi-drop buses.
- SPI (Serial Peripheral Interface): This is a full-duplex, serial, four-wire interface that was originally developed by Motorola, but which has developed into a de facto standard. This standard is commonly used for intra-module communication (that is, transferring data between peripherals and the FPGA within the same system module). Often used for memory devices, analog-to-digital converters (ADCs), CODECs, and MultiMediaCard (MMC) and Secure Digital (SD) memory cards, the system architecture of this interface consists of a single master device and multiple slave devices.
- I2C (Inter-Integrated Circuit): This is a multi-master, two-wire serial bus that was developed by Phillips in the early 1980s with a similar purpose as SPI. Due to the two-wire nature of this interface, communications are only possible in half-duplex mode.
- Parallel: Perhaps the simplest method of transferring data between an FPGA and an on-board peripheral, this supports half-duplex communications between the master and the slave. Depending upon the width of data to be transferred, coupled with the addressable range, a parallel interface may be small and simple or large and complex.
The arty board provides four PMOD outputs (JA through JD) along with the Arduino / ChipKit shield connector through which we can interface with peripherals using these interfaces. Over the next few weeks I am going to interface the PModDA4 (SPI) and the PModAD2 (I2C) to the MicroBlaze System that we have created.
The first step is to generate the hardware build within Vivado such that we can interface to the PMODs. We can add in a AXI SPI controller from the IP library and configure it to be a standard SPI driver and not a dual or quad (more on those in future blogs). We can also add in the AXI IIC (remember to search for iic not i2c) controller module and connect it up to the AXI bus, do not forget to add both interrupts to the interrupt controller.
Once both controllers are within the design the next step is to customise for application at hand, with these complete we can assign the i2c and SPI pins to the correct external ports for the PMOD desired.
All that remains then is to build the hardware and write the software, it sounds so easy when we say it quickly.
Sometimes despite the simulation we have performed upon our design we still have issues with the design on the hardware. One way we can debug in the hardware is to use a ILA within our hardware design, this gives us the ability to monitor either a number of AXI interfaces or discrete inputs.
Adding in an AXI monitor on the AXI interface to the XADC we are currently using in our example we enable us to monitor the transactions. In this case just such that we can examine them however in other instances it may be necessary due to design or integration issues.
We can add in the ILA into our design from the Integration library in our block diagram.
Insertion of the ILA into the Design
By default, the ILA is configured to monitor a AXI interface however, we can change its settings by double clicking on the ILA block and customising as required. We can add in trigger ports, capture control and comparators to trigger on specific patterns in the data stream. For this example, I am going to keep it fairly simple to show the flow and then we can look at more advanced aspects once this is understood.
Customising the ILA
With the ILA inserted into the design the next stage is to reset the generated outputs and then re generate these such that we can implement the design. Once the design is implemented we can then programme the device using the hardware manager, it is via the hardware manager that we can also examine the contents of the interfaces we wished to examine.
Using the hardware manager, we should connect to our Arty target, the next step is to programme the FPGA and initialise the logic analyser. To achieve this we need both the bit file and the definition of the debug nets we wanted to examine in the Vivado block diagram, you will see this as the ltx file within the projects runs / implementation directory beside the bit file.
Identifying the ltx file
Configuring the device and debug profile.
Once these files have been configured we will see the screen below open up in the hardware manager which provides the interface we can use to configure the triggering, capture mode and arm the ILA core.
Once we have configured the core as we wish, we can examine the waveform in the waveform viewer below.
ILA Configuration and control scheme in Hardware Manager
With the simple flow explained we can, in the next blog look at how we can use more advanced features of the ILA.
The Arty board is the next generation of the very useful LX9 MicroBoard however, it takes account of advances in devices and interfacing. I ordered mine just before I recently flew to Japan, and it was waiting for me when I returned.
The Arty is marketed as the perfect development platform for MicroBlaze applications as such when I opened my Arty the first thing I wanted to do was a new build from scratch of a MicroBlaze system. This way I could really understand the ease (or not) with which it could be developed, in the end it was very easy to create both the hardware application and create the simple software to say hello world.
To get MicroBlaze up and running on my Arty I did the following
- Download and install the most update version of Vivado (remember to install the Vivado Design Edition)
- Redeem the license for the Vivado SW which comes with the Arty
- Download the Arty board definitions
- Create the hardware definition within a new Vivado project
- Build the hardware definition and export it to SDK
- Create out SW application – This needs Hardware Definition, Board Support Package and Application SW.
While it may seem daunting these stages can be achieved quickly, also I am not going to talk you through how to do the first two points as they are very straight forward.
Which brings us to downloading the board definitions, these are available at the following link https://reference.digilentinc.com/vivado:boardfiles Once you have downloaded the board files you need to extract the Arty board definitions and store it within your Vivado installation <VIVADO DIRECTORY>/data/boards/ on my system the path is C:\Xilinx\Vivado\2015.3\data\boards\board_files
Within the arty directory you will then see the board definition XML and most helpfully a mig.prj this is the settings for the memory interface generator for if we wish to use the DDR on the arty (and we will of course).
This enables us to create a new project and select the Arty board, which is our next step
Once we have the project created the next step is to create the MicroBlaze system, to do this we need to create a block diagram within which we can create our system. You can create a block diagram by selecting Create Block Diagram option under the Flow Navigator on the left of Vivado
Once the block diagram is open we need to add the following things to it
- MicroBlaze System – See end of blog for customisation
- Memory Interface Generator – This uses the arty settings so it is just a case of scrolling through and generating the options
- AXI UART lite – for communication externally – double click on it to set your RS232 options the default is 9600 no parity one stop bit
- AXI Timer
- AXI Interrupt Controller – We need a concatenate block to drive the interrupt from the timer and the AXI UART lite
- AXI BRAM Controller and BRAM for the AXI Data and Instruction Cache
- Clocking wizard to output a 166.667 MHz Clock and a 200 MHz Clock
- MicroBlaze Debug Module
- AXI Peripheral Interconnect to connect to the timer and UART
- AXI Memory Interconnect to connect to the MIG (DDR) and the AXI BRAM controller
- Processor Reset System
The clocking of the MicroBlaze and all AXI peripherals should use the output clock from the MIG (ui_clk) while the MCM reset from the MIG block should be fed back to the processor reset system DCM input, the ui_clk_rst goes to the ext_reset_in on the reset block.
The clock wizard outputs connects as below
The rest of the connections are pretty straight forward AXI connections, the only difference is the need to add in a MicroBlaze Local DLMB and ILMB memory we can use the Run Block Automation Option (MicroBlaze) available on top of the block diagram editor.
When it is all complete your diagram will look like the below
The one thing we have not done by this point is to connect up the IO on the design to those on the board. We can do this by selecting on the board tab within the block diagram
We can then right click on a signal we wish to use and select Connect Board Component as below
It is then just a simple case of selecting the existing IP to connect it to
Do this for the following
- Sys_clk – 100 MHz clock on the Arty
- DDR3_SDRAM – the DDR on the arty
- USB_UART – How we will say hello world
- Reset – Reset input
This saves us from having to write a XDC file with the pin locations needed to ensure we use the correct IO with the board.
With this completed we can validate our design and we should not get any warnings – if you have any address issues see he Address Editor tab mine looked like below when it was validated OK
Once it has validated OK you are in a position that you can build the hardware, and export the hardware definition to SDK.
I will post another blog tomorrow on how to get the SDK side of things up and running but below is the result
Below are the MicroBlaze customisations
Slides presented at ESC SV 2015 on using lessons learned from space FPGA developments here on earth to get better quality of results
One the most exciting, fun and terrifying stages of engineering is when the first prototypes or engineering models arrive in the lab and are ready for testing. You will have been planning for this day for a long time (hopefully) as the cost of identifying and correcting errors later on in the production run only increases.
Planning for this day will have started right back at the concept of the design as you considered how you would test the functionality being designed into the hardware, FPGA and processors etc. Ensuring you have provided sufficient test points and protected them appropriately (you would hate to short out a rail as you tried to measure the voltage). One thing you also need to consider is accessibility of these test points and debug headers to ensure you can actually access them. Another aspect to consider is how you can test the board on its own will you need any special to type test equipment to enable you to test it, when will it be available.
You will also need to compile a test plan to detail everything you intend to test and the expected results otherwise how else can you be expected to know if performs as required or not.
Once you get the hardware in the lab the testing is generally split into two sections the section is checking the hardware integrity i.e. can it be powered on safely and is it suitable for further testing. During this stage you will check the board has been manufactured and populated correctly, that the voltage rails are safe to turn on and then will come the moment of truth when you have to apply power to the board for the first time. This is always a nerve wracking moment…
Once you have applied power you will be looking at the current drawn against your projections, are the clocks at the correct frequency, does the protection circuitry (over voltage / under voltage) resets and sequencing function as desired. This is the basic engineering tests that will be your first priority however; you will soon progress to wanting to test the more complex interfaces and then the performance.
Some of these may be able to be tested via JTAG / Boundary scan however it is only really testing at speed that you can relax a little (you can never truly relax even after all the qualification testing) It is therefore a great idea to have developed some simple test code for your FPGA or microprocessor to prevent you having to debug both the FPGA/Microprocessor design and the board design. I am sure we have all spent many hours looking into is the issue with the board, FPGA, processor or even worse the ASIC.
Once you have completed the integrity checks you can then proceed to testing the functionality and working out what changes you need to make to the next iteration if any.
Of course at some point I am sure you will encounter problems the most important thing to remember at that point is to not panic and attempt to determine the root cause of the issue even if there is nothing you can do about it on the prototype.