# The Basics of FPGA Mathematics Issue 80

One of the many benefits of an FPGA-based solution
is the ability to implement a mathematical
algorithm in the best possible manner for the
problem at hand. For example, if response time
is critical, then we can pipeline the stages of mathematics.
But if accuracy of the result is more important, we can use
more bits to ensure we achieve the desired precision. Of
course, many modern FPGAs also provide the benefit of
embedded multipliers and DSP slices, which can be used to
obtain the optimal implementation in the target device.
Let’s take a look at the rules and techniques that you can
use to develop mathematical functions within an FPGA or
other programmable device.

# The Engineers Guide to using ADCs and DACs issue 80

Once it’s performed the task it
was designed to do, an FPGAbased
system next has to interface
with the real world, and as every
engineer knows, the real world tends
to function around analog as opposed
to digital signals. That means conversion
is going to be required to and from
the digital domain from the analog
realm. Just as you face a plethora of
choices in selecting the correct FPGA
for the job at hand, so too will you find
an abundance of riches when choosing
the correct ADC or DAC for a system.

# How to Use the CORDIC Algorithm in Your FPGA Design Issue 79

Invented by Jack Volder while designing a new navigation
computer at Convair for the B-58A Hustler program
in 1959, CORDIC—it stands for Coordinate Rotation
Digital Computer—is a simple algorithm designed to calculate
mathematical, trigonometric and hyperbolic mathematical
functions.

# Ins and Outs of Digital Filter Design and Implementation Issue 78

Filters are a key part of any signal-
processing system, and as
modern applications have
grown more complex, so has
filter design. FPGAs provide
the ability to design and implement filters
with performance characteristics that
would be very difficulty to re-create with
analog methods. What’s more, these digital
filters are immune to certain issues that
plague analog implementations, notably
component drift and tolerances (over temperature,
applications). These analog effects
especially in areas such as passband ripple.

# How to Build a Better Dc/Dc Regulator Using FPGA’s Issue 77

DC/DC converters using analog
components (bespoke ICs, operational
amplifiers, resistors, capacitors and
the like) to control the feedback loop
and to generate the pulse-width modulation
required for switching. When
using analog components like these,
you must consider a number of factors,
taking tolerances, electrical
stresses, aging drift and temperature
drift into account to ensure the stability
of the design. Now, the availability
of affordable low-powered FPGAs
coupled with analog-to-digital converters
allows the FPGA to replace the traditional
analog approach.

# High Performance FPGA’s Take Flight in Micro Satellites Issue 75

The UKube1 mission is the pilot mission for the U.K. Space
Agency’s planned CubeSat program. CubeSats are a class of
nanosatellites that are scalable from the basic 1U satellite
(10 x 10 x 10 cm) up to 3U (30 x 10 x 10 cm) and beyond, and which
are flown in low-earth orbit. The typical development cost of a
CubeSat payload is less than \$100,000, and development time is
short. This combination makes CubeSats an ideal platform for verifying
new and exciting technologies in orbit without the associated
overhead or risks that would be present in flying these payloads on
a larger mission. Of course, this class of satellites can present its
own series of design challenges for the engineers involved.
two experiments, both of which are FPGA based. The first experiment
is the validation of a patent held by Astrium on random-number
generation. True random-number generation is an essential component
of secure communications systems. The second experiment
is the flight of a large, high-performance Xilinx® Virtex®-4 FPGA
with the aim of achieving additional in-flight experience with this
technology while gaining an understanding of the device’s radiation
performance and capabilities in the low-earth orbit (LEO). Figure 1
shows the architecture of the payload.

# Using FPGA’s in Mission Critical Systems Issue 73

Dramatic surges in FPGA technology,
device size and capabilities have over the
last few years increased the number of
potential applications that FPGAs can
implement. Increasingly, these applications
are in areas that demand high reliability,
such as aerospace, automotive or
medical. Such applications must function
within a harsh operating environment,
which can also affect the system
performance. This demand for high reliability
coupled with use in rugged environments
often means you as the
engineer must take additional care in the
design and implementation of the state
machines (as well as all accompanying
logic) inside your FPGA to ensure they
can function within the requirements.
One of the major causes of errors within
state machines is single-event upsets
caused by either a high-energy neutron or
an alpha particle striking sensitive sections
of the device silicon. SEUs can cause a bit
to flip its state (0 -> 1 or 1 -> 0), resulting
in an error in device functionality that
could potentially lead to the loss of the system
or even endanger life if incorrectly
handled. Because these SEUs do not result
in any permanent damage to the device
itself, they are called soft errors.

# UART, SPI,I2C and More

Pretty much every FPGA design has to interface to the real world through sensors or external interfaces. Some systems require large volumes of data to be moved around very quickly, in which case high-speed communications interfaces like PCI-X, Ethernet, USB, Fire/SpaceWire, and CAN, or those featuring multi-gigabit transceivers may be employed. However, these interfaces have a considerable overhead in terms of system design and complexity, and they would be overkill for many applications.

Systems with simple control and monitor interfaces — systems that do not have large volumes of data to transfer — can employ one or more of the simpler communications protocols. The four simplest, and therefore most commonly used, protocols are as follows.

• UART (Universal Asynchronous Receiver Transmitter): This comprises a number of standards defined by the Electronic Industry Association (EIA), the most popular being the RS-232, RS-422, and RS-485 interfaces. These standards are often used for inter-module communication (that is, the transfer of data and supervisory control between different modules forming the system) as opposed to between the FPGA and peripherals on the same board, although I am sure there are plenty of applications that do this also. These standards defined are a mixture of point-to-point and multi-drop buses.

• SPI (Serial Peripheral Interface): This is a full-duplex, serial, four-wire interface that was originally developed by Motorola, but which has developed into a de facto standard. This standard is commonly used for intra-module communication (that is, transferring data between peripherals and the FPGA within the same system module). Often used for memory devices, analog-to-digital converters (ADCs), CODECs, and MultiMediaCard (MMC) and Secure Digital (SD) memory cards, the system architecture of this interface consists of a single master device and multiple slave devices.

• I2C (Inter-Integrated Circuit): This is a multi-master, two-wire serial bus that was developed by Phillips in the early 1980s with a similar purpose as SPI. Due to the two-wire nature of this interface, communications are only possible in half-duplex mode.

• Parallel: Perhaps the simplest method of transferring data between an FPGA and an on-board peripheral, this supports half-duplex communications between the master and the slave. Depending upon the width of data to be transferred, coupled with the addressable range, a parallel interface may be small and simple or large and complex.

Over the next few weeks, I will be exploring each of these protocols in depth, explaining their histories, pros and cons, and typical uses. Also, I will be discussing how to implement these protocols inside an FPGA (there may even be some code floating about as well).

But before we examine these protocols in depth, it’s well worth our while to spend a little time recapping some of the terminology we use when describing protocols and their behavior, as follows.

• Point-to-point: The simplest of communication protocols that involves just two devices exchanging data (a common example is RS-232).
• Multi-drop: A more complicated structure that consists of a single master and a number of slaves, thereby facilitating more complicated data transfers and control architectures. Some protocols, such as I2C, can have multiple masters.
• Simplex: This refers to data communication that is unidirectional. A typical implementation of a simplex data link could be a sensor broadcasting data to an FPGA or a microcontroller.
• Duplex: This term is used when discussing point-to-point links. The ability of a protocol to support communication in one direction at a time is referred to as “half duplex.” If the protocol can support communication in both directions simultaneously it is referred to as “full duplex.”

As we discuss the different protocols in future columns, we will also be referring to the Open Systems Interconnection (OSI) seven-layer model. This is an abstract model used to describe how protocols function.

• Layer One: Physical Layer. Describes the physical interconnection.
• Layer Two: Data Link Layer. Describes the means for actually transferring data over the physical layer (Physical Addressing).
• Layer Three: Network Layer. Describes the means to transfer data between different networks (Logical Addressing).
• Layer Four: Transport Layer. Provides transfer of data between end users.
• Layer Five: Session Layer. Controls the connections between end users.
• Layer Six: Presentation Layer. Transforms data presentation between higher-level and low-level layers.
• Layer Seven: Application Layer. Interfaces to the final application where the data transferred to is used or data is gathered for transfer.

It is worth noting at this point that not all layers of the OSI model are required to be present within a particular protocol. We will see examples where only a subset of the layers is employed.

As an aside, while I was writing this, I thought about some of the more obscure EIA protocols out there, such as RS-423, RS-449, and EIA-530.

# RS232 & UART

using a UART (Universal Asynchronous Receiver/Transmitter) for the RS-232 standard. This has to be one of the first communication protocols many of us were introduced to at university or college.

Simply put, RS-232 is a single-ended serial data communication protocol between a DTE (Data Terminal Equipment) and a DCE (Data Circuit-terminating Equipment). So what do these terms actually mean? Well, the DTE is the “master” piece of equipment; for example, your laptop or an ASIC or an FPGA or a microcontroller — whatever is initiating the communications. By comparison, the DCE is the “slave” piece of equipment; for instance, a MODEM back in the days of “dial-up” communications or any other device that is subservient to the main piece of equipment.

RS-232 is one of the oldest communication protocols still in common use. It was originally developed in 1962 which makes it 50 years old this year. The “RS” came from the fact that this was first introduced by the Radio Sector of the Electronic Industries Alliance (EIA), but RS-232 is now generally understood to stand for “Recommended Standard 232.”

The standard itself defines only the physical layer of the protocol — addressing the electrical signal requirements, interface connectors, and so on — but nothing at the data link layer such as framing or error correction.

The complete RS-232 standard defines the signals shown in the table below. However the simplest of RS232 links can be achieved with just three lines — Transmitted Data, Received Data, and Common Ground — with flow control between the DTE and DCE being implemented using commands transmitted over the communications lines (this is often realized using ASCII control characters).

Due to its age, RS-232 uses voltage levels that are considerably higher than those used commonly within designs today. For both transmitted (Tx) and received (Rx) data, the standard defines a logic 1 as a voltage between −3 volts and −15 volts, while a logic 0 is defined as a voltage between +3 volts and +15 volts. For this reason, many designs will contain a transceiver that translates between internal logic levels and the correct RS-232 voltages.

A UART (Universal Asynchronous Receiver/Transmitter) can be used to implement many protocols, with RS-232 being perhaps one of the most common, closely followed by RS-485. Implementing a UART within an FPGA is very simple, requiring only a baud rate generator and a shift register for transmission (with the correct start, parity, and stop bits). Following the start bit, the transmitter transmits the data — LSB (least-significant bit) through to the MSB (most-significant bit) –followed by the parity bit and finally the stop bit(s).

As a small aside, the “stop bit” it is not a real bit, but is instead the minimum period of time the line must be idle at the end of each word. On PCs this period can have three lengths — the time equal to 1, 1.5 or 2 bits.

Implementing a receiver is a little more complicated, because most FPGAs operate at clock frequencies in excess of the RS-232 rate. This allows the receiving FPGA to oversample the incoming receive line. In turn, this allows the leading edge of the start bit to be detected, thereby providing the timing reference for the recovery of the remaining bits. The simplest method is to sample in the middle of the nominal bit period; however, an alternate method of sampling at one third and two thirds of the bit period and confirming that the two sampled values are in agreement is also often used.

The example here is a RS232 receiver this receives an RS-232 signal at 115,200 baud and — in the example provided — the received data is wired out to eight LEDs on my development board allowing them to be turned on or off over the serial link.

Testing this RS-232 Receiver also gave me the opportunity to try out the Oscium LogiScope, which turns your iPad into a very good logic analyzer. In the image below, we see my FPGA development board at the bottom of the image. Behind this board is an iPad. The Oscium LogiScope hardware is the small black box sticking out of the right-hand side of the iPad, while the software (which can be downloaded from the iTunes Store) is running on the iPad.

This enabled me to break out the Rx signal going into the FPGA along with the internal capture register through which the Rx signal is shifted as it is captured. I must say I very impressed with how easy it was to use the LogiScope logic analyzer and set up the triggering etc. since — being a typical engineer — I did not read the user manual. The LogiScope provides the option to perform advanced triggering on patterns or multi-level events, and it also decodes I2C, which will come in useful when we come to discuss that protocol in a future column.

Extracting the screen shots and logs from the iPad was also very simple. Using the email option, it was easy to send these to my email account, thereby allowing me to include them in this blog as shown below:

The beauty of an FPGA-based UART is that it can be easily adapted to interface to other protocols like RS-485 and RS-422. This allows you as the FPGA designer to develop a soft UART core that can be reused across a number of projects. Have you developed such a core — or used someone else’s? If so, please share your experiences with the rest of us.

# Metastability

In a  previous column, we discussed the difference between registers and latches, so I decided to dedicate this column to explain what metastability is, what causes it, and how we can learn to live with it since its occurrence cannot be totally prevented.

As illustrated below, metastability can happen to registers when their setup or hold times are violated; that is, if the data input changes within the capture window. As a result, the output of the register may enter a metastable state, which involves oscillating between logic 0 and 1 values. If not treated, this metastable condition may propagate through the system, causing issues and errors. The register will eventually recover from its metastable state and “settle” on a logic 0 or 1 value; the time it takes for this to occur is called the recovery time.

Metastability within an FPGA design will typically occur in one of two ways:

1. When an incoming signal is asynchronous with regard to the clock domain. This may be an external input signal or a signal crossing between clock domains. In this case, the design engineer is expected to resynchronise the signal to address metastability, which is certain to occur eventually. This is where a multi-stage synchroniser is typically employed as discussed below.
2. When multiple register elements in a synchronous design are using the same clock, but phase alignments or clock skew issues mean that the output from one register violates another register’s setup and hold time. This may be addressed by modifying the place-and-route constraints or by changing the logic design itself.

Let’s consider the case of an incoming signal that is asynchronous with respect to the system clock. It is the engineer’s responsibility to create the design in such a way as to mitigate against any resultant metastability issues. Many engineers will be familiar with the concept of a two-stage synchronizer, but I wonder how many really understand just how it performs its magic?

A two-stage synchronizer.

In fact, the two-stage synchronizer works by permitting the first register to go metastable. The idea is that the system clock is running — and therefore “sampling” the external signal — significantly faster than the external signal is changing from one state to another. If it should happen that a transition on the asynchronous signal causes the first register to become metastable, then ideally this register will have recovered by the time the next clock edge arrives and loads this value into the second register.

Now, this is where some people become confused. Let’s assume that the original value on the asynchronous signal was a logic 0, and that this value has already been loaded into both of the synchronizer registers. At some stage the asynchronous signal transitions to a logic 1. Let’s explore the possibilities as follows:

The first possibility is that the transition on the asynchronous signal doesn’t violate the first register’s setup or hold times. In this case, the first active clock edge (shown as “Edge #1” is the illustration below) following the transition on the asynchronous signal transition loads its new value into the first register, and the second active clock edge will copy this new value from the first register into the second as shown below:

Transition on input doesn’t cause any problems.

The second possibility is that the transition on the asynchronous signal does violate the first register’s setup or hold times, which means the first active clock edge causes the first register to enter a metastable state. At some stage — hopefully before the next clock edge — the first register will recover, by which we mean it will settle into either a logic 0 or a 1 value. Let’s assume that the first register ends up settling into a logic 1 as shown below:

Metastable state settles on logic 1 value.

This is, of course, what we wanted in the first place. In this case, the second active clock edge will load this 1 into the second register (which originally contained a logic 0). Thus, the end result — as seen at the output from the second register — is exactly the same as if the first register had not gone metastable at all.

The final possibility (at least, the last one we will consider in this column) is that, following a period of metastability, the first register settles into a logic 0 as shown below:

Metastable state settles on logic 0 value.

In this case, the second active clock edge will load this 0 into the second register (which already contained a 0). At the same time, this second active clock edge will load the logic 1 on the asynchronous signal into the first register. Thus, it is the third active clock edge that eventually causes the second register to be loaded with a logic 1.

The end result of using our two-stage synchronizer is that — in a worst-case scenario — the desired output from the synchronizer is delayed by a single clock cycle. Having said this, there is a slight chance that the first register will not recover in time, which might cause the second stage of the synchronizer to enter its own metastable condition.

The alternative would be to have three or even more stages, so how do we determine if two stages are acceptable… or not?

Well, as engineers we can actually calculate this. Sadly, this does involve some math, but I will try and keep the painful parts to a minimum. The mean time between failure (MTBF) for a flip-flop (register) depends upon the manufacturing process. Let’s start with the equation for a single flip-flop as follows:

Based on this, we can calculate the MTBF for a multi-stage synchroniser using the following equation:

For both equations:

I really am sorry about this, and I will do my best to keep math out of future columns. Having said this, by means of this equation, it is possible to determine the mean time between a metastability event occurring for your chosen synchronizer structure (two or more flip-flops). If the resulting MTBF for a two-stage synchronizer shows that the time between metastable events is not acceptable (that is, they will occur too often), then you can introduce a third flip-flop.

The use of a three-stage synchronizer is often required in the case of high-speed or high-reliability systems. If you are designing for a high-reliability application, then you will need to demonstrate that the metastability event cannot occur during the operating life of the equipment (at a minimum). This MTBF (or, more correctly, its reciprocal, which is the failure rate) can also be fed into the system-level reliability calculations to determine the overall reliability of the entire system.

When it comes time to simulate these synchronizers, it quickly becomes obvious that the tools are limited in regard to the way in which they can model metastable events. For example, consider the following results generated by simulating an RTL version of a two-stage synchronizer

The RTL simulation appears to indicate that there are no problems.

Even though there is, in fact, a problem with this design, no errors are detected or displayed, due to the fact that the RTL does not — in this case — contain any timing information.

For a simulation to exhibit metastability, you have to simulate at the gate level using a standard delay file (SDF) that contains the appropriated timing information. The synthesis tool extracts this timing information from the library associated with the target FPGA component. For example, consider the following gate-level simulation results for the same two-stage synchronizer

The gate-level simulation reveals a timing error (where the traces go red).

Also, the following warning messages were generated as part of this gate-level simulation:

If you wish, you can replicate these results for yourself by downloading this ZIP file, which contains the following files:

• meta_testbench.vhd — The VHDL testbench
• meta_rtl.vhd — The RTL version of the design
• meta_gate.vhd — The synthesized gate-level version of the design
• meta_gate.sdf — The delays associated with the gate-level version of the design

You can replicate the RTL simulation using the “meta_testbench.vhd” and “meta_rtl.vhd” files. Similarly, you can replicate the gate-level simulation using the “meta_testbench.vhd” and “meta_gate.vhd” files with the delays in the “meta_gate.sdf” file being applied to the “/uut/” region.

the RTL simulation of our two-stage synchronizer indicated that there weren’t any problems. However, the gate-level simulation did reveal a timing error (where the traces go red). It’s also interesting to note that even the gate-level simulation will not fully behave like the real-world synchroniser, because the flip-flop models do not contain the required information on recovery time. This means that the unknown “X” state will be propagated through the design and will affect any downstream elements in the design. It’s up to us at the engineers to handle this correctly.

Of course, saying things like “It’s up to us to handle this correctly” sounds good if you say it quickly and wave your arms around a lot, but how do we actually do this? Well, since the flip-flops in an FPGA are already fabricated, this means that — as FPGA designers — we have only two options (ASIC/SoC designers have a third option, because they can implement a special synchroniser flip-flop which has an acceptably high MTBF with regard to these metastable events).

The first approach is to disable all timing checks within the simulator, but this will hide other timing issues that really need to be investigated. The alternate — and often preferred — technique is to find the first register (“synch_reg(0)” in this case) in the SDF file and set the setup and hold time information to 0. This is shown in the example below where the red highlighted text is changed from the original settings to the updated values required for simulation.

Original settings for “synch_reg_0” in the SDF file (show in red).
Modified settings for “synch_reg_0” in the SDF file (show in red).

Doing this will prevent the register from being able to experience the metastable event. This is only acceptable, however, if you have already analysed your design and you are confident that your synchroniser has the required MTBF