Category Archives: FPGA & VHDL

Articles, tips and advice on FPGA & VHDL development

Using ChipScope ILA


If you are new to FPGAs, one aspect of the development flow you may not have considered is how you will go about debugging your design once it has been loaded into the FPGA.

In order to set the scene, let’s first cast our minds back to the days before FPGAs and consider how we would debug a digital circuit board or system in the lab. One of the tools we would have employed would be a logic analyzer. (See: Turn Your iPad Into a Logic Analyzer!) First we would connect the analyzer’s probe leads to the signals of interest on the board. We might also specify certain trigger conditions upon which we desired the tool to commence storing data for subsequent display and analysis. Then we would run the system and try to work out what the heck was happening.

Logic analyzers are, of course, still employed today. When it comes to using one to debug an FPGA design, we typically start by creating a dedicated test header that will connect to the FPGA’s input-and-outputs (I/Os). One problem with this scheme is that there can be hundreds of thousands of signals inside the FPGA — a much greater number than there are I/Os on the device and signals you can break out to the test header. This means that you may have to keep on rebuilding your design to access the signals of interest and route them out to the test header.

In some cases, the physical construction of the unit in question means that test headers are of use only at the board level and not during system integration. Indeed, I am working on one such project at the time of this writing. Another problem is that many FPGA designs are I/O limited from the start, so dedicating a bunch of pins to observe what’s happening on internal signals may simply not be a feasible option.

And one further problem is that, inevitability, the logic analyzer you are using will also be required by one or more other project teams, which means you all have to agree on how you will allocate the analyzer resources. I cannot tell you how frustrating it is to be homing in on a problem when… suddenly… it’s time to disconnect one’s intricate probe setup and allow the analyzer to be wheeled away to someone else’s project.

One solution to this problem — a solution that has seen great advances over the last few years — has been the development of in-chip logic analyzers for use with FPGAs. The idea is to employ any unused programmable resources and on-chip memory blocks to implement one or more “virtual” logic analyzers.

As with their physical counterparts, these virtual logic analyzers — like ChipScope from Xilinx, Identify RTL Debugger from Synopsys, Reveal from Lattice Semiconductor, and SignalTap from Altera — can be set up so that they will only start collecting data after certain trigger conditions have been met. Engineers can use these analyzers to “peer” into the design as it operates, storing the resulting data in on-chip RAM, extracting the results over the JTAG port, and then displaying the results — more or less in real-time — on their screens.

Using virtual logic analyzers may remove the need for test headers. Sadly, however, in many cases they do not remove the need to rebuild the code. One big advantage of these in-chip logic analyzers is that they offer the ability to capture the values on wide internal busses and store these values in internal RAM. The big downside with this approach comes in designs that are already utilizing most of the devices programmable resources, because this will limit any logic analyzer implementations.

Implementing ChipScope can be very quickly achieved within the ISE design flow. The simplest method is to first implement your design, but not to generate the *.bit file. Instead, open up Core Inserter under your Xilinx installation (in Windows, use Start > Xilinx > ChipScope [pro] > Core Inserter). Select the target technology and identify the output file of the synthesis (either *.ngc or *.edf depending upon the tool you used) and add an ICON controller and then the ILA block.


This is where you will connect the signals you wish to analyze. It is possible to have several ILA blocks per ICON if you wish to use different triggers or monitor different signals, etc. Once you’re happy with the connections you can insert the core, although — depending on the speed of your machine — this may take a little time. After the core has been inserted, you need to rerun the implementation stages and generate a *.bit file (ISE should show the stages needing to be re-run). Having configured the target device, you can then connect to the target over JTAG using the ChipScope Analyzer tool and trigger on the waveform of interest as illustrated in the screenshot below.



If you are interested in playing with this yourself, an example of the project referenced in this column — along with all the files needed to run it on the Avnet LX9 development board — can be found here


Generating a VGA Test Pattern


In my original article, we discussed how we could use two counters — the pixel counter and the line counter — to generate the “H_Sync” (horizontal sync) and “V_Sync” (vertical sync) signals that are used to synchronize the VGA display. Now, in this article, we will consider how to also generate some RGB (red, green, and blue) signals to create an image on the display.


My Spartan 3A development board.

The first step was for me to retrieve my trusty Spartan 3A development board, which I had loaned to a friend at work. Once I had this board back in my hands, I started to ponder my implementation. Sadly my development board does not contain proper digital-to-analog converters (DACs) that can be driven by 8-bit wide red, green, and blue signals generated by the FPGA. Instead, it uses only four bits to represent each color, and it employs a simple resistor network to convert these digital outputs into corresponding analog voltages.

This means the color palette of my Spartan board is limited to four bits for the red channel, four bits for the blue channel, and four bits for the green channel, which equates to 2^4 x 2^4 x 2^4 = 4,069 colors. Although this 12-bit color scheme is admittedly somewhat limited, as we shall see it can still provide excellent results.

The next problem is the amount of memory required to hold the image. Once again, I had originally planned on storing an 800 x 600 pixel image in a frame buffer on the FPGA as described in Max’s article. Even with my limited color palette, however, just one frame would require 800 x 600 x 12-bits, which equals 5.76 megabits of RAM. This is more memory than is available in the FPGA on my development board.

As a “cheap-and-cheerful” alternative, I decided to generate a series of simple test patterns algorithmically. A high-level block diagram of my VGA test pattern generator is illustrated below:


High-level block diagram of my VGA test pattern generator.

First we have a “System Clock,” which is used to synchronize all of the activities inside the FPGA. The “VGA Timing” module comprises the pixel and line counters we discussed in my original article. In addition to generating the “H_Sync” and “V_Sync” signals that are used to synchronize the VGA display itself, this module also generates a number of other signals that are used to control the “VGA Video” module.

The “Algorithmic Test Pattern Generator” module is used to generate a series of simple test patterns. The “VGA Video” module takes these test patterns and presents them to the outside world in the form of the three 4-bit RGB signals that are presented to the DACs (or resistor networks, in the case of my development board).

Actually, I should note that in my real-world implementation, the “Algorithmic Test Pattern Generator” and “VGA Video” modules are one and the same thing, but it’s easier to think of them as being separate entities for the purposes of these discussions.

My implementation of this test pattern generator consumes only a small portion of the resources available on my Spartan FPGA. In fact, it requires just 96 slices out of the 5,888 slices that are available, which means it utilizes less than 2 percent of the chip’s total resources.

To be honest, I’m glad that the limitations of my development board forced me to take this intermediate step — that is, to create a test pattern generator. This is because a test pattern provides the simplest way to output images to prove that the backend display drivers are working correctly. Generating a test pattern (or a series of test patterns, in this case) is a good idea for a variety of reasons:

  • It allows the RGB color outputs to be verified to prove that they are functioning correctly. This can be achieved by displaying incremental bars where the color is gradually increased from 0 to its maximum value.
  • It allows the timing to be checked. Is the frame updating correctly? Are the borders correct? And so forth.
  • More advanced test patterns can be used to align the image with a camera viewfinder on systems that are used to capture real-world images.

As an aside, a famous television test pattern many people will recognize is the Indian Head Test Card. This was common in America until the early 1970s, at which time it was replaced by the SMTPE Color Bars.

If you wish to probe deeper into my design, click here to download a ZIP (compressed) version of my project file. As you will see, this design consists of one structural unit tying together two modules: the “VGA Timing” module and the “VGA Video” module (which includes the algorithmic test pattern generation code as noted above).

The “VGA Video” module outputs the RGB video signals during the active periods of the video display period, as can be seen in the results of the simulation shown in the following screenshot:


The results from my initial simulations.

Again, the values in the line and pixel counters in the “VGA Timing” module are used by the “VGA Video” module to determine positions on the screen and to decide when the RGB outputs need to be manipulated to achieve the desired result.


Generating VGA from an FPGA


Thanks to their nature, FPGAs are well suited to the intense levels of signal processing required by many imaging systems. Of course, one of the most rewarding aspects of image processing is seeing the resultant image on a display, and a very common form of display uses the VGA (video graphics array) standard.

The first VGA display was introduced with the IBM PS/2 line of computers in 1987. One thing most people associate with this form of display is the 15-pin D-subminiature VGA connector you tend to find on the back of a tower computer or the side of your notebook computer.

The original VGA standard supported a resolution of only 640×480 (which means 640 pixels in the horizontal plane and 480 lines in the vertical plane). Over the years, however, the standard has evolved to support a wide variety of resolutions, all the way up to widescreen resolutions as high as 1920×1080.

The act of driving a VGA is surprisingly simple, being based on the use of two counters as follows:

  • Pixel counter: Counts at the required clock frequency (40MHz in this example) the number of pixels in a line, this is used to generate the horizontal timing.
  • Line counter: Also known as the Frame Counter, this repeats at the refresh rate of the desired VESA specification for 60Hz, 75Hz, 85Hz, and so on. This also identifies when the counter is within a valid region for outputting display data. The line counter is incremented each time the pixel counter reaches its terminal count.
    These counters are used to generate two synchronization (sync) markers — the “V_Sync” (vertical sync) and “H_Sync” (horizontal sync) signals. In conjunction with the RGB (red, green, and blue) analog signals , “V_Sync” and “H_Sync” form the basic signals required to display video on a monitor.

Actually, this may be a good time to take a step back to remind ourselves as to the origin of terms like “V_Sync” and “H_Sync.” The main thing to remember is that, at the time the original VGA standard was introduced, the predominant form of computer display was based on the cathode ray tube (CRT), in which an electron beam is used to “write” on a phosphorescent screen.154412_242291

There are several ways in which an electron beam can be manipulated to create images on a CRT screen, but by far the most common technique is the raster scan. Using this approach, the electron beam commences in the upper-left corner of the screen and is guided across the screen to the right. The path the beam follows as it crosses the screen is referred to as a line. When the beam reaches the right-hand side of the screen it undergoes a process known as horizontal flyback, in which its intensity is reduced and it is caused to “fly back” across the screen. While the beam is flying back it is also pulled a little way down the screen as shown in the following illustration:

The beam is now used to form a second line, then a third, and so on until it reaches the bottom of the screen. The number of lines affects the resolution of the resulting picture (that is, the amount of detail that can be displayed). When the beam reaches the bottom right-hand corner of the screen it undergoes vertical flyback, in which its intensity is reduced, it “flies back” up the screen to return to its original position in the upper left-hand corner, and the whole process starts again.

The “V_Sync” and “H_Sync” signals are used to synchronize all of these activities. Thus, returning to our pixel and line counters, the values on these counters can be decoded so as to generate the required waveforms on the “V_Sync” and “H_Sync” outputs from an FPGA (that is, on the FPGA’s pins that are being used to drive the display’s “V_Sync” and “H_Sync” signals). Meanwhile, generating the RGB signals will require the FPGA to drive three digital-to-analog convertors (DACs), one for each signal. As the design engineer, you must ensure that the latency through the DACs is accounted for to ensure that their outputs are correctly aligned with respect to the “V_Sync” and “H_Sync” signals.

The line and pixel counters both have portions of their count sequences when no data is being output to the display. In the case of an 800×600 resolution display refreshing at 60Hz, for example, the vertical (line) counter will actually count 628 lines while the horizontal (pixel) counter will count 1,056 pixels.

Why should this be so? Well, returning to our raster scan, it takes a certain amount of time for the electron beam to undergo its horizontal and vertical flyback activities. One way to think about these times is that we have an actual display area that we see, and that this actual display area “lives” in a larger (virtual) display space that contains a border zone that we don’t see:

Of course, in the case of today’s flat-screen, liquid crystal displays (LCDs) and similar technologies, we don’t actually need to worry about things like horizontal and vertical flyback times. At least, we wouldn’t have to worry if it were not for the fact that we don’t actually know what type of screen our FPGA is driving. Thus, anything driving a VGA output generates the timing signals required to drive CRT display, and other forms of display simply make allowances for any of the historical peculiarities associated with these VGA signals.

But we digress… Each of our counters has a collection of associated timing parameters. Vertical timings are referenced in terms of lines, while horizontal timings are referenced in terms of pixels. The following values are those associated with a display resolution of 800×600:

154451_091813 (1)
Using this approach, it is very easy to generate a simple VGA interface and see the results of our image processing algovgarithms on a monitor. If you are interested, you can download a ZIP file containing the VHDL code for these counters along with a VHDL testbench by clicking here vga


Mean Time Between Failure


Every engineer should be familiar with the concept of Mean Time Between Failure (MTBF), which is one of the most commonly used terms in reliability engineering. Having said this, different people have varying interpretations of what MTBF actually means, and these interpretations are often incorrect.

For example, suppose I were to tell you that a particular system had a MTBF of 8,766 hours, which equates to one year (MTBF is always quoted in hours). Does this mean that if I use a single instantiation of this system in my application, that system is guaranteed to be operational for a year before its first failure?

In fact, you might be surprised to discover that with a MTBF of a year, a single system has a probability of only 0.37 of still being operational after the 8,766 hours. If you’re a manufacturer of systems, then only 37 percent of the units you produce will still be operational after this period, which might cause problems for your warranty and post-design services departments. Using the equation P(s) = E^(-t/MTBF), and charting this equation in Excel will produce the following plot, which shows probability of success at 0.5 MTBF and 1.0 MTBF:

As engineers, we want (or, in some cases, need) to ensure the probability of success for a particular number of years is very high — as high as 0.95 in many cases (some applications require 0.99 or higher, but this will require you to employ redundancy). This means that, for the simple one year product/mission life presented in our example, the MTBF would have to be 20.74 years to obtain the required probability of success. This is a big difference from what you may originally have thought.

The reliability of a module or system follows the well know “bath tub curve” as shown below:

The discussions in the remainder of this blog will relate to determining the MTBF and failure rate during the constant failure rate duration. It is the responsibility of the manufacturer to ensure that infant mortality is screened out, which is one reason for performing post-production burn-in.

One method of determining MTBF is based around the failure rate of the component interconnections (solder joints), the technology (hybrid, IC, PCB etc.), and the operating environment (ground, aircraft, space vehicle, etc.). In fact, two methods for determining the failure rate are commonly used:

  • Parts count: As this technique is based around reference stresses on components, the resulting analysis tends to give a more conservative (pessimistic) failure rate.
  • Stressed reliability: This approach utilizes actual electrical and thermal stress applied to each component to determine a more accurate failure rate for the system.

In many cases, circuit/systems designers may use both of these techniques to determine the MTBF of their system. For example, they may initially perform a parts count analysis on the bill of materials (BOM) to provide an initial (“ball park”) estimate of the reliability.

Later, once the circuit design has been completed, they may follow up with a stressed reliability analysis that takes into account the electrical stresses on the components and the equipment temperatures. This second, stressed analysis tends to lower the failure rate and increase the MTBF, which is generally what engineers and companies want while still being accurate.

One common standard for performing both of these analyses is Mil Handbook 217F Notice 2. This is freely available over the Internet and provides very detailed information on reliability rate calculations for different devices and environments. The only downside with this standard is that it was last updated in 1995, so its predictions can be a little pessimistic for modern components. The other commonly used standards are the Bellcore/Telcordia and SAE Reliability Prediction Methods.

Another method for determining the failure rate of a device is via life testing; i.e., the total number of hours the device operates without a failure. This is often achieved using accelerated life testing, which stresses the devices beyond their normal operating conditions to “age” the devices and determine failure rates. This approach is generally performed by device manufacturers to obtain each component’s FIT (failure in time) rate. Typical examples of this are the Mil Std 883 and the JEDEC Solid State Technology Association’s “JEDS22 reliability test methods for packaged devices.”

The FIT rate is the number of failures in a billion hours, for example, a device with a FIT rate of 20 is said to have 20e-9 FITs. The relationship between MTBF and FIT rate is very simple — the reciprocal of one results in the other. Hence, in our earlier example, in order to have a probability of success of 0.95 for one year, the total design needs a FIT rate no greater than 5,500 FITs, which is still pretty high.

Typical FIT rates for Xilinx FPGAs (this site’s sponsor) are 24 FITs for the current 7 series devices. Typically, power supplies tend to dominate failure rates, especially in Mil Handbook 217F Notice 2, which can be used to calculate the reliability of a hybrid device.


Increasing FPGA System Reliability


In this column, I will look at what we can do within the FPGA and at the hardware/system level to increase reliability.

Focusing on the FPGA implementation first, there are numerous ways the design can be corrupted, depending on the end environment. This corruption could be the result of a single-event upset (SEU), a single-event functional interrupt (SEFI), or even data corruption from a number of sources.

An SEU occurs when a data bit (register or memory) is hit by radiation and flips from a 0 to a 1, or vice versa. A SEFI is where a control register or other critical register suffers a bit flip that locks up the system. In the world of SRAM-based FPGAs, we tend to consider an SEFI when one of the SRAM cells holding the device’s configuration flips and changes the design’s implementation. Data corruption can occur for a number of reasons, including EMI (electromagnetic interference) affecting the design in an industrial application.

How can we protect these systems and increase a unit’s MTBF? Depending on the end application, it may be acceptable simply to duplicate the logic — create two instantiations of the design within the same device — and to indicate an error if the results do not match. The higher-level system would be in charge of deciding what to do in the event of such an error.

The next thing we can do is to implement triple modular redundancy (TMR) within the device. At the simplest level, this instantiates the same design three times within the FPGA. A majority vote — two out of three — decides the result. (Even though this might sound simple, implementing it can become very complex very quickly.) If one instantiation of the design becomes corrupted, the error will be masked. Depending on the kind of error, the device may clear itself on the next calculation, or it may require reconfiguration.

Implementing TMR can be performed by hand, which can be time-consuming, or using tools such as the TMRTool from Xilinx (this site’s sponsor) or the BL-TMR from Brigham Young University. If TMR is implemented correctly (and you have to be careful about synthesis optimizations), the design should mask all SEUs, as long as only one is present at any particular time.

Memory blocks inside the FPGA may also use error-correcting code technology to detect and correct SEUs. However, to ensure you really have good data, you need to perform memory scrubbing. This involves accessing the memory when it is not being used for other purposes, reading out the data, checking the error detection and correction code, and (if necessary) writing back the corrected data. Common tools here include Hamming codes that allow double-error detection and single-error correction.

This nicely leads us to the concept of scrubbing the entire FPGA. Depending on your end application, you might be able to simply reconfigure the FPGA each time before it is used. For example, a radar imaging system taking an image could reconfigure the FPGA between images to prevent corruption. If the FPGA’s performance is more mission-critical or uptime-critical, you can monitor the FPGA’s configuration by reading back the configuration data over the configuration interface. If any errors are detected, the entire device may be reconfigured, or partial reconfiguration may be used to target a specific portion of the design. Of course, all this requires a supervising device or system.

Of course, it will take significant analysis to determine how any of the methods that have been mentioned thus far affects MTBF. The complexity of this analysis will depend on the environment in which the system is intended to operate.

Working at the module level, we as engineers can take a number of steps to increase reliability. The first is to introduce redundancy, either within the module itself (e.g., extra processing chains) or by duplicating the module in its entirity.

If you are implementing redundancy, you have two options: hot and cold. Each has advantages and disadvantages, and implementing either option will be a system-level decision.

In the case of hot redundancy, both the prime and redundant devices (to keep things simple, I am assuming one-for-two redundancy) are powered up, with the redundant module configured ready to replace the prime should it fail. This has the advantage of a more or less seamless transition. However, since the redundant unit is operating alongside the prime, it is also aging and might fail.

In the case of cold redundancy, the prime unit is powered and operating while the redundant unit is powered down. This means the redundant module is not subject to as many aging stresses and, to a large extent, is essentially new when it is turned on. However, this comes at the expense of having some amount of down time if the prime module fails and the redundant module must be switched in.

With careful analysis of your system, you can identify the key drivers; i.e., which components have a high failure rate and are hurting system reliability. Power supply is often a key driver. Therefore, it is often advisable to implement a redundant power supply architecture that can power the same electronics, often in a one-out-of-two setup.

If you are implementing redundancy at the data path, module, or system level, the number of data paths, modules, or systems you employ will impact the new failure rate. For example, 12-for-8 systems will give you a lower failure rate than 10-for-8 systems. Of course, redundancy comes the expense of cost, size, weight, power consumption, and so forth. A very good interactive Website for this analysis can be found by clicking here.

When implementing redundancy at either the system or module level, it is crucial that both prime and redundant modules cannot have faults that can keep each other from working. Fault propagation has to be considered, and prime and redundant modules must be isolated from each other.