Tag Archives: fpga

Securing your FPGA Design


Let’s start by considering the high-level issues we face as engineers attempting to secure our designs. These include the following:

  1. Competitors reverse engineering our design
  2. Unauthorized production runs
  3. Unauthorized modification of the design
  4. Unauthorized access to the data within the design
  5. Unauthorized control of the end system

The severity and impact of each of these will vary depending upon the end function of the design. In the case of an industrial control system, for example, someone being able to take unauthorized control could be critical and cause untold damage and loss of life. A secure data processing system will place emphasis on integrity of the data being critical. By comparison, in the case of a commercial product, preventing reverse engineering, unauthorized production runs, or even modification might be the driving factors.

Luckily, as engineers, we can use a number of approaches to prevent this sort of thing from happening.

The first, and most critical, is taking control of your design data — source code, schematics, mechanical assemblies, etc. — and ensuring it’s secure. This information is the lifeblood of your company and must be protected all the way through the project life cycle, and beyond, to keep your competitive edge. Sadly, in this age of cyberattacks by anything from individuals to organized groups to nation states, this means having very good firewalls — maybe even an “air gap” — between your design network and one connected to the external world.

There are also efforts that can be undertaken to secure your design within the design process itself. These efforts can be split into the following approaches, which are in no way mutually exclusive:

No. 1: Restrict physical access to the FPGA
One of the first methods that can be undertaken is to limit physical access to the unit — especially the circuit card and the FPGA(s). This involves using methods to detect someone tampering with the unit and taking action suitable for the system upon detection of any threat.

Examples of suitable action would be to safely power down the unit or to erase functional parameters preventing further use of the unit. This is often the case in many industrial control systems or military systems to prevent unauthorized access attempts. Depending upon the end application, other physical methods can be undertaken, such as conformal coating or potting to prevent identification of key components. The use of soldered — as opposed to socketed — components also goes without saying.

No. 2: Encryption of configuration streams
Many applications use SRAM-based FPGAs due to the ability to update the design in the field. Typically, these designs require a configuration device that loads the FPGA configuration at power-up and other times. This configuration data stream may be accessed by a third party (depending upon what physical precautions you have taken).

Many devices these days allow for encryption (normally AES) of the data stream, or even the need to know an encryption key before the device can be programmed further or data read back. Physically, the designer of the PCB can also limit people’s abillity to probe these points by using a multi-layer PCB and by not routing tracks on the top of the board, but instead using internal layers. This is especially efficacious if external termination resistors are not required or can be embedded in the PCB itself (this does add cost)

No. 3: Disable read back or even reconfiguration
Many devices provide the option to prevent the reading back of data over the JTAG interface. Some devices even provide the option to prevent upgrading the device if a certain flag is set, thereby turning a re-programmable device into a one-time programmable (OTP) component. Of course, if you take this course of action, you need to be certain that you will not need to change the design and that you are programming the correct file. (I am sure we have all, at one point, programmed the wrong file into a device. Or is that just me?)

No. 4: Protect that JTAG port
Most access attempts to reverse engineer, modify, or change the functionality of your design are going to be made initially via your JTAG chain. There is a very interesting paper on this topic that you can access by clicking here. It is therefore imperative that you protect your JTAG interface, which should never appear on an external connector, but instead require that the unit be disassembled in order to access the connector.

Ensuring your physical security measures in the field should provide protection over this interface. It’s also a good idea to provide several small chains that can be joined together via numerous tap controllers or external cabling, instead of creating large JTAG chains. Obviously, your design should not indicate on the silk screen where or what the JTAG connectors are. Some more secure designs do not include physical JTAG connectors, but rather just pads on the PCB to which a “bed of nails” type approach can be used to programme the devices.

If the device TAP controller contains the optional TRST pin, then it is possible to fit a zero ohm link to ground programming to hold the TAP in reset, thereby preventing the TAP controller from working. You can do the same with the TCLK pin if the TRST pin is not available. This means your attacker has to find and remove this resistor before the port will work.

No. 5: Differential power analysis
This is a technique that hackers can use to determine when the unit is processing data or when it is idling. As the power profile changes, it is possible to determine a significant amount about the design and the data passing through the system. One solution to this is to ensure the module / system draws the same power regardless of whether it is processing data full-out or while sitting idling, thereby preventing this information from being collected. This requires a more complicated power management and thermal management systems, but can be achieved by means of a shunt regulator, which becomes a constant current load on the main power supply.

No. 6: Design in the ability to detect counterfeits
There is always the possibility that — no matter how many precautions you have taken — your design, or portions thereof, can be copied and reused. However, there are systems you can implement within your code that will enable you to detect if your design has been copied. One potential method is the DesignTag approach from Algotronix, which uses a very unique and innovative method of identifying your design.

The discussions above present just some of the possible threats that are out there, along with a selection of techniques that can be undertaken to secure your design


Using ChipScope ILA


If you are new to FPGAs, one aspect of the development flow you may not have considered is how you will go about debugging your design once it has been loaded into the FPGA.

In order to set the scene, let’s first cast our minds back to the days before FPGAs and consider how we would debug a digital circuit board or system in the lab. One of the tools we would have employed would be a logic analyzer. (See: Turn Your iPad Into a Logic Analyzer!) First we would connect the analyzer’s probe leads to the signals of interest on the board. We might also specify certain trigger conditions upon which we desired the tool to commence storing data for subsequent display and analysis. Then we would run the system and try to work out what the heck was happening.

Logic analyzers are, of course, still employed today. When it comes to using one to debug an FPGA design, we typically start by creating a dedicated test header that will connect to the FPGA’s input-and-outputs (I/Os). One problem with this scheme is that there can be hundreds of thousands of signals inside the FPGA — a much greater number than there are I/Os on the device and signals you can break out to the test header. This means that you may have to keep on rebuilding your design to access the signals of interest and route them out to the test header.

In some cases, the physical construction of the unit in question means that test headers are of use only at the board level and not during system integration. Indeed, I am working on one such project at the time of this writing. Another problem is that many FPGA designs are I/O limited from the start, so dedicating a bunch of pins to observe what’s happening on internal signals may simply not be a feasible option.

And one further problem is that, inevitability, the logic analyzer you are using will also be required by one or more other project teams, which means you all have to agree on how you will allocate the analyzer resources. I cannot tell you how frustrating it is to be homing in on a problem when… suddenly… it’s time to disconnect one’s intricate probe setup and allow the analyzer to be wheeled away to someone else’s project.

One solution to this problem — a solution that has seen great advances over the last few years — has been the development of in-chip logic analyzers for use with FPGAs. The idea is to employ any unused programmable resources and on-chip memory blocks to implement one or more “virtual” logic analyzers.

As with their physical counterparts, these virtual logic analyzers — like ChipScope from Xilinx, Identify RTL Debugger from Synopsys, Reveal from Lattice Semiconductor, and SignalTap from Altera — can be set up so that they will only start collecting data after certain trigger conditions have been met. Engineers can use these analyzers to “peer” into the design as it operates, storing the resulting data in on-chip RAM, extracting the results over the JTAG port, and then displaying the results — more or less in real-time — on their screens.

Using virtual logic analyzers may remove the need for test headers. Sadly, however, in many cases they do not remove the need to rebuild the code. One big advantage of these in-chip logic analyzers is that they offer the ability to capture the values on wide internal busses and store these values in internal RAM. The big downside with this approach comes in designs that are already utilizing most of the devices programmable resources, because this will limit any logic analyzer implementations.

Implementing ChipScope can be very quickly achieved within the ISE design flow. The simplest method is to first implement your design, but not to generate the *.bit file. Instead, open up Core Inserter under your Xilinx installation (in Windows, use Start > Xilinx > ChipScope [pro] > Core Inserter). Select the target technology and identify the output file of the synthesis (either *.ngc or *.edf depending upon the tool you used) and add an ICON controller and then the ILA block.


This is where you will connect the signals you wish to analyze. It is possible to have several ILA blocks per ICON if you wish to use different triggers or monitor different signals, etc. Once you’re happy with the connections you can insert the core, although — depending on the speed of your machine — this may take a little time. After the core has been inserted, you need to rerun the implementation stages and generate a *.bit file (ISE should show the stages needing to be re-run). Having configured the target device, you can then connect to the target over JTAG using the ChipScope Analyzer tool and trigger on the waveform of interest as illustrated in the screenshot below.



If you are interested in playing with this yourself, an example of the project referenced in this column — along with all the files needed to run it on the Avnet LX9 development board — can be found here


Generating a VGA Test Pattern


In my original article, we discussed how we could use two counters — the pixel counter and the line counter — to generate the “H_Sync” (horizontal sync) and “V_Sync” (vertical sync) signals that are used to synchronize the VGA display. Now, in this article, we will consider how to also generate some RGB (red, green, and blue) signals to create an image on the display.


My Spartan 3A development board.

The first step was for me to retrieve my trusty Spartan 3A development board, which I had loaned to a friend at work. Once I had this board back in my hands, I started to ponder my implementation. Sadly my development board does not contain proper digital-to-analog converters (DACs) that can be driven by 8-bit wide red, green, and blue signals generated by the FPGA. Instead, it uses only four bits to represent each color, and it employs a simple resistor network to convert these digital outputs into corresponding analog voltages.

This means the color palette of my Spartan board is limited to four bits for the red channel, four bits for the blue channel, and four bits for the green channel, which equates to 2^4 x 2^4 x 2^4 = 4,069 colors. Although this 12-bit color scheme is admittedly somewhat limited, as we shall see it can still provide excellent results.

The next problem is the amount of memory required to hold the image. Once again, I had originally planned on storing an 800 x 600 pixel image in a frame buffer on the FPGA as described in Max’s article. Even with my limited color palette, however, just one frame would require 800 x 600 x 12-bits, which equals 5.76 megabits of RAM. This is more memory than is available in the FPGA on my development board.

As a “cheap-and-cheerful” alternative, I decided to generate a series of simple test patterns algorithmically. A high-level block diagram of my VGA test pattern generator is illustrated below:


High-level block diagram of my VGA test pattern generator.

First we have a “System Clock,” which is used to synchronize all of the activities inside the FPGA. The “VGA Timing” module comprises the pixel and line counters we discussed in my original article. In addition to generating the “H_Sync” and “V_Sync” signals that are used to synchronize the VGA display itself, this module also generates a number of other signals that are used to control the “VGA Video” module.

The “Algorithmic Test Pattern Generator” module is used to generate a series of simple test patterns. The “VGA Video” module takes these test patterns and presents them to the outside world in the form of the three 4-bit RGB signals that are presented to the DACs (or resistor networks, in the case of my development board).

Actually, I should note that in my real-world implementation, the “Algorithmic Test Pattern Generator” and “VGA Video” modules are one and the same thing, but it’s easier to think of them as being separate entities for the purposes of these discussions.

My implementation of this test pattern generator consumes only a small portion of the resources available on my Spartan FPGA. In fact, it requires just 96 slices out of the 5,888 slices that are available, which means it utilizes less than 2 percent of the chip’s total resources.

To be honest, I’m glad that the limitations of my development board forced me to take this intermediate step — that is, to create a test pattern generator. This is because a test pattern provides the simplest way to output images to prove that the backend display drivers are working correctly. Generating a test pattern (or a series of test patterns, in this case) is a good idea for a variety of reasons:

  • It allows the RGB color outputs to be verified to prove that they are functioning correctly. This can be achieved by displaying incremental bars where the color is gradually increased from 0 to its maximum value.
  • It allows the timing to be checked. Is the frame updating correctly? Are the borders correct? And so forth.
  • More advanced test patterns can be used to align the image with a camera viewfinder on systems that are used to capture real-world images.

As an aside, a famous television test pattern many people will recognize is the Indian Head Test Card. This was common in America until the early 1970s, at which time it was replaced by the SMTPE Color Bars.

If you wish to probe deeper into my design, click here to download a ZIP (compressed) version of my project file. As you will see, this design consists of one structural unit tying together two modules: the “VGA Timing” module and the “VGA Video” module (which includes the algorithmic test pattern generation code as noted above).

The “VGA Video” module outputs the RGB video signals during the active periods of the video display period, as can be seen in the results of the simulation shown in the following screenshot:


The results from my initial simulations.

Again, the values in the line and pixel counters in the “VGA Timing” module are used by the “VGA Video” module to determine positions on the screen and to decide when the RGB outputs need to be manipulated to achieve the desired result.


Generating VGA from an FPGA


Thanks to their nature, FPGAs are well suited to the intense levels of signal processing required by many imaging systems. Of course, one of the most rewarding aspects of image processing is seeing the resultant image on a display, and a very common form of display uses the VGA (video graphics array) standard.

The first VGA display was introduced with the IBM PS/2 line of computers in 1987. One thing most people associate with this form of display is the 15-pin D-subminiature VGA connector you tend to find on the back of a tower computer or the side of your notebook computer.

The original VGA standard supported a resolution of only 640×480 (which means 640 pixels in the horizontal plane and 480 lines in the vertical plane). Over the years, however, the standard has evolved to support a wide variety of resolutions, all the way up to widescreen resolutions as high as 1920×1080.

The act of driving a VGA is surprisingly simple, being based on the use of two counters as follows:

  • Pixel counter: Counts at the required clock frequency (40MHz in this example) the number of pixels in a line, this is used to generate the horizontal timing.
  • Line counter: Also known as the Frame Counter, this repeats at the refresh rate of the desired VESA specification for 60Hz, 75Hz, 85Hz, and so on. This also identifies when the counter is within a valid region for outputting display data. The line counter is incremented each time the pixel counter reaches its terminal count.
    These counters are used to generate two synchronization (sync) markers — the “V_Sync” (vertical sync) and “H_Sync” (horizontal sync) signals. In conjunction with the RGB (red, green, and blue) analog signals , “V_Sync” and “H_Sync” form the basic signals required to display video on a monitor.

Actually, this may be a good time to take a step back to remind ourselves as to the origin of terms like “V_Sync” and “H_Sync.” The main thing to remember is that, at the time the original VGA standard was introduced, the predominant form of computer display was based on the cathode ray tube (CRT), in which an electron beam is used to “write” on a phosphorescent screen.154412_242291

There are several ways in which an electron beam can be manipulated to create images on a CRT screen, but by far the most common technique is the raster scan. Using this approach, the electron beam commences in the upper-left corner of the screen and is guided across the screen to the right. The path the beam follows as it crosses the screen is referred to as a line. When the beam reaches the right-hand side of the screen it undergoes a process known as horizontal flyback, in which its intensity is reduced and it is caused to “fly back” across the screen. While the beam is flying back it is also pulled a little way down the screen as shown in the following illustration:

The beam is now used to form a second line, then a third, and so on until it reaches the bottom of the screen. The number of lines affects the resolution of the resulting picture (that is, the amount of detail that can be displayed). When the beam reaches the bottom right-hand corner of the screen it undergoes vertical flyback, in which its intensity is reduced, it “flies back” up the screen to return to its original position in the upper left-hand corner, and the whole process starts again.

The “V_Sync” and “H_Sync” signals are used to synchronize all of these activities. Thus, returning to our pixel and line counters, the values on these counters can be decoded so as to generate the required waveforms on the “V_Sync” and “H_Sync” outputs from an FPGA (that is, on the FPGA’s pins that are being used to drive the display’s “V_Sync” and “H_Sync” signals). Meanwhile, generating the RGB signals will require the FPGA to drive three digital-to-analog convertors (DACs), one for each signal. As the design engineer, you must ensure that the latency through the DACs is accounted for to ensure that their outputs are correctly aligned with respect to the “V_Sync” and “H_Sync” signals.

The line and pixel counters both have portions of their count sequences when no data is being output to the display. In the case of an 800×600 resolution display refreshing at 60Hz, for example, the vertical (line) counter will actually count 628 lines while the horizontal (pixel) counter will count 1,056 pixels.

Why should this be so? Well, returning to our raster scan, it takes a certain amount of time for the electron beam to undergo its horizontal and vertical flyback activities. One way to think about these times is that we have an actual display area that we see, and that this actual display area “lives” in a larger (virtual) display space that contains a border zone that we don’t see:

Of course, in the case of today’s flat-screen, liquid crystal displays (LCDs) and similar technologies, we don’t actually need to worry about things like horizontal and vertical flyback times. At least, we wouldn’t have to worry if it were not for the fact that we don’t actually know what type of screen our FPGA is driving. Thus, anything driving a VGA output generates the timing signals required to drive CRT display, and other forms of display simply make allowances for any of the historical peculiarities associated with these VGA signals.

But we digress… Each of our counters has a collection of associated timing parameters. Vertical timings are referenced in terms of lines, while horizontal timings are referenced in terms of pixels. The following values are those associated with a display resolution of 800×600:

154451_091813 (1)
Using this approach, it is very easy to generate a simple VGA interface and see the results of our image processing algovgarithms on a monitor. If you are interested, you can download a ZIP file containing the VHDL code for these counters along with a VHDL testbench by clicking here vga


Increasing FPGA System Reliability


In this column, I will look at what we can do within the FPGA and at the hardware/system level to increase reliability.

Focusing on the FPGA implementation first, there are numerous ways the design can be corrupted, depending on the end environment. This corruption could be the result of a single-event upset (SEU), a single-event functional interrupt (SEFI), or even data corruption from a number of sources.

An SEU occurs when a data bit (register or memory) is hit by radiation and flips from a 0 to a 1, or vice versa. A SEFI is where a control register or other critical register suffers a bit flip that locks up the system. In the world of SRAM-based FPGAs, we tend to consider an SEFI when one of the SRAM cells holding the device’s configuration flips and changes the design’s implementation. Data corruption can occur for a number of reasons, including EMI (electromagnetic interference) affecting the design in an industrial application.

How can we protect these systems and increase a unit’s MTBF? Depending on the end application, it may be acceptable simply to duplicate the logic — create two instantiations of the design within the same device — and to indicate an error if the results do not match. The higher-level system would be in charge of deciding what to do in the event of such an error.

The next thing we can do is to implement triple modular redundancy (TMR) within the device. At the simplest level, this instantiates the same design three times within the FPGA. A majority vote — two out of three — decides the result. (Even though this might sound simple, implementing it can become very complex very quickly.) If one instantiation of the design becomes corrupted, the error will be masked. Depending on the kind of error, the device may clear itself on the next calculation, or it may require reconfiguration.

Implementing TMR can be performed by hand, which can be time-consuming, or using tools such as the TMRTool from Xilinx (this site’s sponsor) or the BL-TMR from Brigham Young University. If TMR is implemented correctly (and you have to be careful about synthesis optimizations), the design should mask all SEUs, as long as only one is present at any particular time.

Memory blocks inside the FPGA may also use error-correcting code technology to detect and correct SEUs. However, to ensure you really have good data, you need to perform memory scrubbing. This involves accessing the memory when it is not being used for other purposes, reading out the data, checking the error detection and correction code, and (if necessary) writing back the corrected data. Common tools here include Hamming codes that allow double-error detection and single-error correction.

This nicely leads us to the concept of scrubbing the entire FPGA. Depending on your end application, you might be able to simply reconfigure the FPGA each time before it is used. For example, a radar imaging system taking an image could reconfigure the FPGA between images to prevent corruption. If the FPGA’s performance is more mission-critical or uptime-critical, you can monitor the FPGA’s configuration by reading back the configuration data over the configuration interface. If any errors are detected, the entire device may be reconfigured, or partial reconfiguration may be used to target a specific portion of the design. Of course, all this requires a supervising device or system.

Of course, it will take significant analysis to determine how any of the methods that have been mentioned thus far affects MTBF. The complexity of this analysis will depend on the environment in which the system is intended to operate.

Working at the module level, we as engineers can take a number of steps to increase reliability. The first is to introduce redundancy, either within the module itself (e.g., extra processing chains) or by duplicating the module in its entirity.

If you are implementing redundancy, you have two options: hot and cold. Each has advantages and disadvantages, and implementing either option will be a system-level decision.

In the case of hot redundancy, both the prime and redundant devices (to keep things simple, I am assuming one-for-two redundancy) are powered up, with the redundant module configured ready to replace the prime should it fail. This has the advantage of a more or less seamless transition. However, since the redundant unit is operating alongside the prime, it is also aging and might fail.

In the case of cold redundancy, the prime unit is powered and operating while the redundant unit is powered down. This means the redundant module is not subject to as many aging stresses and, to a large extent, is essentially new when it is turned on. However, this comes at the expense of having some amount of down time if the prime module fails and the redundant module must be switched in.

With careful analysis of your system, you can identify the key drivers; i.e., which components have a high failure rate and are hurting system reliability. Power supply is often a key driver. Therefore, it is often advisable to implement a redundant power supply architecture that can power the same electronics, often in a one-out-of-two setup.

If you are implementing redundancy at the data path, module, or system level, the number of data paths, modules, or systems you employ will impact the new failure rate. For example, 12-for-8 systems will give you a lower failure rate than 10-for-8 systems. Of course, redundancy comes the expense of cost, size, weight, power consumption, and so forth. A very good interactive Website for this analysis can be found by clicking here.

When implementing redundancy at either the system or module level, it is crucial that both prime and redundant modules cannot have faults that can keep each other from working. Fault propagation has to be considered, and prime and redundant modules must be isolated from each other.