Tag Archives: fpga

Coding Styles

Facebooktwittergoogle_plusredditpinterestlinkedinmail

The major differences among coding styles relate to how the design engineer decides to handle VHDL keywords and any user-defined items (the names of signals, variables, functions, procedures, etc.). Although there are many different possibilities, in practice there are only three commonly used approaches. The first technique is to use a standard text editor and simply enter everything in lowercase (and black-and-white) as shown in the following example of a simple synchronizer:

185516_931576

B&W: Keywords and user-defined items in lowercase.

The second method is to follow the VHDL Language Reference Manual (LRM), which says that keywords (if, case, when, select, etc.) should be presented in lowercase. Strangely, the LRM doesn’t have anything to say about how user-defined items should be presented, but the common practice (when the keywords are in lowercase) is to present user-defined items in uppercase as shown below:

185522_515573

B&W: Keywords lowercase; user-defined items uppercase.

The third approach — and my personal preference — is to turn the LRM on its head; to capitalize keywords (IF, CASE, WHEN, SELECT, etc.) and to present any user-defined items in lowercase as shown below:

185529_775535

B&W: Keywords uppercase; user-defined items lowercase.

The main argument for using uppercase for keywords and lowercase for user-defined items, or vice versa, is that this helps design engineers and reviewers to quickly locate and identify the various syntactical elements in the design. However, this line of reasoning has been somewhat negated by the use of today’s context-sensitive editors, which are language-aware and therefore able to automatically assign different colors to different items in the design.

Let’s look at our original examples through the “eyes” of a context-sensitive editor as illustrated in the following three images:

185538_895740

Color: Keywords and user-defined items in lowercase.
185543_998811

Color: Keywords lowercase; user-defined items uppercase.
185550_031637

Keywords uppercase; user-defined items lowercase.

Although context-sensitive editors are extremely efficacious, it’s important to remember that some users may be color-blind. Also, even if the code is captured using a context-sensitive editor, it may be that some members of the team end up viewing it using a non-context-aware (black-and-white) editor. Furthermore, the code may well be printed out using a black-and-white printer. For all these reasons, my personal preference is to capitalize keywords and for everything else to be in lowercase.

Aside from how we handle the keywords, most companies will have their own sets of coding guidelines, which will also address other aspects of coding, such as:

    • Naming conventions for clock signals; possibly requiring each clock name to include the frequency, e.g., clk_40MHz, clk_100MHz.
    • Naming and handling of reset signals to the device (active-high, active-low), along with the synchronization of the signal and its assertion and de-assertion within the FPGA. The de-assertion of a reset is key, as removing this signal at the wrong time (too close to an active clock edge) could lead to metastability issues in flip-flops.
    • Naming conventions for signals that are active-low (this is common with external enables). Such signals often have a “_z” or “_n” attached to the end, thereby indicating their status as active-low (“ram_cs_n” or “ram_cs_z,” for example).
    • The permissible and/or preferred libraries that will ensure the standard types that can be used. Commonly used libraries for any implementable code are “std_logic_1164” (this is the base library that everyone uses) and the “numeric_std” for mathematical operations (as opposed to “std_logic_arith”). Meanwhile, other libraries such as “textio” and the “math_real” and “math_complex” libraries will be used when creating test benches.
    • The use of “std_logic” or the unresolved “std_ulogic” types on entities and signals (“std_ulogic” is the unresolved type of “std_logic” and is used to indicate concurrent assignments on a signal).
    • Permissible port styles. Many companies prefer to use only “in,” “out,” and “inout” (for buses only) as opposed to “buffer,” as the need to read back an output can be handled internally with a signal. Also, many companies may limit the types permissible in entities to just “std_logic” or “std_logic_vector” to ease interfacing between large design teams.
  • It is also common for companies in some specialist safety-critical designs to have rules regarding the use of variables.

These coding styles can be very detailed, depending upon the end application the FPGA is being developed for.

Facebooktwittergoogle_plusredditpinterestlinkedinmail

Handling the Don’t Care Value

Facebooktwittergoogle_plusredditpinterestlinkedinmail

In my previous blog, I talked about the nine-value VHDL logic system. As part of those discussions, I mentioned the “Don’t Care” value ‘–’. In particular, we noted how this value is not really meant to be used in comparison operations such as “IF” and “CASE” statements. Instead, the “Don’t Care” value is traditionally used as an assignment when specifying output values.

Having said this, people often ask “Is it possible to use the ‘Don’t Care’ value with comparison operators as it could be very useful?” Like many things, the answer is “yes” with the help of a very useful function — the “std_match()” function provided in the “numeric_std” package (“USE ieee.numeric_std.ALL”).

The “std_match()” function allows the use of “Don’t Care” values during comparison operations. This comes in very useful when we are trying to implement functions like priority encoders, because it allows the “CASE” or “IF” statement to be specified in many fewer lines of code as demonstrated below where a simple priority encoder is implemented.

If we were to perform a comparison just using the “=” operator, we would receive a synthesis warning similar to the following:

Possible simulation mismatch due to ‘-‘ in condition expression

By comparison (no pun intended), if we were to employ the “std_match()” function as demonstrated in the following snippet of code, we would not receive this warning:

174254_897604

In attempt to reduce the need to use the “std_match()” function, VHDL 2008 — which was published by Accellera in 2008, and which is now the official IEEE 1076-2008 standard — introduced many useful upgrades to the language. In addition to including support for new fixed- and floating-point number systems, and the incorporation of the Property Specification Language (PSL) into VHDL, VHDL 2008 also introduced the “CASE?” keyword. This new keyword allows the ‘-‘ to be used as a “Don’t Care” value in comparison operations provided the choices are overlapping. Thus, using the new “CASE?” operator, the code above would be rewritten as follows:

174306_955553

The only remaining question would be why have I included both the “std_match()” approach and the newer “CASE?” method of performing this operation in this column? Why didn’t I just talk about the “CASE?” approach and have done with it?

Well, the answer is both simple and mundane. Although VHDL 2008 was published by Accellera in 2008, and was accepted by the IEEE in January 2009, it takes time for the various tool manufacturers to support new languages features. Often it is pressure from the users that is responsible for the manufacturers introducing new features. As is often the case in life, if you do not ask you will not get. (“The squeaky wheel is the one that gets the oil,” as the old saying goes.)

Facebooktwittergoogle_plusredditpinterestlinkedinmail

Logic Values

Facebooktwittergoogle_plusredditpinterestlinkedinmail

As I touched upon in my earlier column on metastability, a VHDL signal that uses the “std_logic_1164 package” can undertake one of nine different values.

This may come as a shock to newer FPGA designers, who might wonder why a signal that spends the majority of its time as a 0 or 1 needs seven other values to accurately represent its behavior. As with a lot of these things, there is an interesting history as to why we ended up with this nine-value system.

These nine values are defined by the IEEE standard 1164, which was introduced in 1993. This replaced the earlier IEEE 1076-1987 standard, whose logical type — the “bit” — could assume one of only two values: 0 or 1. Sadly the “bit” type’s limited value system caused issues with models of buses, in which multiple tri-state drive gates may be connected to the same signal.

In order to address this problem, the multi-valued logic system defined by the IEEE 1164 standard not only introduced a set of new values, but it also defined an order of precedence. This order of precedence is very important to ensure that the simulator can correctly resolve the value on a signal.

085141_935838

Another way of visualizing this graphically is illustrated below. What this image tells us is that a conflict between a signal carrying a “U” (Uninitialized) value — for example, a register that has not been loaded with a 0 or 1 — and any other value will result in a “U.” A conflict between a “-” (Don’t Care) value and any other value (apart from a “U”) will result in an “X.” A conflict between a “0” and a “1” will result in an “X.” A conflict between a “0” and an “L,” “W,”, or “H” will result in a “0.” And so on and so forth…

214602_979601

It’s important to note that not all of these values can be assigned in synthesizable code, but they may be used in modeling and — in the case of the “U” (Uninitialized”) value, for example — they may be seen as the initial values on signals and registers during simulation.

While the meanings of the “0,” “1,” and “Z” — which are typically used to model the behavior of both signals and buses — are reasonably straightforward (doing what the “bit” type could not), the uses of the other values might not be immediately apparent. The “H” (Weak High) and “L” (Weak Low) states are used to model high-value pull-up and pull-down resistance values on a signal, respectively. These can subsequently be used to accurately represent the actions of wired-AND and wired-OR circuits during simulation. In turn, this lead to the “W” (Weak Unknown”), which represents the case where a signal is being driven by both “L” and “H” values from two different drivers.

The “-” (Don’t Care) value is not intended to be used for branch or loop selection such as “IF signal_a = ‘-‘ THEN…” as this will not be true unless “signal_a” actually is “-“; instead, the “-” value is intended to be used for defining outputs; for example:

214609_692783

You may have noticed my mentioning that the simulator resolves the signal to the correct value. In fact, two base types are defined within the 1164 package: “std_ulogic” and “std_logic.” Both of these types use the same nine-value logic system; however, “std_logic” is resolved while “std_ulogic” is unresolved. What this means is that — when using “std_logic” — if multiple drivers are present the simulator is capable of resolving the situation and determining the resultant state of the signal using the resolution table below:

214618_043606

Of course, this does not mean that the resolved signal will be one the engineer expects and/or one that is useable in the final design. Frequently, if the engineer has inadvertently connected multiple drivers together, the result will be an “X” (Forcing Unknown).

This problem is somewhat solved by using the “std_ulogic” type, because this will cause the simulator to report an error if multiple drivers are present (generally speaking, most signals in logic designs should have only one driver).

The “std_ulogic” type is not commonly used with FPGA designs (although this depends on each company’s coding rules). However, this type is very popular with ASIC designs where inadvertent multiple drivers can have a serious effect on the ensuing silicon realization

Facebooktwittergoogle_plusredditpinterestlinkedinmail

Registers and Latches

Facebooktwittergoogle_plusredditpinterestlinkedinmail

Along with other simple logic functions like multiplexers, the programmable blocks primarily consist of lookup tables (LUTs) and registers (flip-flops). Registers are the storage elements within the FPGA that we use to form the cores of things like counters, shift registers, state machines, and DSP functions — basically anything that requires us to store a value between clock edges.

A register’s contents are updated on the active edge of the clock signal (“clk” in the diagram below), at which time whatever value is present on the register’s data input (“a” in the diagram below) is loaded into the register and stored until the next clock edge. (These discussions assume that any required setup and hold times are met. If not, a metastable state may ensue. This will be the topic of a future column.) Some registers are triggered by a rising (or positive) edge on the clock, others are triggered by a falling (or negative) clock edge. The following illustration reflects a positive-edge-triggered register:

161003_539764
Excluding any blocks of on-chip memory, there is another type of storage element that may be used in a design — the transparent latch. Transparent latches differ from registers in that registers are edge-triggered while latches are level-sensitive. When the enable signal (“en” in the diagram below) is in its active state, whatever value is presented to the latches’ data input (“a” in the diagram below) is transparently passed through the latch to its output.

If the value on the data input changes while the enable is in its active state, then this new value will be passed through to the output. When the enable returns to its inactive state, the current data value is stored in the latch. If the enable signal is in its inactive state, any changes on the data input will have no effect. Some latches have active-high enables, others have active-low enables. The following illustration reflects a transparent latch with an active-high enable:

161017_395565
In some FPGA architectures, a register element in a programmable block may be explicitly configured to act as a latch. In other cases, the latch may be implemented using combinatorial logic (in the form of LUTs) with feedback. Designers may explicitly decide to create their VHDL code in such a way that one or more latches will result. More commonly, however, a designer doesn’t actually mean to use a latch, but the synthesis engine infers a latch based on the way the code is captured.

Generally speaking, the use of latches within an FPGA is frowned upon. As we previously noted, the fundamental programmable fabric in an FPGA is based around a LUT and a register. These devices are designed to have local and global clock networks with low skew. FPGA architectures are not intended to have enable signals replacing clock signals.

This means that if the VHDL code implies the use of a latches, the synthesis tool will be forced to implement those latches in a non-preferred way, which will vary depending on the targeted device architecture. In addition to unwanted race conditions, the use of latches can cause downstream issues in the verification phase with regard to things like static timing analysis and formal equivalence checking. For this reason, most FPGA designers agree that latches are best avoided and registers should be used wherever local storage elements are required.

Latches are generally created when the engineer accidentally forgets to cover all assignment conditions within a combinatorial process, therefore implying storage as illustrated in the following VHDL code example:

161129_143527
The IF statement does not check all conditions on the “en” signal, which means that storage is implied and a latch will be inferred for the “op” output signal. In fact, when this code was synthesised, the tool issued the following warning:

WARNING:Xst:737 — Found 1-bit latch for signal <op>. Latches may be generated from incomplete case or if statements. We do not recommend the use of latches in FPGA/CPLD designs, as they may lead to timing problems.
Unit synthesized.

So, how do we correct for this error? Well, in this case we can make the process synchronous as illustrated in the following VHDL code example:

161137_759640
This code will not result in latch generation. It is important to note, however, that only the design engineers know exactly what it is they are trying to achieve, so it is up to them to decide whether they do require storage (in which case they can use a synchronous process and an enable signal) or whether a purely combinatorial process is desired.

Have you had any experiences with unwanted latches being inferred by the synthesis engine? Alternatively, have you created any FPGA designs in which you decided latches were a necessary evil? If so, please explain.

Click here to download a ZIP file containing the code examples presented in this article.

Facebooktwittergoogle_plusredditpinterestlinkedinmail

Sensitivity Lists and Simulation

Facebooktwittergoogle_plusredditpinterestlinkedinmail

If you are unfamiliar with the VHDL hardware description language (HDL) and are interested in knowing things like the difference between a signal and a variable, or the difference between a function and a procedure, please stay tuned.

As a starter, I thought I would explain the importance of the sensitivity list, which is employed in a VHDL process. At some point, everyone developing VHDL will end up writing something similar to the following lines of code to implement a simple two-stage synchronizer:

101321_180577
In this case, the “sync” signal acts as the output from this process. The “ip” signal is an input. That signal would be declared in the entity associated with the process, but that’s a topic for another column.

The “(reset,clock)” portion of the process is referred to as the sensitivity list. It contains the signals that will cause a simulation tool (e.g., ModelSim) to execute the process and update the signals. This is required because all processes are executed concurrently, so the simulation tool needs to know which processes need updating as its simulation cycle progresses.

In the example above, events occurring on the “reset” or “clock” signals will result in the process being executed. The process will execute sequentially, updating the “sync” signal on the rising edge of the clock.

For clocked processes such as the one presented here, the sensitivity list requires only the “reset” and “clock” signals. Any other signals that appear on the righthand side of the assignment operator (<=) in a clocked process do not need to be included in the sensitivity list. Their states are effectively being sampled on the edge of the “clock.” You could include both the “sync” and “ip” signals in the sensitivity list of the above process, but neither would be used unless any changes on them occurred at the same time as they were being sampled by the rising edge of the “clock.”

By comparison, in the case of combinatorial processes, it is necessary for the sensitivity list to include all the input signals used within the process if you wish to avoid potential issues. Consider a simple 4:1 multiplexer, for example.

101329_539730
Note that, in this case, the “op” signal is the output from the process. Once again, this signal would be declared in the entity associated with the process.

As you can see, this process will be executed whenever a change occurs on the “select_ip” signal (which is a two-bit signal, by the way) or when the values on the “a,” “b,” “c,” or “d” data inputs are updated.

This is the really important point. If we were to include only the “select_ip” signal in our sensitivity list, then we would have a problem. The result would still be legal VHDL. However, the simulator would not respond to any of the changes on the multiplexer’s data inputs. Even worse, the resulting gate-level representation generated by the synthesis tool would be different, because the synthesis engine looks not only at the sensitivity list, but also at the code to extract the behavior of the process when implementing the logic. (This is often referred to as behavioral extraction.)

The result would be a mismatch between the simulation of the original RTL (register transfer level) representation and the simulation of the gate-level representation generated by the synthesis engine. For this reason, the synthesis engine normally reports any signals missing from the sensitivity list that may result in a simulation mismatch. These warnings, which will appear in the synthesis log file, should be investigated and corrected, but it’s a lot easier to create your code in a way that ensures the problem never arises in the first place.

Facebooktwittergoogle_plusredditpinterestlinkedinmail

Securing your FPGA Design

Facebooktwittergoogle_plusredditpinterestlinkedinmail

Let’s start by considering the high-level issues we face as engineers attempting to secure our designs. These include the following:

  1. Competitors reverse engineering our design
  2. Unauthorized production runs
  3. Unauthorized modification of the design
  4. Unauthorized access to the data within the design
  5. Unauthorized control of the end system

The severity and impact of each of these will vary depending upon the end function of the design. In the case of an industrial control system, for example, someone being able to take unauthorized control could be critical and cause untold damage and loss of life. A secure data processing system will place emphasis on integrity of the data being critical. By comparison, in the case of a commercial product, preventing reverse engineering, unauthorized production runs, or even modification might be the driving factors.

Luckily, as engineers, we can use a number of approaches to prevent this sort of thing from happening.

The first, and most critical, is taking control of your design data — source code, schematics, mechanical assemblies, etc. — and ensuring it’s secure. This information is the lifeblood of your company and must be protected all the way through the project life cycle, and beyond, to keep your competitive edge. Sadly, in this age of cyberattacks by anything from individuals to organized groups to nation states, this means having very good firewalls — maybe even an “air gap” — between your design network and one connected to the external world.

There are also efforts that can be undertaken to secure your design within the design process itself. These efforts can be split into the following approaches, which are in no way mutually exclusive:

No. 1: Restrict physical access to the FPGA
One of the first methods that can be undertaken is to limit physical access to the unit — especially the circuit card and the FPGA(s). This involves using methods to detect someone tampering with the unit and taking action suitable for the system upon detection of any threat.

Examples of suitable action would be to safely power down the unit or to erase functional parameters preventing further use of the unit. This is often the case in many industrial control systems or military systems to prevent unauthorized access attempts. Depending upon the end application, other physical methods can be undertaken, such as conformal coating or potting to prevent identification of key components. The use of soldered — as opposed to socketed — components also goes without saying.

No. 2: Encryption of configuration streams
Many applications use SRAM-based FPGAs due to the ability to update the design in the field. Typically, these designs require a configuration device that loads the FPGA configuration at power-up and other times. This configuration data stream may be accessed by a third party (depending upon what physical precautions you have taken).

Many devices these days allow for encryption (normally AES) of the data stream, or even the need to know an encryption key before the device can be programmed further or data read back. Physically, the designer of the PCB can also limit people’s abillity to probe these points by using a multi-layer PCB and by not routing tracks on the top of the board, but instead using internal layers. This is especially efficacious if external termination resistors are not required or can be embedded in the PCB itself (this does add cost)

No. 3: Disable read back or even reconfiguration
Many devices provide the option to prevent the reading back of data over the JTAG interface. Some devices even provide the option to prevent upgrading the device if a certain flag is set, thereby turning a re-programmable device into a one-time programmable (OTP) component. Of course, if you take this course of action, you need to be certain that you will not need to change the design and that you are programming the correct file. (I am sure we have all, at one point, programmed the wrong file into a device. Or is that just me?)

No. 4: Protect that JTAG port
Most access attempts to reverse engineer, modify, or change the functionality of your design are going to be made initially via your JTAG chain. There is a very interesting paper on this topic that you can access by clicking here. It is therefore imperative that you protect your JTAG interface, which should never appear on an external connector, but instead require that the unit be disassembled in order to access the connector.

Ensuring your physical security measures in the field should provide protection over this interface. It’s also a good idea to provide several small chains that can be joined together via numerous tap controllers or external cabling, instead of creating large JTAG chains. Obviously, your design should not indicate on the silk screen where or what the JTAG connectors are. Some more secure designs do not include physical JTAG connectors, but rather just pads on the PCB to which a “bed of nails” type approach can be used to programme the devices.

If the device TAP controller contains the optional TRST pin, then it is possible to fit a zero ohm link to ground programming to hold the TAP in reset, thereby preventing the TAP controller from working. You can do the same with the TCLK pin if the TRST pin is not available. This means your attacker has to find and remove this resistor before the port will work.

No. 5: Differential power analysis
This is a technique that hackers can use to determine when the unit is processing data or when it is idling. As the power profile changes, it is possible to determine a significant amount about the design and the data passing through the system. One solution to this is to ensure the module / system draws the same power regardless of whether it is processing data full-out or while sitting idling, thereby preventing this information from being collected. This requires a more complicated power management and thermal management systems, but can be achieved by means of a shunt regulator, which becomes a constant current load on the main power supply.

No. 6: Design in the ability to detect counterfeits
There is always the possibility that — no matter how many precautions you have taken — your design, or portions thereof, can be copied and reused. However, there are systems you can implement within your code that will enable you to detect if your design has been copied. One potential method is the DesignTag approach from Algotronix, which uses a very unique and innovative method of identifying your design.

The discussions above present just some of the possible threats that are out there, along with a selection of techniques that can be undertaken to secure your design

Facebooktwittergoogle_plusredditpinterestlinkedinmail

Using ChipScope ILA

Facebooktwittergoogle_plusredditpinterestlinkedinmail

If you are new to FPGAs, one aspect of the development flow you may not have considered is how you will go about debugging your design once it has been loaded into the FPGA.

In order to set the scene, let’s first cast our minds back to the days before FPGAs and consider how we would debug a digital circuit board or system in the lab. One of the tools we would have employed would be a logic analyzer. (See: Turn Your iPad Into a Logic Analyzer!) First we would connect the analyzer’s probe leads to the signals of interest on the board. We might also specify certain trigger conditions upon which we desired the tool to commence storing data for subsequent display and analysis. Then we would run the system and try to work out what the heck was happening.

Logic analyzers are, of course, still employed today. When it comes to using one to debug an FPGA design, we typically start by creating a dedicated test header that will connect to the FPGA’s input-and-outputs (I/Os). One problem with this scheme is that there can be hundreds of thousands of signals inside the FPGA — a much greater number than there are I/Os on the device and signals you can break out to the test header. This means that you may have to keep on rebuilding your design to access the signals of interest and route them out to the test header.

In some cases, the physical construction of the unit in question means that test headers are of use only at the board level and not during system integration. Indeed, I am working on one such project at the time of this writing. Another problem is that many FPGA designs are I/O limited from the start, so dedicating a bunch of pins to observe what’s happening on internal signals may simply not be a feasible option.

And one further problem is that, inevitability, the logic analyzer you are using will also be required by one or more other project teams, which means you all have to agree on how you will allocate the analyzer resources. I cannot tell you how frustrating it is to be homing in on a problem when… suddenly… it’s time to disconnect one’s intricate probe setup and allow the analyzer to be wheeled away to someone else’s project.

One solution to this problem — a solution that has seen great advances over the last few years — has been the development of in-chip logic analyzers for use with FPGAs. The idea is to employ any unused programmable resources and on-chip memory blocks to implement one or more “virtual” logic analyzers.

As with their physical counterparts, these virtual logic analyzers — like ChipScope from Xilinx, Identify RTL Debugger from Synopsys, Reveal from Lattice Semiconductor, and SignalTap from Altera — can be set up so that they will only start collecting data after certain trigger conditions have been met. Engineers can use these analyzers to “peer” into the design as it operates, storing the resulting data in on-chip RAM, extracting the results over the JTAG port, and then displaying the results — more or less in real-time — on their screens.

Using virtual logic analyzers may remove the need for test headers. Sadly, however, in many cases they do not remove the need to rebuild the code. One big advantage of these in-chip logic analyzers is that they offer the ability to capture the values on wide internal busses and store these values in internal RAM. The big downside with this approach comes in designs that are already utilizing most of the devices programmable resources, because this will limit any logic analyzer implementations.

Implementing ChipScope can be very quickly achieved within the ISE design flow. The simplest method is to first implement your design, but not to generate the *.bit file. Instead, open up Core Inserter under your Xilinx installation (in Windows, use Start > Xilinx > ChipScope [pro] > Core Inserter). Select the target technology and identify the output file of the synthesis (either *.ngc or *.edf depending upon the tool you used) and add an ICON controller and then the ILA block.

094808_283819

This is where you will connect the signals you wish to analyze. It is possible to have several ILA blocks per ICON if you wish to use different triggers or monitor different signals, etc. Once you’re happy with the connections you can insert the core, although — depending on the speed of your machine — this may take a little time. After the core has been inserted, you need to rerun the implementation stages and generate a *.bit file (ISE should show the stages needing to be re-run). Having configured the target device, you can then connect to the target over JTAG using the ChipScope Analyzer tool and trigger on the waveform of interest as illustrated in the screenshot below.

094817_343698

 

If you are interested in playing with this yourself, an example of the project referenced in this column — along with all the files needed to run it on the Avnet LX9 development board — can be found here

Facebooktwittergoogle_plusredditpinterestlinkedinmail

Generating a VGA Test Pattern

Facebooktwittergoogle_plusredditpinterestlinkedinmail

In my original article, we discussed how we could use two counters — the pixel counter and the line counter — to generate the “H_Sync” (horizontal sync) and “V_Sync” (vertical sync) signals that are used to synchronize the VGA display. Now, in this article, we will consider how to also generate some RGB (red, green, and blue) signals to create an image on the display.

135537_551784

My Spartan 3A development board.

The first step was for me to retrieve my trusty Spartan 3A development board, which I had loaned to a friend at work. Once I had this board back in my hands, I started to ponder my implementation. Sadly my development board does not contain proper digital-to-analog converters (DACs) that can be driven by 8-bit wide red, green, and blue signals generated by the FPGA. Instead, it uses only four bits to represent each color, and it employs a simple resistor network to convert these digital outputs into corresponding analog voltages.

This means the color palette of my Spartan board is limited to four bits for the red channel, four bits for the blue channel, and four bits for the green channel, which equates to 2^4 x 2^4 x 2^4 = 4,069 colors. Although this 12-bit color scheme is admittedly somewhat limited, as we shall see it can still provide excellent results.

The next problem is the amount of memory required to hold the image. Once again, I had originally planned on storing an 800 x 600 pixel image in a frame buffer on the FPGA as described in Max’s article. Even with my limited color palette, however, just one frame would require 800 x 600 x 12-bits, which equals 5.76 megabits of RAM. This is more memory than is available in the FPGA on my development board.

As a “cheap-and-cheerful” alternative, I decided to generate a series of simple test patterns algorithmically. A high-level block diagram of my VGA test pattern generator is illustrated below:

135609_571640

High-level block diagram of my VGA test pattern generator.

First we have a “System Clock,” which is used to synchronize all of the activities inside the FPGA. The “VGA Timing” module comprises the pixel and line counters we discussed in my original article. In addition to generating the “H_Sync” and “V_Sync” signals that are used to synchronize the VGA display itself, this module also generates a number of other signals that are used to control the “VGA Video” module.

The “Algorithmic Test Pattern Generator” module is used to generate a series of simple test patterns. The “VGA Video” module takes these test patterns and presents them to the outside world in the form of the three 4-bit RGB signals that are presented to the DACs (or resistor networks, in the case of my development board).

Actually, I should note that in my real-world implementation, the “Algorithmic Test Pattern Generator” and “VGA Video” modules are one and the same thing, but it’s easier to think of them as being separate entities for the purposes of these discussions.

My implementation of this test pattern generator consumes only a small portion of the resources available on my Spartan FPGA. In fact, it requires just 96 slices out of the 5,888 slices that are available, which means it utilizes less than 2 percent of the chip’s total resources.

To be honest, I’m glad that the limitations of my development board forced me to take this intermediate step — that is, to create a test pattern generator. This is because a test pattern provides the simplest way to output images to prove that the backend display drivers are working correctly. Generating a test pattern (or a series of test patterns, in this case) is a good idea for a variety of reasons:

  • It allows the RGB color outputs to be verified to prove that they are functioning correctly. This can be achieved by displaying incremental bars where the color is gradually increased from 0 to its maximum value.
  • It allows the timing to be checked. Is the frame updating correctly? Are the borders correct? And so forth.
  • More advanced test patterns can be used to align the image with a camera viewfinder on systems that are used to capture real-world images.

As an aside, a famous television test pattern many people will recognize is the Indian Head Test Card. This was common in America until the early 1970s, at which time it was replaced by the SMTPE Color Bars.

If you wish to probe deeper into my design, click here to download a ZIP (compressed) version of my project file. As you will see, this design consists of one structural unit tying together two modules: the “VGA Timing” module and the “VGA Video” module (which includes the algorithmic test pattern generation code as noted above).

The “VGA Video” module outputs the RGB video signals during the active periods of the video display period, as can be seen in the results of the simulation shown in the following screenshot:

135624_999667

The results from my initial simulations.

Again, the values in the line and pixel counters in the “VGA Timing” module are used by the “VGA Video” module to determine positions on the screen and to decide when the RGB outputs need to be manipulated to achieve the desired result.

Facebooktwittergoogle_plusredditpinterestlinkedinmail

Generating VGA from an FPGA

Facebooktwittergoogle_plusredditpinterestlinkedinmail

Thanks to their nature, FPGAs are well suited to the intense levels of signal processing required by many imaging systems. Of course, one of the most rewarding aspects of image processing is seeing the resultant image on a display, and a very common form of display uses the VGA (video graphics array) standard.

The first VGA display was introduced with the IBM PS/2 line of computers in 1987. One thing most people associate with this form of display is the 15-pin D-subminiature VGA connector you tend to find on the back of a tower computer or the side of your notebook computer.

The original VGA standard supported a resolution of only 640×480 (which means 640 pixels in the horizontal plane and 480 lines in the vertical plane). Over the years, however, the standard has evolved to support a wide variety of resolutions, all the way up to widescreen resolutions as high as 1920×1080.

The act of driving a VGA is surprisingly simple, being based on the use of two counters as follows:

  • Pixel counter: Counts at the required clock frequency (40MHz in this example) the number of pixels in a line, this is used to generate the horizontal timing.
  • Line counter: Also known as the Frame Counter, this repeats at the refresh rate of the desired VESA specification for 60Hz, 75Hz, 85Hz, and so on. This also identifies when the counter is within a valid region for outputting display data. The line counter is incremented each time the pixel counter reaches its terminal count.
    These counters are used to generate two synchronization (sync) markers — the “V_Sync” (vertical sync) and “H_Sync” (horizontal sync) signals. In conjunction with the RGB (red, green, and blue) analog signals , “V_Sync” and “H_Sync” form the basic signals required to display video on a monitor.

Actually, this may be a good time to take a step back to remind ourselves as to the origin of terms like “V_Sync” and “H_Sync.” The main thing to remember is that, at the time the original VGA standard was introduced, the predominant form of computer display was based on the cathode ray tube (CRT), in which an electron beam is used to “write” on a phosphorescent screen.154412_242291

There are several ways in which an electron beam can be manipulated to create images on a CRT screen, but by far the most common technique is the raster scan. Using this approach, the electron beam commences in the upper-left corner of the screen and is guided across the screen to the right. The path the beam follows as it crosses the screen is referred to as a line. When the beam reaches the right-hand side of the screen it undergoes a process known as horizontal flyback, in which its intensity is reduced and it is caused to “fly back” across the screen. While the beam is flying back it is also pulled a little way down the screen as shown in the following illustration:

154420_504574
The beam is now used to form a second line, then a third, and so on until it reaches the bottom of the screen. The number of lines affects the resolution of the resulting picture (that is, the amount of detail that can be displayed). When the beam reaches the bottom right-hand corner of the screen it undergoes vertical flyback, in which its intensity is reduced, it “flies back” up the screen to return to its original position in the upper left-hand corner, and the whole process starts again.

The “V_Sync” and “H_Sync” signals are used to synchronize all of these activities. Thus, returning to our pixel and line counters, the values on these counters can be decoded so as to generate the required waveforms on the “V_Sync” and “H_Sync” outputs from an FPGA (that is, on the FPGA’s pins that are being used to drive the display’s “V_Sync” and “H_Sync” signals). Meanwhile, generating the RGB signals will require the FPGA to drive three digital-to-analog convertors (DACs), one for each signal. As the design engineer, you must ensure that the latency through the DACs is accounted for to ensure that their outputs are correctly aligned with respect to the “V_Sync” and “H_Sync” signals.

The line and pixel counters both have portions of their count sequences when no data is being output to the display. In the case of an 800×600 resolution display refreshing at 60Hz, for example, the vertical (line) counter will actually count 628 lines while the horizontal (pixel) counter will count 1,056 pixels.

Why should this be so? Well, returning to our raster scan, it takes a certain amount of time for the electron beam to undergo its horizontal and vertical flyback activities. One way to think about these times is that we have an actual display area that we see, and that this actual display area “lives” in a larger (virtual) display space that contains a border zone that we don’t see:

154433_655629
Of course, in the case of today’s flat-screen, liquid crystal displays (LCDs) and similar technologies, we don’t actually need to worry about things like horizontal and vertical flyback times. At least, we wouldn’t have to worry if it were not for the fact that we don’t actually know what type of screen our FPGA is driving. Thus, anything driving a VGA output generates the timing signals required to drive CRT display, and other forms of display simply make allowances for any of the historical peculiarities associated with these VGA signals.

But we digress… Each of our counters has a collection of associated timing parameters. Vertical timings are referenced in terms of lines, while horizontal timings are referenced in terms of pixels. The following values are those associated with a display resolution of 800×600:

154451_091813 (1)
Using this approach, it is very easy to generate a simple VGA interface and see the results of our image processing algovgarithms on a monitor. If you are interested, you can download a ZIP file containing the VHDL code for these counters along with a VHDL testbench by clicking here vga

Facebooktwittergoogle_plusredditpinterestlinkedinmail

Increasing FPGA System Reliability

Facebooktwittergoogle_plusredditpinterestlinkedinmail

In this column, I will look at what we can do within the FPGA and at the hardware/system level to increase reliability.

Focusing on the FPGA implementation first, there are numerous ways the design can be corrupted, depending on the end environment. This corruption could be the result of a single-event upset (SEU), a single-event functional interrupt (SEFI), or even data corruption from a number of sources.

An SEU occurs when a data bit (register or memory) is hit by radiation and flips from a 0 to a 1, or vice versa. A SEFI is where a control register or other critical register suffers a bit flip that locks up the system. In the world of SRAM-based FPGAs, we tend to consider an SEFI when one of the SRAM cells holding the device’s configuration flips and changes the design’s implementation. Data corruption can occur for a number of reasons, including EMI (electromagnetic interference) affecting the design in an industrial application.

How can we protect these systems and increase a unit’s MTBF? Depending on the end application, it may be acceptable simply to duplicate the logic — create two instantiations of the design within the same device — and to indicate an error if the results do not match. The higher-level system would be in charge of deciding what to do in the event of such an error.

The next thing we can do is to implement triple modular redundancy (TMR) within the device. At the simplest level, this instantiates the same design three times within the FPGA. A majority vote — two out of three — decides the result. (Even though this might sound simple, implementing it can become very complex very quickly.) If one instantiation of the design becomes corrupted, the error will be masked. Depending on the kind of error, the device may clear itself on the next calculation, or it may require reconfiguration.

Implementing TMR can be performed by hand, which can be time-consuming, or using tools such as the TMRTool from Xilinx (this site’s sponsor) or the BL-TMR from Brigham Young University. If TMR is implemented correctly (and you have to be careful about synthesis optimizations), the design should mask all SEUs, as long as only one is present at any particular time.

Memory blocks inside the FPGA may also use error-correcting code technology to detect and correct SEUs. However, to ensure you really have good data, you need to perform memory scrubbing. This involves accessing the memory when it is not being used for other purposes, reading out the data, checking the error detection and correction code, and (if necessary) writing back the corrected data. Common tools here include Hamming codes that allow double-error detection and single-error correction.

This nicely leads us to the concept of scrubbing the entire FPGA. Depending on your end application, you might be able to simply reconfigure the FPGA each time before it is used. For example, a radar imaging system taking an image could reconfigure the FPGA between images to prevent corruption. If the FPGA’s performance is more mission-critical or uptime-critical, you can monitor the FPGA’s configuration by reading back the configuration data over the configuration interface. If any errors are detected, the entire device may be reconfigured, or partial reconfiguration may be used to target a specific portion of the design. Of course, all this requires a supervising device or system.

Of course, it will take significant analysis to determine how any of the methods that have been mentioned thus far affects MTBF. The complexity of this analysis will depend on the environment in which the system is intended to operate.

Working at the module level, we as engineers can take a number of steps to increase reliability. The first is to introduce redundancy, either within the module itself (e.g., extra processing chains) or by duplicating the module in its entirity.

If you are implementing redundancy, you have two options: hot and cold. Each has advantages and disadvantages, and implementing either option will be a system-level decision.

In the case of hot redundancy, both the prime and redundant devices (to keep things simple, I am assuming one-for-two redundancy) are powered up, with the redundant module configured ready to replace the prime should it fail. This has the advantage of a more or less seamless transition. However, since the redundant unit is operating alongside the prime, it is also aging and might fail.

In the case of cold redundancy, the prime unit is powered and operating while the redundant unit is powered down. This means the redundant module is not subject to as many aging stresses and, to a large extent, is essentially new when it is turned on. However, this comes at the expense of having some amount of down time if the prime module fails and the redundant module must be switched in.

With careful analysis of your system, you can identify the key drivers; i.e., which components have a high failure rate and are hurting system reliability. Power supply is often a key driver. Therefore, it is often advisable to implement a redundant power supply architecture that can power the same electronics, often in a one-out-of-two setup.

If you are implementing redundancy at the data path, module, or system level, the number of data paths, modules, or systems you employ will impact the new failure rate. For example, 12-for-8 systems will give you a lower failure rate than 10-for-8 systems. Of course, redundancy comes the expense of cost, size, weight, power consumption, and so forth. A very good interactive Website for this analysis can be found by clicking here.

When implementing redundancy at either the system or module level, it is crucial that both prime and redundant modules cannot have faults that can keep each other from working. Fault propagation has to be considered, and prime and redundant modules must be isolated from each other.

Facebooktwittergoogle_plusredditpinterestlinkedinmail