One of the great things about the BRAM in Xilinx FPGAs is its ability to implement error correcting codes (ECC) on the data stored within. If you remember, we’ve looked at ECC codes in BRAM in a previous blog. The key element of the ECC is that only output data word is corrected, BUT not the corrupted word stored in the memory address. Additionally, while a single-bit error can be corrected, a double-bit error would result in the word being uncorrectable.
We can run what is called a scrubbing algorithm to address when the memory is not being accessed. This algorithm will sequentially read through all addresses in the memory, check if the output data has been corrected, and if so, then write back the error. This enables any single errors that have accumulated within the memory to be corrected and written back, thereby clearing the accumulated faults. Depending on the size of the memory and the access patterns, we can determine the time required for the memory to be scanned and the error corrected. When it comes to high-reliability systems, this calculation can be a critical parameter used as part of the certification process.
In this blog I thought it would be a good idea to create a simple scrubber for Xilinx devices and the XPM memory macro. Using this macro, we can configure the memory to be implemented in either a BRAM or URAM. We can change the memory depth, the read latency, and memory initialization. Since we are using ECC, the data width must be set to 64 bits. If we want to use ECC on BRAM, we’ll need to use a simple dual-port configuration as seen in the diagram below from the UltraScale Memory Resources Guide. There is one ECC on the encode on port A and one ECC decoder on port B.
If we want to target URAM resources, we can configure the memory as either a simple dual-port memory or true dual-port memory because full ECC decoders and encoders are available on both ports. Since I want the ability to convert between BRAM and URAM in the scrubber for commonality, the design will be based around a simple dual-port approach.
The scrubber should ideally enable a range of memory configurations via generics and will also enable user access to the memory. The user will be able to control the scrubbing when the memory is not required by the user. Also, should the user need access, the scrubber will pause and restart to then continue from the place it was last paused. Upon reaching the maximum address, the scrubber will start again from the base address.
To control the scrubbing action, an enable_scrubbing signal will be provided along with an output signal indicating if the scrubbing state machine is currently active. To start the scrubbing, the enable_scubbing signal will be asserted. The signal will be de-asserted to pause scrubbing. User access, however, should not take place until the output status signal shows that the scrubbing operation has completed.
Since statistics can be useful for the behavior of the system, the module will output the cumulative number of errors corrected. It will also be possible to inject single and double-bit errors to the system. If a double-bit error is detected, the user will be informed of the error.
As shown below, the architecture for the module is simple.
The application code and a simple test bench can be found here on my GitHub.
When running through the test bench, you can see that the user access, writing and reading data is uncorrupted. Before a series of writes are corrupted, reading out the data shows the corrupted addresses have a single-bit error. To ensure that high performance accesses can be maintained, there is no attempt at correcting the BRAM contents.
Initial User Memory Access Write and Read
Insertion of Errors and Reporting of Errors on Read Out
Scrubbing Errors Detected and Corrected
With the user accesses completed, the scrubbing is started and you can see the first errors corrected so that on the second scrubbing run, the errors no longer occur because they have been corrected.
In a future blog, I will put the code in the ZCU106 and demonstrate it working with the BRAM and UltraRAM instantiations.
I have tested the basic configuration with a variety of memory sizes and latencies and targeting both BRAM and URAM in simulation.
For now, we have a simple BRAM/URAM scrubber which we can use with the ECC protection provided by the AMD-Xilinx memory resources to ensure high-quality data integrity in our memories
Comments