One thing that is always important for engineers, is the need for us to deliver our projects on quality, schedule and budget. When it comes to developing embedded systems there are a number of lessons, learnt by embedded system developers over the years which can be used to ensure your embedded system achieves these. Let us explore some of the most important lessons learned in developing these.
Slides presented at ESC Minn 2015 on aspects to consider when creating a embedded system.
Completing the RTL design is one part of getting
your FPGA design production-ready.
The next challenge is to ensure the design
meets its timing and performance requirements
in the silicon. To do this, you will often need to
define both timing and placement constraints.
Let’s take a look at how to create and use both
of these types of constraints when designing systems
around Xilinx® FPGAs and SoCs
The Advanced Encryption Standard (AES) has become an increasingly popular cryptographic specification in many applications, including those within embedded systems. Since the National Institute of Standards and Technology (NIST) selected the speci- cation as a standard in 2002, developers of processor, microcontroller, FPGA and SoC applications have turned to AES to secure data entering, leaving and residing within their systems. The algorithm is described very efficiently at a higher abstraction level, as is used in traditional software development; but because of the operations involved, it is most efficiently implemented in an FPGA. Indeed, developers can even get some operations “for free” in the routing. For those reasons, AES is an excellent example of how developers can benefit from the Xilinx® SDSoC™ development environment by describing the algorithm in C and then accelerating the implementation in hardware. In this article we will do just that, first gaining familiarity with the AES algorithm and then implementing AES256 (256-bit key length) on the processing system (PS) side of a Xilinx Zynq®-7000 All Programmable SoC to establish a baseline of software performance before accelerating it in the onchip programmable logic (PL). To gain a thorough understanding of the benefits to be gained, we will perform the steps in all three operating systems the
SDSoC environment supports: Linux, FreeRTOS and BareMetal
Pretty much every embedded system / FPGA design has to interface to the real world through sensors or external interfaces. Some systems require large volumes of data to be moved around very quickly, in which case high-speed communications interfaces like PCI-X, Giga Bit Ethernet, USB, Fire/ SpaceWire, or those featuring multi-gigabit transceivers may be employed.
However, many embedded systems also need to interface to slower interfaces for sensors, memories and other peripherals these systems can employ one or more of the simpler communications protocols. The four simplest, and therefore most commonly used, protocols are as follows.
- UART (Universal Asynchronous Receiver Transmitter): This comprises a number of standards defined by the Electronic Industry Association (EIA), the most popular being the RS-232, RS-422, and RS-485 interfaces. These standards are often used for inter-module communication (that is, the transfer of data and supervisory control between different modules forming the system) as opposed to between the FPGA and peripherals on the same board, although I am sure there are plenty of applications that do this also. These standards defined are a mixture of point-to-point and multi-drop buses.
- SPI (Serial Peripheral Interface): This is a full-duplex, serial, four-wire interface that was originally developed by Motorola, but which has developed into a de facto standard. This standard is commonly used for intra-module communication (that is, transferring data between peripherals and the FPGA within the same system module). Often used for memory devices, analog-to-digital converters (ADCs), CODECs, and MultiMediaCard (MMC) and Secure Digital (SD) memory cards, the system architecture of this interface consists of a single master device and multiple slave devices.
- I2C (Inter-Integrated Circuit): This is a multi-master, two-wire serial bus that was developed by Phillips in the early 1980s with a similar purpose as SPI. Due to the two-wire nature of this interface, communications are only possible in half-duplex mode.
- Parallel: Perhaps the simplest method of transferring data between an FPGA and an on-board peripheral, this supports half-duplex communications between the master and the slave. Depending upon the width of data to be transferred, coupled with the addressable range, a parallel interface may be small and simple or large and complex.
The arty board provides four PMOD outputs (JA through JD) along with the Arduino / ChipKit shield connector through which we can interface with peripherals using these interfaces. Over the next few weeks I am going to interface the PModDA4 (SPI) and the PModAD2 (I2C) to the MicroBlaze System that we have created.
The first step is to generate the hardware build within Vivado such that we can interface to the PMODs. We can add in a AXI SPI controller from the IP library and configure it to be a standard SPI driver and not a dual or quad (more on those in future blogs). We can also add in the AXI IIC (remember to search for iic not i2c) controller module and connect it up to the AXI bus, do not forget to add both interrupts to the interrupt controller.
Once both controllers are within the design the next step is to customise for application at hand, with these complete we can assign the i2c and SPI pins to the correct external ports for the PMOD desired.
All that remains then is to build the hardware and write the software, it sounds so easy when we say it quickly.
Sometimes despite the simulation we have performed upon our design we still have issues with the design on the hardware. One way we can debug in the hardware is to use a ILA within our hardware design, this gives us the ability to monitor either a number of AXI interfaces or discrete inputs.
Adding in an AXI monitor on the AXI interface to the XADC we are currently using in our example we enable us to monitor the transactions. In this case just such that we can examine them however in other instances it may be necessary due to design or integration issues.
We can add in the ILA into our design from the Integration library in our block diagram.
Insertion of the ILA into the Design
By default, the ILA is configured to monitor a AXI interface however, we can change its settings by double clicking on the ILA block and customising as required. We can add in trigger ports, capture control and comparators to trigger on specific patterns in the data stream. For this example, I am going to keep it fairly simple to show the flow and then we can look at more advanced aspects once this is understood.
Customising the ILA
With the ILA inserted into the design the next stage is to reset the generated outputs and then re generate these such that we can implement the design. Once the design is implemented we can then programme the device using the hardware manager, it is via the hardware manager that we can also examine the contents of the interfaces we wished to examine.
Using the hardware manager, we should connect to our Arty target, the next step is to programme the FPGA and initialise the logic analyser. To achieve this we need both the bit file and the definition of the debug nets we wanted to examine in the Vivado block diagram, you will see this as the ltx file within the projects runs / implementation directory beside the bit file.
Identifying the ltx file
Configuring the device and debug profile.
Once these files have been configured we will see the screen below open up in the hardware manager which provides the interface we can use to configure the triggering, capture mode and arm the ILA core.
Once we have configured the core as we wish, we can examine the waveform in the waveform viewer below.
ILA Configuration and control scheme in Hardware Manager
With the simple flow explained we can, in the next blog look at how we can use more advanced features of the ILA.
Happy New Year! For the first blog of the year I thought we would combine the FreeRTOS and the XADC examples we had bbeen looking at previously.
This will enable me to show how we can do the following
- Configure the XADC
- Create a task to read from the XADC
- Create a second task to take action on the results from the first task
The intended functionality is the one task, once a second reads the XADC internal parameters and stores them within a array. This array is then communicated via a queue to the recieivng task which processes the results, for this example it just outputs them over the UART. If we wanted to this task could perform more detailed analysis or calcualtion on the results being provided to it.
We can easily modify the hello world example to perform this. If we wish to make it more complex and introduce more tasks which share resources, we must ensure they are properly managed and do not become deadlocked.
The first thing we need to do is include the proper header files such that we can use the API for the XADC we do this by including the “XSYSMON.H”.
Within the main function we are going to configure and initialise the XADC before we start the two tasks.
As we are going to be using the printf function and we are going to transfering data we need to ensure the stack size is correctly allocated. To prevent problems I decided to increase this to ensure there was sufficent for that requred for both tasks, when we create the tasks in the main() we are required to define the stack allocated to each task. For both tasks I decided to allocate 1000 bytes, I left the task pirorites as the receiving task being a higher priortiy than the transmitting task ensuring the data is transmitted as soon as it is received.
The next step was to create the queue, as I mentioned above due to the priorities of the RX and TX tasks there will only be one element in the queue, which will be the size of the size element XADC_Buf.
The final element is to start the scheduler and let the tasks run for ever well once we have written them.
We write each task as a we would any function in C, being careful to ensure we include the 1 second dealy within the transmitt task.
When I ran the code (which is available here) I got the following results.
Over the last few blogs we have written software which runs on the MicroBlaze and reads the XADC, to do this we have not used an operating system instead using a bare metal approach. However as we wish to develop more complex systems, we need to introduce an operating system, so over the next few blogs we will be looking at how we can do just that.
However first I think it is a good idea to talk a little about operating systems and real time operating systems in particular. What differentiates an RTOS from a generic operating system? An RTOS is deterministic, that means the response of the system will meet a defined deadline.
But does the system have to always meet these deadlines to be classed a real-time system?
Actually, no it does not.
There are three RTOS categories that address deadlines differently:
Hard RTOS – Missing a deadline is classified as a system failure.
Firm RTOS – Occasionally missing a deadline is acceptable and is not classified as a failure.
Soft RTOS – Missing a deadline simply reduces the usefulness of the results.
An RTOS operates around the concept of running tasks (sometimes called processes). Each of these tasks performs a required system function. For example, a task might read data from an interface or perform a calculation. A very simple real-time system may use just one task, but it is more likely that multiple tasks will be running on the processor at any one time. Switching between these tasks is referred to as “context switching” and requires that the RTOS saves the processor state for each task before the context switch starts the next task. The RTOS saves this processor state on a task stack.
Determining which task to run next is controlled by the RTOS kernel and this decision can be complicated—especially if we want to avoid deadlock where tasks lock each other out—but the two basic decision methods are:
Time sharing – Each task gets a dedicated time slot on the processor. Tasks with higher priority can have multiple time slots. Time slicing is controlled via a regular interrupt or timer. This method is often called Round Robin scheduling.
Event Driven – Tasks are only switched when a task finishes or when a higher priority task must be run. This method is often called pre-emptive scheduling
When two or more tasks want to share a resource— the XADC for example—it is possible that the tasks might request the resource at the same time. Resource access needs to be controlled to prevent contention and this is one of the operating system’s most important duties. Without the correct resource management, deadlock or starvation might occur.
Here are the definitions we’ll use for deadlock and starvation:
Deadlock – Occurs when a task holds a resource, cannot release it until the task completes, and is currently unable to complete because it requires another resource currently held by another task. If that second task requires a resource held by the first task, the system will never exit this deadlocked state. Deadlock is a bad situation for an RTOS to find itself in.
Starvation – Occurs when a task cannot run because the resources it needs are always allocated to another task. The task starves because of a lack of resources.
As you can imagine, much has been written on the subjects of deadlock and starvation over the years and there are many proposed solutions. For example, there’s Dekker’s algorithm, which was the first known correct solution to mutual exclusion. It is a shared-memory mechanism that does not require a special “test and set” instruction (but is therefore limited to managing two competing tasks) and is attributed to the Dutch mathematician Theodorus Dekker. The most commonly used method to handle deadlock is the use of semaphores, which commonly come into two types: binary semaphores and counting semaphores. A binary semaphore controls access to one resource—for example a hardware resource. Counting semaphores control access to a pool of identical, interchangeable resources such as memory buffers.
Typically each resource has a binary semaphore allocated to it. A requesting task will wait for the resource to become available before executing and once the task completes, it releases the resource. Binary semaphores commonly use WAIT and SIGNAL operations. A task will WAIT on a semaphore. When the resource is free, which could be immediately (or not), the operating system will give control of the resource to the task. When the task completes, it will SIGNAL completion and free the resource. However, if the resource is occupied when the task WAITs on the semaphore, the operating system suspends that task until the resource is free. The WAITing task might have to wait until the the currently executing task is finished with the resource or the WAIT might take longer if it is pre-empted by a higher priority task.
Introducing the concept of task priority also brings up the problem of priority inversion. There is a more flexible class of binary semaphores called mutex’s (the word “mutex” is an abbreviation for “mutual exclusion”) and these are often used by modern operating systems to prevent priority inversion.
Counting semaphores work in the same way as binary semaphores however they are used when more than one resource is available—data stores for instance. As each of the resources is allocated to requesting tasks, the count is reduced to show the number of free resources remaining. When the semaphore count reaches zero, there are no more resources available and any processes requesting one or more of these resources after the count reaches zero will be suspended until the requisite number of resources is released.
Tasks often need to communicate with each other and there are a number of methods to accomplish this. The simplest method is to use a data store managed with semaphores as described above. More complex communication methods include message queues.
When using message queues, a task that wishes to send information to another task POSTs a message to the queue. When a task wishes to receive a message from a queue, it PENDs on the queue. Message queues therefore work like FIFOs
Over the next few blogs we will look at using FreeRTOS and Micrium uc/OSiii
This is the last Arty blog of 2015 , so have a Merry Christmas and and a Happy New Year.