Fast USB connection on the Nexys Video using FT2232H

Published on sam 25 juin 2016 in VHDL, (Comments)

Over the past years, I developed a strong interest for logic circuit design and VHDL coding, in addition to computer programming and Artificial Intelligence. In this blog post, I present a small project that consists of exchanging data to and from a Digilent Nexys Video FPGA board and a computer using an USB 2.0 link. Instead of implementing a whole USB stack or relying on a slow RS-232 connection, this project uses the FT2232H chip available on the Nexys Video to exchange data at more than 14 MiB/s.

The board

Close-up of the Artix 7 200T FPGA on the Nexys Video

First, a bit of introduction. The Nexys Video is a development board featuring a Xilinx Artix 7 200T FPGA, the largest Artix FPGA (and therefore the largest no-too-expensive Xilinx FPGA). The FT2232H chip is available on many boards (especially from Digilent, but possibly also from other brands), which makes this project quite general. Moreover, I never use any digilent-specific tool or peripheral in this blog post.

Plenty of other FPGA development boards exist, some of them much less expensive, but I needed the large 200T FPGA for a personal project. The Nexys Video also has very interesting features, like an HDMI sink (for video acquisition), gigabit Ethernet (so, you can acquire video and stream it on the network if you want), DDR3 memory and a nice amount of controllers (Ethernet MAC, audio chip, and a microcontroller that allows the FPGA to be programmed from a USB thumb drive or a SD card, in addition to JTAG-over-USB and QSPI).

The FT2232H chip

The FT2232H chip allows a device to send and receive data over a USB connection is many different ways, the chip being highly configurable by writing values in a small EEPROM memory or by configuring it directly over the USB connection. It is most often used as an UART-to-USB bridge, that allows microcontrollers and FPGA to very easily communicate with a computer. However, serial communication with the FT2232H is quite slow, with a top speed at around 1 MiB/s when special care is taken.

Fortunately, the FT2232H also has a very high-speed mode, that also happens to be quite simple to use. The FPGA or microcontroller communicates with the chip using a 8-bit bus, and does so at 60 Mhz using a clock produced by the device (in Synchronous FIFO mode). It is also possible to have the device provide the clock to the FTDI chip (Asynchronous FIFO mode, as the FTDI chip samples data on fall edges of specific signals), but this is a bit slower as timing is more conservative in this mode.

Enabling Synchronous FIFO mode is bit difficult as it is a two-step process. First, the FTDI chip must be configured in Asynchronous FIFO mode, using either a special USB command or the EEPROM memory. Fortunately, the Digilent Nexys Video board configures its FTDI chip in Asynchronous FIFO mode by default. The Synchronous mode must be enabled afterwards, and has to be so using a USB command.

Libftdi

FTDI chips present themselves on the USB bus as modem devices, using the USB CDC class. On Linux, the FT2232H chip exposes two devices, /dev/ttyUSB0 and /dev/ttyUSB1, one for each channel (the chip allows two devices to be connected to it, a JTAG programmer and a general-purpose connection on the Nexys Video). On Windows, COM1 and COM2 are available once a driver is installed.

Opening /dev/ttyUSB0 and writing to it allows to communicate with the device connected to the first port of the FT2232H chip using its default mode, Asynchronous FIFO in the case of the Digilent Nexys Video. However, there is now way to enable Synchronous FIFO using only this ttyUSB0 device, we must use libftdi.

libftdi is a pure user-space library, built on libusb, that allows to detect FTDI chips, configure their ports, send and receive data. This library is very easy to use but has one major problem: it unregisters the USB CDC driver. This means that once a libftdi call is made, the /dev/ttyUSB devices disappear and cannot be used anymore. All the communication with the chip needs to be done using libftdi.

Configuring the FT2232H chip

In this section, I present a very small program that opens a FT2232H USB device, configures it in Synchronous FIFO mode, receives a whole bunch of data (for benchmarking purposes), then allows the user to type numbers that will be sent to the device. This is not very useful but is a nice experiment, and I used it to turn on or off LEDs on the Nexys Video.

First, some includes, that require that libftdi1-devel or libftdi-devel is installed. This library is available in all major Linux distributions and Mac OS ports. It is also available on Windows but I don't have any Windows machine on which to test that.

1
2
3
4
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <ftdi.h>

Opening an FTDI device is quite easy and requires only a couple of function calls. The ftdi variable points to a context that allows all these functions to store information on the device being used

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
int ret;
struct ftdi_context *ftdi;

// Initialize FTDI library
ftdi = ftdi_new();

if (!ftdi) {
    fprintf(stderr, "Cannot initialize libftdi\n");
    return EXIT_FAILURE;
}

// Open device. You will have to change 0x6010 !
ret = ftdi_usb_open(ftdi, 0x0403, 0x6010);

if (ret < 0)
{
    fprintf(
        stderr,
        "Cannot open FTDI device 0x0403:0x6010: %d (%s)\n",
        ret,
        ftdi_get_error_string(ftdi));

    ftdi_free(ftdi);
    return EXIT_FAILURE;
}

The USB device major-minor numbers can be found using lsusb. Find the line talking about your FTDI device, and look at the numbers that look like 0403:6010 or 0403:6001. Each FTDI device has a different number, and some board manufacturers change the number in order to use their own (if they have their own driver for instance).

Now that the device is open, we can configure it in Synchronous FIFO mode. Because the Digilent board is already in Asynchronous FIFO mode, this is very simple as a simple flag has to be toggled. On other boards, more libftdi calls may be required (one more, I think, the one that puts the chip in Asynchronous FIFO mode). I invite you to consult the very nice libftdi documentation for that.

1
2
3
4
5
ret = ftdi_set_bitmode(ftdi, 0xFF, BITMODE_SYNCFF);

if (ret != 0) {
    fprintf(stderr, "Failed to set mode to Synchronous FIFO");
}

Sending and receiving data

When the first libftdi call is made, the USB CDC driver is unregistered and the /dev/ttyUSB devices disappear. This means that they cannot be used anymore to easily exchange data with the device, by using cat or dd for instance. From now on, all the communication with the device must go through libftdi. Fortunately, the library exposes file-like functions that are quite simple to use. Here is how I benchmark the read speed of the device:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
// Read 1024 bytes of data
unsigned char buf[1024];

for (int i=0; i<32*1024; ++i) {
    ftdi_read_data(ftdi, buf, sizeof(buf));
}

for (int i=0; i<1024; ++i) {
    printf("%i ", (int)buf[i]);
}

The first for reads 32 MiB of data. The second one displays the last 1024 bytes in integer format, so that I can check that the data is as expected (see next section for how to generate this data on the FPGA). In my experiments, data is read at around 14 MiB/s, which means that the FTDI chip is able to use 112 Mbps of the 480 Mbps USB connection. Not so fast but still much faster than RS-232. The only way to go faster is to use the Gigabit Ethernet port of the board, but this is much more complex to put in place (on the FPGA side). By the way, I highly recommend you to visit hamsterworks.co.nz if you don't know this website yet, this is very interesting especially if you own a Nexys Video or Arty FPGA board.

In my project, I allow the user to send 10 bytes of data to the FPGA, that will use these bytes to turn LEDs on or off:

1
2
3
4
5
6
7
8
unsigned int v;

for (int i=0; i<10; ++i) {
    printf("8-bit value to send to the LEDs (hexadecimal): ");
    scanf("%x", &v);

    ftdi_write_data(ftdi, (unsigned char *)&v, 1);
}

That's it for the computer part. Now, the FPGA must be able to communicate with the FTDI chip.

The FPGA

The FPGA side of the project is simple and complex at the same time. Simple as the FTDI chip is very easy to use (parallel port, the clock is provided, not too many signals to handle, easy timing). Complex because getting accurate timing information is quite challenging: the FTDI FT2232H datasheet lists timing information on page 41, but the way it is presented is quite misleading.

On page 41, we see (when writing) that the clock period of the FTDI chip is 16.67 ns, for a clock frequency of 60 Mhz. We also see that the data that we want to write must be on the parallel bus two clock cycles after the chip told us that it accepts data. However, the timing table tells us that the setup time for the data is 16.67ns, a full clock cycle, and no hold time is needed.

This is a quite uncommon way of expressing time. However, we can understand this table and schema in another way: on clock cycle N, the FPGA can sample TXE, the signal that tells us that data can be sent to the chip. On clock cycle N+1, the FPGA must place the data on the parallel bus and deassert WR, the signal that tells the FTDI chip that data has to be written. On clock cycle N+2, the data can be removed from the bus. Seen from clock cycle N+2, we have a one-clock-cycle setup time and no hold time.

From what I can see in the datasheet, it appears that the FTDI chip samples and produces signals on falling clock edges, while the FPGA does that on rising edges. This seems very natural but requires that we are careful: the data that the FPGA produces will be sampled 8.335 ns after the rising clock edge, not 16.67 ns. So, don't forget to add an output delay constraint in order to ensure that the FPGA produces data on time.

Implementation

The datasheet coupled with the above paragraphs should allow you understand a simple example. In this example, the FPGA sends data whenever possible, producing an ever-repeating sequence of numbers from 0 to 255. This allows to easily check on a computer that no byte is missing or duplicated. When data arrives, the FPGA copies it to the 8 LEDs available on the Nexys Video.

The interface of the VHDL module is quite simple. The FTDI chip provides the 60 Mhz clock, an input/output parallel bus, and a number of signals. These signals allow to know when data is available or data can be written. The FPGA can say whether it wants to read or write, and what is the direction of the parallel bus. There is also a very nice signal that allows the FPGA to send a wake-up USB packet, so that you FPGA can wake-up your computer if need be.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;

entity ftdi_serial is
    port (
        -- Clock from the FTDI chip
        prog_clko : in std_logic;

        -- Wired to a switch in my case. Resets
        -- are usually not used in FPGAs.
        reset : in std_logic;

        -- Data sent to the LEDs and received from the
        -- FTDI chip
        led : out std_logic_vector(7 downto 0);
        prog_d : inout std_logic_vector(7 downto 0);

        -- Signals, OEN gives the direction of the bus,
        -- SIWU allows to wake-up the computer.
        prog_rxen : in std_logic;
        prog_txen : in std_logic;
        prog_rdn : out std_logic;
        prog_wrn : out std_logic;
        prog_siwun : out std_logic;
        prog_oen : out std_logic -- '0' when the FTDI chip can write
    );
end ftdi_serial;

The implementation is a simple state-machine. Reading needs two states, one that actually reads the data (and tells the FTDI chip that data has been read), and one that gives one clock cycle to the FTDI chip to update prog_rxen. Writing is a bit more complex as three clock cycles are needed: the first one changes the direction of the data bus, the second one places data on the bus now that the FPGA can drive it, and the third cycle simply waits so that the FTDI chip can process the data on the bus and update prog_txen.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
architecture Behavioral of ftdi_serial is
    type state_t is (idle, read_wait, do_write, write_wait, write_wait2);

    -- Counter allows to produce the sequence of bytes
    -- sent to the computer
    signal counter : unsigned(7 downto 0);
    signal state : state_t;
begin
    process(prog_clko)
    begin
        if rising_edge(prog_clko) then
            if reset = '1' then
                counter <= (others => '0');
                prog_d <= (others => 'Z');
                led <= "10101010";

                state <= idle;
                prog_rdn <= '1';
                prog_wrn <= '1';
                prog_siwun <= '1';
                prog_oen <= '0';
            else
                prog_rdn <= '1';
                prog_wrn <= '1';
                prog_oen <= '0';            -- By default, let the FTDI chip
                                            -- drive the bus
                prog_d <= (others => 'Z');

                case state is
                    when idle =>
                        -- Give priority to reads over writes, so that
                        -- the FPGA immediately responds to user queries.
                        if prog_rxen = '0' then
                            -- Data avilable on the bus (OEN is '0' by default)
                            led <= prog_d;

                            prog_rdn <= '0';
                            state <= read_wait;
                        elsif prog_txen = '0' then
                            -- Write possible, tell the FTDI chip that we
                            -- will drive the bus
                            prog_oen <= '0';
                            state <= do_write;
                        end if;
                    when read_wait =>
                        -- One cycle delay to let the FTDI chip process
                        -- the write and update prog_rxen
                        state <= idle;
                    when do_write =>
                        -- Place data on the bus
                        prog_d <= std_logic_vector(counter);
                        prog_wrn <= '0';
                        prog_oen <= '0';

                        counter <= counter + 1;
                        state <= write_wait;
                    when write_wait =>
                        -- Delay so that the FTDI chip can process prog_wrn
                        -- and update prog_txen
                        state <= idle;
                end case;
            end if;
        end if;
    end process;
end Behavioral;

When dealing with an inout port in VHDL, we must not forget to assign it 'Z' before trying to read from it. On an electrical perspective, Z puts the port in high-impedence (input) mode. On a purely language side, the resolution function for std_logic_vector treats Z as unknown and gives priority to other drivers. If one end of a signal sets it to Z and the other end sets it to 1, for instance, then the signal takes a value of 1.

This implementation is very simple and cannot really be used for real-world things. In the next post, An USB to AXI master interface, I present a small IP (and accompanying program) that allows to read and write to arbitrary addresses over an AXI bus. I use it to store files in the 512 MB DDR3 RAM available on the Nexys Video.

« kdev-qmljs 1.7.0 released   FT2232H USB to AXI master »