Unpacking Xilinx 7-Series Bitstreams: Part 3

In Part 2, I detailed the configuration packet format and how the programming operation is conveyed as a sequence of register writes. As we move up to the configuration memory frame layer (see layer diagram in Part 1), the  construction of a Xilinx 7-series device becomes important. A major clue about this relationship comes from the Frame Address Register, a key register in any programming operation.

Frame Address Register (FAR)

Just before the first write of a configuration frame to the Frame Data Register, Input Register (FDRI), a 32-bit value is written to the Frame Address Register (FAR) to indicate where the new frame data should be placed in the configuration memory. What do these addresses tell us about the device construction?  UG470 tells us these 32-bit addresses are comprised of the following fields:

Xilinx 7-Series Frame Address Register

Blocks are further described as only using a few specific values:

  • 000 CLB, I/O, CLK
  • 001 Block RAM content
  • 010 CFG_CLB

This looks very much like a geographical addressing scheme similar to the Bus/Device/Function scheme used for PCI Configuration Space. That is, the device is constructed of a hierarchy of component groupings. In the case of PCI, a system may contain 256 buses, each of which may contain up to 32 devices.  Further each device may contain up to 8 functions. Identifying a specific function requires identifying the bus and device that contain it as well. Thus, function addresses in PCI are a tuple of (bus, device, function). Balancing complexity of address decoding logic with address compactness leads to representing each component of the tuple as a binary number with the minimum number of bits needed to represent the maximum allowed value and then concatenating those numbers into a single binary number padded to a common alignment size (8, 16, 32, or 64 bits).

Inferring Device Architecture

What does this tell us about 7-series devices then? A device is constructed of some hierarchy of block types, device halves, rows, columns, and minor frames. The FAR field descriptions of UG470 gives us a few more details:

  • Rows are numbered outward from the device center-line in the direction specified by the top/bottom bit
  • Columns are numbered with zero on the left and increasing to the right
  • Minor frames are contained within a column

Looking back at the FAR description, it seems that the fields are ordered such that each component contains all the components to the right of its field in FAR. That matches with traditional meanings of the terms used except the relationship between block types and halves. If rows are numbered growing outward from the center-line of the device, that implies there are only two halves in a device, not two per block type. Recall that only three block type values are used. What if instead of being part of the hierarchy, the block type selects one of multiple data buses going into the hierarchy? That would match the terms used better.

Combining those conclusions with the field bit-widths in FAR, we end up with the following addressing limits:

  • Block Types: 3
  • Halves: 2
  • Rows: up to 32 per half
  • Columns: up to 1024 per row
  • Minor frames: up to 128 per column

Putting all of that together, the device looks something like this:

Next: Verifying Against a Bitstream

In Part 4, I’ll look at the FAR and FDRI writes done by a Vivado-generated bitstream and see how well it matches my inferred device architecture. I’ll use a bitstream debugging feature to figure out the valid frame addresses in a device which results in a few surprises.

Unpacking Xilinx 7-Series Bitstreams: Part 2

In Part 1, I walked through the various file formats generated by Xilinx tools, the BIT file format header, and the physical interface layer of the bitstream protocol stack. In this part, I’ll dive into the gory details of the configuration packet format and how those packets control the overall programming operation.

As I briefly mentioned in Part 1, the physical interface layer transports a stream of packetized register read/write operations that constitute the configuration packet layer. The sync word that begins the packet stream also serves to establish 32-bit alignment within the overall byte stream carried by the physical interface layer. From that point on, all data formats are described in 32-bit, big-endian words.

Note that the physical interface used may impose limitations on the features available at the configuration packet layer. I’ll call out these limitations when describing features that are impacted.

Configuration Packet Format

Xilinx 7-Series Configuration Packet Header

Each configuration packet begins with a one-word header. The contents of the header change according to the header type which is contained in the top 3 bits. Only types 1 and 2 are officially documented though type 0 exists in practice as we’ll see later.

Xilinx 7-Series Configuration Packet Type 1 Header

Type 1 packets specify a complete operation to be performed with opcodes being defined as follows:

  • 00 – NOP
  • 01 – Read
  • 10 – Write

For a NOP, the remaining header fields are unused for this operation but the address field is important for type 2 packets. Reads and writes are directed at a specific register specified in the address field. While 14 bits of address space is defined in the header, 7-series devices seem to only use the lower 5 bits. Payload length is the number of data words to be read or written. These data words immediately follow the header with writes being sent to the device and reads being sent from the device. Note: reads are only available over SelectMAP and JTAG physical interfaces.

Xilinx 7-Series Configuration Packet Type 2 Header

Type 2 packets are used when the payload length exceeds the 11 bits available in a type 1 packet. Note the lack of an address field. Remember how I mentioned the address field being important for a NOP? The address field of the last type 1 packet is reused as the target of a type 2 packet. Only the address is reused so, in theory, a type 1 read could be followed by a type 2 write. In practice, I’ve only seen type 2 used immediately after a zero-length type 1 write.

Configuration Registers

Addresses specified in configuration packets are mapped 1-to-1 to a set of variable-width registers. Most of the registers are a single word wide but FDRI and FDRO are notable exceptions. I have not experimented with what happens if a packet attempts a short write or a write past the end of the register.

These registers provide low-level control over the chip including boot configuration and programming. Many of the available knobs are related to tuning physical interface behavior and which status/debug signals are available on pins. A few of the key registers used during programming are:

    Before writing to the configuration memory, a 32-bit device ID code must be written to this register. Reads from the register return the attached device’s ID code.
  • CRC
    When a packet is received by the device, it automatically updates an internal CRC calculation to include the contents of that packet. A write to the CRC register checks that the calculated CRC matches the expected value written to the register. This CRC check is only used to provide integrity checking of the packet stream, not the configuration memory contents, and are not required for programming. If you are modifying a bitstream, CRC writes can simply be removed instead of recalculating them.
  • Command
    Most of the programming sequence is implemented as a state machine that is controlled via one-shot actions. Writes to this register arm an action that, depending on the action requested, may be triggered immediately or delayed until some other condition is met.
    Important Note: During autoincremented frame writes (described later), the current command is rewritten during every autoincrement. This has the effect of rearming the action on every frame written.
  • Frame Address Register (FAR)
    Writes to this register set the starting address for the next frame read or write.
  • FDRI
    When a frame is written to FDRI, the frame data is written to the configuration memory address specified by FAR. If the write to FDRI contains more than one frame, FAR is autoincremented at the end of each frame.

For more details on these registers and others I didn’t mention, refer to Table 5-23 in UG470.

Programming Sequence

I’ll only be providing a high-level overview of a programming sequence for a complete write of the configuration memory. Partial reconfiguration uses a slightly different sequence that I’ll document in a separate post. I highly suggest looking at a bitstream as there are details such as NOPs that I am omitting that may be important when actually programming a device.

  1. Write TIMER: 0x000000000
    Disable the watchdog timer
  2. Write WBSTAR: 0x00000000
    On the next device boot, start with the bitstream at address zero.  This may be different if the bitstream contains a multi-boot configuration.
  3. Write COMMAND: 0x00000000
    Switch to the NULL command.
  4. Write COMMAND: 0x00000007
    Reset the calculated CRC to zero.
  5. Write register 0x13: 0x00000000
    Undocumented register. No idea what this does yet.
  6. Write Configuration Option Register 0: 0x02003fe5
    Setup timing of various device startup operations such as which startup cycle to wait in until MMCMs have locked and which clock settings to use during startup.
  7. Write Configuration Option Register 1: 0x00000000
    Writing defaults to various device options such as the page size used to read from BPI and whether continuous configuration memory CRC calculation is enabled.
  8. Write IDCODE: 0x0362c093
    Tell the device that this is a bitstream for a XC7A50T. If the device is an XC7A50T, configuration memory writes will be enabled.
  9. Write COMMAND: 0x00000009
    Activate the clock configuration specified in Configuration Option Register 0. Up to this point, the device was using whatever clock configuration the last loaded bitstream used.
  10. Write MASK: 0x00000401
    Set a bit-wise mask that is applied to subsequent writes to Control 0 and Control 1. This seems unnecessary for programming but is used to toggle certain bits in those registers instead of using precomputed values. It might make more sense in a use case where the exact value of Control 0 or Control 1 is unknown but a bit needs to be flipped.
  11. Write Control 0: 0x00000501
    Due to the previous write to MASK, 0x401 is actually written to this register which is the default value. Mostly disable fallback boot mode and masks out memory bits in the configuration memory during readback.
  12. Write MASK: 0x00000000
    Clear the write mask for Control 0 and Control 1
  13. Write Control 1: 0x00000000
    Control 1 is officially undocumented. See Part 3 for at least one bit I’ve figured out.
  14. Write FAR: 0x00000000
    Set starting address for frame writes to zero.
  15. Write COMMAND: 0x00000001
    Arm a frame write. The write will occur on the next write to FDRI.
  16. Write FDRI: <547420 words>
    Write desired configuration to configuration memory. Since more than 101 words are written, FAR autoincrementing is being used. 547420 words is 5420 frames. Between each frame, COMMAND will be rewritten with 0x1 which re-arms the next write. Note that the configuration memory space is fragmented and autoincrement moves to the next valid address. As we’ll see in Part 3, this is a rather annoying feature that makes reading bitstream configuration data a bit more challenging.
  17. Write COMMAND: 0x0000000A
    Update the routing and configuration flip-flops with the new values in the configuration memory. At this point, the device configuration has been updated but the device is still in programming mode.
  18. Write COMMAND: 0x00000003
    Tell the device that the last configuration frame has been received. The device re-enabled its interconnect.
  19. Write COMMAND: 0x00000005
    Arm the device startup sequence. Documentation claims both a valid CRC check and a DESYNC command are required to trigger the startup. In practice, a bitstream with no CRC checks works just fine.
  20. Write COMMAND: 0x0000000D
    Exit programming mode. After this, the device will ignore data on the configuration interfaces until the sync word is seen again. This also triggers the previously armed device startup sequence.

Next: Configuration Memory

In Part 3, I’ll cover how configuration memory is addressed and how that gives us some clues about the physical chip structure. I’ll also look at a very curious detail that violates the protocol stack encapsulation.

Unpacking Xilinx 7-Series Bitstreams: Part 1

For the past few months, I’ve been writing Xilinx 7-series bitstream manipulation tools for SymbiFlow. After building a mostly-working implementation in C++, I started to wonder what a generic framework for FPGA development tools would look like. Inspired by LLVM and partly as an excuse to learn Rust, I started a new project, Gaffe, to prototype ideas. With Xilinx 7-series fresh in my mind, I chose to reimplement the bitstream parsing as a first step. While most of the bitstream format is documented in UG470 7 Series FPGA Configuration User Guide, subtle details are omitted that I hope to clarify here.

File Formats Galore

Xilinx 7-series devices can be programmed through multiple interfaces (JTAG, SPI, BPI, SelectMAP) and multiple tools (iMPACT, SPI programmer, SVF player). This has led to multiple file formats being devised for different scenarios:

  • BIT – Binary file containing BIT header followed by raw bitstream
  • RBT – ASCII file with text header followed by raw bitstream written as literal ‘0’ and ‘1’ characters for each bit
  • BIN – Raw bitstream
  • MCS – PROM file format (includes address and checksum info)

Even though a BIN contains all the necessary data for programming a part, BIT is the default format generated by Vivado’s write_bitstream command and is what I’ll focus on.

BIT Header

Thankfully, this header format was documented on FPGA FAQ back in 2001.  It’s mostly a Tag-Length-Value (TLV) format but with a few quirks.  The information provided (design name, build date/time, target part name) are purely informational (ignored by the chip).  The main reason I mention this format is that most other tools (Vivado, openocdrequire this header to be present.

Layers of encapsulation

Past the BIT header, the raw bitstream is literally a stream of bytes that is interpreted by a 7-series part’s programming logic. Similar to networking protocols, part programming is built out of a protocol stack.

Stack of three layers from bottom to top: physical interface, packets, and frames.
Xilinx 7-Series Bitstream Layers

Starting at the base is the physical interface (JTAG, SPI, etc) used to connect to the part.  The physical interface carries a packetized format that controls the overall programming operation through a series of register reads/writes.  Part of the register set provides indexed access to the top layer of the stack: configuration memory frames.

Physical Interface Layer

As multiple physical interfaces are available, the electrical details depend on the specific interface you choose to use.  The only common piece of the physical interface layer is the detection of a sync word (0xAA995566) that begins the parsing of packets.

Any data received prior to to the sync word will not parsed as a packet but may have other effects.  For example, a few of the physical interfaces allow for multiple parallel bus widths.  The interface hardware looks for a magic sequence, called the bit width detection pattern, to determine the width of the parallel interface.  For details on how this works, see Chapter 5 of UG470.

Moving up the stack

In Part 2, I’ll be describing the packet format and the overall programming sequence.  Part 3 will focus on configuration memory frame addressing and a few places where this careful encapsulation gets violated.