Unpacking Xilinx 7-Series Bitstreams: Part 1

For the past few months, I’ve been writing Xilinx 7-series bitstream manipulation tools for SymbiFlow. After building a mostly-working implementation in C++, I started to wonder what a generic framework for FPGA development tools would look like. Inspired by LLVM and partly as an excuse to learn Rust, I started a new project, Gaffe, to prototype ideas. With Xilinx 7-series fresh in my mind, I chose to reimplement the bitstream parsing as a first step. While most of the bitstream format is documented in UG470 7 Series FPGA Configuration User Guide, subtle details are omitted that I hope to clarify here.

File Formats Galore

Xilinx 7-series devices can be programmed through multiple interfaces (JTAG, SPI, BPI, SelectMAP) and multiple tools (iMPACT, SPI programmer, SVF player). This has led to multiple file formats being devised for different scenarios:

  • BIT – Binary file containing BIT header followed by raw bitstream
  • RBT – ASCII file with text header followed by raw bitstream written as literal ‘0’ and ‘1’ characters for each bit
  • BIN – Raw bitstream
  • MCS – PROM file format (includes address and checksum info)

Even though a BIN contains all the necessary data for programming a part, BIT is the default format generated by Vivado’s write_bitstream command and is what I’ll focus on.

BIT Header

Thankfully, this header format was documented on FPGA FAQ back in 2001.  It’s mostly a Tag-Length-Value (TLV) format but with a few quirks.  The information provided (design name, build date/time, target part name) are purely informational (ignored by the chip).  The main reason I mention this format is that most other tools (Vivado, openocdrequire this header to be present.

Layers of encapsulation

Past the BIT header, the raw bitstream is literally a stream of bytes that is interpreted by a 7-series part’s programming logic. Similar to networking protocols, part programming is built out of a protocol stack.

Stack of three layers from bottom to top: physical interface, packets, and frames.
Xilinx 7-Series Bitstream Layers

Starting at the base is the physical interface (JTAG, SPI, etc) used to connect to the part.  The physical interface carries a packetized format that controls the overall programming operation through a series of register reads/writes.  Part of the register set provides indexed access to the top layer of the stack: configuration memory frames.

Physical Interface Layer

As multiple physical interfaces are available, the electrical details depend on the specific interface you choose to use.  The only common piece of the physical interface layer is the detection of a sync word (0xAA995566) that begins the parsing of packets.

Any data received prior to to the sync word will not parsed as a packet but may have other effects.  For example, a few of the physical interfaces allow for multiple parallel bus widths.  The interface hardware looks for a magic sequence, called the bit width detection pattern, to determine the width of the parallel interface.  For details on how this works, see Chapter 5 of UG470.

Moving up the stack

In Part 2, I’ll be describing the packet format and the overall programming sequence.  Part 3 will focus on configuration memory frame addressing and a few places where this careful encapsulation gets violated.

Ubiquiti EdgeRouter Lite USB surgery

Keeping up with updates on all the devices in my home lab usually isn’t a big deal. Windows, macOS, Android, and iOS mostly take care of themselves. Ubuntu needs the occasional touch of ‘apt upgrade’ but is otherwise painless. My printer hasn’t had new firmware in over a year (there’s definitely a vuln or two that won’t ever be patched). Given the typical monotony, I was caught off guard when the firmware update on my router failed.

A/B firmware updates

Instead of a typical WiFi router, I use a Ubiquity EdgeRouter Lite to get some fancy features like BGP routing. Being geared toward WISPs and other industry applications, firmware upgrades are made safer by maintaining two OS images. When a new update is loaded onto the device, it overwrites the inactive OS image (the one not currently running). Once the update is written, checksummed, and ready to go, the device is rebooted and the bootloader loads the new OS image. If the new OS image fails for some reason, the bootloader will fallback to the old OS image that was running previously.

When failures repeat themselves

I SSH’d to the router and ran the update command:

kc8apf@router0:~$ add system image https://dl.ubnt.com/firmwares/edgemax/v1.9.7/ER-e100.v1.9.7+hotfix.4.5024004.tar
Trying to get upgrade file from https://dl.ubnt.com/firmwares/edgemax/v1.9.7/ER-e100.v1.9.7+hotfix.4.5024004.tar
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 80.8M 100 80.8M 0 0 4740k 0 0:00:17 0:00:17 --:--:-- 5082k
Download suceeded
Checking upgrade image...Done
Preparing to upgrade...Failed to mount partition

Uh, oh. This is about the time that I remember that this particular router failed to boot after a power outage a few months ago. The internal storage had been corrupted. I ran a recovery procedure, got it back online, and promptly forgot about it. Seems like the internal storage is failing. What can I do about it?

Surprisingly serviceable

From the Ubiquity Community forums, I learned that the internal storage is actually a USB thumb drive and can be replaced. Perfect! I have a pile of extra USB drives that I keep around for OS installers. I grabbed an 8GB drive and promptly realized I had no idea how to load the OS onto the new drive. Sure, I could pull the old drive and copy it but that would copy any corruption already present. A little more hunting in the forums turned up mkeosimg, a script that prepares a USB drive from an EdgeRouter firmware update.

The README is pretty clear that I need a Linux machine to use it. A quick look at the source confirms that it relies on a handful of tools such as parted and mkfs.ext3. The only Linux machines I have at home at rack-mount servers running containers. Getting access to a USB port on those is a bit of a chore so I took the “easier” approach of firing up an Ubuntu VM on Virtualbox.

Why so slow?!?

Almost immediately, mkeosimg tells me it is “Creating ER-e100.v1.9.7+hotfix.4.5024004-configured.img” and then sits there for a few minutes. What the heck is it doing? Creating a zero-filled file of course! While waiting for dd to copy 8GB worth of zeros to a file inside a VM is an exciting way to spend time, I like to live a fast and dangerous life. I replaced dd with a call to fallocate. fallocate simply asks the filesystem to create a file entry but don’t actually allocate sectors for data. That is, it creates an 8GB file almost instantly because it only updates the filesystem metadata. The data sectors will be allocated as data is written to the file. Now the script runs in seconds. I have an image! I write it to a USB drive. I’m nearing the end!

Opening the case

Yes, there is a USB stick inside this router. I’m not sure I’d call it a “standard” USB stick though.

EdgeRouter-Lite internal USB drive next to a 2.5" hard drive.
2.5″ hard drive for scale.

By removing all the normal casing, the router enclosure can be slightly smaller. The problem: my new USB drive has all the regular casing. A trip to the bench grinder was in order.

Now it just barely fits in the enclosure.

Well, that was fun.

Snakes in a Sysroot

Dear Brother Yocto,
Here’s a weird bug I found on my setup: pythonnative seems to be using
python3.4.3 whereas the recipe specified 2.7.

*What I tested:*
I found the code in phosphor-settings-manager, which inherits pythonnative,
seems to be working per python 3, so added the following line into a native
python function:

python do_check_version () {
  import sys
addtask check_version after do_configure before do_compile

WARNING: phosphor-settings-manager-1.0-r1 do_check_version: 3.4.3 (default,
Nov 17 2016, 01:08:31)

This is with openbmc git tip, building MACHINE=zaius

*What the recipe should be using is seemingly 2.7:*
$ cat ../import-layers/yocto-poky/meta/classes/pythonnative.bbclass

inherit python-dir

$ cat ../import-layers/yocto-poky/meta/classes/python-dir.bbclass

PYTHON_PN = "python"
PYTHON_SITEPACKAGES_DIR = "${libdir}/${PYTHON_DIR}/site-packages"

*Why this is bad:*
well, python2.7 and python3 have a ton of differences. A bug I discovered
is due to usage of filter(), which returns different objects between

Before diving into the layers of bitbake recipes to root cause what caused
this, I want to know if you have ideas why this went wrong.

-Snakes in a Sysroot

Dear Snakes in a Sysroot,
As you’ve already discovered, the version of Python used varies depending on context within a Yocto build. Recipes, by default, are built to run on the target machine and distro. For most recipes, that is what we want: we’re building a recipe so it can be included in the rootfs. In some cases, a tool needs to be run on the machine running bitbake. An obvious example is a compiler that generates binaries for the target machine. Of course, that compiler likely has dependencies on libraries that must be generated for the machine that runs that compiler. This leads to three categories of recipes: native, cross, and target. Each of these categories place their output into different directories (sysroots) both to avoid file collisions and to allow BitBake to select which tools are available to a recipe by altering the PATH environment variable.

Binaries and libraries generated by these recipes are executed on the machine running bitbake (host machine). Any binary that generates other binaries (compiler, linker, etc) will also generate binaries for the host machine.

Similar to native, the binaries and libraries generated by these recipes are executed on the host machine. The difference is that any binary that generates other binaries will generate binaries for the target machine.

Binaries and libraries generated by these recipes are executed on the target machine only.

For a recipe like Python, it is common to have a native and a target recipe (python and python-native) that use a common include file to specify the version, most of the build process, etc. That’s what you see in yocto-poky/meta/classes/python-dir.bbclass.

So, why isn’t python-native being used to run the python function in your recipe? A key insight is that you are using ‘addtask’ to tell BitBake to run this python function as a task. That means it is being run in the context of BitBake itself, not as a command executed by a recipe’s task. The sysroots described above don’t apply since a new python instance is not being executed. Instead, the recipe is run inside the existing BitBake python process. This is also how in-recipe python methods can alter BitBake’s variables and tasks (see BitBake User Manual Section 3.5.2 specifically that the ‘bb’ package is already imported and ‘d’ is available as a global variable). Since bitbake is executed directly from the command line and is a script, its shebang line tells us it will be run under python3.

If you want to run a python script as part of the recipe’s build process (perhaps a tool written in python), you’d execute that script from within a task:

do_check_version() {
  python -c '
import sys
addtask check_version after do_configure before do_compile

Since python is being executed within a task, the version of python in the selected sysroot will be used. The output will not be displayed on the console in BitBake’s output. Instead, it will be written to the task’s log file within the current build directory. If you need to do this, I highly suggest writing the script as a separate file that is included in the package rather than trying to contain it all inside the recipe.

All about those caches

Dear Brother Yocto,
When building an image, I want to include information about the metadata repository that was used by the build.  The os-release recipe seems to be made for this purpose.  I added a .bbappend to my layer with the following:

def run_git(d, cmd):
    oeroot = d.getVar('COREBASE', True)
    return bb.process.run("git --work-tree %s --git-dir %s/.git %s"
        % (oeroot, oeroot, cmd))[0].strip('\n')

python() {
  version_id = run_git(d, 'describe --dirty --long')
  if version_id:
    d.setVar('VERSION_ID', version_id)
    versionList = version_id.split('-')
    version = versionList[0] + "-" + versionList[1]
    d.setVar('VERSION', version)

  build_id = run_git(d, 'describe --abbrev=0')
  if build_id:
    d.setVar('BUILD_ID', build_id)


The first build using this .bbappend works fine: /etc/os-release is generated with VERSION_ID, VERSION, and BUILD_ID set to the ‘git describe’ output as intended. Subsequent builds always skip this recipe’s tasks even after a commit is made to the metadata repo. What’s going on?
-Flummoxed by the Cache

Dear Flummoxed by the Cache,

As you’ve already surmised, you’ve run afoul of BitBake/Yocto’s caching systems. Note the plural. I’ve discovered three and I’m far from convinced that I’ve found them all. Understanding how they interact can be maddening and subtle as your example shows.

Parse Cache

Broadly speaking, BitBake operates in two phases: recipe parsing and task execution. In a clean Yocto Poky repository, there are 1000s of recipes. Reading these recipes from disk, parsing them, and performing post-processing takes a modest amount of time so naturally the results (a per-recipe set of tasks, their dependency information, and a bunch of other data) are cached. I haven’t researched the exact invalidation criteria used for this cache but suffice it to say that modifying the file or any included files is sufficient. If the cache is valid on the next BitBake invocation, the already parsed data is loaded from the cache and handed off to the execution phase.


In the execution phase, each task is run in dependency order, often with much parallelism. Two separate but related mechanisms are used to speed up this phase: stamps and sstate (or shared state) cache. Keep in mind that a single recipe will generate multiple tasks (fetch, unpack, patch, configure, compile, install, package, etc). Each of these tasks assume that they operate on a common work directory (created under tmp/work/….) in dependency order. Stamps are files that record that a specific task has been completed and under what conditions (environment, variables, etc). This allows subsequent BitBake invocations to pick up where previous invocations left off (for example running ‘bitbake -c fetch’ and then ‘bitbake -c unpack’ won’t repeat the fetch).

Shared State cache (sstate)

You’re probably wondering what the sstate cache is for then. When I attempt to build core-image-minimal, one of the tasks is generating the root filesystem. To do so, that task needs the packages produced by all the included recipes. Worst case, everything is built from scratch as happens with a fresh build directory. Obviously that is slow so caching comes into play again. If the stamps discussed previously are available, the tasks can be restarted from the latest, valid stamp.

That helps with repeated, local builds but often Yocto is used in teams where changes submitted upstream will invalidate a bunch of tasks on the next merge. Since Yocto is almost hermetic, the packages generated by submitter’s builds will usually match the packages I generate locally as long as the recipe and environment are the same. The sstate cache maps the task hash to the output of that task. When a recipe includes a _setscene() suffixed version of a task, the sstate cache is used to retrieve the output instead of rerunning the task. This combined with sharing of a sstate cache allows for sharing of build results between users in a team. Read the Shared State Cache section in the Yocto Manual for details on how this works and how to setup a shared cache.

Back to the original problem

So, what is going wrong in the writer’s os-release recipe? If you read the Anonymous Python Function section in the BitBake Manual carefully, they are run at parsing time. That means the result of the function is saved in the parsing cache. Unless the cache is invalidated (usually by modifying the recipe or an imported class), the cached value of BUILD_ID will be used even if the stamps and sstate cache are invalidated. To get BUILD_ID to be re-evaluated on each run, the parse cache needs to be disabled for this recipe. That’s accomplished by setting BB_DONT_CACHE = “1” in the recipe.

Note that the stamps and sstate cache are still enabled. There are some subtle details about making sure BitBake is aware that BUILD_ID is indirectly referenced by the do_compile task so that it gets included in the task hash (see how OS_RELEASE_FIELDS is used in os-release.bb). That ensures that the task hash changes whenever the SHA1 of the OEROOT git repo HEAD changes which means the caches will be invalidated then as well.

Confused by all the caches yet?

-Brother Yocto

On wiring diagrams

After I finished rewiring the Cobra drag car’s trunk, I found myself needing to pull a fuse to do a test and not remembering which fuse was for what. Having all the fuses and relays in one panel was a huge improvement over the original wiring but clearly it was time for labels and a wiring diagram.

Shopping for one-off labels

These labels are going to be affixed near the fuel cell and will likely see some abuse. I want a label that has a strong adhesive, is resistant to scratching, is UV resistant, and tolerates contact with fuel. This is no ordinary sticker. This is a high quality label.

I know Sticker Mule makes custom stickers. Maybe they have something that would work. Bad sign #1: minimum order quantity. I will need precisely one of each label. Having 49 extras is a bit much. A few quick searches show any custom label order is going to have a minimum order. It makes sense: setup takes time. Time to rethink my approach.

From somewhere in the back of my mind, the name Avery jumped forth. Avery makes printable labels in all sorts of shapes and sizes formatted on standard paper sizes so any inkjet or laser printer can be used. I fondly remember churning out hundreds of their mailing labels in the 90’s by running a Word mail merge and using Avery’s provided templates. Maybe I can put a coating over their labels.

Avery UltraDuty GHS Chemical Labels box

No need! Avery makes UltraDuty GHS Chemical Labels designed for labeling chemical containers. They are promoted as great for custom NFPA and HMIS labels. That should work perfectly in a trunk. Oh, look! They even have a template! Wait… it’s in Word? I don’t even own a copy of Word anymore. Hmm, what’s this? They have an online tool instead? Ok. I’m generally against these kinds of tools but I’m getting desperate.  Five minutes later, I close the tab in frustration. Well, I have a label but I’m on my own to layout my designs.

Troublesome Tools

I am a programmer, not an artist.  I know how to draft with pencil, paper, and ruler but I much rather write code.  SVG seemed like a natural fit for the lineart and text I planned to use on these labels.  Not having used SVG before, I read a few how-to articles before diving into the specification.  Laying out the basic shape of the label was pretty easy but filling it with content became baffling. There were so many options of how to proceed with different ramifications.  Two evenings of trying to figure it out was all I could muster.  I deleted my work in progress and took a day off.

As I took some time working on other projects, I realized that wiring diagrams are a form of undirected graph rendered orthographically. GraphViz is the common tool used by programmers far and wide for this.  I quickly wrote up a few lines of DOT that represented some of the major components in my diagram (battery, emergency disconnect, fuse panel) and the +12V and ground wiring between them.  GraphViz, specifically neato, generated a diagram that was technically correct but looked horrible.  Even with ortho layout selected, lines were going underneath boxes, boxes were laid out nonsensically.  As I researched how to pin boxes to specific locations in the diagram and use shapes other than boxes, I discovered these were mostly kludges.  Yet another tool was requiring huge amounts of fiddling to get anything close to reasonable output.  Ugh!

Fine.  I’ll draw it if I have to.

I’m a Mac user.  Part of being a Mac user is discovering The Omni Group‘s fabulous collection of tools.  If I’m going to draw a wiring diagram, I’m going to do with OmniGraffle, a tool similar in concept to Visio.  In an evening, I drew both a label for the fuse panel and a wiring diagram for the trunk.  By grouping primitive shapes, I could create complex shapes that help with identifying components.  Adding magnet points to those shapes anchors the ends of lines (wires) that track as I moved the shapes and line path around.  For all that I wanted to avoid drawing, the process ended up being straightforward and effective.

Cobra Commander trunk labels made with OmniGraffle

I’ll post pictures of the installed labels when I get a chance.


HDDG22 Talk: ECUs and their sensors

Chris Gammell asked me to give a presentation at his Hardware Developers Didactic Galactic meetup in San Francisco.  I enjoy talking about things I work on so I didn’t hesitate to say yes.  I’m pretty sure he was expecting me to talk about Google’s Zaius server, an open-hardware POWER9 server design,  that I brought to a previous meetup.  Giving presentations about my day job takes some review and approvals, even for open designs.   Instead, I offered to talk about engine control, a subject I’m spending many nights on recently.

I’ve had to learn about engine tuning and EFI systems in particular to get Cobra Commander back on the track.  Since that knowledge seems to be spread in bits and pieces, I put together what I’ve learned into a presentation focusing on the sensors used and how they fit into the engine control systems.

Both slides and video are available.  Chris did a writeup for SupplyFrame’s blog as well.

MoTeC M48 Teardown

After working out the serial port pinout on my MoTeC M48 (see MoTeC PCI Cable for $20), I was left with one unidentified pin.  As the same DB9 is used with both a PCI cable and a SUU, I suspect this extra pin is how the ECU distinguishes between the two.  My M48 is already running the latest firmware so figuring out how to emulate an SUU isn’t a critical but not knowing how it works bothers me.

After trying a few things (tie pin to +5V, tie to GND), I decided I needed to open up the case and figure out where pin 12 on the ECU connector goes.

Based on the sharp, 90° bends in the traces, the whole board was definitely autorouted.  Unfortunately, the autorouter decided to route traces underneath chips which makes tracing them out much harder.  Also, the whole board is covered with conformal coating.  On the upside, a large number of components are used to protect the inputs and output from out-of-spec voltages which makes sense as incorrect wiring is bound to be a common problem in the field.

Zooming in on the main processor, I find a Motorola MC68332ACFC16 (datasheet) which is a nifty 68k variant with peripherals geared toward generating time-synchronized I/O.

Next to the main processor are a pair of Altera MAX 7000 PLDs (EPM7032LC44 datasheet).  I didn’t research enough to determine if these PLDs are used with the main processor’s time processor functions or with its expansion bus.

After a few false starts, I determined that pin 12, our extra serial pin, passes through protection circuitry before ending up at a GPIO on port E of the main processor.  There seems to be no alternate functions on that pin that make sense so next up will be reverse engineering the firmware.  Thankfully, MoTeC provides a full, unencrypted firmware as part of the M48 software on their website.  Time to buy a license for IDA.

MoTeC PCI Cable for $20

After getting the MoTeC M48 EMP software installed in Dosbox as described previously, I naively assumed the M48 just connected with normal RS232 serial.  I mean, there’s a DB9 hanging under the dash and the EMP software is looking for a serial port.  Of course it couldn’t be that simple.

Coiled up grey cable with DB9 connectors.
Motec PCI Cable: Pure Unobtainium

It seems that MoTeC decided to save a dollar in the M48 by not including RS232 level shifters.  Instead, it has 5V TTL serial on non-standard pins.  MoTeC helpfully offers cables, called PCI and CIM, to bridge the gap from an RS232 serial port to the ECU.  The downside? These cables are practically unobtainium (or at least cost $220 for one).  Knowing that it’s just TTL serial, why is this cable so expensive?  Can’t I just make one?

Official wiring diagram for M48 serial port

MoTeC offers a wiring diagram for the M48 but it doesn’t tell you what signals are on the DB9 pins.  At least they were clear that pin 6 is ground.  After some time with a voltmeter and oscilloscope, I figured out the following pinout:

DB9  Signal  ECU
1     +5V     24
2     N/C
3     N/C
4     N/C
5     TXD     11
6     GND     27
7     ???
8     N/C
9     RXD      9
Black cable with USB Type-A on one end and 0.1" header with 5V TTL serial on the other
Sparkfun FTDI Cable 5V: Functional equivalent to MoTeC PCI cable

So, what’s inside that $220 cable?  Probably a MAX232.  A simpler, modern alternative is a FTDI 5V cable from Sparkfun for $20.  Either make a connector to adapt to DB9 or wire directly into the ECU connector pins.

Now the M48 EMP software running in Dosbox is talking to the ECU over the FTDI cable.  I’ve successfully downloaded the current tune and some log data.  I even discovered that the air temperature sensor was showing a fault and was able to clear it by reseating the connector on the sensor.

Next up is inventorying the sensors and actuators installed and finding their datasheets.  The goal is to get a good understanding of the tune that is currently loaded and get ready for logging data during dyno runs.  That probably means building a board that can capture the telemetry the ECU spews out the serial port during normal operation.  Then I’ll start digging into the protocol used by EMP to configure the ECU.

Drag Race Cobra and MoTeC M48 ECU

1996 Cobra modified for drag racing parking in a garage

A few month ago, a coworker decided to buy a built drag racing car and was looking for someone to help crew.  I emphatically offered my services and we started working out what exactly he had bought.

The car started as a stock 1996 Ford Mustang Cobra with a 4.6L V8 engine (in Mystic, no less).  The engine has been rebuilt with a supercharger and nitrous.  The drivetrain has been completely replaced.  Yet, it still has power windows, doors, and steering.  The story from the seller was that the original owner had some outrageous tuning that needed the nitrous for cooling, not power.  The seller, the 2nd owner, had had a professional tuner redo the tuning from scratch to get something that would work as a daily driver, an 800HP daily driver.  It’s an interesting beast to say the least.

I started looking into the ECU to figure out what we had to work with.  It’s a MoTeC M48 which is a well-respected ECU from the 90s.  MoTeC still has the manual and software available for download but that’s where this story starts to get annoying.  The M48 software is a DOS-only application that talks to the ECU over serial.  Most of the applications will work under Windows 7 but not all.  I couldn’t get it to work with Windows 10 at all.  After a bit of research, others had had luck with Dosbox.

Screenshot of MoTeC M48 EMP software
MoTeC M48 Software

I use a MacBook Air so I tried Boxer which is just a nice UI over Dosbox.  Installing the MoTeC software is…odd.  The download from MoTeC is a Windows installer that dumps a DOS-based installer into C:\Motec.  The DOS-based installer is built around floppy images and is generally quirky.  I was able to successfully get everything installed in a Windows Vista VM I had handy and then copy C:\Motec into a Dosbox disk image.  If you are looking to use MoTeC M48 EMP on a Mac and want to save a bunch of time, email me.

One last note about using Boxer: the Boxer UI doesn’t expose any settings for serial ports.  Do not despair; Boxer’s Dosbox core has all the serial port support included.  You just need to add the following to the DOSBox Preference.conf stored inside the Boxer disk image bundle (right-click on the disk image in Finder and click Show Package Contents):

serial1=directserial realport:cu.usbserial-A100P1XB

Change cu.usbserial-A100P1XB to the serial port of your choice (one of /dev/cu.*) and you’ll be good to go.

A Hash Mismatch Made in BitBake

Dear Brother Yocto,

These ‘vardeps’ variables in Yocto are entirely magic and undocumented from my best reading of the Mega Manual. There is something like one paragraph that says “you might need to change / add these to make your recipe work in some cases”.  It would be nice to have an explanation of when to use vardeps and when to vardepsexclude.  I get that they are something to do with the sstate caching hash.

–Tired of ‘Taskhash Mismatch’

Dear Tired of ‘Taskhash Mismatch’,

Task hashes and their interactions with various caching mechanisms are indeed a confusing part of Yocto.  I’ve always found “ERROR: os-release-1.0-r0 do_compile: Taskhash mismatch 2399d53cb73a44279bfdeece180c4689 verses 9298efe49a8bf60fc1862ebc05568d32 for openbmc/meta/recipes-core/os-release/os-release.bb.do_compile” to be surprisingly unhelpful in resolving the problem.  Worse is when the problem manifests as a failure to run a task when something has changed instead of an error.

As you noted, the Yocto Developer Manual and Yocto Reference Manual are sparse on this topic.  A very careful reader will catch a reference in Section 2.3.5 of the Reference Manual to the Variable Flags section of the BitBake User Manual.  While that section does describe what these magic [vardeps] and [vardepsexclude] incantations do, I suspect you, the reader, are left wondering why they are in an entirely different manual.  Even though the signs are there for the astute observer (what command do you run to build an image?), many Yocto users don’t realize Yocto is an infrastructure built upon an independent tool called BitBake.

Yocto is just BitBake metadata

Per the BitBake User Manual, “BitBake is a generic task execution engine that allows shell and Python tasks to be run efficiently and in parallel while working within complex inter-task dependency constraints.”  Yocto is a set of tasks executed by BitBake to assemble Linux distributions.  This leads to a few non-trivial insights that I’ll spare you from discovering on your own:

  • Yocto recipes are really BitBake recipes (hence why they use a .bb extension) and follow the syntax described in Chapter 3: Syntax and Operators.
  • Many of the “magic” variables used in recipes (PN, bindir, etc) are defined in meta/conf/bitbake.conf.
  • Recipe lookup is actually controlled by BBPATH, not the order of layers in bblayers.conf.  BBPATH just happens to be modified in each layer’s layer.conf (see Locating and Parsing Recipes).

Tasks and Caching

As a Yocto developer, I tend to think in terms of recipes (because they usually correspond to a package) but BitBake’s fundamental unit of work is a task.  A task is a shell or Python script that gets executed to accomplish something, usually generating some form of output.  Often a task depends on the output from some other task and BitBake offers mechanisms for specifying these dependencies (see Dependencies for details).  BitBake will execute the tasks with as much parallelism as possible while maintaining dependency order.

For smaller projects, a straightforward execution of the tasks, where each task is always run on every invocation, might be OK.  When building a full Linux distribution like Yocto does, every invocation could take hours.  Caching to the rescue!

BitBake has multiple caching mechanisms but for this discussion only the setscene (or sstate) cache is relevant.  Setscene is a BitBake mechanism for caching build artifacts and reusing them in subsequent invocations.  Conceptually, recipes can provide a setscene version of a task by creating a new task of the same name with a _setscene suffix.  This version of the task attempts to provide the build artifacts and, if successful, the main version of the task is skipped.  The above link contains a more in-depth discussion including a lot implementation details necessary for anyone writing a setscene task.

Cache Invalidation and the Task Hash

As with any caching system, the setscene cache needs a way to decide when to invalid an entry.  In the case of setscene, it needs to know when the task would generate meaningfully different output.  That’s exactly what a task hash is supposed to answer.

Note: The Yocto and BitBake manuals tend to use the terms task hash, checksum, and signature interchangeably.  I find this especially confusing when thinking about do_fetch because each source file also has a checksum associated with it.  I’ll be using task hash throughout here but keep in mind that you might see the other terms when reading the documentation.

Since a task is just a shell or Python script, taking the hash of the script should tell us when it changes, right?  Mostly, but not quite.  The task’s dependencies can easily affect the output of this task.  OK, so what if we combine the hash of the script with the hashes of all the dependencies and use that to identify the task?  Excellent! That’s more or less what a task hash is: a combined hash of the task basehash (represents the task itself) and the task hash of each dependency.

Why did I use the term “basehash” above instead of script hash?  Any variable used in the script has the potential for changing the script.  Consider a case where the current date is used as part of a build stamp.  Should the task be considered meaningfully changed whenever the date changes?   The basehash is a more flexible hash that allows specific variables to be excluded (or included) in the hash calculation.  By excluding variables such as the current date and time, the task becomes more stable and the artifacts can be reused more often.

Is that what [vardeps] and [vardepsexclude] do then?

Yup.  [vardeps] and [vardepsexclude] are Variable Flags that add or remove variables from the task’s hash.  When you get a “Taskhash mismatch” error like the one from os-release at the start of this post, BitBake is telling you that the task hash changed unexpectedly; the script didn’t change but the task hash did.  In the os-release case, the original os-release.bb from Yocto declared that the do_compile task was dependent on the VERSION, VERSION_ID, and BUILD_ID variables so those were included in its task hash.  OpenBMC’s  os-release.bbappend uses an anonymous Python function to construct the VERSION, VERSION_ID, and BUILD_ID variables by querying the metadata’s git repo.  Subsequently, the task hash will change even though the script didn’t actually change.  The fix was simple: remove those variable as task hash dependencies.

–Brother Yocto