All about those caches

Dear Brother Yocto,
When building an image, I want to include information about the metadata repository that was used by the build.  The os-release recipe seems to be made for this purpose.  I added a .bbappend to my layer with the following:

def run_git(d, cmd):
  try:
    oeroot = d.getVar('COREBASE', True)
    return bb.process.run("git --work-tree %s --git-dir %s/.git %s"
        % (oeroot, oeroot, cmd))[0].strip('\n')
  except:
    pass

python() {
  version_id = run_git(d, 'describe --dirty --long')
  if version_id:
    d.setVar('VERSION_ID', version_id)
    versionList = version_id.split('-')
    version = versionList[0] + "-" + versionList[1]
    d.setVar('VERSION', version)

  build_id = run_git(d, 'describe --abbrev=0')
  if build_id:
    d.setVar('BUILD_ID', build_id)
}

OS_RELEASE_FIELDS_append = " BUILD_ID"

The first build using this .bbappend works fine: /etc/os-release is generated with VERSION_ID, VERSION, and BUILD_ID set to the ‘git describe’ output as intended. Subsequent builds always skip this recipe’s tasks even after a commit is made to the metadata repo. What’s going on?
-Flummoxed by the Cache

Dear Flummoxed by the Cache,

As you’ve already surmised, you’ve run afoul of BitBake/Yocto’s caching systems. Note the plural. I’ve discovered three and I’m far from convinced that I’ve found them all. Understanding how they interact can be maddening and subtle as your example shows.

Parse Cache

Broadly speaking, BitBake operates in two phases: recipe parsing and task execution. In a clean Yocto Poky repository, there are 1000s of recipes. Reading these recipes from disk, parsing them, and performing post-processing takes a modest amount of time so naturally the results (a per-recipe set of tasks, their dependency information, and a bunch of other data) are cached. I haven’t researched the exact invalidation criteria used for this cache but suffice it to say that modifying the file or any included files is sufficient. If the cache is valid on the next BitBake invocation, the already parsed data is loaded from the cache and handed off to the execution phase.

Stamps

In the execution phase, each task is run in dependency order, often with much parallelism. Two separate but related mechanisms are used to speed up this phase: stamps and sstate (or shared state) cache. Keep in mind that a single recipe will generate multiple tasks (fetch, unpack, patch, configure, compile, install, package, etc). Each of these tasks assume that they operate on a common work directory (created under tmp/work/….) in dependency order. Stamps are files that record that a specific task has been completed and under what conditions (environment, variables, etc). This allows subsequent BitBake invocations to pick up where previous invocations left off (for example running ‘bitbake -c fetch’ and then ‘bitbake -c unpack’ won’t repeat the fetch).

Shared State cache (sstate)

You’re probably wondering what the sstate cache is for then. When I attempt to build core-image-minimal, one of the tasks is generating the root filesystem. To do so, that task needs the packages produced by all the included recipes. Worst case, everything is built from scratch as happens with a fresh build directory. Obviously that is slow so caching comes into play again. If the stamps discussed previously are available, the tasks can be restarted from the latest, valid stamp.

That helps with repeated, local builds but often Yocto is used in teams where changes submitted upstream will invalidate a bunch of tasks on the next merge. Since Yocto is almost hermetic, the packages generated by submitter’s builds will usually match the packages I generate locally as long as the recipe and environment are the same. The sstate cache maps the task hash to the output of that task. When a recipe includes a _setscene() suffixed version of a task, the sstate cache is used to retrieve the output instead of rerunning the task. This combined with sharing of a sstate cache allows for sharing of build results between users in a team. Read the Shared State Cache section in the Yocto Manual for details on how this works and how to setup a shared cache.

Back to the original problem

So, what is going wrong in the writer’s os-release recipe? If you read the Anonymous Python Function section in the BitBake Manual carefully, they are run at parsing time. That means the result of the function is saved in the parsing cache. Unless the cache is invalidated (usually by modifying the recipe or an imported class), the cached value of BUILD_ID will be used even if the stamps and sstate cache are invalidated. To get BUILD_ID to be re-evaluated on each run, the parse cache needs to be disabled for this recipe. That’s accomplished by setting BB_DONT_CACHE = “1” in the recipe.

Note that the stamps and sstate cache are still enabled. There are some subtle details about making sure BitBake is aware that BUILD_ID is indirectly referenced by the do_compile task so that it gets included in the task hash (see how OS_RELEASE_FIELDS is used in os-release.bb). That ensures that the task hash changes whenever the SHA1 of the OEROOT git repo HEAD changes which means the caches will be invalidated then as well.

Confused by all the caches yet?

-Brother Yocto