Conflicted about sysroots

Dear Brother Yocto,
I have two images: production-image.bb and manufacturing-image.bb. The production-image includes a recipe, production-systemd-scripts, and the manufacturing-image includes a different recipe, manufacturing-systemd-scripts. The two recipes have different versions of a systemd service, startup-ui.service. One starts the production app and the other does not. If I start with a clean tmp directory they both build fine, but if I build one first and then build the other, I get an error.


kc8apf@blog$ bitbake production-image

Summary: There was 1 WARNING message shown.
(it works)
kc8apf@blog$ bitbake manufacturing-image

ERROR: manufacturing-systemd-scripts-0.0.1+gitAUTOINC+60423a2c5f-r0 do_populate_sysroot: The recipe manufacturing-systemd-scripts is trying to install files into a shared area when those files already exist. Those files and their manifest location are:
/home/user/fsl-release-bsp/x11_ull_build/tmp/sysroots/imx6ull14x14evk/lib/systemd/system/startup-ui.service
Matched in b’manifest-imx6ull14x14evk-pika.populate_sysroot’
Please verify which recipe should provide the above files.
The build has stopped as continuing in this scenario WILL break things, if not now, possibly in the future (we’ve seen builds fail several months later). If the system knew how to recover from this automatically it would however there are several different scenarios which can result in this and we don’t know which one this is. It may be you have switched providers of something like virtual/kernel (e.g. from linux-yocto to linux-yocto-dev), in that case you need to execute the clean task for both recipes and it will resolve this error. It may be you changed DISTRO_FEATURES from systemd to udev or vice versa. Cleaning those recipes should again resolve this error however switching DISTRO_FEATURES on an existing build directory is not supported, you should really clean out tmp and rebuild (reusing sstate should be safe). It could be the overlapping files detected are harmless in which case adding them to SSTATE_DUPWHITELIST may be the correct solution. It could also be your build is including two different conflicting versions of things (e.g. bluez 4 and bluez 5 and the correct solution for that would be to resolve the conflict. If in doubt, please ask on the mailing list, sharing the error and filelist above.
ERROR: manufacturing-systemd-scripts-0.0.1+gitAUTOINC+60423a2c5f-r0 do_populate_sysroot: If the above message is too much, the simpler version is you’re advised to wipe out tmp and rebuild (reusing sstate is fine). That will likely fix things in most (but not all) cases.
ERROR: manufacturing-systemd-scripts-0.0.1+gitAUTOINC+60423a2c5f-r0 do_populate_sysroot: Function failed: sstate_task_postfunc

The recipe production-systemd-scripts is not in the image I asked to build. Why doesn’t bitbake remove the output of unused recipes from tmp/sysroots before I get this error?

-Conflicted about sysroots

 

Dear Conflicted,
When I first started with Yocto, I assumed that a recipe is built in isolation and should have no side-effects on other packages.  Of course, that isn’t strictly true as a recipe has access to all of the files installed by its dependencies (both build-time and run-time).  If I were designing a new build system, I’d probably implement the build steps as:
  1. Create an empty directory
  2. Install all dependent packages into that directory
  3. Enter this directory as a “chroot”
  4. Build the recipe
  5. Package the results

The Problem

On the surface, this seems great: each package only has access to the files provided by its dependency chain (note that transitive dependencies are still a problem). There is a serious problem lurking under the surface though.  Assume I have two recipes: A and B.  A and B both depend on a common library C but they require different, incompatible major versions.  Let’s say A depends on C v1.0.0 while B requires C v2.0.0.  The builds of A and B will succeed because each build environment includes only its own dependency chain.  When the resulting packages are installed, however, one of the packages will be broken.  Why?  Because the two major versions of C use the same library and header names.  When C v2.0.0 is installed after C v1.0.0, B will work just fine but A will either fail to find some symbols or, worse, start up and then crash due to calling functions with the wrong arguments.

“Wait!,” I hear you screaming, “this is already handled by library major versions using different library names.”  That’s true, most major libraries have encountered this exact problem and worked around it by effectively creating a whole new project for the new major version.  That’s only a solution if you know of the hazard though.  What about smaller libraries with developers that haven’t encountered this situation before?  They can wreak havoc on an OS build. This type of conflict happens often enough that every OS build system I’ve encountered has some type of warning or error when this either happens or is possible.

Enter the sysroot

How does Yocto deal with this situation? As the error messages you encountered indicate, it has something to do with sysroots.  Before I dive into details, please note that Yocto’s sysroot behavior changed in 2.5 (Sumo).  I’ll be referencing documentation from 2.5 because it has much better descriptions of the build process.  When 2.5 behavior differs from prior releases, I’ll indicate this and explain the differences.

Section 4.3 of Yocto Project Overview and Concepts Manual provides a thorough introduction to Yocto’s various build stages, their inputs, and their outputs. While it does discuss sysroots, it doesn’t provide a succinct description of what they are used for.  For that, we need to look at Yocto Reference Manual’s staging.bbclass section. The rough concept is that sysroots are the mechanism used to share files between recipes.

As described in Section 4.3.5.3 of Yocto Project Overview and Concepts Manual, a recipe’s do_install task is responsible for installing a recipe’s build products into the recipe’s staging directory ($D).  Why doesn’t it install directly into the sysroot? Two reasons.

First, the sysroot already contains the files installed by every dependency.  While this might work to build a single image, isolating the recipe’s files allows the recipe’s output to be cached, bundled into a package (deb, rpm, opkg, etc), and used in multiple images that may have different collections of recipes installed.

Second, instead of simply creating a single package containing every file installed by the recipe, Yocto splits the installed files into multiple packages such as -dev and -dbg to allow finer-grain control over image contents. That way a production image only includes executables while the matching debug image can include symbol files and the developer image has all the headers and static libraries. In addition to the packages used to construct images, the do_populate_sysroot task copies files and directories specified by the SYSROOT_DIR* variables to a staging area that is made available at build time of other recipe’s that depend on this recipe.  Note that this means that a recipe that depends on another recipe, A, may have a different set of files available than if you installed an SDK that includes the A-dev package.  Put another way, sysroots are how files are shared between recipes during a build of an image while -dev packages are used to construct the final contents of an SDK image.

But that doesn’t seem to cause a problem….

As mentioned in The Problem, sysroot problems are rarely encountered during the build or install phases of a recipe.  There are two places after building where sysroots can run into problems: during staging (do_populate_sysroot) and during setup as a dependency of another recipe (do_prepare_recipe_sysroot).

During staging, the files matching SYSROOT_DIRS* are copied into a sysroot. If a dependency has already staged a file with the same path, it would be overwritten.  Because this is likely to be a Very Bad Thing™, Yocto raises a warning.  I would expect this to be uncommon except for one big caveat: before Yocto 2.5, a single, global sysroot was used for all recipes.  Instead of file path uniqueness within a dependency chain being required, globally unique file paths were necessary.  As illustrated in the figure below, Yocto 2.5 generates a sysroot for each recipe based solely on its dependencies which avoids path collisions from recipes with independent dependency chains.

Diagram showing a recipe receives its own working directory under $TMPDIR/work/$PN/$PV-$PR. Inside is a build directory ($BPN-$PV), a destination (image), and two sysroots (recipe-sysroot, recipe-sysroot-native)
Recipe work directory layout from Yocto Project Overview and Concept Manual, Section 4.3.5.3
The second place where collisions can happen is when constructing a recipe’s initial sysroot based on its dependency chain. As described in The Problem, the individual dependencies might build without issue but still result in a conflict when they are installed together.  Going back to that example, an image D that depends on both A and B will result in two different versions of C being installed that overwrite each other.  Note that this only happens in Yocto 2.5 and later. In earlier versions of Yocto, these collisions would be encountered during staging into the global sysroot.

Wrapping up

So, what’s going on in your build?  You described two recipes installing a file to the same path but those recipes are never used together in a single image.  This should work in general and it will work in Yocto 2.5 and later.  For earlier versions, the global sysroot will cause a conflict.

A common workaround is to use a single image with custom IMAGE_FEATURES to select which recipe is included. Any modifications to IMAGE_FEATURES or DISTRO_FEATURES is a signal that a new build directory should be used.  The downside is that the production and manufacturing images will be built independently and may not be representative of each other.

Another approach is to use separate systemd unit names for production-systemd-scripts and manufacturing-systemd-scripts. Assuming your main concern is controlling which services are started during boot, the names of the units shouldn’t matter.  In fact, a recipe named production-systemd-scripts points to a bigger, conceptual issue: systemd unit files should be coupled with the service that the unit controls.  That way, the systemd units started reflect the services available rather than a desired behavior.  Behaviors should be separate  recipes that install a single systemd unit whose presence triggers that behavior.  If you have a systemd unit that automatically starts manufacturing tests, put that in an appropriately-named recipe separate from the tests themselves.  That way, the tests can be installed on top of a production image without triggering the manufacturing workflow.  Even better is if you create your own pseudo-targets for these behaviors and then use WantedBy and Conflicts to allow switching between modes by activating or deactivating the psuedo-targets.  That gets you a single image that can be used for both manufacturing test and production.

— Brother Yocto

 

Snakes in a Sysroot

Dear Brother Yocto,
Here’s a weird bug I found on my setup: pythonnative seems to be using
python3.4.3 whereas the recipe specified 2.7.

*What I tested:*
I found the code in phosphor-settings-manager, which inherits pythonnative,
seems to be working per python 3, so added the following line into a native
python function:

python do_check_version () {
  import sys
  bb.warn(sys.version)
}
addtask check_version after do_configure before do_compile

WARNING: phosphor-settings-manager-1.0-r1 do_check_version: 3.4.3 (default,
Nov 17 2016, 01:08:31)

This is with openbmc git tip, building MACHINE=zaius

*What the recipe should be using is seemingly 2.7:*
$ cat ../import-layers/yocto-poky/meta/classes/pythonnative.bbclass

inherit python-dir
[...]

$ cat ../import-layers/yocto-poky/meta/classes/python-dir.bbclass

PYTHON_BASEVERSION = "2.7"
PYTHON_ABI = ""
PYTHON_DIR = "python${PYTHON_BASEVERSION}"
PYTHON_PN = "python"
PYTHON_SITEPACKAGES_DIR = "${libdir}/${PYTHON_DIR}/site-packages"

*Why this is bad:*
well, python2.7 and python3 have a ton of differences. A bug I discovered
is due to usage of filter(), which returns different objects between
Python2/3.

Before diving into the layers of bitbake recipes to root cause what caused
this, I want to know if you have ideas why this went wrong.

-Snakes in a Sysroot

Dear Snakes in a Sysroot,
As you’ve already discovered, the version of Python used varies depending on context within a Yocto build. Recipes, by default, are built to run on the target machine and distro. For most recipes, that is what we want: we’re building a recipe so it can be included in the rootfs. In some cases, a tool needs to be run on the machine running bitbake. An obvious example is a compiler that generates binaries for the target machine. Of course, that compiler likely has dependencies on libraries that must be generated for the machine that runs that compiler. This leads to three categories of recipes: native, cross, and target. Each of these categories place their output into different directories (sysroots) both to avoid file collisions and to allow BitBake to select which tools are available to a recipe by altering the PATH environment variable.

native
Binaries and libraries generated by these recipes are executed on the machine running bitbake (host machine). Any binary that generates other binaries (compiler, linker, etc) will also generate binaries for the host machine.

cross
Similar to native, the binaries and libraries generated by these recipes are executed on the host machine. The difference is that any binary that generates other binaries will generate binaries for the target machine.

target
Binaries and libraries generated by these recipes are executed on the target machine only.

For a recipe like Python, it is common to have a native and a target recipe (python and python-native) that use a common include file to specify the version, most of the build process, etc. That’s what you see in yocto-poky/meta/classes/python-dir.bbclass.

So, why isn’t python-native being used to run the python function in your recipe? A key insight is that you are using ‘addtask’ to tell BitBake to run this python function as a task. That means it is being run in the context of BitBake itself, not as a command executed by a recipe’s task. The sysroots described above don’t apply since a new python instance is not being executed. Instead, the recipe is run inside the existing BitBake python process. This is also how in-recipe python methods can alter BitBake’s variables and tasks (see BitBake User Manual Section 3.5.2 specifically that the ‘bb’ package is already imported and ‘d’ is available as a global variable). Since bitbake is executed directly from the command line and is a script, its shebang line tells us it will be run under python3.

If you want to run a python script as part of the recipe’s build process (perhaps a tool written in python), you’d execute that script from within a task:

do_check_version() {
  python -c '
import sys
print(sys.version)
'
}
addtask check_version after do_configure before do_compile

Since python is being executed within a task, the version of python in the selected sysroot will be used. The output will not be displayed on the console in BitBake’s output. Instead, it will be written to the task’s log file within the current build directory. If you need to do this, I highly suggest writing the script as a separate file that is included in the package rather than trying to contain it all inside the recipe.

All about those caches

Dear Brother Yocto,
When building an image, I want to include information about the metadata repository that was used by the build.  The os-release recipe seems to be made for this purpose.  I added a .bbappend to my layer with the following:

def run_git(d, cmd):
  try:
    oeroot = d.getVar('COREBASE', True)
    return bb.process.run("git --work-tree %s --git-dir %s/.git %s"
        % (oeroot, oeroot, cmd))[0].strip('\n')
  except:
    pass

python() {
  version_id = run_git(d, 'describe --dirty --long')
  if version_id:
    d.setVar('VERSION_ID', version_id)
    versionList = version_id.split('-')
    version = versionList[0] + "-" + versionList[1]
    d.setVar('VERSION', version)

  build_id = run_git(d, 'describe --abbrev=0')
  if build_id:
    d.setVar('BUILD_ID', build_id)
}

OS_RELEASE_FIELDS_append = " BUILD_ID"

The first build using this .bbappend works fine: /etc/os-release is generated with VERSION_ID, VERSION, and BUILD_ID set to the ‘git describe’ output as intended. Subsequent builds always skip this recipe’s tasks even after a commit is made to the metadata repo. What’s going on?
-Flummoxed by the Cache

Dear Flummoxed by the Cache,

As you’ve already surmised, you’ve run afoul of BitBake/Yocto’s caching systems. Note the plural. I’ve discovered three and I’m far from convinced that I’ve found them all. Understanding how they interact can be maddening and subtle as your example shows.

Parse Cache

Broadly speaking, BitBake operates in two phases: recipe parsing and task execution. In a clean Yocto Poky repository, there are 1000s of recipes. Reading these recipes from disk, parsing them, and performing post-processing takes a modest amount of time so naturally the results (a per-recipe set of tasks, their dependency information, and a bunch of other data) are cached. I haven’t researched the exact invalidation criteria used for this cache but suffice it to say that modifying the file or any included files is sufficient. If the cache is valid on the next BitBake invocation, the already parsed data is loaded from the cache and handed off to the execution phase.

Stamps

In the execution phase, each task is run in dependency order, often with much parallelism. Two separate but related mechanisms are used to speed up this phase: stamps and sstate (or shared state) cache. Keep in mind that a single recipe will generate multiple tasks (fetch, unpack, patch, configure, compile, install, package, etc). Each of these tasks assume that they operate on a common work directory (created under tmp/work/….) in dependency order. Stamps are files that record that a specific task has been completed and under what conditions (environment, variables, etc). This allows subsequent BitBake invocations to pick up where previous invocations left off (for example running ‘bitbake -c fetch’ and then ‘bitbake -c unpack’ won’t repeat the fetch).

Shared State cache (sstate)

You’re probably wondering what the sstate cache is for then. When I attempt to build core-image-minimal, one of the tasks is generating the root filesystem. To do so, that task needs the packages produced by all the included recipes. Worst case, everything is built from scratch as happens with a fresh build directory. Obviously that is slow so caching comes into play again. If the stamps discussed previously are available, the tasks can be restarted from the latest, valid stamp.

That helps with repeated, local builds but often Yocto is used in teams where changes submitted upstream will invalidate a bunch of tasks on the next merge. Since Yocto is almost hermetic, the packages generated by submitter’s builds will usually match the packages I generate locally as long as the recipe and environment are the same. The sstate cache maps the task hash to the output of that task. When a recipe includes a _setscene() suffixed version of a task, the sstate cache is used to retrieve the output instead of rerunning the task. This combined with sharing of a sstate cache allows for sharing of build results between users in a team. Read the Shared State Cache section in the Yocto Manual for details on how this works and how to setup a shared cache.

Back to the original problem

So, what is going wrong in the writer’s os-release recipe? If you read the Anonymous Python Function section in the BitBake Manual carefully, they are run at parsing time. That means the result of the function is saved in the parsing cache. Unless the cache is invalidated (usually by modifying the recipe or an imported class), the cached value of BUILD_ID will be used even if the stamps and sstate cache are invalidated. To get BUILD_ID to be re-evaluated on each run, the parse cache needs to be disabled for this recipe. That’s accomplished by setting BB_DONT_CACHE = “1” in the recipe.

Note that the stamps and sstate cache are still enabled. There are some subtle details about making sure BitBake is aware that BUILD_ID is indirectly referenced by the do_compile task so that it gets included in the task hash (see how OS_RELEASE_FIELDS is used in os-release.bb). That ensures that the task hash changes whenever the SHA1 of the OEROOT git repo HEAD changes which means the caches will be invalidated then as well.

Confused by all the caches yet?

-Brother Yocto

A Hash Mismatch Made in BitBake

Dear Brother Yocto,

These ‘vardeps’ variables in Yocto are entirely magic and undocumented from my best reading of the Mega Manual. There is something like one paragraph that says “you might need to change / add these to make your recipe work in some cases”.  It would be nice to have an explanation of when to use vardeps and when to vardepsexclude.  I get that they are something to do with the sstate caching hash.

–Tired of ‘Taskhash Mismatch’

Dear Tired of ‘Taskhash Mismatch’,

Task hashes and their interactions with various caching mechanisms are indeed a confusing part of Yocto.  I’ve always found “ERROR: os-release-1.0-r0 do_compile: Taskhash mismatch 2399d53cb73a44279bfdeece180c4689 verses 9298efe49a8bf60fc1862ebc05568d32 for openbmc/meta/recipes-core/os-release/os-release.bb.do_compile” to be surprisingly unhelpful in resolving the problem.  Worse is when the problem manifests as a failure to run a task when something has changed instead of an error.

As you noted, the Yocto Developer Manual and Yocto Reference Manual are sparse on this topic.  A very careful reader will catch a reference in Section 2.3.5 of the Reference Manual to the Variable Flags section of the BitBake User Manual.  While that section does describe what these magic [vardeps] and [vardepsexclude] incantations do, I suspect you, the reader, are left wondering why they are in an entirely different manual.  Even though the signs are there for the astute observer (what command do you run to build an image?), many Yocto users don’t realize Yocto is an infrastructure built upon an independent tool called BitBake.

Yocto is just BitBake metadata

Per the BitBake User Manual, “BitBake is a generic task execution engine that allows shell and Python tasks to be run efficiently and in parallel while working within complex inter-task dependency constraints.”  Yocto is a set of tasks executed by BitBake to assemble Linux distributions.  This leads to a few non-trivial insights that I’ll spare you from discovering on your own:

  • Yocto recipes are really BitBake recipes (hence why they use a .bb extension) and follow the syntax described in Chapter 3: Syntax and Operators.
  • Many of the “magic” variables used in recipes (PN, bindir, etc) are defined in meta/conf/bitbake.conf.
  • Recipe lookup is actually controlled by BBPATH, not the order of layers in bblayers.conf.  BBPATH just happens to be modified in each layer’s layer.conf (see Locating and Parsing Recipes).

Tasks and Caching

As a Yocto developer, I tend to think in terms of recipes (because they usually correspond to a package) but BitBake’s fundamental unit of work is a task.  A task is a shell or Python script that gets executed to accomplish something, usually generating some form of output.  Often a task depends on the output from some other task and BitBake offers mechanisms for specifying these dependencies (see Dependencies for details).  BitBake will execute the tasks with as much parallelism as possible while maintaining dependency order.

For smaller projects, a straightforward execution of the tasks, where each task is always run on every invocation, might be OK.  When building a full Linux distribution like Yocto does, every invocation could take hours.  Caching to the rescue!

BitBake has multiple caching mechanisms but for this discussion only the setscene (or sstate) cache is relevant.  Setscene is a BitBake mechanism for caching build artifacts and reusing them in subsequent invocations.  Conceptually, recipes can provide a setscene version of a task by creating a new task of the same name with a _setscene suffix.  This version of the task attempts to provide the build artifacts and, if successful, the main version of the task is skipped.  The above link contains a more in-depth discussion including a lot implementation details necessary for anyone writing a setscene task.

Cache Invalidation and the Task Hash

As with any caching system, the setscene cache needs a way to decide when to invalid an entry.  In the case of setscene, it needs to know when the task would generate meaningfully different output.  That’s exactly what a task hash is supposed to answer.

Note: The Yocto and BitBake manuals tend to use the terms task hash, checksum, and signature interchangeably.  I find this especially confusing when thinking about do_fetch because each source file also has a checksum associated with it.  I’ll be using task hash throughout here but keep in mind that you might see the other terms when reading the documentation.

Since a task is just a shell or Python script, taking the hash of the script should tell us when it changes, right?  Mostly, but not quite.  The task’s dependencies can easily affect the output of this task.  OK, so what if we combine the hash of the script with the hashes of all the dependencies and use that to identify the task?  Excellent! That’s more or less what a task hash is: a combined hash of the task basehash (represents the task itself) and the task hash of each dependency.

Why did I use the term “basehash” above instead of script hash?  Any variable used in the script has the potential for changing the script.  Consider a case where the current date is used as part of a build stamp.  Should the task be considered meaningfully changed whenever the date changes?   The basehash is a more flexible hash that allows specific variables to be excluded (or included) in the hash calculation.  By excluding variables such as the current date and time, the task becomes more stable and the artifacts can be reused more often.

Is that what [vardeps] and [vardepsexclude] do then?

Yup.  [vardeps] and [vardepsexclude] are Variable Flags that add or remove variables from the task’s hash.  When you get a “Taskhash mismatch” error like the one from os-release at the start of this post, BitBake is telling you that the task hash changed unexpectedly; the script didn’t change but the task hash did.  In the os-release case, the original os-release.bb from Yocto declared that the do_compile task was dependent on the VERSION, VERSION_ID, and BUILD_ID variables so those were included in its task hash.  OpenBMC’s  os-release.bbappend uses an anonymous Python function to construct the VERSION, VERSION_ID, and BUILD_ID variables by querying the metadata’s git repo.  Subsequently, the task hash will change even though the script didn’t actually change.  The fix was simple: remove those variable as task hash dependencies.

–Brother Yocto

Where do I get advice on Yocto?

Over the past three years, I’ve worked on a variety of projects that use embedded Linux distros built using Yocto Project. While I quickly realized the versatility of Yocto during the first project, learning the ins and outs of recipes, distros, and machines became a regular frustration that followed through each project. Not only is Yocto huge, the design patterns and tools change almost daily.

Yocto Project Reference Manual and BitBake User Manual contain a wealth of vital information about the core concepts and well-established tools but traditional documentation only gets you so far. As my understanding of Yocto grew, mostly by reading vast amounts of code while debugging problems, I became the local go-to expert in my teams. Instead of hacking away at the next task, I was answering questions whose answers required understanding how and why things were designed the way they were, information only available in the code itself. Yet there was nowhere to persist these conversations.

That’s how I came to start this series I’m calling “Dear Brother Yocto.” Originally I just wanted a place to share tips and tricks as I learn them. By the time I got the site up and running, I had seen enough requests for advice on the Yocto mailing list that having a write-in advice column style just made sense. Some posts will be from readers, some will be from my own questions. Regardless, I’ll dig into the code, find a solution, and explain what the heck is going on.

If you have a question about Yocto, please share it with me via Twitter (@kc8apf) or email (kc8apf at kc8apf dot net).