How I broke Gatsby JS conditional page build and learned to debug the tool chain

I built my AWS CodeBuild Pipeline with a new feature called “ Conditional Page Builds” and it did not worked as expected in the build environment.

Starting the build process again with no change in the source code and with identical copies of the .cache and public folder generated this output:

a second call to GATSBY_EXPERIMENTAL_PAGE_BUILD_ON_DATA_CHANGES=true gatsby build --log-pages immediately following the first one shows the expected output:

This proves the fact that the conditional page build is working and no page contains any data which is different on each creation.

Comparing the two outputs I found these problems:

  1. Cache invalidation by changed plugins
    info One or more of your plugins have changed since the last time you ran Gatsby. As a precaution, we're deleting your site's cache to ensure there's no stale data.
  2. Rewriting compilation hashes
  3. Regenerating Images
    success Generating image thumbnails

To open an issue on github I had to create a simple setup simulating the problems with the environment.

Issue 1 and 2 (posted on github) where my fault by creating a temporal file with a name starting with gatsby-* which conflicted with a part of gatsby creating a hash on the content of all files starting with this name.

creating a simpler setup

  1. create a gatsby site based on the default starter template
  2. upgrade gatsby and the tool chain to the latest stable versions

on the first run of GATSBY_EXPERIMENTAL_PAGE_BUILD_ON_DATA_CHANGES=true yarn build --log-pages the result was as expected - there was no.cache and no public folder.

The build process creates the compilation hashes [24], the static HTML [26] and the image thumbnails [28].

running the same build command, a second time gives this output:

as expected the following steps have been skipped:

  • Rewriting compilation hashes
  • Building static HTML 0/0
  • Generating Images

simulating the build process of the build Pipeline

  1. create temporal directory
  2. clone the git repository to a temporal directory
  3. restore the build cache
  4. run the build command
  5. save the build cache

a simple bash script codepipeline.sh would look like this (don't use this for production as many edge cases are missing!):

Run this script twice and you will see that on the second run the image generation still happens:

I documented my environment and posted an issue on github after not finding any other issue with a similar problem.

Since I am not very patient, I decided to look at the problem myself.

How to debug Gatsby JS in VS Code

The first thing I do when testing out a new framework is setting up the Javascript debugger in VS Code. Gatsby provides a very good documentation which makes this a breeze.

Add this to your existing VS Code launch.json to debug conditional page building:

activate searching in the node_modules folder

open the VS Code configuration .vscode/settings.json and add this:

Dig into the problem

The goal is to find the place in the code where the logic is which compares values from the cache with the existing environment and decides to rerun the image generation.

  1. let’s search for Generating image thumbnails in the node_modules folder the result will list CHANGELOG.md and node_modules/gatsby-plugin-sharp/utils.js:38 looking into the second file will show us the code which prints the success message
  2. set a break point in this line [38]
  3. remove the .cache and public folder: rm -fr .cache/ ./public/
  4. start the VS Code Debugger and choose Gatsby build Conditional from the pulldown
  5. the debugger stops at line 38
  1. look at the CALL STACK in the left column and click on the second line - this is the script which called this function

We see that this code part of the gatsby-plugin-sharp is activated by the event CREATE_JOB_V2

If this would have been a normal function call, simple looking up the CALL STACK would be easy, but this is an event which could have been raised anywhere in the code based

Cancel the debugger and search again for CREATE_JOB_V2 inside the node_modules folder the search result contains 6 files where node_modules/gatsby/dist/redux/actions/public.js looks most promising

looking at the 8 lines of this function:

  • line 2 gets the state from the redux store which was populated from the .cache folder
  • line 3 creates a new Job object
  • line 4 gets the content digest hash from the new job
  • line 8 does a lookup in the redux store if the job has been done before
  1. set a breakpoint on const internalJob = createInternalJob(job, plugin);
  2. remove the .cache and public folder: rm -fr .cache/ ./public/
  3. run debugger — choose Gatsby build Conditional from the pulldown
  4. when the debugger pauses we can inspect the job attributes

The left column shows us the following attributes with absolute paths /User/xxxxxxxx/...:
job.inputPaths, job.outputDir, plugin.pluginFilepath, plugin.resolve

Step into the function createInternalJob to check on the creation of the diggest

scrolldown until you see the place where the contentDigest is created:

We found the Problem

The hash created by Gatsby Job Queue Manager will always break when compared to values in the cache which have been created in a source folder with a different absolute path as long as the Job and the Plugin object contain absolute paths.

I also create an issue on github where you can follow the next steps.

Quick Fix

Always copy the source folder to the same path which has been used when the .cache folder was created.

Long term Solutions to the Problem

  • convert the object to a JSON string and replace the path to the root folder with a stable string or hash (still only a quick fix)
  • do not allow to store any absolute path in any Gatsby data structures: Job objects, plugin objects

I hope I could give some people a first impression on how to track down a problem by yourselves and provide maintainers of a project a better documentation to fix architectural problems.

Originally published at https://www.heissenberger.at.

Fast-track professional successful in the design, development and deployment of technology strategies and policy. Experienced leading Internet and IS operations