Improve docs on Gradle User Home caching

- Describe the limitations/properties of the GitHub Actions cache
- Document the algorithm for generating a cache key, and the way that cache entries are matched
- Describe in more detail how entries are de-duplicated
- Explain how cache entries can be optimized in Job pipelines

Fixes #831
Fixes #608
This commit is contained in:
daz 2023-08-17 14:29:33 -06:00
parent f9b4995b32
commit 193108951e
No known key found for this signature in database
2 changed files with 143 additions and 63 deletions

204
README.md
View file

@ -188,9 +188,9 @@ Use the `gradle-executable` input to execute using a specific Gradle installatio
This mechanism can also be used to target a Gradle wrapper script that is located in a non-default location.
## Caching
## Caching build state between Jobs
By default, this action aims to cache any and all reusable state that may be speed up a subsequent build invocation.
The `gradle-build-action` will use the GitHub Actions cache to save and restore reusable state that may be speed up a subsequent build invocation. This includes most content that is downloaded from the internet as part of a build, as well as expensive to create content like compiled build scripts, transformed Jar files, etc.
The state that is cached includes:
- Any distributions downloaded to satisfy a `gradle-version` parameter ;
@ -198,11 +198,35 @@ The state that is cached includes:
To reduce the space required for caching, this action makes a best effort to reduce duplication in cache entries.
### Disabling caching
Caching is enabled by default. You can disable caching for the action as follows:
```yaml
cache-disabled: true
```
### Using the cache read-only
By default, the `gradle-build-action` will only write to the cache from Jobs on the default (`main`/`master`) branch.
Jobs on other branches will read entries from the cache but will not write updated entries.
See [Optimizing cache effectiveness](#optimizing-cache-effectiveness) for a more detailed explanation.
In some circumstances it makes sense to change this default, and to configure a workflow Job to read existing cache entries but not to write changes back.
You can configure read-only caching for the `gradle-build-action` as follows:
```yaml
cache-read-only: true
```
You can also configure read-only caching only for certain branches:
```yaml
# Only write to the cache for builds on the 'main' and 'release' branches. (Default is 'main' only.)
# Builds on other branches will only read existing entries from the cache.
cache-read-only: ${{ github.ref != 'refs/heads/main' && github.ref != 'refs/heads/release' }}
```
### Incompatibility with other caching mechanisms
When using `gradle-build-action` we recommend that you avoid using other mechanisms to save and restore the Gradle User Home.
@ -213,62 +237,6 @@ Specifically:
Using either of these mechanisms may interfere with the caching provided by this action. If you choose to use a different mechanism to save and restore the Gradle User Home, you should disable the caching provided by this action, as described above.
### Cache keys
Distributions downloaded to satisfy a `gradle-version` parameter are stored outside of Gradle User Home and cached separately. The cache key is unique to the downloaded distribution and will not change over time.
The state of the Gradle User Home is highly dependent on the Gradle execution, so the cache key is composed of the current commit hash and the GitHub actions job id.
As such, the cache key is likely to change on each subsequent run of GitHub actions.
This allows the most recent state to always be available in the GitHub actions cache.
To reduce duplication between cache entries, certain artifacts are cached independently based on their identity.
Artifacts that are cached independently include downloaded dependencies, downloaded wrapper distributions and generated Gradle API jars.
For example, this means that all jobs executing a particular version of the Gradle wrapper will share common entries for wrapper distributions and for generated Gradle API jars.
### Using the caches read-only
By default, the `gradle-build-action` will only write to the cache from Jobs on the default (`main`/`master`) branch.
Jobs on other branches will read entries from the cache but will not write updated entries.
See [Optimizing cache effectiveness](#optimizing-cache-effectiveness) for a more detailed explanation.
In some circumstances it makes sense to change this default, and to configure a workflow Job to read existing cache entries but not to write changes back.
You can configure read-only caching for the `gradle-build-action` as follows:
```yaml
# Only write to the cache for builds on the 'main' and 'release' branches. (Default is 'main' only.)
# Builds on other branches will only read existing entries from the cache.
cache-read-only: ${{ github.ref != 'refs/heads/main' && github.ref != 'refs/heads/release' }}
```
### Stopping the Gradle daemon
By default, the action will stop all running Gradle daemons in the post-action step, prior to saving the Gradle User Home state.
This allows for any Gradle User Home cleanup to occur, and avoid file-locking issues on Windows.
If caching is unavailable or the cache is in read-only mode, the daemon will not be stopped and will continue running after the job is completed.
### Gradle User Home cache tuning
As well as any wrapper distributions, the action will attempt to save and restore the `caches` and `notifications` directories from Gradle User Home.
The contents to be cached can be fine tuned by including and excluding certain paths with Gradle User Home.
```yaml
# Cache downloaded JDKs in addition to the default directories.
gradle-home-cache-includes: |
caches
notifications
jdks
# Exclude the local build-cache and keyrings from the directories cached.
gradle-home-cache-excludes: |
caches/build-cache-1
caches/keyrings
```
You can specify any number of fixed paths or patterns to include or exclude.
File pattern support is documented at https://docs.github.com/en/actions/learn-github-actions/workflow-syntax-for-github-actions#patterns-to-match-file-paths.
### Cache debugging and analysis
Gradle User Home state will be restored from the cache during the first `gradle-build-action` step for any workflow job.
@ -285,6 +253,88 @@ env:
Note that this setting will also prevent certain cache operations from running in parallel, further assisting with debugging.
#### Stopping the Gradle daemon
By default, the action will stop all running Gradle daemons in the post-action step, prior to saving the Gradle User Home state.
This allows for any Gradle User Home cleanup to occur, and avoid file-locking issues on Windows.
If caching is unavailable or the cache is in read-only mode, the daemon will not be stopped and will continue running after the job is completed.
### How Gradle User Home caching works
#### Properties of the GitHub Actions cache
The GitHub Actions cache has some properties that present problems for efficient caching of the Gradle User Home.
- Immutable entries: once a cache entry is written for a key, it cannot be overwritten or changed.
- Branch scope: cache entries written for a Git branch are not visible from actions running against different branches. Entries written for the default branch are visible to all. https://docs.github.com/en/actions/using-workflows/caching-dependencies-to-speed-up-workflows#restrictions-for-accessing-a-cache
- Restore keys: if no exact match is found, a set of partial keys can be provided that will match by cache key prefix. https://docs.github.com/en/actions/using-workflows/caching-dependencies-to-speed-up-workflows#matching-a-cache-key
Each of these properties has influenced the design and implementation of the caching in `gradle-build-action`, as described below.
#### Which content is cached
Using experiments and observations, we have attempted to identify which Gradle User Home content is worth saving and restoring between build invocations. We considered both the respective size of the content and the impact this content has on build times. As well as the obvious candidates like downloaded dependencies, we saw that compiled build scripts, transformed Jar files and other content can also have a significant impact.
In the end, we opted to save and restore as much content as is practical, including:
- `caches/<version>/generated-gradle-jars`: These files are generated on first use of a particular Gradle version, and are expensive to recreate
- `caches/<version>/kotlin-dsl` and `caches/<version>/scripts`: These are the compiled build scripts. The Kotlin ones in particular can benefit from caching.
- `caches/modules-2`: The downloaded dependencies
- `caches/transforms-3`: The results of artifact transforms
- `caches/jars-9`: Jar files that have been processed/instrumented by Gradle
- `caches/build-cache-1`: The local build cache
In certain cases a particular section of Gradle User Home will be too large to make caching effective. In these cases, particular subdirectories can be excluded from caching. See [Exclude content from Gradle User Home cache](#exclude-content-from-gradle-user-home-cache).
#### Cache keys
The actual content of the Gradle User Home after a build is the result of many factors, including:
- Core Gradle build files (`settngs.gradle[.kts]`, `build.gradle[.kts]`, `gradle.properties`)
- Associated Gradle configuration files (`gradle-wrapper.properties`, `dependencies.toml`, etc)
- The entire content of `buildSrc` or any included builds that provide plugins.
- The entire content of the repository, in the case of the local build cache.
- The actual build command that was invoked, including system properties and environment variables.
For this reason, it's very difficult to create a cache key that will deterministically map to a saved Gradle User Home state. So instead of trying to reliably hash all of these inputs to generate a cache key, the Gradle User Home cache key is based on the currently executing Job and the current commit hash for the repository.
The Gradle User Home cache key is composed of:
- The current operating system (`RUNNER_OS`)
- The workflow name and Job ID
- A hash of the Job matrix parameters
- The git SHA for the latest commit
Specifically, the cache key is: `${cache-protocol}-gradle|${runner-os}|${workflow-name}-${job-id}[${hash-of-job-matrix}]-${git-sha}`
As such, the cache key is likely to change on each subsequent run of GitHub actions.
This allows the most recent state to always be available in the GitHub actions cache.
#### Finding a matching cache entry
In most cases, no exact match will exist for the cache key. Instead, the Gradle User Home will be restored for the closest matching cache entry, using a set of "restore keys". The entries will be matched with the following precedence:
- An exact match on OS, workflow, job, matrix and Git SHA
- The most recent entry saved for the same OS, workflow, job and matrix values
- The most recent entry saved for the same OS, workflow and job
- The most recent entry saved for the same OS
Due to branch scoping of cache entries, the above match will be first performed for entries from the same branch, and then for the default ('main') branch.
After the Job is complete, the current Gradle User Home state will be collected and written as a new cache entry with the complete cache key. Old entries will be expunged from the GitHub Actions cache on a least-recently-used basis.
Note that while effective, this mechanism is not inherently efficient. It requires the entire Gradle User Home directory to be stored separately for each branch, for every OS+Job+Matrix combination. In addition, a new cache entry to be written on every GitHub Actions run.
This inefficiency is effectively mitigated by [Deduplication of Gradle User Home cache entries](#deduplication-of-gradle-user-home-cache-entries), and can be further optimized for a workflow using the techniques described in [Optimizing cache effectiveness](#optimizing-cache-effectiveness).
#### Deduplication of Gradle User Home cache entries
To reduce duplication between cache entries, certain artifacts in Gradle User Home are extracted and cached independently based on their identity. This allows each Gradle User Home cache entry to be relatively small, sharing common elements between them without duplication.
Artifacts that are cached independently include:
- Downloaded dependencies
- Downloaded wrapper distributions
- Generated Gradle API jars
- Downloaded Java Toolchains
For example, this means that all jobs executing a particular version of the Gradle wrapper will share a single common entry for this wrapper distribution and one for each of the generated Gradle API jars.
### Optimizing cache effectiveness
Cache storage space for GitHub actions is limited, and writing new cache entries can trigger the deletion of existing entries.
@ -292,7 +342,20 @@ Eviction of shared cache entries can reduce cache effectiveness, slowing down yo
There are a number of actions you can take if your cache use is less effective due to entry eviction.
#### Select branches that should write to the cache
At the end of a Job, the `gradle-build-action` will write a summary of the Gradle builds executed, together with a detailed report of the cache entries that were read and written during the Job. This report can provide valuable insights that may help to determine the right way to optimize the cache usage for your workflow.
#### Select which jobs should write to the cache
Consider a workflow that first runs a Job "compile-and-unit-test" to compile the code and run some basic unit tests, which is followed by a matrix of parallel "integration-test" jobs that each run a set of integration tests for the repository. Each "integration test" Job requires all of the dependencies required by "compile-and-unit-test", and possibly one or 2 additional dependencies.
By default, a new cache entry will be written on completion of each integration test job. If no additional dependencies were downloaded then this cache entry will share the "dependencies" entry with the "compile-and-unit-test" job, but if a single dependency was downloaded then an entire new "dependencies" entry would be written. (The `gradle-build-action` does not _yet_ support a layered cache that could do this more efficiently). If each of these "integration-test" entries with their different "dependencies" entries is too large, then it could result in other important entries being evicted from the GitHub Actions cache.
There are some techniques that can be used to avoid/mitigate this issue:
- Configure the "integration-test" jobs with `cache-read-only: true`, meaning that the Job will use the entry written by the "compile-and-unit-test" job. This will avoid the overhead of cache entries for each of these jobs, at the expense of re-downloading any additional dependencies required by "integration-test".
- Add an additional step to the "compile-and-unit-test" job which downloads all dependencies required by the integration-test jobs but does not execute the tests. This will allow the "dependencies" entry for "compile-and-unit-test" to be shared among all cache entries for "integration-test". The resulting "integration-test" entries should be much smaller, reducing the potential for eviction.
- Combine the above 2 techniques, so that no cache entry is written by "integration-test" jobs, but all required dependencies are already present from the restored "compile-and-unit-test" entry.
#### Select which branches should write to the cache
GitHub cache entries are not shared between builds on different branches.
This means that each PR branch will have it's own Gradle User Home cache, and will not benefit from cache entries written by other PR branches.
@ -311,11 +374,28 @@ Similarly, you could use `cache-read-only` for certain jobs in the workflow, and
#### Exclude content from Gradle User Home cache
As well as any wrapper distributions, the action will attempt to save and restore the `caches` and `notifications` directories from Gradle User Home.
Each build is different, and some builds produce more Gradle User Home content than others.
[Cache debugging ](#cache-debugging-and-analysis) can provide insight into which cache entries are the largest,
and you can selectively [exclude content using `gradle-home-cache-exclude`](#gradle-user-home-cache-tuning).
and the contents to be cached can be fine tuned by including and excluding certain paths within Gradle User Home.
#### Removing unused files from Gradle User Home before saving to cache
```yaml
# Cache downloaded JDKs in addition to the default directories.
gradle-home-cache-includes: |
caches
notifications
jdks
# Exclude the local build-cache and keyrings from the directories cached.
gradle-home-cache-excludes: |
caches/build-cache-1
caches/keyrings
```
You can specify any number of fixed paths or patterns to include or exclude.
File pattern support is documented at https://docs.github.com/en/actions/learn-github-actions/workflow-syntax-for-github-actions#patterns-to-match-file-paths.
#### Remove unused files from Gradle User Home before saving to cache
The Gradle User Home directory has a tendency to grow over time. When you switch to a new Gradle wrapper version or upgrade a dependency version
the old files are not automatically and immediately removed. While this can make sense in a local environment, in a GitHub Actions environment
@ -325,7 +405,7 @@ In order to avoid this situation, the `gradle-build-action` supports the `gradle
When enabled, this feature will attempt to delete any files in the Gradle User Home that were not used by Gradle during the GitHub Actions workflow,
prior to saving the Gradle User Home to the GitHub Actions cache.
Gradle Home cache cleanup is disabled by default. You can enable this feature for the action as follows:
Gradle Home cache cleanup is considered experimental and is disabled by default. You can enable this feature for the action as follows:
```yaml
gradle-home-cache-cleanup: true
```

View file

@ -66,7 +66,7 @@ export class CacheKey {
* - The cache protocol version
* - The name of the cache
* - The runner operating system
* - The name of the Job being executed
* - The name of the workflow and Job being executed
* - The matrix values for the Job being executed (job context)
* - The SHA of the commit being executed
*