Containerization has become the industry standard for application packaging and deployment, and it is easy to understand why: isolation, reproducibility, ease of deployment, to name only a few of its advantages.
When authoring the
Dockerfile of your application, you have two choices:
- Building the application outside the Dockerfile and then copying the artifacts as an image layer
- Building the application inside the Dockerfile by running
dotnet publishfor .NET applications
Building inside the
Dockerfile has some advantages (deterministic builds, reproducibility on local dev machines, ...), but build speed isn't one of them. Fortunately, Docker offers some techniques to help us make the build faster, such as the build cache mechanism.
This feature is commonly used to optimize the dependencies' download/install step, so that it is only re-executed in subsequent builds if they were changed, which occurs way less often than source code changes.
Leveraging cache properly for the dependencies is straightforward for some languages/frameworks, but it can be a bit tricky to do it right for medium to large .NET applications. The goal of this article is to explain why, and describe a solution to this problem using the new open source .NET tool from Nimbleways created specially for this use case: dotnet-subset.
Let's step back and understand why we need Docker build caching in the first place. This is what a simple Dockerfile for a small .NET Web API looks like:
Dockerfile contains two build stages:
publishstage based on a full SDK image that builds the application
finalstage based on the smaller ASP.NET Core Runtime image that imports the artifacts from the previous stage and defines the entry point.
The first docker build executes all the instructions as there is no cache yet. On the second run,
COPY . . will compare the checksums of the copied files with the ones from the previous build. If they match, the subsequent instructions in the same stage may benefit from the cache if the instruction itself didn't change. If the checksums don't match, all caches are invalidated and subsequent instructions will execute. You can learn more about the caching behavior from the Docker documentation.
In our case that means that if no file in the project has changed,
dotnet publish's output from the previous run will be reused and our docker build will be extremely fast. But what happens if we change a C# source file ? Yes, you guessed right: the checksum changed, therefore, the
dotnet publish will be re-executed, and that's fine because we do want our code changes to be included in the new image. However, the
dotnet publish also does an implicit restore. Do all the dependencies need to be redownloaded/reinstalled when only C# source code files were changed ? Probably not.
That is why the official documentation provides a better
Dockerfile for .NET application.
The official recommended solution
Below is the Dockerfile of a simple ASP.NET Core project, taken from the official documentation:
Also from the same documentation:
In the preceding Dockerfile, the
*.csprojfiles are copied and restored as distinct layers. When the
docker buildcommand builds an image, it uses a built-in cache. If the
*.csprojfiles haven't changed since the
docker buildcommand last ran, the
dotnet restorecommand doesn't need to run again. Instead, the built-in cache for the corresponding
dotnet restorelayer is reused.
Let's ignore the
sln file copy step because it is not required and was done mainly for convenience.
Dockerfile solves our previous problem by:
- copying the project descriptor (a MSBuild file with a
- copying the remaining files
Why copy only the
csproj ? It is where the NuGet dependencies are defined, and that is what the
dotnet restore needs to know what to do.
However, this solution suffers from some shortcomings:
Real-life .NET applications are often composed of multiple projects (ie: multiple
csproj files). The .NET team provides a
Dockerfile example for this scenario:
The suggested solution is to copy all the
csproj files manually while preserving the original folder structure. (In case you are wondering why globbing wasn't used to copy all the project files with one line while preserving folder structure, Docker doesn't support it.)
It works in most cases, but it requires that for every project dependency change in
complexapp or any of its transitive project dependencies, the
Dockerfile must be updated. For sizeable applications, you may end up with huge
Dockerfile like this one.
Note that if you are restoring a project that has a missing project dependency, for example
libfoo from our
complexapp, it will just skip it and won't fail:
1>_GetAllRestoreProjectPathItems: Skipping project "/root/project/libfoo/libfoo.csproj" because it was not found.
There are a couple of files that can alter the
dotnet restore behavior and thus should be copied along the
nuget.config file contains parameters such as HTTP proxy, trusted package signers and remote package repositories (you can find the full list here). These parameters can be mandatory for a successful
In our case, there are two caveats to be aware of:
- On case-sensitive file systems, like in linux distributions,
dotnetwill check for these three casings in this order and use the first match :
- NuGet read its configuration from multiple
nuget.configfiles. It will look for the computer and user configs, and also for config files present in all the folders between the projet base and its drive root. Values in all these files are combined following a specific order to define the final settings to be applied.
You know the drill now, all these files should be copied too for the
dotnet restore to behave as expected.
This is a lesser known feature of .NET: you can create lock files for your NuGet dependencies. As to why and when it can be useful, check the official documentation.
NuGet looks for the first file in the project base folder that matches in this order (as defined in NuGet's source code):
- The value of the property
NuGetLockFilePathdefined in the
csprojfile if it is not empty
- The file
packages.<project_name>.lock.jsonif it exists, where
csprojfile name without extension and with spaces replaced by underscores.
- The file
packages.lock.jsonif it exists
Custom logic for defining dependencies
Last but not least, NuGet dependencies can technically be defined in any file, not only the
MSBuild, the build engine and also the projet file format, provides a way to include a MSBuild file into another. This is heavily used by the platform to abstract away all the build logic (that is the reason why
csproj files are so minimalistic), and it can also be useful for developers to centralize project settings and/or define some common NuGet dependencies like Roslyn Analyzers.
So if you don't want to miss a dependency during
dotnet restore, you may need to copy all the files directly or transitively imported by any of the application projects.
To sum it up, optimizing the image build for
dotnet restore is not as simple as copying the
.csproj file first. There are few edge cases that should be addressed and most importantly maintained over the project lifetime …
… or we can use
dotnet-subset to handle it all for us 😃
The better solution
At Nimbleways, we don't settle for the "good enough" solution, we challenge ourselves to do things the right way. We weren't satisfied with the current solutions to do
dotnet restore caching in docker properly, so we created
dotnet-subset to achieve that.
What is dotnet-subset ?
dotnet-subset is an open source .NET tool whose goal is to extract a subset of files from a root directory and copy it to a target directory. This subset is defined by the tool's arguments.
Let's see it in action on the
Above is the result of running
dotnet subset restore /source/complexapp/complexapp.csproj --root-directory /source/ --output /tmp/restore_subset/
Breaking down the command line:
dotnet subset: dotnet-subset is invoked as a sub-command of the
restore: the subset algorithm to use.
restoreis currently the only supported algorithm.
/source/complexapp/complexapp.csproj: the project or solution that needs to be restored
--root-directory: the directory from where the files will be copied
--output: the directory where the files needed for the restore will be copied, preserving the original structure.
The output directory
/tmp/restore_subset contains only the files that can impact
complexappand all the projects it depends on directly and transitively
- MSBuild files located under the root directory and that are imported by the copied
- Package lock files associated with copied
nuget.configfiles in copied
csproj's directories and all parent directories up to the root directory
Now that we have this superpower, how can we use it efficiently inside our
Docker cache + dotnet-subset = 🚀
As explained before,
dotnet-subset needs the whole application source code as input, which means we will need a
COPY . . before running it. This will invalidate the cache for the subsequent instructions in the same stage if any file changes, that is why we will be calling
dotnet-subet in its own stage, then we import its output from within the
build stage before running
Dockerfile now becomes:
Tada ! Less
COPY instructions and more confidence in the reliability of our
Do you remember the huge
Dockerfile I mentioned earlier ? Let's appreciate how it became neater thanks to
dotnet-subset (PR link):
Identifying all the files impacting
dotnet restore is hard, maintaining this list is even harder. Miss one and you may deteriorate the docker cache quality or fail the docker build if you are lucky, or you may cause the application to crash at runtime with an obscure error if you are not.
dotnet-subset helps you write optimized *and* reliable
Dockerfiles without the maintenance cost.
The tool is still in an early stage, awaiting your feedback to steer it in the right direction and make it better !