October 24, 2016

The Copy-on-write technique and how Docker storage works

Two important concepts lie at the heart of understanding the various storage options supported by the Docker engine:

The Copy-on-write (CoW) technique
Stackable filesystem layers

Copy-on-write (CoW) is a simple pointer-based technique for efficient storage when you have one or more copies of the same data. Lets say you need to make a copy of directory X into directory Y. Here is what a simple copy operation would do:

Employing a copy-on-write technique, the same operation would do something like this:

So this time, instead of copying all the bytes over to directory Y (a time-consuming process and also disk consuming), we only get a bunch of pointers in directory Y - one pointer for every file. So there is just one copy of the file contents on disk and two pointers to it - one from directory X and one from directory Y. This way the copy completes very fast and occupies negligible space on directory Y.

All read operations for the files in directory Y will open up the same bytes pointed to as in directory X. If a user now wishes to modify one of the files in directory Y, say file A, an actual copy of its bytes are made at such time before letting the user make the edits.

If one of the files is now deleted from directory X, its actual contents are not deleted. It is merely de-referenced from within directory X and it maintains reference from directory Y.

NOTE: The diagrams above are all just illustrative of the concept. Actual storage on a filesystem may involve separate meta-data storage and pointers to the file’s contents from directory X as well. All of this is ignored here for brevity.

One obvious disadvantage of the Copy-on-write technique is the copy latency if there are large files to be copied into the new directory at modification time. However, once the file is copied there, subsequent edits of the same file are fast since all of its bytes are already available in directory Y.

An optimization that some filesystems employ is to do copy-on-write at a block level rather than at a file level. When making a change to a file in the target directory Y, this involves copying only the blocks within the file that need to be modified rather than copying the entire file. This has the obvious advantage of faster copies at modification time. However, this also means that post modification, constructing the complete file for viewing will involve stitching together blocks of the file that are distributed across the disk (fragmentation!).

The CoW technique optimizes disk space usage and copy performance. These are very important in the Docker world to allow for a single Docker image to be efficiently used by multiple containers and also for starting new containers quickly. In the language of the example above, starting a Docker container would involve something like creating directory Y (a CoW container) from directory X (an image).

To understand this better, lets imagine the two directories stacked one on top of the other. We then mark directory X read-only thus disabling further writes. We could say directory X represents an image of the directory frozen in time when it was made read-only. Lets call this image I1. Now, we can make multiple copies (using copy-on-write) on top of image I1. After making a few modifications and new additions in the copies, it might look something like this:

We could now conceive of freezing directory Y as well and creating an image out of it. Lets call this image I2. Recall that directory Y retains links to directory X. We can now create CoW copies on top of image I2 (such as directory N in the example below).

Stackable filesystem layers are very helpful for creating images out of existing read-write layers. This concept is exploited quite heavily in the creation and use of Docker images.

There is one final trick that some (albeit older) filesystems use in stacking layers. Instead of maintaining file links from one layer to the other, they just dynamically construct the combination view of all files across the layers. Somewhat like this:

One important consequence of this approach is that editing a file may involve searching for the file within multiple layers and then copying the file up to the topmost read-write layer before editing. This performance penalty could be quite heavy if the use-case involves editing multiple large files.

With all of this understanding, we’re now ready to dive into understanding the various storage driver options supported by Docker.

Kudos

The Copy-on-write technique and how Docker storage works

Now read this

Why the software sector in India is at an interesting inflection point