August 15, 2017

Containerization for everyone: a perspective on technology adoption and philosophical costs

Container technology is old. Older than it seems.

In 2008, one of the central concepts behind containers called OS-level virtualization (known as control groups or cgroups in the Linux world) became a mainstream feature of the Linux kernel. OS-level virtualization is not to be confused with whole-system virtualization provided by the likes of VMWare ESXi or Hyper-V. The concept was originally conceived of by FreeBSD developers and made available in the year 2000 as a mechanism called ‘FreeBSD jail’ in the FreeBSD OS. Building on this concept, Google engineers heavily contributed to cgroups in Linux as a way to isolate groups of processes from each other and allocate system resources to these process groups; all without the need of virtual machines or hypervisors. In simplified terms, one group of processes gets its own namespace, limited memory, CPU, network & disk - all of which are isolated from those of another process group so that they cannot interfere with each other.

Another central theme behind today’s containers is the concept of a union-mount filesystem, a concept that is more than 3 decades old! It was first introduced in the Plan9 OS by Bell Labs. A union mount filesystem is internally represented by a set of layers with each new write operation typically creating a new layer on top of existing layers. To illustrate - lets say we have three files in a filesystem layer at a given point of time, creating a new file will create a new layer with this file atop the previously existing layer. When the user views the contents of the filesystem, what the user sees is the combination or union of all layers (For a more technical understanding, see my post on Docker storage). A union-mount filesystem is what makes it possible for containers and container images to be extremely efficient in disk space usage (by de-duplication of common layers) and in speedy creation of new images atop existing layers. To this day, the default file system for Docker images in Ubuntu is aufs - a filesystem that was first introduced in 2006!

So why are containers making headlines more than a decade later? This brings us to Docker. Docker’s claim to fame is in democratizing container technology, making the technology accessible to the general developer community by bundling various underlying technologies and tools required to easily assemble and create containers. In that sense, Docker is to container technology what DropBox is to cloud storage (web-based storage was first introduced in the 90s!).

With the increased accessibility to containers enabled by Docker, folks quickly realized that they now had a way to 1) package software, dependencies and run-time environments into bundles (container-images) smaller than before (VM-images), 2) quickly transport them around from one machine to another making software distribution & installation more efficient, and 3) quickly starting up the software in the bundle since it didn’t involve booting up of an entire virtual machine OS and didn’t take up as many system resources either. These are the fundamental benefits offered by containers.

However as things evolved, containers (and Docker in particular) became quickly metamorphosed into much more than technology and into a specific philosophy around software packaging & distribution. There are two main tenets put forth: 1) all software packaged & distributed as container images including the application binaries are made immutable and should not be modified at run-time (except data that is produced or consumed by it), and 2) each executable process that is part of the software being distributed must be packaged such that it runs in a separate container (separate from other processes) and the software as a whole must be designed to choreograph actions between these individual processes.

So what do these mean for the average container user? To understand the implications, lets first look at what constitutes a container image. As per tenet one, the typical container-image of an app is built up of the system libraries, the application’s stack (eg. JDK/JRE, Tomcat, etc), the application’s configuration property files and the application’s binary artifacts (eg. WARs, JARs, etc). Of these, the stack and libraries change most infrequently, configurations change occasionally and the app binaries change very often (sometimes several times a day!). Infrequently changing artifacts are best made immutable and repeatable. But for dynamic oft-changing artifacts, one must consider the question of whether they make a better case for idempotent setup mechanisms rather than immutability.

First there is the cost (time & effort) of bundling the artifacts into immutable images every so often. Even if such costs were to be minimized with automation & optimization, there is still the matter of what this means for the developer. If a development team’s deliverable is now container images instead of binary artifacts, this necessitates changes in the developer’s tools and workflow. Developers must now take into account the idiosyncrasies of container volumes, port mappings and also of software components that run not just as independent processes but as ones that can collaborate with each other across the network. They must also take responsibility of building container images, testing them & managing image versions; all of which involve additional skills and knowledge than what is involved in the testing of app binaries. And tenet two above necessarily means that service discovery, inter-process network communication (synchronous or asynchronous), atomic consistency mechanisms, de-normalization of data & network failures must be dealt with in the design of all apps.

Certain apps - particularly greenfield apps - will most definitely benefit from such design as also from micro-services based architecture. However one could argue that for many other cases these are more likely to put enormous constraints and burdens upon the developer who simply wishes to take advantage of the technological benefits of containers without the costs associated with adopting a software packaging philosophy as engendered by the tenets mentioned above.

Is the imposition of a philosophical cost over and above the technology adoption cost resulting in a de-democratization of technology? Is it unreasonable then to treat containers to be usable as a form of mini VMs – lighter and more portable regardless of how the application is architected or distributed? Or is enterprise ready to bear the costs of re-architecting the hundreds of thousands of apps already deployed? No doubt the adoption of the philosophy brings additional benefits but at a significant cost too and the choice of taking on these costs is best left to the developer. But is this even feasible?

Some projects such as Phusion & Canonical-led LXD have purposefully been designed to provide contrary approaches to the prevalent popular philosophy. Some container platforms such as WaveMaker HyScale also provide developers with the option to choose their deployment methodology and have multi-process containers as appropriate to their use-case. At the end of the day, there exists a belief that technology & its benefits must not be held ransom to a specific philosophy unless the technology cannot scale up to meet the requirements. For all the talk about containers, such discourse on how best to leverage container technology and how best to exploit it for software packaging & distribution will determine the future of software delivery.

I currently work for WaveMaker HyScale. However, the opinions expressed here represent my own and not those of my employer.

Kudos

Containerization for everyone: a perspective on technology adoption and philosophical costs

Now read this

In the Great Microservices Debate, Value Eats Size for Lunch