Docker has hit the systems scene with great fanfare. It's a very exciting advancement for systems, but there are some key misunderstandings around it.
Narrowly focused advice!
My discussion of Docker is nearly entirely limited to multi-host setups of mission-critical systems (web services primarily). Please keep that in mind since my coverage and advice will probably not apply to the many other scenarios you can use Docker for.
Background on Docker
This post assumes a basic understanding of what Docker is and how it works generally.
It's beyond the scope of this article to give a full coverage of Docker, so if you're totally new to Docker, first go through these resources before continuing:
Docker is an amazing tool for many scenarios, but there are several misconceptions I see come up regularly about using Docker.
Misconception: If I learn Docker then I don't have to learn the other systems stuff!
Someday this may be true. However, currently, it's not the case. It's best to think of Docker as an advanced optimization. Yes, it is extremely cool and powerful, but it adds significantly to the complexity of your systems and should only be used in mission critical systems if you are an expert system administrator that understands all the essential points of how to use it safely in production.
At the moment, you need more systems expertise to use Docker, not less. Nearly every article you'll read on Docker will show you the extremely simple use-cases and will ignore the complexities of using Docker on multi-host production systems. This gives a false impression of what it takes to actually use Docker in production.
To run Docker in a safe robust way for a typical multi-host production environment requires very careful management of many variables:
- secured private image repository (index)
- orchestrating container deploys with zero downtime
- orchestrating container deploy roll-backs
- networking between containers on multiple hosts
- managing container logs
- managing container data (db, etc)
- creating images that properly handle init, logs, etc
- much much more...
This is not impossible and can all be done - several large companies are already using Docker in production, but it's definitely non-trivial. This will change as the ecosystem around Docker matures (via Flynn, Docker container hosting, etc), but currently if you're going to attempt using Docker seriously in production, you need to be pretty skilled at systems management and orchestration.
For a sense of what I mean, see these articles that get the closest to production reality that I've found so far (but still miss many critical elements you'd need):
If you don't want to have to learn how to manage servers, you should use a Platform-as-a-Service (PaaS) like Heroku. Docker isn't the solution.
Misconception: You should have only one process per Docker container!
It's important to understand that it is far simpler to manage Docker if you view it as role-based virtual machine rather than as deployable single-purpose processes. For example, you'd build an 'app' container that is very similar to an 'app' VM you'd create along with the init, cron, ssh, etc processes within it. Don't try to capture every process in its own container with a separate container for ssh, cron, app, web server, etc.
There are great theoretical arguments for having a process per container, but in practice, it's a bit of a nightmare to actually manage. Perhaps at extremely large scales that approach makes more sense, but for most systems, you'll want role-based containers (app, db, redis, etc).
If you're still not convinced on that point, read this post on microservices which points out many of the similar management problems: Microservices - Not A Free Lunch!
Misconception: If I use Docker then I don't need a configuration management (CM) tool!
This is partially true. You may not need the configuration management as much for your servers with Docker, but you absolutely need an orchestration tool in order to provision, deploy, and manage your servers with Docker running on them.
This is where a tool like Ansible really shines. Ansible is primarily an orchestration tool that also happens to be able to do configuration management. That means you can use Ansible for all the necessary steps to provision your host servers, deploy and manage Docker containers, and manage the networking, etc.
So, if you decide you want to use Docker in production, the prerequisite is to learn a tool like Ansible. There are many other orchestration tools (some even specifically for Docker), but none of them come close to Ansible's simplicity, low learning curve, and power. It's better to just learn one orchestration tool well than to pick a less powerful tool that won't do everything you need it to (then you'd end up having to learn more tools to cover the shortfalls).
Misconception: I should use Docker right now!
I see too many folks trying to use Docker prematurely. Your systems need to already be in fine working order before you even consider using Docker in production.
Your current systems should have:
- secured least-privilege access (key based logins, firewalls, fail2ban, etc)
- restorable secure off-site database backups
- automated system setup (using Ansible, Puppet, etc)
- automated deploys
- automated provisioning
- monitoring of all critical services
- and more (documentation, etc)
If you have critical holes in your infrastructure, you should not be considering Docker. It'd be like parking a Ferrari on the edge of an unstable cliff.
Docker is a great optimization - but it needs a firm foundation to live on.
Misconception: I have to use Docker in order to get these speed and consistency advantages!
Below I list some optimizations that you can use instead of Docker to get close to the same level of performance and consistency. In fact, most high-scale companies optimize their systems in at least some of these ways.
Configuration Management Tools (Ansible/Puppet/etc)
If your systems are scripted with a CM tool, it allows you to easily create and manage them. Particularly in the cloud, it's cheap and easy to create and destroy server instances.
Many cloud server providers have some capability to save a server configuration as an image. Creating a new server instance from an image is usually far faster than using a CM tool to configure it from scratch.
One approach is to use your CM tool to create base images for your server roles (app, db, cache, etc). Then when you bring up new servers from those images, you can verify and manage them with your CM tool.
When small changes are needed to your servers, you can just use your CM tool to manage those changes. Over time the images will diverge from your current server configurations, so periodically you would create new server images to keep them closer aligned.
This is a variant of the Golden Image pattern that allows you to have the speed of using images, but helps you avoid the tedious image re-creation problem for small changes.
Most of the breakages that occur from environment to environment are due to software version differences. So, to gain close-to-the-same consistency advantages of Docker, explicitly define (pin) all the versions of all your key software. For example, in your CM tool, don't just install 'nginx' - install 'nginx version 1.4.6-1ubuntu3'.
If you're using Ansible, it's trivially easy to install your development environment in Vagrant using the same scripts that you use to install production. If you make sure you're also using the same OS version (like Ubuntu 12.04 x64, etc) across all your environments, then you will have highly consistent systems and breakages between environments will be very rare.
Version Control Deploys
If you use git (or a similar version control system), then you can use that to cache your application software on your servers and update it with very minimal downloads. This is similar to Docker's image layer caching. For example, if your codebase is 50MB and you want to deploy an update to your code which only involves a few changed lines in a couple of files, then if you just update the code on the server via git (or similar) it will only download those small changes in order to update the codebase. This can make for very fast deploys.
Note: You don't even have to use a version control system necessarily for these speed advantages. Tools like
rsync would also allow you to essentially have most of your code cached on your servers and deploy code changes via delta updates which are very light and fast.
Packaged Deploys (of application code primarily)
rpm to manage the deploys.
If you're using git (or another version control system), then you could even have a repository (the same or separate from the code) just for the compiled assets and use that.
For greater speed, make sure that the package (in whatever form) is on the same network local to your servers. Being on the same network is sometimes only a minor speed-up, so only consider it if you have a bottleneck downloading resources outside the servers' network.
When to use Docker
Well, you can start using Docker right away if you use Vagrant for your development environment. Vagrant (version 1.6 and higher) has added Docker as a VM provider (like VirtualBox and VMWare) which abstracts away much of the complexity of Docker so that you can have the speed and low-resource-needs of Docker without the learning curve. For more, see: Feature Preview: Docker-Based Development Environments
For multi-host production use, I would recommend using the alternative optimization methods I mentioned above for as long as you can. If you reach a point in the scale of your servers where those methods aren't enough, then consider using Docker for the advanced optimizations it provides. Currently, you'll need to be at very large scale before the benefits of using Docker outweigh the extra complexity it adds to your systems. This may change in the coming months/years as Docker and the tools and patterns around it mature.
Of course, this recommendation assumes that your systems are already robust and fully covered as far as being scripted, automated, secured, backed up, monitored, etc.
Docker is a truly amazing project and represents a huge leap forward for advanced systems administration. It's very powerful and has many use cases beyond what I've discussed here. My focus for evaluating Docker has been on server setups delivering web applications, however, there are other setups where my advice above won't be as relevant.
It's a new set of optimizations with great promise. But, remember that using and managing it becomes complex very quickly beyond the tiny examples shown in most articles promoting Docker.
Docker is progressing quickly, so some of my advice will be out of date (hopefully) sooner than later. I'd very much like to see the complexity go down and the usability go up for multi-host production Docker use. Until then, be sure to adequately weigh the cost of using Docker against the perceived benefits.
A few tips for simplifying Docker use in production
If you're an expert system administrator and your systems are already at the scale where Docker's cost/benefit trade-off makes sense, then consider these suggestions to help simplify getting started:
You don't need to Dockerize everything
Use Docker only for the server roles that will benefit from it. For example, perhaps you have thousands of app servers and you need Docker to optimize app deploys. In that case, only Dockerize the app servers and continue to manage other servers as they are.
Use role based Docker images
I mentioned this earlier in the post, but just to reiterate, it will be far easier to manage Docker if you use it for roles like app, db, cache, etc rather than individual processes (sshd, nginx, etc).
You will generally already have your servers scripted by roles, so it will make the Dockerization process much simpler.
Also, if you are at scale, you will nearly always only have one role per server (an app server is only an app server, not also a database server) and that means only one Docker container per server. One container per server simplifies networking greatly (no worry of port conflicts, etc).
Be explicit (avoid magic) as long as possible
Docker will assign random ports to access services on your containers unless you specify them explicitly. There are certain scenarios where this is useful (avoiding port-conflicts with multiple containers on the same host), but it's far simpler and easier to manage if you stick with one role container (app, db, cache, etc) per host server. If you do that, then you can assign explicit port numbers and not have to mess with the complexity of trying to communicate random port numbers to other servers that need to access them.
There are great tools like etcd, zookeeper, serf, etc that provide service discovery for your systems. Rather than hard-coding the location of your servers (ex: the database is at database.example.org), your application can query a service discovery app like these for the location of your various servers. Service discovery is very useful when you get to very large scales and are using auto-scaling. In those cases it becomes too costly and problematic to manage hard-coded service locations. However, service discovery apps introduce more complexity, magic, and point of failures, so don't use them unless you absolutely need to. Instead, explicitly define your servers in your configurations for as long as you can. This is trivial to do using something like the inventory variables in Ansible templates.
Don't store data in containers
Unless you really know what you're doing, don't store data in Docker containers. If you're not careful and stop a running container, that data may be lost forever. It's safer and easier to manage your data if you store it directly on the host with a shared directory.
For user uploads, use dedicated storage servers or a file storage service like Amazon's S3 or Google's Cloud Storage.
Yes, there are ways to store data in data-only containers that may not even be running, but unless you have a very high level of confidence, just store the data on the host server with a shared directory or somewhere off-server.
Use a private index provider
It's a chore to correctly set up a self-hosted secure private Docker index yourself. You can get going much quicker by using a hosted private Docker index provider instead.
Docker does provide an image for hosting your own repositories, but it's yet another piece to manage and there are quite a few decisions that you'd need to make when setting up. You're probably better off starting with a hosted repository index unless your images contain very sensitive baked-in configurations (like database passwords, etc). Of course, you shouldn't have sensitive data baked into your app or your Docker images in the first place - instead use a more sane approach like having Ansible set those sensitive details as environment variables when you run the Docker containers.
Build on the expertise of others
Phusion (the company that makes the excellent Passenger web server) has built advanced Docker images that you can use as a base for your services. They have spent a lot of time solving many of the common problems that you'd experience when attempting to create role-based Docker images. Their documentation is excellent and can serve as a great starting point for your own images.
Note: This post is based on an excerpt from my book "Taste Test: Puppet, Chef, Salt, Ansible"