In docker 1.12 you can 'tag' a node and tell a 'service' to only run on the tagged nodes.
Not saying that's a good idea, but it is getting closer.
You could, for example, have a node that's only for DBs that has volumes on it. You could then use DRBD on the host to clone that data to a secondary node. then in the event that node 1 dies swarm would bring the DB up on node2.
With the mesh network stuff they've added the endpoint would remain the same, so all your apps would need to do would be re-connect.
> Not saying that's a good idea, but it is getting closer.
No, that isn't getting closer it's getting farther away. The whole point of containers is that they make the host machine completely fungible. If I can only schedule my DB containers on a specific machine then I might as well just run my DB on that machine and be done with it.
My personal opinion is, Docker's good when you need to be sure you deploy exactly the same stuff on multiple hosts. This includes ensuring that the code you try on local machine would be the same on the production hosts. Docker is completely worthless for deploying singleton services that you already have packaging and deployment recipes for - it just doesn't add any value for such use cases.
That is, Docker is just a packaging system + a tool to spawn the deployed stuff in an isolated environment. Now, 3 years after, they've improved dependencies (previously done by docker-compose + hacks for using it in production).
That is, I use Docker only because its images are somewhat easier to build and deploy, compared to .deb packages or plain tarballs (and it's more OS-agnostic, since Docker Machine's available on Windows and OS X, so I can just build an image with dev. environment and don't care what OS is used). Doubt it's something more than this.
> My personal opinion is, Docker's good when you need to be sure you deploy exactly the same stuff on multiple hosts.
There is a technology for that already: it's called OS packaging. Docker does not maintain enough metadata to allow for in-place upgrades of individual components in an image (software and configuration lifecycle management). The best you can do with Docker is install lumps, you can forget upgrades. Docker is not a replacement for configuration management, and specifically, Docker is not a replacement for the operating system's software management subsystem.
Yes, you're right. I've mentioned that technology in the comment above. ;)
Docker is just a quick-and-dirty way to have a full system image. It's somewhat simpler than building a proper .deb (.rpm, .tar.xz, whatever one distro eats) package, especially when there's a ton of dependencies from other packaging systems (PyPI, Ruby gems, npm packages, etc.)
Oh, and unlike with many popular OS packaging systems, with Docker can actually have multiple versions of the same "package" "installed" at the same time (that's my biggest issue with dpkg), but IIRC there are no built-in provision for the migration scripts (prerm, postinst - this sort of stuff).
Containers are actually resource constraints, and were first introduced in Solaris 10, along with zones, which is what you appear to understand under containers. Even on GNU/Linux, containers are implemented via cgroups, which are resource controls and not a virtualization solution.
On SmartOS, when you provision a zone, you get a fully virtualized UNIX server, and you can apply a container to it by defining resource constraints, but that is both pointless and unnecessary there. Once you have a fully virtualized server provisioned from an image with the imgadm(1M) and vmadm(1M), it is only logical that you will want to service individual components via pkg_rm and pkg_add, rather than baking an entire image all over again, and redeploying it, all over again. It's the rule of modularity: "write simple parts connected by clean interfaces" [Kernighan-Plaguer], and it applies particularly well to lightweight virtualization.
I see how that was confusing. I meant that the fact that Swarm doesn't offer such a guarantee is the reason that storing the data on the host isn't a solution. My real complaint is that none of these give you a way to do persistence that doesn't destroy some other really nice properties of containers.
There's lots of people trying to fix this problem the right way, Kubernetes has made a lot of progress on having volumes follow the containers around from host to host. Torus is a slightly different approach, Flocker is another one.
> I don't think it's docker's responsibility to solve database clustering.
I don't know if I'd call it a responsibility. But Docker is obviously trying to expand their platform into more and more aspects of containerization. If they figured out persistence that would really set them apart, something that this product doesn't really do. It mostly just keeps them even with Kubernetes at best. And an imitation of it at worst.
Disagree...
Also disclaimer, I work at Docker, and particularly spend a large amount of time working on storage for the docker engine.
Volumes are exactly what you should be using to have data that lives beyond the life of the container.
Essentially, if you are writing to disk in your container, you should probably be writing it to a volume.
Volumes can be backed by any number of storage solutions, including from every cloud provider, various distributed block, gluster, ceph, etc...
Docker should _not_ be changing the way you run a database. If you need a highly available database, you don't use 1 database process and hope for the best, you use the built-in replication features available in just about every database product in existence today... database goes down, oh well you have a hot-standye ready.
> My real complaint is that none of these give you a way to do persistence that doesn't destroy some other really nice properties of containers.
100% agree with you there.
> It mostly just keeps them even with Kubernetes at best. And an imitation of it at worst.
Personally, when I looked at kubernetes, mesos, etc and saw the xml hell that i'd be living in and said "no way".
If i have to run something as complex as those to have docker then it's not worth it to me.
When I look at the new docker swarm stuff. For the first time I think that docker is a viable thing (assuming it all works). because I'm not adding additional complexity to get my docker simplicity.
Not sure why you are being downvoted. My sentiment is similar... Docker's historical advantage over other solutions is easy-to-use interface ( including Dockerfile). Can't wait to try swarm out.
This is very much the wrong way to approach storage (and sorry, for being so blunt, not sure how else to say it!).
Docker has support for a wide-array of storage solutions, everything from the various providers solutions (ebs, s3, gce storage, azure storage), to vsphere, netapp, scaleio, various distributed block solutions, nfs etc...
You should _not_ be changing the way you handle storage just because you are in a container. Use the tools you already have.
If you need HA on a database, use the built-in replication service available on just about every database product in existence.
If you really want distributed/shared storage, see the list above.
> If you need HA on a database, use the built-in replication service available on just about every database product in existence.
There are often valid reasons to not do this. For example, MySQL does not guarantee consistency between a source and its destination. mysqlrplsync can do this as a bandage, but it's something extra you need to set up and configure.
I do not think this is a reason to not use replication. In this sense, no distributed system guarantees consistency. If a node goes down (particularly a leader!) while trying to distribute a new update, you are very likely to run into issues.
> You should _not_ be changing the way you handle storage just because you are in a container.
You mean exactly like zones handle it inside of SmartOS (;-))
The ZFS storage pool is abstracted away from the zone the virtual server runs on, and since it is persistent, it's completely out of the way, JustWorks(SM).
Not saying that's a good idea, but it is getting closer.
You could, for example, have a node that's only for DBs that has volumes on it. You could then use DRBD on the host to clone that data to a secondary node. then in the event that node 1 dies swarm would bring the DB up on node2.
With the mesh network stuff they've added the endpoint would remain the same, so all your apps would need to do would be re-connect.