I chose Docker for my project with hope that it would help me create an easily reproducible development environment... And I'm not really sure if it was a good idea. There's so many moving parts and I can't name one that actually works well. Configuring storage is a hell (running out of disk space can lead to errors that are really difficult to debug), the dance with bridges makes configuring firewall a terrible experience. I definitely wouldn't recommend it as a stable container solution.
I totally agree. Docker feels really tidy conceptually. I ended up somehow convincing myself that adding this extra layer of technology into my (relatively tiny) infrastructure will decrease it's complexity. Unsurprisingly, the effect was the exact opposite.
I've been enjoying Nixos[0] and hope to try Nixops[1] soon. The difference between Nixos and docker is that nixos makes it possible to have a truly reproducible environment. See the surprisingly readable paper[2] for more details.
I feel the exact opposite! The Dockerfile is a simple way to define my environment and it's trivial to run the app in either development, QA, or production. Throw in some of the other functionality such as being able to deploy from an image rather than having to rebuild each time, and it's been a productive addition to my toolset. I'm guessing it depends on how complex your application is, but I've found it fantastic for running Python (Flask), PostgreSQL/PostGIS, MapServer, and some other smaller applications.
It's not an image in the sense that you can run updates/installs and strip out the crap associated with that but it's so trivial to run "images" that the benefits far outweigh the additional disk space for our use cases.
You might want to try out SmartOS: one .json file, vmadm(1M) and imgadm(1M) commands later, you'll have a fully isolated UNIX server, running at the speed of bare metal and residing on ZFS, protecting your data. If you need a more complex network setup, dladm(1M) will enable you to configure virtual switches and routers. The manual pages are really good, too. And there is nothing to install, as SmartOS runs from memory, so that all storage in the system can be used for virtual servers. Upgrades are a breeze by just booting into a new image, either from USB or PXE booting from the network.
I also didn't get the point of Docker until a few months ago when I got acquainted with Kubernetes (K8s) and Rancher.
It doesn't really make sense to use Docker without some sort of distributed scheduling/orchestration system like Kubernetes, Docker Swarm or Mesos.
I think the first few years of Docker were just pointless but I genuinely feel that it is all starting to come together now.
Docker 1.12 looks promising on the orchestration front - Hopefully they will back all these new features with thorough documentation (which I found lacking in Swarm when compared to K8s).
It looks like Swarm is just following in K8s' footsteps and maybe even improving things along the way.
Based on what I just read, it looks like Docker Swarm might finally be a serious contender to K8s.
From my experience with Docker one should just avoid docker-supplied volumes and manage the storage oneself using host bind mounts for containers and consistently using --read-only for containers themselves.
With that it is really weired to read all complains and issues about volumes management. It seems Docker just picked up wrong model for their storage. Then they try forever to fix impedance mismatch between that model and how people want to use Docker.
Care to clarify issues you have with volume management?
I think the biggest issue is around the `-v` syntax (and the corrosponding API).
In the UX we call things a volume which are actually a mount (which happen to use a volume).
I like the way we have implemented this in the new services API.... `docker run --mount type=volume,source=important_data,volume-driver=foo,volume-options=...`
For me the biggest issue with docker volumes is lack of any notion of subvolumes or volume grouping. Consider an application consisting of few containers communicating over unix sockets each with its own state. Surely I can create a bunch of volumes for the containers, but those would not have any notion of grouping. That complicates administration and development.
For example, it is very useful to have a shell view of all volumes for the application to poke inside or debug issues. Surely I can do docker run ubuntu and mount all relevant volumes under /mnt there, but that is not user friendly.
Or consider that during development or debugging I want to transfer the state to another machine. Why it is not possible to tar all application volumes with single command and restore them in a new place again with single command? I.e, something similar for docker save/load, but for volumes?
All those issues are trivially solvable if I use some directory on a file system for the whole application state and use subdirectories there for individual volumes for the containers and then pass those with -v to relevant containers.
check out weave for managing networking, IMO it's much easier and I'm not even talking about the docker plugin. just use weave without the plugin and it works amazingly well.
for the configuration aspect, just use dockerfiles and a system like mesos and marathon for deploying production.
you can then write simple restful API commands to launch different environments with marathon.
build all your docker containers to act as single server instances, i.e., spark on one container, SQL on another.
tie in a default install of cadvisor on all mesos clients and you can have easy monitoring as well.
I've been experimenting with Docker for reproducible dev environments as well for a while now, and my biggest issue has been with the app build step, which often requires a specific non-Linux platform (OS X for iOS and Windows for UWP).
Back in my previous project using Cordova, I made the Docker workflow work adequately by offloading builds to web-based build services like PhoneGap Build and Ionic Package (building only a thin live-reloading wrapper app that points to a configurable IP:port for development). But my new project uses React Native, and I haven't found any similar build services for React Native apps (or even just regular native apps) yet. Anyone have any suggestions?
Bitrise and Circle CI both support OS X environments for building projects. Combined with fastlane, you'd probably have a good setup for continuous delivery. I've only got experience with running fastlane locally though.
My experience with Docker is kind of ambivalent. We are using docker with docker compose on a legacy vmware stack in a corporate environment. No swarm and no docker overlay network as it does undermine firewalls. It is perfect to apply gitlow and semver to each build of each docker image, so each outcoming application stack is fully determenistic and auditable. Docker storage and network are really simple and docker is the perfect fit.
On the other side devs were asking for jenkins for that process. We run also Jenkins on Docker but it's cumbersome as it doesn't allow declarative configuration. There are dozen of plugins out there but it doesn't change the fact that the outcome is ressource hungry and slow. For now we are actually optimizing using Job-DSL plugin. We reached our target, but it was extremly time consuming and pain. For a new project I'm in, maybe we go to replace jenkins with concourseci or drone.
Imho a PaaS doesn't solve processual versioning and testing requirements. For that you still need further tools - with or without docker.
Why is storage an issue? With databases, for example, you store data upstream (on the host) and keep the dockerised app/thing mostly stateless, or at least capped with regards to storage. More details about your environment would be appreciated.
Keeping data on the host is exactly the reason that it's an issue. Swarm offers not guarantees that containers will be rescheduled on the same host that it was initially scheduled on and assumes that it can reschedule containers as failures arise... this doesn't work when your database gets rescheduled and emptied along the way.
In docker 1.12 you can 'tag' a node and tell a 'service' to only run on the tagged nodes.
Not saying that's a good idea, but it is getting closer.
You could, for example, have a node that's only for DBs that has volumes on it. You could then use DRBD on the host to clone that data to a secondary node. then in the event that node 1 dies swarm would bring the DB up on node2.
With the mesh network stuff they've added the endpoint would remain the same, so all your apps would need to do would be re-connect.
> Not saying that's a good idea, but it is getting closer.
No, that isn't getting closer it's getting farther away. The whole point of containers is that they make the host machine completely fungible. If I can only schedule my DB containers on a specific machine then I might as well just run my DB on that machine and be done with it.
My personal opinion is, Docker's good when you need to be sure you deploy exactly the same stuff on multiple hosts. This includes ensuring that the code you try on local machine would be the same on the production hosts. Docker is completely worthless for deploying singleton services that you already have packaging and deployment recipes for - it just doesn't add any value for such use cases.
That is, Docker is just a packaging system + a tool to spawn the deployed stuff in an isolated environment. Now, 3 years after, they've improved dependencies (previously done by docker-compose + hacks for using it in production).
That is, I use Docker only because its images are somewhat easier to build and deploy, compared to .deb packages or plain tarballs (and it's more OS-agnostic, since Docker Machine's available on Windows and OS X, so I can just build an image with dev. environment and don't care what OS is used). Doubt it's something more than this.
> My personal opinion is, Docker's good when you need to be sure you deploy exactly the same stuff on multiple hosts.
There is a technology for that already: it's called OS packaging. Docker does not maintain enough metadata to allow for in-place upgrades of individual components in an image (software and configuration lifecycle management). The best you can do with Docker is install lumps, you can forget upgrades. Docker is not a replacement for configuration management, and specifically, Docker is not a replacement for the operating system's software management subsystem.
Yes, you're right. I've mentioned that technology in the comment above. ;)
Docker is just a quick-and-dirty way to have a full system image. It's somewhat simpler than building a proper .deb (.rpm, .tar.xz, whatever one distro eats) package, especially when there's a ton of dependencies from other packaging systems (PyPI, Ruby gems, npm packages, etc.)
Oh, and unlike with many popular OS packaging systems, with Docker can actually have multiple versions of the same "package" "installed" at the same time (that's my biggest issue with dpkg), but IIRC there are no built-in provision for the migration scripts (prerm, postinst - this sort of stuff).
Containers are actually resource constraints, and were first introduced in Solaris 10, along with zones, which is what you appear to understand under containers. Even on GNU/Linux, containers are implemented via cgroups, which are resource controls and not a virtualization solution.
On SmartOS, when you provision a zone, you get a fully virtualized UNIX server, and you can apply a container to it by defining resource constraints, but that is both pointless and unnecessary there. Once you have a fully virtualized server provisioned from an image with the imgadm(1M) and vmadm(1M), it is only logical that you will want to service individual components via pkg_rm and pkg_add, rather than baking an entire image all over again, and redeploying it, all over again. It's the rule of modularity: "write simple parts connected by clean interfaces" [Kernighan-Plaguer], and it applies particularly well to lightweight virtualization.
I see how that was confusing. I meant that the fact that Swarm doesn't offer such a guarantee is the reason that storing the data on the host isn't a solution. My real complaint is that none of these give you a way to do persistence that doesn't destroy some other really nice properties of containers.
There's lots of people trying to fix this problem the right way, Kubernetes has made a lot of progress on having volumes follow the containers around from host to host. Torus is a slightly different approach, Flocker is another one.
> I don't think it's docker's responsibility to solve database clustering.
I don't know if I'd call it a responsibility. But Docker is obviously trying to expand their platform into more and more aspects of containerization. If they figured out persistence that would really set them apart, something that this product doesn't really do. It mostly just keeps them even with Kubernetes at best. And an imitation of it at worst.
Disagree...
Also disclaimer, I work at Docker, and particularly spend a large amount of time working on storage for the docker engine.
Volumes are exactly what you should be using to have data that lives beyond the life of the container.
Essentially, if you are writing to disk in your container, you should probably be writing it to a volume.
Volumes can be backed by any number of storage solutions, including from every cloud provider, various distributed block, gluster, ceph, etc...
Docker should _not_ be changing the way you run a database. If you need a highly available database, you don't use 1 database process and hope for the best, you use the built-in replication features available in just about every database product in existence today... database goes down, oh well you have a hot-standye ready.
> My real complaint is that none of these give you a way to do persistence that doesn't destroy some other really nice properties of containers.
100% agree with you there.
> It mostly just keeps them even with Kubernetes at best. And an imitation of it at worst.
Personally, when I looked at kubernetes, mesos, etc and saw the xml hell that i'd be living in and said "no way".
If i have to run something as complex as those to have docker then it's not worth it to me.
When I look at the new docker swarm stuff. For the first time I think that docker is a viable thing (assuming it all works). because I'm not adding additional complexity to get my docker simplicity.
Not sure why you are being downvoted. My sentiment is similar... Docker's historical advantage over other solutions is easy-to-use interface ( including Dockerfile). Can't wait to try swarm out.
This is very much the wrong way to approach storage (and sorry, for being so blunt, not sure how else to say it!).
Docker has support for a wide-array of storage solutions, everything from the various providers solutions (ebs, s3, gce storage, azure storage), to vsphere, netapp, scaleio, various distributed block solutions, nfs etc...
You should _not_ be changing the way you handle storage just because you are in a container. Use the tools you already have.
If you need HA on a database, use the built-in replication service available on just about every database product in existence.
If you really want distributed/shared storage, see the list above.
> If you need HA on a database, use the built-in replication service available on just about every database product in existence.
There are often valid reasons to not do this. For example, MySQL does not guarantee consistency between a source and its destination. mysqlrplsync can do this as a bandage, but it's something extra you need to set up and configure.
I do not think this is a reason to not use replication. In this sense, no distributed system guarantees consistency. If a node goes down (particularly a leader!) while trying to distribute a new update, you are very likely to run into issues.
> You should _not_ be changing the way you handle storage just because you are in a container.
You mean exactly like zones handle it inside of SmartOS (;-))
The ZFS storage pool is abstracted away from the zone the virtual server runs on, and since it is persistent, it's completely out of the way, JustWorks(SM).
> your database gets rescheduled and emptied along the way
That's bundling an CP application into seemingly AP infrastructure (from what I gleaned from their docs). Completely the wrong tool for the job. If the application was designed for that it would have recovered. In your defense, they have nothing to warn about that in their documentation.
This isn't really a CAP theorem problem and it's certainly not right to describe Docker as "AP infrastructure." What would that mean exactly? Would you expect this problem to go away if I used a database that sacrificed consistency for availability such as Riak?
Swarm (not the whole of Docker) seems like something designed for databases like Riak, yes. A SQL failover cluster might work, but certainly not a single database container on infrastructure designed for highly available apps.
why not run the database directly in the container as well? we're doing that with https://crate.io - just expose your local instance store as volume, crate will take care of the rest