Docker is a trend and everyone want to be part of this but not really in a good way. The bad side of this trend is that everything goes on container, like web app or database even GitLab to the detriment of data persistence, security and maybe performance in in certain case.
One of a solution is, IMHO, to use a orchestrator like Swarm or Kubernetes with a good study of which kind of volumes all your containers need. You can also make security better with a good reverse proxy and using RBAC for Kubernetes.
What is your methods to keep your data safe, to make container secure and to have good performances on your conteneurized website ?
Top comments (11)
I think that data persistence is the real key here. Running web and service applications that store no data themselves are a real good candidate for containerising. But when it comes to permanent or critical data storage I think containerising is a bad idea.
In a world where the cloud is become the defacto way to go in terms of building modern and scalable solutions. Many cloud providers (AWS, Azure, Google) offer cloud native database solutions that have all of the scalability and security concerns taken care of out the box, with guaranteed uptime and data safety.
And with enterprise grade cloud solutions for source control and CI/CD being available for virtually no cost. Running Gitlab/Jenkins and other DevOps tooling yourself seems like a problem that we should not have to worry about any more.
As a developer, the thought of having to maintain a Kubernetes cluster is not something i want to have to concern myself with. As although Kubernetes and competing solutions are undoubtable powerful and very flexible, they are complex and expensive pieces of infrastructure to run vs a cloud native solution.
Of course to implement all of the above means that you have to belong to an organisation that is not afraid to utilise the cloud to its fullest extent. In my opinion, if it is good enough for massive corporations and governments, it's good enough for anyone.
Those are really good points about data persistence. But couldn't you run a DB in another container and connect to that from your containerised app? I'm an undergrad so I don't know much about it, but I'd love to
We're in the process of moving over to docker swarm for our prod stuff - we've been using it for QA for about a year.
We mostly started using it as we have quite a few apps (say 30) and some of them are in PHP 5.3, PHP 7.2, Python 2.x, 3.x, golang, node vA-Z and it was becoming a real hassle to try and keep them updated and working with various OS releases. Also - we wanted to get away from our creaky VM infrastructure which had been sprouting VM's like crazy to avoid dealing with version conflicts etc.
To answer your specific questions - in regards to data we are fairly traditional - RDBMS with decent passwords and over our internal lan only, storage is to a local Minio cluster for newer apps and some are still using NFS (though we're looking to use rexray and see if that works out).
The DB set up we have ended up with is an overlay swarm network with mysql-router in it - all of the apps just talk to that as if it was 'the real' DB. The mysql-router takes care of passing the traffic out to an 'actually real' traditional mysql cluster.
We've only just really started looking at automatic container scanning - but compared to our old 'just install that then leave it running for five years' almost everything we do is an improvement!
This is pretty much all behind the corporate firewall - with just a few web services exposed via Traefik to the outside world, and we monitor those like we would any other internet-facing app - nothing special changed for us with it being in a container.
The performance seems fine without really doing any work so far - if anything it's better without the overhead of full VM's.
I find people very easily conflate containers with stateless. With a single host system containers are a great way of seperating your application dependencies from your OS capabilities. Say you are a small business and you have a variety of web apps going on, differing php versions, differing python versions, different module versions, etc. Docker allows you to very simply run independent environments. You can maintain state in exactly the same way you would before, you just mount the appropriate file system location as a volume into the container. Security is immediately improved as you've sandboxed the applications from each other. Upgrades and updates are handled much simpler. I think this is a very simple and very sadly overlooked use of containers. Just because you can, doesn't mean you need to be running many instances of everything. And when you start to expand in the future, you're set in good stead for the new wave of born-in-the-cloud applications which come container first.
On the data persistence front, I have an project I'm working on docker-izing that's all about data persistence. I'm thinking of making the app look at a default path when in docker that is auto mounted from an outside directory taken in from a CLI argument. Probably not the only way, but "containerizing-all-the-things" is, in general, a really good thing. Even if the current practices/docker even isn't the final solution.
Use volumes!
So much this
Thanks! Ill make sure to check those out :D
Ah containers and storage.
Docker is great for stateless services, but what's the first thing people do with it, containerize stateful systems.
working with all the orchestration platforms(Mesos, k8, swarm) this is one of the first problems that have to be solved. Since most apps get dockerized pretty quickly without any redesign for working well in docker and container orchestration systems.
Pick your poison but most places worked with go with some sort of NFS or S3 styled system or with a DSS like gluster, ceph or portworx.
Each comes with its own limitations but can work. The biggest problem is to now alter dev design to start to account for what containers and orchestrators can really do, designing proper stateless services with stateful data stores is not an easy task for large enterprise systems.
Security has to be approached a lot sooner with containers in distributed systems. Not relying on a known machine with a known key or cert gives IA the shivers...
Speaking of Gitlab, having a Docker support on their CI platform is a very nice addition since we can run our pipeline tests against fresh database, cache, or message bus services just by declaring them in the YAML file. Definitely something that should be in a container.
As for services that don't absolutely need a container, I posted a short remark at the end of my earlier post here.
Serving Gatsby Site With Docker Multi-Stage Build
Niko Heikkilä
You should be using volumes for data when using Docker containers. The data does not belong inside the actual container.
Docker, when used correctly, IS a good thing :-)