RUN RUN COPY RUN FROM RUN COPY RUN CMD, EXPOSE ... ``` * The build fails as soon as an instruction fails * If `RUN ` fails, the build doesn't produce an image * If it succeeds, it produces a clean image (without test libraries and data) .debug[[containers/Dockerfile_Tips.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Dockerfile_Tips.md)] --- class: pic .interstitial[![Image separating from the next part](https://gallant-turing-d0d520.netlify.com/containers/container-housing.jpg)] --- name: toc-dockerfile-examples class: title Dockerfile examples .nav[ [Previous part](#toc-tips-for-efficient-dockerfiles) | [Back to table of contents](#toc-part-5) | [Next part](#toc-advanced-dockerfile-syntax) ] .debug[(automatically generated title slide)] --- # Dockerfile examples There are a number of tips, tricks, and techniques that we can use in Dockerfiles. But sometimes, we have to use different (and even opposed) practices depending on: - the complexity of our project, - the programming language or framework that we are using, - the stage of our project (early MVP vs. super-stable production), - whether we're building a final image or a base for further images, - etc. We are going to show a few examples using very different techniques. .debug[[containers/Dockerfile_Tips.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Dockerfile_Tips.md)] --- ## When to optimize an image When authoring official images, it is a good idea to reduce as much as possible: - the number of layers, - the size of the final image. This is often done at the expense of build time and convenience for the image maintainer; but when an image is downloaded millions of time, saving even a few seconds of pull time can be worth it. .small[ ```dockerfile RUN apt-get update && apt-get install -y libpng12-dev libjpeg-dev && rm -rf /var/lib/apt/lists/* \ && docker-php-ext-configure gd --with-png-dir=/usr --with-jpeg-dir=/usr \ && docker-php-ext-install gd ... RUN curl -o wordpress.tar.gz -SL https://wordpress.org/wordpress-${WORDPRESS_UPSTREAM_VERSION}.tar.gz \ && echo "$WORDPRESS_SHA1 *wordpress.tar.gz" | sha1sum -c - \ && tar -xzf wordpress.tar.gz -C /usr/src/ \ && rm wordpress.tar.gz \ && chown -R www-data:www-data /usr/src/wordpress ``` ] (Source: [Wordpress official image](https://github.com/docker-library/wordpress/blob/618490d4bdff6c5774b84b717979bfe3d6ba8ad1/apache/Dockerfile)) .debug[[containers/Dockerfile_Tips.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Dockerfile_Tips.md)] --- ## When to *not* optimize an image Sometimes, it is better to prioritize *maintainer convenience*. In particular, if: - the image changes a lot, - the image has very few users (e.g. only 1, the maintainer!), - the image is built and run on the same machine, - the image is built and run on machines with a very fast link ... In these cases, just keep things simple! (Next slide: a Dockerfile that can be used to preview a Jekyll / github pages site.) .debug[[containers/Dockerfile_Tips.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Dockerfile_Tips.md)] --- ```dockerfile FROM debian:sid RUN apt-get update -q RUN apt-get install -yq build-essential make RUN apt-get install -yq zlib1g-dev RUN apt-get install -yq ruby ruby-dev RUN apt-get install -yq python-pygments RUN apt-get install -yq nodejs RUN apt-get install -yq cmake RUN gem install --no-rdoc --no-ri github-pages COPY . /blog WORKDIR /blog VOLUME /blog/_site EXPOSE 4000 CMD ["jekyll", "serve", "--host", "0.0.0.0", "--incremental"] ``` .debug[[containers/Dockerfile_Tips.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Dockerfile_Tips.md)] --- ## Multi-dimensional versioning systems Images can have a tag, indicating the version of the image. But sometimes, there are multiple important components, and we need to indicate the versions for all of them. This can be done with environment variables: ```dockerfile ENV PIP=9.0.3 \ ZC_BUILDOUT=2.11.2 \ SETUPTOOLS=38.7.0 \ PLONE_MAJOR=5.1 \ PLONE_VERSION=5.1.0 \ PLONE_MD5=76dc6cfc1c749d763c32fff3a9870d8d ``` (Source: [Plone official image](https://github.com/plone/plone.docker/blob/master/5.1/5.1.0/alpine/Dockerfile)) .debug[[containers/Dockerfile_Tips.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Dockerfile_Tips.md)] --- ## Entrypoints and wrappers It is very common to define a custom entrypoint. That entrypoint will generally be a script, performing any combination of: - pre-flights checks (if a required dependency is not available, display a nice error message early instead of an obscure one in a deep log file), - generation or validation of configuration files, - dropping privileges (with e.g. `su` or `gosu`, sometimes combined with `chown`), - and more. .debug[[containers/Dockerfile_Tips.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Dockerfile_Tips.md)] --- ## A typical entrypoint script ```dockerfile #!/bin/sh set -e # first arg is '-f' or '--some-option' # or first arg is 'something.conf' if [ "${1#-}" != "$1" ] || [ "${1%.conf}" != "$1" ]; then set -- redis-server "$@" fi # allow the container to be started with '--user' if [ "$1" = 'redis-server' -a "$(id -u)" = '0' ]; then chown -R redis . exec su-exec redis "$0" "$@" fi exec "$@" ``` (Source: [Redis official image](https://github.com/docker-library/redis/blob/d24f2be82673ccef6957210cc985e392ebdc65e4/4.0/alpine/docker-entrypoint.sh)) .debug[[containers/Dockerfile_Tips.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Dockerfile_Tips.md)] --- ## Factoring information To facilitate maintenance (and avoid human errors), avoid to repeat information like: - version numbers, - remote asset URLs (e.g. source tarballs) ... Instead, use environment variables. .small[ ```dockerfile ENV NODE_VERSION 10.2.1 ... RUN ... && curl -fsSLO --compressed "https://nodejs.org/dist/v$NODE_VERSION/node-v$NODE_VERSION.tar.xz" \ && curl -fsSLO --compressed "https://nodejs.org/dist/v$NODE_VERSION/SHASUMS256.txt.asc" \ && gpg --batch --decrypt --output SHASUMS256.txt SHASUMS256.txt.asc \ && grep " node-v$NODE_VERSION.tar.xz\$" SHASUMS256.txt | sha256sum -c - \ && tar -xf "node-v$NODE_VERSION.tar.xz" \ && cd "node-v$NODE_VERSION" \ ... ``` ] (Source: [Nodejs official image](https://github.com/nodejs/docker-node/blob/master/10/alpine/Dockerfile)) .debug[[containers/Dockerfile_Tips.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Dockerfile_Tips.md)] --- ## Overrides In theory, development and production images should be the same. In practice, we often need to enable specific behaviors in development (e.g. debug statements). One way to reconcile both needs is to use Compose to enable these behaviors. Let's look at the [trainingwheels](https://github.com/jpetazzo/trainingwheels) demo app for an example. .debug[[containers/Dockerfile_Tips.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Dockerfile_Tips.md)] --- ## Production image This Dockerfile builds an image leveraging gunicorn: ```dockerfile FROM python RUN pip install flask RUN pip install gunicorn RUN pip install redis COPY . /src WORKDIR /src CMD gunicorn --bind 0.0.0.0:5000 --workers 10 counter:app EXPOSE 5000 ``` (Source: [trainingwheels Dockerfile](https://github.com/jpetazzo/trainingwheels/blob/master/www/Dockerfile)) .debug[[containers/Dockerfile_Tips.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Dockerfile_Tips.md)] --- ## Development Compose file This Compose file uses the same image, but with a few overrides for development: - the Flask development server is used (overriding `CMD`), - the `DEBUG` environment variable is set, - a volume is used to provide a faster local development workflow. .small[ ```yaml services: www: build: www ports: - 8000:5000 user: nobody environment: DEBUG: 1 command: python counter.py volumes: - ./www:/src ``` ] (Source: [trainingwheels Compose file](https://github.com/jpetazzo/trainingwheels/blob/master/docker-compose.yml)) .debug[[containers/Dockerfile_Tips.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Dockerfile_Tips.md)] --- ## How to know which best practices are better? - The main goal of containers is to make our lives easier. - In this chapter, we showed many ways to write Dockerfiles. - These Dockerfiles use sometimes diametrically opposed techniques. - Yet, they were the "right" ones *for a specific situation.* - It's OK (and even encouraged) to start simple and evolve as needed. - Feel free to review this chapter later (after writing a few Dockerfiles) for inspiration! ??? :EN:Optimizing images :EN:- Dockerfile tips, tricks, and best practices :EN:- Reducing build time :EN:- Reducing image size :FR:Optimiser ses images :FR:- Bonnes pratiques, trucs et astuces :FR:- Réduire le temps de build :FR:- Réduire la taille des images .debug[[containers/Dockerfile_Tips.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Dockerfile_Tips.md)] --- class: pic .interstitial[![Image separating from the next part](https://gallant-turing-d0d520.netlify.com/containers/containers-by-the-water.jpg)] --- name: toc-advanced-dockerfile-syntax class: title Advanced Dockerfile Syntax .nav[ [Previous part](#toc-dockerfile-examples) | [Back to table of contents](#toc-part-5) | [Next part](#toc-orchestration-an-overview) ] .debug[(automatically generated title slide)] --- class: title # Advanced Dockerfile Syntax ![construction](images/title-advanced-dockerfiles.jpg) .debug[[containers/Advanced_Dockerfiles.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Advanced_Dockerfiles.md)] --- ## Objectives We have seen simple Dockerfiles to illustrate how Docker build container images. In this section, we will give a recap of the Dockerfile syntax, and introduce advanced Dockerfile commands that we might come across sometimes; or that we might want to use in some specific scenarios. .debug[[containers/Advanced_Dockerfiles.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Advanced_Dockerfiles.md)] --- ## `Dockerfile` usage summary * `Dockerfile` instructions are executed in order. * Each instruction creates a new layer in the image. * Docker maintains a cache with the layers of previous builds. * When there are no changes in the instructions and files making a layer, the builder re-uses the cached layer, without executing the instruction for that layer. * The `FROM` instruction MUST be the first non-comment instruction. * Lines starting with `#` are treated as comments. * Some instructions (like `CMD` or `ENTRYPOINT`) update a piece of metadata. (As a result, each call to these instructions makes the previous one useless.) .debug[[containers/Advanced_Dockerfiles.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Advanced_Dockerfiles.md)] --- ## The `RUN` instruction The `RUN` instruction can be specified in two ways. With shell wrapping, which runs the specified command inside a shell, with `/bin/sh -c`: ```dockerfile RUN apt-get update ``` Or using the `exec` method, which avoids shell string expansion, and allows execution in images that don't have `/bin/sh`: ```dockerfile RUN [ "apt-get", "update" ] ``` .debug[[containers/Advanced_Dockerfiles.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Advanced_Dockerfiles.md)] --- ## More about the `RUN` instruction `RUN` will do the following: * Execute a command. * Record changes made to the filesystem. * Work great to install libraries, packages, and various files. `RUN` will NOT do the following: * Record state of *processes*. * Automatically start daemons. If you want to start something automatically when the container runs, you should use `CMD` and/or `ENTRYPOINT`. .debug[[containers/Advanced_Dockerfiles.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Advanced_Dockerfiles.md)] --- ## Collapsing layers It is possible to execute multiple commands in a single step: ```dockerfile RUN apt-get update && apt-get install -y wget && apt-get clean ``` It is also possible to break a command onto multiple lines: ```dockerfile RUN apt-get update \ && apt-get install -y wget \ && apt-get clean ``` .debug[[containers/Advanced_Dockerfiles.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Advanced_Dockerfiles.md)] --- ## The `EXPOSE` instruction The `EXPOSE` instruction tells Docker what ports are to be published in this image. ```dockerfile EXPOSE 8080 EXPOSE 80 443 EXPOSE 53/tcp 53/udp ``` * All ports are private by default. * Declaring a port with `EXPOSE` is not enough to make it public. * The `Dockerfile` doesn't control on which port a service gets exposed. .debug[[containers/Advanced_Dockerfiles.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Advanced_Dockerfiles.md)] --- ## Exposing ports * When you `docker run -p ...`, that port becomes public. (Even if it was not declared with `EXPOSE`.) * When you `docker run -P ...` (without port number), all ports declared with `EXPOSE` become public. A *public port* is reachable from other containers and from outside the host. A *private port* is not reachable from outside. .debug[[containers/Advanced_Dockerfiles.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Advanced_Dockerfiles.md)] --- ## The `COPY` instruction The `COPY` instruction adds files and content from your host into the image. ```dockerfile COPY . /src ``` This will add the contents of the *build context* (the directory passed as an argument to `docker build`) to the directory `/src` in the container. .debug[[containers/Advanced_Dockerfiles.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Advanced_Dockerfiles.md)] --- ## Build context isolation Note: you can only reference files and directories *inside* the build context. Absolute paths are taken as being anchored to the build context, so the two following lines are equivalent: ```dockerfile COPY . /src COPY / /src ``` Attempts to use `..` to get out of the build context will be detected and blocked with Docker, and the build will fail. Otherwise, a `Dockerfile` could succeed on host A, but fail on host B. .debug[[containers/Advanced_Dockerfiles.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Advanced_Dockerfiles.md)] --- ## `ADD` `ADD` works almost like `COPY`, but has a few extra features. `ADD` can get remote files: ```dockerfile ADD http://www.example.com/webapp.jar /opt/ ``` This would download the `webapp.jar` file and place it in the `/opt` directory. `ADD` will automatically unpack zip files and tar archives: ```dockerfile ADD ./assets.zip /var/www/htdocs/assets/ ``` This would unpack `assets.zip` into `/var/www/htdocs/assets`. *However,* `ADD` will not automatically unpack remote archives. .debug[[containers/Advanced_Dockerfiles.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Advanced_Dockerfiles.md)] --- ## `ADD`, `COPY`, and the build cache * Before creating a new layer, Docker checks its build cache. * For most Dockerfile instructions, Docker only looks at the `Dockerfile` content to do the cache lookup. * For `ADD` and `COPY` instructions, Docker also checks if the files to be added to the container have been changed. * `ADD` always needs to download the remote file before it can check if it has been changed. (It cannot use, e.g., ETags or If-Modified-Since headers.) .debug[[containers/Advanced_Dockerfiles.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Advanced_Dockerfiles.md)] --- ## `VOLUME` The `VOLUME` instruction tells Docker that a specific directory should be a *volume*. ```dockerfile VOLUME /var/lib/mysql ``` Filesystem access in volumes bypasses the copy-on-write layer, offering native performance to I/O done in those directories. Volumes can be attached to multiple containers, allowing to "port" data over from a container to another, e.g. to upgrade a database to a newer version. It is possible to start a container in "read-only" mode. The container filesystem will be made read-only, but volumes can still have read/write access if necessary. .debug[[containers/Advanced_Dockerfiles.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Advanced_Dockerfiles.md)] --- ## The `WORKDIR` instruction The `WORKDIR` instruction sets the working directory for subsequent instructions. It also affects `CMD` and `ENTRYPOINT`, since it sets the working directory used when starting the container. ```dockerfile WORKDIR /src ``` You can specify `WORKDIR` again to change the working directory for further operations. .debug[[containers/Advanced_Dockerfiles.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Advanced_Dockerfiles.md)] --- ## The `ENV` instruction The `ENV` instruction specifies environment variables that should be set in any container launched from the image. ```dockerfile ENV WEBAPP_PORT 8080 ``` This will result in an environment variable being created in any containers created from this image of ```bash WEBAPP_PORT=8080 ``` You can also specify environment variables when you use `docker run`. ```bash $ docker run -e WEBAPP_PORT=8000 -e WEBAPP_HOST=www.example.com ... ``` .debug[[containers/Advanced_Dockerfiles.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Advanced_Dockerfiles.md)] --- ## The `USER` instruction The `USER` instruction sets the user name or UID to use when running the image. It can be used multiple times to change back to root or to another user. .debug[[containers/Advanced_Dockerfiles.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Advanced_Dockerfiles.md)] --- ## The `CMD` instruction The `CMD` instruction is a default command run when a container is launched from the image. ```dockerfile CMD [ "nginx", "-g", "daemon off;" ] ``` Means we don't need to specify `nginx -g "daemon off;"` when running the container. Instead of: ```bash $ docker run /web_image nginx -g "daemon off;" ``` We can just do: ```bash $ docker run /web_image ``` .debug[[containers/Advanced_Dockerfiles.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Advanced_Dockerfiles.md)] --- ## More about the `CMD` instruction Just like `RUN`, the `CMD` instruction comes in two forms. The first executes in a shell: ```dockerfile CMD nginx -g "daemon off;" ``` The second executes directly, without shell processing: ```dockerfile CMD [ "nginx", "-g", "daemon off;" ] ``` .debug[[containers/Advanced_Dockerfiles.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Advanced_Dockerfiles.md)] --- class: extra-details ## Overriding the `CMD` instruction The `CMD` can be overridden when you run a container. ```bash $ docker run -it /web_image bash ``` Will run `bash` instead of `nginx -g "daemon off;"`. .debug[[containers/Advanced_Dockerfiles.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Advanced_Dockerfiles.md)] --- ## The `ENTRYPOINT` instruction The `ENTRYPOINT` instruction is like the `CMD` instruction, but arguments given on the command line are *appended* to the entry point. Note: you have to use the "exec" syntax (`[ "..." ]`). ```dockerfile ENTRYPOINT [ "/bin/ls" ] ``` If we were to run: ```bash $ docker run training/ls -l ``` Instead of trying to run `-l`, the container will run `/bin/ls -l`. .debug[[containers/Advanced_Dockerfiles.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Advanced_Dockerfiles.md)] --- class: extra-details ## Overriding the `ENTRYPOINT` instruction The entry point can be overridden as well. ```bash $ docker run -it training/ls bin dev home lib64 mnt proc run srv tmp var boot etc lib media opt root sbin sys usr $ docker run -it --entrypoint bash training/ls root@d902fb7b1fc7:/# ``` .debug[[containers/Advanced_Dockerfiles.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Advanced_Dockerfiles.md)] --- ## How `CMD` and `ENTRYPOINT` interact The `CMD` and `ENTRYPOINT` instructions work best when used together. ```dockerfile ENTRYPOINT [ "nginx" ] CMD [ "-g", "daemon off;" ] ``` The `ENTRYPOINT` specifies the command to be run and the `CMD` specifies its options. On the command line we can then potentially override the options when needed. ```bash $ docker run -d /web_image -t ``` This will override the options `CMD` provided with new flags. .debug[[containers/Advanced_Dockerfiles.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Advanced_Dockerfiles.md)] --- ## Advanced Dockerfile instructions * `ONBUILD` lets you stash instructions that will be executed when this image is used as a base for another one. * `LABEL` adds arbitrary metadata to the image. * `ARG` defines build-time variables (optional or mandatory). * `STOPSIGNAL` sets the signal for `docker stop` (`TERM` by default). * `HEALTHCHECK` defines a command assessing the status of the container. * `SHELL` sets the default program to use for string-syntax RUN, CMD, etc. .debug[[containers/Advanced_Dockerfiles.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Advanced_Dockerfiles.md)] --- class: extra-details ## The `ONBUILD` instruction The `ONBUILD` instruction is a trigger. It sets instructions that will be executed when another image is built from the image being build. This is useful for building images which will be used as a base to build other images. ```dockerfile ONBUILD COPY . /src ``` * You can't chain `ONBUILD` instructions with `ONBUILD`. * `ONBUILD` can't be used to trigger `FROM` instructions. ??? :EN:- Advanced Dockerfile syntax :FR:- Dockerfile niveau expert .debug[[containers/Advanced_Dockerfiles.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Advanced_Dockerfiles.md)] --- class: pic .interstitial[![Image separating from the next part](https://gallant-turing-d0d520.netlify.com/containers/distillery-containers.jpg)] --- name: toc-orchestration-an-overview class: title Orchestration, an overview .nav[ [Previous part](#toc-advanced-dockerfile-syntax) | [Back to table of contents](#toc-part-6) | [Next part](#toc-reducing-image-size) ] .debug[(automatically generated title slide)] --- # Orchestration, an overview In this chapter, we will: * Explain what is orchestration and why we would need it. * Present (from a high-level perspective) some orchestrators. .debug[[containers/Orchestration_Overview.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Orchestration_Overview.md)] --- class: pic ## What's orchestration? ![Joana Carneiro (orchestra conductor)](images/conductor.jpg) .debug[[containers/Orchestration_Overview.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Orchestration_Overview.md)] --- ## What's orchestration? According to Wikipedia: *Orchestration describes the __automated__ arrangement, coordination, and management of complex computer systems, middleware, and services.* -- *[...] orchestration is often discussed in the context of __service-oriented architecture__, __virtualization__, provisioning, Converged Infrastructure and __dynamic datacenter__ topics.* -- What does that really mean? .debug[[containers/Orchestration_Overview.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Orchestration_Overview.md)] --- ## Example 1: dynamic cloud instances -- - Q: do we always use 100% of our servers? -- - A: obviously not! .center[![Daily variations of traffic](images/traffic-graph.png)] .debug[[containers/Orchestration_Overview.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Orchestration_Overview.md)] --- ## Example 1: dynamic cloud instances - Every night, scale down (by shutting down extraneous replicated instances) - Every morning, scale up (by deploying new copies) - "Pay for what you use" (i.e. save big $$$ here) .debug[[containers/Orchestration_Overview.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Orchestration_Overview.md)] --- ## Example 1: dynamic cloud instances How do we implement this? - Crontab - Autoscaling (save even bigger $$$) That's *relatively* easy. Now, how are things for our IAAS provider? .debug[[containers/Orchestration_Overview.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Orchestration_Overview.md)] --- ## Example 2: dynamic datacenter - Q: what's the #1 cost in a datacenter? -- - A: electricity! -- - Q: what uses electricity? -- - A: servers, obviously - A: ... and associated cooling -- - Q: do we always use 100% of our servers? -- - A: obviously not! .debug[[containers/Orchestration_Overview.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Orchestration_Overview.md)] --- ## Example 2: dynamic datacenter - If only we could turn off unused servers during the night... - Problem: we can only turn off a server if it's totally empty! (i.e. all VMs on it are stopped/moved) - Solution: *migrate* VMs and shutdown empty servers (e.g. combine two hypervisors with 40% load into 80%+0%, and shut down the one at 0%) .debug[[containers/Orchestration_Overview.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Orchestration_Overview.md)] --- ## Example 2: dynamic datacenter How do we implement this? - Shut down empty hosts (but keep some spare capacity) - Start hosts again when capacity gets low - Ability to "live migrate" VMs (Xen already did this 10+ years ago) - Rebalance VMs on a regular basis - what if a VM is stopped while we move it? - should we allow provisioning on hosts involved in a migration? *Scheduling* becomes more complex. .debug[[containers/Orchestration_Overview.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Orchestration_Overview.md)] --- ## What is scheduling? According to Wikipedia (again): *In computing, scheduling is the method by which threads, processes or data flows are given access to system resources.* The scheduler is concerned mainly with: - throughput (total amount of work done per time unit); - turnaround time (between submission and completion); - response time (between submission and start); - waiting time (between job readiness and execution); - fairness (appropriate times according to priorities). In practice, these goals often conflict. **"Scheduling" = decide which resources to use.** .debug[[containers/Orchestration_Overview.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Orchestration_Overview.md)] --- ## Exercise 1 - You have: - 5 hypervisors (physical machines) - Each server has: - 16 GB RAM, 8 cores, 1 TB disk - Each week, your team requests: - one VM with X RAM, Y CPU, Z disk Scheduling = deciding which hypervisor to use for each VM. Difficulty: easy! .debug[[containers/Orchestration_Overview.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Orchestration_Overview.md)] --- ## Exercise 2 - You have: - 1000+ hypervisors (and counting!) - Each server has different resources: - 8-500 GB of RAM, 4-64 cores, 1-100 TB disk - Multiple times a day, a different team asks for: - up to 50 VMs with different characteristics Scheduling = deciding which hypervisor to use for each VM. Difficulty: ??? .debug[[containers/Orchestration_Overview.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Orchestration_Overview.md)] --- ## Exercise 2 - You have: - 1000+ hypervisors (and counting!) - Each server has different resources: - 8-500 GB of RAM, 4-64 cores, 1-100 TB disk - Multiple times a day, a different team asks for: - up to 50 VMs with different characteristics Scheduling = deciding which hypervisor to use for each VM. ![Troll face](images/trollface.png) .debug[[containers/Orchestration_Overview.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Orchestration_Overview.md)] --- ## Exercise 3 - You have machines (physical and/or virtual) - You have containers - You are trying to put the containers on the machines - Sounds familiar? .debug[[containers/Orchestration_Overview.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Orchestration_Overview.md)] --- class: pic ## Scheduling with one resource .center[![Not-so-good bin packing](images/binpacking-1d-1.gif)] ## We can't fit a job of size 6 :( .debug[[containers/Orchestration_Overview.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Orchestration_Overview.md)] --- class: pic ## Scheduling with one resource .center[![Better bin packing](images/binpacking-1d-2.gif)] ## ... Now we can! .debug[[containers/Orchestration_Overview.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Orchestration_Overview.md)] --- class: pic ## Scheduling with two resources .center[![2D bin packing](images/binpacking-2d.gif)] .debug[[containers/Orchestration_Overview.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Orchestration_Overview.md)] --- class: pic ## Scheduling with three resources .center[![3D bin packing](images/binpacking-3d.gif)] .debug[[containers/Orchestration_Overview.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Orchestration_Overview.md)] --- class: pic ## You need to be good at this .center[![Tangram](images/tangram.gif)] .debug[[containers/Orchestration_Overview.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Orchestration_Overview.md)] --- class: pic ## But also, you must be quick! .center[![Tetris](images/tetris-1.png)] .debug[[containers/Orchestration_Overview.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Orchestration_Overview.md)] --- class: pic ## And be web scale! .center[![Big tetris](images/tetris-2.gif)] .debug[[containers/Orchestration_Overview.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Orchestration_Overview.md)] --- class: pic ## And think outside (?) of the box! .center[![3D tetris](images/tetris-3.png)] .debug[[containers/Orchestration_Overview.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Orchestration_Overview.md)] --- class: pic ## Good luck! .center[![FUUUUUU face](images/fu-face.jpg)] .debug[[containers/Orchestration_Overview.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Orchestration_Overview.md)] --- ## TL,DR * Scheduling with multiple resources (dimensions) is hard. * Don't expect to solve the problem with a Tiny Shell Script. * There are literally tons of research papers written on this. .debug[[containers/Orchestration_Overview.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Orchestration_Overview.md)] --- ## But our orchestrator also needs to manage ... * Network connectivity (or filtering) between containers. * Load balancing (external and internal). * Failure recovery (if a node or a whole datacenter fails). * Rolling out new versions of our applications. (Canary deployments, blue/green deployments...) .debug[[containers/Orchestration_Overview.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Orchestration_Overview.md)] --- ## Some orchestrators We are going to present briefly a few orchestrators. There is no "absolute best" orchestrator. It depends on: - your applications, - your requirements, - your pre-existing skills... .debug[[containers/Orchestration_Overview.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Orchestration_Overview.md)] --- ## Nomad - Open Source project by Hashicorp. - Arbitrary scheduler (not just for containers). - Great if you want to schedule mixed workloads. (VMs, containers, processes...) - Less integration with the rest of the container ecosystem. .debug[[containers/Orchestration_Overview.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Orchestration_Overview.md)] --- ## Mesos - Open Source project in the Apache Foundation. - Arbitrary scheduler (not just for containers). - Two-level scheduler. - Top-level scheduler acts as a resource broker. - Second-level schedulers (aka "frameworks") obtain resources from top-level. - Frameworks implement various strategies. (Marathon = long running processes; Chronos = run at intervals; ...) - Commercial offering through DC/OS by Mesosphere. .debug[[containers/Orchestration_Overview.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Orchestration_Overview.md)] --- ## Rancher - Rancher 1 offered a simple interface for Docker hosts. - Rancher 2 is a complete management platform for Docker and Kubernetes. - Technically not an orchestrator, but it's a popular option. .debug[[containers/Orchestration_Overview.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Orchestration_Overview.md)] --- ## Swarm - Tightly integrated with the Docker Engine. - Extremely simple to deploy and setup, even in multi-manager (HA) mode. - Secure by default. - Strongly opinionated: - smaller set of features, - easier to operate. .debug[[containers/Orchestration_Overview.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Orchestration_Overview.md)] --- ## Kubernetes - Open Source project initiated by Google. - Contributions from many other actors. - *De facto* standard for container orchestration. - Many deployment options; some of them very complex. - Reputation: steep learning curve. - Reality: - true, if we try to understand *everything*; - false, if we focus on what matters. .debug[[containers/Orchestration_Overview.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Orchestration_Overview.md)] --- class: pic .interstitial[![Image separating from the next part](https://gallant-turing-d0d520.netlify.com/containers/lots-of-containers.jpg)] --- name: toc-reducing-image-size class: title Reducing image size .nav[ [Previous part](#toc-orchestration-an-overview) | [Back to table of contents](#toc-part-6) | [Next part](#toc-multi-stage-builds) ] .debug[(automatically generated title slide)] --- # Reducing image size * In the previous example, our final image contained: * our `hello` program * its source code * the compiler * Only the first one is strictly necessary. * We are going to see how to obtain an image without the superfluous components. .debug[[containers/Multi_Stage_Builds.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Multi_Stage_Builds.md)] --- ## Can't we remove superfluous files with `RUN`? What happens if we do one of the following commands? - `RUN rm -rf ...` - `RUN apt-get remove ...` - `RUN make clean ...` -- This adds a layer which removes a bunch of files. But the previous layers (which added the files) still exist. .debug[[containers/Multi_Stage_Builds.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Multi_Stage_Builds.md)] --- ## Removing files with an extra layer When downloading an image, all the layers must be downloaded. | Dockerfile instruction | Layer size | Image size | | ---------------------- | ---------- | ---------- | | `FROM ubuntu` | Size of base image | Size of base image | | `...` | ... | Sum of this layer + all previous ones | | `RUN apt-get install somepackage` | Size of files added (e.g. a few MB) | Sum of this layer + all previous ones | | `...` | ... | Sum of this layer + all previous ones | | `RUN apt-get remove somepackage` | Almost zero (just metadata) | Same as previous one | Therefore, `RUN rm` does not reduce the size of the image or free up disk space. .debug[[containers/Multi_Stage_Builds.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Multi_Stage_Builds.md)] --- ## Removing unnecessary files Various techniques are available to obtain smaller images: - collapsing layers, - adding binaries that are built outside of the Dockerfile, - squashing the final image, - multi-stage builds. Let's review them quickly. .debug[[containers/Multi_Stage_Builds.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Multi_Stage_Builds.md)] --- ## Collapsing layers You will frequently see Dockerfiles like this: ```dockerfile FROM ubuntu RUN apt-get update && apt-get install xxx && ... && apt-get remove xxx && ... ``` Or the (more readable) variant: ```dockerfile FROM ubuntu RUN apt-get update \ && apt-get install xxx \ && ... \ && apt-get remove xxx \ && ... ``` This `RUN` command gives us a single layer. The files that are added, then removed in the same layer, do not grow the layer size. .debug[[containers/Multi_Stage_Builds.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Multi_Stage_Builds.md)] --- ## Collapsing layers: pros and cons Pros: - works on all versions of Docker - doesn't require extra tools Cons: - not very readable - some unnecessary files might still remain if the cleanup is not thorough - that layer is expensive (slow to build) .debug[[containers/Multi_Stage_Builds.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Multi_Stage_Builds.md)] --- ## Building binaries outside of the Dockerfile This results in a Dockerfile looking like this: ```dockerfile FROM ubuntu COPY xxx /usr/local/bin ``` Of course, this implies that the file `xxx` exists in the build context. That file has to exist before you can run `docker build`. For instance, it can: - exist in the code repository, - be created by another tool (script, Makefile...), - be created by another container image and extracted from the image. See for instance the [busybox official image](https://github.com/docker-library/busybox/blob/fe634680e32659aaf0ee0594805f74f332619a90/musl/Dockerfile) or this [older busybox image](https://github.com/jpetazzo/docker-busybox). .debug[[containers/Multi_Stage_Builds.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Multi_Stage_Builds.md)] --- ## Building binaries outside: pros and cons Pros: - final image can be very small Cons: - requires an extra build tool - we're back in dependency hell and "works on my machine" Cons, if binary is added to code repository: - breaks portability across different platforms - grows repository size a lot if the binary is updated frequently .debug[[containers/Multi_Stage_Builds.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Multi_Stage_Builds.md)] --- ## Squashing the final image The idea is to transform the final image into a single-layer image. This can be done in (at least) two ways. - Activate experimental features and squash the final image: ```bash docker image build --squash ... ``` - Export/import the final image. ```bash docker build -t temp-image . docker run --entrypoint true --name temp-container temp-image docker export temp-container | docker import - final-image docker rm temp-container docker rmi temp-image ``` .debug[[containers/Multi_Stage_Builds.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Multi_Stage_Builds.md)] --- ## Squashing the image: pros and cons Pros: - single-layer images are smaller and faster to download - removed files no longer take up storage and network resources Cons: - we still need to actively remove unnecessary files - squash operation can take a lot of time (on big images) - squash operation does not benefit from cache (even if we change just a tiny file, the whole image needs to be re-squashed) .debug[[containers/Multi_Stage_Builds.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Multi_Stage_Builds.md)] --- ## Multi-stage builds Multi-stage builds allow us to have multiple *stages*. Each stage is a separate image, and can copy files from previous stages. We're going to see how they work in more detail. .debug[[containers/Multi_Stage_Builds.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Multi_Stage_Builds.md)] --- class: pic .interstitial[![Image separating from the next part](https://gallant-turing-d0d520.netlify.com/containers/plastic-containers.JPG)] --- name: toc-multi-stage-builds class: title Multi-stage builds .nav[ [Previous part](#toc-reducing-image-size) | [Back to table of contents](#toc-part-6) | [Next part](#toc-exercise--writing-better-dockerfiles) ] .debug[(automatically generated title slide)] --- # Multi-stage builds * At any point in our `Dockerfile`, we can add a new `FROM` line. * This line starts a new stage of our build. * Each stage can access the files of the previous stages with `COPY --from=...`. * When a build is tagged (with `docker build -t ...`), the last stage is tagged. * Previous stages are not discarded: they will be used for caching, and can be referenced. .debug[[containers/Multi_Stage_Builds.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Multi_Stage_Builds.md)] --- ## Multi-stage builds in practice * Each stage is numbered, starting at `0` * We can copy a file from a previous stage by indicating its number, e.g.: ```dockerfile COPY --from=0 /file/from/first/stage /location/in/current/stage ``` * We can also name stages, and reference these names: ```dockerfile FROM golang AS builder RUN ... FROM alpine COPY --from=builder /go/bin/mylittlebinary /usr/local/bin/ ``` .debug[[containers/Multi_Stage_Builds.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Multi_Stage_Builds.md)] --- ## Multi-stage builds for our C program We will change our Dockerfile to: * give a nickname to the first stage: `compiler` * add a second stage using the same `ubuntu` base image * add the `hello` binary to the second stage * make sure that `CMD` is in the second stage The resulting Dockerfile is on the next slide. .debug[[containers/Multi_Stage_Builds.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Multi_Stage_Builds.md)] --- ## Multi-stage build `Dockerfile` Here is the final Dockerfile: ```dockerfile FROM ubuntu AS compiler RUN apt-get update RUN apt-get install -y build-essential COPY hello.c / RUN make hello FROM ubuntu COPY --from=compiler /hello /hello CMD /hello ``` Let's build it, and check that it works correctly: ```bash docker build -t hellomultistage . docker run hellomultistage ``` .debug[[containers/Multi_Stage_Builds.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Multi_Stage_Builds.md)] --- ## Comparing single/multi-stage build image sizes List our images with `docker images`, and check the size of: - the `ubuntu` base image, - the single-stage `hello` image, - the multi-stage `hellomultistage` image. We can achieve even smaller images if we use smaller base images. However, if we use common base images (e.g. if we standardize on `ubuntu`), these common images will be pulled only once per node, so they are virtually "free." .debug[[containers/Multi_Stage_Builds.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Multi_Stage_Builds.md)] --- ## Build targets * We can also tag an intermediary stage with the following command: ```bash docker build --target STAGE --tag NAME ``` * This will create an image (named `NAME`) corresponding to stage `STAGE` * This can be used to easily access an intermediary stage for inspection (instead of parsing the output of `docker build` to find out the image ID) * This can also be used to describe multiple images from a single Dockerfile (instead of using multiple Dockerfiles, which could go out of sync) ??? :EN:Optimizing our images and their build process :EN:- Leveraging multi-stage builds :FR:Optimiser les images et leur construction :FR:- Utilisation d'un *multi-stage build* .debug[[containers/Multi_Stage_Builds.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Multi_Stage_Builds.md)] --- class: pic .interstitial[![Image separating from the next part](https://gallant-turing-d0d520.netlify.com/containers/train-of-containers-1.jpg)] --- name: toc-exercise--writing-better-dockerfiles class: title Exercise — writing better Dockerfiles .nav[ [Previous part](#toc-multi-stage-builds) | [Back to table of contents](#toc-part-6) | [Next part](#toc-) ] .debug[(automatically generated title slide)] --- # Exercise — writing better Dockerfiles Let's update our Dockerfiles to leverage multi-stage builds! The code is at: https://github.com/jpetazzo/wordsmith Use a different tag for these images, so that we can compare their sizes. What's the size difference between single-stage and multi-stage builds? .debug[[containers/Exercise_Dockerfile_Advanced.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Exercise_Dockerfile_Advanced.md)] --- class: title, self-paced Thank you! .debug[[shared/thankyou.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/shared/thankyou.md)] --- class: title, in-person That's all, folks! Questions? ![end](images/end.jpg) .debug[[shared/thankyou.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/shared/thankyou.md)]
RUN CMD, EXPOSE ... ``` * The build fails as soon as an instruction fails * If `RUN ` fails, the build doesn't produce an image * If it succeeds, it produces a clean image (without test libraries and data) .debug[[containers/Dockerfile_Tips.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Dockerfile_Tips.md)] --- class: pic .interstitial[![Image separating from the next part](https://gallant-turing-d0d520.netlify.com/containers/container-housing.jpg)] --- name: toc-dockerfile-examples class: title Dockerfile examples .nav[ [Previous part](#toc-tips-for-efficient-dockerfiles) | [Back to table of contents](#toc-part-5) | [Next part](#toc-advanced-dockerfile-syntax) ] .debug[(automatically generated title slide)] --- # Dockerfile examples There are a number of tips, tricks, and techniques that we can use in Dockerfiles. But sometimes, we have to use different (and even opposed) practices depending on: - the complexity of our project, - the programming language or framework that we are using, - the stage of our project (early MVP vs. super-stable production), - whether we're building a final image or a base for further images, - etc. We are going to show a few examples using very different techniques. .debug[[containers/Dockerfile_Tips.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Dockerfile_Tips.md)] --- ## When to optimize an image When authoring official images, it is a good idea to reduce as much as possible: - the number of layers, - the size of the final image. This is often done at the expense of build time and convenience for the image maintainer; but when an image is downloaded millions of time, saving even a few seconds of pull time can be worth it. .small[ ```dockerfile RUN apt-get update && apt-get install -y libpng12-dev libjpeg-dev && rm -rf /var/lib/apt/lists/* \ && docker-php-ext-configure gd --with-png-dir=/usr --with-jpeg-dir=/usr \ && docker-php-ext-install gd ... RUN curl -o wordpress.tar.gz -SL https://wordpress.org/wordpress-${WORDPRESS_UPSTREAM_VERSION}.tar.gz \ && echo "$WORDPRESS_SHA1 *wordpress.tar.gz" | sha1sum -c - \ && tar -xzf wordpress.tar.gz -C /usr/src/ \ && rm wordpress.tar.gz \ && chown -R www-data:www-data /usr/src/wordpress ``` ] (Source: [Wordpress official image](https://github.com/docker-library/wordpress/blob/618490d4bdff6c5774b84b717979bfe3d6ba8ad1/apache/Dockerfile)) .debug[[containers/Dockerfile_Tips.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Dockerfile_Tips.md)] --- ## When to *not* optimize an image Sometimes, it is better to prioritize *maintainer convenience*. In particular, if: - the image changes a lot, - the image has very few users (e.g. only 1, the maintainer!), - the image is built and run on the same machine, - the image is built and run on machines with a very fast link ... In these cases, just keep things simple! (Next slide: a Dockerfile that can be used to preview a Jekyll / github pages site.) .debug[[containers/Dockerfile_Tips.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Dockerfile_Tips.md)] --- ```dockerfile FROM debian:sid RUN apt-get update -q RUN apt-get install -yq build-essential make RUN apt-get install -yq zlib1g-dev RUN apt-get install -yq ruby ruby-dev RUN apt-get install -yq python-pygments RUN apt-get install -yq nodejs RUN apt-get install -yq cmake RUN gem install --no-rdoc --no-ri github-pages COPY . /blog WORKDIR /blog VOLUME /blog/_site EXPOSE 4000 CMD ["jekyll", "serve", "--host", "0.0.0.0", "--incremental"] ``` .debug[[containers/Dockerfile_Tips.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Dockerfile_Tips.md)] --- ## Multi-dimensional versioning systems Images can have a tag, indicating the version of the image. But sometimes, there are multiple important components, and we need to indicate the versions for all of them. This can be done with environment variables: ```dockerfile ENV PIP=9.0.3 \ ZC_BUILDOUT=2.11.2 \ SETUPTOOLS=38.7.0 \ PLONE_MAJOR=5.1 \ PLONE_VERSION=5.1.0 \ PLONE_MD5=76dc6cfc1c749d763c32fff3a9870d8d ``` (Source: [Plone official image](https://github.com/plone/plone.docker/blob/master/5.1/5.1.0/alpine/Dockerfile)) .debug[[containers/Dockerfile_Tips.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Dockerfile_Tips.md)] --- ## Entrypoints and wrappers It is very common to define a custom entrypoint. That entrypoint will generally be a script, performing any combination of: - pre-flights checks (if a required dependency is not available, display a nice error message early instead of an obscure one in a deep log file), - generation or validation of configuration files, - dropping privileges (with e.g. `su` or `gosu`, sometimes combined with `chown`), - and more. .debug[[containers/Dockerfile_Tips.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Dockerfile_Tips.md)] --- ## A typical entrypoint script ```dockerfile #!/bin/sh set -e # first arg is '-f' or '--some-option' # or first arg is 'something.conf' if [ "${1#-}" != "$1" ] || [ "${1%.conf}" != "$1" ]; then set -- redis-server "$@" fi # allow the container to be started with '--user' if [ "$1" = 'redis-server' -a "$(id -u)" = '0' ]; then chown -R redis . exec su-exec redis "$0" "$@" fi exec "$@" ``` (Source: [Redis official image](https://github.com/docker-library/redis/blob/d24f2be82673ccef6957210cc985e392ebdc65e4/4.0/alpine/docker-entrypoint.sh)) .debug[[containers/Dockerfile_Tips.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Dockerfile_Tips.md)] --- ## Factoring information To facilitate maintenance (and avoid human errors), avoid to repeat information like: - version numbers, - remote asset URLs (e.g. source tarballs) ... Instead, use environment variables. .small[ ```dockerfile ENV NODE_VERSION 10.2.1 ... RUN ... && curl -fsSLO --compressed "https://nodejs.org/dist/v$NODE_VERSION/node-v$NODE_VERSION.tar.xz" \ && curl -fsSLO --compressed "https://nodejs.org/dist/v$NODE_VERSION/SHASUMS256.txt.asc" \ && gpg --batch --decrypt --output SHASUMS256.txt SHASUMS256.txt.asc \ && grep " node-v$NODE_VERSION.tar.xz\$" SHASUMS256.txt | sha256sum -c - \ && tar -xf "node-v$NODE_VERSION.tar.xz" \ && cd "node-v$NODE_VERSION" \ ... ``` ] (Source: [Nodejs official image](https://github.com/nodejs/docker-node/blob/master/10/alpine/Dockerfile)) .debug[[containers/Dockerfile_Tips.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Dockerfile_Tips.md)] --- ## Overrides In theory, development and production images should be the same. In practice, we often need to enable specific behaviors in development (e.g. debug statements). One way to reconcile both needs is to use Compose to enable these behaviors. Let's look at the [trainingwheels](https://github.com/jpetazzo/trainingwheels) demo app for an example. .debug[[containers/Dockerfile_Tips.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Dockerfile_Tips.md)] --- ## Production image This Dockerfile builds an image leveraging gunicorn: ```dockerfile FROM python RUN pip install flask RUN pip install gunicorn RUN pip install redis COPY . /src WORKDIR /src CMD gunicorn --bind 0.0.0.0:5000 --workers 10 counter:app EXPOSE 5000 ``` (Source: [trainingwheels Dockerfile](https://github.com/jpetazzo/trainingwheels/blob/master/www/Dockerfile)) .debug[[containers/Dockerfile_Tips.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Dockerfile_Tips.md)] --- ## Development Compose file This Compose file uses the same image, but with a few overrides for development: - the Flask development server is used (overriding `CMD`), - the `DEBUG` environment variable is set, - a volume is used to provide a faster local development workflow. .small[ ```yaml services: www: build: www ports: - 8000:5000 user: nobody environment: DEBUG: 1 command: python counter.py volumes: - ./www:/src ``` ] (Source: [trainingwheels Compose file](https://github.com/jpetazzo/trainingwheels/blob/master/docker-compose.yml)) .debug[[containers/Dockerfile_Tips.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Dockerfile_Tips.md)] --- ## How to know which best practices are better? - The main goal of containers is to make our lives easier. - In this chapter, we showed many ways to write Dockerfiles. - These Dockerfiles use sometimes diametrically opposed techniques. - Yet, they were the "right" ones *for a specific situation.* - It's OK (and even encouraged) to start simple and evolve as needed. - Feel free to review this chapter later (after writing a few Dockerfiles) for inspiration! ??? :EN:Optimizing images :EN:- Dockerfile tips, tricks, and best practices :EN:- Reducing build time :EN:- Reducing image size :FR:Optimiser ses images :FR:- Bonnes pratiques, trucs et astuces :FR:- Réduire le temps de build :FR:- Réduire la taille des images .debug[[containers/Dockerfile_Tips.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Dockerfile_Tips.md)] --- class: pic .interstitial[![Image separating from the next part](https://gallant-turing-d0d520.netlify.com/containers/containers-by-the-water.jpg)] --- name: toc-advanced-dockerfile-syntax class: title Advanced Dockerfile Syntax .nav[ [Previous part](#toc-dockerfile-examples) | [Back to table of contents](#toc-part-5) | [Next part](#toc-orchestration-an-overview) ] .debug[(automatically generated title slide)] --- class: title # Advanced Dockerfile Syntax ![construction](images/title-advanced-dockerfiles.jpg) .debug[[containers/Advanced_Dockerfiles.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Advanced_Dockerfiles.md)] --- ## Objectives We have seen simple Dockerfiles to illustrate how Docker build container images. In this section, we will give a recap of the Dockerfile syntax, and introduce advanced Dockerfile commands that we might come across sometimes; or that we might want to use in some specific scenarios. .debug[[containers/Advanced_Dockerfiles.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Advanced_Dockerfiles.md)] --- ## `Dockerfile` usage summary * `Dockerfile` instructions are executed in order. * Each instruction creates a new layer in the image. * Docker maintains a cache with the layers of previous builds. * When there are no changes in the instructions and files making a layer, the builder re-uses the cached layer, without executing the instruction for that layer. * The `FROM` instruction MUST be the first non-comment instruction. * Lines starting with `#` are treated as comments. * Some instructions (like `CMD` or `ENTRYPOINT`) update a piece of metadata. (As a result, each call to these instructions makes the previous one useless.) .debug[[containers/Advanced_Dockerfiles.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Advanced_Dockerfiles.md)] --- ## The `RUN` instruction The `RUN` instruction can be specified in two ways. With shell wrapping, which runs the specified command inside a shell, with `/bin/sh -c`: ```dockerfile RUN apt-get update ``` Or using the `exec` method, which avoids shell string expansion, and allows execution in images that don't have `/bin/sh`: ```dockerfile RUN [ "apt-get", "update" ] ``` .debug[[containers/Advanced_Dockerfiles.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Advanced_Dockerfiles.md)] --- ## More about the `RUN` instruction `RUN` will do the following: * Execute a command. * Record changes made to the filesystem. * Work great to install libraries, packages, and various files. `RUN` will NOT do the following: * Record state of *processes*. * Automatically start daemons. If you want to start something automatically when the container runs, you should use `CMD` and/or `ENTRYPOINT`. .debug[[containers/Advanced_Dockerfiles.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Advanced_Dockerfiles.md)] --- ## Collapsing layers It is possible to execute multiple commands in a single step: ```dockerfile RUN apt-get update && apt-get install -y wget && apt-get clean ``` It is also possible to break a command onto multiple lines: ```dockerfile RUN apt-get update \ && apt-get install -y wget \ && apt-get clean ``` .debug[[containers/Advanced_Dockerfiles.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Advanced_Dockerfiles.md)] --- ## The `EXPOSE` instruction The `EXPOSE` instruction tells Docker what ports are to be published in this image. ```dockerfile EXPOSE 8080 EXPOSE 80 443 EXPOSE 53/tcp 53/udp ``` * All ports are private by default. * Declaring a port with `EXPOSE` is not enough to make it public. * The `Dockerfile` doesn't control on which port a service gets exposed. .debug[[containers/Advanced_Dockerfiles.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Advanced_Dockerfiles.md)] --- ## Exposing ports * When you `docker run -p ...`, that port becomes public. (Even if it was not declared with `EXPOSE`.) * When you `docker run -P ...` (without port number), all ports declared with `EXPOSE` become public. A *public port* is reachable from other containers and from outside the host. A *private port* is not reachable from outside. .debug[[containers/Advanced_Dockerfiles.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Advanced_Dockerfiles.md)] --- ## The `COPY` instruction The `COPY` instruction adds files and content from your host into the image. ```dockerfile COPY . /src ``` This will add the contents of the *build context* (the directory passed as an argument to `docker build`) to the directory `/src` in the container. .debug[[containers/Advanced_Dockerfiles.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Advanced_Dockerfiles.md)] --- ## Build context isolation Note: you can only reference files and directories *inside* the build context. Absolute paths are taken as being anchored to the build context, so the two following lines are equivalent: ```dockerfile COPY . /src COPY / /src ``` Attempts to use `..` to get out of the build context will be detected and blocked with Docker, and the build will fail. Otherwise, a `Dockerfile` could succeed on host A, but fail on host B. .debug[[containers/Advanced_Dockerfiles.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Advanced_Dockerfiles.md)] --- ## `ADD` `ADD` works almost like `COPY`, but has a few extra features. `ADD` can get remote files: ```dockerfile ADD http://www.example.com/webapp.jar /opt/ ``` This would download the `webapp.jar` file and place it in the `/opt` directory. `ADD` will automatically unpack zip files and tar archives: ```dockerfile ADD ./assets.zip /var/www/htdocs/assets/ ``` This would unpack `assets.zip` into `/var/www/htdocs/assets`. *However,* `ADD` will not automatically unpack remote archives. .debug[[containers/Advanced_Dockerfiles.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Advanced_Dockerfiles.md)] --- ## `ADD`, `COPY`, and the build cache * Before creating a new layer, Docker checks its build cache. * For most Dockerfile instructions, Docker only looks at the `Dockerfile` content to do the cache lookup. * For `ADD` and `COPY` instructions, Docker also checks if the files to be added to the container have been changed. * `ADD` always needs to download the remote file before it can check if it has been changed. (It cannot use, e.g., ETags or If-Modified-Since headers.) .debug[[containers/Advanced_Dockerfiles.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Advanced_Dockerfiles.md)] --- ## `VOLUME` The `VOLUME` instruction tells Docker that a specific directory should be a *volume*. ```dockerfile VOLUME /var/lib/mysql ``` Filesystem access in volumes bypasses the copy-on-write layer, offering native performance to I/O done in those directories. Volumes can be attached to multiple containers, allowing to "port" data over from a container to another, e.g. to upgrade a database to a newer version. It is possible to start a container in "read-only" mode. The container filesystem will be made read-only, but volumes can still have read/write access if necessary. .debug[[containers/Advanced_Dockerfiles.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Advanced_Dockerfiles.md)] --- ## The `WORKDIR` instruction The `WORKDIR` instruction sets the working directory for subsequent instructions. It also affects `CMD` and `ENTRYPOINT`, since it sets the working directory used when starting the container. ```dockerfile WORKDIR /src ``` You can specify `WORKDIR` again to change the working directory for further operations. .debug[[containers/Advanced_Dockerfiles.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Advanced_Dockerfiles.md)] --- ## The `ENV` instruction The `ENV` instruction specifies environment variables that should be set in any container launched from the image. ```dockerfile ENV WEBAPP_PORT 8080 ``` This will result in an environment variable being created in any containers created from this image of ```bash WEBAPP_PORT=8080 ``` You can also specify environment variables when you use `docker run`. ```bash $ docker run -e WEBAPP_PORT=8000 -e WEBAPP_HOST=www.example.com ... ``` .debug[[containers/Advanced_Dockerfiles.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Advanced_Dockerfiles.md)] --- ## The `USER` instruction The `USER` instruction sets the user name or UID to use when running the image. It can be used multiple times to change back to root or to another user. .debug[[containers/Advanced_Dockerfiles.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Advanced_Dockerfiles.md)] --- ## The `CMD` instruction The `CMD` instruction is a default command run when a container is launched from the image. ```dockerfile CMD [ "nginx", "-g", "daemon off;" ] ``` Means we don't need to specify `nginx -g "daemon off;"` when running the container. Instead of: ```bash $ docker run /web_image nginx -g "daemon off;" ``` We can just do: ```bash $ docker run /web_image ``` .debug[[containers/Advanced_Dockerfiles.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Advanced_Dockerfiles.md)] --- ## More about the `CMD` instruction Just like `RUN`, the `CMD` instruction comes in two forms. The first executes in a shell: ```dockerfile CMD nginx -g "daemon off;" ``` The second executes directly, without shell processing: ```dockerfile CMD [ "nginx", "-g", "daemon off;" ] ``` .debug[[containers/Advanced_Dockerfiles.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Advanced_Dockerfiles.md)] --- class: extra-details ## Overriding the `CMD` instruction The `CMD` can be overridden when you run a container. ```bash $ docker run -it /web_image bash ``` Will run `bash` instead of `nginx -g "daemon off;"`. .debug[[containers/Advanced_Dockerfiles.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Advanced_Dockerfiles.md)] --- ## The `ENTRYPOINT` instruction The `ENTRYPOINT` instruction is like the `CMD` instruction, but arguments given on the command line are *appended* to the entry point. Note: you have to use the "exec" syntax (`[ "..." ]`). ```dockerfile ENTRYPOINT [ "/bin/ls" ] ``` If we were to run: ```bash $ docker run training/ls -l ``` Instead of trying to run `-l`, the container will run `/bin/ls -l`. .debug[[containers/Advanced_Dockerfiles.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Advanced_Dockerfiles.md)] --- class: extra-details ## Overriding the `ENTRYPOINT` instruction The entry point can be overridden as well. ```bash $ docker run -it training/ls bin dev home lib64 mnt proc run srv tmp var boot etc lib media opt root sbin sys usr $ docker run -it --entrypoint bash training/ls root@d902fb7b1fc7:/# ``` .debug[[containers/Advanced_Dockerfiles.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Advanced_Dockerfiles.md)] --- ## How `CMD` and `ENTRYPOINT` interact The `CMD` and `ENTRYPOINT` instructions work best when used together. ```dockerfile ENTRYPOINT [ "nginx" ] CMD [ "-g", "daemon off;" ] ``` The `ENTRYPOINT` specifies the command to be run and the `CMD` specifies its options. On the command line we can then potentially override the options when needed. ```bash $ docker run -d /web_image -t ``` This will override the options `CMD` provided with new flags. .debug[[containers/Advanced_Dockerfiles.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Advanced_Dockerfiles.md)] --- ## Advanced Dockerfile instructions * `ONBUILD` lets you stash instructions that will be executed when this image is used as a base for another one. * `LABEL` adds arbitrary metadata to the image. * `ARG` defines build-time variables (optional or mandatory). * `STOPSIGNAL` sets the signal for `docker stop` (`TERM` by default). * `HEALTHCHECK` defines a command assessing the status of the container. * `SHELL` sets the default program to use for string-syntax RUN, CMD, etc. .debug[[containers/Advanced_Dockerfiles.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Advanced_Dockerfiles.md)] --- class: extra-details ## The `ONBUILD` instruction The `ONBUILD` instruction is a trigger. It sets instructions that will be executed when another image is built from the image being build. This is useful for building images which will be used as a base to build other images. ```dockerfile ONBUILD COPY . /src ``` * You can't chain `ONBUILD` instructions with `ONBUILD`. * `ONBUILD` can't be used to trigger `FROM` instructions. ??? :EN:- Advanced Dockerfile syntax :FR:- Dockerfile niveau expert .debug[[containers/Advanced_Dockerfiles.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Advanced_Dockerfiles.md)] --- class: pic .interstitial[![Image separating from the next part](https://gallant-turing-d0d520.netlify.com/containers/distillery-containers.jpg)] --- name: toc-orchestration-an-overview class: title Orchestration, an overview .nav[ [Previous part](#toc-advanced-dockerfile-syntax) | [Back to table of contents](#toc-part-6) | [Next part](#toc-reducing-image-size) ] .debug[(automatically generated title slide)] --- # Orchestration, an overview In this chapter, we will: * Explain what is orchestration and why we would need it. * Present (from a high-level perspective) some orchestrators. .debug[[containers/Orchestration_Overview.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Orchestration_Overview.md)] --- class: pic ## What's orchestration? ![Joana Carneiro (orchestra conductor)](images/conductor.jpg) .debug[[containers/Orchestration_Overview.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Orchestration_Overview.md)] --- ## What's orchestration? According to Wikipedia: *Orchestration describes the __automated__ arrangement, coordination, and management of complex computer systems, middleware, and services.* -- *[...] orchestration is often discussed in the context of __service-oriented architecture__, __virtualization__, provisioning, Converged Infrastructure and __dynamic datacenter__ topics.* -- What does that really mean? .debug[[containers/Orchestration_Overview.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Orchestration_Overview.md)] --- ## Example 1: dynamic cloud instances -- - Q: do we always use 100% of our servers? -- - A: obviously not! .center[![Daily variations of traffic](images/traffic-graph.png)] .debug[[containers/Orchestration_Overview.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Orchestration_Overview.md)] --- ## Example 1: dynamic cloud instances - Every night, scale down (by shutting down extraneous replicated instances) - Every morning, scale up (by deploying new copies) - "Pay for what you use" (i.e. save big $$$ here) .debug[[containers/Orchestration_Overview.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Orchestration_Overview.md)] --- ## Example 1: dynamic cloud instances How do we implement this? - Crontab - Autoscaling (save even bigger $$$) That's *relatively* easy. Now, how are things for our IAAS provider? .debug[[containers/Orchestration_Overview.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Orchestration_Overview.md)] --- ## Example 2: dynamic datacenter - Q: what's the #1 cost in a datacenter? -- - A: electricity! -- - Q: what uses electricity? -- - A: servers, obviously - A: ... and associated cooling -- - Q: do we always use 100% of our servers? -- - A: obviously not! .debug[[containers/Orchestration_Overview.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Orchestration_Overview.md)] --- ## Example 2: dynamic datacenter - If only we could turn off unused servers during the night... - Problem: we can only turn off a server if it's totally empty! (i.e. all VMs on it are stopped/moved) - Solution: *migrate* VMs and shutdown empty servers (e.g. combine two hypervisors with 40% load into 80%+0%, and shut down the one at 0%) .debug[[containers/Orchestration_Overview.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Orchestration_Overview.md)] --- ## Example 2: dynamic datacenter How do we implement this? - Shut down empty hosts (but keep some spare capacity) - Start hosts again when capacity gets low - Ability to "live migrate" VMs (Xen already did this 10+ years ago) - Rebalance VMs on a regular basis - what if a VM is stopped while we move it? - should we allow provisioning on hosts involved in a migration? *Scheduling* becomes more complex. .debug[[containers/Orchestration_Overview.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Orchestration_Overview.md)] --- ## What is scheduling? According to Wikipedia (again): *In computing, scheduling is the method by which threads, processes or data flows are given access to system resources.* The scheduler is concerned mainly with: - throughput (total amount of work done per time unit); - turnaround time (between submission and completion); - response time (between submission and start); - waiting time (between job readiness and execution); - fairness (appropriate times according to priorities). In practice, these goals often conflict. **"Scheduling" = decide which resources to use.** .debug[[containers/Orchestration_Overview.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Orchestration_Overview.md)] --- ## Exercise 1 - You have: - 5 hypervisors (physical machines) - Each server has: - 16 GB RAM, 8 cores, 1 TB disk - Each week, your team requests: - one VM with X RAM, Y CPU, Z disk Scheduling = deciding which hypervisor to use for each VM. Difficulty: easy! .debug[[containers/Orchestration_Overview.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Orchestration_Overview.md)] --- ## Exercise 2 - You have: - 1000+ hypervisors (and counting!) - Each server has different resources: - 8-500 GB of RAM, 4-64 cores, 1-100 TB disk - Multiple times a day, a different team asks for: - up to 50 VMs with different characteristics Scheduling = deciding which hypervisor to use for each VM. Difficulty: ??? .debug[[containers/Orchestration_Overview.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Orchestration_Overview.md)] --- ## Exercise 2 - You have: - 1000+ hypervisors (and counting!) - Each server has different resources: - 8-500 GB of RAM, 4-64 cores, 1-100 TB disk - Multiple times a day, a different team asks for: - up to 50 VMs with different characteristics Scheduling = deciding which hypervisor to use for each VM. ![Troll face](images/trollface.png) .debug[[containers/Orchestration_Overview.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Orchestration_Overview.md)] --- ## Exercise 3 - You have machines (physical and/or virtual) - You have containers - You are trying to put the containers on the machines - Sounds familiar? .debug[[containers/Orchestration_Overview.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Orchestration_Overview.md)] --- class: pic ## Scheduling with one resource .center[![Not-so-good bin packing](images/binpacking-1d-1.gif)] ## We can't fit a job of size 6 :( .debug[[containers/Orchestration_Overview.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Orchestration_Overview.md)] --- class: pic ## Scheduling with one resource .center[![Better bin packing](images/binpacking-1d-2.gif)] ## ... Now we can! .debug[[containers/Orchestration_Overview.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Orchestration_Overview.md)] --- class: pic ## Scheduling with two resources .center[![2D bin packing](images/binpacking-2d.gif)] .debug[[containers/Orchestration_Overview.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Orchestration_Overview.md)] --- class: pic ## Scheduling with three resources .center[![3D bin packing](images/binpacking-3d.gif)] .debug[[containers/Orchestration_Overview.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Orchestration_Overview.md)] --- class: pic ## You need to be good at this .center[![Tangram](images/tangram.gif)] .debug[[containers/Orchestration_Overview.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Orchestration_Overview.md)] --- class: pic ## But also, you must be quick! .center[![Tetris](images/tetris-1.png)] .debug[[containers/Orchestration_Overview.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Orchestration_Overview.md)] --- class: pic ## And be web scale! .center[![Big tetris](images/tetris-2.gif)] .debug[[containers/Orchestration_Overview.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Orchestration_Overview.md)] --- class: pic ## And think outside (?) of the box! .center[![3D tetris](images/tetris-3.png)] .debug[[containers/Orchestration_Overview.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Orchestration_Overview.md)] --- class: pic ## Good luck! .center[![FUUUUUU face](images/fu-face.jpg)] .debug[[containers/Orchestration_Overview.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Orchestration_Overview.md)] --- ## TL,DR * Scheduling with multiple resources (dimensions) is hard. * Don't expect to solve the problem with a Tiny Shell Script. * There are literally tons of research papers written on this. .debug[[containers/Orchestration_Overview.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Orchestration_Overview.md)] --- ## But our orchestrator also needs to manage ... * Network connectivity (or filtering) between containers. * Load balancing (external and internal). * Failure recovery (if a node or a whole datacenter fails). * Rolling out new versions of our applications. (Canary deployments, blue/green deployments...) .debug[[containers/Orchestration_Overview.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Orchestration_Overview.md)] --- ## Some orchestrators We are going to present briefly a few orchestrators. There is no "absolute best" orchestrator. It depends on: - your applications, - your requirements, - your pre-existing skills... .debug[[containers/Orchestration_Overview.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Orchestration_Overview.md)] --- ## Nomad - Open Source project by Hashicorp. - Arbitrary scheduler (not just for containers). - Great if you want to schedule mixed workloads. (VMs, containers, processes...) - Less integration with the rest of the container ecosystem. .debug[[containers/Orchestration_Overview.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Orchestration_Overview.md)] --- ## Mesos - Open Source project in the Apache Foundation. - Arbitrary scheduler (not just for containers). - Two-level scheduler. - Top-level scheduler acts as a resource broker. - Second-level schedulers (aka "frameworks") obtain resources from top-level. - Frameworks implement various strategies. (Marathon = long running processes; Chronos = run at intervals; ...) - Commercial offering through DC/OS by Mesosphere. .debug[[containers/Orchestration_Overview.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Orchestration_Overview.md)] --- ## Rancher - Rancher 1 offered a simple interface for Docker hosts. - Rancher 2 is a complete management platform for Docker and Kubernetes. - Technically not an orchestrator, but it's a popular option. .debug[[containers/Orchestration_Overview.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Orchestration_Overview.md)] --- ## Swarm - Tightly integrated with the Docker Engine. - Extremely simple to deploy and setup, even in multi-manager (HA) mode. - Secure by default. - Strongly opinionated: - smaller set of features, - easier to operate. .debug[[containers/Orchestration_Overview.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Orchestration_Overview.md)] --- ## Kubernetes - Open Source project initiated by Google. - Contributions from many other actors. - *De facto* standard for container orchestration. - Many deployment options; some of them very complex. - Reputation: steep learning curve. - Reality: - true, if we try to understand *everything*; - false, if we focus on what matters. .debug[[containers/Orchestration_Overview.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Orchestration_Overview.md)] --- class: pic .interstitial[![Image separating from the next part](https://gallant-turing-d0d520.netlify.com/containers/lots-of-containers.jpg)] --- name: toc-reducing-image-size class: title Reducing image size .nav[ [Previous part](#toc-orchestration-an-overview) | [Back to table of contents](#toc-part-6) | [Next part](#toc-multi-stage-builds) ] .debug[(automatically generated title slide)] --- # Reducing image size * In the previous example, our final image contained: * our `hello` program * its source code * the compiler * Only the first one is strictly necessary. * We are going to see how to obtain an image without the superfluous components. .debug[[containers/Multi_Stage_Builds.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Multi_Stage_Builds.md)] --- ## Can't we remove superfluous files with `RUN`? What happens if we do one of the following commands? - `RUN rm -rf ...` - `RUN apt-get remove ...` - `RUN make clean ...` -- This adds a layer which removes a bunch of files. But the previous layers (which added the files) still exist. .debug[[containers/Multi_Stage_Builds.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Multi_Stage_Builds.md)] --- ## Removing files with an extra layer When downloading an image, all the layers must be downloaded. | Dockerfile instruction | Layer size | Image size | | ---------------------- | ---------- | ---------- | | `FROM ubuntu` | Size of base image | Size of base image | | `...` | ... | Sum of this layer + all previous ones | | `RUN apt-get install somepackage` | Size of files added (e.g. a few MB) | Sum of this layer + all previous ones | | `...` | ... | Sum of this layer + all previous ones | | `RUN apt-get remove somepackage` | Almost zero (just metadata) | Same as previous one | Therefore, `RUN rm` does not reduce the size of the image or free up disk space. .debug[[containers/Multi_Stage_Builds.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Multi_Stage_Builds.md)] --- ## Removing unnecessary files Various techniques are available to obtain smaller images: - collapsing layers, - adding binaries that are built outside of the Dockerfile, - squashing the final image, - multi-stage builds. Let's review them quickly. .debug[[containers/Multi_Stage_Builds.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Multi_Stage_Builds.md)] --- ## Collapsing layers You will frequently see Dockerfiles like this: ```dockerfile FROM ubuntu RUN apt-get update && apt-get install xxx && ... && apt-get remove xxx && ... ``` Or the (more readable) variant: ```dockerfile FROM ubuntu RUN apt-get update \ && apt-get install xxx \ && ... \ && apt-get remove xxx \ && ... ``` This `RUN` command gives us a single layer. The files that are added, then removed in the same layer, do not grow the layer size. .debug[[containers/Multi_Stage_Builds.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Multi_Stage_Builds.md)] --- ## Collapsing layers: pros and cons Pros: - works on all versions of Docker - doesn't require extra tools Cons: - not very readable - some unnecessary files might still remain if the cleanup is not thorough - that layer is expensive (slow to build) .debug[[containers/Multi_Stage_Builds.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Multi_Stage_Builds.md)] --- ## Building binaries outside of the Dockerfile This results in a Dockerfile looking like this: ```dockerfile FROM ubuntu COPY xxx /usr/local/bin ``` Of course, this implies that the file `xxx` exists in the build context. That file has to exist before you can run `docker build`. For instance, it can: - exist in the code repository, - be created by another tool (script, Makefile...), - be created by another container image and extracted from the image. See for instance the [busybox official image](https://github.com/docker-library/busybox/blob/fe634680e32659aaf0ee0594805f74f332619a90/musl/Dockerfile) or this [older busybox image](https://github.com/jpetazzo/docker-busybox). .debug[[containers/Multi_Stage_Builds.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Multi_Stage_Builds.md)] --- ## Building binaries outside: pros and cons Pros: - final image can be very small Cons: - requires an extra build tool - we're back in dependency hell and "works on my machine" Cons, if binary is added to code repository: - breaks portability across different platforms - grows repository size a lot if the binary is updated frequently .debug[[containers/Multi_Stage_Builds.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Multi_Stage_Builds.md)] --- ## Squashing the final image The idea is to transform the final image into a single-layer image. This can be done in (at least) two ways. - Activate experimental features and squash the final image: ```bash docker image build --squash ... ``` - Export/import the final image. ```bash docker build -t temp-image . docker run --entrypoint true --name temp-container temp-image docker export temp-container | docker import - final-image docker rm temp-container docker rmi temp-image ``` .debug[[containers/Multi_Stage_Builds.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Multi_Stage_Builds.md)] --- ## Squashing the image: pros and cons Pros: - single-layer images are smaller and faster to download - removed files no longer take up storage and network resources Cons: - we still need to actively remove unnecessary files - squash operation can take a lot of time (on big images) - squash operation does not benefit from cache (even if we change just a tiny file, the whole image needs to be re-squashed) .debug[[containers/Multi_Stage_Builds.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Multi_Stage_Builds.md)] --- ## Multi-stage builds Multi-stage builds allow us to have multiple *stages*. Each stage is a separate image, and can copy files from previous stages. We're going to see how they work in more detail. .debug[[containers/Multi_Stage_Builds.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Multi_Stage_Builds.md)] --- class: pic .interstitial[![Image separating from the next part](https://gallant-turing-d0d520.netlify.com/containers/plastic-containers.JPG)] --- name: toc-multi-stage-builds class: title Multi-stage builds .nav[ [Previous part](#toc-reducing-image-size) | [Back to table of contents](#toc-part-6) | [Next part](#toc-exercise--writing-better-dockerfiles) ] .debug[(automatically generated title slide)] --- # Multi-stage builds * At any point in our `Dockerfile`, we can add a new `FROM` line. * This line starts a new stage of our build. * Each stage can access the files of the previous stages with `COPY --from=...`. * When a build is tagged (with `docker build -t ...`), the last stage is tagged. * Previous stages are not discarded: they will be used for caching, and can be referenced. .debug[[containers/Multi_Stage_Builds.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Multi_Stage_Builds.md)] --- ## Multi-stage builds in practice * Each stage is numbered, starting at `0` * We can copy a file from a previous stage by indicating its number, e.g.: ```dockerfile COPY --from=0 /file/from/first/stage /location/in/current/stage ``` * We can also name stages, and reference these names: ```dockerfile FROM golang AS builder RUN ... FROM alpine COPY --from=builder /go/bin/mylittlebinary /usr/local/bin/ ``` .debug[[containers/Multi_Stage_Builds.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Multi_Stage_Builds.md)] --- ## Multi-stage builds for our C program We will change our Dockerfile to: * give a nickname to the first stage: `compiler` * add a second stage using the same `ubuntu` base image * add the `hello` binary to the second stage * make sure that `CMD` is in the second stage The resulting Dockerfile is on the next slide. .debug[[containers/Multi_Stage_Builds.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Multi_Stage_Builds.md)] --- ## Multi-stage build `Dockerfile` Here is the final Dockerfile: ```dockerfile FROM ubuntu AS compiler RUN apt-get update RUN apt-get install -y build-essential COPY hello.c / RUN make hello FROM ubuntu COPY --from=compiler /hello /hello CMD /hello ``` Let's build it, and check that it works correctly: ```bash docker build -t hellomultistage . docker run hellomultistage ``` .debug[[containers/Multi_Stage_Builds.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Multi_Stage_Builds.md)] --- ## Comparing single/multi-stage build image sizes List our images with `docker images`, and check the size of: - the `ubuntu` base image, - the single-stage `hello` image, - the multi-stage `hellomultistage` image. We can achieve even smaller images if we use smaller base images. However, if we use common base images (e.g. if we standardize on `ubuntu`), these common images will be pulled only once per node, so they are virtually "free." .debug[[containers/Multi_Stage_Builds.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Multi_Stage_Builds.md)] --- ## Build targets * We can also tag an intermediary stage with the following command: ```bash docker build --target STAGE --tag NAME ``` * This will create an image (named `NAME`) corresponding to stage `STAGE` * This can be used to easily access an intermediary stage for inspection (instead of parsing the output of `docker build` to find out the image ID) * This can also be used to describe multiple images from a single Dockerfile (instead of using multiple Dockerfiles, which could go out of sync) ??? :EN:Optimizing our images and their build process :EN:- Leveraging multi-stage builds :FR:Optimiser les images et leur construction :FR:- Utilisation d'un *multi-stage build* .debug[[containers/Multi_Stage_Builds.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Multi_Stage_Builds.md)] --- class: pic .interstitial[![Image separating from the next part](https://gallant-turing-d0d520.netlify.com/containers/train-of-containers-1.jpg)] --- name: toc-exercise--writing-better-dockerfiles class: title Exercise — writing better Dockerfiles .nav[ [Previous part](#toc-multi-stage-builds) | [Back to table of contents](#toc-part-6) | [Next part](#toc-) ] .debug[(automatically generated title slide)] --- # Exercise — writing better Dockerfiles Let's update our Dockerfiles to leverage multi-stage builds! The code is at: https://github.com/jpetazzo/wordsmith Use a different tag for these images, so that we can compare their sizes. What's the size difference between single-stage and multi-stage builds? .debug[[containers/Exercise_Dockerfile_Advanced.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/containers/Exercise_Dockerfile_Advanced.md)] --- class: title, self-paced Thank you! .debug[[shared/thankyou.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/shared/thankyou.md)] --- class: title, in-person That's all, folks! Questions? ![end](images/end.jpg) .debug[[shared/thankyou.md](https://github.com/jpetazzo/container.training/tree/2021-05-enix/slides/shared/thankyou.md)]