Streamlined Service Delivery Design
We will streamline and integrate the delivery of services, by building a new production platform for integrated development, testing, deployment and hosting of applications.
In 2017, the Technology department started a Streamlined Service Delivery program in the Foundation's annual plan. This pertains to the effort by the department to create tools and processes that will allow developers to easily create services that could run in production infrastructure with minimal (if any) modifications.
Goal
We will build a new production platform for integrated development, testing, deployment, and hosting of applications. This will greatly reduce the complexity and speed of delivering a service and maintaining it throughout its lifecycle, with fewer dependencies between teams and greater automation and integration. The platform will offer more flexibility through support for automatic high-availability and scaling, abstraction from hardware, and a streamlined path from development through testing to deployment. Services will be isolated from each other for increased reliability and security.
Wikimedia developers, as well as third-party users, benefit from the ability to easily replicate the stack for development or their own use cases.
This work also represents an investment in the future; although this will not yet significantly materialize within FY17-18, this project will eventually result in significant cost savings on both capital expenditure (through consolidation of hardware capacity) and staff time (by streamlining development, testing, deployment and maintenance).
Design
At a very high level the program is about creating a number of systems that will allow developers to create, modify and test old and new applications in a streamlined and unified way, eliminating many of the current roadblocks to application development. The resulting new applications would even be good candidates for running on the main production infrastructure maintained by WMF.
The key systems, first listed as items and later explained in more detail one by one would be:
- A development environment
- Testing/building pipeline
- Artifact Registry
- Deployment tooling
- Configuration registry
- Staging environment
- Production environment
Development environment
Testing/building pipeline
Image building is triggered upon submission of any changeset to Gerrit for a supported MediaWiki services project. The job takes a build configuration as input and produces a functional production image through a multi-stage build/test/build/test/register process:
- Build 'test' image
- Functionally test (at low level) application image by running its entrypoint
- Build 'production' image
- Functionally test (at high level) application image through isolated deploy and e2e tests
- Push 'production' image (for further integration and testing in staging)
Build configuration
Wikimedia Release Engineering Team has developed a build configuration schema and namesake tool called Blubber for defining as YAML how a services application image should be built and run, and to provide a command-line tool that transforms configuration into intermediate Dockerfiles, respectively. The primary purpose of this tooling is to give teams control over how their application images are built while still maintaining a level of operational sanity/security not afforded by raw Dockerfiles.
Blubber allows for different image "variants" to be defined in a project's configuration, related but somewhat functionally different image configurations that are meant to strike a balance between dev/test/staging/prod parity and the variations required between these runtime environments.
Each Blubber configuration in a project's repo MUST define at least a 'test' and a 'production' variant to be supported by the pipeline. A canonical example of Blubber configuration can currently be found in the Mathoid repo.
Step 1: Build 'test' image
Generate a Dockerfile using the "test" variant defined in the repo's Blubber configuration, and pipe the Dockerfile to docker build
. The repo's root directory is used as the context for the build.
Step 2: Run image entry point
Run the image, delegating to the defined entry point for functional testing of the application at a low level. This should typically mean running the repo's unit tests and integration tests but it's ultimately left to the repo owner to decide what's appropriate.
The 'test' entry point MUST execute successfully for the build pipeline to continue.
Step 3: Build 'production' image
Generate a Dockerfile using the "production" variant defined in the repo's Blubber configuration, and build using docker build
in the repo's root directory.
The repo's Blubber configuration SHOULD utilize the copies
instruction to invoke a multi-stage build. Multi-stage image builds use a preparatory image w/ build dependencies but a minimal base image for the final production image, greatly reducing the latter's size. Again, see Mathoid's configuration for a canonical example.
Step 4: Isolated deploy and e2e tests
Verifies the basic functionality of the 'production' image in the context of an isolated k8s deployment. The image is deployed using Helm, and a very basic set of e2e smoke tests are invoked against the application's endpoint—from outside the pod, exercising the full application stack as well as any provisioned k8s service and load balancer.
Helm provides a facility for running such functional tests against a deployment. We should experiment with it to see if it's a viable option.
Step 5: Push 'production' image for staging deploy
At this point, there should be enough confidence in the 'production' image that it's ready for deployment and further integration/testing within the context of the staging environment.
Artifact Registry
The artifact (or artefact) registry will be a container registry that will be publicly available to everyone in the world. The ability to upload artificats to it will be restricted to the pipeline and Technical Operations members who have to perform security upgrades and need to upload new container images. Every image that has been successfully built by the pipeline will be uploaded to the artifact registry and be ready for deployment.
It is already implemented using the docker provided registry with a swift backend. This allows the storage powering the registry to scale horizontally.
The ability for anyone in the world to download our images means they will be reusable by the development environment which in turns means every developer will be able to reuse the code we run in production as is.
Deployment tooling
Configuration registry
Services running in production WILL have different configuration from services running in other environments like the dev environment. This is expected and welcomed since it allows for greater flexibility. That configuration should be kept in a (probably public) place allowing interested parties to review and propose changes. Let's call that place the configuration registry. The deployment tooling will be using that configuration registry to bootstrap and configure services in production whereas it should either not be used for other environments (e.g. development environments) or a different version of it should be used. The staging environment, since it's host in WMF will probably be using the same configuration registry, perhaps with some differences. For now, the logical choice for this configuration registry is git. Configuration is text, something in which git excels tracking. A git repo is flexible enough to allow supporting more than one environments via branches or different git repos can be used per environment. This section will probably need way more work to be defined in the future.
Staging environment
What it will be
The staging environment is meant to be a functional copy of the production environment, albeit with smaller capacity and availability guarantees. It will not be available in both data centers and the nodes powering it will always be fewer in number. In every other way, it is expected it will be an exact copy at every point in time of production
Mode of operation
This is where the artifacts from the artifact registry will be deployed to after being successfully built by the testing/building pipeline, preferably directly from the pipeline instrumentation (automatically or after human vetting) allowing developers to have a canary deployment environment, allowing catching of errors early before they make it to production.
What it will NOT be
The staging environment SHOULD NOT be considered as a testing environment. It SHOULD be considered as the last step and chance for catching bugs before they impact end-users.
Production environment
What it will be
The production environment will be the one where live traffic from end-users is directed.
Highly Available in all manners possible
The above goal will be implemented via running the containers produced from the above stages in a number of kubernetes clusters (currently 1 per primary DC). Kubernetes implements a lot of the above. By scheduling different workloads and workload types across a fleet of kubernetes nodes and monitoring above said workloads, stopping and starting as necessary according to algorithms, it provides high availability. In case of a sudden failure of an application it will schedule an automatic depooling of said application, scheduled quick restart and repooling of it. Given the nature of above said workloads it is possible to run multiple incarnations of it in parallel (called pods) providing load balancing and high availability.
Present on all primary DCs (2 currently)
Currently we already have 2 clusters, 1 per primary DC and hardware for the future has already thought of.
Adequately filtering traffic between applications as well as the rest of the world
Using the Calico framework and the network policies that it allows, it is possible (and has already been implemented) to very specifically define the outbound network connections allowed to an application or to other applications reaching inbound to it. This is expected to minimize the exposure of services to connections from unwarranted services while also protecting the rest of the fleet from said applications in case of a compromise.
Equipped with Access Control mechanisms for configuration and deployment of applications
Industry Standardized Role Based Access control is used to provide the required only rights for deploying an application, following https://en.wikipedia.org/wiki/Principle_of_least_privilege
Capable of high levels of inbound/outbound traffic
The inbound mechanisms to the kubernetes clusters are going to be the well tested in WMF infrastructure LVS-DR alongside pybal, allowing high levels of traffic reaching the clusters at specific Node ports (TCP and UDP). The nodes will follow the standard LVS practice across the current implementation of our data centers to return the traffic directly to the caller allowing for high volumes of asymmetrical traffic, which favors the way the Web currently works, with small requests and big in size answers.
Capable of load balancing traffic between various application endpoints
Load balancing happens natively by kubernetes using a stochastic system where traffic is probabilistically routed to an application instance (called a pod) regardless of the point of origin. There's not really much more to add to that.
Capable of responding both automatically and manually to increases/decreases of traffic
Kubernetes also allows employing autoscalers making the ability to respond to sudden increase of inbound traffic automatically. It's of course possible to scale manually, increase/descreasing manually the number of replicas available for each application.
Supporting monitoring, health checks
Pods and services in the kubernetes environment should be monitored. While pods are ephemeral and alerting on them is not really useful, there are a number of meta-level alerts useful for the pod level. Services on the other hand should be monitored and alerted on. Some of the things described below are already implemented, some are guidelines.
Pods
Metrics for pods already exist. A sample dashboard exists at https://grafana.wikimedia.org/d/000000473/kubernetes-pods. It covers
- CPU
- Memory
- Number of containers/pod
- IOPS
- Pod execution latency
- container lifetime
Specific alerts haven't been yet created on these as we have no experience yet running services under kubernetes. We will wait some time to spot out the abnormalities and create alerts then.
Probes
Probes are arbitrary short in duration checks that happen in the context of container and are connected with an action taken by kubernetes. 2 kinds of probes currently exist, liveness and readiness.
Liveness
Containers will be auto restarted by kubernetes if they fail a basic liveness probe. In https://gerrit.wikimedia.org/r/#/c/392619/4/_scaffold/values.yaml the basic liveness probe is an `HTTP GET` request to `/`
The URL part can be overriden on a per service basis and it is expected the services using `service-runner` will define `/_info` or `/?spec`. An endpoint that can be used as a liveness probe MUST exist
Readiness
If a pod fails a readiness probe no traffic will be directed to it until it stops failing that probe. This allows a probe to inform kubernetes it is overwhelmed and traffic should be directed elsewhere.
In https://gerrit.wikimedia.org/r/#/c/392619/4/_scaffold/values.yaml the basic readiness probe is an `HTTP GET` request to `/`
The URL part can be overriden on a per service basis and it is expected the services using `service-runner` will define `/_info` or `/?spec`. An endpoint that can be used as a liveness probe MUST exist. It's fine if that endpoint is the same as the liveness endpoint. Services that are able of knowing when they are overloaded however SHOULD create and specify a readiness endpoint
Services
Kubernetes services can be exposed via a variety of ways. In our environment after some discussions we decided that for now we will standardize on `NodePort` type services.
Below is a quick recap of kubernetes service types and how they function and their usage in WMF and how we intend to monitor
ClusterIP
Those are services that are meant to only exist intra-cluster. Those are useful if practically all callers of a service are only going to be in the kubernetes cluster. At least for now we won't be having those types of services as they are a) not easily monitored outside the cluster, b) requiring a critical mass of services in the kubernetes cluster. Service owners should NOT be asking for this type of service
NodePort
This is the type of services we will be going with. Effectively for every service of that type, a port is chosen on every node and every node uses it to publish the service. We will be using an LVS IP (every node will have all LVS IPs same as for other stuff in our infrastructure) for every service in order to both decouple services from IP couplings and avoid port conflicts.
We expect to leverage the standard icinga infrastructure we currently have, using service-checker
to monitor every such service in the standard way we monitor all non kubernetes based services. That will maintain the status quo and allow us to move forward without extra disruption. All services SHOULD partially or fully conform to the `service-checker` contract
The aforementioned contract is already implemented, but it would be nice to fully document it.
Headless services
Those are services that have no serviceIP assigned (e.g. headless services). We won't alert on failures of such services. Their usage will be actively discouraged. Services owners SHOULD NOT ask for this type of service for now
LoadBalancer
Very specific type of service, tightly bound a cloud provider's load balancer (e.g. ELB). We will NOT be having this kind of service at all, ever due to technical limitations (it was not designed for bare metal use)
ExternalName
Practically a service that is a CNAME. Useful for internal service discovery. We will NOT be having this kind of service at all, at least for now
Supporting telemetry and transparent encryption for application communication
See Services_Proxy
Rolling deploys
Kubernetes allowing natively rolling deploys. This can be done both behind the scenes by kubectl, but it is also implemented by helm. This happens by creating a new Replication Controller in kubernetes that gradually increases the number of replicas available while the number of replicas for the old one is gradually decreased until it reaches 0, at which point the old replication controller is deleted and the entire of the application has been upgraded. The rolling update is also capable of being rolled back quite quickly. Keeping in mind that kubernetes performs health checks of applications and this allows monitoring of applications and pausing, rolling back easily
Relation to the pipeline
The pipeline instrumentation WILL NOT be automatically updating application code in production. This WILL have to be vetted and done by the developers so that surprises are kept to a minimum, the current status quo is maintained, possibly easing adoption of the new infrastructure.