Jump to content

Data Platform/Systems/Airflow/Upgrading

From Wikitech

We have the following procedures in place to upgrade Airflow versions.

Configure feature branch

Check out a working copy of the https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags repository

Create a feature branch, for example: git checkout -b airflow_version_2_6

Determine the new version number, which is based on

  • The airflow major, minor, and patch versions
  • The python environment major and minor versions
  • The date of creation

For example: 2.6.0-py3.10-20230510

Update conda environment

Modify the conda-environment.yml file with the new version of airflow that is to be used.

Execute ./generate_conda_environment_lock_yml.sh which will update the file generate_conda_environment_lock_yml.sh with the specific package versions in the environment.

This file may require modification before proceeding. One known error is that it removes the gitlab URLs from the workflow_utils and conda-pack packages. These modifications should be reverted.

Execute ./check_conda_environment_lock_yml.sh and verify that a conda envionment can in fact be created from the specification. Check for the output:

An environment could be created form conda-environment.lock.yml

Execute unit tests

Execute the following:

export PYTHONPATH=.:./wmf_airflow_common/plugins
tox
tox -e lint

Ensure that all tests pass

Update debian packaging parameters

Update the gitlab.ci.yaml and Dockerfile files with the new package version.

Add an entry to the debian/changelog - This can be made a little simpler with the command dch -v 2.6.0-py3.10-20230510 -D buster-wikimedia --force-distribution

Stage your changes for commit: git add gitlab-ci.yaml Dockerfile debian/changelog

Commit your chages with git commit

Build the package with GitLab CI

Push your branch back to GitLab with git push --set-upstream origin update_airflow_2_6

Browse to the feature branch in GitLab and locate the CI pipeline for the branch, as shown in the following image.

Location of the link to the airflow feature branch pipeline
Location of the link to the airflow feature branch pipeline

Click on the triangle in the publish_airflow_package_with_docker stage, with the PACKAGE_VERSION=x.y.z variable.

Location of the publish airflow package with docker stage
Location of the publish airflow package with docker stage

When the stage has built successfully, navigate to the Package Registry section of the repository.

Location of the Airflow package built by GitLab-CI
Location of the Airflow package built by GitLab-CI

Locate the package that has been built.

Copy the download link and download this file to your test host with a command like the following:

curl -o airflow-2.6.0-py3.10-20230510_amd64.deb https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/package_files/1257/download

[SRE Only] Install the package with sudo apt install airflow-2.6.0-py3.10-20230510_amd64.deb

Run the database check command: sudo -u analytics airflow-analytics_test db check

Run the database upgrade command: sudo -u analytics airflow-analytics_test db upgrade

Once the upgrade on the test instance has proven to work, publish the new deb to our deb repos:

root@apt1002:~# wget https://gitlab.wikimedia.org/api/v4/projects/93/packages/generic/airflow/2.9.3/airflow-2.9.3_amd64.deb
--2024-09-16 10:47:06--  https://gitlab.wikimedia.org/api/v4/projects/93/packages/generic/airflow/2.9.3/airflow-2.9.3_amd64.deb
Resolving gitlab.wikimedia.org (gitlab.wikimedia.org)... 2620:0:860:1:208:80:153:8, 208.80.153.8
Connecting to gitlab.wikimedia.org (gitlab.wikimedia.org)|2620:0:860:1:208:80:153:8|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 579378076 (553M) [application/octet-stream]
Saving to: ‘airflow-2.9.3_amd64.deb’

airflow-2.9.3_amd64.deb                                     100%[=========================================================================================================================================>] 552.54M   110MB/s    in 5.5s

2024-09-16 10:47:12 (101 MB/s) - ‘airflow-2.9.3_amd64.deb’ saved [579378076/579378076]


root@apt1002:~# mv airflow-2.9.3_amd64.deb airflow-2.9.3-py3.10-20240916_amd64.deb
root@apt1002:~# reprepro includedeb bullseye-wikimedia `pwd`/airflow-2.9.3-py3.10-20240916_amd64.deb
Exporting indices...
root@apt1002:~# reprepro includedeb buster-wikimedia `pwd`/airflow-2.9.3-py3.10-20240916_amd64.deb
Exporting indices...
Deleting files no longer referenced...
root@apt1002:~# reprepro ls airflow
airflow | 2.9.3-py3.10-20240916 |   buster-wikimedia | amd64
airflow | 2.9.3-py3.10-20240916 | bullseye-wikimedia | amd64

You then need to send a puppet patch stack to upgrade each instance individually, and a final patch to make the new version the default one (example). For each instance, after puppet has installed the new deb version, run the following commands:

sudo systemctl restart airflow-{webserver,scheduler,kerberos}@*.service
sudo -u $(sudo systemctl cat airflow-scheduler@*.service | grep User | cut -d= -f2) $(sudo systemctl cat airflow-scheduler@*.service | grep ExecStart | cut -d= -f2 | awk '{ print $1 }') db check
sudo -u $(sudo systemctl cat airflow-scheduler@*.service | grep User | cut -d= -f2) $(sudo systemctl cat airflow-scheduler@*.service | grep ExecStart | cut -d= -f2 | awk '{ print $1 }') db migrate