Server Admin Log/Archive 91

2025-03-31

23:13 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1173.eqiad.wmnet
23:06 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-worker1173.eqiad.wmnet
23:05 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1172.eqiad.wmnet
22:57 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-worker1172.eqiad.wmnet
22:49 btullis@cumin1002: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host an-worker1174.eqiad.wmnet
22:38 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-worker1174.eqiad.wmnet
22:35 btullis@cumin1002: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker[1173-1174].eqiad.wmnet
22:33 btullis@cumin1002: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker[1173-1174].eqiad.wmnet
22:27 btullis@cumin1002: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1172.eqiad.wmnet
22:25 btullis@cumin1002: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1172.eqiad.wmnet
22:13 reedy@deploy1003: Synchronized php-1.44.0-wmf.22/includes/libs/filebackend/FileBackend.php: T384851 (duration: 02m 14s)
22:02 tgr_: UTC late deploys done
22:01 tgr@deploy1003: Finished scap sync-world: Backport for Add EmailAuth provider to local domain exclusion list (T390437) (duration: 15m 37s)
21:54 tgr@deploy1003: tgr: Continuing with sync
21:51 tgr@deploy1003: tgr: Backport for Add EmailAuth provider to local domain exclusion list (T390437) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:45 tgr@deploy1003: Started scap sync-world: Backport for Add EmailAuth provider to local domain exclusion list (T390437)
21:29 tgr@deploy1003: Finished scap sync-world: Backport for EmailAuth: Prepare config for enabling in log-only mode (T390437), EmailAuth: Enable info level logging (T390437) (duration: 26m 08s)
21:22 tgr@deploy1003: kharlan, tgr: Continuing with sync
21:08 tgr@deploy1003: kharlan, tgr: Backport for EmailAuth: Prepare config for enabling in log-only mode (T390437), EmailAuth: Enable info level logging (T390437) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:03 tgr@deploy1003: Started scap sync-world: Backport for EmailAuth: Prepare config for enabling in log-only mode (T390437), EmailAuth: Enable info level logging (T390437)
21:01 tgr@deploy1003: Sync cancelled.
20:53 tgr@deploy1003: tgr, kharlan: Backport for EmailAuth: Prepare config for enabling in log-only mode (T390437) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:48 tgr@deploy1003: Started scap sync-world: Backport for EmailAuth: Prepare config for enabling in log-only mode (T390437)
20:46 tgr@deploy1003: Finished scap sync-world: Backport for Deploy dark mode and Vector 2022 to German Wikipedia (T387155), Enable Vector 2022 for Russian Wikimedia and arbcom_ruwiki (T390112) (duration: 17m 56s)
20:39 tgr@deploy1003: jdlrobson, tgr: Continuing with sync
20:33 tgr@deploy1003: jdlrobson, tgr: Backport for Deploy dark mode and Vector 2022 to German Wikipedia (T387155), Enable Vector 2022 for Russian Wikimedia and arbcom_ruwiki (T390112) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:28 tgr@deploy1003: Started scap sync-world: Backport for Deploy dark mode and Vector 2022 to German Wikipedia (T387155), Enable Vector 2022 for Russian Wikimedia and arbcom_ruwiki (T390112)
20:26 tgr@deploy1003: Finished scap sync-world: Backport for REST: enable Specs module on certain wikis, adjust Sandbox modules (T389407), Throttle exemption for Editathon at Universidad Nacional de La Plata - 9 April 2025 (T390290), EmailAuth: Add EmailAuthRequireToken hook implementation (T390437) (duration: 12m 59s)
20:26 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on P{cp1114.eqiad.wmnet} and A:cp
20:26 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on P{cp1115.eqiad.wmnet} and A:cp
20:22 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on P{cp1115.eqiad.wmnet} and A:cp
20:20 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on P{cp1114.eqiad.wmnet} and A:cp
20:19 tgr@deploy1003: superpes, bpirkle, tgr, kharlan: Continuing with sync
20:18 tgr@deploy1003: superpes, bpirkle, tgr, kharlan: Backport for REST: enable Specs module on certain wikis, adjust Sandbox modules (T389407), Throttle exemption for Editathon at Universidad Nacional de La Plata - 9 April 2025 (T390290), EmailAuth: Add EmailAuthRequireToken hook implementation (T390437) synced to the testservers (https://wikitech.wikimedia.org/wiki
20:13 tgr@deploy1003: Started scap sync-world: Backport for REST: enable Specs module on certain wikis, adjust Sandbox modules (T389407), Throttle exemption for Editathon at Universidad Nacional de La Plata - 9 April 2025 (T390290), EmailAuth: Add EmailAuthRequireToken hook implementation (T390437)
20:06 dancy@deploy1003: Finished scap sync-world: Backport for .gitmodules: Add extensions/EmailAuth (T390437), extension-list: Add EmailAuth (T390437) (duration: 20m 53s)
19:55 dancy@deploy1003: dancy: Continuing with sync
19:53 dancy@deploy1003: dancy: Backport for .gitmodules: Add extensions/EmailAuth (T390437), extension-list: Add EmailAuth (T390437) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
19:51 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on P{cp1112.eqiad.wmnet} and A:cp
19:51 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on P{cp1113.eqiad.wmnet} and A:cp
19:46 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on P{cp1113.eqiad.wmnet} and A:cp
19:46 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on P{cp1112.eqiad.wmnet} and A:cp
19:45 dancy@deploy1003: Started scap sync-world: Backport for .gitmodules: Add extensions/EmailAuth (T390437), extension-list: Add EmailAuth (T390437)
19:22 kamila@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'apply'.
19:22 kamila@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/admin 'apply'.
19:20 kamila@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'apply'.
19:19 kamila@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/admin 'apply'.
19:17 dzahn@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
19:16 dzahn@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
19:14 dzahn@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'apply'.
19:13 dzahn@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/admin 'apply'.
19:13 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on P{cp1110.eqiad.wmnet} and A:cp
19:13 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on P{cp1111.eqiad.wmnet} and A:cp
19:08 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on P{cp1111.eqiad.wmnet} and A:cp
19:08 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on P{cp1110.eqiad.wmnet} and A:cp
19:06 dancy@deploy1003: Started scap sync-world: Backport for .gitmodules: Add extensions/EmailAuth (T390437), extension-list: Add EmailAuth (T390437)
19:01 dancy: Deploying EmailAuth extension to wmf.22 for T390437
18:29 dzahn@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'apply'.
18:29 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on P{cp1108.eqiad.wmnet} and A:cp
18:28 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on P{cp1109.eqiad.wmnet} and A:cp
18:28 dzahn@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/admin 'apply'.
18:23 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on P{cp1109.eqiad.wmnet} and A:cp
18:23 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on P{cp1108.eqiad.wmnet} and A:cp
17:46 dzahn@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'apply'.
17:46 dzahn@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/admin 'apply'.
17:45 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on P{cp1106.eqiad.wmnet} and A:cp
17:43 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on P{cp1107.eqiad.wmnet} and A:cp
17:43 dzahn@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'apply'.
17:41 dzahn@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/admin 'apply'.
17:39 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on P{cp1106.eqiad.wmnet} and A:cp
17:39 brett@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp1105.eqiad.wmnet
17:38 brett@cumin2002: END (ERROR) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=97) rolling upgrade of Varnish on P{cp1105.eqiad.wmnet} and A:cp
17:38 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on P{cp1107.eqiad.wmnet} and A:cp
17:38 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on P{cp1105.eqiad.wmnet} and A:cp
17:21 dzahn@dns1004: END - running authdns-update
17:18 dzahn@dns1004: START - running authdns-update
17:16 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1008.eqiad.wmnet with OS bullseye
17:14 swfrench-wmf: attempting to reproduce T389734 with enhanced logging on mwdebug1001
17:12 dancy@deploy1003: Installation of scap version "4.148.0" completed for 193 hosts
17:09 dzahn@dns1004: START - running authdns-update
17:08 dancy@deploy1003: Installing scap version "4.148.0" for 193 host(s)
17:04 maryum: deploy fix for T389727
17:03 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on P{cp1104.eqiad.wmnet} and A:cp
17:03 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on P{cp1105.eqiad.wmnet} and A:cp
16:58 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on P{cp1105.eqiad.wmnet} and A:cp
16:58 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on P{cp1104.eqiad.wmnet} and A:cp
16:53 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1008.eqiad.wmnet with reason: host reimage
16:50 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1008.eqiad.wmnet with reason: host reimage
16:46 otto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: apply
16:46 otto@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: apply
16:32 otto@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: apply
16:31 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudelastic1008.eqiad.wmnet with OS bullseye
16:31 otto@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: apply
16:29 otto@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-main: apply
16:29 otto@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-main: apply
16:28 ottomata: beginning eventgate-main upgrade to NodeJS 20 - T383814
16:21 maryum: deploy fix for T389727
16:12 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on P{cp1102.eqiad.wmnet} and A:cp
16:11 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on P{cp1103.eqiad.wmnet} and A:cp
16:07 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on P{cp1102.eqiad.wmnet} and A:cp
16:07 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on P{cp1103.eqiad.wmnet} and A:cp
16:00 jdrewniak@deploy1003: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 02m 35s)
15:58 jdrewniak@deploy1003: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 11m 48s)
15:55 jmm@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ganeti4007.ulsfo.wmnet
15:55 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4007.ulsfo.wmnet
15:47 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cloudsw2-d5-eqiad
15:47 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device cloudsw2-d5-eqiad
15:46 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device fasw2-c8a-codfw
15:45 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4007.ulsfo.wmnet
15:44 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device fasw2-c8a-codfw
15:44 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device fasw2-c8b-codfw
15:44 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device fasw2-c8b-codfw
15:39 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device fasw2-c8b-codfw
15:37 ayounsi@cumin1002: START - Cookbook sre.network.tls for network device fasw2-c8b-codfw
15:33 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on P{cp1100.eqiad.wmnet} and A:cp
15:32 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on P{cp1101.eqiad.wmnet} and A:cp
15:30 volans: uploaded spicerack_10.0.0 to apt.wikimedia.org bullseye-wikimedia
15:27 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on P{cp1101.eqiad.wmnet} and A:cp
15:27 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on P{cp1100.eqiad.wmnet} and A:cp
15:03 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ganeti4007.ulsfo.wmnet
15:02 jmm@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on ganeti4007.ulsfo.wmnet with reason: remove from cluster for reimage
15:02 elukey: `elukey@cumin1002:~$ sudo cumin 'registry*' 'rm -rf /var/cache/nginx-docker-registry'` - T390251
14:53 dcausse@deploy1003: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
14:53 dcausse@deploy1003: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
14:52 dcausse@deploy1003: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
14:52 dcausse@deploy1003: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
14:45 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster cloudelastic: apply new plugins - bking@cumin2002 - T390100
14:41 dcausse@deploy1003: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
14:41 dcausse@deploy1003: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
14:40 dcausse@deploy1003: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
14:40 dcausse@deploy1003: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
14:32 dcausse@deploy1003: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
14:31 dcausse@deploy1003: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
14:31 dcausse@deploy1003: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
14:30 dcausse@deploy1003: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
14:30 dcausse@deploy1003: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
14:30 dcausse@deploy1003: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
14:25 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti4007.ulsfo.wmnet
14:23 dcausse@deploy1003: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
14:22 dcausse@deploy1003: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
14:08 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster cloudelastic: apply new plugins - bking@cumin2002 - T390100
14:02 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: relforge1004* for ban relforge1004 prior to service restart and decom T390565 - bking@cumin2002 - T390100
14:01 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: relforge1004* for ban relforge1004 prior to service restart and decom T390565 - bking@cumin2002 - T390100
13:58 godog: move k8s instances from prometheus1005 to prometheus1007 - T383232
13:57 cgoubert@deploy1003: Finished scap sync-world: Deploy mediawiki chart 0.8.8 (duration: 02m 19s)
13:57 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
13:56 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
13:56 cgoubert@deploy1003: cgoubert: Continuing with sync
13:55 cgoubert@deploy1003: cgoubert: Deploy mediawiki chart 0.8.8 synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:55 cgoubert@deploy1003: Started scap sync-world: Deploy mediawiki chart 0.8.8
13:52 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
13:52 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
13:51 tgr_: UTC afternoon deploys done
13:51 tgr@deploy1003: Finished scap sync-world: Backport for Enable SUL3 everywhere (T384220), OATHAuth: Mark centralnoticeadmin as requiring 2FA (T208113), SpecialTranslationTargetLanguages: Use cxserver-supported language codes (T390300) (duration: 20m 25s)
13:50 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
13:49 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
13:45 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
13:45 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
13:44 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
13:44 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
13:44 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
13:44 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
13:43 tgr@deploy1003: ngkountas, tgr: Continuing with sync
13:35 tgr@deploy1003: ngkountas, tgr: Backport for Enable SUL3 everywhere (T384220), OATHAuth: Mark centralnoticeadmin as requiring 2FA (T208113), SpecialTranslationTargetLanguages: Use cxserver-supported language codes (T390300) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:33 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
13:33 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
13:32 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
13:31 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
13:31 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
13:30 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
13:30 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
13:30 tgr@deploy1003: Started scap sync-world: Backport for Enable SUL3 everywhere (T384220), OATHAuth: Mark centralnoticeadmin as requiring 2FA (T208113), SpecialTranslationTargetLanguages: Use cxserver-supported language codes (T390300)
13:30 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
13:30 volans@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:05:00 on sretest1002.eqiad.wmnet with reason: Test
13:27 tgr@deploy1003: Finished scap sync-world: Backport for REST: Enable REST Sandbox on an initial set of production wikis (T389407) (duration: 18m 21s)
13:27 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
13:27 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
13:26 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
13:26 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
13:25 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
13:25 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
13:22 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: relforge1003* for ban relforge1003 prior to service restart - bking@cumin2002 - T390100
13:22 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: relforge1003* for ban relforge1003 prior to service restart - bking@cumin2002 - T390100
13:20 tgr@deploy1003: bpirkle, tgr: Continuing with sync
13:19 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: apply updated master config - bking@cumin2002 - T390100
13:19 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: apply updated master config - bking@cumin2002 - T390100
13:17 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
13:16 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
13:14 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
13:14 tgr@deploy1003: bpirkle, tgr: Backport for REST: Enable REST Sandbox on an initial set of production wikis (T389407) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:14 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
13:09 tgr@deploy1003: Started scap sync-world: Backport for REST: Enable REST Sandbox on an initial set of production wikis (T389407)
13:06 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
13:06 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
13:02 dcausse@deploy1003: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
13:02 dcausse@deploy1003: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
13:00 dcausse@deploy1003: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
13:00 klausman@deploy1003: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply
13:00 dcausse@deploy1003: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
13:00 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti4007.ulsfo.wmnet
13:00 klausman@deploy1003: helmfile [eqiad] START helmfile.d/services/api-gateway: apply
12:59 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti4007.ulsfo.wmnet
12:58 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti4007.ulsfo.wmnet
12:57 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti4006.ulsfo.wmnet to cluster ulsfo and group 1
12:56 dcausse@deploy1003: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
12:55 dcausse@deploy1003: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
12:52 topranks: reboot cr2-drmrs to updrade JunOS T364092
12:48 dcausse@deploy1003: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
12:48 dcausse@deploy1003: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
12:47 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4006.ulsfo.wmnet to cluster ulsfo and group 1
12:45 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ms-be1066.eqiad.wmnet
12:45 mvernon@cumin1002: START - Cookbook sre.hosts.remove-downtime for ms-be1066.eqiad.wmnet
12:45 klausman@deploy1003: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply
12:44 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
12:44 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
12:44 klausman@deploy1003: helmfile [codfw] START helmfile.d/services/api-gateway: apply
12:44 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
12:44 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
12:43 cgoubert@deploy1003: Finished scap sync-world: Deploy mediawiki chart 0.8.5 (duration: 02m 19s)
12:42 cgoubert@deploy1003: cgoubert: Continuing with sync
12:42 cgoubert@deploy1003: cgoubert: Deploy mediawiki chart 0.8.5 synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
12:42 claime: cgoubert@deploy1003 Started scap sync-world: Deploy mediawiki chart 0.8.6
12:41 cgoubert@deploy1003: Started scap sync-world: Deploy mediawiki chart 0.8.5
12:41 aqu@deploy1003: Finished deploy [airflow-dags/analytics_test@040c3ab]: Update artifacts for analytics_test (duration: 00m 16s)
12:41 aqu@deploy1003: Started deploy [airflow-dags/analytics_test@040c3ab]: Update artifacts for analytics_test
12:40 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti4006
12:39 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti4006
12:39 jmm@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host ganeti4005
12:39 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti4005
12:33 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4006.ulsfo.wmnet
12:29 cmooney@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 8 hosts with reason: Upgrade cr1-drmrs JunOS
12:28 dcausse@deploy1003: Finished scap sync-world: Backport for Translate: fix elasticsearch cluster setup (T390244) (duration: 15m 57s)
12:27 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
12:25 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4006.ulsfo.wmnet
12:25 ayounsi@cumin1002: START - Cookbook sre.dns.netbox
12:23 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
12:23 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
12:23 cgoubert@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-cron: apply
12:23 cgoubert@deploy1003: helmfile [codfw] START helmfile.d/services/mw-cron: apply
12:20 klausman@deploy1003: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
12:20 klausman@deploy1003: helmfile [staging] START helmfile.d/services/api-gateway: apply
12:18 dcausse@deploy1003: dcausse: Continuing with sync
12:16 dcausse@deploy1003: dcausse: Backport for Translate: fix elasticsearch cluster setup (T390244) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
12:15 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti4006.ulsfo.wmnet with OS bookworm
12:12 dcausse@deploy1003: Started scap sync-world: Backport for Translate: fix elasticsearch cluster setup (T390244)
12:11 topranks: set "graceful sender" option on cr2-drmrs to darin for JunOS upgrade T364092
12:08 claime: Deploying 1131037 mw::periodic_job: Migrate blameStartupRegistry.php - T388540
12:02 marostegui@cumin1002: END (ERROR) - Cookbook sre.mysql.clone (exit_code=97) of db1211.eqiad.wmnet onto db1256.eqiad.wmnet
11:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti4006.ulsfo.wmnet with reason: host reimage
11:53 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti4006.ulsfo.wmnet with reason: host reimage
11:51 Emperor: VACUUM large container dbs on ms-be1066 T377827
11:49 mvernon@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on ms-be1066.eqiad.wmnet with reason: vacuum overlarge container dbs
11:35 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti4006.ulsfo.wmnet with OS bookworm
11:32 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.pool-depool-node (exit_code=0) check for host wikikube-worker1039.eqiad.wmnet
11:32 cgoubert@cumin1002: START - Cookbook sre.k8s.pool-depool-node check for host wikikube-worker1039.eqiad.wmnet
11:26 claime: wikikube-worker1039.eqiad.wmnet - powercycling from idrac
11:22 jmm@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ganeti4006.ulsfo.wmnet
11:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4006.ulsfo.wmnet
11:15 jmm@dns1004: END - running authdns-update
11:13 jmm@dns1004: START - running authdns-update
11:12 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4006.ulsfo.wmnet
11:07 elukey@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on registry1004.eqiad.wmnet with reason: maintenance
11:05 elukey: remove docker registry nginx cache settings from registry* - T390251
10:26 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ganeti4006.ulsfo.wmnet
10:15 jmm@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on ganeti4006.ulsfo.wmnet with reason: remove from cluster for reimage
10:11 elukey: restart nginx on registry2005 - T390251
10:05 dcausse@deploy1003: Started scap sync-world: Backport for Translate: fix elasticsearch cluster setup (T390244)
09:58 fabfur@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on cp4047.ulsfo.wmnet with reason: HW errors
09:55 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti4006.ulsfo.wmnet
09:42 kevinbazira@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
09:38 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
09:31 kevinbazira@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
09:22 slyngshede@dns1004: END - running authdns-update
09:20 slyngshede@dns1004: START - running authdns-update
09:19 elukey: restart of nginx on registry2004 (by akosiaris) - only instance returning inconsistent responses for a given layer request - T390251
09:15 dcausse@deploy1003: Started scap sync-world: Backport for Translate: fix elasticsearch cluster setup (T390244)
09:13 elukey: `apt-get clean` on registry200[4,5] to free some space
08:49 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
08:49 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
08:48 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db1211 - Depool db1211.eqiad.wmnet to then clone it to db1256.eqiad.wmnet - marostegui@cumin1002
08:48 marostegui@cumin1002: START - Cookbook sre.mysql.depool db1211 - Depool db1211.eqiad.wmnet to then clone it to db1256.eqiad.wmnet - marostegui@cumin1002
08:48 marostegui@cumin1002: START - Cookbook sre.mysql.clone of db1211.eqiad.wmnet onto db1256.eqiad.wmnet
08:47 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host cumin1003.eqiad.wmnet
08:47 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cumin1003.eqiad.wmnet with OS bookworm
08:38 dcausse@deploy1003: Started scap sync-world: Backport for Translate: fix elasticsearch cluster setup (T390244)
08:32 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cumin1003.eqiad.wmnet with reason: host reimage
08:27 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cumin1003.eqiad.wmnet with reason: host reimage
08:16 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host cumin1003.eqiad.wmnet with OS bookworm
08:12 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM cumin1003.eqiad.wmnet - jmm@cumin2002"
08:12 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM cumin1003.eqiad.wmnet - jmm@cumin2002"
08:11 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) cumin1003.eqiad.wmnet on all recursors
08:11 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache cumin1003.eqiad.wmnet on all recursors
08:11 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
08:11 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM cumin1003.eqiad.wmnet - jmm@cumin2002"
08:11 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM cumin1003.eqiad.wmnet - jmm@cumin2002"
08:11 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp7008.magru.wmnet
08:10 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp7016.magru.wmnet
08:05 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp7016.magru.wmnet
08:05 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp7008.magru.wmnet
08:04 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp7007.magru.wmnet
08:04 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp7015.magru.wmnet
08:03 jmm@cumin2002: START - Cookbook sre.dns.netbox
08:03 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host cumin1003.eqiad.wmnet
08:02 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
08:02 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
07:59 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp7015.magru.wmnet
07:59 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp7007.magru.wmnet
07:54 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp7006.magru.wmnet
07:54 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp7014.magru.wmnet
07:50 dcausse@deploy1003: Started scap sync-world: Backport for Translate: fix elasticsearch cluster setup (T390244)
07:49 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti4006.ulsfo.wmnet
07:48 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp7014.magru.wmnet
07:48 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp7006.magru.wmnet
07:48 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti4006.ulsfo.wmnet
07:46 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp7005.magru.wmnet
07:46 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti4006.ulsfo.wmnet
07:45 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp7013.magru.wmnet
07:41 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp7013.magru.wmnet
07:41 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp7005.magru.wmnet
07:40 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp7004.magru.wmnet
07:38 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp7012.magru.wmnet
07:36 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
07:36 isaranto@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
07:35 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
07:34 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp7012.magru.wmnet
07:34 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp7004.magru.wmnet
07:33 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp7003.magru.wmnet
07:33 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp7011.magru.wmnet
07:32 dcausse@deploy1003: Started scap sync-world: Backport for Translate: fix elasticsearch cluster setup (T390244)
07:32 brouberol@deploy1003: Finished scap build-images: T390059 - Prevent stub provider from going in an infinite loop (duration: 10m 57s)
07:28 kevinbazira@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'article-models' for release 'main' .
07:27 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp7011.magru.wmnet
07:27 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp7003.magru.wmnet
07:24 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp7010.magru.wmnet
07:24 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp7002.magru.wmnet
07:21 brouberol@deploy1003: Started scap build-images: T390059 - Prevent stub provider from going in an infinite loop
07:19 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp7010.magru.wmnet
07:19 volans: upgrade python3-wmflib to v1.3.1 fleetwide
07:18 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'article-models' for release 'main' .
07:17 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp7002.magru.wmnet
07:12 volans: upgraded python3-wmflib to v1.3.1 on cumin[12]002
07:08 kevinbazira@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'article-models' for release 'main' .
07:06 fabfur: deploying https://gerrit.wikimedia.org/r/c/operations/puppet/+/1131705 on A:cp-magru (T384227)

2025-03-29

17:00 isaranto@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revision-models' for release 'main' .
15:51 oblivian@cumin1002: END (FAIL) - Cookbook sre.discovery.service-route (exit_code=99) pool swift in eqiad: maintenance
15:47 mvernon@cumin1002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:swift-fe-eqiad
15:46 oblivian@cumin1002: START - Cookbook sre.discovery.service-route pool swift in eqiad: maintenance
15:44 mvernon@cumin1002: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:swift-fe-eqiad
15:42 oblivian@cumin1002: END (FAIL) - Cookbook sre.discovery.service-route (exit_code=99) depool swift in eqiad: maintenance
15:37 oblivian@cumin1002: START - Cookbook sre.discovery.service-route depool swift in eqiad: maintenance

2025-03-28

23:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2330.codfw.wmnet with OS bookworm
23:27 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
23:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2328.codfw.wmnet with OS bookworm
23:27 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
23:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2327.codfw.wmnet with OS bookworm
23:27 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2329.codfw.wmnet with OS bookworm
23:23 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
23:23 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
23:22 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2326.codfw.wmnet with OS bookworm
23:22 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
23:20 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
23:15 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
23:10 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
23:05 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
23:00 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2326.codfw.wmnet with reason: host reimage
22:57 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2328.codfw.wmnet with reason: host reimage
22:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2329.codfw.wmnet with reason: host reimage
22:51 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2330.codfw.wmnet with reason: host reimage
22:49 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2327.codfw.wmnet with reason: host reimage
22:47 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2330.codfw.wmnet with reason: host reimage
22:46 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2329.codfw.wmnet with reason: host reimage
22:46 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2328.codfw.wmnet with reason: host reimage
22:46 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2326.codfw.wmnet with reason: host reimage
22:46 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2327.codfw.wmnet with reason: host reimage
22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2330.codfw.wmnet with OS bookworm
22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2329.codfw.wmnet with OS bookworm
22:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2328.codfw.wmnet with OS bookworm
22:34 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2327.codfw.wmnet with OS bookworm
22:34 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2326.codfw.wmnet with OS bookworm
22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2323.codfw.wmnet with OS bookworm
22:32 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
22:32 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
22:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2325.codfw.wmnet with OS bookworm
22:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
22:28 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
22:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2322.codfw.wmnet with OS bookworm
22:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
22:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
22:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2324.codfw.wmnet with OS bookworm
22:27 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
22:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2321.codfw.wmnet with OS bookworm
22:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
22:23 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
22:23 sbassett@deploy1003: Finished scap sync-world: Backport for Add LoginNotify to disallowed local providers, Introduce configuration to deny logins from unknown systems (T390315), Configure LoginNotify deny functionality (T390315) (duration: 27m 32s)
22:17 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
22:16 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2323.codfw.wmnet with reason: host reimage
22:12 sbassett@deploy1003: sbassett, tgr, mszabo, kharlan: Continuing with sync
22:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2322.codfw.wmnet with reason: host reimage
22:10 sbassett@deploy1003: sbassett, tgr, mszabo, kharlan: Backport for Add LoginNotify to disallowed local providers, Introduce configuration to deny logins from unknown systems (T390315), Configure LoginNotify deny functionality (T390315) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
22:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2325.codfw.wmnet with reason: host reimage
22:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2324.codfw.wmnet with reason: host reimage
22:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2321.codfw.wmnet with reason: host reimage
22:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2324.codfw.wmnet with reason: host reimage
21:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2325.codfw.wmnet with reason: host reimage
21:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2323.codfw.wmnet with reason: host reimage
21:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2322.codfw.wmnet with reason: host reimage
21:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2321.codfw.wmnet with reason: host reimage
21:55 sbassett@deploy1003: Started scap sync-world: Backport for Add LoginNotify to disallowed local providers, Introduce configuration to deny logins from unknown systems (T390315), Configure LoginNotify deny functionality (T390315)
21:46 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2325.codfw.wmnet with OS bookworm
21:45 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2324.codfw.wmnet with OS bookworm
21:45 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2323.codfw.wmnet with OS bookworm
21:45 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2322.codfw.wmnet with OS bookworm
21:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2318.codfw.wmnet with OS bookworm
21:44 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
21:32 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
21:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2319.codfw.wmnet with OS bookworm
21:30 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
21:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2320.codfw.wmnet with OS bookworm
21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
21:25 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
21:25 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2317.codfw.wmnet with OS bookworm
21:25 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
21:20 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
21:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2316.codfw.wmnet with OS bookworm
21:20 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
21:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
21:16 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2318.codfw.wmnet with reason: host reimage
21:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2319.codfw.wmnet with reason: host reimage
21:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2320.codfw.wmnet with reason: host reimage
21:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2317.codfw.wmnet with reason: host reimage
21:03 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2316.codfw.wmnet with reason: host reimage
21:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2320.codfw.wmnet with reason: host reimage
21:01 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2319.codfw.wmnet with reason: host reimage
21:00 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2318.codfw.wmnet with reason: host reimage
21:00 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2317.codfw.wmnet with reason: host reimage
21:00 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2316.codfw.wmnet with reason: host reimage
20:50 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2320.codfw.wmnet with OS bookworm
20:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2319.codfw.wmnet with OS bookworm
20:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2318.codfw.wmnet with OS bookworm
20:49 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2317.codfw.wmnet with OS bookworm
20:48 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2316.codfw.wmnet with OS bookworm
20:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2313.codfw.wmnet with OS bookworm
20:47 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
20:12 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
20:09 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2312.codfw.wmnet with OS bookworm
20:09 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
20:09 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
20:07 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2315.codfw.wmnet with OS bookworm
20:07 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
20:06 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
20:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2314.codfw.wmnet with OS bookworm
20:06 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
20:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2311.codfw.wmnet with OS bookworm
20:06 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
20:02 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
19:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
19:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2313.codfw.wmnet with reason: host reimage
19:52 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2312.codfw.wmnet with reason: host reimage
19:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2315.codfw.wmnet with reason: host reimage
19:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2314.codfw.wmnet with reason: host reimage
19:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2311.codfw.wmnet with reason: host reimage
19:40 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2315.codfw.wmnet with reason: host reimage
19:39 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2314.codfw.wmnet with reason: host reimage
19:39 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2313.codfw.wmnet with reason: host reimage
19:39 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2312.codfw.wmnet with reason: host reimage
19:39 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2311.codfw.wmnet with reason: host reimage
19:31 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: apply updated master config - bking@cumin2002 - T390100
19:31 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: apply updated master config - bking@cumin2002 - T390100
19:28 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2315.codfw.wmnet with OS bookworm
19:28 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2314.codfw.wmnet with OS bookworm
19:28 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2313.codfw.wmnet with OS bookworm
19:27 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2312.codfw.wmnet with OS bookworm
19:27 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2311.codfw.wmnet with OS bookworm
19:15 fab@deploy1003: Finished deploy [airflow-dags/research@b5ce354]: (no justification provided) (duration: 00m 43s)
19:15 fab@deploy1003: Started deploy [airflow-dags/research@b5ce354]: (no justification provided)
17:50 ladsgroup@deploy1003: Finished scap sync-world: Backport for OATHAuth: Mark interface-admin as requiring 2FA (duration: 13m 43s)
17:43 ladsgroup@deploy1003: ladsgroup, reedy: Continuing with sync
17:41 ladsgroup@deploy1003: ladsgroup, reedy: Backport for OATHAuth: Mark interface-admin as requiring 2FA synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
17:37 ladsgroup@deploy1003: Started scap sync-world: Backport for OATHAuth: Mark interface-admin as requiring 2FA
16:44 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: apply updated master config - bking@cumin2002 - T390100
16:44 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster relforge: apply updated master config - bking@cumin2002 - T390100
15:40 oblivian@deploy1003: Finished scap sync-world: Backport for HIBP verification code (T389727) (duration: 12m 14s)
15:34 oblivian@deploy1003: oblivian, fabfur: Continuing with sync
15:33 oblivian@deploy1003: oblivian, fabfur: Backport for HIBP verification code (T389727) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
15:29 inflatador: bking@apt1002 publish wmf-opensearch-search-plugins-1.3.20-3 to component/opensearch13 bullseye-wikimedia T390100
15:28 oblivian@deploy1003: Started scap sync-world: Backport for HIBP verification code (T389727)
15:23 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply
15:23 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/api-gateway: apply
15:22 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply
15:22 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/api-gateway: apply
15:19 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
15:19 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/api-gateway: apply
15:18 moritzm: uploaded wmf-laptop 1.0.1 to apt.wikimedia.org
15:17 fabfur@dns1004: END - running authdns-update
15:14 fabfur@dns1004: START - running authdns-update
15:09 fabfur@dns1004: END - running authdns-update
15:07 fabfur@dns1004: START - running authdns-update
15:06 fabfur@dns1004: END - running authdns-update
15:04 fabfur@dns1004: START - running authdns-update
14:58 fab@deploy1003: Finished deploy [airflow-dags/research@b5ce354]: (no justification provided) (duration: 00m 39s)
14:58 fab@deploy1003: Started deploy [airflow-dags/research@b5ce354]: (no justification provided)
13:48 reedy@deploy1003: Synchronized wmf-config/CommonSettings.php: Disable CAPTCHA on Special:UserLogin (duration: 11m 52s)
13:23 hashar: Restarted CI Jenkins
12:04 mfossati@deploy1003: Finished deploy [airflow-dags/platform_eng@a233b0d]: bump SEAL to v0.5.0 (duration: 01m 14s)
12:02 mfossati@deploy1003: Started deploy [airflow-dags/platform_eng@a233b0d]: bump SEAL to v0.5.0
11:20 jelto@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'apply'.
11:19 jelto@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/admin 'apply'.
11:19 jelto@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
11:19 jelto@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
11:12 jnuche@deploy1003: Installation of scap version "4.147.0" completed for 2 hosts
11:11 jnuche@deploy1003: Installing scap version "4.147.0" for 2 host(s)
10:43 jelto@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'.
10:43 jelto@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'.
10:41 jelto@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
10:41 jelto@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'.
10:41 jelto: apply admin_ng external-services to add puppetdb endpoints - T350794
10:39 jelto@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
10:38 jelto@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
10:36 jelto@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
10:35 jelto@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
09:55 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
09:55 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'.
09:55 cgoubert@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'.
09:54 cgoubert@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'.
09:54 cgoubert@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
09:54 cgoubert@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
09:53 cgoubert@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
09:53 elukey: benthos-webrequest_live back working with two instances
09:52 cgoubert@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
09:37 dcausse: repooling wdqs1020
09:21 dcausse: restarting blazegraph on wdqs1020 (deadlocked)
09:20 fabfur@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on cp4047.ulsfo.wmnet with reason: HW errors
08:39 godog: bounce mtail on centrallog1002 - stuck on cpu
08:38 elukey: stop benthos-webrequest_live on centrallog2002.codfw.wmnet to test handling load/traffic with one instance - T390029
08:37 elukey@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on centrallog2002.codfw.wmnet with reason: Test stopping benthos webrequest-live
07:22 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2243.codfw.wmnet with reason: Maintenance
03:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2310.codfw.wmnet with OS bookworm
03:01 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
03:00 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2309.codfw.wmnet with OS bookworm
03:00 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
02:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
02:52 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
02:52 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2308.codfw.wmnet with OS bookworm
02:52 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
02:48 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
02:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2310.codfw.wmnet with reason: host reimage
02:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2309.codfw.wmnet with reason: host reimage
02:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2308.codfw.wmnet with reason: host reimage
02:30 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2310.codfw.wmnet with reason: host reimage
02:30 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2309.codfw.wmnet with reason: host reimage
02:30 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2308.codfw.wmnet with reason: host reimage
02:19 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2310.codfw.wmnet with OS bookworm
02:19 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2309.codfw.wmnet with OS bookworm
02:18 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2308.codfw.wmnet with OS bookworm
02:16 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2306.codfw.wmnet with OS bookworm
02:16 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
02:15 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
02:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2307.codfw.wmnet with OS bookworm
02:12 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
02:12 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
02:10 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2305.codfw.wmnet with OS bookworm
02:10 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
02:09 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
02:00 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2306.codfw.wmnet with reason: host reimage
01:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2307.codfw.wmnet with reason: host reimage
01:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2305.codfw.wmnet with reason: host reimage
01:50 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2307.codfw.wmnet with reason: host reimage
01:50 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2306.codfw.wmnet with reason: host reimage
01:49 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2305.codfw.wmnet with reason: host reimage
01:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2307.codfw.wmnet with OS bookworm
01:38 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2306.codfw.wmnet with OS bookworm
01:37 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2305.codfw.wmnet with OS bookworm
01:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2304.codfw.wmnet with OS bookworm
01:33 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
01:32 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
01:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2303.codfw.wmnet with OS bookworm
01:29 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
01:28 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
01:25 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2291.codfw.wmnet with OS bookworm
01:25 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
01:25 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
01:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2304.codfw.wmnet with reason: host reimage
01:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2303.codfw.wmnet with reason: host reimage
01:10 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2291.codfw.wmnet with reason: host reimage
01:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2304.codfw.wmnet with reason: host reimage
01:06 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2303.codfw.wmnet with reason: host reimage
01:06 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2291.codfw.wmnet with reason: host reimage
00:55 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2304.codfw.wmnet with OS bookworm
00:55 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2303.codfw.wmnet with OS bookworm
00:55 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2291.codfw.wmnet with OS bookworm
00:53 ladsgroup@deploy1003: Finished scap sync-world: Backport for maintenance: Add support for unlocking accounts in LockUser.php (duration: 54m 51s)
00:52 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2320.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
00:52 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2319.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
00:52 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2320.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
00:52 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2318.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
00:51 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2319.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
00:51 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2318.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
00:51 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2313.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
00:51 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2312.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
00:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2313.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
00:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2311.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
00:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2312.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
00:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2311.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
00:49 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2310.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
00:49 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2309.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
00:49 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2310.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
00:49 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2306.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
00:49 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2309.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
00:48 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2306.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
00:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2305.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
00:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2304.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
00:48 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2305.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
00:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2303.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
00:47 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2304.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
00:47 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2303.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
00:45 ladsgroup@deploy1003: ladsgroup: Continuing with sync
00:36 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
00:35 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply
00:35 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
00:35 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
00:35 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
00:35 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply
00:02 ladsgroup@deploy1003: ladsgroup: Backport for maintenance: Add support for unlocking accounts in LockUser.php synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)

2025-03-27

23:58 ladsgroup@deploy1003: Started scap sync-world: Backport for maintenance: Add support for unlocking accounts in LockUser.php
23:43 zabe: zabe@mwmaint1002:~$ cat group1.dblist | xargs -I{} bash -c "echo {}; mwscript extensions/WikimediaMaintenance/migrateESRefToContentTableStage2.php {} --delete /home/zabe/afl_text_table_deletedump/{} --sleep 0.3" # T381599
23:43 Krinkle: Doing some load testing on mwdebug1001
23:43 cstone: payments-wiki upgraded from 86dea9fc to 9a51f51d
23:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1123.eqiad.wmnet with OS bullseye
23:40 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
23:40 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
23:30 tstarling@deploy1003: Finished scap sync-world: Backport for Enable Codex and Multiblocks in Polish wiki (T377121), CaptchaPreAuthenticationProvider: Improve log messages (T379178) (duration: 16m 42s)
23:24 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1123.eqiad.wmnet with reason: host reimage
23:23 tstarling@deploy1003: tstarling, hmonroy, reedy: Continuing with sync
23:22 cstone: payments-wiki upgraded from 8bcc8ff2 to 86dea9fc
23:20 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1123.eqiad.wmnet with reason: host reimage
23:19 tstarling@deploy1003: tstarling, hmonroy, reedy: Backport for Enable Codex and Multiblocks in Polish wiki (T377121), CaptchaPreAuthenticationProvider: Improve log messages (T379178) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
23:14 tstarling@deploy1003: Started scap sync-world: Backport for Enable Codex and Multiblocks in Polish wiki (T377121), CaptchaPreAuthenticationProvider: Improve log messages (T379178)
23:10 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host elastic1123.eqiad.wmnet with OS bullseye
22:57 thcipriani@deploy1003: Finished scap sync-world: Backport for REST: fix extra routes module localization strings (T385855) (duration: 14m 06s)
22:50 thcipriani@deploy1003: bpirkle, thcipriani: Continuing with sync
22:48 thcipriani@deploy1003: bpirkle, thcipriani: Backport for REST: fix extra routes module localization strings (T385855) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
22:43 thcipriani@deploy1003: Started scap sync-world: Backport for REST: fix extra routes module localization strings (T385855)
22:42 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1125.eqiad.wmnet with OS bullseye
22:42 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
22:42 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
22:40 thcipriani@deploy1003: Finished scap sync-world: Backport for Disable new WebAuthn credentials creation on local domains (T378402 T354701) (duration: 17m 03s)
22:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1124.eqiad.wmnet with OS bullseye
22:40 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
22:38 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
22:33 thcipriani@deploy1003: tgr, thcipriani: Continuing with sync
22:28 thcipriani@deploy1003: tgr, thcipriani: Backport for Disable new WebAuthn credentials creation on local domains (T378402 T354701) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
22:27 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1125.eqiad.wmnet with reason: host reimage
22:24 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1124.eqiad.wmnet with reason: host reimage
22:23 thcipriani@deploy1003: Started scap sync-world: Backport for Disable new WebAuthn credentials creation on local domains (T378402 T354701)
22:22 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1125.eqiad.wmnet with reason: host reimage
22:21 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1124.eqiad.wmnet with reason: host reimage
22:12 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host elastic1125.eqiad.wmnet with OS bullseye
22:10 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host elastic1124.eqiad.wmnet with OS bullseye
22:09 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host elastic1123.eqiad.wmnet with OS bullseye
22:09 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic1125.eqiad.wmnet with OS bullseye
22:09 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host elastic1125.eqiad.wmnet with OS bullseye
22:05 thcipriani@deploy1003: Finished scap sync-world: Backport for Add "PRE" (for NS_TEMPLATE) and "CAT" (for NS_CATEGORY) as namespace aliases in ptwiki. (T389609) (duration: 13m 23s)
21:57 thcipriani@deploy1003: thcipriani, albertoleoncio: Continuing with sync
21:57 thcipriani@deploy1003: thcipriani, albertoleoncio: Backport for Add "PRE" (for NS_TEMPLATE) and "CAT" (for NS_CATEGORY) as namespace aliases in ptwiki. (T389609) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:52 thcipriani@deploy1003: Started scap sync-world: Backport for Add "PRE" (for NS_TEMPLATE) and "CAT" (for NS_CATEGORY) as namespace aliases in ptwiki. (T389609)
21:43 dancy@deploy1003: Finished scap sync-world: Testing deployments (duration: 24m 24s)
21:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2328.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
21:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2329.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
21:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2330.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
21:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2328.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
21:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2329.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
21:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2330.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
21:22 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2326.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
21:22 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2325.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
21:22 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2324.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
21:22 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2323.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
21:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2322.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
21:21 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on P{cp2041.codfw.wmnet} and A:cp
21:21 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on P{cp2042.codfw.wmnet} and A:cp
21:19 dancy@deploy1003: Started scap sync-world: Testing deployments
21:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2327.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
21:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2327.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
21:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2326.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
21:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2325.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
21:16 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on P{cp2042.codfw.wmnet} and A:cp
21:16 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on P{cp2041.codfw.wmnet} and A:cp
21:16 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2324.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
21:16 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2323.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
21:16 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2322.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
21:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2320.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
21:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2319.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
21:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2318.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
21:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2321.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
21:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2317.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
21:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2316.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
21:07 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2321.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
21:07 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2320.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
21:07 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2319.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
21:07 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2318.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
21:06 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2317.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
21:06 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2316.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
21:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2313.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
21:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2312.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
21:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2311.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
21:04 dancy@deploy1003: Started scap sync-world: Testing deployments
21:04 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2315.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
21:04 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2314.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
21:00 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2310.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
20:58 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2315.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
20:58 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2314.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
20:58 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2313.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
20:58 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2312.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
20:58 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2311.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
20:57 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2310.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
20:55 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2305.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
20:55 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2306.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
20:55 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2309.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
20:55 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2304.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
20:55 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2308.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
20:55 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2307.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
20:50 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2309.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
20:49 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2308.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
20:49 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2307.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
20:49 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2306.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
20:49 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2305.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
20:48 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2304.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
20:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2330.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
20:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2329.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
20:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2328.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
20:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2326.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
20:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2303.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
20:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2325.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
20:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2324.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
20:39 sbassett@deploy1003: Started scap sync-world: Backport for LoginNotify#sendNotice: Add IP and UA to log message (T390141), GlobalContributions: Add API query module (T390156)
20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2323.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
20:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2322.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
20:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2321.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
20:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2320.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2319.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
20:36 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2303.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2291.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
20:36 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2291.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
20:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2318.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
20:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2317.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
20:35 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2330.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
20:35 thcipriani: scap backport failed, investigating
20:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2316.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
20:34 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2329.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
20:33 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2328.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
20:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2315.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
20:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2327.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
20:32 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2327.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
20:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2314.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
20:32 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2326.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
20:31 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on P{cp2039.codfw.wmnet} and A:cp
20:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2313.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
20:31 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on P{cp2040.codfw.wmnet} and A:cp
20:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2325.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
20:30 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2324.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
20:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2312.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
20:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2311.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
20:28 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2323.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
20:28 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2322.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
20:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2321.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2320.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
20:26 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on P{cp2039.codfw.wmnet} and A:cp
20:26 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on P{cp2040.codfw.wmnet} and A:cp
20:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2319.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
20:26 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2309.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
20:25 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2308.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
20:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2318.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
20:25 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2307.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
20:25 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2305.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
20:25 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2306.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
20:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2317.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
20:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2304.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
20:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2303.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
20:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2316.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
20:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2315.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
20:21 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2314.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
20:21 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2313.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
20:20 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2312.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
20:19 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2311.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
20:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2310.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
20:17 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2310.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
20:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2309.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
20:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2308.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
20:14 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2307.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
20:14 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2306.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
20:14 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2305.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
20:14 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2304.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
20:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2291.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
20:13 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2303.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
20:13 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2291.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
20:12 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2330
20:12 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2330
20:12 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2329
20:12 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2329
20:12 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2328
20:12 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2328
20:11 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2326
20:11 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2326
20:11 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2325
20:11 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2325
20:11 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2324
20:11 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2324
20:11 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2323
20:11 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2323
20:11 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2322
20:11 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2322
20:10 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2321
20:10 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2321
20:10 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
20:10 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2323 to codfw - jhancock@cumin2002"
20:10 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2323 to codfw - jhancock@cumin2002"
20:07 sbassett@deploy1003: Started scap sync-world: Backport for GlobalContributions: Add API query module (T390156), LoginNotify#sendNotice: Add IP and UA to log message (T390141)
20:06 jhancock@cumin2002: START - Cookbook sre.dns.netbox
20:01 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2320
20:01 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2320
20:01 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2319
20:00 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2319
20:00 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2318
20:00 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2318
19:58 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2317
19:58 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2317
19:57 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2316
19:57 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2316
19:57 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2315
19:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2315
19:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2314
19:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2314
19:56 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
19:56 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2322 to codfw - jhancock@cumin2002"
19:56 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2322 to codfw - jhancock@cumin2002"
19:52 jhancock@cumin2002: START - Cookbook sre.dns.netbox
19:51 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on P{cp2037.codfw.wmnet} and A:cp
19:51 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on P{cp2038.codfw.wmnet} and A:cp
19:50 dancy@deploy1003: Finished scap sync-world: Backport for CaptchaPreAuthenticationProvider: Check if a login attempt would trigger a captcha in testForAuthentication (T379178) (duration: 24m 58s)
19:47 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2313
19:47 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2313
19:47 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2312
19:47 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2312
19:47 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2311
19:47 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2311
19:47 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2309
19:46 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2309
19:46 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on P{cp2038.codfw.wmnet} and A:cp
19:46 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on P{cp2037.codfw.wmnet} and A:cp
19:45 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
19:45 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2309 to codfw - jhancock@cumin2002"
19:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2309 to codfw - jhancock@cumin2002"
19:43 dancy@deploy1003: sbassett, dancy: Continuing with sync
19:41 jhancock@cumin2002: START - Cookbook sre.dns.netbox
19:38 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2308
19:38 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2308
19:37 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2307
19:37 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2307
19:37 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2306
19:37 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2306
19:37 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2305
19:36 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2305
19:36 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2304
19:36 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2304
19:36 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2303
19:36 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2303
19:35 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
19:35 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2303 to codfw - jhancock@cumin2002"
19:35 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2303 to codfw - jhancock@cumin2002"
19:32 dancy@deploy1003: sbassett, dancy: Backport for CaptchaPreAuthenticationProvider: Check if a login attempt would trigger a captcha in testForAuthentication (T379178) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
19:30 jhancock@cumin2002: START - Cookbook sre.dns.netbox
19:25 dancy@deploy1003: Started scap sync-world: Backport for CaptchaPreAuthenticationProvider: Check if a login attempt would trigger a captcha in testForAuthentication (T379178)
19:20 dancy@deploy1003: Installation of scap version "4.147.0" completed for 2 hosts
19:19 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on P{cp2036.codfw.wmnet} and A:cp
19:18 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on P{cp2035.codfw.wmnet} and A:cp
19:13 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on P{cp2036.codfw.wmnet} and A:cp
19:13 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on P{cp2035.codfw.wmnet} and A:cp
19:11 dancy@deploy1003: Installing scap version "4.147.0" for 2 host(s)
19:07 cmooney@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 202053
19:06 cmooney@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 202053
18:47 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host elastic1125.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
18:45 jclark@cumin1002: START - Cookbook sre.hosts.provision for host elastic1125.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
18:43 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on P{cp2033.codfw.wmnet} and A:cp
18:42 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on P{cp2034.codfw.wmnet} and A:cp
18:37 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on P{cp2034.codfw.wmnet} and A:cp
18:37 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on P{cp2033.codfw.wmnet} and A:cp
18:29 dancy@deploy1003: Installation of scap version "4.146.0" completed for 2 hosts
18:22 dancy@deploy1003: Installing scap version "4.146.0" for 2 host(s)
18:00 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on P{cp2031.codfw.wmnet} and A:cp
18:00 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on P{cp2032.codfw.wmnet} and A:cp
17:55 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on P{cp2032.codfw.wmnet} and A:cp
17:55 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on P{cp2031.codfw.wmnet} and A:cp
17:50 marostegui@cumin1002: dbctl commit (dc=all): 'db1211 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P74485 and previous config saved to /var/cache/conftool/dbconfig/20250327-175008-root.json
17:49 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic1123.eqiad.wmnet with OS bullseye
17:48 cdobbins@dns1004: END - running authdns-update
17:46 cdobbins@dns1004: START - running authdns-update
17:43 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
17:43 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
17:42 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
17:42 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
17:40 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
17:40 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/rest-gateway: apply
17:39 otto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
17:38 otto@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
17:37 otto@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply
17:36 otto@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply
17:35 otto@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply
17:35 marostegui@cumin1002: dbctl commit (dc=all): 'db1211 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P74482 and previous config saved to /var/cache/conftool/dbconfig/20250327-173502-root.json
17:34 otto@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply
17:34 ottomata: upgrading eventgate-analytics-external to node20 - T383814
17:28 btullis@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
17:25 btullis@cumin1002: START - Cookbook sre.dns.netbox
17:24 btullis@cumin1002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts an-worker1202.eqiad.wmnet
17:24 btullis@cumin1002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
17:24 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on P{cp2029.codfw.wmnet} and A:cp
17:23 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on P{cp2030.codfw.wmnet} and A:cp
17:20 ladsgroup@deploy1003: Finished scap sync-world: Backport for CaptchaPreAuthenticationProvider: Run triggerCaptcha for login attempts (T379178), ConfirmEditTriggersCaptcha: Support showing a CAPTCHA on Special:UserLogin (T390197) (duration: 18m 03s)
17:20 bd808@deploy1003: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
17:20 bd808@deploy1003: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
17:20 bd808@deploy1003: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
17:19 marostegui@cumin1002: dbctl commit (dc=all): 'db1211 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P74481 and previous config saved to /var/cache/conftool/dbconfig/20250327-171956-root.json
17:19 bd808@deploy1003: helmfile [codfw] START helmfile.d/services/developer-portal: apply
17:18 bd808@deploy1003: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
17:18 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on P{cp2030.codfw.wmnet} and A:cp
17:18 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on P{cp2029.codfw.wmnet} and A:cp
17:17 bd808@deploy1003: helmfile [staging] START helmfile.d/services/developer-portal: apply
17:13 ladsgroup@deploy1003: dreamyjazz, ladsgroup: Continuing with sync
17:13 otto@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: apply
17:12 otto@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: apply
17:11 otto@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-logging-external: apply
17:11 otto@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-logging-external: apply
17:07 ladsgroup@deploy1003: dreamyjazz, ladsgroup: Backport for CaptchaPreAuthenticationProvider: Run triggerCaptcha for login attempts (T379178), ConfirmEditTriggersCaptcha: Support showing a CAPTCHA on Special:UserLogin (T390197) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
17:04 marostegui@cumin1002: dbctl commit (dc=all): 'db1211 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P74480 and previous config saved to /var/cache/conftool/dbconfig/20250327-170451-root.json
17:02 ladsgroup@deploy1003: Started scap sync-world: Backport for CaptchaPreAuthenticationProvider: Run triggerCaptcha for login attempts (T379178), ConfirmEditTriggersCaptcha: Support showing a CAPTCHA on Special:UserLogin (T390197)
16:51 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply
16:51 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/api-gateway: apply
16:51 cgoubert@deploy1003: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply
16:51 cgoubert@deploy1003: helmfile [codfw] START helmfile.d/services/api-gateway: apply
16:50 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on P{cp2027.codfw.wmnet} and A:cp
16:49 marostegui@cumin1002: dbctl commit (dc=all): 'db1211 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P74479 and previous config saved to /var/cache/conftool/dbconfig/20250327-164945-root.json
16:49 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on P{cp2028.codfw.wmnet} and A:cp
16:48 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1124.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
16:44 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on P{cp2028.codfw.wmnet} and A:cp
16:44 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on P{cp2027.codfw.wmnet} and A:cp
16:44 ladsgroup@deploy1003: Finished scap sync-world: Backport for LoginAttemptCounter: Add extra hardening for long period too, LoginAttemptCounter: Add extra hardening for long period too (duration: 16m 33s)
16:43 dcaro@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20 days, 0:00:00 on cloudcephosd1029.eqiad.wmnet with reason: Installing a disk for testing
16:42 jclark@cumin1002: START - Cookbook sre.hosts.provision for host elastic1124.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
16:39 marostegui@cumin1002: END (ERROR) - Cookbook sre.mysql.clone (exit_code=97) of db1211.eqiad.wmnet onto db1255.eqiad.wmnet
16:37 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host elastic1123.eqiad.wmnet with OS bullseye
16:37 ladsgroup@deploy1003: ladsgroup: Continuing with sync
16:34 ladsgroup@deploy1003: ladsgroup: Backport for LoginAttemptCounter: Add extra hardening for long period too, LoginAttemptCounter: Add extra hardening for long period too synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
16:27 ladsgroup@deploy1003: Started scap sync-world: Backport for LoginAttemptCounter: Add extra hardening for long period too, LoginAttemptCounter: Add extra hardening for long period too
16:25 jgiannelos@deploy1003: Finished deploy [restbase/deploy@3349f02]: Deprecate unused RB codebase (duration: 19m 23s)
16:24 dancy@deploy1003: Sync cancelled.
16:24 dancy@deploy1003: dancy: Testing T389830 synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
16:23 dancy@deploy1003: Started scap sync-world: Testing T389830
16:22 sergi0: Run `foreachwikiindblist growthexperiments CommunityConfiguration:migrateConfig CommunityUpdates 2.0.3`# T387737
16:17 dancy@deploy1003: sync-world aborted: Testing T389830 (duration: 01m 48s)
16:16 dancy@deploy1003: Started scap sync-world: Testing T389830
16:14 sergi0: Run `foreachwikiindblist growthexperiments CommunityConfiguration:setVersionData CommunityUpdates 2.0.2` # T387737
16:09 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2331
16:09 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2331
16:09 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:09 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2331 to codfw - jhancock@cumin2002"
16:09 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2331 to codfw - jhancock@cumin2002"
16:06 jgiannelos@deploy1003: Started deploy [restbase/deploy@3349f02]: Deprecate unused RB codebase
16:06 marostegui@cumin1002: dbctl commit (dc=all): 'db2181 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P74476 and previous config saved to /var/cache/conftool/dbconfig/20250327-160623-root.json
16:06 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
16:05 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
16:05 jhancock@cumin2002: START - Cookbook sre.dns.netbox
16:00 elukey: `sudo systemctl restart burrow-jumbo-eqiad.service prometheus-burrow-exporter@jumbo-eqiad.service` on kafkamon1003 - attempt to check if the new kafka lag for benthos-webrequest_live is due to burrow - T390029
15:59 ebernhardson@deploy1003: Finished scap sync-world: Backport for Move cirrus traffic to eqiad for platform upgrade (T388610) (duration: 12m 49s)
15:51 marostegui@cumin1002: dbctl commit (dc=all): 'db2181 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P74475 and previous config saved to /var/cache/conftool/dbconfig/20250327-155117-root.json
15:50 ebernhardson@deploy1003: ebernhardson: Continuing with sync
15:49 ebernhardson@deploy1003: ebernhardson: Backport for Move cirrus traffic to eqiad for platform upgrade (T388610) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
15:44 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Unbanning all hosts in search_codfw
15:44 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Unbanning all hosts in search_codfw
15:44 ebernhardson@deploy1003: Started scap sync-world: Backport for Move cirrus traffic to eqiad for platform upgrade (T388610)
15:44 otto@deploy1003: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply
15:44 otto@deploy1003: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply
15:36 marostegui@cumin1002: dbctl commit (dc=all): 'db2181 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P74474 and previous config saved to /var/cache/conftool/dbconfig/20250327-153612-root.json
15:29 hashar: Restarting Gerrit to raise heap from 32G to 64G (T387223) and to enable pushing notifications to browsers (T389327)
15:28 ottomata: upgrading eventgate-logging-external to node20 (using new per stream header enrich setting), first testing in staging. - T383814, T387908
15:26 dcausse@deploy1003: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
15:26 dcausse@deploy1003: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
15:21 marostegui@cumin1002: dbctl commit (dc=all): 'db2181 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P74473 and previous config saved to /var/cache/conftool/dbconfig/20250327-152106-root.json
15:19 hashar@deploy1003: Finished scap sync-world: Sync patch to PrivateSettings.php and removal of unused configs (Gerrit: 1127930 1127889 1127890 1127886 1125095 1127900 1127898 1127887 1127897 1127888 1127929) (duration: 11m 52s)
15:15 elukey: update benthos@webrequest-live's config on centrallog nodes to new Kafka topics (haproxy vs varnishkafka) - T390029
15:07 hashar@deploy1003: Started scap sync-world: Sync patch to PrivateSettings.php and removal of unused configs (Gerrit: 1127930 1127889 1127890 1127886 1125095 1127900 1127898 1127887 1127897 1127888 1127929)
15:06 hashar@deploy1003: sync-world aborted: Sync patch to PrivateSettings.php and removal of unused configs (Gerrit: 1127930 1127889 1127890 1127886 1125095 1127900 1127898 1127887 1127897 1127888 1127929) (duration: 00m 16s)
15:06 btullis@cumin1002: START - Cookbook sre.dns.netbox
15:06 hashar@deploy1003: Started scap sync-world: Sync patch to PrivateSettings.php and removal of unused configs (Gerrit: 1127930 1127889 1127890 1127886 1125095 1127900 1127898 1127887 1127897 1127888 1127929)
15:06 marostegui@cumin1002: dbctl commit (dc=all): 'db2181 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P74472 and previous config saved to /var/cache/conftool/dbconfig/20250327-150601-root.json
15:03 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
15:02 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
15:00 moritzm: installing setuptools security updates
14:54 tgr_: UTC afternoon deploys done
14:52 tgr@deploy1003: Finished scap sync-world: Backport for Enable SUL3 for temp users on group 0/1 (T384220) (duration: 22m 27s)
14:45 tgr@deploy1003: tgr: Continuing with sync
14:40 moritzm: uploaded Boost 1.83.0-4.1~wmf12u1 (backport of Boost 1.83 to Bookworm, needed by Mapnik 4.0.6) T389776
14:39 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
14:39 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
14:35 tgr@deploy1003: tgr: Backport for Enable SUL3 for temp users on group 0/1 (T384220) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:30 tgr@deploy1003: Started scap sync-world: Backport for Enable SUL3 for temp users on group 0/1 (T384220)
14:26 jmm@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
14:24 tgr@deploy1003: Finished scap sync-world: Backport for Fix badpass logging for locally nonexistent users (duration: 19m 42s)
14:21 jmm@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply
14:19 jmm@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
14:16 tgr@deploy1003: tgr: Continuing with sync
14:16 jmm@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply
13:06 topranks: adding IBGP peerings between loopbacks in cloud-vrf on cloudsw devices in eqiad T389958
13:01 akosiaris@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-misc: apply
13:01 akosiaris@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-misc: apply
13:00 akosiaris: bump mw-misc to pick up https://gerrit.wikimedia.org/r/c/operations/deployment-charts/+/1131636. It removes various hostnames from the SANs of mediawiki, but should be a noop
12:59 akosiaris@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-misc: apply
12:59 akosiaris@deploy1003: helmfile [codfw] START helmfile.d/services/mw-misc: apply
12:59 isaranto@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revision-models' for release 'main' .
12:58 isaranto@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revision-models' for release 'main' .
12:56 isaranto@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revision-models' for release 'main' .
12:36 topranks: enabling IPv6 on cloudsw devices in eqiad T389958
12:35 aqu@deploy1003: Finished deploy [airflow-dags/analytics_test@bbac659]: Keep airflow analytics_test up-to-date (duration: 00m 14s)
12:34 aqu@deploy1003: Started deploy [airflow-dags/analytics_test@bbac659]: Keep airflow analytics_test up-to-date
12:32 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
12:30 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply
12:27 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
12:25 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply
12:20 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
12:20 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply
12:04 Dreamy_Jazz: Updated security patches for T389235
11:53 ladsgroup@deploy1003: Finished scap sync-world: Backport for Bump thumb steps ratio to 50% (T360589) (duration: 18m 20s)
11:51 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
11:49 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
11:46 ladsgroup@deploy1003: ladsgroup: Continuing with sync
11:40 ladsgroup@deploy1003: ladsgroup: Backport for Bump thumb steps ratio to 50% (T360589) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
11:34 ladsgroup@deploy1003: Started scap sync-world: Backport for Bump thumb steps ratio to 50% (T360589)
11:32 brouberol@deploy1003: Finished scap build-images: T390059 - add signal handlers in dumps code to display a stacktrace (duration: 00m 39s)
11:31 brouberol@deploy1003: Started scap build-images: T390059 - add signal handlers in dumps code to display a stacktrace
11:29 jmm@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on puppetserver2004.codfw.wmnet with reason: being setup
11:23 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.addnode (exit_code=0) for new host ganeti4005.ulsfo.wmnet to cluster ulsfo and group 1
11:22 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti4005.ulsfo.wmnet to cluster ulsfo and group 1
11:19 jmm@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti4005
11:18 jmm@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti4005
11:15 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4005.ulsfo.wmnet
11:07 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4005.ulsfo.wmnet
11:05 jiji@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply
11:05 jiji@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply
11:05 _joe_: manually installing python3-opensearch on mwlog1002, temporarily
11:01 jiji@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply
10:55 isaranto@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revision-models' for release 'main' .
10:54 zoe@deploy1003: manually-logged testing manual log helper script
10:53 isaranto@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revision-models' for release 'main' .
10:45 jiji@deploy1003: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply
10:44 isaranto@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revision-models' for release 'main' .
10:37 isaranto@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revision-models' for release 'main' .
10:28 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti4005.ulsfo.wmnet with OS bookworm
10:23 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
10:23 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
10:22 klausman@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
10:21 klausman@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
10:20 klausman@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
10:19 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-serve-ctrl2002.codfw.wmnet with OS bookworm
10:19 klausman@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
10:17 klausman@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
10:16 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
10:16 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
10:15 klausman@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
10:15 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
10:14 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
10:14 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
10:14 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
10:14 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
10:14 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
10:13 joal@deploy1003: Finished deploy [analytics/refinery@bc1b576] (hadoop-test): Analytics webrequest_frontend update TEST [analytics/refinery@bc1b5761] (duration: 01m 27s)
10:12 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
10:12 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
10:11 joal@deploy1003: Started deploy [analytics/refinery@bc1b576] (hadoop-test): Analytics webrequest_frontend update TEST [analytics/refinery@bc1b5761]
10:10 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
10:10 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
10:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti4005.ulsfo.wmnet with reason: host reimage
10:08 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
10:08 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
10:06 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti4005.ulsfo.wmnet with reason: host reimage
10:01 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
10:01 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
10:01 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve-ctrl2002.codfw.wmnet with reason: host reimage
09:58 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve-ctrl2002.codfw.wmnet with reason: host reimage
09:48 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti4005.ulsfo.wmnet with OS bookworm
09:45 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
09:45 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
09:45 jmm@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ganeti4005.ulsfo.wmnet
09:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4005.ulsfo.wmnet
09:41 aklapper@deploy1003: rebuilt and synchronized wikiversions files: group2 to 1.44.0-wmf.22 refs T386217
09:41 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ml-serve-ctrl2002.codfw.wmnet with OS bookworm
09:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4005.ulsfo.wmnet
09:32 dcausse@deploy1003: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
09:32 dcausse@deploy1003: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
09:30 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
09:29 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
09:29 godog: silence LogstashKafkaConsumerLag and LogstashIndexingFailures for today for 1d - T390140
09:29 dcausse@deploy1003: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
09:29 dcausse@deploy1003: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
09:28 dcausse@deploy1003: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
09:28 dcausse@deploy1003: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
09:27 dcausse@deploy1003: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
09:27 dcausse@deploy1003: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
09:14 aklapper@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.44.0-wmf.22 refs T386217
09:01 aklapper@deploy1003: Finished scap sync-world: Backport for Instead of calling deprecated parserOptions(), parse content ourselves (T390032) (duration: 12m 24s)
08:54 aklapper@deploy1003: aklapper, jforrester: Continuing with sync
08:53 aklapper@deploy1003: aklapper, jforrester: Backport for Instead of calling deprecated parserOptions(), parse content ourselves (T390032) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
08:50 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ganeti4005.ulsfo.wmnet
08:48 aklapper@deploy1003: Started scap sync-world: Backport for Instead of calling deprecated parserOptions(), parse content ourselves (T390032)
08:44 aklapper@deploy1003: Finished scap sync-world: Backport for Allow arwikisource bureaucrat to manage "import" (T389952) (duration: 13m 28s)
08:42 jmm@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on ganeti4005.ulsfo.wmnet with reason: remove from cluster for reimage
08:41 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp7001.magru.wmnet
08:41 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp7009.magru.wmnet
08:41 fabfur: repooling cp7001 and cp7009 with new TLS certificate path (T384227)
08:40 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti4005.ulsfo.wmnet
08:37 aklapper@deploy1003: hubaishan, aklapper: Continuing with sync
08:37 aklapper@deploy1003: hubaishan, aklapper: Backport for Allow arwikisource bureaucrat to manage "import" (T389952) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
08:30 aklapper@deploy1003: Started scap sync-world: Backport for Allow arwikisource bureaucrat to manage "import" (T389952)
08:29 aklapper@deploy1003: Finished scap sync-world: Backport for Make officewiki readonly after moving flow pages (T380909) (duration: 14m 14s)
08:28 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp7009.magru.wmnet
08:28 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp7001.magru.wmnet
08:28 fabfur: depooling cp7001 and cp7009 to apply https://gerrit.wikimedia.org/r/c/operations/puppet/+/1131052 (T384227)
08:21 aklapper@deploy1003: zoe, aklapper: Continuing with sync
08:21 aklapper@deploy1003: zoe, aklapper: Backport for Make officewiki readonly after moving flow pages (T380909) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
08:14 aklapper@deploy1003: Started scap sync-world: Backport for Make officewiki readonly after moving flow pages (T380909)
08:06 marostegui@cumin1002: END (FAIL) - Cookbook sre.mysql.clone (exit_code=99) of db1211.eqiad.wmnet onto db1255.eqiad.wmnet
07:54 marostegui@cumin1002: START - Cookbook sre.mysql.clone of db1211.eqiad.wmnet onto db1255.eqiad.wmnet
07:37 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti4005.ulsfo.wmnet
07:30 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti4005.ulsfo.wmnet
07:30 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti4005.ulsfo.wmnet
06:01 marostegui@cumin1002: END (FAIL) - Cookbook sre.mysql.clone (exit_code=99) of db2181.codfw.wmnet onto db2242.codfw.wmnet
04:06 cwhite: restart grafana-server on grafana1002 - appears hung

2025-03-26

23:06 toyofuku@deploy1003: Finished scap sync-world: Backport for Restore simplified watchlist for logged in users (T388445) (duration: 12m 29s)
22:59 toyofuku@deploy1003: jdlrobson, toyofuku: Continuing with sync
22:59 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic1123.eqiad.wmnet with OS bullseye
22:58 toyofuku@deploy1003: jdlrobson, toyofuku: Backport for Restore simplified watchlist for logged in users (T388445) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
22:57 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on P{cp6009.drmrs.wmnet} and A:cp
22:56 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on P{cp6008.drmrs.wmnet} and A:cp
22:54 toyofuku@deploy1003: Started scap sync-world: Backport for Restore simplified watchlist for logged in users (T388445)
22:51 toyofuku@deploy1003: Finished scap sync-world: Backport for Set wgMinervaDonateBanner to default base true (T388438) (duration: 15m 15s)
22:50 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on P{cp6008.drmrs.wmnet} and A:cp
22:50 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on P{cp6009.drmrs.wmnet} and A:cp
22:44 toyofuku@deploy1003: ksarabia, toyofuku: Continuing with sync
22:42 toyofuku@deploy1003: ksarabia, toyofuku: Backport for Set wgMinervaDonateBanner to default base true (T388438) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
22:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2310.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
22:36 toyofuku@deploy1003: Started scap sync-world: Backport for Set wgMinervaDonateBanner to default base true (T388438)
22:32 toyofuku@deploy1003: Finished scap sync-world: Backport for Web features should not be ambiguously configured (T388445) (duration: 23m 41s)
22:32 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host elastic1124.eqiad.wmnet with OS bullseye
22:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2310.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
22:31 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1124.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
22:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2310.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
22:26 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2301.codfw.wmnet with OS bookworm
22:26 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2302.codfw.wmnet with OS bookworm
22:25 toyofuku@deploy1003: toyofuku, jdlrobson: Continuing with sync
22:20 jclark@cumin1002: START - Cookbook sre.hosts.provision for host elastic1124.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
22:18 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host elastic1124.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
22:18 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2310.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
22:18 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host elastic1125.eqiad.wmnet with OS bullseye
22:18 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1125.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
22:17 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2310
22:17 jclark@cumin1002: START - Cookbook sre.hosts.provision for host elastic1124.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
22:17 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2310
22:17 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host elastic1124.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
22:15 toyofuku@deploy1003: toyofuku, jdlrobson: Backport for Web features should not be ambiguously configured (T388445) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
22:15 jclark@cumin1002: START - Cookbook sre.hosts.provision for host elastic1124.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
22:13 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on P{cp6010.drmrs.wmnet} and A:cp
22:13 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host elastic1124.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
22:12 jclark@cumin1002: START - Cookbook sre.hosts.provision for host elastic1125.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
22:12 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
22:12 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2310 to codfw - jhancock@cumin2002"
22:12 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on P{cp6007.drmrs.wmnet} and A:cp
22:12 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2310 to codfw - jhancock@cumin2002"
22:10 topranks: reset configuration on cr1-drmrs to enable external connections T389071
22:09 toyofuku@deploy1003: Started scap sync-world: Backport for Web features should not be ambiguously configured (T388445)
22:08 jhancock@cumin2002: START - Cookbook sre.dns.netbox
22:06 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on P{cp6007.drmrs.wmnet} and A:cp
22:06 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on P{cp6010.drmrs.wmnet} and A:cp
22:04 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1125.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
22:01 jclark@cumin1002: START - Cookbook sre.hosts.provision for host elastic1124.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
22:01 topranks: resetting PIC0 on cr1-drmrs to enable et-0/0/1 T389071
21:55 topranks: disabling external Internet peers in BGP on cr1-drmrs T389071
21:51 jclark@cumin1002: START - Cookbook sre.hosts.provision for host elastic1125.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
21:47 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host elastic1124.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
21:47 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host elastic1123.eqiad.wmnet with OS bullseye
21:46 jclark@cumin1002: START - Cookbook sre.hosts.provision for host elastic1124.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
21:45 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1123.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
21:43 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host elastic1124
21:41 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host elastic1124
21:40 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host elastic1125
21:40 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host elastic1125
21:39 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host elastic1124.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
21:38 jclark@cumin1002: START - Cookbook sre.hosts.provision for host elastic1124.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
21:38 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on P{cp6011.drmrs.wmnet} and A:cp
21:38 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on P{cp6006.drmrs.wmnet} and A:cp
21:34 jclark@cumin1002: START - Cookbook sre.hosts.provision for host elastic1123.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
21:34 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
21:34 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for elastic - jclark@cumin1002"
21:34 topranks: enable 'graceful shutdown' mode for bgp on cr1-drmrs T389071
21:34 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for elastic - jclark@cumin1002"
21:32 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on P{cp6006.drmrs.wmnet} and A:cp
21:32 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on P{cp6011.drmrs.wmnet} and A:cp
21:30 topranks: drain transport circuit CRT-008647 T389071
21:30 jclark@cumin1002: START - Cookbook sre.dns.netbox
21:14 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2302.codfw.wmnet with OS bookworm
21:14 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2301.codfw.wmnet with OS bookworm
21:11 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2300.codfw.wmnet with OS bookworm
21:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2301.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
21:07 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2302.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
21:02 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2301.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
21:02 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2302.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
20:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2302.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
20:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2301.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
20:45 spiderpig@deploy1003: Finished scap sync-world: Backport for Edit check: in single action mode the fixed sidebar isn't allowed null offset (T389906) (duration: 11m 55s)
20:38 spiderpig@deploy1003: kemayo, spiderpig: Continuing with sync
20:38 spiderpig@deploy1003: kemayo, spiderpig: Backport for Edit check: in single action mode the fixed sidebar isn't allowed null offset (T389906) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:38 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2302.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
20:37 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2301.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
20:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2298.codfw.wmnet with OS bookworm
20:34 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
20:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2299.codfw.wmnet with OS bookworm
20:34 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
20:33 spiderpig@deploy1003: Started scap sync-world: Backport for Edit check: in single action mode the fixed sidebar isn't allowed null offset (T389906)
20:32 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
20:31 dancy@deploy1003: Installation of scap version "4.144.5" completed for 2 hosts
20:29 dancy@deploy1003: Installing scap version "4.144.5" for 2 host(s)
20:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
20:27 thcipriani: first successfull spiderpig backport window
20:26 brennen: end of UTC late backport & config window
20:26 spiderpig@deploy1003: Finished scap sync-world: Backport for Edit check: in single action mode the fixed sidebar isn't allowed null offset (T389906) (duration: 15m 03s)
20:20 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on P{cp6012.drmrs.wmnet} and A:cp
20:19 spiderpig@deploy1003: spiderpig, kemayo: Continuing with sync
20:19 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on P{cp6005.drmrs.wmnet} and A:cp
20:16 spiderpig@deploy1003: spiderpig, kemayo: Backport for Edit check: in single action mode the fixed sidebar isn't allowed null offset (T389906) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2298.codfw.wmnet with reason: host reimage
20:13 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on P{cp6012.drmrs.wmnet} and A:cp
20:13 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on P{cp6005.drmrs.wmnet} and A:cp
20:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2299.codfw.wmnet with reason: host reimage
20:11 spiderpig@deploy1003: Started scap sync-world: Backport for Edit check: in single action mode the fixed sidebar isn't allowed null offset (T389906)
20:10 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2298.codfw.wmnet with reason: host reimage
20:09 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2299.codfw.wmnet with reason: host reimage
19:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2298.codfw.wmnet with OS bookworm
19:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2300.codfw.wmnet with OS bookworm
19:58 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2299.codfw.wmnet with OS bookworm
19:57 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2300.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
19:57 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2299.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
19:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2298.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
19:47 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2298.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
19:47 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2300.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
19:46 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2299.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
19:30 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on P{cp6013.drmrs.wmnet} and A:cp
19:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2298.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
19:29 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on P{cp6004.drmrs.wmnet} and A:cp
19:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2298.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
19:24 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on P{cp6013.drmrs.wmnet} and A:cp
19:24 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on P{cp6004.drmrs.wmnet} and A:cp
18:49 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2300.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
18:49 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2299.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
18:43 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on P{cp6014.drmrs.wmnet} and A:cp
18:43 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on P{cp6003.drmrs.wmnet} and A:cp
18:39 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2300.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
18:39 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2299.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
18:37 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on P{cp6003.drmrs.wmnet} and A:cp
18:37 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on P{cp6014.drmrs.wmnet} and A:cp
18:35 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2298.codfw.wmnet with OS bookworm
18:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2297.codfw.wmnet with OS bookworm
18:35 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
18:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2296.codfw.wmnet with OS bookworm
18:35 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
18:21 dani@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
18:21 dani@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply
18:21 dani@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
18:21 dani@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
18:21 dani@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
18:20 dani@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply
17:51 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on P{cp6015.drmrs.wmnet} and A:cp
17:50 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on P{cp6002.drmrs.wmnet} and A:cp
17:44 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on P{cp6002.drmrs.wmnet} and A:cp
17:44 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on P{cp6015.drmrs.wmnet} and A:cp
17:43 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2298.codfw.wmnet with OS bookworm
17:29 dancy@deploy1003: Installation of scap version "4.144.4" completed for 2 hosts
17:27 dancy@deploy1003: Installing scap version "4.144.4" for 2 host(s)
17:12 akosiaris: restart keyholder-proxy.service on deploy1003, deploy2002 to pick up the spiderpig deployment group change
17:08 dancy@deploy1003: Installation of scap version "4.144.3" completed for 2 hosts
17:06 dancy@deploy1003: Installing scap version "4.144.3" for 2 host(s)
17:03 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on P{cp6016.drmrs.wmnet} and A:cp
17:03 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on P{cp6001.drmrs.wmnet} and A:cp
17:00 jelto@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: Upgrade GitLab to 17.8
16:58 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
16:57 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on P{cp6001.drmrs.wmnet} and A:cp
16:57 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on P{cp6016.drmrs.wmnet} and A:cp
16:52 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
16:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2297.codfw.wmnet with reason: host reimage
16:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2296.codfw.wmnet with reason: host reimage
16:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2297.codfw.wmnet with reason: host reimage
16:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2296.codfw.wmnet with reason: host reimage
16:34 kart_: Updated recommendation-api-ng to 2025-03-25-091801-production (T306508)
16:33 elukey@puppetserver1001: conftool action : set/pooled=yes; selector: name=ml-serve-ctrl2001.codfw.wmnet
16:28 kartik@deploy1003: helmfile [ml-serve-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
16:23 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2296.codfw.wmnet with OS bookworm
16:23 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2297.codfw.wmnet with OS bookworm
16:23 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2298.codfw.wmnet with OS bookworm
16:19 elukey@puppetserver1001: conftool action : set/pooled=yes; selector: name=ml-serve-ctrl2001.codfw.wmnet,dc=codfw,service=ml-ctrl
16:15 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on P{cp5032.eqsin.wmnet} and A:cp
16:13 kartik@deploy1003: helmfile [ml-serve-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
16:13 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-serve-ctrl2001.codfw.wmnet with OS bookworm
16:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2298.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
16:12 brett: Rolling out varnishkafka 1.2.0-1 to esams, ulsfo, eqsin, and magru (T389978)
16:12 brett: Rolling out varnishkafka 1.2.0-1 to esams, ulsfo, eqsin, and magru
16:12 kartik@deploy1003: helmfile [ml-serve-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
16:11 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
16:11 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
16:11 elukey@puppetserver1001: conftool action : set/pooled=no; selector: name=ml-serve-ctrl2001.codfw.wmnet
16:10 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
16:10 elukey@puppetserver1001: conftool action : set/pooled=no; selector: name=ml-serve-ctrl2001.codfw.wmnet,dc=codfw,service=ml-ctrl
16:10 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
16:08 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on P{cp5032.eqsin.wmnet} and A:cp
16:08 brett: Importing varnishkafka 1.2.0-1 into bullseye-wikimedia component/varnish-staging (T389978)
16:06 brouberol@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
16:06 brouberol@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
16:05 kartik@deploy1003: helmfile [ml-serve-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
16:03 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2297.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
16:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2296.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
16:02 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2298.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
16:02 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2297.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
16:02 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2296.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
16:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2295.codfw.wmnet with OS bookworm
16:01 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
16:01 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
16:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2293.codfw.wmnet with OS bookworm
16:00 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
16:00 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2294.codfw.wmnet with OS bookworm
16:00 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
15:58 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti-test2003.codfw.wmnet to cluster codfw_test and group A-test
15:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
15:57 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti-test2003.codfw.wmnet to cluster codfw_test and group A-test
15:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve-ctrl2001.codfw.wmnet with reason: host reimage
15:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2003.codfw.wmnet
15:53 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve-ctrl2001.codfw.wmnet with reason: host reimage
15:52 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
15:49 kartik@deploy1003: helmfile [ml-serve-eqiad] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
15:48 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2003.codfw.wmnet
15:44 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti-test2003.codfw.wmnet to cluster codfw_test and group A-test
15:44 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti-test2003.codfw.wmnet to cluster codfw_test and group A-test
15:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2295.codfw.wmnet with reason: host reimage
15:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2294.codfw.wmnet with reason: host reimage
15:38 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ml-serve-ctrl2001.codfw.wmnet with OS bookworm
15:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2293.codfw.wmnet with reason: host reimage
15:33 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2295.codfw.wmnet with reason: host reimage
15:33 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2294.codfw.wmnet with reason: host reimage
15:33 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2293.codfw.wmnet with reason: host reimage
15:32 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti-test2003.codfw.wmnet with OS bookworm
15:24 moritzm: installing Exim security updates
15:23 dancy@deploy1003: Installation of scap version "4.144.2" completed for 2 hosts
15:22 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2295.codfw.wmnet with OS bookworm
15:22 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2294.codfw.wmnet with OS bookworm
15:22 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2293.codfw.wmnet with OS bookworm
15:21 dancy@deploy1003: Installing scap version "4.144.2" for 2 host(s)
15:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2293.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
15:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2294.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
15:20 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2293.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
15:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2295.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
15:20 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2294.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
15:20 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2295.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
15:16 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2290.codfw.wmnet with OS bookworm
15:16 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
15:16 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2278.codfw.wmnet with OS bookworm
15:16 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
15:16 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2281.codfw.wmnet with OS bookworm
15:16 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
15:15 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti-test2003.codfw.wmnet with reason: host reimage
15:11 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti-test2003.codfw.wmnet with reason: host reimage
14:56 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
14:51 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
14:47 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti-test2003.codfw.wmnet with OS bookworm
14:46 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
14:46 jmm@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ganeti-test2003.codfw.wmnet
14:46 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2003.codfw.wmnet
14:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2281.codfw.wmnet with reason: host reimage
14:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2290.codfw.wmnet with reason: host reimage
14:35 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2003.codfw.wmnet
14:34 btullis@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
14:33 btullis@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
14:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2278.codfw.wmnet with reason: host reimage
14:30 hnowlan@cumin1002: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) pool all active/active services in codfw: Datacentre switchover repool - T385155
14:27 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2281.codfw.wmnet with reason: host reimage
14:27 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2290.codfw.wmnet with reason: host reimage
14:27 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2278.codfw.wmnet with reason: host reimage
14:15 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2290.codfw.wmnet with OS bookworm
14:15 daphnesmit@deploy1003: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
14:15 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2281.codfw.wmnet with OS bookworm
14:15 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2278.codfw.wmnet with OS bookworm
14:14 daphnesmit@deploy1003: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
14:14 daphnesmit@deploy1003: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
14:13 daphnesmit@deploy1003: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
14:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2278.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
14:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2290.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
14:13 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2278.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
14:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2281.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
14:12 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2290.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
14:12 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2281.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
14:10 daphnesmit@deploy1003: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
14:09 daphnesmit@deploy1003: helmfile [staging] START helmfile.d/services/wikifunctions: apply
14:08 hnowlan@cumin1002: START - Cookbook sre.discovery.datacenter pool all active/active services in codfw: Datacentre switchover repool - T385155
14:08 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ganeti-test2003.codfw.wmnet
14:06 hnowlan@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool site codfw [reason: Datacentre switchover repool, T385155]
14:06 hnowlan@cumin1002: START - Cookbook sre.dns.admin DNS admin: pool site codfw [reason: Datacentre switchover repool, T385155]
14:05 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti-test2003.codfw.wmnet
14:02 kevinbazira@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
13:47 btullis@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM dse-k8s-ctrl1001.eqiad.wmnet
13:42 btullis@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM dse-k8s-ctrl1001.eqiad.wmnet
13:39 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Upgrade GitLab to 17.8
13:38 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti-test2003.codfw.wmnet
13:36 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti-test2003.codfw.wmnet
13:36 jelto@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: Upgrade Replica to GitLab 17.8
13:34 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti-test2003.codfw.wmnet
13:34 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti-test2002.codfw.wmnet
13:34 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti-test2002.codfw.wmnet
13:30 tgr_: UTC afternoon deploys done
13:27 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Upgrade Replica to GitLab 17.8
13:26 jelto@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: Upgrade Replica to GitLab 17.8
13:25 tgr@deploy1003: Finished scap sync-world: Backport for ext-EventStreamConfig: Reduce product_metrics.web_base data collection, Enable SUL3 login for 50% of group 2 users (T384219) (duration: 14m 33s)
13:18 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti-test2002.codfw.wmnet to cluster codfw_test and group A-test
13:18 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti-test2002.codfw.wmnet to cluster codfw_test and group A-test
13:18 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Upgrade Replica to GitLab 17.8
13:18 tgr@deploy1003: phuedx, tgr: Continuing with sync
13:18 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti-test2002.codfw.wmnet to cluster codfw_test and group A-test
13:18 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti-test2002.codfw.wmnet to cluster codfw_test and group A-test
13:16 tgr@deploy1003: phuedx, tgr: Backport for ext-EventStreamConfig: Reduce product_metrics.web_base data collection, Enable SUL3 login for 50% of group 2 users (T384219) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:12 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2002.codfw.wmnet
13:11 tgr@deploy1003: Started scap sync-world: Backport for ext-EventStreamConfig: Reduce product_metrics.web_base data collection, Enable SUL3 login for 50% of group 2 users (T384219)
13:07 tgr_: running sendBulkEmail.php as per T389064#10676651
13:04 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2002.codfw.wmnet
13:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti-test2002.codfw.wmnet with OS bookworm
12:42 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti-test2002.codfw.wmnet with reason: host reimage
12:41 btullis@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM dse-k8s-ctrl1002.eqiad.wmnet
12:39 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti-test2002.codfw.wmnet with reason: host reimage
12:37 kartik@deploy1003: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
12:36 btullis@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM dse-k8s-ctrl1002.eqiad.wmnet
12:28 kevinbazira@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'article-models' for release 'main' .
12:25 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'article-models' for release 'main' .
12:24 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm
12:21 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti-test2002.codfw.wmnet with OS bookworm
12:19 jmm@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti-test2002.codfw.wmnet']
12:19 kevinbazira@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'article-models' for release 'main' .
12:10 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage
12:07 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage
11:56 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm
11:50 claime: Deployment done - T318285
11:50 claime: Cache purged for https://www.wikipedia.org and https://www.wikipedia.org/?search - T318285
11:45 claime: New httpbb tests on cumin1002 green - T318285
11:44 filippo@cumin1002: conftool action : set/weight=10; selector: name=prometheus1008.eqiad.wmnet
11:44 filippo@cumin1002: conftool action : set/weight=10; selector: name=prometheus1007.eqiad.wmnet
11:42 claime: Running new httpbb tests on cumin1002 - T318285
11:41 cgoubert@deploy1003: Finished scap sync-world: T318285 - 1123622 - www.wikipedia.org: fix search URL parameter (duration: 20m 34s)
11:39 mvolz@deploy1003: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
11:39 claime: Running puppet on cumin1002 to deploy new httpbb tests - T318285
11:38 mvolz@deploy1003: helmfile [eqiad] START helmfile.d/services/citoid: apply
11:38 mvolz@deploy1003: helmfile [codfw] DONE helmfile.d/services/citoid: apply
11:36 mvolz@deploy1003: helmfile [codfw] START helmfile.d/services/citoid: apply
11:35 mvolz@deploy1003: helmfile [staging] DONE helmfile.d/services/citoid: apply
11:35 cgoubert@deploy1003: cgoubert: Continuing with sync
11:35 mvolz@deploy1003: helmfile [staging] START helmfile.d/services/citoid: apply
11:26 claime: Running httpbb tests on deploy1003 before deploying apache change, should pass - T318285
11:26 cgoubert@deploy1003: cgoubert: T318285 - 1123622 - www.wikipedia.org: fix search URL parameter synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
11:25 cgoubert@deploy1003: helmfile [staging-codfw] START helmfile.d/services/mw-debug: apply
11:24 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm
11:23 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti-test2002.codfw.wmnet']
11:23 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ganeti-test2002.codfw.wmnet with OS bookworm
11:22 cgoubert@deploy1003: Started scap sync-world: T318285 - 1123622 - www.wikipedia.org: fix search URL parameter
11:19 claime: Running httpbb tests on deploy1003 before deploying apache change, should fail - T318285
11:16 joal@deploy1003: Finished deploy [analytics/refinery@2364d83] (hadoop-test): Analytics webrequest_frontend update TEST [analytics/refinery@2364d83c] (duration: 00m 47s)
11:16 joal@deploy1003: Started deploy [analytics/refinery@2364d83] (hadoop-test): Analytics webrequest_frontend update TEST [analytics/refinery@2364d83c]
11:14 claime: Running puppet on deploy1003 - T318285
11:14 claime: Enabling puppet on deploy1003 - T318285
11:11 claime: Disabling puppet on deploy1003 - T318285
11:09 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage
11:09 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti-test2002.codfw.wmnet with OS bookworm
11:07 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage
11:05 ladsgroup@deploy1003: Finished scap sync-world: Backport for Bump thumbnail steps to 45% (T360589) (duration: 14m 23s)
10:58 ladsgroup@deploy1003: ladsgroup: Continuing with sync
10:56 ladsgroup@deploy1003: ladsgroup: Backport for Bump thumbnail steps to 45% (T360589) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
10:55 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm
10:51 ladsgroup@deploy1003: Started scap sync-world: Backport for Bump thumbnail steps to 45% (T360589)
10:49 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti-test2002.codfw.wmnet
10:48 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest2001.codfw.wmnet with OS bookworm
10:48 aklapper@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.22 refs T386217
10:33 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage
10:29 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest2001.codfw.wmnet with reason: host reimage
10:25 joal@deploy1003: Finished deploy [analytics/refinery@2364d83] (thin): Analytics webrequest_frontend update THIN [analytics/refinery@2364d83c] (duration: 00m 59s)
10:24 joal@deploy1003: Started deploy [analytics/refinery@2364d83] (thin): Analytics webrequest_frontend update THIN [analytics/refinery@2364d83c]
10:23 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti-test2002.codfw.wmnet
10:22 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti-test2002.codfw.wmnet
10:22 joal@deploy1003: Finished deploy [analytics/refinery@2364d83]: Analytics webrequest_frontend update [analytics/refinery@2364d83c] (duration: 02m 01s)
10:22 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti-test2002.codfw.wmnet
10:22 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti-test2002.codfw.wmnet
10:22 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti-test2002.codfw.wmnet
10:21 klausman@deploy1003: helmfile [staging] DONE helmfile.d/services/changeprop: apply
10:21 klausman@deploy1003: helmfile [staging] START helmfile.d/services/changeprop: apply
10:20 joal@deploy1003: Started deploy [analytics/refinery@2364d83]: Analytics webrequest_frontend update [analytics/refinery@2364d83c]
10:18 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host sretest2001.codfw.wmnet with OS bookworm
10:14 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti-test2001.codfw.wmnet to cluster codfw_test and group A-test
10:13 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti-test2001.codfw.wmnet to cluster codfw_test and group A-test
10:12 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2001.codfw.wmnet
10:05 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2001.codfw.wmnet
09:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti-test2001.codfw.wmnet with OS bookworm
09:44 marostegui@cumin1002: START - Cookbook sre.mysql.clone of db2181.codfw.wmnet onto db2242.codfw.wmnet
09:33 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti-test2001.codfw.wmnet with reason: host reimage
09:32 marostegui@cumin1002: END (FAIL) - Cookbook sre.mysql.clone (exit_code=99) of db2181.codfw.wmnet onto db2241.codfw.wmnet
09:30 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti-test2001.codfw.wmnet with reason: host reimage
09:22 aklapper@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.44.0-wmf.22 refs T386217
09:07 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti-test2001.codfw.wmnet with OS bookworm
08:55 kart_: Deployed: Add all language codes to SectionTranslationTargetLanguages (T389920)
08:55 kartik@deploy1003: Finished scap sync-world: Backport for Add all language codes to SectionTranslationTargetLanguages (T387821) (duration: 16m 28s)
08:47 moritzm: installing dnsmasq security updates
08:47 volans: uploaded python3-wmflib_1.3.1 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia,bookworm-wikimedia
08:47 kartik@deploy1003: kartik, ngkountas: Continuing with sync
08:45 kartik@deploy1003: kartik, ngkountas: Backport for Add all language codes to SectionTranslationTargetLanguages (T387821) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
08:38 kartik@deploy1003: Started scap sync-world: Backport for Add all language codes to SectionTranslationTargetLanguages (T387821)
08:30 zoe@deploy1003: Finished scap sync-world: Backport for Archive user talk pages even if the userpage doesn't exist (T380911) (duration: 14m 10s)
08:23 zoe@deploy1003: zoe: Continuing with sync
08:23 zoe@deploy1003: zoe: Backport for Archive user talk pages even if the userpage doesn't exist (T380911) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
08:16 zoe@deploy1003: Started scap sync-world: Backport for Archive user talk pages even if the userpage doesn't exist (T380911)
07:20 marostegui@cumin1002: dbctl commit (dc=all): 'db2220 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P74427 and previous config saved to /var/cache/conftool/dbconfig/20250326-072033-root.json
07:05 marostegui@cumin1002: dbctl commit (dc=all): 'db2220 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P74426 and previous config saved to /var/cache/conftool/dbconfig/20250326-070527-root.json
07:02 marostegui@cumin1002: START - Cookbook sre.mysql.clone of db2181.codfw.wmnet onto db2241.codfw.wmnet
06:50 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2181 T381475', diff saved to https://phabricator.wikimedia.org/P74423 and previous config saved to /var/cache/conftool/dbconfig/20250326-065037-marostegui.json
06:50 marostegui@cumin1002: dbctl commit (dc=all): 'db2220 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P74422 and previous config saved to /var/cache/conftool/dbconfig/20250326-065022-root.json
06:35 marostegui@cumin1002: dbctl commit (dc=all): 'db2220 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P74421 and previous config saved to /var/cache/conftool/dbconfig/20250326-063517-root.json
06:20 marostegui@cumin1002: dbctl commit (dc=all): 'db2220 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P74420 and previous config saved to /var/cache/conftool/dbconfig/20250326-062011-root.json
06:13 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2220.codfw.wmnet with reason: Lagging
06:13 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2220', diff saved to https://phabricator.wikimedia.org/P74419 and previous config saved to /var/cache/conftool/dbconfig/20250326-061320-marostegui.json
05:07 kartik@deploy1003: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
02:11 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2286.codfw.wmnet with OS bookworm
02:11 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2289.codfw.wmnet with OS bookworm
02:11 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
02:11 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
02:11 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2288.codfw.wmnet with OS bookworm
02:11 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
02:11 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2287.codfw.wmnet with OS bookworm
02:11 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
02:07 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
02:02 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
01:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
01:52 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
01:49 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2289.codfw.wmnet with reason: host reimage
01:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2286.codfw.wmnet with reason: host reimage
01:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2288.codfw.wmnet with reason: host reimage
01:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2287.codfw.wmnet with reason: host reimage
01:36 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2289.codfw.wmnet with reason: host reimage
01:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2286.codfw.wmnet with reason: host reimage
01:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2288.codfw.wmnet with reason: host reimage
01:35 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2287.codfw.wmnet with reason: host reimage
01:24 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2289.codfw.wmnet with OS bookworm
01:24 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2288.codfw.wmnet with OS bookworm
01:24 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2287.codfw.wmnet with OS bookworm
01:23 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2286.codfw.wmnet with OS bookworm
01:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2289.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
01:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2288.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
01:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2287.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
01:20 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2289.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
01:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2286.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
01:20 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2288.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
01:20 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2287.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
01:20 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2286.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
01:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2279.codfw.wmnet with OS bookworm
01:18 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
01:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
01:16 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2282.codfw.wmnet with OS bookworm
01:16 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
01:16 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
01:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2283.codfw.wmnet with OS bookworm
01:14 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
01:13 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
01:11 zabe: zabe@mwmaint1002:~$ cat group1.dblist | xargs -I{} bash -c "echo {}; mwscript extensions/AbuseFilter/maintenance/MigrateESRefToAflTable.php {} --deletedump /home/zabe/afl_text_table_deletedump/{} --dump /home/zabe/afl_text_table_dump/{} --sleep 0.3" # T381599
01:10 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2280.codfw.wmnet with OS bookworm
01:10 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
01:10 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
01:04 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2279.codfw.wmnet with reason: host reimage
01:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2282.codfw.wmnet with reason: host reimage
00:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2283.codfw.wmnet with reason: host reimage
00:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2280.codfw.wmnet with reason: host reimage
00:52 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2283.codfw.wmnet with reason: host reimage
00:52 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2279.codfw.wmnet with reason: host reimage
00:52 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2282.codfw.wmnet with reason: host reimage
00:52 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2280.codfw.wmnet with reason: host reimage
00:41 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2283.codfw.wmnet with OS bookworm
00:41 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2282.codfw.wmnet with OS bookworm
00:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2280.codfw.wmnet with OS bookworm
00:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2279.codfw.wmnet with OS bookworm
00:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2283.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
00:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2282.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
00:38 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2283.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
00:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2280.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
00:37 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2282.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
00:37 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2280.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
00:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2279.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
00:36 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2279.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
00:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2285.codfw.wmnet with OS bookworm
00:33 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
00:31 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
00:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2284.codfw.wmnet with OS bookworm
00:30 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
00:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
00:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2292.codfw.wmnet with OS bookworm
00:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
00:24 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
00:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2277.codfw.wmnet with OS bookworm
00:21 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
00:20 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
00:16 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2285.codfw.wmnet with reason: host reimage
00:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2284.codfw.wmnet with reason: host reimage
00:09 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2292.codfw.wmnet with reason: host reimage
00:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2277.codfw.wmnet with reason: host reimage
00:03 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2292.codfw.wmnet with reason: host reimage
00:03 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2285.codfw.wmnet with reason: host reimage
00:03 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2284.codfw.wmnet with reason: host reimage
00:02 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2277.codfw.wmnet with reason: host reimage

2025-03-25

23:52 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2292.codfw.wmnet with OS bookworm
23:52 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2285.codfw.wmnet with OS bookworm
23:51 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2284.codfw.wmnet with OS bookworm
23:51 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2278.codfw.wmnet with OS bookworm
23:51 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2277.codfw.wmnet with OS bookworm
23:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2276.codfw.wmnet with OS bookworm
23:47 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
23:46 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
23:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2272.codfw.wmnet with OS bookworm
23:41 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
23:40 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
23:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2275.codfw.wmnet with OS bookworm
23:37 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
23:36 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
23:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2274.codfw.wmnet with OS bookworm
23:35 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
23:35 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
23:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2276.codfw.wmnet with reason: host reimage
23:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2273.codfw.wmnet with OS bookworm
23:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
23:30 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
23:25 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2272.codfw.wmnet with reason: host reimage
23:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2275.codfw.wmnet with reason: host reimage
23:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2274.codfw.wmnet with reason: host reimage
23:16 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2273.codfw.wmnet with reason: host reimage
23:14 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2276.codfw.wmnet with reason: host reimage
23:14 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2275.codfw.wmnet with reason: host reimage
23:14 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2274.codfw.wmnet with reason: host reimage
23:14 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2272.codfw.wmnet with reason: host reimage
23:13 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2273.codfw.wmnet with reason: host reimage
23:04 eileen: civicrm upgraded from fba4c3d6 to 73533b73
23:03 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2276.codfw.wmnet with OS bookworm
23:02 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2275.codfw.wmnet with OS bookworm
23:02 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2274.codfw.wmnet with OS bookworm
23:02 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2273.codfw.wmnet with OS bookworm
23:02 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2272.codfw.wmnet with OS bookworm
22:55 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2302.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
22:55 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2301.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
22:55 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2300.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
22:55 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2299.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
22:45 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2302.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
22:45 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2301.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
22:44 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2300.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
22:44 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2299.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
22:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2302.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
22:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2301.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
22:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2300.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
22:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2299.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
22:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2298.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
22:32 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2302.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
22:32 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2301.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
22:32 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2300.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
22:32 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2299.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2296.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2295.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2297.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
22:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2294.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
22:27 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on P{cp5024.eqsin.wmnet} and A:cp
22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2298.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2297.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
22:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2296.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
22:22 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2295.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
22:22 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2294.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
22:20 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on P{cp5024.eqsin.wmnet} and A:cp
22:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2298.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
22:18 brett@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp3066.esams.wmnet
22:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2297.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
22:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2296.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
22:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2294.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
22:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2295.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
22:08 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2298.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
22:07 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2297.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
22:07 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2296.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
22:07 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2295.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
22:07 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2294.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
22:06 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2302
22:06 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2302
22:06 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2301
22:06 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2301
22:06 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2300
22:06 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2300
22:06 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
22:05 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2300 to codfw - jhancock@cumin2002"
22:05 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2300 to codfw - jhancock@cumin2002"
22:01 jhancock@cumin2002: START - Cookbook sre.dns.netbox
21:57 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2299
21:57 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2299
21:57 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2298
21:57 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2298
21:57 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2297
21:57 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2297
21:57 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2296
21:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2296
21:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2295
21:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2295
21:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2294
21:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2294
21:54 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
21:54 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2294 to codfw - jhancock@cumin2002"
21:54 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2294 to codfw - jhancock@cumin2002"
21:51 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on P{cp5023.eqsin.wmnet} and A:cp
21:51 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on P{cp5025.eqsin.wmnet} and A:cp
21:44 jhancock@cumin2002: START - Cookbook sre.dns.netbox
21:44 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on P{cp5025.eqsin.wmnet} and A:cp
21:44 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on P{cp5023.eqsin.wmnet} and A:cp
21:40 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on P{cp5026.eqsin.wmnet} and A:cp
21:38 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on P{cp5022.eqsin.wmnet} and A:cp
21:32 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on P{cp5026.eqsin.wmnet} and A:cp
21:32 tgr_: UTC late deploys done
21:31 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on P{cp5022.eqsin.wmnet} and A:cp
21:29 tgr@deploy1003: Finished scap sync-world: Backport for Drop unused $wgCampaignEventsSeparateOngoingEvents (T386428), Enable SUL3 login for 10% of group 2 users (T384219), knwikisource, tcywikisource: add translate namespace (T388955) (duration: 17m 04s)
21:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2290.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
21:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2293.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
21:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2289.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
21:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2292.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
21:22 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2293.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
21:22 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2292.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
21:22 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2290.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
21:22 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2289.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
21:21 tgr@deploy1003: daimona, anzx, tgr: Continuing with sync
21:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2293.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
21:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2292.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
21:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2290.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
21:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2289.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
21:18 tgr@deploy1003: daimona, anzx, tgr: Backport for Drop unused $wgCampaignEventsSeparateOngoingEvents (T386428), Enable SUL3 login for 10% of group 2 users (T384219), knwikisource, tcywikisource: add translate namespace (T388955) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:12 tgr@deploy1003: Started scap sync-world: Backport for Drop unused $wgCampaignEventsSeparateOngoingEvents (T386428), Enable SUL3 login for 10% of group 2 users (T384219), knwikisource, tcywikisource: add translate namespace (T388955)
21:10 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2293.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
21:10 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2292.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
21:10 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2290.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
21:10 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2289.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
21:09 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2293
21:09 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2293
21:09 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2289
21:08 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2289
21:08 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2292
21:08 kemayo@deploy1003: Finished scap sync-world: Backport for Enable VisualEditor EditCheck multi-check a/b test on remaining wikis (T384372) (duration: 15m 23s)
21:08 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2290
21:08 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2292
21:08 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2289
21:08 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2290
21:08 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2289
21:08 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
21:08 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2289 to codfw - jhancock@cumin2002"
21:08 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2289 to codfw - jhancock@cumin2002"
21:04 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2288.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
21:03 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2287.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
21:03 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2286.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
21:03 jhancock@cumin2002: START - Cookbook sre.dns.netbox
21:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2285.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
21:01 kemayo@deploy1003: kemayo: Continuing with sync
20:59 kemayo@deploy1003: kemayo: Backport for Enable VisualEditor EditCheck multi-check a/b test on remaining wikis (T384372) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:58 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2288.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
20:57 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2287.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
20:57 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2286.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
20:57 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2285.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
20:55 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2288.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
20:55 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2287.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
20:55 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2286.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
20:55 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2285.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
20:53 kemayo@deploy1003: Started scap sync-world: Backport for Enable VisualEditor EditCheck multi-check a/b test on remaining wikis (T384372)
20:48 ryankemper@deploy1003: Finished scap sync-world: Backport for wdqs categories: switch to internal-main (T375520 T385896 T337013) (duration: 21m 40s)
20:45 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2288.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
20:45 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2287.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
20:44 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2286.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
20:44 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2285.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
20:43 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2288
20:43 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2287
20:43 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2286
20:43 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2285
20:43 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2288
20:43 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2287
20:43 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2286
20:43 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2285
20:42 gmodena@deploy1003: Finished deploy [airflow-dags/search@af7e28f]: Deploying mjolnir 2.6.0 (duration: 01m 00s)
20:41 gmodena@deploy1003: Started deploy [airflow-dags/search@af7e28f]: Deploying mjolnir 2.6.0
20:40 ryankemper@deploy1003: ryankemper: Continuing with sync
20:40 ryankemper: T385896 Got successful deepcat search of `deepcat:"musicals"` on `en.wikipedia.org` with `X-Wikimedia-Debug:backend=k8s-mwdebug`; rolling out change fully now
20:40 tgr_: running sendBulkEmail.php as per T389064#10676087
20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
20:39 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2285-88 to codfw - jhancock@cumin2002"
20:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2285-88 to codfw - jhancock@cumin2002"
20:35 jhancock@cumin2002: START - Cookbook sre.dns.netbox
20:33 ryankemper@deploy1003: ryankemper: Backport for wdqs categories: switch to internal-main (T375520 T385896 T337013) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2282.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
20:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2283.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
20:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2281.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
20:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2284.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
20:26 ryankemper@deploy1003: Started scap sync-world: Backport for wdqs categories: switch to internal-main (T375520 T385896 T337013)
20:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2284.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
20:22 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2283.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
20:22 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2282.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
20:22 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2281.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
20:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2284.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
20:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2283.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
20:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2282.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
20:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2281.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
20:10 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2284.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
20:10 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2283.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
20:10 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2282.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
20:10 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2281.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
20:09 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2284
20:09 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2282
20:09 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2281
20:09 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2283
20:09 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2282
20:09 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2281
20:09 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2284
20:09 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2283
20:07 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
20:07 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2281 to codfw - jhancock@cumin2002"
20:07 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2281 to codfw - jhancock@cumin2002"
19:55 jhancock@cumin2002: START - Cookbook sre.dns.netbox
19:53 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on P{cp5021.eqsin.wmnet} and A:cp
19:53 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on P{cp5027.eqsin.wmnet} and A:cp
19:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2276.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
19:47 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2276.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
19:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2273.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
19:46 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on P{cp5027.eqsin.wmnet} and A:cp
19:46 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on P{cp5021.eqsin.wmnet} and A:cp
19:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2272.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
19:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2274.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
19:46 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2273.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
19:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2275.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
19:45 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2274.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
19:45 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2274.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
19:45 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2273.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
19:45 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2274.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
19:45 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2275.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
19:45 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2273.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
19:45 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2272.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
19:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2278.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
19:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2280.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
19:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2279.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
19:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2277.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
19:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2280.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
19:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2279.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
19:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2278.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
19:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2277.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
19:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2278.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
19:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2279.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
19:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2280.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
19:29 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2277.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
19:18 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2280.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
19:18 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2279.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
19:18 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2278.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
19:18 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2277.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
19:17 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2280
19:17 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2279
19:17 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2278
19:17 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2277
19:17 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2280
19:17 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2279
19:16 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2278
19:16 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2277
19:16 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
19:16 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2278-80 to codfw - jhancock@cumin2002"
19:16 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2278-80 to codfw - jhancock@cumin2002"
19:13 dancy@deploy1003: Installation of scap version "4.144.1" completed for 2 hosts
19:12 jhancock@cumin2002: START - Cookbook sre.dns.netbox
19:11 dancy@deploy1003: Installing scap version "4.144.1" for 2 host(s)
19:11 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
19:11 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2277 to codfw - jhancock@cumin2002"
19:10 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2277 to codfw - jhancock@cumin2002"
19:05 jhancock@cumin2002: START - Cookbook sre.dns.netbox
19:00 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on P{cp5020.eqsin.wmnet} and A:cp
18:59 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on P{cp5028.eqsin.wmnet} and A:cp
18:53 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on P{cp5028.eqsin.wmnet} and A:cp
18:53 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on P{cp5020.eqsin.wmnet} and A:cp
18:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2274.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
18:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2273.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
18:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2272.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
18:47 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2273.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
18:47 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2273.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
18:47 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2274.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
18:47 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2273.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
18:47 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2272.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
18:42 sukhe@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp1100.eqiad.wmnet [reason: testing varnish stuff]
18:39 sukhe@puppetserver1001: conftool action : set/pooled=no; selector: name=cp1100.eqiad.wmnet [reason: testing varnish stuff]
18:31 dancy@deploy1003: Installation of scap version "4.144.0" completed for 2 hosts
18:29 dancy@deploy1003: Installing scap version "4.144.0" for 2 host(s)
18:25 brett@puppetserver1001: conftool action : set/pooled=no; selector: name=cp3066.esams.wmnet
18:25 brett: Depooling cp3066 for varnishkafka testing (T389978)
18:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2269.codfw.wmnet with OS bookworm
18:05 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
18:04 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
18:04 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2271.codfw.wmnet with OS bookworm
18:04 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
18:00 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
17:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2270.codfw.wmnet with OS bookworm
17:58 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
17:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
17:54 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on P{cp5029.eqsin.wmnet} and A:cp
17:54 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on P{cp5019.eqsin.wmnet} and A:cp
17:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2269.codfw.wmnet with reason: host reimage
17:47 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on P{cp5029.eqsin.wmnet} and A:cp
17:47 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on P{cp5019.eqsin.wmnet} and A:cp
17:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2271.codfw.wmnet with reason: host reimage
17:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2270.codfw.wmnet with reason: host reimage
17:39 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2269.codfw.wmnet with reason: host reimage
17:39 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2271.codfw.wmnet with reason: host reimage
17:39 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2270.codfw.wmnet with reason: host reimage
17:28 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2271.codfw.wmnet with OS bookworm
17:28 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2270.codfw.wmnet with OS bookworm
17:28 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2269.codfw.wmnet with OS bookworm
17:27 klausman@cumin2002: conftool action : set/weight=1; selector: name=ml-serve2009.codfw.wmnet
17:27 klausman@cumin2002: conftool action : set/weight=1; selector: name=ml-serve2010.codfw.wmnet
17:27 klausman@cumin2002: conftool action : set/weight=1; selector: name=ml-serve2011.codfw.wmnet
17:26 klausman@cumin2002: conftool action : set/pooled=yes; selector: name=ml-serve2011.codfw.wmnet
17:26 klausman@cumin2002: conftool action : set/pooled=yes; selector: name=ml-serve2010.codfw.wmnet
17:26 klausman@cumin2002: conftool action : set/pooled=yes; selector: name=ml-serve2009.codfw.wmnet
17:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2271.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
17:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2270.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
17:20 klausman@cumin2002: conftool action : set/pooled=yes; selector: name=ml-serve2007.codfw.wmnet
17:19 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2271.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
17:18 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2271.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
17:16 klausman@cumin2002: conftool action : set/pooled=yes; selector: name=ml-serve2007.codfw.wmnet
17:16 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2270.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
17:16 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2271.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
17:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2271.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
17:14 sukhe: restart pybal on lvs2013
17:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2270.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
17:13 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on P{cp5018.eqsin.wmnet} and A:cp
17:12 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on P{cp5030.eqsin.wmnet} and A:cp
17:11 sukhe: sudo systemctl restart pybal on lvs2014
17:09 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2269.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
17:08 swfrench@deploy1003: Finished scap sync-world: Helmfile-only deployment for next and migration release cleanups - T383845 (duration: 02m 45s)
17:07 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-serve2008.codfw.wmnet with OS bookworm
17:06 swfrench@deploy1003: Started scap sync-world: Helmfile-only deployment for next and migration release cleanups - T383845
17:05 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on P{cp5018.eqsin.wmnet} and A:cp
17:05 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on P{cp5030.eqsin.wmnet} and A:cp
17:04 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2271.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
17:04 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2271.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
17:04 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2271.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
17:04 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2270.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
17:04 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2269.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
17:03 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2271
17:03 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2271
17:03 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2270
17:03 jhancock@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host wikikube-worker2271
17:03 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2269
17:03 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2271
17:03 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2270
17:03 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2269
17:02 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
17:02 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2270 to codfw - jhancock@cumin2002"
17:02 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2270 to codfw - jhancock@cumin2002"
16:54 jhancock@cumin2002: START - Cookbook sre.dns.netbox
16:51 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve2008.codfw.wmnet with reason: host reimage
16:49 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2268.codfw.wmnet with OS bookworm
16:49 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
16:48 klausman@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve2008.codfw.wmnet with reason: host reimage
16:45 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
16:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2266.codfw.wmnet with OS bookworm
16:45 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
16:40 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
16:39 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-serve2007.codfw.wmnet with OS bookworm
16:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2267.codfw.wmnet with OS bookworm
16:36 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
16:36 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on P{cp5017.eqsin.wmnet} and A:cp
16:35 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on P{cp5031.eqsin.wmnet} and A:cp
16:35 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
16:34 brennen@deploy1003: Finished deploy [phabricator/deployment@f01e475]: deploy phab1004 for T389953 (duration: 00m 36s)
16:34 brennen@deploy1003: Started deploy [phabricator/deployment@f01e475]: deploy phab1004 for T389953
16:33 brennen@deploy1003: Finished deploy [phabricator/deployment@f01e475]: deploy phab2002 for T389953 (duration: 00m 39s)
16:33 brennen@deploy1003: Started deploy [phabricator/deployment@f01e475]: deploy phab2002 for T389953
16:33 jelto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab2002.codfw.wmnet with reason: Phabricator deploy
16:32 jelto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab1004.eqiad.wmnet with reason: Phabricator deploy
16:30 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host ml-serve2008
16:30 klausman@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ml-serve2008
16:30 klausman@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ml-serve2008
16:30 klausman@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ml-serve2008.codfw.wmnet 175.48.192.10.in-addr.arpa 5.7.1.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
16:30 klausman@cumin2002: START - Cookbook sre.dns.wipe-cache ml-serve2008.codfw.wmnet 175.48.192.10.in-addr.arpa 5.7.1.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
16:30 klausman@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:30 klausman@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ml-serve2008 - klausman@cumin2002"
16:30 klausman@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ml-serve2008 - klausman@cumin2002"
16:28 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on P{cp5031.eqsin.wmnet} and A:cp
16:28 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on P{cp5017.eqsin.wmnet} and A:cp
16:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2268.codfw.wmnet with reason: host reimage
16:27 cmooney@dns2005: END - running authdns-update
16:25 kamila@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
16:25 cmooney@dns2005: START - running authdns-update
16:25 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2266.codfw.wmnet with reason: host reimage
16:22 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve2007.codfw.wmnet with reason: host reimage
16:20 kamila@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
16:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2267.codfw.wmnet with reason: host reimage
16:18 kamila@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/benthos-mw-accesslog-metrics: apply
16:18 kamila@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/benthos-mw-accesslog-metrics: apply
16:18 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve2007.codfw.wmnet with reason: host reimage
16:18 robh: updating ssd firmware on cp4047 via T387238
16:17 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2268.codfw.wmnet with reason: host reimage
16:17 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2266.codfw.wmnet with reason: host reimage
16:17 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2267.codfw.wmnet with reason: host reimage
16:16 kamila@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/services/benthos-mw-accesslog-metrics: apply
16:15 kamila@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/services/benthos-mw-accesslog-metrics: apply
16:06 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2268.codfw.wmnet with OS bookworm
16:06 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2267.codfw.wmnet with OS bookworm
16:05 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2266.codfw.wmnet with OS bookworm
16:03 klausman@cumin2002: START - Cookbook sre.dns.netbox
16:03 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2267.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
16:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2268.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
16:02 klausman@cumin2002: START - Cookbook sre.hosts.move-vlan for host ml-serve2008
16:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2266.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
16:02 klausman@cumin2002: START - Cookbook sre.hosts.reimage for host ml-serve2008.codfw.wmnet with OS bookworm
16:00 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ml-serve2007.codfw.wmnet with OS bookworm
15:59 elukey@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ml-serve2007.codfw.wmnet with OS bookworm
15:57 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2267.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
15:57 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2267.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
15:57 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2268.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
15:57 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2267.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
15:57 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2266.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
15:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2264.codfw.wmnet with OS bookworm
15:53 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
15:53 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
15:49 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2265.codfw.wmnet with OS bookworm
15:49 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
15:49 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
15:48 elukey@puppetserver1001: conftool action : set/pooled=yes:weight=1; selector: name=ml-serve2003.codfw.wmnet,dc=codfw,cluster=maps,service=inference
15:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2263.codfw.wmnet with OS bookworm
15:48 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
15:46 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
15:40 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1008.eqiad.wmnet with OS bullseye
15:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2264.codfw.wmnet with reason: host reimage
15:36 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) frbast2002.mgmt.frack.codfw.wmnet on all recursors
15:36 pt1979@cumin2002: START - Cookbook sre.dns.wipe-cache frbast2002.mgmt.frack.codfw.wmnet on all recursors
15:35 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ganeti-test2001.codfw.wmnet with OS bookworm
15:34 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2265.codfw.wmnet with reason: host reimage
15:32 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:32 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: updare frack node to use new mgmt subnet 10.195.1.1/25 - pt1979@cumin2002"
15:32 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: updare frack node to use new mgmt subnet 10.195.1.1/25 - pt1979@cumin2002"
15:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2263.codfw.wmnet with reason: host reimage
15:30 klausman@cumin2002: conftool action : set/pooled=yes; selector: name=ml-serve2002.codfw.wmnet
15:29 cmooney@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 396032
15:29 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2265.codfw.wmnet with reason: host reimage
15:29 cmooney@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 396032
15:29 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2264.codfw.wmnet with reason: host reimage
15:28 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2263.codfw.wmnet with reason: host reimage
15:28 pt1979@cumin2002: START - Cookbook sre.dns.netbox
15:28 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti-test2001.codfw.wmnet with OS bookworm
15:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti-test2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
15:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2265.codfw.wmnet with OS bookworm
15:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2264.codfw.wmnet with OS bookworm
15:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2263.codfw.wmnet with OS bookworm
15:16 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host ml-serve2007
15:16 elukey@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ml-serve2007
15:16 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2265.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
15:16 elukey@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host ml-serve2007
15:16 elukey@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ml-serve2007.codfw.wmnet 78.32.192.10.in-addr.arpa 8.7.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
15:16 elukey@cumin1002: START - Cookbook sre.dns.wipe-cache ml-serve2007.codfw.wmnet 78.32.192.10.in-addr.arpa 8.7.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
15:16 elukey@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:16 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ml-serve2007 - elukey@cumin1002"
15:16 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ml-serve2007 - elukey@cumin1002"
15:11 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2264.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
15:11 elukey@cumin1002: START - Cookbook sre.dns.netbox
15:09 elukey@cumin1002: START - Cookbook sre.hosts.move-vlan for host ml-serve2007
15:09 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ml-serve2007.codfw.wmnet with OS bookworm
15:04 jmm@cumin2002: START - Cookbook sre.hosts.provision for host ganeti-test2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
15:04 ayounsi@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host atlas5001.wikimedia.org
15:04 ayounsi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM atlas5001.wikimedia.org - ayounsi@cumin1002"
15:04 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM atlas5001.wikimedia.org - ayounsi@cumin1002"
15:04 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) atlas5001.wikimedia.org on all recursors
15:04 ayounsi@cumin1002: START - Cookbook sre.dns.wipe-cache atlas5001.wikimedia.org on all recursors
15:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM atlas5001.wikimedia.org - ayounsi@cumin1002"
15:03 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM atlas5001.wikimedia.org - ayounsi@cumin1002"
15:02 Kemayo: Impromptu Editing backport window finished
15:01 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
15:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2263.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
15:01 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
15:01 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2264.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
15:01 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2264.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
15:01 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2265.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
15:01 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2264.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
15:01 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2263.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
15:01 kemayo@deploy1003: Finished scap sync-world: Backport for Edit check: add editcheck-references-shown to the allowed tags list (T373949), Edit check: don't close the sidebar on context change on desktop (T389906) (duration: 17m 30s)
15:00 godog: finished moving k8s instances to prometheus2008 - T383232
14:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2262.codfw.wmnet with OS bookworm
14:58 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
14:58 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
14:58 ayounsi@cumin1002: START - Cookbook sre.dns.netbox
14:58 ayounsi@cumin1002: START - Cookbook sre.ganeti.makevm for new host atlas5001.wikimedia.org
14:56 jmm@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['ganeti-test2001.codfw.wmnet']
14:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2259.codfw.wmnet with OS bookworm
14:56 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
14:55 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
14:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2272.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
14:53 kemayo@deploy1003: kemayo: Continuing with sync
14:52 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2272.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
14:50 kemayo@deploy1003: kemayo: Backport for Edit check: add editcheck-references-shown to the allowed tags list (T373949), Edit check: don't close the sidebar on context change on desktop (T389906) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:49 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2255.codfw.wmnet with OS bookworm
14:49 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
14:48 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
14:44 marostegui@cumin1002: dbctl commit (dc=all): 'db2181 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P74414 and previous config saved to /var/cache/conftool/dbconfig/20250325-144447-root.json
14:43 kemayo@deploy1003: Started scap sync-world: Backport for Edit check: add editcheck-references-shown to the allowed tags list (T373949), Edit check: don't close the sidebar on context change on desktop (T389906)
14:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2262.codfw.wmnet with reason: host reimage
14:40 fabfur: enable puppet on A:cp (T384227)
14:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2259.codfw.wmnet with reason: host reimage
14:39 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2262.codfw.wmnet with reason: host reimage
14:39 godog: move k8s instances from prometheus2006 to prometheus2008 - T383232
14:37 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2259.codfw.wmnet with reason: host reimage
14:35 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2220 gradually with 4 steps - Pooling in after OS upgrade
14:35 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['ganeti-test2001.codfw.wmnet']
14:33 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ganeti-test2001.codfw.wmnet with OS bookworm
14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2255.codfw.wmnet with reason: host reimage
14:30 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2255.codfw.wmnet with reason: host reimage
14:29 marostegui@cumin1002: dbctl commit (dc=all): 'db2181 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P74412 and previous config saved to /var/cache/conftool/dbconfig/20250325-142942-root.json
14:29 Kemayo: Impromptu Editing backport window started
14:27 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2262.codfw.wmnet with OS bookworm
14:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2262.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
14:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2262.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
14:25 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2259.codfw.wmnet with OS bookworm
14:25 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2259.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
14:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2259.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
14:23 phuedx: UTC afternoon backport window finished
14:22 phuedx@deploy1003: Finished scap sync-world: Backport for Fully silence TRX profiler after autocreation (T388165) (duration: 16m 22s)
14:20 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1008.eqiad.wmnet with OS bullseye
14:20 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudelastic1008.eqiad.wmnet']
14:18 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2255.codfw.wmnet with OS bookworm
14:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2255.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
14:15 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2255.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
14:14 marostegui@cumin1002: dbctl commit (dc=all): 'db2181 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P74410 and previous config saved to /var/cache/conftool/dbconfig/20250325-141437-root.json
14:13 phuedx@deploy1003: phuedx, matmarex: Continuing with sync
14:12 phuedx@deploy1003: phuedx, matmarex: Backport for Fully silence TRX profiler after autocreation (T388165) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:12 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudelastic1008.eqiad.wmnet']
14:12 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudelastic1008.eqiad.wmnet']
14:09 fabfur: rebooting cp4047 (T384227)
14:06 phuedx@deploy1003: Started scap sync-world: Backport for Fully silence TRX profiler after autocreation (T388165)
14:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab2003.wikimedia.org
14:03 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-serve2009.codfw.wmnet with OS bookworm
14:02 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudelastic1008.eqiad.wmnet']
13:59 marostegui@cumin1002: dbctl commit (dc=all): 'db2181 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P74408 and previous config saved to /var/cache/conftool/dbconfig/20250325-135931-root.json
13:59 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host gitlab2003.wikimedia.org
13:58 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host phab1005.eqiad.wmnet
13:57 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti-test2001.codfw.wmnet with OS bookworm
13:55 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['cloudelastic1008.eqiad.wmnet']
13:51 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host phab1005.eqiad.wmnet
13:49 phuedx@deploy1003: Finished scap sync-world: Backport for Restore deprecated aliases for CommentStoreComment and RawMessage (T388725) (duration: 15m 39s)
13:49 fceratto@cumin1002: START - Cookbook sre.mysql.pool db2220 gradually with 4 steps - Pooling in after OS upgrade
13:48 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.pool (exit_code=99) db2220 gradually with 4 steps - Pooling in after OS upgrade
13:48 fceratto@cumin1002: START - Cookbook sre.mysql.pool db2220 gradually with 4 steps - Pooling in after OS upgrade
13:48 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudelastic1008.eqiad.wmnet']
13:47 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: apply updated master config - bking@cumin2002 - T388150
13:47 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve2009.codfw.wmnet with reason: host reimage
13:46 fabfur: disable puppet on A:cp to test https://gerrit.wikimedia.org/r/c/operations/puppet/+/1129223 (T384227)
13:44 klausman@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve2009.codfw.wmnet with reason: host reimage
13:44 marostegui@cumin1002: dbctl commit (dc=all): 'db2181 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P74406 and previous config saved to /var/cache/conftool/dbconfig/20250325-134426-root.json
13:41 phuedx@deploy1003: phuedx, matmarex: Continuing with sync
13:40 phuedx@deploy1003: phuedx, matmarex: Backport for Restore deprecated aliases for CommentStoreComment and RawMessage (T388725) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host maps-test2002.codfw.wmnet with OS bookworm
13:35 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1008.eqiad.wmnet with OS bullseye
13:33 phuedx@deploy1003: Started scap sync-world: Backport for Restore deprecated aliases for CommentStoreComment and RawMessage (T388725)
13:31 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host ml-serve2009
13:31 klausman@cumin2002: START - Cookbook sre.hosts.move-vlan for host ml-serve2009
13:31 klausman@cumin2002: START - Cookbook sre.hosts.reimage for host ml-serve2009.codfw.wmnet with OS bookworm
13:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps-test2002.codfw.wmnet with reason: host reimage
13:19 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on maps-test2002.codfw.wmnet with reason: host reimage
13:19 marostegui@cumin1002: END (FAIL) - Cookbook sre.mysql.clone (exit_code=99) of db2181.codfw.wmnet onto db2243.codfw.wmnet
13:19 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: apply updated master config - bking@cumin2002 - T388150
13:18 bking@cumin2002: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: apply updated master config - bking@cumin2002 - T388150
13:18 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: apply updated master config - bking@cumin2002 - T388150
13:17 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster cloudelastic: apply updated master config - bking@cumin2002 - T388150
13:17 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster cloudelastic: apply updated master config - bking@cumin2002 - T388150
13:15 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-serve2010.codfw.wmnet with OS bookworm
13:14 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-serve2006.codfw.wmnet with OS bookworm
13:00 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve2010.codfw.wmnet with reason: host reimage
13:00 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host maps-test2002.codfw.wmnet with OS bookworm
12:57 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve2006.codfw.wmnet with reason: host reimage
12:55 klausman@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve2010.codfw.wmnet with reason: host reimage
12:55 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve2006.codfw.wmnet with reason: host reimage
12:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host maps-test2001.codfw.wmnet with OS bookworm
12:42 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host ml-serve2010
12:42 klausman@cumin2002: START - Cookbook sre.hosts.move-vlan for host ml-serve2010
12:42 klausman@cumin2002: START - Cookbook sre.hosts.reimage for host ml-serve2010.codfw.wmnet with OS bookworm
12:35 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host ml-serve2006
12:35 elukey@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ml-serve2006
12:34 elukey@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host ml-serve2006
12:34 elukey@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ml-serve2006.codfw.wmnet 115.16.192.10.in-addr.arpa 5.1.1.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
12:34 elukey@cumin1002: START - Cookbook sre.dns.wipe-cache ml-serve2006.codfw.wmnet 115.16.192.10.in-addr.arpa 5.1.1.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
12:34 elukey@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
12:34 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ml-serve2006 - elukey@cumin1002"
12:32 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ml-serve2006 - elukey@cumin1002"
12:31 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps-test2001.codfw.wmnet with reason: host reimage
12:29 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on maps-test2001.codfw.wmnet with reason: host reimage
12:28 elukey@cumin1002: START - Cookbook sre.dns.netbox
12:28 elukey@cumin1002: START - Cookbook sre.hosts.move-vlan for host ml-serve2006
12:27 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ml-serve2006.codfw.wmnet with OS bookworm
12:26 ladsgroup@deploy1003: Finished scap sync-world: Backport for Bump thumbnail steps to 40% (T360589) (duration: 16m 00s)
12:23 cgoubert@deploy1003: helmfile [staging-codfw] DONE helmfile.d/services/mw-debug: apply
12:23 cgoubert@deploy1003: helmfile [staging-codfw] START helmfile.d/services/mw-debug: apply
12:23 cgoubert@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/services/mw-debug: apply
12:21 cgoubert@deploy1003: helmfile [staging-eqiad] START helmfile.d/services/mw-debug: apply
12:19 ladsgroup@deploy1003: ladsgroup: Continuing with sync
12:18 ladsgroup@deploy1003: ladsgroup: Backport for Bump thumbnail steps to 40% (T360589) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
12:17 marostegui@cumin1002: dbctl commit (dc=all): 'db2196 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P74405 and previous config saved to /var/cache/conftool/dbconfig/20250325-121749-root.json
12:15 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1008.eqiad.wmnet with OS bullseye
12:10 ladsgroup@deploy1003: Started scap sync-world: Backport for Bump thumbnail steps to 40% (T360589)
12:10 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host maps-test2001.codfw.wmnet with OS bookworm
12:02 marostegui@cumin1002: dbctl commit (dc=all): 'db2196 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P74404 and previous config saved to /var/cache/conftool/dbconfig/20250325-120244-root.json
11:51 marostegui@cumin1002: dbctl commit (dc=all): 'db1248 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P74403 and previous config saved to /var/cache/conftool/dbconfig/20250325-115109-root.json
11:48 cgoubert@deploy1003: Finished scap sync-world: 1127882: mediawiki: Change kafka topic for rsyslog - T384335 (duration: 15m 00s)
11:47 marostegui@cumin1002: dbctl commit (dc=all): 'db2196 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P74402 and previous config saved to /var/cache/conftool/dbconfig/20250325-114738-root.json
11:41 cgoubert@deploy1003: cgoubert: Continuing with sync
11:38 cgoubert@deploy1003: cgoubert: 1127882: mediawiki: Change kafka topic for rsyslog - T384335 synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
11:36 marostegui@cumin1002: dbctl commit (dc=all): 'db1248 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P74401 and previous config saved to /var/cache/conftool/dbconfig/20250325-113604-root.json
11:33 cgoubert@deploy1003: Started scap sync-world: 1127882: mediawiki: Change kafka topic for rsyslog - T384335
11:33 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
11:33 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
11:32 marostegui@cumin1002: dbctl commit (dc=all): 'db2196 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P74400 and previous config saved to /var/cache/conftool/dbconfig/20250325-113233-root.json
11:22 cgoubert@deploy1003: Sync cancelled.
11:21 marostegui@cumin1002: dbctl commit (dc=all): 'db1248 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P74398 and previous config saved to /var/cache/conftool/dbconfig/20250325-112059-root.json
11:17 marostegui@cumin1002: dbctl commit (dc=all): 'db2196 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P74397 and previous config saved to /var/cache/conftool/dbconfig/20250325-111727-root.json
11:16 moritzm: installing Python 3.11 security updates
11:11 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db2220.codfw.wmnet
11:10 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.upgrade (exit_code=99) for db2192.codfw.wmnet
11:06 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db2220 - Upgrading db2220.codfw.wmnet - fceratto@cumin1002
11:05 fceratto@cumin1002: START - Cookbook sre.mysql.depool db2220 - Upgrading db2220.codfw.wmnet - fceratto@cumin1002
11:05 marostegui@cumin1002: dbctl commit (dc=all): 'db1248 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P74393 and previous config saved to /var/cache/conftool/dbconfig/20250325-110554-root.json
11:05 fceratto@cumin1002: START - Cookbook sre.mysql.upgrade for db2220.codfw.wmnet
11:05 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.upgrade (exit_code=99) for db2220.codfw.wmnet
11:05 fceratto@cumin1002: dbctl commit (dc=all): 'Depool db2220 T389383', diff saved to https://phabricator.wikimedia.org/P74392 and previous config saved to /var/cache/conftool/dbconfig/20250325-110505-fceratto.json
11:03 fceratto@cumin1002: END (ERROR) - Cookbook sre.mysql.depool (exit_code=97) db2220 - Upgrading db2220.codfw.wmnet - fceratto@cumin1002
11:03 fceratto@cumin1002: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) db2192 slowly with 10 steps - Upgrade of db2192.codfw.wmnet completed - fceratto@cumin1002
11:03 fceratto@cumin1002: START - Cookbook sre.mysql.depool db2220 - Upgrading db2220.codfw.wmnet - fceratto@cumin1002
11:03 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.depool (exit_code=99) db2220 - Upgrading db2220.codfw.wmnet - fceratto@cumin1002
11:02 marostegui@cumin1002: dbctl commit (dc=all): 'db2196 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P74390 and previous config saved to /var/cache/conftool/dbconfig/20250325-110222-root.json
11:02 fceratto@cumin1002: dbctl commit (dc=all): 'Configure db2220 T389383', diff saved to https://phabricator.wikimedia.org/P74389 and previous config saved to /var/cache/conftool/dbconfig/20250325-110217-fceratto.json
10:58 fceratto@cumin1002: START - Cookbook sre.mysql.depool db2220 - Upgrading db2220.codfw.wmnet - fceratto@cumin1002
10:58 fceratto@cumin1002: START - Cookbook sre.mysql.upgrade for db2220.codfw.wmnet
10:52 marostegui@cumin1002: START - Cookbook sre.mysql.clone of db2181.codfw.wmnet onto db2243.codfw.wmnet
10:51 fceratto@cumin1002: dbctl commit (dc=all): 'Promote db2218 to s7 primary T389383', diff saved to https://phabricator.wikimedia.org/P74385 and previous config saved to /var/cache/conftool/dbconfig/20250325-105108-fceratto.json
10:50 marostegui@cumin1002: dbctl commit (dc=all): 'db1248 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P74384 and previous config saved to /var/cache/conftool/dbconfig/20250325-105049-root.json
10:50 federico3: Starting s7 codfw failover from db2220 to db2218 - T389383
10:49 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2181.codfw.wmnet with reason: Cloning db2243
10:46 marostegui@cumin1002: dbctl commit (dc=all): 'db2196 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P74382 and previous config saved to /var/cache/conftool/dbconfig/20250325-104630-root.json
10:45 fceratto@cumin1002: dbctl commit (dc=all): 'Remove db2218 from API/vslow/dump T389383', diff saved to https://phabricator.wikimedia.org/P74381 and previous config saved to /var/cache/conftool/dbconfig/20250325-104526-fceratto.json
10:44 fceratto@cumin1002: dbctl commit (dc=all): 'Set db2218 with weight 0 T389383', diff saved to https://phabricator.wikimedia.org/P74380 and previous config saved to /var/cache/conftool/dbconfig/20250325-104452-fceratto.json
10:44 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 28 hosts with reason: Primary switchover s7 T389383
10:39 godog: bounce ircecho on alert1002 - exceptions in journal
10:38 cgoubert@deploy1003: cgoubert: 1127882: mediawiki: Change kafka topic for rsyslog - T384335 synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
10:37 cgoubert@deploy1003: Started scap sync-world: 1127882: mediawiki: Change kafka topic for rsyslog - T384335
10:30 aklapper@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.22 refs T386217
10:30 fceratto@cumin1002: START - Cookbook sre.mysql.pool db2192 slowly with 10 steps - Upgrade of db2192.codfw.wmnet completed - fceratto@cumin1002
10:25 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db2192 - Upgrading db2192.codfw.wmnet - fceratto@cumin1002
10:24 fceratto@cumin1002: START - Cookbook sre.mysql.depool db2192 - Upgrading db2192.codfw.wmnet - fceratto@cumin1002
10:24 fceratto@cumin1002: START - Cookbook sre.mysql.upgrade for db2192.codfw.wmnet
10:24 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.upgrade (exit_code=99) for db2192.codfw.wmnet
10:22 fceratto@cumin1002: START - Cookbook sre.mysql.upgrade for db2192.codfw.wmnet
10:18 marostegui@cumin1002: dbctl commit (dc=all): 'Remove hosts from x2 T387332', diff saved to https://phabricator.wikimedia.org/P74377 and previous config saved to /var/cache/conftool/dbconfig/20250325-101805-marostegui.json
10:17 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.upgrade (exit_code=99) for db2192.codfw.wmnet
10:17 fceratto@cumin1002: START - Cookbook sre.mysql.upgrade for db2192.codfw.wmnet
10:16 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.upgrade (exit_code=99) for db2192.codfw.wmnet
10:16 fceratto@cumin1002: START - Cookbook sre.mysql.upgrade for db2192.codfw.wmnet
10:12 fceratto@cumin1002: dbctl commit (dc=all): 'Depool db2192 T389381', diff saved to https://phabricator.wikimedia.org/P74376 and previous config saved to /var/cache/conftool/dbconfig/20250325-101222-fceratto.json
10:11 marostegui@cumin1002: dbctl commit (dc=all): 'Repool ms1 T387332', diff saved to https://phabricator.wikimedia.org/P74375 and previous config saved to /var/cache/conftool/dbconfig/20250325-101101-marostegui.json
10:08 fceratto@cumin1002: dbctl commit (dc=all): 'Promote db2213 to s5 primary T389381', diff saved to https://phabricator.wikimedia.org/P74374 and previous config saved to /var/cache/conftool/dbconfig/20250325-100825-fceratto.json
10:08 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db1152.eqiad.wmnet
10:07 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db2142.codfw.wmnet
10:07 federico3: Starting s5 codfw failover from db2192 to db2213 - T389381
10:06 aklapper@deploy1003: Finished scap sync-world: testwikis to 1.44.0-wmf.22 refs T386217 (duration: 42m 25s)
10:04 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2142.codfw.wmnet,db1152.eqiad.wmnet with reason: Maintenance in ms1
10:01 marostegui@cumin1002: START - Cookbook sre.mysql.upgrade for db1152.eqiad.wmnet
10:00 fceratto@cumin1002: dbctl commit (dc=all): 'Remove db2213 from API/vslow/dump T389381', diff saved to https://phabricator.wikimedia.org/P74373 and previous config saved to /var/cache/conftool/dbconfig/20250325-100055-fceratto.json
10:00 marostegui@cumin1002: START - Cookbook sre.mysql.upgrade for db2142.codfw.wmnet
09:58 marostegui@cumin1002: dbctl commit (dc=all): 'Depool ms1 T387332', diff saved to https://phabricator.wikimedia.org/P74372 and previous config saved to /var/cache/conftool/dbconfig/20250325-095817-marostegui.json
09:57 fceratto@cumin1002: dbctl commit (dc=all): 'Set db2213 with weight 0 T389381', diff saved to https://phabricator.wikimedia.org/P74371 and previous config saved to /var/cache/conftool/dbconfig/20250325-095741-fceratto.json
09:57 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 24 hosts with reason: Primary switchover s5 T389381
09:45 dcausse: repooling wdqs1013
09:23 aklapper@deploy1003: Started scap sync-world: testwikis to 1.44.0-wmf.22 refs T386217
09:16 fabfur@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on cp4047.ulsfo.wmnet with reason: HW errors
09:16 volans: uploaded python3-wmflib_1.3.0 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia,bookworm-wikimedia
09:16 hashar@deploy1003: Finished scap sync-world: Backport for [kowikiquote] Change the logo and wordmark (T389631) (duration: 17m 22s)
09:08 hashar@deploy1003: superpes, hashar: Continuing with sync
09:03 hashar@deploy1003: superpes, hashar: Backport for [kowikiquote] Change the logo and wordmark (T389631) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
08:59 dcausse: repooling wdqs1018
08:58 hashar@deploy1003: Started scap sync-world: Backport for [kowikiquote] Change the logo and wordmark (T389631)
08:56 dcausse: depooling & restarting blazegraph on wdqs1013 (deadlocked)
08:54 kartik@deploy1003: Finished scap sync-world: Backport for AX: Add quick survey for MinT for Wikireaders (T381886) (duration: 20m 59s)
08:47 joal@deploy1003: Finished deploy [airflow-dags/analytics@324a662]: Regular analytics weekly train [airflow-dags/analytics@324a6629] (duration: 00m 30s)
08:47 kartik@deploy1003: abi, kartik: Continuing with sync
08:47 joal@deploy1003: Started deploy [airflow-dags/analytics@324a662]: Regular analytics weekly train [airflow-dags/analytics@324a6629]
08:38 kartik@deploy1003: abi, kartik: Backport for AX: Add quick survey for MinT for Wikireaders (T381886) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
08:37 joal@deploy1003: Finished deploy [airflow-dags/analytics@001332b]: Regular analytics weekly train [airflow-dags/analytics@001332b5] (duration: 00m 33s)
08:37 joal@deploy1003: Started deploy [airflow-dags/analytics@001332b]: Regular analytics weekly train [airflow-dags/analytics@001332b5]
08:33 kartik@deploy1003: Started scap sync-world: Backport for AX: Add quick survey for MinT for Wikireaders (T381886)
08:30 kartik@deploy1003: Finished scap sync-world: Backport for AX: Disable automatic translation entrypoints before release (T389176) (duration: 25m 13s)
08:23 kartik@deploy1003: kartik, abi: Continuing with sync
08:19 joal@deploy1003: Finished deploy [analytics/refinery@2f09783] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@2f097836] (duration: 00m 39s)
08:18 joal@deploy1003: Started deploy [analytics/refinery@2f09783] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@2f097836]
08:18 joal@deploy1003: Finished deploy [analytics/refinery@2f09783] (thin): Regular analytics weekly train THIN [analytics/refinery@2f097836] (duration: 00m 52s)
08:17 joal@deploy1003: Started deploy [analytics/refinery@2f09783] (thin): Regular analytics weekly train THIN [analytics/refinery@2f097836]
08:14 joal@deploy1003: Finished deploy [analytics/refinery@2f09783]: Regular analytics weekly train [analytics/refinery@2f097836] (duration: 02m 35s)
08:11 joal@deploy1003: Started deploy [analytics/refinery@2f09783]: Regular analytics weekly train [analytics/refinery@2f097836]
08:11 kartik@deploy1003: kartik, abi: Backport for AX: Disable automatic translation entrypoints before release (T389176) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
08:08 moritzm: installing jinja2 security updates
08:05 kartik@deploy1003: Started scap sync-world: Backport for AX: Disable automatic translation entrypoints before release (T389176)
08:05 hashar: Shifted MediaWiki train UTC-0 version window by one hour to avoid conflict with backport window.
07:00 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
07:00 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
06:32 kartik@deploy1003: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
06:32 kartik@deploy1003: helmfile [staging] START helmfile.d/services/machinetranslation: apply
04:51 kevinbazira@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
04:04 mwpresync@deploy1003: Pruned MediaWiki: 1.43.0-wmf.22, 1.43.0-wmf.23, 1.43.0-wmf.24, 1.43.0-wmf.25, 1.43.0-wmf.26, 1.43.0-wmf.27, 1.43.0-wmf.28, 1.44.0-wmf.1, 1.44.0-wmf.2, 1.44.0-wmf.3, 1.44.0-wmf.4, 1.44.0-wmf.5, 1.44.0-wmf.6, 1.44.0-wmf.8, 1.44.0-wmf.11, 1.44.0-wmf.12, 1.44.0-wmf.13, 1.44.0-wmf.14, 1.44.0-wmf.15, 1.44.0-wmf.16, 1.44.0-wmf.17, 1.44.0-wmf.18, 1.44.0-wmf.19 (duration: 04m 48s)
01:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2269.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
01:52 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2269.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
01:51 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2269.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
01:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2266.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
01:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2268.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
01:49 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2265.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
01:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2267.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
01:41 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2269.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
01:39 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2268.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
01:39 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2266.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
01:39 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2268.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
01:38 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2266.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
01:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2264.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
01:37 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2268.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
01:37 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2267.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
01:37 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2266.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
01:37 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2265.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
01:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2263.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
01:37 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2264.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
01:37 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2263.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
01:36 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2264.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
01:36 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2264.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
01:36 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2263.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
01:36 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2263.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
01:34 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2269
01:34 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2269
01:34 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2268
01:34 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2268
01:33 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2267
01:33 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2267
01:33 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2266
01:33 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2265
01:33 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2266
01:33 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2265
01:33 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2264
01:33 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2264
01:33 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2263
01:32 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2263
01:31 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
01:31 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2263 to codfw - jhancock@cumin2002"
01:31 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2263 to codfw - jhancock@cumin2002"
01:24 jhancock@cumin2002: START - Cookbook sre.dns.netbox
01:18 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
01:18 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2263 to codfw - jhancock@cumin2002"
01:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2263 to codfw - jhancock@cumin2002"
01:14 jhancock@cumin2002: START - Cookbook sre.dns.netbox
01:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2276.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
01:06 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2275.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
01:04 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2274.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
01:04 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2273.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
01:03 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2272.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
01:00 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2276.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
01:00 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2275.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
00:59 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2274.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
00:58 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2273.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
00:57 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2273.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
00:57 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2272.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
00:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2272.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
00:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2275.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
00:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2274.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
00:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2276.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
00:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2262.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
00:46 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2273.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
00:46 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2262.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
00:45 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2262.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
00:45 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2273.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
00:45 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2276.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
00:45 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2275.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
00:45 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2274.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
00:45 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2273.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
00:45 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2272.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
00:45 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2262.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
00:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1119.eqiad.wmnet with OS bullseye
00:43 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
00:42 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2273
00:42 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
00:42 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2275
00:42 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2273
00:42 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2276
00:42 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2275
00:42 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2274
00:42 jhancock@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host wikikube-worker2275
00:42 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2272
00:42 jhancock@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host wikikube-worker2273
00:42 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2262
00:42 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2276
00:42 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2275
00:42 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2274
00:42 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2273
00:41 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2272
00:41 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2262
00:41 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
00:41 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2262 to codfw - jhancock@cumin2002"
00:40 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2262 to codfw - jhancock@cumin2002"
00:35 jhancock@cumin2002: START - Cookbook sre.dns.netbox
00:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1119.eqiad.wmnet with reason: host reimage
00:25 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1119.eqiad.wmnet with reason: host reimage
00:16 sukhe: [correction]restarting varnishkafka-webrequest on cp7009
00:15 sukhe: restarting varnishkafka on cp7009
00:15 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host elastic1119.eqiad.wmnet with OS bullseye
00:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2262.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
00:01 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2262.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
00:01 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2262
00:00 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2262

2025-03-24

23:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2260.codfw.wmnet with OS bookworm
23:38 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
23:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2261.codfw.wmnet with OS bookworm
23:38 jhancock@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
23:36 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
23:30 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
23:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2261.codfw.wmnet with reason: host reimage
23:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2260.codfw.wmnet with reason: host reimage
23:14 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
23:13 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host elastic1119.eqiad.wmnet with OS bullseye
23:12 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2261.codfw.wmnet with reason: host reimage
23:12 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2260.codfw.wmnet with reason: host reimage
23:11 jhancock@cumin2002: START - Cookbook sre.dns.netbox
23:04 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
23:01 jhancock@cumin2002: START - Cookbook sre.dns.netbox
23:01 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2261.codfw.wmnet with OS bookworm
23:00 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2260.codfw.wmnet with OS bookworm
23:00 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2259.codfw.wmnet with OS bookworm
22:55 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1119.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
22:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2264.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
22:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2263.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
22:52 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2261.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
22:52 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2260.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
22:52 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2262.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
22:52 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2259.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
22:47 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2263.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
22:47 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2261.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
22:47 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2263.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
22:46 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2261.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
22:46 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2264.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
22:46 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2263.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
22:46 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2262.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
22:46 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2261.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
22:46 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2260.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
22:46 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2259.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
22:44 jclark@cumin1002: START - Cookbook sre.hosts.provision for host elastic1119.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
22:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2263.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
22:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2261.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
22:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2264.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
22:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2260.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
22:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2262.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
22:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2259.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
22:36 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on P{cp7009.magru.wmnet} and A:cp
22:31 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1119.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
22:30 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on P{cp7009.magru.wmnet} and A:cp
22:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host elastic1119.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
22:27 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1119.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
22:27 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2263.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
22:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2261.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
22:26 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2263.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
22:26 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2261.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
22:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2264.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
22:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2263.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
22:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2262.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2261.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2260.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
22:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2259.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
22:24 maryum: Deployed security fix for T358689
22:23 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2264
22:23 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2263
22:23 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2264
22:23 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2263
22:23 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
22:23 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2263 to codfw - jhancock@cumin2002"
22:23 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2263 to codfw - jhancock@cumin2002"
22:20 jclark@cumin1002: START - Cookbook sre.hosts.provision for host elastic1119.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
22:18 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on P{cp7008.magru.wmnet} and A:cp
22:18 jhancock@cumin2002: START - Cookbook sre.dns.netbox
22:18 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on P{cp7010.magru.wmnet} and A:cp
22:15 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2261
22:15 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2261
22:14 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2260
22:14 jhancock@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host wikikube-worker2261
22:14 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2262
22:14 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2261
22:14 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2260
22:14 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2262
22:13 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
22:13 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2260 to codfw - jhancock@cumin2002"
22:13 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2260 to codfw - jhancock@cumin2002"
22:13 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on P{cp7010.magru.wmnet} and A:cp
22:13 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on P{cp7008.magru.wmnet} and A:cp
22:12 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host elastic1119.eqiad.wmnet with OS bullseye
22:06 jhancock@cumin2002: START - Cookbook sre.dns.netbox
22:05 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host relforge1008.eqiad.wmnet with OS bullseye
22:05 jclark@cumin1002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
22:05 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host relforge1009.eqiad.wmnet with OS bullseye
22:05 jclark@cumin1002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
22:04 reedy@deploy1003: Synchronized php-1.44.0-wmf.21/extensions/CentralAuth/maintenance/populateLocalAndGlobalIds.php: T303590 (duration: 11m 56s)
21:53 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2259
21:53 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2259
21:53 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
21:53 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2259 to codfw - jhancock@cumin2002"
21:53 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2259 to codfw - jhancock@cumin2002"
21:42 jhancock@cumin2002: START - Cookbook sre.dns.netbox
21:32 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on P{cp7007.magru.wmnet} and A:cp
21:31 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on P{cp7011.magru.wmnet} and A:cp
21:28 brennen: end of UTC late backport & config window
21:28 brennen@deploy1003: Finished scap sync-world: Backport for Deploy donate banner for all wikis except English Wikipedia (T388438) (duration: 13m 31s)
21:26 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on P{cp7011.magru.wmnet} and A:cp
21:26 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on P{cp7007.magru.wmnet} and A:cp
21:21 brennen@deploy1003: ksarabia, brennen: Continuing with sync
21:19 brennen@deploy1003: ksarabia, brennen: Backport for Deploy donate banner for all wikis except English Wikipedia (T388438) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:14 brennen@deploy1003: Started scap sync-world: Backport for Deploy donate banner for all wikis except English Wikipedia (T388438)
21:12 brennen@deploy1003: Finished scap sync-world: Backport for Don't clobber error information for failed Flow creates (T380911) (duration: 17m 04s)
21:05 brennen@deploy1003: zoe, brennen: Continuing with sync
21:01 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
21:00 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on P{cp7006.magru.wmnet} and A:cp
20:59 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on P{cp7014.magru.wmnet} and A:cp
20:59 brennen@deploy1003: zoe, brennen: Backport for Don't clobber error information for failed Flow creates (T380911) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:56 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
20:55 brennen@deploy1003: Started scap sync-world: Backport for Don't clobber error information for failed Flow creates (T380911)
20:54 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on P{cp7014.magru.wmnet} and A:cp
20:54 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on P{cp7006.magru.wmnet} and A:cp
20:54 brennen@deploy1003: Finished scap sync-world: Backport for Disable Search AB Test (T389399) (duration: 24m 12s)
20:48 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on relforge1008.eqiad.wmnet with reason: host reimage
20:47 brennen@deploy1003: bwang, brennen: Continuing with sync
20:45 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on relforge1008.eqiad.wmnet with reason: host reimage
20:42 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on relforge1009.eqiad.wmnet with reason: host reimage
20:40 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on relforge1009.eqiad.wmnet with reason: host reimage
20:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2258.codfw.wmnet with OS bookworm
20:35 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
20:35 brennen@deploy1003: bwang, brennen: Backport for Disable Search AB Test (T389399) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:35 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
20:30 brennen@deploy1003: Started scap sync-world: Backport for Disable Search AB Test (T389399)
20:29 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host relforge1009.eqiad.wmnet with OS bullseye
20:29 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host relforge1008.eqiad.wmnet with OS bullseye
20:27 brennen@deploy1003: Finished scap sync-world: sync world after keyholder errors during spiderpig testing (duration: 02m 47s)
20:24 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host relforge1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
20:24 brennen@deploy1003: Started scap sync-world: sync world after keyholder errors during spiderpig testing
20:22 spiderpig@deploy1003: Finished scap sync-world: Backport for Enable SUL3 login for all group 1 users (T384153), Enable SUL3 login for 1% of group 2 users (T384219) (duration: 16m 21s)
20:21 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2257.codfw.wmnet with OS bookworm
20:21 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
20:20 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
20:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2258.codfw.wmnet with reason: host reimage
20:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2256.codfw.wmnet with OS bookworm
20:17 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
20:16 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
20:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2258.codfw.wmnet with reason: host reimage
20:14 spiderpig@deploy1003: tgr, spiderpig: Continuing with sync
20:11 spiderpig@deploy1003: tgr, spiderpig: Backport for Enable SUL3 login for all group 1 users (T384153), Enable SUL3 login for 1% of group 2 users (T384219) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:06 spiderpig@deploy1003: Started scap sync-world: Backport for Enable SUL3 login for all group 1 users (T384153), Enable SUL3 login for 1% of group 2 users (T384219)
20:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2257.codfw.wmnet with reason: host reimage
20:03 jclark@cumin1002: START - Cookbook sre.hosts.provision for host relforge1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
20:02 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2257.codfw.wmnet with reason: host reimage
20:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2256.codfw.wmnet with reason: host reimage
19:59 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2258.codfw.wmnet with OS bookworm
19:57 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2254.codfw.wmnet with OS bookworm
19:57 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
19:57 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2256.codfw.wmnet with reason: host reimage
19:56 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
19:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2258.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
19:56 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on P{cp7012.magru.wmnet} and A:cp
19:55 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2258.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
19:55 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on P{cp7005.magru.wmnet} and A:cp
19:55 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2258
19:55 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2258
19:54 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host relforge1009.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
19:50 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2257.codfw.wmnet with OS bookworm
19:50 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on P{cp7012.magru.wmnet} and A:cp
19:50 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on P{cp7005.magru.wmnet} and A:cp
19:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2257.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
19:49 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2257.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
19:49 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2257
19:49 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2257
19:48 jclark@cumin1002: START - Cookbook sre.hosts.provision for host relforge1009.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
19:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2254.codfw.wmnet with reason: host reimage
19:39 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2256.codfw.wmnet with OS bookworm
19:39 marostegui@cumin1002: dbctl commit (dc=all): 'db2179 (re)pooling @ 100%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74364 and previous config saved to /var/cache/conftool/dbconfig/20250324-193911-root.json
19:38 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2254.codfw.wmnet with reason: host reimage
19:37 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2256.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
19:37 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2256.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
19:36 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2256
19:36 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2256
19:36 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
19:33 jhancock@cumin2002: START - Cookbook sre.dns.netbox
19:31 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2255.codfw.wmnet with OS bookworm
19:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2255.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
19:30 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2255.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
19:30 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2255
19:29 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2255
19:27 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on P{cp7004.magru.wmnet} and A:cp
19:26 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on P{cp7013.magru.wmnet} and A:cp
19:26 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2254.codfw.wmnet with OS bookworm
19:25 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2254.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
19:24 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2254.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
19:24 marostegui@cumin1002: dbctl commit (dc=all): 'db2179 (re)pooling @ 75%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74363 and previous config saved to /var/cache/conftool/dbconfig/20250324-192406-root.json
19:23 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2254
19:23 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2254
19:21 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on P{cp7013.magru.wmnet} and A:cp
19:21 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on P{cp7004.magru.wmnet} and A:cp
19:16 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
19:13 jhancock@cumin2002: START - Cookbook sre.dns.netbox
19:09 marostegui@cumin1002: dbctl commit (dc=all): 'db2179 (re)pooling @ 50%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74362 and previous config saved to /var/cache/conftool/dbconfig/20250324-190900-root.json
19:01 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
19:01 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
19:00 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
18:59 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
18:53 marostegui@cumin1002: dbctl commit (dc=all): 'db2179 (re)pooling @ 25%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74361 and previous config saved to /var/cache/conftool/dbconfig/20250324-185355-root.json
18:47 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on P{cp7003.magru.wmnet} and A:cp
18:47 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on P{cp7015.magru.wmnet} and A:cp
18:41 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on P{cp7015.magru.wmnet} and A:cp
18:41 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on P{cp7003.magru.wmnet} and A:cp
18:38 marostegui@cumin1002: dbctl commit (dc=all): 'db2179 (re)pooling @ 10%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74360 and previous config saved to /var/cache/conftool/dbconfig/20250324-183850-root.json
18:14 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on P{cp7016.magru.wmnet} and A:cp
18:08 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on P{cp7016.magru.wmnet} and A:cp
18:08 brett@cumin2002: END (ERROR) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=97) rolling upgrade of Varnish on P{cp70[02,16].magru.wmnet} and A:cp
18:02 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on P{cp70[02,16].magru.wmnet} and A:cp
18:00 marostegui@cumin1002: dbctl commit (dc=all): 'db2212 (re)pooling @ 100%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74359 and previous config saved to /var/cache/conftool/dbconfig/20250324-180010-root.json
17:51 ebernhardson: T379002 Start reindex of cebwiki search indices in cloudelastic
17:45 marostegui@cumin1002: dbctl commit (dc=all): 'db2212 (re)pooling @ 75%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74358 and previous config saved to /var/cache/conftool/dbconfig/20250324-174505-root.json
17:33 swfrench@deploy1003: Finished scap sync-world: Deployment to pick up new php8.1 production image - T389243 (duration: 23m 54s)
17:32 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on P{cp7001.magru.wmnet} and A:cp
17:30 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
17:30 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: updare frack node to use new mgmt subnet 10.195.1.1/25 - pt1979@cumin2002"
17:30 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: updare frack node to use new mgmt subnet 10.195.1.1/25 - pt1979@cumin2002"
17:30 marostegui@cumin1002: dbctl commit (dc=all): 'db2212 (re)pooling @ 50%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74357 and previous config saved to /var/cache/conftool/dbconfig/20250324-173000-root.json
17:26 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on P{cp7001.magru.wmnet} and A:cp
17:26 pt1979@cumin2002: START - Cookbook sre.dns.netbox
17:23 swfrench@deploy1003: swfrench: Continuing with sync
17:18 swfrench@deploy1003: swfrench: Deployment to pick up new php8.1 production image - T389243 synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
17:14 marostegui@cumin1002: dbctl commit (dc=all): 'db2212 (re)pooling @ 25%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74356 and previous config saved to /var/cache/conftool/dbconfig/20250324-171454-root.json
17:09 swfrench@deploy1003: Started scap sync-world: Deployment to pick up new php8.1 production image - T389243
17:08 swfrench-wmf: rebuilt php8.1 production image suite (8.1.32-1-s1) - T389243
17:08 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2291.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
17:02 elukey@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2291.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
17:00 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2291.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
16:59 marostegui@cumin1002: dbctl commit (dc=all): 'db2212 (re)pooling @ 10%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74355 and previous config saved to /var/cache/conftool/dbconfig/20250324-165949-root.json
16:57 btullis@cumin1002: END (PASS) - Cookbook sre.apifeatureusage.roll-restart-reboot-logstash (exit_code=0) rolling restart_daemons on A:apifeatureusage
16:55 btullis@cumin1002: START - Cookbook sre.apifeatureusage.roll-restart-reboot-logstash rolling restart_daemons on A:apifeatureusage
16:52 btullis@cumin1002: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0) for Zookeeper A:zookeeper-flink-codfw cluster: Roll restart of jvm daemons.
16:49 elukey@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2291.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
16:45 btullis@cumin1002: START - Cookbook sre.zookeeper.roll-restart-zookeeper for Zookeeper A:zookeeper-flink-codfw cluster: Roll restart of jvm daemons.
16:38 reedy@deploy1003: Synchronized php-1.44.0-wmf.21/extensions/CentralAuth/maintenance/populateLocalAndGlobalIds.php: T303590 (duration: 11m 51s)
16:32 btullis@cumin1002: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0) for Zookeeper A:zookeeper-flink-eqiad cluster: Roll restart of jvm daemons.
16:29 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host relforge1010.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
16:29 jclark@cumin1002: START - Cookbook sre.hosts.provision for host relforge1010.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
16:28 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2291
16:28 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2291
16:28 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2291 to codfw - jhancock@cumin2002"
16:28 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2291 to codfw - jhancock@cumin2002"
16:26 btullis@cumin1002: START - Cookbook sre.zookeeper.roll-restart-zookeeper for Zookeeper A:zookeeper-flink-eqiad cluster: Roll restart of jvm daemons.
16:23 jhancock@cumin2002: START - Cookbook sre.dns.netbox
16:22 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host relforge1010.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
16:22 jclark@cumin1002: START - Cookbook sre.hosts.provision for host relforge1010.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
16:21 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host relforge1010.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
16:21 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host relforge1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
16:21 jclark@cumin1002: START - Cookbook sre.hosts.provision for host relforge1010.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
16:12 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2327.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
16:11 elukey@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2327.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
16:10 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-main: apply
16:09 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-main: apply
16:08 jclark@cumin1002: START - Cookbook sre.hosts.provision for host relforge1008.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
16:03 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
16:03 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2256.codfw.wmnet with OS bookworm
16:03 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2255.codfw.wmnet with OS bookworm
16:03 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2254.codfw.wmnet with OS bookworm
16:02 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply
16:02 cgoubert@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
16:02 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
16:02 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
16:02 cgoubert@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply
16:01 cgoubert@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'apply'.
16:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2256.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
16:01 cgoubert@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
16:01 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2256.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
16:00 cgoubert@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'.
16:00 cgoubert@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'.
15:59 cgoubert@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'.
15:58 cgoubert@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
15:58 cgoubert@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
15:58 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-main: apply
15:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-ctrl2005.codfw.wmnet with OS bookworm
15:58 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
15:57 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-main: apply
15:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
15:57 cgoubert@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
15:56 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2255.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
15:56 cgoubert@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
15:56 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2255.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
15:56 marostegui@cumin1002: dbctl commit (dc=all): 'Add ms2 to live traffic T387332', diff saved to https://phabricator.wikimedia.org/P74354 and previous config saved to /var/cache/conftool/dbconfig/20250324-155616-marostegui.json
15:52 swfrench-wmf: reprepro include php8.1 8.1.32-1+wmf11u1 in component/php81
15:51 cgoubert@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/admin 'apply'.
15:51 cgoubert@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
15:51 cgoubert@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
15:51 cgoubert@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
15:49 cgoubert@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
15:49 cgoubert@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
15:47 brett@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp7001.magru.wmnet
15:46 cgoubert@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
15:46 cgoubert@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
15:45 swfrench-wmf: reprepro include php-excimer 1.2.3-1+wmf11u1 in component/php81 - T389243
15:45 cgoubert@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
15:44 claime: Deploying pending admin_ng changes to all clusters
15:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2255.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
15:43 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2255.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
15:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-ctrl2005.codfw.wmnet with reason: host reimage
15:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2254.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
15:41 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2255.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
15:41 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2255.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
15:41 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2254.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
15:40 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2255
15:40 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2255
15:40 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2254
15:40 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2254
15:39 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:39 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2254 to codfw - jhancock@cumin2002"
15:39 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2254 to codfw - jhancock@cumin2002"
15:38 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-ctrl2005.codfw.wmnet with reason: host reimage
15:37 marostegui@cumin1002: dbctl commit (dc=all): 'db2205 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P74353 and previous config saved to /var/cache/conftool/dbconfig/20250324-153706-root.json
15:34 jhancock@cumin2002: START - Cookbook sre.dns.netbox
15:32 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2144.codfw.wmnet,db1151.eqiad.wmnet with reason: Maintenance in x2
15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2254 to codfw - jhancock@cumin2002"
15:28 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2254 to codfw - jhancock@cumin2002"
15:26 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl2005.codfw.wmnet with OS bookworm
15:25 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
15:24 jhancock@cumin2002: START - Cookbook sre.dns.netbox
15:24 jhancock@cumin2002: START - Cookbook sre.dns.netbox
15:22 marostegui@cumin1002: dbctl commit (dc=all): 'db2205 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P74352 and previous config saved to /var/cache/conftool/dbconfig/20250324-152201-root.json
15:20 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
15:06 marostegui@cumin1002: dbctl commit (dc=all): 'db2205 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P74351 and previous config saved to /var/cache/conftool/dbconfig/20250324-150655-root.json
15:03 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2179.codfw.wmnet with reason: Maintenance
15:03 btullis@cumin1002: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0) for Zookeeper A:zookeeper-druid-analytics cluster: Roll restart of jvm daemons.
15:03 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host relforge1009.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
15:00 root@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2179.codfw.wmnet with reason: Index rebuild
14:59 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db2179.codfw.wmnet
14:58 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1119.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
14:57 btullis@cumin1002: START - Cookbook sre.zookeeper.roll-restart-zookeeper for Zookeeper A:zookeeper-druid-analytics cluster: Roll restart of jvm daemons.
14:56 btullis@cumin1002: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0) for Zookeeper A:zookeeper-druid-public cluster: Roll restart of jvm daemons.
14:53 elukey@cumin2002: START - Cookbook sre.hosts.provision for host elastic1119.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
14:52 tgr_: UTC afternoon deploys done
14:52 marostegui@cumin1002: START - Cookbook sre.mysql.upgrade for db2179.codfw.wmnet
14:52 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host elastic1119.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
14:52 elukey@cumin2002: START - Cookbook sre.hosts.provision for host elastic1119.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
14:51 marostegui@cumin1002: dbctl commit (dc=all): 'db2205 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P74350 and previous config saved to /var/cache/conftool/dbconfig/20250324-145150-root.json
14:51 jclark@cumin1002: START - Cookbook sre.hosts.provision for host relforge1009.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
14:51 tgr@deploy1003: Finished scap sync-world: Backport for Redirect credentials change pages to central domain (T362715), Fix clearing stuck cookies: $wgCookiePrefix defaults to false, not null (T389796) (duration: 19m 38s)
14:51 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2179 T389378', diff saved to https://phabricator.wikimedia.org/P74349 and previous config saved to /var/cache/conftool/dbconfig/20250324-145100-marostegui.json
14:50 btullis@cumin1002: START - Cookbook sre.zookeeper.roll-restart-zookeeper for Zookeeper A:zookeeper-druid-public cluster: Roll restart of jvm daemons.
14:50 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db2240 to s4 primary T389378', diff saved to https://phabricator.wikimedia.org/P74348 and previous config saved to /var/cache/conftool/dbconfig/20250324-145018-marostegui.json
14:49 marostegui: Starting s4 codfw failover from db2179 to db2240 - T389378
14:49 joal@deploy1003: Finished deploy [analytics/refinery@e0320e1] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@e0320e14] (duration: 00m 36s)
14:49 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
14:48 joal@deploy1003: Started deploy [analytics/refinery@e0320e1] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@e0320e14]
14:47 joal@deploy1003: Finished deploy [analytics/refinery@e0320e1] (thin): Regular analytics weekly train THIN [analytics/refinery@e0320e14] (duration: 00m 50s)
14:46 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host relforge1010.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
14:46 joal@deploy1003: Started deploy [analytics/refinery@e0320e1] (thin): Regular analytics weekly train THIN [analytics/refinery@e0320e14]
14:45 joal@deploy1003: Finished deploy [analytics/refinery@e0320e1]: Regular analytics weekly train [analytics/refinery@e0320e14] (duration: 02m 42s)
14:44 marostegui@cumin1002: dbctl commit (dc=all): 'Remove db2240 from API/vslow/dump T389378', diff saved to https://phabricator.wikimedia.org/P74347 and previous config saved to /var/cache/conftool/dbconfig/20250324-144440-marostegui.json
14:44 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 33 hosts with reason: Primary switchover s4 T389378
14:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2255.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
14:44 tgr@deploy1003: matmarex, tgr: Continuing with sync
14:44 marostegui@cumin1002: dbctl commit (dc=all): 'Set db2240 with weight 0 T389378', diff saved to https://phabricator.wikimedia.org/P74346 and previous config saved to /var/cache/conftool/dbconfig/20250324-144410-marostegui.json
14:43 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2255.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
14:43 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2254.codfw.wmnet with OS bookworm
14:43 joal@deploy1003: Started deploy [analytics/refinery@e0320e1]: Regular analytics weekly train [analytics/refinery@e0320e14]
14:41 jclark@cumin1002: START - Cookbook sre.hosts.provision for host relforge1010.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
14:39 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host relforge1010
14:39 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host relforge1010
14:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2254.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
14:38 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2254.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
14:38 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host relforge1009
14:38 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-ctrl2005.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
14:37 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-ctrl2005
14:37 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host relforge1009
14:37 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-ctrl2005
14:36 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:36 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-ctrl2005 to codfw - jhancock@cumin2002"
14:36 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-ctrl2005 to codfw - jhancock@cumin2002"
14:36 tgr@deploy1003: matmarex, tgr: Backport for Redirect credentials change pages to central domain (T362715), Fix clearing stuck cookies: $wgCookiePrefix defaults to false, not null (T389796) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:36 marostegui@cumin1002: dbctl commit (dc=all): 'db2205 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P74344 and previous config saved to /var/cache/conftool/dbconfig/20250324-143644-root.json
14:35 jmm@cumin2002: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Aitolkyn out of all services on: 953 hosts
14:35 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host relforge1008
14:34 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host relforge1008
14:34 jmm@cumin2002: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Aitolkyn out of all services on: 1310 hosts
14:32 jhancock@cumin2002: START - Cookbook sre.dns.netbox
14:31 tgr@deploy1003: Started scap sync-world: Backport for Redirect credentials change pages to central domain (T362715), Fix clearing stuck cookies: $wgCookiePrefix defaults to false, not null (T389796)
14:31 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-serve2011.codfw.wmnet with OS bookworm
14:21 marostegui@cumin1002: dbctl commit (dc=all): 'db2205 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P74342 and previous config saved to /var/cache/conftool/dbconfig/20250324-142139-root.json
14:15 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve2011.codfw.wmnet with reason: host reimage
14:14 tgr@deploy1003: Finished scap sync-world: Backport for bugfix: add back missing pipe char to conform to dogstatsd spec (T359385), Fix clearing stuck 'UserID' and 'UserName' cookies on Wikitech (T389796) (duration: 17m 16s)
14:13 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:13 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for elastic - jclark@cumin1002"
14:13 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for elastic - jclark@cumin1002"
14:12 klausman@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve2011.codfw.wmnet with reason: host reimage
14:08 jclark@cumin1002: START - Cookbook sre.dns.netbox
14:07 tgr@deploy1003: tgr, matmarex, cwhite: Continuing with sync
14:06 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2227 slowly with 10 steps - Pooling in after cloning
14:06 fceratto@cumin1002: START - Cookbook sre.mysql.pool db2227 slowly with 10 steps - Pooling in after cloning
14:06 marostegui@cumin1002: dbctl commit (dc=all): 'db2205 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P74341 and previous config saved to /var/cache/conftool/dbconfig/20250324-140633-root.json
14:06 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2188 gradually with 4 steps - Pooling in after cloning
14:06 fceratto@cumin1002: START - Cookbook sre.mysql.pool db2188 gradually with 4 steps - Pooling in after cloning
14:05 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2212.codfw.wmnet with reason: Index rebuild
14:02 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.clone (exit_code=99) of db2227.codfw.wmnet onto db2205.codfw.wmnet
14:02 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.clone (exit_code=99) of db2188.codfw.wmnet onto db2212.codfw.wmnet
14:02 fceratto@cumin1002: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) db2188 slowly with 10 steps - Pool db2188.codfw.wmnet in after cloning
14:02 fceratto@cumin1002: START - Cookbook sre.mysql.pool db2188 slowly with 10 steps - Pool db2188.codfw.wmnet in after cloning
14:01 tgr@deploy1003: tgr, matmarex, cwhite: Backport for bugfix: add back missing pipe char to conform to dogstatsd spec (T359385), Fix clearing stuck 'UserID' and 'UserName' cookies on Wikitech (T389796) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:01 dcausse: depooling wdqs1012 (catching up lag)
13:57 tgr@deploy1003: Started scap sync-world: Backport for bugfix: add back missing pipe char to conform to dogstatsd spec (T359385), Fix clearing stuck 'UserID' and 'UserName' cookies on Wikitech (T389796)
13:56 volans: updated cuimin to v5.1.1 on cumin1002
13:56 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host ml-serve2011
13:56 klausman@cumin2002: START - Cookbook sre.hosts.move-vlan for host ml-serve2011
13:56 klausman@cumin2002: START - Cookbook sre.hosts.reimage for host ml-serve2011.codfw.wmnet with OS bookworm
13:54 tgr@deploy1003: Finished scap sync-world: Backport for Turn on Parsoid fragment support everywhere (take 2) (T374661 T380758 T389545 T387608), Do not throw an exception after shared-domain login with no token (T362715), Do not start central login from the shared domain (T362715) (duration: 20m 42s)
13:49 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db2196.codfw.wmnet
13:46 tgr@deploy1003: tgr, cscott: Continuing with sync
13:46 btullis@cumin1002: END (PASS) - Cookbook sre.zookeeper.roll-restart-zookeeper (exit_code=0) for Zookeeper A:zookeeper-analytics cluster: Roll restart of jvm daemons.
13:45 marostegui@cumin1002: START - Cookbook sre.mysql.upgrade for db2196.codfw.wmnet
13:45 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2196 T389795', diff saved to https://phabricator.wikimedia.org/P74340 and previous config saved to /var/cache/conftool/dbconfig/20250324-134500-marostegui.json
13:43 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db2215 to x1 primary T389795', diff saved to https://phabricator.wikimedia.org/P74339 and previous config saved to /var/cache/conftool/dbconfig/20250324-134356-marostegui.json
13:43 marostegui: Starting x1 codfw failover from db2196 to db2215 - T389795
13:40 btullis@cumin1002: START - Cookbook sre.zookeeper.roll-restart-zookeeper for Zookeeper A:zookeeper-analytics cluster: Roll restart of jvm daemons.
13:38 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs1015.eqiad.wmnet
13:37 tgr@deploy1003: tgr, cscott: Backport for Turn on Parsoid fragment support everywhere (take 2) (T374661 T380758 T389545 T387608), Do not throw an exception after shared-domain login with no token (T362715), Do not start central login from the shared domain (T362715) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:33 tgr@deploy1003: Started scap sync-world: Backport for Turn on Parsoid fragment support everywhere (take 2) (T374661 T380758 T389545 T387608), Do not throw an exception after shared-domain login with no token (T362715), Do not start central login from the shared domain (T362715)
13:33 marostegui@cumin1002: dbctl commit (dc=all): 'Set db2215 with weight 0 T389795', diff saved to https://phabricator.wikimedia.org/P74338 and previous config saved to /var/cache/conftool/dbconfig/20250324-133320-marostegui.json
13:33 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 13 hosts with reason: Primary switchover x1 T389795
13:32 bking@cumin2002: START - Cookbook sre.hosts.reboot-single for host wdqs1015.eqiad.wmnet
13:23 tgr@deploy1003: Finished scap sync-world: Backport for CommonSettings: Migrate CentralNotice to Virtual Domains (T389348), [Growth] enwiki: Release Add Link to 20% of newcomers (T388289), [Experiment Platform] Disable test experiment (T383801), Enable Section Translation and Unified Dashboard on all wikipedias (T387821) (duration: 16m 45s)
13:16 tgr@deploy1003: reedy, sfaci, tgr, cyndywikime, sbisson: Continuing with sync
13:12 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps-test2001.codfw.wmnet
13:11 tgr@deploy1003: reedy, sfaci, tgr, cyndywikime, sbisson: Backport for CommonSettings: Migrate CentralNotice to Virtual Domains (T389348), [Growth] enwiki: Release Add Link to 20% of newcomers (T388289), [Experiment Platform] Disable test experiment (T383801), Enable Section Translation and Unified Dashboard on all wikipedias (T387821) synced t
13:06 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps-test2001.codfw.wmnet
13:06 tgr@deploy1003: Started scap sync-world: Backport for CommonSettings: Migrate CentralNotice to Virtual Domains (T389348), [Growth] enwiki: Release Add Link to 20% of newcomers (T388289), [Experiment Platform] Disable test experiment (T383801), Enable Section Translation and Unified Dashboard on all wikipedias (T387821)
13:04 ladsgroup@deploy1003: Finished scap sync-world: Backport for Remove x2 (T383327) (duration: 13m 07s)
12:57 ladsgroup@deploy1003: ladsgroup: Continuing with sync
12:55 ladsgroup@deploy1003: ladsgroup: Backport for Remove x2 (T383327) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
12:51 ladsgroup@deploy1003: Started scap sync-world: Backport for Remove x2 (T383327)
12:41 ladsgroup@deploy1003: Finished scap sync-world: Backport for Switch the footer link to wikimedia.org (T387573 T373204) (duration: 12m 25s)
12:40 marostegui@cumin1002: dbctl commit (dc=all): 'db2214 (re)pooling @ 100%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74337 and previous config saved to /var/cache/conftool/dbconfig/20250324-124041-root.json
12:34 ladsgroup@deploy1003: ladsgroup: Continuing with sync
12:33 ladsgroup@deploy1003: ladsgroup: Backport for Switch the footer link to wikimedia.org (T387573 T373204) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
12:31 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db2205 - Depool db2227.codfw.wmnet to then clone it to db2205.codfw.wmnet - fceratto@cumin1002
12:31 fceratto@cumin1002: START - Cookbook sre.mysql.depool db2205 - Depool db2227.codfw.wmnet to then clone it to db2205.codfw.wmnet - fceratto@cumin1002
12:31 fceratto@cumin1002: START - Cookbook sre.mysql.clone of db2227.codfw.wmnet onto db2205.codfw.wmnet
12:29 ladsgroup@deploy1003: Started scap sync-world: Backport for Switch the footer link to wikimedia.org (T387573 T373204)
12:25 marostegui@cumin1002: dbctl commit (dc=all): 'db2214 (re)pooling @ 75%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74336 and previous config saved to /var/cache/conftool/dbconfig/20250324-122535-root.json
12:24 ladsgroup@deploy1003: Synchronized portals: Minor wikimedia.org mobile fixes (T373204) (duration: 02m 48s)
12:23 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db2205.codfw.wmnet
12:22 ladsgroup@deploy1003: Synchronized portals/wikipedia.org/assets: Minor wikimedia.org mobile fixes (T373204) (duration: 11m 37s)
12:19 fceratto@cumin1002: START - Cookbook sre.mysql.upgrade for db2205.codfw.wmnet
12:12 fceratto@cumin1002: dbctl commit (dc=all): 'Depool db2205 T389377', diff saved to https://phabricator.wikimedia.org/P74335 and previous config saved to /var/cache/conftool/dbconfig/20250324-121227-fceratto.json
12:10 marostegui@cumin1002: dbctl commit (dc=all): 'db2214 (re)pooling @ 50%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74334 and previous config saved to /var/cache/conftool/dbconfig/20250324-121030-root.json
12:09 fceratto@cumin1002: dbctl commit (dc=all): 'Promote db2209 to s3 primary T389377', diff saved to https://phabricator.wikimedia.org/P74333 and previous config saved to /var/cache/conftool/dbconfig/20250324-120947-fceratto.json
12:09 elukey: revert rate-limit replicas from 6 -> 3 on Wikikube eqiad
12:08 federico3: Starting s3 codfw failover from db2205 to db2209 - T389377
12:03 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host maps-test2002.codfw.wmnet with OS bookworm
12:02 ladsgroup@deploy1003: Finished scap sync-world: Backport for Enable dataRedundancy for mainstash (T383327) (duration: 15m 43s)
11:58 marostegui@cumin1002: dbctl commit (dc=all): 'db2169 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P74332 and previous config saved to /var/cache/conftool/dbconfig/20250324-115843-root.json
11:56 elukey: temporarily bump rate-limit replicas from 3 -> 6 on Wikikube eqiad
11:55 marostegui@cumin1002: dbctl commit (dc=all): 'db2214 (re)pooling @ 25%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74331 and previous config saved to /var/cache/conftool/dbconfig/20250324-115524-root.json
11:55 ladsgroup@deploy1003: ladsgroup: Continuing with sync
11:54 fceratto@cumin1002: dbctl commit (dc=all): 'Set db2209 with weight 0 T389377', diff saved to https://phabricator.wikimedia.org/P74330 and previous config saved to /var/cache/conftool/dbconfig/20250324-115457-fceratto.json
11:54 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 24 hosts with reason: Primary switchover s3 T389377
11:51 ladsgroup@deploy1003: ladsgroup: Backport for Enable dataRedundancy for mainstash (T383327) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
11:46 ladsgroup@deploy1003: Started scap sync-world: Backport for Enable dataRedundancy for mainstash (T383327)
11:46 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db2212 - Depool db2188.codfw.wmnet to then clone it to db2212.codfw.wmnet - fceratto@cumin1002
11:46 fceratto@cumin1002: START - Cookbook sre.mysql.depool db2212 - Depool db2188.codfw.wmnet to then clone it to db2212.codfw.wmnet - fceratto@cumin1002
11:46 fceratto@cumin1002: START - Cookbook sre.mysql.clone of db2188.codfw.wmnet onto db2212.codfw.wmnet
11:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps-test2002.codfw.wmnet with reason: host reimage
11:44 moritzm: installing subversion security updates
11:43 marostegui@cumin1002: dbctl commit (dc=all): 'db2169 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P74328 and previous config saved to /var/cache/conftool/dbconfig/20250324-114338-root.json
11:43 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on maps-test2002.codfw.wmnet with reason: host reimage
11:42 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps-test2006.codfw.wmnet
11:40 marostegui@cumin1002: dbctl commit (dc=all): 'db2214 (re)pooling @ 10%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74327 and previous config saved to /var/cache/conftool/dbconfig/20250324-114019-root.json
11:38 btullis@cumin1002: END (PASS) - Cookbook sre.opensearch.roll-restart-reboot (exit_code=0) rolling restart_daemons on A:datahubsearch
11:34 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps-test2006.codfw.wmnet
11:31 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps-test2005.codfw.wmnet
11:31 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db2212.codfw.wmnet
11:28 btullis@cumin1002: START - Cookbook sre.opensearch.roll-restart-reboot rolling restart_daemons on A:datahubsearch
11:28 moritzm: installing busybox security updates
11:28 marostegui@cumin1002: dbctl commit (dc=all): 'db2169 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P74326 and previous config saved to /var/cache/conftool/dbconfig/20250324-112833-root.json
11:26 fceratto@cumin1002: START - Cookbook sre.mysql.upgrade for db2212.codfw.wmnet
11:24 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host maps-test2002.codfw.wmnet with OS bookworm
11:23 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps-test2005.codfw.wmnet
11:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host maps-test2001.codfw.wmnet with OS bookworm
11:19 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-serve2005.codfw.wmnet with OS bookworm
11:13 marostegui@cumin1002: dbctl commit (dc=all): 'db2169 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P74325 and previous config saved to /var/cache/conftool/dbconfig/20250324-111327-root.json
11:11 fceratto@cumin1002: dbctl commit (dc=all): 'Depool db2212 T389373', diff saved to https://phabricator.wikimedia.org/P74324 and previous config saved to /var/cache/conftool/dbconfig/20250324-111157-fceratto.json
11:11 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-mariadb1001.eqiad.wmnet
11:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps-test2004.codfw.wmnet
11:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on maps-test2001.codfw.wmnet with reason: host reimage
11:04 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-mariadb1001.eqiad.wmnet
11:03 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps-test2004.codfw.wmnet
11:03 fceratto@cumin1002: dbctl commit (dc=all): 'Promote db2203 to s1 primary T389373', diff saved to https://phabricator.wikimedia.org/P74323 and previous config saved to /var/cache/conftool/dbconfig/20250324-110321-fceratto.json
11:03 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve2005.codfw.wmnet with reason: host reimage
11:01 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on maps-test2001.codfw.wmnet with reason: host reimage
10:59 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve2005.codfw.wmnet with reason: host reimage
10:59 federico3: Starting s1 codfw failover from db2212 to db2203 - T389373
10:58 marostegui@cumin1002: dbctl commit (dc=all): 'db2169 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P74322 and previous config saved to /var/cache/conftool/dbconfig/20250324-105822-root.json
10:57 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host maps-test2003.codfw.wmnet
10:50 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host maps-test2003.codfw.wmnet
10:43 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 34 hosts with reason: Primary switchover s1 T389373
10:43 marostegui@cumin1002: dbctl commit (dc=all): 'db2169 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P74321 and previous config saved to /var/cache/conftool/dbconfig/20250324-104316-root.json
10:42 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host maps-test2001.codfw.wmnet with OS bookworm
10:42 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host ml-serve2005
10:42 elukey@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ml-serve2005
10:41 elukey@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host ml-serve2005
10:41 elukey@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ml-serve2005.codfw.wmnet 202.0.192.10.in-addr.arpa 2.0.2.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
10:41 elukey@cumin1002: START - Cookbook sre.dns.wipe-cache ml-serve2005.codfw.wmnet 202.0.192.10.in-addr.arpa 2.0.2.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
10:41 elukey@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
10:41 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ml-serve2005 - elukey@cumin1002"
10:41 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ml-serve2005 - elukey@cumin1002"
10:38 ladsgroup@deploy1003: Finished scap sync-world: Backport for Migrate x2 off LB config (T383327 T387654), etcd: Make Mainstash config global variable (T383327 T387654) (duration: 17m 39s)
10:35 moritzm: installing docker.io security updates
10:31 elukey@cumin1002: START - Cookbook sre.dns.netbox
10:31 ladsgroup@deploy1003: ladsgroup: Continuing with sync
10:31 elukey@cumin1002: START - Cookbook sre.hosts.move-vlan for host ml-serve2005
10:30 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ml-serve2005.codfw.wmnet with OS bookworm
10:29 fceratto@cumin1002: dbctl commit (dc=all): 'Set db2203 with weight 0 T389373', diff saved to https://phabricator.wikimedia.org/P74319 and previous config saved to /var/cache/conftool/dbconfig/20250324-102944-fceratto.json
10:28 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2214.codfw.wmnet with reason: Index rebuild
10:28 marostegui@cumin1002: dbctl commit (dc=all): 'db2169 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P74318 and previous config saved to /var/cache/conftool/dbconfig/20250324-102811-root.json
10:27 fceratto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 34 hosts with reason: Primary switchover s1 T389373
10:26 ladsgroup@deploy1003: ladsgroup: Backport for Migrate x2 off LB config (T383327 T387654), etcd: Make Mainstash config global variable (T383327 T387654) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
10:24 marostegui@cumin1002: END (FAIL) - Cookbook sre.mysql.clone (exit_code=99) of db2169.codfw.wmnet onto db2214.codfw.wmnet
10:21 ladsgroup@deploy1003: Started scap sync-world: Backport for Migrate x2 off LB config (T383327 T387654), etcd: Make Mainstash config global variable (T383327 T387654)
10:17 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1119.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
10:15 ladsgroup@deploy1003: Started scap sync-world: Backport for Migrate x2 off LB config (T383327 T387654)
10:13 XioNoX: shutdown all SG.IX peers - T386987
10:12 elukey@cumin2002: START - Cookbook sre.hosts.provision for host elastic1119.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
10:10 ladsgroup@deploy1003: Finished scap sync-world: Backport for Bump thumbnail steps to 35% (T360589) (duration: 15m 56s)
10:03 ladsgroup@deploy1003: ladsgroup: Continuing with sync
10:00 ladsgroup@deploy1003: ladsgroup: Backport for Bump thumbnail steps to 35% (T360589) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
09:54 ladsgroup@deploy1003: Started scap sync-world: Backport for Bump thumbnail steps to 35% (T360589)
09:29 dcausse@deploy1003: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
09:29 dcausse@deploy1003: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
09:25 dcausse@deploy1003: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
09:25 dcausse@deploy1003: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
09:15 fabfur: restarting purged on A:cp due to T389707
09:13 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1119.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
09:13 marostegui@cumin1002: dbctl commit (dc=all): 'db2204 (re)pooling @ 100%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74315 and previous config saved to /var/cache/conftool/dbconfig/20250324-091320-root.json
09:05 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-serve2004.codfw.wmnet with OS bookworm
09:05 tgr_: morning UTC deploys done
09:04 tgr@deploy1003: Finished scap sync-world: Backport for authmanager: Use an URL parameter to keep track of returns (T388250) (duration: 16m 18s)
09:03 elukey@cumin2002: START - Cookbook sre.hosts.provision for host elastic1119.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
09:01 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1119.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
08:58 marostegui@cumin1002: dbctl commit (dc=all): 'db2204 (re)pooling @ 75%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74314 and previous config saved to /var/cache/conftool/dbconfig/20250324-085815-root.json
08:57 tgr@deploy1003: tgr: Continuing with sync
08:56 elukey@cumin2002: START - Cookbook sre.hosts.provision for host elastic1119.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
08:55 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host elastic1119.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
08:55 elukey@cumin2002: START - Cookbook sre.hosts.provision for host elastic1119.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
08:54 tgr@deploy1003: tgr: Backport for authmanager: Use an URL parameter to keep track of returns (T388250) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
08:49 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1119.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
08:48 tgr@deploy1003: Started scap sync-world: Backport for authmanager: Use an URL parameter to keep track of returns (T388250)
08:48 dcausse@deploy1003: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
08:48 dcausse@deploy1003: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
08:47 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve2004.codfw.wmnet with reason: host reimage
08:45 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve2004.codfw.wmnet with reason: host reimage
08:45 tgr@deploy1003: Finished scap sync-world: Backport for Preserve 'useformat' param when accessing Special:ChangePassword (duration: 33m 43s)
08:43 marostegui@cumin1002: dbctl commit (dc=all): 'db2204 (re)pooling @ 50%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74313 and previous config saved to /var/cache/conftool/dbconfig/20250324-084309-root.json
08:42 dcausse@deploy1003: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
08:42 elukey@cumin2002: START - Cookbook sre.hosts.provision for host elastic1119.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
08:41 dcausse@deploy1003: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
08:39 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host elastic1119.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
08:38 elukey@cumin2002: START - Cookbook sre.hosts.provision for host elastic1119.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
08:36 dcausse@deploy1003: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
08:36 dcausse@deploy1003: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
08:35 tgr@deploy1003: tgr: Continuing with sync
08:30 tgr@deploy1003: tgr: Backport for Preserve 'useformat' param when accessing Special:ChangePassword synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
08:28 marostegui@cumin1002: dbctl commit (dc=all): 'db2204 (re)pooling @ 25%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74312 and previous config saved to /var/cache/conftool/dbconfig/20250324-082804-root.json
08:27 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ml-serve2004.codfw.wmnet with OS bookworm
08:12 marostegui@cumin1002: dbctl commit (dc=all): 'db2204 (re)pooling @ 10%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74311 and previous config saved to /var/cache/conftool/dbconfig/20250324-081258-root.json
08:11 tgr@deploy1003: Started scap sync-world: Backport for Preserve 'useformat' param when accessing Special:ChangePassword
07:28 moritzm: rebalance ganeti eqiad/D following reimages T382507
07:16 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db2214 - Depool db2169.codfw.wmnet to then clone it to db2214.codfw.wmnet - marostegui@cumin1002
07:16 marostegui@cumin1002: START - Cookbook sre.mysql.depool db2214 - Depool db2169.codfw.wmnet to then clone it to db2214.codfw.wmnet - marostegui@cumin1002
07:16 marostegui@cumin1002: START - Cookbook sre.mysql.clone of db2169.codfw.wmnet onto db2214.codfw.wmnet
07:08 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db2214.codfw.wmnet
07:04 marostegui@cumin1002: START - Cookbook sre.mysql.upgrade for db2214.codfw.wmnet
07:02 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2214 T389382', diff saved to https://phabricator.wikimedia.org/P74309 and previous config saved to /var/cache/conftool/dbconfig/20250324-070245-marostegui.json
07:01 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db2229 to s6 primary T389382', diff saved to https://phabricator.wikimedia.org/P74308 and previous config saved to /var/cache/conftool/dbconfig/20250324-070147-marostegui.json
07:01 marostegui: Starting s6 codfw failover from db2214 to db2229 - T389382
06:53 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 24 hosts with reason: Primary switchover s6 T389382
06:52 marostegui@cumin1002: dbctl commit (dc=all): 'Set db2229 with weight 0 T389382', diff saved to https://phabricator.wikimedia.org/P74307 and previous config saved to /var/cache/conftool/dbconfig/20250324-065236-marostegui.json
06:32 root@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2204.codfw.wmnet with reason: Index rebuild
06:30 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db2204.codfw.wmnet
06:26 marostegui@cumin1002: START - Cookbook sre.mysql.upgrade for db2204.codfw.wmnet
06:23 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2204 T389376', diff saved to https://phabricator.wikimedia.org/P74306 and previous config saved to /var/cache/conftool/dbconfig/20250324-062338-marostegui.json
06:22 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db2207 to s2 primary T389376', diff saved to https://phabricator.wikimedia.org/P74305 and previous config saved to /var/cache/conftool/dbconfig/20250324-062223-marostegui.json
06:22 marostegui: Starting s2 codfw failover from db2204 to db2207 - T389376
06:18 marostegui@cumin1002: dbctl commit (dc=all): 'Set db2207 with weight 0 T389376', diff saved to https://phabricator.wikimedia.org/P74304 and previous config saved to /var/cache/conftool/dbconfig/20250324-061812-marostegui.json
06:18 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 27 hosts with reason: Primary switchover s2 T389376

2025-03-22

07:20 vgutierrez@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on cp4047.ulsfo.wmnet with reason: HW errors
07:15 vgutierrez[off]: restart purged on cp7001 - T389707

2025-03-21

22:58 krinkle@deploy1003: Finished scap sync-world: Backport for docroot: Enable Chrome credential sharing on foundation.wikimedia.org (T385520) (duration: 26m 14s)
22:51 krinkle@deploy1003: krinkle: Continuing with sync
22:36 krinkle@deploy1003: krinkle: Backport for docroot: Enable Chrome credential sharing on foundation.wikimedia.org (T385520) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
22:31 krinkle@deploy1003: Started scap sync-world: Backport for docroot: Enable Chrome credential sharing on foundation.wikimedia.org (T385520)
21:45 ryankemper@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: try rolling operation without allow-yellow flag - ryankemper@cumin2002 - T389119
21:43 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on P{cp3081.esams.wmnet} and A:cp
21:36 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on P{cp3081.esams.wmnet} and A:cp
21:27 sukhe: sukhe@deploy1003:~$ echo 'https://spiderpig.wikimedia.org/api/whoami' | mwscript-k8s --attach -- purgeList.php
21:26 ryankemper@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: try rolling operation without allow-yellow flag - ryankemper@cumin2002 - T389119
21:00 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on P{cp3080.esams.wmnet} and A:cp
20:55 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on P{cp3080.esams.wmnet} and A:cp
19:09 dancy@deploy1003: Installation of scap version "4.143.2" completed for 2 hosts
19:07 dancy@deploy1003: Installing scap version "4.143.2" for 2 host(s)
18:46 eevans@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
18:46 eevans@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Addtional IPs for restbase1045 - eevans@cumin1002"
18:46 eevans@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Addtional IPs for restbase1045 - eevans@cumin1002"
18:42 eevans@cumin1002: START - Cookbook sre.dns.netbox
18:40 eevans@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
18:40 eevans@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Addtional IPs for restbase1044 - eevans@cumin1002"
18:40 eevans@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Addtional IPs for restbase1044 - eevans@cumin1002"
18:32 eevans@cumin1002: START - Cookbook sre.dns.netbox
18:28 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for restbase1043.eqiad.wmnet
18:27 eevans@cumin1002: START - Cookbook sre.hosts.remove-downtime for restbase1043.eqiad.wmnet
18:25 topranks: enabling ospf cloudsw1-c8-eqiad
18:08 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2255.codfw.wmnet with OS bookworm
18:07 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2258.codfw.wmnet with OS bookworm
18:07 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2254.codfw.wmnet with OS bookworm
18:07 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2257.codfw.wmnet with OS bookworm
18:07 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker2256.codfw.wmnet with OS bookworm
17:53 dancy@deploy1003: Installation of scap version "4.143.1" completed for 2 hosts
17:51 dancy@deploy1003: Installing scap version "4.143.1" for 2 host(s)
17:48 mforns@deploy1003: Finished deploy [airflow-dags/analytics@317134a]: finalize airflow migration (duration: 00m 44s)
17:47 mforns@deploy1003: Started deploy [airflow-dags/analytics@317134a]: finalize airflow migration
17:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2253.codfw.wmnet with OS bookworm
17:43 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
17:33 cmooney@dns2005: END - running authdns-update
17:31 cmooney@dns2005: START - running authdns-update
17:16 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
17:09 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
17:07 cmooney@cumin1002: START - Cookbook sre.dns.netbox
17:06 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
17:06 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: set cloudsw cloud vrf xlink dns to wikimediacloud.org domain - cmooney@cumin1002"
17:06 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: set cloudsw cloud vrf xlink dns to wikimediacloud.org domain - cmooney@cumin1002"
17:02 elukey@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ml-serve2004.codfw.wmnet with OS bookworm
17:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2253.codfw.wmnet with reason: host reimage
17:01 cmooney@cumin1002: START - Cookbook sre.dns.netbox
17:00 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1122.eqiad.wmnet with OS bullseye
17:00 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
16:59 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
16:58 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2253.codfw.wmnet with reason: host reimage
16:50 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host elastic1119.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
16:49 elukey@cumin2002: START - Cookbook sre.hosts.provision for host elastic1119.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
16:47 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2257.codfw.wmnet with OS bookworm
16:47 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2258.codfw.wmnet with OS bookworm
16:47 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2256.codfw.wmnet with OS bookworm
16:47 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2255.codfw.wmnet with OS bookworm
16:47 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2254.codfw.wmnet with OS bookworm
16:47 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2253.codfw.wmnet with OS bookworm
16:46 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1122.eqiad.wmnet with reason: host reimage
16:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2257.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
16:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2256.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
16:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2258.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
16:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2255.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
16:43 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1122.eqiad.wmnet with reason: host reimage
16:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2254.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
16:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2253.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
16:38 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2257.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
16:37 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2257.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
16:37 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2258.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
16:37 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2257.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
16:37 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2256.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
16:37 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2255.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
16:37 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2254.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
16:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2255.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
16:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2257.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
16:36 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2253.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
16:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2258.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
16:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2254.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
16:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2256.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
16:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2253.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
16:32 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host elastic1122.eqiad.wmnet with OS bullseye
16:32 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2327.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
16:26 elukey@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2327.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
16:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2255.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
16:25 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2327.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
16:25 elukey@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2327.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
16:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2257.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
16:25 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2257.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
16:25 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2255.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
16:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2258.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
16:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2257.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
16:25 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2327.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
16:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2256.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
16:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2255.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
16:25 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on P{cp3079.esams.wmnet} and A:cp
16:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2254.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
16:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2253.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
16:24 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2255
16:23 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2255
16:23 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2257
16:23 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2257
16:23 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2258
16:23 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2256
16:23 jhancock@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host wikikube-worker2257
16:23 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2254
16:23 jhancock@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host wikikube-worker2255
16:23 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2253
16:23 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2258
16:23 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2257
16:23 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2256
16:23 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2255
16:23 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2254
16:23 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2253
16:22 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:22 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2253-8 to codfw - jhancock@cumin2002"
16:22 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2253-8 to codfw - jhancock@cumin2002"
16:19 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on P{cp3079.esams.wmnet} and A:cp
16:18 jhancock@cumin2002: START - Cookbook sre.dns.netbox
16:14 elukey@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2327.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
16:12 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:12 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2253-8 to codfw - jhancock@cumin2002"
16:12 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2253-8 to codfw - jhancock@cumin2002"
16:08 jhancock@cumin2002: START - Cookbook sre.dns.netbox
16:07 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1121.eqiad.wmnet with OS bullseye
16:07 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
16:06 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
16:05 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ml-serve2004.codfw.wmnet with OS bookworm
16:04 elukey@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ml-serve2004.codfw.wmnet with OS bookworm
16:03 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1122.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
15:57 jclark@cumin1002: START - Cookbook sre.hosts.provision for host elastic1122.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
15:53 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1121.eqiad.wmnet with reason: host reimage
15:50 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1121.eqiad.wmnet with reason: host reimage
15:47 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1120.eqiad.wmnet with OS bullseye
15:47 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
15:43 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
15:42 klausman@deploy1003: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
15:42 klausman@deploy1003: helmfile [staging] START helmfile.d/services/api-gateway: apply
15:42 klausman@deploy1003: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply
15:42 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host puppetserver2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
15:42 klausman@deploy1003: helmfile [codfw] START helmfile.d/services/api-gateway: apply
15:41 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1117.eqiad.wmnet with OS bullseye
15:41 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
15:41 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host elastic1119.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
15:41 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
15:40 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host elastic1121.eqiad.wmnet with OS bullseye
15:40 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1118.eqiad.wmnet with OS bullseye
15:40 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
15:39 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
15:39 jclark@cumin1002: START - Cookbook sre.hosts.provision for host elastic1119.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
15:39 klausman@deploy1003: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply
15:39 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2249.codfw.wmnet with OS bookworm
15:39 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
15:39 klausman@deploy1003: helmfile [eqiad] START helmfile.d/services/api-gateway: apply
15:37 cmooney@dns2005: END - running authdns-update
15:37 elukey@cumin2002: START - Cookbook sre.hosts.provision for host puppetserver2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
15:36 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1121.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
15:35 cmooney@dns2005: START - running authdns-update
15:35 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
15:34 jclark@cumin1002: START - Cookbook sre.hosts.provision for host elastic1121.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
15:32 robh@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cp4047.ulsfo.wmnet
15:32 robh@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cp4047.ulsfo.wmnet
15:32 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host elastic1119.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
15:31 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host puppetserver2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
15:30 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1120.eqiad.wmnet with reason: host reimage
15:30 elukey@cumin2002: START - Cookbook sre.hosts.provision for host puppetserver2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
15:29 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:29 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns for cloudsw1-c8-eqiad cloud-private vrf loopback - cmooney@cumin1002"
15:29 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add dns for cloudsw1-c8-eqiad cloud-private vrf loopback - cmooney@cumin1002"
15:29 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host puppetserver2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
15:28 elukey@cumin2002: START - Cookbook sre.hosts.provision for host puppetserver2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2251.codfw.wmnet with OS bookworm
15:28 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
15:27 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host puppetserver2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
15:27 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1117.eqiad.wmnet with reason: host reimage
15:27 jclark@cumin1002: START - Cookbook sre.hosts.provision for host elastic1119.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
15:27 elukey@cumin2002: START - Cookbook sre.hosts.provision for host puppetserver2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
15:27 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
15:27 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host puppetserver2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
15:26 cmooney@cumin1002: START - Cookbook sre.dns.netbox
15:25 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host elastic1119.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
15:25 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1118.eqiad.wmnet with reason: host reimage
15:24 jclark@cumin1002: START - Cookbook sre.hosts.provision for host elastic1119.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
15:24 elukey@cumin2002: START - Cookbook sre.hosts.provision for host puppetserver2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
15:22 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1120.eqiad.wmnet with reason: host reimage
15:22 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1117.eqiad.wmnet with reason: host reimage
15:22 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1118.eqiad.wmnet with reason: host reimage
15:22 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1116.eqiad.wmnet with OS bullseye
15:22 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host elastic1121.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
15:22 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
15:21 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
15:20 jclark@cumin1002: START - Cookbook sre.hosts.provision for host elastic1121.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
15:19 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host elastic1122.eqiad.wmnet with OS bullseye
15:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-ctrl2004.codfw.wmnet with OS bookworm
15:19 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
15:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2249.codfw.wmnet with reason: host reimage
15:19 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1122.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
15:19 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
15:18 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:18 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2327 to codfw - jhancock@cumin2002"
15:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2327 to codfw - jhancock@cumin2002"
15:18 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host puppetserver2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
15:17 elukey@cumin2002: START - Cookbook sre.hosts.provision for host puppetserver2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
15:16 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2252.codfw.wmnet with OS bookworm
15:16 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
15:16 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
15:16 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2249.codfw.wmnet with reason: host reimage
15:14 jhancock@cumin2002: START - Cookbook sre.dns.netbox
15:14 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2250.codfw.wmnet with OS bookworm
15:14 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
15:13 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
15:12 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host ml-serve2004
15:12 elukey@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ml-serve2004
15:12 robh@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp4047.ulsfo.wmnet
15:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2251.codfw.wmnet with reason: host reimage
15:11 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host elastic1120.eqiad.wmnet with OS bullseye
15:11 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host elastic1118.eqiad.wmnet with OS bullseye
15:11 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host elastic1117.eqiad.wmnet with OS bullseye
15:11 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp4047.ulsfo.wmnet
15:10 robh@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cp4047.ulsfo.wmnet
15:10 robh@cumin2002: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host cp4047.ulsfo.wmnet
15:09 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host puppetserver2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
15:09 elukey@cumin2002: START - Cookbook sre.hosts.provision for host puppetserver2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
15:08 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host puppetserver2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
15:08 elukey@cumin2002: START - Cookbook sre.hosts.provision for host puppetserver2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
15:08 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1116.eqiad.wmnet with reason: host reimage
15:07 elukey@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host ml-serve2004
15:07 elukey@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ml-serve2004.codfw.wmnet 11.48.192.10.in-addr.arpa 1.1.0.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
15:07 elukey@cumin1002: START - Cookbook sre.dns.wipe-cache ml-serve2004.codfw.wmnet 11.48.192.10.in-addr.arpa 1.1.0.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
15:07 elukey@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:07 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ml-serve2004 - elukey@cumin1002"
15:07 jclark@cumin1002: START - Cookbook sre.hosts.provision for host elastic1122.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
15:07 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ml-serve2004 - elukey@cumin1002"
15:04 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-ctrl2004.codfw.wmnet with reason: host reimage
15:03 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1121.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
15:02 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1115.eqiad.wmnet with OS bullseye
15:02 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
15:01 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2252.codfw.wmnet with reason: host reimage
14:59 elukey@cumin1002: START - Cookbook sre.dns.netbox
14:59 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
14:59 elukey@cumin1002: START - Cookbook sre.hosts.move-vlan for host ml-serve2004
14:59 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ml-serve2004.codfw.wmnet with OS bookworm
14:59 robh@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp4047.ulsfo.wmnet
14:58 klausman@deploy1003: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply
14:58 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1116.eqiad.wmnet with reason: host reimage
14:58 klausman@deploy1003: helmfile [eqiad] START helmfile.d/services/api-gateway: apply
14:58 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2250.codfw.wmnet with reason: host reimage
14:57 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1114.eqiad.wmnet with OS bullseye
14:57 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
14:57 jclark@cumin1002: START - Cookbook sre.hosts.provision for host elastic1121.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
14:57 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
14:56 sukhe@dns1005: END - running authdns-update
14:56 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp4047.ulsfo.wmnet
14:56 robh@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts cp4047.ulsfo.wmnet
14:55 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp4047.ulsfo.wmnet
14:55 sukhe: testing dummy authdns-update run
14:55 robh: resuming firmware updates on cp4047 via T387238
14:55 sukhe@dns1005: START - running authdns-update
14:55 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2251.codfw.wmnet with reason: host reimage
14:54 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2252.codfw.wmnet with reason: host reimage
14:54 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-ctrl2004.codfw.wmnet with reason: host reimage
14:54 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2250.codfw.wmnet with reason: host reimage
14:53 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1121.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
14:52 sukhe: sudo cumin 'A:dnsbox' 'run-puppet-agent'
14:47 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host elastic1116.eqiad.wmnet with OS bullseye
14:46 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1115.eqiad.wmnet with reason: host reimage
14:43 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1114.eqiad.wmnet with reason: host reimage
14:43 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2252.codfw.wmnet with OS bookworm
14:43 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2251.codfw.wmnet with OS bookworm
14:43 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2250.codfw.wmnet with OS bookworm
14:43 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2249.codfw.wmnet with OS bookworm
14:43 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl2004.codfw.wmnet with OS bookworm
14:42 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1115.eqiad.wmnet with reason: host reimage
14:42 jclark@cumin1002: START - Cookbook sre.hosts.provision for host elastic1121.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
14:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2252.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
14:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2251.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
14:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
14:39 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1114.eqiad.wmnet with reason: host reimage
14:39 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host elastic1122.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
14:38 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host elastic1121.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
14:38 jclark@cumin1002: START - Cookbook sre.hosts.provision for host elastic1122.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
14:38 jclark@cumin1002: START - Cookbook sre.hosts.provision for host elastic1121.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
14:36 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2252.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
14:36 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2252.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
14:36 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2252.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
14:35 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2251.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
14:35 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2252.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
14:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2251.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
14:31 isaranto@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revision-models' for release 'main' .
14:31 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host elastic1115.eqiad.wmnet with OS bullseye
14:31 isaranto@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revision-models' for release 'main' .
14:31 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1113.eqiad.wmnet with OS bullseye
14:31 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
14:31 isaranto@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
14:31 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
14:31 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host elastic1114.eqiad.wmnet with OS bullseye
14:29 isaranto@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revision-models' for release 'main' .
14:29 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1112.eqiad.wmnet with OS bullseye
14:29 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
14:27 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
14:25 isaranto@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
14:24 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1120.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
14:23 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host puppetserver2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
14:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2252.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
14:22 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2252.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
14:22 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2252.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
14:22 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2251.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
14:22 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-ctrl2004.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
14:21 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2252
14:21 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2251
14:21 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-ctrl2004
14:21 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2252
14:21 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2251
14:21 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-ctrl2004
14:20 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:20 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2251 to codfw - jhancock@cumin2002"
14:20 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2251 to codfw - jhancock@cumin2002"
14:18 jclark@cumin1002: START - Cookbook sre.hosts.provision for host elastic1120.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
14:18 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host elastic1119.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
14:18 elukey@cumin1002: START - Cookbook sre.hosts.provision for host puppetserver2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
14:17 jclark@cumin1002: START - Cookbook sre.hosts.provision for host elastic1119.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
14:16 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1113.eqiad.wmnet with reason: host reimage
14:16 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1120.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
14:16 jhancock@cumin2002: START - Cookbook sre.dns.netbox
14:15 jhancock@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
14:13 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1112.eqiad.wmnet with reason: host reimage
14:13 jhancock@cumin2002: START - Cookbook sre.dns.netbox
14:12 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1113.eqiad.wmnet with reason: host reimage
14:10 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1112.eqiad.wmnet with reason: host reimage
14:07 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host elastic1119.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
14:04 jclark@cumin1002: START - Cookbook sre.hosts.provision for host elastic1120.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
14:03 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1118.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
14:02 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host elastic1113.eqiad.wmnet with OS bullseye
14:01 jclark@cumin1002: START - Cookbook sre.hosts.provision for host elastic1119.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
14:00 Ammar: T389589 Ran mwscript-k8s --comment="T389589" -f -- extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=bewiki --logwiki=metawiki 'Daanschr' 'Daan Schrama'
13:59 Ammar: T389589 Ran mwscript-k8s --comment="T389589" -f -- extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=zhwiki --logwiki=metawiki 'Pinnasalvatore80' 'Diana 79'
13:59 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host elastic1112.eqiad.wmnet with OS bullseye
13:58 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1111.eqiad.wmnet with OS bullseye
13:58 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
13:58 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
13:56 jclark@cumin1002: START - Cookbook sre.hosts.provision for host elastic1118.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
13:56 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1118.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
13:54 klausman@deploy1003: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply
13:53 sukhe@cumin1002: END (PASS) - Cookbook sre.network.cf (exit_code=0)
13:53 sukhe@cumin1002: START - Cookbook sre.network.cf
13:53 klausman@deploy1003: helmfile [codfw] START helmfile.d/services/api-gateway: apply
13:50 klausman@deploy1003: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
13:49 klausman@deploy1003: helmfile [staging] START helmfile.d/services/api-gateway: apply
13:49 urandom: bootstrapping restbase1043-c/cassandra — T389423
13:45 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1111.eqiad.wmnet with reason: host reimage
13:44 jclark@cumin1002: START - Cookbook sre.hosts.provision for host elastic1118.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
13:43 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1117.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
13:42 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1111.eqiad.wmnet with reason: host reimage
13:36 jclark@cumin1002: START - Cookbook sre.hosts.provision for host elastic1117.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
13:35 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1117.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
13:31 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host elastic1111.eqiad.wmnet with OS bullseye
13:23 jclark@cumin1002: START - Cookbook sre.hosts.provision for host elastic1117.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
13:23 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1111.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
13:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host elastic1111.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
13:15 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1112.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
13:09 jclark@cumin1002: START - Cookbook sre.hosts.provision for host elastic1112.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
13:06 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1113.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
12:59 jclark@cumin1002: START - Cookbook sre.hosts.provision for host elastic1113.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
12:59 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1114.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
12:52 jclark@cumin1002: START - Cookbook sre.hosts.provision for host elastic1114.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
12:52 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1115.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
12:50 jclark@cumin1002: START - Cookbook sre.hosts.provision for host elastic1115.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
12:50 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1116.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
12:43 jclark@cumin1002: START - Cookbook sre.hosts.provision for host elastic1116.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
12:42 claime: vacuum systemd journal logs down to 500M on registry200[4-5].codfw.wmnet
12:41 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1116.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
12:30 jclark@cumin1002: START - Cookbook sre.hosts.provision for host elastic1116.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
12:20 mlitn@deploy1003: Finished deploy [airflow-dags/platform_eng@317134a]: (no justification provided) (duration: 00m 30s)
12:20 mlitn@deploy1003: Started deploy [airflow-dags/platform_eng@317134a]: (no justification provided)
12:02 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-serve2003.codfw.wmnet with OS bookworm
11:45 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve2003.codfw.wmnet with reason: host reimage
11:41 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve2003.codfw.wmnet with reason: host reimage
11:17 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host ml-serve2003
11:17 elukey@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ml-serve2003
11:16 elukey@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host ml-serve2003
11:16 elukey@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ml-serve2003.codfw.wmnet 29.32.192.10.in-addr.arpa 9.2.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
11:16 elukey@cumin1002: START - Cookbook sre.dns.wipe-cache ml-serve2003.codfw.wmnet 29.32.192.10.in-addr.arpa 9.2.0.0.2.3.0.0.2.9.1.0.0.1.0.0.3.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
11:16 elukey@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
11:16 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ml-serve2003 - elukey@cumin1002"
11:16 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ml-serve2003 - elukey@cumin1002"
11:11 elukey@cumin1002: START - Cookbook sre.dns.netbox
11:11 elukey@cumin1002: START - Cookbook sre.hosts.move-vlan for host ml-serve2003
11:10 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ml-serve2003.codfw.wmnet with OS bookworm
10:49 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
10:49 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
10:30 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
10:30 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
10:30 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
10:30 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
10:23 elukey: alter sequence wikidata_relation_members_id_seq as bigint; on maps2009's gis database - T389462
10:16 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-test: apply
10:16 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-test: apply
10:15 elukey: ALTER TABLE public.wikidata_relation_members ALTER COLUMN id TYPE bigint; on maps2009's posgres - T389462
10:14 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
10:14 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
10:06 elukey: `alter sequence wikidata_relation_members_id_seq as bigint;` on maps1009's gis database - T389462
09:51 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-main: apply
09:51 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-main: apply
09:50 brouberol@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
09:50 brouberol@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
09:35 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-serve2002.codfw.wmnet with OS bookworm
09:27 moritzm: imported python3-flask-sqlalchemy 2.1-4 to main component of wikimedia-bullseye (imported from bullseye-backports which will be archived soon) T383557
09:03 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve2002.codfw.wmnet with reason: host reimage
09:00 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve2002.codfw.wmnet with reason: host reimage
08:27 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ml-serve2002.codfw.wmnet with OS bookworm
07:54 Krinkle: krinkle@mwmaint: Fix actor_name encoding on cawiki for 1 row: actor_id=342864, per T389559
07:10 moritzm: installing Linux 6.1.129 on Bookworm hosts
07:09 slyngshede@cumin1002: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Sharvaniharan out of all services on: 2292 hosts
07:02 moritzm: installing vim security updates
02:37 brett@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on cp4047.ulsfo.wmnet with reason: BIOS upgrades
02:14 TimStarling: fixing corrupted blocks by directly updating the database for T389452
01:27 sukhe: running homer on cr*-esams
01:24 sukhe@cumin1002: END (PASS) - Cookbook sre.network.cf (exit_code=0)
01:24 sukhe@cumin1002: START - Cookbook sre.network.cf
01:24 sukhe@cumin1002: END (PASS) - Cookbook sre.network.cf (exit_code=0)
01:24 sukhe@cumin1002: START - Cookbook sre.network.cf
01:09 sukhe: running homer
01:07 sukhe@cumin1002: END (PASS) - Cookbook sre.network.cf (exit_code=0)
01:07 sukhe@cumin1002: START - Cookbook sre.network.cf
00:23 brett@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4047.ulsfo.wmnet with reason: BIOS upgrades
00:17 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on P{cp3078.esams.wmnet} and A:cp
00:12 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on P{cp3078.esams.wmnet} and A:cp
00:08 robh@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cp4047.ulsfo.wmnet
00:08 robh@cumin2002: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host cp4047.ulsfo.wmnet
00:08 brett: restart varnishkafka-all on A:cp-ulsfo
00:04 tstarling@deploy1003: Finished scap sync-world: Backport for block: Don't modify an autoblock when the user specifies an IP (T389452) (duration: 26m 05s)

2025-03-20

23:59 robh@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp4047.ulsfo.wmnet
23:58 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp4047.ulsfo.wmnet
23:58 robh@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cp4047.ulsfo.wmnet
23:58 robh@cumin2002: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host cp4047.ulsfo.wmnet
23:53 tstarling@deploy1003: tstarling: Continuing with sync
23:53 tstarling@deploy1003: tstarling: Backport for block: Don't modify an autoblock when the user specifies an IP (T389452) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
23:47 robh@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp4047.ulsfo.wmnet
23:46 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp4047.ulsfo.wmnet
23:45 robh@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cp4047.ulsfo.wmnet
23:45 robh@cumin2002: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host cp4047.ulsfo.wmnet
23:43 brett@dns1005: END - running authdns-update
23:42 brett@dns1005: START - running authdns-update
23:38 tstarling@deploy1003: Started scap sync-world: Backport for block: Don't modify an autoblock when the user specifies an IP (T389452)
23:37 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on P{cp3077.esams.wmnet} and A:cp
23:32 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on P{cp3077.esams.wmnet} and A:cp
23:31 robh@cumin2002: START - Cookbook sre.hosts.reboot-single for host cp4047.ulsfo.wmnet
23:30 kamila@deploy1003: Finished scap sync-world: Test deployment to validate deployment server switchover - T385155 (duration: 19m 42s)
23:30 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp4047.ulsfo.wmnet
23:29 robh: updating cp4047 bios via T387238, server will flap but is not pooled
23:28 robh@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts cp4047.ulsfo.wmnet
23:18 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp4047.ulsfo.wmnet
23:15 robh@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts cp4047.ulsfo.wmnet
23:13 brett@puppetserver1001: conftool action : set/pooled=yes; selector: name=dns1005.wikimedia.org
23:10 kamila@deploy1003: Started scap sync-world: Test deployment to validate deployment server switchover - T385155
23:06 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp4047.ulsfo.wmnet
23:05 robh@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts cp4047.ulsfo.wmnet
23:01 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on P{cp3076.esams.wmnet} and A:cp
22:58 kamila@deploy2002: Unlocked for deployment [MediaWiki]: deployment server switch -- T385155 (duration: 68m 30s)
22:56 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on P{cp3076.esams.wmnet} and A:cp
22:49 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp4047.ulsfo.wmnet
22:48 robh@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts cp4047.ulsfo.wmnet
22:47 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp4047.ulsfo.wmnet
22:41 kamila_: switch deployment.w.o DNS to eqiad
22:41 kamila@dns1004: END - running authdns-update
22:39 kamila@dns1004: START - running authdns-update
22:38 sukhe: depool dns1005 to debug zone files not in sync with dns.git
22:38 sukhe@puppetserver1001: conftool action : set/pooled=no; selector: name=dns1005.wikimedia.org
22:22 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp4046.ulsfo.wmnet
22:18 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on P{cp3075.esams.wmnet} and A:cp
22:13 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on P{cp3075.esams.wmnet} and A:cp
22:12 kamila@dns1004: START - running authdns-update
22:09 robh@cumin2002: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts cp4047.ulsfo.wmnet
22:09 dzahn@dns1004: START - running authdns-update
22:09 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp4047.ulsfo.wmnet
22:08 robh@cumin2002: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts cp4047.ulsfo.wmnet
22:07 robh@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cp4047.ulsfo.wmnet
22:00 kamila@dns1004: START - running authdns-update
21:49 kamila@deploy2002: Locking from deployment [MediaWiki]: deployment server switch -- T385155
21:39 tgr_: late UTC deploys done
21:38 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on P{cp3073.esams.wmnet} and A:cp
21:37 tgr@deploy2002: Finished scap sync-world: Backport for Clear stuck session cookies on Wikitech (T389433) (duration: 20m 31s)
21:34 urandom: bootstrapping restbase1043-b/cassandra — T389423 (previous msg(s) typo-ed)
21:33 urandom: bootstrapping restbase1034-b/cassandra — T389423
21:31 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on P{cp3073.esams.wmnet} and A:cp
21:29 tgr@deploy2002: tgr: Continuing with sync
21:25 eileen: civicrm upgraded from 7b532ad7 to fba4c3d6
21:19 tgr@deploy2002: tgr: Backport for Clear stuck session cookies on Wikitech (T389433) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:16 tgr@deploy2002: Started scap sync-world: Backport for Clear stuck session cookies on Wikitech (T389433)
21:14 tgr@deploy2002: Finished scap sync-world: Backport for Throttle exemption for Editathon in Ciudad de Buenos Aires - 29 March 2025 (T389400), Enable SUL3 logins for 50% of group 1 users (T384153), AbstractIterator: Make PHP 8.1 compatible (T389515) (duration: 14m 26s)
21:06 tgr@deploy2002: tgr, jforrester, superpes: Continuing with sync
21:04 tgr@deploy2002: tgr, jforrester, superpes: Backport for Throttle exemption for Editathon in Ciudad de Buenos Aires - 29 March 2025 (T389400), Enable SUL3 logins for 50% of group 1 users (T384153), AbstractIterator: Make PHP 8.1 compatible (T389515) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:59 tgr@deploy2002: Started scap sync-world: Backport for Throttle exemption for Editathon in Ciudad de Buenos Aires - 29 March 2025 (T389400), Enable SUL3 logins for 50% of group 1 users (T384153), AbstractIterator: Make PHP 8.1 compatible (T389515)
20:57 tgr@deploy2002: Finished scap sync-world: Backport for Profiler: emit both statsd and dogstatsd (T359385) (duration: 16m 11s)
20:50 tgr@deploy2002: cwhite, tgr: Continuing with sync
20:45 tgr@deploy2002: cwhite, tgr: Backport for Profiler: emit both statsd and dogstatsd (T359385) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:41 tgr@deploy2002: Started scap sync-world: Backport for Profiler: emit both statsd and dogstatsd (T359385)
20:34 tgr@deploy2002: Finished scap sync-world: Backport for cirrus: explicitly route search traffic to eqiad (T388610) (duration: 21m 07s)
20:27 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on P{cp3072.esams.wmnet} and A:cp
20:27 tgr@deploy2002: dcausse, tgr: Continuing with sync
20:21 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on P{cp3072.esams.wmnet} and A:cp
20:18 tgr@deploy2002: dcausse, tgr: Backport for cirrus: explicitly route search traffic to eqiad (T388610) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:13 tgr@deploy2002: Started scap sync-world: Backport for cirrus: explicitly route search traffic to eqiad (T388610)
19:59 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host elastic1122
19:58 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host elastic1122
19:58 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host elastic1121
19:56 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host elastic1121
19:56 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host elastic1120
19:55 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host elastic1120
19:55 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host elastic1119
19:54 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host elastic1119
19:54 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host elastic1118
19:52 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host elastic1118
19:52 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host elastic1117
19:51 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host elastic1117
19:51 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host elastic1116
19:51 jclark@cumin1002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host elastic1117
19:51 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host elastic1117
19:51 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
19:51 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for elastic - jclark@cumin1002"
19:50 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for elastic - jclark@cumin1002"
19:50 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on P{cp3071.esams.wmnet} and A:cp
19:50 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host elastic1116
19:46 jclark@cumin1002: START - Cookbook sre.dns.netbox
19:44 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on P{cp3071.esams.wmnet} and A:cp
19:33 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Unbanning all hosts in relforge
19:33 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Unbanning all hosts in relforge
19:27 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: relforge1003* for ban host to test puppet code - bking@cumin2002 - T380752
19:27 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: relforge1003* for ban host to test puppet code - bking@cumin2002 - T380752
19:26 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Unbanning all hosts in relforge
19:26 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Unbanning all hosts in relforge
19:24 dancy@deploy2002: Finished scap sync-world: T388761 (duration: 11m 15s)
19:21 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: relforge1004* for ban host to test reimage - bking@cumin2002 - T380752
19:21 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: relforge1004* for ban host to test reimage - bking@cumin2002 - T380752
19:16 denisse: restarting prometheus@ops.service in prometheus1005
19:13 dancy@deploy2002: Started scap sync-world: T388761
19:11 dancy@deploy2002: Installation of scap version "4.143.0" completed for 193 hosts
19:10 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on P{cp3070.esams.wmnet} and A:cp
19:07 dancy@deploy2002: Installing scap version "4.143.0" for 193 host(s)
19:03 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on P{cp3070.esams.wmnet} and A:cp
18:57 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Unbanning all hosts in relforge
18:57 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Unbanning all hosts in relforge
18:51 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: relforge1003* for ban host to test reimage - bking@cumin2002 - T380752
18:51 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: relforge1003* for ban host to test reimage - bking@cumin2002 - T380752
18:51 swfrench@deploy2002: Finished scap sync-world: Switch mw-misc to PHP 8.1 - T383845 (duration: 03m 22s)
18:48 swfrench@deploy2002: Started scap sync-world: Switch mw-misc to PHP 8.1 - T383845
18:19 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on P{cp3069.esams.wmnet} and A:cp
18:12 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Unbanning all hosts in relforge
18:12 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Unbanning all hosts in relforge
18:10 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on P{cp3069.esams.wmnet} and A:cp
18:02 sukhe: sudo cumin -b11 'A:cp-text' 'run-puppet-agent "rolling out CR 1129349"': T350094
18:02 sukhe: sudo cumin -b11 'A:cp-text' 'enable-puppet-agent "rolling out CR 1129349"': T350094
17:57 sukhe: enable puppet and run agent on cp3071 to test CR 1129349
17:53 reedy@deploy2002: Synchronized php-1.44.0-wmf.21/includes/parser/Sanitizer.php: T388733 (duration: 11m 36s)
17:51 sukhe: sudo cumin 'A:cp-text' 'disable-puppet "rolling out CR 1129349"': T350094
17:26 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on P{cp3068.esams.wmnet} and A:cp
17:23 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: relforge1003* for ban host prior to reimage - bking@cumin2002 - T380752
17:23 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: relforge1003* for ban host prior to reimage - bking@cumin2002 - T380752
17:00 brouberol@deploy2002: Finished scap build-images: Rebuild mediawiki-cli with recent dumps abspath fix (w/o cache) - T388378 (duration: 00m 56s)
16:59 brouberol@deploy2002: Started scap build-images: Rebuild mediawiki-cli with recent dumps abspath fix (w/o cache) - T388378
16:58 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on P{cp3068.esams.wmnet} and A:cp
16:53 brouberol@deploy2002: Finished scap build-images: Rebuild mediawiki-cli with recent dumps abspath fix - T388378 (duration: 00m 24s)
16:52 brouberol@deploy2002: Started scap build-images: Rebuild mediawiki-cli with recent dumps abspath fix - T388378
16:49 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-varnish (exit_code=0) rolling upgrade of Varnish on P{cp3067.esams.wmnet} and A:cp
16:48 fabfur: upgrade haproxykafka to 0.3.6 on A:cp (gradual rollout)
16:46 fabfur: imported haproxykafka 0.3.6 into apt repository (added TimestampType) (T388397)
16:43 jiji@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
16:42 jiji@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: apply
16:41 brett: Upgrading varnish to 7.1 on cp3067 (T378737)
16:41 brouberol@deploy2002: Finished scap build-images: (no justification provided) (duration: 00m 30s)
16:41 brouberol@deploy2002: Started scap build-images: (no justification provided)
16:41 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-varnish rolling upgrade of Varnish on P{cp3067.esams.wmnet} and A:cp
16:27 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2250.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
16:21 elukey@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker2250.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
16:20 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2250.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
16:18 elukey: `ALTER TABLE public.wikidata_relation_members ALTER COLUMN id TYPE bigint;` on maps1009's posgres - T389462
16:14 dancy@deploy2002: Installation of scap version "4.142.0" completed for 193 hosts
16:13 urandom: bootstrapping restbase1034-a/cassandra — T389423
16:10 elukey@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker2250.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
16:09 dancy@deploy2002: Installing scap version "4.142.0" for 193 host(s)
16:09 eevans@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 30 days, 0:00:00 on restbase1043.eqiad.wmnet with reason: Bootstrapping — T389423
16:07 elukey: stop imposm on maps1009 to allow fixing the postgres db - T389462
16:02 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2250
16:02 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2250
16:00 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:00 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2250 to codfw - jhancock@cumin2002"
15:58 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2250 to codfw - jhancock@cumin2002"
15:54 jhancock@cumin2002: START - Cookbook sre.dns.netbox
15:48 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2248.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
15:48 elukey@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker2248.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
15:47 moritzm: installing node-postcss security updates
15:42 cgoubert@deploy2002: Finished scap sync-world: Build mediawiki-cli image - T389484 (duration: 06m 18s)
15:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti-test2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
15:36 cgoubert@deploy2002: Started scap sync-world: Build mediawiki-cli image - T389484
15:29 jmm@cumin2002: START - Cookbook sre.hosts.provision for host ganeti-test2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
15:20 bking@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on cloudelastic[1007,1009-1012].eqiad.wmnet with reason: troubleshooting red status
15:14 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: try rolling operation without allow-yellow flag - bking@cumin2002 - T389119
15:11 dcausse@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
15:11 dcausse@deploy2002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
15:11 dcausse@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
15:11 dcausse@deploy2002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
15:06 dcausse@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
15:06 dcausse@deploy2002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
15:03 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@c274545] (releasing): (no justification provided) (duration: 00m 51s)
15:02 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@c274545] (releasing): (no justification provided)
15:01 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-main: apply
15:00 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-main: apply
14:59 jnuche@deploy2002: Finished deploy [releng/jenkins-deploy@c274545] (releasing): (no justification provided) (duration: 00m 33s)
14:58 jnuche@deploy2002: Started deploy [releng/jenkins-deploy@c274545] (releasing): (no justification provided)
14:58 dcausse@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
14:57 dcausse@deploy2002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
14:57 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: try rolling operation without allow-yellow flag - bking@cumin2002 - T389119
14:52 dcausse@deploy2002: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
14:52 dcausse@deploy2002: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
14:52 dcausse@deploy2002: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
14:52 dcausse@deploy2002: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
14:51 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-main: apply
14:51 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-main: apply
14:50 dcausse@deploy2002: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
14:50 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
14:49 dcausse@deploy2002: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
14:49 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
14:49 dreamyjazz@deploy2002: Finished scap sync-world: Backport for GlobalContributions: Do not look up permissions for registered target (T389187), GlobalContributionsPagerTest: De-duplicate getting new pager, GlobalContributions: Do not look up permissions for registered target (T389187) (duration: 15m 04s)
14:42 dcausse@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
14:41 dreamyjazz@deploy2002: dreamyjazz: Continuing with sync
14:38 dreamyjazz@deploy2002: dreamyjazz: Backport for GlobalContributions: Do not look up permissions for registered target (T389187), GlobalContributionsPagerTest: De-duplicate getting new pager, GlobalContributions: Do not look up permissions for registered target (T389187) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:36 dcausse@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
14:35 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: upgrade search plugins - bking@cumin2002 - T389119
14:34 dreamyjazz@deploy2002: Started scap sync-world: Backport for GlobalContributions: Do not look up permissions for registered target (T389187), GlobalContributionsPagerTest: De-duplicate getting new pager, GlobalContributions: Do not look up permissions for registered target (T389187)
14:12 dcausse@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
14:12 dcausse@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
14:12 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: upgrade search plugins - bking@cumin2002 - T389119
14:11 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_codfw: upgrade search plugins - bking@cumin2002 - T389119
14:08 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2249.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
14:08 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (3 nodes at a time) for ElasticSearch cluster search_codfw: upgrade search plugins - bking@cumin2002 - T389119
13:58 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2249.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
13:57 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2249.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
13:57 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2249.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
13:56 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker2249.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
13:56 sgimeno@deploy2002: Finished scap sync-world: Backport for Enable SUL3 login for 10% of group 1 users (T384153) (duration: 11m 18s)
13:53 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2249.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
13:48 sgimeno@deploy2002: tgr, sgimeno: Continuing with sync
13:47 sgimeno@deploy2002: tgr, sgimeno: Backport for Enable SUL3 login for 10% of group 1 users (T384153) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:44 sgimeno@deploy2002: Started scap sync-world: Backport for Enable SUL3 login for 10% of group 1 users (T384153)
13:39 sgimeno@deploy2002: Finished scap sync-world: Backport for analytics(GrowthExperimentsInteractionLogger): add edit_count to the event data (T388622), feat(SurfacingStructuredTasks): increase max edit cap to 100 (T388622) (duration: 11m 00s)
13:32 sgimeno@deploy2002: sgimeno: Continuing with sync
13:31 sgimeno@deploy2002: sgimeno: Backport for analytics(GrowthExperimentsInteractionLogger): add edit_count to the event data (T388622), feat(SurfacingStructuredTasks): increase max edit cap to 100 (T388622) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:30 moritzm: remove ganeti-test2001 for reimage T382515
13:29 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti-test2001.codfw.wmnet
13:28 sgimeno@deploy2002: Started scap sync-world: Backport for analytics(GrowthExperimentsInteractionLogger): add edit_count to the event data (T388622), feat(SurfacingStructuredTasks): increase max edit cap to 100 (T388622)
13:14 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti-test2001.codfw.wmnet
13:11 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti-test2001.codfw.wmnet
13:11 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti-test2001.codfw.wmnet
13:03 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of testvm2002.codfw.wmnet to drbd
13:02 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group2 to 1.44.0-wmf.21 refs T386216
12:56 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of testvm2002.codfw.wmnet to drbd
12:55 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti-test2001.codfw.wmnet
12:55 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti-test2001.codfw.wmnet
12:39 marostegui@cumin1002: dbctl commit (dc=all): 'db2165 (re)pooling @ 100%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74281 and previous config saved to /var/cache/conftool/dbconfig/20250320-123939-root.json
12:36 moritzm: installing openjdk 17 security updates on puppet servers (the necessary restarts may cause a few interrupted puppet runs and will be splayed out)
12:28 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group1 to 1.44.0-wmf.21 refs T386216
12:24 marostegui@cumin1002: dbctl commit (dc=all): 'db2165 (re)pooling @ 75%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74280 and previous config saved to /var/cache/conftool/dbconfig/20250320-122433-root.json
12:21 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host db1300.eqiad.wmnet
12:21 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1300.eqiad.wmnet with OS bookworm
12:17 jmm@dns1004: END - running authdns-update
12:15 jmm@dns1004: START - running authdns-update
12:13 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host puppetserver2004.codfw.wmnet with OS bookworm
12:10 tgr@deploy2002: Finished scap sync-world: Backport for Use MediaWikiServices for early config changes (T288819 T389430) (duration: 29m 34s)
12:09 marostegui@cumin1002: dbctl commit (dc=all): 'db2165 (re)pooling @ 50%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74279 and previous config saved to /var/cache/conftool/dbconfig/20250320-120928-root.json
12:05 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1300.eqiad.wmnet with reason: host reimage
12:03 tgr@deploy2002: tgr: Continuing with sync
12:02 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1300.eqiad.wmnet with reason: host reimage
11:57 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on puppetserver2004.codfw.wmnet with reason: host reimage
11:54 marostegui@cumin1002: dbctl commit (dc=all): 'db2165 (re)pooling @ 25%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74278 and previous config saved to /var/cache/conftool/dbconfig/20250320-115423-root.json
11:53 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on puppetserver2004.codfw.wmnet with reason: host reimage
11:51 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host db1300.eqiad.wmnet with OS bookworm
11:48 tgr@deploy2002: tgr: Backport for Use MediaWikiServices for early config changes (T288819 T389430) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
11:42 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host puppetserver2004.codfw.wmnet with OS bookworm
11:41 tgr@deploy2002: Started scap sync-world: Backport for Use MediaWikiServices for early config changes (T288819 T389430)
11:39 marostegui@cumin1002: dbctl commit (dc=all): 'db2165 (re)pooling @ 10%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74277 and previous config saved to /var/cache/conftool/dbconfig/20250320-113918-root.json
11:38 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM db1300.eqiad.wmnet - jmm@cumin2002"
11:38 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM db1300.eqiad.wmnet - jmm@cumin2002"
11:37 moritzm: instaling debootstrap bugfix updates
11:37 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) db1300.eqiad.wmnet on all recursors
11:37 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache db1300.eqiad.wmnet on all recursors
11:37 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
11:37 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM db1300.eqiad.wmnet - jmm@cumin2002"
11:37 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM db1300.eqiad.wmnet - jmm@cumin2002"
11:31 jmm@cumin2002: START - Cookbook sre.dns.netbox
11:31 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host db1300.eqiad.wmnet
11:17 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host puppetserver2004.codfw.wmnet with OS bookworm
11:00 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.upgrade (exit_code=0) restarting P{lvs[7001-7002].magru.wmnet} and A:liberica
10:59 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.upgrade restarting P{lvs[7001-7002].magru.wmnet} and A:liberica
10:59 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.upgrade (exit_code=0) restarting P{lvs7003.magru.wmnet,lvs1013.eqiad.wmnet} and A:liberica
10:58 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.upgrade restarting P{lvs7003.magru.wmnet,lvs1013.eqiad.wmnet} and A:liberica
10:56 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'clear' for AS: 52999
10:55 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'clear' for AS: 52999
10:44 moritzm: installing Java security updates on idp hosts
10:43 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host puppetserver2004.codfw.wmnet with OS bookworm
10:42 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host puppetserver2004.codfw.wmnet with OS bookworm
10:38 elukey: restart imposm.service on maps1009 - T389462
10:31 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host puppetserver2004.codfw.wmnet with OS bookworm
10:21 elukey@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ml-serve2002.codfw.wmnet with OS bookworm
10:09 moritzm: installing gunicorn security updates
09:57 ayounsi@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host ganeti5004.eqsin.wmnet
09:50 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ml-serve2002.codfw.wmnet with OS bookworm
09:50 elukey@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ml-serve2002.codfw.wmnet with OS bookworm
09:46 ladsgroup@deploy2002: Finished scap sync-world: Backport for Bump thumbnail steps to 30% (T360589) (duration: 13m 55s)
09:44 ayounsi@cumin1002: START - Cookbook sre.hosts.reboot-single for host ganeti5004.eqsin.wmnet
09:38 ladsgroup@deploy2002: ladsgroup: Continuing with sync
09:35 ladsgroup@deploy2002: ladsgroup: Backport for Bump thumbnail steps to 30% (T360589) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
09:34 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti5004.eqsin.wmnet
09:32 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti5004.eqsin.wmnet
09:32 ladsgroup@deploy2002: Started scap sync-world: Backport for Bump thumbnail steps to 30% (T360589)
09:28 tgr@deploy2002: Finished scap sync-world: Backport for Enable SUL3 logins for 1% of group 1 users (T384153) (duration: 25m 47s)
09:21 tgr@deploy2002: tgr: Continuing with sync
09:19 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host puppetserver2004.codfw.wmnet with OS bookworm
09:18 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 15:00:00 on db2165.codfw.wmnet with reason: Maintenance
09:12 tgr@deploy2002: tgr: Backport for Enable SUL3 logins for 1% of group 1 users (T384153) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
09:07 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5007.eqsin.wmnet
09:03 tgr@deploy2002: Started scap sync-world: Backport for Enable SUL3 logins for 1% of group 1 users (T384153)
09:02 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host puppetserver2004.codfw.wmnet with OS bookworm
09:00 ayounsi@cumin1002: START - Cookbook sre.hosts.reboot-single for host ganeti5007.eqsin.wmnet
08:58 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host ml-serve2002
08:58 elukey@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ml-serve2002
08:58 elukey@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host ml-serve2002
08:58 elukey@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ml-serve2002.codfw.wmnet 43.16.192.10.in-addr.arpa 3.4.0.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
08:58 elukey@cumin1002: START - Cookbook sre.dns.wipe-cache ml-serve2002.codfw.wmnet 43.16.192.10.in-addr.arpa 3.4.0.0.6.1.0.0.2.9.1.0.0.1.0.0.2.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
08:58 elukey@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
08:58 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ml-serve2002 - elukey@cumin1002"
08:58 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ml-serve2002 - elukey@cumin1002"
08:55 Emperor: restart swift-proxy on ms-fe1010 T360913
08:53 elukey@cumin1002: START - Cookbook sre.dns.netbox
08:53 elukey@cumin1002: START - Cookbook sre.hosts.move-vlan for host ml-serve2002
08:52 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ml-serve2002.codfw.wmnet with OS bookworm
08:50 tgr@deploy2002: Finished scap sync-world: Backport for Add logging to help figure unserialization issues (T388725), Add logging to help figure unserialization issues (T388725) (duration: 16m 05s)
08:48 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti5007.eqsin.wmnet
08:45 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti5007.eqsin.wmnet
08:43 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5006.eqsin.wmnet
08:43 tgr@deploy2002: matmarex, tgr: Continuing with sync
08:43 XioNoX: deploy pfw policy - T389456
08:41 tgr@deploy2002: matmarex, tgr: Backport for Add logging to help figure unserialization issues (T388725), Add logging to help figure unserialization issues (T388725) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
08:40 XioNoX: merge/deploy network/data.yaml: add sandbox1-b3-magru
08:36 ayounsi@cumin1002: START - Cookbook sre.hosts.reboot-single for host ganeti5006.eqsin.wmnet
08:34 tgr@deploy2002: Started scap sync-world: Backport for Add logging to help figure unserialization issues (T388725), Add logging to help figure unserialization issues (T388725)
08:32 Emperor: restart swift-proxy on ms-fe2010 T360913
08:20 moritzm: installing python-cryptography security updates
08:09 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti5006.eqsin.wmnet
08:08 taavi@cumin1002: END (PASS) - Cookbook sre.wikireplicas.update-views (exit_code=0)
08:07 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti5006.eqsin.wmnet
08:05 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5005.eqsin.wmnet
08:03 taavi@cumin1002: START - Cookbook sre.wikireplicas.update-views
07:59 ayounsi@cumin1002: START - Cookbook sre.hosts.reboot-single for host ganeti5005.eqsin.wmnet
07:47 elukey: remove kartotherian from maps* bare metal nodes
07:45 marostegui@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P74275 and previous config saved to /var/cache/conftool/dbconfig/20250320-074459-root.json
07:41 btullis@deploy2002: Finished deploy [dumps/dumps@2fe1059]: Fixing index out of range error (duration: 00m 26s)
07:41 btullis@deploy2002: Started deploy [dumps/dumps@2fe1059]: Fixing index out of range error
07:35 btullis@deploy2002: Finished deploy [dumps/dumps@2fe1059]: Fixing index out of range error (duration: 00m 09s)
07:35 btullis@deploy2002: Started deploy [dumps/dumps@2fe1059]: Fixing index out of range error
07:35 btullis@deploy2002: Finished deploy [dumps/dumps@2fe1059]: Fixing index out of range error (duration: 00m 09s)
07:35 btullis@deploy2002: Started deploy [dumps/dumps@2fe1059]: Fixing index out of range error
07:32 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti5005.eqsin.wmnet
07:31 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti5005.eqsin.wmnet
07:29 marostegui@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P74274 and previous config saved to /var/cache/conftool/dbconfig/20250320-072953-root.json
07:24 moritzm: rebalance ganeti eqiad/C following reimages T382507
07:14 marostegui@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P74273 and previous config saved to /var/cache/conftool/dbconfig/20250320-071448-root.json
06:59 marostegui@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P74272 and previous config saved to /var/cache/conftool/dbconfig/20250320-065942-root.json
06:51 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 15:00:00 on db2165.codfw.wmnet with reason: Maintenance
06:50 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db2165.codfw.wmnet
06:44 marostegui@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P74271 and previous config saved to /var/cache/conftool/dbconfig/20250320-064437-root.json
06:43 marostegui@cumin1002: START - Cookbook sre.mysql.upgrade for db2165.codfw.wmnet
06:41 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2165 T389367', diff saved to https://phabricator.wikimedia.org/P74270 and previous config saved to /var/cache/conftool/dbconfig/20250320-064131-marostegui.json
06:40 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db2161 to s8 primary T389367', diff saved to https://phabricator.wikimedia.org/P74269 and previous config saved to /var/cache/conftool/dbconfig/20250320-064012-marostegui.json
06:39 marostegui: Starting s8 codfw failover from db2165 to db2161 - T389367
06:37 marostegui@deploy2002: Finished scap sync-world: Backport for Revert "db-production.php: Disable writes on es7" (duration: 12m 48s)
06:35 marostegui@cumin1002: dbctl commit (dc=all): 'Set db2161 with weight 0 T389367', diff saved to https://phabricator.wikimedia.org/P74268 and previous config saved to /var/cache/conftool/dbconfig/20250320-063509-marostegui.json
06:34 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 32 hosts with reason: Primary switchover s8 T389367
06:31 marostegui@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.finalize (exit_code=0) for the switch from codfw to eqiad for section s1
06:31 marostegui@cumin1002: START - Cookbook sre.switchdc.databases.finalize for the switch from codfw to eqiad for section s1
06:30 marostegui@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.finalize (exit_code=0) for the switch from codfw to eqiad for section s2
06:30 marostegui@cumin1002: START - Cookbook sre.switchdc.databases.finalize for the switch from codfw to eqiad for section s2
06:30 marostegui@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.finalize (exit_code=0) for the switch from codfw to eqiad for section s3
06:30 marostegui@deploy2002: marostegui: Continuing with sync
06:29 marostegui@cumin1002: START - Cookbook sre.switchdc.databases.finalize for the switch from codfw to eqiad for section s3
06:29 marostegui@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P74267 and previous config saved to /var/cache/conftool/dbconfig/20250320-062931-root.json
06:29 marostegui@deploy2002: marostegui: Backport for Revert "db-production.php: Disable writes on es7" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
06:29 marostegui@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.finalize (exit_code=0) for the switch from codfw to eqiad for section s4
06:28 marostegui@cumin1002: START - Cookbook sre.switchdc.databases.finalize for the switch from codfw to eqiad for section s4
06:28 marostegui@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.finalize (exit_code=0) for the switch from codfw to eqiad for section s5
06:26 marostegui@cumin1002: START - Cookbook sre.switchdc.databases.finalize for the switch from codfw to eqiad for section s5
06:26 marostegui@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.finalize (exit_code=0) for the switch from codfw to eqiad for section s6
06:26 marostegui@cumin1002: START - Cookbook sre.switchdc.databases.finalize for the switch from codfw to eqiad for section s6
06:25 marostegui@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.finalize (exit_code=0) for the switch from codfw to eqiad for section s7
06:25 marostegui@cumin1002: START - Cookbook sre.switchdc.databases.finalize for the switch from codfw to eqiad for section s7
06:24 marostegui@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.finalize (exit_code=0) for the switch from codfw to eqiad for section s8
06:24 marostegui@deploy2002: Started scap sync-world: Backport for Revert "db-production.php: Disable writes on es7"
06:24 marostegui@cumin1002: START - Cookbook sre.switchdc.databases.finalize for the switch from codfw to eqiad for section s8
06:23 marostegui@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.finalize (exit_code=0) for the switch from codfw to eqiad for section x1
06:23 marostegui@cumin1002: START - Cookbook sre.switchdc.databases.finalize for the switch from codfw to eqiad for section x1
06:23 marostegui@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.finalize (exit_code=0) for the switch from codfw to eqiad for section es6
06:23 marostegui@cumin1002: START - Cookbook sre.switchdc.databases.finalize for the switch from codfw to eqiad for section es6
06:21 marostegui@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.finalize (exit_code=0) for the switch from codfw to eqiad for section es7
06:21 marostegui@cumin1002: START - Cookbook sre.switchdc.databases.finalize for the switch from codfw to eqiad for section es7
06:20 marostegui@deploy2002: Finished scap sync-world: Backport for db-production.php: Disable writes on es7 (T388627) (duration: 11m 07s)
06:14 marostegui@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P74266 and previous config saved to /var/cache/conftool/dbconfig/20250320-061426-root.json
06:13 marostegui@deploy2002: marostegui: Continuing with sync
06:13 marostegui@deploy2002: marostegui: Backport for db-production.php: Disable writes on es7 (T388627) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
06:09 marostegui@deploy2002: Started scap sync-world: Backport for db-production.php: Disable writes on es7 (T388627)
02:09 ejegg: disabled dlocal at payments-wiki

2025-03-19

22:14 jhuneidi@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.21 refs T386216
21:26 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2249
21:21 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2249
21:19 eevans@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
21:18 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
21:18 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2249 to codfw - jhancock@cumin2002"
21:18 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2249 to codfw - jhancock@cumin2002"
21:17 eevans@cumin1002: START - Cookbook sre.dns.netbox
21:11 jhancock@cumin2002: START - Cookbook sre.dns.netbox
21:11 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
21:10 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
21:10 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
21:09 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
21:09 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
21:08 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
21:07 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
21:06 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
21:06 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
21:05 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
21:04 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
21:04 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
21:00 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host elastic1112.eqiad.wmnet with OS bullseye
20:59 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-ats (exit_code=0) Rolling upgrade/restart of Apache Traffic Server on A:eqiad and A:cp for 9.2.9-1wm1
20:58 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1112.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
20:56 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster cloudelastic: upgrade search plugins - bking@cumin2002 - T389119
20:55 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster cloudelastic: upgrade search plugins - bking@cumin2002 - T389119
20:50 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1113.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
20:49 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host elastic1111.eqiad.wmnet with OS bullseye
20:48 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1114.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
20:47 jclark@cumin1002: START - Cookbook sre.hosts.provision for host elastic1112.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
20:46 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1115.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
20:44 tgr_: UTC late deploys done
{{safesubst:SAL entry|1=20:42 tgr@deploy2002: Finished scap sync-world: Backport for LabsServices: use appservers service name for parsoid (T389252), wikitech: Remove $wgCookieDomain override (T389318), Revert^2 "Allowlist Special:WikimediaDebug on the shared domain", Revert^2 "Allowlist Special:WikimediaDebug on the shared domain", [[gerrit:1129353|Revert^2 "Fix SUL3 logi}}
20:38 jclark@cumin1002: START - Cookbook sre.hosts.provision for host elastic1113.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
20:37 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host elastic1113.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
20:36 jclark@cumin1002: START - Cookbook sre.hosts.provision for host elastic1113.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
20:36 jclark@cumin1002: START - Cookbook sre.hosts.provision for host elastic1114.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
20:36 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host elastic1113.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
20:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host elastic1113.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
20:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2248.codfw.wmnet with OS bookworm
20:35 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
20:35 jclark@cumin1002: START - Cookbook sre.hosts.provision for host elastic1115.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
20:34 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
20:34 tgr@deploy2002: tgr, bd808: Continuing with sync
20:23 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
{{safesubst:SAL entry|1=20:22 tgr@deploy2002: tgr, bd808: Backport for LabsServices: use appservers service name for parsoid (T389252), wikitech: Remove $wgCookieDomain override (T389318), Revert^2 "Allowlist Special:WikimediaDebug on the shared domain", Revert^2 "Allowlist Special:WikimediaDebug on the shared domain", [[gerrit:1129353|Revert^2 "Fix SUL3 login cohort logic}}
20:20 jclark@cumin1002: START - Cookbook sre.dns.netbox
20:19 jclark@cumin1002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
20:18 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2248.codfw.wmnet with reason: host reimage
20:17 bking@cumin2002: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster cloudelastic: upgrade search plugins - bking@cumin2002 - T389119
20:17 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster cloudelastic: upgrade search plugins - bking@cumin2002 - T389119
{{safesubst:SAL entry|1=20:16 tgr@deploy2002: Started scap sync-world: Backport for LabsServices: use appservers service name for parsoid (T389252), wikitech: Remove $wgCookieDomain override (T389318), Revert^2 "Allowlist Special:WikimediaDebug on the shared domain", Revert^2 "Allowlist Special:WikimediaDebug on the shared domain", [[gerrit:1129353|Revert^2 "Fix SUL3 login}}
20:15 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2248.codfw.wmnet with reason: host reimage
19:59 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host wikikube-worker2248.codfw.wmnet with OS bookworm
19:57 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2248.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
19:49 brett: upgrading varnishkafka to 1.1.0-5 on A:cp-ulsfo and cp30[66,74] (T389322)
19:46 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-worker2248.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
19:45 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2248
19:45 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2248
19:45 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
19:45 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2248 to codfw - jhancock@cumin2002"
19:44 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding wikikube-worker2248 to codfw - jhancock@cumin2002"
19:42 ebernhardson@deploy2002: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
19:42 ebernhardson@deploy2002: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
19:39 jhancock@cumin2002: START - Cookbook sre.dns.netbox
19:27 jnuche@deploy2002: Finished scap sync-world: Backport for Edit check: return early in debounced methods if surface is gone (T389394) (duration: 12m 34s)
19:23 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster cloudelastic: upgrade search plugins - bking@cumin2002 - T389119
19:22 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster cloudelastic: upgrade search plugins - bking@cumin2002 - T389119
19:19 jnuche@deploy2002: jnuche: Continuing with sync
19:19 jnuche@deploy2002: jnuche: Backport for Edit check: return early in debounced methods if surface is gone (T389394) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
19:14 jnuche@deploy2002: Started scap sync-world: Backport for Edit check: return early in debounced methods if surface is gone (T389394)
19:05 cstone: payments-wiki upgraded from 3eb57d3f to 8bcc8ff2
19:02 dzahn@dns1004: START - running authdns-update
19:02 cstone: donorwiki upgraded from 0f6d18f0 to 8bcc8ff2
19:02 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-ats Rolling upgrade/restart of Apache Traffic Server on A:eqiad and A:cp for 9.2.9-1wm1
19:02 bblack@dns1005: START - running authdns-update
18:59 dzahn@dns1004: END - running authdns-update
18:57 dzahn@dns1004: START - running authdns-update
18:40 dzahn@dns1004: END - running authdns-update
18:38 dzahn@dns1004: START - running authdns-update
18:33 bking@cumin2002: conftool action : set/pooled=no; selector: name=cloudelastic1008.eqiad.wmnet
18:30 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Unbanning all hosts in cloudelastic
18:30 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Unbanning all hosts in cloudelastic
18:29 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster cloudelastic: upgrade search plugins - bking@cumin2002 - T389119
18:28 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster cloudelastic: upgrade search plugins - bking@cumin2002 - T389119
18:27 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-ats (exit_code=0) Rolling upgrade/restart of Apache Traffic Server on A:codfw and A:cp for 9.2.9-1wm1
17:53 brett: Import varnishkafka 1.1.0-5 into wikimedia-bullseye component/varnish-staging (T389322)
17:53 brett: Import varnishkafka 1.1.0-5 into wikimedia-bullseye component/varnish-staging
17:52 sukhe: sudo cumin 'A:lvs-codfw' 'run-puppet-agent --enable "rolling out CR 1128937"'
17:49 sukhe: run agent on lvs2013 and restart pybal [CR 1128937]
17:46 herron@puppetserver1001: conftool action : set/pooled=yes; selector: name=aux-k8s-worker2004.codfw.wmnet
17:45 herron@puppetserver1001: conftool action : set/pooled=yes; selector: name=aux-k8s-worker2002.codfw.wmnet
17:43 sukhe: run agent on lvs2014 and restart pybal [CR 1128937]
17:43 sukhe: run agent on lvs2014 and restart pybal
17:33 sukhe: disable puppet on A:lvs-codfw to roll out CR 1128937
17:06 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2089.codfw.wmnet with OS bullseye
17:06 mvernon@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - mvernon@cumin2002"
17:06 mvernon@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - mvernon@cumin2002"
17:04 elukey: remove spurious kartotherian err files under config-master2001:/var/run/confd-template
16:55 jnuche@deploy2002: Finished scap sync-world: Backport for Fix error evaluating function `unit` (T389384) (duration: 12m 52s)
16:54 sukhe: restart pybal on lvs-low-traffic and secondary in eqiad/codfw
16:47 jnuche@deploy2002: hokwelum, jnuche: Continuing with sync
16:46 jnuche@deploy2002: hokwelum, jnuche: Backport for Fix error evaluating function `unit` (T389384) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
16:46 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2089.codfw.wmnet with reason: host reimage
16:43 elukey: restart pybal on lvs10[19,20] and run ipvsadm --delete-service --tcp-service 10.2.2.13:{443,6533}
16:43 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2089.codfw.wmnet with reason: host reimage
16:42 jnuche@deploy2002: Started scap sync-world: Backport for Fix error evaluating function `unit` (T389384)
16:40 elukey: restart pybal on lvs201[3,4] and run ipvsadm --delete-service --tcp-service 10.2.1.13:{443,6533}
16:35 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-serve2001.codfw.wmnet with OS bookworm
16:31 elukey: restart pybal on low-traffic eqiad/codfw to remove two old/unused kartotherian ports
16:27 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-ats Rolling upgrade/restart of Apache Traffic Server on A:codfw and A:cp for 9.2.9-1wm1
16:24 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host ms-be2089
16:24 mvernon@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-be2089
16:18 mvernon@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ms-be2089
16:18 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ms-be2089.codfw.wmnet 15.48.192.10.in-addr.arpa 5.1.0.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
16:18 mvernon@cumin2002: START - Cookbook sre.dns.wipe-cache ms-be2089.codfw.wmnet 15.48.192.10.in-addr.arpa 5.1.0.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
16:18 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:18 mvernon@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-be2089 - mvernon@cumin2002"
16:18 mvernon@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-be2089 - mvernon@cumin2002"
16:16 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-serve2001.codfw.wmnet with reason: host reimage
16:14 mvernon@cumin2002: START - Cookbook sre.dns.netbox
16:13 mvernon@cumin2002: START - Cookbook sre.hosts.move-vlan for host ms-be2089
16:13 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-serve2001.codfw.wmnet with reason: host reimage
16:12 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2089.codfw.wmnet with OS bullseye
15:53 hnowlan@deploy2002: Finished scap sync-world: Backport for debug: fix config syntax (T385155) (duration: 17m 11s)
15:52 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2075.codfw.wmnet with OS bullseye
15:46 hnowlan@deploy2002: hnowlan: Continuing with sync
15:41 hnowlan@deploy2002: hnowlan: Backport for debug: fix config syntax (T385155) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
15:41 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host ml-serve2001
15:41 elukey@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ml-serve2001
15:40 elukey@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host ml-serve2001
15:40 elukey@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ml-serve2001.codfw.wmnet 21.0.192.10.in-addr.arpa 1.2.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
15:40 elukey@cumin1002: START - Cookbook sre.dns.wipe-cache ml-serve2001.codfw.wmnet 21.0.192.10.in-addr.arpa 1.2.0.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
15:40 elukey@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:40 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ml-serve2001 - elukey@cumin1002"
15:40 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ml-serve2001 - elukey@cumin1002"
15:36 elukey@cumin1002: START - Cookbook sre.dns.netbox
15:36 hnowlan@deploy2002: Started scap sync-world: Backport for debug: fix config syntax (T385155)
15:35 elukey@cumin1002: START - Cookbook sre.hosts.move-vlan for host ml-serve2001
15:35 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ml-serve2001.codfw.wmnet with OS bookworm
15:34 hnowlan@deploy2002: Unlocked for deployment [ALL REPOSITORIES]: Switchover followup (duration: 02m 47s)
15:33 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2075.codfw.wmnet with reason: host reimage
15:31 hnowlan@deploy2002: Locking from deployment [ALL REPOSITORIES]: Switchover followup
15:29 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2075.codfw.wmnet with reason: host reimage
15:25 hnowlan@deploy2002: Sync cancelled.
15:25 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
15:24 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
15:24 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
15:24 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
15:23 hnowlan@deploy2002: hnowlan: Backport for debug: reorder debug backends for eqiad switchover (T385155) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
15:18 hnowlan@deploy2002: Started scap sync-world: Backport for debug: reorder debug backends for eqiad switchover (T385155)
15:11 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host ms-be2075
15:11 mvernon@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-be2075
15:07 mvernon@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ms-be2075
15:07 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ms-be2075.codfw.wmnet 147.0.192.10.in-addr.arpa 7.4.1.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
15:07 mvernon@cumin2002: START - Cookbook sre.dns.wipe-cache ms-be2075.codfw.wmnet 147.0.192.10.in-addr.arpa 7.4.1.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
15:06 mvernon@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:06 mvernon@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-be2075 - mvernon@cumin2002"
15:06 mvernon@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ms-be2075 - mvernon@cumin2002"
14:59 XioNoX: shutdown sessions to SGIX RS - T386987
14:57 mvernon@cumin2002: START - Cookbook sre.dns.netbox
14:56 mvernon@cumin2002: START - Cookbook sre.hosts.move-vlan for host ms-be2075
14:56 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2075.codfw.wmnet with OS bullseye
14:55 hnowlan@dns1004: END - running authdns-update
14:51 hnowlan@dns1004: START - running authdns-update
14:47 hnowlan@dns1004: END - running authdns-update
14:44 hnowlan@dns1004: START - running authdns-update
14:41 hnowlan@deploy2002: Unlocked for deployment [ALL REPOSITORIES]: Datacenter Switchover - T385155 (duration: 52m 40s)
14:40 hnowlan@cumin2002: END (PASS) - Cookbook sre.switchdc.mediawiki.09-run-puppet-on-db-masters (exit_code=0) for datacenter switchover from codfw to eqiad
14:29 hnowlan@cumin2002: START - Cookbook sre.switchdc.mediawiki.09-run-puppet-on-db-masters for datacenter switchover from codfw to eqiad
14:28 hnowlan@dns1004: END - running authdns-update
14:26 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2089.codfw.wmnet with OS bullseye
14:25 hnowlan@dns1004: START - running authdns-update
14:23 hnowlan@cumin2002: END (PASS) - Cookbook sre.switchdc.mediawiki.09-restore-ttl (exit_code=0) for datacenter switchover from codfw to eqiad
14:22 hnowlan@cumin2002: START - Cookbook sre.switchdc.mediawiki.09-restore-ttl for datacenter switchover from codfw to eqiad
14:21 hnowlan@cumin2002: END (PASS) - Cookbook sre.switchdc.mediawiki.08-start-maintenance (exit_code=0) for datacenter switchover from codfw to eqiad
14:21 root@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
14:21 root@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
14:21 root@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-cron: apply
14:21 root@deploy2002: helmfile [codfw] START helmfile.d/services/mw-cron: apply
14:19 hnowlan@cumin2002: START - Cookbook sre.switchdc.mediawiki.08-start-maintenance for datacenter switchover from codfw to eqiad
14:18 hnowlan@cumin2002: END (PASS) - Cookbook sre.switchdc.mediawiki.08-restart-mw-jobrunner (exit_code=0) for datacenter switchover from codfw to eqiad
14:18 root@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-jobrunner: sync
14:18 root@deploy2002: helmfile [codfw] START helmfile.d/services/mw-jobrunner: sync
14:18 hnowlan@cumin2002: START - Cookbook sre.switchdc.mediawiki.08-restart-mw-jobrunner for datacenter switchover from codfw to eqiad
14:17 hnowlan@cumin2002: END (PASS) - Cookbook sre.switchdc.mediawiki.07-set-readwrite (exit_code=0) for datacenter switchover from codfw to eqiad
14:17 hnowlan@cumin2002: MediaWiki read-only period ends at: 2025-03-19 14:17:55.451583
14:15 hnowlan@cumin2002: MediaWiki read-only period starts at: 2025-03-19 14:15:30.955779
14:15 hnowlan@cumin2002: START - Cookbook sre.switchdc.mediawiki.02-set-readonly for datacenter switchover from codfw to eqiad
14:14 hnowlan@cumin2002: END (PASS) - Cookbook sre.switchdc.mediawiki.01-stop-maintenance (exit_code=0) for datacenter switchover from codfw to eqiad
14:14 hnowlan@cumin2002: START - Cookbook sre.switchdc.mediawiki.01-stop-maintenance for datacenter switchover from codfw to eqiad
14:13 hnowlan@cumin2002: END (FAIL) - Cookbook sre.switchdc.mediawiki.01-stop-maintenance (exit_code=99) for datacenter switchover from codfw to eqiad
14:12 hnowlan@cumin2002: START - Cookbook sre.switchdc.mediawiki.01-stop-maintenance for datacenter switchover from codfw to eqiad
14:12 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
14:12 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
14:11 hnowlan@cumin2002: END (PASS) - Cookbook sre.switchdc.mediawiki.00-reduce-ttl (exit_code=0) for datacenter switchover from codfw to eqiad
14:07 jclark@cumin1002: START - Cookbook sre.dns.netbox
14:06 hnowlan@cumin2002: START - Cookbook sre.switchdc.mediawiki.00-reduce-ttl for datacenter switchover from codfw to eqiad
14:03 hnowlan@cumin2002: END (PASS) - Cookbook sre.switchdc.mediawiki.00-downtime-db-readonly-checks (exit_code=0) for datacenter switchover from codfw to eqiad
14:03 hnowlan@cumin2002: START - Cookbook sre.switchdc.mediawiki.00-downtime-db-readonly-checks for datacenter switchover from codfw to eqiad
14:03 hnowlan@cumin2002: END (PASS) - Cookbook sre.switchdc.mediawiki.00-disable-puppet (exit_code=0) for datacenter switchover from codfw to eqiad
14:03 hnowlan@cumin2002: START - Cookbook sre.switchdc.mediawiki.00-disable-puppet for datacenter switchover from codfw to eqiad
13:59 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
13:59 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
13:58 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
13:58 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
13:55 Lucas_WMDE: lucaswerkmeister-wmde@deploy2002 /srv/mediawiki-staging (master $ u=) $ mwscript-k8s --follow --comment=T388158 -- cleanupTitles kaawiki
13:48 hnowlan@deploy2002: Locking from deployment [ALL REPOSITORIES]: Datacenter Switchover - T385155
13:45 Lucas_WMDE: UTC afternoon backport+config window done
13:41 Lucas_WMDE: lucaswerkmeister-wmde@deploy2002 /srv/mediawiki-staging (master $ u=) $ mwscript-k8s --comment=T388158 --follow -- namespaceDupes kaawiki --fix | tee ~/T388158
13:40 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for Revert^4 "Add Portal namespace to kaawiki" (duration: 14m 01s)
13:33 lucaswerkmeister-wmde@deploy2002: jhsoby, lucaswerkmeister-wmde: Continuing with sync
13:33 lucaswerkmeister-wmde@deploy2002: jhsoby, lucaswerkmeister-wmde: Backport for Revert^4 "Add Portal namespace to kaawiki" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:26 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Revert^4 "Add Portal namespace to kaawiki"
13:18 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group1 to 1.44.0-wmf.21 refs T386216
13:12 moritzm: installing sqlite3 security updates
12:47 jnuche@deploy2002: Finished scap sync-world: Backport for JobExecutor: Activate wrapping span (T389331) (duration: 15m 06s)
12:44 XioNoX: trunk the sandbox vlan to ganeti500X - T385560
12:40 jnuche@deploy2002: kharlan, jnuche: Continuing with sync
12:39 jnuche@deploy2002: kharlan, jnuche: Backport for JobExecutor: Activate wrapping span (T389331) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
12:32 jnuche@deploy2002: Started scap sync-world: Backport for JobExecutor: Activate wrapping span (T389331)
12:26 ladsgroup@deploy2002: Finished scap sync-world: Backport for Bump thumbnail steps ratio to 25% (T360589) (duration: 16m 49s)
12:23 moritzm: installing openjdk 17 security updates on puppet servers (the necessary restarts may cause a few interrupted puppet runs and will be splayed out)
12:19 ladsgroup@deploy2002: ladsgroup: Continuing with sync
12:14 ladsgroup@deploy2002: ladsgroup: Backport for Bump thumbnail steps ratio to 25% (T360589) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
12:10 mvolz@deploy2002: helmfile [codfw] DONE helmfile.d/services/citoid: apply
12:10 mvolz@deploy2002: helmfile [codfw] START helmfile.d/services/citoid: apply
12:09 cmooney@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
12:09 ladsgroup@deploy2002: Started scap sync-world: Backport for Bump thumbnail steps ratio to 25% (T360589)
12:09 mvolz@deploy2002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
12:09 mvolz@deploy2002: helmfile [eqiad] START helmfile.d/services/citoid: apply
12:09 cmooney@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
12:08 mvolz@deploy2002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
12:08 cmooney@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
12:08 cmooney@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
12:08 mvolz@deploy2002: helmfile [eqiad] START helmfile.d/services/citoid: apply
12:08 cmooney@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
12:08 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti7003.magru.wmnet
12:07 cmooney@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
12:07 cmooney@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
12:07 cmooney@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
12:01 mvolz@deploy2002: helmfile [staging] DONE helmfile.d/services/citoid: apply
12:00 mvolz@deploy2002: helmfile [staging] START helmfile.d/services/citoid: apply
12:00 ayounsi@cumin1002: START - Cookbook sre.hosts.reboot-single for host ganeti7003.magru.wmnet
11:54 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
11:54 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti7003.magru.wmnet
11:52 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti7003.magru.wmnet
11:52 moritzm: installing gtk+3.0 bugfix updates from Bookworm point release
11:51 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply
11:50 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
11:48 ayounsi@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host atlas7001.wikimedia.org
11:48 ayounsi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM atlas7001.wikimedia.org - ayounsi@cumin1002"
11:48 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM atlas7001.wikimedia.org - ayounsi@cumin1002"
11:48 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) atlas7001.wikimedia.org on all recursors
11:48 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply
11:48 ayounsi@cumin1002: START - Cookbook sre.dns.wipe-cache atlas7001.wikimedia.org on all recursors
11:48 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
11:48 ayounsi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM atlas7001.wikimedia.org - ayounsi@cumin1002"
11:48 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM atlas7001.wikimedia.org - ayounsi@cumin1002"
11:47 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
11:40 ayounsi@cumin1002: START - Cookbook sre.dns.netbox
11:40 ayounsi@cumin1002: START - Cookbook sre.ganeti.makevm for new host atlas7001.wikimedia.org
11:37 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply
11:37 moritzm: switch ganeti master for magru01 to ganeti7001
11:36 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'.
11:35 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'.
11:34 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
11:33 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'.
11:14 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti7001.magru.wmnet
11:07 ayounsi@cumin1002: START - Cookbook sre.hosts.reboot-single for host ganeti7001.magru.wmnet
11:06 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti7001.magru.wmnet
10:59 ayounsi@cumin1002: START - Cookbook sre.hosts.reboot-single for host ganeti7001.magru.wmnet
10:55 vgutierrez: Upgrading cp4050 to Varnish 7 (T378737)
10:49 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti7001.magru.wmnet
10:48 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti7001.magru.wmnet
10:38 ayounsi@dns1004: END - running authdns-update
10:35 ayounsi@dns1004: START - running authdns-update
10:18 XioNoX: trunk sandbox vlan to ganeti7001/3 - T385560
10:13 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
10:13 ayounsi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: pickup new magru sandbox includes files - ayounsi@cumin1002"
10:13 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: pickup new magru sandbox includes files - ayounsi@cumin1002"
10:13 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host elastic1111.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
10:11 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-staging-ctrl2002.codfw.wmnet with OS bookworm
10:11 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/kartotherian: sync
10:10 vgutierrez: Upgrading cp4049 to Varnish 7 (T378737)
10:09 ayounsi@cumin1002: START - Cookbook sre.dns.netbox
10:09 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/kartotherian: sync
10:06 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/services/kartotherian: sync
10:04 ayounsi@cumin1002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
10:04 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/kartotherian: sync
10:02 elukey@cumin1002: START - Cookbook sre.hosts.provision for host elastic1111.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
10:00 ayounsi@cumin1002: START - Cookbook sre.dns.netbox
09:57 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host elastic1111.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
09:55 elukey@cumin1002: START - Cookbook sre.hosts.provision for host elastic1111.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
09:52 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-staging-ctrl2002.codfw.wmnet with reason: host reimage
09:48 XioNoX: add sandbox vlan on asw1-b3-magru - T385560
09:48 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-staging-ctrl2002.codfw.wmnet with reason: host reimage
09:40 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.21 refs T386216
09:32 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ml-staging-ctrl2002.codfw.wmnet with OS bookworm
09:31 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1248.eqiad.wmnet with reason: Maintenance
09:30 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-staging-ctrl2001.codfw.wmnet with OS bookworm
09:30 vgutierrez: Upgrading cp4048 to Varnish 7 (T378737)
09:30 marostegui: Stop MariaDB on db1248 T388837
09:19 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group1 to 1.44.0-wmf.21 refs T386216
09:13 dcausse@deploy2002: Finished deploy [airflow-dags/search@e55954c]: publish search artifacts (duration: 00m 38s)
09:13 dcausse@deploy2002: Started deploy [airflow-dags/search@e55954c]: publish search artifacts
09:12 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-staging-ctrl2001.codfw.wmnet with reason: host reimage
09:08 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-staging-ctrl2001.codfw.wmnet with reason: host reimage
09:06 moritzm: installing pdns-recursor security updates on DoH hosts
09:00 elukey: remove kartotherian.discovery.wmnet:{80,443} ports from LVS config (some extra noise may be registered)
08:51 logmsgbot: kharlan Deployed security patch for T389235
08:49 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host ml-staging-ctrl2001.codfw.wmnet with OS bookworm
08:37 logmsgbot: kharlan Deployed security patch for T389235
08:20 vgutierrez: Upgrading cp4046 to Varnish 7 (T378737)
07:54 jmm@cumin2002: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Jennifer Ebe out of all services on: 1294 hosts
07:53 jmm@cumin2002: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Jennifer Ebe out of all services on: 949 hosts
07:51 moritzm: rebalance ganeti eqiad/B following reimages T382507
06:52 vgutierrez: Upgrading cp4045 to Varnish 7 (T378737)
06:37 vgutierrez: Upgrading cp4042 to Varnish 7 (T378737)
05:16 kart_: Updated cxserver to 2025-03-14-045617-production (T382294)
05:15 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
05:14 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
05:13 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
05:13 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/cxserver: apply
05:09 eileen: civicrm upgraded from 8dadc9ac to 7b532ad7
05:04 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
05:04 kartik@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply
01:12 eileen: config revision changed from 19e7ed53 to d9d9d0a5
01:08 eileen: civicrm upgraded from 5783340a to 8dadc9ac
00:58 eileen: civicrm upgraded from 94636e03 to 5783340a
00:48 eileen: config revision changed from de33fff3 to 19e7ed53
00:17 sukhe: restart logrotate on cp3080

2025-03-18

23:22 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2089.codfw.wmnet with reason: host reimage
23:18 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2089.codfw.wmnet with reason: host reimage
23:17 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host elastic1112.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
23:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host elastic1112.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
23:14 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host elastic1112.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
23:14 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host elastic1111.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
23:13 jclark@cumin1002: START - Cookbook sre.hosts.provision for host elastic1111.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
23:13 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host elastic1111.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
23:13 jclark@cumin1002: START - Cookbook sre.hosts.provision for host elastic1112.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
23:13 jclark@cumin1002: START - Cookbook sre.hosts.provision for host elastic1111.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
23:04 brett@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp4041.ulsfo.wmnet
23:01 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2089.codfw.wmnet with OS bullseye
22:59 brett@puppetserver1001: conftool action : set/pooled=no; selector: name=cp4041.ulsfo.wmnet
22:59 brett: Upgrading cp4041 to Varnish 7 (T378737)
22:58 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host elastic1111
22:57 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host elastic1111
22:57 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host elastic1112
22:55 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host elastic1112
22:54 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
22:52 jclark@cumin1002: START - Cookbook sre.dns.netbox
22:50 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
22:50 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for elastic - jclark@cumin1002"
22:50 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for elastic - jclark@cumin1002"
22:45 jclark@cumin1002: START - Cookbook sre.dns.netbox
22:38 brett@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp4040.ulsfo.wmnet
22:31 brett@puppetserver1001: conftool action : set/pooled=no; selector: name=cp4040.ulsfo.wmnet
22:30 brett: Upgrading cp4040 to Varnish 7 (T378737)
22:01 brett@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp4039.ulsfo.wmnet
21:56 dancy@deploy2002: Installation of scap version "4.141.2" completed for 2 hosts
21:55 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2050.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
21:54 dancy@deploy2002: Installing scap version "4.141.2" for 2 host(s)
21:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2049.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
21:49 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2050.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
21:48 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2048.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
21:48 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2049.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
21:48 brett@puppetserver1001: conftool action : set/pooled=no; selector: name=cp4039.ulsfo.wmnet
21:48 brett: Upgrading cp4039 to Varnish 7 (T378737)
21:43 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2048.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
21:34 toyofuku: web deploy window done
21:28 toyofuku@deploy2002: Finished scap sync-world: Backport for Disable donation LINK on Catalan Wikipedia (T387768) (duration: 12m 35s)
21:21 toyofuku@deploy2002: toyofuku, jdlrobson: Continuing with sync
21:21 toyofuku@deploy2002: toyofuku, jdlrobson: Backport for Disable donation LINK on Catalan Wikipedia (T387768) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:16 toyofuku@deploy2002: Started scap sync-world: Backport for Disable donation LINK on Catalan Wikipedia (T387768)
21:10 brett@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp4038.ulsfo.wmnet
21:02 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-ats (exit_code=0) Rolling upgrade/restart of Apache Traffic Server on A:drmrs and A:cp for 9.2.9-1wm1
20:55 tgr_: late UTC deploys done
20:52 brett@puppetserver1001: conftool action : set/pooled=no; selector: name=cp4038.ulsfo.wmnet
20:52 brett: Upgrading cp4038 to Varnish 7 (T378737)
20:52 tgr@deploy2002: Finished scap sync-world: Backport for Enable SUL3 logins on group 0 (T384153) (duration: 24m 51s)
20:45 tgr@deploy2002: tgr: Continuing with sync
20:34 tgr@deploy2002: tgr: Backport for Enable SUL3 logins on group 0 (T384153) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:27 tgr@deploy2002: Started scap sync-world: Backport for Enable SUL3 logins on group 0 (T384153)
20:25 tgr@deploy2002: Finished scap sync-world: Backport for Edit check: set up the multi-check a/b test (T384372), Enable VisualEditor EditCheck multi-check a/b test on test2wiki (T384372), Growth: enable new way of refreshing LinkRecommendations for pilots (T386250) (duration: 16m 36s)
20:18 tgr@deploy2002: migr, kemayo, tgr: Continuing with sync
20:16 tgr@deploy2002: migr, kemayo, tgr: Backport for Edit check: set up the multi-check a/b test (T384372), Enable VisualEditor EditCheck multi-check a/b test on test2wiki (T384372), Growth: enable new way of refreshing LinkRecommendations for pilots (T386250) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:15 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti2048.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
20:14 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2048.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
20:09 tgr@deploy2002: Started scap sync-world: Backport for Edit check: set up the multi-check a/b test (T384372), Enable VisualEditor EditCheck multi-check a/b test on test2wiki (T384372), Growth: enable new way of refreshing LinkRecommendations for pilots (T386250)
20:06 brett@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp4037.ulsfo.wmnet
19:58 brett@puppetserver1001: conftool action : set/pooled=no; selector: name=cp4037.ulsfo.wmnet
19:54 brett: Upgrading remaining ulsfo cache nodes to Varnish 7 (T378737)
19:53 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti2048.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
19:53 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2048.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
19:51 dr0ptp4kt: Deployed refinery using scap, then deployed onto hdfs (concludes Deploying Refinery at 37a2ddf for c1126977 / T388654 tlwikisource to allowlist)
19:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2047.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
19:44 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2047.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
19:44 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti2047.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
19:43 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2047.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
19:42 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti2047.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
19:41 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2047.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
19:22 root@deploy2002: helmfile [aux-k8s-codfw] DONE helmfile.d/aux-k8s-services/jaeger: apply
19:21 root@deploy2002: helmfile [aux-k8s-codfw] START helmfile.d/aux-k8s-services/jaeger: apply
19:08 dr0ptp4kt@deploy2002: Finished deploy [analytics/refinery@37a2ddf] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@37a2ddfc] (duration: 00m 38s)
19:08 dr0ptp4kt@deploy2002: Started deploy [analytics/refinery@37a2ddf] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@37a2ddfc]
19:07 dr0ptp4kt@deploy2002: Finished deploy [analytics/refinery@37a2ddf] (thin): Regular analytics weekly train THIN [analytics/refinery@37a2ddfc] (duration: 00m 50s)
19:07 dr0ptp4kt@deploy2002: Started deploy [analytics/refinery@37a2ddf] (thin): Regular analytics weekly train THIN [analytics/refinery@37a2ddfc]
19:06 dr0ptp4kt@deploy2002: Finished deploy [analytics/refinery@37a2ddf]: Regular analytics weekly train [analytics/refinery@37a2ddfc] (duration: 02m 20s)
19:04 dr0ptp4kt@deploy2002: Started deploy [analytics/refinery@37a2ddf]: Regular analytics weekly train [analytics/refinery@37a2ddfc]
19:03 dr0ptp4kt: Deploying Refinery at 37a2ddf for c1126977 / T388654 tlwikisource to allowlist
18:50 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-ats Rolling upgrade/restart of Apache Traffic Server on A:drmrs and A:cp for 9.2.9-1wm1
18:42 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-ats (exit_code=0) Rolling upgrade/restart of Apache Traffic Server on P{cp70[02-16].magru.wmnet} and A:cp for 9.2.9-1wm1
18:35 topranks: re-enabling cr1-drmrs external circuits after upgrade
18:13 topranks: reboot cr1-drmrs to update JunOS (router is drained of traffic) T364092
18:13 cmooney@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 7 hosts with reason: Upgrade cr1-drmrs JunOS
17:54 root@deploy2002: helmfile [aux-k8s-codfw] DONE helmfile.d/aux-k8s-services/jaeger: apply
17:53 root@deploy2002: helmfile [aux-k8s-codfw] START helmfile.d/aux-k8s-services/jaeger: apply
17:40 root@deploy2002: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'apply'.
17:40 root@deploy2002: helmfile [aux-k8s-codfw] START helmfile.d/admin 'apply'.
17:31 root@deploy2002: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'apply'.
17:27 root@deploy2002: helmfile [aux-k8s-codfw] START helmfile.d/admin 'apply'.
17:26 topranks: resetting PIC 0 on cr1-drmrs (QSFP ports) to move link from port 1 to port 3 T389071
17:25 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-research: apply
17:25 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-research: apply
17:22 root@deploy2002: helmfile [aux-k8s-codfw] DONE helmfile.d/aux-k8s-services/jaeger: apply
17:20 swfrench@deploy2002: Finished scap sync-world: Switch mw-wikifuncions to PHP 8.1 - T383845 (duration: 11m 46s)
17:15 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2048.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
17:12 swfrench@deploy2002: Started scap sync-world: Switch mw-wikifuncions to PHP 8.1 - T383845
17:12 root@deploy2002: helmfile [aux-k8s-codfw] START helmfile.d/aux-k8s-services/jaeger: apply
17:11 root@deploy2002: helmfile [aux-k8s-codfw] DONE helmfile.d/aux-k8s-services/jaeger: apply
17:09 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2048.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
17:05 topranks: move traffic off cr1-drms to allow for pic reset / port reconfiguration T389071
17:04 root@deploy2002: helmfile [aux-k8s-codfw] START helmfile.d/aux-k8s-services/jaeger: apply
17:00 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host puppetserver2004.codfw.wmnet with OS bookworm
16:51 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp4038.ulsfo.wmnet
16:39 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-ats Rolling upgrade/restart of Apache Traffic Server on P{cp70[02-16].magru.wmnet} and A:cp for 9.2.9-1wm1
16:38 elukey: restart pybal on lvs1020 and lvs1019 to pick up kartotherian svc changes
16:38 fabfur: repooling cp4038 (T388147)
16:38 Ammar: T389226 Ran mwscript-k8s --comment="T389226" -f -- extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=mediawikiwiki --logwiki=metawiki 'Schwarze Feder' 'AndreasKemper'
16:37 Ammar: T389226 Ran mwscript-k8s --comment="T389226" -f -- extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=bnwikivoyage --logwiki=metawiki 'Arafatuniofdhaka' 'আরাফাত হোসেন ভূঁইয়া'
16:37 fabfur: enabled puppet on A:cp (T388147)
16:35 brett@cumin2002: END (FAIL) - Cookbook sre.cdn.roll-upgrade-ats (exit_code=1) Rolling upgrade/restart of Apache Traffic Server on A:magru and A:cp for 9.2.9-1wm1
16:34 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-ats Rolling upgrade/restart of Apache Traffic Server on A:magru and A:cp for 9.2.9-1wm1
16:33 elukey: restart pybal on lvs2013 (kartotherian's svc change)
16:33 root@deploy2002: helmfile [aux-k8s-codfw] DONE helmfile.d/aux-k8s-services/jaeger: apply
16:33 root@deploy2002: helmfile [aux-k8s-codfw] START helmfile.d/aux-k8s-services/jaeger: apply
16:29 elukey: restart pybal on lvs2014
16:29 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp4038.ulsfo.wmnet
16:28 fabfur: enabled puppet and depooled cp4038
16:27 elukey: disable puppet on lvs low traffic hosts in eqiad/codfw to restart pybal (kartotherian svc change)
16:25 fabfur: disabled puppet on A:cp for T388147
16:24 root@deploy2002: helmfile [aux-k8s-codfw] DONE helmfile.d/aux-k8s-services/jaeger: apply
16:21 root@deploy2002: helmfile [aux-k8s-codfw] START helmfile.d/aux-k8s-services/jaeger: apply
16:17 elukey: removed kartotherian-related confd error files from config-master2001 - related to a maintenance issue
16:13 brennen@deploy2002: Finished deploy [phabricator/deployment@8884125]: deploy phab1004 for T389220 (duration: 00m 52s)
16:12 brennen@deploy2002: Started deploy [phabricator/deployment@8884125]: deploy phab1004 for T389220
16:12 brennen@deploy2002: Finished deploy [phabricator/deployment@8884125]: deploy phab2002 for T389220 (duration: 00m 29s)
16:11 brennen@deploy2002: Started deploy [phabricator/deployment@8884125]: deploy phab2002 for T389220
16:10 elukey@puppetserver1001: conftool action : set/pooled=yes:weight=10; selector: name=wikikube-worker2.*,dc=codfw,cluster=maps,service=kartotherian-k8s-ssl
16:10 arnaudb@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on phab1004.eqiad.wmnet with reason: bugfix
16:10 elukey@puppetserver1001: conftool action : set/pooled=yes:weight=10; selector: name=wikikube-worker1.*,dc=eqiad,cluster=maps,service=kartotherian-k8s-ssl
16:10 arnaudb@cumin1002: DONE (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 1:00:00 on phab1004.eqiad.wmnet with reason: debugging T389079
16:09 arnaudb@cumin1002: DONE (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 1:00:00 on phabricator.wikimedia.org with reason: bug fix
16:09 arnaudb@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on phab1004.eqiad.wmnet with reason: bug fix
16:06 root@deploy2002: helmfile [aux-k8s-codfw] DONE helmfile.d/aux-k8s-services/jaeger: apply
16:02 root@deploy2002: helmfile [aux-k8s-codfw] START helmfile.d/aux-k8s-services/jaeger: apply
16:01 root@deploy2002: helmfile [aux-k8s-codfw] DONE helmfile.d/aux-k8s-services/jaeger: apply
15:56 claime: Silenced PHPFPMTooBusy for release=canary for 6d - T389224
15:51 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2047.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
15:51 root@deploy2002: helmfile [aux-k8s-codfw] START helmfile.d/aux-k8s-services/jaeger: apply
15:49 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host puppetserver2004.codfw.wmnet with OS bookworm
15:46 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ganeti2047.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
15:34 hnowlan@cumin2002: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0) depool all services in codfw: Datacenter Switchover - T385155
15:07 arnaudb@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on vrts1003.eqiad.wmnet with reason: debugging T389079
15:05 hnowlan@cumin2002: START - Cookbook sre.discovery.datacenter depool all services in codfw: Datacenter Switchover - T385155
15:01 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-main: apply
15:00 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-main: apply
14:56 inflatador: bking@logstash1033 running puppet agent to confirm that CR 1128880 didn't cause problems T386868
14:42 hnowlan@cumin2002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool site codfw [reason: Datacentre switchover, T387444]
14:42 hnowlan@cumin2002: START - Cookbook sre.dns.admin DNS admin: depool site codfw [reason: Datacentre switchover, T387444]
14:42 hnowlan@cumin2002: END (FAIL) - Cookbook sre.dns.admin (exit_code=99) DNS admin: depool site codfw [reason: no reason specified, no task ID specified]
14:41 hnowlan@cumin2002: START - Cookbook sre.dns.admin DNS admin: depool site codfw [reason: no reason specified, no task ID specified]
14:37 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti2047
14:37 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti2047
14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti2047
14:32 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti2047
14:28 tgr@deploy2002: Sync cancelled.
14:28 tgr@deploy2002: trainbranchbot, tgr: Backport for Revert "Revert^2 "Add Portal namespace to kaawiki"" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:26 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti2047
14:26 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti2047
14:24 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:23 tgr@deploy2002: Started scap sync-world: Backport for Revert "Revert^2 "Add Portal namespace to kaawiki""
14:22 jhancock@cumin2002: START - Cookbook sre.dns.netbox
13:58 tgr@deploy2002: tgr: Continuing with sync
13:57 tgr@deploy2002: tgr: Backport for Revert^2 "Add Portal namespace to kaawiki" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:51 tgr@deploy2002: Started scap sync-world: Backport for Revert^2 "Add Portal namespace to kaawiki"
13:50 godog: bounce mtail on centrallog2002
13:46 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
13:46 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
13:43 Lucas_WMDE: pulled d2fa9d6821 / Ieab79b7eb1 to /src/mediawiki-staging on deploy2002 to bring config back in sync with deployed state (due to failed deployment, T389203)
13:43 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
13:43 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
13:15 lucaswerkmeister-wmde@deploy2002: hashar, lucaswerkmeister-wmde: Continuing with sync
13:12 lucaswerkmeister-wmde@deploy2002: hashar, lucaswerkmeister-wmde: Backport for Add Portal namespace to kaawiki (T388158) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:05 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Add Portal namespace to kaawiki (T388158)
12:40 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'sync'.
12:39 moritzm: installing freetype2 security updates
12:39 elukey@deploy2002: helmfile [eqiad] START helmfile.d/admin 'sync'.
12:38 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'sync'.
12:38 elukey@deploy2002: helmfile [codfw] START helmfile.d/admin 'sync'.
12:35 moritzm: rebalance ganeti eqiad/A following reimages T382507
12:34 ladsgroup@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
12:33 ladsgroup@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
12:33 ladsgroup@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
12:32 ladsgroup@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
12:31 ladsgroup@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
12:31 ladsgroup@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
12:30 ladsgroup@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
12:30 ladsgroup@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: apply
12:30 ladsgroup@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
12:30 ladsgroup@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: apply
12:29 ladsgroup@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
12:28 ladsgroup@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: apply
12:27 dbrant@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
12:26 dbrant@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
12:26 dbrant@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
12:26 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
12:25 dbrant@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
12:24 dbrant@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
12:24 dbrant@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply
12:16 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/thumbor: apply
11:47 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1029.eqiad.wmnet to cluster eqiad and group A
11:46 ladsgroup@deploy2002: Finished scap sync-world: Backport for Bump thumbnail steps to 20% (T360589) (duration: 17m 41s)
11:43 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1029.eqiad.wmnet to cluster eqiad and group A
11:35 ladsgroup@deploy2002: ladsgroup: Continuing with sync
11:35 ladsgroup@deploy2002: ladsgroup: Backport for Bump thumbnail steps to 20% (T360589) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
11:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1029.eqiad.wmnet
11:28 ladsgroup@deploy2002: Started scap sync-world: Backport for Bump thumbnail steps to 20% (T360589)
11:26 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1029.eqiad.wmnet
11:04 ladsgroup@deploy2002: ladsgroup: Continuing with sync
11:01 ladsgroup@deploy2002: ladsgroup: Backport for Bump thumbnail steps to 20% (T360589) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
10:54 ladsgroup@deploy2002: Started scap sync-world: Backport for Bump thumbnail steps to 20% (T360589)
10:36 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti1029.eqiad.wmnet with OS bookworm
10:34 hnowlan@cumin1002: END (PASS) - Cookbook sre.discovery.datacenter (exit_code=0)
10:34 hnowlan@cumin1002: START - Cookbook sre.discovery.datacenter
10:34 jnuche@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.21 refs T386216
10:19 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti1029.eqiad.wmnet with reason: host reimage
10:15 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti1029.eqiad.wmnet with reason: host reimage
10:06 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
10:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
09:58 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti1029.eqiad.wmnet with OS bookworm
09:53 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
09:52 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
09:49 jmm@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ganeti1029.eqiad.wmnet
09:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1029.eqiad.wmnet
09:45 jnuche@deploy2002: jnuche: Continuing with sync
09:44 jnuche@deploy2002: jnuche: Backport for objectcache: Re-number array keys in SqlBagOStuff (T389169) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
09:39 jnuche@deploy2002: Started scap sync-world: Backport for objectcache: Re-number array keys in SqlBagOStuff (T389169)
09:37 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1029.eqiad.wmnet
09:07 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ganeti1029.eqiad.wmnet
08:45 jmm@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on ganeti1029.eqiad.wmnet with reason: remove from cluster for reimage
08:37 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
08:36 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
08:36 mlitn@deploy2002: Finished deploy [airflow-dags/platform_eng@e7be149]: (no justification provided) (duration: 00m 46s)
08:35 mlitn@deploy2002: Started deploy [airflow-dags/platform_eng@e7be149]: (no justification provided)
08:33 brouberol@dns1004: END - running authdns-update
08:30 brouberol@dns1004: START - running authdns-update
08:02 jmm@cumin2002: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Swagoel out of all services on: 949 hosts
07:59 jmm@cumin2002: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Swagoel out of all services on: 1293 hosts
07:56 mforns@deploy2002: Finished deploy [airflow-dags/analytics@e7be149]: hotfix for webrequest DAGs end_dates for k8s migration (duration: 02m 09s)
07:54 mforns@deploy2002: Started deploy [airflow-dags/analytics@e7be149]: hotfix for webrequest DAGs end_dates for k8s migration
07:16 hashar@deploy2002: Started scap sync-world: Backport for Add Portal namespace to kaawiki (T388158)
07:05 hashar@deploy2002: Started scap sync-world: Backport for Add Portal namespace to kaawiki (T388158)
06:40 hashar: Shifted UTC morning backport windows by an hour to take in account daylight saving time difference between USA and Europe
04:06 mwpresync@deploy2002: Pruned MediaWiki: 1.44.0-wmf.18 (duration: 06m 13s)
03:03 mwpresync@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.21 refs T386216
02:25 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ganeti2045.codfw.wmnet with OS bookworm
02:14 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti2045.codfw.wmnet with OS bookworm
02:07 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti2050
02:07 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti2050
02:06 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
02:04 jhancock@cumin2002: START - Cookbook sre.dns.netbox
01:58 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti2047
01:58 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti2047
01:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2046.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
01:32 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2046.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
01:29 eileen: civicrm upgraded from 6f23b50b to 94636e03
01:13 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1257.eqiad.wmnet with OS bookworm
01:13 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
01:13 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
00:54 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1257.eqiad.wmnet with reason: host reimage
00:50 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1257.eqiad.wmnet with reason: host reimage
00:39 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host db1257.eqiad.wmnet with OS bookworm
00:16 eileen: civicrm upgraded from aa582fe1 to 6f23b50b
00:14 zabe: zabe@mwmaint2002:~$ cat group0.dblist | xargs -I{} bash -c "echo {}; mwscript extensions/WikimediaMaintenance/migrateESRefToContentTableStage2.php {} --delete /home/zabe/afl_text_table_deletedump/{} --sleep 0.3" # T381599
00:11 tgr_: very late UTC deploys done

2025-03-17

23:59 tgr@deploy2002: Finished scap sync-world: Backport for Do not schedule edge login recursively (T389132) (duration: 15m 35s)
23:52 tgr@deploy2002: tgr: Continuing with sync
23:47 tgr@deploy2002: tgr: Backport for Do not schedule edge login recursively (T389132) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
23:43 tgr@deploy2002: Started scap sync-world: Backport for Do not schedule edge login recursively (T389132)
23:39 tgr@deploy2002: Finished scap sync-world: Backport for Re-apply "Try both SUL2 and SUL3 central domain for autologin" (T375796) (duration: 46m 45s)
23:37 zabe: zabe@mwmaint2002:~$ cat group0.dblist | xargs -I{} bash -c "echo {}; mwscript extensions/AbuseFilter/maintenance/MigrateESRefToAflTable.php {} --deletedump /home/zabe/afl_text_table_deletedump/{} --dump /home/zabe/afl_text_table_dump/{}" # T381599
23:33 tgr@deploy2002: tgr: Continuing with sync
23:02 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host db1257.eqiad.wmnet with OS bookworm
22:57 tgr@deploy2002: tgr: Backport for Re-apply "Try both SUL2 and SUL3 central domain for autologin" (T375796) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
22:53 tgr@deploy2002: Started scap sync-world: Backport for Re-apply "Try both SUL2 and SUL3 central domain for autologin" (T375796)
22:40 tgr@deploy2002: Finished scap sync-world: Backport for Re-apply "Fix some SUL3 shared domain settings" (T388218) (duration: 64m 35s)
22:34 tgr@deploy2002: tgr: Continuing with sync
21:40 tgr@deploy2002: tgr: Backport for Re-apply "Fix some SUL3 shared domain settings" (T388218) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:36 tgr@deploy2002: Started scap sync-world: Backport for Re-apply "Fix some SUL3 shared domain settings" (T388218)
21:35 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster cloudelastic: upgrade search plugins - bking@cumin2002 - T389119
21:34 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster cloudelastic: upgrade search plugins - bking@cumin2002 - T389119
21:32 tgr@deploy2002: Finished scap sync-world: Backport for Do not trigger edge login on the shared domain, Do not initiate central login on the passive central domain (T388218) (duration: 25m 53s)
21:30 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster cloudelastic: upgrade search plugins - bking@cumin2002 - T389119
21:29 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.UPGRADE (1 nodes at a time) for ElasticSearch cluster cloudelastic: upgrade search plugins - bking@cumin2002 - T389119
21:29 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2089.codfw.wmnet with OS bullseye
21:26 tgr@deploy2002: tgr: Continuing with sync
21:10 swfrench-wmf: ran `reprepro --delete clearvanished` to complete removal of unused component/pcre2 - T386006
21:10 tgr@deploy2002: tgr: Backport for Do not trigger edge login on the shared domain, Do not initiate central login on the passive central domain (T388218) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:08 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-ats (exit_code=0) Rolling upgrade/restart of Apache Traffic Server on A:eqsin and A:cp for 9.2.9-1wm1
21:06 tgr@deploy2002: Started scap sync-world: Backport for Do not trigger edge login on the shared domain, Do not initiate central login on the passive central domain (T388218)
20:57 tgr@deploy2002: Finished scap sync-world: Backport for GlobalContributions: Use unique CentralAuth tokens per request (T384717) (duration: 18m 00s)
20:50 tgr@deploy2002: tgr, mszabo: Continuing with sync
20:48 brett@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp4043.ulsfo.wmnet
20:45 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2089.codfw.wmnet with OS bullseye
20:43 tgr@deploy2002: tgr, mszabo: Backport for GlobalContributions: Use unique CentralAuth tokens per request (T384717) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:39 tgr@deploy2002: Started scap sync-world: Backport for GlobalContributions: Use unique CentralAuth tokens per request (T384717)
20:33 tgr@deploy2002: Finished scap sync-world: Backport for Lua: Prevent PHP errors in production from displayNumber lookup (T383924) (duration: 11m 54s)
20:32 brett@puppetserver1001: conftool action : set/pooled=no; selector: name=cp4043.ulsfo.wmnet
20:27 tgr@deploy2002: soda, tgr: Continuing with sync
20:26 tgr@deploy2002: soda, tgr: Backport for Lua: Prevent PHP errors in production from displayNumber lookup (T383924) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:21 tgr@deploy2002: Started scap sync-world: Backport for Lua: Prevent PHP errors in production from displayNumber lookup (T383924)
20:20 tgr@deploy2002: Finished scap sync-world: Backport for Enable Vector 2022 on Wikidata (T387154), Enable Donation banner on Catalan Wikipedia (T387768), Re-enable wgTrackGlobalJsonLinksNamespaces for JsonConf (T385917) (duration: 13m 28s)
20:13 tgr@deploy2002: bvibber, jdlrobson, tgr: Continuing with sync
20:10 tgr@deploy2002: bvibber, jdlrobson, tgr: Backport for Enable Vector 2022 on Wikidata (T387154), Enable Donation banner on Catalan Wikipedia (T387768), Re-enable wgTrackGlobalJsonLinksNamespaces for JsonConf (T385917) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:06 tgr@deploy2002: Started scap sync-world: Backport for Enable Vector 2022 on Wikidata (T387154), Enable Donation banner on Catalan Wikipedia (T387768), Re-enable wgTrackGlobalJsonLinksNamespaces for JsonConf (T385917)
20:06 brett@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp4051.ulsfo.wmnet
19:55 amastilovic@deploy2002: Finished deploy [airflow-dags/analytics@f0d67b6]: Keeping up with the Kubernetes migration (duration: 00m 46s)
19:54 amastilovic@deploy2002: Started deploy [airflow-dags/analytics@f0d67b6]: Keeping up with the Kubernetes migration
19:45 brett@puppetserver1001: conftool action : set/pooled=no; selector: name=cp4051.ulsfo.wmnet
19:36 cstone: donorwiki upgraded from ab9df085 to 0f6d18f0
19:32 ebernhardson@deploy2002: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
19:32 ebernhardson@deploy2002: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
19:28 ebernhardson@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
19:28 ebernhardson@deploy2002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
19:27 ebernhardson@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
19:27 ebernhardson@deploy2002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
19:20 ebernhardson@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
19:20 ebernhardson@deploy2002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
19:08 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T385896, xfer categories jnl) xfer categories from wdqs1021.eqiad.wmnet -> wdqs1025.eqiad.wmnet, repooling both afterwards
19:04 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-transfer (T385896, xfer categories jnl) xfer categories from wdqs1021.eqiad.wmnet -> wdqs1025.eqiad.wmnet, repooling both afterwards
18:54 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
18:53 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
18:53 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-ats Rolling upgrade/restart of Apache Traffic Server on A:eqsin and A:cp for 9.2.9-1wm1
18:51 dancy@deploy2002: Installation of scap version "4.141.1" completed for 1 hosts
18:50 dancy@deploy2002: Installing scap version "4.141.1" for 1 host(s)
18:45 dancy@deploy2002: Installing scap version "4.141.1" for 204 host(s)
18:22 ladsgroup@deploy2002: Synchronized portals: wikimedia.org updates (T373204) (duration: 02m 38s)
18:20 ladsgroup@deploy2002: Synchronized portals/wikipedia.org/assets: wikimedia.org updates (T373204) (duration: 12m 38s)
18:00 swfrench@deploy2002: Finished scap sync-world: Backport for Disable cookie-based enrollment in 8.1 (cleanup) (T383845) (duration: 12m 35s)
17:54 swfrench@deploy2002: swfrench: Continuing with sync
17:52 swfrench@deploy2002: swfrench: Backport for Disable cookie-based enrollment in 8.1 (cleanup) (T383845) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
17:51 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2089.codfw.wmnet with OS bullseye
17:48 swfrench@deploy2002: Started scap sync-world: Backport for Disable cookie-based enrollment in 8.1 (cleanup) (T383845)
17:44 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
17:44 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add dns entry for msw2-codfw - pt1979@cumin2002"
17:44 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add dns entry for msw2-codfw - pt1979@cumin2002"
17:43 swfrench-wmf: applied https://gerrit.wikimedia.org/r/1117638 to mediawiki statsd exporters
17:43 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-wikifunctions: apply
17:43 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-wikifunctions: apply
17:43 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
17:43 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-web: apply
17:42 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply
17:42 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply
17:42 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-misc: apply
17:42 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-misc: apply
17:42 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-jobrunner: apply
17:42 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-jobrunner: apply
17:42 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
17:41 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
17:41 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-ats (exit_code=0) Rolling upgrade/restart of Apache Traffic Server on P{cp30[66-72,74-80].esams.wmnet} and A:cp for 9.2.9-1wm1
17:41 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
17:41 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
17:41 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
17:41 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
17:40 pt1979@cumin2002: START - Cookbook sre.dns.netbox
17:38 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-wikifunctions: apply
17:38 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-wikifunctions: apply
17:38 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
17:37 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
17:37 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply
17:37 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply
17:37 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-misc: apply
17:37 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-misc: apply
17:36 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-jobrunner: apply
17:36 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-jobrunner: apply
17:36 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
17:36 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
17:36 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
17:36 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
17:36 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
17:35 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
17:29 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
17:29 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add dns entry for msw2-codfw - pt1979@cumin2002"
17:29 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
17:29 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add dns entry for msw2-codfw - pt1979@cumin2002"
17:28 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
17:28 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
17:28 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
17:25 pt1979@cumin2002: START - Cookbook sre.dns.netbox
17:16 ebernhardson@deploy2002: Finished deploy [airflow-dags/search@4eb42a4]: search: drop export_queries_to_relforge (duration: 00m 29s)
17:15 ebernhardson@deploy2002: Started deploy [airflow-dags/search@4eb42a4]: search: drop export_queries_to_relforge
17:07 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/services/kartotherian: sync
17:05 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/kartotherian: sync
17:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2089.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
17:01 akosiaris: silence GatewayBackendErrorsHigh lw_inference_reference_need_cluster in eqiad for 1 week
16:58 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2089.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
16:47 Dreamy_Jazz: dreamyjazz@deploy2002 Finished scap sync-world: Backport for Unset the old 'checkuser-temporary-account-viewer' group (T387205) (duration: 11m 41s)
16:47 Dreamy_Jazz: dreamyjazz@deploy2002 dreamyjazz: Continuing with sync
16:47 Dreamy_Jazz: dreamyjazz@deploy2002 dreamyjazz: Backport for Unset the old 'checkuser-temporary-account-viewer' group (T387205) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
16:47 Dreamy_Jazz: dreamyjazz@deploy2002 Started scap sync-world: Backport for Unset the old 'checkuser-temporary-account-viewer' group (T387205)
16:46 vgutierrez: downgrading HAProxy to version 2.8 in cp5032 (upload) - T386796
16:45 jhancock@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host wikikube-worker2327
16:45 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2327
16:45 vgutierrez: downgrading HAProxy to version 2.8 in cp5024 (text) - T386796
16:45 jhancock@cumin2002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host wikikube-worker2327
16:44 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2327
16:07 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2089.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
16:07 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.upgrade (exit_code=0) restarting P{lvs[5004-5006].eqsin.wmnet,lvs[4008-4009].ulsfo.wmnet} and A:liberica
16:05 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2089.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
16:04 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be2089.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
16:02 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.upgrade restarting P{lvs[5004-5006].eqsin.wmnet,lvs[4008-4009].ulsfo.wmnet} and A:liberica
15:50 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-ats Rolling upgrade/restart of Apache Traffic Server on P{cp30[66-72,74-80].esams.wmnet} and A:cp for 9.2.9-1wm1
15:47 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/services/kartotherian: sync
15:44 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2089.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
15:44 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-be2089
15:44 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ms-be2089
15:37 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/kartotherian: sync
15:34 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ms-be2075.codfw.wmnet with OS bullseye
15:32 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/kartotherian: sync
15:31 ladsgroup@deploy2002: Finished scap sync-world: Backport for Bump thumbnail steps ratio to 15% (T360589) (duration: 13m 20s)
15:31 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) config_reloading P{lvs3008.esams.wmnet} and A:liberica (T384477)
15:31 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.admin config_reloading P{lvs3008.esams.wmnet} and A:liberica (T384477)
15:31 vgutierrez: repool lvs3008 running liberica - T384477
15:30 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/kartotherian: sync
15:25 ladsgroup@deploy2002: ladsgroup: Continuing with sync
15:23 ladsgroup@deploy2002: ladsgroup: Backport for Bump thumbnail steps ratio to 15% (T360589) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
15:18 ladsgroup@deploy2002: Started scap sync-world: Backport for Bump thumbnail steps ratio to 15% (T360589)
15:16 elukey@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host puppetserver2004.codfw.wmnet with OS bookworm
15:14 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ms-be2075.codfw.wmnet with reason: host reimage
15:11 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs3008.esams.wmnet with OS bookworm
15:10 mvernon@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ms-be2075.codfw.wmnet with reason: host reimage
15:06 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
15:05 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
14:58 tgr@deploy2002: Finished scap sync-world: Backport for Revert "Try both SUL2 and SUL3 central domain for autologin" (duration: 12m 35s)
14:53 herron@deploy2002: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'sync'.
14:53 herron@deploy2002: helmfile [aux-k8s-codfw] START helmfile.d/admin 'sync'.
14:52 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs3008.esams.wmnet with reason: host reimage
14:51 herron@deploy2002: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'sync'.
14:51 tgr@deploy2002: trainbranchbot, tgr: Continuing with sync
14:51 herron@deploy2002: helmfile [aux-k8s-codfw] START helmfile.d/admin 'sync'.
14:51 herron@deploy2002: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'sync'.
14:51 herron@deploy2002: helmfile [aux-k8s-codfw] START helmfile.d/admin 'sync'.
14:50 tgr@deploy2002: trainbranchbot, tgr: Backport for Revert "Try both SUL2 and SUL3 central domain for autologin" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:48 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs3008.esams.wmnet with reason: host reimage
14:45 tgr@deploy2002: Started scap sync-world: Backport for Revert "Try both SUL2 and SUL3 central domain for autologin"
14:45 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host puppetserver2004.codfw.wmnet with OS bookworm
14:44 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/kartotherian: sync
14:43 elukey@deploy2002: helmfile [staging] START helmfile.d/services/kartotherian: sync
14:43 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host puppetserver2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
14:40 mvernon@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2075.codfw.wmnet with OS bullseye
14:40 mvernon@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2075.codfw.wmnet with OS bullseye
14:36 tgr@deploy2002: Sync cancelled.
14:33 elukey@cumin1002: START - Cookbook sre.hosts.provision for host puppetserver2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
14:31 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host puppetserver2004.codfw.wmnet with OS bookworm
14:29 vgutierrez: upgrading HAProxy to version 3.1 in cp5024 (text) - T386796
14:29 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs3008.esams.wmnet with OS bookworm
14:27 mvernon@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2075.codfw.wmnet with OS bullseye
14:26 vgutierrez@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lvs3008.esams.wmnet with reason: depooled before reimage
14:25 mvernon@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2075.codfw.wmnet with OS bullseye
14:23 tgr@deploy2002: tgr: Backport for Try both SUL2 and SUL3 central domain for autologin (T375796) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:22 vgutierrez: depooling lvs3008 before being reimaged - T384477
14:22 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/kartotherian: sync
14:19 tgr@deploy2002: Started scap sync-world: Backport for Try both SUL2 and SUL3 central domain for autologin (T375796)
14:18 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
14:18 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db1257.eqiad.wmnet with OS bookworm
14:17 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
14:17 tgr@deploy2002: Finished scap sync-world: Backport for Revert "Fix some SUL3 shared domain settings" (duration: 11m 52s)
14:14 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on puppetserver2004.codfw.wmnet with reason: host reimage
14:13 elukey@deploy2002: helmfile [staging] START helmfile.d/services/kartotherian: sync
14:12 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
14:12 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
14:11 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on puppetserver2004.codfw.wmnet with reason: host reimage
14:10 tgr@deploy2002: trainbranchbot, tgr: Continuing with sync
14:10 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) config_reloading P{lvs3009.esams.wmnet} and A:liberica (T384477)
14:10 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.admin config_reloading P{lvs3009.esams.wmnet} and A:liberica (T384477)
14:09 vgutierrez: repool lvs3009 running liberica - T384477
14:09 tgr@deploy2002: trainbranchbot, tgr: Backport for Revert "Fix some SUL3 shared domain settings" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:05 tgr@deploy2002: Started scap sync-world: Backport for Revert "Fix some SUL3 shared domain settings"
14:01 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs3009.esams.wmnet with OS bookworm
13:58 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host puppetserver2004.codfw.wmnet with OS bookworm
13:57 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-main: apply
13:56 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-main: apply
13:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host puppetserver2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
13:56 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
13:55 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
13:55 kevinbazira@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'article-models' for release 'main' .
13:52 kevinbazira@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'article-models' for release 'main' .
13:52 tgr@deploy2002: Started scap sync-world: Backport for Fix some SUL3 shared domain settings (T388218)
13:48 tgr@deploy2002: Finished scap sync-world: Backport for Enable credentials change special pages on SUL3 shared domain (T362715) (duration: 25m 42s)
13:47 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1029.eqiad.wmnet
13:47 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1029.eqiad.wmnet
13:46 elukey@cumin1002: START - Cookbook sre.hosts.provision for host puppetserver2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
13:45 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1029.eqiad.wmnet
13:41 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs3009.esams.wmnet with reason: host reimage
13:40 tgr@deploy2002: tgr: Continuing with sync
13:39 godog: begin moving k8s prometheus instances from prometheus2005 to prometheus2007 - T383232
13:38 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host db1257.eqiad.wmnet with OS bookworm
13:38 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs3009.esams.wmnet with reason: host reimage
13:31 vgutierrez: uploaded HAProxy 3.1.5 to apt.wm.o (bullseye-wikimedia) component thirdparty/haproxy31 - T386796
13:27 tgr@deploy2002: tgr: Backport for Enable credentials change special pages on SUL3 shared domain (T362715) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:25 mvernon@cumin1002: START - Cookbook sre.hosts.reimage for host ms-be2075.codfw.wmnet with OS bullseye
13:25 vgutierrez: upgrading HAProxy to version 3.1 in cp5032 (upload) - T386796
13:23 tgr@deploy2002: Started scap sync-world: Backport for Enable credentials change special pages on SUL3 shared domain (T362715)
13:19 tgr@deploy2002: Finished scap sync-world: Backport for Revert "Disable new WebAuthn credentials creation" (T378402 T389064), sqwiktionary: update logo, wordmark, tagline and icon (T342172), Growth: eswiki+cswiki - enable new way of refreshing LinkRecommendations (T386250) (duration: 13m 27s)
13:18 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs3009.esams.wmnet with OS bookworm
13:14 ayounsi@cumin1002: END (FAIL) - Cookbook sre.network.debug (exit_code=99) for Netbox interface ID 20595
13:14 ayounsi@cumin1002: START - Cookbook sre.network.debug for Netbox interface ID 20595
13:11 tgr@deploy2002: tgr, migr, anzx: Continuing with sync
13:10 tgr@deploy2002: tgr, migr, anzx: Backport for Revert "Disable new WebAuthn credentials creation" (T378402 T389064), sqwiktionary: update logo, wordmark, tagline and icon (T342172), Growth: eswiki+cswiki - enable new way of refreshing LinkRecommendations (T386250) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:08 vgutierrez: depooling lvs3009 before being reimaged - T384477
13:08 vgutierrez@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lvs3009.esams.wmnet with reason: depooled before reimage
13:06 tgr@deploy2002: Started scap sync-world: Backport for Revert "Disable new WebAuthn credentials creation" (T378402 T389064), sqwiktionary: update logo, wordmark, tagline and icon (T342172), Growth: eswiki+cswiki - enable new way of refreshing LinkRecommendations (T386250)
12:59 kevinbazira@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'article-models' for release 'main' .
12:54 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
12:53 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: apply
12:53 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
12:52 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: apply
12:51 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
12:50 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: apply
12:38 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
12:38 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
12:27 kamila@deploy2002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
12:27 kamila@deploy2002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
12:25 kamila@deploy2002: helmfile [staging] OK helmfile.d/services/zotero: apply
12:25 kamila@deploy2002: helmfile [staging] OK helmfile.d/services/wikifunctions: apply
12:24 kamila@deploy2002: helmfile [staging] OK helmfile.d/services/wikifeeds: apply
12:24 kamila@deploy2002: helmfile [staging] OK helmfile.d/services/wikidata-query-gui: apply
12:23 kamila@deploy2002: helmfile [staging] OK helmfile.d/services/toolhub: apply
12:23 kamila@deploy2002: helmfile [staging] OK helmfile.d/services/thumbor: apply
12:23 kamila@deploy2002: helmfile [staging] OK helmfile.d/services/termbox: apply
12:22 kamila@deploy2002: helmfile [staging] OK helmfile.d/services/shellbox-video: apply
12:22 kamila@deploy2002: helmfile [staging] OK helmfile.d/services/shellbox-timeline: apply
12:21 kamila@deploy2002: helmfile [staging] OK helmfile.d/services/shellbox-syntaxhighlight: apply
12:21 kamila@deploy2002: helmfile [staging] OK helmfile.d/services/shellbox-media: apply
12:20 kamila@deploy2002: helmfile [staging] OK helmfile.d/services/shellbox-constraints: apply
12:20 kamila@deploy2002: helmfile [staging] OK helmfile.d/services/shellbox: apply
12:20 kamila@deploy2002: helmfile [staging] OK helmfile.d/services/sessionstore: apply
12:19 kamila@deploy2002: helmfile [staging] OK helmfile.d/services/rest-gateway: apply
12:19 kamila@deploy2002: helmfile [staging] OK helmfile.d/services/recommendation-api: apply
12:19 kamila@deploy2002: helmfile [staging] OK helmfile.d/services/rdf-streaming-updater: apply
12:18 kamila@deploy2002: helmfile [staging] OK helmfile.d/services/ratelimit: apply
12:18 kamila@deploy2002: helmfile [staging] OK helmfile.d/services/push-notifications: apply
12:18 kamila@deploy2002: helmfile [staging] OK helmfile.d/services/proton: apply
12:17 kamila@deploy2002: helmfile [staging] OK helmfile.d/services/page-analytics: apply
12:17 kamila@deploy2002: helmfile [staging] OK helmfile.d/services/mobileapps: apply
12:17 kamila@deploy2002: helmfile [staging] OK helmfile.d/services/miscweb: apply
12:16 kamila@deploy2002: helmfile [staging] OK helmfile.d/services/media-analytics: apply
12:16 kamila@deploy2002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
12:16 kamila@deploy2002: helmfile [staging] OK helmfile.d/services/mathoid: apply
12:16 kamila@deploy2002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
12:15 kamila@deploy2002: helmfile [staging] OK helmfile.d/services/kartotherian: apply
12:15 kamila@deploy2002: helmfile [staging] DONE helmfile.d/services/tegola-vector-tiles: apply
12:14 kamila@deploy2002: helmfile [staging] START helmfile.d/services/tegola-vector-tiles: apply
12:09 kamila@deploy2002: helmfile [staging] OK helmfile.d/services/linkrecommendation: apply
11:59 kamila@deploy2002: helmfile [staging] OK helmfile.d/services/image-suggestion: apply
11:59 kamila@deploy2002: helmfile [staging] OK helmfile.d/services/geo-analytics: apply
11:59 kamila@deploy2002: helmfile [staging] OK helmfile.d/services/eventstreams-internal: apply
11:58 kamila@deploy2002: helmfile [staging] OK helmfile.d/services/eventstreams: apply
11:58 kamila@deploy2002: helmfile [staging] OK helmfile.d/services/eventgate-main: apply
11:57 kamila@deploy2002: helmfile [staging] OK helmfile.d/services/eventgate-logging-external: apply
11:57 kamila@deploy2002: helmfile [staging] OK helmfile.d/services/eventgate-analytics-external: apply
11:56 kamila@deploy2002: helmfile [staging] OK helmfile.d/services/eventgate-analytics: apply
11:56 kamila@deploy2002: helmfile [staging] OK helmfile.d/services/editor-analytics: apply
11:56 kamila@deploy2002: helmfile [staging] OK helmfile.d/services/edit-analytics: apply
11:55 kamila@deploy2002: helmfile [staging] OK helmfile.d/services/echostore: apply
11:55 kamila@deploy2002: helmfile [staging] OK helmfile.d/services/device-analytics: apply
11:55 kamila@deploy2002: helmfile [staging] OK helmfile.d/services/developer-portal: apply
11:54 kamila@deploy2002: helmfile [staging] OK helmfile.d/services/data-gateway: apply
11:54 kamila@deploy2002: helmfile [staging] OK helmfile.d/services/cxserver: apply
11:53 kamila@deploy2002: helmfile [staging] OK helmfile.d/services/commons-impact-analytics: apply
11:53 kamila@deploy2002: helmfile [staging] OK helmfile.d/services/citoid: apply
11:53 kamila@deploy2002: helmfile [staging] OK helmfile.d/services/cirrus-streaming-updater: apply
11:53 kamila@deploy2002: helmfile [staging] OK helmfile.d/services/chart-renderer: apply
11:52 kamila@deploy2002: helmfile [staging] OK helmfile.d/services/changeprop-jobqueue: apply
11:52 ladsgroup@cumin1002: END (PASS) - Cookbook sre.wikireplicas.update-views (exit_code=0)
11:52 kamila@deploy2002: helmfile [staging] OK helmfile.d/services/changeprop: apply
11:49 kamila@deploy2002: helmfile [staging] OK helmfile.d/services/api-gateway: apply
11:49 MichaelG_WMF: `time mwscript extensions/GrowthExperiments/maintenance/fixLinkRecommendationData.php --wiki=shwiki --search-index --verbose 2>&1 | tee ~/shwiki-searchindex.txt`
11:49 kamila@deploy2002: helmfile [staging] OK helmfile.d/services/apertium: apply
11:47 MichaelG_WMF: `time mwscript extensions/GrowthExperiments/maintenance/fixLinkRecommendationData.php --wiki=arzwiki --search-index --verbose 2>&1 | tee ~/arzwiki-searchindex.txt`
11:45 MichaelG_WMF: running `time mwscript extensions/GrowthExperiments/maintenance/fixLinkRecommendationData.php --wiki=arzwiki --db-table --verbose --force 2>&1 | tee ~/arzwiki-dbtable.txt`
11:44 kamila@cumin1002: END (PASS) - Cookbook sre.k8s.wipe-cluster (exit_code=0) Wipe the K8s cluster staging-eqiad: k8s upgrade
11:44 kamila@deploy2002: helmfile [staging] DONE helmfile.d/services/ipoid: apply
11:44 kamila@deploy2002: helmfile [staging] START helmfile.d/services/ipoid: apply
11:43 ladsgroup@cumin1002: START - Cookbook sre.wikireplicas.update-views
11:43 MichaelG_WMF: running `time mwscript extensions/GrowthExperiments/maintenance/fixLinkRecommendationData.php --wiki=idwiki --db-table --verbose --force 2>&1 | tee ~/idwiki-dbtable.txt`
11:43 ladsgroup@cumin1002: END (PASS) - Cookbook sre.wikireplicas.update-views (exit_code=0)
11:41 MichaelG_WMF: running `time mwscript extensions/GrowthExperiments/maintenance/fixLinkRecommendationData.php --wiki=ptwiki --db-table --verbose --force 2>&1 | tee ~/ptwiki-dbtable.txt`
11:40 kamila@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
11:39 MichaelG_WMF: running `time mwscript extensions/GrowthExperiments/maintenance/fixLinkRecommendationData.php --wiki=eswiki --db-table --verbose --force 2>&1 | tee ~/eswiki-dbtable.txt`
11:38 kamila@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
11:38 kamila@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'.
11:38 kamila@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'sync'.
11:38 kamila@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'sync'.
11:38 ladsgroup@cumin1002: START - Cookbook sre.wikireplicas.update-views
11:37 kamila@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
11:37 kamila@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
11:36 kamila@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'.
11:36 kamila@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'sync'.
11:36 kamila@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'.
11:36 kamila@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'sync'.
11:36 kamila@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'.
11:36 kamila@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'sync'.
11:35 kamila@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'.
11:35 kamila@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'sync'.
11:33 kamila@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'.
11:32 kamila@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'sync'.
11:32 kamila@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'.
11:30 Dreamy_Jazz: Running `mwscript migrateUserGroup.php --wiki=X checkuser-temporary-account-viewer temporary-account-viewer` for all wikis with temporary accounts enabled or known (testwiki, loginwiki, test2wiki, metawiki, cswikiversity, igwiki, itwikiquote, swwiki, shwiki, fawiktionary, jawikibooks, zh_yuewiki, dawiki, srwiki, rowiki, nowiki, metawiki)
11:23 Dreamy_Jazz: Ran `mwscript migrateUserGroup.php --wiki=testwiki checkuser-temporary-account-viewer temporary-account-viewer`
11:22 kamila@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'sync'.
11:22 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs3010.esams.wmnet with OS bookworm
11:21 kamila@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
11:21 kamila@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
11:20 kamila@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'.
11:20 kamila@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'sync'.
11:20 kamila@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'.
11:19 kamila@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'sync'.
11:19 kamila@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'.
11:19 kamila@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'sync'.
11:19 kamila@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'sync'.
11:19 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.upgrade (exit_code=0) upgradeing P{lvs[4008-4009].ulsfo.wmnet,lvs5004.eqsin.wmnet} and A:liberica
11:18 dreamyjazz@deploy2002: Finished scap sync-world: Backport for Re-enable the 'temporary-account-viewer' group for migration (T387205), Remove obsolete $wgParserCacheNewKeySchemaRatio (T373037) (duration: 12m 32s)
11:16 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.upgrade upgradeing P{lvs[4008-4009].ulsfo.wmnet,lvs5004.eqsin.wmnet} and A:liberica
11:14 kamila@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
11:14 kamila@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
11:13 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.upgrade (exit_code=0) upgradeing A:liberica-magru
11:11 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.upgrade upgradeing A:liberica-magru
11:10 dreamyjazz@deploy2002: dreamyjazz, hashar: Continuing with sync
11:10 dreamyjazz@deploy2002: dreamyjazz, hashar: Backport for Re-enable the 'temporary-account-viewer' group for migration (T387205), Remove obsolete $wgParserCacheNewKeySchemaRatio (T373037) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
11:07 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.upgrade (exit_code=0) upgradeing P{lvs[5005-5006].eqsin.wmnet} and A:liberica
11:06 dreamyjazz@deploy2002: Started scap sync-world: Backport for Re-enable the 'temporary-account-viewer' group for migration (T387205), Remove obsolete $wgParserCacheNewKeySchemaRatio (T373037)
11:04 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.upgrade upgradeing P{lvs[5005-5006].eqsin.wmnet} and A:liberica
11:04 ladsgroup@deploy2002: Finished scap sync-world: Backport for findBadBlobs: Allow for timestamp based search via --scan-to (T351953), media: Make SvgHandler respect physicalWidth when building URL for thumb (T360589) (duration: 14m 09s)
11:02 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs3010.esams.wmnet with reason: host reimage
10:59 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.upgrade (exit_code=0) upgradeing P{lvs5006.eqsin.wmnet} and A:liberica
10:58 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.upgrade upgradeing P{lvs5006.eqsin.wmnet} and A:liberica
10:57 ladsgroup@deploy2002: ladsgroup: Continuing with sync
10:57 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs3010.esams.wmnet with reason: host reimage
10:55 kamila@cumin1002: START - Cookbook sre.k8s.wipe-cluster Wipe the K8s cluster staging-eqiad: k8s upgrade
10:55 ladsgroup@deploy2002: ladsgroup: Backport for findBadBlobs: Allow for timestamp based search via --scan-to (T351953), media: Make SvgHandler respect physicalWidth when building URL for thumb (T360589) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
10:50 ladsgroup@deploy2002: Started scap sync-world: Backport for findBadBlobs: Allow for timestamp based search via --scan-to (T351953), media: Make SvgHandler respect physicalWidth when building URL for thumb (T360589)
10:43 jynus: restarting dbprov2005 T389052
10:38 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs3010.esams.wmnet with OS bookworm
10:37 vgutierrez@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host lvs3010.esams.wmnet with OS bookworm
10:21 marostegui@deploy2002: Finished scap sync-world: Backport for Revert "db-production.php: Disable writes on es6" (duration: 14m 41s)
10:21 marostegui@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.prepare (exit_code=0) for the switch from codfw to eqiad for section s1
10:19 marostegui@cumin1002: START - Cookbook sre.switchdc.databases.prepare for the switch from codfw to eqiad for section s1
10:18 marostegui@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.prepare (exit_code=0) for the switch from codfw to eqiad for section s2
10:17 marostegui@cumin1002: START - Cookbook sre.switchdc.databases.prepare for the switch from codfw to eqiad for section s2
10:17 marostegui@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.prepare (exit_code=0) for the switch from codfw to eqiad for section s3
10:16 marostegui@cumin1002: START - Cookbook sre.switchdc.databases.prepare for the switch from codfw to eqiad for section s3
10:15 marostegui@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.prepare (exit_code=0) for the switch from codfw to eqiad for section s4
10:14 marostegui@cumin1002: START - Cookbook sre.switchdc.databases.prepare for the switch from codfw to eqiad for section s4
10:14 marostegui@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.prepare (exit_code=0) for the switch from codfw to eqiad for section s5
10:13 marostegui@deploy2002: marostegui: Continuing with sync
10:12 marostegui@cumin1002: START - Cookbook sre.switchdc.databases.prepare for the switch from codfw to eqiad for section s5
10:12 marostegui@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.prepare (exit_code=0) for the switch from codfw to eqiad for section s8
10:11 marostegui@deploy2002: marostegui: Backport for Revert "db-production.php: Disable writes on es6" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
10:10 marostegui@cumin1002: START - Cookbook sre.switchdc.databases.prepare for the switch from codfw to eqiad for section s8
10:10 marostegui@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.prepare (exit_code=0) for the switch from codfw to eqiad for section s7
10:09 marostegui@cumin1002: START - Cookbook sre.switchdc.databases.prepare for the switch from codfw to eqiad for section s7
10:09 marostegui@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.prepare (exit_code=0) for the switch from codfw to eqiad for section s6
10:07 marostegui@cumin1002: START - Cookbook sre.switchdc.databases.prepare for the switch from codfw to eqiad for section s6
10:07 marostegui@deploy2002: Started scap sync-world: Backport for Revert "db-production.php: Disable writes on es6"
10:07 marostegui@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.prepare (exit_code=0) for the switch from codfw to eqiad for section x1
10:06 marostegui@cumin1002: START - Cookbook sre.switchdc.databases.prepare for the switch from codfw to eqiad for section x1
10:06 marostegui@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.prepare (exit_code=0) for the switch from codfw to eqiad for section es7
10:05 marostegui@cumin1002: START - Cookbook sre.switchdc.databases.prepare for the switch from codfw to eqiad for section es7
10:02 marostegui@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.prepare (exit_code=0) for the switch from codfw to eqiad for section es6
10:01 marostegui@cumin1002: START - Cookbook sre.switchdc.databases.prepare for the switch from codfw to eqiad for section es6
10:00 marostegui@deploy2002: Finished scap sync-world: Backport for db-production.php: Disable writes on es6 (T388626) (duration: 23m 25s)
09:56 vgutierrez@cumin1002: END (FAIL) - Cookbook sre.loadbalancer.upgrade (exit_code=1) upgradeing P{lvs4010.ulsfo.wmnet} and A:liberica
09:55 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.upgrade upgradeing P{lvs4010.ulsfo.wmnet} and A:liberica
09:50 marostegui@deploy2002: marostegui: Continuing with sync
09:50 marostegui@deploy2002: marostegui: Backport for db-production.php: Disable writes on es6 (T388626) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
09:45 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs3010.esams.wmnet with OS bookworm
09:37 marostegui@deploy2002: Started scap sync-world: Backport for db-production.php: Disable writes on es6 (T388626)
09:25 moritzm: installing intel-microcode security updates
09:19 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1002.eqiad.wmnet
09:14 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1002.eqiad.wmnet
08:46 moritzm: freed 28G of disk space on maps1009
08:30 moritzm: updated bookworm installer image to Bookworm 12.10 T389034
08:23 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: sync
08:23 elukey@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: sync
08:23 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
08:23 elukey@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: sync
07:29 marostegui@cumin2002: END (PASS) - Cookbook sre.switchdc.databases.prepare (exit_code=0) for the switch from codfw to eqiad for section x1
07:28 marostegui@cumin2002: START - Cookbook sre.switchdc.databases.prepare for the switch from codfw to eqiad for section x1
07:17 marostegui@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.prepare (exit_code=0) for the switch from codfw to eqiad for section x1
07:17 marostegui@cumin1002: START - Cookbook sre.switchdc.databases.prepare for the switch from codfw to eqiad for section x1
07:16 marostegui@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.prepare (exit_code=0) for the switch from codfw to eqiad for section test-s4
07:15 marostegui@cumin1002: START - Cookbook sre.switchdc.databases.prepare for the switch from codfw to eqiad for section test-s4
07:14 marostegui@cumin1002: END (PASS) - Cookbook sre.switchdc.databases.prepare (exit_code=0) for the switch from codfw to eqiad for section x1
07:13 marostegui@cumin1002: START - Cookbook sre.switchdc.databases.prepare for the switch from codfw to eqiad for section x1

2025-03-16

14:08 claime: sudo postqueue -j | jq -r 'select(.sender == "vrts-bounce@wikimedia.org") | .queue_id' | sudo postsuper -d - # mx-out1001
13:59 Emperor: sudo postqueue -j | jq -r ' select(.recipients[0].address == "vrts-bounce@wikimedia.org") | select(.recipients[1].address == null) | .queue_id' | sudo postsuper -d - # mx-in2001
13:39 Emperor: restart postfix on mx-in2001

2025-03-15

18:11 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2046.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
18:00 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2046.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
00:51 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti2046
00:51 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti2046

2025-03-14

21:25 zabe: zabe@mwmaint2002:~$ mwscript extensions/WikimediaMaintenance/migrateESRefToContentTableStage2.php testwiki --delete /home/zabe/afl_text_table_deletedump/testwiki --sleep 0.3 # T381599
21:06 zabe: zabe@mwmaint2002:~$ mwscript extensions/AbuseFilter/maintenance/MigrateESRefToAflTable.php testwiki --dump /home/zabe/afl_text_table_dump/testwiki --deletedump /home/zabe/afl_text_table_deletedump/testwiki --sleep 0.3 # T381599
16:51 root@deploy2002: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'sync'.
16:51 root@deploy2002: helmfile [aux-k8s-codfw] START helmfile.d/admin 'sync'.
16:15 slyngshede@cumin1002: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Vivian Rook out of all services on: 2288 hosts
16:14 slyngshede@cumin1002: DONE (FAIL) - Cookbook sre.idm.logout (exit_code=99) Logging Vivian Rook out of all services on: 2288 hosts
16:05 sukhe@cumin1002: END (FAIL) - Cookbook sre.loadbalancer.admin (exit_code=1) pooling A:liberica-canary
16:05 sukhe@cumin1002: START - Cookbook sre.loadbalancer.admin pooling A:liberica-canary
16:04 sukhe@cumin1002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) depooling A:liberica-canary
16:04 sukhe@cumin1002: START - Cookbook sre.loadbalancer.admin depooling A:liberica-canary
16:04 sukhe@cumin1002: END (FAIL) - Cookbook sre.loadbalancer.admin (exit_code=1) pooling A:liberica-canary
16:03 sukhe@cumin1002: START - Cookbook sre.loadbalancer.admin pooling A:liberica-canary
16:01 sukhe@cumin1002: END (FAIL) - Cookbook sre.loadbalancer.admin (exit_code=1) pooling A:liberica-canary
16:00 sukhe@cumin1002: START - Cookbook sre.loadbalancer.admin pooling A:liberica-canary
16:00 sukhe@cumin1002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) depooling A:liberica-canary
16:00 sukhe@cumin1002: START - Cookbook sre.loadbalancer.admin depooling A:liberica-canary
15:50 slyngshede@cumin1002: DONE (FAIL) - Cookbook sre.idm.logout (exit_code=99) Logging Vivian Rook out of all services on: 2288 hosts
15:41 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2075.codfw.wmnet with OS bullseye
15:41 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2075.codfw.wmnet with OS bullseye
15:37 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ms-be2075.codfw.wmnet with OS bullseye
15:20 root@deploy2002: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'sync'.
15:20 root@deploy2002: helmfile [aux-k8s-codfw] START helmfile.d/admin 'sync'.
15:20 root@deploy2002: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'sync'.
15:20 root@deploy2002: helmfile [aux-k8s-codfw] START helmfile.d/admin 'sync'.
15:19 root@deploy2002: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'sync'.
15:19 root@deploy2002: helmfile [aux-k8s-codfw] START helmfile.d/admin 'sync'.
15:19 root@deploy2002: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'sync'.
15:18 root@deploy2002: helmfile [aux-k8s-codfw] START helmfile.d/admin 'sync'.
14:55 herron: kafka-logging reduce mediawiki.httpd.accesslog topic retention from 172800000ms (2d) to 129600000ms (1.5d)
13:33 kevinbazira@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'article-models' for release 'main' .
13:14 pt1979@cumin1002: END (FAIL) - Cookbook sre.hosts.dhcp (exit_code=99) for host cloudgw1003.eqiad.wmnet
13:13 volans: installed cumin v5.1.1 on cloudcumin* and cuminunpriv* hosts
12:03 hnowlan@cumin2002: END (PASS) - Cookbook sre.switchdc.mediawiki.08-start-maintenance (exit_code=0) for datacenter switchover from eqiad to codfw
12:02 root@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-cron: apply
12:02 root@deploy2002: helmfile [codfw] START helmfile.d/services/mw-cron: apply
12:02 root@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-cron: apply
12:02 root@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-cron: apply
12:00 hnowlan@cumin2002: START - Cookbook sre.switchdc.mediawiki.08-start-maintenance for datacenter switchover from eqiad to codfw
11:52 hnowlan@cumin2002: END (PASS) - Cookbook sre.switchdc.mediawiki.01-stop-maintenance (exit_code=0) for datacenter switchover from eqiad to codfw
11:52 hnowlan@cumin2002: START - Cookbook sre.switchdc.mediawiki.01-stop-maintenance for datacenter switchover from eqiad to codfw
11:40 hnowlan@cumin2002: END (FAIL) - Cookbook sre.switchdc.mediawiki.01-stop-maintenance (exit_code=99) for datacenter switchover from eqiad to codfw
11:40 hnowlan@cumin2002: START - Cookbook sre.switchdc.mediawiki.01-stop-maintenance for datacenter switchover from eqiad to codfw
11:36 volans: uploaded cumin_5.1.1 to apt.wikimedia.org bullseye-wikimedia
11:13 stevemunene@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on an-worker1199.eqiad.wmnet with reason: Adding the hosts to the analytics hadoop cluster in batches. this is part of the next batch
11:13 stevemunene@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on 9 hosts with reason: Adding the hosts to the analytics hadoop cluster in batches. this is part of the next batch
10:58 godog: set 80GB (per 6x partition ~500GB) retention for udp_localhost-err topic in kafka-logging eqiad
10:57 godog: set 150GB (per 6x partition = ~1TB) retention for udp_localhost-warning topic in kafka-logging eqiad
10:37 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
10:37 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
10:19 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1257.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
10:17 elukey@cumin2002: START - Cookbook sre.hosts.provision for host db1257.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
10:10 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1257.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
10:09 elukey@cumin2002: START - Cookbook sre.hosts.provision for host db1257.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
10:08 elukey@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host db1257.eqiad.wmnet with OS bookworm
09:38 godog: set 1TB retention for udp_localhost-warning topic in kafka-logging eqiad
09:36 godog: set 400G retention for udp_localhost-err topic in kafka-logging eqiad
09:31 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: sync
09:31 elukey@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: sync
09:30 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
09:30 elukey@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: sync
09:19 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host db1257.eqiad.wmnet with OS bookworm
09:18 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1257.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
09:05 elukey@cumin2002: START - Cookbook sre.hosts.provision for host db1257.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
01:49 cstone: civicrm upgraded from 52226531 to aa582fe1

2025-03-13

23:11 ladsgroup@deploy2002: Finished scap sync-world: Backport for Temporarily enable mobile sitenotice for fawiki (duration: 10m 07s)
23:05 ladsgroup@deploy2002: ladsgroup: Continuing with sync
23:04 ladsgroup@deploy2002: ladsgroup: Backport for Temporarily enable mobile sitenotice for fawiki synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
23:01 ladsgroup@deploy2002: Started scap sync-world: Backport for Temporarily enable mobile sitenotice for fawiki
22:58 reedy@deploy2002: Finished scap sync-world: Backport for FilterEvaluator::rmspecials: Disable PCRE JIT for this call too (T385452) (duration: 68m 05s)
22:52 reedy@deploy2002: reedy: Continuing with sync
21:53 reedy@deploy2002: reedy: Backport for FilterEvaluator::rmspecials: Disable PCRE JIT for this call too (T385452) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:50 reedy@deploy2002: Started scap sync-world: Backport for FilterEvaluator::rmspecials: Disable PCRE JIT for this call too (T385452)
21:35 mutante: lists1004 - systemctl start wmf_auto_restart_exim4 which was failed for some reason
{{safesubst:SAL entry|1=21:34 jhuneidi@deploy2002: Finished scap sync-world: Backport for PreferenceHelper: Handle another case of getGlobalPreferencesValues returning false (T388073), FilterEvaluator::rmdoubles: Disable PCRE JIT for this call (T385452), Score: Handle parser passing $code of null and bail out (T388821), [[gerrit:1127570|SidebarBeforeOutputHookHandler::getItemId: Bail early i}}
21:28 jhuneidi@deploy2002: reedy, jhuneidi: Continuing with sync
{{safesubst:SAL entry|1=21:28 jhuneidi@deploy2002: reedy, jhuneidi: Backport for PreferenceHelper: Handle another case of getGlobalPreferencesValues returning false (T388073), FilterEvaluator::rmdoubles: Disable PCRE JIT for this call (T385452), Score: Handle parser passing $code of null and bail out (T388821), [[gerrit:1127570|SidebarBeforeOutputHookHandler::getItemId: Bail early if Title i}}
{{safesubst:SAL entry|1=21:25 jhuneidi@deploy2002: Started scap sync-world: Backport for PreferenceHelper: Handle another case of getGlobalPreferencesValues returning false (T388073), FilterEvaluator::rmdoubles: Disable PCRE JIT for this call (T385452), Score: Handle parser passing $code of null and bail out (T388821), [[gerrit:1127570|SidebarBeforeOutputHookHandler::getItemId: Bail early if}}
21:16 eevans@deploy2002: helmfile [staging] DONE helmfile.d/services/data-gateway: apply
21:16 eevans@deploy2002: helmfile [staging] START helmfile.d/services/data-gateway: apply
20:59 ladsgroup@deploy2002: Finished scap sync-world: Backport for Bump the thumbnail steps ratio to 10% (T360589) (duration: 12m 56s)
20:57 ladsgroup@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
20:56 ladsgroup@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
20:56 ladsgroup@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
20:54 ladsgroup@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
20:54 ladsgroup@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
20:54 ladsgroup@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
20:53 ladsgroup@deploy2002: ladsgroup: Continuing with sync
20:50 ladsgroup@deploy2002: ladsgroup: Backport for Bump the thumbnail steps ratio to 10% (T360589) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:49 ladsgroup@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
20:49 ladsgroup@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
20:47 ladsgroup@deploy2002: Started scap sync-world: Backport for Bump the thumbnail steps ratio to 10% (T360589)
20:44 jhuneidi@deploy2002: Finished scap sync-world: Backport for Rebuild logo files (T387448), Logos: Fix order of guwwikinews in yaml file (T387448), logos: have CI fail on uncommited logos.php changes (T341412) (duration: 19m 18s)
20:38 jhuneidi@deploy2002: hashar, pppery, jhuneidi: Continuing with sync
20:28 jhuneidi@deploy2002: hashar, pppery, jhuneidi: Backport for Rebuild logo files (T387448), Logos: Fix order of guwwikinews in yaml file (T387448), logos: have CI fail on uncommited logos.php changes (T341412) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:25 jhuneidi@deploy2002: Started scap sync-world: Backport for Rebuild logo files (T387448), Logos: Fix order of guwwikinews in yaml file (T387448), logos: have CI fail on uncommited logos.php changes (T341412)
20:03 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Unbanning all hosts in cloudelastic
20:03 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Unbanning all hosts in cloudelastic
19:56 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1012.eqiad.wmnet with OS bullseye
19:54 swfrench@deploy2002: Finished scap sync-world: apply rsyslog config changes - T388799 (duration: 08m 09s)
19:47 jynus: forcing a reboot of db1248 from console T388837
19:47 swfrench@deploy2002: Started scap sync-world: apply rsyslog config changes - T388799
19:44 cwhite: depooled db1248, unchanged db1245
19:42 cwhite@cumin2002: dbctl commit (dc=all): 'depool db1245', diff saved to https://phabricator.wikimedia.org/P74224 and previous config saved to /var/cache/conftool/dbconfig/20250313-194204-cwhite.json
19:36 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1012.eqiad.wmnet with reason: host reimage
19:34 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
19:33 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
19:32 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1012.eqiad.wmnet with reason: host reimage
19:24 swfrench-wmf: mw-(api-int|jobrunner|parsoid): reverted all traffic back to 'main' release - T383845
19:04 ebernhardson@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
19:04 ebernhardson@deploy2002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
19:04 ebernhardson@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
19:04 ebernhardson@deploy2002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
18:58 ebernhardson@deploy2002: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
18:58 ebernhardson@deploy2002: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
18:39 jhuneidi@deploy2002: rebuilt and synchronized wikiversions files: group2 to 1.44.0-wmf.20 refs T386215
18:17 jiji@deploy2002: Finished scap sync-world: scap run to deploy switch to PHP 8.1 images - T383845 (duration: 10m 28s)
18:11 brett@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp3066.esams.wmnet
18:10 root@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'sync'.
18:10 root@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/admin 'sync'.
18:09 root@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'sync'.
18:09 root@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/admin 'sync'.
18:09 jiji@deploy2002: Started scap sync-world: scap run to deploy switch to PHP 8.1 images - T383845
18:08 root@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'sync'.
18:08 root@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/admin 'sync'.
18:02 jiji@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply
18:01 root@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'sync'.
18:01 root@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/admin 'sync'.
18:00 jiji@deploy2002: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply
17:56 jiji@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply
17:56 jiji@deploy2002: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply
17:55 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply
17:54 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply
17:50 root@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'sync'.
17:50 brett@puppetserver1001: conftool action : set/pooled=no; selector: name=cp3066.esams.wmnet
17:49 brett: Upgrading cp3066 to Varnish 7 (T378737)
17:49 root@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/admin 'sync'.
17:48 jiji@deploy2002: Stopping before sync operations
17:47 jiji@deploy2002: Started scap sync-world: No-sync scap run to switch image flavours to PHP 8.1 - T383845
17:47 root@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'sync'.
17:46 root@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/admin 'sync'.
17:44 swfrench@deploy2002: Unlocked for deployment [ALL REPOSITORIES]: Taking scap lock while awaiting coordinated puppet change (duration: 34m 27s)
17:37 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: cloudelastic1012* for ban host prior to reimage - bking@cumin2002 - T387904
17:37 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: cloudelastic1012* for ban host prior to reimage - bking@cumin2002 - T387904
17:10 root@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'sync'.
17:10 root@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/admin 'sync'.
17:09 swfrench@deploy2002: Locking from deployment [ALL REPOSITORIES]: Taking scap lock while awaiting coordinated puppet change
17:05 brett@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp3074.esams.wmnet
16:44 hashar: deployment server: rebased /srv/mediawiki-staging for 3 noop changes (d4e1c561e..a66406939)
16:43 brett@puppetserver1001: conftool action : set/pooled=no; selector: name=cp3074.esams.wmnet
16:42 brett: Upgrading cp3074 to Varnish 7 (T378737)
16:41 Emperor: restart swift-proxy on ms-fe2009
16:41 root@deploy1003: helmfile [aux-k8s-codfw] DONE helmfile.d/admin 'sync'.
16:41 root@deploy1003: helmfile [aux-k8s-codfw] START helmfile.d/admin 'sync'.
16:36 moritzm: installing gunicorn security updates
16:30 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-ats (exit_code=0) Rolling upgrade/restart of Apache Traffic Server on P{cp[3073,3081].esams.wmnet} and A:cp for 9.2.9-1wm1
16:25 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
16:24 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
16:24 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
16:23 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
16:22 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
16:21 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply
16:18 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-ats Rolling upgrade/restart of Apache Traffic Server on P{cp[3073,3081].esams.wmnet} and A:cp for 9.2.9-1wm1
15:57 klausman@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
15:57 klausman@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
15:56 klausman@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
15:54 klausman@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
15:48 klausman@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revision-models' for release 'main' .
15:48 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revision-models' for release 'main' .
15:48 klausman@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revision-models' for release 'main' .
15:44 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revision-models' for release 'main' .
15:24 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
15:23 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
15:22 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
15:21 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
15:21 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
15:21 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply
15:17 klausman@deploy2002: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply
15:17 klausman@deploy2002: helmfile [codfw] START helmfile.d/services/api-gateway: apply
15:15 klausman@deploy2002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply
15:15 klausman@deploy2002: helmfile [eqiad] START helmfile.d/services/api-gateway: apply
15:13 klausman@deploy2002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
15:12 klausman@deploy2002: helmfile [staging] START helmfile.d/services/api-gateway: apply
15:09 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1012.eqiad.wmnet with OS bullseye
15:03 cmooney@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
15:03 cmooney@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
15:01 cmooney@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
15:01 cmooney@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
14:57 Lucas_WMDE: UTC afternoon backport+config window done
14:57 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for Enable SUL3 signup for everyone (T384218), Set $wgSul3RolloutUserPercentage on some testwikis (T384153), Reapply "Make WikibaseQualityConstraints use split-graph query service" (T374021) (duration: 10m 24s)
14:55 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/api-gateway: sync
14:54 moritzm: restarting FPM on Phabricator to pick up gnutls security updates
14:54 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/api-gateway: sync
14:50 lucaswerkmeister-wmde@deploy2002: tgr, lucaswerkmeister-wmde: Continuing with sync
14:50 moritzm: restarting slapd on serpens/seaborgium to pick up gnutls updates
14:50 lucaswerkmeister-wmde@deploy2002: tgr, lucaswerkmeister-wmde: Backport for Enable SUL3 signup for everyone (T384218), Set $wgSul3RolloutUserPercentage on some testwikis (T384153), Reapply "Make WikibaseQualityConstraints use split-graph query service" (T374021) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:48 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1012.eqiad.wmnet with reason: host reimage
14:47 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Enable SUL3 signup for everyone (T384218), Set $wgSul3RolloutUserPercentage on some testwikis (T384153), Reapply "Make WikibaseQualityConstraints use split-graph query service" (T374021)
14:45 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1012.eqiad.wmnet with reason: host reimage
14:44 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for Follow-up Ia4b9f65b6: Fix argument order passed to EditCheckFactory#create (T388722) (duration: 11m 31s)
14:37 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, kemayo: Continuing with sync
14:35 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, kemayo: Backport for Follow-up Ia4b9f65b6: Fix argument order passed to EditCheckFactory#create (T388722) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:35 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1012.eqiad.wmnet with OS bullseye
14:35 jmm@cumin2002: END (PASS) - Cookbook sre.o11y.roll-restart-reboot-logstash-collectors (exit_code=0) rolling restart_daemons on A:logstash-collector
14:33 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
14:33 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
14:32 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Follow-up Ia4b9f65b6: Fix argument order passed to EditCheckFactory#create (T388722)
14:32 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-main: apply
14:32 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-main: apply
14:31 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
14:31 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudelastic1012.eqiad.wmnet with OS bullseye
14:31 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
14:30 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply
14:30 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply
14:27 jmm@cumin2002: START - Cookbook sre.o11y.roll-restart-reboot-logstash-collectors rolling restart_daemons on A:logstash-collector
14:26 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ms-be2075.codfw.wmnet with OS bullseye
14:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be2075.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
14:19 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2075.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
14:12 moritzm: installing gnutls security updates
14:06 jiji@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply
14:05 effie: restarting parsoid on codfw
14:04 jiji@deploy2002: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply
14:01 kcvelaga@deploy2002: Finished deploy [airflow-dags/analytics_product@554407c]: T362615 (duration: 01m 39s)
14:00 kcvelaga@deploy2002: Started deploy [airflow-dags/analytics_product@554407c]: T362615
13:50 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1012.eqiad.wmnet with OS bullseye
13:46 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudelastic1012.eqiad.wmnet with OS bullseye
13:45 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1012.eqiad.wmnet with OS bullseye
13:44 bking@cumin2002: conftool action : set/pooled=no; selector: service=cloudelastic,name=cloudelastic1012.eqiad.wmnet
13:22 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revision-models' for release 'main' .
13:22 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revision-models' for release 'main' .
13:21 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revision-models' for release 'main' .
12:51 ladsgroup@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
12:50 ladsgroup@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
12:50 ladsgroup@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
12:49 ladsgroup@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
12:49 ladsgroup@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
12:49 ladsgroup@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
12:28 jiji@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-jobrunner: apply
12:28 moritzm: installing tiff security updates
12:27 jiji@deploy2002: helmfile [codfw] START helmfile.d/services/mw-jobrunner: apply
12:09 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
12:08 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
12:07 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
12:07 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
11:58 cmooney@dns2005: END - running authdns-update
11:58 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
11:56 cmooney@dns2005: START - running authdns-update
11:56 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
11:56 jiji@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
11:56 jiji@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
11:56 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
11:56 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove old dns entries for lvs6xxx vlan sub-int IPs - cmooney@cumin1002"
11:56 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove old dns entries for lvs6xxx vlan sub-int IPs - cmooney@cumin1002"
11:55 jiji@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: sync
11:50 cmooney@cumin1002: START - Cookbook sre.dns.netbox
11:48 ladsgroup@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
11:47 ladsgroup@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
11:47 ladsgroup@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
11:46 ladsgroup@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
11:45 ladsgroup@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
11:45 ladsgroup@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
11:44 ladsgroup@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
11:43 effie: rolling restarting mw-api-int
11:43 ladsgroup@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
11:43 jiji@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-int: sync
11:36 jiji@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: sync
11:35 jiji@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: sync
11:35 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: sync
11:34 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: sync
11:28 jiji@deploy2002: scap failed: <KeyError> 'production' (scap version: 4.140.0) (duration: 13m 16s)
11:26 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase1045.eqiad.wmnet with OS bullseye
11:26 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002"
11:20 stevemunene@cumin1002: END (FAIL) - Cookbook sre.hadoop.roll-restart-masters (exit_code=99) restart masters for Hadoop analytics cluster: Restart of jvm daemons.
11:15 jiji@deploy2002: Started scap sync-world: (T383845) mw-(api-int|parsoid|jobrunner): switch all releases to PHP 8.1
11:08 klausman@deploy2002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
11:08 klausman@deploy2002: helmfile [staging] START helmfile.d/services/api-gateway: apply
11:06 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002"
11:05 stevemunene@cumin1002: START - Cookbook sre.hadoop.roll-restart-masters restart masters for Hadoop analytics cluster: Restart of jvm daemons.
10:51 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase1045.eqiad.wmnet with reason: host reimage
10:50 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
10:49 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply
10:48 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
10:48 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply
10:48 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase1045.eqiad.wmnet with reason: host reimage
10:39 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.migrate-service-ipip (exit_code=0) for role: wdqs::internal_scholarly@eqiad
10:39 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on (A:lvs-low-traffic-eqiad or A:lvs-secondary-eqiad) and A:bullseye and A:lvs
10:38 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on (A:lvs-low-traffic-eqiad or A:lvs-secondary-eqiad) and A:bullseye and A:lvs
10:36 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host restbase1045.eqiad.wmnet with OS bullseye
10:36 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase1044.eqiad.wmnet with OS bullseye
10:36 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1034.eqiad.wmnet to cluster eqiad and group D
10:36 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002"
10:34 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.migrate-service-ipip for role: wdqs::internal_scholarly@eqiad
10:34 vgutierrez@cumin1002: END (FAIL) - Cookbook sre.loadbalancer.migrate-service-ipip (exit_code=99) for role: wdqs::internal_scholarly@eqiad
10:34 klausman@deploy2002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
10:34 klausman@deploy2002: helmfile [staging] START helmfile.d/services/api-gateway: apply
10:31 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1034.eqiad.wmnet to cluster eqiad and group D
10:31 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.migrate-service-ipip for role: wdqs::internal_scholarly@eqiad
10:28 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.migrate-service-ipip (exit_code=0) for role: wdqs::internal_scholarly@codfw
10:28 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on (A:lvs-low-traffic-codfw or A:lvs-secondary-codfw) and A:bullseye and A:lvs
10:27 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on (A:lvs-low-traffic-codfw or A:lvs-secondary-codfw) and A:bullseye and A:lvs
10:25 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002"
10:21 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.migrate-service-ipip for role: wdqs::internal_scholarly@codfw
10:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1034.eqiad.wmnet
10:11 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase1044.eqiad.wmnet with reason: host reimage
10:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1034.eqiad.wmnet
10:08 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase1044.eqiad.wmnet with reason: host reimage
09:56 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host restbase1044.eqiad.wmnet with OS bullseye
09:56 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host restbase1043.eqiad.wmnet with OS bullseye
09:56 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002"
09:53 elukey@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - elukey@cumin1002"
09:42 volans: uploaded cumin_5.1.0 to apt.wikimedia.org bullseye-wikimedia
09:40 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on restbase1043.eqiad.wmnet with reason: host reimage
09:37 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) config_reloading P{lvs6001.drmrs.wmnet} and A:liberica (T384477)
09:37 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.admin config_reloading P{lvs6001.drmrs.wmnet} and A:liberica (T384477)
09:36 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on restbase1043.eqiad.wmnet with reason: host reimage
09:24 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host restbase1043.eqiad.wmnet with OS bullseye
09:22 stevemunene@cumin1002: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1204.eqiad.wmnet
09:20 stevemunene@cumin1002: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1204.eqiad.wmnet
09:15 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host restbase1045.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
09:12 gkyziridis@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
09:10 elukey@cumin2002: START - Cookbook sre.hosts.provision for host restbase1045.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
09:06 stevemunene@cumin1002: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1204.eqiad.wmnet
09:04 stevemunene@cumin1002: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1204.eqiad.wmnet
09:02 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs6001.drmrs.wmnet with OS bookworm
08:51 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs6001.drmrs.wmnet with reason: host reimage
08:48 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs6001.drmrs.wmnet with reason: host reimage
08:46 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs6001.drmrs.wmnet with OS bookworm
08:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti1034.eqiad.wmnet with OS bookworm
08:30 arnaudb@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on gerrit2003.wikimedia.org with reason: testing
08:28 kevinbazira@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'article-models' for release 'main' .
08:28 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti1034.eqiad.wmnet with reason: host reimage
08:25 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host restbase1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
08:24 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti1034.eqiad.wmnet with reason: host reimage
08:20 vgutierrez@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host lvs6001.drmrs.wmnet with OS bookworm
08:15 stevemunene@cumin1002: END (FAIL) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=99) for hosts an-worker1204.eqiad.wmnet
08:14 elukey@cumin2002: START - Cookbook sre.hosts.provision for host restbase1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
08:14 stevemunene@cumin1002: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1204.eqiad.wmnet
08:12 stevemunene@cumin1002: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker1204.eqiad.wmnet
08:10 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs6001.drmrs.wmnet with reason: host reimage
08:09 stevemunene@cumin1002: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker1204.eqiad.wmnet
08:06 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs6001.drmrs.wmnet with reason: host reimage
08:03 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host restbase1043.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
08:02 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti1034.eqiad.wmnet with OS bookworm
07:58 elukey@cumin2002: START - Cookbook sre.hosts.provision for host restbase1043.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
07:57 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host restbase1043.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
07:57 elukey@cumin1002: START - Cookbook sre.hosts.provision for host restbase1043.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
07:50 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs6001.drmrs.wmnet with OS bookworm
07:42 vgutierrez@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lvs6001.drmrs.wmnet with reason: depooled before reimage
07:42 krinkle@deploy2002: Finished scap sync-world: Backport for fatal-error: Ensure action=cache max-age is higher than response time (duration: 11m 28s)
07:41 vgutierrez: depool lvs6001 before being reimaged - T384477
07:35 krinkle@deploy2002: krinkle: Continuing with sync
07:33 krinkle@deploy2002: krinkle: Backport for fatal-error: Ensure action=cache max-age is higher than response time synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
07:30 krinkle@deploy2002: Started scap sync-world: Backport for fatal-error: Ensure action=cache max-age is higher than response time
07:24 marostegui@cumin1002: dbctl commit (dc=all): 'es2039 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P74215 and previous config saved to /var/cache/conftool/dbconfig/20250313-072403-root.json
07:21 marostegui@cumin1002: dbctl commit (dc=all): 'es1039 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P74214 and previous config saved to /var/cache/conftool/dbconfig/20250313-072141-root.json
07:19 stevemunene@cumin1002: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker[1200-1208].eqiad.wmnet
07:08 marostegui@cumin1002: dbctl commit (dc=all): 'es2039 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P74213 and previous config saved to /var/cache/conftool/dbconfig/20250313-070857-root.json
07:06 marostegui@cumin1002: dbctl commit (dc=all): 'es1039 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P74212 and previous config saved to /var/cache/conftool/dbconfig/20250313-070636-root.json
06:53 marostegui@cumin1002: dbctl commit (dc=all): 'es2039 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P74211 and previous config saved to /var/cache/conftool/dbconfig/20250313-065351-root.json
06:51 marostegui@cumin1002: dbctl commit (dc=all): 'es1039 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P74210 and previous config saved to /var/cache/conftool/dbconfig/20250313-065129-root.json
06:38 marostegui@cumin1002: dbctl commit (dc=all): 'es2039 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P74209 and previous config saved to /var/cache/conftool/dbconfig/20250313-063846-root.json
06:36 marostegui@cumin1002: dbctl commit (dc=all): 'es1039 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P74208 and previous config saved to /var/cache/conftool/dbconfig/20250313-063624-root.json
06:23 marostegui@cumin1002: dbctl commit (dc=all): 'es2039 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P74207 and previous config saved to /var/cache/conftool/dbconfig/20250313-062341-root.json
06:13 dzahn@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: security release
05:58 kevinbazira@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'article-models' for release 'main' .
05:40 kevinbazira@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'article-models' for release 'main' .
05:21 kevinbazira@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'article-models' for release 'main' .
03:10 dzahn@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: security release
03:06 dzahn@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: security release
03:00 dzahn@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: security release
02:58 dzahn@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: security release
02:56 dzahn@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: security release
02:55 dzahn@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: security release
02:53 dzahn@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: security release
02:11 pt1979@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host restbase1043.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
02:05 pt1979@cumin1002: START - Cookbook sre.hosts.provision for host restbase1043.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
01:13 Daimona: Manually fixing 5 bad abuse_filter_log rows in mediawikiwiki for T388732

2025-03-12

22:34 mforns@deploy2002: Finished deploy [airflow-dags/analytics@868fdba]: deploy CIM allow list update and DEPRECATED tags for Kubernetes migration (duration: 01m 17s)
22:33 mforns@deploy2002: Started deploy [airflow-dags/analytics@868fdba]: deploy CIM allow list update and DEPRECATED tags for Kubernetes migration
22:24 krinkle@deploy2002: Synchronized w/fatal-error.php: I1c677ca1cf7d (duration: 08m 41s)
21:19 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host restbase1045.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
21:17 jclark@cumin1002: START - Cookbook sre.hosts.provision for host restbase1045.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
21:17 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host restbase1043.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
21:15 jclark@cumin1002: START - Cookbook sre.hosts.provision for host restbase1043.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
21:15 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1257.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
21:14 Reedy: create translate tables on officewiki T380414
21:13 jclark@cumin1002: START - Cookbook sre.hosts.provision for host db1257.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
21:08 reedy@deploy2002: Synchronized wmf-config/: Various config changes (duration: 08m 42s)
20:57 Reedy: created wikilove tables on foundationwiki T381065
20:34 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host restbase1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
20:33 jclark@cumin1002: START - Cookbook sre.hosts.provision for host restbase1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
20:23 brett@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp4052.ulsfo.wmnet
20:23 jdrewniak@deploy2002: Finished scap sync-world: Backport for Fixes event logging for main menu button (T387768), Add donation banner images (T388446) (duration: 14m 42s)
20:16 jdrewniak@deploy2002: jdrewniak, jdlrobson: Continuing with sync
20:12 gmodena@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
20:12 gmodena@deploy2002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
20:11 jdrewniak@deploy2002: jdrewniak, jdlrobson: Backport for Fixes event logging for main menu button (T387768), Add donation banner images (T388446) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:08 jdrewniak@deploy2002: Started scap sync-world: Backport for Fixes event logging for main menu button (T387768), Add donation banner images (T388446)
20:06 brett@puppetserver1001: conftool action : set/pooled=no; selector: name=cp4052.ulsfo.wmnet
20:06 brett: Upgrading cp4052 (upload) to Varnish 7 (T378737)
20:06 gmodena@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
20:05 gmodena@deploy2002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
20:00 gmodena@deploy2002: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
20:00 gmodena@deploy2002: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
20:00 gmodena@deploy2002: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
19:59 gmodena@deploy2002: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
19:38 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-ats (exit_code=0) Rolling upgrade/restart of Apache Traffic Server on P{cp404[5-7].ulsfo.wmnet} and A:cp for 9.2.9-1wm1
19:07 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-ats Rolling upgrade/restart of Apache Traffic Server on P{cp403[7,9].ulsfo.wmnet} and A:cp for 9.2.9-1wm1
19:07 ebysans@deploy2002: Finished deploy [analytics/refinery@fe214cf]: Regular analytics weekly train [analytics/refinery@fe214cfb] (duration: 02m 47s)
19:05 ebysans@deploy2002: Started deploy [analytics/refinery@fe214cf]: Regular analytics weekly train [analytics/refinery@fe214cfb]
19:04 sandraebele: deploying refinery
19:02 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-ats (exit_code=0) Rolling upgrade/restart of Apache Traffic Server on P{cp404[0-3].ulsfo.wmnet} and A:cp for 9.2.9-1wm1
18:48 swfrench-wmf: mw-(api-ext|web): scaled latent 'next' deployments down to 1 pod - T383845
18:47 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
18:46 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-web: apply
18:46 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
18:46 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
18:43 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
18:43 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
18:43 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
18:43 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
18:36 Amir1: marking ~3K revisions with bad blobs (T351953)
18:35 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-ats Rolling upgrade/restart of Apache Traffic Server on P{cp404[0-3].ulsfo.wmnet} and A:cp for 9.2.9-1wm1
18:32 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-ats (exit_code=0) Rolling upgrade/restart of Apache Traffic Server on P{cp4048.ulsfo.wmnet} and A:cp for 9.2.9-1wm1
18:29 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-ats Rolling upgrade/restart of Apache Traffic Server on P{cp4048.ulsfo.wmnet} and A:cp for 9.2.9-1wm1
18:20 jhuneidi@deploy2002: rebuilt and synchronized wikiversions files: group1 to 1.44.0-wmf.20 refs T386215
18:19 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-ats (exit_code=0) Rolling upgrade/restart of Apache Traffic Server on P{cp4049.ulsfo.wmnet} and A:cp for 9.2.9-1wm1
18:16 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-ats Rolling upgrade/restart of Apache Traffic Server on P{cp4049.ulsfo.wmnet} and A:cp for 9.2.9-1wm1
18:07 swfrench-wmf: ran cumin -b8 -s90 'A:cp-text' 'run-puppet-agent -e "merging ATS Lua config change - T383845"'
17:44 sandraebele: deploying refinery source as part of weekly deployment train
17:37 swfrench-wmf: ran cumin 'A:cp-text' 'disable-puppet "merging ATS Lua config change - T383845"'
17:35 swfrench-wmf: mw-(api-ext|web): scaled 'main' releases back to normal size - T383845
17:34 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
17:34 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-web: apply
17:34 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
17:33 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
17:33 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
17:33 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
17:32 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
17:32 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
17:28 swfrench-wmf: mw-(api-ext|web): reverted all non-cookie-migrated traffic back to 'main' release - T383845
17:27 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-ats (exit_code=0) Rolling upgrade/restart of Apache Traffic Server on P{cp4050.ulsfo.wmnet} and A:cp for 9.2.9-1wm1
17:26 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
17:26 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-web: apply
17:25 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
17:25 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
17:24 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
17:24 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-web: apply
17:24 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-ats Rolling upgrade/restart of Apache Traffic Server on P{cp4050.ulsfo.wmnet} and A:cp for 9.2.9-1wm1
17:23 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
17:23 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
17:21 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
17:21 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
17:20 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
17:20 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
17:19 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
17:19 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
17:19 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
17:18 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
17:06 swfrench-wmf: mw-(api-ext|web): migrated 100% of residual PHP 7.4 traffic to 8.1 - T383845
17:06 swfrench@deploy2002: Finished scap sync-world: helmfile-only deployment to apply remaining 8.1 diffs on mw-(api-ext|web) - T383845 (duration: 05m 03s)
17:02 swfrench@deploy2002: Started scap sync-world: helmfile-only deployment to apply remaining 8.1 diffs on mw-(api-ext|web) - T383845
16:57 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
16:56 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
16:54 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
16:53 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
16:52 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
16:51 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
16:47 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
16:47 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
16:44 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-wmde: apply
16:44 swfrench@deploy2002: Stopping before sync operations
16:43 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-wmde: apply
16:43 swfrench@deploy2002: Started scap sync-world: No-sync scap run to update helmfile release values for mw-(api-ext|web) - T383845
16:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-research: apply
16:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-research: apply
16:42 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-search: apply
16:41 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-search: apply
16:39 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-ml: apply
16:39 jmm@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ganeti1034.eqiad.wmnet
16:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1034.eqiad.wmnet
16:39 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-ml: apply
16:36 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-product: apply
16:36 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/kartotherian: sync
16:35 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-product: apply
16:34 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/kartotherian: sync
16:29 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1034.eqiad.wmnet
16:24 moritzm: installing Redis security updates
16:07 godog: bounce mtail on centrallog1002 - hogging the cpu
16:06 moritzm: installing qemu security updates
16:00 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.admin (exit_code=0) config_reloading P{lvs6002.drmrs.wmnet} and A:liberica (T384477)
16:00 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.admin config_reloading P{lvs6002.drmrs.wmnet} and A:liberica (T384477)
15:55 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ganeti1034.eqiad.wmnet
15:48 jmm@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on ganeti1034.eqiad.wmnet with reason: remove from cluster for reimage
15:44 ladsgroup@deploy2002: Finished scap sync-world: Backport for Bump the thumbnail steps ratio to 5% (T360589) (duration: 11m 30s)
15:38 ladsgroup@deploy2002: ladsgroup: Continuing with sync
15:36 ladsgroup@deploy2002: ladsgroup: Backport for Bump the thumbnail steps ratio to 5% (T360589) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
15:33 ladsgroup@deploy2002: Started scap sync-world: Backport for Bump the thumbnail steps ratio to 5% (T360589)
15:30 mszabo@deploy2002: Finished scap sync-world: Backport for GlobalUserSelectQueryBuilder: Ignore unattached local users (T388125), http: Promote MultiHttpClient warnings to errors (T384717) (duration: 12m 01s)
15:24 mszabo@deploy2002: mszabo: Continuing with sync
15:22 mszabo@deploy2002: mszabo: Backport for GlobalUserSelectQueryBuilder: Ignore unattached local users (T388125), http: Promote MultiHttpClient warnings to errors (T384717) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
15:20 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1034.eqiad.wmnet
15:18 mszabo@deploy2002: Started scap sync-world: Backport for GlobalUserSelectQueryBuilder: Ignore unattached local users (T388125), http: Promote MultiHttpClient warnings to errors (T384717)
15:17 Emperor: storcli64 /c0 restart on ms-be1090 T384003
15:14 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs6002.drmrs.wmnet with OS bookworm
15:12 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/services/kartotherian: sync
15:11 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.migrate-service-ipip (exit_code=0) for role: wdqs::internal_main@eqiad
15:11 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on (A:lvs-low-traffic-eqiad or A:lvs-secondary-eqiad) and A:bullseye and A:lvs
15:10 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on (A:lvs-low-traffic-eqiad or A:lvs-secondary-eqiad) and A:bullseye and A:lvs
15:10 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/kartotherian: sync
15:06 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.migrate-service-ipip for role: wdqs::internal_main@eqiad
15:00 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.migrate-service-ipip (exit_code=0) for role: wdqs::internal_main@codfw
15:00 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on (A:lvs-low-traffic-codfw or A:lvs-secondary-codfw) and A:bullseye and A:lvs
14:59 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on (A:lvs-low-traffic-codfw or A:lvs-secondary-codfw) and A:bullseye and A:lvs
14:55 stevemunene@cumin1002: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker[1200-1208].eqiad.wmnet
14:55 stevemunene@cumin1002: END (PASS) - Cookbook sre.hadoop.init-hadoop-workers (exit_code=0) for hosts an-worker[1187-1199].eqiad.wmnet
14:55 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for Improve SPARQL query construction in SparqlHelper, Replace distinct-values SPARQL queries (T369079), Improve SPARQL query construction in SparqlHelper, Replace distinct-values SPARQL queries (T369079) (duration: 12m 58s)
14:53 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.migrate-service-ipip for role: wdqs::internal_main@codfw
14:53 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs6002.drmrs.wmnet with reason: host reimage
14:50 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
14:50 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
14:49 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs6002.drmrs.wmnet with reason: host reimage
14:48 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Continuing with sync
14:45 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for Improve SPARQL query construction in SparqlHelper, Replace distinct-values SPARQL queries (T369079), Improve SPARQL query construction in SparqlHelper, Replace distinct-values SPARQL queries (T369079) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:42 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Improve SPARQL query construction in SparqlHelper, Replace distinct-values SPARQL queries (T369079), Improve SPARQL query construction in SparqlHelper, Replace distinct-values SPARQL queries (T369079)
14:40 tgr@deploy2002: Finished scap sync-world: Backport for Remove Flow as the default talk system (T383569) (duration: 11m 32s)
14:37 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
14:36 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
14:33 tgr@deploy2002: zoe, tgr: Continuing with sync
14:32 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs6002.drmrs.wmnet with OS bookworm
14:31 tgr@deploy2002: zoe, tgr: Backport for Remove Flow as the default talk system (T383569) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:30 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/kartotherian: sync
14:29 elukey@deploy2002: helmfile [staging] START helmfile.d/services/kartotherian: sync
14:28 tgr@deploy2002: Started scap sync-world: Backport for Remove Flow as the default talk system (T383569)
14:26 vgutierrez@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lvs6002.drmrs.wmnet with reason: depooled before reimage
14:26 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
14:26 tgr@deploy2002: Finished scap sync-world: Backport for Add MP event stream for MassDelete workflows (T382147), Enable SUL3 signup for 50% of group 2 users (T384218), [enwiki] Throttle exemption for event (T388637) (duration: 11m 04s)
14:26 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
14:26 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
14:26 vgutierrez: depooling lvs6002 before getting reimaged - T384477
14:24 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
14:24 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
14:23 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
14:22 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
14:22 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
14:22 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
14:21 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
14:20 jforrester@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
14:20 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
14:19 tgr@deploy2002: jsn, tgr, superpes: Continuing with sync
14:19 jforrester@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
14:19 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
14:18 stevemunene@cumin1002: START - Cookbook sre.hadoop.init-hadoop-workers for hosts an-worker[1187-1199].eqiad.wmnet
14:18 jforrester@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
14:18 tgr@deploy2002: jsn, tgr, superpes: Backport for Add MP event stream for MassDelete workflows (T382147), Enable SUL3 signup for 50% of group 2 users (T384218), [enwiki] Throttle exemption for event (T388637) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:17 jforrester@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
14:17 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
14:17 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
14:16 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
14:16 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
14:15 tgr@deploy2002: Started scap sync-world: Backport for Add MP event stream for MassDelete workflows (T382147), Enable SUL3 signup for 50% of group 2 users (T384218), [enwiki] Throttle exemption for event (T388637)
14:13 jforrester@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
14:13 jforrester@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
13:44 jiji@deploy2002: Finished scap sync-world: Reverted 1126607 and 1126650 (duration: 04m 57s)
13:40 jiji@deploy2002: Started scap sync-world: Reverted 1126607 and 1126650
13:37 ladsgroup@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
13:36 ladsgroup@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
13:36 ladsgroup@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
13:34 sukhe: upgrade doh2002 to dnsdist 1.9.8
13:34 ladsgroup@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
13:34 ladsgroup@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
13:34 ladsgroup@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
13:34 ladsgroup@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
13:34 ladsgroup@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
13:33 ladsgroup@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
13:32 sukhe: upgrade doh1001 to dnsdist 1.9.8
13:32 ladsgroup@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
13:20 brouberol@deploy2002: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
13:20 brouberol@deploy2002: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
13:19 brouberol@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
13:19 brouberol@deploy2002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
13:01 Emperor: fio testing on ms-be2088 24 disks at once whilst resetting the controller T384003
12:27 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1034.eqiad.wmnet
12:24 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1034.eqiad.wmnet
12:23 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1034.eqiad.wmnet
12:23 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1034.eqiad.wmnet
12:22 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1034.eqiad.wmnet
12:10 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs6003.drmrs.wmnet with OS bookworm
11:55 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.migrate-service-ipip (exit_code=0) for role: wdqs::internal@eqiad
11:55 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on (A:lvs-low-traffic-eqiad or A:lvs-secondary-eqiad) and A:bullseye and A:lvs
11:54 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on (A:lvs-low-traffic-eqiad or A:lvs-secondary-eqiad) and A:bullseye and A:lvs
11:50 Emperor: fio testing on ms-be2088 24 disks at once T384003
11:44 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.migrate-service-ipip for role: wdqs::internal@eqiad
11:42 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs6003.drmrs.wmnet with reason: host reimage
11:39 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs6003.drmrs.wmnet with reason: host reimage
11:39 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.migrate-service-ipip (exit_code=0) for role: wdqs::internal@codfw
11:39 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on (A:lvs-low-traffic-codfw or A:lvs-secondary-codfw) and A:bullseye and A:lvs
11:38 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on (A:lvs-low-traffic-codfw or A:lvs-secondary-codfw) and A:bullseye and A:lvs
11:31 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.migrate-service-ipip for role: wdqs::internal@codfw
11:21 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs6003.drmrs.wmnet with OS bookworm
11:18 vgutierrez: reimage lvs6003 as a liberica instance - T384477
11:17 mvolz@deploy2002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
11:16 mvolz@deploy2002: helmfile [eqiad] START helmfile.d/services/citoid: apply
11:16 mvolz@deploy2002: helmfile [codfw] DONE helmfile.d/services/citoid: apply
11:15 mvolz@deploy2002: helmfile [codfw] START helmfile.d/services/citoid: apply
11:13 mvolz@deploy2002: helmfile [staging] DONE helmfile.d/services/citoid: apply
11:13 mvolz@deploy2002: helmfile [staging] START helmfile.d/services/citoid: apply
11:11 Emperor: fio testing on ms-be2088 while resetting controller T384003
11:05 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ms-be1091.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
11:05 elukey@cumin2002: START - Cookbook sre.hosts.provision for host ms-be1091.mgmt.eqiad.wmnet with chassis set policy GRACEFUL_RESTART
10:57 jiji@deploy2002: scap failed: <KeyError> 'production' (scap version: 4.140.0) (duration: 13m 54s)
10:53 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host puppetserver2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
10:48 elukey@cumin2002: START - Cookbook sre.hosts.provision for host puppetserver2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
10:44 jiji@deploy2002: Started scap sync-world: (T383845) mw-(api-int|parsoid|jobrunner): switch all releases to PHP 8.1
10:43 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host puppetserver2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
10:42 jynus: removing backup1002, backup2002 dbbackups user @ m1 T387892
10:38 elukey@cumin2002: START - Cookbook sre.hosts.provision for host puppetserver2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
10:36 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host puppetserver2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
10:36 elukey@cumin2002: START - Cookbook sre.hosts.provision for host puppetserver2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
10:19 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1037.eqiad.wmnet to cluster eqiad and group C
10:18 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1037.eqiad.wmnet to cluster eqiad and group C
10:17 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1037.eqiad.wmnet
10:14 jynus: removing backup1002, backup2002 dump user on es6,es7 T387892
10:14 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host puppetserver2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
10:13 moritzm: installing systemd bugfix updates from Bookworm point release
10:08 elukey@cumin2002: START - Cookbook sre.hosts.provision for host puppetserver2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
10:07 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1037.eqiad.wmnet
09:53 Emperor: fio testing on ms-be2088 T384003
09:45 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-main: apply
09:44 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-main: apply
09:33 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db1125.eqiad.wmnet
09:33 marostegui@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
09:33 marostegui@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1125.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1002"
09:32 marostegui@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db1125.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - marostegui@cumin1002"
09:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti1037.eqiad.wmnet with OS bookworm
09:16 marostegui@cumin1002: START - Cookbook sre.dns.netbox
09:10 marostegui@cumin1002: START - Cookbook sre.hosts.decommission for hosts db1125.eqiad.wmnet
09:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti1037.eqiad.wmnet with reason: host reimage
09:03 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti1037.eqiad.wmnet with reason: host reimage
08:55 oblivian@deploy2002: Finished scap sync-world: Updating k8s chart (duration: 03m 42s)
08:52 oblivian@deploy2002: Started scap sync-world: Updating k8s chart
08:50 slyngshede@dns1004: END - running authdns-update
08:48 slyngshede@dns1004: START - running authdns-update
08:45 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1176.eqiad.wmnet with reason: Maintenance
08:45 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2230.codfw.wmnet,db1125.eqiad.wmnet with reason: Maintenance
08:40 oblivian@deploy2002: Finished scap sync-world: Backport for noc/wiki.php: allow showing a single variable in json format (duration: 09m 34s)
08:37 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti1037.eqiad.wmnet with OS bookworm
08:33 oblivian@deploy2002: oblivian: Continuing with sync
08:33 oblivian@deploy2002: oblivian: Backport for noc/wiki.php: allow showing a single variable in json format synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
08:32 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db1176.eqiad.wmnet
08:30 oblivian@deploy2002: Started scap sync-world: Backport for noc/wiki.php: allow showing a single variable in json format
08:28 marostegui@cumin1002: START - Cookbook sre.mysql.upgrade for db1176.eqiad.wmnet
{{safesubst:SAL entry|1=08:25 hashar@deploy2002: Finished scap sync-world: Backport for Remove obsolete $wgAllowMicrodataAttributes, Remove wgArticlePlaceholderSearchIntegrationBackend (T207407), Remove obsolete CirrusSearch config, Fix wgCirrusSearchSimilarityProfiles, Remove Cognate legacy settings (T348526), [[gerrit:1125124|Remove obsolete $wgFlowMai}}
08:24 marostegui: Failover m5 from db1176 to db1228 - T388500
08:19 hashar@deploy2002: reedy, hashar: Continuing with sync
{{safesubst:SAL entry|1=08:16 hashar@deploy2002: reedy, hashar: Backport for Remove obsolete $wgAllowMicrodataAttributes, Remove wgArticlePlaceholderSearchIntegrationBackend (T207407), Remove obsolete CirrusSearch config, Fix wgCirrusSearchSimilarityProfiles, Remove Cognate legacy settings (T348526), [[gerrit:1125124|Remove obsolete $wgFlowMaintenanceMod}}
08:16 jmm@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ganeti1037.eqiad.wmnet
08:16 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1037.eqiad.wmnet
{{safesubst:SAL entry|1=08:12 hashar@deploy2002: Started scap sync-world: Backport for Remove obsolete $wgAllowMicrodataAttributes, Remove wgArticlePlaceholderSearchIntegrationBackend (T207407), Remove obsolete CirrusSearch config, Fix wgCirrusSearchSimilarityProfiles, Remove Cognate legacy settings (T348526), [[gerrit:1125124|Remove obsolete $wgFlowMain}}
08:03 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1037.eqiad.wmnet
08:02 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db[2160,2235].codfw.wmnet,db[1176,1217,1228].eqiad.wmnet with reason: m5 master switch T388500
07:26 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ganeti1037.eqiad.wmnet
03:12 eileen: civicrm upgraded from ec20a105 to 14afd1b8

2025-03-11

22:28 ejegg: payments-wiki upgraded from 6409fffa to 3d4dfab3
21:59 reedy@deploy2002: Synchronized private/: various cleanup (duration: 08m 45s)
20:52 dzahn@dns1004: END - running authdns-update
20:50 dzahn@dns1004: START - running authdns-update
20:49 jhuneidi@deploy2002: Finished scap sync-world: Backport for Silence TRX profiler in deferreds after autocreation (T388165), Silence TRX profiler in deferreds after autocreation (T388165) (duration: 13m 05s)
20:42 jhuneidi@deploy2002: jhuneidi, tgr: Continuing with sync
20:39 jhuneidi@deploy2002: jhuneidi, tgr: Backport for Silence TRX profiler in deferreds after autocreation (T388165), Silence TRX profiler in deferreds after autocreation (T388165) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:35 jhuneidi@deploy2002: Started scap sync-world: Backport for Silence TRX profiler in deferreds after autocreation (T388165), Silence TRX profiler in deferreds after autocreation (T388165)
20:18 jhuneidi@deploy2002: Finished scap sync-world: Backport for Deploy donate banner to test wiki for event logging testing (T387768) (duration: 12m 33s)
20:12 jhuneidi@deploy2002: ksarabia, jhuneidi: Continuing with sync
20:10 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: cloudelastic1012* for ban host prior to reimage - bking@cumin2002 - T387904
20:10 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: cloudelastic1012* for ban host prior to reimage - bking@cumin2002 - T387904
20:09 jhuneidi@deploy2002: ksarabia, jhuneidi: Backport for Deploy donate banner to test wiki for event logging testing (T387768) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:06 jhuneidi@deploy2002: Started scap sync-world: Backport for Deploy donate banner to test wiki for event logging testing (T387768)
19:51 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Unbanning all hosts in cloudelastic
19:51 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Unbanning all hosts in cloudelastic
19:37 jhuneidi@deploy2002: Finished scap sync-world: Backport for api: guard against undefined prop relations (T384627), api: guard against undefined prop relations (T384627) (duration: 09m 53s)
19:30 jhuneidi@deploy2002: reedy, jhuneidi: Continuing with sync
19:30 jhuneidi@deploy2002: reedy, jhuneidi: Backport for api: guard against undefined prop relations (T384627), api: guard against undefined prop relations (T384627) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
19:27 jhuneidi@deploy2002: Started scap sync-world: Backport for api: guard against undefined prop relations (T384627), api: guard against undefined prop relations (T384627)
19:04 bking@cumin2002: DONE (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for cloudelastic1011.eqiad.wmnet: Renew puppet certificate - bking@cumin2002
19:00 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudelastic1011.eqiad.wmnet with OS bullseye
18:34 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1011.eqiad.wmnet with reason: host reimage
18:30 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1011.eqiad.wmnet with reason: host reimage
18:25 jhuneidi@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.20 refs T386215
18:19 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1011.eqiad.wmnet with OS bullseye
17:58 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1011.eqiad.wmnet with OS bullseye
17:48 swfrench-wmf: mw-(api-ext|web): migrated 50% of residual PHP 7.4 traffic to 8.1 - T383845
17:46 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
17:46 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-web: apply
17:46 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
17:46 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
17:43 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
17:43 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-web: apply
17:43 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
17:43 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
17:39 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1011.eqiad.wmnet with reason: host reimage
17:39 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
17:38 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
17:38 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
17:38 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
17:35 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1011.eqiad.wmnet with reason: host reimage
17:35 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
17:35 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
17:34 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
17:34 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
17:24 swfrench@deploy2002: Finished scap sync-world: Deployment to pick up new php8.1 production image - T386006 (duration: 26m 26s)
17:24 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1011.eqiad.wmnet with OS bullseye
17:14 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-ats (exit_code=0) Rolling upgrade/restart of Apache Traffic Server on P{cp4051.ulsfo.wmnet} and A:cp for 9.2.9-1wm1
17:11 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-ats Rolling upgrade/restart of Apache Traffic Server on P{cp4051.ulsfo.wmnet} and A:cp for 9.2.9-1wm1
17:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1160 (T371742)', diff saved to https://phabricator.wikimedia.org/P74200 and previous config saved to /var/cache/conftool/dbconfig/20250311-171052-ladsgroup.json
16:58 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: cloudelastic1011* for ban host prior to reimage - bking@cumin2002 - T387904
16:58 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: cloudelastic1011* for ban host prior to reimage - bking@cumin2002 - T387904
16:58 swfrench@deploy2002: Started scap sync-world: Deployment to pick up new php8.1 production image - T386006
16:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P74199 and previous config saved to /var/cache/conftool/dbconfig/20250311-165545-ladsgroup.json
16:54 swfrench-wmf: rebuilt php8.1 production images to pick up PCRE2 backport from component/php81 - T386006
16:53 vgutierrez: test liberica 0.11 in lvs1013
16:53 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:52 vgutierrez: upload liberica 0.11 to bookworm-wikimedia (apt.wm.o)
16:51 herron@cumin1002: START - Cookbook sre.dns.netbox
16:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P74198 and previous config saved to /var/cache/conftool/dbconfig/20250311-164038-ladsgroup.json
16:36 brennen@deploy2002: Finished deploy [phabricator/deployment@714f3c7]: redeploy phab1004 for T309222 (duration: 01m 40s)
16:34 brennen@deploy2002: Started deploy [phabricator/deployment@714f3c7]: redeploy phab1004 for T309222
16:33 brennen@deploy2002: Finished deploy [phabricator/deployment@714f3c7]: redeploy phab2002 for T309222 (duration: 01m 03s)
16:33 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
16:33 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
16:32 brennen@deploy2002: Started deploy [phabricator/deployment@714f3c7]: redeploy phab2002 for T309222
16:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1160 (T371742)', diff saved to https://phabricator.wikimedia.org/P74197 and previous config saved to /var/cache/conftool/dbconfig/20250311-162530-ladsgroup.json
16:19 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revision-models' for release 'main' .
16:18 swfrench@deploy2002: Finished scap sync-world: No-op deploy to pick up mediawiki-deployments.yaml changes - T387917 (duration: 02m 42s)
16:16 swfrench@deploy2002: Started scap sync-world: No-op deploy to pick up mediawiki-deployments.yaml changes - T387917
16:03 brennen@deploy2002: Finished deploy [phabricator/deployment@714f3c7]: deploy phab1004 for T388551 (duration: 01m 02s)
16:01 brennen@deploy2002: Started deploy [phabricator/deployment@714f3c7]: deploy phab1004 for T388551
16:01 brennen@deploy2002: Finished deploy [phabricator/deployment@714f3c7]: deploy phab2002 for T388551 (duration: 00m 29s)
16:01 brennen@deploy2002: Started deploy [phabricator/deployment@714f3c7]: deploy phab2002 for T388551
15:59 dzahn@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab2002.codfw.wmnet with reason: phabricator deploy
15:59 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for kubestagemaster[2003-2005].codfw.wmnet
15:59 jelto@cumin1002: START - Cookbook sre.hosts.remove-downtime for kubestagemaster[2003-2005].codfw.wmnet
15:59 dzahn@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab1004.eqiad.wmnet with reason: phabricator deploy
15:58 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for kubestage[2001-2004].codfw.wmnet
15:58 jelto@cumin1002: START - Cookbook sre.hosts.remove-downtime for kubestage[2001-2004].codfw.wmnet
15:58 dzahn@cumin1002: DONE (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 0:30:00 on phab.wmfusercontent.org with reason: phabricator deploy
15:58 dzahn@cumin1002: DONE (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 0:30:00 on phabricator.wikimedia.org with reason: phabricator deploy
15:52 vgutierrez: upload liberica 0.10 to bookworm-wikimedia (apt.wm.o)
15:49 vgutierrez: test liberica 0.10 in lvs1013
15:45 jelto@deploy2002: helmfile [staging] DONE helmfile.d/services/ipoid: apply
15:45 jelto@deploy2002: helmfile [staging] START helmfile.d/services/ipoid: apply
15:37 klausman@deploy2002: helmfile [codfw] DONE helmfile.d/services/api-gateway: apply
15:36 klausman@deploy2002: helmfile [codfw] START helmfile.d/services/api-gateway: apply
15:33 brouberol@deploy2002: Finished scap sync-world: mediawiki: render configmaps when dumps are enabled - T388378 (duration: 02m 18s)
15:32 brouberol@deploy2002: Started scap sync-world: mediawiki: render configmaps when dumps are enabled - T388378
15:26 klausman@deploy2002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: apply
15:25 klausman@deploy2002: helmfile [eqiad] START helmfile.d/services/api-gateway: apply
15:25 Lucas_WMDE: UTC afternoon backport+config window done
15:24 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for Revert "ResourceLoader: Enable Less.php math=parens-division" (T388475 T388526), Enable SUL3 signup for 10% of group 2 users (T384218), Disable CX unified dashboard on idwiki (T387820) (duration: 17m 22s)
15:21 klausman@deploy2002: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
15:20 klausman@deploy2002: helmfile [staging] START helmfile.d/services/api-gateway: apply
15:18 lucaswerkmeister-wmde@deploy2002: sbisson, tgr, lucaswerkmeister-wmde: Continuing with sync
15:10 lucaswerkmeister-wmde@deploy2002: sbisson, tgr, lucaswerkmeister-wmde: Backport for Revert "ResourceLoader: Enable Less.php math=parens-division" (T388475 T388526), Enable SUL3 signup for 10% of group 2 users (T384218), Disable CX unified dashboard on idwiki (T387820) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
15:10 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
15:10 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
15:08 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
15:08 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
15:07 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Revert "ResourceLoader: Enable Less.php math=parens-division" (T388475 T388526), Enable SUL3 signup for 10% of group 2 users (T384218), Disable CX unified dashboard on idwiki (T387820)
14:58 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for Revert "Set `$wgCentralAuthLoginWiki` to correct default as documented" (duration: 11m 28s)
14:52 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, d3r1ck01: Continuing with sync
14:50 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, d3r1ck01: Backport for Revert "Set `$wgCentralAuthLoginWiki` to correct default as documented" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:49 godog: moving k8s-mlstaging off prometheus200[56] completed - T383232
14:47 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Revert "Set `$wgCentralAuthLoginWiki` to correct default as documented"
14:37 Lucas_WMDE: accidentally Ctrl+C’ed ongoing scap, was last seen at 80% sync-prod-k8s progress
14:31 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, abi: Continuing with sync
14:31 filippo@cumin1002: conftool action : set/pooled=yes; selector: name=prometheus2006.codfw.wmnet
14:24 filippo@cumin1002: conftool action : set/pooled=yes; selector: name=prometheus2008.codfw.wmnet
14:19 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, abi: Backport for EventLogging: Improve handling when suggestions are not present (T388467) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:16 filippo@cumin1002: conftool action : set/weight=10; selector: name=prometheus2008.codfw.wmnet
14:16 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for EventLogging: Improve handling when suggestions are not present (T388467)
14:15 filippo@cumin1002: conftool action : set/pooled=no; selector: name=prometheus2008.codfw.wmnet
14:15 filippo@cumin1002: conftool action : set/pooled=no; selector: name=prometheus2006.codfw.wmnet
14:15 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for Set `$wgCentralAuthLoginWiki` to correct default as documented (T388218) (duration: 11m 35s)
14:09 lucaswerkmeister-wmde@deploy2002: d3r1ck01, lucaswerkmeister-wmde: Continuing with sync
14:06 lucaswerkmeister-wmde@deploy2002: d3r1ck01, lucaswerkmeister-wmde: Backport for Set `$wgCentralAuthLoginWiki` to correct default as documented (T388218) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:03 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Set `$wgCentralAuthLoginWiki` to correct default as documented (T388218)
13:53 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for Remove Wikibase fixed RDF feature flag again (T384344) (duration: 09m 31s)
13:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T371742)', diff saved to https://phabricator.wikimedia.org/P74194 and previous config saved to /var/cache/conftool/dbconfig/20250311-135019-ladsgroup.json
13:48 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revision-models' for release 'main' .
13:47 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revision-models' for release 'main' .
13:47 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Continuing with sync
13:47 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for Remove Wikibase fixed RDF feature flag again (T384344) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:44 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Remove Wikibase fixed RDF feature flag again (T384344)
13:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P74193 and previous config saved to /var/cache/conftool/dbconfig/20250311-133512-ladsgroup.json
13:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P74192 and previous config saved to /var/cache/conftool/dbconfig/20250311-132005-ladsgroup.json
13:16 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
13:10 klausman@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
13:10 klausman@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
13:09 klausman@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
13:09 klausman@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
13:08 klausman@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
13:07 klausman@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
13:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T371742)', diff saved to https://phabricator.wikimedia.org/P74191 and previous config saved to /var/cache/conftool/dbconfig/20250311-130458-ladsgroup.json
12:57 marostegui: Poweroff db1246 T387673
12:57 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1246.eqiad.wmnet with reason: Maintenance
12:56 marostegui: Stop MariaDB on db1246 T387673
12:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1201 (T371742)', diff saved to https://phabricator.wikimedia.org/P74190 and previous config saved to /var/cache/conftool/dbconfig/20250311-125458-ladsgroup.json
12:54 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1201.eqiad.wmnet with reason: Maintenance
12:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1160 (T371742)', diff saved to https://phabricator.wikimedia.org/P74189 and previous config saved to /var/cache/conftool/dbconfig/20250311-125007-ladsgroup.json
12:50 ladsgroup@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1160.eqiad.wmnet with reason: Maintenance
12:42 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts kafka-main[1001-1005].eqiad.wmnet
12:42 jiji@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
12:42 jiji@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kafka-main[1001-1005].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1002"
12:41 jiji@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kafka-main[1001-1005].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1002"
12:40 elukey@puppetserver1001: conftool action : set/pooled=inactive; selector: name=maps1010.eqiad.wmnet,dc=eqiad,cluster=maps,service=kartotherian-k8s-ssl
12:40 elukey@puppetserver1001: conftool action : set/pooled=inactive; selector: name=maps1009.eqiad.wmnet,dc=eqiad,cluster=maps,service=kartotherian-k8s-ssl
12:39 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/services/kartotherian: sync
12:39 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/kartotherian: sync
12:38 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/kartotherian: sync
12:38 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/kartotherian: sync
12:38 jmm@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on ganeti1037.eqiad.wmnet with reason: remove from cluster for reimage
12:37 jiji@cumin1002: START - Cookbook sre.dns.netbox
12:37 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1037.eqiad.wmnet
12:35 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1254 gradually with 4 steps - Pool in for T385141
12:35 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'sync'.
12:34 elukey@deploy2002: helmfile [codfw] START helmfile.d/admin 'sync'.
12:34 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'sync'.
12:33 elukey@deploy2002: helmfile [eqiad] START helmfile.d/admin 'sync'.
12:31 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: sync
12:30 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: sync
12:23 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: sync
12:23 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: sync
12:18 ladsgroup@deploy2002: Finished scap sync-world: Backport for Bump thumbnail steps to 2% (T360589) (duration: 10m 18s)
12:16 elukey@puppetserver1001: conftool action : set/pooled=inactive; selector: name=maps2010.codfw.wmnet,dc=codfw,cluster=maps,service=kartotherian-k8s-ssl
12:16 elukey@puppetserver1001: conftool action : set/pooled=inactive; selector: name=maps2009.codfw.wmnet,dc=codfw,cluster=maps,service=kartotherian-k8s-ssl
12:11 jiji@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on kafka-main1005.eqiad.wmnet with reason: decom
12:11 ladsgroup@deploy2002: ladsgroup: Continuing with sync
12:11 jiji@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on kafka-main1004.eqiad.wmnet with reason: decom
12:11 jiji@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on kafka-main1003.eqiad.wmnet with reason: decom
12:11 jiji@cumin1002: START - Cookbook sre.hosts.decommission for hosts kafka-main[1001-1005].eqiad.wmnet
12:11 jiji@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on kafka-main1002.eqiad.wmnet with reason: decom
12:10 ladsgroup@deploy2002: ladsgroup: Backport for Bump thumbnail steps to 2% (T360589) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
12:09 jiji@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on kafka-main1001.eqiad.wmnet with reason: decom
12:08 ladsgroup@deploy2002: Started scap sync-world: Backport for Bump thumbnail steps to 2% (T360589)
12:04 ladsgroup@deploy2002: Finished scap sync-world: Backport for FileModule: Normalize file paths for deps tracked from CSSMin (T388323) (duration: 13m 41s)
11:55 ladsgroup@deploy2002: ladsgroup: Continuing with sync
11:55 ladsgroup@deploy2002: ladsgroup: Backport for FileModule: Normalize file paths for deps tracked from CSSMin (T388323) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
11:50 ladsgroup@deploy2002: Started scap sync-world: Backport for FileModule: Normalize file paths for deps tracked from CSSMin (T388323)
11:50 fceratto@cumin1002: START - Cookbook sre.mysql.pool db1254 gradually with 4 steps - Pool in for T385141
11:48 fceratto@cumin1002: dbctl commit (dc=all): 'Preparing db1254 for T385141', diff saved to https://phabricator.wikimedia.org/P74183 and previous config saved to /var/cache/conftool/dbconfig/20250311-114835-fceratto.json
11:43 ladsgroup@deploy2002: Finished scap sync-world: Backport for Stop loading the ActiveAbstract extension for dumps (T382069) (duration: 13m 36s)
11:41 Amir1: dropping transcache table everywhere (T376627)
11:34 ladsgroup@deploy2002: ladsgroup, jforrester: Continuing with sync
11:34 ladsgroup@deploy2002: ladsgroup, jforrester: Backport for Stop loading the ActiveAbstract extension for dumps (T382069) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
11:31 topranks: enable connections from ssw1-e1 and ssw1-f1 to new top-of-rack switches lsw1-e8 and lsw1-f8 in eqiad T382017
11:30 ladsgroup@deploy2002: Started scap sync-world: Backport for Stop loading the ActiveAbstract extension for dumps (T382069)
11:28 jelto@cumin1002: END (FAIL) - Cookbook sre.k8s.wipe-cluster (exit_code=99) Wipe the K8s cluster staging-codfw: Kubernetes upgrade
11:24 elukey@puppetserver1001: conftool action : set/pooled=inactive; selector: name=maps1008.eqiad.wmnet,dc=eqiad,cluster=maps,service=kartotherian-k8s-ssl
11:24 elukey@puppetserver1001: conftool action : set/pooled=inactive; selector: name=maps2008.codfw.wmnet,dc=codfw,cluster=maps,service=kartotherian-k8s-ssl
11:24 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "add new network switches - cmooney@cumin1002 - T382017"
11:23 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "add new network switches - cmooney@cumin1002 - T382017"
11:21 cmooney@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-f8-eqiad
11:19 MichaelG_WMF: migr@mwmaint2002: ran "time mwscript extensions/GrowthExperiments/maintenance/fixLinkRecommendationData.php --wiki=frwiki --db-table --verbose --force 2>&1 | tee ~/frwiki-dbtable.txt"
11:19 cmooney@cumin1002: START - Cookbook sre.network.tls for network device lsw1-f8-eqiad
11:18 cmooney@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-e8-eqiad
11:16 cmooney@cumin1002: START - Cookbook sre.network.tls for network device lsw1-e8-eqiad
11:16 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1253 gradually with 4 steps - Pool in for T385141
11:05 ladsgroup@deploy2002: ladsgroup, jforrester: Continuing with sync
11:05 ladsgroup@deploy2002: ladsgroup, jforrester: Backport for Stop loading the ActiveAbstract extension for dumps (T382069) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
10:59 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
10:59 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
10:56 elukey@puppetserver1001: conftool action : set/pooled=inactive; selector: name=maps2007.codfw.wmnet,dc=codfw,cluster=maps,service=kartotherian-k8s-ssl
10:56 elukey@puppetserver1001: conftool action : set/pooled=inactive; selector: name=maps1007.eqiad.wmnet,dc=eqiad,cluster=maps,service=kartotherian-k8s-ssl
10:55 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
10:54 jelto@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
10:54 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
10:53 jelto@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
10:41 moritzm: installing openjdk 17 security updates on puppet servers (the necessary restarts may cause a few interrupted puppet runs and will be splayed out)
10:37 ladsgroup@deploy2002: Started scap sync-world: Backport for Stop loading the ActiveAbstract extension for dumps (T382069)
10:36 jelto@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
10:30 fceratto@cumin1002: START - Cookbook sre.mysql.pool db1253 gradually with 4 steps - Pool in for T385141
10:22 marostegui: Deploy schema change on x1 commonswiki codfw master with replication dbmaint T385917
10:21 marostegui: Deploy schema change on s4 testcommonswiki codfw master with replication dbmaint T385917
10:18 jelto@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
10:15 dcausse@deploy2002: Finished deploy [airflow-dags/search@c27621d]: publish search artifacts (duration: 00m 29s)
10:14 dcausse@deploy2002: Started deploy [airflow-dags/search@c27621d]: publish search artifacts
10:06 jelto@cumin1002: START - Cookbook sre.k8s.wipe-cluster Wipe the K8s cluster staging-codfw: Kubernetes upgrade
10:05 jelto@cumin1002: END (FAIL) - Cookbook sre.k8s.wipe-cluster (exit_code=99) Wipe the K8s cluster staging-codfw: Kubernetes upgrade
10:01 jelto@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
09:58 jelto@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
09:57 jelto@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
09:57 jelto@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
09:55 jelto@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
09:55 jelto@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
09:53 vgutierrez@cumin1002: END (FAIL) - Cookbook sre.loadbalancer.liberica-admin (exit_code=1) depooling P{lvs4010.ulsfo.wmnet} and A:liberica
09:52 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.liberica-admin depooling P{lvs4010.ulsfo.wmnet} and A:liberica
09:48 jelto@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
09:48 jelto@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
09:41 kart_: Script run: `mwscript updateCollation.php --wiki=kkwiki --previous-collation=uppercase` (T384395)
09:32 elukey@puppetserver1001: conftool action : set/pooled=inactive; selector: name=maps1006.eqiad.wmnet,dc=eqiad,cluster=maps,service=kartotherian-k8s-ssl
09:32 elukey@puppetserver1001: conftool action : set/pooled=inactive; selector: name=maps2006.codfw.wmnet,dc=codfw,cluster=maps,service=kartotherian-k8s-ssl
09:28 jelto@cumin1002: START - Cookbook sre.k8s.wipe-cluster Wipe the K8s cluster staging-codfw: Kubernetes upgrade
08:59 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1037.eqiad.wmnet
08:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1037.eqiad.wmnet
08:51 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2144.codfw.wmnet,db1151.eqiad.wmnet with reason: Maintenance
08:50 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1037.eqiad.wmnet
08:47 kartik@deploy2002: Finished scap sync-world: Backport for Add uca collation for Kazakh (T384395) (duration: 12m 13s)
08:41 kartik@deploy2002: kartik, jhsoby: Continuing with sync
08:38 kartik@deploy2002: kartik, jhsoby: Backport for Add uca collation for Kazakh (T384395) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
08:37 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
08:37 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
08:37 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
08:37 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
08:35 kartik@deploy2002: Started scap sync-world: Backport for Add uca collation for Kazakh (T384395)
08:32 kartik@deploy2002: Finished scap sync-world: Backport for EventLogging: Improve handling when suggestions are not present (T388467) (duration: 26m 56s)
08:23 kartik@deploy2002: abi, kartik: Continuing with sync
08:12 kartik@deploy2002: abi, kartik: Backport for EventLogging: Improve handling when suggestions are not present (T388467) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
08:08 moritzm: installing systemd bugfix updates from Bookworm point release
08:05 kartik@deploy2002: Started scap sync-world: Backport for EventLogging: Improve handling when suggestions are not present (T388467)
08:02 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 8 hosts with reason: Cloning
08:00 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1217.eqiad.wmnet with reason: Maintenance
07:59 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1228.eqiad.wmnet with reason: Maintenance
07:23 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db1228.eqiad.wmnet
07:19 marostegui@cumin1002: START - Cookbook sre.mysql.upgrade for db1228.eqiad.wmnet
07:19 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1228.eqiad.wmnet with reason: Maintenance
07:13 marostegui: Failover m2 from db1228 to db1164 - T388396
07:00 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db[2160,2233].codfw.wmnet,db[1164,1217,1228].eqiad.wmnet with reason: Primary switchover m2 T388396
06:45 marostegui: Drop rt database from m1 T388437
06:45 marostegui: Remove rt grants from m1 T388437
04:03 mwpresync@deploy2002: Pruned MediaWiki: 1.44.0-wmf.17 (duration: 03m 02s)
03:54 eileen: civicrm upgraded from f2222fcd to ec20a105
03:52 mwpresync@deploy2002: Finished scap sync-world: testwikis to 1.44.0-wmf.20 refs T386215 (duration: 49m 13s)
03:03 mwpresync@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.20 refs T386215
00:22 aaron@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
00:21 aaron@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
00:18 aaron@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
00:13 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be2089.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
00:08 pt1979@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host restbase1045.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
00:07 pt1979@cumin1002: START - Cookbook sre.hosts.provision for host restbase1045.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
00:07 aaron@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: apply
00:03 pt1979@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host restbase1045.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
00:01 pt1979@cumin1002: START - Cookbook sre.hosts.provision for host restbase1045.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
00:00 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2089.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED

2025-03-10

23:47 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ms-be2089.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
23:40 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ms-be2089.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
23:38 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-be2089
23:38 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ms-be2089
23:38 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
23:38 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ms-be2089 to codfw - jhancock@cumin2002"
23:38 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ms-be2089 to codfw - jhancock@cumin2002"
23:34 jhancock@cumin2002: START - Cookbook sre.dns.netbox
23:31 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ms-be2089
23:31 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ms-be2089
21:48 tgr_: UTC late deploys done
21:48 tgr@deploy2002: Finished scap sync-world: Backport for Enable SUL3 signup for all of group 1 and 1% of group 2 users (T384007 T384218) (duration: 15m 21s)
21:42 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host restbase1043.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
21:41 tgr@deploy2002: tgr: Continuing with sync
21:41 jclark@cumin1002: START - Cookbook sre.hosts.provision for host restbase1043.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
21:35 tgr@deploy2002: tgr: Backport for Enable SUL3 signup for all of group 1 and 1% of group 2 users (T384007 T384218) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:32 tgr@deploy2002: Started scap sync-world: Backport for Enable SUL3 signup for all of group 1 and 1% of group 2 users (T384007 T384218)
21:28 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1257.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
21:23 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host restbase1043.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
21:22 jclark@cumin1002: START - Cookbook sre.hosts.provision for host db1257.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
21:21 jclark@cumin1002: START - Cookbook sre.hosts.provision for host restbase1043.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
21:21 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host restbase1045.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
21:17 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host restbase1043.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
21:17 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host restbase1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
21:15 jclark@cumin1002: START - Cookbook sre.hosts.provision for host restbase1045.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
21:14 jclark@cumin1002: START - Cookbook sre.hosts.provision for host restbase1044.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
21:14 fabfur: installed new benthos version (4.27.0-2 over 4.27.0-1) on cp4037 for testing'
21:14 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db1257.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
21:14 jclark@cumin1002: START - Cookbook sre.hosts.provision for host restbase1043.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
21:13 jclark@cumin1002: START - Cookbook sre.hosts.provision for host db1257.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART
21:11 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
21:11 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for restbase - jclark@cumin1002"
21:11 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added mgmt for restbase - jclark@cumin1002"
21:07 jclark@cumin1002: START - Cookbook sre.dns.netbox
20:31 dancy@deploy2002: Finished scap sync-world: Backport for CX3 Build 1.0.0+20250310 (T284422 T387036) (duration: 10m 46s)
20:25 dancy@deploy2002: sbisson, dancy: Continuing with sync
20:23 dancy@deploy2002: sbisson, dancy: Backport for CX3 Build 1.0.0+20250310 (T284422 T387036) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:20 dancy@deploy2002: Started scap sync-world: Backport for CX3 Build 1.0.0+20250310 (T284422 T387036)
20:19 dancy@deploy2002: Finished scap sync-world: Backport for Remove $wgAllowAuthenticatedCrossOrigin again (T322944) (duration: 11m 18s)
20:13 dancy@deploy2002: lucaswerkmeister, dancy: Continuing with sync
20:11 dancy@deploy2002: lucaswerkmeister, dancy: Backport for Remove $wgAllowAuthenticatedCrossOrigin again (T322944) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:08 dancy@deploy2002: Started scap sync-world: Backport for Remove $wgAllowAuthenticatedCrossOrigin again (T322944)
20:06 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-ats (exit_code=0) Rolling upgrade/restart of Apache Traffic Server on P{cp4038.ulsfo.wmnet} and A:cp for 9.2.9-1wm1
20:03 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-ats Rolling upgrade/restart of Apache Traffic Server on P{cp4038.ulsfo.wmnet} and A:cp for 9.2.9-1wm1
19:58 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1010.eqiad.wmnet with OS bullseye
19:52 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Unbanning all hosts in cloudelastic
19:52 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Unbanning all hosts in cloudelastic
19:25 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1010.eqiad.wmnet with reason: host reimage
19:22 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1010.eqiad.wmnet with reason: host reimage
19:14 ladsgroup@deploy2002: Finished scap sync-world: Backport for FileModule: Normalize file paths for deps tracked from CSSMin (T388323) (duration: 10m 53s)
19:11 cmooney@cumin1002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device lsw1-f8-eqiad.mgmt.eqiad.wmnet
19:08 ladsgroup@deploy2002: ladsgroup: Continuing with sync
19:06 ladsgroup@deploy2002: ladsgroup: Backport for FileModule: Normalize file paths for deps tracked from CSSMin (T388323) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
19:03 ladsgroup@deploy2002: Started scap sync-world: Backport for FileModule: Normalize file paths for deps tracked from CSSMin (T388323)
19:03 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1010.eqiad.wmnet with OS bullseye
19:02 cmooney@cumin1002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device lsw1-e8-eqiad.mgmt.eqiad.wmnet
18:44 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
18:44 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-f8-eqiad - cmooney@cumin1002"
18:44 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-f8-eqiad - cmooney@cumin1002"
18:39 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudelastic1010.eqiad.wmnet with OS bullseye
18:34 cmooney@cumin1002: START - Cookbook sre.dns.netbox
18:34 cmooney@cumin1002: START - Cookbook sre.network.provision for device lsw1-f8-eqiad.mgmt.eqiad.wmnet
18:32 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
18:32 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-e8-eqiad - cmooney@cumin1002"
18:27 cmooney@dns2005: END - running authdns-update
18:26 cmooney@dns2005: START - running authdns-update
18:25 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-e8-eqiad - cmooney@cumin1002"
18:21 cmooney@cumin1002: START - Cookbook sre.dns.netbox
18:21 cmooney@cumin1002: START - Cookbook sre.network.provision for device lsw1-e8-eqiad.mgmt.eqiad.wmnet
18:17 sukhe: restart pybal on lvs2013: not required but to clear up possible no restart alerts
18:16 sukhe: sudo cumin 'A:lvs-codfw' 'run-puppet-agent --enable "adding k8s-ingress-aux codfw"'
18:14 sukhe: restart pybal on lvs2014 for reverted aux-k8s change
18:12 brett@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp4044.ulsfo.wmnet
18:03 herron@puppetserver1001: conftool action : set/pooled=no; selector: name=aux-k8s-worker2004.codfw.wmnet
18:03 herron@puppetserver1001: conftool action : set/pooled=no; selector: name=aux-k8s-worker2002.codfw.wmnet
17:58 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1010.eqiad.wmnet with OS bullseye
17:58 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revision-models' for release 'main' .
17:56 herron@puppetserver1001: conftool action : set/pooled=yes; selector: name=aux-k8s-worker2005.codfw.wmnet
17:56 herron@puppetserver1001: conftool action : set/pooled=yes; selector: name=aux-k8s-worker2003.codfw.wmnet
17:55 sukhe: restart pybal on lvs2014
17:54 cgoubert@deploy2002: Finished scap sync-world: mw-cron to php 8.1 - T387916 (duration: 02m 49s)
17:52 cgoubert@deploy2002: Started scap sync-world: mw-cron to php 8.1 - T387916
17:49 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
17:48 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: apply
17:48 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
17:47 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: apply
17:47 swfrench-wmf: mw-(api-ext|web): migrated 25% of residual PHP 7.4 traffic to 8.1 - T383845
17:46 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
17:45 sukhe: sudo cumin 'A:lvs-codfw' 'disable-puppet "adding k8s-ingress-aux codfw"'T381417
17:45 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-web: apply
17:45 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
17:45 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-web: apply
17:44 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
17:44 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
17:44 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
17:43 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
17:40 brett: Upgrading cp4044 to Varnish 7 (T378737)
17:40 brett@puppetserver1001: conftool action : set/pooled=no; selector: name=cp4044.ulsfo.wmnet
17:38 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
17:38 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
17:37 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
17:37 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
17:36 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
17:35 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
17:35 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
17:35 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
17:14 dreamyjazz@deploy2002: Finished scap sync-world: Backport for Revert^2 "CommonSettings.php: Add $wgCentralAuthAutomaticVanishWiki" (duration: 10m 20s)
17:12 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['cloudelastic1010.eqiad.wmnet']
17:10 sukhe: sudo cumin 'A:lvs and A:eqiad' 'run-puppet-agent --enable "adding aux-k8s-ctrl codfw"'
17:08 sukhe: sudo cumin 'A:lvs and A:codfw' 'run-puppet-agent --enable "adding aux-k8s-ctrl codfw"'
17:08 dreamyjazz@deploy2002: dreamyjazz: Continuing with sync
17:08 dreamyjazz@deploy2002: dreamyjazz: Backport for Revert^2 "CommonSettings.php: Add $wgCentralAuthAutomaticVanishWiki" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
17:06 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudelastic1010.eqiad.wmnet']
17:06 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudelastic1010.eqiad.wmnet with OS bullseye
17:06 sukhe: lvs2013: restart pybal
17:04 dreamyjazz@deploy2002: Started scap sync-world: Backport for Revert^2 "CommonSettings.php: Add $wgCentralAuthAutomaticVanishWiki"
17:03 herron@puppetserver1001: conftool action : set/pooled=yes; selector: name=aux-k8s-worker2005.codfw.wmnet
17:02 herron@puppetserver1001: conftool action : set/pooled=yes; selector: name=aux-k8s-worker2004.codfw.wmnet
17:02 herron@puppetserver1001: conftool action : set/pooled=yes; selector: name=aux-k8s-worker2003.codfw.wmnet
17:02 herron@puppetserver1001: conftool action : set/pooled=yes; selector: name=aux-k8s-worker2002.codfw.wmnet
17:00 dancy@deploy2002: Installation of scap version "4.140.0" completed for 204 hosts
17:00 sukhe: restart pybal on lvs2014
16:59 sukhe: enable puppet on lvs2014
16:58 sukhe: restart pybal on lvs1020
16:55 dancy@deploy2002: Installing scap version "4.140.0" for 204 host(s)
16:51 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
16:50 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: apply
16:50 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
16:47 herron@puppetserver1001: conftool action : set/pooled=yes; selector: name=aux-k8s-ctrl2003.codfw.wmnet
16:47 herron@puppetserver1001: conftool action : set/pooled=yes; selector: name=aux-k8s-ctrl2002.codfw.wmnet
16:47 sukhe: sudo cumin 'A:lvs and (A:eqiad or A:codfw)' 'disable-puppet "adding aux-k8s-ctrl codfw"'
16:44 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1010.eqiad.wmnet with OS bullseye
16:43 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: apply
16:43 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
16:43 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: apply
16:43 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
16:42 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: apply
16:42 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
16:42 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: apply
16:39 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['cloudelastic1010.eqiad.wmnet']
16:33 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudelastic1010.eqiad.wmnet']
16:32 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudelastic1010.eqiad.wmnet']
16:31 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) k8s-ingress-aux.svc.codfw.wmnet on all recursors
16:31 sukhe@cumin1002: START - Cookbook sre.dns.wipe-cache k8s-ingress-aux.svc.codfw.wmnet on all recursors
16:30 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-ctrl.svc.codfw.wmnet on all recursors
16:30 sukhe@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-ctrl.svc.codfw.wmnet on all recursors
16:30 herron@dns1004: END - running authdns-update
16:29 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
16:29 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: apply
16:28 herron@dns1004: START - running authdns-update
16:17 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudelastic1010.eqiad.wmnet']
16:12 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) aux-k8s-ctrl.svc.codfw.wmnet on all recursors
16:12 sukhe@cumin1002: START - Cookbook sre.dns.wipe-cache aux-k8s-ctrl.svc.codfw.wmnet on all recursors
16:10 herron@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:10 herron@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: enabling aux-k8s codfw vips - herron@cumin1002"
16:10 herron@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: enabling aux-k8s codfw vips - herron@cumin1002"
16:09 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudelastic1010.eqiad.wmnet']
16:06 herron@cumin1002: START - Cookbook sre.dns.netbox
16:04 elukey@puppetserver1001: conftool action : set/pooled=inactive; selector: name=maps2005.codfw.wmnet,dc=codfw,cluster=maps,service=kartotherian-k8s-ssl
16:04 elukey@puppetserver1001: conftool action : set/pooled=inactive; selector: name=maps1005.eqiad.wmnet,dc=eqiad,cluster=maps,service=kartotherian-k8s-ssl
16:01 elukey@puppetserver1001: conftool action : set/weight=10; selector: name=wikikube-worker1.*,dc=eqiad,cluster=maps,service=kartotherian-k8s-ssl
16:01 elukey@puppetserver1001: conftool action : set/weight=10; selector: name=wikikube-worker2.*,dc=codfw,cluster=maps,service=kartotherian-k8s-ssl
16:00 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudelastic1010.eqiad.wmnet']
16:00 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host puppetserver2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
16:00 moritzm: imported keepalived 1:2.2.7-1~bpo11+1 to main component of bullseye-wikimedia T383557
15:59 elukey@cumin2002: START - Cookbook sre.hosts.provision for host puppetserver2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
15:58 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudelastic1010.eqiad.wmnet with OS bullseye
15:56 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host puppetserver2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
15:56 elukey@cumin2002: START - Cookbook sre.hosts.provision for host puppetserver2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
15:56 jdrewniak@deploy2002: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 02m 25s)
15:54 swfrench-wmf: reprepro update pcre2_10.42-1~wmf11+1 in component/pcre2 from apt-staging - T386006
15:53 jdrewniak@deploy2002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 08m 38s)
15:53 fceratto@cumin1002: dbctl commit (dc=all): 'Preparing db1253 T385141', diff saved to https://phabricator.wikimedia.org/P74174 and previous config saved to /var/cache/conftool/dbconfig/20250310-155332-fceratto.json
15:36 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1010.eqiad.wmnet with OS bullseye
15:30 moritzm: installing systemd bugfix updates from Bookworm point release
15:17 godog: repool prometheus200[56] - T383232
15:16 filippo@puppetserver1001: conftool action : set/pooled=yes; selector: name=prometheus2005.codfw.wmnet
15:16 filippo@puppetserver1001: conftool action : set/pooled=yes; selector: name=prometheus2006.codfw.wmnet
15:09 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'.
15:08 jayme@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'.
15:08 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
15:07 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'.
15:06 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
15:05 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
15:01 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti2046
15:01 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host puppetserver2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
15:01 elukey@cumin2002: START - Cookbook sre.hosts.provision for host puppetserver2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
15:01 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti2046
15:01 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:00 tgr_: UTC afternoon deploys done
14:58 jhancock@cumin2002: START - Cookbook sre.dns.netbox
14:58 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti2045
14:58 tgr@deploy2002: Finished scap sync-world: Backport for SpecialCentralAutoLogin: Handle nullable wiki ID (T388252), SUL3: Attach SUL mode to the return URL of local wiki (T388067), Log and add user IDs that mismatch in the runtime exception (T388177) (duration: 15m 48s)
14:55 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti2045
14:55 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:55 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ganeti2045 to codfw - jhancock@cumin2002"
14:53 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ganeti2045 to codfw - jhancock@cumin2002"
14:51 tgr@deploy2002: tgr: Continuing with sync
14:50 moritzm: installing pymysql security updates
14:50 tgr@deploy2002: tgr: Backport for SpecialCentralAutoLogin: Handle nullable wiki ID (T388252), SUL3: Attach SUL mode to the return URL of local wiki (T388067), Log and add user IDs that mismatch in the runtime exception (T388177) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:49 jhancock@cumin2002: START - Cookbook sre.dns.netbox
14:48 sukhe: sudo cumin 'P:durum' 'run-puppet-agent'
14:48 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host puppetserver2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
14:48 elukey@cumin2002: START - Cookbook sre.hosts.provision for host puppetserver2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
14:44 Emperor: restart swift on ms-fe2011
14:22 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Continuing with sync
14:22 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde: Backport for Clean up RDF feature flags again (T384344) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:22 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1028.eqiad.wmnet to cluster eqiad and group C
14:21 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1028.eqiad.wmnet to cluster eqiad and group C
14:19 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Clean up RDF feature flags again (T384344)
14:17 Lucas_WMDE: lucaswerkmeister-wmde@deploy2002 $ mwscript-k8s --comment=T356620 --follow -- namespaceDupes mnwwiktionary --fix | tee T356620
14:17 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for Enable CX unified dashboard on phase 2 wikis (T387820), Disallow editing modules for non-autoconfirmed users on the English Wikivoyage (T388301), mnwwiktionary: add thesaurus namespace (T356620) (duration: 11m 21s)
14:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1028.eqiad.wmnet
14:10 lucaswerkmeister-wmde@deploy2002: dreamrimmer, sbisson, anzx, lucaswerkmeister-wmde: Continuing with sync
14:08 lucaswerkmeister-wmde@deploy2002: dreamrimmer, sbisson, anzx, lucaswerkmeister-wmde: Backport for Enable CX unified dashboard on phase 2 wikis (T387820), Disallow editing modules for non-autoconfirmed users on the English Wikivoyage (T388301), mnwwiktionary: add thesaurus namespace (T356620) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:06 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1028.eqiad.wmnet
14:05 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Enable CX unified dashboard on phase 2 wikis (T387820), Disallow editing modules for non-autoconfirmed users on the English Wikivoyage (T388301), mnwwiktionary: add thesaurus namespace (T356620)
14:01 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
14:01 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
14:00 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/services/mediawiki-dumps-legacy: apply
14:00 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/services/mediawiki-dumps-legacy: apply
13:58 moritzm: installing libpgjava security updates
13:52 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti1028.eqiad.wmnet with OS bookworm
13:33 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti1028.eqiad.wmnet with reason: host reimage
13:26 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti1028.eqiad.wmnet with reason: host reimage
13:07 godog: test prometheus2007 as the sole host pooled in pybal - T383232
13:03 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti1028.eqiad.wmnet with OS bookworm
12:59 filippo@puppetserver1001: conftool action : set/pooled=no; selector: name=prometheus2006.codfw.wmnet
12:58 jmm@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ganeti1028.eqiad.wmnet
12:58 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1028.eqiad.wmnet
12:56 filippo@puppetserver1001: conftool action : set/pooled=yes; selector: name=prometheus2007.codfw.wmnet
12:55 moritzm: imported wmf-laptop 1.0.1 to apt.wikimedia.org
12:44 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1028.eqiad.wmnet
12:43 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ganeti1028.eqiad.wmnet
12:43 jmm@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ganeti1028.eqiad.wmnet
12:34 arnaudb@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on gerrit2003.wikimedia.org with reason: testing
12:33 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ganeti1028.eqiad.wmnet
12:27 jiji@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply
12:27 jiji@deploy2002: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply
12:26 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply
12:26 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply
12:25 cgoubert@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
12:25 jayme@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
12:24 jayme@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
12:24 jayme@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
12:24 jayme@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
12:24 jayme@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
12:23 jayme@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
12:23 jayme@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
12:23 jayme@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
12:23 jayme@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
12:23 jayme@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
12:23 cgoubert@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
12:22 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
12:22 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
12:22 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'.
12:21 jayme@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'.
12:21 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
12:21 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'.
12:18 jiji@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-jobrunner: apply
12:18 jiji@deploy2002: helmfile [codfw] START helmfile.d/services/mw-jobrunner: apply
12:18 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-jobrunner: apply
12:17 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-jobrunner: apply
12:16 jmm@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on ganeti1028.eqiad.wmnet with reason: remove from cluster for reimage
12:14 jiji@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
12:14 jiji@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
12:13 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
12:13 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
12:01 jayme@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
12:01 jayme@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
12:01 jayme@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
12:00 jayme@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
12:00 jayme@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
11:59 moritzm: installing iputils bugfixes updates
11:59 jayme@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
11:59 jayme@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
11:58 jayme@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
11:58 jayme@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
11:57 jayme@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
11:56 jayme@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
11:56 jayme@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
11:56 jayme@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
11:55 jayme@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
11:55 jayme@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
11:55 jayme@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'.
11:55 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'.
11:55 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1028.eqiad.wmnet
11:53 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'.
11:52 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'.
11:51 jayme@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
11:51 jayme@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
11:50 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'.
11:50 jayme@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'.
11:49 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
11:49 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'.
11:48 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
11:47 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
11:47 jayme@deploy1003: helmfile [codfw] DONE helmfile.d/admin 'apply'.
11:47 jayme@deploy1003: helmfile [codfw] START helmfile.d/admin 'apply'.
11:46 jayme@deploy1003: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
11:43 jayme@deploy1003: helmfile [eqiad] START helmfile.d/admin 'apply'.
11:42 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
11:42 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
11:41 ladsgroup@deploy2002: Finished scap sync-world: Backport for Set thumbnail steps to 1% of production (T360589) (duration: 10m 27s)
11:35 ladsgroup@deploy2002: ladsgroup: Continuing with sync
11:34 ladsgroup@deploy2002: ladsgroup: Backport for Set thumbnail steps to 1% of production (T360589) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
11:31 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/services/kartotherian: sync
11:31 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/kartotherian: sync
11:31 ladsgroup@deploy2002: Started scap sync-world: Backport for Set thumbnail steps to 1% of production (T360589)
11:31 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'sync'.
11:30 elukey@deploy2002: helmfile [eqiad] START helmfile.d/admin 'sync'.
11:29 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'sync'.
11:29 elukey@deploy2002: helmfile [codfw] START helmfile.d/admin 'sync'.
11:27 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/services/kartotherian: sync
11:25 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2142.codfw.wmnet,db1152.eqiad.wmnet with reason: Setting up
11:24 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2144.codfw.wmnet,db[1151-1152].eqiad.wmnet with reason: Setting up
11:21 marostegui@cumin1002: dbctl commit (dc=all): 'Set ms3 weights to 1 instead of 100', diff saved to https://phabricator.wikimedia.org/P74171 and previous config saved to /var/cache/conftool/dbconfig/20250310-112140-marostegui.json
11:20 marostegui@cumin1002: dbctl commit (dc=all): 'Add weight to ms1 hosts T387332', diff saved to https://phabricator.wikimedia.org/P74170 and previous config saved to /var/cache/conftool/dbconfig/20250310-112046-marostegui.json
11:17 marostegui@cumin1002: dbctl commit (dc=all): 'Push ms1 config T387332', diff saved to https://phabricator.wikimedia.org/P74169 and previous config saved to /var/cache/conftool/dbconfig/20250310-111742-marostegui.json
11:09 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 8 hosts with reason: Cloning
11:07 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/kartotherian: sync
11:06 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/kartotherian: sync
11:06 elukey@deploy2002: helmfile [staging] START helmfile.d/services/kartotherian: sync
11:03 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.migrate-service-ipip (exit_code=0) for role: wdqs::scholarly@eqiad
11:03 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on (A:lvs-low-traffic-eqiad or A:lvs-secondary-eqiad) and A:bullseye and A:lvs
11:02 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on (A:lvs-low-traffic-eqiad or A:lvs-secondary-eqiad) and A:bullseye and A:lvs
10:57 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dse-k8s-etcd1002.eqiad.wmnet with OS bookworm
10:55 moritzm: installing qemu security updates
10:45 ladsgroup@deploy2002: Finished deploy [dumps/dumps@afcb740]: Removing Yahoo! abstract dumps code (T382069) (duration: 00m 07s)
10:45 ladsgroup@deploy2002: Started deploy [dumps/dumps@afcb740]: Removing Yahoo! abstract dumps code (T382069)
10:45 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.migrate-service-ipip for role: wdqs::scholarly@eqiad
10:43 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.migrate-service-ipip (exit_code=0) for role: wdqs::scholarly@codfw
10:43 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on (A:lvs-low-traffic-codfw or A:lvs-secondary-codfw) and A:bullseye and A:lvs
10:42 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on (A:lvs-low-traffic-codfw or A:lvs-secondary-codfw) and A:bullseye and A:lvs
10:37 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.migrate-service-ipip for role: wdqs::scholarly@codfw
10:37 filippo@puppetserver1001: conftool action : set/pooled=no; selector: name=prometheus2005.codfw.wmnet
10:36 filippo@puppetserver1001: conftool action : set/pooled=no; selector: name=prometheus2007.codfw.wmnet
10:36 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dse-k8s-etcd1002.eqiad.wmnet with reason: host reimage
10:33 filippo@puppetserver1001: conftool action : set/weight=10; selector: name=prometheus2007.codfw.wmnet
10:32 stevemunene@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on dse-k8s-etcd1002.eqiad.wmnet with reason: host reimage
10:22 stevemunene@cumin1002: START - Cookbook sre.hosts.reimage for host dse-k8s-etcd1002.eqiad.wmnet with OS bookworm
10:16 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.migrate-service-ipip (exit_code=0) for role: wdqs::main@eqiad
10:16 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on (A:lvs-low-traffic-eqiad or A:lvs-secondary-eqiad) and A:bullseye and A:lvs
10:15 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on (A:lvs-low-traffic-eqiad or A:lvs-secondary-eqiad) and A:bullseye and A:lvs
10:15 godog: test moving k8s-mlstaging from prometheus2005 to prometheus2007 - T383232
10:07 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 8 hosts with reason: Cloning
10:07 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.migrate-service-ipip for role: wdqs::main@eqiad
10:05 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1217.eqiad.wmnet with reason: Reboot
10:04 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1164.eqiad.wmnet with reason: Reboot
10:04 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.migrate-service-ipip (exit_code=0) for role: wdqs::main@codfw
10:04 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on (A:lvs-low-traffic-codfw or A:lvs-secondary-codfw) and A:bullseye and A:lvs
10:03 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on (A:lvs-low-traffic-codfw or A:lvs-secondary-codfw) and A:bullseye and A:lvs
09:57 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.migrate-service-ipip for role: wdqs::main@codfw
09:39 elukey: run puppetserver.delete() for relforge100[567] and elastic110[456] - pending certificate requests since weeks ago, DSE confirmed those hosts are not in prod/used.
09:33 moritzm: installing exim4 bugfix updates from Bookworm point release
09:28 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db1164.eqiad.wmnet
09:23 marostegui@cumin1002: START - Cookbook sre.mysql.upgrade for db1164.eqiad.wmnet
09:23 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1164.eqiad.wmnet with reason: Reboot
09:20 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'article-descriptions' for release 'main' .
09:06 marostegui@cumin1002: dbctl commit (dc=all): 'Repool pc1 T387953', diff saved to https://phabricator.wikimedia.org/P74166 and previous config saved to /var/cache/conftool/dbconfig/20250310-090600-marostegui.json
08:42 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on pc2011.codfw.wmnet,pc1011.eqiad.wmnet with reason: Migration to 10.11
08:37 marostegui@cumin1002: dbctl commit (dc=all): 'Depool pc1 T387953', diff saved to https://phabricator.wikimedia.org/P74164 and previous config saved to /var/cache/conftool/dbconfig/20250310-083746-marostegui.json
08:31 awight: UTC morning backports are done
08:30 brouberol@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-analytics-product: apply
08:30 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1028.eqiad.wmnet
08:30 brouberol@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-analytics-product: apply
08:29 awight@deploy2002: Finished scap sync-world: Backport for Disallow editing modules for non-confirmed/non-autoconfirmed users on the English Wikivoyage (T388301) (duration: 24m 08s)
08:29 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1028.eqiad.wmnet
08:27 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1028.eqiad.wmnet
08:19 awight@deploy2002: awight, dreamrimmer: Continuing with sync
08:18 awight@deploy2002: awight, dreamrimmer: Backport for Disallow editing modules for non-confirmed/non-autoconfirmed users on the English Wikivoyage (T388301) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
08:09 marostegui: Failover m1 from db1164 to db1250 - T388024
08:07 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db[2160,2232].codfw.wmnet,db[1164,1217,1250].eqiad.wmnet with reason: Primary switchover m1 T388024
08:05 awight@deploy2002: Started scap sync-world: Backport for Disallow editing modules for non-confirmed/non-autoconfirmed users on the English Wikivoyage (T388301)
07:38 jmm@cumin2002: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Jdcc-berkman out of all services on: 1284 hosts
07:37 jmm@cumin2002: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Jdcc-berkman out of all services on: 961 hosts

2025-03-09

10:13 elukey@puppetserver1001: conftool action : set/weight=5; selector: name=wikikube-worker2.*,dc=codfw,cluster=maps,service=kartotherian-k8s-ssl
10:12 elukey@puppetserver1001: conftool action : set/weight=5; selector: name=wikikube-worker1.*,dc=eqiad,cluster=maps,service=kartotherian-k8s-ssl
10:12 elukey@puppetserver1001: conftool action : set/pooled=yes; selector: name=maps2005.codfw.wmnet,dc=codfw,cluster=maps,service=kartotherian-k8s-ssl
10:12 elukey@puppetserver1001: conftool action : set/pooled=yes; selector: name=maps1005.eqiad.wmnet,dc=eqiad,cluster=maps,service=kartotherian-k8s-ssl

2025-03-08

11:33 moritzm: truncated /var/log/syslog on seaborgium and bounced slapd
00:34 tzatziki: removing 3 files for legal compliance

2025-03-07

22:42 inflatador: bking@cloudelastic1009 exclude `cloudelastic1010` from master voting T387904
22:17 ryankemper: [Cloudelastic] Doing a `/_cluster/reroute?retry_failed=true` of all 3 elastic/opensearch clusters
22:02 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1009.eqiad.wmnet with OS bullseye
22:02 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Unbanning all hosts in cloudelastic
22:01 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Unbanning all hosts in cloudelastic
21:36 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1009.eqiad.wmnet with reason: host reimage
21:32 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1009.eqiad.wmnet with reason: host reimage
21:13 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1009.eqiad.wmnet with OS bullseye
21:12 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudelastic1009.eqiad.wmnet with OS bullseye
20:58 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1009.eqiad.wmnet with OS bullseye
20:49 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['cloudelastic1009.eqiad.wmnet']
20:44 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudelastic1009.eqiad.wmnet']
20:42 bking@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['cloudelastic1009.eqiad.wmnet']
20:32 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudelastic1009.eqiad.wmnet']
20:32 bking@cumin2002: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts ['cloudelastic1009.eqiad.wmnet']
20:32 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudelastic1009.eqiad.wmnet']
20:17 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudelastic1009.eqiad.wmnet']
20:17 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudelastic1009.eqiad.wmnet']
20:17 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['cloudelastic1009.eqiad.wmnet']
20:16 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudelastic1009.eqiad.wmnet']
20:16 bking@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudelastic1009.eqiad.wmnet']
20:08 bking@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudelastic1009.eqiad.wmnet']
20:06 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudelastic1009.eqiad.wmnet with OS bullseye
19:53 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1009.eqiad.wmnet with OS bullseye
19:53 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudelastic1009.eqiad.wmnet with OS bullseye
19:20 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1009.eqiad.wmnet with OS bullseye
17:38 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
17:38 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dns additions for eqiad E8/F8 links to new switches - cmooney@cumin1002"
17:38 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dns additions for eqiad E8/F8 links to new switches - cmooney@cumin1002"
17:32 cmooney@cumin1002: START - Cookbook sre.dns.netbox
17:26 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1202.eqiad.wmnet with OS bullseye
17:26 jclark@cumin1002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
17:18 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
17:18 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dns additions for eqiad E8/F8 links to new switches - cmooney@cumin1002"
17:14 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: dns additions for eqiad E8/F8 links to new switches - cmooney@cumin1002"
17:08 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
17:06 cmooney@cumin1002: START - Cookbook sre.dns.netbox
16:46 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1246', diff saved to https://phabricator.wikimedia.org/P74156 and previous config saved to /var/cache/conftool/dbconfig/20250307-164605-root.json
16:45 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1202.eqiad.wmnet with reason: host reimage
16:41 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1202.eqiad.wmnet with reason: host reimage
16:33 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1208.eqiad.wmnet with OS bullseye
16:33 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
16:32 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
16:29 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1207.eqiad.wmnet with OS bullseye
16:29 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
16:28 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
16:26 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1202.eqiad.wmnet with OS bullseye
16:25 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1206.eqiad.wmnet with OS bullseye
16:25 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
16:25 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
16:21 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1202.eqiad.wmnet with OS bullseye
16:20 sbassett: Deployed security patch for T387691
16:17 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1205.eqiad.wmnet with OS bullseye
16:17 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
16:16 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
16:09 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1208.eqiad.wmnet with reason: host reimage
16:05 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1207.eqiad.wmnet with reason: host reimage
16:04 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1203.eqiad.wmnet with OS bullseye
16:04 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
16:04 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1204.eqiad.wmnet with OS bullseye
16:04 jclark@cumin1002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
16:02 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1206.eqiad.wmnet with reason: host reimage
15:59 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1207.eqiad.wmnet with reason: host reimage
15:58 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revision-models' for release 'main' .
15:58 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1208.eqiad.wmnet with reason: host reimage
15:58 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1206.eqiad.wmnet with reason: host reimage
15:58 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revision-models' for release 'main' .
15:58 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revision-models' for release 'main' .
15:55 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
15:50 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
15:50 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1205.eqiad.wmnet with reason: host reimage
15:49 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "updating for renamed dell switches in eqiad - cmooney@cumin1002"
15:49 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "updating for renamed dell switches in eqiad - cmooney@cumin1002"
15:46 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1201.eqiad.wmnet with OS bullseye
15:46 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
15:46 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1205.eqiad.wmnet with reason: host reimage
15:46 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
15:44 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1207.eqiad.wmnet with OS bullseye
15:44 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1208.eqiad.wmnet with OS bullseye
15:43 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1206.eqiad.wmnet with OS bullseye
15:35 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:35 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: change dns names for eqiad rack e8 endpoints - cmooney@cumin1002"
15:35 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: change dns names for eqiad rack e8 endpoints - cmooney@cumin1002"
15:33 swfrench@deploy2002: Finished scap sync-world: helmfile-only deploy to reduce likelihood of deployment timeouts - T383845 (duration: 04m 33s)
15:32 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1204.eqiad.wmnet with reason: host reimage
15:31 swfrench@deploy2002: Started scap sync-world: helmfile-only deploy to reduce likelihood of deployment timeouts - T383845
15:31 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1205.eqiad.wmnet with OS bullseye
15:30 cmooney@cumin1002: START - Cookbook sre.dns.netbox
15:29 cmooney@cumin1002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
15:27 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1200.eqiad.wmnet with OS bullseye
15:27 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
15:27 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
15:25 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1203.eqiad.wmnet with reason: host reimage
15:24 cmooney@cumin1002: START - Cookbook sre.dns.netbox
15:24 cmooney@cumin1002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
15:22 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1201.eqiad.wmnet with reason: host reimage
15:20 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1203.eqiad.wmnet with reason: host reimage
15:20 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1204.eqiad.wmnet with reason: host reimage
15:18 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1201.eqiad.wmnet with reason: host reimage
15:15 cmooney@cumin1002: START - Cookbook sre.dns.netbox
15:13 klausman@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
15:12 klausman@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
15:08 klausman@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
15:06 klausman@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
15:05 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1203.eqiad.wmnet with OS bullseye
15:05 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1204.eqiad.wmnet with OS bullseye
15:04 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1200.eqiad.wmnet with reason: host reimage
15:04 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1202.eqiad.wmnet with OS bullseye
15:04 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1201.eqiad.wmnet with OS bullseye
15:01 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1200.eqiad.wmnet with reason: host reimage
14:46 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1200.eqiad.wmnet with OS bullseye
14:44 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1198.eqiad.wmnet with OS bullseye
14:44 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
14:12 elukey@puppetserver1001: conftool action : set/weight=10:pooled=yes; selector: name=wikikube-worker1.*,dc=eqiad,cluster=maps,service=kartotherian-k8s-ssl
14:12 elukey@puppetserver1001: conftool action : set/weight=10:pooled=yes; selector: name=wikikube-worker2.*,dc=codfw,cluster=maps,service=kartotherian-k8s-ssl
14:02 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1198.eqiad.wmnet with reason: host reimage
13:58 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1198.eqiad.wmnet with reason: host reimage
13:42 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1198.eqiad.wmnet with OS bullseye
13:25 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revision-models' for release 'main' .
13:21 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revision-models' for release 'main' .
13:19 isaranto@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revision-models' for release 'main' .
11:58 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host ms-be2088.codfw.wmnet
11:46 elukey@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be2088.codfw.wmnet
11:27 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host ms-be2088.codfw.wmnet
11:16 elukey@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be2088.codfw.wmnet
10:50 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1035.eqiad.wmnet to cluster eqiad and group A
10:48 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1035.eqiad.wmnet to cluster eqiad and group A
10:46 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1035.eqiad.wmnet
10:37 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1035.eqiad.wmnet
10:30 elukey@puppetserver1001: conftool action : set/pooled=inactive; selector: name=maps1005.eqiad.wmnet,dc=eqiad,cluster=maps,service=kartotherian-k8s-ssl
10:30 elukey@puppetserver1001: conftool action : set/pooled=inactive; selector: name=maps2005.codfw.wmnet,dc=codfw,cluster=maps,service=kartotherian-k8s-ssl
10:22 jmm@cumin2002: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Halfak out of all services on: 951 hosts
10:21 jmm@cumin2002: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Halfak out of all services on: 1284 hosts
10:13 moritzm: updated pwstore key for btullis
09:38 elukey@puppetserver1001: conftool action : set/weight=10; selector: name=wikikube-worker2.*,dc=codfw,cluster=maps,service=kartotherian-k8s-ssl
09:37 elukey@puppetserver1001: conftool action : set/weight=10; selector: name=wikikube-worker1.*,dc=eqiad,cluster=maps,service=kartotherian-k8s-ssl
09:27 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/services/kartotherian: sync
09:21 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/kartotherian: sync
09:20 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'sync'.
09:20 elukey@deploy2002: helmfile [codfw] START helmfile.d/admin 'sync'.
09:18 elukey@deploy2002: helmfile [codfw] DONE helmfile.d/services/kartotherian: sync
09:12 jelto@deploy2002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
09:09 jelto@deploy2002: helmfile [codfw] START helmfile.d/services/miscweb: apply
09:08 jelto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
09:07 elukey@deploy2002: helmfile [codfw] START helmfile.d/services/kartotherian: sync
09:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti1035.eqiad.wmnet with OS bookworm
09:05 jelto@deploy2002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
09:03 jelto@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
09:02 jelto@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply
08:55 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/kartotherian: sync
08:52 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/kartotherian: sync
08:50 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/services/kartotherian: sync
08:48 jayme: updated helmfile to 0.171.0-5 on deploy* - T387837
08:48 jayme: imported helmfile 0.171.0-5 to bullseye-wikimedia and bookworm-wikimedia - T387837
08:46 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti1035.eqiad.wmnet with reason: host reimage
08:43 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti1035.eqiad.wmnet with reason: host reimage
08:43 elukey@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'sync'.
08:42 elukey@deploy2002: helmfile [eqiad] START helmfile.d/admin 'sync'.
08:40 elukey@deploy2002: helmfile [eqiad] START helmfile.d/services/kartotherian: sync
08:39 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/kartotherian: sync
08:39 elukey@deploy2002: helmfile [staging] START helmfile.d/services/kartotherian: sync
08:21 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti1035.eqiad.wmnet with OS bookworm
08:15 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: sync
08:15 elukey@deploy2002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: sync
08:15 elukey@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
08:15 elukey@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: sync
08:12 moritzm: installing Linux 5.10.234 on Bullseye hosts (just the rollout of the new kernels, no immediate reboots involved)
08:07 jmm@cumin2002: DONE (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging JJMC89 out of all services on: 2 hosts
07:51 moritzm: installing emacs security updates
07:36 hashar@deploy2002: Finished deploy [releng/jenkins-deploy@34b35a5] (releasing): Upgrade to Jenkins LTS 2.492.2 (duration: 01m 23s)
07:35 hashar@deploy2002: Started deploy [releng/jenkins-deploy@34b35a5] (releasing): Upgrade to Jenkins LTS 2.492.2
07:31 hashar: Upgrading Jenkins on contint1002
01:06 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1201.eqiad.wmnet with OS bullseye
00:41 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1208.eqiad.wmnet with OS bullseye
00:41 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1207.eqiad.wmnet with OS bullseye
00:41 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1206.eqiad.wmnet with OS bullseye
00:40 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1205.eqiad.wmnet with OS bullseye
00:40 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1204.eqiad.wmnet with OS bullseye
00:40 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1203.eqiad.wmnet with OS bullseye
00:40 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1202.eqiad.wmnet with OS bullseye
00:01 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1201.eqiad.wmnet with OS bullseye

2025-03-06

23:19 joal@deploy2002: Finished deploy [analytics/refinery@64b629d]: emergency deploy for gobblin event_default recenchange memory issue - 2 (duration: 01m 13s)
23:18 joal@deploy2002: Started deploy [analytics/refinery@64b629d]: emergency deploy for gobblin event_default recenchange memory issue - 2
23:03 tgr@deploy2002: Finished scap sync-world: Backport for Enable SUL3 signup for 50% of group 1 users (T384007) (duration: 20m 55s)
22:56 tgr@deploy2002: tgr: Continuing with sync
22:45 tgr@deploy2002: tgr: Backport for Enable SUL3 signup for 50% of group 1 users (T384007) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
22:42 tgr@deploy2002: Started scap sync-world: Backport for Enable SUL3 signup for 50% of group 1 users (T384007)
22:39 toyofuku@deploy2002: Finished scap sync-world: Backport for Enable Search AB test for en wiki (duration: 18m 27s)
22:33 toyofuku@deploy2002: toyofuku, bwang: Continuing with sync
22:26 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1202.eqiad.wmnet onto db1253.eqiad.wmnet
22:23 toyofuku@deploy2002: toyofuku, bwang: Backport for Enable Search AB test for en wiki synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
22:21 toyofuku@deploy2002: Started scap sync-world: Backport for Enable Search AB test for en wiki
22:13 tgr@deploy2002: Finished scap sync-world: Backport for Revert^2 "Fix nested refs with the same name but a different group" (duration: 12m 44s)
22:06 tgr@deploy2002: tgr, ssastry: Continuing with sync
22:03 tgr@deploy2002: tgr, ssastry: Backport for Revert^2 "Fix nested refs with the same name but a different group" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
22:00 tgr@deploy2002: Started scap sync-world: Backport for Revert^2 "Fix nested refs with the same name but a different group"
21:55 tgr@deploy2002: Finished scap sync-world: Backport for Remove unused $wgDiscussionToolsABTest, Remove unused $wgOATHAuthMultipleDevicesMigrationStage, Deduplicate JsonConfig config (duration: 15m 00s)
21:54 otto@deploy2002: Finished deploy [analytics/refinery@ec4c468]: 'emergency deploy for gobblin event_default recenchange memory issue' (duration: 01m 55s)
21:53 otto@deploy2002: Started deploy [analytics/refinery@ec4c468]: 'emergency deploy for gobblin event_default recenchange memory issue'
21:49 tgr@deploy2002: matmarex, tgr: Continuing with sync
21:43 tgr@deploy2002: matmarex, tgr: Backport for Remove unused $wgDiscussionToolsABTest, Remove unused $wgOATHAuthMultipleDevicesMigrationStage, Deduplicate JsonConfig config synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:40 tgr@deploy2002: Started scap sync-world: Backport for Remove unused $wgDiscussionToolsABTest, Remove unused $wgOATHAuthMultipleDevicesMigrationStage, Deduplicate JsonConfig config
21:32 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: cloudelastic1009* for ban host prior to reimage - bking@cumin2002 - T387904
21:32 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: cloudelastic1009* for ban host prior to reimage - bking@cumin2002 - T387904
19:49 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
19:48 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: apply
19:11 ebernhardson: T379002 start reindex of cirrus cebwiki_content index in codfw
19:10 btullis@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host an-presto1014.eqiad.wmnet
19:09 ebernhardson: T379002 start reindex of cirrus cebwiki_content index in eqiad
19:06 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1180.eqiad.wmnet with OS bullseye
19:06 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1002"
19:05 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1002"
19:04 ebernhardson@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
19:04 ebernhardson@deploy2002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
18:58 ebernhardson@deploy2002: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
18:58 ebernhardson@deploy2002: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
18:45 swfrench-wmf: mw-web: migrated 5% of residual PHP 7.4 traffic to 8.1 - T383845
18:45 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
18:45 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-web: apply
18:43 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1180.eqiad.wmnet with reason: host reimage
18:40 vriley@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1180.eqiad.wmnet with reason: host reimage
18:39 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
18:38 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-web: apply
18:37 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
18:37 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
18:30 andrew@dns1004: END - running authdns-update
18:28 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
18:28 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
18:28 andrew@dns1004: START - running authdns-update
18:27 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1178.eqiad.wmnet with OS bullseye
18:27 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1002"
18:26 swfrench-wmf: mw-api-ext: migrated 5% of residual PHP 7.4 traffic to 8.1 - T383845
18:26 ebernhardson@deploy2002: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
18:26 ebernhardson@deploy2002: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
18:25 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1180.eqiad.wmnet with OS bullseye
18:25 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - vriley@cumin1002"
18:24 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1198.eqiad.wmnet with OS bullseye
18:23 ebernhardson@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
18:23 ebernhardson@deploy2002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
18:23 ebernhardson@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
18:23 ebernhardson@deploy2002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
18:23 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
18:23 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
18:17 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
18:17 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
18:16 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
18:16 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
18:14 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1199.eqiad.wmnet with OS bullseye
18:14 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
18:13 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-worker1200.eqiad.wmnet with OS bullseye
18:08 ebernhardson@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
18:08 ebernhardson@deploy2002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
18:06 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
18:06 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
18:02 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1178.eqiad.wmnet with reason: host reimage
17:56 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
17:55 vriley@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1178.eqiad.wmnet with reason: host reimage
17:51 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply
17:50 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-media: apply
17:50 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-presto1014.eqiad.wmnet
17:44 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply
17:43 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply
17:42 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1197.eqiad.wmnet with OS bullseye
17:42 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
17:42 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
17:40 vriley@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1178.eqiad.wmnet with OS bullseye
17:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1196.eqiad.wmnet with OS bullseye
17:38 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
17:38 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
17:36 jmm@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ganeti1035.eqiad.wmnet
17:36 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1035.eqiad.wmnet
17:33 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1199.eqiad.wmnet with reason: host reimage
17:30 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1199.eqiad.wmnet with reason: host reimage
17:22 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1035.eqiad.wmnet
17:22 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1189.eqiad.wmnet with OS bullseye
17:22 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
17:22 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ganeti1035.eqiad.wmnet
17:21 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
17:21 jmm@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ganeti1035.eqiad.wmnet
17:18 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1197.eqiad.wmnet with reason: host reimage
17:17 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1200.eqiad.wmnet with OS bullseye
17:16 moritzm: installing avahi security updates
17:16 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1192.eqiad.wmnet with OS bullseye
17:16 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
17:16 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
17:15 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1199.eqiad.wmnet with OS bullseye
17:15 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1196.eqiad.wmnet with reason: host reimage
17:15 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1191.eqiad.wmnet with OS bullseye
17:15 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
17:14 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
17:14 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1197.eqiad.wmnet with reason: host reimage
17:12 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1196.eqiad.wmnet with reason: host reimage
17:10 isaranto@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revision-models' for release 'main' .
17:10 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1193.eqiad.wmnet with OS bullseye
17:10 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
17:09 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
17:07 isaranto@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revision-models' for release 'main' .
17:06 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1194.eqiad.wmnet with OS bullseye
17:06 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
17:05 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
17:03 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1198.eqiad.wmnet with OS bullseye
17:02 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1195.eqiad.wmnet with OS bullseye
17:02 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
17:02 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
16:58 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1197.eqiad.wmnet with OS bullseye
16:58 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1190.eqiad.wmnet with OS bullseye
16:58 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
16:58 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1189.eqiad.wmnet with reason: host reimage
16:58 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
16:56 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1196.eqiad.wmnet with OS bullseye
16:55 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1188.eqiad.wmnet with OS bullseye
16:55 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
16:55 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1189.eqiad.wmnet with reason: host reimage
16:55 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
16:53 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1192.eqiad.wmnet with reason: host reimage
16:50 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1191.eqiad.wmnet with reason: host reimage
16:48 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ganeti1035.eqiad.wmnet
16:46 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1193.eqiad.wmnet with reason: host reimage
16:42 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1194.eqiad.wmnet with reason: host reimage
16:41 jmm@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on ganeti1035.eqiad.wmnet with reason: remove from cluster for reimage
16:39 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1195.eqiad.wmnet with reason: host reimage
16:38 reedy@deploy2002: Synchronized wmf-config/: Various config cleanup (duration: 08m 31s)
16:35 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1190.eqiad.wmnet with reason: host reimage
16:32 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1188.eqiad.wmnet with reason: host reimage
16:29 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1193.eqiad.wmnet with reason: host reimage
16:28 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1194.eqiad.wmnet with reason: host reimage
16:28 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1195.eqiad.wmnet with reason: host reimage
16:27 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1192.eqiad.wmnet with reason: host reimage
16:27 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1191.eqiad.wmnet with reason: host reimage
16:27 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1190.eqiad.wmnet with reason: host reimage
16:27 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1188.eqiad.wmnet with reason: host reimage
16:19 tgr_: UTC afternoon deploys done
16:17 tgr@deploy2002: Finished scap sync-world: Backport for Enable SUL3 signup for 10% of group 1 users (T384007) (duration: 14m 10s)
16:15 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1035.eqiad.wmnet
16:14 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1193.eqiad.wmnet with OS bullseye
16:13 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1195.eqiad.wmnet with OS bullseye
16:13 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1194.eqiad.wmnet with OS bullseye
16:13 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1192.eqiad.wmnet with OS bullseye
16:12 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1191.eqiad.wmnet with OS bullseye
16:12 fceratto@cumin1002: START - Cookbook sre.mysql.clone of db1202.eqiad.wmnet onto db1253.eqiad.wmnet
16:12 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1190.eqiad.wmnet with OS bullseye
16:12 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1189.eqiad.wmnet with OS bullseye
16:12 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1188.eqiad.wmnet with OS bullseye
16:11 tgr@deploy2002: tgr: Continuing with sync
16:10 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.clone (exit_code=99) of db1202.eqiad.wmnet onto db1253.eqiad.wmnet
16:09 fceratto@cumin1002: START - Cookbook sre.mysql.clone of db1202.eqiad.wmnet onto db1253.eqiad.wmnet
16:08 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.clone (exit_code=99) of db1202.eqiad.wmnet onto db1253.eqiad.wmnet
16:08 fceratto@cumin1002: START - Cookbook sre.mysql.clone of db1202.eqiad.wmnet onto db1253.eqiad.wmnet
16:06 tgr@deploy2002: tgr: Backport for Enable SUL3 signup for 10% of group 1 users (T384007) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
16:03 tgr@deploy2002: Started scap sync-world: Backport for Enable SUL3 signup for 10% of group 1 users (T384007)
15:56 hashar@deploy2002: Finished scap sync-world: Backport for Use namespaced Title class (T388085) (duration: 22m 00s)
15:50 hashar@deploy2002: hashar, daimona: Continuing with sync
15:39 hashar@deploy2002: hashar, daimona: Backport for Use namespaced Title class (T388085) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
15:34 hashar@deploy2002: Started scap sync-world: Backport for Use namespaced Title class (T388085)
15:31 hashar@deploy2002: Finished scap sync-world: Backport for [Growth] Set default api lookahead size to 10 (T325990), Revert "Let sysops add/remove the event-organizer group by default" (T386738), Remove unused route file from Wikibase REST API configuration (T383774) (duration: 10m 23s)
15:29 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1187.eqiad.wmnet with OS bullseye
15:29 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
15:27 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
15:24 hashar@deploy2002: itamar, sgimeno, daimona, hashar: Continuing with sync
15:24 hashar@deploy2002: itamar, sgimeno, daimona, hashar: Backport for [Growth] Set default api lookahead size to 10 (T325990), Revert "Let sysops add/remove the event-organizer group by default" (T386738), Remove unused route file from Wikibase REST API configuration (T383774) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
15:22 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.migrate-service-ipip (exit_code=0) for role: wdqs::public@eqiad
15:22 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on (A:lvs-low-traffic-eqiad or A:lvs-secondary-eqiad) and A:bullseye and A:lvs
15:21 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on (A:lvs-low-traffic-eqiad or A:lvs-secondary-eqiad) and A:bullseye and A:lvs
15:20 hashar@deploy2002: Started scap sync-world: Backport for [Growth] Set default api lookahead size to 10 (T325990), Revert "Let sysops add/remove the event-organizer group by default" (T386738), Remove unused route file from Wikibase REST API configuration (T383774)
15:09 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.migrate-service-ipip for role: wdqs::public@eqiad
15:03 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.migrate-service-ipip (exit_code=0) for role: wdqs::public@codfw
15:03 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on (A:lvs-low-traffic-codfw or A:lvs-secondary-codfw) and A:bullseye and A:lvs
15:02 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1187.eqiad.wmnet with reason: host reimage
15:02 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on (A:lvs-low-traffic-codfw or A:lvs-secondary-codfw) and A:bullseye and A:lvs
14:58 hashar@deploy2002: hashar, sgimeno, itamar, daimona: Backport for [Growth] Set default api lookahead size to 10 (T325990), Revert "Let sysops add/remove the event-organizer group by default" (T386738), Remove unused route file from Wikibase REST API configuration (T383774) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:57 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1187.eqiad.wmnet with reason: host reimage
14:57 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.migrate-service-ipip for role: wdqs::public@codfw
14:56 hashar@deploy2002: Started scap sync-world: Backport for [Growth] Set default api lookahead size to 10 (T325990), Revert "Let sysops add/remove the event-organizer group by default" (T386738), Remove unused route file from Wikibase REST API configuration (T383774)
14:53 hashar@deploy2002: Finished scap sync-world: Backport for Revert "Fix nested refs with the same name but a different group", Test new term store config in beta (T385592), Growth: remove unused config wgGENewcomerTasksOresTopicConfigTitle, Drop $wmgCampaignEventsProgramsAndEventsDashboardEnabled (T387025) (duration: 12m 10s)
14:47 hashar@deploy2002: ollieshotton, migr, daimona, hashar: Continuing with sync
14:47 hashar@deploy2002: ollieshotton, migr, daimona, hashar: Backport for Revert "Fix nested refs with the same name but a different group", Test new term store config in beta (T385592), Growth: remove unused config wgGENewcomerTasksOresTopicConfigTitle, Drop $wmgCampaignEventsProgramsAndEventsDashboardEnabled (T387025) synced to the testservers (h
14:17 hashar@deploy2002: Sync cancelled.
14:11 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1233.eqiad.wmnet onto db1254.eqiad.wmnet
14:02 hashar@deploy2002: hashar, ihurbain: Backport for Fix nested refs with the same name but a different group (T387800) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:59 hashar@deploy2002: Started scap sync-world: Backport for Fix nested refs with the same name but a different group (T387800)
13:13 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db1197.eqiad.wmnet
13:13 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1197 gradually with 4 steps - Upgrading db1197
13:11 moritzm: installing gst-plugins-base1.0 security updates
13:02 btullis@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
13:01 btullis@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
12:57 mfossati@deploy2002: Finished deploy [airflow-dags/platform_eng@fa4513d]: say hello to image suggestions v1.0.0 (duration: 01m 09s)
12:56 mfossati@deploy2002: Started deploy [airflow-dags/platform_eng@fa4513d]: say hello to image suggestions v1.0.0
12:30 marostegui@cumin1002: dbctl commit (dc=all): 'db2152 (re)pooling @ 100%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74134 and previous config saved to /var/cache/conftool/dbconfig/20250306-123017-root.json
12:28 fceratto@cumin1002: START - Cookbook sre.mysql.pool db1197 gradually with 4 steps - Upgrading db1197
12:24 moritzm: installing krb5 security updates
12:21 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db1197 - Upgrading db1197
12:21 fceratto@cumin1002: START - Cookbook sre.mysql.depool db1197 - Upgrading db1197
12:20 fceratto@cumin1002: START - Cookbook sre.mysql.upgrade for db1197.eqiad.wmnet
12:15 moritzm: imported lshw 02.19.git.2021.06.19.996aaad9c7-2~bpo11+1 to component/lshw T383557
12:15 marostegui@cumin1002: dbctl commit (dc=all): 'db2152 (re)pooling @ 75%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74131 and previous config saved to /var/cache/conftool/dbconfig/20250306-121512-root.json
12:00 marostegui@cumin1002: dbctl commit (dc=all): 'db2152 (re)pooling @ 50%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74130 and previous config saved to /var/cache/conftool/dbconfig/20250306-120007-root.json
11:53 marostegui@cumin1002: dbctl commit (dc=all): 'db1209 (re)pooling @ 100%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74129 and previous config saved to /var/cache/conftool/dbconfig/20250306-115357-root.json
11:45 marostegui@cumin1002: dbctl commit (dc=all): 'db2152 (re)pooling @ 25%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74127 and previous config saved to /var/cache/conftool/dbconfig/20250306-114501-root.json
11:44 topranks: applying interface-specific arp policer on cr2-magru to IX.BR sub-interface ae0.3347 (T384774)
11:39 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.migrate-service-ipip (exit_code=0) for role: druid::public::worker@eqiad
11:39 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on (A:lvs-low-traffic-eqiad or A:lvs-secondary-eqiad) and A:bullseye and A:lvs
11:38 marostegui@cumin1002: dbctl commit (dc=all): 'db1209 (re)pooling @ 75%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74125 and previous config saved to /var/cache/conftool/dbconfig/20250306-113852-root.json
11:37 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on (A:lvs-low-traffic-eqiad or A:lvs-secondary-eqiad) and A:bullseye and A:lvs
11:36 hnowlan: Migrating 12 wikis to use mobileapps/pcs without restbase
11:34 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db2230.codfw.wmnet
11:34 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.migrate-service-ipip for role: druid::public::worker@eqiad
11:29 marostegui@cumin1002: dbctl commit (dc=all): 'db2152 (re)pooling @ 10%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74124 and previous config saved to /var/cache/conftool/dbconfig/20250306-112955-root.json
11:29 fceratto@cumin1002: START - Cookbook sre.mysql.upgrade for db2230.codfw.wmnet
11:24 vgutierrez@cumin1002: END (FAIL) - Cookbook sre.loadbalancer.migrate-service-ipip (exit_code=99) for role: druid::public::worker@eqiad
11:23 marostegui@cumin1002: dbctl commit (dc=all): 'db1209 (re)pooling @ 50%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74121 and previous config saved to /var/cache/conftool/dbconfig/20250306-112346-root.json
11:19 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.upgrade (exit_code=99) for db2230.codfw.wmnet
11:18 fceratto@cumin1002: START - Cookbook sre.mysql.upgrade for db2230.codfw.wmnet
11:18 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.migrate-service-ipip for role: druid::public::worker@eqiad
11:08 marostegui@cumin1002: dbctl commit (dc=all): 'db1209 (re)pooling @ 25%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74118 and previous config saved to /var/cache/conftool/dbconfig/20250306-110841-root.json
10:53 marostegui@cumin1002: dbctl commit (dc=all): 'db1209 (re)pooling @ 10%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74116 and previous config saved to /var/cache/conftool/dbconfig/20250306-105335-root.json
10:16 marostegui: Drop phabricator_search.search_documentfield_BKUP T387174
10:14 volans@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
10:14 volans@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Unblock others adds an-worker1186 - volans@cumin1002"
10:14 volans@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Unblock others adds an-worker1186 - volans@cumin1002"
10:10 volans@cumin1002: START - Cookbook sre.dns.netbox
10:10 volans@cumin1002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
09:51 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1035.eqiad.wmnet
09:49 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1035.eqiad.wmnet
09:47 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1035.eqiad.wmnet
09:46 volans: disabling iDrac's WebServer.HostHeaderCheck on the remaining hosts that have it - T382416
09:35 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dse-k8s-etcd1003.eqiad.wmnet with OS bookworm
09:28 jynus: deploy additional grants to m1 T387892
09:22 moritzm: installing openssh security updates
09:13 hashar@deploy2002: rebuilt and synchronized wikiversions files: group2 to 1.44.0-wmf.19 refs T386214
09:10 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1032.eqiad.wmnet to cluster eqiad and group A
09:09 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1032.eqiad.wmnet to cluster eqiad and group A
08:56 volans@cumin1002: START - Cookbook sre.dns.netbox
08:56 dcausse@deploy2002: Finished scap sync-world: Backport for cirrus: configure wgCirrusSearchLanguageKeywordExtraFields (T271776) (duration: 11m 53s)
08:55 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1032.eqiad.wmnet
08:50 dcausse@deploy2002: dcausse: Continuing with sync
08:47 dcausse@deploy2002: dcausse: Backport for cirrus: configure wgCirrusSearchLanguageKeywordExtraFields (T271776) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
08:44 dcausse@deploy2002: Started scap sync-world: Backport for cirrus: configure wgCirrusSearchLanguageKeywordExtraFields (T271776)
08:41 dcausse: adding https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/1121666 to the "UTC morning backport window"
08:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1032.eqiad.wmnet
08:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti1032.eqiad.wmnet with OS bookworm
07:56 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dse-k8s-etcd1003.eqiad.wmnet with reason: host reimage
07:52 stevemunene@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on dse-k8s-etcd1003.eqiad.wmnet with reason: host reimage
07:51 fceratto@cumin1002: START - Cookbook sre.mysql.clone of db1233.eqiad.wmnet onto db1254.eqiad.wmnet
07:46 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti1032.eqiad.wmnet with reason: host reimage
07:42 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti1032.eqiad.wmnet with reason: host reimage
07:38 stevemunene@cumin1002: START - Cookbook sre.hosts.reimage for host dse-k8s-etcd1003.eqiad.wmnet with OS bookworm
07:25 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti1032.eqiad.wmnet with OS bookworm
07:24 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dse-k8s-etcd1001.eqiad.wmnet with OS bookworm
06:25 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dse-k8s-etcd1001.eqiad.wmnet with reason: host reimage
06:25 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1209.eqiad.wmnet with reason: Index rebuild
06:25 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db1209.eqiad.wmnet
06:22 stevemunene@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on dse-k8s-etcd1001.eqiad.wmnet with reason: host reimage
06:19 marostegui@cumin1002: START - Cookbook sre.mysql.upgrade for db1209.eqiad.wmnet
06:17 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1209 T388093', diff saved to https://phabricator.wikimedia.org/P74112 and previous config saved to /var/cache/conftool/dbconfig/20250306-061736-marostegui.json
06:16 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db1193 to s8 primary T388093', diff saved to https://phabricator.wikimedia.org/P74111 and previous config saved to /var/cache/conftool/dbconfig/20250306-061650-marostegui.json
06:16 marostegui: Starting s8 eqiad failover from db1209 to db1193 - T388093
06:16 root@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2152.codfw.wmnet with reason: Index rebuild
06:15 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db2152.codfw.wmnet
06:12 stevemunene@cumin1002: START - Cookbook sre.hosts.reimage for host dse-k8s-etcd1001.eqiad.wmnet with OS bookworm
06:11 marostegui@cumin1002: dbctl commit (dc=all): 'Remove db1193 from API/vslow/dump T388093', diff saved to https://phabricator.wikimedia.org/P74110 and previous config saved to /var/cache/conftool/dbconfig/20250306-061133-marostegui.json
06:11 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 32 hosts with reason: Primary switchover s8 T388093
06:10 marostegui@cumin1002: dbctl commit (dc=all): 'Set db1193 with weight 0 T388093', diff saved to https://phabricator.wikimedia.org/P74109 and previous config saved to /var/cache/conftool/dbconfig/20250306-061052-marostegui.json
06:08 marostegui@cumin1002: START - Cookbook sre.mysql.upgrade for db2152.codfw.wmnet
06:08 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2152', diff saved to https://phabricator.wikimedia.org/P74108 and previous config saved to /var/cache/conftool/dbconfig/20250306-060842-marostegui.json
05:42 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1184.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
05:28 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1180.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
05:25 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1185.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
05:20 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1185.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
05:19 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1185.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
05:14 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1185.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
05:13 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
05:13 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [an-worker1185] - vriley@cumin1002"
05:13 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [an-worker1185] - vriley@cumin1002"
05:09 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1184.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
05:08 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1180.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
05:08 vriley@cumin1002: START - Cookbook sre.dns.netbox
05:07 vriley@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-worker1184
05:07 vriley@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host an-worker1184
05:06 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
05:06 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [an-worker1184] - vriley@cumin1002"
05:06 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [an-worker1184] - vriley@cumin1002"
05:04 vriley@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-worker1180
05:02 vriley@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host an-worker1180
05:01 vriley@cumin1002: START - Cookbook sre.dns.netbox
05:01 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
05:01 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [an-worker1180] - vriley@cumin1002"
05:01 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [an-worker1180] - vriley@cumin1002"
05:00 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1182.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
04:56 vriley@cumin1002: START - Cookbook sre.dns.netbox
04:52 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1179.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
04:47 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1179.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
04:45 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1179.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
04:45 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1178.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
04:44 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1182.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
04:42 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
04:42 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [an-worker1182] - vriley@cumin1002"
04:42 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [an-worker1182] - vriley@cumin1002"
04:39 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1179.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
04:38 vriley@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-worker1179
04:38 vriley@cumin1002: START - Cookbook sre.dns.netbox
04:37 vriley@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host an-worker1179
04:36 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
04:36 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [an-worker1179] - vriley@cumin1002"
04:36 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt [an-worker1179] - vriley@cumin1002"
04:31 vriley@cumin1002: START - Cookbook sre.dns.netbox
04:28 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1178.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
04:26 vriley@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-worker1178
04:25 vriley@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host an-worker1178
04:24 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
04:24 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt an-worker1178 - vriley@cumin1002"
04:24 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt an-worker1178 - vriley@cumin1002"
04:20 vriley@cumin1002: START - Cookbook sre.dns.netbox
04:04 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2045.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
03:58 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2045.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
03:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti2045.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
03:49 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2045.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
03:48 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti2045.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
03:48 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2045.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
03:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2046.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
03:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2046.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
03:25 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ganeti2045.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
03:14 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2045.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
03:10 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti2045.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
03:10 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2045.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
03:10 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti2047.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
03:10 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti2046.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
03:10 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti2045.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
03:09 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2045.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
03:09 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti2049.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
03:09 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti2048.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
03:09 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti2050.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
03:08 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti2045.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
03:08 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2045.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
03:06 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti2045.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
03:06 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2045.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
03:05 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti2045.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
03:05 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2045.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
03:04 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti2045.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
03:03 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2050.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
03:03 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2049.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
03:03 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2048.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
03:03 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2047.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
03:03 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2046.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
03:03 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2045.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
02:43 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti2050
02:43 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti2049
02:42 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti2048
02:42 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti2047
02:42 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti2046
02:42 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti2045
02:42 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti2050
02:42 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti2049
02:42 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti2048
02:42 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti2047
02:42 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti2046
02:42 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti2045
02:41 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
02:41 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ganeti2049 to codfw - jhancock@cumin2002"
02:41 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ganeti2049 to codfw - jhancock@cumin2002"
02:39 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1008.eqiad.wmnet with OS bullseye
02:37 jhancock@cumin2002: START - Cookbook sre.dns.netbox
02:32 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
02:32 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ganeti2045 to codfw - jhancock@cumin2002"
02:32 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ganeti2045 to codfw - jhancock@cumin2002"
02:27 jhancock@cumin2002: START - Cookbook sre.dns.netbox
02:26 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
02:23 jhancock@cumin2002: START - Cookbook sre.dns.netbox
01:19 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-worker1181.eqiad.wmnet with OS bullseye
01:19 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
01:19 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
01:18 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1008.eqiad.wmnet with OS bullseye
00:58 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-worker1181.eqiad.wmnet with reason: host reimage
00:55 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-worker1181.eqiad.wmnet with reason: host reimage
00:40 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1181.eqiad.wmnet with OS bullseye
00:33 zabe: zabe@mwmaint2002:~$ mwscript extensions/WikimediaMaintenance/migrateESRefToContentTableStage2.php commonswiki --delete /home/zabe/text_table_cleanup/commonswiki --sleep 0.5 # T183490
00:09 tgr_: UTC late deploys done
00:08 tgr@deploy2002: Finished scap sync-world: Backport for Roll out SUL3 signup to 1% of users on most group 1 wikis (T384007) (duration: 29m 13s)
00:02 tgr@deploy2002: tgr: Continuing with sync

2025-03-05

23:42 tgr@deploy2002: tgr: Backport for Roll out SUL3 signup to 1% of users on most group 1 wikis (T384007) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
23:39 tgr@deploy2002: Started scap sync-world: Backport for Roll out SUL3 signup to 1% of users on most group 1 wikis (T384007)
23:39 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1181.eqiad.wmnet with OS bullseye
23:36 tgr@deploy2002: Finished scap sync-world: Backport for Preserve usesul3 flag during autologin (T375788), Preserve usesul3 flag during autologin (T375788), Clean up SUL3 config (T384007) (duration: 18m 53s)
23:29 tgr@deploy2002: tgr: Continuing with sync
23:20 tgr@deploy2002: tgr: Backport for Preserve usesul3 flag during autologin (T375788), Preserve usesul3 flag during autologin (T375788), Clean up SUL3 config (T384007) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
23:17 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudelastic1008.eqiad.wmnet with OS bullseye
23:17 tgr@deploy2002: Started scap sync-world: Backport for Preserve usesul3 flag during autologin (T375788), Preserve usesul3 flag during autologin (T375788), Clean up SUL3 config (T384007)
23:04 tgr@deploy2002: Finished scap sync-world: Backport for Revert^2 "Invert Parsoid read view wiktionary configs" (duration: 12m 13s)
23:03 ebernhardson@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
23:02 ebernhardson@deploy2002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
22:57 tgr@deploy2002: tgr: Continuing with sync
22:54 tgr@deploy2002: tgr: Backport for Revert^2 "Invert Parsoid read view wiktionary configs" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
22:53 ebernhardson@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
22:53 ebernhardson@deploy2002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
22:51 tgr@deploy2002: Started scap sync-world: Backport for Revert^2 "Invert Parsoid read view wiktionary configs"
22:29 tgr@deploy2002: Finished scap sync-world: Backport for Revert^2 "Turn on Parsoid Read Views for 44 wiktionaries" (T387505) (duration: 12m 06s)
22:24 ebernhardson@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
22:24 ebernhardson@deploy2002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
22:23 tgr@deploy2002: tgr: Continuing with sync
22:20 tgr@deploy2002: tgr: Backport for Revert^2 "Turn on Parsoid Read Views for 44 wiktionaries" (T387505) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
22:17 tgr@deploy2002: Started scap sync-world: Backport for Revert^2 "Turn on Parsoid Read Views for 44 wiktionaries" (T387505)
21:57 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1008.eqiad.wmnet with OS bullseye
21:56 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-worker1181.eqiad.wmnet with OS bullseye
21:54 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudelastic1008.eqiad.wmnet with OS bullseye
21:53 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1008.eqiad.wmnet with OS bullseye
21:50 tgr@deploy2002: Finished scap sync-world: Backport for Revert "Invert Parsoid read view wiktionary configs", Revert "Turn on Parsoid Read Views for 44 wiktionaries" (duration: 09m 30s)
21:44 tgr@deploy2002: tgr: Continuing with sync
21:44 tgr@deploy2002: tgr: Backport for Revert "Invert Parsoid read view wiktionary configs", Revert "Turn on Parsoid Read Views for 44 wiktionaries" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:41 tgr@deploy2002: Started scap sync-world: Backport for Revert "Invert Parsoid read view wiktionary configs", Revert "Turn on Parsoid Read Views for 44 wiktionaries"
21:29 ebernhardson@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
21:28 ebernhardson@deploy2002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
21:26 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
21:26 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-web: apply
21:25 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
21:25 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
21:24 ebernhardson@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
21:24 ebernhardson@deploy2002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
21:23 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1181.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
21:22 tgr@deploy2002: Finished scap sync-world: Backport for Turn on Parsoid Read Views for 44 wiktionaries (T387505), Invert Parsoid read view wiktionary configs (duration: 12m 23s)
21:16 tgr@deploy2002: tgr, arlolra: Continuing with sync
21:13 tgr@deploy2002: tgr, arlolra: Backport for Turn on Parsoid Read Views for 44 wiktionaries (T387505), Invert Parsoid read view wiktionary configs synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:10 tgr@deploy2002: Started scap sync-world: Backport for Turn on Parsoid Read Views for 44 wiktionaries (T387505), Invert Parsoid read view wiktionary configs
20:39 swfrench-wmf: right-sized capacity distribution between mw-(api-ext|web) main and next releases - T383845
20:38 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
20:38 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
20:38 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
20:38 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
20:20 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
20:20 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-web: apply
20:20 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
20:19 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
20:16 marostegui@cumin1002: dbctl commit (dc=all): 'db2154 (re)pooling @ 100%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74107 and previous config saved to /var/cache/conftool/dbconfig/20250305-201612-root.json
20:11 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-upgrade-ats (exit_code=0) Rolling upgrade/restart of Apache Traffic Server on P{cp4052.ulsfo.wmnet} and A:cp for 9.2.9-1wm1
20:09 brett@cumin2002: START - Cookbook sre.cdn.roll-upgrade-ats Rolling upgrade/restart of Apache Traffic Server on P{cp4052.ulsfo.wmnet} and A:cp for 9.2.9-1wm1
20:04 marostegui@cumin1002: dbctl commit (dc=all): 'db1167 (re)pooling @ 100%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74106 and previous config saved to /var/cache/conftool/dbconfig/20250305-200426-root.json
20:01 marostegui@cumin1002: dbctl commit (dc=all): 'db2154 (re)pooling @ 75%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74105 and previous config saved to /var/cache/conftool/dbconfig/20250305-200106-root.json
19:49 marostegui@cumin1002: dbctl commit (dc=all): 'db1167 (re)pooling @ 75%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74104 and previous config saved to /var/cache/conftool/dbconfig/20250305-194920-root.json
19:46 marostegui@cumin1002: dbctl commit (dc=all): 'db2154 (re)pooling @ 50%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74103 and previous config saved to /var/cache/conftool/dbconfig/20250305-194601-root.json
19:34 marostegui@cumin1002: dbctl commit (dc=all): 'db1167 (re)pooling @ 50%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74102 and previous config saved to /var/cache/conftool/dbconfig/20250305-193414-root.json
19:31 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: cloudelastic1008* for ban host prior to reimage - bking@cumin2002 - T387904
19:31 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: cloudelastic1008* for ban host prior to reimage - bking@cumin2002 - T387904
19:30 marostegui@cumin1002: dbctl commit (dc=all): 'db2154 (re)pooling @ 25%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74101 and previous config saved to /var/cache/conftool/dbconfig/20250305-193056-root.json
19:19 marostegui@cumin1002: dbctl commit (dc=all): 'db1167 (re)pooling @ 25%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74100 and previous config saved to /var/cache/conftool/dbconfig/20250305-191909-root.json
19:15 marostegui@cumin1002: dbctl commit (dc=all): 'db2154 (re)pooling @ 10%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74099 and previous config saved to /var/cache/conftool/dbconfig/20250305-191550-root.json
19:04 marostegui@cumin1002: dbctl commit (dc=all): 'db1167 (re)pooling @ 10%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74098 and previous config saved to /var/cache/conftool/dbconfig/20250305-190403-root.json
18:45 brett: import trafficserver 9.2.9-1wm1 into bullseye-wikimedia (T388035)
18:45 brett: import trafficserver 9.2.9-1wm1 into bullseye-wikimedia
18:21 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Unbanning all hosts in cloudelastic
18:21 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Unbanning all hosts in cloudelastic
18:20 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudelastic1007.eqiad.wmnet with OS bullseye
17:50 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply
17:50 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-video: apply
17:50 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply
17:50 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply
17:50 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply
17:50 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply
17:41 ladsgroup@deploy2002: Finished scap sync-world: Backport for Enable thumbnail steps in testwiki (T360589) (duration: 13m 04s)
17:34 ladsgroup@deploy2002: ladsgroup: Continuing with sync
17:31 ladsgroup@deploy2002: ladsgroup: Backport for Enable thumbnail steps in testwiki (T360589) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
17:28 ladsgroup@deploy2002: Started scap sync-world: Backport for Enable thumbnail steps in testwiki (T360589)
17:20 tgr@deploy2002: Finished scap sync-world: Backport for CentralAuth: Enable SUL3 signup on group 0 (attempt 4) (T384007) (duration: 24m 13s)
17:14 tgr@deploy2002: tgr: Continuing with sync
16:59 tgr@deploy2002: tgr: Backport for CentralAuth: Enable SUL3 signup on group 0 (attempt 4) (T384007) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
16:56 tgr@deploy2002: Started scap sync-world: Backport for CentralAuth: Enable SUL3 signup on group 0 (attempt 4) (T384007)
{{safesubst:SAL entry|1=16:54 tgr@deploy2002: Finished scap sync-world: Backport for CentralAuthIdLookup: Reuse cached object on single-value lookup (T379909 T380500 T387106), CentralAuthIdLookup: Use primary DB after writes (T379909 T380500), Use UserOptionsManager for SUL3 rollout flag (T384549), Make SUL3 global preference optional and simplify logic, [[gerrit:1124785|A}}
16:53 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host puppetserver2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
16:53 elukey@cumin2002: START - Cookbook sre.hosts.provision for host puppetserver2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
16:53 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:50 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host puppetserver2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
16:50 elukey@cumin2002: START - Cookbook sre.hosts.provision for host puppetserver2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
16:50 cmooney@cumin1002: START - Cookbook sre.dns.netbox
16:48 tgr@deploy2002: tgr: Continuing with sync
16:45 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.migrate-service-ipip (exit_code=0) for role: wcqs::public@eqiad
16:45 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on (A:lvs-low-traffic-eqiad or A:lvs-secondary-eqiad) and A:bullseye and A:lvs
16:44 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on (A:lvs-low-traffic-eqiad or A:lvs-secondary-eqiad) and A:bullseye and A:lvs
16:39 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.migrate-service-ipip for role: wcqs::public@eqiad
16:39 vgutierrez@cumin1002: END (FAIL) - Cookbook sre.loadbalancer.migrate-service-ipip (exit_code=99) for role: wcqs::public@eqiad
16:34 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.migrate-service-ipip for role: wcqs::public@eqiad
16:33 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.migrate-service-ipip (exit_code=0) for role: wcqs::public@codfw
16:33 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on (A:lvs-low-traffic-codfw or A:lvs-secondary-codfw) and A:bullseye and A:lvs
16:32 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on (A:lvs-low-traffic-codfw or A:lvs-secondary-codfw) and A:bullseye and A:lvs
16:28 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudelastic1007.eqiad.wmnet with reason: host reimage
16:26 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.migrate-service-ipip for role: wcqs::public@codfw
16:24 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudelastic1007.eqiad.wmnet with reason: host reimage
{{safesubst:SAL entry|1=16:22 tgr@deploy2002: tgr: Backport for CentralAuthIdLookup: Reuse cached object on single-value lookup (T379909 T380500 T387106), CentralAuthIdLookup: Use primary DB after writes (T379909 T380500), Use UserOptionsManager for SUL3 rollout flag (T384549), Make SUL3 global preference optional and simplify logic, [[gerrit:1124785|Add passive central do}}
16:20 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti2048
16:20 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti2048
{{safesubst:SAL entry|1=16:19 tgr@deploy2002: Started scap sync-world: Backport for CentralAuthIdLookup: Reuse cached object on single-value lookup (T379909 T380500 T387106), CentralAuthIdLookup: Use primary DB after writes (T379909 T380500), Use UserOptionsManager for SUL3 rollout flag (T384549), Make SUL3 global preference optional and simplify logic, [[gerrit:1124785|Ad}}
16:19 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti2047
16:19 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti2046
16:19 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti2045
16:19 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti2047
16:19 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti2046
16:19 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti2045
16:19 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:19 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ganeti2046-48 to codfw - jhancock@cumin2002"
16:17 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ganeti2046-48 to codfw - jhancock@cumin2002"
16:12 jhancock@cumin2002: START - Cookbook sre.dns.netbox
16:05 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1007.eqiad.wmnet with OS bullseye
16:00 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dse-k8s-ctrl1002.eqiad.wmnet with OS bookworm
15:48 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.ban (exit_code=99) Banning hosts: cloudelastic1007* for ban host prior to reimage - bking@cumin2002 - T387904
15:48 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: cloudelastic1007* for ban host prior to reimage - bking@cumin2002 - T387904
15:43 ecarg@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
15:42 ecarg@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
15:42 ecarg@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
15:42 ecarg@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
15:41 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dse-k8s-ctrl1002.eqiad.wmnet with reason: host reimage
15:40 jynus: starting es backups on new hosts backup1013, backup2013 T387892
15:39 ecarg@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
15:38 ecarg@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
15:37 stevemunene@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on dse-k8s-ctrl1002.eqiad.wmnet with reason: host reimage
15:35 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Unbanning all hosts in cloudelastic
15:34 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Unbanning all hosts in cloudelastic
15:32 ecarg@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
15:32 ecarg@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
15:31 ecarg@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
15:30 ecarg@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
15:30 sukhe: upload dnsdist 1.9.8-1~wmf12u1 to apt.wm.org for bookworm
15:28 ecarg@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
15:27 bking@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host cloudelastic1007.eqiad.wmnet with OS bullseye
15:26 ecarg@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
15:26 stevemunene@cumin1002: START - Cookbook sre.hosts.reimage for host dse-k8s-ctrl1002.eqiad.wmnet with OS bookworm
15:24 bking@cumin2002: START - Cookbook sre.hosts.reimage for host cloudelastic1007.eqiad.wmnet with OS bullseye
15:23 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/services/mw-debug: apply
15:21 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/services/mw-debug: apply
15:19 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/services/mw-debug: apply
15:18 ecarg@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
15:17 ecarg@deploy2002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
15:17 ecarg@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
15:16 ecarg@deploy2002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
15:11 moritzm: installing openssh security updates
15:10 ecarg@deploy2002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
15:09 ecarg@deploy2002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
15:09 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/services/mw-debug: apply
15:08 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dse-k8s-ctrl1001.eqiad.wmnet with OS bookworm
15:07 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/services/mw-debug: apply
15:07 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/services/mw-debug: apply
15:04 dreamyjazz@deploy2002: Finished scap sync-world: Backport for Use MediaWikiServices hook for push-subscription-manager changes (T275336), Unset unused IP reveal groups in properly (T387205) (duration: 11m 05s)
14:57 dreamyjazz@deploy2002: dreamyjazz, pppery: Continuing with sync
14:55 dreamyjazz@deploy2002: dreamyjazz, pppery: Backport for Use MediaWikiServices hook for push-subscription-manager changes (T275336), Unset unused IP reveal groups in properly (T387205) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:53 dreamyjazz@deploy2002: Started scap sync-world: Backport for Use MediaWikiServices hook for push-subscription-manager changes (T275336), Unset unused IP reveal groups in properly (T387205)
14:52 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.dbctl (exit_code=99)
14:52 fceratto@cumin1002: START - Cookbook sre.mysql.dbctl
14:52 dreamyjazz@deploy2002: Finished scap sync-world: Backport for metawiki: Enable Chinese variant translation for message bundles (T387230) (duration: 18m 29s)
14:51 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: cloudelastic1007* for ban host prior to reimage - bking@cumin2002 - T387904
14:51 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: cloudelastic1007* for ban host prior to reimage - bking@cumin2002 - T387904
14:51 stevemunene@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dse-k8s-ctrl1001.eqiad.wmnet with reason: host reimage
14:48 stevemunene@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on dse-k8s-ctrl1001.eqiad.wmnet with reason: host reimage
14:45 fceratto@cumin1002: END (ERROR) - Cookbook sre.mysql.dbctl (exit_code=1)
14:45 fceratto@cumin1002: START - Cookbook sre.mysql.dbctl
14:45 cmooney@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti2046
14:45 dreamyjazz@deploy2002: abi, dreamyjazz: Continuing with sync
14:44 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.migrate-service-ipip (exit_code=0) for role: analytics_cluster::datahub::opensearch@eqiad
14:44 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on (A:lvs-low-traffic-eqiad or A:lvs-secondary-eqiad) and A:bullseye and A:lvs
14:44 cmooney@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti2046
14:43 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on (A:lvs-low-traffic-eqiad or A:lvs-secondary-eqiad) and A:bullseye and A:lvs
14:43 fceratto@cumin1002: END (ERROR) - Cookbook sre.mysql.dbctl (exit_code=2)
14:43 fceratto@cumin1002: START - Cookbook sre.mysql.dbctl
14:42 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.dbctl (exit_code=0)
14:42 fceratto@cumin1002: START - Cookbook sre.mysql.dbctl
{{safesubst:SAL entry|1=14:23 dreamyjazz@deploy2002: Finished scap sync-world: Backport for Set Flow to read-only on remaining phase 2a wikis (T378834), Remove unused config parameters from ReadingLists extension., Use namespaced Title and Html classes (T166010 T387938), officewiki: Disable the event-organizer user group (T387943), [[gerrit:1124768|Temporarily unset tempor}}
14:16 dreamyjazz@deploy2002: daimona, zoe, dreamyjazz, dbrant: Continuing with sync
14:14 sukhe: restart pybal on lvs2014
14:14 sukhe: restart pybal on lvs2013
{{safesubst:SAL entry|1=14:12 dreamyjazz@deploy2002: daimona, zoe, dreamyjazz, dbrant: Backport for Set Flow to read-only on remaining phase 2a wikis (T378834), Remove unused config parameters from ReadingLists extension., Use namespaced Title and Html classes (T166010 T387938), officewiki: Disable the event-organizer user group (T387943), [[gerrit:1124768|Temporarily unse}}
{{safesubst:SAL entry|1=14:09 dreamyjazz@deploy2002: Started scap sync-world: Backport for Set Flow to read-only on remaining phase 2a wikis (T378834), Remove unused config parameters from ReadingLists extension., Use namespaced Title and Html classes (T166010 T387938), officewiki: Disable the event-organizer user group (T387943), [[gerrit:1124768|Temporarily unset tempora}}
13:58 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1167.eqiad.wmnet with reason: Index rebuild
13:58 root@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2154.codfw.wmnet with reason: Index rebuild
13:58 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db2154.codfw.wmnet
13:57 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db1167.eqiad.wmnet
13:53 jmm@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ganeti1032.eqiad.wmnet
13:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1032.eqiad.wmnet
13:51 marostegui@cumin1002: START - Cookbook sre.mysql.upgrade for db2154.codfw.wmnet
13:51 marostegui@cumin1002: START - Cookbook sre.mysql.upgrade for db1167.eqiad.wmnet
13:50 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1016,1020].eqiad.wmnet,db1154.eqiad.wmnet with reason: Index rebuild
13:49 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2154 db1167', diff saved to https://phabricator.wikimedia.org/P74096 and previous config saved to /var/cache/conftool/dbconfig/20250305-134936-marostegui.json
13:42 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1032.eqiad.wmnet
13:28 klausman@deploy2002: conftool action : set/pooled=yes; selector: name=inference-staging
13:27 klausman@deploy2002: conftool action : set/pooled=yes; selector: name=inference
13:26 jiji@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply
13:26 jiji@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-media: apply
13:26 jiji@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply
13:25 jiji@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply
13:23 ladsgroup@deploy2002: Finished scap sync-world: Backport for maintenance: Also check for utf-8 encoding in findBadBlobs (T351953), maintenance: Also check for utf-8 encoding in findBadBlobs (T351953) (duration: 11m 31s)
13:22 elukey@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: pool site eqiad [reason: Repool eqiad after maintenance, no task ID specified]
13:22 elukey@cumin1002: START - Cookbook sre.dns.admin DNS admin: pool site eqiad [reason: Repool eqiad after maintenance, no task ID specified]
13:18 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ganeti1032.eqiad.wmnet
13:16 ladsgroup@deploy2002: ladsgroup: Continuing with sync
13:15 ladsgroup@deploy2002: ladsgroup: Backport for maintenance: Also check for utf-8 encoding in findBadBlobs (T351953), maintenance: Also check for utf-8 encoding in findBadBlobs (T351953) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:12 jmm@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on ganeti1032.eqiad.wmnet with reason: remove from cluster for reimage
13:11 ladsgroup@deploy2002: Started scap sync-world: Backport for maintenance: Also check for utf-8 encoding in findBadBlobs (T351953), maintenance: Also check for utf-8 encoding in findBadBlobs (T351953)
13:10 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1032.eqiad.wmnet
12:50 ladsgroup@deploy2002: ladsgroup: Continuing with sync
12:42 ladsgroup@deploy2002: ladsgroup: Backport for maintenance: Also check for utf-8 encoding in findBadBlobs (T351953), maintenance: Also check for utf-8 encoding in findBadBlobs (T351953) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
12:39 ladsgroup@deploy2002: Started scap sync-world: Backport for maintenance: Also check for utf-8 encoding in findBadBlobs (T351953), maintenance: Also check for utf-8 encoding in findBadBlobs (T351953)
12:35 Emperor: restart envoy/swift on ms-fe2010
12:31 marostegui@cumin1002: dbctl commit (dc=all): 'db2166 (re)pooling @ 100%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74095 and previous config saved to /var/cache/conftool/dbconfig/20250305-123149-root.json
12:25 mvolz@deploy2002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
12:24 mvolz@deploy2002: helmfile [eqiad] START helmfile.d/services/citoid: apply
12:23 mvolz@deploy2002: helmfile [codfw] DONE helmfile.d/services/citoid: apply
12:23 mvolz@deploy2002: helmfile [codfw] START helmfile.d/services/citoid: apply
12:22 mvolz@deploy2002: helmfile [staging] DONE helmfile.d/services/citoid: apply
12:21 mvolz@deploy2002: helmfile [staging] START helmfile.d/services/citoid: apply
12:20 mvolz@deploy2002: helmfile [staging] DONE helmfile.d/services/citoid: apply
12:19 mvolz@deploy2002: helmfile [staging] START helmfile.d/services/citoid: apply
12:16 marostegui@cumin1002: dbctl commit (dc=all): 'db2166 (re)pooling @ 75%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74094 and previous config saved to /var/cache/conftool/dbconfig/20250305-121643-root.json
12:14 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
12:13 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: apply
12:13 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
12:13 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/changeprop: apply
12:13 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
12:12 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/changeprop: apply
12:10 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
12:10 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/changeprop: apply
12:05 slyngshede@dns1004: END - running authdns-update
12:03 slyngshede@dns1004: START - running authdns-update
12:01 marostegui@cumin1002: dbctl commit (dc=all): 'db2166 (re)pooling @ 50%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74093 and previous config saved to /var/cache/conftool/dbconfig/20250305-120138-root.json
11:55 marostegui@cumin1002: dbctl commit (dc=all): 'db1226 (re)pooling @ 100%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74092 and previous config saved to /var/cache/conftool/dbconfig/20250305-115557-root.json
11:46 marostegui@cumin1002: dbctl commit (dc=all): 'db2166 (re)pooling @ 25%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74091 and previous config saved to /var/cache/conftool/dbconfig/20250305-114632-root.json
11:40 marostegui@cumin1002: dbctl commit (dc=all): 'db1226 (re)pooling @ 75%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74090 and previous config saved to /var/cache/conftool/dbconfig/20250305-114051-root.json
11:38 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-staging2001.codfw.wmnet
11:35 tappof@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "network_devices: adding device model - tappof@cumin1002 - T387231"
11:34 tappof@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "network_devices: adding device model - tappof@cumin1002 - T387231"
11:32 klausman@cumin2002: START - Cookbook sre.hosts.reboot-single for host ml-staging2001.codfw.wmnet
11:31 marostegui@cumin1002: dbctl commit (dc=all): 'db2166 (re)pooling @ 10%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74089 and previous config saved to /var/cache/conftool/dbconfig/20250305-113126-root.json
11:29 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
11:29 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/mw-web: apply
11:25 marostegui@cumin1002: dbctl commit (dc=all): 'db1226 (re)pooling @ 50%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74088 and previous config saved to /var/cache/conftool/dbconfig/20250305-112545-root.json
11:10 marostegui@cumin1002: dbctl commit (dc=all): 'db1226 (re)pooling @ 25%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74087 and previous config saved to /var/cache/conftool/dbconfig/20250305-111040-root.json
11:07 elukey@cumin1002: END (PASS) - Cookbook sre.dns.admin (exit_code=0) DNS admin: depool site eqiad [reason: no reason specified, no task ID specified]
11:07 elukey@cumin1002: START - Cookbook sre.dns.admin DNS admin: depool site eqiad [reason: no reason specified, no task ID specified]
10:57 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
10:57 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/mw-web: apply
10:57 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
10:57 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
10:56 hnowlan@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
10:56 hnowlan@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
10:55 marostegui@cumin1002: dbctl commit (dc=all): 'db1226 (re)pooling @ 10%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74086 and previous config saved to /var/cache/conftool/dbconfig/20250305-105534-root.json
10:38 filippo@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus2007.codfw.wmnet
10:38 filippo@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus2008.codfw.wmnet
10:33 marostegui@cumin1002: dbctl commit (dc=all): 'db1160 (re)pooling @ 100%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74085 and previous config saved to /var/cache/conftool/dbconfig/20250305-103316-root.json
10:32 elukey: restart kube-apiserver on ml-staging-ctrl200[12] after the move to containerd (some issues regisstered)
10:31 filippo@cumin1002: START - Cookbook sre.hosts.reboot-single for host prometheus2008.codfw.wmnet
10:30 filippo@cumin1002: START - Cookbook sre.hosts.reboot-single for host prometheus2007.codfw.wmnet
10:18 marostegui@cumin1002: dbctl commit (dc=all): 'db1160 (re)pooling @ 75%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74084 and previous config saved to /var/cache/conftool/dbconfig/20250305-101810-root.json
10:03 marostegui@cumin1002: dbctl commit (dc=all): 'db1160 (re)pooling @ 50%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74083 and previous config saved to /var/cache/conftool/dbconfig/20250305-100304-root.json
09:58 volans@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
09:58 volans@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
09:58 volans@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
09:57 volans@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
09:55 fceratto@cumin1002: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) db1202 gradually with 4 steps - Cloned db1202 to db1253
09:48 marostegui@cumin1002: dbctl commit (dc=all): 'db1160 (re)pooling @ 25%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74081 and previous config saved to /var/cache/conftool/dbconfig/20250305-094759-root.json
09:39 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1032.eqiad.wmnet
09:38 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1032.eqiad.wmnet
09:38 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1032.eqiad.wmnet
09:36 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of dse-k8s-etcd1001.eqiad.wmnet to plain
09:36 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of dse-k8s-etcd1001.eqiad.wmnet to plain
09:35 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1032.eqiad.wmnet
09:35 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1032.eqiad.wmnet
09:33 fceratto@cumin1002: START - Cookbook sre.mysql.pool db1202 gradually with 4 steps - Cloned db1202 to db1253
09:32 marostegui@cumin1002: dbctl commit (dc=all): 'db1160 (re)pooling @ 10%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74078 and previous config saved to /var/cache/conftool/dbconfig/20250305-093254-root.json
09:32 fceratto@cumin1002: dbctl commit (dc=all): 'Cloned db1202 to db1253', diff saved to https://phabricator.wikimedia.org/P74077 and previous config saved to /var/cache/conftool/dbconfig/20250305-093249-fceratto.json
09:31 fceratto@cumin1002: END (ERROR) - Cookbook sre.mysql.pool (exit_code=97) db1202 gradually with 4 steps - Cloned db1202 to db1253
09:30 fceratto@cumin1002: START - Cookbook sre.mysql.pool db1202 gradually with 4 steps - Cloned db1202 to db1253
09:27 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of dse-k8s-etcd1001.eqiad.wmnet to drbd
09:23 jynus: deploy new backup grants for es2036,es2040 T387892
09:18 jynus: deploy new backup grants for es1036,es1040 T387892
09:17 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of dse-k8s-etcd1001.eqiad.wmnet to drbd
09:16 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of aux-k8s-etcd1003.eqiad.wmnet to plain
09:15 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of aux-k8s-etcd1003.eqiad.wmnet to plain
09:15 hashar@deploy2002: rebuilt and synchronized wikiversions files: group1 to 1.44.0-wmf.19 refs T386214
09:15 godog: upgrade to karma 0.120 - T353457
09:15 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1032.eqiad.wmnet
09:14 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1032.eqiad.wmnet
09:14 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of aux-k8s-etcd1003.eqiad.wmnet to drbd
09:09 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.pool (exit_code=99) db1202 gradually with 4 steps - Cloned db1202 to db1253
09:08 fceratto@cumin1002: START - Cookbook sre.mysql.pool db1202 gradually with 4 steps - Cloned db1202 to db1253
09:07 marostegui: Stop db1217:3321 to clone db1250 T385141
09:05 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1217.eqiad.wmnet with reason: cloning
09:04 klausman@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
09:04 klausman@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
08:55 tappof@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "network_devices: adding device model - tappof@cumin1002 - T387231"
08:54 tappof@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "network_devices: adding device model - tappof@cumin1002 - T387231"
08:53 jelto@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
08:52 jelto@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
08:51 jelto@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
08:50 jelto@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
08:50 jelto@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
08:50 jelto@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
08:49 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of aux-k8s-etcd1003.eqiad.wmnet to drbd
08:46 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1032.eqiad.wmnet
08:45 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1032.eqiad.wmnet
08:33 klausman@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'article-descriptions' for release 'main' .
08:20 hashar@deploy2002: Finished scap sync-world: Backport for Lift IP cap for edit-a-thon (Illinois Tech) on March 12, 2025 (T387568), sewikimedia: update wordmark and tagline (T377921) (duration: 12m 02s)
08:14 hashar@deploy2002: hashar, anzx: Continuing with sync
08:13 hashar@deploy2002: hashar, anzx: Backport for Lift IP cap for edit-a-thon (Illinois Tech) on March 12, 2025 (T387568), sewikimedia: update wordmark and tagline (T377921) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
08:08 hashar@deploy2002: Started scap sync-world: Backport for Lift IP cap for edit-a-thon (Illinois Tech) on March 12, 2025 (T387568), sewikimedia: update wordmark and tagline (T377921)
08:03 marostegui@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P74075 and previous config saved to /var/cache/conftool/dbconfig/20250305-080343-root.json
07:48 marostegui@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P74074 and previous config saved to /var/cache/conftool/dbconfig/20250305-074838-root.json
07:33 marostegui@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P74073 and previous config saved to /var/cache/conftool/dbconfig/20250305-073333-root.json
07:18 marostegui@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P74072 and previous config saved to /var/cache/conftool/dbconfig/20250305-071827-root.json
07:03 marostegui@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P74071 and previous config saved to /var/cache/conftool/dbconfig/20250305-070321-root.json
06:45 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1160.eqiad.wmnet with reason: Rebuilding index
06:42 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db1160.eqiad.wmnet
06:35 marostegui@cumin1002: START - Cookbook sre.mysql.upgrade for db1160.eqiad.wmnet
06:32 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1160 T387816', diff saved to https://phabricator.wikimedia.org/P74070 and previous config saved to /var/cache/conftool/dbconfig/20250305-063216-marostegui.json
06:31 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db1244 to s4 primary T387816', diff saved to https://phabricator.wikimedia.org/P74069 and previous config saved to /var/cache/conftool/dbconfig/20250305-063124-marostegui.json
06:30 marostegui: Starting s4 eqiad failover from db1160 to db1244 - T387816
06:30 root@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2166.codfw.wmnet with reason: Index rebuild
06:30 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1226.eqiad.wmnet with reason: Index rebuild
06:30 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db2166.codfw.wmnet
06:29 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db1226.eqiad.wmnet
06:26 marostegui@cumin1002: dbctl commit (dc=all): 'Remove db1244 from API/vslow/dump T387816', diff saved to https://phabricator.wikimedia.org/P74068 and previous config saved to /var/cache/conftool/dbconfig/20250305-062629-marostegui.json
06:26 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 33 hosts with reason: Primary switchover s4 T387816
06:25 marostegui@cumin1002: dbctl commit (dc=all): 'Set db1244 with weight 0 T387816', diff saved to https://phabricator.wikimedia.org/P74067 and previous config saved to /var/cache/conftool/dbconfig/20250305-062554-marostegui.json
06:24 marostegui@cumin1002: START - Cookbook sre.mysql.upgrade for db1226.eqiad.wmnet
06:24 marostegui@cumin1002: START - Cookbook sre.mysql.upgrade for db2166.codfw.wmnet
06:24 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2166 db1226', diff saved to https://phabricator.wikimedia.org/P74066 and previous config saved to /var/cache/conftool/dbconfig/20250305-062402-marostegui.json
03:42 ejegg: donorwiki upgraded from 05f7d8cc to 1b6c275a
02:05 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host backup2014.codfw.wmnet with OS bookworm
02:05 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
02:00 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
01:40 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backup2014.codfw.wmnet with reason: host reimage
01:36 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on backup2014.codfw.wmnet with reason: host reimage
01:18 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host backup2014.codfw.wmnet with OS bookworm
01:12 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host backup2014.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
01:01 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host backup2014.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
01:00 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host backup2013.codfw.wmnet with OS bookworm
01:00 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
00:55 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
00:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backup2013.codfw.wmnet with reason: host reimage
00:32 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on backup2013.codfw.wmnet with reason: host reimage
00:14 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host backup2013.codfw.wmnet with OS bookworm
00:11 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['backup2013']
00:11 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['backup2013']
00:10 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host backup2013.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
00:03 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host backup2013.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
00:02 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host backup2014.codfw.wmnet with OS bookworm

2025-03-04

23:53 swfrench-wmf: started shellbox-media PHP 8.1 pilot with increased logging and display_startup_errors fix - T377038
23:51 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply
23:51 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-media: apply
23:49 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply
23:49 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply
23:11 tgr_: UTC very late deploys done
23:08 tgr@deploy2002: Finished scap sync-world: Backport for Revert "CentralAuth: Enable SUL3 signup on group 0 (attempt 3)" (duration: 11m 36s)
23:01 tgr@deploy2002: trainbranchbot, tgr: Continuing with sync
22:59 tgr@deploy2002: trainbranchbot, tgr: Backport for Revert "CentralAuth: Enable SUL3 signup on group 0 (attempt 3)" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
22:56 tgr@deploy2002: Started scap sync-world: Backport for Revert "CentralAuth: Enable SUL3 signup on group 0 (attempt 3)"
22:50 tgr@deploy2002: Sync cancelled.
22:35 ebernhardson@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
22:34 ebernhardson@deploy2002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
22:29 tgr@deploy2002: tgr: Backport for CentralAuth: Enable SUL3 signup on group 0 (attempt 3) (T384007) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
22:26 tgr@deploy2002: Started scap sync-world: Backport for CentralAuth: Enable SUL3 signup on group 0 (attempt 3) (T384007)
22:21 jdrewniak@deploy2002: Finished scap sync-world: Backport for Deploy Search AB test to everywhere but English wiki (T386849) (duration: 13m 34s)
22:19 Amir1: clearing user_real_name in group0 wikis (T387212)
22:15 jdrewniak@deploy2002: jdrewniak, bwang: Continuing with sync
22:11 jdrewniak@deploy2002: jdrewniak, bwang: Backport for Deploy Search AB test to everywhere but English wiki (T386849) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
22:08 jdrewniak@deploy2002: Started scap sync-world: Backport for Deploy Search AB test to everywhere but English wiki (T386849)
{{safesubst:SAL entry|1=21:53 jforrester@deploy2002: Finished scap sync-world: Backport for IS: Stop setting wgParserConf, unused since MW 1.36, CS: Stop setting wgTmhWebPlayer, unused since TMH REL1_39, CS: Stop setting wgBabelUseDatabase, unused since Babel REL1_39, CS-labs: Stop setting wgUrlShortenerDB*, unused since UrlShortener REL1_41, [[gerrit:1120505|[Growth] Enab}}
21:47 jforrester@deploy2002: jforrester, sgimeno: Continuing with sync
{{safesubst:SAL entry|1=21:45 jforrester@deploy2002: jforrester, sgimeno: Backport for IS: Stop setting wgParserConf, unused since MW 1.36, CS: Stop setting wgTmhWebPlayer, unused since TMH REL1_39, CS: Stop setting wgBabelUseDatabase, unused since Babel REL1_39, CS-labs: Stop setting wgUrlShortenerDB*, unused since UrlShortener REL1_41, [[gerrit:1120505|[Growth] Enable su}}
{{safesubst:SAL entry|1=21:42 jforrester@deploy2002: Started scap sync-world: Backport for IS: Stop setting wgParserConf, unused since MW 1.36, CS: Stop setting wgTmhWebPlayer, unused since TMH REL1_39, CS: Stop setting wgBabelUseDatabase, unused since Babel REL1_39, CS-labs: Stop setting wgUrlShortenerDB*, unused since UrlShortener REL1_41, [[gerrit:1120505|[Growth] Enabl}}
{{safesubst:SAL entry|1=21:41 jforrester@deploy2002: Finished scap sync-world: Backport for fix(surfacing): don't show highlights on protected pages, fix(surfacing): don't show highlights on protected pages, analytics(GrowthExperimentsInteractionLogger): add mediawiki.database to event data (T387286), [[gerrit:1124494|analytics(GrowthExperimentsInteractionLogger): add mediawiki.database to e}}
21:34 jforrester@deploy2002: sgimeno, jforrester, migr: Continuing with sync
{{safesubst:SAL entry|1=21:32 jforrester@deploy2002: sgimeno, jforrester, migr: Backport for fix(surfacing): don't show highlights on protected pages, fix(surfacing): don't show highlights on protected pages, analytics(GrowthExperimentsInteractionLogger): add mediawiki.database to event data (T387286), [[gerrit:1124494|analytics(GrowthExperimentsInteractionLogger): add mediawiki.database to}}
21:30 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host ganeti2045.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
{{safesubst:SAL entry|1=21:28 jforrester@deploy2002: Started scap sync-world: Backport for fix(surfacing): don't show highlights on protected pages, fix(surfacing): don't show highlights on protected pages, analytics(GrowthExperimentsInteractionLogger): add mediawiki.database to event data (T387286), [[gerrit:1124494|analytics(GrowthExperimentsInteractionLogger): add mediawiki.database to ev}}
21:25 jforrester@deploy2002: Finished scap sync-world: Backport for Revert "styles: Remove transparent PNG fallback for `.vector-icon`" (T358910 T387351) (duration: 10m 13s)
21:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host ganeti2045.mgmt.codfw.wmnet with chassis set policy FORCE_RESTART
21:24 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ganeti2045
21:23 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host backup2013.codfw.wmnet with OS bookworm
21:23 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1181.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
21:23 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ganeti2045
21:18 jforrester@deploy2002: jforrester, jdlrobson: Continuing with sync
21:18 jforrester@deploy2002: jforrester, jdlrobson: Backport for Revert "styles: Remove transparent PNG fallback for `.vector-icon`" (T358910 T387351) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:15 jforrester@deploy2002: Started scap sync-world: Backport for Revert "styles: Remove transparent PNG fallback for `.vector-icon`" (T358910 T387351)
21:14 jforrester@deploy2002: Finished scap sync-world: Backport for docroot: Enable Chrome credential sharing on all open SUL wikis (T385520) (duration: 10m 33s)
21:08 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1181.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
21:08 jforrester@deploy2002: jforrester, krinkle: Continuing with sync
21:07 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1181.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
21:07 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1181.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
21:07 jforrester@deploy2002: jforrester, krinkle: Backport for docroot: Enable Chrome credential sharing on all open SUL wikis (T385520) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:06 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1181.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
21:06 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1181.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
21:05 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1181.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
21:04 jforrester@deploy2002: Started scap sync-world: Backport for docroot: Enable Chrome credential sharing on all open SUL wikis (T385520)
20:40 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host backup2013.codfw.wmnet with OS bookworm
20:05 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
20:05 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns names for test servers nokia lab - cmooney@cumin1002"
20:04 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns names for test servers nokia lab - cmooney@cumin1002"
20:01 cmooney@cumin1002: START - Cookbook sre.dns.netbox
19:33 xcollazo@deploy2002: Finished deploy [airflow-dags/analytics@10615c9]: Deploy latet DAGs for analytics Airflow instance. T387906. (duration: 00m 34s)
19:32 xcollazo@deploy2002: Started deploy [airflow-dags/analytics@10615c9]: Deploy latet DAGs for analytics Airflow instance. T387906.
19:20 hashar@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.19 refs T386214
18:51 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.clone (exit_code=99) of db1202.eqiad.wmnet onto db1253.eqiad.wmnet
18:43 hashar@deploy2002: Finished scap sync-world: Backport for Fix typo in wgTrackGlobalJsonLinksNamespaces (T387843 T385917) (duration: 14m 51s)
18:41 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-staging2003.codfw.wmnet with OS bookworm
18:36 hashar@deploy2002: hashar, bvibber: Continuing with sync
18:36 hashar@deploy2002: hashar, bvibber: Backport for Fix typo in wgTrackGlobalJsonLinksNamespaces (T387843 T385917) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
18:28 hashar@deploy2002: Started scap sync-world: Backport for Fix typo in wgTrackGlobalJsonLinksNamespaces (T387843 T385917)
18:26 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-staging2003.codfw.wmnet with reason: host reimage
18:23 klausman@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-staging2003.codfw.wmnet with reason: host reimage
18:20 swfrench-wmf: serving 25% of mw-api-int traffic on PHP 8.1 - T383845
18:19 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
18:19 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
18:18 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
18:18 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
18:16 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
18:16 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
18:15 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
18:15 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
18:10 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host ml-staging2003
18:10 klausman@cumin2002: START - Cookbook sre.hosts.move-vlan for host ml-staging2003
18:09 klausman@cumin2002: START - Cookbook sre.hosts.reimage for host ml-staging2003.codfw.wmnet with OS bookworm
18:05 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-staging2001.codfw.wmnet with OS bookworm
17:59 dcausse@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
17:58 dcausse@deploy2002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
17:58 swfrench@deploy2002: Finished scap sync-world: Use latest php8.1 images - T377038 (duration: 24m 53s)
17:56 dcausse@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
17:55 dcausse@deploy2002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
17:53 dcausse@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
17:53 ejegg: donorwiki upgraded from 98027151 to 05f7d8cc
17:52 dcausse@deploy2002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
17:50 dcausse@deploy2002: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
17:49 dcausse@deploy2002: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
17:48 dcausse@deploy2002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
17:47 dcausse@deploy2002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
17:46 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-staging2001.codfw.wmnet with reason: host reimage
17:42 klausman@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-staging2001.codfw.wmnet with reason: host reimage
17:33 swfrench@deploy2002: Started scap sync-world: Use latest php8.1 images - T377038
17:32 marostegui@cumin1002: dbctl commit (dc=all): 'db2161 (re)pooling @ 100%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74064 and previous config saved to /var/cache/conftool/dbconfig/20250304-173228-root.json
17:31 swfrench-wmf: built php8.1 production images with 'php8.1: Set display_startup_errors consistent with display_errors' - T377038
17:25 klausman@cumin2002: START - Cookbook sre.hosts.reimage for host ml-staging2001.codfw.wmnet with OS bookworm
17:24 klausman@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ml-staging2001.codfw.wmnet with OS bookworm
17:17 marostegui@cumin1002: dbctl commit (dc=all): 'db2161 (re)pooling @ 75%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74062 and previous config saved to /var/cache/conftool/dbconfig/20250304-171722-root.json
17:12 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
17:11 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
17:09 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
17:08 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
17:03 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
17:02 marostegui@cumin1002: dbctl commit (dc=all): 'db1214 (re)pooling @ 100%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74061 and previous config saved to /var/cache/conftool/dbconfig/20250304-170223-root.json
17:02 marostegui@cumin1002: dbctl commit (dc=all): 'db2161 (re)pooling @ 50%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74060 and previous config saved to /var/cache/conftool/dbconfig/20250304-170217-root.json
17:02 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply
17:02 ryankemper@dns1004: END - running authdns-update
17:00 claime: Closing UTC afternoon backport window
16:59 ryankemper@dns1004: START - running authdns-update
16:58 cgoubert@deploy2002: Finished scap sync-world: Backport for Revert^2 "When executing cli scripts, wait for the service mesh" (T387208) (duration: 10m 42s)
16:57 jgiannelos@deploy2002: Finished deploy [restbase/deploy@3eb0316]: Add new wikis. Enable prometheus metrics. (duration: 21m 25s)
16:55 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host ml-staging2001
16:55 klausman@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ml-staging2001
16:55 klausman@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ml-staging2001
16:55 klausman@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ml-staging2001.codfw.wmnet 201.0.192.10.in-addr.arpa 1.0.2.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
16:55 klausman@cumin2002: START - Cookbook sre.dns.wipe-cache ml-staging2001.codfw.wmnet 201.0.192.10.in-addr.arpa 1.0.2.0.0.0.0.0.2.9.1.0.0.1.0.0.1.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
16:55 klausman@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:55 klausman@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ml-staging2001 - klausman@cumin2002"
16:55 klausman@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ml-staging2001 - klausman@cumin2002"
16:51 cgoubert@deploy2002: cgoubert, oblivian: Continuing with sync
16:51 klausman@cumin2002: START - Cookbook sre.dns.netbox
16:50 klausman@cumin2002: START - Cookbook sre.hosts.move-vlan for host ml-staging2001
16:50 cgoubert@deploy2002: cgoubert, oblivian: Backport for Revert^2 "When executing cli scripts, wait for the service mesh" (T387208) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
16:49 klausman@cumin2002: START - Cookbook sre.hosts.reimage for host ml-staging2001.codfw.wmnet with OS bookworm
16:48 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-staging2002.codfw.wmnet with OS bookworm
16:47 cgoubert@deploy2002: Started scap sync-world: Backport for Revert^2 "When executing cli scripts, wait for the service mesh" (T387208)
16:47 marostegui@cumin1002: dbctl commit (dc=all): 'db1214 (re)pooling @ 75%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74059 and previous config saved to /var/cache/conftool/dbconfig/20250304-164718-root.json
16:47 marostegui@cumin1002: dbctl commit (dc=all): 'db2161 (re)pooling @ 25%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74058 and previous config saved to /var/cache/conftool/dbconfig/20250304-164712-root.json
16:46 fceratto@cumin1002: START - Cookbook sre.mysql.clone of db1202.eqiad.wmnet onto db1253.eqiad.wmnet
16:45 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.clone (exit_code=99) of db1202.eqiad.wmnet onto db1253.eqiad.wmnet
16:43 fceratto@cumin1002: START - Cookbook sre.mysql.clone of db1202.eqiad.wmnet onto db1253.eqiad.wmnet
16:43 cgoubert@deploy2002: Finished scap sync-world: Backport for Enable $wgCampaignEventsSeparateOngoingEvents by default (T386427) (duration: 21m 28s)
16:42 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.clone (exit_code=99) of db1202.eqiad.wmnet onto db1253.eqiad.wmnet
16:41 fceratto@cumin1002: START - Cookbook sre.mysql.clone of db1202.eqiad.wmnet onto db1253.eqiad.wmnet
16:39 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1031.eqiad.wmnet to cluster eqiad and group A
16:38 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1031.eqiad.wmnet to cluster eqiad and group A
16:36 jgiannelos@deploy2002: Started deploy [restbase/deploy@3eb0316]: Add new wikis. Enable prometheus metrics.
16:34 cgoubert@deploy2002: daimona, cgoubert: Continuing with sync
16:32 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.clone (exit_code=99) of db1202.eqiad.wmnet onto db1253.eqiad.wmnet
16:32 marostegui@cumin1002: dbctl commit (dc=all): 'db1214 (re)pooling @ 50%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74057 and previous config saved to /var/cache/conftool/dbconfig/20250304-163212-root.json
16:32 marostegui@cumin1002: dbctl commit (dc=all): 'db2161 (re)pooling @ 10%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74056 and previous config saved to /var/cache/conftool/dbconfig/20250304-163207-root.json
16:31 fceratto@cumin1002: START - Cookbook sre.mysql.clone of db1202.eqiad.wmnet onto db1253.eqiad.wmnet
16:30 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1031.eqiad.wmnet
16:29 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-staging2002.codfw.wmnet with reason: host reimage
16:29 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.clone (exit_code=99) of db1202.eqiad.wmnet onto db1253.eqiad.wmnet
16:28 cgoubert@deploy2002: daimona, cgoubert: Backport for Enable $wgCampaignEventsSeparateOngoingEvents by default (T386427) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
16:27 dcausse@deploy2002: helmfile [eqiad] DONE helmfile.d/services/rdf-streaming-updater: apply
16:27 dcausse@deploy2002: helmfile [eqiad] START helmfile.d/services/rdf-streaming-updater: apply
16:25 klausman@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-staging2002.codfw.wmnet with reason: host reimage
16:24 dcausse@deploy2002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
16:24 dcausse@deploy2002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
16:22 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1031.eqiad.wmnet
16:22 cgoubert@deploy2002: Started scap sync-world: Backport for Enable $wgCampaignEventsSeparateOngoingEvents by default (T386427)
16:20 jmm@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ganeti1031.eqiad.wmnet
16:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1031.eqiad.wmnet
16:17 cgoubert@deploy2002: Finished scap sync-world: Move image forward (duration: 09m 16s)
16:15 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/services/mw-debug: apply
16:15 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/services/mw-debug: apply
16:11 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/services/mw-debug: apply
16:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1031.eqiad.wmnet
16:08 cgoubert@deploy2002: Started scap sync-world: Move image forward
16:07 cgoubert@deploy2002: Finished scap sync-world: Shrink -next releases (duration: 02m 35s)
16:07 klausman@cumin2002: START - Cookbook sre.hosts.reimage for host ml-staging2002.codfw.wmnet with OS bookworm
16:05 cgoubert@deploy2002: Started scap sync-world: Shrink -next releases
16:05 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host ml-staging2002.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
16:04 otto@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: apply
16:04 brennen@deploy2002: Finished deploy [phabricator/deployment@5d2302b]: deploy phab1004 for T387873 (duration: 00m 51s)
16:03 brennen@deploy2002: Started deploy [phabricator/deployment@5d2302b]: deploy phab1004 for T387873
16:03 elukey@cumin1002: START - Cookbook sre.hosts.provision for host ml-staging2002.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART and with Dell SCP reboot policy GRACEFUL
16:03 otto@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: apply
16:03 brennen@deploy2002: Finished deploy [phabricator/deployment@5d2302b]: test deploy phab2002 for T387873 (duration: 00m 29s)
16:02 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-logging-external: apply
16:02 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/services/mw-debug: apply
16:02 brennen@deploy2002: Started deploy [phabricator/deployment@5d2302b]: test deploy phab2002 for T387873
16:02 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-logging-external: apply
16:02 jelto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab1004.eqiad.wmnet with reason: Phabricator deploy
16:01 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply
16:01 jelto@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab2002.codfw.wmnet with reason: Phabricator deploy
16:01 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply
16:01 ottomata: eventgate-logging-external: rolling back to pre node 20 due to bug likely caused by T382173. -- T387850 , T383814
15:51 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/services/mw-debug: apply
15:51 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
15:51 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
15:50 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/services/mw-debug: apply
15:50 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/services/mw-debug: apply
15:49 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/services/mw-debug: apply
15:46 jmm@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ganeti1031.eqiad.wmnet
15:46 fceratto@cumin1002: START - Cookbook sre.mysql.clone of db1202.eqiad.wmnet onto db1253.eqiad.wmnet
15:45 marostegui@cumin1002: dbctl commit (dc=all): 'db1214 (re)pooling @ 25%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74054 and previous config saved to /var/cache/conftool/dbconfig/20250304-154537-root.json
15:42 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/services/mw-debug: apply
15:41 vgutierrez: repooling lvs5004 running liberica - T384477
15:35 jelto@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
15:35 jelto@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply
15:34 jelto@deploy2002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
15:33 jelto@deploy2002: helmfile [staging] START helmfile.d/services/miscweb: apply
15:33 klausman@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host ml-staging2002.codfw.wmnet with OS bookworm
15:32 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/services/mw-debug: apply
15:30 marostegui@cumin1002: dbctl commit (dc=all): 'db1214 (re)pooling @ 10%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74052 and previous config saved to /var/cache/conftool/dbconfig/20250304-153031-root.json
15:19 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs5004.eqsin.wmnet with OS bookworm
15:16 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ganeti1031.eqiad.wmnet with OS bookworm
14:59 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.move-vlan (exit_code=0) for host ml-staging2002
14:59 klausman@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ml-staging2002
14:58 klausman@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ml-staging2002
14:58 klausman@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ml-staging2002.codfw.wmnet 174.48.192.10.in-addr.arpa 4.7.1.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
14:58 klausman@cumin2002: START - Cookbook sre.dns.wipe-cache ml-staging2002.codfw.wmnet 174.48.192.10.in-addr.arpa 4.7.1.0.8.4.0.0.2.9.1.0.0.1.0.0.4.0.1.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa on all recursors
14:58 klausman@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:58 klausman@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ml-staging2002 - klausman@cumin2002"
14:58 klausman@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update records for host ml-staging2002 - klausman@cumin2002"
14:57 sukhe@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-ats (exit_code=0) Rolling upgrade/restart of Apache Traffic Server on P{cp7005.*} or P{cp7009.*} or P{cp[7011-7014]*} or P{cp7016.*} and A:cp for 9.2.6-1wm2
14:57 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ganeti1031.eqiad.wmnet with reason: host reimage
{{safesubst:SAL entry|1=14:56 cgoubert@deploy2002: Started scap sync-world: Deploying [[gerrit:1124444|Revert "php8.1: Set display_startup_errors consistent with display_errors"}}
14:54 klausman@cumin2002: START - Cookbook sre.dns.netbox
14:53 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ganeti1031.eqiad.wmnet with reason: host reimage
14:53 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs5004.eqsin.wmnet with reason: host reimage
14:51 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.migrate-service-ipip (exit_code=0) for role: eventschemas::service@eqiad
14:51 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on (A:lvs-low-traffic-eqiad or A:lvs-secondary-eqiad) and A:bullseye and A:lvs
14:51 klausman@cumin2002: START - Cookbook sre.hosts.move-vlan for host ml-staging2002
14:50 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on (A:lvs-low-traffic-eqiad or A:lvs-secondary-eqiad) and A:bullseye and A:lvs
14:50 klausman@cumin2002: START - Cookbook sre.hosts.reimage for host ml-staging2002.codfw.wmnet with OS bookworm
14:50 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs5004.eqsin.wmnet with reason: host reimage
14:46 moritzm: restarting r/w slapds to pick up libtasn updates
14:23 dreamyjazz@deploy2002: Started scap sync-world: Backport for Create temporary-account-viewer group (T387205)
14:20 moritzm: installing libtasn1-6 security updates
14:17 vgutierrez@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lvs5004.eqsin.wmnet with reason: depooled before reimage
14:16 vgutierrez: depooling lvs5004 before reimaging - T384477
14:03 sukhe@cumin1002: START - Cookbook sre.cdn.roll-upgrade-ats Rolling upgrade/restart of Apache Traffic Server on P{cp7005.*} or P{cp7009.*} or P{cp[7011-7014]*} or P{cp7016.*} and A:cp for 9.2.6-1wm2
14:02 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ldap-replica1003.wikimedia.org
13:56 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ldap-replica1003.wikimedia.org
13:54 dreamyjazz@deploy2002: dreamyjazz: Continuing with sync
13:54 dreamyjazz@deploy2002: dreamyjazz: Backport for Create temporary-account-viewer group (T387205) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:48 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
13:46 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
13:46 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
13:46 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
13:45 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ldap-replica1004.wikimedia.org
13:43 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-redacteddb1001.eqiad.wmnet with reason: Rebuilding index
13:38 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ldap-replica1004.wikimedia.org
13:30 dreamyjazz@deploy2002: Started scap sync-world: Backport for Create temporary-account-viewer group (T387205)
13:00 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ldap-replica2005.wikimedia.org
12:51 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ldap-replica2005.wikimedia.org
12:48 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ldap-replica2006.wikimedia.org
12:44 jmm@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ldap-replica2006.wikimedia.org
12:32 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1154.eqiad.wmnet with reason: Rebuilding index
12:27 jiji@deploy2002: Finished scap sync-world: Deploy php 8.1.34-1-s3 image (duration: 04m 59s)
12:23 jiji@deploy2002: Started scap sync-world: Deploy php 8.1.34-1-s3 image
12:20 marostegui@cumin1002: dbctl commit (dc=all): 'db2167 (re)pooling @ 100%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74048 and previous config saved to /var/cache/conftool/dbconfig/20250304-122057-root.json
12:05 marostegui@cumin1002: dbctl commit (dc=all): 'db2167 (re)pooling @ 75%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74047 and previous config saved to /var/cache/conftool/dbconfig/20250304-120552-root.json
11:50 marostegui@cumin1002: dbctl commit (dc=all): 'db2167 (re)pooling @ 50%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74046 and previous config saved to /var/cache/conftool/dbconfig/20250304-115047-root.json
11:43 marostegui@cumin1002: dbctl commit (dc=all): 'db1211 (re)pooling @ 100%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74045 and previous config saved to /var/cache/conftool/dbconfig/20250304-114358-root.json
11:43 claime: Deleting obsolete puppet certs for eventstreams.discovery.wmnet and eventgate-analytics-external.discovery.wmnet
11:43 jiji@deploy2002: Started scap sync-world: Deploy php 8.1.34-1-s3 image
11:35 marostegui@cumin1002: dbctl commit (dc=all): 'db2167 (re)pooling @ 25%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74044 and previous config saved to /var/cache/conftool/dbconfig/20250304-113541-root.json
11:28 marostegui@cumin1002: dbctl commit (dc=all): 'db1211 (re)pooling @ 75%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74043 and previous config saved to /var/cache/conftool/dbconfig/20250304-112852-root.json
11:28 vgutierrez: repooling lvs5005 running liberica - T384477
11:23 joal@deploy2002: Finished deploy [airflow-dags/analytics@9a0b051]: Regular analytics weekly train [airflow-dags/analytics@9a0b0519] (duration: 00m 35s)
11:22 joal@deploy2002: Started deploy [airflow-dags/analytics@9a0b051]: Regular analytics weekly train [airflow-dags/analytics@9a0b0519]
11:22 cgoubert@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/services/mw-debug: apply
11:20 marostegui@cumin1002: dbctl commit (dc=all): 'db2167 (re)pooling @ 10%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74039 and previous config saved to /var/cache/conftool/dbconfig/20250304-112035-root.json
11:20 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.clone (exit_code=99) of db1233.eqiad.wmnet onto db1246.eqiad.wmnet
11:16 fceratto@cumin1002: START - Cookbook sre.mysql.clone of db1233.eqiad.wmnet onto db1246.eqiad.wmnet
11:13 marostegui@cumin1002: dbctl commit (dc=all): 'db1211 (re)pooling @ 50%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74038 and previous config saved to /var/cache/conftool/dbconfig/20250304-111347-root.json
11:12 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs5005.eqsin.wmnet with OS bookworm
11:11 cgoubert@deploy2002: helmfile [staging-eqiad] START helmfile.d/services/mw-debug: apply
11:11 marostegui@cumin1002: dbctl commit (dc=all): 'db1184 (re)pooling @ 100%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74037 and previous config saved to /var/cache/conftool/dbconfig/20250304-111146-root.json
11:10 hashar@deploy2002: Finished scap sync-world: testwikis to 1.44.0-wmf.19 refs T386214 (duration: 11m 09s)
11:09 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb1016.eqiad.wmnet with reason: Rebuilding index
11:08 joal@deploy2002: Finished deploy [analytics/refinery@dbcd265] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@dbcd2652] (duration: 00m 35s)
11:07 joal@deploy2002: Started deploy [analytics/refinery@dbcd265] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@dbcd2652]
11:07 joal@deploy2002: Finished deploy [analytics/refinery@dbcd265] (thin): Regular analytics weekly train THIN [analytics/refinery@dbcd2652] (duration: 00m 55s)
11:06 root@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2161.codfw.wmnet with reason: Index rebuild
11:06 joal@deploy2002: Started deploy [analytics/refinery@dbcd265] (thin): Regular analytics weekly train THIN [analytics/refinery@dbcd2652]
11:05 joal@deploy2002: Finished deploy [analytics/refinery@dbcd265]: Regular analytics weekly train [analytics/refinery@dbcd2652] (duration: 02m 58s)
11:05 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db2161.codfw.wmnet
11:05 root@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1214.eqiad.wmnet with reason: Index rebuild
11:04 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db1214.eqiad.wmnet
11:02 joal@deploy2002: Started deploy [analytics/refinery@dbcd265]: Regular analytics weekly train [analytics/refinery@dbcd2652]
10:59 hashar@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.19 refs T386214
10:58 marostegui@cumin1002: START - Cookbook sre.mysql.upgrade for db1214.eqiad.wmnet
10:58 marostegui@cumin1002: dbctl commit (dc=all): 'db1211 (re)pooling @ 25%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74036 and previous config saved to /var/cache/conftool/dbconfig/20250304-105842-root.json
10:58 marostegui@cumin1002: START - Cookbook sre.mysql.upgrade for db2161.codfw.wmnet
10:58 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2161 db1214', diff saved to https://phabricator.wikimedia.org/P74035 and previous config saved to /var/cache/conftool/dbconfig/20250304-105814-marostegui.json
10:56 marostegui@cumin1002: dbctl commit (dc=all): 'db1184 (re)pooling @ 75%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74034 and previous config saved to /var/cache/conftool/dbconfig/20250304-105640-root.json
10:52 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs5005.eqsin.wmnet with reason: host reimage
10:49 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb1019.eqiad.wmnet with reason: Rebuilding index
10:48 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs5005.eqsin.wmnet with reason: host reimage
10:43 gkyziridis@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
10:43 marostegui@cumin1002: dbctl commit (dc=all): 'db1211 (re)pooling @ 10%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74033 and previous config saved to /var/cache/conftool/dbconfig/20250304-104336-root.json
10:41 marostegui@cumin1002: dbctl commit (dc=all): 'db1184 (re)pooling @ 50%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74032 and previous config saved to /var/cache/conftool/dbconfig/20250304-104135-root.json
10:35 xSavitar: T387789 Ran mwscript-k8s --comment="T387789" -f -- extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=enwiki --logwiki=metawiki 'JamesVilla44' 'DartsF4' --ignorestatus
10:27 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs5005.eqsin.wmnet with OS bookworm
10:26 marostegui@cumin1002: dbctl commit (dc=all): 'db1184 (re)pooling @ 25%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74031 and previous config saved to /var/cache/conftool/dbconfig/20250304-102630-root.json
10:21 hashar@deploy2002: rebuilt and synchronized wikiversions files: group0 to 1.44.0-wmf.19 refs T386214
10:20 vgutierrez@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lvs5005.eqsin.wmnet with reason: depooled before reimage
10:19 vgutierrez: depooling lvs5005 before reimaging - T384477
10:11 marostegui@cumin1002: dbctl commit (dc=all): 'db1184 (re)pooling @ 10%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74030 and previous config saved to /var/cache/conftool/dbconfig/20250304-101124-root.json
10:00 dcausse: wdqs: reconciled Q27151108 on both eqiad & codfw wdqs endpoints (T386998)
09:52 aborrero@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host cloudcontrol1005.eqiad.wmnet
09:52 marostegui@cumin1002: dbctl commit (dc=all): 'db2147 (re)pooling @ 100%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74029 and previous config saved to /var/cache/conftool/dbconfig/20250304-095228-root.json
09:41 elukey@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
09:39 elukey@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
09:37 marostegui@cumin1002: dbctl commit (dc=all): 'db2147 (re)pooling @ 75%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74028 and previous config saved to /var/cache/conftool/dbconfig/20250304-093723-root.json
09:34 elukey@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
09:33 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host puppetserver2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
09:33 elukey@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
09:32 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host lvs5006.eqsin.wmnet with OS bookworm
09:32 sgimeno@deploy2002: Finished scap sync-world: Backport for analytics(HomepageHooks,BeforePageDisplayHandler): log experiment_enrollment interaction on new accounts (T387286) (duration: 12m 01s)
09:28 elukey@cumin1002: START - Cookbook sre.hosts.provision for host puppetserver2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
09:28 marostegui@cumin1002: dbctl commit (dc=all): 'db1221 (re)pooling @ 100%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74027 and previous config saved to /var/cache/conftool/dbconfig/20250304-092839-root.json
09:27 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1155.eqiad.wmnet with reason: Rebuilding index
09:25 sgimeno@deploy2002: sgimeno: Continuing with sync
09:23 sgimeno@deploy2002: sgimeno: Backport for analytics(HomepageHooks,BeforePageDisplayHandler): log experiment_enrollment interaction on new accounts (T387286) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
09:23 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ipoid: apply
09:23 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ipoid: apply
09:22 jayme@deploy1003: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
09:22 marostegui@cumin1002: dbctl commit (dc=all): 'db2147 (re)pooling @ 50%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74026 and previous config saved to /var/cache/conftool/dbconfig/20250304-092217-root.json
09:21 jayme@deploy1003: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
09:21 jayme@deploy1003: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
09:20 sgimeno@deploy2002: Started scap sync-world: Backport for analytics(HomepageHooks,BeforePageDisplayHandler): log experiment_enrollment interaction on new accounts (T387286)
09:19 jayme@deploy1003: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
09:16 sgimeno@deploy2002: Finished scap sync-world: Backport for [Growth] Add mediawiki.product_metrics.growth_product_interaction stream config (T387286) (duration: 16m 01s)
09:13 marostegui@cumin1002: dbctl commit (dc=all): 'db1221 (re)pooling @ 75%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74025 and previous config saved to /var/cache/conftool/dbconfig/20250304-091334-root.json
09:08 sgimeno@deploy2002: sgimeno: Continuing with sync
09:07 marostegui@cumin1002: dbctl commit (dc=all): 'db2147 (re)pooling @ 25%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74024 and previous config saved to /var/cache/conftool/dbconfig/20250304-090712-root.json
09:05 sgimeno@deploy2002: sgimeno: Backport for [Growth] Add mediawiki.product_metrics.growth_product_interaction stream config (T387286) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
09:00 sgimeno@deploy2002: Started scap sync-world: Backport for [Growth] Add mediawiki.product_metrics.growth_product_interaction stream config (T387286)
09:00 dcausse@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync
08:59 dcausse@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-main: sync
08:58 marostegui@cumin1002: dbctl commit (dc=all): 'db1221 (re)pooling @ 50%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74023 and previous config saved to /var/cache/conftool/dbconfig/20250304-085829-root.json
08:58 aborrero@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudcontrol1005.eqiad.wmnet
08:57 dcausse@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync
08:56 dcausse@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync
08:55 dcausse: restarting eventgate-main to pickup to new streams (T375821)
08:54 dcausse@deploy2002: Finished scap sync-world: Backport for cirrus: add v1 stream for the search update pipeline (T375821) (duration: 41m 17s)
08:52 marostegui@cumin1002: dbctl commit (dc=all): 'db2147 (re)pooling @ 10%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74022 and previous config saved to /var/cache/conftool/dbconfig/20250304-085207-root.json
08:45 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host puppetserver2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
08:45 elukey@cumin2002: START - Cookbook sre.hosts.provision for host puppetserver2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
08:44 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host puppetserver2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
08:44 vgutierrez@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lvs5006.eqsin.wmnet with reason: host reimage
08:43 marostegui@cumin1002: dbctl commit (dc=all): 'db1221 (re)pooling @ 25%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74021 and previous config saved to /var/cache/conftool/dbconfig/20250304-084325-root.json
08:40 vgutierrez@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lvs5006.eqsin.wmnet with reason: host reimage
08:40 dcausse@deploy2002: dcausse: Continuing with sync
08:29 dcausse@deploy2002: dcausse: Backport for cirrus: add v1 stream for the search update pipeline (T375821) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
08:28 marostegui@cumin1002: dbctl commit (dc=all): 'db1221 (re)pooling @ 10%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74020 and previous config saved to /var/cache/conftool/dbconfig/20250304-082819-root.json
08:24 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb1015.eqiad.wmnet with reason: Rebuilding index
08:17 vgutierrez@cumin1002: START - Cookbook sre.hosts.reimage for host lvs5006.eqsin.wmnet with OS bookworm
08:13 dcausse@deploy2002: Started scap sync-world: Backport for cirrus: add v1 stream for the search update pipeline (T375821)
08:08 hashar@deploy2002: sync-world aborted: testwikis to 1.44.0-wmf.19 refs T386214 (duration: 05m 10s)
08:03 hashar@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.19 refs T386214
08:00 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1031.eqiad.wmnet
07:59 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of kubestagemaster1005.eqiad.wmnet to plain
07:57 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of kubestagemaster1005.eqiad.wmnet to plain
07:54 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1031.eqiad.wmnet
07:53 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1031.eqiad.wmnet
07:51 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.changedisk (exit_code=0) for changing disk type of kubestagemaster1005.eqiad.wmnet to drbd
07:35 jmm@cumin2002: START - Cookbook sre.ganeti.changedisk for changing disk type of kubestagemaster1005.eqiad.wmnet to drbd
07:32 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1031.eqiad.wmnet
07:29 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1031.eqiad.wmnet
06:42 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1184.eqiad.wmnet with reason: Index rebuild
06:41 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db1184.eqiad.wmnet
06:35 marostegui@cumin1002: START - Cookbook sre.mysql.upgrade for db1184.eqiad.wmnet
06:33 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1184 T387552', diff saved to https://phabricator.wikimedia.org/P74019 and previous config saved to /var/cache/conftool/dbconfig/20250304-063320-marostegui.json
06:32 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db1163 to s1 primary T387552', diff saved to https://phabricator.wikimedia.org/P74018 and previous config saved to /var/cache/conftool/dbconfig/20250304-063222-marostegui.json
06:27 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 34 hosts with reason: Primary switchover s1 T387552
06:27 marostegui@cumin1002: dbctl commit (dc=all): 'Remove db1163 from API/vslow/dump T387552', diff saved to https://phabricator.wikimedia.org/P74017 and previous config saved to /var/cache/conftool/dbconfig/20250304-062717-marostegui.json
06:27 marostegui@cumin1002: dbctl commit (dc=all): 'Set db1163 with weight 0 T387552', diff saved to https://phabricator.wikimedia.org/P74016 and previous config saved to /var/cache/conftool/dbconfig/20250304-062702-marostegui.json
06:18 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2167.codfw.wmnet with reason: Index rebuild
06:18 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1211.eqiad.wmnet with reason: Index rebuild
06:17 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db2167.codfw.wmnet
06:17 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db1211.eqiad.wmnet
06:17 root@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2147.codfw.wmnet with reason: Index rebuild
06:17 root@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1221.eqiad.wmnet with reason: Index rebuild
06:17 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db2147.codfw.wmnet
06:16 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db1221.eqiad.wmnet
06:13 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb1020.eqiad.wmnet with reason: Rebuilding index
06:12 marostegui@cumin1002: START - Cookbook sre.mysql.upgrade for db1211.eqiad.wmnet
06:12 marostegui@cumin1002: START - Cookbook sre.mysql.upgrade for db2167.codfw.wmnet
06:11 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2167 db1211', diff saved to https://phabricator.wikimedia.org/P74015 and previous config saved to /var/cache/conftool/dbconfig/20250304-061152-marostegui.json
06:10 marostegui@cumin1002: START - Cookbook sre.mysql.upgrade for db1221.eqiad.wmnet
06:10 marostegui@cumin1002: START - Cookbook sre.mysql.upgrade for db2147.codfw.wmnet
06:10 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1015,1019].eqiad.wmnet,db1155.eqiad.wmnet with reason: Rebuilding index
06:09 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1221 db2147', diff saved to https://phabricator.wikimedia.org/P74014 and previous config saved to /var/cache/conftool/dbconfig/20250304-060927-marostegui.json
05:53 kart_: Updated cxserver to 2025-03-03-041049-production (T369815, T387037)
05:52 kartik@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
05:51 kartik@deploy2002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
05:51 kartik@deploy2002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
05:50 kartik@deploy2002: helmfile [codfw] START helmfile.d/services/cxserver: apply
05:43 kartik@deploy2002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
05:43 kartik@deploy2002: helmfile [staging] START helmfile.d/services/cxserver: apply
05:05 mwpresync@deploy2002: Pruned MediaWiki: 1.44.0-wmf.16 (duration: 05m 56s)
04:02 mwpresync@deploy2002: Started scap sync-world: testwikis to 1.44.0-wmf.19 refs T386214
00:59 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
00:59 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ganeti2046-8 to codfw - jhancock@cumin2002"
00:59 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ganeti2046-8 to codfw - jhancock@cumin2002"
00:53 jhancock@cumin2002: START - Cookbook sre.dns.netbox
00:50 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
00:50 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ganeti2045 to codfw - jhancock@cumin2002"
00:50 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ganeti2045 to codfw - jhancock@cumin2002"
00:39 jhancock@cumin2002: START - Cookbook sre.dns.netbox
00:34 dduvall: deleting older mw-multiversion images on deploy2002 to free space (T387796)
00:01 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1110.eqiad.wmnet with OS bullseye

2025-03-03

23:56 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1109.eqiad.wmnet with OS bullseye
23:53 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host elastic1108.eqiad.wmnet with OS bullseye
23:50 Amir1: deleted local user_password from labswiki database (T104500 and T161859)
23:44 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1110.eqiad.wmnet with reason: host reimage
23:40 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1109.eqiad.wmnet with reason: host reimage
23:38 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1110.eqiad.wmnet with reason: host reimage
23:36 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on elastic1108.eqiad.wmnet with reason: host reimage
23:32 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1109.eqiad.wmnet with reason: host reimage
23:32 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on elastic1108.eqiad.wmnet with reason: host reimage
23:25 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1181.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
23:25 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host backup2014.codfw.wmnet with OS bookworm
23:23 bking@cumin2002: START - Cookbook sre.hosts.reimage for host elastic1110.eqiad.wmnet with OS bullseye
23:22 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic1110.eqiad.wmnet with OS bullseye
23:17 bking@cumin2002: START - Cookbook sre.hosts.reimage for host elastic1109.eqiad.wmnet with OS bullseye
23:16 bking@cumin2002: START - Cookbook sre.hosts.reimage for host elastic1108.eqiad.wmnet with OS bullseye
23:02 ebernhardson@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
23:02 ebernhardson@deploy2002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
23:01 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic1109.eqiad.wmnet with OS bullseye
22:56 ryankemper: T384422 Deploying backend.yaml routing patch; after it's deployed we should theoretically be able to see a UI at https://query-legacy-full.wikidata.org/
22:52 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host elastic1108.eqiad.wmnet with OS bullseye
22:52 tgr_: late UTC deploys done
22:51 vriley@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host an-worker1181.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
22:51 tgr@deploy2002: Finished scap sync-world: Backport for feat(Surfacing): Add Change Tag for surfaced Add a Link (T387160) (duration: 31m 28s)
22:49 ryankemper@dns1004: END - running authdns-update
22:47 ryankemper@dns1004: START - running authdns-update
22:47 ryankemper: T384422 Merging DNS patch now https://gerrit.wikimedia.org/r/c/operations/dns/+/1122676
22:46 ryankemper: T384422 k8s deployment of `wikidata-query-legacy-full-gui` release in codfw looks fine, proceeding to eqiad
22:46 ryankemper@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikidata-query-gui: apply
22:45 ryankemper@deploy2002: helmfile [eqiad] START helmfile.d/services/wikidata-query-gui: apply
22:41 tgr@deploy2002: migr, tgr: Continuing with sync
22:39 ryankemper@deploy2002: helmfile [codfw] DONE helmfile.d/services/wikidata-query-gui: apply
22:39 ryankemper@deploy2002: helmfile [codfw] START helmfile.d/services/wikidata-query-gui: apply
22:35 tgr@deploy2002: migr, tgr: Backport for feat(Surfacing): Add Change Tag for surfaced Add a Link (T387160) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
22:32 ryankemper@deploy2002: helmfile [eqiad] DONE helmfile.d/services/wikidata-query-gui: apply
22:32 ryankemper@deploy2002: helmfile [eqiad] START helmfile.d/services/wikidata-query-gui: apply
22:24 bking@cumin2002: START - Cookbook sre.hosts.reimage for host elastic1110.eqiad.wmnet with OS bullseye
22:21 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from relforge1007 to elastic1110
22:20 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host elastic1110
22:20 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host elastic1110
22:20 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
22:20 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming relforge1007 to elastic1110 - bking@cumin2002"
22:20 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming relforge1007 to elastic1110 - bking@cumin2002"
22:19 tgr@deploy2002: Started scap sync-world: Backport for feat(Surfacing): Add Change Tag for surfaced Add a Link (T387160)
22:15 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1181.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
22:13 vriley@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-worker1181
22:13 vriley@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host an-worker1181
22:12 bking@cumin2002: START - Cookbook sre.dns.netbox
22:12 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
22:12 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt an-worker1181 - vriley@cumin1002"
22:11 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt an-worker1181 - vriley@cumin1002"
22:08 bking@cumin2002: START - Cookbook sre.hosts.rename from relforge1007 to elastic1110
22:07 vriley@cumin1002: START - Cookbook sre.dns.netbox
22:07 vriley@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-worker1181
22:07 vriley@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host an-worker1181
22:06 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
22:05 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host backup2014.codfw.wmnet with OS bookworm
22:04 vriley@cumin1002: START - Cookbook sre.dns.netbox
22:02 bking@cumin2002: START - Cookbook sre.hosts.reimage for host elastic1109.eqiad.wmnet with OS bullseye
22:01 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from relforge1006 to elastic1109
22:01 tgr@deploy2002: Finished scap sync-world: Backport for Use session storage for session tick events (T387400), Update experiment name for Search AB test french wiki (T387400) (duration: 26m 04s)
22:00 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host elastic1109
22:00 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host elastic1109
22:00 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
22:00 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming relforge1006 to elastic1109 - bking@cumin2002"
22:00 bking@cumin2002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 16:00:00 on relforge[1003-1004,1006-1007].eqiad.wmnet with reason: T387782
21:59 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming relforge1006 to elastic1109 - bking@cumin2002"
21:56 bking@cumin2002: START - Cookbook sre.dns.netbox
21:55 bking@cumin2002: START - Cookbook sre.hosts.rename from relforge1006 to elastic1109
21:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host backup2013.codfw.wmnet with OS bookworm
21:54 bking@cumin2002: START - Cookbook sre.hosts.reimage for host elastic1108.eqiad.wmnet with OS bullseye
21:53 tgr@deploy2002: bwang, tgr: Continuing with sync
21:52 bking@cumin2002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from relforge1005 to elastic1108
21:51 bking@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host elastic1108
21:51 bking@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host elastic1108
21:51 bking@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
21:51 bking@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming relforge1005 to elastic1108 - bking@cumin2002"
21:50 bking@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming relforge1005 to elastic1108 - bking@cumin2002"
21:45 bking@cumin2002: START - Cookbook sre.dns.netbox
21:44 bking@cumin2002: START - Cookbook sre.hosts.rename from relforge1005 to elastic1108
21:44 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.rename (exit_code=99) from relforge1006 to elastic1109
21:44 bking@cumin2002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
21:43 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.rename (exit_code=99) from relforge1005 to elastic1108
21:43 bking@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
21:42 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1181.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
21:42 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1181.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
21:42 bking@cumin2002: START - Cookbook sre.dns.netbox
21:41 vriley@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host an-worker1181.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
21:41 vriley@cumin1002: START - Cookbook sre.hosts.provision for host an-worker1181.mgmt.eqiad.wmnet with chassis set policy FORCE_RESTART and with Dell SCP reboot policy FORCED
21:39 ebernhardson@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
21:39 ebernhardson@deploy2002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
21:38 bking@cumin2002: START - Cookbook sre.hosts.rename from relforge1006 to elastic1109
21:37 tgr@deploy2002: bwang, tgr: Backport for Use session storage for session tick events (T387400), Update experiment name for Search AB test french wiki (T387400) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:37 bking@cumin2002: START - Cookbook sre.dns.netbox
21:36 bking@cumin2002: START - Cookbook sre.hosts.rename from relforge1005 to elastic1108
21:35 vriley@cumin1002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host an-worker1181
21:35 vriley@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host an-worker1181
21:35 vriley@cumin1002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host an-worker1181
21:35 vriley@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host an-worker1181
21:34 tgr@deploy2002: Started scap sync-world: Backport for Use session storage for session tick events (T387400), Update experiment name for Search AB test french wiki (T387400)
21:34 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
21:34 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt an-worker1181 - vriley@cumin1002"
21:34 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt an-worker1181 - vriley@cumin1002"
21:30 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply
21:30 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-media: apply
21:30 vriley@cumin1002: START - Cookbook sre.dns.netbox
21:27 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply
21:26 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-media: apply
21:26 tgr@deploy2002: Finished scap sync-world: Backport for Remove unused config variable $wgJsonConfigInterwikiPrefix, Fix inconsistent definitions for $wmgLocalServices['chart-renderer'], Set $wgCentralAuthSharedDomainCallback (T387357) (duration: 10m 06s)
21:21 swfrench@deploy2002: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply
21:21 swfrench@deploy2002: helmfile [staging] START helmfile.d/services/shellbox-media: apply
21:19 tgr@deploy2002: matmarex, tgr: Continuing with sync
21:19 tgr@deploy2002: matmarex, tgr: Backport for Remove unused config variable $wgJsonConfigInterwikiPrefix, Fix inconsistent definitions for $wmgLocalServices['chart-renderer'], Set $wgCentralAuthSharedDomainCallback (T387357) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:16 tgr@deploy2002: Started scap sync-world: Backport for Remove unused config variable $wgJsonConfigInterwikiPrefix, Fix inconsistent definitions for $wmgLocalServices['chart-renderer'], Set $wgCentralAuthSharedDomainCallback (T387357)
21:10 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host backup2013.codfw.wmnet with OS bookworm
20:31 marostegui@cumin1002: dbctl commit (dc=all): 'db1203 (re)pooling @ 100%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74012 and previous config saved to /var/cache/conftool/dbconfig/20250303-203158-root.json
20:31 marostegui@cumin1002: dbctl commit (dc=all): 'db2162 (re)pooling @ 100%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74011 and previous config saved to /var/cache/conftool/dbconfig/20250303-203100-root.json
20:16 marostegui@cumin1002: dbctl commit (dc=all): 'db1203 (re)pooling @ 75%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74010 and previous config saved to /var/cache/conftool/dbconfig/20250303-201652-root.json
20:15 marostegui@cumin1002: dbctl commit (dc=all): 'db2162 (re)pooling @ 75%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74009 and previous config saved to /var/cache/conftool/dbconfig/20250303-201554-root.json
20:01 marostegui@cumin1002: dbctl commit (dc=all): 'db1203 (re)pooling @ 50%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74008 and previous config saved to /var/cache/conftool/dbconfig/20250303-200146-root.json
20:00 marostegui@cumin1002: dbctl commit (dc=all): 'db2162 (re)pooling @ 50%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74007 and previous config saved to /var/cache/conftool/dbconfig/20250303-200048-root.json
19:46 marostegui@cumin1002: dbctl commit (dc=all): 'db1203 (re)pooling @ 25%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74006 and previous config saved to /var/cache/conftool/dbconfig/20250303-194641-root.json
19:45 marostegui@cumin1002: dbctl commit (dc=all): 'db2162 (re)pooling @ 25%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74005 and previous config saved to /var/cache/conftool/dbconfig/20250303-194543-root.json
19:37 marostegui@cumin1002: dbctl commit (dc=all): 'db2155 (re)pooling @ 100%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74004 and previous config saved to /var/cache/conftool/dbconfig/20250303-193742-root.json
19:31 marostegui@cumin1002: dbctl commit (dc=all): 'db1203 (re)pooling @ 10%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74003 and previous config saved to /var/cache/conftool/dbconfig/20250303-193136-root.json
19:30 marostegui@cumin1002: dbctl commit (dc=all): 'db2162 (re)pooling @ 10%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74002 and previous config saved to /var/cache/conftool/dbconfig/20250303-193038-root.json
19:26 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: reforge1005*,relforge1006*,relforge1007* for ban hosts prior to revert - bking@cumin2002 - T387176
19:26 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: reforge1005*,relforge1006*,relforge1007* for ban hosts prior to revert - bking@cumin2002 - T387176
19:22 marostegui@cumin1002: dbctl commit (dc=all): 'db2155 (re)pooling @ 75%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74001 and previous config saved to /var/cache/conftool/dbconfig/20250303-192237-root.json
19:15 marostegui@cumin1002: dbctl commit (dc=all): 'db1247 (re)pooling @ 100%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P74000 and previous config saved to /var/cache/conftool/dbconfig/20250303-191513-root.json
19:08 swfrench-wmf: serving 10% of mw-api-int traffic on PHP 8.1 - T383845
19:07 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
19:07 marostegui@cumin1002: dbctl commit (dc=all): 'db2155 (re)pooling @ 50%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73999 and previous config saved to /var/cache/conftool/dbconfig/20250303-190732-root.json
19:07 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
19:07 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
19:07 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
19:06 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
19:06 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
19:05 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
19:05 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
19:05 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
19:05 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
19:03 dcausse@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
19:02 dcausse@deploy2002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
19:00 marostegui@cumin1002: dbctl commit (dc=all): 'db1247 (re)pooling @ 75%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73998 and previous config saved to /var/cache/conftool/dbconfig/20250303-190007-root.json
18:52 marostegui@cumin1002: dbctl commit (dc=all): 'db2155 (re)pooling @ 25%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73997 and previous config saved to /var/cache/conftool/dbconfig/20250303-185227-root.json
18:45 marostegui@cumin1002: dbctl commit (dc=all): 'db1247 (re)pooling @ 50%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73996 and previous config saved to /var/cache/conftool/dbconfig/20250303-184501-root.json
18:44 dcausse@deploy2002: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
18:44 dcausse@deploy2002: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
18:44 dcausse@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
18:43 dcausse@deploy2002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
18:37 marostegui@cumin1002: dbctl commit (dc=all): 'db2155 (re)pooling @ 10%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73995 and previous config saved to /var/cache/conftool/dbconfig/20250303-183721-root.json
18:33 swfrench@deploy2002: Finished scap sync-world: Backport for Enroll 100% of client sessions in PHP 8.1 (T383845) (duration: 11m 03s)
18:29 marostegui@cumin1002: dbctl commit (dc=all): 'db1247 (re)pooling @ 25%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73994 and previous config saved to /var/cache/conftool/dbconfig/20250303-182956-root.json
18:26 swfrench@deploy2002: swfrench: Continuing with sync
18:24 swfrench@deploy2002: swfrench: Backport for Enroll 100% of client sessions in PHP 8.1 (T383845) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
18:22 swfrench@deploy2002: Started scap sync-world: Backport for Enroll 100% of client sessions in PHP 8.1 (T383845)
18:17 swfrench-wmf: scaled mw-(api-ext|web) next deployments to 40% of main size - T383845
18:16 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
18:16 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-web: apply
18:15 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
18:15 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
18:14 marostegui@cumin1002: dbctl commit (dc=all): 'db1247 (re)pooling @ 10%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73993 and previous config saved to /var/cache/conftool/dbconfig/20250303-181451-root.json
18:13 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
18:13 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
18:12 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
18:12 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
18:03 elukey@cumin2002: START - Cookbook sre.hosts.provision for host puppetserver2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
18:02 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
18:02 elukey@cumin2002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
18:01 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
18:01 elukey@cumin2002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
18:00 moritzm: repool maps2009 T387431
17:54 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
17:54 elukey@cumin2002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
17:42 elukey@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
17:40 elukey@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
17:16 dancy@deploy2002: Installation of scap version "4.139.0" completed for 204 hosts
17:11 dancy@deploy2002: Installing scap version "4.139.0" for 204 host(s)
16:48 otto@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: apply
16:47 elukey@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
16:46 elukey@cumin2002: START - Cookbook sre.hosts.provision for host sretest2001.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
16:43 otto@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: apply
16:42 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-logging-external: apply
16:42 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-logging-external: apply
16:38 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply
16:38 ottomata: deploying eventgate-logging-external to ACTUALLY bump to node20 - T383814
16:37 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply
16:34 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.clone (exit_code=99) of db1248.eqiad.wmnet onto db1252.eqiad.wmnet
16:26 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2167 gradually with 4 steps - Cloned db2166 to db2167
16:20 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db2166 gradually with 4 steps - Cloned db2166 to db2167
16:18 moritzm: depool maps2009 T387431
16:10 swfrench-wmf: finished shellbox-media PHP 8.1 pilot - T377038
16:10 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply
16:10 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-media: apply
16:10 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply
16:10 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply
16:01 root@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2155.codfw.wmnet with reason: Index rebuild
16:01 root@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1247.eqiad.wmnet with reason: Index rebuild
16:01 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db2155.codfw.wmnet
16:00 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db1247.eqiad.wmnet
15:58 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.pool (exit_code=99) db1252 gradually with 4 steps - Cloned db124 to db1252
15:58 fceratto@cumin1002: START - Cookbook sre.mysql.pool db1252 gradually with 4 steps - Cloned db124 to db1252
15:58 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.pool (exit_code=99) db1252 gradually with 4 steps - Cloned db124 to db1252
15:58 fceratto@cumin1002: START - Cookbook sre.mysql.pool db1252 gradually with 4 steps - Cloned db124 to db1252
15:55 marostegui@cumin1002: START - Cookbook sre.mysql.upgrade for db2155.codfw.wmnet
15:54 marostegui@cumin1002: START - Cookbook sre.mysql.upgrade for db1247.eqiad.wmnet
15:54 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1247 db2155', diff saved to https://phabricator.wikimedia.org/P73985 and previous config saved to /var/cache/conftool/dbconfig/20250303-155447-marostegui.json
15:53 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db[2155,2187].codfw.wmnet with reason: Rebuilding indexes
15:51 swfrench-wmf: started shellbox-media PHP 8.1 pilot with increased logging - T377038
15:50 swfrench@deploy2002: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply
15:50 swfrench@deploy2002: helmfile [codfw] START helmfile.d/services/shellbox-media: apply
15:47 swfrench@deploy2002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply
15:47 swfrench@deploy2002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply
15:42 otto@deploy2002: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: apply
15:41 otto@deploy2002: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: apply
15:41 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.pool (exit_code=0) db1248 gradually with 4 steps - Cloning db1252.eqiad.wmnet completed
15:40 fceratto@cumin1002: START - Cookbook sre.mysql.pool db2167 gradually with 4 steps - Cloned db2166 to db2167
15:38 otto@deploy2002: helmfile [eqiad] DONE helmfile.d/services/eventgate-logging-external: apply
15:37 otto@deploy2002: helmfile [eqiad] START helmfile.d/services/eventgate-logging-external: apply
15:36 ottomata: deploying eventgate-logging-external to bump to node20 - T383814
15:36 otto@deploy2002: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply
15:36 otto@deploy2002: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply
15:35 fceratto@cumin1002: START - Cookbook sre.mysql.pool db2166 gradually with 4 steps - Cloned db2166 to db2167
15:24 ihurbain: UTC afternoon deploys done
15:18 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host puppetserver2004.codfw.wmnet with OS bookworm
15:11 fceratto@cumin1002: START - Cookbook sre.mysql.pool db1248 gradually with 4 steps - Cloning db1252.eqiad.wmnet completed
15:11 marostegui@cumin1002: dbctl commit (dc=all): 'db2206 (re)pooling @ 100%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73978 and previous config saved to /var/cache/conftool/dbconfig/20250303-151113-root.json
15:11 marostegui@cumin1002: dbctl commit (dc=all): 'db1249 (re)pooling @ 100%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73977 and previous config saved to /var/cache/conftool/dbconfig/20250303-151107-root.json
15:11 fceratto@cumin1002: dbctl commit (dc=all): 'Pooling in after cloning to db1252 T385141', diff saved to https://phabricator.wikimedia.org/P73976 and previous config saved to /var/cache/conftool/dbconfig/20250303-151103-fceratto.json
15:09 ihurbain@deploy2002: Finished scap sync-world: Backport for Remove $wmgUseGraphWithJsonNamespace (T124748) (duration: 11m 55s)
15:03 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on puppetserver2004.codfw.wmnet with reason: host reimage
15:02 ihurbain@deploy2002: matmarex, ihurbain: Continuing with sync
15:00 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.pool (exit_code=99) db1248 gradually with 4 steps - Cloning db1252.eqiad.wmnet completed
15:00 fceratto@cumin1002: START - Cookbook sre.mysql.pool db1248 gradually with 4 steps - Cloning db1252.eqiad.wmnet completed
14:59 ihurbain@deploy2002: matmarex, ihurbain: Backport for Remove $wmgUseGraphWithJsonNamespace (T124748) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:57 ihurbain@deploy2002: Started scap sync-world: Backport for Remove $wmgUseGraphWithJsonNamespace (T124748)
14:56 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on puppetserver2004.codfw.wmnet with reason: host reimage
14:54 ihurbain@deploy2002: Finished scap sync-world: Backport for Change license for Russian Wikinews to CC-BY-4.0 (T387279), Revert "Turn on Parsoid fragment support everywhere" (T387608) (duration: 11m 39s)
14:52 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Unbanning all hosts in search_eqiad
14:52 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Unbanning all hosts in search_eqiad
14:49 marostegui@cumin1002: dbctl commit (dc=all): 'db2206 (re)pooling @ 75%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73974 and previous config saved to /var/cache/conftool/dbconfig/20250303-144930-root.json
14:48 marostegui@cumin1002: dbctl commit (dc=all): 'db1249 (re)pooling @ 75%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73973 and previous config saved to /var/cache/conftool/dbconfig/20250303-144805-root.json
14:48 ihurbain@deploy2002: matmarex, ssastry, ihurbain: Continuing with sync
14:46 ihurbain@deploy2002: matmarex, ssastry, ihurbain: Backport for Change license for Russian Wikinews to CC-BY-4.0 (T387279), Revert "Turn on Parsoid fragment support everywhere" (T387608) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:44 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host elastic1066.eqiad.wmnet
14:44 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host puppetserver2004.codfw.wmnet with OS bookworm
14:42 ihurbain@deploy2002: Started scap sync-world: Backport for Change license for Russian Wikinews to CC-BY-4.0 (T387279), Revert "Turn on Parsoid fragment support everywhere" (T387608)
14:42 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host puppetserver2004.mgmt.codfw.wmnet with chassis set policy GRACEFUL_RESTART
14:22 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Enable fixed Wikibase RDF everywhere (T384344)
14:21 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: elastic1066* for ban elastic1066 to hopefully stop rejections - bking@cumin2002 - T387176
14:21 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: elastic1066* for ban elastic1066 to hopefully stop rejections - bking@cumin2002 - T387176
14:21 lucaswerkmeister-wmde@deploy2002: Finished scap sync-world: Backport for Set Transwiki namespace on zhwikivoyage and zhwikiversity (T387055) (duration: 14m 02s)
14:19 marostegui@cumin1002: dbctl commit (dc=all): 'db2206 (re)pooling @ 25%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73970 and previous config saved to /var/cache/conftool/dbconfig/20250303-141919-root.json
14:17 marostegui@cumin1002: dbctl commit (dc=all): 'db1249 (re)pooling @ 25%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73969 and previous config saved to /var/cache/conftool/dbconfig/20250303-141754-root.json
14:14 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2162.codfw.wmnet with reason: Index rebuild
14:14 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1203.eqiad.wmnet with reason: Index rebuild
14:14 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db2162.codfw.wmnet
14:13 marostegui@cumin1002: dbctl commit (dc=all): 'db2164 (re)pooling @ 100%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73968 and previous config saved to /var/cache/conftool/dbconfig/20250303-141350-root.json
14:13 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db1203.eqiad.wmnet
14:13 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, sdhehua: Continuing with sync
14:12 lucaswerkmeister-wmde@deploy2002: lucaswerkmeister-wmde, sdhehua: Backport for Set Transwiki namespace on zhwikivoyage and zhwikiversity (T387055) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:07 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1009.eqiad.wmnet with reason: Rebuilding indexes
14:07 lucaswerkmeister-wmde@deploy2002: Started scap sync-world: Backport for Set Transwiki namespace on zhwikivoyage and zhwikiversity (T387055)
14:07 marostegui@cumin1002: START - Cookbook sre.mysql.upgrade for db2162.codfw.wmnet
14:06 marostegui@cumin1002: START - Cookbook sre.mysql.upgrade for db1203.eqiad.wmnet
14:06 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2162 db1203', diff saved to https://phabricator.wikimedia.org/P73966 and previous config saved to /var/cache/conftool/dbconfig/20250303-140638-marostegui.json
14:04 marostegui@cumin1002: dbctl commit (dc=all): 'db2206 (re)pooling @ 10%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73965 and previous config saved to /var/cache/conftool/dbconfig/20250303-140414-root.json
14:03 marostegui@cumin1002: dbctl commit (dc=all): 'db1172 (re)pooling @ 100%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73964 and previous config saved to /var/cache/conftool/dbconfig/20250303-140309-root.json
14:02 marostegui@cumin1002: dbctl commit (dc=all): 'db1249 (re)pooling @ 10%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73963 and previous config saved to /var/cache/conftool/dbconfig/20250303-140249-root.json
13:58 marostegui@cumin1002: dbctl commit (dc=all): 'db2164 (re)pooling @ 75%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73962 and previous config saved to /var/cache/conftool/dbconfig/20250303-135845-root.json
13:56 kevinbazira@deploy2002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'article-models' for release 'main' .
13:55 kevinbazira@deploy2002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'article-models' for release 'main' .
13:54 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.clone (exit_code=99) of db2166.codfw.wmnet onto db2167.codfw.wmnet
13:48 marostegui@cumin1002: dbctl commit (dc=all): 'db1172 (re)pooling @ 75%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73961 and previous config saved to /var/cache/conftool/dbconfig/20250303-134804-root.json
13:43 marostegui@cumin1002: dbctl commit (dc=all): 'db2164 (re)pooling @ 50%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73960 and previous config saved to /var/cache/conftool/dbconfig/20250303-134340-root.json
13:37 cgoubert@deploy2002: Finished scap sync-world: Deploying 1116800 1122563 (duration: 02m 15s)
13:37 marostegui@cumin1002: dbctl commit (dc=all): 'db1220 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P73959 and previous config saved to /var/cache/conftool/dbconfig/20250303-133713-root.json
13:35 cgoubert@deploy2002: Started scap sync-world: Deploying 1116800 1122563
13:32 marostegui@cumin1002: dbctl commit (dc=all): 'db1172 (re)pooling @ 50%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73958 and previous config saved to /var/cache/conftool/dbconfig/20250303-133258-root.json
13:28 marostegui@cumin1002: dbctl commit (dc=all): 'db2164 (re)pooling @ 25%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73957 and previous config saved to /var/cache/conftool/dbconfig/20250303-132834-root.json
13:24 moritzm: failover Ganeti master in eqiad to ganeti1048 T382507
13:22 marostegui@cumin1002: dbctl commit (dc=all): 'db1220 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P73956 and previous config saved to /var/cache/conftool/dbconfig/20250303-132207-root.json
13:17 marostegui@cumin1002: dbctl commit (dc=all): 'db1172 (re)pooling @ 25%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73955 and previous config saved to /var/cache/conftool/dbconfig/20250303-131752-root.json
13:17 tgr_: undid arbcom_ruwiki block of CirrusSearch_Streaming_Updater via blockUser.php
13:15 cgoubert@deploy2002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
13:14 cgoubert@deploy2002: helmfile [codfw] START helmfile.d/admin 'apply'.
13:14 cgoubert@deploy2002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
13:13 marostegui@cumin1002: dbctl commit (dc=all): 'db2164 (re)pooling @ 10%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73954 and previous config saved to /var/cache/conftool/dbconfig/20250303-131329-root.json
13:12 cgoubert@deploy2002: helmfile [eqiad] START helmfile.d/admin 'apply'.
13:12 cgoubert@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
13:10 cgoubert@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
13:10 cgoubert@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
13:07 cgoubert@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
13:07 cgoubert@deploy2002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
13:07 marostegui@cumin1002: dbctl commit (dc=all): 'db1220 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P73953 and previous config saved to /var/cache/conftool/dbconfig/20250303-130702-root.json
13:06 cgoubert@deploy2002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
13:02 marostegui@cumin1002: dbctl commit (dc=all): 'db1172 (re)pooling @ 10%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73952 and previous config saved to /var/cache/conftool/dbconfig/20250303-130247-root.json
13:01 cgoubert@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
13:01 cgoubert@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
12:58 cgoubert@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
12:58 cgoubert@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
12:56 cgoubert@deploy2002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
12:55 cgoubert@deploy2002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
12:55 cgoubert@deploy2002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
12:53 cgoubert@deploy2002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
12:51 marostegui@cumin1002: dbctl commit (dc=all): 'db1220 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P73951 and previous config saved to /var/cache/conftool/dbconfig/20250303-125156-root.json
12:36 marostegui@cumin1002: dbctl commit (dc=all): 'db1220 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P73950 and previous config saved to /var/cache/conftool/dbconfig/20250303-123651-root.json
12:33 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db1220.eqiad.wmnet
12:29 marostegui@cumin1002: START - Cookbook sre.mysql.upgrade for db1220.eqiad.wmnet
12:26 marostegui@cumin1002: dbctl commit (dc=all): 'db1233 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P73949 and previous config saved to /var/cache/conftool/dbconfig/20250303-122609-root.json
12:24 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1220 T387557', diff saved to https://phabricator.wikimedia.org/P73948 and previous config saved to /var/cache/conftool/dbconfig/20250303-122437-marostegui.json
12:23 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db1237 to x1 primary T387557', diff saved to https://phabricator.wikimedia.org/P73947 and previous config saved to /var/cache/conftool/dbconfig/20250303-122304-root.json
12:22 marostegui: Starting x1 eqiad failover from db1220 to db1237 - T387557
12:17 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.migrate-service-ipip (exit_code=0) for role: restbase::production@eqiad
12:17 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on (A:lvs-low-traffic-eqiad or A:lvs-secondary-eqiad) and A:bullseye and A:lvs
12:16 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 13 hosts with reason: Primary switchover x1 T387557
12:16 marostegui@cumin1002: dbctl commit (dc=all): 'Set db1237 with weight 0 T387557', diff saved to https://phabricator.wikimedia.org/P73946 and previous config saved to /var/cache/conftool/dbconfig/20250303-121623-root.json
12:16 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on (A:lvs-low-traffic-eqiad or A:lvs-secondary-eqiad) and A:bullseye and A:lvs
12:11 marostegui@cumin1002: dbctl commit (dc=all): 'db1233 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P73945 and previous config saved to /var/cache/conftool/dbconfig/20250303-121104-root.json
12:09 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.migrate-service-ipip for role: restbase::production@eqiad
12:08 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.migrate-service-ipip (exit_code=0) for role: wmcs::openstack::eqiad1::cloudweb@eqiad
12:08 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on (A:lvs-low-traffic-eqiad or A:lvs-secondary-eqiad) and A:bullseye and A:lvs
12:07 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on (A:lvs-low-traffic-eqiad or A:lvs-secondary-eqiad) and A:bullseye and A:lvs
12:03 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.migrate-service-ipip for role: wmcs::openstack::eqiad1::cloudweb@eqiad
12:02 jgiannelos@deploy2002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
12:01 jgiannelos@deploy2002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
12:01 jgiannelos@deploy2002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
12:00 jgiannelos@deploy2002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
12:00 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.migrate-service-ipip (exit_code=0) for role: restbase::production@codfw
12:00 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on (A:lvs-low-traffic-codfw or A:lvs-secondary-codfw) and A:bullseye and A:lvs
12:00 jgiannelos@deploy2002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
11:59 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on (A:lvs-low-traffic-codfw or A:lvs-secondary-codfw) and A:bullseye and A:lvs
11:59 jayme: Imported helmfile 0.171.0-2 and helm-diff 3.10.0-1 to bullseye-wikimedia and bookworm-wikimedia - T341984 T387376
11:58 jgiannelos@deploy2002: helmfile [staging] START helmfile.d/services/mobileapps: apply
11:57 elukey@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
11:56 elukey@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
11:56 marostegui@cumin1002: dbctl commit (dc=all): 'db1233 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P73944 and previous config saved to /var/cache/conftool/dbconfig/20250303-115559-root.json
11:52 kevinbazira@deploy2002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'article-models' for release 'main' .
11:52 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.migrate-service-ipip for role: restbase::production@codfw
11:49 jayme@deploy1003: helmfile [staging] DONE helmfile.d/services/ipoid: apply
11:49 root@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2206.codfw.wmnet with reason: Index rebuild
11:48 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1249.eqiad.wmnet with reason: Index rebuild
11:48 jayme@deploy1003: helmfile [staging] START helmfile.d/services/ipoid: apply
11:48 fceratto@cumin1002: START - Cookbook sre.mysql.clone of db2166.codfw.wmnet onto db2167.codfw.wmnet
11:45 marostegui@cumin1002: dbctl commit (dc=all): 'db1190 (re)pooling @ 100%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73943 and previous config saved to /var/cache/conftool/dbconfig/20250303-114500-root.json
11:43 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.clone (exit_code=99) of db2166.codfw.wmnet onto db2167.codfw.wmnet
11:42 fceratto@cumin1002: START - Cookbook sre.mysql.clone of db2166.codfw.wmnet onto db2167.codfw.wmnet
11:42 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.migrate-service-ipip (exit_code=0) for role: mediawiki::jobrunner@eqiad
11:42 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on (A:lvs-low-traffic-eqiad or A:lvs-secondary-eqiad) and A:bullseye and A:lvs
11:41 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on (A:lvs-low-traffic-eqiad or A:lvs-secondary-eqiad) and A:bullseye and A:lvs
11:40 marostegui@cumin1002: dbctl commit (dc=all): 'db1233 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P73942 and previous config saved to /var/cache/conftool/dbconfig/20250303-114054-root.json
11:38 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db1249.eqiad.wmnet
11:38 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db2206.codfw.wmnet
11:37 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.migrate-service-ipip for role: mediawiki::jobrunner@eqiad
11:32 marostegui@cumin1002: START - Cookbook sre.mysql.upgrade for db1249.eqiad.wmnet
11:32 marostegui@cumin1002: START - Cookbook sre.mysql.upgrade for db2206.codfw.wmnet
11:32 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2206 db1249', diff saved to https://phabricator.wikimedia.org/P73941 and previous config saved to /var/cache/conftool/dbconfig/20250303-113225-root.json
11:29 marostegui@cumin1002: dbctl commit (dc=all): 'db1190 (re)pooling @ 75%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73940 and previous config saved to /var/cache/conftool/dbconfig/20250303-112954-root.json
11:25 marostegui@cumin1002: dbctl commit (dc=all): 'db1233 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P73939 and previous config saved to /var/cache/conftool/dbconfig/20250303-112548-root.json
11:18 fceratto@cumin1002: END (FAIL) - Cookbook sre.mysql.clone (exit_code=99) of db2166.codfw.wmnet onto db2167.codfw.wmnet
11:17 elukey@deploy2002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'sync'.
11:17 elukey@deploy2002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'sync'.
11:14 marostegui@cumin1002: dbctl commit (dc=all): 'db1190 (re)pooling @ 50%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73938 and previous config saved to /var/cache/conftool/dbconfig/20250303-111448-root.json
11:12 elukey@deploy2002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'sync'.
11:11 elukey@deploy2002: helmfile [ml-serve-codfw] START helmfile.d/admin 'sync'.
11:11 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.migrate-service-ipip (exit_code=0) for role: mediawiki::jobrunner@codfw
11:11 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on (A:lvs-low-traffic-codfw or A:lvs-secondary-codfw) and A:bullseye and A:lvs
11:10 elukey@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
11:09 elukey@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
11:08 marostegui@cumin1002: dbctl commit (dc=all): 'db2210 (re)pooling @ 100%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73937 and previous config saved to /var/cache/conftool/dbconfig/20250303-110830-root.json
11:05 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on (A:lvs-low-traffic-codfw or A:lvs-secondary-codfw) and A:bullseye and A:lvs
11:01 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.migrate-service-ipip for role: mediawiki::jobrunner@codfw
10:59 marostegui@cumin1002: dbctl commit (dc=all): 'db1190 (re)pooling @ 25%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73936 and previous config saved to /var/cache/conftool/dbconfig/20250303-105943-root.json
10:58 fceratto@cumin1002: END (PASS) - Cookbook sre.mysql.depool (exit_code=0) db2166 - catching up replication
10:58 fceratto@cumin1002: START - Cookbook sre.mysql.depool db2166 - catching up replication
10:54 elukey@deploy2002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
10:53 marostegui@cumin1002: dbctl commit (dc=all): 'db2210 (re)pooling @ 75%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73935 and previous config saved to /var/cache/conftool/dbconfig/20250303-105325-root.json
10:52 elukey@deploy2002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
10:51 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1233.eqiad.wmnet onto db1246.eqiad.wmnet
10:44 marostegui@cumin1002: dbctl commit (dc=all): 'db1190 (re)pooling @ 10%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73934 and previous config saved to /var/cache/conftool/dbconfig/20250303-104438-root.json
10:40 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.cf (exit_code=0)
10:40 ayounsi@cumin1002: START - Cookbook sre.network.cf
10:38 marostegui@cumin1002: dbctl commit (dc=all): 'db2210 (re)pooling @ 50%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73933 and previous config saved to /var/cache/conftool/dbconfig/20250303-103820-root.json
10:34 fceratto@cumin1002: START - Cookbook sre.mysql.clone of db1248.eqiad.wmnet onto db1252.eqiad.wmnet
10:28 fceratto@cumin1002: END (ERROR) - Cookbook sre.mysql.clone (exit_code=97) of db1248.eqiad.wmnet onto db1252.eqiad.wmnet
10:26 hashar: Upgraded scap to 4.139.0 # T303828
10:26 hashar@deploy2002: Installation of scap version "4.139.0" completed for 204 hosts
10:21 hashar@deploy2002: Installing scap version "4.139.0" for 204 host(s)
10:21 marostegui@cumin1002: dbctl commit (dc=all): 'db2210 (re)pooling @ 25%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73931 and previous config saved to /var/cache/conftool/dbconfig/20250303-102109-root.json
10:06 marostegui@cumin1002: dbctl commit (dc=all): 'db2210 (re)pooling @ 10%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73930 and previous config saved to /var/cache/conftool/dbconfig/20250303-100603-root.json
09:46 mvernon@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on ms-be1080.eqiad.wmnet with reason: disk failed
09:45 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.migrate-service-ipip (exit_code=0) for role: docker_registry_ha::registry@codfw
09:45 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on (A:lvs-low-traffic-codfw or A:lvs-secondary-codfw) and A:bullseye and A:lvs
09:44 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1030.eqiad.wmnet to cluster eqiad and group A
09:44 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on (A:lvs-low-traffic-codfw or A:lvs-secondary-codfw) and A:bullseye and A:lvs
09:43 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1030.eqiad.wmnet to cluster eqiad and group A
09:43 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1027.eqiad.wmnet to cluster eqiad and group A
09:43 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1027.eqiad.wmnet to cluster eqiad and group A
09:38 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.migrate-service-ipip for role: docker_registry_ha::registry@codfw
09:28 marostegui@cumin1002: START - Cookbook sre.mysql.clone of db1233.eqiad.wmnet onto db1246.eqiad.wmnet
09:11 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.migrate-service-ipip (exit_code=0) for role: docker_registry_ha::registry@eqiad
09:11 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on (A:lvs-low-traffic-eqiad or A:lvs-secondary-eqiad) and A:bullseye and A:lvs
09:10 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on (A:lvs-low-traffic-eqiad or A:lvs-secondary-eqiad) and A:bullseye and A:lvs
09:07 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.migrate-service-ipip for role: docker_registry_ha::registry@eqiad
08:55 marostegui@cumin1002: END (FAIL) - Cookbook sre.mysql.clone (exit_code=99) of db1233.eqiad.wmnet onto db1246.eqiad.wmnet
08:55 marostegui@cumin1002: START - Cookbook sre.mysql.clone of db1233.eqiad.wmnet onto db1246.eqiad.wmnet
08:30 fceratto@cumin1002: START - Cookbook sre.mysql.clone of db2166.codfw.wmnet onto db2167.codfw.wmnet
08:29 kartik@deploy2002: Finished scap sync-world: Backport for Enable CX unified dashboard on sqwiki (T386719) (duration: 25m 32s)
08:20 kartik@deploy2002: sbisson, kartik: Continuing with sync
08:16 kartik@deploy2002: sbisson, kartik: Backport for Enable CX unified dashboard on sqwiki (T386719) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
08:08 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1190.eqiad.wmnet with reason: Index rebuild
08:04 kartik@deploy2002: Started scap sync-world: Backport for Enable CX unified dashboard on sqwiki (T386719)
07:55 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db1190.eqiad.wmnet
07:53 root@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2210.codfw.wmnet with reason: Index rebuild
07:53 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db2210.codfw.wmnet
07:52 root@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2164.codfw.wmnet with reason: Index rebuild
07:52 root@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1172.eqiad.wmnet with reason: Index rebuild
07:52 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db2164.codfw.wmnet
07:52 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db1172.eqiad.wmnet
07:51 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.addnode (exit_code=99) for new host ganeti1027.eqiad.wmnet to cluster eqiad and group C
07:50 jmm@cumin2002: START - Cookbook sre.ganeti.addnode for new host ganeti1027.eqiad.wmnet to cluster eqiad and group C
07:48 marostegui@cumin1002: START - Cookbook sre.mysql.upgrade for db2210.codfw.wmnet
07:48 marostegui@cumin1002: START - Cookbook sre.mysql.upgrade for db1190.eqiad.wmnet
07:48 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2210 db1190', diff saved to https://phabricator.wikimedia.org/P73926 and previous config saved to /var/cache/conftool/dbconfig/20250303-074804-marostegui.json
07:46 Ammar: T387658 Ran mwscript-k8s --comment="T387658" -f -- extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=bawiki --logwiki=metawiki 'Əkrəm Cəfər' 'Əkrəm'
07:45 marostegui@cumin1002: START - Cookbook sre.mysql.upgrade for db2164.codfw.wmnet
07:45 marostegui@cumin1002: START - Cookbook sre.mysql.upgrade for db1172.eqiad.wmnet
07:45 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1172 db2164', diff saved to https://phabricator.wikimedia.org/P73925 and previous config saved to /var/cache/conftool/dbconfig/20250303-074525-marostegui.json
07:45 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db[2164,2186].codfw.wmnet,db1172.eqiad.wmnet with reason: Rebuilding indexes
07:33 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1233', diff saved to https://phabricator.wikimedia.org/P73923 and previous config saved to /var/cache/conftool/dbconfig/20250303-073358-root.json
07:18 moritzm: installing Linux 6.1.128 on Bookworm hosts

2025-03-02

22:07 marostegui@cumin1002: dbctl commit (dc=all): 'db1248 (re)pooling @ 100%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73921 and previous config saved to /var/cache/conftool/dbconfig/20250302-220727-root.json
21:52 marostegui@cumin1002: dbctl commit (dc=all): 'db1248 (re)pooling @ 75%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73920 and previous config saved to /var/cache/conftool/dbconfig/20250302-215221-root.json
21:37 marostegui@cumin1002: dbctl commit (dc=all): 'db1248 (re)pooling @ 50%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73919 and previous config saved to /var/cache/conftool/dbconfig/20250302-213716-root.json
21:22 marostegui@cumin1002: dbctl commit (dc=all): 'db1248 (re)pooling @ 25%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73918 and previous config saved to /var/cache/conftool/dbconfig/20250302-212211-root.json
21:07 marostegui@cumin1002: dbctl commit (dc=all): 'db1248 (re)pooling @ 10%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73917 and previous config saved to /var/cache/conftool/dbconfig/20250302-210705-root.json
20:52 mvernon@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db1246.eqiad.wmnet with reason: crashed
20:51 mvernon@cumin1002: dbctl commit (dc=all): 'Depool db1246', diff saved to https://phabricator.wikimedia.org/P73916 and previous config saved to /var/cache/conftool/dbconfig/20250302-205123-mvernon.json
16:24 marostegui@cumin1002: dbctl commit (dc=all): 'db2163 (re)pooling @ 100%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73915 and previous config saved to /var/cache/conftool/dbconfig/20250302-162421-root.json
16:09 marostegui@cumin1002: dbctl commit (dc=all): 'db2163 (re)pooling @ 75%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73914 and previous config saved to /var/cache/conftool/dbconfig/20250302-160915-root.json
15:54 marostegui@cumin1002: dbctl commit (dc=all): 'db2163 (re)pooling @ 50%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73913 and previous config saved to /var/cache/conftool/dbconfig/20250302-155410-root.json
15:39 marostegui@cumin1002: dbctl commit (dc=all): 'db2163 (re)pooling @ 25%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73912 and previous config saved to /var/cache/conftool/dbconfig/20250302-153904-root.json
15:23 marostegui@cumin1002: dbctl commit (dc=all): 'db2163 (re)pooling @ 10%: Repooling after rebuild index', diff saved to https://phabricator.wikimedia.org/P73911 and previous config saved to /var/cache/conftool/dbconfig/20250302-152359-root.json
10:17 root@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1248.eqiad.wmnet with reason: Index rebuild
10:17 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db1248.eqiad.wmnet
10:11 marostegui@cumin1002: START - Cookbook sre.mysql.upgrade for db1248.eqiad.wmnet
10:11 root@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2163.codfw.wmnet with reason: Index rebuild
10:08 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db2163.codfw.wmnet
10:04 marostegui@cumin1002: DONE (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2163.codfw.wmnet with reason: Setup
10:03 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2167', diff saved to https://phabricator.wikimedia.org/P73910 and previous config saved to /var/cache/conftool/dbconfig/20250302-100324-marostegui.json
10:00 marostegui@cumin1002: START - Cookbook sre.mysql.upgrade for db2163.codfw.wmnet
09:58 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2163', diff saved to https://phabricator.wikimedia.org/P73909 and previous config saved to /var/cache/conftool/dbconfig/20250302-095839-root.json
06:04 _joe_: started replication on db2167
05:44 tchin@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich: apply
05:44 tchin@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich: apply
00:32 reedy@deploy2002: Finished scap sync-world: Backport for UserGroupsHookHandler: Return early if performer is false (T387523) (duration: 10m 33s)
00:25 reedy@deploy2002: reedy, dreamyjazz: Continuing with sync
00:25 reedy@deploy2002: reedy, dreamyjazz: Backport for UserGroupsHookHandler: Return early if performer is false (T387523) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
00:21 reedy@deploy2002: Started scap sync-world: Backport for UserGroupsHookHandler: Return early if performer is false (T387523)

2025-03-01

23:59 tchin@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich: apply
23:59 tchin@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich: apply
18:37 dcausse: disabling the saneitizer on the cirrus streaming updater for consumer-search@eqiad & consumer-cloudelastic (pre-emptive hotfix for T387625)
18:37 dcausse@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
18:37 dcausse@deploy2002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
18:36 dcausse@deploy2002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
18:35 dcausse@deploy2002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
18:30 dcausse: disabling the saneitizer on the cirrus streaming updater in codfw (hotfix for T387625)
18:29 dcausse@deploy2002: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
18:29 dcausse@deploy2002: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
17:47 godog: bounce mtail on centrallog2002
17:22 tchin@deploy2002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich: apply
17:22 tchin@deploy2002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mw-content-history-reconcile-enrich: apply
14:00 andrewbogott: rebooting wikitech-static; the entire server was intermittently locking up

Other archives

2000s

Archive 1: 2004 Jun - 2004 Sep
Archive 2: 2004 Oct - 2004 Nov
Archive 3: 2004 Dec - 2005 Mar
Archive 4: 2005 Apr - 2005 Jul
Archive 5: 2005 Aug - 2005 Oct, with revision history 2004-06-23 to 2005-11-25
Archive 6: 2005 Nov - 2006 Feb
Archive 7: 2006 Mar - 2006 Jun
Archive 8: 2006 Jul - 2006 Sep
Archive 9: 2006 Oct - 2007 Jan, with revision history 2005-11-25 to 2007-02-21
Archive 10: 2007 Feb - 2007 Jun
Archive 11: 2007 Jul - 2007 Dec
Archive 12: 2008 Jan - 2008 Jul
Archive 12a: 2008 Aug
Archive 12b: 2008 Sept
Archive 13: 2008 Oct - 2009 Jun
Archive 14: 2009 Jun - 2009 Dec

2010s

2020-2024

2025-present