Server Admin Log/Archive 83

2024-07-31

22:23 pt1979@cumin1002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
22:19 pt1979@cumin1002: START - Cookbook sre.dns.netbox
22:17 pt1979@cumin1002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
22:15 pt1979@cumin1002: START - Cookbook sre.dns.netbox
22:10 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host alert2002.mgmt.codfw.wmnet with reboot policy FORCED
22:09 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host alert2002.mgmt.codfw.wmnet with reboot policy FORCED
21:52 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
21:50 cmooney@cumin1002: START - Cookbook sre.dns.netbox
21:28 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker1256.mgmt.eqiad.wmnet with reboot policy FORCED
21:27 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1256.mgmt.eqiad.wmnet with reboot policy FORCED
21:27 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker1256.mgmt.eqiad.wmnet with reboot policy FORCED
21:26 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1256.mgmt.eqiad.wmnet with reboot policy FORCED
21:17 xcollazo@deploy1003: Finished deploy [airflow-dags/analytics@82674dc]: deploy hot airflow analytics dag hot fix T368756 (duration: 01m 05s)
21:16 xcollazo@deploy1003: Started deploy [airflow-dags/analytics@82674dc]: deploy hot airflow analytics dag hot fix T368756
21:10 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on cp7015.magru.wmnet with reason: T371554
21:10 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on cp7015.magru.wmnet with reason: T371554
21:09 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
21:06 jclark@cumin1002: START - Cookbook sre.dns.netbox
21:04 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
21:02 jclark@cumin1002: START - Cookbook sre.dns.netbox
20:56 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker1257.mgmt.eqiad.wmnet with reboot policy FORCED
20:55 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker1256.mgmt.eqiad.wmnet with reboot policy FORCED
20:55 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker1255.mgmt.eqiad.wmnet with reboot policy FORCED
20:54 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker1254.mgmt.eqiad.wmnet with reboot policy FORCED
20:54 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker1253.mgmt.eqiad.wmnet with reboot policy FORCED
20:53 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker1259.mgmt.eqiad.wmnet with reboot policy FORCED
20:53 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker1251.mgmt.eqiad.wmnet with reboot policy FORCED
20:53 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker1252.mgmt.eqiad.wmnet with reboot policy FORCED
20:53 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker1250.mgmt.eqiad.wmnet with reboot policy FORCED
20:52 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker1258.mgmt.eqiad.wmnet with reboot policy FORCED
20:49 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1258.mgmt.eqiad.wmnet with reboot policy FORCED
20:49 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker1258.mgmt.eqiad.wmnet with reboot policy FORCED
20:47 brett@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp7015.magru.wmnet
20:45 cjming: end of UTC late backport window
20:44 cjming@deploy1003: Finished scap: Backport for beta: Enable NetworkSession extension (T355267) (duration: 07m 47s)
20:40 cjming@deploy1003: ebernhardson, cjming: Continuing with sync
20:39 cjming@deploy1003: ebernhardson, cjming: Backport for beta: Enable NetworkSession extension (T355267) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:37 cjming@deploy1003: Started scap sync-world: Backport for beta: Enable NetworkSession extension (T355267)
20:34 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1257.mgmt.eqiad.wmnet with reboot policy FORCED
20:34 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker1257.mgmt.eqiad.wmnet with reboot policy FORCED
20:31 cjming@deploy1003: Finished scap: Backport for [arwiki] Set noindex for namespace user (T371470) (duration: 17m 28s)
20:27 cjming@deploy1003: cjming, gergesshamon: Continuing with sync
20:26 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1258.mgmt.eqiad.wmnet with reboot policy FORCED
20:25 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1257.mgmt.eqiad.wmnet with reboot policy FORCED
20:25 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1256.mgmt.eqiad.wmnet with reboot policy FORCED
20:25 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1259.mgmt.eqiad.wmnet with reboot policy FORCED
20:24 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1255.mgmt.eqiad.wmnet with reboot policy FORCED
20:24 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1254.mgmt.eqiad.wmnet with reboot policy FORCED
20:24 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1253.mgmt.eqiad.wmnet with reboot policy FORCED
20:24 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1252.mgmt.eqiad.wmnet with reboot policy FORCED
20:23 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1251.mgmt.eqiad.wmnet with reboot policy FORCED
20:23 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1250.mgmt.eqiad.wmnet with reboot policy FORCED
20:19 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
20:17 jclark@cumin1002: START - Cookbook sre.dns.netbox
20:16 cjming@deploy1003: cjming, gergesshamon: Backport for [arwiki] Set noindex for namespace user (T371470) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:14 jclark@cumin1002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
20:14 cjming@deploy1003: Started scap sync-world: Backport for [arwiki] Set noindex for namespace user (T371470)
20:12 cjming@deploy1003: Finished scap: Backport for [wmf-config] Remove trailing slash in SSO domain (duration: 08m 04s)
20:09 jclark@cumin1002: START - Cookbook sre.dns.netbox
20:07 cjming@deploy1003: cjming, d3r1ck01: Continuing with sync
20:06 cjming@deploy1003: cjming, d3r1ck01: Backport for [wmf-config] Remove trailing slash in SSO domain synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:06 cstone: payments-wiki upgraded from c4c43c74 to e8d1c5ad
20:04 cjming@deploy1003: Started scap sync-world: Backport for [wmf-config] Remove trailing slash in SSO domain
20:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on netbox2002.codfw.wmnet,netbox1002.eqiad.wmnet with reason: old netbox
20:02 ayounsi@cumin1002: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on netbox2002.codfw.wmnet,netbox1002.eqiad.wmnet with reason: old netbox
19:41 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host alert2002.mgmt.codfw.wmnet with reboot policy FORCED
19:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host alert2002.mgmt.codfw.wmnet with reboot policy FORCED
19:23 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
19:20 jhancock@cumin2002: START - Cookbook sre.dns.netbox
19:20 jhancock@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
19:17 jhancock@cumin2002: START - Cookbook sre.dns.netbox
19:13 xcollazo@deploy1003: Finished deploy [airflow-dags/analytics@ea93090]: deploy latest DAGS to analyics Airflow instance. (duration: 01m 30s)
19:11 xcollazo@deploy1003: Started deploy [airflow-dags/analytics@ea93090]: deploy latest DAGS to analyics Airflow instance.
18:56 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host vrts2002.mgmt.codfw.wmnet with reboot policy FORCED
18:55 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host vrts2002.mgmt.codfw.wmnet with reboot policy FORCED
18:48 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host vrts2002.mgmt.codfw.wmnet with reboot policy FORCED
18:48 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host vrts2002.mgmt.codfw.wmnet with reboot policy FORCED
18:40 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host vrts2002.mgmt.codfw.wmnet with reboot policy FORCED
18:29 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host vrts2002.mgmt.codfw.wmnet with reboot policy FORCED
18:27 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
18:24 jhancock@cumin2002: START - Cookbook sre.dns.netbox
18:24 jhancock@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
18:17 brennen@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.43.0-wmf.16 refs T366961
18:17 jhancock@cumin2002: START - Cookbook sre.dns.netbox
18:09 brennen: 1.43.0-wmf.16 train (T366961): no current blockers, logs clean, rolling to group1.
17:52 ejegg: payments-wiki upgraded from 91624a2e to c4c43c74
17:13 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1248 (T367856)', diff saved to https://phabricator.wikimedia.org/P67171 and previous config saved to /var/cache/conftool/dbconfig/20240731-171255-marostegui.json
17:13 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1248.eqiad.wmnet with reason: Maintenance
17:12 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1248.eqiad.wmnet with reason: Maintenance
17:12 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247 (T367856)', diff saved to https://phabricator.wikimedia.org/P67170 and previous config saved to /var/cache/conftool/dbconfig/20240731-171233-marostegui.json
16:57 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247', diff saved to https://phabricator.wikimedia.org/P67169 and previous config saved to /var/cache/conftool/dbconfig/20240731-165726-marostegui.json
16:42 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247', diff saved to https://phabricator.wikimedia.org/P67168 and previous config saved to /var/cache/conftool/dbconfig/20240731-164219-marostegui.json
16:27 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247 (T367856)', diff saved to https://phabricator.wikimedia.org/P67167 and previous config saved to /var/cache/conftool/dbconfig/20240731-162712-marostegui.json
16:17 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2228.mgmt.codfw.wmnet with reboot policy GRACEFUL
16:08 elukey@cumin2002: START - Cookbook sre.hosts.provision for host db2228.mgmt.codfw.wmnet with reboot policy GRACEFUL
16:08 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2227.mgmt.codfw.wmnet with reboot policy GRACEFUL
16:07 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:04 ayounsi@cumin1002: START - Cookbook sre.dns.netbox
15:58 elukey@cumin2002: START - Cookbook sre.hosts.provision for host db2227.mgmt.codfw.wmnet with reboot policy GRACEFUL
15:56 ayounsi@cumin1002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
15:55 ayounsi@cumin1002: START - Cookbook sre.dns.netbox
15:49 marostegui@cumin1002: dbctl commit (dc=all): 'db2148 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P67166 and previous config saved to /var/cache/conftool/dbconfig/20240731-154912-root.json
15:40 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2226.mgmt.codfw.wmnet with reboot policy GRACEFUL
15:34 marostegui@cumin1002: dbctl commit (dc=all): 'db2148 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P67165 and previous config saved to /var/cache/conftool/dbconfig/20240731-153407-root.json
15:30 ayounsi@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1002.eqiad.wmnet with reason: CR1058609 - ayounsi@cumin1002
15:30 jgiannelos@deploy1003: Finished deploy [restbase/deploy@59a40a0]: (no justification provided) (duration: 19m 22s)
15:28 ayounsi@cumin1002: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1002.eqiad.wmnet with reason: CR1058609 - ayounsi@cumin1002
15:28 elukey@cumin2002: START - Cookbook sre.hosts.provision for host db2226.mgmt.codfw.wmnet with reboot policy GRACEFUL
15:27 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2225.mgmt.codfw.wmnet with reboot policy GRACEFUL
15:19 elukey@cumin2002: START - Cookbook sre.hosts.provision for host db2225.mgmt.codfw.wmnet with reboot policy GRACEFUL
15:19 marostegui@cumin1002: dbctl commit (dc=all): 'db2148 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P67164 and previous config saved to /var/cache/conftool/dbconfig/20240731-151901-root.json
15:17 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2224.mgmt.codfw.wmnet with reboot policy GRACEFUL
15:11 jgiannelos@deploy1003: Started deploy [restbase/deploy@59a40a0]: (no justification provided)
15:04 elukey@cumin2002: START - Cookbook sre.hosts.provision for host db2224.mgmt.codfw.wmnet with reboot policy GRACEFUL
15:03 marostegui@cumin1002: dbctl commit (dc=all): 'db2148 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P67163 and previous config saved to /var/cache/conftool/dbconfig/20240731-150356-root.json
14:48 marostegui@cumin1002: dbctl commit (dc=all): 'db2148 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P67162 and previous config saved to /var/cache/conftool/dbconfig/20240731-144850-root.json
14:45 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2223.mgmt.codfw.wmnet with reboot policy GRACEFUL
14:33 marostegui@cumin1002: dbctl commit (dc=all): 'db2148 (re)pooling @ 5%: Repooling', diff saved to Unable to send diff to phaste and previous config saved to /var/cache/conftool/dbconfig/20240731-143340-root.json
14:33 elukey@cumin2002: START - Cookbook sre.hosts.provision for host db2223.mgmt.codfw.wmnet with reboot policy GRACEFUL
14:21 sukhe: [done] upgrade cp4044 to ATS 9.2.5: T339134
14:21 sukhe@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-ats (exit_code=0) Rolling upgrade/restart of Apache Traffic Server on P{cp4044*} and (A:cp-eqiad or A:cp-text_eqiad or A:cp-upload_eqiad or A:cp-codfw or A:cp-text_codfw or A:cp-upload_codfw or A:cp-esams or A:cp-text_esams or A:cp-upload_esams or A:cp-ulsfo or A:cp-text_ulsfo or A:cp-upload_ulsfo or A:cp-eqsin or A:cp-text_eqsin or A:cp-upload_eqsin or A:cp-
14:20 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2148', diff saved to https://phabricator.wikimedia.org/P67160 and previous config saved to /var/cache/conftool/dbconfig/20240731-141959-marostegui.json
14:19 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2148.codfw.wmnet with reason: Maintenance
14:19 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on db2148.codfw.wmnet with reason: Maintenance
14:17 sukhe@cumin1002: START - Cookbook sre.cdn.roll-upgrade-ats Rolling upgrade/restart of Apache Traffic Server on P{cp4044*} and (A:cp-eqiad or A:cp-text_eqiad or A:cp-upload_eqiad or A:cp-codfw or A:cp-text_codfw or A:cp-upload_codfw or A:cp-esams or A:cp-text_esams or A:cp-upload_esams or A:cp-ulsfo or A:cp-text_ulsfo or A:cp-upload_ulsfo or A:cp-eqsin or A:cp-text_eqsin or A:cp-upload_eqsin or A:cp-drmrs or A:cp-text_
13:54 Lucas_WMDE: UTC afternoon backport+config window done
13:53 logmsgbot: lucaswerkmeister-wmde@deploy1003 Finished scap: Backport for EventStreamConfig - fix for private wiki streams (T346046 T371433) (duration: 11m 31s)
13:49 logmsgbot: lucaswerkmeister-wmde@deploy1003 lucaswerkmeister-wmde, otto: Continuing with sync
13:49 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1019.eqiad.wmnet,service=s6
13:46 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
13:45 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
13:44 logmsgbot: lucaswerkmeister-wmde@deploy1003 lucaswerkmeister-wmde, otto: Backport for EventStreamConfig - fix for private wiki streams (T346046 T371433) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:42 logmsgbot: lucaswerkmeister-wmde@deploy1003 Started scap sync-world: Backport for EventStreamConfig - fix for private wiki streams (T346046 T371433)
13:40 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=dns6001.wikimedia.org,service=recdns [reason: [done] pdns-rec upgrade]
13:39 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=dns6001.wikimedia.org,service=recdns [reason: pdns-rec upgrade]
13:39 logmsgbot: lucaswerkmeister-wmde@deploy1003 Finished scap: Backport for TranslatablePage: Store source page ids as string in WAN cache (T366455), TranslatablePage: Store source page ids as string in WAN cache (T366455) (duration: 12m 34s)
13:39 sukhe: upgrade pdns-recursor to 4.8.8 from from 4.8.7 on dns6001
13:34 logmsgbot: lucaswerkmeister-wmde@deploy1003 lucaswerkmeister-wmde, abi: Continuing with sync
13:28 logmsgbot: lucaswerkmeister-wmde@deploy1003 lucaswerkmeister-wmde, abi: Backport for TranslatablePage: Store source page ids as string in WAN cache (T366455), TranslatablePage: Store source page ids as string in WAN cache (T366455) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:27 cdanis@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/aus-k8s-eqiad-services/jaeger: apply
13:26 logmsgbot: lucaswerkmeister-wmde@deploy1003 Started scap sync-world: Backport for TranslatablePage: Store source page ids as string in WAN cache (T366455), TranslatablePage: Store source page ids as string in WAN cache (T366455)
13:25 logmsgbot: lucaswerkmeister-wmde@deploy1003 Finished scap: Backport for Fix tracking parameter casing (T370045) (duration: 12m 30s)
13:25 ayounsi@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1002.eqiad.wmnet with reason: Release v0.7.0 - ayounsi@cumin1002
13:24 cdanis@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/aus-k8s-eqiad-services/jaeger: apply
13:21 cdanis@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/aus-k8s-eqiad-services/jaeger: apply
13:20 logmsgbot: lucaswerkmeister-wmde@deploy1003 lucaswerkmeister-wmde, joelyrookewmde: Continuing with sync
13:19 ayounsi@cumin1002: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1002.eqiad.wmnet with reason: Release v0.7.0 - ayounsi@cumin1002
13:18 cdanis@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/aus-k8s-eqiad-services/jaeger: apply
13:16 logmsgbot: lucaswerkmeister-wmde@deploy1003 lucaswerkmeister-wmde, joelyrookewmde: Backport for Fix tracking parameter casing (T370045) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:13 fabfur: running `sudo cumin -b 1 -s300 A:cp-ulsfo 'depool-cdn && sleep 30 && enable-puppet "T370741" && run-puppet-agent && pool-cdn'` (T370741)
13:12 logmsgbot: lucaswerkmeister-wmde@deploy1003 Started scap sync-world: Backport for Fix tracking parameter casing (T370045)
12:57 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4044.ulsfo.wmnet [reason: pooling after cookbook depooled as puppet was disabled]
12:57 elukey: update debmonitor-server and python3-debmonitor to bookworm-wikimedia - T368744
12:54 sukhe@cumin1002: END (FAIL) - Cookbook sre.cdn.roll-upgrade-ats (exit_code=1) Rolling upgrade/restart of Apache Traffic Server on P{cp4044*} and (A:cp-eqiad or A:cp-text_eqiad or A:cp-upload_eqiad or A:cp-codfw or A:cp-text_codfw or A:cp-upload_codfw or A:cp-esams or A:cp-text_esams or A:cp-upload_esams or A:cp-ulsfo or A:cp-text_ulsfo or A:cp-upload_ulsfo or A:cp-eqsin or A:cp-text_eqsin or A:cp-upload_eqsin or A:cp-
12:53 sukhe: upgrade cp4044 to ATS 9.2.5: T339134
12:53 sukhe@cumin1002: START - Cookbook sre.cdn.roll-upgrade-ats Rolling upgrade/restart of Apache Traffic Server on P{cp4044*} and (A:cp-eqiad or A:cp-text_eqiad or A:cp-upload_eqiad or A:cp-codfw or A:cp-text_codfw or A:cp-upload_codfw or A:cp-esams or A:cp-text_esams or A:cp-upload_esams or A:cp-ulsfo or A:cp-text_ulsfo or A:cp-upload_ulsfo or A:cp-eqsin or A:cp-text_eqsin or A:cp-upload_eqsin or A:cp-drmrs or A:cp-text_
12:50 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp4037.ulsfo.wmnet
12:50 fabfur: repool cp4037, haproxy configuration modified to exclude benthos logging (T370741)
12:46 cdanis@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/aus-k8s-eqiad-services/jaeger: apply
12:44 klausman@deploy1003: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
12:39 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp4037.ulsfo.wmnet
12:39 fabfur: temporary depooling cp4037 to test remove all Benthos resources (T370741)
12:34 ayounsi@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) netbox to netbox2003.codfw.wmnet,netbox1003.eqiad.wmnet with reason: Release v4.0.8 to future netbox prod - ayounsi@cumin1002 - T336275
12:33 cdanis@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/aus-k8s-eqiad-services/jaeger: apply
12:30 fabfur: temporary disabling puppet on cp-ulsfo to test remove benthos from cp4037 (https://gerrit.wikimedia.org/r/c/operations/puppet/+/1057823) (T370741)
12:25 ayounsi@cumin1002: START - Cookbook sre.deploy.python-code netbox to netbox2003.codfw.wmnet,netbox1003.eqiad.wmnet with reason: Release v4.0.8 to future netbox prod - ayounsi@cumin1002 - T336275
12:22 klausman@deploy1003: helmfile [ml-serve-eqiad] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
12:12 dreamyjazz@deploy1003: Finished scap: Backport for Grant checkuser-temporary-account-no-preference to suppress group (T371364) (duration: 08m 57s)
12:11 Dreamy_Jazz: Running `mwscript extensions/MediaModeration/maintenance/updateMetrics.php --wiki=commonswiki --verbose
12:08 marostegui@cumin1002: dbctl commit (dc=all): 'db2218 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P67159 and previous config saved to /var/cache/conftool/dbconfig/20240731-120844-root.json
12:07 dreamyjazz@deploy1003: dreamyjazz: Continuing with sync
12:07 dreamyjazz@deploy1003: dreamyjazz: Backport for Grant checkuser-temporary-account-no-preference to suppress group (T371364) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
12:06 akosiaris@cumin1002: conftool action : set/pooled=yes; selector: name=parse2001.codfw.wmnet
12:06 akosiaris@cumin1002: conftool action : set/weight=10; selector: name=parse2001.codfw.wmnet
12:03 dreamyjazz@deploy1003: Started scap sync-world: Backport for Grant checkuser-temporary-account-no-preference to suppress group (T371364)
11:55 klausman@deploy1003: helmfile [ml-serve-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
11:53 marostegui@cumin1002: dbctl commit (dc=all): 'db2218 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P67158 and previous config saved to /var/cache/conftool/dbconfig/20240731-115338-root.json
11:38 marostegui@cumin1002: dbctl commit (dc=all): 'db2218 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P67156 and previous config saved to /var/cache/conftool/dbconfig/20240731-113833-root.json
11:35 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on dbstore1007.eqiad.wmnet with reason: Long schema change
11:35 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on dbstore1007.eqiad.wmnet with reason: Long schema change
11:25 akosiaris@cumin1002: conftool action : set/pooled=yes; selector: name=parse1001.eqiad.wmnet
11:25 akosiaris@cumin1002: conftool action : set/weight=10; selector: name=parse1001.eqiad.wmnet
11:23 marostegui@cumin1002: dbctl commit (dc=all): 'db2218 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P67155 and previous config saved to /var/cache/conftool/dbconfig/20240731-112327-root.json
11:11 urbanecm@deploy1003: Finished scap: Backport for EventStreamConfig: Re-enable mediawiki_eventbus on private wikis (T371433) (duration: 08m 02s)
11:11 claime: Removing /var/lib/puppet/server/ssl/ca/signed/docker-registry.discovery.wmnet.pem on puppetmaster1001
11:08 marostegui@cumin1002: dbctl commit (dc=all): 'db2218 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P67154 and previous config saved to /var/cache/conftool/dbconfig/20240731-110822-root.json
11:07 klausman@deploy1003: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
11:07 urbanecm@deploy1003: urbanecm: Continuing with sync
11:05 akosiaris@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host parse2001.codfw.wmnet with OS bullseye
11:05 urbanecm@deploy1003: urbanecm: Backport for EventStreamConfig: Re-enable mediawiki_eventbus on private wikis (T371433) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
11:03 urbanecm@deploy1003: Started scap sync-world: Backport for EventStreamConfig: Re-enable mediawiki_eventbus on private wikis (T371433)
11:01 akosiaris@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host parse1001.eqiad.wmnet with OS bullseye
10:53 marostegui@cumin1002: dbctl commit (dc=all): 'db2218 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P67153 and previous config saved to /var/cache/conftool/dbconfig/20240731-105317-root.json
10:46 akosiaris@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on parse2001.codfw.wmnet with reason: host reimage
10:46 dreamyjazz@deploy1003: Finished scap: Backport for Unblock CI (T371324), Unblock CI (T371324) (duration: 07m 29s)
10:43 akosiaris@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on parse2001.codfw.wmnet with reason: host reimage
10:42 akosiaris@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on parse1001.eqiad.wmnet with reason: host reimage
10:41 dreamyjazz@deploy1003: dreamyjazz: Continuing with sync
10:41 dreamyjazz@deploy1003: dreamyjazz: Backport for Unblock CI (T371324), Unblock CI (T371324) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
10:39 akosiaris@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on parse1001.eqiad.wmnet with reason: host reimage
10:39 dreamyjazz@deploy1003: Started scap sync-world: Backport for Unblock CI (T371324), Unblock CI (T371324)
10:38 marostegui@cumin1002: dbctl commit (dc=all): 'db2218 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P67152 and previous config saved to /var/cache/conftool/dbconfig/20240731-103811-root.json
10:37 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2218 T371462', diff saved to https://phabricator.wikimedia.org/P67151 and previous config saved to /var/cache/conftool/dbconfig/20240731-103704-marostegui.json
10:35 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db2220 to s7 primary T371462', diff saved to https://phabricator.wikimedia.org/P67150 and previous config saved to /var/cache/conftool/dbconfig/20240731-103513-root.json
10:33 marostegui: Starting s7 codfw failover from db2218 to db2220 - T371462
10:26 akosiaris@cumin1002: START - Cookbook sre.hosts.reimage for host parse2001.codfw.wmnet with OS bullseye
10:25 akosiaris@cumin1002: START - Cookbook sre.hosts.reimage for host parse1001.eqiad.wmnet with OS bullseye
10:18 akosiaris: revoke docker-registry.discovery.wmnet old certificate from Puppet CA that would expire in a few days. It hasn't been in use since https://gerrit.wikimedia.org/r/c/operations/puppet/+/1018251
10:15 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 27 hosts with reason: Primary switchover s7 T371462
10:14 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 27 hosts with reason: Primary switchover s7 T371462
10:14 mfossati@deploy1003: Finished deploy [airflow-dags/platform_eng@6ef5a7a]: (no justification provided) (duration: 00m 30s)
10:13 mfossati@deploy1003: Started deploy [airflow-dags/platform_eng@6ef5a7a]: (no justification provided)
09:56 marostegui@cumin1002: dbctl commit (dc=all): 'Remove db2220 from API/vslow/dump T371462', diff saved to https://phabricator.wikimedia.org/P67149 and previous config saved to /var/cache/conftool/dbconfig/20240731-095640-root.json
09:56 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 27 hosts with reason: Primary switchover s7 T371462
09:56 marostegui@cumin1002: dbctl commit (dc=all): 'Set db2220 with weight 0 T371462', diff saved to https://phabricator.wikimedia.org/P67148 and previous config saved to /var/cache/conftool/dbconfig/20240731-095609-root.json
09:55 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 27 hosts with reason: Primary switchover s7 T371462
09:55 marostegui@cumin1002: dbctl commit (dc=all): 'Repool db2220', diff saved to https://phabricator.wikimedia.org/P67147 and previous config saved to /var/cache/conftool/dbconfig/20240731-095545-marostegui.json
09:52 marostegui@cumin1002: dbctl commit (dc=all): 'db2205 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P67146 and previous config saved to /var/cache/conftool/dbconfig/20240731-095200-root.json
09:50 marostegui@cumin1002: dbctl commit (dc=all): 'db2220 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P67145 and previous config saved to /var/cache/conftool/dbconfig/20240731-095050-root.json
09:36 marostegui@cumin1002: dbctl commit (dc=all): 'db2205 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P67144 and previous config saved to /var/cache/conftool/dbconfig/20240731-093654-root.json
09:35 marostegui@cumin1002: dbctl commit (dc=all): 'db2220 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P67143 and previous config saved to /var/cache/conftool/dbconfig/20240731-093545-root.json
09:25 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1019.eqiad.wmnet,service=s4
09:21 marostegui@cumin1002: dbctl commit (dc=all): 'db2205 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P67142 and previous config saved to /var/cache/conftool/dbconfig/20240731-092149-root.json
09:20 marostegui@cumin1002: dbctl commit (dc=all): 'db2220 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P67141 and previous config saved to /var/cache/conftool/dbconfig/20240731-092039-root.json
09:18 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2220.codfw.wmnet with reason: Maintenance
09:18 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on db2220.codfw.wmnet with reason: Maintenance
09:17 marostegui@cumin1002: dbctl commit (dc=all): 'Move db2121 to vslow T371361', diff saved to https://phabricator.wikimedia.org/P67140 and previous config saved to /var/cache/conftool/dbconfig/20240731-091706-root.json
09:14 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2220 T371361', diff saved to https://phabricator.wikimedia.org/P67139 and previous config saved to /var/cache/conftool/dbconfig/20240731-091450-root.json
09:06 marostegui@cumin1002: dbctl commit (dc=all): 'db2205 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P67138 and previous config saved to /var/cache/conftool/dbconfig/20240731-090643-root.json
08:51 marostegui@cumin1002: dbctl commit (dc=all): 'db2205 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P67137 and previous config saved to /var/cache/conftool/dbconfig/20240731-085138-root.json
08:47 marostegui@cumin1002: dbctl commit (dc=all): 'db1179 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P67136 and previous config saved to /var/cache/conftool/dbconfig/20240731-084705-root.json
08:36 marostegui@cumin1002: dbctl commit (dc=all): 'db2205 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P67135 and previous config saved to /var/cache/conftool/dbconfig/20240731-083633-root.json
08:32 marostegui@cumin1002: dbctl commit (dc=all): 'db1179 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P67134 and previous config saved to /var/cache/conftool/dbconfig/20240731-083159-root.json
08:21 marostegui@cumin1002: dbctl commit (dc=all): 'db2205 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P67133 and previous config saved to /var/cache/conftool/dbconfig/20240731-082127-root.json
08:18 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2205 T371455', diff saved to https://phabricator.wikimedia.org/P67132 and previous config saved to /var/cache/conftool/dbconfig/20240731-081801-root.json
08:16 marostegui@cumin1002: dbctl commit (dc=all): 'db1179 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P67131 and previous config saved to /var/cache/conftool/dbconfig/20240731-081654-root.json
08:16 marostegui: Starting s3 codfw failover from db2205 to db2209 - T371455
08:10 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 24 hosts with reason: Switchover s3
08:10 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 24 hosts with reason: Switchover s3
08:01 marostegui@cumin1002: dbctl commit (dc=all): 'db1179 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P67130 and previous config saved to /var/cache/conftool/dbconfig/20240731-080148-root.json
08:00 marostegui@cumin1002: dbctl commit (dc=all): 'db2203 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P67129 and previous config saved to /var/cache/conftool/dbconfig/20240731-080017-root.json
07:57 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2222.mgmt.codfw.wmnet with reboot policy GRACEFUL
07:49 elukey@cumin1002: START - Cookbook sre.hosts.provision for host db2222.mgmt.codfw.wmnet with reboot policy GRACEFUL
07:46 marostegui@cumin1002: dbctl commit (dc=all): 'db1179 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P67128 and previous config saved to /var/cache/conftool/dbconfig/20240731-074643-root.json
07:45 marostegui@cumin1002: dbctl commit (dc=all): 'db2203 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P67127 and previous config saved to /var/cache/conftool/dbconfig/20240731-074512-root.json
07:41 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'clear' for AS: 64049
07:39 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'clear' for AS: 64049
07:30 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2221.mgmt.codfw.wmnet with reboot policy GRACEFUL
07:30 marostegui@cumin1002: dbctl commit (dc=all): 'db2203 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P67126 and previous config saved to /var/cache/conftool/dbconfig/20240731-073006-root.json
07:21 elukey@cumin1002: START - Cookbook sre.hosts.provision for host db2221.mgmt.codfw.wmnet with reboot policy GRACEFUL
07:16 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 24 hosts with reason: Primary switchover s3 T371455
07:16 marostegui@cumin1002: dbctl commit (dc=all): 'Set db2209 with weight 0 T371455', diff saved to https://phabricator.wikimedia.org/P67125 and previous config saved to /var/cache/conftool/dbconfig/20240731-071645-root.json
07:16 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 24 hosts with reason: Primary switchover s3 T371455
07:15 marostegui@cumin1002: dbctl commit (dc=all): 'db2203 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P67124 and previous config saved to /var/cache/conftool/dbconfig/20240731-071500-root.json
07:14 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1179.mgmt.eqiad.wmnet with reboot policy GRACEFUL
07:01 elukey@cumin1002: START - Cookbook sre.hosts.provision for host db1179.mgmt.eqiad.wmnet with reboot policy GRACEFUL
06:59 marostegui@cumin1002: dbctl commit (dc=all): 'db2203 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P67123 and previous config saved to /var/cache/conftool/dbconfig/20240731-065955-root.json
06:53 marostegui@cumin1002: dbctl commit (dc=all): 'db1209 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P67122 and previous config saved to /var/cache/conftool/dbconfig/20240731-065341-root.json
06:53 marostegui@cumin1002: dbctl commit (dc=all): 'db2209 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P67121 and previous config saved to /var/cache/conftool/dbconfig/20240731-065320-root.json
06:50 slyngs: Upgrading CAS to version 7.0
06:48 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1179.eqiad.wmnet with reason: Maintenance
06:48 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on db1179.eqiad.wmnet with reason: Maintenance
06:47 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1179 T371132', diff saved to https://phabricator.wikimedia.org/P67120 and previous config saved to /var/cache/conftool/dbconfig/20240731-064752-root.json
06:44 marostegui@cumin1002: dbctl commit (dc=all): 'db2203 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P67119 and previous config saved to /var/cache/conftool/dbconfig/20240731-064449-root.json
06:38 marostegui@cumin1002: dbctl commit (dc=all): 'db1209 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P67118 and previous config saved to /var/cache/conftool/dbconfig/20240731-063835-root.json
06:38 marostegui@cumin1002: dbctl commit (dc=all): 'db2209 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P67117 and previous config saved to /var/cache/conftool/dbconfig/20240731-063814-root.json
06:23 marostegui@cumin1002: dbctl commit (dc=all): 'db1209 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P67116 and previous config saved to /var/cache/conftool/dbconfig/20240731-062330-root.json
06:23 marostegui@cumin1002: dbctl commit (dc=all): 'db2209 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P67115 and previous config saved to /var/cache/conftool/dbconfig/20240731-062308-root.json
05:56 marostegui@cumin1002: dbctl commit (dc=all): 'db1173 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P67112 and previous config saved to /var/cache/conftool/dbconfig/20240731-055645-root.json
05:53 marostegui@cumin1002: dbctl commit (dc=all): 'db1209 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P67111 and previous config saved to /var/cache/conftool/dbconfig/20240731-055319-root.json
05:52 marostegui@cumin1002: dbctl commit (dc=all): 'db2209 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P67110 and previous config saved to /var/cache/conftool/dbconfig/20240731-055256-root.json
05:50 marostegui@cumin1002: dbctl commit (dc=all): 'Make db2127 vslow and remove it as candidate master T371361', diff saved to https://phabricator.wikimedia.org/P67109 and previous config saved to /var/cache/conftool/dbconfig/20240731-055004-marostegui.json
05:47 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2209.codfw.wmnet with reason: Change binlog format
05:47 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on db2209.codfw.wmnet with reason: Change binlog format
05:46 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2209 T371361', diff saved to https://phabricator.wikimedia.org/P67108 and previous config saved to /var/cache/conftool/dbconfig/20240731-054653-root.json
05:44 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1169 (T367856)', diff saved to https://phabricator.wikimedia.org/P67107 and previous config saved to /var/cache/conftool/dbconfig/20240731-054414-marostegui.json
05:44 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1169.eqiad.wmnet with reason: Maintenance
05:43 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1169.eqiad.wmnet with reason: Maintenance
05:43 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1163 (T367856)', diff saved to https://phabricator.wikimedia.org/P67106 and previous config saved to /var/cache/conftool/dbconfig/20240731-054352-marostegui.json
05:41 marostegui@cumin1002: dbctl commit (dc=all): 'db1173 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P67105 and previous config saved to /var/cache/conftool/dbconfig/20240731-054140-root.json
05:38 marostegui@cumin1002: dbctl commit (dc=all): 'db1209 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P67104 and previous config saved to /var/cache/conftool/dbconfig/20240731-053813-root.json
05:28 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P67103 and previous config saved to /var/cache/conftool/dbconfig/20240731-052845-marostegui.json
05:26 marostegui@cumin1002: dbctl commit (dc=all): 'db1173 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P67102 and previous config saved to /var/cache/conftool/dbconfig/20240731-052634-root.json
05:23 marostegui@cumin1002: dbctl commit (dc=all): 'db1209 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P67101 and previous config saved to /var/cache/conftool/dbconfig/20240731-052308-root.json
05:22 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1209 T371368', diff saved to https://phabricator.wikimedia.org/P67100 and previous config saved to /var/cache/conftool/dbconfig/20240731-052216-marostegui.json
05:21 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db1193 to s8 primary and set section read-write T371368', diff saved to https://phabricator.wikimedia.org/P67099 and previous config saved to /var/cache/conftool/dbconfig/20240731-052114-root.json
05:20 marostegui@cumin1002: dbctl commit (dc=all): 'Set s8 eqiad as read-only for maintenance - T371368', diff saved to https://phabricator.wikimedia.org/P67098 and previous config saved to /var/cache/conftool/dbconfig/20240731-052036-root.json
05:20 marostegui: Starting s8 eqiad failover from db1209 to db1193 - T371368
05:13 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P67097 and previous config saved to /var/cache/conftool/dbconfig/20240731-051339-marostegui.json
05:11 marostegui@cumin1002: dbctl commit (dc=all): 'db1173 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P67096 and previous config saved to /var/cache/conftool/dbconfig/20240731-051129-root.json
04:58 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1163 (T367856)', diff saved to https://phabricator.wikimedia.org/P67095 and previous config saved to /var/cache/conftool/dbconfig/20240731-045832-marostegui.json
04:56 marostegui@cumin1002: dbctl commit (dc=all): 'Remove db1193 from API/vslow/dump T371368', diff saved to https://phabricator.wikimedia.org/P67094 and previous config saved to /var/cache/conftool/dbconfig/20240731-045649-root.json
04:56 marostegui@cumin1002: dbctl commit (dc=all): 'Set db1193 with weight 0 T371368', diff saved to https://phabricator.wikimedia.org/P67093 and previous config saved to /var/cache/conftool/dbconfig/20240731-045631-root.json
04:56 marostegui@cumin1002: dbctl commit (dc=all): 'db1173 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P67092 and previous config saved to /var/cache/conftool/dbconfig/20240731-045623-root.json
04:56 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 32 hosts with reason: Primary switchover s8 T371368
04:55 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 32 hosts with reason: Primary switchover s8 T371368
04:52 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1173 T371365', diff saved to https://phabricator.wikimedia.org/P67091 and previous config saved to /var/cache/conftool/dbconfig/20240731-045158-marostegui.json
04:49 marostegui@cumin1002: dbctl commit (dc=all): 'Set s6 eqiad as read-only for maintenance - T371365', diff saved to https://phabricator.wikimedia.org/P67089 and previous config saved to /var/cache/conftool/dbconfig/20240731-044954-root.json
04:49 marostegui: Starting s6 eqiad failover from db1173 to db1201 - T371365
04:35 marostegui@cumin1002: dbctl commit (dc=all): 'Remove db1201 from API/vslow/dump T371365', diff saved to https://phabricator.wikimedia.org/P67088 and previous config saved to /var/cache/conftool/dbconfig/20240731-043528-marostegui.json
04:35 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s6 T371365
04:35 marostegui@cumin1002: dbctl commit (dc=all): 'Set db1201 with weight 0 T371365', diff saved to https://phabricator.wikimedia.org/P67087 and previous config saved to /var/cache/conftool/dbconfig/20240731-043459-marostegui.json
04:34 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 25 hosts with reason: Primary switchover s6 T371365
02:29 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1247 (T367856)', diff saved to https://phabricator.wikimedia.org/P67086 and previous config saved to /var/cache/conftool/dbconfig/20240731-022920-marostegui.json
02:29 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1247.eqiad.wmnet with reason: Maintenance
02:29 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1247.eqiad.wmnet with reason: Maintenance
00:55 eileen: civicrm upgraded from 4d3d2720 to d1f1d7bd
00:03 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1248.eqiad.wmnet with OS bullseye
00:03 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
00:02 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"

2024-07-30

23:53 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1249.eqiad.wmnet with OS bullseye
23:53 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
23:52 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
23:50 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=93) for host wikikube-worker1248.mgmt.eqiad.wmnet with reboot policy FORCED
23:49 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1247.eqiad.wmnet with OS bullseye
23:49 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
23:48 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
23:46 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1246.eqiad.wmnet with OS bullseye
23:46 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
23:45 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1248.eqiad.wmnet with reason: host reimage
23:45 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
23:44 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1244.eqiad.wmnet with OS bullseye
23:44 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
23:43 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
23:42 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1248.eqiad.wmnet with reason: host reimage
23:35 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1249.eqiad.wmnet with reason: host reimage
23:34 tzatziki: removing 1 file for legal compliance
23:32 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1249.eqiad.wmnet with reason: host reimage
23:31 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1247.eqiad.wmnet with reason: host reimage
23:28 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1246.eqiad.wmnet with reason: host reimage
23:26 tzatziki: removing 1 file for legal compliance
23:26 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1248.eqiad.wmnet with OS bullseye
23:25 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1247.eqiad.wmnet with reason: host reimage
23:25 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1244.eqiad.wmnet with reason: host reimage
23:23 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1246.eqiad.wmnet with reason: host reimage
23:22 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1244.eqiad.wmnet with reason: host reimage
23:22 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1248.mgmt.eqiad.wmnet with reboot policy FORCED
23:17 eileen: civicrm upgraded from 3db16342 to 4d3d2720
23:15 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1241.eqiad.wmnet with OS bullseye
23:15 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
23:15 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1249.eqiad.wmnet with OS bullseye
23:14 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1249.mgmt.eqiad.wmnet with reboot policy FORCED
23:14 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
23:13 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1248.mgmt.eqiad.wmnet with reboot policy FORCED
23:12 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1245.eqiad.wmnet with OS bullseye
23:12 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
23:11 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
23:09 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1247.eqiad.wmnet with OS bullseye
23:09 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1243.eqiad.wmnet with OS bullseye
23:09 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
23:08 tzatziki: removing 2 files for legal compliance
23:07 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
23:06 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1246.eqiad.wmnet with OS bullseye
23:06 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1242.eqiad.wmnet with OS bullseye
23:06 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
23:06 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1244.eqiad.wmnet with OS bullseye
23:05 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1244.mgmt.eqiad.wmnet with reboot policy FORCED
23:04 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
22:56 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1241.eqiad.wmnet with reason: host reimage
22:53 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1245.eqiad.wmnet with reason: host reimage
22:50 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1243.eqiad.wmnet with reason: host reimage
22:49 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1245.eqiad.wmnet with reason: host reimage
22:47 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1242.eqiad.wmnet with reason: host reimage
22:45 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1241.eqiad.wmnet with reason: host reimage
22:45 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1243.eqiad.wmnet with reason: host reimage
22:45 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1242.eqiad.wmnet with reason: host reimage
22:41 eileen: config revision changed from d2484ce6 to e8cc0ed6
22:35 eileen: config revision changed from 10ead940 to d2484ce6
22:34 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1244.mgmt.eqiad.wmnet with reboot policy FORCED
22:32 eileen: civicrm upgraded from 5ac353bd to 3db16342
22:32 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1245.eqiad.wmnet with OS bullseye
22:28 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1242.eqiad.wmnet with OS bullseye
22:28 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1243.eqiad.wmnet with OS bullseye
22:28 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1241.eqiad.wmnet with OS bullseye
21:53 urbanecm@deploy1003: Finished scap: Backport for Fix resource response to use JSON content type header (T263870), Fix resource response to use JSON content type header (T263870) (duration: 08m 09s)
21:45 urbanecm@deploy1003: Started scap sync-world: Backport for Fix resource response to use JSON content type header (T263870), Fix resource response to use JSON content type header (T263870)
21:23 cjming@deploy1003: Finished scap: Backport for Deploy MetricsPlatform to beta cluster (T366234) (duration: 11m 41s)
21:18 cjming@deploy1003: cjming: Continuing with sync
21:14 cjming@deploy1003: cjming: Backport for Deploy MetricsPlatform to beta cluster (T366234) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:11 cjming@deploy1003: Started scap sync-world: Backport for Deploy MetricsPlatform to beta cluster (T366234)
21:06 cjming@deploy1003: Finished scap: Backport for Enable Parsoid Read Views on {en,he}wikivoyage (T365367) (duration: 13m 18s)
21:01 cjming@deploy1003: cjming, cscott: Continuing with sync
20:58 cjming@deploy1003: cjming, cscott: Backport for Enable Parsoid Read Views on {en,he}wikivoyage (T365367) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:52 cjming@deploy1003: Started scap sync-world: Backport for Enable Parsoid Read Views on {en,he}wikivoyage (T365367)
20:48 cjming@deploy1003: Finished scap: Backport for Add NetworkSession extension (T355267) (duration: 45m 08s)
20:40 cjming@deploy1003: ebernhardson, cjming: Continuing with sync
20:38 cjming@deploy1003: ebernhardson, cjming: Backport for Add NetworkSession extension (T355267) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:16 godog: bounce benthos@webrequest_live.service on centrallog for excessive lag
20:06 topranks: re-enable BGP to lvs2011 on lsw1-a2-codfw (restores as primary for traffic) T370891
20:03 cjming@deploy1003: Started scap sync-world: Backport for Add NetworkSession extension (T355267)
19:58 topranks: rebooting lvs2011 to force new network config T370891
19:37 eileen: civicrm upgraded from 5e72c64f to 5ac353bd
19:29 topranks: disable BGP to lvs2011 on lsw1-a2-codfw (moves traffic to lvs2014) in advnace of vlan change T370891
19:29 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs2011.codfw.wmnet with reason: reconfigure vlans on lvs2011
19:29 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on lvs2011.codfw.wmnet with reason: reconfigure vlans on lvs2011
19:28 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lsw1-a2-codfw.mgmt with reason: reconfigure vlans on lvs2011
19:28 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on lsw1-a2-codfw.mgmt with reason: reconfigure vlans on lvs2011
19:21 ryankemper@deploy1003: Finished deploy [wdqs/wdqs@316bf7f]: 0.3.145 (duration: 07m 59s)
19:13 ryankemper@deploy1003: Started deploy [wdqs/wdqs@316bf7f]: 0.3.145
18:53 brennen@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.43.0-wmf.16 refs T366961
18:39 topranks: re-enabling BGP to lvs2012 from lsw1-b2-codfw T370862
18:33 brennen: 1.43.0-wmf.16 train (T366961): blockers resolved, rolling to group0
18:31 brennen@deploy1003: Finished scap: Backport for Bump wikimedia/parsoid to 0.20.0-a16 (T371376 T371126) (duration: 08m 54s)
18:29 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lvs2012.codfw.wmnet with reason: reconfigure vlans on lvs2012
18:29 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on lvs2012.codfw.wmnet with reason: reconfigure vlans on lvs2012
18:29 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lsw1-b2-codfw.mgmt with reason: reconfigure vlans on lvs2012
18:29 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on lsw1-b2-codfw.mgmt with reason: reconfigure vlans on lvs2012
18:27 topranks: rebooting lvs2012 (again) to force new network config T370862
18:26 brennen@deploy1003: brennen, cscott: Continuing with sync
18:25 brennen@deploy1003: brennen, cscott: Backport for Bump wikimedia/parsoid to 0.20.0-a16 (T371376 T371126) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
18:23 brennen@deploy1003: Started scap sync-world: Backport for Bump wikimedia/parsoid to 0.20.0-a16 (T371376 T371126)
18:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repool db1174', diff saved to https://phabricator.wikimedia.org/P67083 and previous config saved to /var/cache/conftool/dbconfig/20240730-181331-ladsgroup.json
18:13 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lvs2012.codfw.wmnet with reason: reconfigure vlans on lvs2012
18:13 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on lvs2012.codfw.wmnet with reason: reconfigure vlans on lvs2012
18:12 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depool db1174', diff saved to https://phabricator.wikimedia.org/P67082 and previous config saved to /var/cache/conftool/dbconfig/20240730-181242-ladsgroup.json
18:05 Dreamy_Jazz: Stopped MediaModeration scanning script on ruwiki
17:56 topranks: rebooting lvs2012 to force new network config T370862
17:55 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lvs2012.codfw.wmnet with reason: reconfigure vlans on lvs2012
17:55 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on lvs2012.codfw.wmnet with reason: reconfigure vlans on lvs2012
17:55 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lsw1-b2-codfw.mgmt with reason: reconfigure vlans on lvs2012
17:55 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on lsw1-b2-codfw.mgmt with reason: reconfigure vlans on lvs2012
17:51 hashar@deploy1003: Finished deploy [gerrit/gerrit@40e4e0f]: wm-pcc: separate v5 and v7 in two runs - T371407 (duration: 00m 09s)
17:50 hashar@deploy1003: Started deploy [gerrit/gerrit@40e4e0f]: wm-pcc: separate v5 and v7 in two runs - T371407
17:20 topranks: disable BGP to PyBal on lvs2012 from lsw1-b2-codfw (moving traffic to lvs2014)
17:18 otto@deploy1003: Finished scap: mediawiki.org - Apache Rewrite /beacon/event -> EventLogging rest handler - T353817 (duration: 05m 56s)
17:18 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on lsw1-b2-codfw.mgmt with reason: reconfigure vlans on lvs2012
17:18 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:40:00 on lsw1-b2-codfw.mgmt with reason: reconfigure vlans on lvs2012
17:17 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on lvs2012.codfw.wmnet with reason: reconfigure vlans on lvs2012
17:17 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:40:00 on lvs2012.codfw.wmnet with reason: reconfigure vlans on lvs2012
17:13 otto@deploy1003: Started scap sync-world: mediawiki.org - Apache Rewrite /beacon/event -> EventLogging rest handler - T353817
17:12 topranks: adding row C/D vlans to lsw1-b2-codfw and adding on trunk to lvs2012 T370862
16:09 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply
16:08 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-video: apply
16:07 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply
16:07 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply
16:06 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply
16:06 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-video: apply
15:56 akosiaris: restart pybal for parsoid-php removal on lvs1019, lvs2013 T359387
15:50 jnuche@deploy1003: Installation of scap version "latest" completed for 213 hosts
15:49 jnuche@deploy1003: Installing scap version "latest" for 213 hosts
15:48 jnuche@deploy1003: Installing scap version "latest" for 214 hosts
15:47 jnuche@deploy1003: Installation of scap version "latest" completed for 2 hosts
15:47 jnuche@deploy1003: Installing scap version "latest" for 2 hosts
15:20 akosiaris: restart pybal for parsoid-php removal on lvs1020, lvs2014 T359387
15:09 ayounsi@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) netbox to netbox-dev2003.codfw.wmnet with reason: Release v4.0.8 to netbox-next - ayounsi@cumin1002 - T336275
15:04 ayounsi@cumin1002: START - Cookbook sre.deploy.python-code netbox to netbox-dev2003.codfw.wmnet with reason: Release v4.0.8 to netbox-next - ayounsi@cumin1002 - T336275
15:03 ayounsi@cumin1002: END (FAIL) - Cookbook sre.deploy.python-code (exit_code=99) netbox to netbox-dev2003.codfw.wmnet with reason: Release v4.0.8 to netbox-next - ayounsi@cumin1002 - T336275
15:00 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=dns7001.wikimedia.org [reason: finished upgrading anycast-hc: T370068]
14:59 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pc2017.mgmt.codfw.wmnet with reboot policy GRACEFUL
14:58 ayounsi@cumin1002: START - Cookbook sre.deploy.python-code netbox to netbox-dev2003.codfw.wmnet with reason: Release v4.0.8 to netbox-next - ayounsi@cumin1002 - T336275
14:51 sukhe: [dns7001] upgrade anycast-healthchecker to 0.9.8-1+wmf12u2: T370068
14:51 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=dns7001.wikimedia.org [reason: upgrading anycast-hc: T370068]
14:48 mforns@deploy1003: Finished deploy [airflow-dags/analytics@e1fdaac]: (no justification provided) (duration: 00m 26s)
14:47 mforns@deploy1003: Started deploy [airflow-dags/analytics@e1fdaac]: (no justification provided)
14:47 mforns@deploy1003: Finished deploy [airflow-dags/analytics@e1fdaac]: (no justification provided) (duration: 00m 15s)
14:47 mforns@deploy1003: Started deploy [airflow-dags/analytics@e1fdaac]: (no justification provided)
14:45 urbanecm: mwmaint1002: mwscript extensions/GrowthExperiments/maintenance/revalidateLinkRecommendations.php --wiki=enwiki --all --verbose (T370802; log kept at mwmaint1002:/home/urbanecm/revalidateLinkRecommendations-T370802-july-2024.log)
14:42 elukey@cumin1002: START - Cookbook sre.hosts.provision for host pc2017.mgmt.codfw.wmnet with reboot policy GRACEFUL
14:42 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pc1017.mgmt.eqiad.wmnet with reboot policy GRACEFUL
14:36 ayounsi@cumin1002: END (FAIL) - Cookbook sre.deploy.python-code (exit_code=99) netbox to netbox-dev2003.codfw.wmnet with reason: Release v4.0.8 to netbox-next - ayounsi@cumin1002 - T336275
14:36 ayounsi@cumin1002: START - Cookbook sre.deploy.python-code netbox to netbox-dev2003.codfw.wmnet with reason: Release v4.0.8 to netbox-next - ayounsi@cumin1002 - T336275
14:35 ayounsi@cumin1002: END (FAIL) - Cookbook sre.deploy.python-code (exit_code=99) netbox to netbox-dev2003.codfw.wmnet with reason: Release v4.0.8 to netbox-next - ayounsi@cumin1002 - T336275
14:35 ayounsi@cumin1002: START - Cookbook sre.deploy.python-code netbox to netbox-dev2003.codfw.wmnet with reason: Release v4.0.8 to netbox-next - ayounsi@cumin1002 - T336275
14:33 elukey@cumin1002: START - Cookbook sre.hosts.provision for host pc1017.mgmt.eqiad.wmnet with reboot policy GRACEFUL
14:33 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1247.mgmt.eqiad.wmnet with reboot policy FORCED
14:27 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1246.mgmt.eqiad.wmnet with reboot policy FORCED
14:26 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
14:25 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1243.mgmt.eqiad.wmnet with reboot policy FORCED
14:25 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1248.mgmt.eqiad.wmnet with reboot policy FORCED
14:25 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker1244.mgmt.eqiad.wmnet with reboot policy FORCED
14:25 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1244.mgmt.eqiad.wmnet with reboot policy FORCED
14:24 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker1248.mgmt.eqiad.wmnet with reboot policy FORCED
14:24 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1248.mgmt.eqiad.wmnet with reboot policy FORCED
14:24 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1249.mgmt.eqiad.wmnet with reboot policy FORCED
14:22 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply
14:22 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1241.mgmt.eqiad.wmnet with reboot policy FORCED
14:22 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1242.mgmt.eqiad.wmnet with reboot policy FORCED
14:21 jnuche@deploy1003: Installation of scap version "latest" completed for 2 hosts
14:21 jnuche@deploy1003: Installing scap version "latest" for 2 hosts
14:20 jnuche@deploy1003: Installing scap version "latest" for 3 hosts
14:20 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1245.mgmt.eqiad.wmnet with reboot policy FORCED
14:11 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1247.mgmt.eqiad.wmnet with reboot policy FORCED
14:09 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1246.mgmt.eqiad.wmnet with reboot policy FORCED
14:07 jclark@cumin1002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=97)
14:07 jclark@cumin1002: END (ERROR) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=97) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt wikikube-worker1241-9 - jclark@cumin1002"
14:06 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1243.mgmt.eqiad.wmnet with reboot policy FORCED
14:06 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt wikikube-worker1241-9 - jclark@cumin1002"
14:02 jclark@cumin1002: START - Cookbook sre.dns.netbox
13:58 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1241.mgmt.eqiad.wmnet with reboot policy FORCED
13:58 marostegui: Remove clouddb1021 from zarcillo database T368518
13:57 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker1244.mgmt.eqiad.wmnet with reboot policy FORCED
13:57 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker1241.mgmt.eqiad.wmnet with reboot policy FORCED
13:57 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1245.mgmt.eqiad.wmnet with reboot policy FORCED
13:57 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1244.mgmt.eqiad.wmnet with reboot policy FORCED
13:57 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1241.mgmt.eqiad.wmnet with reboot policy FORCED
13:56 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:56 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt wikikube-worker1241-9 - jclark@cumin1002"
13:55 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker1245.mgmt.eqiad.wmnet with reboot policy FORCED
13:55 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt wikikube-worker1241-9 - jclark@cumin1002"
13:55 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1245.mgmt.eqiad.wmnet with reboot policy FORCED
13:54 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker1241.mgmt.eqiad.wmnet with reboot policy FORCED
13:54 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker1245.mgmt.eqiad.wmnet with reboot policy FORCED
13:54 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker1244.mgmt.eqiad.wmnet with reboot policy FORCED
13:54 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1245.mgmt.eqiad.wmnet with reboot policy FORCED
13:54 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1244.mgmt.eqiad.wmnet with reboot policy FORCED
13:54 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1242.mgmt.eqiad.wmnet with reboot policy FORCED
13:54 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1241.mgmt.eqiad.wmnet with reboot policy FORCED
13:49 jclark@cumin1002: START - Cookbook sre.dns.netbox
13:48 urbanecm@deploy1003: Finished scap: Backport for [eswiki] Enable Visual Editor in namespace Project (T370158), [euwiki] Enable Visual Editor in namespaces Project and Wikiproiektu (T368632), Enable VisualEditor at Spanish Wikiquote (T355336) (duration: 16m 12s)
13:43 marostegui@cumin1002: dbctl commit (dc=all): 'db1193 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P67079 and previous config saved to /var/cache/conftool/dbconfig/20240730-134352-root.json
13:43 urbanecm@deploy1003: urbanecm, gergesshamon: Continuing with sync
13:39 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1240.eqiad.wmnet with OS bullseye
13:39 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
13:34 urbanecm@deploy1003: urbanecm, gergesshamon: Backport for [eswiki] Enable Visual Editor in namespace Project (T370158), [euwiki] Enable Visual Editor in namespaces Project and Wikiproiektu (T368632), Enable VisualEditor at Spanish Wikiquote (T355336) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:33 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
13:32 urbanecm@deploy1003: Started scap sync-world: Backport for [eswiki] Enable Visual Editor in namespace Project (T370158), [euwiki] Enable Visual Editor in namespaces Project and Wikiproiektu (T368632), Enable VisualEditor at Spanish Wikiquote (T355336)
13:31 urbanecm@deploy1003: Finished scap: Backport for Update nlwiki AbuseFilter config per consensus (T370605) (duration: 09m 35s)
13:30 elukey: deprecate the sre-admins posix group fleetwide (replaced by ops-limited) - T360356
13:28 marostegui@cumin1002: dbctl commit (dc=all): 'db1193 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P67078 and previous config saved to /var/cache/conftool/dbconfig/20240730-132846-root.json
13:26 urbanecm@deploy1003: xxblackburnxx, urbanecm: Continuing with sync
13:25 urbanecm@deploy1003: xxblackburnxx, urbanecm: Backport for Update nlwiki AbuseFilter config per consensus (T370605) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:22 urbanecm@deploy1003: Started scap sync-world: Backport for Update nlwiki AbuseFilter config per consensus (T370605)
13:21 urbanecm@deploy1003: Finished scap: Backport for [Growth] hywwiki: Disable Add link backend (T370558), refreshLinkRecommendations: Work even when link-recommendation is disabled (T371316), refreshLinkRecommendations: Work even when link-recommendation is disabled (T371316) (duration: 22m 31s)
13:16 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1240.eqiad.wmnet with reason: host reimage
13:13 marostegui@cumin1002: dbctl commit (dc=all): 'db1193 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P67077 and previous config saved to /var/cache/conftool/dbconfig/20240730-131341-root.json
13:13 Dreamy_Jazz: ruwiki scan is set to time out after 5 hours
13:13 Dreamy_Jazz: Started MediaModeration scan on ruwiki to catch-up on monthly limit
13:12 Dreamy_Jazz: Started MediaModeration script after it crashed - https://wikitech.wikimedia.org/wiki/MediaModeration
13:12 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1240.eqiad.wmnet with reason: host reimage
13:12 marostegui@cumin1002: dbctl commit (dc=all): 'db1201 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P67076 and previous config saved to /var/cache/conftool/dbconfig/20240730-131223-root.json
12:58 urbanecm@deploy1003: Started scap sync-world: Backport for [Growth] hywwiki: Disable Add link backend (T370558), refreshLinkRecommendations: Work even when link-recommendation is disabled (T371316), refreshLinkRecommendations: Work even when link-recommendation is disabled (T371316)
12:58 marostegui@cumin1002: dbctl commit (dc=all): 'db1193 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P67074 and previous config saved to /var/cache/conftool/dbconfig/20240730-125836-root.json
12:57 marostegui@cumin1002: dbctl commit (dc=all): 'db1201 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P67073 and previous config saved to /var/cache/conftool/dbconfig/20240730-125717-root.json
12:56 jnuche@deploy1003: Installation of scap version "latest" completed for 2 hosts
12:56 jnuche@deploy1003: Installing scap version "latest" for 2 hosts
12:56 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1240.eqiad.wmnet with OS bullseye
12:55 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker1240.eqiad.wmnet with OS bullseye
12:43 marostegui@cumin1002: dbctl commit (dc=all): 'db1193 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P67072 and previous config saved to /var/cache/conftool/dbconfig/20240730-124330-root.json
12:42 marostegui@cumin1002: dbctl commit (dc=all): 'db1201 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P67071 and previous config saved to /var/cache/conftool/dbconfig/20240730-124212-root.json
12:41 urbanecm: mwdebug1001: scap pull to overcome scap issues
12:28 marostegui@cumin1002: dbctl commit (dc=all): 'db1193 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P67070 and previous config saved to /var/cache/conftool/dbconfig/20240730-122825-root.json
12:27 marostegui@cumin1002: dbctl commit (dc=all): 'db1201 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P67069 and previous config saved to /var/cache/conftool/dbconfig/20240730-122706-root.json
12:23 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1193.eqiad.wmnet with reason: Change binlog format
12:22 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on db1193.eqiad.wmnet with reason: Change binlog format
12:22 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1193 T371361', diff saved to https://phabricator.wikimedia.org/P67068 and previous config saved to /var/cache/conftool/dbconfig/20240730-122243-root.json
12:21 JustHannah: T371253 Ran mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=dewiktionary --logwiki=metawiki 'Gregorjohannes' 'Klegul'
12:17 urbanecm@deploy1003: Started scap sync-world: Backport for [Growth] hywwiki: Disable Add link backend (T370558), refreshLinkRecommendations: Work even when link-recommendation is disabled (T371316), refreshLinkRecommendations: Work even when link-recommendation is disabled (T371316)
12:16 urbanecm@deploy1003: sync-world aborted: Backport for [Growth] hywwiki: Disable Add link backend (T370558), refreshLinkRecommendations: Work even when link-recommendation is disabled (T371316), refreshLinkRecommendations: Work even when link-recommendation is disabled (T371316) (duration: 14m 10s)
12:15 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1231 T371361', diff saved to https://phabricator.wikimedia.org/P67066 and previous config saved to /var/cache/conftool/dbconfig/20240730-121500-root.json
12:12 marostegui@cumin1002: dbctl commit (dc=all): 'db1201 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P67065 and previous config saved to /var/cache/conftool/dbconfig/20240730-121201-root.json
12:08 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1201.eqiad.wmnet with reason: Change binlog format
12:08 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on db1201.eqiad.wmnet with reason: Change binlog format
12:08 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1201 T371361', diff saved to https://phabricator.wikimedia.org/P67064 and previous config saved to /var/cache/conftool/dbconfig/20240730-120805-root.json
12:02 urbanecm@deploy1003: Started scap sync-world: Backport for [Growth] hywwiki: Disable Add link backend (T370558), refreshLinkRecommendations: Work even when link-recommendation is disabled (T371316), refreshLinkRecommendations: Work even when link-recommendation is disabled (T371316)
11:54 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
11:52 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply
11:47 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: apply
11:47 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply
11:16 marostegui@cumin1002: dbctl commit (dc=all): 'db1179 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P67062 and previous config saved to /var/cache/conftool/dbconfig/20240730-111622-root.json
11:13 marostegui@cumin1002: dbctl commit (dc=all): 'db1224 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P67061 and previous config saved to /var/cache/conftool/dbconfig/20240730-111331-root.json
11:10 kevinbazira@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'logo-detection' for release 'main' .
11:03 klausman@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
11:03 klausman@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
11:02 klausman@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
11:02 klausman@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
11:02 klausman@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
11:02 klausman@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
11:02 klausman@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
11:01 klausman@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
11:01 klausman@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
11:01 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'logo-detection' for release 'main' .
11:01 klausman@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
11:01 marostegui@cumin1002: dbctl commit (dc=all): 'db1179 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P67060 and previous config saved to /var/cache/conftool/dbconfig/20240730-110117-root.json
11:00 klausman@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
11:00 klausman@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
11:00 klausman@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
10:59 klausman@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
10:58 klausman@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
10:58 marostegui@cumin1002: dbctl commit (dc=all): 'db1224 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P67059 and previous config saved to /var/cache/conftool/dbconfig/20240730-105825-root.json
10:56 klausman@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
10:55 klausman@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
10:55 klausman@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
10:55 klausman@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
10:55 klausman@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
10:55 klausman@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
10:54 klausman@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
10:54 klausman@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
10:54 klausman@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
10:53 klausman@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
10:51 klausman@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
10:50 volans@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2227.codfw.wmnet with OS bookworm
10:50 volans@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - volans@cumin2002"
10:49 volans@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - volans@cumin2002"
10:47 marostegui@cumin1002: dbctl commit (dc=all): 'db2216 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P67058 and previous config saved to /var/cache/conftool/dbconfig/20240730-104705-root.json
10:46 marostegui@cumin1002: dbctl commit (dc=all): 'db1179 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P67057 and previous config saved to /var/cache/conftool/dbconfig/20240730-104612-root.json
10:43 marostegui@cumin1002: dbctl commit (dc=all): 'db1224 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P67056 and previous config saved to /var/cache/conftool/dbconfig/20240730-104318-root.json
10:33 kevinbazira@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'logo-detection' for release 'main' .
10:32 volans@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2227.codfw.wmnet with reason: host reimage
10:32 marostegui@cumin1002: dbctl commit (dc=all): 'db2216 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P67054 and previous config saved to /var/cache/conftool/dbconfig/20240730-103200-root.json
10:31 marostegui@cumin1002: dbctl commit (dc=all): 'db1179 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P67053 and previous config saved to /var/cache/conftool/dbconfig/20240730-103106-root.json
10:29 volans@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2227.codfw.wmnet with reason: host reimage
10:28 marostegui@cumin1002: dbctl commit (dc=all): 'db1224 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P67052 and previous config saved to /var/cache/conftool/dbconfig/20240730-102813-root.json
10:21 klausman@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
10:20 klausman@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
10:20 klausman@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
10:20 klausman@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
10:20 klausman@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
10:20 klausman@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
10:20 klausman@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
10:20 klausman@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
10:16 marostegui@cumin1002: dbctl commit (dc=all): 'db2216 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P67051 and previous config saved to /var/cache/conftool/dbconfig/20240730-101654-root.json
10:16 marostegui@cumin1002: dbctl commit (dc=all): 'db1179 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P67050 and previous config saved to /var/cache/conftool/dbconfig/20240730-101600-root.json
10:14 volans@cumin2002: START - Cookbook sre.hosts.reimage for host db2227.codfw.wmnet with OS bookworm
10:13 marostegui@cumin1002: dbctl commit (dc=all): 'db1224 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P67049 and previous config saved to /var/cache/conftool/dbconfig/20240730-101307-root.json
10:08 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1240.mgmt.eqiad.wmnet with reboot policy FORCED
10:02 elukey@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1240.mgmt.eqiad.wmnet with reboot policy FORCED
10:01 marostegui@cumin1002: dbctl commit (dc=all): 'db2216 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P67048 and previous config saved to /var/cache/conftool/dbconfig/20240730-100148-root.json
10:00 marostegui@cumin1002: dbctl commit (dc=all): 'db1179 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P67047 and previous config saved to /var/cache/conftool/dbconfig/20240730-100055-root.json
09:58 marostegui@cumin1002: dbctl commit (dc=all): 'db1224 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P67046 and previous config saved to /var/cache/conftool/dbconfig/20240730-095802-root.json
09:46 marostegui@cumin1002: dbctl commit (dc=all): 'db2216 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P67045 and previous config saved to /var/cache/conftool/dbconfig/20240730-094643-root.json
09:45 marostegui@cumin1002: dbctl commit (dc=all): 'db1179 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P67044 and previous config saved to /var/cache/conftool/dbconfig/20240730-094549-root.json
09:42 marostegui@cumin1002: dbctl commit (dc=all): 'db1224 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P67043 and previous config saved to /var/cache/conftool/dbconfig/20240730-094256-root.json
09:42 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1179.eqiad.wmnet onto db1224.eqiad.wmnet
09:31 marostegui@cumin1002: dbctl commit (dc=all): 'db2216 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P67042 and previous config saved to /var/cache/conftool/dbconfig/20240730-093138-root.json
09:29 marostegui: Deploy schema change on db2203 s1 codfw dbmaint T367856
09:26 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on netbox2003.codfw.wmnet,netbox1003.eqiad.wmnet with reason: netbox upgrade prep work
09:26 ayounsi@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on netbox2003.codfw.wmnet,netbox1003.eqiad.wmnet with reason: netbox upgrade prep work
09:21 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2203.codfw.wmnet with reason: Long schema change
09:21 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2203.codfw.wmnet with reason: Long schema change
09:19 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2203 T371345', diff saved to https://phabricator.wikimedia.org/P67041 and previous config saved to /var/cache/conftool/dbconfig/20240730-091925-marostegui.json
09:17 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db2212 to s1 primary T371345', diff saved to https://phabricator.wikimedia.org/P67040 and previous config saved to /var/cache/conftool/dbconfig/20240730-091742-root.json
09:10 marostegui: Starting s1 codfw failover from db2203 to db2212 - T371345
08:46 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1245.eqiad.wmnet with reason: Maintenance
08:46 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1245.eqiad.wmnet with reason: Maintenance
08:45 marostegui@cumin1002: dbctl commit (dc=all): 'db1244 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P67039 and previous config saved to /var/cache/conftool/dbconfig/20240730-084525-root.json
08:32 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2216.codfw.wmnet onto db2212.codfw.wmnet
08:30 marostegui@cumin1002: dbctl commit (dc=all): 'db1244 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P67038 and previous config saved to /var/cache/conftool/dbconfig/20240730-083020-root.json
08:24 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 35 hosts with reason: Primary switchover s1 T371345
08:24 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 35 hosts with reason: Primary switchover s1 T371345
08:15 marostegui@cumin1002: dbctl commit (dc=all): 'db1244 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P67037 and previous config saved to /var/cache/conftool/dbconfig/20240730-081515-root.json
08:11 akosiaris@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host deploy2002.codfw.wmnet with OS bullseye
08:11 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker1240.mgmt.eqiad.wmnet with reboot policy FORCED
08:06 marostegui: Update db1224 on zarcillo T371276
08:06 marostegui@cumin1002: START - Cookbook sre.mysql.clone of db1179.eqiad.wmnet onto db1224.eqiad.wmnet
08:06 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1179.eqiad.wmnet with reason: Move db1224 to x1
08:05 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 5:00:00 on db1179.eqiad.wmnet with reason: Move db1224 to x1
08:05 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1179 T371276', diff saved to https://phabricator.wikimedia.org/P67035 and previous config saved to /var/cache/conftool/dbconfig/20240730-080538-root.json
08:05 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1224.eqiad.wmnet with reason: Move db1224 to x1
08:05 elukey@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1240.mgmt.eqiad.wmnet with reboot policy FORCED
08:05 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 5:00:00 on db1224.eqiad.wmnet with reason: Move db1224 to x1
08:03 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker1240.mgmt.eqiad.wmnet with reboot policy FORCED
08:02 elukey@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1240.mgmt.eqiad.wmnet with reboot policy FORCED
08:01 marostegui@cumin1002: dbctl commit (dc=all): 'db1238 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P67034 and previous config saved to /var/cache/conftool/dbconfig/20240730-080135-root.json
08:00 marostegui@cumin1002: dbctl commit (dc=all): 'db1244 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P67033 and previous config saved to /var/cache/conftool/dbconfig/20240730-080010-root.json
07:46 marostegui@cumin1002: dbctl commit (dc=all): 'db1238 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P67032 and previous config saved to /var/cache/conftool/dbconfig/20240730-074629-root.json
07:45 marostegui@cumin1002: dbctl commit (dc=all): 'db1244 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P67031 and previous config saved to /var/cache/conftool/dbconfig/20240730-074505-root.json
07:33 akosiaris@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on deploy2002.codfw.wmnet with reason: host reimage
07:31 marostegui@cumin1002: dbctl commit (dc=all): 'db1238 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P67030 and previous config saved to /var/cache/conftool/dbconfig/20240730-073124-root.json
07:30 marostegui@cumin1002: dbctl commit (dc=all): 'db1244 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P67029 and previous config saved to /var/cache/conftool/dbconfig/20240730-072959-root.json
07:28 akosiaris@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on deploy2002.codfw.wmnet with reason: host reimage
07:16 marostegui@cumin1002: dbctl commit (dc=all): 'db1238 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P67028 and previous config saved to /var/cache/conftool/dbconfig/20240730-071619-root.json
07:14 marostegui@cumin1002: dbctl commit (dc=all): 'db1244 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P67027 and previous config saved to /var/cache/conftool/dbconfig/20240730-071454-root.json
07:14 godog: finish rolling out benthos 4.27.0-1
07:10 akosiaris@cumin1002: START - Cookbook sre.hosts.reimage for host deploy2002.codfw.wmnet with OS bullseye
07:01 marostegui@cumin1002: dbctl commit (dc=all): 'db1238 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P67026 and previous config saved to /var/cache/conftool/dbconfig/20240730-070114-root.json
06:58 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1244.eqiad.wmnet onto db1238.eqiad.wmnet
06:56 marostegui@cumin1002: START - Cookbook sre.mysql.clone of db2216.codfw.wmnet onto db2212.codfw.wmnet
06:48 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2216', diff saved to https://phabricator.wikimedia.org/P67025 and previous config saved to /var/cache/conftool/dbconfig/20240730-064853-root.json
06:48 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2212', diff saved to https://phabricator.wikimedia.org/P67024 and previous config saved to /var/cache/conftool/dbconfig/20240730-064835-root.json
06:41 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 35 hosts with reason: Primary switchover s1 T371345
06:41 marostegui@cumin1002: dbctl commit (dc=all): 'Set db2212 with weight 0 T371345', diff saved to https://phabricator.wikimedia.org/P67023 and previous config saved to /var/cache/conftool/dbconfig/20240730-064128-marostegui.json
06:41 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 35 hosts with reason: Primary switchover s1 T371345
05:24 marostegui@cumin1002: dbctl commit (dc=all): 'db2179 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P67022 and previous config saved to /var/cache/conftool/dbconfig/20240730-052420-root.json
05:20 marostegui: Change candidate master in s4 eqiad (this is a NOOP) T371343
05:09 marostegui@cumin1002: dbctl commit (dc=all): 'db2179 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P67021 and previous config saved to /var/cache/conftool/dbconfig/20240730-050914-root.json
05:04 marostegui@cumin1002: START - Cookbook sre.mysql.clone of db1244.eqiad.wmnet onto db1238.eqiad.wmnet
04:55 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1244.eqiad.wmnet with reason: Recloning db1238
04:55 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1244.eqiad.wmnet with reason: Recloning db1238
04:55 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1238.eqiad.wmnet with reason: Long schema change
04:55 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1238.eqiad.wmnet with reason: Long schema change
04:54 marostegui@cumin1002: dbctl commit (dc=all): 'db2179 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P67020 and previous config saved to /var/cache/conftool/dbconfig/20240730-045409-root.json
04:53 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1238 T371251', diff saved to https://phabricator.wikimedia.org/P67019 and previous config saved to /var/cache/conftool/dbconfig/20240730-045336-marostegui.json
04:51 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db1160 to s4 primary and set section read-write T371251', diff saved to https://phabricator.wikimedia.org/P67018 and previous config saved to /var/cache/conftool/dbconfig/20240730-045104-marostegui.json
04:50 marostegui@cumin1002: dbctl commit (dc=all): 'Set s4 eqiad as read-only for maintenance - T371251', diff saved to https://phabricator.wikimedia.org/P67017 and previous config saved to /var/cache/conftool/dbconfig/20240730-045032-root.json
04:50 marostegui: Starting s4 eqiad failover from db1238 to db1160 - T371251
04:39 marostegui@cumin1002: dbctl commit (dc=all): 'db2179 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P67016 and previous config saved to /var/cache/conftool/dbconfig/20240730-043904-root.json
04:27 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1163 (T367856)', diff saved to https://phabricator.wikimedia.org/P67015 and previous config saved to /var/cache/conftool/dbconfig/20240730-042755-marostegui.json
04:27 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1163.eqiad.wmnet with reason: Maintenance
04:27 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1163.eqiad.wmnet with reason: Maintenance
04:25 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 32 hosts with reason: Primary switchover s4 T371251
04:25 marostegui@cumin1002: dbctl commit (dc=all): 'Set db1160 with weight 0 T371251', diff saved to https://phabricator.wikimedia.org/P67014 and previous config saved to /var/cache/conftool/dbconfig/20240730-042528-root.json
04:25 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 32 hosts with reason: Primary switchover s4 T371251
04:23 marostegui@cumin1002: dbctl commit (dc=all): 'db2179 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P67013 and previous config saved to /var/cache/conftool/dbconfig/20240730-042358-root.json
04:07 mwpresync@deploy1003: Pruned MediaWiki: 1.43.0-wmf.13 (duration: 06m 51s)
03:02 mwpresync@deploy1003: Started scap sync-world: testwikis to 1.43.0-wmf.16 refs T366961
02:52 eileen: disabled audit modules (Adyen audit etc)
02:09 eileen: civicrm upgraded from 2837c4e9 to 5e72c64f
02:05 eileen: config revision changed from 8e2f7c03 to 10ead940
01:02 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216 (T367856)', diff saved to https://phabricator.wikimedia.org/P67011 and previous config saved to /var/cache/conftool/dbconfig/20240730-010232-marostegui.json
00:47 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216', diff saved to https://phabricator.wikimedia.org/P67010 and previous config saved to /var/cache/conftool/dbconfig/20240730-004725-marostegui.json
00:32 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216', diff saved to https://phabricator.wikimedia.org/P67009 and previous config saved to /var/cache/conftool/dbconfig/20240730-003218-marostegui.json
00:17 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216 (T367856)', diff saved to https://phabricator.wikimedia.org/P67008 and previous config saved to /var/cache/conftool/dbconfig/20240730-001710-marostegui.json

2024-07-29

23:19 eileen: civicrm upgraded from efbb874e to 2837c4e9
22:19 eileen: * civicrm upgraded from 1dc4f944 to efbb874e
21:42 dwisehaupt@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
21:42 dwisehaupt@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: * - dwisehaupt@cumin1002"
21:41 dwisehaupt@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: * - dwisehaupt@cumin1002"
21:38 dwisehaupt@cumin1002: START - Cookbook sre.dns.netbox
21:09 cjming: end of UTC late backport window
21:06 cjming@deploy1003: Finished scap: Backport for Produce a limited set of event streams on private wikis (pt 2) (T346046) (duration: 10m 40s)
21:00 cjming@deploy1003: ebernhardson, cjming: Continuing with sync
21:00 cjming@deploy1003: ebernhardson, cjming: Backport for Produce a limited set of event streams on private wikis (pt 2) (T346046) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:55 cjming@deploy1003: Started scap sync-world: Backport for Produce a limited set of event streams on private wikis (pt 2) (T346046)
20:52 cjming@deploy1003: Finished scap: Backport for Clean up night mode exclude namespaces and allow font size on submit (T370092 T370505) (duration: 08m 18s)
20:48 ebernhardson@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync
20:48 ebernhardson@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync
20:46 cjming@deploy1003: cjming, jdlrobson: Continuing with sync
20:45 cjming@deploy1003: cjming, jdlrobson: Backport for Clean up night mode exclude namespaces and allow font size on submit (T370092 T370505) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:45 ebernhardson@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync
20:45 ebernhardson@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: sync
20:43 cjming@deploy1003: Started scap sync-world: Backport for Clean up night mode exclude namespaces and allow font size on submit (T370092 T370505)
20:42 cjming@deploy1003: Finished scap: Backport for Produce a limited set of event streams on private wikis (pt 1) (T346046) (duration: 07m 30s)
20:37 cjming@deploy1003: ebernhardson, cjming: Continuing with sync
20:36 cjming@deploy1003: ebernhardson, cjming: Backport for Produce a limited set of event streams on private wikis (pt 1) (T346046) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:34 cjming@deploy1003: Started scap sync-world: Backport for Produce a limited set of event streams on private wikis (pt 1) (T346046)
20:33 cjming@deploy1003: Finished scap: Backport for enwiki, commonswiki: lift IP cap for edit-a-thon (T371026) (duration: 07m 59s)
20:27 cjming@deploy1003: superzerocool, cjming: Continuing with sync
20:27 cjming@deploy1003: superzerocool, cjming: Backport for enwiki, commonswiki: lift IP cap for edit-a-thon (T371026) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:25 cjming@deploy1003: Started scap sync-world: Backport for enwiki, commonswiki: lift IP cap for edit-a-thon (T371026)
20:19 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db2227.codfw.wmnet with OS bookworm
20:15 cjming@deploy1003: Finished scap: Backport for Increase edit count requirement for autoconfirmed on English Wikivoyage (T371186) (duration: 08m 52s)
20:10 cjming@deploy1003: nmw03, cjming: Continuing with sync
20:08 cjming@deploy1003: nmw03, cjming: Backport for Increase edit count requirement for autoconfirmed on English Wikivoyage (T371186) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:06 cjming@deploy1003: Started scap sync-world: Backport for Increase edit count requirement for autoconfirmed on English Wikivoyage (T371186)
18:59 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2227.codfw.wmnet with OS bookworm
18:58 pt1979@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host db2227.codfw.wmnet with OS bookworm
17:52 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2227.codfw.wmnet with OS bookworm
17:51 urbanecm: mwmaint1002: kill extensions/GrowthExperiments/maintenance/refreshLinkRecommendations.php for enwiki (T370802)
17:50 marostegui@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db2227.codfw.wmnet with OS bookworm
17:26 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4052.ulsfo.wmnet [reason: testing ATS 9.2.5 upgrade]
17:25 marostegui@cumin1002: START - Cookbook sre.hosts.reimage for host db2227.codfw.wmnet with OS bookworm
17:24 marostegui@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db2227.codfw.wmnet with OS bookworm
17:17 sukhe@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-ats (exit_code=0) Rolling upgrade/restart of Apache Traffic Server on P{cp4052*} and (A:cp-eqiad or A:cp-text_eqiad or A:cp-upload_eqiad or A:cp-codfw or A:cp-text_codfw or A:cp-upload_codfw or A:cp-esams or A:cp-text_esams or A:cp-upload_esams or A:cp-ulsfo or A:cp-text_ulsfo or A:cp-upload_ulsfo or A:cp-eqsin or A:cp-text_eqsin or A:cp-upload_eqsin or A:cp-
17:14 sukhe@cumin1002: START - Cookbook sre.cdn.roll-upgrade-ats Rolling upgrade/restart of Apache Traffic Server on P{cp4052*} and (A:cp-eqiad or A:cp-text_eqiad or A:cp-upload_eqiad or A:cp-codfw or A:cp-text_codfw or A:cp-upload_codfw or A:cp-esams or A:cp-text_esams or A:cp-upload_esams or A:cp-ulsfo or A:cp-text_ulsfo or A:cp-upload_ulsfo or A:cp-eqsin or A:cp-text_eqsin or A:cp-upload_eqsin or A:cp-drmrs or A:cp-text_
17:14 sukhe: reprepro -C main include bullseye-wikimedia trafficserver_9.2.5-1wm2_amd64.changes T339134
16:47 urbanecm@deploy1003: Finished scap: Backport for Display a GlobalBlock link to stewards in Special:CheckUser (T370463 T178571), Ignore help-links with no title configured (T370941) (duration: 10m 56s)
16:45 marostegui@cumin1002: START - Cookbook sre.hosts.reimage for host db2227.codfw.wmnet with OS bookworm
16:44 marostegui@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db2227.codfw.wmnet with OS bookworm
16:42 urbanecm@deploy1003: dreamyjazz, migr, urbanecm: Continuing with sync
16:38 urbanecm@deploy1003: dreamyjazz, migr, urbanecm: Backport for Display a GlobalBlock link to stewards in Special:CheckUser (T370463 T178571), Ignore help-links with no title configured (T370941) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
16:38 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host gerrit2003.wikimedia.org with OS bookworm
16:38 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
16:36 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
16:36 urbanecm@deploy1003: Started scap sync-world: Backport for Display a GlobalBlock link to stewards in Special:CheckUser (T370463 T178571), Ignore help-links with no title configured (T370941)
16:30 Emperor: restart swift-proxy on ms-fe2011 T360913
16:18 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gerrit2003.wikimedia.org with reason: host reimage
16:17 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp4052.ulsfo.wmnet [reason: testing ATS 9.2.5 upgrade]
16:15 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on gerrit2003.wikimedia.org with reason: host reimage
16:04 sukhe@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-ats (exit_code=0) Rolling upgrade/restart of Apache Traffic Server on P{cp4052*} and (A:cp-eqiad or A:cp-text_eqiad or A:cp-upload_eqiad or A:cp-codfw or A:cp-text_codfw or A:cp-upload_codfw or A:cp-esams or A:cp-text_esams or A:cp-upload_esams or A:cp-ulsfo or A:cp-text_ulsfo or A:cp-upload_ulsfo or A:cp-eqsin or A:cp-text_eqsin or A:cp-upload_eqsin or A:cp-
16:01 sukhe@cumin1002: START - Cookbook sre.cdn.roll-upgrade-ats Rolling upgrade/restart of Apache Traffic Server on P{cp4052*} and (A:cp-eqiad or A:cp-text_eqiad or A:cp-upload_eqiad or A:cp-codfw or A:cp-text_codfw or A:cp-upload_codfw or A:cp-esams or A:cp-text_esams or A:cp-upload_esams or A:cp-ulsfo or A:cp-text_ulsfo or A:cp-upload_ulsfo or A:cp-eqsin or A:cp-text_eqsin or A:cp-upload_eqsin or A:cp-drmrs or A:cp-text_
15:57 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host gerrit2003.wikimedia.org with OS bookworm
15:56 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:56 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add public vlan for gerrit2003 - pt1979@cumin2002"
15:56 sukhe: reprepro -C main include bullseye-wikimedia trafficserver_9.2.5-1wm1_amd64.changes T339134
15:55 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add public vlan for gerrit2003 - pt1979@cumin2002"
15:55 klausman@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
15:54 klausman@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
15:53 pt1979@cumin2002: START - Cookbook sre.dns.netbox
15:49 klausman@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
15:48 klausman@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
15:47 pt1979@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host gerrit2003.codfw.wmnet with OS bookworm
15:41 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host gerrit2003.codfw.wmnet with OS bookworm
15:40 pt1979@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host gerrit2003.codfw.wmnet with OS bookworm
15:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2233.codfw.wmnet with OS bookworm
15:33 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
15:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
15:23 klausman@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
15:23 klausman@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
15:18 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker1240.eqiad.wmnet with OS bullseye
15:18 klausman@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
15:17 klausman@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
15:16 klausman@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
15:16 klausman@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
15:14 sukhe: running authdns-update after dns2006 depool
15:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2233.codfw.wmnet with reason: host reimage
15:13 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=dns2006.wikimedia.org [reason: finished upgrading anycast-hc: T370068]
15:10 sukhe: [dns2006] upgrade anycast-healthchecker to 0.9.8-1+wmf12u2: T370068
15:10 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2233.codfw.wmnet with reason: host reimage
15:09 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=dns2006.wikimedia.org [reason: upgrading anycast-hc: T370068]
15:02 marostegui@cumin1002: START - Cookbook sre.hosts.reimage for host db2227.codfw.wmnet with OS bookworm
14:59 klausman@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
14:58 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host gerrit2003.codfw.wmnet with OS bookworm
14:58 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db2227.codfw.wmnet with OS bookworm
14:57 klausman@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
14:56 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db2233.codfw.wmnet with OS bookworm
14:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db2233.codfw.wmnet with OS bookworm
14:45 Lucas_WMDE: UTC afternoon backport+config window done
14:41 logmsgbot: lucaswerkmeister-wmde@deploy1003 Finished scap: Backport for Revert "TranslatablePage: Split translatable page id cache into multiple shards" (T366455) (duration: 07m 58s)
14:39 klausman@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
14:37 klausman@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
14:35 logmsgbot: lucaswerkmeister-wmde@deploy1003 lucaswerkmeister-wmde: Continuing with sync
14:35 logmsgbot: lucaswerkmeister-wmde@deploy1003 lucaswerkmeister-wmde: Backport for Revert "TranslatablePage: Split translatable page id cache into multiple shards" (T366455) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:34 sukhe: sudo cumin -b1 -s120 'O:wikidough' 'run-puppet-agent'
14:33 sukhe: A:wikidough: debdeploy upgrade anycast-hc to 0.9.8: T370068
14:33 logmsgbot: lucaswerkmeister-wmde@deploy1003 Started scap sync-world: Backport for Revert "TranslatablePage: Split translatable page id cache into multiple shards" (T366455)
14:33 sukhe: A:wikidough: debdeploy upgrade anycast-hc to 0.9.8
14:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2229.codfw.wmnet with OS bookworm
14:24 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
14:24 herron: the grafana default datasource has been changed from graphite to thanos T269333
14:23 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
14:23 klausman@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
14:22 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2231.codfw.wmnet with OS bookworm
14:22 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
14:21 logmsgbot: lucaswerkmeister-wmde@deploy1003 Finished scap: Backport for TranslatablePage: Split translatable page id cache into multiple shards (T366455) (duration: 19m 24s)
14:21 klausman@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
14:20 klausman@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
14:20 klausman@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
14:19 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
14:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2230.codfw.wmnet with OS bookworm
14:19 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
14:15 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
14:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2232.codfw.wmnet with OS bookworm
14:15 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
14:14 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db2227.codfw.wmnet with OS bookworm
14:13 logmsgbot: lucaswerkmeister-wmde@deploy1003 lucaswerkmeister-wmde, abi: Continuing with sync
14:13 logmsgbot: lucaswerkmeister-wmde@deploy1003 lucaswerkmeister-wmde, abi: Backport for TranslatablePage: Split translatable page id cache into multiple shards (T366455) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2228.codfw.wmnet with OS bookworm
14:13 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
14:12 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
14:09 SandraEbele_: rerunning airflow mediawiki_history_check_denormalize dag as down stream task after rerunning mediawiki_history_denormalize dag
14:07 cgoubert@cumin1002: conftool action : set/weight=10:pooled=yes; selector: name=(wikikube-worker2039.codfw.wmnet),cluster=kubernetes,service=kubesvc [reason: Pooling and uncordoning - T351074]
14:06 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
14:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2231.codfw.wmnet with reason: host reimage
14:02 logmsgbot: lucaswerkmeister-wmde@deploy1003 Started scap sync-world: Backport for TranslatablePage: Split translatable page id cache into multiple shards (T366455)
14:01 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1240.eqiad.wmnet with OS bullseye
14:01 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1240.mgmt.eqiad.wmnet with reboot policy FORCED
14:01 jnuche@deploy1003: Installation of scap version "4.94.0" completed for 210 hosts
14:00 jnuche@deploy1003: Installing scap version "4.94.0" for 210 hosts
13:59 jnuche@deploy1003: Installing scap version "4.94.0" for 211 hosts
13:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2229.codfw.wmnet with reason: host reimage
13:56 claime: homer 'cr*codfw*' commit 'T351074'
13:55 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2230.codfw.wmnet with reason: host reimage
13:52 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2232.codfw.wmnet with reason: host reimage
13:49 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2228.codfw.wmnet with reason: host reimage
13:49 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1240.mgmt.eqiad.wmnet with reboot policy FORCED
13:48 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2231.codfw.wmnet with reason: host reimage
13:48 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2229.codfw.wmnet with reason: host reimage
13:48 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:48 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt wikikube-worker1240 - jclark@cumin1002"
13:47 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2232.codfw.wmnet with reason: host reimage
13:47 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2230.codfw.wmnet with reason: host reimage
13:47 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt wikikube-worker1240 - jclark@cumin1002"
13:47 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2228.codfw.wmnet with reason: host reimage
13:46 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db2227.codfw.wmnet with OS bookworm
13:45 logmsgbot: lucaswerkmeister-wmde@deploy1003 Synchronized php-1.43.0-wmf.15/extensions/ContentTranslation/extension.json: Backport for AX: Unregister "axArticleFooterEntrypointRegistrar" hook handler (T363338) (duration: 06m 36s)
13:44 jclark@cumin1002: START - Cookbook sre.dns.netbox
13:41 XioNoX: push new pfw policies - T371137
13:36 filippo@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafkamon2003.codfw.wmnet
13:33 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db2233.codfw.wmnet with OS bookworm
13:33 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db2232.codfw.wmnet with OS bookworm
13:33 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db2231.codfw.wmnet with OS bookworm
13:33 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db2230.codfw.wmnet with OS bookworm
13:33 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db2229.codfw.wmnet with OS bookworm
13:33 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db2228.codfw.wmnet with OS bookworm
13:33 filippo@cumin1002: START - Cookbook sre.hosts.reboot-single for host kafkamon2003.codfw.wmnet
13:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2222.codfw.wmnet with OS bookworm
13:30 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
13:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
13:24 logmsgbot: lucaswerkmeister-wmde@deploy1003 Synchronized wmf-config/: Backport for Enable mul language code on Wikidata (limited mode) (T330281) (duration: 06m 47s)
13:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2225.codfw.wmnet with OS bookworm
13:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2222.codfw.wmnet with reason: host reimage
13:11 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2222.codfw.wmnet with reason: host reimage
13:10 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2226.codfw.wmnet with OS bookworm
13:10 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
13:07 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2225.codfw.wmnet with reason: host reimage
13:05 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2225.codfw.wmnet with reason: host reimage
13:03 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
13:02 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db2225.codfw.wmnet with OS bookworm
13:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2223.codfw.wmnet with OS bookworm
13:01 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
13:00 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
12:59 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host db2225.codfw.wmnet with OS bookworm
12:58 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
12:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
12:57 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2224.codfw.wmnet with OS bookworm
12:57 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
12:57 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db2222.codfw.wmnet with OS bookworm
12:55 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
12:55 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2221.codfw.wmnet with OS bookworm
12:55 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
12:53 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2039.codfw.wmnet with OS bullseye
12:51 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
12:48 klausman@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
12:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2226.codfw.wmnet with reason: host reimage
12:47 klausman@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
12:46 godog: upgrade and roll-restart benthos@mw_accesslog_sampler on logstash hosts
12:45 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db2222.codfw.wmnet with OS bookworm
12:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2223.codfw.wmnet with reason: host reimage
12:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2225.codfw.wmnet with reason: host reimage
12:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2224.codfw.wmnet with reason: host reimage
12:35 godog: test benthos 4.27 on logstash1023
12:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2221.codfw.wmnet with reason: host reimage
12:34 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2225.codfw.wmnet with reason: host reimage
12:34 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2224.codfw.wmnet with reason: host reimage
12:33 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2223.codfw.wmnet with reason: host reimage
12:33 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2226.codfw.wmnet with reason: host reimage
12:32 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2039.codfw.wmnet with reason: host reimage
12:30 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2221.codfw.wmnet with reason: host reimage
12:27 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2039.codfw.wmnet with reason: host reimage
12:18 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db2227.codfw.wmnet with OS bookworm
12:18 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db2226.codfw.wmnet with OS bookworm
12:18 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db2225.codfw.wmnet with OS bookworm
12:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db2224.codfw.wmnet with OS bookworm
12:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db2223.codfw.wmnet with OS bookworm
12:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db2222.codfw.wmnet with OS bookworm
12:14 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db2221.codfw.wmnet with OS bookworm
12:08 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2039.codfw.wmnet with OS bullseye
12:07 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2441 to wikikube-worker2039
12:07 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2039
12:06 klausman@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
12:02 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/recommendation-api: sync
12:02 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/recommendation-api: sync
12:01 klausman@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
11:59 klausman@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
11:51 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2039
11:51 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
11:51 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2441 to wikikube-worker2039 - cgoubert@cumin1002"
11:49 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2441 to wikikube-worker2039 - cgoubert@cumin1002"
11:46 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
11:46 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2441 to wikikube-worker2039
11:26 akosiaris@deploy1003: Finished scap: check the deployment server after switchover (duration: 32m 28s)
11:14 marostegui@cumin1002: dbctl commit (dc=all): 'es1032 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P67004 and previous config saved to /var/cache/conftool/dbconfig/20240729-111410-root.json
10:59 marostegui@cumin1002: dbctl commit (dc=all): 'es1032 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P67003 and previous config saved to /var/cache/conftool/dbconfig/20240729-105904-root.json
10:54 akosiaris@deploy1003: Started scap sync-world: check the deployment server after switchover
10:43 marostegui@cumin1002: dbctl commit (dc=all): 'es1032 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P67002 and previous config saved to /var/cache/conftool/dbconfig/20240729-104358-root.json
10:28 marostegui@cumin1002: dbctl commit (dc=all): 'es1032 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P67001 and previous config saved to /var/cache/conftool/dbconfig/20240729-102853-root.json
10:20 marostegui: Deploy schema change on s7 eqiad master with replication dbmaint T370394
10:14 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2441.mgmt.codfw.wmnet with reboot policy GRACEFUL
10:13 marostegui@cumin1002: dbctl commit (dc=all): 'es1032 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P67000 and previous config saved to /var/cache/conftool/dbconfig/20240729-101348-root.json
10:12 godog: bounce benthos@mw_accesslog_sampler on logstash collectors
10:11 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es1032.eqiad.wmnet with reason: Long schema change
10:11 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on es1032.eqiad.wmnet with reason: Long schema change
10:07 cgoubert@cumin1002: START - Cookbook sre.hosts.provision for host mw2441.mgmt.codfw.wmnet with reboot policy GRACEFUL
09:31 Dreamy_Jazz: Restarted MediaModeration scanning script - https://wikitech.wikimedia.org/wiki/MediaModeration
09:27 dcausse@deploy1002: Finished deploy [airflow-dags/search@7da1ef0]: search: process_sparql_query workaround oom issues (duration: 00m 20s)
09:27 dcausse@deploy1002: Started deploy [airflow-dags/search@7da1ef0]: search: process_sparql_query workaround oom issues
09:22 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es1032 investigate access denied errors', diff saved to https://phabricator.wikimedia.org/P66999 and previous config saved to /var/cache/conftool/dbconfig/20240729-092239-root.json
09:17 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1244 (T367856)', diff saved to https://phabricator.wikimedia.org/P66998 and previous config saved to /var/cache/conftool/dbconfig/20240729-091658-marostegui.json
09:16 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1244.eqiad.wmnet with reason: Maintenance
09:16 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1244.eqiad.wmnet with reason: Maintenance
09:16 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243 (T367856)', diff saved to https://phabricator.wikimedia.org/P66997 and previous config saved to /var/cache/conftool/dbconfig/20240729-091637-marostegui.json
09:09 marostegui@cumin1002: dbctl commit (dc=all): 'Repool 25% of es1032', diff saved to https://phabricator.wikimedia.org/P66996 and previous config saved to /var/cache/conftool/dbconfig/20240729-090953-marostegui.json
09:07 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es1032.eqiad.wmnet with reason: Long schema change
09:07 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on es1032.eqiad.wmnet with reason: Long schema change
09:07 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es1032 investigate access denied errors', diff saved to https://phabricator.wikimedia.org/P66995 and previous config saved to /var/cache/conftool/dbconfig/20240729-090730-root.json
09:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243', diff saved to https://phabricator.wikimedia.org/P66994 and previous config saved to /var/cache/conftool/dbconfig/20240729-090129-marostegui.json
08:46 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243', diff saved to https://phabricator.wikimedia.org/P66992 and previous config saved to /var/cache/conftool/dbconfig/20240729-084622-marostegui.json
08:31 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243 (T367856)', diff saved to https://phabricator.wikimedia.org/P66991 and previous config saved to /var/cache/conftool/dbconfig/20240729-083115-marostegui.json
07:54 dcausse: closing the backport window
07:53 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 24482
07:51 dcausse@deploy1002: Finished scap: Backport for GeoData: add pool counter settings (T370621) (duration: 11m 36s)
07:47 brouberol@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts karapace1001.eqiad.wmnet
07:47 brouberol@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
07:47 brouberol@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: karapace1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1002"
07:46 brouberol@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: karapace1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1002"
07:46 dcausse@deploy1002: dcausse: Continuing with sync
07:42 dcausse@deploy1002: dcausse: Backport for GeoData: add pool counter settings (T370621) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
07:41 brouberol@cumin1002: START - Cookbook sre.dns.netbox
07:39 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 24482
07:39 dcausse@deploy1002: Started scap sync-world: Backport for GeoData: add pool counter settings (T370621)
07:39 ayounsi@cumin1002: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'configure' for AS: 24482
07:37 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 24482
07:34 kartik@deploy1002: Finished scap: Backport for Temporary disable MinT for Wikireaders for bn, fa, hi, and ko (duration: 14m 42s)
07:34 brouberol@cumin1002: START - Cookbook sre.hosts.decommission for hosts karapace1001.eqiad.wmnet
07:34 brouberol@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts karapace1002.eqiad.wmnet
07:34 brouberol@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
07:34 brouberol@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: karapace1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1002"
07:32 brouberol@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: karapace1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1002"
07:29 brouberol@cumin1002: START - Cookbook sre.dns.netbox
07:25 kartik@deploy1002: kartik: Continuing with sync
07:25 kartik@deploy1002: kartik: Backport for Temporary disable MinT for Wikireaders for bn, fa, hi, and ko synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
07:25 brouberol@cumin1002: START - Cookbook sre.hosts.decommission for hosts karapace1002.eqiad.wmnet
07:19 kartik@deploy1002: Started scap sync-world: Backport for Temporary disable MinT for Wikireaders for bn, fa, hi, and ko
07:19 kartik@deploy1002: Sync cancelled.
07:19 kartik@deploy1002: kartik: Backport for Temporary disable MinT for Wikireaders for bn, fa, hi, and ko synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
07:03 kartik@deploy1002: Started scap sync-world: Backport for Temporary disable MinT for Wikireaders for bn, fa, hi, and ko
06:48 marostegui: Deploy schema change on s4 codfw db2179 dbmaint T367856
06:47 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2179.codfw.wmnet with reason: Long schema change
06:46 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2179.codfw.wmnet with reason: Long schema change
06:44 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2179 T371205', diff saved to https://phabricator.wikimedia.org/P66990 and previous config saved to /var/cache/conftool/dbconfig/20240729-064405-marostegui.json
06:42 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db2140 to s4 primary T371205', diff saved to https://phabricator.wikimedia.org/P66989 and previous config saved to /var/cache/conftool/dbconfig/20240729-064250-marostegui.json
06:42 marostegui: Starting s4 codfw failover from db2179 to db2140 - T371205
03:08 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2216 (T367856)', diff saved to https://phabricator.wikimedia.org/P66984 and previous config saved to /var/cache/conftool/dbconfig/20240729-030804-marostegui.json
03:07 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2216.codfw.wmnet with reason: Maintenance
03:07 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2216.codfw.wmnet with reason: Maintenance
03:07 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2212 (T367856)', diff saved to https://phabricator.wikimedia.org/P66983 and previous config saved to /var/cache/conftool/dbconfig/20240729-030742-marostegui.json
02:52 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2212', diff saved to https://phabricator.wikimedia.org/P66982 and previous config saved to /var/cache/conftool/dbconfig/20240729-025235-marostegui.json
02:37 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2212', diff saved to https://phabricator.wikimedia.org/P66981 and previous config saved to /var/cache/conftool/dbconfig/20240729-023728-marostegui.json
02:22 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2212 (T367856)', diff saved to https://phabricator.wikimedia.org/P66980 and previous config saved to /var/cache/conftool/dbconfig/20240729-022221-marostegui.json

2024-07-28

19:00 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1243 (T367856)', diff saved to https://phabricator.wikimedia.org/P66979 and previous config saved to /var/cache/conftool/dbconfig/20240728-190050-marostegui.json
19:00 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1243.eqiad.wmnet with reason: Maintenance
19:00 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1243.eqiad.wmnet with reason: Maintenance
19:00 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242 (T367856)', diff saved to https://phabricator.wikimedia.org/P66978 and previous config saved to /var/cache/conftool/dbconfig/20240728-190028-marostegui.json
18:45 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242', diff saved to https://phabricator.wikimedia.org/P66977 and previous config saved to /var/cache/conftool/dbconfig/20240728-184521-marostegui.json
18:30 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242', diff saved to https://phabricator.wikimedia.org/P66976 and previous config saved to /var/cache/conftool/dbconfig/20240728-183013-marostegui.json
18:15 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242 (T367856)', diff saved to https://phabricator.wikimedia.org/P66975 and previous config saved to /var/cache/conftool/dbconfig/20240728-181506-marostegui.json
04:42 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2212 (T367856)', diff saved to https://phabricator.wikimedia.org/P66974 and previous config saved to /var/cache/conftool/dbconfig/20240728-044200-marostegui.json
04:41 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2212.codfw.wmnet with reason: Maintenance
04:41 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2212.codfw.wmnet with reason: Maintenance
04:20 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1242 (T367856)', diff saved to https://phabricator.wikimedia.org/P66973 and previous config saved to /var/cache/conftool/dbconfig/20240728-042021-marostegui.json
04:20 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1242.eqiad.wmnet with reason: Maintenance
04:20 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1242.eqiad.wmnet with reason: Maintenance
04:20 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241 (T367856)', diff saved to https://phabricator.wikimedia.org/P66972 and previous config saved to /var/cache/conftool/dbconfig/20240728-042000-marostegui.json
04:04 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241', diff saved to https://phabricator.wikimedia.org/P66971 and previous config saved to /var/cache/conftool/dbconfig/20240728-040453-marostegui.json
03:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241', diff saved to https://phabricator.wikimedia.org/P66970 and previous config saved to /var/cache/conftool/dbconfig/20240728-034946-marostegui.json
03:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241 (T367856)', diff saved to https://phabricator.wikimedia.org/P66969 and previous config saved to /var/cache/conftool/dbconfig/20240728-033440-marostegui.json

2024-07-27

13:59 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1241 (T367856)', diff saved to https://phabricator.wikimedia.org/P66968 and previous config saved to /var/cache/conftool/dbconfig/20240727-135859-marostegui.json
13:58 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1241.eqiad.wmnet with reason: Maintenance
13:58 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1241.eqiad.wmnet with reason: Maintenance
13:58 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221 (T367856)', diff saved to https://phabricator.wikimedia.org/P66967 and previous config saved to /var/cache/conftool/dbconfig/20240727-135838-marostegui.json
13:43 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221', diff saved to https://phabricator.wikimedia.org/P66966 and previous config saved to /var/cache/conftool/dbconfig/20240727-134331-marostegui.json
13:28 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221', diff saved to https://phabricator.wikimedia.org/P66965 and previous config saved to /var/cache/conftool/dbconfig/20240727-132824-marostegui.json
13:13 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221 (T367856)', diff saved to https://phabricator.wikimedia.org/P66964 and previous config saved to /var/cache/conftool/dbconfig/20240727-131316-marostegui.json
11:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P66963 and previous config saved to /var/cache/conftool/dbconfig/20240727-113018-ladsgroup.json
11:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P66962 and previous config saved to /var/cache/conftool/dbconfig/20240727-111512-ladsgroup.json
11:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P66961 and previous config saved to /var/cache/conftool/dbconfig/20240727-110007-ladsgroup.json
10:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P66960 and previous config saved to /var/cache/conftool/dbconfig/20240727-104502-ladsgroup.json
10:11 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1246.eqiad.wmnet with reason: Sad
10:11 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db1246.eqiad.wmnet with reason: Sad
10:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depool db1246, paged', diff saved to https://phabricator.wikimedia.org/P66959 and previous config saved to /var/cache/conftool/dbconfig/20240727-100533-ladsgroup.json
07:08 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2202.codfw.wmnet with reason: Maintenance
07:08 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2202.codfw.wmnet with reason: Maintenance
07:08 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188 (T367856)', diff saved to https://phabricator.wikimedia.org/P66958 and previous config saved to /var/cache/conftool/dbconfig/20240727-070839-marostegui.json
06:53 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188', diff saved to https://phabricator.wikimedia.org/P66957 and previous config saved to /var/cache/conftool/dbconfig/20240727-065332-marostegui.json
06:38 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188', diff saved to https://phabricator.wikimedia.org/P66956 and previous config saved to /var/cache/conftool/dbconfig/20240727-063824-marostegui.json
06:23 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188 (T367856)', diff saved to https://phabricator.wikimedia.org/P66955 and previous config saved to /var/cache/conftool/dbconfig/20240727-062317-marostegui.json
01:54 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2234.codfw.wmnet with OS bookworm
01:54 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
01:26 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
01:13 pt1979@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host db2233.codfw.wmnet with OS bookworm
01:10 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2234.codfw.wmnet with reason: host reimage
01:07 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2234.codfw.wmnet with reason: host reimage
01:07 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2233.codfw.wmnet with OS bookworm
00:53 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2234.codfw.wmnet with OS bookworm
00:51 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2235.codfw.wmnet with OS bookworm
00:51 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
00:51 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2236.codfw.wmnet with OS bookworm
00:51 pt1979@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
00:42 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
00:26 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2235.codfw.wmnet with reason: host reimage
00:24 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2235.codfw.wmnet with reason: host reimage
00:20 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
00:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1163 (T352010)', diff saved to https://phabricator.wikimedia.org/P66954 and previous config saved to /var/cache/conftool/dbconfig/20240727-002016-ladsgroup.json
00:10 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2235.codfw.wmnet with OS bookworm
00:08 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2237.codfw.wmnet with OS bookworm
00:08 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
00:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P66953 and previous config saved to /var/cache/conftool/dbconfig/20240727-000509-ladsgroup.json
00:04 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2236.codfw.wmnet with reason: host reimage
00:01 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2236.codfw.wmnet with reason: host reimage
00:01 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"

2024-07-26

23:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P66952 and previous config saved to /var/cache/conftool/dbconfig/20240726-235001-ladsgroup.json
23:46 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2236.codfw.wmnet with OS bookworm
23:44 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2237.codfw.wmnet with reason: host reimage
23:42 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2237.codfw.wmnet with reason: host reimage
23:38 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2238.codfw.wmnet with OS bookworm
23:38 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
23:36 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1221 (T367856)', diff saved to https://phabricator.wikimedia.org/P66951 and previous config saved to /var/cache/conftool/dbconfig/20240726-233648-marostegui.json
23:36 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1015,1019].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
23:36 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1015,1019].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
23:36 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1221.eqiad.wmnet with reason: Maintenance
23:36 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1221.eqiad.wmnet with reason: Maintenance
23:36 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199 (T367856)', diff saved to https://phabricator.wikimedia.org/P66950 and previous config saved to /var/cache/conftool/dbconfig/20240726-233619-marostegui.json
23:36 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
23:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1163 (T352010)', diff saved to https://phabricator.wikimedia.org/P66949 and previous config saved to /var/cache/conftool/dbconfig/20240726-233454-ladsgroup.json
23:28 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2237.codfw.wmnet with OS bookworm
23:21 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199', diff saved to https://phabricator.wikimedia.org/P66948 and previous config saved to /var/cache/conftool/dbconfig/20240726-232112-marostegui.json
23:18 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2238.codfw.wmnet with reason: host reimage
23:15 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2238.codfw.wmnet with reason: host reimage
23:11 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2239.codfw.wmnet with OS bookworm
23:10 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
23:09 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
23:06 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199', diff saved to https://phabricator.wikimedia.org/P66947 and previous config saved to /var/cache/conftool/dbconfig/20240726-230605-marostegui.json
23:02 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2238.codfw.wmnet with OS bookworm
22:52 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2239.codfw.wmnet with reason: host reimage
22:51 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199 (T367856)', diff saved to https://phabricator.wikimedia.org/P66946 and previous config saved to /var/cache/conftool/dbconfig/20240726-225058-marostegui.json
22:50 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2239.codfw.wmnet with reason: host reimage
22:35 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2239.codfw.wmnet with OS bookworm
22:35 pt1979@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host db2239.codfw.wmnet with OS bookworm
20:44 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2239.codfw.wmnet with OS bookworm
18:52 mutante: [deploy1002:~] $ echo 'https://sep11.wikipedia.org' | mwscript purgeList.php --wiki=aawiki - T367014
18:02 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephmon1005.eqiad.wmnet with OS bullseye
18:02 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
17:57 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
17:56 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephmon1006.eqiad.wmnet with OS bullseye
17:56 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
17:53 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
17:41 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephmon1005.eqiad.wmnet with reason: host reimage
17:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephmon1006.eqiad.wmnet with reason: host reimage
17:35 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephmon1005.eqiad.wmnet with reason: host reimage
17:35 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephmon1006.eqiad.wmnet with reason: host reimage
17:33 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephmon1005.eqiad.wmnet with OS bullseye
17:33 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephmon1006.eqiad.wmnet with OS bullseye
17:16 cjming@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic: apply
17:16 cjming@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic: apply
16:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2239.mgmt.codfw.wmnet with reboot policy FORCED
16:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2238.mgmt.codfw.wmnet with reboot policy FORCED
16:52 cjming@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
16:52 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2237.mgmt.codfw.wmnet with reboot policy FORCED
16:52 cjming@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
16:51 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2236.mgmt.codfw.wmnet with reboot policy FORCED
16:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2235.mgmt.codfw.wmnet with reboot policy FORCED
16:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2234.mgmt.codfw.wmnet with reboot policy FORCED
16:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2233.mgmt.codfw.wmnet with reboot policy FORCED
16:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2232.mgmt.codfw.wmnet with reboot policy FORCED
16:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2231.mgmt.codfw.wmnet with reboot policy FORCED
16:43 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db2239.mgmt.codfw.wmnet with reboot policy FORCED
16:42 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db2238.mgmt.codfw.wmnet with reboot policy FORCED
16:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2230.mgmt.codfw.wmnet with reboot policy FORCED
16:41 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db2237.mgmt.codfw.wmnet with reboot policy FORCED
16:40 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db2236.mgmt.codfw.wmnet with reboot policy FORCED
16:38 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db2235.mgmt.codfw.wmnet with reboot policy FORCED
16:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2229.mgmt.codfw.wmnet with reboot policy FORCED
16:37 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db2234.mgmt.codfw.wmnet with reboot policy FORCED
16:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2228.mgmt.codfw.wmnet with reboot policy FORCED
16:35 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db2233.mgmt.codfw.wmnet with reboot policy FORCED
16:34 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db2232.mgmt.codfw.wmnet with reboot policy FORCED
16:33 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db2231.mgmt.codfw.wmnet with reboot policy FORCED
16:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db2230.mgmt.codfw.wmnet with reboot policy FORCED
16:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db2229.mgmt.codfw.wmnet with reboot policy FORCED
16:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db2228.mgmt.codfw.wmnet with reboot policy FORCED
16:24 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:24 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding db2229 to codfw - jhancock@cumin2002"
16:23 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding db2229 to codfw - jhancock@cumin2002"
16:20 jhancock@cumin2002: START - Cookbook sre.dns.netbox
15:55 mfossati@deploy1002: Finished deploy [airflow-dags/platform_eng@845502d]: (no justification provided) (duration: 00m 37s)
15:55 mfossati@deploy1002: Started deploy [airflow-dags/platform_eng@845502d]: (no justification provided)
15:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1163 (T352010)', diff saved to https://phabricator.wikimedia.org/P66945 and previous config saved to /var/cache/conftool/dbconfig/20240726-153145-ladsgroup.json
15:31 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1163.eqiad.wmnet with reason: Maintenance
15:31 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1163.eqiad.wmnet with reason: Maintenance
15:12 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db2227.codfw.wmnet with OS bookworm
15:09 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2227.mgmt.codfw.wmnet with reboot policy FORCED
14:53 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db2227.mgmt.codfw.wmnet with reboot policy FORCED
14:52 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:51 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding db2227 to codfw - jhancock@cumin2002"
14:50 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding db2227 to codfw - jhancock@cumin2002"
14:48 jhancock@cumin2002: START - Cookbook sre.dns.netbox
14:42 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
14:42 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['db2226']
14:41 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2226']
14:41 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['db2226']
14:41 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2226']
14:34 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2240.codfw.wmnet with OS bookworm
14:34 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
14:23 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
14:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2226.mgmt.codfw.wmnet with reboot policy FORCED
14:07 dcausse@deploy1002: Finished deploy [airflow-dags/search@fb00e94]: search: process_sparql_query_hourly tune the number of partitions to prevent OOM (duration: 00m 21s)
14:07 dcausse@deploy1002: Started deploy [airflow-dags/search@fb00e94]: search: process_sparql_query_hourly tune the number of partitions to prevent OOM
14:06 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2240.codfw.wmnet with reason: host reimage
14:03 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2240.codfw.wmnet with reason: host reimage
13:59 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db2226.mgmt.codfw.wmnet with reboot policy FORCED
13:56 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:56 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding db2226 to codfw - jhancock@cumin2002"
13:55 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding db2226 to codfw - jhancock@cumin2002"
13:52 jhancock@cumin2002: START - Cookbook sre.dns.netbox
13:49 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2240.codfw.wmnet with OS bookworm
13:42 elukey: move dump_cloud_ip_ranges's write to /srv/private capabilities back to puppetmaster1001 - T368023
13:23 dcausse@deploy1002: Finished deploy [airflow-dags/search@d09039f]: search: fix drop dailies and bump discolitycs to fix numpy & pyarrow version conflict (duration: 00m 45s)
13:23 dcausse@deploy1002: Started deploy [airflow-dags/search@d09039f]: search: fix drop dailies and bump discolitycs to fix numpy & pyarrow version conflict
13:19 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1001.eqiad.wmnet with OS bullseye
13:02 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1001.eqiad.wmnet with reason: host reimage
12:58 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1001.eqiad.wmnet with reason: host reimage
12:56 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephmon1006.eqiad.wmnet with OS bullseye
12:42 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephmon1005.eqiad.wmnet with OS bullseye
12:42 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host sretest1001.eqiad.wmnet with OS bullseye
12:00 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm
11:52 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephmon1005.eqiad.wmnet with OS bullseye
11:52 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephmon1006.eqiad.wmnet with OS bullseye
11:51 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage
11:48 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage
11:45 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm
11:05 jelto@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: Upgrade GitLab to new version
10:40 akosiaris@deploy1003: Synchronized .mailmap: Testing a noop deploy from deploy1003 (duration: 20m 28s)
10:03 elukey@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1001.eqiad.wmnet with OS bullseye
10:00 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/growthbook: apply
10:00 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/growthbook: apply
09:38 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host analytics1073.eqiad.wmnet
09:35 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host analytics1073.eqiad.wmnet
09:33 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host analytics1072.eqiad.wmnet
09:27 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host analytics1072.eqiad.wmnet
09:21 akosiaris@deploy1003: helmfile [staging] DONE helmfile.d/services/machinetranslation: sync
09:21 akosiaris@deploy1003: helmfile [staging] START helmfile.d/services/machinetranslation: sync
09:21 akosiaris@deploy1003: helmfile [codfw] DONE helmfile.d/services/machinetranslation: sync
09:21 akosiaris@deploy1003: helmfile [codfw] START helmfile.d/services/machinetranslation: sync
09:21 akosiaris@deploy1003: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: sync
09:16 akosiaris@deploy1003: helmfile [staging] DONE helmfile.d/services/sessionstore: sync
09:10 akosiaris@deploy1003: helmfile [staging] DONE helmfile.d/services/recommendation-api: sync
09:09 akosiaris@deploy1003: helmfile [staging] START helmfile.d/services/recommendation-api: sync
09:09 akosiaris@deploy1003: helmfile [codfw] DONE helmfile.d/services/recommendation-api: sync
09:09 akosiaris@deploy1003: helmfile [codfw] START helmfile.d/services/recommendation-api: sync
09:09 akosiaris@deploy1003: helmfile [eqiad] DONE helmfile.d/services/recommendation-api: sync
09:09 akosiaris@deploy1003: helmfile [eqiad] START helmfile.d/services/recommendation-api: sync
09:06 akosiaris@deploy1003: helmfile [staging] START helmfile.d/services/sessionstore: sync
09:06 akosiaris@deploy1003: helmfile [codfw] DONE helmfile.d/services/sessionstore: sync
09:06 akosiaris@deploy1003: helmfile [staging] DONE helmfile.d/services/echostore: sync
09:06 akosiaris@deploy1003: helmfile [codfw] START helmfile.d/services/sessionstore: sync
09:06 akosiaris@deploy1003: helmfile [eqiad] DONE helmfile.d/services/sessionstore: sync
09:06 akosiaris@deploy1003: helmfile [staging] START helmfile.d/services/echostore: sync
09:06 akosiaris@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: sync
09:06 akosiaris@deploy1003: helmfile [eqiad] START helmfile.d/services/sessionstore: sync
09:06 akosiaris@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: sync
09:06 akosiaris@deploy1003: helmfile [eqiad] DONE helmfile.d/services/echostore: sync
09:06 akosiaris@deploy1003: helmfile [eqiad] START helmfile.d/services/echostore: sync
09:05 akosiaris@deploy1003: helmfile [eqiad] START helmfile.d/services/machinetranslation: sync
09:02 akosiaris@deploy1003: helmfile [staging] DONE helmfile.d/services/linkrecommendation: sync
09:02 akosiaris@deploy1003: helmfile [staging] START helmfile.d/services/linkrecommendation: sync
09:02 akosiaris@deploy1003: helmfile [codfw] DONE helmfile.d/services/linkrecommendation: sync
09:01 akosiaris@deploy1003: helmfile [codfw] START helmfile.d/services/linkrecommendation: sync
09:01 akosiaris@deploy1003: helmfile [eqiad] DONE helmfile.d/services/linkrecommendation: sync
09:01 akosiaris@deploy1003: helmfile [eqiad] START helmfile.d/services/linkrecommendation: sync
08:56 akosiaris@deploy1003: helmfile [staging] DONE helmfile.d/services/developer-portal: sync
08:55 akosiaris@deploy1003: helmfile [staging] START helmfile.d/services/developer-portal: sync
08:55 akosiaris@deploy1003: helmfile [codfw] DONE helmfile.d/services/developer-portal: sync
08:55 akosiaris@deploy1003: helmfile [codfw] START helmfile.d/services/developer-portal: sync
08:55 akosiaris@deploy1003: helmfile [eqiad] DONE helmfile.d/services/developer-portal: sync
08:55 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2188 (T367856)', diff saved to https://phabricator.wikimedia.org/P66942 and previous config saved to /var/cache/conftool/dbconfig/20240726-085529-marostegui.json
08:55 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2188.codfw.wmnet with reason: Maintenance
08:55 akosiaris@deploy1003: helmfile [eqiad] START helmfile.d/services/developer-portal: sync
08:55 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2188.codfw.wmnet with reason: Maintenance
08:55 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T367856)', diff saved to https://phabricator.wikimedia.org/P66941 and previous config saved to /var/cache/conftool/dbconfig/20240726-085507-marostegui.json
08:52 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/growthbook: apply
08:52 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/growthbook: apply
08:40 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P66940 and previous config saved to /var/cache/conftool/dbconfig/20240726-083959-marostegui.json
08:35 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1001.eqiad.wmnet with reason: host reimage
08:32 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1001.eqiad.wmnet with reason: host reimage
08:25 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Upgrade GitLab to new version
08:24 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P66939 and previous config saved to /var/cache/conftool/dbconfig/20240726-082452-marostegui.json
08:18 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/growthbook: apply
08:18 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/growthbook: apply
08:16 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/growthbook: apply
08:16 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host sretest1001.eqiad.wmnet with OS bullseye
08:09 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T367856)', diff saved to https://phabricator.wikimedia.org/P66938 and previous config saved to /var/cache/conftool/dbconfig/20240726-080945-marostegui.json
07:43 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1199 (T367856)', diff saved to https://phabricator.wikimedia.org/P66937 and previous config saved to /var/cache/conftool/dbconfig/20240726-074330-marostegui.json
07:43 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1199.eqiad.wmnet with reason: Maintenance
07:43 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1199.eqiad.wmnet with reason: Maintenance
07:43 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190 (T367856)', diff saved to https://phabricator.wikimedia.org/P66936 and previous config saved to /var/cache/conftool/dbconfig/20240726-074308-marostegui.json
07:28 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P66935 and previous config saved to /var/cache/conftool/dbconfig/20240726-072801-marostegui.json
07:12 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P66934 and previous config saved to /var/cache/conftool/dbconfig/20240726-071254-marostegui.json
06:57 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190 (T367856)', diff saved to https://phabricator.wikimedia.org/P66933 and previous config saved to /var/cache/conftool/dbconfig/20240726-065747-marostegui.json
06:56 XioNoX: continue rolling out "LVS-and-NS-service-ips" prefix-list rename to network device
00:47 ladsgroup@deploy1002: Finished scap: Backport for Update UI classes and CSS for review notices (T191156), Add CSS class to watchlist pending notice (T191156) (duration: 09m 49s)
00:42 ladsgroup@deploy1002: ladsgroup: Continuing with sync
00:40 ladsgroup@deploy1002: ladsgroup: Backport for Update UI classes and CSS for review notices (T191156), Add CSS class to watchlist pending notice (T191156) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
00:37 ladsgroup@deploy1002: Started scap sync-world: Backport for Update UI classes and CSS for review notices (T191156), Add CSS class to watchlist pending notice (T191156)

2024-07-25

23:09 ladsgroup@deploy1002: ladsgroup: Continuing with sync
23:05 ladsgroup@deploy1002: ladsgroup: Backport for Add CSS class to watchlist pending notice (T191156) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
23:03 ladsgroup@deploy1002: Started scap sync-world: Backport for Add CSS class to watchlist pending notice (T191156)
22:56 ladsgroup@deploy1002: Finished scap: Backport for Revert "Use expression builder to avoid IDatabase::makeList" (T371052) (duration: 10m 08s)
22:50 ladsgroup@deploy1002: ladsgroup, umherirrender: Continuing with sync
22:48 ladsgroup@deploy1002: ladsgroup, umherirrender: Backport for Revert "Use expression builder to avoid IDatabase::makeList" (T371052) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
22:46 ladsgroup@deploy1002: Started scap sync-world: Backport for Revert "Use expression builder to avoid IDatabase::makeList" (T371052)
22:11 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2240.mgmt.codfw.wmnet with reboot policy FORCED
22:10 eoghan@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: Upgrade for T370973
22:04 eoghan@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Upgrade for T370973
22:04 eoghan@cumin1002: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab1004.wikimedia.org with reason: Upgrade for T370973
22:03 eoghan@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Upgrade for T370973
22:00 eoghan@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: Upgrade for T370973
21:59 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db2240.mgmt.codfw.wmnet with reboot policy FORCED
21:55 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
21:55 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding db2240 to codfw - jhancock@cumin2002"
21:54 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding db2240 to codfw - jhancock@cumin2002"
21:54 eoghan@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Upgrade for T370973
21:52 jhancock@cumin2002: START - Cookbook sre.dns.netbox
21:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db2224']
21:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db2225']
21:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db2223']
21:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db2222']
21:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db2221']
21:36 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2225']
21:36 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2224']
21:36 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2223']
21:36 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2222']
21:35 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2221']
21:35 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['db2225']
21:35 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['db2224']
21:35 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['db2223']
21:35 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['db2222']
21:35 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['db2221']
21:35 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2225']
21:35 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2224']
21:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2223']
21:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2222']
21:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2221']
21:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2225.mgmt.codfw.wmnet with reboot policy FORCED
21:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2222.mgmt.codfw.wmnet with reboot policy FORCED
21:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2223.mgmt.codfw.wmnet with reboot policy FORCED
21:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2221.mgmt.codfw.wmnet with reboot policy FORCED
21:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2224.mgmt.codfw.wmnet with reboot policy FORCED
21:22 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db2225.mgmt.codfw.wmnet with reboot policy FORCED
21:22 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db2222.mgmt.codfw.wmnet with reboot policy FORCED
21:19 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db2225.mgmt.codfw.wmnet with reboot policy FORCED
21:19 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db2222.mgmt.codfw.wmnet with reboot policy FORCED
21:18 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db2225.mgmt.codfw.wmnet with reboot policy FORCED
21:18 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db2224.mgmt.codfw.wmnet with reboot policy FORCED
21:18 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db2223.mgmt.codfw.wmnet with reboot policy FORCED
21:18 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db2222.mgmt.codfw.wmnet with reboot policy FORCED
21:18 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db2221.mgmt.codfw.wmnet with reboot policy FORCED
21:18 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
21:18 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding db2221 to codfw - jhancock@cumin2002"
21:17 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding db2221 to codfw - jhancock@cumin2002"
21:14 jhancock@cumin2002: START - Cookbook sre.dns.netbox
19:16 cstone: payments-wiki upgraded from a37746fe to 91624a2e
19:12 pt1979@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephmon1004.eqiad.wmnet with OS bullseye
19:12 pt1979@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin1002"
18:59 ryankemper@cumin2002: START - Cookbook sre.hadoop.reboot-workers for Hadoop analytics cluster
18:26 pt1979@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin1002"
18:12 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group2 to 1.43.0-wmf.15 refs T366960
18:10 pt1979@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephmon1004.eqiad.wmnet with reason: host reimage
18:07 pt1979@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephmon1004.eqiad.wmnet with reason: host reimage
18:05 pt1979@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephmon1004.eqiad.wmnet with OS bullseye
17:56 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on netbox2003.codfw.wmnet,netbox1003.eqiad.wmnet with reason: netbox upgrade prep work
17:56 ayounsi@cumin1002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on netbox2003.codfw.wmnet,netbox1003.eqiad.wmnet with reason: netbox upgrade prep work
17:32 pt1979@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephmon1004.eqiad.wmnet with OS bullseye
17:20 swfrench@cumin1002: conftool action : set/weight=10:pooled=yes; selector: name=(wikikube-worker1032.eqiad.wmnet),cluster=kubernetes,service=kubesvc [reason: T351074 - pooling after reimage]
17:08 swfrench@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1032.eqiad.wmnet with OS bullseye
17:06 swfrench-wmf: running homer 'cr*eqiad*' commit 'T351074' for k8s worker reimage
17:03 ebernhardson@deploy1002: Finished deploy [airflow-dags/search@b1a04fc]: bump discolytics to 0.25 (duration: 00m 25s)
17:03 ebernhardson@deploy1002: Started deploy [airflow-dags/search@b1a04fc]: bump discolytics to 0.25
16:48 swfrench@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1032.eqiad.wmnet with reason: host reimage
16:46 ebernhardson@deploy1002: Finished deploy [airflow-dags/search@8c8f4c2]: Add new fields to search_satisfaction metrics (duration: 00m 19s)
16:46 ebernhardson@deploy1002: Started deploy [airflow-dags/search@8c8f4c2]: Add new fields to search_satisfaction metrics
16:45 swfrench@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1032.eqiad.wmnet with reason: host reimage
16:45 pt1979@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephmon1004.eqiad.wmnet with OS bullseye
16:30 swfrench@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1032.eqiad.wmnet with OS bullseye
16:29 swfrench@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1032.eqiad.wmnet on all recursors
16:29 swfrench@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker1032.eqiad.wmnet on all recursors
16:27 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/growthbook: apply
16:27 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/growthbook: apply
16:25 swfrench@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw1364 to wikikube-worker1032
16:24 swfrench@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1032
16:24 swfrench@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1032
16:23 swfrench@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:23 swfrench@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1364 to wikikube-worker1032 - swfrench@cumin1002"
16:21 swfrench@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1364 to wikikube-worker1032 - swfrench@cumin1002"
16:18 swfrench@cumin1002: START - Cookbook sre.dns.netbox
16:18 swfrench@cumin1002: START - Cookbook sre.hosts.rename from mw1364 to wikikube-worker1032
16:17 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/growthbook: apply
16:09 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:07 ayounsi@cumin1002: START - Cookbook sre.dns.netbox
15:15 elukey: upgrade spicerack to 8.9.0 on cumin nodes
15:08 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1190 (T367856)', diff saved to https://phabricator.wikimedia.org/P66930 and previous config saved to /var/cache/conftool/dbconfig/20240725-150739-marostegui.json
15:07 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1190.eqiad.wmnet with reason: Maintenance
15:07 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1190.eqiad.wmnet with reason: Maintenance
15:07 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1160 (T367856)', diff saved to https://phabricator.wikimedia.org/P66929 and previous config saved to /var/cache/conftool/dbconfig/20240725-150717-marostegui.json
14:53 elukey: uploaded spicerack_8.9.0 to apt.wikimedia.org bullseye-wikimedia
14:52 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P66928 and previous config saved to /var/cache/conftool/dbconfig/20240725-145210-marostegui.json
14:51 sukhe: running authdns-update after dns4003 depool
14:48 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=dns4003.wikimedia.org [reason: finished upgrading anycast-hc: T370068]
14:46 sukhe: [dns4003] upgrade anycast-healthchecker to 0.9.8-1+wmf12u2: T370068
14:44 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=dns4003.wikimedia.org [reason: upgrading anycast-hc: T370068]
14:37 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P66926 and previous config saved to /var/cache/conftool/dbconfig/20240725-143703-marostegui.json
14:36 dcausse@deploy1002: Finished deploy [airflow-dags/search@87b91b6]: search: drop hourly weighted_tags support (duration: 00m 20s)
14:36 dcausse@deploy1002: Started deploy [airflow-dags/search@87b91b6]: search: drop hourly weighted_tags support
14:21 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1160 (T367856)', diff saved to https://phabricator.wikimedia.org/P66925 and previous config saved to /var/cache/conftool/dbconfig/20240725-142155-marostegui.json
14:19 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: sync
14:12 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/machinetranslation: sync
14:12 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: sync
14:04 akosiaris@deploy1003: helmfile [staging] DONE helmfile.d/services/recommendation-api: sync
14:04 akosiaris@deploy1003: helmfile [staging] START helmfile.d/services/recommendation-api: sync
14:04 akosiaris@deploy1003: helmfile [codfw] DONE helmfile.d/services/recommendation-api: sync
14:03 akosiaris@deploy1003: helmfile [codfw] START helmfile.d/services/recommendation-api: sync
14:03 akosiaris@deploy1003: helmfile [eqiad] DONE helmfile.d/services/recommendation-api: sync
14:03 akosiaris@deploy1003: helmfile [eqiad] START helmfile.d/services/recommendation-api: sync
13:57 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/machinetranslation: sync
13:57 akosiaris@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: sync
13:53 akosiaris@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: sync
13:52 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/sessionstore: sync
13:52 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/sessionstore: sync
13:52 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/sessionstore: sync
13:52 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/sessionstore: sync
13:48 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/echostore: apply
13:48 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/echostore: apply
13:48 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/echostore: apply
13:48 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/echostore: apply
13:48 akosiaris@deploy1002: helmfile [staging] DONE helmfile.d/services/echostore: apply
13:48 akosiaris@deploy1002: helmfile [staging] START helmfile.d/services/echostore: apply
13:47 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/linkrecommendation: apply
13:45 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/linkrecommendation: apply
13:45 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/linkrecommendation: apply
13:45 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/linkrecommendation: apply
13:45 akosiaris@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply
13:45 akosiaris@deploy1002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply
13:43 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
13:43 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
13:43 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
13:43 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
13:42 akosiaris@deploy1002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
13:42 akosiaris@deploy1002: helmfile [staging] START helmfile.d/services/developer-portal: apply
13:41 cgoubert@cumin1002: conftool action : set/pooled=yes; selector: name=kubernetes1051.eqiad.wmnet,cluster=kubernetes,service=kubesvc [reason: Uncordoning kubernetes1051 for missed upgrades - T369011]
13:40 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubernetes1051.eqiad.wmnet
13:34 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=1) for host pc1017.eqiad.wmnet with OS bookworm
13:32 cgoubert@cumin1002: START - Cookbook sre.hosts.reboot-single for host kubernetes1051.eqiad.wmnet
13:30 Lucas_WMDE: UTC afternoon backport+config window done
13:30 cgoubert@cumin1002: conftool action : set/pooled=inactive; selector: name=kubernetes1051.eqiad.wmnet,cluster=kubernetes,service=kubesvc [reason: Cordoning kubernetes1051 for missed upgrades - T369011]
13:30 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for Add wikibase client interaction stream (T370045) (duration: 07m 56s)
13:25 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde, joelyrookewmde: Continuing with sync
13:24 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde, joelyrookewmde: Backport for Add wikibase client interaction stream (T370045) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:22 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap sync-world: Backport for Add wikibase client interaction stream (T370045)
13:18 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for Enable optional MathJax rendering in everywhere (T370507) (duration: 09m 57s)
13:17 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage
13:15 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage
13:14 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde, physikerwelt: Continuing with sync
13:12 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde, physikerwelt: Backport for Enable optional MathJax rendering in everywhere (T370507) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:09 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap sync-world: Backport for Enable optional MathJax rendering in everywhere (T370507)
13:01 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm
12:56 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=1) for host pc1017.eqiad.wmnet with OS bookworm
12:56 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
12:42 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/growthbook: apply
12:42 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/growthbook: apply
12:33 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
12:29 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-misc: apply
12:28 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-misc: apply
12:28 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-misc: apply
12:27 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-misc: apply
12:27 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
12:26 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
12:26 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
12:25 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
12:25 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply
12:24 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply
12:24 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply
12:23 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply
12:23 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-jobrunner: apply
12:22 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-jobrunner: apply
12:22 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-jobrunner: apply
12:20 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-jobrunner: apply
12:20 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-wikifunctions: apply
12:20 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-wikifunctions: apply
12:20 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-wikifunctions: apply
12:18 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-wikifunctions: apply
12:18 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
12:17 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
12:17 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
12:16 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
12:16 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
12:16 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage
12:15 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
12:15 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
12:14 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
12:13 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage
12:13 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
12:12 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
12:12 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
12:12 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
12:08 cgoubert@deploy1002: sync-world aborted: Deploying mpic envoy listener - 1056163 - T366234 (duration: 17m 59s)
11:59 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm
11:53 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/growthbook: apply
11:51 cgoubert@deploy1002: Started scap sync-world: Deploying mpic envoy listener - 1056163 - T366234
11:45 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/growthbook: apply
11:22 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/growthbook: apply
11:22 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/growthbook: apply
10:42 elukey: upload docker-report 0.0.15 to bullseye-wimedia and upgrade build2001
10:00 cgoubert@cumin1002: conftool action : set/pooled=yes; selector: name=kubernetes1051.eqiad.wmnet,cluster=kubernetes,service=kubesvc [reason: Uncordoning kubernetes1051 - T369011]
09:57 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
09:54 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
09:27 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/growthbook: apply
09:26 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/growthbook: apply
09:19 elukey: move dump_cloud_ip_ranges from puppetmaster1001 to puppetserver1001 - T368023
07:38 kart_: Updated cxserver to 2024-07-22-050142-production (T363968)
07:37 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2176 (T367856)', diff saved to https://phabricator.wikimedia.org/P66924 and previous config saved to /var/cache/conftool/dbconfig/20240725-073742-marostegui.json
07:37 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2176.codfw.wmnet with reason: Maintenance
07:37 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2176.codfw.wmnet with reason: Maintenance
07:37 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T367856)', diff saved to https://phabricator.wikimedia.org/P66923 and previous config saved to /var/cache/conftool/dbconfig/20240725-073720-marostegui.json
07:37 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
07:36 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/growthbook: apply
07:36 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
07:35 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/growthbook: apply
07:35 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
07:35 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
07:22 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P66922 and previous config saved to /var/cache/conftool/dbconfig/20240725-072213-marostegui.json
07:14 XioNoX: add transit BGP session to KPN in esams
07:07 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P66921 and previous config saved to /var/cache/conftool/dbconfig/20240725-070706-marostegui.json
06:52 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T367856)', diff saved to https://phabricator.wikimedia.org/P66920 and previous config saved to /var/cache/conftool/dbconfig/20240725-065159-marostegui.json
00:43 zabe@deploy1002: Finished scap: Backport for Further configs for cswikivoyage (T370913) (duration: 08m 22s)
00:39 zabe@deploy1002: zabe: Continuing with sync
00:37 zabe@deploy1002: zabe: Backport for Further configs for cswikivoyage (T370913) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
00:35 zabe@deploy1002: Started scap sync-world: Backport for Further configs for cswikivoyage (T370913)
00:11 eileen: civicrm upgraded from c656ab2f to 1dc4f944
00:00 dani@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
00:00 dani@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
00:00 dani@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply

2024-07-24

23:59 dani@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
23:59 dani@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
23:59 dani@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
23:20 zabe@deploy1002: Finished scap: update interwiki cache (duration: 08m 25s)
23:11 zabe@deploy1002: Started scap sync-world: update interwiki cache
23:09 zabe: zabe@mwmaint1002:~$ mwscript extensions/CirrusSearch/maintenance/UpdateSearchIndexConfig.php --wiki=cswikivoyage --cluster=all 2>&1 | tee /tmp/cswikivoyage.UpdateSearchIndexConfig.log # T370905
23:08 zabe@deploy1002: Finished scap: T370905 (duration: 09m 14s)
23:02 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1160 (T367856)', diff saved to https://phabricator.wikimedia.org/P66919 and previous config saved to /var/cache/conftool/dbconfig/20240724-230209-marostegui.json
23:02 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1160.eqiad.wmnet with reason: Maintenance
23:01 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1160.eqiad.wmnet with reason: Maintenance
22:59 zabe@deploy1002: Started scap sync-world: T370905
22:59 zabe: Create Wikivoyage Czech # T370905
22:42 ejegg: re-enabled Adyen job runner
22:41 ejegg: SmashPig upgraded from f2aca230 to 1b2d9a6e across all frack servers
22:34 ejegg: SmashPig upgraded from f2aca230 to 1b2d9a6e on frpig1002 only
22:34 ejegg: SmashPig upgraded from f2aca230 to 1b2d9a6e on frpig2001 only
22:33 ejegg: disabled Adyen job runner
21:59 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs1021.eqiad.wmnet
21:59 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs1020.eqiad.wmnet
21:58 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs1019.eqiad.wmnet
21:58 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs1018.eqiad.wmnet
21:55 ryankemper@cumin2002: START - Cookbook sre.hosts.reboot-single for host wdqs1021.eqiad.wmnet
21:55 ryankemper@cumin2002: START - Cookbook sre.hosts.reboot-single for host wdqs1020.eqiad.wmnet
21:55 ryankemper@cumin2002: START - Cookbook sre.hosts.reboot-single for host wdqs1019.eqiad.wmnet
21:55 ryankemper@cumin2002: START - Cookbook sre.hosts.reboot-single for host wdqs1018.eqiad.wmnet
21:54 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on wdqs[1018-1021].eqiad.wmnet with reason: T366555 security
21:54 ryankemper@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on wdqs[1018-1021].eqiad.wmnet with reason: T366555 security
21:51 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs1014.eqiad.wmnet
21:50 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs1015.eqiad.wmnet
21:47 ryankemper@cumin2002: START - Cookbook sre.hosts.reboot-single for host wdqs1014.eqiad.wmnet
21:47 ryankemper@cumin2002: START - Cookbook sre.hosts.reboot-single for host wdqs1015.eqiad.wmnet
21:47 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on wdqs[1014-1015].eqiad.wmnet with reason: T366555 security
21:47 ryankemper@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on wdqs[1014-1015].eqiad.wmnet with reason: T366555 security
21:46 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs2007.codfw.wmnet
21:45 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs2010.codfw.wmnet
21:45 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs2011.codfw.wmnet
21:45 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs1012.eqiad.wmnet
21:45 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs2012.codfw.wmnet
21:45 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs2009.codfw.wmnet
21:44 ryankemper@cumin2002: END (PASS) - Cookbook sre.apifeatureusage.roll-restart-reboot-logstash (exit_code=0) rolling reboot on A:apifeatureusage
21:42 ryankemper@cumin2002: START - Cookbook sre.hosts.reboot-single for host wdqs2012.codfw.wmnet
21:42 ryankemper@cumin2002: START - Cookbook sre.hosts.reboot-single for host wdqs2011.codfw.wmnet
21:42 ryankemper@cumin2002: START - Cookbook sre.hosts.reboot-single for host wdqs2010.codfw.wmnet
21:42 ryankemper@cumin2002: START - Cookbook sre.hosts.reboot-single for host wdqs2009.codfw.wmnet
21:42 ryankemper@cumin2002: START - Cookbook sre.hosts.reboot-single for host wdqs2007.codfw.wmnet
21:42 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs1013.eqiad.wmnet
21:41 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on wdqs[2007,2009-2012].codfw.wmnet with reason: T366555 security
21:40 ryankemper@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on wdqs[2007,2009-2012].codfw.wmnet with reason: T366555 security
21:38 ryankemper@cumin2002: START - Cookbook sre.hosts.reboot-single for host wdqs1013.eqiad.wmnet
21:38 ryankemper@cumin2002: START - Cookbook sre.hosts.reboot-single for host wdqs1012.eqiad.wmnet
21:38 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on wdqs[1012-1013].eqiad.wmnet with reason: T366555 security
21:38 ryankemper@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on wdqs[1012-1013].eqiad.wmnet with reason: T366555 security
21:35 ryankemper@cumin2002: START - Cookbook sre.apifeatureusage.roll-restart-reboot-logstash rolling reboot on A:apifeatureusage
21:32 ebernhardson@deploy1002: Finished scap: Backport for Check the output of RevisionStore::getRevisionById (T370770) (duration: 12m 07s)
21:28 ebernhardson@deploy1002: ebernhardson: Continuing with sync
21:26 ebernhardson@deploy1002: ebernhardson: Backport for Check the output of RevisionStore::getRevisionById (T370770) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:20 ebernhardson@deploy1002: Started scap sync-world: Backport for Check the output of RevisionStore::getRevisionById (T370770)
21:17 zabe@deploy1002: Finished scap: Backport for Create dark mode launch banner for Vector 2022 (T370303) (duration: 41m 44s)
21:11 zabe@deploy1002: jdrewniak, zabe: Continuing with sync
21:07 zabe@deploy1002: jdrewniak, zabe: Backport for Create dark mode launch banner for Vector 2022 (T370303) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:49 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm
20:36 zabe@deploy1002: Started scap sync-world: Backport for Create dark mode launch banner for Vector 2022 (T370303)
20:24 sergi0: mwscript extensions/GrowthExperiments/maintenance/migrateCommunityConfig.php --wiki=frwiktionary #T369711
20:23 sergi0: sgimeno@mwmaint1002:~$ mwscript extensions/GrowthExperiments/maintenance/migrateCommunityConfig.php --wiki=dewiki --force
20:18 zabe@deploy1002: Finished scap: Backport for frwiktionary, dewiki: enable CommunityConfiguration (T370261 T369711) (duration: 09m 43s)
20:13 zabe@deploy1002: zabe, sgimeno: Continuing with sync
20:11 zabe@deploy1002: zabe, sgimeno: Backport for frwiktionary, dewiki: enable CommunityConfiguration (T370261 T369711) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:08 zabe@deploy1002: Started scap sync-world: Backport for frwiktionary, dewiki: enable CommunityConfiguration (T370261 T369711)
19:31 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
19:31 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
19:30 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
19:30 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
19:27 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
19:27 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
18:10 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group1 to 1.43.0-wmf.15 refs T366960
17:22 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
17:22 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding frack servers to codfw - jhancock@cumin2002"
17:21 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding frack servers to codfw - jhancock@cumin2002"
17:13 jhancock@cumin2002: START - Cookbook sre.dns.netbox
17:03 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
17:03 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding frack servers to codfw - jhancock@cumin2002"
17:02 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding frack servers to codfw - jhancock@cumin2002"
16:59 jhancock@cumin2002: START - Cookbook sre.dns.netbox
16:54 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:54 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding frack servers to codfw - jhancock@cumin2002"
16:53 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding frack servers to codfw - jhancock@cumin2002"
16:50 jhancock@cumin2002: START - Cookbook sre.dns.netbox
16:44 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:44 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding frack servers to codfw - jhancock@cumin2002"
16:43 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding frack servers to codfw - jhancock@cumin2002"
16:38 jhancock@cumin2002: START - Cookbook sre.dns.netbox
16:33 sukhe: sudo cumin -b1 -s120 'O:wikidough' 'systemctl restart anycast-healthchecker.service'
15:43 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply
15:42 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply
15:30 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs1020.eqiad.wmnet
15:24 sukhe@cumin1002: START - Cookbook sre.hosts.reboot-single for host lvs1020.eqiad.wmnet
15:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc2017.codfw.wmnet with OS bookworm
15:10 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc2017.codfw.wmnet with reason: host reimage
15:07 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs3010.esams.wmnet
15:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc2017.codfw.wmnet with reason: host reimage
15:04 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host pc2017.codfw.wmnet with OS bookworm
15:01 sukhe@cumin1002: START - Cookbook sre.hosts.reboot-single for host lvs3010.esams.wmnet
14:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=1) for host pc2017.codfw.wmnet with OS bookworm
14:54 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
14:52 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
14:52 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for Revert "TranslatablePage: Split translatable page id cache into multiple shards", Revert "TranslatablePage: Split translatable page id cache into multiple shards" (duration: 09m 37s)
14:47 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde, trainbranchbot: Continuing with sync
14:44 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde, trainbranchbot: Backport for Revert "TranslatablePage: Split translatable page id cache into multiple shards", Revert "TranslatablePage: Split translatable page id cache into multiple shards" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:42 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap sync-world: Backport for Revert "TranslatablePage: Split translatable page id cache into multiple shards", Revert "TranslatablePage: Split translatable page id cache into multiple shards"
14:36 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs4010.ulsfo.wmnet
14:35 ecarg@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc2017.codfw.wmnet with reason: host reimage
14:33 ecarg@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
14:32 ecarg@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
14:31 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=dns6001.wikimedia.org [reason: finished upgrading anycast-hc: T370068]
14:31 ecarg@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
14:30 sukhe@cumin1002: START - Cookbook sre.hosts.reboot-single for host lvs4010.ulsfo.wmnet
14:30 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc2017.codfw.wmnet with reason: host reimage
14:29 ecarg@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
14:28 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs5006.eqsin.wmnet
14:27 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=dns6001.wikimedia.org [reason: upgrading anycast-hc: T370068]
14:27 ecarg@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
14:26 ecarg@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
14:26 ecarg@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
14:26 kamila@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply
14:25 kamila@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-video: apply
14:25 kamila@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply
14:24 kamila@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-video: apply
14:24 kamila@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply
14:24 sukhe: upgrade O:durum to anycast-hc 0.9.8-1+wmf12u2
14:22 kamila@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply
14:22 sukhe@cumin1002: START - Cookbook sre.hosts.reboot-single for host lvs5006.eqsin.wmnet
14:20 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs6003.drmrs.wmnet
14:20 ecarg@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
14:19 ecarg@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
14:19 ecarg@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
14:18 sukhe: disable puppet on O:durum
14:18 ecarg@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
14:16 ecarg@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
14:15 ecarg@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
14:14 sukhe@cumin1002: START - Cookbook sre.hosts.reboot-single for host lvs6003.drmrs.wmnet
14:10 Lucas_WMDE: UTC afternoon backport+config window done
14:09 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for TranslatablePage: Split translatable page id cache into multiple shards (T366455) (duration: 11m 21s)
14:05 logmsgbot: lucaswerkmeister-wmde@deploy1002 abi, lucaswerkmeister-wmde: Continuing with sync
14:00 logmsgbot: lucaswerkmeister-wmde@deploy1002 abi, lucaswerkmeister-wmde: Backport for TranslatablePage: Split translatable page id cache into multiple shards (T366455) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:59 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host pc2017.codfw.wmnet with OS bookworm
13:58 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap sync-world: Backport for TranslatablePage: Split translatable page id cache into multiple shards (T366455)
13:57 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for TranslatablePage: Split translatable page id cache into multiple shards (T366455) (duration: 10m 21s)
13:52 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde, abi: Continuing with sync
13:51 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage
13:49 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde, abi: Backport for TranslatablePage: Split translatable page id cache into multiple shards (T366455) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:48 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage
13:46 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap sync-world: Backport for TranslatablePage: Split translatable page id cache into multiple shards (T366455)
13:37 godog: silence OtelCollectorRefusedSpans in codfw for 7d - T370043
13:35 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm
13:28 sukhe: reprepro -C main include bookworm-wikimedia anycast-healthchecker_0.9.8-1+wmf12u2_amd64.changes: T370068
13:25 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for knwikisource: Enable local uploads (T370765) (duration: 10m 14s)
13:19 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde, anzx: Continuing with sync
13:18 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde, anzx: Backport for knwikisource: Enable local uploads (T370765) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:15 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap sync-world: Backport for knwikisource: Enable local uploads (T370765)
13:14 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host pc1017.eqiad.wmnet with OS bookworm
12:39 akosiaris@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host deploy1003.eqiad.wmnet with OS bullseye
12:31 mfossati@deploy1002: Finished deploy [airflow-dags/platform_eng@24f95a8]: (no justification provided) (duration: 00m 30s)
12:31 mfossati@deploy1002: Started deploy [airflow-dags/platform_eng@24f95a8]: (no justification provided)
11:11 dreamyjazz@deploy1002: Finished scap: Backport for Remove now unused $wgGlobalBlockingDatabase definition (T370856) (duration: 07m 27s)
11:06 dreamyjazz@deploy1002: dreamyjazz: Continuing with sync
11:06 dreamyjazz@deploy1002: dreamyjazz: Backport for Remove now unused $wgGlobalBlockingDatabase definition (T370856) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
11:03 dreamyjazz@deploy1002: Started scap sync-world: Backport for Remove now unused $wgGlobalBlockingDatabase definition (T370856)
11:00 jiji@deploy1002: Finished scap: Noop, bumping mediawiki chart version (duration: 02m 32s)
10:57 jiji@deploy1002: Started scap sync-world: Noop, bumping mediawiki chart version
10:54 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
10:54 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
10:54 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
10:53 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
10:33 akosiaris@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on deploy1003.eqiad.wmnet with reason: host reimage
10:28 akosiaris@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on deploy1003.eqiad.wmnet with reason: host reimage
10:16 akosiaris@cumin1002: START - Cookbook sre.hosts.reimage for host deploy1003.eqiad.wmnet with OS bullseye
10:15 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 21 days, 0:00:00 on 16 hosts with reason: Legacy appserver spindown
10:15 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 21 days, 0:00:00 on 16 hosts with reason: Legacy appserver spindown
06:54 XioNoX: deploy CR1056198 Rename LVS-service-IPs prefix-list
06:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P66908 and previous config saved to /var/cache/conftool/dbconfig/20240724-060142-marostegui.json
05:46 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P66907 and previous config saved to /var/cache/conftool/dbconfig/20240724-054635-marostegui.json
05:45 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
05:45 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
05:31 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T367856)', diff saved to https://phabricator.wikimedia.org/P66906 and previous config saved to /var/cache/conftool/dbconfig/20240724-053128-marostegui.json
05:12 akosiaris@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host deploy1003.eqiad.wmnet with OS bullseye
01:14 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host pc2017.codfw.wmnet with OS bookworm
00:12 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm

2024-07-23

23:58 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host pc1017.eqiad.wmnet with OS bookworm
23:54 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host pc2017.codfw.wmnet with OS bookworm
23:49 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['pc2017']
23:43 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['pc2017']
23:43 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['pc2017']
23:42 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['pc2017']
23:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pc2017.mgmt.codfw.wmnet with reboot policy FORCED
23:23 eileen: civicrm upgraded from 4247715d to c656ab2f
23:20 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host pc2017.mgmt.codfw.wmnet with reboot policy FORCED
23:17 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
23:17 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding pc2017 to codfw - jhancock@cumin2002"
23:16 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding pc2017 to codfw - jhancock@cumin2002"
23:12 jhancock@cumin2002: START - Cookbook sre.dns.netbox
23:11 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm
23:09 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pc1017.mgmt.eqiad.wmnet with reboot policy FORCED
22:57 jclark@cumin1002: START - Cookbook sre.hosts.provision for host pc1017.mgmt.eqiad.wmnet with reboot policy FORCED
22:56 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
22:54 jclark@cumin1002: START - Cookbook sre.dns.netbox
22:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1203 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P66905 and previous config saved to /var/cache/conftool/dbconfig/20240723-223855-ladsgroup.json
22:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1202 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P66904 and previous config saved to /var/cache/conftool/dbconfig/20240723-223826-ladsgroup.json
22:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1195 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P66903 and previous config saved to /var/cache/conftool/dbconfig/20240723-223742-ladsgroup.json
22:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1203 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P66902 and previous config saved to /var/cache/conftool/dbconfig/20240723-222349-ladsgroup.json
22:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1202 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P66901 and previous config saved to /var/cache/conftool/dbconfig/20240723-222320-ladsgroup.json
22:22 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Unbanning all hosts in search_eqiad
22:22 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Unbanning all hosts in search_eqiad
22:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1195 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P66900 and previous config saved to /var/cache/conftool/dbconfig/20240723-222236-ladsgroup.json
22:08 jclark@cumin1002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
22:08 jclark@cumin1002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt pc1017 - jclark@cumin1002"
22:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1203 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P66899 and previous config saved to /var/cache/conftool/dbconfig/20240723-220844-ladsgroup.json
22:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1202 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P66898 and previous config saved to /var/cache/conftool/dbconfig/20240723-220815-ladsgroup.json
22:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1195 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P66897 and previous config saved to /var/cache/conftool/dbconfig/20240723-220731-ladsgroup.json
22:07 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt pc1017 - jclark@cumin1002"
22:03 jclark@cumin1002: START - Cookbook sre.dns.netbox
21:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1203 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P66896 and previous config saved to /var/cache/conftool/dbconfig/20240723-215338-ladsgroup.json
21:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1202 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P66895 and previous config saved to /var/cache/conftool/dbconfig/20240723-215309-ladsgroup.json
21:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1195 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P66894 and previous config saved to /var/cache/conftool/dbconfig/20240723-215225-ladsgroup.json
away: UTC late deploys done
20:53 tgr@deploy1002: Finished scap: Backport for Respect wgTranslateNumerals in Cite footnote markers (T370585), Respect wgTranslateNumerals in Cite footnote markers (T370585) (duration: 09m 34s)
20:48 tgr@deploy1002: wmde-fisch, tgr: Continuing with sync
20:46 tgr@deploy1002: wmde-fisch, tgr: Backport for Respect wgTranslateNumerals in Cite footnote markers (T370585), Respect wgTranslateNumerals in Cite footnote markers (T370585) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:44 tgr@deploy1002: Started scap sync-world: Backport for Respect wgTranslateNumerals in Cite footnote markers (T370585), Respect wgTranslateNumerals in Cite footnote markers (T370585)
20:38 ryankemper@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Unbanning all hosts in search_eqiad
20:38 ryankemper@cumin2002: START - Cookbook sre.elasticsearch.ban Unbanning all hosts in search_eqiad
20:22 tgr@deploy1002: Finished scap: Backport for debug: Enable Special:WikimediaDebug (T350094) (duration: 09m 28s)
20:16 tgr@deploy1002: tgr: Continuing with sync
20:14 tgr@deploy1002: tgr: Backport for debug: Enable Special:WikimediaDebug (T350094) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:12 tgr@deploy1002: Started scap sync-world: Backport for debug: Enable Special:WikimediaDebug (T350094)
18:59 milimetric@deploy1002: Finished deploy [airflow-dags/analytics@01e1952]: (no justification provided) (duration: 00m 30s)
18:58 milimetric@deploy1002: Started deploy [airflow-dags/analytics@01e1952]: (no justification provided)
18:45 mutante: puppetmaster1001/puppetmaster2001 - rm /var/run/confd-template/*.err to clear pybal icinga alerts after T367949
18:42 mutante: puppetmaster1001/puppetmaster2001 - rm /var/run/confd-template/_srv_config-master_pybal_codfw_api-https.err to clear pybal icinga alerts after T367949
18:40 pt1979@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephmon1004.eqiad.wmnet with OS bullseye
18:14 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group0 to 1.43.0-wmf.15 refs T366960
18:13 swfrench-wmf: sudo cumin 'A:lvs-secondary-eqiad or A:lvs-low-traffic-eqiad' 'ipvsadm --delete-service --tcp-service 10.2.2.1:443' (appservers-https eqiad) - T367949
18:12 aokoth@cumin1002: END (PASS) - Cookbook sre.vrts.upgrade (exit_code=0) on VRTS host vrts1001.eqiad.wmnet
18:11 swfrench-wmf: sudo cumin 'A:lvs-secondary-eqiad or A:lvs-low-traffic-eqiad' 'ipvsadm --delete-service --tcp-service 10.2.2.22:443' (api-https eqiad) - T367949
18:11 swfrench-wmf: sudo cumin 'A:lvs-secondary-eqiad or A:lvs-low-traffic-eqiad' 'ipvsa
18:10 aokoth@cumin1002: START - Cookbook sre.vrts.upgrade on VRTS host vrts1001.eqiad.wmnet
18:10 swfrench-wmf: sudo cumin 'A:lvs-secondary-codfw or A:lvs-low-traffic-codfw' 'ipvsa
18:08 swfrench-wmf: sudo cumin 'A:lvs-secondary-codfw or A:lvs-low-traffic-codfw' 'ipvsa
18:01 aokoth@cumin1002: END (FAIL) - Cookbook sre.vrts.upgrade (exit_code=99) on VRTS host vrts1001.eqiad.wmnet
18:01 aokoth@cumin1002: START - Cookbook sre.vrts.upgrade on VRTS host vrts1001.eqiad.wmnet
17:58 swfrench-wmf: sudo cumin 'A:lvs-low-traffic-eqiad' 'systemctl restart pybal.service' - T367949
17:51 swfrench-wmf: sudo cumin 'A:lvs-secondary-eqiad' 'systemctl restart pybal.service' - T367949
17:46 logmsgbot: nshahquinn-wmf@deploy1002 Finished deploy [airflow-dags/analytics_product@ebd9e13]: (no justification provided) (duration: 00m 07s)
17:46 logmsgbot: nshahquinn-wmf@deploy1002 Started deploy [airflow-dags/analytics_product@ebd9e13]: (no justification provided)
17:44 swfrench-wmf: sudo cumin 'A:lvs-low-traffic-codfw' 'systemctl restart pybal.service' - T367949
17:41 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs2014.codfw.wmnet
17:41 sukhe@cumin1002: START - Cookbook sre.hosts.remove-downtime for lvs2014.codfw.wmnet
17:40 swfrench@cumin2002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-secondary-codfw (T367949)
17:37 pt1979@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephmon1004.eqiad.wmnet with OS bullseye
17:33 swfrench@cumin2002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-secondary-codfw (T367949)
17:28 swfrench-wmf: run-puppet-agent on O:lvs::balancer to pick up switch to service_setup, removal of profile::lvs::realserver::pools - T367949
17:17 swfrench-wmf: run-puppet-agent on A:dnsbox to pick up switch to lvs_setup - T367949
17:06 swfrench-wmf: ran authdns-update on dns1004 to pick up removal of appservers / api records - T367949
17:04 dancy@deploy1002: sync-world aborted: testing (duration: 00m 51s)
17:03 dancy@deploy1002: Started scap sync-world: testing
17:02 pt1979@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephmon1004.eqiad.wmnet with OS bullseye
16:59 jhathaway: applying varnish change on cp4037, 1030591
16:58 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply
16:57 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply
16:16 pt1979@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephmon1004.eqiad.wmnet with OS bullseye
16:14 pt1979@cumin1002: END (FAIL) - Cookbook sre.hosts.dhcp (exit_code=99) for host cloudcephmon1004.eqiad.wmnet
16:07 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/growthbook: apply
16:07 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/growthbook: apply
15:52 pt1979@cumin1002: START - Cookbook sre.hosts.dhcp for host cloudcephmon1004.eqiad.wmnet
15:48 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/growthbook: apply
15:47 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
15:47 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
15:24 cgoubert@cumin1002: conftool action : set/pooled=yes; selector: name=(kubernetes1025|kubernetes1026|kubernetes1052|kubernetes1053|kubernetes1054|kubernetes1055|kubernetes1056|mw1496).eqiad.wmnet,cluster=kubernetes,service=kubesvc [reason: Uncordoning following T365998]
15:24 Emperor: moss-be1003 out of maintenance mode after network downtime T365998
15:22 cgoubert@cumin1002: conftool action : set/pooled=yes; selector: name=dse-k8s-worker1008.eqiad.wmnet,cluster=dse-k8s,service=kubesvc
15:22 claime: Uncordoning dse-k8s-worker1008.eqiad.wmnet after T365998
15:20 andrewbogott: find /srv/mediawiki/images/wikitech/archive -type f | xargs delete on wikitech-static, drive is full of nonsense
15:07 brennen@deploy1002: Finished deploy [phabricator/deployment@3902e30]: deploy phab1004 for T370776 (duration: 00m 33s)
15:06 brennen@deploy1002: Started deploy [phabricator/deployment@3902e30]: deploy phab1004 for T370776
15:06 brennen@deploy1002: Finished deploy [phabricator/deployment@3902e30]: deploy phab2002 for T370776 (redux, first deploy a mistaken no-op) (duration: 00m 34s)
15:05 brennen@deploy1002: Started deploy [phabricator/deployment@3902e30]: deploy phab2002 for T370776 (redux, first deploy a mistaken no-op)
15:05 brennen@deploy1002: Finished deploy [phabricator/deployment@7335128]: deploy phab2002 for T370776 (duration: 01m 17s)
15:03 brennen@deploy1002: Started deploy [phabricator/deployment@7335128]: deploy phab2002 for T370776
15:03 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab2002.codfw.wmnet with reason: Phabricator/Phorge update
15:03 jelto@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on phab2002.codfw.wmnet with reason: Phabricator/Phorge update
15:03 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab1004.eqiad.wmnet with reason: Phabricator/Phorge update
15:02 jelto@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on phab1004.eqiad.wmnet with reason: Phabricator/Phorge update
15:02 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab.wmfusercontent.org with reason: Phabricator/Phorge update
15:02 jelto@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on phab.wmfusercontent.org with reason: Phabricator/Phorge update
15:01 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 25 hosts with reason: JunOS upgrade lsw1-f3-eqiad
15:01 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on 25 hosts with reason: JunOS upgrade lsw1-f3-eqiad
15:01 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lsw1-f3-eqiad,lsw1-f3-eqiad IPv6,ssw1-e1-eqiad.mgmt,ssw1-f1-eqiad.mgmt with reason: JunOS upgrade lsw1-f3-eqiad
15:00 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on lsw1-f3-eqiad,lsw1-f3-eqiad IPv6,ssw1-e1-eqiad.mgmt,ssw1-f1-eqiad.mgmt with reason: JunOS upgrade lsw1-f3-eqiad
15:00 topranks: rebooting lsw1-f3-eqiad to complete JunOS upgrade (T365998)
14:59 XioNoX: deploy CR1055546 border-in: remove authdns filter
14:59 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply
14:58 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply
14:54 Emperor: moss-be1003 into maintenance mode for network downtime T365998
14:48 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:50:00 on lsw1-f3-eqiad.mgmt with reason: prep JunOS upgrade lsw1-f3-eqiad
14:48 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:50:00 on lsw1-f3-eqiad.mgmt with reason: prep JunOS upgrade lsw1-f3-eqiad
14:10 klausman@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1001.eqiad.wmnet
14:10 ChrisDobbins901_: cdobbins@cumin1002:~$ sudo cumin 'A:cp' 'run-puppet-agent "merging CR #1041705"'
14:06 akosiaris@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on deploy1003.eqiad.wmnet with reason: host reimage
14:03 klausman@cumin1002: START - Cookbook sre.hosts.reboot-single for host ml-serve1001.eqiad.wmnet
14:03 akosiaris@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on deploy1003.eqiad.wmnet with reason: host reimage
13:58 Lucas_WMDE: UTC afternoon backport+config window done
13:57 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for MoveLogFormatter::getPreloadTitles: Handle bad titles (T370396) (duration: 09m 24s)
13:52 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde: Continuing with sync
13:51 XioNoX: deploy CR1055544 border-in: remove squid and nrpe filters, expand LVS filter
13:51 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde: Backport for MoveLogFormatter::getPreloadTitles: Handle bad titles (T370396) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:50 sukhe: running authdns-update after dns6001 depool
13:50 XioNoX: deploy CR1055543: border-in: remove git-ssh term
13:49 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:49 akosiaris@cumin1002: START - Cookbook sre.hosts.reimage for host deploy1003.eqiad.wmnet with OS bullseye
13:48 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
13:47 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap sync-world: Backport for MoveLogFormatter::getPreloadTitles: Handle bad titles (T370396)
13:44 ChrisDobbins901_: cdobbins@cumin1002:~$ sudo cumin 'A:cp' 'disable-puppet "merging CR #1041705"'
13:43 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/growthbook: apply
13:40 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=dns6001.wikimedia.org [reason: finished upgrading anycast-hc: T370068]
13:38 cmooney@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM netflow7001.magru.wmnet
13:37 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=dns6001.wikimedia.org [reason: upgrading anycast-hc: T370068]
13:34 cmooney@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM netflow7001.magru.wmnet
13:34 cmooney@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM netflow6001.drmrs.wmnet
13:31 cmooney@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM netflow5002.eqsin.wmnet
13:30 cmooney@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM netflow6001.drmrs.wmnet
13:29 cmooney@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM netflow4002.ulsfo.wmnet
13:24 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for [arwiki] Enable the CampaignEvents extension (T370066) (duration: 19m 17s)
13:24 cmooney@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM netflow5002.eqsin.wmnet
13:23 cmooney@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM netflow4002.ulsfo.wmnet
13:22 cmooney@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM netflow3003.esams.wmnet
13:19 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde, daimona: Continuing with sync
13:16 cmooney@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM netflow3003.esams.wmnet
13:15 cmooney@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM netflow1002.eqiad.wmnet
13:11 cmooney@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM netflow1002.eqiad.wmnet
13:10 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde, daimona: Backport for [arwiki] Enable the CampaignEvents extension (T370066) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:05 cgoubert@cumin1002: conftool action : set/pooled=inactive; selector: name=dse-k8s-worker1008.eqiad.wmnet,cluster=dse-k8s,service=kubesvc
13:05 claime: Cordoning dse-k8s-worker1008.eqiad.wmnet for T365998
13:05 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap sync-world: Backport for [arwiki] Enable the CampaignEvents extension (T370066)
11:28 cgoubert@cumin1002: conftool action : set/pooled=inactive; selector: name=(kubernetes1025|kubernetes1026|kubernetes1052|kubernetes1053|kubernetes1054|kubernetes1055|kubernetes1056|mw1496).eqiad.wmnet,cluster=kubernetes,service=kubesvc
11:19 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
11:19 claime: Lowered concurrency of RecordLint job to 50 - T370304
11:18 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
11:18 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
11:17 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
11:16 cgoubert@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
11:15 cgoubert@deploy1002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
10:51 Amir1: running "delete from linter where linter_cat = 23 limit 1000;" in a loop in mwmaint (T370304)
10:39 claime: Cordoning kubernetes1025.eqiad.wmnet kubernetes1026.eqiad.wmnet kubernetes1052.eqiad.wmnet kubernetes1053.eqiad.wmnet kubernetes1054.eqiad.wmnet kubernetes1055.eqiad.wmnet kubernetes1056.eqiad.wmnet mw1496.eqiad.wmnet for T365998
10:03 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/growthbook: apply
10:02 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/growthbook: apply
09:41 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/growthbook: apply
09:41 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/growthbook: apply
09:35 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/growthbook: apply
09:35 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/growthbook: apply
09:14 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/growthbook: apply
09:12 dreamyjazz@deploy1002: Finished scap: Backport for Define wgGlobalBlockingCentralWiki as 'metawiki' (T370457) (duration: 11m 29s)
09:07 dreamyjazz@deploy1002: dreamyjazz: Continuing with sync
09:07 dreamyjazz@deploy1002: dreamyjazz: Backport for Define wgGlobalBlockingCentralWiki as 'metawiki' (T370457) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
09:05 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/growthbook: apply
09:01 dreamyjazz@deploy1002: Started scap sync-world: Backport for Define wgGlobalBlockingCentralWiki as 'metawiki' (T370457)
08:27 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/growthbook: apply
08:17 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/growthbook: apply
07:22 kartik@deploy1002: Finished scap: Backport for uzwiki: Limit publishing in CX to 'patroller' and 'sysop' groups (T370387) (duration: 13m 37s)
07:17 kartik@deploy1002: kartik: Continuing with sync
07:15 kartik@deploy1002: kartik: Backport for uzwiki: Limit publishing in CX to 'patroller' and 'sysop' groups (T370387) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
07:08 kartik@deploy1002: Started scap sync-world: Backport for uzwiki: Limit publishing in CX to 'patroller' and 'sysop' groups (T370387)
06:58 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
06:58 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
05:00 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2173 (T367856)', diff saved to https://phabricator.wikimedia.org/P66892 and previous config saved to /var/cache/conftool/dbconfig/20240723-050042-marostegui.json
05:00 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 12:00:00 on db2186.codfw.wmnet with reason: Maintenance
05:00 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 12:00:00 on db2186.codfw.wmnet with reason: Maintenance
05:00 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2173.codfw.wmnet with reason: Maintenance
05:00 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2173.codfw.wmnet with reason: Maintenance
05:00 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170 (T367856)', diff saved to https://phabricator.wikimedia.org/P66891 and previous config saved to /var/cache/conftool/dbconfig/20240723-050004-marostegui.json
04:44 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170', diff saved to https://phabricator.wikimedia.org/P66890 and previous config saved to /var/cache/conftool/dbconfig/20240723-044457-marostegui.json
04:29 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170', diff saved to https://phabricator.wikimedia.org/P66889 and previous config saved to /var/cache/conftool/dbconfig/20240723-042950-marostegui.json
04:14 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170 (T367856)', diff saved to https://phabricator.wikimedia.org/P66888 and previous config saved to /var/cache/conftool/dbconfig/20240723-041442-marostegui.json
04:01 mwpresync@deploy1002: Pruned MediaWiki: 1.43.0-wmf.12 (duration: 01m 00s)
03:54 mwpresync@deploy1002: Finished scap: testwikis to 1.43.0-wmf.15 refs T366960 (duration: 51m 50s)
03:03 mwpresync@deploy1002: Started scap sync-world: testwikis to 1.43.0-wmf.15 refs T366960
01:28 dani@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
01:27 dani@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
01:27 dani@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
01:27 dani@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
01:27 dani@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
01:27 dani@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
01:24 dani@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
01:24 dani@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
01:24 dani@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
01:24 dani@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
01:24 dani@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
01:24 dani@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
01:24 dani@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
01:24 dani@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
01:24 dani@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
01:24 dani@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
01:24 dani@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
01:24 dani@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
00:22 eevans@deploy1002: helmfile [staging] DONE helmfile.d/services/data-gateway: apply
00:22 eevans@deploy1002: helmfile [staging] START helmfile.d/services/data-gateway: apply
00:05 cmooney@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM netflow2003.codfw.wmnet
00:02 cmooney@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM netflow2003.codfw.wmnet
00:00 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on netflow2003.codfw.wmnet with reason: reboot netflow2003
00:00 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:15:00 on netflow2003.codfw.wmnet with reason: reboot netflow2003

2024-07-22

23:08 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "set lsw in codfw to active - cmooney@cumin1002"
23:07 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "set lsw in codfw to active - cmooney@cumin1002"
23:05 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
23:03 cmooney@cumin1002: START - Cookbook sre.dns.netbox
22:47 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephmon1005.eqiad.wmnet with OS bullseye
22:38 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephmon1004.eqiad.wmnet with OS bullseye
22:37 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: elastic110[0-2]* for T348977 - bking@cumin2002
22:36 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: elastic110[0-2]* for T348977 - bking@cumin2002
22:35 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on elastic[1100-1102].eqiad.wmnet with reason: T348977
22:34 bking@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on elastic[1100-1102].eqiad.wmnet with reason: T348977
22:01 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephmon1005.eqiad.wmnet with OS bullseye
21:52 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephmon1004.eqiad.wmnet with OS bullseye
21:30 catrope@deploy1002: Finished scap: Backport for Do not unreview pages when they are moved (T370593) (duration: 20m 27s)
21:25 catrope@deploy1002: catrope, soda: Continuing with sync
21:24 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
21:24 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns entries for new pfw3-codfw mgmt IP - cmooney@cumin1002"
21:12 catrope@deploy1002: catrope, soda: Backport for Do not unreview pages when they are moved (T370593) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:10 catrope@deploy1002: Started scap sync-world: Backport for Do not unreview pages when they are moved (T370593)
21:09 catrope@deploy1002: Finished scap: Backport for SpecialMovePage: fix logic to check `delete-redirect` (T370669) (duration: 19m 12s)
21:04 catrope@deploy1002: catrope, matmarex: Continuing with sync
20:52 catrope@deploy1002: catrope, matmarex: Backport for SpecialMovePage: fix logic to check `delete-redirect` (T370669) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:50 catrope@deploy1002: Started scap sync-world: Backport for SpecialMovePage: fix logic to check `delete-redirect` (T370669)
20:49 catrope@deploy1002: Finished scap: Backport for HACK: add option to checked-disable checkboxes (T370611), HACK: show structured link task as disabled if frontend flag is true (T370611) (duration: 08m 27s)
20:47 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns entries for new pfw3-codfw mgmt IP - cmooney@cumin1002"
20:46 topranks: applying additional address to pfw3-codfw reth0.2140 to provide space for new hosts (T370164)
20:44 catrope@deploy1002: catrope, migr: Continuing with sync
20:43 catrope@deploy1002: catrope, migr: Backport for HACK: add option to checked-disable checkboxes (T370611), HACK: show structured link task as disabled if frontend flag is true (T370611) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:40 catrope@deploy1002: Started scap sync-world: Backport for HACK: add option to checked-disable checkboxes (T370611), HACK: show structured link task as disabled if frontend flag is true (T370611)
20:40 cmooney@cumin1002: START - Cookbook sre.dns.netbox
20:12 catrope@deploy1002: Finished scap: Backport for Work around T370517 by remapping the affected i18n message (T370517) (duration: 08m 24s)
20:07 catrope@deploy1002: catrope: Continuing with sync
20:06 catrope@deploy1002: catrope: Backport for Work around T370517 by remapping the affected i18n message (T370517) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:04 catrope@deploy1002: Started scap sync-world: Backport for Work around T370517 by remapping the affected i18n message (T370517)
19:54 dancy@deploy1002: Finished scap: Backport for MWMultiVersion.php: Use FORCE_MW_VERSION instead of MW_FORCE_VERSION (T369115) (duration: 20m 22s)
19:47 dancy@deploy1002: dancy: Continuing with sync
19:47 dancy@deploy1002: dancy: Backport for MWMultiVersion.php: Use FORCE_MW_VERSION instead of MW_FORCE_VERSION (T369115) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
19:34 dancy@deploy1002: Started scap sync-world: Backport for MWMultiVersion.php: Use FORCE_MW_VERSION instead of MW_FORCE_VERSION (T369115)
18:36 ejegg: civicrm upgraded from a9ef8ab9 to 4247715d
18:27 aokoth@cumin1002: END (FAIL) - Cookbook sre.vrts.upgrade (exit_code=99) on VRTS host vrts2001.codfw.wmnet
18:27 aokoth@cumin1002: START - Cookbook sre.vrts.upgrade on VRTS host vrts2001.codfw.wmnet
18:13 aokoth@cumin1002: END (FAIL) - Cookbook sre.vrts.upgrade (exit_code=99) on VRTS host vrts2001.codfw.wmnet
18:12 aokoth@cumin1002: START - Cookbook sre.vrts.upgrade on VRTS host vrts2001.codfw.wmnet
18:12 aokoth@cumin1002: END (ERROR) - Cookbook sre.vrts.upgrade (exit_code=97) on VRTS host vrts2001.codfw.wmnet
18:12 aokoth@cumin1002: START - Cookbook sre.vrts.upgrade on VRTS host vrts2001.codfw.wmnet
17:42 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
17:42 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns entries for new cloudceph nodes - cmooney@cumin1002"
17:41 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns entries for new cloudceph nodes - cmooney@cumin1002"
17:33 cmooney@cumin1002: START - Cookbook sre.dns.netbox
17:32 cmooney@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephmon1004.eqiad.wmnet with OS bullseye
17:11 cmooney@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephmon1004.eqiad.wmnet with OS bullseye
17:09 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on netbox2003.codfw.wmnet with reason: netbox upgrade prep work
17:09 ayounsi@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on netbox2003.codfw.wmnet with reason: netbox upgrade prep work
17:09 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on netbox1003.eqiad.wmnet with reason: netbox upgrade prep work
17:09 ayounsi@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on netbox1003.eqiad.wmnet with reason: netbox upgrade prep work
17:09 ayounsi@cumin1002: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 2:00:00 on netbox1003.eqiad.wmnet with reason: netbox upgrade prep work
17:08 ayounsi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on netbox1003.eqiad.wmnet with reason: netbox upgrade prep work
16:37 sukhe: [doh1001] upgrade anycast-healthchecker to 0.9.8-1+wmf12u1: T370068
16:32 cgoubert@cumin1002: conftool action : set/weight=10:pooled=yes; selector: name=(wikikube-worker2035.codfw.wmnet|wikikube-worker2036.codfw.wmnet|wikikube-worker2037.codfw.wmnet|wikikube-worker2038.codfw.wmnet),cluster=kubernetes,service=kubesvc
16:31 claime: Pooling and uncordoning wikikube-worker2035.codfw.wmnet wikikube-worker2036.codfw.wmnet wikikube-worker2037.codfw.wmnet wikikube-worker2038.codfw.wmnet - T351074
16:31 sukhe: restart anycast-hc on durum1001
16:13 pt1979@cumin1002: END (FAIL) - Cookbook sre.hosts.dhcp (exit_code=99) for host cloudcephmon1004.eqiad.wmnet
16:08 pt1979@cumin1002: START - Cookbook sre.hosts.dhcp for host cloudcephmon1004.eqiad.wmnet
16:02 elukey: remove /srv/kafka/data/eqiad.resource-purge-3 on kafka-main2001 to force a refetch of data from good replicas and circumvent data corruption - T370574
15:58 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on kafka-main2001.codfw.wmnet with reason: attempt to remove a data dir on disk
15:57 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on kafka-main2001.codfw.wmnet with reason: attempt to remove a data dir on disk
15:49 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on kafka-test1006.eqiad.wmnet with reason: attempt to remove a data dir on disk
15:49 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on kafka-test1006.eqiad.wmnet with reason: attempt to remove a data dir on disk
15:08 dancy@deploy1002: Finished scap: Backport for MWMultiVersion.php: Allow MW_FORCE_VERSION to pin the mw version (T369115) (duration: 09m 10s)
15:03 dancy@deploy1002: dancy: Continuing with sync
15:01 dancy@deploy1002: dancy: Backport for MWMultiVersion.php: Allow MW_FORCE_VERSION to pin the mw version (T369115) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:59 dancy@deploy1002: Started scap sync-world: Backport for MWMultiVersion.php: Allow MW_FORCE_VERSION to pin the mw version (T369115)
14:26 zabe@deploy1002: Finished scap: Backport for Revert^2 "Set some site names for new-ish wikis" (T363270 T360303 T360310 T363263) (duration: 10m 54s)
14:21 zabe@deploy1002: zabe: Continuing with sync
14:17 zabe@deploy1002: zabe: Backport for Revert^2 "Set some site names for new-ish wikis" (T363270 T360303 T360310 T363263) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:15 zabe@deploy1002: Started scap sync-world: Backport for Revert^2 "Set some site names for new-ish wikis" (T363270 T360303 T360310 T363263)
14:08 tchanders@deploy1002: Finished scap: Backport for Set Flow to read only on testwiki (T370322), Enable temporary accounts on testwiki and loginwiki (T348895), Fix logic for handling enabling temporary accounts (T348895) (duration: 07m 11s)
14:03 tchanders@deploy1002: tchanders: Continuing with sync
14:03 tchanders@deploy1002: tchanders: Backport for Set Flow to read only on testwiki (T370322), Enable temporary accounts on testwiki and loginwiki (T348895), Fix logic for handling enabling temporary accounts (T348895) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:01 tchanders@deploy1002: Started scap sync-world: Backport for Set Flow to read only on testwiki (T370322), Enable temporary accounts on testwiki and loginwiki (T348895), Fix logic for handling enabling temporary accounts (T348895)
13:45 tchanders@deploy1002: tchanders: Continuing with sync
13:42 tchanders@deploy1002: tchanders: Backport for Set Flow to read only on testwiki (T370322), Enable temporary accounts on testwiki and loginwiki (T348895), Fix logic for handling enabling temporary accounts (T348895) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:39 tchanders@deploy1002: Started scap sync-world: Backport for Set Flow to read only on testwiki (T370322), Enable temporary accounts on testwiki and loginwiki (T348895), Fix logic for handling enabling temporary accounts (T348895)
13:29 tchanders@deploy1002: Sync cancelled.
13:25 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on rdb1014.eqiad.wmnet with reason: Hardware issue
13:25 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on rdb1014.eqiad.wmnet with reason: Hardware issue
13:21 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on netbox1002.eqiad.wmnet with reason: Netbox 3 silencing
13:20 ayounsi@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on netbox1002.eqiad.wmnet with reason: Netbox 3 silencing
13:20 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on netbox2002.codfw.wmnet with reason: Netbox 3 silencing
13:20 ayounsi@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on netbox2002.codfw.wmnet with reason: Netbox 3 silencing
13:13 tchanders@deploy1002: tchanders: Backport for Set Flow to read only on testwiki (T370322), Enable temporary accounts on testwiki and loginwiki (T348895) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:11 tchanders@deploy1002: Started scap sync-world: Backport for Set Flow to read only on testwiki (T370322), Enable temporary accounts on testwiki and loginwiki (T348895)
13:07 claime: power cycling rdb1014.eqiad.wmnet
12:22 godog: restore retention.ms=172800000 for mediawiki.httpd.accesslog
11:54 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply
11:53 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply
11:17 ladsgroup@deploy1002: Finished scap: Backport for Enable ICU provided alphabetical order in the Kurdish wikis categories (T48235) (duration: 08m 02s)
11:12 ladsgroup@deploy1002: ebrahim, ladsgroup: Continuing with sync
11:11 ladsgroup@deploy1002: ebrahim, ladsgroup: Backport for Enable ICU provided alphabetical order in the Kurdish wikis categories (T48235) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
11:09 ladsgroup@deploy1002: Started scap sync-world: Backport for Enable ICU provided alphabetical order in the Kurdish wikis categories (T48235)
10:33 volans: upgraded manually prometheus-ipmi-exporter to v 1.8.0-1~wmf12+1 on db1179 (leftover because was down) T368088
10:32 Dreamy_Jazz: Running `mwscript extensions/MediaModeration/maintenance/updateMetrics.php --wiki=commonswiki --verbose`
10:28 Dreamy_Jazz: Restarting MediaModeration scanning script - https://wikitech.wikimedia.org/wiki/MediaModeration
10:24 elukey: kafka preferred-replica-election on kafka-main - T370574
09:51 godog: set mediawiki.httpd.accesslog topic retention to 26h temporarily
09:50 mlitn@deploy1002: Finished scap: Backport for Reduce weight of 'main subject' as it's used inconsistently (T367774) (duration: 08m 19s)
09:45 mlitn@deploy1002: cparle, mlitn: Continuing with sync
09:44 mlitn@deploy1002: cparle, mlitn: Backport for Reduce weight of 'main subject' as it's used inconsistently (T367774) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
09:42 mlitn@deploy1002: Started scap sync-world: Backport for Reduce weight of 'main subject' as it's used inconsistently (T367774)
09:40 claime: homer 'cr*codfw*' commit 'T351074'
09:30 ayounsi@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) netbox to netbox2003.codfw.wmnet,netbox1003.eqiad.wmnet with reason: Release v4.0.7 to future netbox prod - ayounsi@cumin1002 - T336275
09:21 ayounsi@cumin1002: START - Cookbook sre.deploy.python-code netbox to netbox2003.codfw.wmnet,netbox1003.eqiad.wmnet with reason: Release v4.0.7 to future netbox prod - ayounsi@cumin1002 - T336275
09:03 ayounsi@cumin1002: END (FAIL) - Cookbook sre.deploy.python-code (exit_code=99) netbox to netbox2003.codfw.wmnet,netbox1003.eqiad.wmnet with reason: Release v4.0.7 to future netbox prod - ayounsi@cumin1002 - T336275
09:00 ayounsi@cumin1002: START - Cookbook sre.deploy.python-code netbox to netbox2003.codfw.wmnet,netbox1003.eqiad.wmnet with reason: Release v4.0.7 to future netbox prod - ayounsi@cumin1002 - T336275
08:56 godog: rebalance mediawiki.httpd.accesslog partitions across brokers - T370129
08:55 ayounsi@cumin1002: END (PASS) - Cookbook sre.postgresql.postgres-init (exit_code=0)
08:50 ayounsi@cumin1002: START - Cookbook sre.postgresql.postgres-init
08:32 elukey: restart kafka on kafka-main2005 - T370574
08:31 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on kafka-main2005.codfw.wmnet with reason: restart attempt
08:30 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on kafka-main2005.codfw.wmnet with reason: restart attempt
08:24 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/superset: apply
08:23 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/superset: apply
08:07 elukey: restart kafka on kafka-main2001 - T370574
08:06 elukey: restart kafka on kafka-main2001 - sre.hosts.downtime
08:06 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on kafka-main2001.codfw.wmnet with reason: restart attempt
08:05 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on kafka-main2001.codfw.wmnet with reason: restart attempt
08:03 brouberol@cumin1002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=99) for hosts karapace1002.eqiad.wmnet
08:00 brouberol@cumin1002: START - Cookbook sre.hosts.decommission for hosts karapace1002.eqiad.wmnet
07:39 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on netbox2003.codfw.wmnet with reason: netbox upgrade prep work
07:39 ayounsi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on netbox2003.codfw.wmnet with reason: netbox upgrade prep work
07:35 stran@deploy1002: Finished scap: Backport for IPInfoHandler: Move token param definition to getBodyParamSettings (T370500) (duration: 12m 18s)
07:30 stran@deploy1002: stran: Continuing with sync
07:25 stran@deploy1002: stran: Backport for IPInfoHandler: Move token param definition to getBodyParamSettings (T370500) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
07:23 stran@deploy1002: Started scap sync-world: Backport for IPInfoHandler: Move token param definition to getBodyParamSettings (T370500)
07:12 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on netbox1003.eqiad.wmnet with reason: netbox upgrade prep work
07:12 ayounsi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on netbox1003.eqiad.wmnet with reason: netbox upgrade prep work
02:55 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2170 (T367856)', diff saved to https://phabricator.wikimedia.org/P66880 and previous config saved to /var/cache/conftool/dbconfig/20240722-025552-marostegui.json
02:55 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2170.codfw.wmnet with reason: Maintenance
02:55 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2170.codfw.wmnet with reason: Maintenance
02:55 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T367856)', diff saved to https://phabricator.wikimedia.org/P66879 and previous config saved to /var/cache/conftool/dbconfig/20240722-025530-marostegui.json
02:40 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P66878 and previous config saved to /var/cache/conftool/dbconfig/20240722-024023-marostegui.json
02:25 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P66877 and previous config saved to /var/cache/conftool/dbconfig/20240722-022516-marostegui.json
02:10 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T367856)', diff saved to https://phabricator.wikimedia.org/P66876 and previous config saved to /var/cache/conftool/dbconfig/20240722-021009-marostegui.json
01:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1179 (re)pooling @ 100%: Maint over (T369855 T370304)', diff saved to https://phabricator.wikimedia.org/P66875 and previous config saved to /var/cache/conftool/dbconfig/20240722-015302-ladsgroup.json
01:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1179 (re)pooling @ 75%: Maint over (T369855 T370304)', diff saved to https://phabricator.wikimedia.org/P66874 and previous config saved to /var/cache/conftool/dbconfig/20240722-013756-ladsgroup.json
01:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1179 (re)pooling @ 25%: Maint over (T369855 T370304)', diff saved to https://phabricator.wikimedia.org/P66873 and previous config saved to /var/cache/conftool/dbconfig/20240722-012251-ladsgroup.json
01:19 ladsgroup@deploy1002: Finished scap: Backport for Stop storing missing-image-alt-text lints (T370304) (duration: 08m 48s)
01:13 ladsgroup@deploy1002: ladsgroup: Continuing with sync
01:13 ladsgroup@deploy1002: ladsgroup: Backport for Stop storing missing-image-alt-text lints (T370304) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
01:10 ladsgroup@deploy1002: Started scap sync-world: Backport for Stop storing missing-image-alt-text lints (T370304)
01:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1179 (re)pooling @ 10%: Maint over (T369855 T370304)', diff saved to https://phabricator.wikimedia.org/P66872 and previous config saved to /var/cache/conftool/dbconfig/20240722-010745-ladsgroup.json

2024-07-21

23:22 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219 (T367856)', diff saved to https://phabricator.wikimedia.org/P66871 and previous config saved to /var/cache/conftool/dbconfig/20240721-232234-marostegui.json
23:07 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P66870 and previous config saved to /var/cache/conftool/dbconfig/20240721-230727-marostegui.json
22:52 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P66869 and previous config saved to /var/cache/conftool/dbconfig/20240721-225219-marostegui.json
22:44 ladsgroup@deploy1002: Finished scap: Backport for Disable missing-image-alt-text lint (T370304) (duration: 26m 27s)
22:37 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219 (T367856)', diff saved to https://phabricator.wikimedia.org/P66868 and previous config saved to /var/cache/conftool/dbconfig/20240721-223712-marostegui.json
22:36 ladsgroup@deploy1002: ladsgroup: Continuing with sync
22:35 ladsgroup@deploy1002: ladsgroup: Backport for Disable missing-image-alt-text lint (T370304) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
22:18 ladsgroup@deploy1002: Started scap sync-world: Backport for Disable missing-image-alt-text lint (T370304)
08:58 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2219 (T367856)', diff saved to https://phabricator.wikimedia.org/P66867 and previous config saved to /var/cache/conftool/dbconfig/20240721-085853-marostegui.json
08:58 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2219.codfw.wmnet with reason: Maintenance
08:58 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2219.codfw.wmnet with reason: Maintenance
08:58 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2210 (T367856)', diff saved to https://phabricator.wikimedia.org/P66866 and previous config saved to /var/cache/conftool/dbconfig/20240721-085832-marostegui.json
08:43 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P66865 and previous config saved to /var/cache/conftool/dbconfig/20240721-084325-marostegui.json
08:28 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P66864 and previous config saved to /var/cache/conftool/dbconfig/20240721-082818-marostegui.json
08:13 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2210 (T367856)', diff saved to https://phabricator.wikimedia.org/P66863 and previous config saved to /var/cache/conftool/dbconfig/20240721-081310-marostegui.json
02:01 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2153 (T367856)', diff saved to https://phabricator.wikimedia.org/P66862 and previous config saved to /var/cache/conftool/dbconfig/20240721-020121-marostegui.json
02:01 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2153.codfw.wmnet with reason: Maintenance
02:01 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2153.codfw.wmnet with reason: Maintenance
02:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T367856)', diff saved to https://phabricator.wikimedia.org/P66861 and previous config saved to /var/cache/conftool/dbconfig/20240721-020059-marostegui.json
01:45 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P66860 and previous config saved to /var/cache/conftool/dbconfig/20240721-014552-marostegui.json
01:30 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P66859 and previous config saved to /var/cache/conftool/dbconfig/20240721-013044-marostegui.json
01:15 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T367856)', diff saved to https://phabricator.wikimedia.org/P66858 and previous config saved to /var/cache/conftool/dbconfig/20240721-011537-marostegui.json

2024-07-20

19:00 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2210 (T367856)', diff saved to https://phabricator.wikimedia.org/P66857 and previous config saved to /var/cache/conftool/dbconfig/20240720-190046-marostegui.json
19:00 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2210.codfw.wmnet with reason: Maintenance
19:00 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2210.codfw.wmnet with reason: Maintenance
19:00 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2206 (T367856)', diff saved to https://phabricator.wikimedia.org/P66856 and previous config saved to /var/cache/conftool/dbconfig/20240720-190024-marostegui.json
18:45 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2206', diff saved to https://phabricator.wikimedia.org/P66855 and previous config saved to /var/cache/conftool/dbconfig/20240720-184516-marostegui.json
18:30 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2206', diff saved to https://phabricator.wikimedia.org/P66854 and previous config saved to /var/cache/conftool/dbconfig/20240720-183009-marostegui.json
18:15 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2206 (T367856)', diff saved to https://phabricator.wikimedia.org/P66853 and previous config saved to /var/cache/conftool/dbconfig/20240720-181502-marostegui.json
14:30 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephmon1005.eqiad.wmnet with OS bullseye
14:22 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephmon1004.eqiad.wmnet with OS bullseye
14:16 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephmon1004
14:16 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephmon1004
14:15 jclark@cumin1002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host cloudcephmon1006
14:15 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephmon1006
14:15 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephmon1005
14:15 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephmon1005
14:15 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephmon1004
14:14 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephmon1004
14:10 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephmon1004
14:10 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephmon1004
14:09 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:09 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt for cloudcephmon1004-6 - jclark@cumin1002"
14:08 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt for cloudcephmon1004-6 - jclark@cumin1002"
14:06 jclark@cumin1002: START - Cookbook sre.dns.netbox
14:06 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephmon1004
14:05 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephmon1004
14:05 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephmon1004
14:05 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephmon1004
13:59 jclark@cumin1002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host cloudcephmon1006
13:59 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephmon1006
13:54 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephmon1004
13:54 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephmon1004
13:47 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephmon1004
13:47 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephmon1004
13:47 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephmon1005
13:47 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephmon1005
13:45 jclark@cumin1002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host cloudcephmon1005
13:45 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephmon1005
13:45 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephmon1004
13:44 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephmon1004
13:34 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephmon1005
13:34 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephmon1005
13:33 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephmon1004
13:33 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephmon1004
13:10 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephmon1005.eqiad.wmnet with OS bullseye
13:02 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephmon1004.eqiad.wmnet with OS bullseye
08:15 dani@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
08:15 dani@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
08:15 dani@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
08:15 dani@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
08:15 dani@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
08:15 dani@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
06:21 dani@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
03:35 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2206 (T367856)', diff saved to https://phabricator.wikimedia.org/P66852 and previous config saved to /var/cache/conftool/dbconfig/20240720-033501-marostegui.json
03:34 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2206.codfw.wmnet with reason: Maintenance
03:34 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2206.codfw.wmnet with reason: Maintenance
01:17 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2146 (T367856)', diff saved to https://phabricator.wikimedia.org/P66851 and previous config saved to /var/cache/conftool/dbconfig/20240720-011705-marostegui.json
01:16 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2146.codfw.wmnet with reason: Maintenance
01:16 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2146.codfw.wmnet with reason: Maintenance
01:16 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T367856)', diff saved to https://phabricator.wikimedia.org/P66850 and previous config saved to /var/cache/conftool/dbconfig/20240720-011643-marostegui.json
01:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P66849 and previous config saved to /var/cache/conftool/dbconfig/20240720-010136-marostegui.json
00:46 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P66848 and previous config saved to /var/cache/conftool/dbconfig/20240720-004629-marostegui.json
00:31 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T367856)', diff saved to https://phabricator.wikimedia.org/P66847 and previous config saved to /var/cache/conftool/dbconfig/20240720-003122-marostegui.json
00:26 jclark@cumin1002: END (ERROR) - Cookbook sre.hosts.provision (exit_code=97) for host db1179.mgmt.eqiad.wmnet with reboot policy GRACEFUL
00:14 jclark@cumin1002: START - Cookbook sre.hosts.provision for host db1179.mgmt.eqiad.wmnet with reboot policy GRACEFUL

2024-07-19

21:14 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1061.eqiad.wmnet with OS bookworm
20:52 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1061.eqiad.wmnet with reason: host reimage
20:49 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1061.eqiad.wmnet with reason: host reimage
20:34 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1061.eqiad.wmnet with OS bookworm
17:21 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
17:21 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns entries for new irb ints codfw row c and d - cmooney@cumin1002"
17:20 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns entries for new irb ints codfw row c and d - cmooney@cumin1002"
17:17 cmooney@cumin1002: START - Cookbook sre.dns.netbox
17:13 cmooney@cumin1002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
17:12 topranks: adding irb ints for row c/d vlans to codfw leaf switches in those rows T364095
17:05 cmooney@cumin1002: START - Cookbook sre.dns.netbox
16:48 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:46 cmooney@cumin1002: START - Cookbook sre.dns.netbox
16:20 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/superset-next: apply
16:20 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/superset-next: apply
16:13 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/superset-next: apply
16:11 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/superset-next: apply
15:52 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2038.codfw.wmnet with OS bullseye
15:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['gerrit2003']
15:44 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['gerrit2003']
15:43 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['gerrit2003']
15:41 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2037.codfw.wmnet with OS bullseye
15:35 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['gerrit2003']
15:35 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['gerrit2003']
15:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['gerrit2003']
15:34 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['gerrit2003']
15:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['gerrit2003']
15:32 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2038.codfw.wmnet with reason: host reimage
15:28 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2038.codfw.wmnet with reason: host reimage
15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host gerrit2003.mgmt.codfw.wmnet with reboot policy FORCED
15:25 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host sretest2002.codfw.wmnet
15:20 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2037.codfw.wmnet with reason: host reimage
15:17 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2037.codfw.wmnet with reason: host reimage
15:16 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host gerrit2003.mgmt.codfw.wmnet with reboot policy FORCED
15:16 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:16 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding gerrit2003 to codfw - jhancock@cumin2002"
15:15 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding gerrit2003 to codfw - jhancock@cumin2002"
15:11 jhancock@cumin2002: START - Cookbook sre.dns.netbox
15:10 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2038.codfw.wmnet with OS bullseye
15:09 cgoubert@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-worker2038.codfw.wmnet with OS bullseye
14:59 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2037.codfw.wmnet with OS bullseye
14:54 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2035.codfw.wmnet with OS bullseye
14:51 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2036.codfw.wmnet with OS bullseye
14:49 cmooney@cumin1002: START - Cookbook sre.hosts.dhcp for host sretest2002.codfw.wmnet
14:45 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2038.codfw.wmnet with OS bullseye
14:43 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2037.mgmt.codfw.wmnet with reboot policy GRACEFUL
14:43 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:43 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns entries for sretest2002 - cmooney@cumin1002"
14:42 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns entries for sretest2002 - cmooney@cumin1002"
14:40 cgoubert@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker2037.mgmt.codfw.wmnet with reboot policy GRACEFUL
14:39 godog: power off centrallog1002 for network upgrade - T369825
14:38 filippo@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on centrallog1002.eqiad.wmnet with reason: network upgrade
14:38 filippo@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on centrallog1002.eqiad.wmnet with reason: network upgrade
14:37 cmooney@cumin1002: START - Cookbook sre.dns.netbox
14:36 cgoubert@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-worker2038.codfw.wmnet with OS bullseye
14:36 cgoubert@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-worker2037.codfw.wmnet with OS bullseye
14:34 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2035.codfw.wmnet with reason: host reimage
14:31 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2036.codfw.wmnet with reason: host reimage
14:28 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2035.codfw.wmnet with reason: host reimage
14:27 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2036.codfw.wmnet with reason: host reimage
14:10 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2038.codfw.wmnet with OS bullseye
14:10 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2037.codfw.wmnet with OS bullseye
14:09 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2036.codfw.wmnet with OS bullseye
14:09 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2035.codfw.wmnet with OS bullseye
14:08 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2439 to wikikube-worker2038
14:07 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2038
14:06 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2038
14:06 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:06 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2439 to wikikube-worker2038 - cgoubert@cumin1002"
14:05 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2439 to wikikube-worker2038 - cgoubert@cumin1002"
14:03 herron@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thanos-web,name=titan1001.eqiad.wmnet
14:02 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
14:02 herron@puppetmaster1001: conftool action : set/pooled=no; selector: service=thanos-web,name=titan1001.eqiad.wmnet
14:02 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2439 to wikikube-worker2038
14:02 herron@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thanos-web,name=titan1002.eqiad.wmnet
14:01 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2438 to wikikube-worker2037
14:01 herron@puppetmaster1001: conftool action : set/pooled=no; selector: service=thanos-web,name=titan1002.eqiad.wmnet
14:01 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2037
13:59 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2037
13:59 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:59 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2438 to wikikube-worker2037 - cgoubert@cumin1002"
13:57 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2438 to wikikube-worker2037 - cgoubert@cumin1002"
13:55 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
13:55 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2438 to wikikube-worker2037
13:53 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2433 to wikikube-worker2036
13:53 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2036
13:52 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2036
13:52 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:52 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2433 to wikikube-worker2036 - cgoubert@cumin1002"
13:51 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2433 to wikikube-worker2036 - cgoubert@cumin1002"
13:48 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
13:48 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2433 to wikikube-worker2036
13:46 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2432 to wikikube-worker2035
13:46 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2035
13:45 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2035
13:45 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:45 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2432 to wikikube-worker2035 - cgoubert@cumin1002"
13:42 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2432 to wikikube-worker2035 - cgoubert@cumin1002"
13:39 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
13:39 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2432 to wikikube-worker2035
13:31 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2439.mgmt.codfw.wmnet with reboot policy GRACEFUL
13:21 cgoubert@cumin1002: START - Cookbook sre.hosts.provision for host mw2439.mgmt.codfw.wmnet with reboot policy GRACEFUL
12:53 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2439.mgmt.codfw.wmnet with reboot policy GRACEFUL
12:49 cgoubert@cumin1002: START - Cookbook sre.hosts.provision for host mw2439.mgmt.codfw.wmnet with reboot policy GRACEFUL
12:47 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host mw2439.mgmt.codfw.wmnet with reboot policy GRACEFUL
12:47 cgoubert@cumin1002: START - Cookbook sre.hosts.provision for host mw2439.mgmt.codfw.wmnet with reboot policy GRACEFUL
12:24 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.convert-disks (exit_code=0) for host mw2439
12:24 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2439.mgmt.codfw.wmnet with reboot policy GRACEFUL
12:23 arnaudb@cumin1002: dbctl commit (dc=all): 'T365998 - depooling db1195 - s1 db1202 - s7 db1203 - s8', diff saved to https://phabricator.wikimedia.org/P66843 and previous config saved to /var/cache/conftool/dbconfig/20240719-122320-arnaudb.json
12:20 cgoubert@cumin1002: START - Cookbook sre.hosts.provision for host mw2439.mgmt.codfw.wmnet with reboot policy GRACEFUL
12:19 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2199.codfw.wmnet with reason: Maintenance
12:19 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2199.codfw.wmnet with reason: Maintenance
12:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T367856)', diff saved to https://phabricator.wikimedia.org/P66842 and previous config saved to /var/cache/conftool/dbconfig/20240719-121933-marostegui.json
12:19 cgoubert@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts mw2439.codfw.wmnet
12:18 cgoubert@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mw2439.codfw.wmnet
12:18 cgoubert@cumin1002: START - Cookbook sre.hosts.convert-disks for host mw2439
12:13 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.convert-disks (exit_code=99) for host mw2439
12:12 cgoubert@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts mw2439.codfw.wmnet
12:12 cgoubert@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mw2439.codfw.wmnet
12:12 cgoubert@cumin1002: START - Cookbook sre.hosts.convert-disks for host mw2439
12:10 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.convert-disks (exit_code=99) for host mw2439
12:09 cgoubert@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts mw2439.codfw.wmnet
12:09 cgoubert@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mw2439.codfw.wmnet
12:09 cgoubert@cumin1002: START - Cookbook sre.hosts.convert-disks for host mw2439
12:06 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2439.mgmt.codfw.wmnet with reboot policy GRACEFUL
12:04 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P66841 and previous config saved to /var/cache/conftool/dbconfig/20240719-120426-marostegui.json
12:01 cgoubert@cumin1002: START - Cookbook sre.hosts.provision for host mw2439.mgmt.codfw.wmnet with reboot policy GRACEFUL
11:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P66840 and previous config saved to /var/cache/conftool/dbconfig/20240719-114919-marostegui.json
11:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T367856)', diff saved to https://phabricator.wikimedia.org/P66839 and previous config saved to /var/cache/conftool/dbconfig/20240719-113412-marostegui.json
11:10 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.convert-disks (exit_code=99) for host mw2439
11:07 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
11:05 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
11:05 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
10:54 cgoubert@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts mw2439.codfw.wmnet
10:54 cgoubert@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mw2439.codfw.wmnet
10:54 cgoubert@cumin1002: START - Cookbook sre.hosts.convert-disks for host mw2439
10:49 cgoubert@cumin1002: END (ERROR) - Cookbook sre.hosts.convert-disks (exit_code=97) for host mw2439
10:41 cgoubert@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts mw2439.codfw.wmnet
10:41 cgoubert@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mw2439.codfw.wmnet
10:41 cgoubert@cumin1002: START - Cookbook sre.hosts.convert-disks for host mw2439
10:38 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.convert-disks (exit_code=99) for host mw2439
10:37 cgoubert@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts mw2439.codfw.wmnet
10:37 cgoubert@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mw2439.codfw.wmnet
10:37 cgoubert@cumin1002: START - Cookbook sre.hosts.convert-disks for host mw2439
10:28 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.convert-disks (exit_code=99) for host mw2439
10:13 cgoubert@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts mw2439.codfw.wmnet
10:13 cgoubert@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mw2439.codfw.wmnet
10:13 cgoubert@cumin1002: START - Cookbook sre.hosts.convert-disks for host mw2439
10:06 pfischer@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
10:05 pfischer@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
10:00 pfischer@deploy1002: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
10:00 pfischer@deploy1002: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
09:58 cgoubert@cumin1002: END (ERROR) - Cookbook sre.hosts.convert-disks (exit_code=97) for host mw2439
09:54 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.dhcp (exit_code=99) for host sretest2001.codfw.wmnet
09:42 cgoubert@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts mw2439.codfw.wmnet
09:41 cgoubert@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mw2439.codfw.wmnet
09:41 cgoubert@cumin1002: START - Cookbook sre.hosts.convert-disks for host mw2439
09:35 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.convert-disks (exit_code=99) for host mw2439
09:35 cgoubert@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts mw2439.codfw.wmnet
09:35 cgoubert@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mw2439.codfw.wmnet
09:35 cgoubert@cumin1002: START - Cookbook sre.hosts.convert-disks for host mw2439
09:32 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.convert-disks (exit_code=99) for host mw2439
09:31 cgoubert@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts mw2439.codfw.wmnet
09:21 cgoubert@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mw2439.codfw.wmnet
09:21 cgoubert@cumin1002: START - Cookbook sre.hosts.convert-disks for host mw2439
08:16 pfischer@deploy1002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
08:16 pfischer@deploy1002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
08:15 pfischer@deploy1002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
08:15 pfischer@deploy1002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
08:15 pfischer@deploy1002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
08:15 pfischer@deploy1002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
08:08 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.convert-disks (exit_code=99) for host mw2438
08:08 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2438.mgmt.codfw.wmnet with reboot policy GRACEFUL
08:05 elukey@cumin1002: START - Cookbook sre.hosts.dhcp for host sretest2001.codfw.wmnet
02:50 eileen: civicrm upgraded from 384fe444 to a9ef8ab9
00:28 zabe@deploy1002: sync-world aborted: Backport for Set some site names for new-ish wikis (T363270 T360303 T360310 T363263) (duration: 01m 33s)
00:26 zabe@deploy1002: Started scap sync-world: Backport for Set some site names for new-ish wikis (T363270 T360303 T360310 T363263)

2024-07-18

23:57 topranks: re-enable ssw<->ssw bgp in codfw to move east-west traffic away from CRs T369274
23:46 topranks: move IP GW for vlan private1-d-codfw to ssw1-d1-codfw and ssw1-d8-codfw T369274
23:44 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
23:44 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns entries for migrated codfw gw IPs - cmooney@cumin1002"
23:44 topranks: remove VRRP group for private1-d-codfw vlan on cr1-codfw and cr2-codfw
23:43 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns entries for migrated codfw gw IPs - cmooney@cumin1002"
23:40 cmooney@cumin1002: START - Cookbook sre.dns.netbox
23:36 topranks: move outbound gateway for private1-d-codfw vlan from cr1-codfw to ssw1-d1-codfw
23:31 topranks: disable IPv6 RA generation for private1-d-codfw vlan on cr1-codfw and cr2-codfw T369274
23:17 topranks: enable IPv6 RA generation for private1-d-codfw vlan from ssw1-d1-codfw and ssw1-d8-codfw T369274
23:16 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2145 (T367856)', diff saved to https://phabricator.wikimedia.org/P66838 and previous config saved to /var/cache/conftool/dbconfig/20240718-231639-marostegui.json
23:16 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2145.codfw.wmnet with reason: Maintenance
23:16 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2145.codfw.wmnet with reason: Maintenance
23:05 topranks: Remove VRRP group for vlan private1-c-codfw on cr1-codfw and cr2-codfw
22:49 topranks: Re-route outbound traffic for private1-c-codfw vlan on to ssw1-d1-codfw
22:33 topranks: Disable IPv6 RA generation for private1-c-codfw vlan on cr1-codfw and cr2-codfw T369274
22:19 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on elastic1100.eqiad.wmnet with reason: catch up on indexing
22:19 bking@cumin2002: START - Cookbook sre.hosts.downtime for 4:00:00 on elastic1100.eqiad.wmnet with reason: catch up on indexing
22:15 topranks: add IP interfaces for private1-c-codfw vlan to ssw1-d1-codfw and ssw1-d8-codfw
22:03 topranks: move GW IPs for public1-d-codfw vlan to ssw1-d1-codfw and ssw1-d8-codfw T369274
21:58 topranks: remove VRRP group on cr1-codfw and cr2-codfw for public1-d-codfw vlan T369274
21:57 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
21:57 bking@cumin2002: START - Cookbook sre.elasticsearch.force-shard-allocation
21:39 topranks: disable IPv6 RA generation on cr1-codfw and cr2-codfw for public1-d-codfw vlan T369274
21:21 topranks: enable IPv6 RA generation on ssw1-d1-codfw and ssw1-d8-codfw for public1-d-codfw vlan T369274
21:14 dancy@deploy1002: Finished scap: Backport for Fix guard clause in Revision Hook Handler and Precheck (T370161) (duration: 12m 02s)
21:09 dancy@deploy1002: suecarmol, dancy: Continuing with sync
21:08 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Unbanning all hosts in search_eqiad
21:08 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Unbanning all hosts in search_eqiad
21:04 dancy@deploy1002: suecarmol, dancy: Backport for Fix guard clause in Revision Hook Handler and Precheck (T370161) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:02 dancy@deploy1002: Started scap sync-world: Backport for Fix guard clause in Revision Hook Handler and Precheck (T370161)
21:01 dancy@deploy1002: rebuilt and synchronized wikiversions files: group2 to 1.43.0-wmf.14 refs T366959
20:52 dancy@deploy1002: Finished scap: Backport for Fixes client preferences error (T370441) (duration: 11m 22s)
20:49 topranks: remove VRRP for public1-c-codfw vlan from cr1-codfw and cr2-codfw T369274
20:47 dancy@deploy1002: dancy, jdlrobson: Continuing with sync
20:43 dancy@deploy1002: dancy, jdlrobson: Backport for Fixes client preferences error (T370441) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:41 dancy@deploy1002: Started scap sync-world: Backport for Fixes client preferences error (T370441)
20:25 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2172 (T367856)', diff saved to https://phabricator.wikimedia.org/P66836 and previous config saved to /var/cache/conftool/dbconfig/20240718-202511-marostegui.json
20:25 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2172.codfw.wmnet with reason: Maintenance
20:24 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2172.codfw.wmnet with reason: Maintenance
20:24 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T367856)', diff saved to https://phabricator.wikimedia.org/P66835 and previous config saved to /var/cache/conftool/dbconfig/20240718-202449-marostegui.json
20:04 topranks: enabling IPv6 RA generation for public1-c-codfw on ssw1-d1-codfw and ssw1-d8-codfw T369274
19:54 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P66832 and previous config saved to /var/cache/conftool/dbconfig/20240718-195434-marostegui.json
19:54 dancy@deploy1002: Finished scap: Backport for [i18n] Change the names of the Arabic months (T370456) (duration: 10m 23s)
19:47 dancy@deploy1002: dancy: Continuing with sync
19:46 dancy@deploy1002: dancy: Backport for [i18n] Change the names of the Arabic months (T370456) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
19:43 dancy@deploy1002: Started scap sync-world: Backport for [i18n] Change the names of the Arabic months (T370456)
19:43 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
19:43 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new IRB interfaces codfw - cmooney@cumin1002"
19:42 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new IRB interfaces codfw - cmooney@cumin1002"
19:39 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T367856)', diff saved to https://phabricator.wikimedia.org/P66831 and previous config saved to /var/cache/conftool/dbconfig/20240718-193927-marostegui.json
19:38 cmooney@cumin1002: START - Cookbook sre.dns.netbox
19:37 topranks: add IRB int on public1-c-codfw vlan to ssw1-d1-codfw and ssw1-d8-codfw T369274
19:37 denisse: Send SIGQUIT signal to the benthos service after a goroutine was waiting forever in webrequest_live.yaml - T369256
19:34 topranks: disable BGP between spine switches in rows A and row D prior to enabling IP GW (T369274)
19:32 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on ssw1-a[1,8]-codfw.mgmt,ssw1-d[1,8]-codfw.mgmt with reason: Migrate codfw row c and d IP GWs from CRs to Spines
19:31 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on ssw1-a[1,8]-codfw.mgmt,ssw1-d[1,8]-codfw.mgmt with reason: Migrate codfw row c and d IP GWs from CRs to Spines
19:12 topranks: enabling BGP session from cr1-codfw to ssw1-d1-codfw
19:07 dancy@deploy1002: Installing scap version "4.93.0" for 232 hosts
18:30 aokoth@cumin1002: END (FAIL) - Cookbook sre.vrts.upgrade (exit_code=99) on VRTS host vrts1001.eqiad.wmnet
18:27 aokoth@cumin1002: START - Cookbook sre.vrts.upgrade on VRTS host vrts1001.eqiad.wmnet
18:17 swfrench-wmf: api-ro.discovery.wmnet now resolves to failoid - T367949
18:03 swfrench-wmf: appservers-ro.discovery.wmnet now resolves to failoid - T367949
18:01 aokoth@cumin1002: END (FAIL) - Cookbook sre.vrts.upgrade (exit_code=99) on VRTS host vrts1001.eqiad.wmnet
18:01 aokoth@cumin1002: START - Cookbook sre.vrts.upgrade on VRTS host vrts1001.eqiad.wmnet
17:45 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2136', diff saved to https://phabricator.wikimedia.org/P66829 and previous config saved to /var/cache/conftool/dbconfig/20240718-174547-root.json
17:43 topranks: disabling cr2-codfw port et-1/1/0 connecting to asw-c-codfw T366941
17:38 cgoubert@cumin1002: START - Cookbook sre.hosts.provision for host mw2438.mgmt.codfw.wmnet with reboot policy GRACEFUL
17:29 cgoubert@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts mw2438.codfw.wmnet
17:29 cgoubert@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mw2438.codfw.wmnet
17:29 cgoubert@cumin1002: START - Cookbook sre.hosts.convert-disks for host mw2438
17:28 cgoubert@cumin1002: END (ERROR) - Cookbook sre.hosts.convert-disks (exit_code=97) for host mw2438
17:24 topranks: making cr1-codfw interfaces connecting ssw1-d1-codfw VRRP master for row c & d vlans T366941
17:20 cgoubert@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts mw2438.codfw.wmnet
17:20 cgoubert@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mw2438.codfw.wmnet
17:20 cgoubert@cumin1002: START - Cookbook sre.hosts.convert-disks for host mw2438
17:15 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.convert-disks (exit_code=99) for host mw2438
17:15 cgoubert@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts mw2438.codfw.wmnet
17:15 cgoubert@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mw2438.codfw.wmnet
17:15 cgoubert@cumin1002: START - Cookbook sre.hosts.convert-disks for host mw2438
17:10 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.convert-disks (exit_code=99) for host mw2438
17:10 cgoubert@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts mw2438.codfw.wmnet
17:10 cgoubert@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mw2438.codfw.wmnet
17:09 cgoubert@cumin1002: START - Cookbook sre.hosts.convert-disks for host mw2438
16:52 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.convert-disks (exit_code=99) for host mw2438
16:39 topranks: resetting line card 1/1 on cr1-codfw (T366941)
16:37 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mw2438.codfw.wmnet
16:35 cgoubert@cumin1002: START - Cookbook sre.hosts.reboot-single for host mw2438.codfw.wmnet
16:35 cgoubert@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts mw2438.codfw.wmnet
16:34 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:20:00 on ssw1-a1-codfw.mgmt with reason: bouncing line card on cr1-codfw
16:34 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:20:00 on ssw1-a1-codfw.mgmt with reason: bouncing line card on cr1-codfw
16:32 papaul: re-enable option 82 on lsw1-b7-codfw
16:26 cgoubert@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mw2438.codfw.wmnet
16:25 cgoubert@cumin1002: START - Cookbook sre.hosts.convert-disks for host mw2438
16:24 papaul: disable option 82 on lsw1-b7-codfw to test pxe boot issue
16:23 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.convert-disks (exit_code=99) for host mw2433
16:21 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:20:00 on cloudsw1-b1-codfw.mgmt,pfw3-codfw with reason: bouncing line card on cr1-codfw
16:21 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:20:00 on cloudsw1-b1-codfw.mgmt,pfw3-codfw with reason: bouncing line card on cr1-codfw
16:13 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mw2433.codfw.wmnet
16:10 cgoubert@cumin1002: START - Cookbook sre.hosts.reboot-single for host mw2433.codfw.wmnet
16:10 cgoubert@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts mw2433.codfw.wmnet
16:10 cgoubert@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mw2433.codfw.wmnet
16:10 cgoubert@cumin1002: START - Cookbook sre.hosts.convert-disks for host mw2433
16:07 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:20:00 on cloudsw1-b1-codfw.mgmt,pfw3-codfw with reason: bouncing line card on cr1-codfw
16:07 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:20:00 on cloudsw1-b1-codfw.mgmt,pfw3-codfw with reason: bouncing line card on cr1-codfw
15:52 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub: sync on production
15:48 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub: apply on production
15:37 arnaudb@cumin1002: dbctl commit (dc=all): 'db1203 (re)pooling @ 100%: maintenance rescheduled', diff saved to https://phabricator.wikimedia.org/P66827 and previous config saved to /var/cache/conftool/dbconfig/20240718-153748-arnaudb.json
15:37 arnaudb@cumin1002: dbctl commit (dc=all): 'db1202 (re)pooling @ 100%: maintenance rescheduled', diff saved to https://phabricator.wikimedia.org/P66826 and previous config saved to /var/cache/conftool/dbconfig/20240718-153731-arnaudb.json
15:37 arnaudb@cumin1002: dbctl commit (dc=all): 'db1195 (re)pooling @ 100%: maintenance rescheduled', diff saved to https://phabricator.wikimedia.org/P66825 and previous config saved to /var/cache/conftool/dbconfig/20240718-153718-arnaudb.json
15:35 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host sretest2001.codfw.wmnet
15:29 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2433.mgmt.codfw.wmnet with reboot policy GRACEFUL
15:23 cgoubert@cumin1002: START - Cookbook sre.hosts.provision for host mw2433.mgmt.codfw.wmnet with reboot policy GRACEFUL
15:22 arnaudb@cumin1002: dbctl commit (dc=all): 'db1203 (re)pooling @ 75%: maintenance rescheduled', diff saved to https://phabricator.wikimedia.org/P66824 and previous config saved to /var/cache/conftool/dbconfig/20240718-152243-arnaudb.json
15:22 arnaudb@cumin1002: dbctl commit (dc=all): 'db1202 (re)pooling @ 75%: maintenance rescheduled', diff saved to https://phabricator.wikimedia.org/P66823 and previous config saved to /var/cache/conftool/dbconfig/20240718-152225-arnaudb.json
15:22 arnaudb@cumin1002: dbctl commit (dc=all): 'db1195 (re)pooling @ 75%: maintenance rescheduled', diff saved to https://phabricator.wikimedia.org/P66822 and previous config saved to /var/cache/conftool/dbconfig/20240718-152213-arnaudb.json
15:19 topranks: disabling interface et-1/1/3 on cr1-codfw (facing asw-d-codfw) T366941
15:17 topranks: disabling interface et-1/1/0 on cr1-codfw (facing asw-c-codfw) T366941
15:13 elukey@cumin1002: START - Cookbook sre.hosts.dhcp for host sretest2001.codfw.wmnet
15:12 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on cr[1-2]-codfw,ssw1-d[1,8]-codfw with reason: Move asw-c-codfw and asw-d-codfw CR uplinks
15:12 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on cr[1-2]-codfw,ssw1-d[1,8]-codfw with reason: Move asw-c-codfw and asw-d-codfw CR uplinks
15:12 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.convert-disks (exit_code=99) for host mw2433
15:09 mforns@deploy1002: Finished deploy [airflow-dags/analytics@cde3c31]: (no justification provided) (duration: 00m 30s)
15:08 mforns@deploy1002: Started deploy [airflow-dags/analytics@cde3c31]: (no justification provided)
15:07 arnaudb@cumin1002: dbctl commit (dc=all): 'db1203 (re)pooling @ 50%: maintenance rescheduled', diff saved to https://phabricator.wikimedia.org/P66821 and previous config saved to /var/cache/conftool/dbconfig/20240718-150737-arnaudb.json
15:07 arnaudb@cumin1002: dbctl commit (dc=all): 'db1202 (re)pooling @ 50%: maintenance rescheduled', diff saved to https://phabricator.wikimedia.org/P66820 and previous config saved to /var/cache/conftool/dbconfig/20240718-150720-arnaudb.json
15:07 arnaudb@cumin1002: dbctl commit (dc=all): 'db1195 (re)pooling @ 50%: maintenance rescheduled', diff saved to https://phabricator.wikimedia.org/P66819 and previous config saved to /var/cache/conftool/dbconfig/20240718-150708-arnaudb.json
15:03 cgoubert@cumin1002: START - Cookbook sre.hosts.convert-disks for host mw2433
14:58 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.convert-disks (exit_code=99) for host mw2433
14:58 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host mw2433.codfw.wmnet
14:52 arnaudb@cumin1002: dbctl commit (dc=all): 'db1203 (re)pooling @ 25%: maintenance rescheduled', diff saved to https://phabricator.wikimedia.org/P66818 and previous config saved to /var/cache/conftool/dbconfig/20240718-145232-arnaudb.json
14:52 arnaudb@cumin1002: dbctl commit (dc=all): 'db1202 (re)pooling @ 25%: maintenance rescheduled', diff saved to https://phabricator.wikimedia.org/P66817 and previous config saved to /var/cache/conftool/dbconfig/20240718-145214-arnaudb.json
14:52 arnaudb@cumin1002: dbctl commit (dc=all): 'db1195 (re)pooling @ 25%: maintenance rescheduled', diff saved to Unable to send diff to phaste and previous config saved to /var/cache/conftool/dbconfig/20240718-145157-arnaudb.json
14:47 arnaudb@cumin1002: dbctl commit (dc=all): 'T365998 - depooling db1195 - s1 db1202 - s7 db1203 - s8', diff saved to https://phabricator.wikimedia.org/P66816 and previous config saved to /var/cache/conftool/dbconfig/20240718-144754-arnaudb.json
14:47 cgoubert@cumin1002: START - Cookbook sre.hosts.reboot-single for host mw2433.codfw.wmnet
14:47 cgoubert@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts mw2433.codfw.wmnet
14:40 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-conf1004.eqiad.wmnet with OS bookworm
14:38 cgoubert@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mw2433.codfw.wmnet
14:38 cgoubert@cumin1002: START - Cookbook sre.hosts.convert-disks for host mw2433
14:17 kevinbazira@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
14:06 logmsgbot: lucaswerkmeister-wmde@deploy1002 helmfile [codfw] DONE helmfile.d/services/termbox: apply
14:06 logmsgbot: lucaswerkmeister-wmde@deploy1002 helmfile [codfw] START helmfile.d/services/termbox: apply
14:06 logmsgbot: lucaswerkmeister-wmde@deploy1002 helmfile [eqiad] DONE helmfile.d/services/termbox: apply
14:06 logmsgbot: lucaswerkmeister-wmde@deploy1002 helmfile [eqiad] START helmfile.d/services/termbox: apply
14:05 logmsgbot: lucaswerkmeister-wmde@deploy1002 helmfile [staging] DONE helmfile.d/services/termbox: apply
14:05 logmsgbot: lucaswerkmeister-wmde@deploy1002 helmfile [staging] START helmfile.d/services/termbox: apply
14:01 logmsgbot: lucaswerkmeister-wmde@deploy1002 helmfile [codfw] DONE helmfile.d/services/termbox: apply
14:01 logmsgbot: lucaswerkmeister-wmde@deploy1002 helmfile [codfw] START helmfile.d/services/termbox: apply
14:01 logmsgbot: lucaswerkmeister-wmde@deploy1002 helmfile [eqiad] DONE helmfile.d/services/termbox: apply
14:01 logmsgbot: lucaswerkmeister-wmde@deploy1002 helmfile [eqiad] START helmfile.d/services/termbox: apply
13:55 logmsgbot: lucaswerkmeister-wmde@deploy1002 helmfile [staging] DONE helmfile.d/services/termbox: apply
13:53 logmsgbot: lucaswerkmeister-wmde@deploy1002 helmfile [staging] START helmfile.d/services/termbox: apply
13:50 brett: Release ncmonitor 1.1.0-1 to bookworm-wikimedia
13:46 Dreamy_Jazz: Afternoon UTC backport window done
13:44 dreamyjazz@deploy1002: Finished scap: Backport for Allow Bureaucrats on Foundation Wiki to be able to remove Sysop rights (T370097), fix(editor): make PageTitleControl reliably blankable (T370326) (duration: 09m 59s)
13:39 dreamyjazz@deploy1002: migr, dreamyjazz, dreamrimmer: Continuing with sync
13:36 dreamyjazz@deploy1002: migr, dreamyjazz, dreamrimmer: Backport for Allow Bureaucrats on Foundation Wiki to be able to remove Sysop rights (T370097), fix(editor): make PageTitleControl reliably blankable (T370326) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:34 dreamyjazz@deploy1002: Started scap sync-world: Backport for Allow Bureaucrats on Foundation Wiki to be able to remove Sysop rights (T370097), fix(editor): make PageTitleControl reliably blankable (T370326)
13:29 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-conf1004.eqiad.wmnet with OS bookworm
13:07 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2432.codfw.wmnet with OS buster
12:55 topranks: re-enabling interface et-1/0/2 on cr2-codfw which connects to ssw1-d8-codfw (problemtic IP interfaces have been deleted) T366941
12:52 topranks: re-enabling BGP between spine-layer switches in codfw (problematic IP interfaces have been deleted) T366941
12:51 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
12:51 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove entries for IRB ints on row D spines - cmooney@cumin1002"
12:50 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove entries for IRB ints on row D spines - cmooney@cumin1002"
12:46 cmooney@cumin1002: START - Cookbook sre.dns.netbox
12:40 dreamyjazz@deploy1002: Finished scap: Backport for [GlobalBlocking] Enable global account blocks on all wikis (T356924) (duration: 09m 10s)
12:35 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub-next: sync on staging
12:35 dreamyjazz@deploy1002: dreamyjazz: Continuing with sync
12:34 dreamyjazz@deploy1002: dreamyjazz: Backport for [GlobalBlocking] Enable global account blocks on all wikis (T356924) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
12:33 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2432.codfw.wmnet with reason: host reimage
12:32 pfischer@deploy1002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
12:32 pfischer@deploy1002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
12:32 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub-next: apply on staging
12:30 dreamyjazz@deploy1002: Started scap sync-world: Backport for [GlobalBlocking] Enable global account blocks on all wikis (T356924)
12:27 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2432.codfw.wmnet with reason: host reimage
12:25 elukey: update spicerack to 8.8.0 on cumin1002
12:14 claime: restarting sync-puppet-volatile on puppetserver2001
12:10 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host mw2432.codfw.wmnet with OS buster
12:09 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
12:09 isaranto@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
12:08 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
11:42 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2432.mgmt.codfw.wmnet with reboot policy GRACEFUL
11:39 cgoubert@cumin1002: START - Cookbook sre.hosts.provision for host mw2432.mgmt.codfw.wmnet with reboot policy GRACEFUL
11:15 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub-next: sync on staging
11:14 jforrester@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
11:14 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
11:14 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new IRB interfaces codfw - cmooney@cumin1002"
11:13 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new IRB interfaces codfw - cmooney@cumin1002"
11:12 jforrester@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
11:12 jforrester@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
11:10 cmooney@cumin1002: START - Cookbook sre.dns.netbox
11:10 jforrester@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
11:09 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
11:07 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
11:07 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
11:05 cmooney@cumin1002: START - Cookbook sre.dns.netbox
11:05 cmooney@cumin1002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
11:04 cmooney@cumin1002: START - Cookbook sre.dns.netbox
11:04 cmooney@cumin1002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
11:03 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub-next: apply on staging
10:54 cmooney@cumin1002: START - Cookbook sre.dns.netbox
10:38 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host mw2432.codfw.wmnet with OS buster
10:28 cgoubert@cumin1002: END (ERROR) - Cookbook sre.hosts.convert-disks (exit_code=97) for host mw2432
10:17 cgoubert@cumin1002: START - Cookbook sre.hosts.convert-disks for host mw2432
10:08 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.convert-disks (exit_code=99) for host mw2432
10:04 cgoubert@cumin1002: START - Cookbook sre.hosts.convert-disks for host mw2432
09:56 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.convert-disks (exit_code=99) for host mw2432
09:52 cgoubert@cumin1002: START - Cookbook sre.hosts.convert-disks for host mw2432
09:46 kevinbazira@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
09:46 kevinbazira@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
09:44 elukey: upgrade spicerack to 8.8.0 on cumin2002 - testing the new release
09:43 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub-next: sync on staging
09:26 elukey: uploaded spicerack_8.8.0 to apt.wikimedia.org bullseye-wikimedia
09:26 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub-next: apply on staging
09:08 btullis: disabled check-private-data.timer on clouddb1021, pending decom.
09:06 dcausse@deploy1002: helmfile [eqiad] DONE helmfile.d/services/rdf-streaming-updater: apply
09:06 dcausse@deploy1002: helmfile [eqiad] START helmfile.d/services/rdf-streaming-updater: apply
09:02 dcausse@deploy1002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
09:02 dcausse@deploy1002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
08:56 dcausse@deploy1002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
08:55 dcausse@deploy1002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
08:51 dcausse@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
08:51 dcausse@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
08:47 dcausse@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
08:47 dcausse@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
08:13 aklapper@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.43.0-wmf.14 refs T366959
04:38 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2155 (T367856)', diff saved to https://phabricator.wikimedia.org/P66806 and previous config saved to /var/cache/conftool/dbconfig/20240718-043817-marostegui.json
04:38 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
04:37 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
04:37 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2155.codfw.wmnet with reason: Maintenance
04:37 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2155.codfw.wmnet with reason: Maintenance
04:37 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T367856)', diff saved to https://phabricator.wikimedia.org/P66805 and previous config saved to /var/cache/conftool/dbconfig/20240718-043739-marostegui.json
04:22 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P66804 and previous config saved to /var/cache/conftool/dbconfig/20240718-042232-marostegui.json
04:07 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P66803 and previous config saved to /var/cache/conftool/dbconfig/20240718-040725-marostegui.json
03:52 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T367856)', diff saved to https://phabricator.wikimedia.org/P66802 and previous config saved to /var/cache/conftool/dbconfig/20240718-035218-marostegui.json
00:35 ryankemper@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: elastic110[0-2]* for row maint - ryankemper@cumin2002 - T348977
00:35 ryankemper@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: elastic110[0-2]* for row maint - ryankemper@cumin2002 - T348977
00:05 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2220 (T367781)', diff saved to https://phabricator.wikimedia.org/P66801 and previous config saved to /var/cache/conftool/dbconfig/20240718-000500-arnaudb.json

2024-07-17

23:50 mutante: phabricator (phab1004) - deployed gerrit:1054907 ; restarted apache
23:49 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2220', diff saved to https://phabricator.wikimedia.org/P66800 and previous config saved to /var/cache/conftool/dbconfig/20240717-234953-arnaudb.json
23:34 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2220', diff saved to https://phabricator.wikimedia.org/P66799 and previous config saved to /var/cache/conftool/dbconfig/20240717-233446-arnaudb.json
23:19 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2220 (T367781)', diff saved to https://phabricator.wikimedia.org/P66798 and previous config saved to /var/cache/conftool/dbconfig/20240717-231939-arnaudb.json
23:16 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2220 (T367781)', diff saved to https://phabricator.wikimedia.org/P66797 and previous config saved to /var/cache/conftool/dbconfig/20240717-231612-arnaudb.json
23:16 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2220.codfw.wmnet with reason: Maintenance
23:16 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
23:15 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2220.codfw.wmnet with reason: Maintenance
23:15 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2208 (T367781)', diff saved to https://phabricator.wikimedia.org/P66796 and previous config saved to /var/cache/conftool/dbconfig/20240717-231550-arnaudb.json
23:14 jclark@cumin1002: START - Cookbook sre.dns.netbox
23:13 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephmon1006
23:13 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephmon1006
23:13 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephmon1004
23:13 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephmon1004
23:12 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephmon1005
23:11 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephmon1005
23:00 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2208', diff saved to https://phabricator.wikimedia.org/P66795 and previous config saved to /var/cache/conftool/dbconfig/20240717-230043-arnaudb.json
22:45 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2208', diff saved to https://phabricator.wikimedia.org/P66794 and previous config saved to /var/cache/conftool/dbconfig/20240717-224536-arnaudb.json
22:39 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephmon1006.eqiad.wmnet with OS bullseye
22:39 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephmon1005.eqiad.wmnet with OS bullseye
22:39 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephmon1004.eqiad.wmnet with OS bullseye
22:37 zabe: zabe@mwmaint1002:~$ mwscript createAndPromote.php aewikimedia "Reda Kerbouche" REDACTED --bureaucrat --sysop # T362529
22:30 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2208 (T367781)', diff saved to https://phabricator.wikimedia.org/P66793 and previous config saved to /var/cache/conftool/dbconfig/20240717-223028-arnaudb.json
22:28 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephmon1004.mgmt.eqiad.wmnet with reboot policy FORCED
22:28 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephmon1006.mgmt.eqiad.wmnet with reboot policy FORCED
22:28 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephmon1005.mgmt.eqiad.wmnet with reboot policy FORCED
22:27 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2208 (T367781)', diff saved to https://phabricator.wikimedia.org/P66792 and previous config saved to /var/cache/conftool/dbconfig/20240717-222701-arnaudb.json
22:26 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2208.codfw.wmnet with reason: Maintenance
22:26 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2208.codfw.wmnet with reason: Maintenance
22:26 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2200.codfw.wmnet with reason: Maintenance
22:26 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2200.codfw.wmnet with reason: Maintenance
22:25 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2198.codfw.wmnet with reason: Maintenance
22:25 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2198.codfw.wmnet with reason: Maintenance
22:25 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T367781)', diff saved to https://phabricator.wikimedia.org/P66791 and previous config saved to /var/cache/conftool/dbconfig/20240717-222530-arnaudb.json
22:17 jclark@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephmon1005.mgmt.eqiad.wmnet with reboot policy FORCED
22:17 jclark@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephmon1006.mgmt.eqiad.wmnet with reboot policy FORCED
22:17 jclark@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephmon1004.mgmt.eqiad.wmnet with reboot policy FORCED
22:14 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
22:14 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt for cloudcephmon1004-6 - jclark@cumin1002"
22:13 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt for cloudcephmon1004-6 - jclark@cumin1002"
22:10 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P66790 and previous config saved to /var/cache/conftool/dbconfig/20240717-221023-arnaudb.json
22:07 jclark@cumin1002: START - Cookbook sre.dns.netbox
21:55 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P66789 and previous config saved to /var/cache/conftool/dbconfig/20240717-215516-arnaudb.json
21:51 eileen: civicrm upgraded from 1ac3e7be to 384fe444
21:40 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T367781)', diff saved to https://phabricator.wikimedia.org/P66788 and previous config saved to /var/cache/conftool/dbconfig/20240717-214008-arnaudb.json
21:36 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2182 (T367781)', diff saved to https://phabricator.wikimedia.org/P66787 and previous config saved to /var/cache/conftool/dbconfig/20240717-213641-arnaudb.json
21:36 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2182.codfw.wmnet with reason: Maintenance
21:36 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2182.codfw.wmnet with reason: Maintenance
21:36 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2168 (T367781)', diff saved to https://phabricator.wikimedia.org/P66786 and previous config saved to /var/cache/conftool/dbconfig/20240717-213619-arnaudb.json
away: UTC late deploys done
21:21 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2168', diff saved to https://phabricator.wikimedia.org/P66785 and previous config saved to /var/cache/conftool/dbconfig/20240717-212112-arnaudb.json
21:19 tgr@deploy1002: Finished scap: Backport for skin-themes dblist is expanded to include tier 2 wikis as well as tier 1. (T367150) (duration: 16m 59s)
21:14 tgr@deploy1002: tgr, ksarabia: Continuing with sync
21:08 tgr@deploy1002: tgr, ksarabia: Backport for skin-themes dblist is expanded to include tier 2 wikis as well as tier 1. (T367150) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:06 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2168', diff saved to https://phabricator.wikimedia.org/P66784 and previous config saved to /var/cache/conftool/dbconfig/20240717-210605-arnaudb.json
21:02 tgr@deploy1002: Started scap sync-world: Backport for skin-themes dblist is expanded to include tier 2 wikis as well as tier 1. (T367150)
21:01 tgr@deploy1002: Finished scap: Backport for SUL3: Fix URL handling for the SSO domain (T365162) (duration: 42m 33s)
20:54 tgr@deploy1002: tgr: Continuing with sync
20:53 tgr@deploy1002: tgr: Backport for SUL3: Fix URL handling for the SSO domain (T365162) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:50 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2168 (T367781)', diff saved to https://phabricator.wikimedia.org/P66783 and previous config saved to /var/cache/conftool/dbconfig/20240717-205058-arnaudb.json
20:47 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2168 (T367781)', diff saved to https://phabricator.wikimedia.org/P66782 and previous config saved to /var/cache/conftool/dbconfig/20240717-204731-arnaudb.json
20:47 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2168.codfw.wmnet with reason: Maintenance
20:47 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2168.codfw.wmnet with reason: Maintenance
20:47 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T367781)', diff saved to https://phabricator.wikimedia.org/P66781 and previous config saved to /var/cache/conftool/dbconfig/20240717-204709-arnaudb.json
20:32 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P66780 and previous config saved to /var/cache/conftool/dbconfig/20240717-203202-arnaudb.json
20:18 tgr@deploy1002: Started scap sync-world: Backport for SUL3: Fix URL handling for the SSO domain (T365162)
20:16 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P66779 and previous config saved to /var/cache/conftool/dbconfig/20240717-201655-arnaudb.json
20:14 tgr@deploy1002: Finished scap: Backport for SUL3: Fix cookie names on the SSO domain (T365162) (duration: 09m 23s)
20:12 topranks: rebooting unused switch ssw1-d8-codfw in an effort to troubleshoot gnmic errors
20:12 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on cr2-codfw,ssw1-a[1,8]-codfw.mgmt with reason: Rebooting ssw1-d8-codfw to try and fix gnmi telemtry
20:12 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on cr2-codfw,ssw1-a[1,8]-codfw.mgmt with reason: Rebooting ssw1-d8-codfw to try and fix gnmi telemtry
20:09 tgr@deploy1002: tgr: Continuing with sync
20:07 tgr@deploy1002: tgr: Backport for SUL3: Fix cookie names on the SSO domain (T365162) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:04 tgr@deploy1002: Started scap sync-world: Backport for SUL3: Fix cookie names on the SSO domain (T365162)
20:01 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T367781)', diff saved to https://phabricator.wikimedia.org/P66778 and previous config saved to /var/cache/conftool/dbconfig/20240717-200147-arnaudb.json
19:59 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2159 (T367781)', diff saved to https://phabricator.wikimedia.org/P66777 and previous config saved to /var/cache/conftool/dbconfig/20240717-195921-arnaudb.json
19:59 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance
19:59 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance
19:59 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2159.codfw.wmnet with reason: Maintenance
19:58 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2159.codfw.wmnet with reason: Maintenance
19:58 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T367781)', diff saved to https://phabricator.wikimedia.org/P66776 and previous config saved to /var/cache/conftool/dbconfig/20240717-195844-arnaudb.json
19:45 eileen: config revision changed from 85336766 to 4ea1c745
19:43 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P66775 and previous config saved to /var/cache/conftool/dbconfig/20240717-194337-arnaudb.json
19:28 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P66774 and previous config saved to /var/cache/conftool/dbconfig/20240717-192830-arnaudb.json
19:13 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T367781)', diff saved to https://phabricator.wikimedia.org/P66773 and previous config saved to /var/cache/conftool/dbconfig/20240717-191324-arnaudb.json
19:11 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2150 (T367781)', diff saved to https://phabricator.wikimedia.org/P66772 and previous config saved to /var/cache/conftool/dbconfig/20240717-191057-arnaudb.json
19:10 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2150.codfw.wmnet with reason: Maintenance
19:10 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2150.codfw.wmnet with reason: Maintenance
19:10 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2122 (T367781)', diff saved to https://phabricator.wikimedia.org/P66771 and previous config saved to /var/cache/conftool/dbconfig/20240717-191035-arnaudb.json
18:55 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2122', diff saved to https://phabricator.wikimedia.org/P66770 and previous config saved to /var/cache/conftool/dbconfig/20240717-185528-arnaudb.json
18:46 dancy@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.43.0-wmf.14 refs T366959
18:40 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2122', diff saved to https://phabricator.wikimedia.org/P66769 and previous config saved to /var/cache/conftool/dbconfig/20240717-184021-arnaudb.json
18:25 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2122 (T367781)', diff saved to https://phabricator.wikimedia.org/P66768 and previous config saved to /var/cache/conftool/dbconfig/20240717-182514-arnaudb.json
18:21 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2122 (T367781)', diff saved to https://phabricator.wikimedia.org/P66767 and previous config saved to /var/cache/conftool/dbconfig/20240717-182147-arnaudb.json
18:21 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2122.codfw.wmnet with reason: Maintenance
18:21 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2122.codfw.wmnet with reason: Maintenance
18:21 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2121 (T367781)', diff saved to https://phabricator.wikimedia.org/P66766 and previous config saved to /var/cache/conftool/dbconfig/20240717-182125-arnaudb.json
18:10 dancy@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.43.0-wmf.14 refs T366959
18:06 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2121', diff saved to https://phabricator.wikimedia.org/P66765 and previous config saved to /var/cache/conftool/dbconfig/20240717-180617-arnaudb.json
18:01 topranks: adjust route preference for traffic to AWS on Eqiad core routers T370297
17:51 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2121', diff saved to https://phabricator.wikimedia.org/P66764 and previous config saved to /var/cache/conftool/dbconfig/20240717-175110-arnaudb.json
17:36 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2121 (T367781)', diff saved to https://phabricator.wikimedia.org/P66763 and previous config saved to /var/cache/conftool/dbconfig/20240717-173603-arnaudb.json
17:33 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2121 (T367781)', diff saved to https://phabricator.wikimedia.org/P66762 and previous config saved to /var/cache/conftool/dbconfig/20240717-173336-arnaudb.json
17:33 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2121.codfw.wmnet with reason: Maintenance
17:33 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2121.codfw.wmnet with reason: Maintenance
17:33 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
17:33 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
17:32 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1227 (T367781)', diff saved to https://phabricator.wikimedia.org/P66761 and previous config saved to /var/cache/conftool/dbconfig/20240717-173257-arnaudb.json
17:27 mutante: removing integration.mediawiki.org from DNS - T361250
17:17 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1227', diff saved to https://phabricator.wikimedia.org/P66760 and previous config saved to /var/cache/conftool/dbconfig/20240717-171750-arnaudb.json
17:13 inflatador: bking@kafka-main2005 `kafka topics --create --topic ${TOPIC} --partitions 1 --replication-factor 3; kafka configs --entity-type topics --entity-name ${TOPIC} --alter --add-config retention.ms=2592000000 T367510`
17:02 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1227', diff saved to https://phabricator.wikimedia.org/P66759 and previous config saved to /var/cache/conftool/dbconfig/20240717-170243-arnaudb.json
16:59 btullis@deploy1002: Finished deploy [airflow-dags/analytics@ca21d05]: (no justification provided) (duration: 00m 51s)
16:58 btullis@deploy1002: Started deploy [airflow-dags/analytics@ca21d05]: (no justification provided)
16:47 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1227 (T367781)', diff saved to https://phabricator.wikimedia.org/P66758 and previous config saved to /var/cache/conftool/dbconfig/20240717-164736-arnaudb.json
16:45 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1227 (T367781)', diff saved to https://phabricator.wikimedia.org/P66757 and previous config saved to /var/cache/conftool/dbconfig/20240717-164521-arnaudb.json
16:45 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1227.eqiad.wmnet with reason: Maintenance
16:45 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1227.eqiad.wmnet with reason: Maintenance
16:45 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1202 (T367781)', diff saved to https://phabricator.wikimedia.org/P66756 and previous config saved to /var/cache/conftool/dbconfig/20240717-164459-arnaudb.json
16:34 klausman@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
16:34 klausman@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
16:32 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
16:31 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
16:31 klausman@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
16:31 klausman@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
16:30 klausman@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
16:30 otto@deploy1002: Finished deploy [analytics/refinery@8f00c85] (thin): THIN [analytics/refinery@8f00c859] (duration: 04m 08s)
16:29 klausman@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
16:29 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P66755 and previous config saved to /var/cache/conftool/dbconfig/20240717-162952-arnaudb.json
16:26 otto@deploy1002: Started deploy [analytics/refinery@8f00c85] (thin): THIN [analytics/refinery@8f00c859]
16:21 otto@deploy1002: Finished deploy [analytics/refinery@8f00c85]: [analytics/refinery@8f00c859] (duration: 07m 59s)
16:14 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P66754 and previous config saved to /var/cache/conftool/dbconfig/20240717-161445-arnaudb.json
16:13 otto@deploy1002: Started deploy [analytics/refinery@8f00c85]: [analytics/refinery@8f00c859]
16:08 inflatador: bking@kafka-main1005 `kafka topics --create --topic ${TOPIC} --partitions 1 --replication-factor 3; kafka configs --entity-type topics --entity-name ${TOPIC} --alter --add-config retention.ms=2592000000` T367510
15:59 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1202 (T367781)', diff saved to https://phabricator.wikimedia.org/P66752 and previous config saved to /var/cache/conftool/dbconfig/20240717-155937-arnaudb.json
15:56 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1202 (T367781)', diff saved to https://phabricator.wikimedia.org/P66751 and previous config saved to /var/cache/conftool/dbconfig/20240717-155628-arnaudb.json
15:56 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1202.eqiad.wmnet with reason: Maintenance
15:56 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1202.eqiad.wmnet with reason: Maintenance
15:56 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T367781)', diff saved to https://phabricator.wikimedia.org/P66750 and previous config saved to /var/cache/conftool/dbconfig/20240717-155606-arnaudb.json
15:53 otto@deploy1002: Finished deploy [analytics/refinery@8f00c85] (hadoop-test): - take 2 - TEST [analytics/refinery@8f00c859] (duration: 03m 33s)
15:50 otto@deploy1002: Started deploy [analytics/refinery@8f00c85] (hadoop-test): - take 2 - TEST [analytics/refinery@8f00c859]
15:46 otto@deploy1002: Finished deploy [analytics/refinery@0b53772] (hadoop-test): TEST [analytics/refinery@0b53772e] (duration: 03m 27s)
15:42 otto@deploy1002: Started deploy [analytics/refinery@0b53772] (hadoop-test): TEST [analytics/refinery@0b53772e]
15:41 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P66748 and previous config saved to /var/cache/conftool/dbconfig/20240717-154059-arnaudb.json
15:38 sukhe@cumin1002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-low-traffic-eqiad and A:lvs
15:37 sukhe@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-low-traffic-eqiad and A:lvs
15:35 sukhe@cumin1002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-secondary-eqiad and A:lvs
15:35 sukhe@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-secondary-eqiad and A:lvs
15:33 sukhe@cumin1002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-low-traffic-codfw and A:lvs
15:32 sukhe@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-low-traffic-codfw and A:lvs
15:32 topranks: Adjust anycast route policy at Chicago Network POP cr2-eqord to announce anycast ranges T367439
15:30 sukhe: sudo cumin "A:lvs" "run-puppet-agent" to pick up apus change
15:29 sukhe@cumin1002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-secondary-codfw and A:lvs
15:28 sukhe@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-secondary-codfw and A:lvs
15:25 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P66747 and previous config saved to /var/cache/conftool/dbconfig/20240717-152552-arnaudb.json
15:24 jforrester@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
15:23 jforrester@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
15:23 jforrester@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
15:22 jforrester@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
15:21 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbproxy2007.codfw.wmnet with OS bookworm
15:21 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
15:21 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
15:21 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) apus.discovery.wmnet on all recursors
15:20 sukhe@cumin1002: START - Cookbook sre.dns.wipe-cache apus.discovery.wmnet on all recursors
15:20 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
15:19 jgiannelos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
15:18 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
15:18 sukhe: running authdns-update for CR 1054346
15:16 jgiannelos@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop: apply
15:16 sukhe: cumin 'A:dnsbox' 'run-puppet-agent': T279621
15:13 jgiannelos@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
15:12 jgiannelos@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop: apply
15:11 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
15:10 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T367781)', diff saved to https://phabricator.wikimedia.org/P66745 and previous config saved to /var/cache/conftool/dbconfig/20240717-151045-arnaudb.json
15:09 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: apply
15:08 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: apply
15:08 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
15:08 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1194 (T367781)', diff saved to https://phabricator.wikimedia.org/P66744 and previous config saved to /var/cache/conftool/dbconfig/20240717-150833-arnaudb.json
15:08 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1194.eqiad.wmnet with reason: Maintenance
15:08 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1194.eqiad.wmnet with reason: Maintenance
15:08 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T367781)', diff saved to https://phabricator.wikimedia.org/P66743 and previous config saved to /var/cache/conftool/dbconfig/20240717-150811-arnaudb.json
15:08 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: apply
15:08 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
15:07 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: apply
15:07 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: apply
15:03 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbproxy2007.codfw.wmnet with reason: host reimage
14:59 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dbproxy2007.codfw.wmnet with reason: host reimage
14:53 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P66742 and previous config saved to /var/cache/conftool/dbconfig/20240717-145303-arnaudb.json
14:46 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/commons-impact-analytics: apply
14:46 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/commons-impact-analytics: apply
14:46 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host dbproxy2007.codfw.wmnet with OS bookworm
14:44 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2141.codfw.wmnet with reason: Maintenance
14:44 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2141.codfw.wmnet with reason: Maintenance
14:44 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T367856)', diff saved to https://phabricator.wikimedia.org/P66741 and previous config saved to /var/cache/conftool/dbconfig/20240717-144415-marostegui.json
14:40 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/commons-impact-analytics: apply
14:40 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/commons-impact-analytics: apply
14:37 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P66740 and previous config saved to /var/cache/conftool/dbconfig/20240717-143756-arnaudb.json
14:37 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/commons-impact-analytics: apply
14:36 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/commons-impact-analytics: apply
14:29 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P66739 and previous config saved to /var/cache/conftool/dbconfig/20240717-142908-marostegui.json
14:27 jgiannelos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
14:27 jgiannelos@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop: apply
14:27 jgiannelos@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
14:27 jgiannelos@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop: apply
14:26 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for durum3003.esams.wmnet
14:26 sukhe@cumin1002: START - Cookbook sre.hosts.remove-downtime for durum3003.esams.wmnet
14:22 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T367781)', diff saved to https://phabricator.wikimedia.org/P66738 and previous config saved to /var/cache/conftool/dbconfig/20240717-142249-arnaudb.json
14:22 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on durum3003.esams.wmnet with reason: testing anycast-healthchecker 0.9.8
14:22 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 0:10:00 on durum3003.esams.wmnet with reason: testing anycast-healthchecker 0.9.8
14:20 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbproxy2008.codfw.wmnet with OS bookworm
14:20 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
14:19 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1191 (T367781)', diff saved to https://phabricator.wikimedia.org/P66737 and previous config saved to /var/cache/conftool/dbconfig/20240717-141939-arnaudb.json
14:19 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1191.eqiad.wmnet with reason: Maintenance
14:19 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1191.eqiad.wmnet with reason: Maintenance
14:19 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T367781)', diff saved to https://phabricator.wikimedia.org/P66736 and previous config saved to /var/cache/conftool/dbconfig/20240717-141929-arnaudb.json
14:19 jforrester@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
14:18 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
14:17 jforrester@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
14:17 jforrester@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
14:16 sukhe: [durum3003] upgrade anycast-healthchecker to 0.9.8-1+wmf12u1: T370068
14:16 jforrester@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
14:14 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.convert-disks (exit_code=99) for host mw2432
14:14 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P66735 and previous config saved to /var/cache/conftool/dbconfig/20240717-141401-marostegui.json
14:11 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
14:11 cgoubert@cumin1002: START - Cookbook sre.hosts.convert-disks for host mw2432
14:11 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
14:07 jgiannelos@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
14:06 jgiannelos@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop: apply
14:04 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P66734 and previous config saved to /var/cache/conftool/dbconfig/20240717-140423-arnaudb.json
14:03 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbproxy2008.codfw.wmnet with reason: host reimage
13:59 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.convert-disks (exit_code=99) for host mw2432
13:59 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dbproxy2008.codfw.wmnet with reason: host reimage
13:58 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T367856)', diff saved to https://phabricator.wikimedia.org/P66733 and previous config saved to /var/cache/conftool/dbconfig/20240717-135854-marostegui.json
13:56 cgoubert@cumin1002: START - Cookbook sre.hosts.convert-disks for host mw2432
13:54 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.convert-disks (exit_code=99) for host mw2432
13:54 cgoubert@cumin1002: START - Cookbook sre.hosts.convert-disks for host mw2432
13:53 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.convert-disks (exit_code=99) for host mw2432
13:53 cgoubert@cumin1002: START - Cookbook sre.hosts.convert-disks for host mw2432
13:49 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P66732 and previous config saved to /var/cache/conftool/dbconfig/20240717-134916-arnaudb.json
13:43 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.convert-disks (exit_code=99) for host mw2432
13:43 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host dbproxy2008.codfw.wmnet with OS bookworm
13:40 cgoubert@cumin1002: START - Cookbook sre.hosts.convert-disks for host mw2432
13:37 jgiannelos@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
13:36 pt1979@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dbproxy2008.codfw.wmnet with OS bookworm
13:34 jgiannelos@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop: apply
13:34 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T367781)', diff saved to https://phabricator.wikimedia.org/P66730 and previous config saved to /var/cache/conftool/dbconfig/20240717-133408-arnaudb.json
13:33 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
13:33 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: apply
13:29 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.convert-disks (exit_code=99) for host mw2432
13:26 cgoubert@cumin1002: START - Cookbook sre.hosts.convert-disks for host mw2432
13:26 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host dbproxy2008.codfw.wmnet with OS bookworm
13:19 urbanecm: Stop revalidateLinkRecommendation for azwiki; restart as `[urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/revalidateLinkRecommendations.php --wiki=azwiki --olderThan=20240104000000 --verbose` instead (T370262)
13:13 urbanecm@deploy1002: Finished scap: Backport for Add Portal namespace for Ingush Wikipedia (T326089), eventbus: enable instrumentation on group 0 (T363587) (duration: 10m 06s)
13:12 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/revalidateLinkRecommendations.php --wiki=azwiki --all --verbose # T370262
13:08 urbanecm@deploy1002: nmw03, gmodena, urbanecm: Continuing with sync
13:07 sukhe: [intentional] stop nginx.service on durum1001
13:05 urbanecm@deploy1002: nmw03, gmodena, urbanecm: Backport for Add Portal namespace for Ingush Wikipedia (T326089), eventbus: enable instrumentation on group 0 (T363587) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:03 urbanecm@deploy1002: Started scap sync-world: Backport for Add Portal namespace for Ingush Wikipedia (T326089), eventbus: enable instrumentation on group 0 (T363587)
12:33 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1181 (T367781)', diff saved to https://phabricator.wikimedia.org/P66729 and previous config saved to /var/cache/conftool/dbconfig/20240717-123352-arnaudb.json
12:33 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1181.eqiad.wmnet with reason: Maintenance
12:33 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1181.eqiad.wmnet with reason: Maintenance
12:33 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T367781)', diff saved to https://phabricator.wikimedia.org/P66728 and previous config saved to /var/cache/conftool/dbconfig/20240717-123341-arnaudb.json
12:31 urbanecm: Community configuration deployment finished
12:29 urbanecm@deploy1002: Finished scap: Backport for CommunityConfiguration: Release to all Growth wikis, except frwiktionary (T366458), dewiki: Disable CommunityConfiguration (T366458) (duration: 08m 30s)
12:24 urbanecm@deploy1002: urbanecm: Continuing with sync
12:23 urbanecm@deploy1002: urbanecm: Backport for CommunityConfiguration: Release to all Growth wikis, except frwiktionary (T366458), dewiki: Disable CommunityConfiguration (T366458) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
12:21 urbanecm@deploy1002: Started scap sync-world: Backport for CommunityConfiguration: Release to all Growth wikis, except frwiktionary (T366458), dewiki: Disable CommunityConfiguration (T366458)
12:19 urbanecm@deploy1002: Sync cancelled.
12:19 urbanecm: (relogging to attach to the task) migrateCommunityConfig.php finished, logs are available at https://phabricator.wikimedia.org/P66724 (T366458)
12:18 urbanecm: migrateCommunityConfig.php finished, logs are available at https://phabricator.wikimedia.org/P66724
12:18 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P66725 and previous config saved to /var/cache/conftool/dbconfig/20240717-121834-arnaudb.json
12:03 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P66723 and previous config saved to /var/cache/conftool/dbconfig/20240717-120327-arnaudb.json
11:57 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.convert-disks (exit_code=99) for host mw2432
11:54 cgoubert@cumin1002: START - Cookbook sre.hosts.convert-disks for host mw2432
11:52 urbanecm: [urbanecm@mwdebug1001 ~]$ foreachwikiindblist growthexperiments extensions/GrowthExperiments/maintenance/migrateCommunityConfig.php # T366458; output logged to migrateCommunityConfig.log in my home
11:51 urbanecm@deploy1002: urbanecm: Backport for CommunityConfiguration: Release to all Growth wikis, except frwiktionary (T366458) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
11:49 urbanecm@deploy1002: Started scap sync-world: Backport for CommunityConfiguration: Release to all Growth wikis, except frwiktionary (T366458)
11:48 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T367781)', diff saved to https://phabricator.wikimedia.org/P66722 and previous config saved to /var/cache/conftool/dbconfig/20240717-114820-arnaudb.json
11:45 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1174 (T367781)', diff saved to https://phabricator.wikimedia.org/P66721 and previous config saved to /var/cache/conftool/dbconfig/20240717-114510-arnaudb.json
11:45 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1174.eqiad.wmnet with reason: Maintenance
11:44 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1174.eqiad.wmnet with reason: Maintenance
11:44 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1171.eqiad.wmnet with reason: Maintenance
11:44 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1171.eqiad.wmnet with reason: Maintenance
11:44 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1170 (T367781)', diff saved to https://phabricator.wikimedia.org/P66720 and previous config saved to /var/cache/conftool/dbconfig/20240717-114426-arnaudb.json
11:40 marostegui@cumin1002: dbctl commit (dc=all): 'Increase db2136's weight - testing 10.11 T365805', diff saved to https://phabricator.wikimedia.org/P66719 and previous config saved to /var/cache/conftool/dbconfig/20240717-114032-marostegui.json
11:39 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2147 (T367856)', diff saved to https://phabricator.wikimedia.org/P66718 and previous config saved to /var/cache/conftool/dbconfig/20240717-113954-marostegui.json
11:39 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2147.codfw.wmnet with reason: Maintenance
11:39 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2147.codfw.wmnet with reason: Maintenance
11:39 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2140 (T367856)', diff saved to https://phabricator.wikimedia.org/P66717 and previous config saved to /var/cache/conftool/dbconfig/20240717-113932-marostegui.json
11:38 _joe_: deleted pod that was reportedly returning 5xx to the cdn for mw-api-ext
11:29 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1170', diff saved to https://phabricator.wikimedia.org/P66716 and previous config saved to /var/cache/conftool/dbconfig/20240717-112919-arnaudb.json
11:27 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.convert-disks (exit_code=99) for host mw2432
11:24 cgoubert@cumin1002: START - Cookbook sre.hosts.convert-disks for host mw2432
11:24 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2140', diff saved to https://phabricator.wikimedia.org/P66715 and previous config saved to /var/cache/conftool/dbconfig/20240717-112425-marostegui.json
11:23 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on mw2432.codfw.wmnet with reason: RAID conversion testing
11:22 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on mw2432.codfw.wmnet with reason: RAID conversion testing
11:14 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1170', diff saved to https://phabricator.wikimedia.org/P66714 and previous config saved to /var/cache/conftool/dbconfig/20240717-111412-arnaudb.json
11:12 cmooney@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-d8-codfw
11:10 cmooney@cumin1002: START - Cookbook sre.network.tls for network device lsw1-d8-codfw
11:10 cmooney@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-d7-codfw
11:09 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2140', diff saved to https://phabricator.wikimedia.org/P66713 and previous config saved to /var/cache/conftool/dbconfig/20240717-110918-marostegui.json
11:08 cmooney@cumin1002: START - Cookbook sre.network.tls for network device lsw1-d7-codfw
11:08 cmooney@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-d6-codfw
11:05 cmooney@cumin1002: START - Cookbook sre.network.tls for network device lsw1-d6-codfw
11:05 cmooney@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-d5-codfw
11:03 cmooney@cumin1002: START - Cookbook sre.network.tls for network device lsw1-d5-codfw
11:03 cmooney@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-d4-codfw
11:01 cmooney@cumin1002: START - Cookbook sre.network.tls for network device lsw1-d4-codfw
11:01 cmooney@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-d3-codfw
10:59 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1170 (T367781)', diff saved to https://phabricator.wikimedia.org/P66712 and previous config saved to /var/cache/conftool/dbconfig/20240717-105904-arnaudb.json
10:58 cmooney@cumin1002: START - Cookbook sre.network.tls for network device lsw1-d3-codfw
10:58 cmooney@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-d2-codfw
10:56 cmooney@cumin1002: START - Cookbook sre.network.tls for network device lsw1-d2-codfw
10:56 cmooney@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-c7-codfw
10:54 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2140 (T367856)', diff saved to https://phabricator.wikimedia.org/P66711 and previous config saved to /var/cache/conftool/dbconfig/20240717-105411-marostegui.json
10:53 cmooney@cumin1002: START - Cookbook sre.network.tls for network device lsw1-c7-codfw
10:53 cmooney@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-c6-codfw
10:51 cmooney@cumin1002: START - Cookbook sre.network.tls for network device lsw1-c6-codfw
10:51 cmooney@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-c5-codfw
10:49 cmooney@cumin1002: START - Cookbook sre.network.tls for network device lsw1-c5-codfw
10:49 cmooney@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-c4-codfw
10:46 cmooney@cumin1002: START - Cookbook sre.network.tls for network device lsw1-c4-codfw
10:46 cmooney@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-c3-codfw
10:44 cmooney@cumin1002: START - Cookbook sre.network.tls for network device lsw1-c3-codfw
10:44 cmooney@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-c2-codfw
10:41 cmooney@cumin1002: START - Cookbook sre.network.tls for network device lsw1-c2-codfw
10:41 cmooney@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-c1-codfw
10:39 cmooney@cumin1002: START - Cookbook sre.network.tls for network device lsw1-c1-codfw
10:39 cmooney@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-d8-codfw
10:37 cmooney@cumin1002: START - Cookbook sre.network.tls for network device ssw1-d8-codfw
10:37 cmooney@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-d1-codfw
10:34 cmooney@cumin1002: START - Cookbook sre.network.tls for network device ssw1-d1-codfw
10:34 cmooney@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device asw1-b4-magru
10:32 cmooney@cumin1002: START - Cookbook sre.network.tls for network device asw1-b4-magru
10:32 cmooney@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device asw1-b3-magru
10:29 cmooney@cumin1002: START - Cookbook sre.network.tls for network device asw1-b3-magru
09:58 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1170 (T367781)', diff saved to https://phabricator.wikimedia.org/P66710 and previous config saved to /var/cache/conftool/dbconfig/20240717-095845-arnaudb.json
09:58 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1170.eqiad.wmnet with reason: Maintenance
09:58 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1170.eqiad.wmnet with reason: Maintenance
09:44 marostegui@cumin1002: dbctl commit (dc=all): 'db1181 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P66709 and previous config saved to /var/cache/conftool/dbconfig/20240717-094412-root.json
09:29 marostegui@cumin1002: dbctl commit (dc=all): 'db1181 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P66708 and previous config saved to /var/cache/conftool/dbconfig/20240717-092907-root.json
09:18 cmooney@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cr2-magru
09:14 cmooney@cumin1002: START - Cookbook sre.network.tls for network device cr2-magru
09:14 marostegui@cumin1002: dbctl commit (dc=all): 'db1181 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P66706 and previous config saved to /var/cache/conftool/dbconfig/20240717-091402-root.json
09:13 cmooney@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cr1-magru
09:08 cmooney@cumin1002: START - Cookbook sre.network.tls for network device cr1-magru
09:02 elukey@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp4037.ulsfo.wmnet
08:58 marostegui@cumin1002: dbctl commit (dc=all): 'db1181 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P66705 and previous config saved to /var/cache/conftool/dbconfig/20240717-085857-root.json
08:57 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4037.ulsfo.wmnet
08:48 elukey@cumin1002: START - Cookbook sre.hosts.reboot-single for host cp4037.ulsfo.wmnet
08:47 elukey@puppetserver1001: conftool action : set/pooled=no; selector: name=cp4037.ulsfo.wmnet
08:43 marostegui@cumin1002: dbctl commit (dc=all): 'db1181 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P66704 and previous config saved to /var/cache/conftool/dbconfig/20240717-084351-root.json
08:06 kartik@deploy1002: Finished scap: Backport for TranslatablePageState: Check if banner namespaces are configured (T370219) (duration: 14m 26s)
08:00 kartik@deploy1002: abi, kartik: Continuing with sync
07:54 kartik@deploy1002: abi, kartik: Backport for TranslatablePageState: Check if banner namespaces are configured (T370219) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
07:51 kartik@deploy1002: Started scap sync-world: Backport for TranslatablePageState: Check if banner namespaces are configured (T370219)
07:50 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
07:50 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
07:50 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
07:49 elukey: restart hadoop-mapreduce-historyserver.service on an-master1003 - failed for Java OOM
07:49 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
07:38 elukey@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-d1-codfw
07:37 jayme: imported helm3 3.11.3 to bullseye-wikimedia and buster-wikimedia
07:36 elukey@cumin1002: START - Cookbook sre.network.tls for network device lsw1-d1-codfw
06:48 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'clear' for AS: 17072
06:48 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'clear' for AS: 17072
05:36 marostegui: Deploy schema change on s7 eqiad db1181 dbmaint T367856
05:35 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1181.eqiad.wmnet with reason: Long schema change
05:35 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1181.eqiad.wmnet with reason: Long schema change
05:34 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1181 T370121', diff saved to https://phabricator.wikimedia.org/P66703 and previous config saved to /var/cache/conftool/dbconfig/20240717-053359-marostegui.json
05:33 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db1236 to s7 primary and set section read-write T370121', diff saved to https://phabricator.wikimedia.org/P66702 and previous config saved to /var/cache/conftool/dbconfig/20240717-053302-root.json
05:32 marostegui@cumin1002: dbctl commit (dc=all): 'Set s7 eqiad as read-only for maintenance - T370121', diff saved to https://phabricator.wikimedia.org/P66701 and previous config saved to /var/cache/conftool/dbconfig/20240717-053230-root.json
05:32 marostegui: Starting s7 eqiad failover from db1181 to db1236 - T370121
05:14 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 27 hosts with reason: Primary switchover s7 T370121
05:14 marostegui@cumin1002: dbctl commit (dc=all): 'Set db1236 with weight 0 T370121', diff saved to https://phabricator.wikimedia.org/P66700 and previous config saved to /var/cache/conftool/dbconfig/20240717-051419-root.json
05:14 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 27 hosts with reason: Primary switchover s7 T370121
02:56 eileen: civicrm upgraded from 4f919c1e to 1ac3e7be
00:42 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Unbanning all hosts in search_eqiad
00:42 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Unbanning all hosts in search_eqiad

2024-07-16

23:33 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2209 (T367781)', diff saved to https://phabricator.wikimedia.org/P66699 and previous config saved to /var/cache/conftool/dbconfig/20240716-233336-arnaudb.json
23:25 cstone: civicrm upgraded from 8dbcdfb7 to 4f919c1e
23:18 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2209', diff saved to https://phabricator.wikimedia.org/P66698 and previous config saved to /var/cache/conftool/dbconfig/20240716-231829-arnaudb.json
23:04 eileen: config revision changed from a1ed167f to 85336766
23:03 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2209', diff saved to https://phabricator.wikimedia.org/P66697 and previous config saved to /var/cache/conftool/dbconfig/20240716-230322-arnaudb.json
22:48 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2209 (T367781)', diff saved to https://phabricator.wikimedia.org/P66696 and previous config saved to /var/cache/conftool/dbconfig/20240716-224815-arnaudb.json
22:40 tzatziki: removing 9 files for legal compliance
22:37 eileen: * civicrm upgraded from 3287ced0 to 8dbcdfb7
22:26 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2209 (T367781)', diff saved to https://phabricator.wikimedia.org/P66695 and previous config saved to /var/cache/conftool/dbconfig/20240716-222638-arnaudb.json
22:26 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2209.codfw.wmnet with reason: Maintenance
22:26 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2209.codfw.wmnet with reason: Maintenance
22:26 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 (T367781)', diff saved to https://phabricator.wikimedia.org/P66694 and previous config saved to /var/cache/conftool/dbconfig/20240716-222616-arnaudb.json
22:11 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P66693 and previous config saved to /var/cache/conftool/dbconfig/20240716-221109-arnaudb.json
21:59 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dbproxy2008.codfw.wmnet with OS bookworm
21:56 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P66692 and previous config saved to /var/cache/conftool/dbconfig/20240716-215601-arnaudb.json
21:40 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 (T367781)', diff saved to https://phabricator.wikimedia.org/P66691 and previous config saved to /var/cache/conftool/dbconfig/20240716-214054-arnaudb.json
21:19 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2194 (T367781)', diff saved to https://phabricator.wikimedia.org/P66690 and previous config saved to /var/cache/conftool/dbconfig/20240716-211914-arnaudb.json
21:19 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2194.codfw.wmnet with reason: Maintenance
21:18 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2194.codfw.wmnet with reason: Maintenance
21:18 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2190 (T367781)', diff saved to https://phabricator.wikimedia.org/P66689 and previous config saved to /var/cache/conftool/dbconfig/20240716-211852-arnaudb.json
21:03 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P66688 and previous config saved to /var/cache/conftool/dbconfig/20240716-210345-arnaudb.json
20:54 urbanecm@deploy1002: Finished scap: Backport for [July 16th] Enable dark mode for logged out users (tier 1) (T367150) (duration: 08m 43s)
20:49 urbanecm@deploy1002: urbanecm, jdlrobson: Continuing with sync
20:48 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P66687 and previous config saved to /var/cache/conftool/dbconfig/20240716-204838-arnaudb.json
20:48 urbanecm@deploy1002: urbanecm, jdlrobson: Backport for [July 16th] Enable dark mode for logged out users (tier 1) (T367150) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:45 urbanecm@deploy1002: Started scap sync-world: Backport for [July 16th] Enable dark mode for logged out users (tier 1) (T367150)
20:39 urbanecm@deploy1002: Finished scap: Backport for Ensure every test-config has valid defaults, Merge partial config with defaults (T368606), Merge partial config with defaults (T368606) (duration: 09m 55s)
20:38 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host dbproxy2008.codfw.wmnet with OS bookworm
20:34 urbanecm@deploy1002: urbanecm, migr: Continuing with sync
20:33 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2190 (T367781)', diff saved to https://phabricator.wikimedia.org/P66686 and previous config saved to /var/cache/conftool/dbconfig/20240716-203331-arnaudb.json
20:33 urbanecm@deploy1002: urbanecm, migr: Backport for Ensure every test-config has valid defaults, Merge partial config with defaults (T368606), Merge partial config with defaults (T368606) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:30 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host dbproxy2008.codfw.wmnet with OS bookworm
20:30 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host dbproxy2008.codfw.wmnet with OS bookworm
20:29 urbanecm@deploy1002: Started scap sync-world: Backport for Ensure every test-config has valid defaults, Merge partial config with defaults (T368606), Merge partial config with defaults (T368606)
20:27 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dbproxy2008.codfw.wmnet with OS bookworm
20:14 urbanecm@deploy1002: Finished scap: Backport for foundationwiki: Restrict `unfuzzy` right to autoconfirmed users (T369979) (duration: 09m 31s)
20:12 swfrench@cumin2002: conftool action : set/pooled=true; selector: dnsdisc=appservers-ro,name=eqiad [reason: Repooling to concentrate clients in eqiad - T367949]
20:11 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2190 (T367781)', diff saved to https://phabricator.wikimedia.org/P66685 and previous config saved to /var/cache/conftool/dbconfig/20240716-201153-arnaudb.json
20:11 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2190.codfw.wmnet with reason: Maintenance
20:11 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2190.codfw.wmnet with reason: Maintenance
20:11 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T367781)', diff saved to https://phabricator.wikimedia.org/P66684 and previous config saved to /var/cache/conftool/dbconfig/20240716-201131-arnaudb.json
20:09 urbanecm@deploy1002: seawolf35gerrit, urbanecm: Continuing with sync
20:09 urbanecm@deploy1002: seawolf35gerrit, urbanecm: Backport for foundationwiki: Restrict `unfuzzy` right to autoconfirmed users (T369979) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:05 urbanecm@deploy1002: Started scap sync-world: Backport for foundationwiki: Restrict `unfuzzy` right to autoconfirmed users (T369979)
19:56 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P66683 and previous config saved to /var/cache/conftool/dbconfig/20240716-195624-arnaudb.json
19:41 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P66682 and previous config saved to /var/cache/conftool/dbconfig/20240716-194117-arnaudb.json
19:26 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T367781)', diff saved to https://phabricator.wikimedia.org/P66681 and previous config saved to /var/cache/conftool/dbconfig/20240716-192610-arnaudb.json
19:25 swfrench@cumin2002: conftool action : set/pooled=false; selector: dnsdisc=appservers-ro,name=eqiad [reason: Depooling ahead of turndown - T367949]
19:24 swfrench-wmf: depooling appservers-ro in eqiad, which is not used by remaining analytics workloads - T367949
19:18 cdanis@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
19:18 cdanis@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
19:17 cdanis@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
19:15 cdanis@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
19:07 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host dbproxy2008.codfw.wmnet with OS bookworm
19:05 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2177 (T367781)', diff saved to https://phabricator.wikimedia.org/P66680 and previous config saved to /var/cache/conftool/dbconfig/20240716-190526-arnaudb.json
19:05 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2177.codfw.wmnet with reason: Maintenance
19:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2177.codfw.wmnet with reason: Maintenance
19:05 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T367781)', diff saved to https://phabricator.wikimedia.org/P66679 and previous config saved to /var/cache/conftool/dbconfig/20240716-190504-arnaudb.json
18:56 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2140 (T367856)', diff saved to https://phabricator.wikimedia.org/P66678 and previous config saved to /var/cache/conftool/dbconfig/20240716-185657-marostegui.json
18:56 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2140.codfw.wmnet with reason: Maintenance
18:56 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2140.codfw.wmnet with reason: Maintenance
18:51 cdanis@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
18:50 cdanis@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
18:49 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P66677 and previous config saved to /var/cache/conftool/dbconfig/20240716-184956-arnaudb.json
18:49 cdanis@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
18:49 cdanis@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
18:45 pt1979@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dbproxy2007.codfw.wmnet with OS bookworm
18:34 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P66675 and previous config saved to /var/cache/conftool/dbconfig/20240716-183449-arnaudb.json
18:27 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host dbproxy2007.codfw.wmnet with OS bookworm
18:19 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T367781)', diff saved to https://phabricator.wikimedia.org/P66674 and previous config saved to /var/cache/conftool/dbconfig/20240716-181942-arnaudb.json
18:14 dancy@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.43.0-wmf.14 refs T366959
18:00 dancy@deploy1002: Installing scap version "4.92.0" for 232 hosts
17:59 otto@deploy1002: Finished deploy [analytics/refinery@f97900c]: Deploy refinery with refinery-source version 0.2.44 for mw on k8s - take 3 [analytics/refinery@f97900c9] (duration: 00m 47s)
17:58 otto@deploy1002: Started deploy [analytics/refinery@f97900c]: Deploy refinery with refinery-source version 0.2.44 for mw on k8s - take 3 [analytics/refinery@f97900c9]
17:58 otto@deploy1002: Finished deploy [analytics/refinery@f97900c]: Deploy refinery with refinery-source version 0.2.44 for mw on k8s - take 2 [analytics/refinery@f97900c9] (duration: 02m 44s)
17:58 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2156 (T367781)', diff saved to https://phabricator.wikimedia.org/P66672 and previous config saved to /var/cache/conftool/dbconfig/20240716-175820-arnaudb.json
17:58 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2186.codfw.wmnet with reason: Maintenance
17:58 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2186.codfw.wmnet with reason: Maintenance
17:57 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2156.codfw.wmnet with reason: Maintenance
17:57 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2156.codfw.wmnet with reason: Maintenance
17:57 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T367781)', diff saved to https://phabricator.wikimedia.org/P66671 and previous config saved to /var/cache/conftool/dbconfig/20240716-175742-arnaudb.json
17:55 otto@deploy1002: Started deploy [analytics/refinery@f97900c]: Deploy refinery with refinery-source version 0.2.44 for mw on k8s - take 2 [analytics/refinery@f97900c9]
17:55 otto@deploy1002: Finished deploy [analytics/refinery@f97900c]: Deploy refinery with refinery-source version 0.2.44 for mw on k8s [analytics/refinery@f97900c9] (duration: 08m 33s)
17:55 cdanis@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
17:53 cdanis@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
17:47 otto@deploy1002: Started deploy [analytics/refinery@f97900c]: Deploy refinery with refinery-source version 0.2.44 for mw on k8s [analytics/refinery@f97900c9]
17:47 otto@deploy1002: Finished deploy [analytics/refinery@f97900c] (hadoop-test): Deploy refinery with refinery-source version 0.2.44 for mw on k8s - TEST [analytics/refinery@f97900c9] (duration: 03m 23s)
17:46 swfrench-wmf: appservers-rw and api-rw now resolve to failoid - T367949
17:44 otto@deploy1002: Started deploy [analytics/refinery@f97900c] (hadoop-test): Deploy refinery with refinery-source version 0.2.44 for mw on k8s - TEST [analytics/refinery@f97900c9]
17:44 swfrench@cumin2002: conftool action : set/pooled=false; selector: dnsdisc=api-rw,name=eqiad [reason: Depooling ahead of turndown - T367949]
17:43 swfrench@cumin2002: conftool action : set/pooled=false; selector: dnsdisc=appservers-rw,name=eqiad [reason: Depooling ahead of turndown - T367949]
17:42 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P66670 and previous config saved to /var/cache/conftool/dbconfig/20240716-174235-arnaudb.json
17:40 swfrench@cumin2002: conftool action : set/pooled=false; selector: dnsdisc=api-ro,name=codfw [reason: Depooling ahead of turndown - T367949]
17:39 swfrench@cumin2002: conftool action : set/pooled=false; selector: dnsdisc=appservers-ro,name=codfw [reason: Depooling ahead of turndown - T367949]
17:27 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P66669 and previous config saved to /var/cache/conftool/dbconfig/20240716-172727-arnaudb.json
17:14 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbproxy2006.codfw.wmnet with OS bookworm
17:14 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
17:12 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
17:12 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T367781)', diff saved to https://phabricator.wikimedia.org/P66668 and previous config saved to /var/cache/conftool/dbconfig/20240716-171220-arnaudb.json
17:00 mutante: lists2001 - systemctl reset-failed after gerrit:1054610 to fix T370098
16:56 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbproxy2006.codfw.wmnet with reason: host reimage
16:53 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dbproxy2006.codfw.wmnet with reason: host reimage
16:51 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2149 (T367781)', diff saved to https://phabricator.wikimedia.org/P66667 and previous config saved to /var/cache/conftool/dbconfig/20240716-165135-arnaudb.json
16:51 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2149.codfw.wmnet with reason: Maintenance
16:51 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2149.codfw.wmnet with reason: Maintenance
16:44 arnaudb@cumin1002: dbctl commit (dc=all): 'db1201 (re)pooling @ 100%: post T365997 repool', diff saved to https://phabricator.wikimedia.org/P66666 and previous config saved to /var/cache/conftool/dbconfig/20240716-164446-arnaudb.json
16:44 arnaudb@cumin1002: dbctl commit (dc=all): 'db1200 (re)pooling @ 100%: post T365997 repool', diff saved to https://phabricator.wikimedia.org/P66665 and previous config saved to /var/cache/conftool/dbconfig/20240716-164437-arnaudb.json
16:44 arnaudb@cumin1002: dbctl commit (dc=all): 'db1194 (re)pooling @ 100%: post T365997 repool', diff saved to https://phabricator.wikimedia.org/P66664 and previous config saved to /var/cache/conftool/dbconfig/20240716-164422-arnaudb.json
16:39 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host dbproxy2006.codfw.wmnet with OS bookworm
16:31 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2139.codfw.wmnet with reason: Maintenance
16:31 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2139.codfw.wmnet with reason: Maintenance
16:31 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2127 (T367781)', diff saved to https://phabricator.wikimedia.org/P66663 and previous config saved to /var/cache/conftool/dbconfig/20240716-163059-arnaudb.json
16:29 arnaudb@cumin1002: dbctl commit (dc=all): 'db1201 (re)pooling @ 75%: post T365997 repool', diff saved to https://phabricator.wikimedia.org/P66662 and previous config saved to /var/cache/conftool/dbconfig/20240716-162940-arnaudb.json
16:29 arnaudb@cumin1002: dbctl commit (dc=all): 'db1200 (re)pooling @ 75%: post T365997 repool', diff saved to https://phabricator.wikimedia.org/P66661 and previous config saved to /var/cache/conftool/dbconfig/20240716-162931-arnaudb.json
16:29 arnaudb@cumin1002: dbctl commit (dc=all): 'db1194 (re)pooling @ 75%: post T365997 repool', diff saved to https://phabricator.wikimedia.org/P66660 and previous config saved to /var/cache/conftool/dbconfig/20240716-162916-arnaudb.json
16:21 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
16:21 sukhe@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: merge DNS franio changes (add mgmt IPs) - sukhe@cumin1002"
16:20 sukhe@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: merge DNS franio changes (add mgmt IPs) - sukhe@cumin1002"
16:18 sukhe@cumin1002: START - Cookbook sre.dns.netbox
16:15 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2127', diff saved to https://phabricator.wikimedia.org/P66659 and previous config saved to /var/cache/conftool/dbconfig/20240716-161552-arnaudb.json
16:14 arnaudb@cumin1002: dbctl commit (dc=all): 'db1201 (re)pooling @ 50%: post T365997 repool', diff saved to https://phabricator.wikimedia.org/P66658 and previous config saved to /var/cache/conftool/dbconfig/20240716-161435-arnaudb.json
16:14 arnaudb@cumin1002: dbctl commit (dc=all): 'db1200 (re)pooling @ 50%: post T365997 repool', diff saved to https://phabricator.wikimedia.org/P66657 and previous config saved to /var/cache/conftool/dbconfig/20240716-161426-arnaudb.json
16:14 arnaudb@cumin1002: dbctl commit (dc=all): 'db1194 (re)pooling @ 50%: post T365997 repool', diff saved to https://phabricator.wikimedia.org/P66656 and previous config saved to /var/cache/conftool/dbconfig/20240716-161411-arnaudb.json
16:00 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2127', diff saved to https://phabricator.wikimedia.org/P66655 and previous config saved to /var/cache/conftool/dbconfig/20240716-160044-arnaudb.json
15:59 arnaudb@cumin1002: dbctl commit (dc=all): 'db1201 (re)pooling @ 25%: post T365997 repool', diff saved to https://phabricator.wikimedia.org/P66654 and previous config saved to /var/cache/conftool/dbconfig/20240716-155930-arnaudb.json
15:59 arnaudb@cumin1002: dbctl commit (dc=all): 'db1200 (re)pooling @ 25%: post T365997 repool', diff saved to https://phabricator.wikimedia.org/P66653 and previous config saved to /var/cache/conftool/dbconfig/20240716-155920-arnaudb.json
15:59 arnaudb@cumin1002: dbctl commit (dc=all): 'db1194 (re)pooling @ 25%: post T365997 repool', diff saved to https://phabricator.wikimedia.org/P66652 and previous config saved to /var/cache/conftool/dbconfig/20240716-155905-arnaudb.json
15:58 elukey: uploaded spicerack_8.7.0 to apt.wikimedia.org bullseye-wikimedia
15:52 marostegui@cumin1002: dbctl commit (dc=all): 'db1158 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P66651 and previous config saved to /var/cache/conftool/dbconfig/20240716-155221-root.json
15:45 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2127 (T367781)', diff saved to https://phabricator.wikimedia.org/P66650 and previous config saved to /var/cache/conftool/dbconfig/20240716-154537-arnaudb.json
15:44 arnaudb@cumin1002: dbctl commit (dc=all): 'db1201 (re)pooling @ 10%: post T365997 repool', diff saved to https://phabricator.wikimedia.org/P66649 and previous config saved to /var/cache/conftool/dbconfig/20240716-154424-arnaudb.json
15:44 arnaudb@cumin1002: dbctl commit (dc=all): 'db1200 (re)pooling @ 10%: post T365997 repool', diff saved to https://phabricator.wikimedia.org/P66648 and previous config saved to /var/cache/conftool/dbconfig/20240716-154415-arnaudb.json
15:44 arnaudb@cumin1002: dbctl commit (dc=all): 'db1194 (re)pooling @ 10%: post T365997 repool', diff saved to https://phabricator.wikimedia.org/P66647 and previous config saved to /var/cache/conftool/dbconfig/20240716-154401-arnaudb.json
15:39 dcausse@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
15:39 dcausse@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
15:37 papaul: reboot fpc0 on fasw-c-codfw.mgmt.codfw.wmnet
15:37 marostegui@cumin1002: dbctl commit (dc=all): 'db1158 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P66646 and previous config saved to /var/cache/conftool/dbconfig/20240716-153715-root.json
15:36 dcausse@deploy1002: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
15:35 dcausse@deploy1002: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
15:32 dcausse@deploy1002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
15:32 dcausse@deploy1002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
15:29 arnaudb@cumin1002: dbctl commit (dc=all): 'db1201 (re)pooling @ 5%: post T365997 repool', diff saved to https://phabricator.wikimedia.org/P66645 and previous config saved to /var/cache/conftool/dbconfig/20240716-152918-arnaudb.json
15:29 arnaudb@cumin1002: dbctl commit (dc=all): 'db1200 (re)pooling @ 5%: post T365997 repool', diff saved to https://phabricator.wikimedia.org/P66644 and previous config saved to /var/cache/conftool/dbconfig/20240716-152910-arnaudb.json
15:28 arnaudb@cumin1002: dbctl commit (dc=all): 'db1194 (re)pooling @ 5%: post T365997 repool', diff saved to https://phabricator.wikimedia.org/P66643 and previous config saved to /var/cache/conftool/dbconfig/20240716-152855-arnaudb.json
15:27 cgoubert@cumin1002: conftool action : set/pooled=yes; selector: name=(kubernetes1062.eqiad.wmnet|mw1494.eqiad.wmnet|mw1495.eqiad.wmnet),cluster=kubernetes,service=kubesvc
15:27 claime: Uncordoning kubernetes1062.eqiad.wmnet mw1494.eqiad.wmnet mw1495.eqiad.wmnet - T365997
15:23 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2127 (T367781)', diff saved to https://phabricator.wikimedia.org/P66642 and previous config saved to /var/cache/conftool/dbconfig/20240716-152349-arnaudb.json
15:23 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2127.codfw.wmnet with reason: Maintenance
15:23 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2127.codfw.wmnet with reason: Maintenance
15:22 marostegui@cumin1002: dbctl commit (dc=all): 'db1158 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P66641 and previous config saved to /var/cache/conftool/dbconfig/20240716-152209-root.json
15:19 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
15:19 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
15:15 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1240.eqiad.wmnet with reason: Maintenance
15:15 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1240.eqiad.wmnet with reason: Maintenance
15:15 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1212 (T367781)', diff saved to https://phabricator.wikimedia.org/P66640 and previous config saved to /var/cache/conftool/dbconfig/20240716-151516-arnaudb.json
15:08 topranks: Rebooting lsw1-f2-eqiad to complete JunOS upgrade T365997
15:08 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 21 hosts with reason: JunOS upgrade lsw1-f2-eqiad
15:07 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on 21 hosts with reason: JunOS upgrade lsw1-f2-eqiad
15:07 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lsw1-f2-eqiad,lsw1-f2-eqiad IPv6,ssw1-e1-eqiad.mgmt,ssw1-f1-eqiad.mgmt with reason: JunOS upgrade lsw1-f2-eqiad
15:07 marostegui@cumin1002: dbctl commit (dc=all): 'db1158 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P66638 and previous config saved to /var/cache/conftool/dbconfig/20240716-150704-root.json
15:06 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on lsw1-f2-eqiad,lsw1-f2-eqiad IPv6,ssw1-e1-eqiad.mgmt,ssw1-f1-eqiad.mgmt with reason: JunOS upgrade lsw1-f2-eqiad
15:06 brennen@deploy1002: Finished deploy [phabricator/deployment@7335128]: deploy phab1004 for T370109 (duration: 00m 52s)
15:05 godog: silence OtelCollectorRefusedSpans in codfw for 7d - T370043
15:05 godog: silence OtelCollectorRefusedSpans in codfw for 7d
15:05 brennen@deploy1002: Started deploy [phabricator/deployment@7335128]: deploy phab1004 for T370109
15:04 brennen@deploy1002: Finished deploy [phabricator/deployment@7335128]: test deploy phab2002 for T370109 (duration: 00m 34s)
15:04 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab.wmfusercontent.org with reason: Phabricator/Phorge update
15:04 jelto@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on phab.wmfusercontent.org with reason: Phabricator/Phorge update
15:04 brennen@deploy1002: Started deploy [phabricator/deployment@7335128]: test deploy phab2002 for T370109
15:02 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab1004.eqiad.wmnet with reason: Phabricator/Phorge update
15:02 jelto@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on phab1004.eqiad.wmnet with reason: Phabricator/Phorge update
15:02 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab2002.codfw.wmnet with reason: Phabricator/Phorge update
15:02 jelto@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on phab2002.codfw.wmnet with reason: Phabricator/Phorge update
{{safesubst:SAL entry|1=15:01 urbanecm@deploy1002: Finished scap: Backport for Introduce Vanish Request Flow (T367329 T367726 T367728 T367729 T367744 T368177 T368285 T368368 T368372 T368611 T369489), Pass wiki id to actor store for cross-db hasPublicLogs query (T370059), Properly set automatic vanish performer on GlobalRenameUser (T368177), [[gerrit:1053373|Enable account vanishing in Centra}}
15:00 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1212', diff saved to https://phabricator.wikimedia.org/P66637 and previous config saved to /var/cache/conftool/dbconfig/20240716-150007-arnaudb.json
14:53 urbanecm@deploy1002: dbrant, urbanecm: Continuing with sync
{{safesubst:SAL entry|1=14:53 urbanecm@deploy1002: dbrant, urbanecm: Backport for Introduce Vanish Request Flow (T367329 T367726 T367728 T367729 T367744 T368177 T368285 T368368 T368372 T368611 T369489), Pass wiki id to actor store for cross-db hasPublicLogs query (T370059), Properly set automatic vanish performer on GlobalRenameUser (T368177), [[gerrit:1053373|Enable account vanishing in Cen}}
14:53 filippo@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on centrallog2002.codfw.wmnet with reason: network upgrade
14:53 filippo@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on centrallog2002.codfw.wmnet with reason: network upgrade
14:51 marostegui@cumin1002: dbctl commit (dc=all): 'db1158 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P66636 and previous config saved to /var/cache/conftool/dbconfig/20240716-145159-root.json
14:49 sukhe: [durum1001] upgrade anycast-healthchecker to 0.9.8-1+wmf12u1: T370068
14:46 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:50:00 on lsw1-f2-eqiad.mgmt with reason: prep JunOS upgrade lsw1-f2-eqiad
14:46 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:50:00 on lsw1-f2-eqiad.mgmt with reason: prep JunOS upgrade lsw1-f2-eqiad
14:45 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1212', diff saved to https://phabricator.wikimedia.org/P66635 and previous config saved to /var/cache/conftool/dbconfig/20240716-144500-arnaudb.json
14:44 sukhe: reprepro -C main include bookworm-wikimedia anycast-healthchecker_0.9.8-1+wmf12u1_amd64.changes: T370068
14:36 cgoubert@cumin1002: conftool action : set/pooled=inactive; selector: name=(kubernetes1062.eqiad.wmnet|mw1494.eqiad.wmnet|mw1495.eqiad.wmnet),cluster=kubernetes,service=kubesvc
14:34 claime: Cordoning kubernetes1062.eqiad.wmnet mw1494.eqiad.wmnet mw1495.eqiad.wmnet - T365997
14:33 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db[1194,1200-1201].eqiad.wmnet,dbstore1009.eqiad.wmnet with reason: T365997
14:33 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db[1194,1200-1201].eqiad.wmnet,dbstore1009.eqiad.wmnet with reason: T365997
14:33 arnaudb@cumin1002: dbctl commit (dc=all): 'T365997 - depool db1194-s7,db1200-s5,db1201-s6', diff saved to https://phabricator.wikimedia.org/P66634 and previous config saved to /var/cache/conftool/dbconfig/20240716-143306-arnaudb.json
14:29 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1212 (T367781)', diff saved to https://phabricator.wikimedia.org/P66633 and previous config saved to /var/cache/conftool/dbconfig/20240716-142953-arnaudb.json
{{safesubst:SAL entry|1=14:26 urbanecm@deploy1002: Started scap sync-world: Backport for Introduce Vanish Request Flow (T367329 T367726 T367728 T367729 T367744 T368177 T368285 T368368 T368372 T368611 T369489), Pass wiki id to actor store for cross-db hasPublicLogs query (T370059), Properly set automatic vanish performer on GlobalRenameUser (T368177), [[gerrit:1053373|Enable account vanishing}}
14:23 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1212 (T367781)', diff saved to https://phabricator.wikimedia.org/P66632 and previous config saved to /var/cache/conftool/dbconfig/20240716-142321-arnaudb.json
14:23 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
14:22 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
14:22 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1212.eqiad.wmnet with reason: Maintenance
14:22 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1212.eqiad.wmnet with reason: Maintenance
14:20 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T367781)', diff saved to https://phabricator.wikimedia.org/P66631 and previous config saved to /var/cache/conftool/dbconfig/20240716-142029-arnaudb.json
14:12 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
14:11 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
14:10 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
14:08 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
14:07 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
14:07 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
14:05 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P66630 and previous config saved to /var/cache/conftool/dbconfig/20240716-140522-arnaudb.json
14:03 cgoubert@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts mw2432.codfw.wmnet
13:53 cgoubert@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mw2432.codfw.wmnet
13:50 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P66629 and previous config saved to /var/cache/conftool/dbconfig/20240716-135015-arnaudb.json
away: UTC afternoon deploys done
13:39 tgr@deploy1002: Finished scap: Backport for Handle sso.wikimedia.org domain (T365162) (duration: 19m 07s)
13:35 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T367781)', diff saved to https://phabricator.wikimedia.org/P66628 and previous config saved to /var/cache/conftool/dbconfig/20240716-133508-arnaudb.json
13:34 tgr@deploy1002: tgr: Continuing with sync
13:29 mforns@deploy1002: Finished deploy [airflow-dags/analytics@1ee55b8]: (no justification provided) (duration: 00m 30s)
13:29 mforns@deploy1002: Started deploy [airflow-dags/analytics@1ee55b8]: (no justification provided)
13:29 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1198 (T367781)', diff saved to https://phabricator.wikimedia.org/P66627 and previous config saved to /var/cache/conftool/dbconfig/20240716-132915-arnaudb.json
13:29 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1198.eqiad.wmnet with reason: Maintenance
13:28 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1198.eqiad.wmnet with reason: Maintenance
13:28 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1189 (T367781)', diff saved to https://phabricator.wikimedia.org/P66626 and previous config saved to /var/cache/conftool/dbconfig/20240716-132853-arnaudb.json
13:22 tgr@deploy1002: tgr: Backport for Handle sso.wikimedia.org domain (T365162) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:20 tgr@deploy1002: Started scap sync-world: Backport for Handle sso.wikimedia.org domain (T365162)
13:15 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for EventStreamConfig: Enable hive ingestion for mediawiki.page-delete (T367134) (duration: 10m 15s)
13:13 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P66625 and previous config saved to /var/cache/conftool/dbconfig/20240716-131346-arnaudb.json
13:10 logmsgbot: lucaswerkmeister-wmde@deploy1002 tchin, lucaswerkmeister-wmde: Continuing with sync
13:09 logmsgbot: lucaswerkmeister-wmde@deploy1002 tchin, lucaswerkmeister-wmde: Backport for EventStreamConfig: Enable hive ingestion for mediawiki.page-delete (T367134) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:05 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap sync-world: Backport for EventStreamConfig: Enable hive ingestion for mediawiki.page-delete (T367134)
12:58 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P66624 and previous config saved to /var/cache/conftool/dbconfig/20240716-125839-arnaudb.json
12:46 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2130 (T367856)', diff saved to https://phabricator.wikimedia.org/P66623 and previous config saved to /var/cache/conftool/dbconfig/20240716-124604-marostegui.json
12:45 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2130.codfw.wmnet with reason: Maintenance
12:45 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2130.codfw.wmnet with reason: Maintenance
12:45 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T367856)', diff saved to https://phabricator.wikimedia.org/P66622 and previous config saved to /var/cache/conftool/dbconfig/20240716-124543-marostegui.json
12:43 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1189 (T367781)', diff saved to https://phabricator.wikimedia.org/P66621 and previous config saved to /var/cache/conftool/dbconfig/20240716-124332-arnaudb.json
12:30 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P66620 and previous config saved to /var/cache/conftool/dbconfig/20240716-123035-marostegui.json
12:20 marostegui@cumin1002: dbctl commit (dc=all): 'db1174 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P66619 and previous config saved to /var/cache/conftool/dbconfig/20240716-122039-root.json
12:15 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P66618 and previous config saved to /var/cache/conftool/dbconfig/20240716-121528-marostegui.json
12:10 ayounsi@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) netbox to netbox-dev2003.codfw.wmnet with reason: Release v4.0.7 to netbox-next - ayounsi@cumin1002 - T336275
12:09 ayounsi@cumin1002: START - Cookbook sre.deploy.python-code netbox to netbox-dev2003.codfw.wmnet with reason: Release v4.0.7 to netbox-next - ayounsi@cumin1002 - T336275
12:05 marostegui@cumin1002: dbctl commit (dc=all): 'db1174 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P66617 and previous config saved to /var/cache/conftool/dbconfig/20240716-120534-root.json
12:00 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T367856)', diff saved to https://phabricator.wikimedia.org/P66616 and previous config saved to /var/cache/conftool/dbconfig/20240716-120021-marostegui.json
12:00 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2116 (T367856)', diff saved to https://phabricator.wikimedia.org/P66615 and previous config saved to /var/cache/conftool/dbconfig/20240716-120012-marostegui.json
12:00 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2116.codfw.wmnet with reason: Maintenance
12:00 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2116.codfw.wmnet with reason: Maintenance
11:59 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T367856)', diff saved to https://phabricator.wikimedia.org/P66614 and previous config saved to /var/cache/conftool/dbconfig/20240716-115920-marostegui.json
11:50 marostegui@cumin1002: dbctl commit (dc=all): 'db1174 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P66613 and previous config saved to /var/cache/conftool/dbconfig/20240716-115028-root.json
11:49 effie: drain mw1496.eqiad.wmnet
11:43 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1189 (T367781)', diff saved to https://phabricator.wikimedia.org/P66611 and previous config saved to /var/cache/conftool/dbconfig/20240716-114315-arnaudb.json
11:43 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1189.eqiad.wmnet with reason: Maintenance
11:42 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1189.eqiad.wmnet with reason: Maintenance
11:42 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T367781)', diff saved to https://phabricator.wikimedia.org/P66610 and previous config saved to /var/cache/conftool/dbconfig/20240716-114254-arnaudb.json
11:35 marostegui@cumin1002: dbctl commit (dc=all): 'db1174 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P66608 and previous config saved to /var/cache/conftool/dbconfig/20240716-113523-root.json
11:27 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P66607 and previous config saved to /var/cache/conftool/dbconfig/20240716-112746-arnaudb.json
11:20 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply
11:20 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply
11:20 marostegui@cumin1002: dbctl commit (dc=all): 'db1174 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P66606 and previous config saved to /var/cache/conftool/dbconfig/20240716-112017-root.json
11:12 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P66605 and previous config saved to /var/cache/conftool/dbconfig/20240716-111239-arnaudb.json
11:08 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply
11:07 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply
11:05 marostegui@cumin1002: dbctl commit (dc=all): 'db1174 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P66604 and previous config saved to /var/cache/conftool/dbconfig/20240716-110512-root.json
10:57 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T367781)', diff saved to https://phabricator.wikimedia.org/P66603 and previous config saved to /var/cache/conftool/dbconfig/20240716-105732-arnaudb.json
10:53 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply
10:51 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1175 (T367781)', diff saved to https://phabricator.wikimedia.org/P66602 and previous config saved to /var/cache/conftool/dbconfig/20240716-105139-arnaudb.json
10:51 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1175.eqiad.wmnet with reason: Maintenance
10:51 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1175.eqiad.wmnet with reason: Maintenance
10:51 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T367781)', diff saved to https://phabricator.wikimedia.org/P66601 and previous config saved to /var/cache/conftool/dbconfig/20240716-105117-arnaudb.json
10:50 marostegui@cumin1002: dbctl commit (dc=all): 'db1174 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P66600 and previous config saved to /var/cache/conftool/dbconfig/20240716-105006-root.json
10:36 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P66599 and previous config saved to /var/cache/conftool/dbconfig/20240716-103610-arnaudb.json
10:35 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply
10:21 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P66598 and previous config saved to /var/cache/conftool/dbconfig/20240716-102103-arnaudb.json
10:10 dcausse: T362529: creating aewikimedia CirrusSearch indices with 'mwscript extensions/CirrusSearch/maintenance/UpdateSearchIndexConfig.php --wiki=aewikimedia --cluster=all'
10:05 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T367781)', diff saved to https://phabricator.wikimedia.org/P66597 and previous config saved to /var/cache/conftool/dbconfig/20240716-100556-arnaudb.json
10:00 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1166 (T367781)', diff saved to https://phabricator.wikimedia.org/P66595 and previous config saved to /var/cache/conftool/dbconfig/20240716-100002-arnaudb.json
09:59 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1166.eqiad.wmnet with reason: Maintenance
09:59 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1166.eqiad.wmnet with reason: Maintenance
09:59 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1157 (T367781)', diff saved to https://phabricator.wikimedia.org/P66594 and previous config saved to /var/cache/conftool/dbconfig/20240716-095939-arnaudb.json
09:54 isaranto@deploy1002: helmfile [ml-serve-eqiad] 'sync' command on namespace 'ores-legacy' for release 'main' .
09:53 isaranto@deploy1002: helmfile [ml-serve-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
09:52 isaranto@deploy1002: helmfile [ml-serve-eqiad] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
09:52 isaranto@deploy1002: helmfile [ml-serve-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
09:50 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
09:44 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1157', diff saved to https://phabricator.wikimedia.org/P66593 and previous config saved to /var/cache/conftool/dbconfig/20240716-094432-arnaudb.json
09:44 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
09:42 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
09:39 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
09:37 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
09:37 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
09:32 btullis@cumin1002: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0) for database aewikimedia (T362529)
09:29 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1157', diff saved to https://phabricator.wikimedia.org/P66592 and previous config saved to /var/cache/conftool/dbconfig/20240716-092924-arnaudb.json
09:23 isaranto@deploy1002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
09:20 godog: bounce benthos@mw_accesslog_sampler - T369256
09:14 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1157 (T367781)', diff saved to https://phabricator.wikimedia.org/P66591 and previous config saved to /var/cache/conftool/dbconfig/20240716-091418-arnaudb.json
09:12 elukey: update docker-registry to 0.0.14-1 on build2001
09:12 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
09:12 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
09:12 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
09:11 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
09:11 elukey: update docker-report to 0.0.14-1 on bullseye-wikimedia
09:06 btullis@cumin1002: START - Cookbook sre.wikireplicas.add-wiki for database aewikimedia (T362529)
09:04 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
09:03 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
09:03 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
09:03 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
09:03 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
09:02 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
08:50 isaranto@deploy1002: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
08:32 godog: root@kafka-logging1001:~# kafka topics --alter --topic mediawiki.httpd.accesslog --partitions 12 - T369256
08:31 marostegui: Clone dbstore1008:3317 from db1174 T370122
08:28 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1008.eqiad.wmnet with reason: Long schema change
08:28 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1008.eqiad.wmnet with reason: Long schema change
08:27 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1174', diff saved to https://phabricator.wikimedia.org/P66589 and previous config saved to /var/cache/conftool/dbconfig/20240716-082727-root.json
08:22 marostegui@cumin1002: dbctl commit (dc=all): 'db1174 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P66588 and previous config saved to /var/cache/conftool/dbconfig/20240716-082213-root.json
08:14 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1157 (T367781)', diff saved to https://phabricator.wikimedia.org/P66587 and previous config saved to /var/cache/conftool/dbconfig/20240716-081401-arnaudb.json
08:13 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1157.eqiad.wmnet with reason: Maintenance
08:13 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1157.eqiad.wmnet with reason: Maintenance
08:11 marostegui@cumin1002: dbctl commit (dc=all): 'db1157 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P66586 and previous config saved to /var/cache/conftool/dbconfig/20240716-081129-root.json
08:09 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1150.eqiad.wmnet with reason: Maintenance
08:09 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1150.eqiad.wmnet with reason: Maintenance
08:07 marostegui@cumin1002: dbctl commit (dc=all): 'db1157 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P66585 and previous config saved to /var/cache/conftool/dbconfig/20240716-080720-root.json
08:07 marostegui@cumin1002: dbctl commit (dc=all): 'db1174 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P66584 and previous config saved to /var/cache/conftool/dbconfig/20240716-080707-root.json
07:46 klausman@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1006.eqiad.wmnet
07:40 Dreamy_Jazz: Morning UTC backport window done
07:38 klausman@cumin1002: START - Cookbook sre.hosts.reboot-single for host ml-serve1006.eqiad.wmnet
07:38 klausman@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1002.eqiad.wmnet
07:29 Dreamy_Jazz: Restarted MediaModeration scanning scrpt
07:28 klausman@cumin1002: START - Cookbook sre.hosts.reboot-single for host ml-serve1002.eqiad.wmnet
07:19 dreamyjazz@deploy1002: Finished scap: Backport for [CheckUser] Remove wgCheckUserEventTablesMigrationStage config (T366546) (duration: 12m 09s)
07:14 dreamyjazz@deploy1002: dreamyjazz: Continuing with sync
07:14 dreamyjazz@deploy1002: dreamyjazz: Backport for [CheckUser] Remove wgCheckUserEventTablesMigrationStage config (T366546) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
07:13 volans@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
07:13 volans@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Merging pending changes for frack hosts as per IRC discussion - volans@cumin1002"
07:10 volans@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Merging pending changes for frack hosts as per IRC discussion - volans@cumin1002"
07:07 dreamyjazz@deploy1002: Started scap sync-world: Backport for [CheckUser] Remove wgCheckUserEventTablesMigrationStage config (T366546)
07:07 volans@cumin1002: START - Cookbook sre.dns.netbox
06:59 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 52999
06:59 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 52999
06:18 kart_: Updated cxserver to 2024-07-15-100650-production (T354666)
06:16 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
06:16 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
06:12 kevinbazira@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'readability' for release 'main' .
06:12 kevinbazira@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'readability' for release 'main' .
06:11 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
06:11 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
06:06 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
06:05 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
05:43 marostegui: Deploy schema change on s7 eqiad db1174 dbmaint T367856
05:43 marostegui: Deploy schema change on s3 eqiad db1157 dbmaint T367856
05:25 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2116.codfw.wmnet with reason: Long schema change
05:25 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2116.codfw.wmnet with reason: Long schema change
05:17 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1157.eqiad.wmnet with reason: Long schema change
05:17 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1157.eqiad.wmnet with reason: Long schema change
05:17 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1157 T370019', diff saved to https://phabricator.wikimedia.org/P66581 and previous config saved to /var/cache/conftool/dbconfig/20240716-051718-root.json
05:15 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db1223 to s3 primary and set section read-write T370019', diff saved to https://phabricator.wikimedia.org/P66580 and previous config saved to /var/cache/conftool/dbconfig/20240716-051538-root.json
05:15 marostegui@cumin1002: dbctl commit (dc=all): 'Set s3 eqiad as read-only for maintenance - T370019', diff saved to https://phabricator.wikimedia.org/P66579 and previous config saved to /var/cache/conftool/dbconfig/20240716-051516-root.json
05:15 marostegui: Starting s3 eqiad failover from db1157 to db1223 - T370019
04:58 marostegui@cumin1002: dbctl commit (dc=all): 'Set db1223 with weight 0 T370019', diff saved to https://phabricator.wikimedia.org/P66578 and previous config saved to /var/cache/conftool/dbconfig/20240716-045839-root.json
04:58 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1174.eqiad.wmnet with reason: Long schema change
04:58 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1174.eqiad.wmnet with reason: Long schema change
04:58 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1174', diff saved to https://phabricator.wikimedia.org/P66577 and previous config saved to /var/cache/conftool/dbconfig/20240716-045807-marostegui.json
04:58 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s3 T370019
04:57 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 25 hosts with reason: Primary switchover s3 T370019
04:01 mwpresync@deploy1002: Pruned MediaWiki: 1.43.0-wmf.11 (duration: 00m 58s)
03:53 mwpresync@deploy1002: Finished scap: testwikis wikis to 1.43.0-wmf.14 refs T366959 (duration: 50m 56s)
03:03 mwpresync@deploy1002: Started scap sync-world: testwikis wikis to 1.43.0-wmf.14 refs T366959
02:55 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2207 (T367781)', diff saved to https://phabricator.wikimedia.org/P66576 and previous config saved to /var/cache/conftool/dbconfig/20240716-025545-arnaudb.json
02:40 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2207', diff saved to https://phabricator.wikimedia.org/P66575 and previous config saved to /var/cache/conftool/dbconfig/20240716-024038-arnaudb.json
02:25 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2207', diff saved to https://phabricator.wikimedia.org/P66574 and previous config saved to /var/cache/conftool/dbconfig/20240716-022531-arnaudb.json
02:10 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2207 (T367781)', diff saved to https://phabricator.wikimedia.org/P66573 and previous config saved to /var/cache/conftool/dbconfig/20240716-021023-arnaudb.json
02:07 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2207 (T367781)', diff saved to https://phabricator.wikimedia.org/P66572 and previous config saved to /var/cache/conftool/dbconfig/20240716-020751-arnaudb.json
02:07 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2207.codfw.wmnet with reason: Maintenance
02:07 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2207.codfw.wmnet with reason: Maintenance
01:21 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2197.codfw.wmnet with reason: Maintenance
01:21 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2197.codfw.wmnet with reason: Maintenance
01:21 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189 (T367781)', diff saved to https://phabricator.wikimedia.org/P66570 and previous config saved to /var/cache/conftool/dbconfig/20240716-012125-arnaudb.json
01:06 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189', diff saved to https://phabricator.wikimedia.org/P66569 and previous config saved to /var/cache/conftool/dbconfig/20240716-010618-arnaudb.json
00:51 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189', diff saved to https://phabricator.wikimedia.org/P66568 and previous config saved to /var/cache/conftool/dbconfig/20240716-005111-arnaudb.json
00:36 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189 (T367781)', diff saved to https://phabricator.wikimedia.org/P66567 and previous config saved to /var/cache/conftool/dbconfig/20240716-003604-arnaudb.json
00:33 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2189 (T367781)', diff saved to https://phabricator.wikimedia.org/P66566 and previous config saved to /var/cache/conftool/dbconfig/20240716-003331-arnaudb.json
00:33 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2189.codfw.wmnet with reason: Maintenance
00:33 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2189.codfw.wmnet with reason: Maintenance
00:33 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T367781)', diff saved to https://phabricator.wikimedia.org/P66565 and previous config saved to /var/cache/conftool/dbconfig/20240716-003310-arnaudb.json
00:26 zabe: zabe@mwmaint1002:/tmp/upload$ mwscript importImages.php --wiki=commonswiki --comment-ext=txt --user=Trade . # T369998
00:22 zabe: zabe@mwmaint1002:~$ mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=enwiktionary --logwiki=metawiki 'Dodo cham' 'Le GlitcheurHD' # T369777
00:18 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P66564 and previous config saved to /var/cache/conftool/dbconfig/20240716-001802-arnaudb.json
00:02 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P66563 and previous config saved to /var/cache/conftool/dbconfig/20240716-000255-arnaudb.json

2024-07-15

23:54 zabe@deploy1002: Finished scap: Backport for Further configurations for aewikimedia (T362529) (duration: 12m 26s)
23:49 zabe@deploy1002: zabe: Continuing with sync
23:48 zabe: zabe@mwmaint1002:~$ mwscript extensions/WikimediaMaintenance/createExtensionTables.php aewikimedia translate # T362529
23:47 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T367781)', diff saved to https://phabricator.wikimedia.org/P66562 and previous config saved to /var/cache/conftool/dbconfig/20240715-234748-arnaudb.json
23:45 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2175 (T367781)', diff saved to https://phabricator.wikimedia.org/P66561 and previous config saved to /var/cache/conftool/dbconfig/20240715-234516-arnaudb.json
23:45 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2175.codfw.wmnet with reason: Maintenance
23:44 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2175.codfw.wmnet with reason: Maintenance
23:44 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T367781)', diff saved to https://phabricator.wikimedia.org/P66560 and previous config saved to /var/cache/conftool/dbconfig/20240715-234454-arnaudb.json
23:44 zabe@deploy1002: zabe: Backport for Further configurations for aewikimedia (T362529) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
23:42 zabe@deploy1002: Started scap sync-world: Backport for Further configurations for aewikimedia (T362529)
23:29 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P66559 and previous config saved to /var/cache/conftool/dbconfig/20240715-232947-arnaudb.json
23:14 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P66558 and previous config saved to /var/cache/conftool/dbconfig/20240715-231440-arnaudb.json
23:11 logmsgbot: nshahquinn-wmf@deploy1002 Finished deploy [airflow-dags/analytics_product@767d7ad]: (no justification provided) (duration: 00m 08s)
23:11 logmsgbot: nshahquinn-wmf@deploy1002 Started deploy [airflow-dags/analytics_product@767d7ad]: (no justification provided)
22:59 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T367781)', diff saved to https://phabricator.wikimedia.org/P66557 and previous config saved to /var/cache/conftool/dbconfig/20240715-225933-arnaudb.json
22:57 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2148 (T367781)', diff saved to https://phabricator.wikimedia.org/P66556 and previous config saved to /var/cache/conftool/dbconfig/20240715-225701-arnaudb.json
22:56 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2148.codfw.wmnet with reason: Maintenance
22:56 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2148.codfw.wmnet with reason: Maintenance
22:56 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2138 (T367781)', diff saved to https://phabricator.wikimedia.org/P66555 and previous config saved to /var/cache/conftool/dbconfig/20240715-225639-arnaudb.json
22:41 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2138', diff saved to https://phabricator.wikimedia.org/P66554 and previous config saved to /var/cache/conftool/dbconfig/20240715-224131-arnaudb.json
22:26 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2138', diff saved to https://phabricator.wikimedia.org/P66553 and previous config saved to /var/cache/conftool/dbconfig/20240715-222624-arnaudb.json
22:11 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2138 (T367781)', diff saved to https://phabricator.wikimedia.org/P66552 and previous config saved to /var/cache/conftool/dbconfig/20240715-221117-arnaudb.json
22:08 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2138 (T367781)', diff saved to https://phabricator.wikimedia.org/P66551 and previous config saved to /var/cache/conftool/dbconfig/20240715-220845-arnaudb.json
22:08 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2138.codfw.wmnet with reason: Maintenance
22:08 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2138.codfw.wmnet with reason: Maintenance
22:08 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2126 (T367781)', diff saved to https://phabricator.wikimedia.org/P66550 and previous config saved to /var/cache/conftool/dbconfig/20240715-220823-arnaudb.json
21:53 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2126', diff saved to https://phabricator.wikimedia.org/P66549 and previous config saved to /var/cache/conftool/dbconfig/20240715-215316-arnaudb.json
21:38 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2126', diff saved to https://phabricator.wikimedia.org/P66548 and previous config saved to /var/cache/conftool/dbconfig/20240715-213809-arnaudb.json
21:23 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2126 (T367781)', diff saved to https://phabricator.wikimedia.org/P66547 and previous config saved to /var/cache/conftool/dbconfig/20240715-212302-arnaudb.json
21:20 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2126 (T367781)', diff saved to https://phabricator.wikimedia.org/P66546 and previous config saved to /var/cache/conftool/dbconfig/20240715-212034-arnaudb.json
21:20 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance
21:20 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance
21:20 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2126.codfw.wmnet with reason: Maintenance
21:20 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2126.codfw.wmnet with reason: Maintenance
21:20 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2125 (T367781)', diff saved to https://phabricator.wikimedia.org/P66545 and previous config saved to /var/cache/conftool/dbconfig/20240715-211957-arnaudb.json
21:04 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2125', diff saved to https://phabricator.wikimedia.org/P66544 and previous config saved to /var/cache/conftool/dbconfig/20240715-210451-arnaudb.json
20:49 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2125', diff saved to https://phabricator.wikimedia.org/P66543 and previous config saved to /var/cache/conftool/dbconfig/20240715-204944-arnaudb.json
20:39 catrope@deploy1002: Finished scap: Backport for Revert changes in log levels, Revert "Change Linter log level to info" (duration: 07m 41s)
20:35 catrope@deploy1002: arlolra, catrope: Continuing with sync
20:34 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2125 (T367781)', diff saved to https://phabricator.wikimedia.org/P66542 and previous config saved to /var/cache/conftool/dbconfig/20240715-203435-arnaudb.json
20:34 catrope@deploy1002: arlolra, catrope: Backport for Revert changes in log levels, Revert "Change Linter log level to info" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:32 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance
20:32 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance
20:32 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2137 (T367856)', diff saved to https://phabricator.wikimedia.org/P66541 and previous config saved to /var/cache/conftool/dbconfig/20240715-203233-marostegui.json
20:32 catrope@deploy1002: Started scap sync-world: Backport for Revert changes in log levels, Revert "Change Linter log level to info"
20:32 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2125 (T367781)', diff saved to https://phabricator.wikimedia.org/P66540 and previous config saved to /var/cache/conftool/dbconfig/20240715-203203-arnaudb.json
20:31 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2125.codfw.wmnet with reason: Maintenance
20:31 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2125.codfw.wmnet with reason: Maintenance
20:31 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
20:31 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
20:31 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1246 (T367781)', diff saved to https://phabricator.wikimedia.org/P66539 and previous config saved to /var/cache/conftool/dbconfig/20240715-203120-arnaudb.json
20:29 catrope@deploy1002: Finished scap: Backport for [July 15th] Deploy dark mode to all logged-in users (T368795) (duration: 10m 26s)
20:24 catrope@deploy1002: jdlrobson, catrope: Continuing with sync
20:22 catrope@deploy1002: jdlrobson, catrope: Backport for [July 15th] Deploy dark mode to all logged-in users (T368795) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:19 ryankemper: [WDQS Deploy] Deploy complete. Successful test query placed on query.wikidata.org, there's no relevant criticals in Icinga, and Grafana looks good
20:18 catrope@deploy1002: Started scap sync-world: Backport for [July 15th] Deploy dark mode to all logged-in users (T368795)
20:17 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2137', diff saved to https://phabricator.wikimedia.org/P66538 and previous config saved to /var/cache/conftool/dbconfig/20240715-201726-marostegui.json
20:16 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1246', diff saved to https://phabricator.wikimedia.org/P66537 and previous config saved to /var/cache/conftool/dbconfig/20240715-201613-arnaudb.json
20:02 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2137', diff saved to https://phabricator.wikimedia.org/P66536 and previous config saved to /var/cache/conftool/dbconfig/20240715-200218-marostegui.json
20:01 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1246', diff saved to https://phabricator.wikimedia.org/P66535 and previous config saved to /var/cache/conftool/dbconfig/20240715-200106-arnaudb.json
19:55 marostegui@cumin1002: dbctl commit (dc=all): 'db2121 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P66534 and previous config saved to /var/cache/conftool/dbconfig/20240715-195510-root.json
19:55 marostegui@cumin1002: dbctl commit (dc=all): 'db1170 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P66533 and previous config saved to /var/cache/conftool/dbconfig/20240715-195459-root.json
19:47 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2137 (T367856)', diff saved to https://phabricator.wikimedia.org/P66532 and previous config saved to /var/cache/conftool/dbconfig/20240715-194711-marostegui.json
19:46 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1246 (T367781)', diff saved to https://phabricator.wikimedia.org/P66531 and previous config saved to /var/cache/conftool/dbconfig/20240715-194559-arnaudb.json
19:43 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1246 (T367781)', diff saved to https://phabricator.wikimedia.org/P66530 and previous config saved to /var/cache/conftool/dbconfig/20240715-194344-arnaudb.json
19:43 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1246.eqiad.wmnet with reason: Maintenance
19:43 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1246.eqiad.wmnet with reason: Maintenance
19:43 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1239.eqiad.wmnet with reason: Maintenance
19:43 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1239.eqiad.wmnet with reason: Maintenance
19:42 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233 (T367781)', diff saved to https://phabricator.wikimedia.org/P66529 and previous config saved to /var/cache/conftool/dbconfig/20240715-194257-arnaudb.json
19:40 marostegui@cumin1002: dbctl commit (dc=all): 'db2121 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P66528 and previous config saved to /var/cache/conftool/dbconfig/20240715-194004-root.json
19:39 marostegui@cumin1002: dbctl commit (dc=all): 'db1170 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P66527 and previous config saved to /var/cache/conftool/dbconfig/20240715-193953-root.json
19:27 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P66526 and previous config saved to /var/cache/conftool/dbconfig/20240715-192750-arnaudb.json
19:25 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@9ad2bec]: 0.3.144 (duration: 08m 31s)
19:24 marostegui@cumin1002: dbctl commit (dc=all): 'db2121 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P66525 and previous config saved to /var/cache/conftool/dbconfig/20240715-192458-root.json
19:24 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: elastic109[8-9]* for T348977 - bking@cumin2002
19:24 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: elastic109[8-9]* for T348977 - bking@cumin2002
19:24 marostegui@cumin1002: dbctl commit (dc=all): 'db1170 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P66524 and previous config saved to /var/cache/conftool/dbconfig/20240715-192448-root.json
19:24 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on elastic[1098-1099].eqiad.wmnet with reason: T348977
19:23 bking@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on elastic[1098-1099].eqiad.wmnet with reason: T348977
19:17 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.144` on canary `wdqs1016`; proceeding to rest of fleet
19:16 ryankemper@deploy1002: Started deploy [wdqs/wdqs@9ad2bec]: 0.3.144
19:16 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.144`. Pre-deploy tests passing on canary `wdqs1016`
19:12 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P66523 and previous config saved to /var/cache/conftool/dbconfig/20240715-191243-arnaudb.json
19:09 marostegui@cumin1002: dbctl commit (dc=all): 'db2121 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P66522 and previous config saved to /var/cache/conftool/dbconfig/20240715-190953-root.json
19:09 marostegui@cumin1002: dbctl commit (dc=all): 'db1170 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P66521 and previous config saved to /var/cache/conftool/dbconfig/20240715-190942-root.json
18:57 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233 (T367781)', diff saved to https://phabricator.wikimedia.org/P66520 and previous config saved to /var/cache/conftool/dbconfig/20240715-185736-arnaudb.json
18:55 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1233 (T367781)', diff saved to https://phabricator.wikimedia.org/P66519 and previous config saved to /var/cache/conftool/dbconfig/20240715-185521-arnaudb.json
18:55 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1233.eqiad.wmnet with reason: Maintenance
18:55 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1233.eqiad.wmnet with reason: Maintenance
18:55 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229 (T367781)', diff saved to https://phabricator.wikimedia.org/P66518 and previous config saved to /var/cache/conftool/dbconfig/20240715-185459-arnaudb.json
18:54 marostegui@cumin1002: dbctl commit (dc=all): 'db2121 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P66517 and previous config saved to /var/cache/conftool/dbconfig/20240715-185447-root.json
18:54 marostegui@cumin1002: dbctl commit (dc=all): 'db1170 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P66516 and previous config saved to /var/cache/conftool/dbconfig/20240715-185437-root.json
18:39 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P66515 and previous config saved to /var/cache/conftool/dbconfig/20240715-183952-arnaudb.json
18:39 marostegui@cumin1002: dbctl commit (dc=all): 'db2121 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P66514 and previous config saved to /var/cache/conftool/dbconfig/20240715-183942-root.json
18:39 marostegui@cumin1002: dbctl commit (dc=all): 'db1170 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P66513 and previous config saved to /var/cache/conftool/dbconfig/20240715-183931-root.json
18:24 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P66512 and previous config saved to /var/cache/conftool/dbconfig/20240715-182444-arnaudb.json
18:24 marostegui@cumin1002: dbctl commit (dc=all): 'db2121 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P66511 and previous config saved to /var/cache/conftool/dbconfig/20240715-182436-root.json
18:24 marostegui@cumin1002: dbctl commit (dc=all): 'db1170 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P66510 and previous config saved to /var/cache/conftool/dbconfig/20240715-182426-root.json
18:09 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229 (T367781)', diff saved to https://phabricator.wikimedia.org/P66509 and previous config saved to /var/cache/conftool/dbconfig/20240715-180937-arnaudb.json
18:07 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1229 (T367781)', diff saved to https://phabricator.wikimedia.org/P66508 and previous config saved to /var/cache/conftool/dbconfig/20240715-180726-arnaudb.json
18:07 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1229.eqiad.wmnet with reason: Maintenance
18:07 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1229.eqiad.wmnet with reason: Maintenance
18:07 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1225.eqiad.wmnet with reason: Maintenance
18:06 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1225.eqiad.wmnet with reason: Maintenance
18:06 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T367781)', diff saved to https://phabricator.wikimedia.org/P66507 and previous config saved to /var/cache/conftool/dbconfig/20240715-180640-arnaudb.json
18:04 herron: upgraded prometheus-ipmi-exporter to 1.8.0 T368088
17:51 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P66506 and previous config saved to /var/cache/conftool/dbconfig/20240715-175133-arnaudb.json
17:41 mnz@deploy1002: Finished deploy [airflow-dags/research@5121748]: (no justification provided) (duration: 00m 10s)
17:40 mnz@deploy1002: Started deploy [airflow-dags/research@5121748]: (no justification provided)
17:38 ejegg: Fundraising python tools upgraded from 94bac5c6 to 490a7b3f
17:37 ejegg: SmashPig upgraded from 565c61e4 to f2aca230
17:36 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P66505 and previous config saved to /var/cache/conftool/dbconfig/20240715-173625-arnaudb.json
17:21 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T367781)', diff saved to https://phabricator.wikimedia.org/P66504 and previous config saved to /var/cache/conftool/dbconfig/20240715-172118-arnaudb.json
17:19 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1197 (T367781)', diff saved to https://phabricator.wikimedia.org/P66503 and previous config saved to /var/cache/conftool/dbconfig/20240715-171908-arnaudb.json
17:19 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1197.eqiad.wmnet with reason: Maintenance
17:18 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1197.eqiad.wmnet with reason: Maintenance
17:18 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T367781)', diff saved to Unable to send diff to phaste and previous config saved to /var/cache/conftool/dbconfig/20240715-171841-arnaudb.json
17:03 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P66501 and previous config saved to /var/cache/conftool/dbconfig/20240715-170334-arnaudb.json
16:48 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P66500 and previous config saved to /var/cache/conftool/dbconfig/20240715-164827-arnaudb.json
16:33 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T367781)', diff saved to https://phabricator.wikimedia.org/P66499 and previous config saved to /var/cache/conftool/dbconfig/20240715-163320-arnaudb.json
16:31 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1188 (T367781)', diff saved to https://phabricator.wikimedia.org/P66498 and previous config saved to /var/cache/conftool/dbconfig/20240715-163110-arnaudb.json
16:31 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1188.eqiad.wmnet with reason: Maintenance
16:30 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1188.eqiad.wmnet with reason: Maintenance
16:30 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T367781)', diff saved to https://phabricator.wikimedia.org/P66497 and previous config saved to /var/cache/conftool/dbconfig/20240715-163048-arnaudb.json
16:15 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P66496 and previous config saved to /var/cache/conftool/dbconfig/20240715-161541-arnaudb.json
16:00 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P66495 and previous config saved to /var/cache/conftool/dbconfig/20240715-160033-arnaudb.json
15:47 klausman@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
15:47 klausman@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
15:45 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T367781)', diff saved to https://phabricator.wikimedia.org/P66494 and previous config saved to /var/cache/conftool/dbconfig/20240715-154526-arnaudb.json
15:43 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1182 (T367781)', diff saved to https://phabricator.wikimedia.org/P66493 and previous config saved to /var/cache/conftool/dbconfig/20240715-154312-arnaudb.json
15:43 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1182.eqiad.wmnet with reason: Maintenance
15:42 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1182.eqiad.wmnet with reason: Maintenance
15:42 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T367781)', diff saved to https://phabricator.wikimedia.org/P66492 and previous config saved to /var/cache/conftool/dbconfig/20240715-154250-arnaudb.json
15:32 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on netbox2003.codfw.wmnet with reason: netbox upgrade prep work
15:31 ayounsi@cumin1002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on netbox2003.codfw.wmnet with reason: netbox upgrade prep work
15:27 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P66491 and previous config saved to /var/cache/conftool/dbconfig/20240715-152742-arnaudb.json
15:17 klausman@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
15:16 klausman@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
15:16 klausman@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
15:14 mnz@deploy1002: Finished deploy [airflow-dags/research@5121748]: (no justification provided) (duration: 00m 31s)
15:13 mnz@deploy1002: Started deploy [airflow-dags/research@5121748]: (no justification provided)
15:13 klausman@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
15:12 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P66490 and previous config saved to /var/cache/conftool/dbconfig/20240715-151235-arnaudb.json
15:12 klausman@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
15:12 klausman@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
15:09 klausman@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
15:07 klausman@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
14:57 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T367781)', diff saved to https://phabricator.wikimedia.org/P66489 and previous config saved to /var/cache/conftool/dbconfig/20240715-145728-arnaudb.json
14:55 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1162 (T367781)', diff saved to https://phabricator.wikimedia.org/P66488 and previous config saved to /var/cache/conftool/dbconfig/20240715-145517-arnaudb.json
14:55 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1162.eqiad.wmnet with reason: Maintenance
14:55 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1162.eqiad.wmnet with reason: Maintenance
14:54 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T367781)', diff saved to https://phabricator.wikimedia.org/P66487 and previous config saved to /var/cache/conftool/dbconfig/20240715-145455-arnaudb.json
14:50 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on aqs1013.eqiad.wmnet with reason: Server swap — T362033
14:50 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on aqs1013.eqiad.wmnet with reason: Server swap — T362033
14:39 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P66486 and previous config saved to /var/cache/conftool/dbconfig/20240715-143948-arnaudb.json
14:24 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P66485 and previous config saved to /var/cache/conftool/dbconfig/20240715-142441-arnaudb.json
14:16 _joe_: updating conftool to 3.1.0 fleet wide
14:13 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbproxy2005.codfw.wmnet with OS bookworm
14:09 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T367781)', diff saved to https://phabricator.wikimedia.org/P66484 and previous config saved to /var/cache/conftool/dbconfig/20240715-140934-arnaudb.json
14:07 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1156 (T367781)', diff saved to https://phabricator.wikimedia.org/P66483 and previous config saved to /var/cache/conftool/dbconfig/20240715-140720-arnaudb.json
14:07 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
14:07 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
14:06 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1156.eqiad.wmnet with reason: Maintenance
14:06 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1156.eqiad.wmnet with reason: Maintenance
13:58 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbproxy2005.codfw.wmnet with reason: host reimage
13:54 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dbproxy2005.codfw.wmnet with reason: host reimage
13:53 oblivian@puppetmaster2001: conftool action : set/pooled=yes; selector: name=mw1386.*,cluster=kubernetes,dc=eqiad [reason: Test conftool sal logging]
13:51 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on netbox1003.eqiad.wmnet with reason: netbox upgrade prep work
13:51 ayounsi@cumin1002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on netbox1003.eqiad.wmnet with reason: netbox upgrade prep work
13:50 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on netboxdb2003.codfw.wmnet with reason: netbox upgrade prep work
13:50 ayounsi@cumin1002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on netboxdb2003.codfw.wmnet with reason: netbox upgrade prep work
13:45 _joe_: uploading conftool 3.1.0 to bookworm,bullseye,buster
13:41 Lucas_WMDE: UTC afternoon backport+config window done
13:39 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host dbproxy2005.codfw.wmnet with OS bookworm
13:33 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for Add entity-schema to $wgWBRepoSettings['searchIndexTypes'] (T369495) (duration: 30m 51s)
13:25 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde: Continuing with sync
13:15 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde: Backport for Add entity-schema to $wgWBRepoSettings['searchIndexTypes'] (T369495) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:02 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap sync-world: Backport for Add entity-schema to $wgWBRepoSettings['searchIndexTypes'] (T369495)
12:41 klausman@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
12:41 klausman@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
12:41 klausman@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
12:40 klausman@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
12:30 dcausse@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
12:30 dcausse@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
12:30 dcausse@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
12:30 dcausse@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
12:16 dcausse@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
12:15 dcausse@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
11:32 marostegui: test
11:31 marostegui: Reboot stashbot
11:25 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
11:24 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
11:11 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
11:11 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
11:11 claime: Increasing webVideoTranscodePrioritized concurrency in changeprop-jobqueue
11:09 cgoubert@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
11:08 cgoubert@deploy1002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
11:08 cgoubert@deploy1002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
10:21 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2116 (T367856)', diff saved to https://phabricator.wikimedia.org/P66480 and previous config saved to /var/cache/conftool/dbconfig/20240715-102117-marostegui.json
10:21 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2116.codfw.wmnet with reason: Maintenance
10:20 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2116.codfw.wmnet with reason: Maintenance
09:59 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 52999
09:59 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 52999
09:58 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 270361
09:58 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 270361
09:58 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 262293
09:58 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 262293
09:58 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 61941
09:57 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 61941
09:56 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 49544
09:54 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 49544
09:29 claime: manually removing mw1349.eqiad.wmnet mw1350.eqiad.wmnet mw1351.eqiad.wmnet from k8s following reimage to videoscalers - T351074
09:25 dcausse@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
09:22 dcausse@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
09:19 marostegui: Deploy schema change on s7 eqiad db1170 dbmaint T367856
09:18 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Long schema change
09:18 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Long schema change
09:18 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1170 (T367856)', diff saved to https://phabricator.wikimedia.org/P66479 and previous config saved to /var/cache/conftool/dbconfig/20240715-091800-marostegui.json
09:17 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
09:17 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
09:16 elukey@cumin1002: END (FAIL) - Cookbook sre.network.tls (exit_code=99) for network device lsw1-d3-codfw
09:15 marostegui: Deploy schema change on s7 codfw db2121 dbmaint T367856
09:15 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2121.codfw.wmnet with reason: Long schema change
09:14 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2121.codfw.wmnet with reason: Long schema change
09:14 elukey@cumin1002: START - Cookbook sre.network.tls for network device lsw1-d3-codfw
09:05 volans@cumin1002: dbctl commit (dc=all): 'Depool db2121 T369882', diff saved to https://phabricator.wikimedia.org/P66478 and previous config saved to /var/cache/conftool/dbconfig/20240715-090532-volans.json
08:56 volans@cumin1002: dbctl commit (dc=all): 'Promote db2218 to s7 primary T369882', diff saved to https://phabricator.wikimedia.org/P66477 and previous config saved to /var/cache/conftool/dbconfig/20240715-085654-volans.json
08:51 volans: Starting s7 codfw failover from db2121 to db2218 - T369882
08:33 slyngshede@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host idp2004.wikimedia.org
08:33 slyngshede@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host idp2004.wikimedia.org with OS bookworm
08:22 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 52468
08:21 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 52468
08:16 slyngshede@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on idp2004.wikimedia.org with reason: host reimage
08:13 slyngshede@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on idp2004.wikimedia.org with reason: host reimage
08:12 volans@cumin2002: dbctl commit (dc=all): 'Remove db2218 from API T369882', diff saved to https://phabricator.wikimedia.org/P66475 and previous config saved to /var/cache/conftool/dbconfig/20240715-081252-volans.json
08:09 volans@cumin2002: dbctl commit (dc=all): 'Set db2218 with weight 0 T369882', diff saved to https://phabricator.wikimedia.org/P66474 and previous config saved to /var/cache/conftool/dbconfig/20240715-080948-volans.json
08:05 volans@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 28 hosts with reason: Primary switchover s7 T369882
08:04 volans@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on 28 hosts with reason: Primary switchover s7 T369882
07:58 slyngshede@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM idp2004.wikimedia.org - slyngshede@cumin1002"
07:57 slyngshede@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM idp2004.wikimedia.org - slyngshede@cumin1002"
07:57 slyngshede@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) idp2004.wikimedia.org on all recursors
07:57 slyngshede@cumin1002: START - Cookbook sre.dns.wipe-cache idp2004.wikimedia.org on all recursors
07:57 slyngshede@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
07:57 slyngshede@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM idp2004.wikimedia.org - slyngshede@cumin1002"
07:55 slyngshede@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM idp2004.wikimedia.org - slyngshede@cumin1002"
07:53 slyngshede@cumin1002: START - Cookbook sre.dns.netbox
07:53 slyngshede@cumin1002: START - Cookbook sre.ganeti.makevm for new host idp2004.wikimedia.org
07:36 slyngshede@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host idp1004.wikimedia.org
07:36 slyngshede@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host idp1004.wikimedia.org with OS bookworm
07:21 slyngshede@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on idp1004.wikimedia.org with reason: host reimage
07:17 slyngshede@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on idp1004.wikimedia.org with reason: host reimage
07:17 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db1179.eqiad.wmnet with reason: T369855
07:17 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on db1179.eqiad.wmnet with reason: T369855
07:06 slyngshede@cumin1002: START - Cookbook sre.hosts.reimage for host idp1004.wikimedia.org with OS bookworm
07:05 slyngshede@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM idp1004.wikimedia.org - slyngshede@cumin1002"
07:04 slyngshede@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM idp1004.wikimedia.org - slyngshede@cumin1002"
07:04 slyngshede@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) idp1004.wikimedia.org on all recursors
07:04 slyngshede@cumin1002: START - Cookbook sre.dns.wipe-cache idp1004.wikimedia.org on all recursors
07:04 slyngshede@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
07:04 slyngshede@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM idp1004.wikimedia.org - slyngshede@cumin1002"
07:03 slyngshede@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM idp1004.wikimedia.org - slyngshede@cumin1002"
07:01 slyngshede@cumin1002: START - Cookbook sre.dns.netbox
07:00 slyngshede@cumin1002: START - Cookbook sre.ganeti.makevm for new host idp1004.wikimedia.org
06:22 marostegui@cumin1002: dbctl commit (dc=all): 'Repool db2136', diff saved to https://phabricator.wikimedia.org/P66473 and previous config saved to /var/cache/conftool/dbconfig/20240715-062216-root.json
06:07 dani@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
06:07 dani@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
06:07 dani@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
06:06 dani@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
06:06 dani@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
06:06 dani@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
05:12 marostegui@cumin1002: START - Cookbook sre.hosts.reimage for host dbproxy2005.codfw.wmnet with OS bookworm
04:47 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2137 (T367856)', diff saved to https://phabricator.wikimedia.org/P66472 and previous config saved to /var/cache/conftool/dbconfig/20240715-044723-marostegui.json
04:47 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2137.codfw.wmnet with reason: Maintenance
04:47 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2137.codfw.wmnet with reason: Maintenance
04:13 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
04:13 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove IPV6 for dbproxy200[5-8] - pt1979@cumin2002"
04:12 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove IPV6 for dbproxy200[5-8] - pt1979@cumin2002"
04:09 pt1979@cumin2002: START - Cookbook sre.dns.netbox
02:11 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
02:11 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
02:11 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1236 (T367856)', diff saved to https://phabricator.wikimedia.org/P66471 and previous config saved to /var/cache/conftool/dbconfig/20240715-021121-marostegui.json
01:56 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1236', diff saved to https://phabricator.wikimedia.org/P66470 and previous config saved to /var/cache/conftool/dbconfig/20240715-015613-marostegui.json
01:41 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1236', diff saved to https://phabricator.wikimedia.org/P66469 and previous config saved to /var/cache/conftool/dbconfig/20240715-014106-marostegui.json
01:26 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1236 (T367856)', diff saved to https://phabricator.wikimedia.org/P66467 and previous config saved to /var/cache/conftool/dbconfig/20240715-012559-marostegui.json

2024-07-14

22:31 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1236 (T367856)', diff saved to https://phabricator.wikimedia.org/P66466 and previous config saved to /var/cache/conftool/dbconfig/20240714-223146-marostegui.json
22:31 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1236.eqiad.wmnet with reason: Maintenance
22:31 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1236.eqiad.wmnet with reason: Maintenance
22:31 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1227 (T367856)', diff saved to https://phabricator.wikimedia.org/P66465 and previous config saved to /var/cache/conftool/dbconfig/20240714-223124-marostegui.json
22:16 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1227', diff saved to https://phabricator.wikimedia.org/P66464 and previous config saved to /var/cache/conftool/dbconfig/20240714-221617-marostegui.json
22:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1227', diff saved to https://phabricator.wikimedia.org/P66463 and previous config saved to /var/cache/conftool/dbconfig/20240714-220110-marostegui.json
21:46 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1227 (T367856)', diff saved to https://phabricator.wikimedia.org/P66462 and previous config saved to /var/cache/conftool/dbconfig/20240714-214603-marostegui.json
17:58 marostegui@cumin1002: dbctl commit (dc=all): 'db1189 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P66461 and previous config saved to /var/cache/conftool/dbconfig/20240714-175827-root.json
17:43 marostegui@cumin1002: dbctl commit (dc=all): 'db1189 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P66460 and previous config saved to /var/cache/conftool/dbconfig/20240714-174322-root.json
17:28 marostegui@cumin1002: dbctl commit (dc=all): 'db1189 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P66459 and previous config saved to /var/cache/conftool/dbconfig/20240714-172816-root.json
17:13 marostegui@cumin1002: dbctl commit (dc=all): 'db1189 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P66458 and previous config saved to /var/cache/conftool/dbconfig/20240714-171311-root.json
16:58 marostegui@cumin1002: dbctl commit (dc=all): 'db1189 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P66457 and previous config saved to /var/cache/conftool/dbconfig/20240714-165805-root.json
16:43 marostegui@cumin1002: dbctl commit (dc=all): 'db1189 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P66456 and previous config saved to /var/cache/conftool/dbconfig/20240714-164300-root.json
16:30 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: Maintenance
16:30 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: Maintenance
16:27 marostegui@cumin1002: dbctl commit (dc=all): 'db1189 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P66455 and previous config saved to /var/cache/conftool/dbconfig/20240714-162755-root.json
14:00 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1227 (T367856)', diff saved to https://phabricator.wikimedia.org/P66454 and previous config saved to /var/cache/conftool/dbconfig/20240714-140046-marostegui.json
14:00 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1227.eqiad.wmnet with reason: Maintenance
14:00 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1227.eqiad.wmnet with reason: Maintenance
14:00 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1202 (T367856)', diff saved to https://phabricator.wikimedia.org/P66453 and previous config saved to /var/cache/conftool/dbconfig/20240714-140024-marostegui.json
13:45 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P66452 and previous config saved to /var/cache/conftool/dbconfig/20240714-134517-marostegui.json
13:30 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P66451 and previous config saved to /var/cache/conftool/dbconfig/20240714-133010-marostegui.json
13:15 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1202 (T367856)', diff saved to https://phabricator.wikimedia.org/P66450 and previous config saved to /var/cache/conftool/dbconfig/20240714-131502-marostegui.json
09:35 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1202 (T367856)', diff saved to https://phabricator.wikimedia.org/P66449 and previous config saved to /var/cache/conftool/dbconfig/20240714-093540-marostegui.json
09:35 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1202.eqiad.wmnet with reason: Maintenance
09:35 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1202.eqiad.wmnet with reason: Maintenance
09:35 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T367856)', diff saved to https://phabricator.wikimedia.org/P66448 and previous config saved to /var/cache/conftool/dbconfig/20240714-093518-marostegui.json
09:20 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P66447 and previous config saved to /var/cache/conftool/dbconfig/20240714-092011-marostegui.json
09:05 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P66446 and previous config saved to /var/cache/conftool/dbconfig/20240714-090504-marostegui.json
08:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T367856)', diff saved to https://phabricator.wikimedia.org/P66445 and previous config saved to /var/cache/conftool/dbconfig/20240714-084956-marostegui.json
08:49 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1189 (T367856)', diff saved to https://phabricator.wikimedia.org/P66444 and previous config saved to /var/cache/conftool/dbconfig/20240714-084903-marostegui.json
08:48 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1189.eqiad.wmnet with reason: Maintenance
08:48 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1189.eqiad.wmnet with reason: Maintenance
05:46 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1194 (T367856)', diff saved to https://phabricator.wikimedia.org/P66443 and previous config saved to /var/cache/conftool/dbconfig/20240714-054611-marostegui.json
05:46 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1194.eqiad.wmnet with reason: Maintenance
05:45 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1194.eqiad.wmnet with reason: Maintenance
05:45 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T367856)', diff saved to https://phabricator.wikimedia.org/P66442 and previous config saved to /var/cache/conftool/dbconfig/20240714-054549-marostegui.json
05:30 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P66441 and previous config saved to /var/cache/conftool/dbconfig/20240714-053042-marostegui.json
05:15 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P66440 and previous config saved to /var/cache/conftool/dbconfig/20240714-051535-marostegui.json
05:00 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T367856)', diff saved to https://phabricator.wikimedia.org/P66439 and previous config saved to /var/cache/conftool/dbconfig/20240714-050027-marostegui.json
01:59 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1191 (T367856)', diff saved to https://phabricator.wikimedia.org/P66438 and previous config saved to /var/cache/conftool/dbconfig/20240714-015901-marostegui.json
01:58 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1191.eqiad.wmnet with reason: Maintenance
01:58 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1191.eqiad.wmnet with reason: Maintenance
01:58 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T367856)', diff saved to https://phabricator.wikimedia.org/P66437 and previous config saved to /var/cache/conftool/dbconfig/20240714-015838-marostegui.json
01:43 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P66436 and previous config saved to /var/cache/conftool/dbconfig/20240714-014331-marostegui.json
01:28 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P66435 and previous config saved to /var/cache/conftool/dbconfig/20240714-012824-marostegui.json
01:13 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T367856)', diff saved to https://phabricator.wikimedia.org/P66434 and previous config saved to /var/cache/conftool/dbconfig/20240714-011317-marostegui.json
00:13 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1174 (T367856)', diff saved to https://phabricator.wikimedia.org/P66433 and previous config saved to /var/cache/conftool/dbconfig/20240714-001301-marostegui.json
00:12 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1174.eqiad.wmnet with reason: Maintenance
00:12 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1174.eqiad.wmnet with reason: Maintenance

2024-07-13

15:52 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
15:52 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
15:52 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1170 (T367856)', diff saved to https://phabricator.wikimedia.org/P66432 and previous config saved to /var/cache/conftool/dbconfig/20240713-155158-marostegui.json
15:36 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1170', diff saved to https://phabricator.wikimedia.org/P66431 and previous config saved to /var/cache/conftool/dbconfig/20240713-153650-marostegui.json
15:21 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1170', diff saved to https://phabricator.wikimedia.org/P66430 and previous config saved to /var/cache/conftool/dbconfig/20240713-152143-marostegui.json
15:06 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1170 (T367856)', diff saved to https://phabricator.wikimedia.org/P66429 and previous config saved to /var/cache/conftool/dbconfig/20240713-150636-marostegui.json
14:06 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1170 (T367856)', diff saved to https://phabricator.wikimedia.org/P66428 and previous config saved to /var/cache/conftool/dbconfig/20240713-140620-marostegui.json
14:06 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
14:06 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
13:36 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
13:35 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
10:58 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
10:58 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
10:58 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1158.eqiad.wmnet with reason: Maintenance
10:58 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1158.eqiad.wmnet with reason: Maintenance
06:19 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1240.eqiad.wmnet with reason: Maintenance
06:19 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1240.eqiad.wmnet with reason: Maintenance
06:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1223 (T367856)', diff saved to https://phabricator.wikimedia.org/P66427 and previous config saved to /var/cache/conftool/dbconfig/20240713-061928-marostegui.json
06:04 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1223', diff saved to https://phabricator.wikimedia.org/P66426 and previous config saved to /var/cache/conftool/dbconfig/20240713-060421-marostegui.json
05:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1223', diff saved to https://phabricator.wikimedia.org/P66425 and previous config saved to /var/cache/conftool/dbconfig/20240713-054913-marostegui.json
05:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1223 (T367856)', diff saved to https://phabricator.wikimedia.org/P66424 and previous config saved to /var/cache/conftool/dbconfig/20240713-053406-marostegui.json
01:33 tzatziki: removing 2 files for legal compliance
01:22 tzatziki: removing 16 files for legal compliance
00:04 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2220 (T367856)', diff saved to https://phabricator.wikimedia.org/P66423 and previous config saved to /var/cache/conftool/dbconfig/20240713-000433-marostegui.json

2024-07-12

23:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2220', diff saved to https://phabricator.wikimedia.org/P66422 and previous config saved to /var/cache/conftool/dbconfig/20240712-234926-marostegui.json
23:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2220', diff saved to https://phabricator.wikimedia.org/P66421 and previous config saved to /var/cache/conftool/dbconfig/20240712-233419-marostegui.json
23:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2220 (T367856)', diff saved to https://phabricator.wikimedia.org/P66420 and previous config saved to /var/cache/conftool/dbconfig/20240712-231912-marostegui.json
22:34 tzatziki: removing 1 file for legal compliance
22:32 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1223 (T367856)', diff saved to https://phabricator.wikimedia.org/P66419 and previous config saved to /var/cache/conftool/dbconfig/20240712-223226-marostegui.json
22:32 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1223.eqiad.wmnet with reason: Maintenance
22:32 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1223.eqiad.wmnet with reason: Maintenance
22:32 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1212 (T367856)', diff saved to https://phabricator.wikimedia.org/P66418 and previous config saved to /var/cache/conftool/dbconfig/20240712-223204-marostegui.json
22:21 tzatziki: removing 1 file for legal compliance
22:16 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1212', diff saved to https://phabricator.wikimedia.org/P66417 and previous config saved to /var/cache/conftool/dbconfig/20240712-221656-marostegui.json
22:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1212', diff saved to https://phabricator.wikimedia.org/P66416 and previous config saved to /var/cache/conftool/dbconfig/20240712-220149-marostegui.json
21:46 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1212 (T367856)', diff saved to https://phabricator.wikimedia.org/P66415 and previous config saved to /var/cache/conftool/dbconfig/20240712-214642-marostegui.json
19:02 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1212 (T367856)', diff saved to https://phabricator.wikimedia.org/P66414 and previous config saved to /var/cache/conftool/dbconfig/20240712-190224-marostegui.json
19:02 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
19:02 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
19:02 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1212.eqiad.wmnet with reason: Maintenance
19:02 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1212.eqiad.wmnet with reason: Maintenance
19:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T367856)', diff saved to https://phabricator.wikimedia.org/P66413 and previous config saved to /var/cache/conftool/dbconfig/20240712-190154-marostegui.json
18:46 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P66412 and previous config saved to /var/cache/conftool/dbconfig/20240712-184647-marostegui.json
18:31 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P66411 and previous config saved to /var/cache/conftool/dbconfig/20240712-183140-marostegui.json
18:16 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T367856)', diff saved to https://phabricator.wikimedia.org/P66410 and previous config saved to /var/cache/conftool/dbconfig/20240712-181632-marostegui.json
17:10 hnowlan@cumin1002: conftool action : set/pooled=yes:weight=10; selector: name=(mw1349.eqiad.wmnet|mw1350.eqiad.wmnet|mw1351.eqiad.wmnet)
17:07 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw1349.eqiad.wmnet
17:07 hnowlan@cumin1002: START - Cookbook sre.hosts.remove-downtime for mw1349.eqiad.wmnet
17:07 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw[1350-1351].eqiad.wmnet
17:07 cgoubert@cumin1002: START - Cookbook sre.hosts.remove-downtime for mw[1350-1351].eqiad.wmnet
17:06 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw1351.eqiad.wmnet with OS buster
17:03 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw1350.eqiad.wmnet with OS buster
17:00 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw1349.eqiad.wmnet with OS buster
16:32 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1351.eqiad.wmnet with reason: host reimage
16:29 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1350.eqiad.wmnet with reason: host reimage
16:27 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1349.eqiad.wmnet with reason: host reimage
16:24 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1351.eqiad.wmnet with reason: host reimage
16:24 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1350.eqiad.wmnet with reason: host reimage
16:23 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1349.eqiad.wmnet with reason: host reimage
16:17 dcausse@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
16:16 dcausse@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
16:10 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host mw1351.eqiad.wmnet with OS buster
16:09 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host mw1350.eqiad.wmnet with OS buster
16:09 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host mw1349.eqiad.wmnet with OS buster
16:05 cgoubert@cumin1002: conftool action : set/pooled=inactive; selector: name=(mw1349|mw1350|mw1351).eqiad.wmnet,cluster=(jobrunner|videoscaler)
16:05 cgoubert@cumin1002: conftool action : set/pooled=yes; selector: name=(mw1349|mw1350|mw1351).eqiad.wmnet,cluster=(jobrunner|videoscaler)
16:04 claime: pooling mw1349, mw1350, mw1351 as jobrunners
16:03 cgoubert@cumin1002: conftool action : set/pooled=no:weight=10; selector: name=(mw1349|mw1350|mw1351).eqiad.wmnet,cluster=(jobrunner|videoscaler)
16:01 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw1351.eqiad.wmnet with OS buster
16:01 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw1350.eqiad.wmnet
16:01 cgoubert@cumin1002: START - Cookbook sre.hosts.remove-downtime for mw1350.eqiad.wmnet
16:00 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw1349.eqiad.wmnet
16:00 cgoubert@cumin1002: START - Cookbook sre.hosts.remove-downtime for mw1349.eqiad.wmnet
15:59 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw1350.eqiad.wmnet with OS buster
15:58 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw1349.eqiad.wmnet with OS buster
15:57 cgoubert@cumin1002: conftool action : set/pooled=no:weight=10; selector: name=(mw1349|mw1350|mw1351).eqiad.wmnet,cluster=jobrunner
15:57 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host sretest2001.codfw.wmnet
15:49 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2220 (T367856)', diff saved to https://phabricator.wikimedia.org/P66408 and previous config saved to /var/cache/conftool/dbconfig/20240712-154954-marostegui.json
15:49 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2220.codfw.wmnet with reason: Maintenance
15:49 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2220.codfw.wmnet with reason: Maintenance
15:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2218 (T367856)', diff saved to https://phabricator.wikimedia.org/P66407 and previous config saved to /var/cache/conftool/dbconfig/20240712-154921-marostegui.json
15:47 dcausse@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
15:47 dcausse@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
15:46 pt1979@cumin2002: START - Cookbook sre.hosts.dhcp for host sretest2001.codfw.wmnet
15:46 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host sretest2001.codfw.wmnet
15:46 pt1979@cumin2002: START - Cookbook sre.hosts.dhcp for host sretest2001.codfw.wmnet
15:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2218', diff saved to https://phabricator.wikimedia.org/P66406 and previous config saved to /var/cache/conftool/dbconfig/20240712-153414-marostegui.json
15:33 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/commons-impact-analytics: apply
15:33 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1351.eqiad.wmnet with reason: host reimage
15:32 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/commons-impact-analytics: apply
15:26 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/commons-impact-analytics: apply
15:26 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1350.eqiad.wmnet with reason: host reimage
15:25 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/commons-impact-analytics: apply
15:23 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1349.eqiad.wmnet with reason: host reimage
15:21 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1351.eqiad.wmnet with reason: host reimage
15:21 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1350.eqiad.wmnet with reason: host reimage
15:21 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1349.eqiad.wmnet with reason: host reimage
15:20 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/commons-impact-analytics: apply
15:20 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/commons-impact-analytics: apply
15:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2218', diff saved to https://phabricator.wikimedia.org/P66405 and previous config saved to /var/cache/conftool/dbconfig/20240712-151907-marostegui.json
15:17 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
15:17 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
15:17 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
15:17 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
15:15 hnowlan: homer 'cr*eqiad*' commit 'videoscaler reimages mw1349/mw135[01]'
15:08 dcausse@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
15:07 dcausse@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
15:07 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host mw1351.eqiad.wmnet with OS buster
15:06 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host mw1350.eqiad.wmnet with OS buster
15:06 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host mw1349.eqiad.wmnet with OS buster
15:04 dcausse@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
15:04 dcausse@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
15:04 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2218 (T367856)', diff saved to https://phabricator.wikimedia.org/P66404 and previous config saved to /var/cache/conftool/dbconfig/20240712-150400-marostegui.json
15:03 dcausse@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
15:02 dcausse@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
14:58 cgoubert@cumin1002: conftool action : set/pooled=inactive; selector: name=(mw1349.eqiad.wmnet|mw1350.eqiad.wmnet|mw1351.eqiad.wmnet),cluster=kubernetes,service=kubesvc
14:55 claime: Draining and depooling mw1349, mw1350, mw1351 for reimage as jobrunners
14:36 elukey@cumin1002: END (FAIL) - Cookbook sre.network.tls (exit_code=99) for network device lsw1-d3-codfw
14:34 elukey@cumin1002: START - Cookbook sre.network.tls for network device lsw1-d3-codfw
14:20 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
14:19 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
14:19 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
14:18 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
13:45 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:43 pt1979@cumin2002: START - Cookbook sre.dns.netbox
13:22 cdanis@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/aus-k8s-eqiad-services/jaeger: apply
13:21 cdanis@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/aus-k8s-eqiad-services/jaeger: apply
13:21 cdanis@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/aus-k8s-eqiad-services/jaeger: apply
13:21 cdanis@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/aus-k8s-eqiad-services/jaeger: apply
13:19 cdanis@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/aus-k8s-eqiad-services/jaeger: apply
13:18 cdanis@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/aus-k8s-eqiad-services/jaeger: apply
13:18 cdanis@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/aus-k8s-eqiad-services/jaeger: apply
13:12 cdanis@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/aus-k8s-eqiad-services/jaeger: apply
13:10 topranks: pushing updated BGP policy to cr2-eqord and cr2-eqdfw to announce Anycast ranges from network pops (T367439)
10:24 arnaudb@cumin1002: dbctl commit (dc=all): 'db1196 (re)pooling @ 100%: stopping T367781', diff saved to https://phabricator.wikimedia.org/P66396 and previous config saved to /var/cache/conftool/dbconfig/20240712-102416-arnaudb.json
10:22 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1198 (T367856)', diff saved to https://phabricator.wikimedia.org/P66395 and previous config saved to /var/cache/conftool/dbconfig/20240712-102243-marostegui.json
10:22 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1198.eqiad.wmnet with reason: Maintenance
10:22 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1198.eqiad.wmnet with reason: Maintenance
10:22 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1189 (T367856)', diff saved to https://phabricator.wikimedia.org/P66394 and previous config saved to /var/cache/conftool/dbconfig/20240712-102221-marostegui.json
10:18 godog: stop benthos@webrequest_live on centrallog2002 and start it on centrallog1002 - T369737
10:09 arnaudb@cumin1002: dbctl commit (dc=all): 'db1196 (re)pooling @ 75%: stopping T367781', diff saved to https://phabricator.wikimedia.org/P66393 and previous config saved to /var/cache/conftool/dbconfig/20240712-100910-arnaudb.json
10:07 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P66392 and previous config saved to /var/cache/conftool/dbconfig/20240712-100714-marostegui.json
09:54 arnaudb@cumin1002: dbctl commit (dc=all): 'db1196 (re)pooling @ 50%: stopping T367781', diff saved to https://phabricator.wikimedia.org/P66391 and previous config saved to /var/cache/conftool/dbconfig/20240712-095405-arnaudb.json
09:53 godog: temp stop benthos@webrequest_live on centrallog1002 - T369737
09:52 klausman@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
09:52 klausman@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
09:52 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P66389 and previous config saved to /var/cache/conftool/dbconfig/20240712-095207-marostegui.json
09:39 arnaudb@cumin1002: dbctl commit (dc=all): 'db1196 (re)pooling @ 25%: stopping T367781', diff saved to https://phabricator.wikimedia.org/P66388 and previous config saved to /var/cache/conftool/dbconfig/20240712-093900-arnaudb.json
09:37 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1189 (T367856)', diff saved to https://phabricator.wikimedia.org/P66387 and previous config saved to /var/cache/conftool/dbconfig/20240712-093700-marostegui.json
09:23 arnaudb@cumin1002: dbctl commit (dc=all): 'db1196 (re)pooling @ 10%: stopping T367781', diff saved to https://phabricator.wikimedia.org/P66386 and previous config saved to /var/cache/conftool/dbconfig/20240712-092354-arnaudb.json
09:20 dcausse@deploy1002: Finished scap: Backport for Re-add CirrusSearch prefix to statsd metrics (T359033) (duration: 09m 44s)
09:15 dcausse@deploy1002: dcausse: Continuing with sync
09:13 dcausse@deploy1002: dcausse: Backport for Re-add CirrusSearch prefix to statsd metrics (T359033) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
09:10 dcausse@deploy1002: Started scap sync-world: Backport for Re-add CirrusSearch prefix to statsd metrics (T359033)
09:10 elukey: upgrade httpd version in production (bullseye/bookworm) for T369885
09:08 arnaudb@cumin1002: dbctl commit (dc=all): 'db1196 (re)pooling @ 5%: stopping T367781', diff saved to https://phabricator.wikimedia.org/P66385 and previous config saved to /var/cache/conftool/dbconfig/20240712-090849-arnaudb.json
09:05 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1158 (T367781)', diff saved to https://phabricator.wikimedia.org/P66384 and previous config saved to /var/cache/conftool/dbconfig/20240712-090527-arnaudb.json
09:05 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
09:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
09:05 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1158.eqiad.wmnet with reason: Maintenance
09:04 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1158.eqiad.wmnet with reason: Maintenance
08:59 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on db1179.eqiad.wmnet with reason: T369855
08:59 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on db1179.eqiad.wmnet with reason: T369855
08:42 godog: tweak benthos@webrequest_live output batching on centrallog2001 - T369737
08:36 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1189 (T367856)', diff saved to https://phabricator.wikimedia.org/P66383 and previous config saved to /var/cache/conftool/dbconfig/20240712-083644-marostegui.json
08:36 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1189.eqiad.wmnet with reason: Maintenance
08:36 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1189.eqiad.wmnet with reason: Maintenance
08:36 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T367856)', diff saved to https://phabricator.wikimedia.org/P66382 and previous config saved to /var/cache/conftool/dbconfig/20240712-083621-marostegui.json
08:21 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P66381 and previous config saved to /var/cache/conftool/dbconfig/20240712-082114-marostegui.json
08:06 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P66380 and previous config saved to /var/cache/conftool/dbconfig/20240712-080607-marostegui.json
07:51 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T367856)', diff saved to https://phabricator.wikimedia.org/P66379 and previous config saved to /var/cache/conftool/dbconfig/20240712-075100-marostegui.json
07:31 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2218 (T367856)', diff saved to https://phabricator.wikimedia.org/P66377 and previous config saved to /var/cache/conftool/dbconfig/20240712-073102-marostegui.json
07:30 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2218.codfw.wmnet with reason: Maintenance
07:30 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2218.codfw.wmnet with reason: Maintenance
07:30 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2208 (T367856)', diff saved to https://phabricator.wikimedia.org/P66376 and previous config saved to /var/cache/conftool/dbconfig/20240712-073040-marostegui.json
07:30 kevinbazira@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'article-descriptions' for release 'main' .
07:24 kevinbazira@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'article-descriptions' for release 'main' .
07:15 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2208', diff saved to https://phabricator.wikimedia.org/P66375 and previous config saved to /var/cache/conftool/dbconfig/20240712-071533-marostegui.json
07:00 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2208', diff saved to https://phabricator.wikimedia.org/P66374 and previous config saved to /var/cache/conftool/dbconfig/20240712-070026-marostegui.json
06:37 Dreamy_Jazz: Starting MediaModeration scan on commons after it crashed last night due to database issues - https://wikitech.wikimedia.org/wiki/MediaModeration
06:18 marostegui@cumin1002: dbctl commit (dc=all): 'db2127 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P66372 and previous config saved to /var/cache/conftool/dbconfig/20240712-061835-root.json
06:03 marostegui@cumin1002: dbctl commit (dc=all): 'db2127 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P66371 and previous config saved to /var/cache/conftool/dbconfig/20240712-060329-root.json
05:48 marostegui@cumin1002: dbctl commit (dc=all): 'db2127 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P66370 and previous config saved to /var/cache/conftool/dbconfig/20240712-054824-root.json
05:33 marostegui@cumin1002: dbctl commit (dc=all): 'db2127 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P66369 and previous config saved to /var/cache/conftool/dbconfig/20240712-053318-root.json
05:18 marostegui@cumin1002: dbctl commit (dc=all): 'db2127 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P66368 and previous config saved to /var/cache/conftool/dbconfig/20240712-051813-root.json
05:08 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2136', diff saved to https://phabricator.wikimedia.org/P66367 and previous config saved to /var/cache/conftool/dbconfig/20240712-050800-root.json
05:03 marostegui@cumin1002: dbctl commit (dc=all): 'db2127 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P66366 and previous config saved to /var/cache/conftool/dbconfig/20240712-050307-root.json
04:48 marostegui@cumin1002: dbctl commit (dc=all): 'db2127 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P66365 and previous config saved to /var/cache/conftool/dbconfig/20240712-044802-root.json
03:52 ayounsi@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host netboxdb2003.codfw.wmnet
03:52 ayounsi@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host netboxdb2003.codfw.wmnet with OS bookworm
00:01 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1175 (T367856)', diff saved to https://phabricator.wikimedia.org/P66364 and previous config saved to /var/cache/conftool/dbconfig/20240712-000131-marostegui.json
00:01 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1175.eqiad.wmnet with reason: Maintenance
00:01 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1175.eqiad.wmnet with reason: Maintenance
00:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T367856)', diff saved to https://phabricator.wikimedia.org/P66363 and previous config saved to /var/cache/conftool/dbconfig/20240712-000109-marostegui.json

2024-07-11

23:46 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P66362 and previous config saved to /var/cache/conftool/dbconfig/20240711-234602-marostegui.json
23:37 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2213 (T367781)', diff saved to https://phabricator.wikimedia.org/P66361 and previous config saved to /var/cache/conftool/dbconfig/20240711-233712-arnaudb.json
23:30 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P66360 and previous config saved to /var/cache/conftool/dbconfig/20240711-233054-marostegui.json
23:22 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2208 (T367856)', diff saved to https://phabricator.wikimedia.org/P66359 and previous config saved to /var/cache/conftool/dbconfig/20240711-232218-marostegui.json
23:22 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2208.codfw.wmnet with reason: Maintenance
23:22 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2213', diff saved to https://phabricator.wikimedia.org/P66358 and previous config saved to /var/cache/conftool/dbconfig/20240711-232205-arnaudb.json
23:21 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2208.codfw.wmnet with reason: Maintenance
23:15 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T367856)', diff saved to https://phabricator.wikimedia.org/P66357 and previous config saved to /var/cache/conftool/dbconfig/20240711-231547-marostegui.json
23:06 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2213', diff saved to https://phabricator.wikimedia.org/P66356 and previous config saved to /var/cache/conftool/dbconfig/20240711-230657-arnaudb.json
23:06 zabe@deploy1002: Finished scap: update interwiki cache (duration: 07m 37s)
22:59 zabe@deploy1002: Started scap sync-world: update interwiki cache
22:51 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2213 (T367781)', diff saved to https://phabricator.wikimedia.org/P66355 and previous config saved to /var/cache/conftool/dbconfig/20240711-225150-arnaudb.json
22:49 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2213 (T367781)', diff saved to https://phabricator.wikimedia.org/P66354 and previous config saved to /var/cache/conftool/dbconfig/20240711-224858-arnaudb.json
22:48 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2213.codfw.wmnet with reason: Maintenance
22:48 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2213.codfw.wmnet with reason: Maintenance
22:48 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2211 (T367781)', diff saved to https://phabricator.wikimedia.org/P66353 and previous config saved to /var/cache/conftool/dbconfig/20240711-224836-arnaudb.json
22:33 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2211', diff saved to https://phabricator.wikimedia.org/P66352 and previous config saved to /var/cache/conftool/dbconfig/20240711-223329-arnaudb.json
22:27 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
22:27 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove IPV6 for dbproxy200[5-8] - pt1979@cumin2002"
22:26 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove IPV6 for dbproxy200[5-8] - pt1979@cumin2002"
22:23 pt1979@cumin2002: START - Cookbook sre.dns.netbox
22:18 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2211', diff saved to https://phabricator.wikimedia.org/P66351 and previous config saved to /var/cache/conftool/dbconfig/20240711-221822-arnaudb.json
22:03 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2211 (T367781)', diff saved to https://phabricator.wikimedia.org/P66350 and previous config saved to /var/cache/conftool/dbconfig/20240711-220315-arnaudb.json
21:59 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2211 (T367781)', diff saved to https://phabricator.wikimedia.org/P66349 and previous config saved to /var/cache/conftool/dbconfig/20240711-215921-arnaudb.json
21:59 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2211.codfw.wmnet with reason: Maintenance
21:59 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2211.codfw.wmnet with reason: Maintenance
21:57 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2201.codfw.wmnet with reason: Maintenance
21:57 rzl: systemctl restart apache2 on mwdebug1002, mwdebug2001, mwdebug2002 for https://gerrit.wikimedia.org/r/1052128
21:57 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2201.codfw.wmnet with reason: Maintenance
21:57 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2192 (T367781)', diff saved to https://phabricator.wikimedia.org/P66348 and previous config saved to /var/cache/conftool/dbconfig/20240711-215700-arnaudb.json
21:44 rzl: rzl@mwdebug1002:~$ sudo apache2ctl restart
21:41 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2192', diff saved to https://phabricator.wikimedia.org/P66347 and previous config saved to /var/cache/conftool/dbconfig/20240711-214153-arnaudb.json
21:38 jhathaway: upgrading exim4 to 4.94.2-7+deb11u3
21:26 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2192', diff saved to https://phabricator.wikimedia.org/P66346 and previous config saved to /var/cache/conftool/dbconfig/20240711-212646-arnaudb.json
21:13 catrope@deploy1002: Finished scap: Backport for Change Linter log level to info (duration: 14m 40s)
21:11 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Unbanning all hosts in search_eqiad
21:11 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2192 (T367781)', diff saved to https://phabricator.wikimedia.org/P66345 and previous config saved to /var/cache/conftool/dbconfig/20240711-211138-arnaudb.json
21:11 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Unbanning all hosts in search_eqiad
21:08 catrope@deploy1002: arlolra, catrope: Continuing with sync
21:07 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2192 (T367781)', diff saved to https://phabricator.wikimedia.org/P66344 and previous config saved to /var/cache/conftool/dbconfig/20240711-210747-arnaudb.json
21:07 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2192.codfw.wmnet with reason: Maintenance
21:07 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2192.codfw.wmnet with reason: Maintenance
21:07 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T367781)', diff saved to https://phabricator.wikimedia.org/P66343 and previous config saved to /var/cache/conftool/dbconfig/20240711-210725-arnaudb.json
21:05 catrope@deploy1002: arlolra, catrope: Backport for Change Linter log level to info synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:59 catrope@deploy1002: Started scap sync-world: Backport for Change Linter log level to info
20:52 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P66342 and previous config saved to /var/cache/conftool/dbconfig/20240711-205218-arnaudb.json
20:37 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P66341 and previous config saved to /var/cache/conftool/dbconfig/20240711-203711-arnaudb.json
20:37 catrope@deploy1002: Finished scap: Backport for Vector theme should default to day (T369833) (duration: 17m 09s)
20:32 catrope@deploy1002: jdlrobson, catrope: Continuing with sync
20:30 catrope@deploy1002: jdlrobson, catrope: Backport for Vector theme should default to day (T369833) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:29 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbproxy2005.codfw.wmnet with OS bookworm
20:28 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
20:26 eileen: config revision changed from 540f27e6 to c25da839 renable silverpop_daily
20:22 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T367781)', diff saved to https://phabricator.wikimedia.org/P66340 and previous config saved to /var/cache/conftool/dbconfig/20240711-202204-arnaudb.json
20:19 catrope@deploy1002: Started scap sync-world: Backport for Vector theme should default to day (T369833)
20:18 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2178 (T367781)', diff saved to https://phabricator.wikimedia.org/P66339 and previous config saved to /var/cache/conftool/dbconfig/20240711-201815-arnaudb.json
20:18 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2178.codfw.wmnet with reason: Maintenance
20:17 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2178.codfw.wmnet with reason: Maintenance
20:17 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2171 (T367781)', diff saved to https://phabricator.wikimedia.org/P66338 and previous config saved to /var/cache/conftool/dbconfig/20240711-201753-arnaudb.json
20:15 catrope@deploy1002: Finished scap: Backport for Graph: Fix JSON parse errors in Graph data source tracking (duration: 13m 32s)
20:10 catrope@deploy1002: catrope: Continuing with sync
20:08 catrope@deploy1002: catrope: Backport for Graph: Fix JSON parse errors in Graph data source tracking synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:02 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2171', diff saved to https://phabricator.wikimedia.org/P66337 and previous config saved to /var/cache/conftool/dbconfig/20240711-200246-arnaudb.json
20:01 catrope@deploy1002: Started scap sync-world: Backport for Graph: Fix JSON parse errors in Graph data source tracking
19:47 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2171', diff saved to https://phabricator.wikimedia.org/P66336 and previous config saved to /var/cache/conftool/dbconfig/20240711-194739-arnaudb.json
19:32 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2171 (T367781)', diff saved to https://phabricator.wikimedia.org/P66335 and previous config saved to /var/cache/conftool/dbconfig/20240711-193231-arnaudb.json
19:28 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2171 (T367781)', diff saved to https://phabricator.wikimedia.org/P66334 and previous config saved to /var/cache/conftool/dbconfig/20240711-192842-arnaudb.json
19:28 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2171.codfw.wmnet with reason: Maintenance
19:28 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2171.codfw.wmnet with reason: Maintenance
19:28 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T367781)', diff saved to https://phabricator.wikimedia.org/P66333 and previous config saved to /var/cache/conftool/dbconfig/20240711-192820-arnaudb.json
19:23 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
19:13 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P66332 and previous config saved to /var/cache/conftool/dbconfig/20240711-191313-arnaudb.json
19:12 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/commons-impact-analytics: apply
19:11 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/commons-impact-analytics: apply
19:09 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbproxy2005.codfw.wmnet with reason: host reimage
19:06 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dbproxy2005.codfw.wmnet with reason: host reimage
18:58 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P66331 and previous config saved to /var/cache/conftool/dbconfig/20240711-185805-arnaudb.json
18:52 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host dbproxy2005.codfw.wmnet with OS bookworm
18:43 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T367781)', diff saved to https://phabricator.wikimedia.org/P66330 and previous config saved to /var/cache/conftool/dbconfig/20240711-184258-arnaudb.json
18:40 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2157 (T367781)', diff saved to https://phabricator.wikimedia.org/P66329 and previous config saved to /var/cache/conftool/dbconfig/20240711-184009-arnaudb.json
18:40 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2157.codfw.wmnet with reason: Maintenance
18:39 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2157.codfw.wmnet with reason: Maintenance
18:39 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T367781)', diff saved to https://phabricator.wikimedia.org/P66328 and previous config saved to /var/cache/conftool/dbconfig/20240711-183946-arnaudb.json
18:24 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P66327 and previous config saved to /var/cache/conftool/dbconfig/20240711-182438-arnaudb.json
18:18 aokoth@cumin1002: END (FAIL) - Cookbook sre.vrts.upgrade (exit_code=99) on VRTS host vrts1001.eqiad.wmnet
18:15 aokoth@cumin1002: START - Cookbook sre.vrts.upgrade on VRTS host vrts1001.eqiad.wmnet
18:09 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P66326 and previous config saved to /var/cache/conftool/dbconfig/20240711-180931-arnaudb.json
18:00 aokoth@cumin1002: END (FAIL) - Cookbook sre.vrts.upgrade (exit_code=93) on VRTS host vrts1001.eqiad.wmnet
18:00 aokoth@cumin1002: START - Cookbook sre.vrts.upgrade on VRTS host vrts1001.eqiad.wmnet
17:54 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T367781)', diff saved to https://phabricator.wikimedia.org/P66325 and previous config saved to /var/cache/conftool/dbconfig/20240711-175424-arnaudb.json
17:52 daniel@deploy1002: Finished scap: Backport for Enable Special:RestSandbox on testwiki (T362006) (duration: 11m 01s)
17:52 rzl@cumin2002: dbctl commit (dc=all): 'db1179 depooled', diff saved to https://phabricator.wikimedia.org/P66324 and previous config saved to /var/cache/conftool/dbconfig/20240711-175212-rzl.json
17:50 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2128 (T367781)', diff saved to https://phabricator.wikimedia.org/P66322 and previous config saved to /var/cache/conftool/dbconfig/20240711-175038-arnaudb.json
17:50 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2186.codfw.wmnet with reason: Maintenance
17:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2186.codfw.wmnet with reason: Maintenance
17:50 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2128.codfw.wmnet with reason: Maintenance
17:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2128.codfw.wmnet with reason: Maintenance
17:49 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
17:49 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
17:49 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1245.eqiad.wmnet with reason: Maintenance
17:48 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1245.eqiad.wmnet with reason: Maintenance
17:48 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1216.eqiad.wmnet with reason: Maintenance
17:48 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1216.eqiad.wmnet with reason: Maintenance
17:48 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1213 (T367781)', diff saved to https://phabricator.wikimedia.org/P66321 and previous config saved to /var/cache/conftool/dbconfig/20240711-174820-arnaudb.json
17:47 daniel@deploy1002: daniel: Continuing with sync
17:46 daniel@deploy1002: daniel: Backport for Enable Special:RestSandbox on testwiki (T362006) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
17:41 daniel@deploy1002: Started scap sync-world: Backport for Enable Special:RestSandbox on testwiki (T362006)
17:33 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1213', diff saved to https://phabricator.wikimedia.org/P66319 and previous config saved to /var/cache/conftool/dbconfig/20240711-173313-arnaudb.json
17:28 akosiaris@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host deploy1003.eqiad.wmnet with OS bullseye
17:18 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1213', diff saved to https://phabricator.wikimedia.org/P66318 and previous config saved to /var/cache/conftool/dbconfig/20240711-171806-arnaudb.json
17:10 daniel@deploy1002: Started scap sync-world: Backport for Enable Special:RestSandbox on testwiki (T362006)
17:10 bd808@deploy1002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
17:09 bd808@deploy1002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
17:09 bd808@deploy1002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
17:08 bd808@deploy1002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
17:07 bd808@deploy1002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
17:07 bd808@deploy1002: helmfile [staging] START helmfile.d/services/developer-portal: apply
17:06 mutante: puppetmaster1001 - puppet cert clean aphlict..discovery.wmnet T369796 T360413
17:03 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1213 (T367781)', diff saved to https://phabricator.wikimedia.org/P66317 and previous config saved to /var/cache/conftool/dbconfig/20240711-170258-arnaudb.json
17:00 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1213 (T367781)', diff saved to https://phabricator.wikimedia.org/P66316 and previous config saved to /var/cache/conftool/dbconfig/20240711-170030-arnaudb.json
17:00 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1213.eqiad.wmnet with reason: Maintenance
17:00 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1213.eqiad.wmnet with reason: Maintenance
17:00 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1210 (T367781)', diff saved to https://phabricator.wikimedia.org/P66315 and previous config saved to /var/cache/conftool/dbconfig/20240711-170007-arnaudb.json
16:58 mutante: puppetmaster1001 - puppet cert clean phabricator.discovery.wmnet T369796 T360413
16:58 dcausse@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
16:58 dcausse@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
16:46 dcausse@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
16:46 dcausse@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
16:45 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1210', diff saved to https://phabricator.wikimedia.org/P66314 and previous config saved to /var/cache/conftool/dbconfig/20240711-164500-arnaudb.json
16:40 dcausse@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
16:40 dcausse@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
16:29 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1210', diff saved to https://phabricator.wikimedia.org/P66313 and previous config saved to /var/cache/conftool/dbconfig/20240711-162953-arnaudb.json
16:14 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1210 (T367781)', diff saved to https://phabricator.wikimedia.org/P66312 and previous config saved to /var/cache/conftool/dbconfig/20240711-161446-arnaudb.json
16:13 ejegg: payments-wiki upgraded from 4e48059a to c8edeb8e
16:12 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1210 (T367781)', diff saved to https://phabricator.wikimedia.org/P66311 and previous config saved to /var/cache/conftool/dbconfig/20240711-161219-arnaudb.json
16:12 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1210.eqiad.wmnet with reason: Maintenance
16:12 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1210.eqiad.wmnet with reason: Maintenance
16:11 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T367781)', diff saved to https://phabricator.wikimedia.org/P66310 and previous config saved to /var/cache/conftool/dbconfig/20240711-161157-arnaudb.json
16:03 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2200.codfw.wmnet with reason: Maintenance
16:03 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2200.codfw.wmnet with reason: Maintenance
15:56 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P66309 and previous config saved to /var/cache/conftool/dbconfig/20240711-155649-arnaudb.json
15:53 dcausse@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
15:52 dcausse@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
15:51 akosiaris@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on deploy1003.eqiad.wmnet with reason: host reimage
15:51 arnaudb@cumin1002: dbctl commit (dc=all): 'db1193 (re)pooling @ 100%: post T365996 repool', diff saved to https://phabricator.wikimedia.org/P66308 and previous config saved to /var/cache/conftool/dbconfig/20240711-155109-arnaudb.json
15:48 akosiaris@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on deploy1003.eqiad.wmnet with reason: host reimage
15:41 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P66307 and previous config saved to /var/cache/conftool/dbconfig/20240711-154142-arnaudb.json
15:41 sfaci@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
15:40 sfaci@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
15:36 akosiaris@cumin1002: START - Cookbook sre.hosts.reimage for host deploy1003.eqiad.wmnet with OS bullseye
15:36 arnaudb@cumin1002: dbctl commit (dc=all): 'db1193 (re)pooling @ 75%: post T365996 repool', diff saved to https://phabricator.wikimedia.org/P66306 and previous config saved to /var/cache/conftool/dbconfig/20240711-153604-arnaudb.json
15:31 dcausse@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
15:30 dcausse@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
15:29 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1166 (T367856)', diff saved to https://phabricator.wikimedia.org/P66305 and previous config saved to /var/cache/conftool/dbconfig/20240711-152946-marostegui.json
15:29 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
15:29 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
15:26 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T367781)', diff saved to https://phabricator.wikimedia.org/P66304 and previous config saved to /var/cache/conftool/dbconfig/20240711-152635-arnaudb.json
15:26 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
15:24 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1200 (T367781)', diff saved to https://phabricator.wikimedia.org/P66303 and previous config saved to /var/cache/conftool/dbconfig/20240711-152412-arnaudb.json
15:24 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1200.eqiad.wmnet with reason: Maintenance
15:23 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1200.eqiad.wmnet with reason: Maintenance
15:23 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T367781)', diff saved to https://phabricator.wikimedia.org/P66302 and previous config saved to /var/cache/conftool/dbconfig/20240711-152350-arnaudb.json
15:22 sfaci@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
15:22 sfaci@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
15:22 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
15:21 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply
15:21 arnaudb@cumin1002: dbctl commit (dc=all): 'db1193 (re)pooling @ 50%: post T365996 repool', diff saved to https://phabricator.wikimedia.org/P66301 and previous config saved to /var/cache/conftool/dbconfig/20240711-152058-arnaudb.json
15:20 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply
15:20 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
15:17 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
15:13 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-wikifunctions: apply
15:13 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-wikifunctions: apply
15:12 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-wikifunctions: apply
15:12 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/mw-wikifunctions: apply
15:12 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-wikifunctions: apply
15:11 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/mw-wikifunctions: apply
15:11 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-wikifunctions: apply
15:11 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/mw-wikifunctions: apply
15:08 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P66300 and previous config saved to /var/cache/conftool/dbconfig/20240711-150843-arnaudb.json
15:05 arnaudb@cumin1002: dbctl commit (dc=all): 'db1193 (re)pooling @ 25%: post T365996 repool', diff saved to https://phabricator.wikimedia.org/P66299 and previous config saved to /var/cache/conftool/dbconfig/20240711-150553-arnaudb.json
15:03 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
15:01 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
15:00 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
14:59 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
14:55 Emperor: repool ms-fe1014 and thanos-fe1004 before switch work T365996
14:53 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P66298 and previous config saved to /var/cache/conftool/dbconfig/20240711-145336-arnaudb.json
14:50 arnaudb@cumin1002: dbctl commit (dc=all): 'db1193 (re)pooling @ 10%: post T365996 repool', diff saved to https://phabricator.wikimedia.org/P66297 and previous config saved to /var/cache/conftool/dbconfig/20240711-145047-arnaudb.json
14:43 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
14:42 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop: apply
14:42 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
14:42 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop: apply
14:41 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
14:40 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: apply
14:38 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T367781)', diff saved to https://phabricator.wikimedia.org/P66296 and previous config saved to /var/cache/conftool/dbconfig/20240711-143829-arnaudb.json
14:36 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1185 (T367781)', diff saved to https://phabricator.wikimedia.org/P66295 and previous config saved to /var/cache/conftool/dbconfig/20240711-143606-arnaudb.json
14:35 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1185.eqiad.wmnet with reason: Maintenance
14:35 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1185.eqiad.wmnet with reason: Maintenance
14:35 arnaudb@cumin1002: dbctl commit (dc=all): 'db1193 (re)pooling @ 5%: post T365996 repool', diff saved to https://phabricator.wikimedia.org/P66294 and previous config saved to /var/cache/conftool/dbconfig/20240711-143541-arnaudb.json
14:35 godog: pool titan1001 for switch work T365996
14:25 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on backup1011.eqiad.wmnet,db1193.eqiad.wmnet,dbproxy1027.eqiad.wmnet with reason: T365996
14:25 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1:30:00 on backup1011.eqiad.wmnet,db1193.eqiad.wmnet,dbproxy1027.eqiad.wmnet with reason: T365996
14:25 arnaudb@cumin1002: dbctl commit (dc=all): 'T365996 - depool db1193 - s8', diff saved to https://phabricator.wikimedia.org/P66293 and previous config saved to /var/cache/conftool/dbconfig/20240711-142544-arnaudb.json
14:20 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1183', diff saved to https://phabricator.wikimedia.org/P66292 and previous config saved to /var/cache/conftool/dbconfig/20240711-142037-arnaudb.json
14:19 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 23 hosts with reason: JunOS upgrade lsw1-f1-eqiad
14:19 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on 23 hosts with reason: JunOS upgrade lsw1-f1-eqiad
14:15 topranks: rebooting lsw1-f1-eqiad to install updated JunOS version T365996
14:12 godog: depool titan1001 for switch work T365996
14:12 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 23 hosts with reason: JunOS upgrade lsw1-f1-eqiad
14:12 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on 23 hosts with reason: JunOS upgrade lsw1-f1-eqiad
14:09 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lsw1-f1-eqiad,lsw1-f1-eqiad IPv6,ssw1-e1-eqiad.mgmt,ssw1-f1-eqiad.mgmt with reason: JunOS upgrade lsw1-f1-eqiad
14:08 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on lsw1-f1-eqiad,lsw1-f1-eqiad IPv6,ssw1-e1-eqiad.mgmt,ssw1-f1-eqiad.mgmt with reason: JunOS upgrade lsw1-f1-eqiad
14:08 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:50:00 on lsw1-f1-eqiad.mgmt with reason: prep JunOS upgrade lsw1-f1-eqiad
14:08 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:50:00 on lsw1-f1-eqiad.mgmt with reason: prep JunOS upgrade lsw1-f1-eqiad
14:05 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1183', diff saved to https://phabricator.wikimedia.org/P66291 and previous config saved to /var/cache/conftool/dbconfig/20240711-140530-arnaudb.json
13:56 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
13:52 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
13:50 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1183 (T367781)', diff saved to https://phabricator.wikimedia.org/P66290 and previous config saved to /var/cache/conftool/dbconfig/20240711-135023-arnaudb.json
13:50 Emperor: depool ms-fe1014 and thanos-fe1004 before switch work T365996
13:48 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1183 (T367781)', diff saved to https://phabricator.wikimedia.org/P66289 and previous config saved to /var/cache/conftool/dbconfig/20240711-134759-arnaudb.json
13:47 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1183.eqiad.wmnet with reason: Maintenance
13:47 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1183.eqiad.wmnet with reason: Maintenance
13:47 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T367781)', diff saved to https://phabricator.wikimedia.org/P66288 and previous config saved to /var/cache/conftool/dbconfig/20240711-134737-arnaudb.json
13:44 btullis@cumin1002: END (PASS) - Cookbook sre.presto.roll-restart-workers (exit_code=0) for Presto an-presto cluster: Roll restart of all Presto's jvm daemons.
13:32 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
13:32 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P66287 and previous config saved to /var/cache/conftool/dbconfig/20240711-133229-arnaudb.json
13:29 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
13:28 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1090.eqiad.wmnet
13:26 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
13:22 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
13:20 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-worker1090.eqiad.wmnet
13:17 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P66286 and previous config saved to /var/cache/conftool/dbconfig/20240711-131721-arnaudb.json
13:14 cgoubert@cumin1002: conftool action : set/pooled=yes; selector: name=(kubernetes1062.eqiad.wmnet|mw1494.eqiad.wmnet|mw1495.eqiad.wmnet),cluster=kubernetes,service=kubesvc
13:14 claime: Uncordoning and depooling kubernetes1062.eqiad.wmnet mw1494.eqiad.wmnet mw1495.eqiad.wmnet that were actually not concerned by T365996
13:13 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
13:12 btullis@cumin1002: START - Cookbook sre.presto.roll-restart-workers for Presto an-presto cluster: Roll restart of all Presto's jvm daemons.
13:10 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
13:09 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
13:08 cgoubert@cumin1002: conftool action : set/pooled=inactive; selector: name=(kubernetes1062.eqiad.wmnet|mw1494.eqiad.wmnet|mw1495.eqiad.wmnet),cluster=kubernetes,service=kubesvc
13:05 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
13:04 claime: Cordoning and depooling kubernetes1062.eqiad.wmnet mw1494.eqiad.wmnet mw1495.eqiad.wmnet for T365996
13:04 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6 days, 0:00:00 on relforge[1003-1004].eqiad.wmnet with reason: T368950
13:04 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
13:03 bking@cumin2002: START - Cookbook sre.hosts.downtime for 6 days, 0:00:00 on relforge[1003-1004].eqiad.wmnet with reason: T368950
13:02 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T367781)', diff saved to https://phabricator.wikimedia.org/P66285 and previous config saved to /var/cache/conftool/dbconfig/20240711-130214-arnaudb.json
13:00 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
12:59 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1161 (T367781)', diff saved to https://phabricator.wikimedia.org/P66284 and previous config saved to /var/cache/conftool/dbconfig/20240711-125949-arnaudb.json
12:59 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
12:59 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
12:59 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1161.eqiad.wmnet with reason: Maintenance
12:59 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1161.eqiad.wmnet with reason: Maintenance
12:55 godog: reenable benthos@webrequest_live on centrallog2002 - T369737
12:51 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on netboxdb2003.codfw.wmnet with reason: netbox upgrade prep work
12:51 ayounsi@cumin1002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on netboxdb2003.codfw.wmnet with reason: netbox upgrade prep work
12:51 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on netboxdb1003.eqiad.wmnet with reason: netbox upgrade prep work
12:51 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
12:51 ayounsi@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on netboxdb2003.codfw.wmnet with reason: host reimage
12:51 ayounsi@cumin1002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on netboxdb1003.eqiad.wmnet with reason: netbox upgrade prep work
12:50 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
12:50 dcausse@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
12:50 dcausse@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
12:50 claime: running puppet on O:analytics_cluster::turnilo,O:analytics_cluster::turnilo::staging
12:48 godog: temp stop benthos@webrequest_live on centrallog2002 - T369737
12:47 ayounsi@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on netboxdb2003.codfw.wmnet with reason: host reimage
12:43 dcausse@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
12:42 dcausse@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
12:39 ayounsi@cumin1002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 4 days, 0:00:00 on netboxdb1003.eqiad.wmnet with reason: netbox upgrade prep work
12:39 ayounsi@cumin1002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on netboxdb1003.eqiad.wmnet with reason: netbox upgrade prep work
12:30 ayounsi@cumin2002: START - Cookbook sre.hosts.reimage for host netboxdb2003.codfw.wmnet with OS bookworm
12:30 ayounsi@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM netboxdb2003.codfw.wmnet - ayounsi@cumin2002"
12:29 ayounsi@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM netboxdb2003.codfw.wmnet - ayounsi@cumin2002"
12:28 ayounsi@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) netboxdb2003.codfw.wmnet on all recursors
12:28 ayounsi@cumin2002: START - Cookbook sre.dns.wipe-cache netboxdb2003.codfw.wmnet on all recursors
12:28 ayounsi@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
12:28 ayounsi@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM netboxdb2003.codfw.wmnet - ayounsi@cumin2002"
12:28 dcausse@deploy1002: Finished deploy [airflow-dags/search@7bb895a]: search: stop using api-ro.discovery.wmnet (duration: 00m 21s)
12:27 dcausse@deploy1002: Started deploy [airflow-dags/search@7bb895a]: search: stop using api-ro.discovery.wmnet
12:27 ayounsi@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM netboxdb2003.codfw.wmnet - ayounsi@cumin2002"
12:24 ayounsi@cumin2002: START - Cookbook sre.dns.netbox
12:24 ayounsi@cumin2002: START - Cookbook sre.ganeti.makevm for new host netboxdb2003.codfw.wmnet
11:50 ayounsi@cumin1002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=97) for new host netboxdb1003.eqiad.wmnet
11:50 ayounsi@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host netboxdb1003.eqiad.wmnet with OS bookworm
11:49 dcausse@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
11:48 dcausse@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
11:36 ayounsi@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host netbox2003.codfw.wmnet
11:36 ayounsi@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host netbox2003.codfw.wmnet with OS bookworm
11:29 ayounsi@cumin1002: START - Cookbook sre.hosts.reimage for host netboxdb1003.eqiad.wmnet with OS bookworm
11:29 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
11:29 ayounsi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM netboxdb1003.eqiad.wmnet - ayounsi@cumin1002"
11:29 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop: apply
11:29 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on netbox2003.codfw.wmnet with reason: netbox upgrade prep work
11:29 ayounsi@cumin1002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on netbox2003.codfw.wmnet with reason: netbox upgrade prep work
11:28 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on netbox1003.eqiad.wmnet with reason: netbox upgrade prep work
11:28 ayounsi@cumin1002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on netbox1003.eqiad.wmnet with reason: netbox upgrade prep work
11:28 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM netboxdb1003.eqiad.wmnet - ayounsi@cumin1002"
11:28 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) netboxdb1003.eqiad.wmnet on all recursors
11:28 ayounsi@cumin1002: START - Cookbook sre.dns.wipe-cache netboxdb1003.eqiad.wmnet on all recursors
11:28 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
11:28 ayounsi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM netboxdb1003.eqiad.wmnet - ayounsi@cumin1002"
11:26 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM netboxdb1003.eqiad.wmnet - ayounsi@cumin1002"
11:24 ayounsi@cumin1002: START - Cookbook sre.dns.netbox
11:24 ayounsi@cumin1002: START - Cookbook sre.ganeti.makevm for new host netboxdb1003.eqiad.wmnet
11:14 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
11:14 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop: apply
11:13 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
11:12 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: apply
11:02 ayounsi@cumin1002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host netboxdb1003.eqiad.wmnet
11:02 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) netboxdb1003.eqiad.wmnet on all recursors
11:02 ayounsi@cumin1002: START - Cookbook sre.dns.wipe-cache netboxdb1003.eqiad.wmnet on all recursors
11:02 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
11:00 ayounsi@cumin1002: START - Cookbook sre.dns.netbox
11:00 ayounsi@cumin1002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
10:58 ayounsi@cumin1002: START - Cookbook sre.dns.netbox
10:58 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) netboxdb1003.eqiad.wmnet on all recursors
10:58 ayounsi@cumin1002: START - Cookbook sre.dns.wipe-cache netboxdb1003.eqiad.wmnet on all recursors
10:58 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
10:57 ayounsi@cumin1002: START - Cookbook sre.dns.netbox
10:56 ayounsi@cumin1002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
10:53 ayounsi@cumin1002: START - Cookbook sre.dns.netbox
10:53 ayounsi@cumin1002: START - Cookbook sre.ganeti.makevm for new host netboxdb1003.eqiad.wmnet
10:52 ayounsi@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host netbox1003.eqiad.wmnet
10:52 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host netbox1003.eqiad.wmnet with OS bookworm
10:51 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic: apply
10:51 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic: apply
10:47 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub: sync on production
10:41 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub: apply on production
10:40 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/superset: apply
10:40 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/superset: apply
10:39 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/superset-next: apply
10:39 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/superset-next: apply
10:37 sfaci@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
10:36 sfaci@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
10:34 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
10:34 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
10:27 jelto@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: Upgrade GitLab to new version
10:12 klausman@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
10:12 klausman@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
10:01 sukhe: [end] authdns-update for sending BR to magru: T359054
10:00 sukhe: [start] authdns-update for sending BR to magru: T359054
09:54 dcausse@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
09:54 dcausse@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
09:53 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
09:53 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
09:45 ayounsi@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on netbox2003.codfw.wmnet with reason: host reimage
09:42 ayounsi@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on netbox2003.codfw.wmnet with reason: host reimage
09:36 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
09:33 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
09:31 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
09:28 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
09:25 ayounsi@cumin2002: START - Cookbook sre.hosts.reimage for host netbox2003.codfw.wmnet with OS bookworm
09:25 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on netbox1003.eqiad.wmnet with reason: host reimage
09:23 jiji@deploy1002: Finished scap: Remove mcrouter container and exporter from mediawiki pods (duration: 04m 33s)
09:23 ayounsi@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM netbox2003.codfw.wmnet - ayounsi@cumin2002"
09:22 ayounsi@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM netbox2003.codfw.wmnet - ayounsi@cumin2002"
09:22 ayounsi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on netbox1003.eqiad.wmnet with reason: host reimage
09:22 ayounsi@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) netbox2003.codfw.wmnet on all recursors
09:22 ayounsi@cumin2002: START - Cookbook sre.dns.wipe-cache netbox2003.codfw.wmnet on all recursors
09:22 ayounsi@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
09:22 ayounsi@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM netbox2003.codfw.wmnet - ayounsi@cumin2002"
09:20 ayounsi@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM netbox2003.codfw.wmnet - ayounsi@cumin2002"
09:19 jiji@deploy1002: Started scap sync-world: Remove mcrouter container and exporter from mediawiki pods
09:18 ayounsi@cumin2002: START - Cookbook sre.dns.netbox
09:18 ayounsi@cumin2002: START - Cookbook sre.ganeti.makevm for new host netbox2003.codfw.wmnet
09:13 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
09:12 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
09:11 ayounsi@cumin1002: START - Cookbook sre.hosts.reimage for host netbox1003.eqiad.wmnet with OS bookworm
09:10 ayounsi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM netbox1003.eqiad.wmnet - ayounsi@cumin1002"
09:09 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM netbox1003.eqiad.wmnet - ayounsi@cumin1002"
09:09 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) netbox1003.eqiad.wmnet on all recursors
09:09 ayounsi@cumin1002: START - Cookbook sre.dns.wipe-cache netbox1003.eqiad.wmnet on all recursors
09:09 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
09:09 ayounsi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM netbox1003.eqiad.wmnet - ayounsi@cumin1002"
09:08 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM netbox1003.eqiad.wmnet - ayounsi@cumin1002"
09:05 ayounsi@cumin1002: START - Cookbook sre.dns.netbox
09:05 ayounsi@cumin1002: START - Cookbook sre.ganeti.makevm for new host netbox1003.eqiad.wmnet
09:05 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
09:04 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
09:02 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
09:00 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
08:57 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
08:57 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
08:55 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
08:55 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
08:46 elukey: cd /srv/git/private; git reset --hard HEAD^ on puppetserver1001 to remove my last local commit (test before migration of the private repo to puppetserver1001) - T368023
08:41 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2198.codfw.wmnet with reason: Maintenance
08:41 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2198.codfw.wmnet with reason: Maintenance
08:41 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T367856)', diff saved to https://phabricator.wikimedia.org/P66280 and previous config saved to /var/cache/conftool/dbconfig/20240711-084151-marostegui.json
08:30 hashar: Switched CI Quibble and Phan jobs based on PHP 8.1, 8.2 and 8.3 from Buster to Bullseye - T335766 T366799 T369146
08:26 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P66279 and previous config saved to /var/cache/conftool/dbconfig/20240711-082644-marostegui.json
08:15 aklapper@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.43.0-wmf.13 refs T366958
08:11 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P66278 and previous config saved to /var/cache/conftool/dbconfig/20240711-081137-marostegui.json
08:05 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Upgrade GitLab to new version
07:56 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T367856)', diff saved to https://phabricator.wikimedia.org/P66277 and previous config saved to /var/cache/conftool/dbconfig/20240711-075630-marostegui.json
07:50 marostegui: Deploy schema change on s3 codfw db2127 dbmaint T367856
07:48 dcausse: closing the backport window
07:48 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2127.codfw.wmnet with reason: Long schema change
07:48 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2127.codfw.wmnet with reason: Long schema change
07:47 dcausse@deploy1002: Finished scap: Backport for Fix pool counter metric (duration: 09m 56s)
07:46 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2127 T369691', diff saved to https://phabricator.wikimedia.org/P66276 and previous config saved to /var/cache/conftool/dbconfig/20240711-074629-marostegui.json
07:45 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db2205 to s3 primary T369691', diff saved to https://phabricator.wikimedia.org/P66275 and previous config saved to /var/cache/conftool/dbconfig/20240711-074534-marostegui.json
07:45 marostegui: Starting s3 codfw failover from db2127 to db2205 - T369691
07:42 dcausse@deploy1002: dcausse: Continuing with sync
07:41 dcausse@deploy1002: dcausse: Backport for Fix pool counter metric synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
07:37 dcausse@deploy1002: Started scap sync-world: Backport for Fix pool counter metric
07:31 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s3 T369691
07:31 marostegui@cumin1002: dbctl commit (dc=all): 'Set db2205 with weight 0 T369691', diff saved to https://phabricator.wikimedia.org/P66274 and previous config saved to /var/cache/conftool/dbconfig/20240711-073101-root.json
07:30 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
07:30 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 25 hosts with reason: Primary switchover s3 T369691
07:30 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
07:28 jgiannelos@deploy1002: Finished scap: Backport for Linter: trigger parsoid parses on template changes (T361013) (duration: 14m 25s)
07:23 jgiannelos@deploy1002: daniel, jgiannelos: Continuing with sync
07:17 jgiannelos@deploy1002: daniel, jgiannelos: Backport for Linter: trigger parsoid parses on template changes (T361013) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
07:14 jgiannelos@deploy1002: Started scap sync-world: Backport for Linter: trigger parsoid parses on template changes (T361013)
07:12 kartik@deploy1002: Finished scap: Backport for Enable MinT for Wikipedia readers MVP on a second group of pilot wikis (T367067) (duration: 09m 32s)
07:07 kartik@deploy1002: kartik: Continuing with sync
07:05 kartik@deploy1002: kartik: Backport for Enable MinT for Wikipedia readers MVP on a second group of pilot wikis (T367067) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
07:02 kartik@deploy1002: Started scap sync-world: Backport for Enable MinT for Wikipedia readers MVP on a second group of pilot wikis (T367067)
07:00 marostegui@cumin1002: dbctl commit (dc=all): 'db1163 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P66273 and previous config saved to /var/cache/conftool/dbconfig/20240711-070004-root.json
06:55 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2200.codfw.wmnet with reason: Maintenance
06:55 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2200.codfw.wmnet with reason: Maintenance
06:55 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2198.codfw.wmnet with reason: Maintenance
06:55 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2198.codfw.wmnet with reason: Maintenance
06:55 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 (T367781)', diff saved to https://phabricator.wikimedia.org/P66272 and previous config saved to /var/cache/conftool/dbconfig/20240711-065508-arnaudb.json
06:54 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2209 (T367856)', diff saved to https://phabricator.wikimedia.org/P66271 and previous config saved to /var/cache/conftool/dbconfig/20240711-065432-marostegui.json
06:29 marostegui@cumin1002: dbctl commit (dc=all): 'db1163 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P66267 and previous config saved to /var/cache/conftool/dbconfig/20240711-062953-root.json
06:24 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P66266 and previous config saved to /var/cache/conftool/dbconfig/20240711-062454-arnaudb.json
06:24 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2209', diff saved to https://phabricator.wikimedia.org/P66265 and previous config saved to /var/cache/conftool/dbconfig/20240711-062417-marostegui.json
06:14 marostegui@cumin1002: dbctl commit (dc=all): 'db1163 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P66264 and previous config saved to /var/cache/conftool/dbconfig/20240711-061447-root.json
06:09 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 (T367781)', diff saved to https://phabricator.wikimedia.org/P66263 and previous config saved to /var/cache/conftool/dbconfig/20240711-060947-arnaudb.json
06:09 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2209 (T367856)', diff saved to https://phabricator.wikimedia.org/P66262 and previous config saved to /var/cache/conftool/dbconfig/20240711-060910-marostegui.json
06:07 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2195 (T367781)', diff saved to https://phabricator.wikimedia.org/P66261 and previous config saved to /var/cache/conftool/dbconfig/20240711-060736-arnaudb.json
06:07 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2195.codfw.wmnet with reason: Maintenance
06:07 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2195.codfw.wmnet with reason: Maintenance
06:07 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 (T367781)', diff saved to https://phabricator.wikimedia.org/P66260 and previous config saved to /var/cache/conftool/dbconfig/20240711-060714-arnaudb.json
05:59 marostegui@cumin1002: dbctl commit (dc=all): 'db1163 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P66259 and previous config saved to /var/cache/conftool/dbconfig/20240711-055942-root.json
05:52 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P66258 and previous config saved to /var/cache/conftool/dbconfig/20240711-055206-arnaudb.json
05:44 marostegui@cumin1002: dbctl commit (dc=all): 'db1163 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P66257 and previous config saved to /var/cache/conftool/dbconfig/20240711-054436-root.json
05:36 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P66256 and previous config saved to /var/cache/conftool/dbconfig/20240711-053659-arnaudb.json
05:29 marostegui@cumin1002: dbctl commit (dc=all): 'db1163 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P66255 and previous config saved to /var/cache/conftool/dbconfig/20240711-052931-root.json
05:27 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1163 T369514', diff saved to https://phabricator.wikimedia.org/P66254 and previous config saved to /var/cache/conftool/dbconfig/20240711-052702-root.json
05:25 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db1184 to s1 primary and set section read-write T369514', diff saved to https://phabricator.wikimedia.org/P66253 and previous config saved to /var/cache/conftool/dbconfig/20240711-052540-root.json
05:25 marostegui@cumin1002: dbctl commit (dc=all): 'Set s1 eqiad as read-only for maintenance - T369514', diff saved to https://phabricator.wikimedia.org/P66252 and previous config saved to /var/cache/conftool/dbconfig/20240711-052507-root.json
05:24 marostegui: Starting s1 eqiad failover from db1163 to db1184 - T369514
05:21 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 (T367781)', diff saved to https://phabricator.wikimedia.org/P66251 and previous config saved to /var/cache/conftool/dbconfig/20240711-052151-arnaudb.json
05:19 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2181 (T367781)', diff saved to https://phabricator.wikimedia.org/P66250 and previous config saved to /var/cache/conftool/dbconfig/20240711-051941-arnaudb.json
05:19 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2181.codfw.wmnet with reason: Maintenance
05:19 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2181.codfw.wmnet with reason: Maintenance
05:19 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167 (T367781)', diff saved to https://phabricator.wikimedia.org/P66249 and previous config saved to /var/cache/conftool/dbconfig/20240711-051920-arnaudb.json
05:04 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P66248 and previous config saved to /var/cache/conftool/dbconfig/20240711-050413-arnaudb.json
04:59 marostegui@cumin1002: dbctl commit (dc=all): 'Remove db1184 from API/vslow/dump T369514', diff saved to https://phabricator.wikimedia.org/P66247 and previous config saved to /var/cache/conftool/dbconfig/20240711-045905-marostegui.json
04:58 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 36 hosts with reason: Primary switchover s1 T369514
04:58 marostegui@cumin1002: dbctl commit (dc=all): 'Set db1184 with weight 0 T369514', diff saved to https://phabricator.wikimedia.org/P66246 and previous config saved to /var/cache/conftool/dbconfig/20240711-045829-marostegui.json
04:58 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 36 hosts with reason: Primary switchover s1 T369514
04:49 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P66245 and previous config saved to /var/cache/conftool/dbconfig/20240711-044905-arnaudb.json
04:34 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167 (T367781)', diff saved to https://phabricator.wikimedia.org/P66244 and previous config saved to /var/cache/conftool/dbconfig/20240711-043358-arnaudb.json
04:31 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2167 (T367781)', diff saved to https://phabricator.wikimedia.org/P66243 and previous config saved to /var/cache/conftool/dbconfig/20240711-043147-arnaudb.json
04:31 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2167.codfw.wmnet with reason: Maintenance
04:31 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2167.codfw.wmnet with reason: Maintenance
04:31 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T367781)', diff saved to https://phabricator.wikimedia.org/P66242 and previous config saved to /var/cache/conftool/dbconfig/20240711-043124-arnaudb.json
04:16 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P66241 and previous config saved to /var/cache/conftool/dbconfig/20240711-041617-arnaudb.json
04:01 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P66240 and previous config saved to /var/cache/conftool/dbconfig/20240711-040110-arnaudb.json
03:46 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T367781)', diff saved to https://phabricator.wikimedia.org/P66239 and previous config saved to /var/cache/conftool/dbconfig/20240711-034603-arnaudb.json
03:43 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2166 (T367781)', diff saved to https://phabricator.wikimedia.org/P66238 and previous config saved to /var/cache/conftool/dbconfig/20240711-034352-arnaudb.json
03:43 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2166.codfw.wmnet with reason: Maintenance
03:43 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2166.codfw.wmnet with reason: Maintenance
03:43 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2165 (T367781)', diff saved to https://phabricator.wikimedia.org/P66237 and previous config saved to /var/cache/conftool/dbconfig/20240711-034330-arnaudb.json
03:28 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P66236 and previous config saved to /var/cache/conftool/dbconfig/20240711-032823-arnaudb.json
03:20 eileen: civicrm upgraded from 04cb9083 to 3287ced0
03:13 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P66235 and previous config saved to /var/cache/conftool/dbconfig/20240711-031316-arnaudb.json
03:08 eileen: civicrm upgraded from 2d1a0aad to 04cb9083
02:58 eileen: config revision changed from e02c3a85 to 540f27e6
02:58 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2165 (T367781)', diff saved to https://phabricator.wikimedia.org/P66234 and previous config saved to /var/cache/conftool/dbconfig/20240711-025809-arnaudb.json
02:56 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2165 (T367781)', diff saved to https://phabricator.wikimedia.org/P66233 and previous config saved to /var/cache/conftool/dbconfig/20240711-025558-arnaudb.json
02:55 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2165.codfw.wmnet with reason: Maintenance
02:55 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2165.codfw.wmnet with reason: Maintenance
02:55 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T367781)', diff saved to https://phabricator.wikimedia.org/P66232 and previous config saved to /var/cache/conftool/dbconfig/20240711-025537-arnaudb.json
02:48 eileen: civicrm upgraded from a17496a2 to 2d1a0aad
02:45 mutante: stewards2001 - sudo mv /srv/repos/users-db /root/ - run puppet and let it recreate the usersdb repo - this time pulling from gitlab - T369780 T369430
02:40 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P66231 and previous config saved to /var/cache/conftool/dbconfig/20240711-024030-arnaudb.json
02:25 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P66230 and previous config saved to /var/cache/conftool/dbconfig/20240711-022522-arnaudb.json
02:15 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1060.eqiad.wmnet with OS bookworm
02:10 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T367781)', diff saved to https://phabricator.wikimedia.org/P66229 and previous config saved to /var/cache/conftool/dbconfig/20240711-021015-arnaudb.json
02:08 eileen: civicrm upgraded from a03085ff to 1e2fcba3
02:08 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2164 (T367781)', diff saved to https://phabricator.wikimedia.org/P66228 and previous config saved to /var/cache/conftool/dbconfig/20240711-020805-arnaudb.json
02:07 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2186.codfw.wmnet with reason: Maintenance
02:07 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2186.codfw.wmnet with reason: Maintenance
02:07 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2164.codfw.wmnet with reason: Maintenance
02:07 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2164.codfw.wmnet with reason: Maintenance
02:07 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T367781)', diff saved to https://phabricator.wikimedia.org/P66227 and previous config saved to /var/cache/conftool/dbconfig/20240711-020738-arnaudb.json
01:54 eileen: config revision changed from 840e6b90 to e02c3a85
01:52 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P66226 and previous config saved to /var/cache/conftool/dbconfig/20240711-015231-arnaudb.json
01:46 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1060.eqiad.wmnet with reason: host reimage
01:43 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1060.eqiad.wmnet with reason: host reimage
01:37 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P66225 and previous config saved to /var/cache/conftool/dbconfig/20240711-013723-arnaudb.json
01:27 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1060.eqiad.wmnet with OS bookworm
01:22 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T367781)', diff saved to https://phabricator.wikimedia.org/P66224 and previous config saved to /var/cache/conftool/dbconfig/20240711-012216-arnaudb.json
01:21 mutante: gerrit-replica.wikimedia.org (gerrit2002) - switched firewall provider from iptables to nftables - all seems fine to me but just in case: gerrit:1053068 can be reverted to go back
01:20 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2163 (T367781)', diff saved to https://phabricator.wikimedia.org/P66223 and previous config saved to /var/cache/conftool/dbconfig/20240711-012006-arnaudb.json
01:19 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2163.codfw.wmnet with reason: Maintenance
01:19 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2163.codfw.wmnet with reason: Maintenance
01:19 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2162 (T367781)', diff saved to https://phabricator.wikimedia.org/P66222 and previous config saved to /var/cache/conftool/dbconfig/20240711-011944-arnaudb.json
01:04 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2162', diff saved to https://phabricator.wikimedia.org/P66221 and previous config saved to /var/cache/conftool/dbconfig/20240711-010437-arnaudb.json
00:55 mutante: gerrit-replica.wikimedia.org (gerrit2002) - maintenance
00:49 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2162', diff saved to https://phabricator.wikimedia.org/P66220 and previous config saved to /var/cache/conftool/dbconfig/20240711-004930-arnaudb.json
00:49 dzahn@cumin1002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 1:00:00 on gerrit-replica.wikimedia.org with reason: switch firewall provider
00:49 dzahn@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on gerrit-replica.wikimedia.org with reason: switch firewall provider
00:49 dzahn@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on gerrit2002.wikimedia.org with reason: switch firewall provider
00:48 dzahn@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on gerrit2002.wikimedia.org with reason: switch firewall provider
00:34 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2162 (T367781)', diff saved to https://phabricator.wikimedia.org/P66219 and previous config saved to /var/cache/conftool/dbconfig/20240711-003423-arnaudb.json
00:32 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2162 (T367781)', diff saved to https://phabricator.wikimedia.org/P66218 and previous config saved to /var/cache/conftool/dbconfig/20240711-003212-arnaudb.json
00:32 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2162.codfw.wmnet with reason: Maintenance
00:32 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2162.codfw.wmnet with reason: Maintenance
00:31 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T367781)', diff saved to https://phabricator.wikimedia.org/P66217 and previous config saved to /var/cache/conftool/dbconfig/20240711-003150-arnaudb.json
00:16 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P66216 and previous config saved to /var/cache/conftool/dbconfig/20240711-001643-arnaudb.json
00:01 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P66215 and previous config saved to /var/cache/conftool/dbconfig/20240711-000136-arnaudb.json

2024-07-10

23:46 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T367781)', diff saved to https://phabricator.wikimedia.org/P66214 and previous config saved to /var/cache/conftool/dbconfig/20240710-234629-arnaudb.json
23:44 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2154 (T367781)', diff saved to https://phabricator.wikimedia.org/P66213 and previous config saved to /var/cache/conftool/dbconfig/20240710-234418-arnaudb.json
23:44 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2154.codfw.wmnet with reason: Maintenance
23:44 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2154.codfw.wmnet with reason: Maintenance
23:43 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T367781)', diff saved to https://phabricator.wikimedia.org/P66212 and previous config saved to /var/cache/conftool/dbconfig/20240710-234356-arnaudb.json
23:36 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2182 (T367856)', diff saved to https://phabricator.wikimedia.org/P66211 and previous config saved to /var/cache/conftool/dbconfig/20240710-233558-marostegui.json
23:35 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2182.codfw.wmnet with reason: Maintenance
23:35 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2182.codfw.wmnet with reason: Maintenance
23:35 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2168 (T367856)', diff saved to https://phabricator.wikimedia.org/P66210 and previous config saved to /var/cache/conftool/dbconfig/20240710-233535-marostegui.json
23:35 rzl: $ sudo cumin A:all-mw enable-puppet T367012
23:34 rzl@deploy1002: Finished scap: T367012 (duration: 07m 45s)
23:30 rzl@deploy1002: rzl: Continuing with sync
23:29 rzl@deploy1002: rzl: T367012 synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
23:28 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P66209 and previous config saved to /var/cache/conftool/dbconfig/20240710-232849-arnaudb.json
23:27 rzl@deploy1002: Started scap sync-world: T367012
23:20 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2168', diff saved to https://phabricator.wikimedia.org/P66208 and previous config saved to /var/cache/conftool/dbconfig/20240710-232028-marostegui.json
23:20 rzl: $ sudo cumin A:all-mw disable-puppet # T367012 - really just for the old mwdebug hosts
23:16 zabe@deploy1002: Finished scap: update interwiki cache (duration: 07m 32s)
23:13 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P66207 and previous config saved to /var/cache/conftool/dbconfig/20240710-231342-arnaudb.json
23:09 zabe@deploy1002: Started scap sync-world: update interwiki cache
23:08 zabe@deploy1002: Finished scap: T362529 (duration: 07m 44s)
23:05 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2168', diff saved to https://phabricator.wikimedia.org/P66206 and previous config saved to /var/cache/conftool/dbconfig/20240710-230522-marostegui.json
23:02 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2209 (T367856)', diff saved to https://phabricator.wikimedia.org/P66205 and previous config saved to /var/cache/conftool/dbconfig/20240710-230130-marostegui.json
23:01 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2209.codfw.wmnet with reason: Maintenance
23:01 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2209.codfw.wmnet with reason: Maintenance
23:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2205 (T367856)', diff saved to https://phabricator.wikimedia.org/P66204 and previous config saved to /var/cache/conftool/dbconfig/20240710-230107-marostegui.json
23:00 zabe@deploy1002: Started scap sync-world: T362529
23:00 zabe: Create Wikimedians of United Arab Emirates User Group Wiki # T362529
23:00 mutante: puppetserver1001 - fixing failed unit geoip_update_ipinfo.service
22:58 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T367781)', diff saved to https://phabricator.wikimedia.org/P66203 and previous config saved to /var/cache/conftool/dbconfig/20240710-225835-arnaudb.json
22:57 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2152 (T367781)', diff saved to https://phabricator.wikimedia.org/P66202 and previous config saved to /var/cache/conftool/dbconfig/20240710-225725-arnaudb.json
22:57 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2152.codfw.wmnet with reason: Maintenance
22:57 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2152.codfw.wmnet with reason: Maintenance
22:57 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance
22:56 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance
22:56 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 (T367781)', diff saved to https://phabricator.wikimedia.org/P66201 and previous config saved to /var/cache/conftool/dbconfig/20240710-225647-arnaudb.json
22:53 mutante: puppetmaster1001 - remove Enterprise product ID from MaxMind downloads. sudo systemctl start geoip_update_ipinfo - T366272
22:50 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2168 (T367856)', diff saved to https://phabricator.wikimedia.org/P66200 and previous config saved to /var/cache/conftool/dbconfig/20240710-225015-marostegui.json
22:46 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P66199 and previous config saved to /var/cache/conftool/dbconfig/20240710-224559-marostegui.json
22:41 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P66198 and previous config saved to /var/cache/conftool/dbconfig/20240710-224140-arnaudb.json
22:35 dzahn@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: security release
22:30 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P66197 and previous config saved to /var/cache/conftool/dbconfig/20240710-223052-marostegui.json
22:26 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P66196 and previous config saved to /var/cache/conftool/dbconfig/20240710-222633-arnaudb.json
22:25 dzahn@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: security release
22:19 dzahn@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: security release
22:15 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2205 (T367856)', diff saved to https://phabricator.wikimedia.org/P66195 and previous config saved to /var/cache/conftool/dbconfig/20240710-221545-marostegui.json
22:11 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 (T367781)', diff saved to https://phabricator.wikimedia.org/P66194 and previous config saved to /var/cache/conftool/dbconfig/20240710-221126-arnaudb.json
22:10 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1226 (T367781)', diff saved to https://phabricator.wikimedia.org/P66193 and previous config saved to /var/cache/conftool/dbconfig/20240710-221018-arnaudb.json
22:10 mutante: gitlab-replica-b.wikimedia.org - version upgrade in progress
22:10 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1226.eqiad.wmnet with reason: Maintenance
22:10 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1226.eqiad.wmnet with reason: Maintenance
22:10 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1216.eqiad.wmnet with reason: Maintenance
22:09 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1216.eqiad.wmnet with reason: Maintenance
22:09 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 (T367781)', diff saved to https://phabricator.wikimedia.org/P66192 and previous config saved to /var/cache/conftool/dbconfig/20240710-220951-arnaudb.json
22:09 dzahn@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: security release
21:54 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P66191 and previous config saved to /var/cache/conftool/dbconfig/20240710-215444-arnaudb.json
21:39 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P66190 and previous config saved to /var/cache/conftool/dbconfig/20240710-213935-arnaudb.json
21:30 jdrewniak@deploy1002: Finished scap: Backport for [July 10th] Vector: enable dark mode for tier 1 wikis (logged in only) (T368795), Add beta tag & feedback link to Appearance menu (T367871) (duration: 11m 35s)
21:24 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 (T367781)', diff saved to https://phabricator.wikimedia.org/P66188 and previous config saved to /var/cache/conftool/dbconfig/20240710-212427-arnaudb.json
21:23 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1214 (T367781)', diff saved to https://phabricator.wikimedia.org/P66187 and previous config saved to /var/cache/conftool/dbconfig/20240710-212319-arnaudb.json
21:23 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1214.eqiad.wmnet with reason: Maintenance
21:23 jdrewniak@deploy1002: jdlrobson, jdrewniak: Continuing with sync
21:23 jdrewniak@deploy1002: jdlrobson, jdrewniak: Backport for [July 10th] Vector: enable dark mode for tier 1 wikis (logged in only) (T368795), Add beta tag & feedback link to Appearance menu (T367871) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:23 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1214.eqiad.wmnet with reason: Maintenance
21:22 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 (T367781)', diff saved to https://phabricator.wikimedia.org/P66186 and previous config saved to /var/cache/conftool/dbconfig/20240710-212257-arnaudb.json
21:18 jdrewniak@deploy1002: Started scap sync-world: Backport for [July 10th] Vector: enable dark mode for tier 1 wikis (logged in only) (T368795), Add beta tag & feedback link to Appearance menu (T367871)
21:17 jdrewniak@deploy1002: Sync cancelled.
21:17 jdrewniak@deploy1002: jdlrobson, jdrewniak: Backport for [July 10th] Vector: enable dark mode for tier 1 wikis (logged in only) (T368795), Add beta tag & feedback link to Appearance menu (T367871) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:10 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: elastic1096*,elastic1097*,elastic1106* for T348977 - bking@cumin2002
21:10 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: elastic1096*,elastic1097*,elastic1106* for T348977 - bking@cumin2002
21:09 jdrewniak@deploy1002: Started scap sync-world: Backport for [July 10th] Vector: enable dark mode for tier 1 wikis (logged in only) (T368795), Add beta tag & feedback link to Appearance menu (T367871)
21:07 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P66185 and previous config saved to /var/cache/conftool/dbconfig/20240710-210750-arnaudb.json
21:06 jdrewniak@deploy1002: Sync cancelled.
21:06 jdrewniak@deploy1002: jdrewniak, jdlrobson: Backport for [July 10th] Vector: enable dark mode for tier 1 wikis (logged in only) (T368795), Add beta tag & feedback link to Appearance menu (T367871) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:04 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on elastic[1096-1097,1106].eqiad.wmnet with reason: T348977
21:03 bking@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on elastic[1096-1097,1106].eqiad.wmnet with reason: T348977
20:52 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P66184 and previous config saved to /var/cache/conftool/dbconfig/20240710-205242-arnaudb.json
20:37 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 (T367781)', diff saved to https://phabricator.wikimedia.org/P66183 and previous config saved to /var/cache/conftool/dbconfig/20240710-203735-arnaudb.json
20:37 jdrewniak@deploy1002: Started scap sync-world: Backport for [July 10th] Vector: enable dark mode for tier 1 wikis (logged in only) (T368795), Add beta tag & feedback link to Appearance menu (T367871)
20:36 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1211 (T367781)', diff saved to https://phabricator.wikimedia.org/P66182 and previous config saved to /var/cache/conftool/dbconfig/20240710-203627-arnaudb.json
20:36 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1211.eqiad.wmnet with reason: Maintenance
20:36 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1211.eqiad.wmnet with reason: Maintenance
20:36 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 (T367781)', diff saved to https://phabricator.wikimedia.org/P66181 and previous config saved to /var/cache/conftool/dbconfig/20240710-203605-arnaudb.json
20:20 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P66180 and previous config saved to /var/cache/conftool/dbconfig/20240710-202057-arnaudb.json
20:05 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P66179 and previous config saved to /var/cache/conftool/dbconfig/20240710-200550-arnaudb.json
19:50 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 (T367781)', diff saved to https://phabricator.wikimedia.org/P66178 and previous config saved to /var/cache/conftool/dbconfig/20240710-195043-arnaudb.json
19:49 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1203 (T367781)', diff saved to https://phabricator.wikimedia.org/P66177 and previous config saved to /var/cache/conftool/dbconfig/20240710-194935-arnaudb.json
19:49 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1203.eqiad.wmnet with reason: Maintenance
19:49 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1203.eqiad.wmnet with reason: Maintenance
19:49 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1193 (T367781)', diff saved to https://phabricator.wikimedia.org/P66176 and previous config saved to /var/cache/conftool/dbconfig/20240710-194913-arnaudb.json
19:34 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1193', diff saved to https://phabricator.wikimedia.org/P66174 and previous config saved to /var/cache/conftool/dbconfig/20240710-193406-arnaudb.json
19:19 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1193', diff saved to https://phabricator.wikimedia.org/P66173 and previous config saved to /var/cache/conftool/dbconfig/20240710-191859-arnaudb.json
19:03 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1193 (T367781)', diff saved to https://phabricator.wikimedia.org/P66172 and previous config saved to /var/cache/conftool/dbconfig/20240710-190352-arnaudb.json
19:03 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1193 (T367781)', diff saved to https://phabricator.wikimedia.org/P66171 and previous config saved to /var/cache/conftool/dbconfig/20240710-190244-arnaudb.json
19:02 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1193.eqiad.wmnet with reason: Maintenance
19:02 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1193.eqiad.wmnet with reason: Maintenance
19:02 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 (T367781)', diff saved to https://phabricator.wikimedia.org/P66170 and previous config saved to /var/cache/conftool/dbconfig/20240710-190222-arnaudb.json
18:56 sfaci@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic: apply
18:56 sfaci@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic: apply
18:47 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P66169 and previous config saved to /var/cache/conftool/dbconfig/20240710-184714-arnaudb.json
18:45 ryankemper@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
18:45 ryankemper@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add 4 new IPs (2 eqiad, 2 codfw) for wdqs graph split - ryankemper@cumin2002"
18:43 ryankemper@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add 4 new IPs (2 eqiad, 2 codfw) for wdqs graph split - ryankemper@cumin2002"
18:35 ryankemper@cumin2002: START - Cookbook sre.dns.netbox
18:32 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P66168 and previous config saved to /var/cache/conftool/dbconfig/20240710-183207-arnaudb.json
18:17 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 (T367781)', diff saved to https://phabricator.wikimedia.org/P66166 and previous config saved to /var/cache/conftool/dbconfig/20240710-181700-arnaudb.json
17:16 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1192 (T367781)', diff saved to https://phabricator.wikimedia.org/P66164 and previous config saved to /var/cache/conftool/dbconfig/20240710-171644-arnaudb.json
17:16 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1192.eqiad.wmnet with reason: Maintenance
17:16 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1192.eqiad.wmnet with reason: Maintenance
17:16 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T367781)', diff saved to https://phabricator.wikimedia.org/P66163 and previous config saved to /var/cache/conftool/dbconfig/20240710-171622-arnaudb.json
17:01 arnaudb@cumin1002: dbctl commit (dc=all): 'db1190 (re)pooling @ 100%: post T365993 repool', diff saved to https://phabricator.wikimedia.org/P66162 and previous config saved to /var/cache/conftool/dbconfig/20240710-170143-arnaudb.json
17:01 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P66161 and previous config saved to /var/cache/conftool/dbconfig/20240710-170115-arnaudb.json
16:46 arnaudb@cumin1002: dbctl commit (dc=all): 'db1190 (re)pooling @ 75%: post T365993 repool', diff saved to https://phabricator.wikimedia.org/P66160 and previous config saved to /var/cache/conftool/dbconfig/20240710-164637-arnaudb.json
16:46 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P66159 and previous config saved to /var/cache/conftool/dbconfig/20240710-164608-arnaudb.json
16:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T352010)', diff saved to https://phabricator.wikimedia.org/P66158 and previous config saved to /var/cache/conftool/dbconfig/20240710-164225-ladsgroup.json
16:31 arnaudb@cumin1002: dbctl commit (dc=all): 'db1190 (re)pooling @ 50%: post T365993 repool', diff saved to https://phabricator.wikimedia.org/P66157 and previous config saved to /var/cache/conftool/dbconfig/20240710-163131-arnaudb.json
16:31 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T367781)', diff saved to https://phabricator.wikimedia.org/P66156 and previous config saved to /var/cache/conftool/dbconfig/20240710-163100-arnaudb.json
16:30 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1178 (T367781)', diff saved to https://phabricator.wikimedia.org/P66155 and previous config saved to /var/cache/conftool/dbconfig/20240710-162952-arnaudb.json
16:29 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1178.eqiad.wmnet with reason: Maintenance
16:29 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1178.eqiad.wmnet with reason: Maintenance
16:29 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T367781)', diff saved to Unable to send diff to phaste and previous config saved to /var/cache/conftool/dbconfig/20240710-162926-arnaudb.json
16:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P66153 and previous config saved to /var/cache/conftool/dbconfig/20240710-162718-ladsgroup.json
16:17 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
16:17 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
16:16 arnaudb@cumin1002: dbctl commit (dc=all): 'db1190 (re)pooling @ 25%: post T365993 repool', diff saved to https://phabricator.wikimedia.org/P66152 and previous config saved to /var/cache/conftool/dbconfig/20240710-161626-arnaudb.json
16:14 brett@cumin2002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-secondary (T368083)
16:14 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P66151 and previous config saved to /var/cache/conftool/dbconfig/20240710-161419-arnaudb.json
16:12 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P66150 and previous config saved to /var/cache/conftool/dbconfig/20240710-161211-ladsgroup.json
16:11 brett@cumin2002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-secondary (T368083)
16:08 brett@cumin2002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-high-traffic2 (T368083)
16:05 brett@cumin2002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-high-traffic2 (T368083)
16:03 brett@cumin2002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-high-traffic1 (T368083)
16:01 arnaudb@cumin1002: dbctl commit (dc=all): 'db1190 (re)pooling @ 10%: post T365993 repool', diff saved to https://phabricator.wikimedia.org/P66149 and previous config saved to /var/cache/conftool/dbconfig/20240710-160120-arnaudb.json
16:01 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
16:00 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
16:00 brett@cumin2002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-high-traffic1 (T368083)
15:59 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P66148 and previous config saved to /var/cache/conftool/dbconfig/20240710-155911-arnaudb.json
15:59 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
15:58 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
15:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T352010)', diff saved to https://phabricator.wikimedia.org/P66147 and previous config saved to /var/cache/conftool/dbconfig/20240710-155703-ladsgroup.json
15:55 brett@cumin2002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-high-traffic2-eqsin (T368083)
15:54 brett@cumin2002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-high-traffic2-eqsin (T368083)
15:53 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Unbanning all hosts in search_eqiad
15:53 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Unbanning all hosts in search_eqiad
15:49 brett@cumin2002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-high-traffic1-eqsin (T368083)
15:48 brett@cumin2002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-high-traffic1-eqsin (T368083)
15:48 sfaci@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
15:48 sfaci@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
15:46 arnaudb@cumin1002: dbctl commit (dc=all): 'db1190 (re)pooling @ 5%: post T365993 repool', diff saved to https://phabricator.wikimedia.org/P66146 and previous config saved to /var/cache/conftool/dbconfig/20240710-154615-arnaudb.json
15:44 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T367781)', diff saved to https://phabricator.wikimedia.org/P66145 and previous config saved to /var/cache/conftool/dbconfig/20240710-154404-arnaudb.json
15:42 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1177 (T367781)', diff saved to https://phabricator.wikimedia.org/P66144 and previous config saved to /var/cache/conftool/dbconfig/20240710-154256-arnaudb.json
15:42 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1177.eqiad.wmnet with reason: Maintenance
15:42 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1177.eqiad.wmnet with reason: Maintenance
15:42 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T367781)', diff saved to https://phabricator.wikimedia.org/P66143 and previous config saved to /var/cache/conftool/dbconfig/20240710-154234-arnaudb.json
15:36 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-presto1004.eqiad.wmnet with reason: Shutting down to investigate RAM issue
15:36 btullis@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on an-presto1004.eqiad.wmnet with reason: Shutting down to investigate RAM issue
15:27 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P66142 and previous config saved to /var/cache/conftool/dbconfig/20240710-152727-arnaudb.json
15:27 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-high-traffic1 (T368083)
15:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Remove db1233 from groups', diff saved to https://phabricator.wikimedia.org/P66141 and previous config saved to /var/cache/conftool/dbconfig/20240710-152616-ladsgroup.json
15:24 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-high-traffic1 (T368083)
15:24 vgutierrez: rolling restart of high-traffic1 LVSs to switch ncredir to maglev - T368083
15:24 topranks: rebooting lsw1-e1-eqiad to install updated JunOS version T365993
15:24 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 26 hosts with reason: JunOS upgrade lsw1-e1-eqiad
15:23 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on 26 hosts with reason: JunOS upgrade lsw1-e1-eqiad
15:23 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lsw1-e1-eqiad,lsw1-e1-eqiad IPv6,ssw1-e1-eqiad.mgmt,ssw1-f1-eqiad.mgmt with reason: JunOS upgrade lsw1-e1-eqiad
15:23 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on lsw1-e1-eqiad,lsw1-e1-eqiad IPv6,ssw1-e1-eqiad.mgmt,ssw1-f1-eqiad.mgmt with reason: JunOS upgrade lsw1-e1-eqiad
15:19 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-secondary (T368083)
15:16 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-secondary (T368083)
15:14 vgutierrez: rolling restart of secondary LVSs to switch ncredir to maglev - T368083
15:13 elukey: restart turnilo on an-tool1007
15:12 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P66140 and previous config saved to /var/cache/conftool/dbconfig/20240710-151219-arnaudb.json
14:58 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2205 (T367856)', diff saved to https://phabricator.wikimedia.org/P66139 and previous config saved to /var/cache/conftool/dbconfig/20240710-145807-marostegui.json
14:58 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2205.codfw.wmnet with reason: Maintenance
14:57 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2205.codfw.wmnet with reason: Maintenance
14:57 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 (T367856)', diff saved to https://phabricator.wikimedia.org/P66138 and previous config saved to /var/cache/conftool/dbconfig/20240710-145744-marostegui.json
14:57 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T367781)', diff saved to https://phabricator.wikimedia.org/P66137 and previous config saved to /var/cache/conftool/dbconfig/20240710-145712-arnaudb.json
14:56 cmooney@cumin1002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: elastic1104*,elastic1089*,elastic1090* for T365993 - cmooney@cumin1002
14:55 cmooney@cumin1002: START - Cookbook sre.elasticsearch.ban Banning hosts: elastic1104*,elastic1089*,elastic1090* for T365993 - cmooney@cumin1002
14:42 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P66136 and previous config saved to /var/cache/conftool/dbconfig/20240710-144237-marostegui.json
14:37 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2168 (T367856)', diff saved to https://phabricator.wikimedia.org/P66135 and previous config saved to /var/cache/conftool/dbconfig/20240710-143713-marostegui.json
14:37 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2168.codfw.wmnet with reason: Maintenance
14:36 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2168.codfw.wmnet with reason: Maintenance
14:36 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T367856)', diff saved to https://phabricator.wikimedia.org/P66134 and previous config saved to /var/cache/conftool/dbconfig/20240710-143651-marostegui.json
14:34 cmooney@cumin1002: END (FAIL) - Cookbook sre.elasticsearch.ban (exit_code=99) Banning hosts: elastic1104,elastic1089,elastic1090 for ban elastic nodes before switch upgrade rack E1 - cmooney@cumin1002 - T365993
14:34 cmooney@cumin1002: START - Cookbook sre.elasticsearch.ban Banning hosts: elastic1104,elastic1089,elastic1090 for ban elastic nodes before switch upgrade rack E1 - cmooney@cumin1002 - T365993
14:30 logmsgbot: lucaswerkmeister-wmde@deploy1002 helmfile [codfw] DONE helmfile.d/services/termbox: apply
14:30 logmsgbot: lucaswerkmeister-wmde@deploy1002 helmfile [codfw] START helmfile.d/services/termbox: apply
14:30 logmsgbot: lucaswerkmeister-wmde@deploy1002 helmfile [eqiad] DONE helmfile.d/services/termbox: apply
14:30 logmsgbot: lucaswerkmeister-wmde@deploy1002 helmfile [eqiad] START helmfile.d/services/termbox: apply
14:28 logmsgbot: lucaswerkmeister-wmde@deploy1002 helmfile [staging] DONE helmfile.d/services/termbox: apply
14:28 logmsgbot: lucaswerkmeister-wmde@deploy1002 helmfile [staging] START helmfile.d/services/termbox: apply
14:27 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P66133 and previous config saved to /var/cache/conftool/dbconfig/20240710-142730-marostegui.json
14:21 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P66132 and previous config saved to /var/cache/conftool/dbconfig/20240710-142144-marostegui.json
14:21 kamila@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply
14:20 kamila@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-video: apply
14:20 kamila@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply
14:19 kamila@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply
14:19 kamila@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply
14:19 kamila@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-video: apply
14:16 jforrester@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
14:15 effie: disable puppet on mw memcached hosts - T352885
14:13 jforrester@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
14:13 jforrester@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
14:12 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 (T367856)', diff saved to https://phabricator.wikimedia.org/P66131 and previous config saved to /var/cache/conftool/dbconfig/20240710-141222-marostegui.json
14:11 jforrester@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
14:11 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
14:10 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
14:08 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on lsw1-e1-eqiad.mgmt with reason: prep JunOS upgrade lsw1-e1-eqiad
14:08 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 1:30:00 on lsw1-e1-eqiad.mgmt with reason: prep JunOS upgrade lsw1-e1-eqiad
14:07 jforrester@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
14:07 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:06 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P66130 and previous config saved to /var/cache/conftool/dbconfig/20240710-140637-marostegui.json
14:06 jforrester@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
14:06 jforrester@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
14:05 jforrester@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
14:05 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
14:04 XioNoX: add ipxe_1.21.1+git-20240627.b66e27d to bookworm-wikimedia reprepro
14:04 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
14:04 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
14:03 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backup1010.eqiad.wmnet,db1190.eqiad.wmnet,dbproxy1026.eqiad.wmnet with reason: T365993
14:02 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on backup1010.eqiad.wmnet,db1190.eqiad.wmnet,dbproxy1026.eqiad.wmnet with reason: T365993
14:02 arnaudb@cumin1002: dbctl commit (dc=all): 'T365993 - depool db1190 - s4', diff saved to https://phabricator.wikimedia.org/P66129 and previous config saved to /var/cache/conftool/dbconfig/20240710-140224-arnaudb.json
13:57 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1172 (T367781)', diff saved to https://phabricator.wikimedia.org/P66128 and previous config saved to /var/cache/conftool/dbconfig/20240710-135656-arnaudb.json
13:57 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1172.eqiad.wmnet with reason: Maintenance
13:56 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on kubernetes1051.eqiad.wmnet with reason: Hardware issue
13:56 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1172.eqiad.wmnet with reason: Maintenance
13:56 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1171.eqiad.wmnet with reason: Maintenance
13:56 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on kubernetes1051.eqiad.wmnet with reason: Hardware issue
13:56 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1171.eqiad.wmnet with reason: Maintenance
13:56 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T367781)', diff saved to https://phabricator.wikimedia.org/P66127 and previous config saved to /var/cache/conftool/dbconfig/20240710-135619-arnaudb.json
13:54 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
13:53 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
13:51 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T367856)', diff saved to https://phabricator.wikimedia.org/P66126 and previous config saved to /var/cache/conftool/dbconfig/20240710-135130-marostegui.json
13:49 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
13:48 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
13:46 akosiaris@cumin1002: conftool action : set/pooled=inactive; selector: name=kubernetes1059.*
13:44 btullis: re-enabling the misc dumps jobs on snapshot1017 with https://gerrit.wikimedia.org/r/c/operations/puppet/+/1053315
13:41 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P66125 and previous config saved to /var/cache/conftool/dbconfig/20240710-134112-arnaudb.json
13:34 klausman@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
13:34 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-mariadb1001.eqiad.wmnet with OS bookworm
13:33 klausman@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
13:26 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P66124 and previous config saved to /var/cache/conftool/dbconfig/20240710-132604-arnaudb.json
13:18 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-mariadb1001.eqiad.wmnet with reason: host reimage
13:15 btullis@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-mariadb1001.eqiad.wmnet with reason: host reimage
13:11 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T367781)', diff saved to https://phabricator.wikimedia.org/P66123 and previous config saved to /var/cache/conftool/dbconfig/20240710-131057-arnaudb.json
13:01 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host an-mariadb1001.eqiad.wmnet with OS bookworm
12:59 marostegui@cumin1002: dbctl commit (dc=all): 'db1183 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P66122 and previous config saved to /var/cache/conftool/dbconfig/20240710-125928-root.json
12:44 marostegui@cumin1002: dbctl commit (dc=all): 'db1183 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P66121 and previous config saved to /var/cache/conftool/dbconfig/20240710-124422-root.json
12:38 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1167 (T367781)', diff saved to https://phabricator.wikimedia.org/P66120 and previous config saved to /var/cache/conftool/dbconfig/20240710-123844-arnaudb.json
12:38 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
12:38 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
12:38 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1167.eqiad.wmnet with reason: Maintenance
12:38 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1167.eqiad.wmnet with reason: Maintenance
12:30 topranks: removing unused wmcs vlans from asw2-b-eqiad
12:29 marostegui@cumin1002: dbctl commit (dc=all): 'db1183 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P66119 and previous config saved to /var/cache/conftool/dbconfig/20240710-122917-root.json
12:23 logmsgbot: lucaswerkmeister-wmde@deploy1002 helmfile [codfw] DONE helmfile.d/services/termbox: apply
12:22 logmsgbot: lucaswerkmeister-wmde@deploy1002 helmfile [codfw] START helmfile.d/services/termbox: apply
12:22 logmsgbot: lucaswerkmeister-wmde@deploy1002 helmfile [eqiad] DONE helmfile.d/services/termbox: apply
12:21 logmsgbot: lucaswerkmeister-wmde@deploy1002 helmfile [eqiad] START helmfile.d/services/termbox: apply
12:21 logmsgbot: lucaswerkmeister-wmde@deploy1002 helmfile [staging] DONE helmfile.d/services/termbox: apply
12:20 logmsgbot: lucaswerkmeister-wmde@deploy1002 helmfile [staging] START helmfile.d/services/termbox: apply
12:14 marostegui@cumin1002: dbctl commit (dc=all): 'db1183 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P66118 and previous config saved to /var/cache/conftool/dbconfig/20240710-121411-root.json
11:59 marostegui@cumin1002: dbctl commit (dc=all): 'db1183 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P66117 and previous config saved to /var/cache/conftool/dbconfig/20240710-115906-root.json
11:53 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub: sync on production
11:50 marostegui@cumin1002: dbctl commit (dc=all): 'Pool db2136 into api with small weight T365805', diff saved to https://phabricator.wikimedia.org/P66116 and previous config saved to /var/cache/conftool/dbconfig/20240710-115046-marostegui.json
11:50 claime: cleaned up leftover media files on videoscalers
11:50 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub: apply on production
11:44 marostegui@cumin1002: dbctl commit (dc=all): 'db1183 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P66115 and previous config saved to /var/cache/conftool/dbconfig/20240710-114401-root.json
11:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1162 (T352010)', diff saved to https://phabricator.wikimedia.org/P66114 and previous config saved to /var/cache/conftool/dbconfig/20240710-113010-ladsgroup.json
11:30 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1162.eqiad.wmnet with reason: Maintenance
11:29 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1162.eqiad.wmnet with reason: Maintenance
11:28 marostegui@cumin1002: dbctl commit (dc=all): 'db1183 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P66113 and previous config saved to /var/cache/conftool/dbconfig/20240710-112856-root.json
11:22 mnz@deploy1002: Finished deploy [airflow-dags/research@5121748]: (no justification provided) (duration: 00m 41s)
11:21 mnz@deploy1002: Started deploy [airflow-dags/research@5121748]: (no justification provided)
10:43 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic: apply
10:43 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic: apply
10:43 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
10:43 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
10:42 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub-next: sync on staging
10:39 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub-next: apply on staging
10:38 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/superset: apply
10:38 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/superset: apply
10:34 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
10:34 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
10:29 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/superset-next: apply
10:26 mnz@deploy1002: Finished deploy [airflow-dags/research@5121748]: (no justification provided) (duration: 00m 04s)
10:26 mnz@deploy1002: Started deploy [airflow-dags/research@5121748]: (no justification provided)
10:23 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1208.eqiad.wmnet with reason: corruption issue
10:22 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1208.eqiad.wmnet with reason: corruption issue
10:21 jiji@deploy1002: Finished scap: Switch mediawiki everywhere to use node-local mcrouter ds - T346690 (duration: 05m 15s)
10:15 jiji@deploy1002: Started scap sync-world: Switch mediawiki everywhere to use node-local mcrouter ds - T346690
09:29 kevinbazira@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
08:51 aklapper@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.43.0-wmf.13 refs T366958
08:41 hashar: On deployment server, unblocked train by manually editing /var/lib/scap/scap/lib/python3.7/site-packages/scap/train.py to allow train blocker task with "progress" status instead of just "open" # T369689
08:08 kostajh: UTC morning deploys done
08:06 kharlan@deploy1002: Finished scap: Backport for ConfirmEdit: Enable showcaptcha action on testwiki and beta wikis (T20110) (duration: 09m 41s)
08:00 kharlan@deploy1002: kharlan: Continuing with sync
07:59 kharlan@deploy1002: kharlan: Backport for ConfirmEdit: Enable showcaptcha action on testwiki and beta wikis (T20110) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
07:57 kharlan@deploy1002: Started scap sync-world: Backport for ConfirmEdit: Enable showcaptcha action on testwiki and beta wikis (T20110)
07:43 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs2025.codfw.wmnet
07:43 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs2024.codfw.wmnet
07:36 ryankemper@cumin2002: START - Cookbook sre.hosts.reboot-single for host wdqs2025.codfw.wmnet
07:36 ryankemper@cumin2002: START - Cookbook sre.hosts.reboot-single for host wdqs2024.codfw.wmnet
07:35 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs2023.codfw.wmnet
07:33 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs2020.codfw.wmnet
07:33 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs2021.codfw.wmnet
07:32 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs2022.codfw.wmnet
07:28 ryankemper@cumin2002: START - Cookbook sre.hosts.reboot-single for host wdqs2023.codfw.wmnet
07:27 ryankemper@cumin2002: START - Cookbook sre.hosts.reboot-single for host wdqs2020.codfw.wmnet
07:26 ryankemper@cumin2002: START - Cookbook sre.hosts.reboot-single for host wdqs2021.codfw.wmnet
07:26 ryankemper@cumin2002: START - Cookbook sre.hosts.reboot-single for host wdqs2022.codfw.wmnet
07:22 kostajh: UTC morning deploys done
07:20 kharlan@deploy1002: Finished scap: Backport for IPReputation: Enable extension on testwiki (T360067) (duration: 14m 05s)
07:18 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs2019.codfw.wmnet
07:18 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs2018.codfw.wmnet
07:15 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs2017.codfw.wmnet
07:15 kharlan@deploy1002: kharlan: Continuing with sync
07:11 ryankemper@cumin2002: START - Cookbook sre.hosts.reboot-single for host wdqs2018.codfw.wmnet
07:11 ryankemper@cumin2002: START - Cookbook sre.hosts.reboot-single for host wdqs2019.codfw.wmnet
07:09 ryankemper@cumin2002: START - Cookbook sre.hosts.reboot-single for host wdqs2017.codfw.wmnet
07:09 kharlan@deploy1002: kharlan: Backport for IPReputation: Enable extension on testwiki (T360067) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
07:08 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs2016.codfw.wmnet
07:06 kharlan@deploy1002: Started scap sync-world: Backport for IPReputation: Enable extension on testwiki (T360067)
07:02 ryankemper@cumin2002: START - Cookbook sre.hosts.reboot-single for host wdqs2016.codfw.wmnet
07:01 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs2015.codfw.wmnet
06:58 XioNoX: push policy-statement BGP_agg_net_pops to all CRs (noop as it's not applied there) - T367439
06:54 ryankemper@cumin2002: START - Cookbook sre.hosts.reboot-single for host wdqs2015.codfw.wmnet
06:54 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 17072
06:52 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 17072
06:52 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs2014.codfw.wmnet
06:35 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs2013.codfw.wmnet
06:29 ryankemper@cumin2002: START - Cookbook sre.hosts.reboot-single for host wdqs2013.codfw.wmnet
06:28 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs2012.codfw.wmnet
06:24 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2194 (T367856)', diff saved to https://phabricator.wikimedia.org/P66110 and previous config saved to /var/cache/conftool/dbconfig/20240710-062424-marostegui.json
06:24 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2194.codfw.wmnet with reason: Maintenance
06:24 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2194.codfw.wmnet with reason: Maintenance
06:24 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2190 (T367856)', diff saved to https://phabricator.wikimedia.org/P66109 and previous config saved to /var/cache/conftool/dbconfig/20240710-062401-marostegui.json
06:22 ryankemper@cumin2002: START - Cookbook sre.hosts.reboot-single for host wdqs2012.codfw.wmnet
06:16 ryankemper@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host wdqs2012.codfw.wmnet
06:15 ryankemper@cumin2002: START - Cookbook sre.hosts.reboot-single for host wdqs2012.codfw.wmnet
06:08 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P66108 and previous config saved to /var/cache/conftool/dbconfig/20240710-060854-marostegui.json
05:53 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P66107 and previous config saved to /var/cache/conftool/dbconfig/20240710-055347-marostegui.json
05:49 marostegui: Deploy schema change on s5 eqiad db1183 dbmaint T367856
05:48 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1183.eqiad.wmnet with reason: Long schema change
05:48 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1183.eqiad.wmnet with reason: Long schema change
05:47 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1183 T369616', diff saved to https://phabricator.wikimedia.org/P66106 and previous config saved to /var/cache/conftool/dbconfig/20240710-054710-root.json
05:46 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db1230 to s5 primary and set section read-write T369616', diff saved to https://phabricator.wikimedia.org/P66105 and previous config saved to /var/cache/conftool/dbconfig/20240710-054621-marostegui.json
05:46 marostegui@cumin1002: dbctl commit (dc=all): 'Set s5 eqiad as read-only for maintenance - T369616', diff saved to https://phabricator.wikimedia.org/P66104 and previous config saved to /var/cache/conftool/dbconfig/20240710-054559-marostegui.json
05:45 marostegui: Starting s5 eqiad failover from db1183 to db1230 - T369616
05:38 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2190 (T367856)', diff saved to https://phabricator.wikimedia.org/P66103 and previous config saved to /var/cache/conftool/dbconfig/20240710-053840-marostegui.json
05:30 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s5 T369616
05:30 marostegui@cumin1002: dbctl commit (dc=all): 'Set db1230 with weight 0 T369616', diff saved to https://phabricator.wikimedia.org/P66102 and previous config saved to /var/cache/conftool/dbconfig/20240710-053009-root.json
05:30 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 25 hosts with reason: Primary switchover s5 T369616
05:25 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2159 (T367856)', diff saved to https://phabricator.wikimedia.org/P66101 and previous config saved to /var/cache/conftool/dbconfig/20240710-052520-marostegui.json
05:25 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
05:25 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
05:24 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2159.codfw.wmnet with reason: Maintenance
05:24 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2159.codfw.wmnet with reason: Maintenance
05:24 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T367856)', diff saved to https://phabricator.wikimedia.org/P66100 and previous config saved to /var/cache/conftool/dbconfig/20240710-052443-marostegui.json
05:09 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P66099 and previous config saved to /var/cache/conftool/dbconfig/20240710-050935-marostegui.json
04:54 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P66098 and previous config saved to /var/cache/conftool/dbconfig/20240710-045428-marostegui.json
04:39 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T367856)', diff saved to https://phabricator.wikimedia.org/P66097 and previous config saved to /var/cache/conftool/dbconfig/20240710-043921-marostegui.json
03:22 eileen: tools upgraded from 95f10b20 to 94bac5c6

2024-07-09

22:33 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2190 (T367856)', diff saved to https://phabricator.wikimedia.org/P66096 and previous config saved to /var/cache/conftool/dbconfig/20240709-223336-marostegui.json
22:33 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2190.codfw.wmnet with reason: Maintenance
22:33 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2190.codfw.wmnet with reason: Maintenance
22:33 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T367856)', diff saved to https://phabricator.wikimedia.org/P66095 and previous config saved to /var/cache/conftool/dbconfig/20240709-223314-marostegui.json
22:18 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P66094 and previous config saved to /var/cache/conftool/dbconfig/20240709-221807-marostegui.json
22:03 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P66093 and previous config saved to /var/cache/conftool/dbconfig/20240709-220300-marostegui.json
21:50 ejegg: payments-wiki upgraded from dc0c14d4 to 4e48059a (and ingenico config removed)
21:47 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T367856)', diff saved to https://phabricator.wikimedia.org/P66092 and previous config saved to /var/cache/conftool/dbconfig/20240709-214752-marostegui.json
21:24 ejegg: fundraising civicrm upgraded from 84d6f5d1 to a03085ff
21:18 urbanecm@deploy1002: Finished scap: Backport for use text() instead of escaped() for msg recentchanges (T352626) (duration: 21m 50s)
21:13 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Unbanning all hosts in search_eqiad
21:13 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Unbanning all hosts in search_eqiad
21:12 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2212 (T352010)', diff saved to https://phabricator.wikimedia.org/P66091 and previous config saved to /var/cache/conftool/dbconfig/20240709-211231-ladsgroup.json
21:12 urbanecm@deploy1002: gergesshamon, urbanecm: Continuing with sync
21:00 urbanecm@deploy1002: gergesshamon, urbanecm: Backport for use text() instead of escaped() for msg recentchanges (T352626) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2212', diff saved to https://phabricator.wikimedia.org/P66090 and previous config saved to /var/cache/conftool/dbconfig/20240709-205724-ladsgroup.json
20:56 urbanecm@deploy1002: Started scap sync-world: Backport for use text() instead of escaped() for msg recentchanges (T352626)
20:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2212', diff saved to https://phabricator.wikimedia.org/P66089 and previous config saved to /var/cache/conftool/dbconfig/20240709-204217-ladsgroup.json
20:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2212 (T352010)', diff saved to https://phabricator.wikimedia.org/P66088 and previous config saved to /var/cache/conftool/dbconfig/20240709-202709-ladsgroup.json
20:19 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2150 (T367856)', diff saved to https://phabricator.wikimedia.org/P66087 and previous config saved to /var/cache/conftool/dbconfig/20240709-201928-marostegui.json
20:19 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2150.codfw.wmnet with reason: Maintenance
20:19 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2150.codfw.wmnet with reason: Maintenance
20:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2122 (T367856)', diff saved to https://phabricator.wikimedia.org/P66086 and previous config saved to /var/cache/conftool/dbconfig/20240709-201906-marostegui.json
20:16 urbanecm@deploy1002: Finished scap: Backport for Missing.php: check REQUEST_URI in addition to PATH_INFO (T9496 T355018) (duration: 13m 01s)
20:10 urbanecm@deploy1002: urbanecm, pppery: Continuing with sync
20:07 urbanecm@deploy1002: urbanecm, pppery: Backport for Missing.php: check REQUEST_URI in addition to PATH_INFO (T9496 T355018) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:03 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2122', diff saved to https://phabricator.wikimedia.org/P66084 and previous config saved to /var/cache/conftool/dbconfig/20240709-200359-marostegui.json
20:03 urbanecm@deploy1002: Started scap sync-world: Backport for Missing.php: check REQUEST_URI in addition to PATH_INFO (T9496 T355018)
19:48 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2122', diff saved to https://phabricator.wikimedia.org/P66083 and previous config saved to /var/cache/conftool/dbconfig/20240709-194851-marostegui.json
19:33 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2122 (T367856)', diff saved to https://phabricator.wikimedia.org/P66082 and previous config saved to /var/cache/conftool/dbconfig/20240709-193344-marostegui.json
17:14 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
17:13 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
17:12 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
17:12 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
17:12 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
17:12 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
17:12 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
17:12 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
17:11 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
17:11 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
17:11 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
17:03 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on relforge[1003-1004].eqiad.wmnet with reason: T368950
17:03 bking@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on relforge[1003-1004].eqiad.wmnet with reason: T368950
16:59 marostegui@cumin1002: dbctl commit (dc=all): 'db1199 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P66080 and previous config saved to /var/cache/conftool/dbconfig/20240709-165921-root.json
16:57 marostegui@cumin1002: dbctl commit (dc=all): 'db1198 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P66079 and previous config saved to /var/cache/conftool/dbconfig/20240709-165746-root.json
16:57 marostegui@cumin1002: dbctl commit (dc=all): 'db1192 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P66078 and previous config saved to /var/cache/conftool/dbconfig/20240709-165738-root.json
16:57 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
16:57 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
16:44 marostegui@cumin1002: dbctl commit (dc=all): 'db1199 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P66077 and previous config saved to /var/cache/conftool/dbconfig/20240709-164415-root.json
16:42 marostegui@cumin1002: dbctl commit (dc=all): 'db1198 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P66076 and previous config saved to /var/cache/conftool/dbconfig/20240709-164241-root.json
16:42 marostegui@cumin1002: dbctl commit (dc=all): 'db1192 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P66075 and previous config saved to /var/cache/conftool/dbconfig/20240709-164233-root.json
16:40 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
16:40 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
16:30 btullis@deploy1002: Finished deploy [analytics/refinery@a203f30] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@a203f30c] (duration: 03m 41s)
16:29 marostegui@cumin1002: dbctl commit (dc=all): 'db1199 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P66074 and previous config saved to /var/cache/conftool/dbconfig/20240709-162909-root.json
16:27 marostegui@cumin1002: dbctl commit (dc=all): 'db1198 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P66073 and previous config saved to /var/cache/conftool/dbconfig/20240709-162735-root.json
16:27 marostegui@cumin1002: dbctl commit (dc=all): 'db1192 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P66072 and previous config saved to /var/cache/conftool/dbconfig/20240709-162727-root.json
16:26 btullis@deploy1002: Started deploy [analytics/refinery@a203f30] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@a203f30c]
16:25 btullis@deploy1002: Finished deploy [analytics/refinery@a203f30] (thin): Regular analytics weekly train THIN [analytics/refinery@a203f30c] (duration: 04m 05s)
16:21 btullis@deploy1002: Started deploy [analytics/refinery@a203f30] (thin): Regular analytics weekly train THIN [analytics/refinery@a203f30c]
16:20 btullis@deploy1002: Finished deploy [analytics/refinery@a203f30]: Regular analytics weekly train [analytics/refinery@a203f30c] (duration: 01m 18s)
16:19 btullis@deploy1002: Started deploy [analytics/refinery@a203f30]: Regular analytics weekly train [analytics/refinery@a203f30c]
16:19 btullis@deploy1002: Finished deploy [analytics/refinery@a203f30]: Regular analytics weekly train [analytics/refinery@a203f30c] (duration: 04m 51s)
16:14 btullis@deploy1002: Started deploy [analytics/refinery@a203f30]: Regular analytics weekly train [analytics/refinery@a203f30c]
16:14 btullis@deploy1002: Finished deploy [analytics/refinery@a203f30]: Regular analytics weekly train [analytics/refinery@a203f30c] (duration: 09m 23s)
16:14 marostegui@cumin1002: dbctl commit (dc=all): 'db1199 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P66071 and previous config saved to /var/cache/conftool/dbconfig/20240709-161404-root.json
16:14 btullis: pooled druid1010
16:13 btullis: unset noout mode on the cephosd cluster
16:13 btullis: uncordoned dse-k8s-worker1006
16:12 marostegui@cumin1002: dbctl commit (dc=all): 'db1198 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P66070 and previous config saved to /var/cache/conftool/dbconfig/20240709-161230-root.json
16:12 marostegui@cumin1002: dbctl commit (dc=all): 'db1192 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P66069 and previous config saved to /var/cache/conftool/dbconfig/20240709-161222-root.json
16:07 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply
16:04 btullis@deploy1002: Started deploy [analytics/refinery@a203f30]: Regular analytics weekly train [analytics/refinery@a203f30c]
15:58 marostegui@cumin1002: dbctl commit (dc=all): 'db1199 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P66068 and previous config saved to /var/cache/conftool/dbconfig/20240709-155858-root.json
15:57 marostegui@cumin1002: dbctl commit (dc=all): 'db1198 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P66067 and previous config saved to /var/cache/conftool/dbconfig/20240709-155724-root.json
15:57 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-video: apply
15:57 marostegui@cumin1002: dbctl commit (dc=all): 'db1192 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P66066 and previous config saved to /var/cache/conftool/dbconfig/20240709-155717-root.json
15:56 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply
15:46 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-video: apply
15:44 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply
15:44 ayounsi@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) netbox to netbox-dev2003.codfw.wmnet with reason: Release v4.0.6 to netbox-next - ayounsi@cumin1002 - T336275
15:44 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply
15:43 marostegui@cumin1002: dbctl commit (dc=all): 'db1199 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P66065 and previous config saved to /var/cache/conftool/dbconfig/20240709-154353-root.json
15:42 ayounsi@cumin1002: START - Cookbook sre.deploy.python-code netbox to netbox-dev2003.codfw.wmnet with reason: Release v4.0.6 to netbox-next - ayounsi@cumin1002 - T336275
15:42 marostegui@cumin1002: dbctl commit (dc=all): 'db1198 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P66064 and previous config saved to /var/cache/conftool/dbconfig/20240709-154219-root.json
15:42 marostegui@cumin1002: dbctl commit (dc=all): 'db1192 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P66063 and previous config saved to /var/cache/conftool/dbconfig/20240709-154211-root.json
15:41 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/commons-impact-analytics: apply
15:41 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/commons-impact-analytics: apply
15:39 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/commons-impact-analytics: apply
15:38 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/commons-impact-analytics: apply
15:35 sukhe: remove traffic-dnsbox VM on cloud-vps: T360710
15:28 marostegui@cumin1002: dbctl commit (dc=all): 'db1199 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P66062 and previous config saved to /var/cache/conftool/dbconfig/20240709-152847-root.json
15:27 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 9 hosts
15:27 hnowlan@cumin1002: START - Cookbook sre.hosts.remove-downtime for 9 hosts
15:27 marostegui@cumin1002: dbctl commit (dc=all): 'db1198 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P66061 and previous config saved to /var/cache/conftool/dbconfig/20240709-152713-root.json
15:27 marostegui@cumin1002: dbctl commit (dc=all): 'db1192 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P66060 and previous config saved to /var/cache/conftool/dbconfig/20240709-152706-root.json
15:14 ayounsi@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) netbox to netbox-dev2003.codfw.wmnet with reason: Release v4.0.6 to netbox-next - ayounsi@cumin1002 - T336275
15:12 ayounsi@cumin1002: START - Cookbook sre.deploy.python-code netbox to netbox-dev2003.codfw.wmnet with reason: Release v4.0.6 to netbox-next - ayounsi@cumin1002 - T336275
15:11 ayounsi@cumin1002: END (FAIL) - Cookbook sre.deploy.python-code (exit_code=99) netbox to netbox-dev2003.codfw.wmnet with reason: Release v4.0.6 to netbox-next - ayounsi@cumin1002 - T336275
15:08 ayounsi@cumin1002: START - Cookbook sre.deploy.python-code netbox to netbox-dev2003.codfw.wmnet with reason: Release v4.0.6 to netbox-next - ayounsi@cumin1002 - T336275
15:04 topranks: rebooting lsw1-e3-eqiad to install updated JunOS version T365998
15:03 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on 27 hosts with reason: JunOS upgrade lsw1-e3-eqiad
15:02 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:40:00 on 27 hosts with reason: JunOS upgrade lsw1-e3-eqiad
15:01 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 9 hosts with reason: network maintenance
15:01 hnowlan@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on 9 hosts with reason: network maintenance
15:00 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lsw1-e3-eqiad,lsw1-e3-eqiad IPv6,ssw1-e1-eqiad.mgmt,ssw1-f1-eqiad.mgmt with reason: JunOS upgrade lsw1-e3-eqiad
14:59 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on lsw1-e3-eqiad,lsw1-e3-eqiad IPv6,ssw1-e1-eqiad.mgmt,ssw1-f1-eqiad.mgmt with reason: JunOS upgrade lsw1-e3-eqiad
14:54 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/commons-impact-analytics: apply
14:53 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/commons-impact-analytics: apply
14:53 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on lsw1-e3-eqiad.mgmt with reason: prep JunOS upgrade lsw1-e3-eqiad
14:53 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:40:00 on lsw1-e3-eqiad.mgmt with reason: prep JunOS upgrade lsw1-e3-eqiad
14:50 hashar: Restart Gerrit primary on gerrit1003 to apply a configuration change | T367505
14:46 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-conf1003.eqiad.wmnet
14:46 hashar@deploy1002: Finished deploy [integration/docroot@c8b0266]: (no justification provided) (duration: 00m 07s)
14:46 hashar@deploy1002: Started deploy [integration/docroot@c8b0266]: (no justification provided)
14:45 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-coord1003.eqiad.wmnet
14:43 Lucas_WMDE: UTC afternoon backport+config window done
14:40 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-conf1003.eqiad.wmnet
14:40 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-conf1002.eqiad.wmnet
14:38 sukhe: dummy authdns-update
14:38 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-coord1003.eqiad.wmnet
14:37 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cephosd2003.codfw.wmnet
14:37 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for wmfRenderEmptyGraphTag: Fix count() warning (T369600) (duration: 14m 35s)
14:35 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-conf1002.eqiad.wmnet
14:32 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-conf1001.eqiad.wmnet
14:32 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde: Continuing with sync
14:29 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde: Backport for wmfRenderEmptyGraphTag: Fix count() warning (T369600) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:28 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host cephosd2003.codfw.wmnet
14:28 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cephosd2002.codfw.wmnet
14:27 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-conf1001.eqiad.wmnet
14:26 hnowlan@cumin1002: conftool action : set/pooled=inactive; selector: name=(kubernetes1061.eqiad.wmnet|kubernetes1048.eqiad.wmnet|kubernetes1047.eqiad.wmnet|kubernetes1049.eqiad.wmnet|kubernetes1050.eqiad.wmnet|kubernetes1051.eqiad.wmnet|mw1491.eqiad.wmnet|mw1492.eqiad.wmnet|mw1493.eqiad.wmnet),cluster=kubernetes,service=kubesvc
14:26 hnowlan: kubectl drain kubernetes1061.eqiad.wmnet kubernetes1061.eqiad.wmnet kubernetes1061.eqiad.wmnet kubernetes1061.eqiad.wmnet kubernetes1061.eqiad.wmnet kubernetes1061.eqiad.wmnet kubernetes1061.eqiad.wmnet mw1492.eqiad.wmnet mw1492.eqiad.wmnet (T365995)
14:25 ayounsi@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) netbox to netbox-dev2003.codfw.wmnet with reason: Release v4.0.6 to netbox-next - ayounsi@cumin1002 - T336275
14:22 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap sync-world: Backport for wmfRenderEmptyGraphTag: Fix count() warning (T369600)
14:21 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host cephosd2002.codfw.wmnet
14:21 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cephosd2001.codfw.wmnet
14:17 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for Re-introduce notices (T369053) (duration: 39m 17s)
14:15 ayounsi@cumin1002: START - Cookbook sre.deploy.python-code netbox to netbox-dev2003.codfw.wmnet with reason: Release v4.0.6 to netbox-next - ayounsi@cumin1002 - T336275
14:13 ayounsi@cumin1002: END (FAIL) - Cookbook sre.deploy.python-code (exit_code=99) netbox to netbox-dev2003.codfw.wmnet with reason: Release v4.0.6 to netbox-next - ayounsi@cumin1002 - T336275
14:12 ayounsi@cumin1002: START - Cookbook sre.deploy.python-code netbox to netbox-dev2003.codfw.wmnet with reason: Release v4.0.6 to netbox-next - ayounsi@cumin1002 - T336275
14:12 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host cephosd2001.codfw.wmnet
14:10 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde, mlitn: Continuing with sync
14:09 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde, mlitn: Backport for Re-introduce notices (T369053) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
14:03 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1080.eqiad.wmnet
14:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2212 (T352010)', diff saved to https://phabricator.wikimedia.org/P66059 and previous config saved to /var/cache/conftool/dbconfig/20240709-140033-ladsgroup.json
14:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2212.codfw.wmnet with reason: Maintenance
14:00 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2212.codfw.wmnet with reason: Maintenance
13:59 XioNoX: netbox-deploy - rebase the dev branch into main
13:41 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-worker1080.eqiad.wmnet
13:38 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap sync-world: Backport for Re-introduce notices (T369053)
13:34 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2177 (T367856)', diff saved to https://phabricator.wikimedia.org/P66058 and previous config saved to /var/cache/conftool/dbconfig/20240709-133450-marostegui.json
13:34 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2177.codfw.wmnet with reason: Maintenance
13:34 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2177.codfw.wmnet with reason: Maintenance
13:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T367856)', diff saved to https://phabricator.wikimedia.org/P66057 and previous config saved to /var/cache/conftool/dbconfig/20240709-133428-marostegui.json
13:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P66056 and previous config saved to /var/cache/conftool/dbconfig/20240709-131921-marostegui.json
13:16 sukhe: dummy authdns-update run
13:14 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for Add $wgMaxShellWallClockTime setting for shellbox (T356241) (duration: 08m 28s)
13:08 logmsgbot: lucaswerkmeister-wmde@deploy1002 kamila, lucaswerkmeister-wmde: Continuing with sync
13:08 logmsgbot: lucaswerkmeister-wmde@deploy1002 kamila, lucaswerkmeister-wmde: Backport for Add $wgMaxShellWallClockTime setting for shellbox (T356241) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:05 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap sync-world: Backport for Add $wgMaxShellWallClockTime setting for shellbox (T356241)
13:04 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P66055 and previous config saved to /var/cache/conftool/dbconfig/20240709-130414-marostegui.json
12:59 hashar: Restart Gerrit replica on gerrit2002 to apply a configuration change | T367505
12:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T367856)', diff saved to https://phabricator.wikimedia.org/P66054 and previous config saved to /var/cache/conftool/dbconfig/20240709-124907-marostegui.json
12:04 marostegui@cumin1002: dbctl commit (dc=all): 'db2212 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P66053 and previous config saved to /var/cache/conftool/dbconfig/20240709-120440-root.json
12:01 eoghan@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts lists1001.wikimedia.org
12:01 eoghan@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
12:01 eoghan@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: lists1001.wikimedia.org decommissioned, removing all IPs except the asset tag one - eoghan@cumin1002"
11:59 eoghan@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: lists1001.wikimedia.org decommissioned, removing all IPs except the asset tag one - eoghan@cumin1002"
11:54 eoghan@cumin1002: START - Cookbook sre.dns.netbox
11:49 marostegui@cumin1002: dbctl commit (dc=all): 'db2212 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P66052 and previous config saved to /var/cache/conftool/dbconfig/20240709-114935-root.json
11:45 eoghan@cumin1002: START - Cookbook sre.hosts.decommission for hosts lists1001.wikimedia.org
11:34 marostegui@cumin1002: dbctl commit (dc=all): 'db2212 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P66051 and previous config saved to /var/cache/conftool/dbconfig/20240709-113430-root.json
11:28 eoghan: Decommissioning lists1001 T331706
11:26 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply
11:26 marostegui@cumin1002: dbctl commit (dc=all): 'db1162 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P66050 and previous config saved to /var/cache/conftool/dbconfig/20240709-112611-root.json
11:19 marostegui@cumin1002: dbctl commit (dc=all): 'db2212 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P66049 and previous config saved to /var/cache/conftool/dbconfig/20240709-111925-root.json
11:18 btullis: depooled druid1010 for T365995
11:17 btullis: set cephosd cluster into noout mode to prevent rebalancing for T365995
11:16 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply
11:15 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
11:15 btullis: drained dse-k8s-worker1006.eqiad.wmnet ready for T365995
11:15 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
11:14 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
11:13 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
11:12 hnowlan@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
11:11 hnowlan@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
11:11 marostegui@cumin1002: dbctl commit (dc=all): 'db1162 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P66048 and previous config saved to /var/cache/conftool/dbconfig/20240709-111105-root.json
11:10 hnowlan@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
11:10 hnowlan@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
11:04 marostegui@cumin1002: dbctl commit (dc=all): 'db2212 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P66047 and previous config saved to /var/cache/conftool/dbconfig/20240709-110420-root.json
10:56 marostegui@cumin1002: dbctl commit (dc=all): 'db1162 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P66046 and previous config saved to /var/cache/conftool/dbconfig/20240709-105600-root.json
10:54 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2122 (T367856)', diff saved to https://phabricator.wikimedia.org/P66045 and previous config saved to /var/cache/conftool/dbconfig/20240709-105454-marostegui.json
10:54 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2122.codfw.wmnet with reason: Maintenance
10:54 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2122.codfw.wmnet with reason: Maintenance
10:49 marostegui@cumin1002: dbctl commit (dc=all): 'db2212 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P66044 and previous config saved to /var/cache/conftool/dbconfig/20240709-104914-root.json
10:40 marostegui@cumin1002: dbctl commit (dc=all): 'db1162 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P66043 and previous config saved to /var/cache/conftool/dbconfig/20240709-104054-root.json
10:37 Dreamy_Jazz: Finished running maintenance scripts for T366781
10:34 marostegui@cumin1002: dbctl commit (dc=all): 'db2212 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P66042 and previous config saved to /var/cache/conftool/dbconfig/20240709-103409-root.json
10:33 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2212 T369515', diff saved to https://phabricator.wikimedia.org/P66041 and previous config saved to /var/cache/conftool/dbconfig/20240709-103331-root.json
10:32 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db2203 to s1 primary T369515', diff saved to https://phabricator.wikimedia.org/P66040 and previous config saved to /var/cache/conftool/dbconfig/20240709-103238-root.json
10:32 marostegui: Starting s1 codfw failover from db2212 to db2203 - T369515
10:29 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1192 db1198 db1199 T365995', diff saved to https://phabricator.wikimedia.org/P66039 and previous config saved to /var/cache/conftool/dbconfig/20240709-102947-root.json
10:25 marostegui@cumin1002: dbctl commit (dc=all): 'db1162 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P66038 and previous config saved to /var/cache/conftool/dbconfig/20240709-102549-root.json
10:10 marostegui@cumin1002: dbctl commit (dc=all): 'db1162 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P66037 and previous config saved to /var/cache/conftool/dbconfig/20240709-101043-root.json
10:04 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
10:03 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
09:57 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 36 hosts with reason: Primary switchover s1 T369515
09:57 marostegui@cumin1002: dbctl commit (dc=all): 'Set db2203 with weight 0 T369515', diff saved to https://phabricator.wikimedia.org/P66036 and previous config saved to /var/cache/conftool/dbconfig/20240709-095659-root.json
09:56 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 36 hosts with reason: Primary switchover s1 T369515
09:55 marostegui@cumin1002: dbctl commit (dc=all): 'db1162 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P66035 and previous config saved to /var/cache/conftool/dbconfig/20240709-095538-root.json
09:26 cparle@deploy1002: Finished deploy [airflow-dags/platform_eng@0e9b3ac]: (no justification provided) (duration: 00m 32s)
09:26 cparle@deploy1002: Started deploy [airflow-dags/platform_eng@0e9b3ac]: (no justification provided)
09:06 vgutierrez: restart purged @ cp3073
08:28 filippo@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
08:28 filippo@deploy1002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
08:28 filippo@deploy1002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
08:27 filippo@deploy1002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
08:17 aklapper@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.43.0-wmf.13 refs T366958
08:03 jforrester@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
08:01 jforrester@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
08:01 jforrester@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
07:59 jforrester@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
07:58 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
07:57 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
07:42 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts netbox-dev2002.codfw.wmnet
07:42 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
07:42 ayounsi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: netbox-dev2002.codfw.wmnet decommissioned, removing all IPs except the asset tag one - ayounsi@cumin1002"
07:40 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: netbox-dev2002.codfw.wmnet decommissioned, removing all IPs except the asset tag one - ayounsi@cumin1002"
07:40 Dreamy_Jazz: Morning UTC backport window done
07:38 vgutierrez: repool cp3073
07:35 ayounsi@cumin1002: START - Cookbook sre.dns.netbox
07:32 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on P{cp3073.*} and A:cp
07:32 fabfur@cumin1002: cookbooks.sre.cdn.roll-reboot finished rebooting cp3073.esams.wmnet
07:30 dreamyjazz@deploy1002: Synchronized wmf-config/throttle.php: Deploying throttle change for T369522 (duration: 09m 50s)
07:26 ayounsi@cumin1002: START - Cookbook sre.hosts.decommission for hosts netbox-dev2002.codfw.wmnet
07:25 fabfur@cumin1002: START - Cookbook sre.cdn.roll-reboot rolling reboot on P{cp3073.*} and A:cp
07:12 fabfur@cumin1002: END (FAIL) - Cookbook sre.cdn.roll-reboot (exit_code=1) rolling reboot on P{cp3073.*} and A:cp
07:10 fabfur@cumin1002: START - Cookbook sre.cdn.roll-reboot rolling reboot on P{cp3073.*} and A:cp
07:08 fabfur@cumin1002: END (ERROR) - Cookbook sre.cdn.roll-reboot (exit_code=97) rolling reboot on P{cp3073.*} and A:cp
07:08 fabfur@cumin1002: START - Cookbook sre.cdn.roll-reboot rolling reboot on P{cp3073.*} and A:cp
06:54 Dreamy_Jazz: Start `foreachwikiindblist group2.dblist extensions/CheckUser/maintenance/deleteReadOldRowsInCuChanges.php --batch-size=200` in a tmux session
05:41 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
05:41 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
05:20 marostegui: Deploy schema change on s2 eqiad db1162 dbmaint T367856
05:20 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1162.eqiad.wmnet with reason: Long schema change
05:20 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1162.eqiad.wmnet with reason: Long schema change
05:19 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1162 T369339', diff saved to https://phabricator.wikimedia.org/P66034 and previous config saved to /var/cache/conftool/dbconfig/20240709-051911-marostegui.json
05:18 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db1222 to s2 primary and set section read-write T369339', diff saved to https://phabricator.wikimedia.org/P66033 and previous config saved to /var/cache/conftool/dbconfig/20240709-051814-marostegui.json
05:17 marostegui@cumin1002: dbctl commit (dc=all): 'Set s2 eqiad as read-only for maintenance - T369339', diff saved to https://phabricator.wikimedia.org/P66032 and previous config saved to /var/cache/conftool/dbconfig/20240709-051749-marostegui.json
05:17 marostegui: Starting s2 eqiad failover from db1162 to db1222 - T369339
04:58 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 27 hosts with reason: Primary switchover s2 T369339
04:58 marostegui@cumin1002: dbctl commit (dc=all): 'Set db1222 with weight 0 T369339', diff saved to https://phabricator.wikimedia.org/P66031 and previous config saved to /var/cache/conftool/dbconfig/20240709-045814-marostegui.json
04:58 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 27 hosts with reason: Primary switchover s2 T369339
04:41 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2156 (T367856)', diff saved to https://phabricator.wikimedia.org/P66030 and previous config saved to /var/cache/conftool/dbconfig/20240709-044128-marostegui.json
04:41 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
04:41 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
04:41 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2156.codfw.wmnet with reason: Maintenance
04:40 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2156.codfw.wmnet with reason: Maintenance
04:40 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T367856)', diff saved to https://phabricator.wikimedia.org/P66029 and previous config saved to /var/cache/conftool/dbconfig/20240709-044051-marostegui.json
04:25 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P66028 and previous config saved to /var/cache/conftool/dbconfig/20240709-042544-marostegui.json
04:10 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P66027 and previous config saved to /var/cache/conftool/dbconfig/20240709-041036-marostegui.json
04:01 mwpresync@deploy1002: Pruned MediaWiki: 1.43.0-wmf.10 (duration: 00m 57s)
03:55 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T367856)', diff saved to https://phabricator.wikimedia.org/P66026 and previous config saved to /var/cache/conftool/dbconfig/20240709-035529-marostegui.json
03:53 mwpresync@deploy1002: Finished scap: testwikis wikis to 1.43.0-wmf.13 refs T366958 (duration: 50m 52s)
03:03 mwpresync@deploy1002: Started scap sync-world: testwikis wikis to 1.43.0-wmf.13 refs T366958
01:42 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217 (T367781)', diff saved to https://phabricator.wikimedia.org/P66025 and previous config saved to /var/cache/conftool/dbconfig/20240709-014242-arnaudb.json
01:27 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217', diff saved to https://phabricator.wikimedia.org/P66024 and previous config saved to /var/cache/conftool/dbconfig/20240709-012735-arnaudb.json
01:12 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217', diff saved to https://phabricator.wikimedia.org/P66023 and previous config saved to /var/cache/conftool/dbconfig/20240709-011227-arnaudb.json
00:57 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217 (T367781)', diff saved to https://phabricator.wikimedia.org/P66022 and previous config saved to /var/cache/conftool/dbconfig/20240709-005720-arnaudb.json
00:54 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2217 (T367781)', diff saved to https://phabricator.wikimedia.org/P66021 and previous config saved to /var/cache/conftool/dbconfig/20240709-005456-arnaudb.json
00:54 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2217.codfw.wmnet with reason: Maintenance
00:54 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2217.codfw.wmnet with reason: Maintenance
00:49 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.dhcp (exit_code=99) for host sretest2001.codfw.wmnet
00:14 pt1979@cumin2002: START - Cookbook sre.hosts.dhcp for host sretest2001.codfw.wmnet
00:13 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2197.codfw.wmnet with reason: Maintenance
00:13 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2197.codfw.wmnet with reason: Maintenance
00:13 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193 (T367781)', diff saved to https://phabricator.wikimedia.org/P66020 and previous config saved to /var/cache/conftool/dbconfig/20240709-001324-arnaudb.json
00:13 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1245.eqiad.wmnet with reason: Maintenance
00:12 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1245.eqiad.wmnet with reason: Maintenance
00:12 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1230 (T367856)', diff saved to https://phabricator.wikimedia.org/P66019 and previous config saved to /var/cache/conftool/dbconfig/20240709-001250-marostegui.json
00:05 ejegg: payments-wiki upgraded from 82a5e588 to dc0c14d4

2024-07-08

23:58 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193', diff saved to https://phabricator.wikimedia.org/P66018 and previous config saved to /var/cache/conftool/dbconfig/20240708-235817-arnaudb.json
23:57 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1230', diff saved to https://phabricator.wikimedia.org/P66017 and previous config saved to /var/cache/conftool/dbconfig/20240708-235742-marostegui.json
23:52 fabfur@cumin1002: END (FAIL) - Cookbook sre.cdn.roll-reboot (exit_code=1) rolling reboot on A:cp-text_esams
23:43 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193', diff saved to https://phabricator.wikimedia.org/P66016 and previous config saved to /var/cache/conftool/dbconfig/20240708-234310-arnaudb.json
23:42 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1230', diff saved to https://phabricator.wikimedia.org/P66015 and previous config saved to /var/cache/conftool/dbconfig/20240708-234235-marostegui.json
23:28 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193 (T367781)', diff saved to https://phabricator.wikimedia.org/P66014 and previous config saved to /var/cache/conftool/dbconfig/20240708-232803-arnaudb.json
23:27 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1230 (T367856)', diff saved to https://phabricator.wikimedia.org/P66013 and previous config saved to /var/cache/conftool/dbconfig/20240708-232728-marostegui.json
23:25 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2193 (T367781)', diff saved to https://phabricator.wikimedia.org/P66012 and previous config saved to /var/cache/conftool/dbconfig/20240708-232549-arnaudb.json
23:25 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2193.codfw.wmnet with reason: Maintenance
23:25 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2193.codfw.wmnet with reason: Maintenance
23:25 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T367781)', diff saved to https://phabricator.wikimedia.org/P66011 and previous config saved to /var/cache/conftool/dbconfig/20240708-232527-arnaudb.json
23:10 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P66010 and previous config saved to /var/cache/conftool/dbconfig/20240708-231020-arnaudb.json
22:55 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P66009 and previous config saved to /var/cache/conftool/dbconfig/20240708-225513-arnaudb.json
22:46 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.reboot (exit_code=0)
22:42 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on A:cp-upload_esams
22:42 fabfur@cumin1002: cookbooks.sre.cdn.roll-reboot finished rebooting cp3081.esams.wmnet
22:40 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T367781)', diff saved to https://phabricator.wikimedia.org/P66008 and previous config saved to /var/cache/conftool/dbconfig/20240708-224006-arnaudb.json
22:37 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2180 (T367781)', diff saved to https://phabricator.wikimedia.org/P66007 and previous config saved to /var/cache/conftool/dbconfig/20240708-223752-arnaudb.json
22:37 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2180.codfw.wmnet with reason: Maintenance
22:37 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2180.codfw.wmnet with reason: Maintenance
22:37 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169 (T367781)', diff saved to https://phabricator.wikimedia.org/P66006 and previous config saved to /var/cache/conftool/dbconfig/20240708-223741-arnaudb.json
22:26 bking@cumin2002: START - Cookbook sre.wdqs.reboot
22:22 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169', diff saved to https://phabricator.wikimedia.org/P66005 and previous config saved to /var/cache/conftool/dbconfig/20240708-222234-arnaudb.json
22:07 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169', diff saved to https://phabricator.wikimedia.org/P66004 and previous config saved to /var/cache/conftool/dbconfig/20240708-220727-arnaudb.json
21:52 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169 (T367781)', diff saved to https://phabricator.wikimedia.org/P66003 and previous config saved to /var/cache/conftool/dbconfig/20240708-215220-arnaudb.json
21:49 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2169 (T367781)', diff saved to https://phabricator.wikimedia.org/P66002 and previous config saved to /var/cache/conftool/dbconfig/20240708-214954-arnaudb.json
21:49 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2169.codfw.wmnet with reason: Maintenance
21:49 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2169.codfw.wmnet with reason: Maintenance
21:49 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T367781)', diff saved to https://phabricator.wikimedia.org/P66001 and previous config saved to /var/cache/conftool/dbconfig/20240708-214932-arnaudb.json
21:34 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P66000 and previous config saved to /var/cache/conftool/dbconfig/20240708-213425-arnaudb.json
21:24 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
21:23 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
21:19 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P65999 and previous config saved to /var/cache/conftool/dbconfig/20240708-211918-arnaudb.json
21:16 catrope@deploy1002: Finished scap: Backport for Enable VisualEditor by default on Italian Wikibooks (T369342) (duration: 09m 23s)
21:10 catrope@deploy1002: catrope, nmw03: Continuing with sync
21:09 catrope@deploy1002: catrope, nmw03: Backport for Enable VisualEditor by default on Italian Wikibooks (T369342) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:06 catrope@deploy1002: Started scap sync-world: Backport for Enable VisualEditor by default on Italian Wikibooks (T369342)
21:05 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: elastic109[3-5]* for T348977 - bking@cumin2002
21:05 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: elastic109[3-5]* for T348977 - bking@cumin2002
21:05 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on elastic[1093-1095].eqiad.wmnet with reason: T348977
21:05 bking@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on elastic[1093-1095].eqiad.wmnet with reason: T348977
21:04 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T367781)', diff saved to https://phabricator.wikimedia.org/P65998 and previous config saved to /var/cache/conftool/dbconfig/20240708-210410-arnaudb.json
21:02 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs1023.eqiad.wmnet
21:02 fabfur@cumin1002: cookbooks.sre.cdn.roll-reboot finished rebooting cp3080.esams.wmnet
21:01 fabfur@cumin1002: cookbooks.sre.cdn.roll-reboot finished rebooting cp3072.esams.wmnet
21:01 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2158 (T367781)', diff saved to https://phabricator.wikimedia.org/P65997 and previous config saved to /var/cache/conftool/dbconfig/20240708-210144-arnaudb.json
21:01 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance
21:01 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance
21:01 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2158.codfw.wmnet with reason: Maintenance
21:01 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2158.codfw.wmnet with reason: Maintenance
21:01 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T367781)', diff saved to https://phabricator.wikimedia.org/P65996 and previous config saved to /var/cache/conftool/dbconfig/20240708-210106-arnaudb.json
20:55 bking@cumin2002: START - Cookbook sre.hosts.reboot-single for host wdqs1023.eqiad.wmnet
20:52 catrope@deploy1002: Finished scap: Backport for Graph extension: Add tracking for data sources used in <graph> tags (duration: 13m 00s)
20:50 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs1022.eqiad.wmnet
20:47 catrope@deploy1002: catrope: Continuing with sync
20:46 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P65995 and previous config saved to /var/cache/conftool/dbconfig/20240708-204559-arnaudb.json
20:43 bking@cumin2002: START - Cookbook sre.hosts.reboot-single for host wdqs1022.eqiad.wmnet
20:42 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.reboot (exit_code=99)
20:42 catrope@deploy1002: catrope: Backport for Graph extension: Add tracking for data sources used in <graph> tags synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:40 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1230 (T367856)', diff saved to https://phabricator.wikimedia.org/P65994 and previous config saved to /var/cache/conftool/dbconfig/20240708-204042-marostegui.json
20:40 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1230.eqiad.wmnet with reason: Maintenance
20:40 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1230.eqiad.wmnet with reason: Maintenance
20:39 catrope@deploy1002: Started scap sync-world: Backport for Graph extension: Add tracking for data sources used in <graph> tags
20:38 bking@cumin2002: START - Cookbook sre.wdqs.reboot
20:35 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.reboot (exit_code=99)
20:30 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P65993 and previous config saved to /var/cache/conftool/dbconfig/20240708-203052-arnaudb.json
20:28 bking@cumin2002: START - Cookbook sre.wdqs.reboot
20:27 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.reboot (exit_code=0)
20:15 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T367781)', diff saved to https://phabricator.wikimedia.org/P65992 and previous config saved to /var/cache/conftool/dbconfig/20240708-201545-arnaudb.json
20:13 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2151 (T367781)', diff saved to https://phabricator.wikimedia.org/P65991 and previous config saved to /var/cache/conftool/dbconfig/20240708-201318-arnaudb.json
20:13 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2151.codfw.wmnet with reason: Maintenance
20:13 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2151.codfw.wmnet with reason: Maintenance
20:12 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2129 (T367781)', diff saved to https://phabricator.wikimedia.org/P65990 and previous config saved to /var/cache/conftool/dbconfig/20240708-201256-arnaudb.json
20:08 bking@cumin2002: START - Cookbook sre.wdqs.reboot
19:57 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2129', diff saved to https://phabricator.wikimedia.org/P65989 and previous config saved to /var/cache/conftool/dbconfig/20240708-195749-arnaudb.json
19:44 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2149 (T367856)', diff saved to https://phabricator.wikimedia.org/P65988 and previous config saved to /var/cache/conftool/dbconfig/20240708-194435-marostegui.json
19:44 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2149.codfw.wmnet with reason: Maintenance
19:44 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2149.codfw.wmnet with reason: Maintenance
19:42 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2129', diff saved to https://phabricator.wikimedia.org/P65987 and previous config saved to /var/cache/conftool/dbconfig/20240708-194242-arnaudb.json
19:39 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.reboot (exit_code=0)
19:27 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2129 (T367781)', diff saved to https://phabricator.wikimedia.org/P65986 and previous config saved to /var/cache/conftool/dbconfig/20240708-192735-arnaudb.json
19:25 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2129 (T367781)', diff saved to https://phabricator.wikimedia.org/P65985 and previous config saved to /var/cache/conftool/dbconfig/20240708-192508-arnaudb.json
19:25 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2129.codfw.wmnet with reason: Maintenance
19:24 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2129.codfw.wmnet with reason: Maintenance
19:24 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2124 (T367781)', diff saved to https://phabricator.wikimedia.org/P65984 and previous config saved to /var/cache/conftool/dbconfig/20240708-192444-arnaudb.json
19:21 fabfur@cumin1002: cookbooks.sre.cdn.roll-reboot finished rebooting cp3079.esams.wmnet
19:21 fabfur@cumin1002: cookbooks.sre.cdn.roll-reboot finished rebooting cp3071.esams.wmnet
19:21 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
19:20 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
19:09 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P65983 and previous config saved to /var/cache/conftool/dbconfig/20240708-190937-arnaudb.json
19:02 bking@cumin2002: START - Cookbook sre.wdqs.reboot
18:54 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P65982 and previous config saved to /var/cache/conftool/dbconfig/20240708-185430-arnaudb.json
18:39 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2124 (T367781)', diff saved to https://phabricator.wikimedia.org/P65981 and previous config saved to /var/cache/conftool/dbconfig/20240708-183923-arnaudb.json
18:37 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2124 (T367781)', diff saved to https://phabricator.wikimedia.org/P65980 and previous config saved to /var/cache/conftool/dbconfig/20240708-183658-arnaudb.json
18:36 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2124.codfw.wmnet with reason: Maintenance
18:36 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2124.codfw.wmnet with reason: Maintenance
18:36 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2114.codfw.wmnet with reason: Maintenance
18:36 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2114.codfw.wmnet with reason: Maintenance
18:36 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance
18:35 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance
18:35 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1231 (T367781)', diff saved to https://phabricator.wikimedia.org/P65979 and previous config saved to /var/cache/conftool/dbconfig/20240708-183548-arnaudb.json
18:20 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1231', diff saved to https://phabricator.wikimedia.org/P65978 and previous config saved to /var/cache/conftool/dbconfig/20240708-182041-arnaudb.json
18:06 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host search-loader2002.codfw.wmnet
18:05 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1231', diff saved to https://phabricator.wikimedia.org/P65977 and previous config saved to /var/cache/conftool/dbconfig/20240708-180533-arnaudb.json
18:02 bking@cumin2002: START - Cookbook sre.hosts.reboot-single for host search-loader2002.codfw.wmnet
17:50 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1231 (T367781)', diff saved to https://phabricator.wikimedia.org/P65976 and previous config saved to /var/cache/conftool/dbconfig/20240708-175026-arnaudb.json
17:49 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1231 (T367781)', diff saved to https://phabricator.wikimedia.org/P65975 and previous config saved to /var/cache/conftool/dbconfig/20240708-174918-arnaudb.json
17:49 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1231.eqiad.wmnet with reason: Maintenance
17:48 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1231.eqiad.wmnet with reason: Maintenance
17:48 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1225.eqiad.wmnet with reason: Maintenance
17:48 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1225.eqiad.wmnet with reason: Maintenance
17:48 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1224 (T367781)', diff saved to https://phabricator.wikimedia.org/P65974 and previous config saved to /var/cache/conftool/dbconfig/20240708-174823-arnaudb.json
17:40 fabfur@cumin1002: cookbooks.sre.cdn.roll-reboot finished rebooting cp3078.esams.wmnet
17:38 fabfur@cumin1002: cookbooks.sre.cdn.roll-reboot finished rebooting cp3070.esams.wmnet
17:33 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1224', diff saved to https://phabricator.wikimedia.org/P65973 and previous config saved to /var/cache/conftool/dbconfig/20240708-173316-arnaudb.json
17:18 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1224', diff saved to https://phabricator.wikimedia.org/P65972 and previous config saved to /var/cache/conftool/dbconfig/20240708-171810-arnaudb.json
17:03 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1224 (T367781)', diff saved to https://phabricator.wikimedia.org/P65971 and previous config saved to /var/cache/conftool/dbconfig/20240708-170302-arnaudb.json
17:00 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1224 (T367781)', diff saved to https://phabricator.wikimedia.org/P65970 and previous config saved to /var/cache/conftool/dbconfig/20240708-170053-arnaudb.json
17:00 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1224.eqiad.wmnet with reason: Maintenance
17:00 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1224.eqiad.wmnet with reason: Maintenance
17:00 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T367781)', diff saved to https://phabricator.wikimedia.org/P65969 and previous config saved to /var/cache/conftool/dbconfig/20240708-170031-arnaudb.json
16:45 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P65968 and previous config saved to /var/cache/conftool/dbconfig/20240708-164524-arnaudb.json
16:39 ladsgroup@deploy1002: Finished scap: Backport for Reduce frequency of two query pages in commonswiki (T369024) (duration: 07m 50s)
16:34 ladsgroup@deploy1002: ladsgroup: Continuing with sync
16:33 ladsgroup@deploy1002: ladsgroup: Backport for Reduce frequency of two query pages in commonswiki (T369024) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
16:31 ladsgroup@deploy1002: Started scap sync-world: Backport for Reduce frequency of two query pages in commonswiki (T369024)
16:30 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P65967 and previous config saved to /var/cache/conftool/dbconfig/20240708-163017-arnaudb.json
16:15 dcaro@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd1011.eqiad.wmnet
16:15 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T367781)', diff saved to https://phabricator.wikimedia.org/P65966 and previous config saved to /var/cache/conftool/dbconfig/20240708-161510-arnaudb.json
16:13 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1201 (T367781)', diff saved to https://phabricator.wikimedia.org/P65965 and previous config saved to /var/cache/conftool/dbconfig/20240708-161302-arnaudb.json
16:12 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1201.eqiad.wmnet with reason: Maintenance
16:12 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1201.eqiad.wmnet with reason: Maintenance
16:12 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T367781)', diff saved to https://phabricator.wikimedia.org/P65964 and previous config saved to /var/cache/conftool/dbconfig/20240708-161238-arnaudb.json
16:09 dcaro@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudcephosd1011.eqiad.wmnet
16:08 root@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1011.eqiad.wmnet with OS bullseye
15:57 fabfur@cumin1002: cookbooks.sre.cdn.roll-reboot finished rebooting cp3077.esams.wmnet
15:57 fabfur@cumin1002: cookbooks.sre.cdn.roll-reboot finished rebooting cp3069.esams.wmnet
15:57 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P65963 and previous config saved to /var/cache/conftool/dbconfig/20240708-155731-arnaudb.json
15:51 jdrewniak@deploy1002: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 06m 28s)
15:47 pfischer@deploy1002: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
15:46 pfischer@deploy1002: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
15:45 pfischer@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
15:45 pfischer@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
15:45 pfischer@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
15:45 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/commons-impact-analytics: apply
15:44 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 07m 54s)
15:44 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/commons-impact-analytics: apply
15:44 pfischer@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
15:42 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P65962 and previous config saved to /var/cache/conftool/dbconfig/20240708-154224-arnaudb.json
15:38 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
15:38 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
15:27 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T367781)', diff saved to https://phabricator.wikimedia.org/P65961 and previous config saved to /var/cache/conftool/dbconfig/20240708-152717-arnaudb.json
15:25 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1187 (T367781)', diff saved to https://phabricator.wikimedia.org/P65960 and previous config saved to /var/cache/conftool/dbconfig/20240708-152508-arnaudb.json
15:25 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1187.eqiad.wmnet with reason: Maintenance
15:24 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1187.eqiad.wmnet with reason: Maintenance
15:24 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T367781)', diff saved to https://phabricator.wikimedia.org/P65959 and previous config saved to /var/cache/conftool/dbconfig/20240708-152446-arnaudb.json
15:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Bumping db1227 weight (T366852)', diff saved to https://phabricator.wikimedia.org/P65958 and previous config saved to /var/cache/conftool/dbconfig/20240708-152222-ladsgroup.json
15:16 root@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1011.eqiad.wmnet with reason: host reimage
15:13 root@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1011.eqiad.wmnet with reason: host reimage
15:09 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P65957 and previous config saved to /var/cache/conftool/dbconfig/20240708-150939-arnaudb.json
14:59 root@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1011.eqiad.wmnet with OS bullseye
14:57 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host search-loader1002.eqiad.wmnet
14:54 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P65956 and previous config saved to /var/cache/conftool/dbconfig/20240708-145432-arnaudb.json
14:53 bking@cumin2002: START - Cookbook sre.hosts.reboot-single for host search-loader1002.eqiad.wmnet
14:53 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host search-loader1002.eqiad.wmnet
14:53 bking@cumin2002: START - Cookbook sre.hosts.reboot-single for host search-loader1002.eqiad.wmnet
14:52 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host search-loader1002.eqiad.wmnet
14:51 bking@cumin2002: START - Cookbook sre.hosts.reboot-single for host search-loader1002.eqiad.wmnet
14:51 claime: cleaning up old shellbox files on mw1438
14:43 root@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts cloudcephosd1011.eqiad.wmnet
14:43 root@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd1011.eqiad.wmnet
14:39 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T367781)', diff saved to https://phabricator.wikimedia.org/P65955 and previous config saved to /var/cache/conftool/dbconfig/20240708-143925-arnaudb.json
14:37 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1180 (T367781)', diff saved to https://phabricator.wikimedia.org/P65954 and previous config saved to /var/cache/conftool/dbconfig/20240708-143716-arnaudb.json
14:37 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1180.eqiad.wmnet with reason: Maintenance
14:36 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1180.eqiad.wmnet with reason: Maintenance
14:36 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T367781)', diff saved to https://phabricator.wikimedia.org/P65953 and previous config saved to /var/cache/conftool/dbconfig/20240708-143654-arnaudb.json
14:34 root@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudcephosd1011.eqiad.wmnet
14:31 root@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cloudcephosd1011.eqiad.wmnet
14:27 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.reboot (exit_code=99)
14:27 bking@cumin2002: START - Cookbook sre.wdqs.reboot
14:23 filippo@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
14:22 filippo@deploy1002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
14:22 filippo@deploy1002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
14:21 filippo@deploy1002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
14:21 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.reboot (exit_code=99)
14:21 bking@cumin2002: START - Cookbook sre.wdqs.reboot
14:21 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P65952 and previous config saved to /var/cache/conftool/dbconfig/20240708-142147-arnaudb.json
14:21 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.reboot (exit_code=99)
14:21 bking@cumin2002: START - Cookbook sre.wdqs.reboot
14:20 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.reboot (exit_code=99)
14:20 bking@cumin2002: START - Cookbook sre.wdqs.reboot
14:20 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.reboot (exit_code=99)
14:20 bking@cumin2002: START - Cookbook sre.wdqs.reboot
14:20 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.reboot (exit_code=99)
14:20 bking@cumin2002: START - Cookbook sre.wdqs.reboot
14:18 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.reboot (exit_code=99)
14:17 bking@cumin2002: START - Cookbook sre.wdqs.reboot
14:17 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.reboot (exit_code=99)
14:17 bking@cumin2002: START - Cookbook sre.wdqs.reboot
14:17 fabfur@cumin1002: cookbooks.sre.cdn.roll-reboot finished rebooting cp3068.esams.wmnet
14:16 fabfur@cumin1002: cookbooks.sre.cdn.roll-reboot finished rebooting cp3076.esams.wmnet
14:14 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance
14:14 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance
14:14 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1213 (T367856)', diff saved to https://phabricator.wikimedia.org/P65951 and previous config saved to /var/cache/conftool/dbconfig/20240708-141432-marostegui.json
14:13 claime: cleaning up old shellbox files on mw1446
14:06 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P65950 and previous config saved to /var/cache/conftool/dbconfig/20240708-140640-arnaudb.json
13:59 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1213', diff saved to https://phabricator.wikimedia.org/P65949 and previous config saved to /var/cache/conftool/dbconfig/20240708-135925-marostegui.json
13:58 urbanecm@deploy1002: Finished scap: Backport for lib: Update metrics-platform to 84ed8dcbe7c9 (duration: 10m 36s)
13:53 urbanecm@deploy1002: phuedx, urbanecm: Continuing with sync
13:51 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T367781)', diff saved to https://phabricator.wikimedia.org/P65948 and previous config saved to /var/cache/conftool/dbconfig/20240708-135132-arnaudb.json
13:50 urbanecm@deploy1002: phuedx, urbanecm: Backport for lib: Update metrics-platform to 84ed8dcbe7c9 synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:50 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1168 (T367781)', diff saved to https://phabricator.wikimedia.org/P65947 and previous config saved to /var/cache/conftool/dbconfig/20240708-135024-arnaudb.json
13:50 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1168.eqiad.wmnet with reason: Maintenance
13:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1168.eqiad.wmnet with reason: Maintenance
13:50 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T367781)', diff saved to https://phabricator.wikimedia.org/P65946 and previous config saved to /var/cache/conftool/dbconfig/20240708-135002-arnaudb.json
13:48 urbanecm@deploy1002: Started scap sync-world: Backport for lib: Update metrics-platform to 84ed8dcbe7c9
13:47 urbanecm@deploy1002: Finished scap: Backport for EventStreamConfig: Add hive ingestion defaults (T367134), [wikifunctionswiki] Disable MobileFrontend in production (T349408) (duration: 30m 38s)
13:44 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1213', diff saved to https://phabricator.wikimedia.org/P65945 and previous config saved to /var/cache/conftool/dbconfig/20240708-134418-marostegui.json
13:42 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.REBOOT (1 nodes at a time) for ElasticSearch cluster relforge: security update - bking@cumin2002 - T366555
13:39 urbanecm@deploy1002: tchin, jforrester, urbanecm: Continuing with sync
13:34 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P65944 and previous config saved to /var/cache/conftool/dbconfig/20240708-133456-arnaudb.json
13:32 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REBOOT (1 nodes at a time) for ElasticSearch cluster relforge: security update - bking@cumin2002 - T366555
13:32 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REBOOT (1 nodes at a time) for ElasticSearch cluster relforge: security update - bking@cumin2002 - T366555
13:32 urbanecm@deploy1002: tchin, jforrester, urbanecm: Backport for EventStreamConfig: Add hive ingestion defaults (T367134), [wikifunctionswiki] Disable MobileFrontend in production (T349408) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:31 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REBOOT (1 nodes at a time) for ElasticSearch cluster relforge: security update - bking@cumin2002 - T366555
13:29 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1213 (T367856)', diff saved to https://phabricator.wikimedia.org/P65943 and previous config saved to /var/cache/conftool/dbconfig/20240708-132911-marostegui.json
13:19 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P65942 and previous config saved to /var/cache/conftool/dbconfig/20240708-131948-arnaudb.json
13:17 urbanecm@deploy1002: Started scap sync-world: Backport for EventStreamConfig: Add hive ingestion defaults (T367134), [wikifunctionswiki] Disable MobileFrontend in production (T349408)
13:04 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T367781)', diff saved to https://phabricator.wikimedia.org/P65941 and previous config saved to /var/cache/conftool/dbconfig/20240708-130441-arnaudb.json
13:03 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1165 (T367781)', diff saved to https://phabricator.wikimedia.org/P65940 and previous config saved to /var/cache/conftool/dbconfig/20240708-130333-arnaudb.json
13:03 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
13:03 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
13:03 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1165.eqiad.wmnet with reason: Maintenance
13:02 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1165.eqiad.wmnet with reason: Maintenance
12:51 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
12:51 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-mariadb1002.eqiad.wmnet with OS bookworm
12:51 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
12:48 vgutierrez: test bwlimit per url on cp4051 - T317799
12:43 marostegui@cumin1002: dbctl commit (dc=all): 'Pool with small weight T365805', diff saved to https://phabricator.wikimedia.org/P65939 and previous config saved to /var/cache/conftool/dbconfig/20240708-124310-marostegui.json
12:36 fabfur@cumin1002: cookbooks.sre.cdn.roll-reboot finished rebooting cp3067.esams.wmnet
12:36 fabfur@cumin1002: cookbooks.sre.cdn.roll-reboot finished rebooting cp3075.esams.wmnet
12:35 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-mariadb1002.eqiad.wmnet with reason: host reimage
12:32 btullis@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-mariadb1002.eqiad.wmnet with reason: host reimage
12:27 btullis@deploy1002: Finished deploy [airflow-dags/analytics@a2faba7]: (no justification provided) (duration: 00m 27s)
12:27 btullis@deploy1002: Started deploy [airflow-dags/analytics@a2faba7]: (no justification provided)
12:19 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host an-mariadb1002.eqiad.wmnet with OS bookworm
11:54 marostegui@cumin1002: dbctl commit (dc=all): 'db2213 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P65938 and previous config saved to /var/cache/conftool/dbconfig/20240708-115422-root.json
11:47 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 262476
11:47 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 262476
11:39 marostegui@cumin1002: dbctl commit (dc=all): 'db2213 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P65937 and previous config saved to /var/cache/conftool/dbconfig/20240708-113917-root.json
11:34 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance
11:34 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance
11:27 jforrester@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
11:26 jforrester@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
11:26 jforrester@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
11:25 jforrester@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
11:25 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
11:25 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
11:24 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
11:24 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
11:24 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
11:24 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
11:24 marostegui@cumin1002: dbctl commit (dc=all): 'db2213 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P65936 and previous config saved to /var/cache/conftool/dbconfig/20240708-112411-root.json
11:09 marostegui@cumin1002: dbctl commit (dc=all): 'db2213 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P65935 and previous config saved to /var/cache/conftool/dbconfig/20240708-110905-root.json
10:55 fabfur@cumin1002: cookbooks.sre.cdn.roll-reboot finished rebooting cp3066.esams.wmnet
10:55 fabfur@cumin1002: cookbooks.sre.cdn.roll-reboot finished rebooting cp3074.esams.wmnet
10:54 marostegui@cumin1002: dbctl commit (dc=all): 'db2213 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P65934 and previous config saved to /var/cache/conftool/dbconfig/20240708-105400-root.json
10:53 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1213 (T367856)', diff saved to https://phabricator.wikimedia.org/P65933 and previous config saved to /var/cache/conftool/dbconfig/20240708-105348-marostegui.json
10:53 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1213.eqiad.wmnet with reason: Maintenance
10:53 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1213.eqiad.wmnet with reason: Maintenance
10:53 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1210 (T367856)', diff saved to https://phabricator.wikimedia.org/P65932 and previous config saved to /var/cache/conftool/dbconfig/20240708-105325-marostegui.json
10:45 fabfur@cumin1002: START - Cookbook sre.cdn.roll-reboot rolling reboot on A:cp-upload_esams
10:45 fabfur@cumin1002: START - Cookbook sre.cdn.roll-reboot rolling reboot on A:cp-text_esams
10:45 fabfur: rebooting A:cp-esams (T366555)
10:43 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 270359
10:43 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 270359
10:43 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 268248
10:43 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 268248
10:42 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 262476
10:42 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 262476
10:42 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 272432
10:41 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 272432
10:38 marostegui@cumin1002: dbctl commit (dc=all): 'db2213 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P65931 and previous config saved to /var/cache/conftool/dbconfig/20240708-103854-root.json
10:38 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1210', diff saved to https://phabricator.wikimedia.org/P65930 and previous config saved to /var/cache/conftool/dbconfig/20240708-103818-marostegui.json
10:23 marostegui@cumin1002: dbctl commit (dc=all): 'db2213 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P65929 and previous config saved to /var/cache/conftool/dbconfig/20240708-102347-root.json
10:23 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1210', diff saved to https://phabricator.wikimedia.org/P65928 and previous config saved to /var/cache/conftool/dbconfig/20240708-102311-marostegui.json
10:08 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1210 (T367856)', diff saved to https://phabricator.wikimedia.org/P65927 and previous config saved to /var/cache/conftool/dbconfig/20240708-100804-marostegui.json
10:06 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: sync
10:02 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: sync
10:00 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
10:00 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
09:58 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: sync
09:55 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: sync
09:50 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: sync
09:50 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/rest-gateway: sync
09:49 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: sync
09:49 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/api-gateway: sync
09:44 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: sync
09:44 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/rest-gateway: sync
09:41 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/api-gateway: sync
09:41 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/api-gateway: sync
09:38 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: sync
09:38 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifeeds: sync
09:32 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifeeds: sync
09:32 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/wikifeeds: sync
09:31 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifeeds: sync
09:31 elukey@deploy1002: helmfile [staging] START helmfile.d/services/wikifeeds: sync
09:17 arturo: aborrero@apt1002:~$ sudo -i reprepro --component thirdparty/k9s includedeb bookworm-wikimedia /home/aborrero/k9s_linux_amd64.deb (T366061)
08:59 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:swift-fe-codfw
08:56 mvernon@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:swift-fe-codfw
08:51 Dreamy_Jazz: Running `foreachwikiindblist group1.dblist extensions/CheckUser/maintenance/deleteReadOldRowsInCuChanges.php --batch-size=200` in a tmux session
08:50 Dreamy_Jazz: Restarting MediaModeration scanning script - https://wikitech.wikimedia.org/wiki/MediaModeration
08:42 arturo: update packages for thirdparty/kubeadm-k8s-1-25 bookworm-wikimedia in apt1002 (T369163)
08:26 godog: re-enable business hours americas oncall - T369122
07:02 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 270052
07:01 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 270052
06:17 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 52455
06:16 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 52455
06:15 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 137409
06:14 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 137409
06:13 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 27768
06:13 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 27768
06:11 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 61512
06:09 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 61512
06:08 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 269783
06:08 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 269783
06:06 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 52320
06:05 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 52320
06:05 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 7738
06:04 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 7738
06:04 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 52468
06:03 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 52468
06:02 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 270052
06:01 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 270052
05:59 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 28008
05:59 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 28008
05:39 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 17072
05:39 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 17072
05:39 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 263522
05:38 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 263522
05:38 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 61942
05:38 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 61942
05:38 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 18013
05:37 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 18013
05:37 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 268248
05:37 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 268248
05:36 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 61672
05:36 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 61672
05:36 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 28352
05:36 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 28352
05:36 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 999
05:36 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 999
05:35 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 4788
05:34 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 4788
05:34 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 132167
05:33 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 132167
05:33 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 6447
05:32 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 6447
05:31 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1210 (T367856)', diff saved to https://phabricator.wikimedia.org/P65926 and previous config saved to /var/cache/conftool/dbconfig/20240708-053133-marostegui.json
05:31 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1210.eqiad.wmnet with reason: Maintenance
05:31 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1210.eqiad.wmnet with reason: Maintenance
05:31 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T367856)', diff saved to https://phabricator.wikimedia.org/P65925 and previous config saved to /var/cache/conftool/dbconfig/20240708-053122-marostegui.json
05:30 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 28306
05:29 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 28306
05:24 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2213.codfw.wmnet with reason: Long schema change
05:24 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2213.codfw.wmnet with reason: Long schema change
05:24 marostegui: Deploy schema change on s5 codfw db2213 dbmaint T367856
05:19 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2213 T369478', diff saved to https://phabricator.wikimedia.org/P65923 and previous config saved to /var/cache/conftool/dbconfig/20240708-051935-root.json
05:18 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db2123 to s5 primary T369478', diff saved to https://phabricator.wikimedia.org/P65922 and previous config saved to /var/cache/conftool/dbconfig/20240708-051840-root.json
05:18 marostegui: Starting s5 codfw failover from db2213 to db2123 - T369478
05:16 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P65921 and previous config saved to /var/cache/conftool/dbconfig/20240708-051615-marostegui.json
05:16 marostegui@cumin1002: dbctl commit (dc=all): 'Remove db2123 from dump/slow', diff saved to https://phabricator.wikimedia.org/P65920 and previous config saved to /var/cache/conftool/dbconfig/20240708-051605-marostegui.json
05:03 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s5 T369478
05:03 marostegui@cumin1002: dbctl commit (dc=all): 'Set db2123 with weight 0 T369478', diff saved to https://phabricator.wikimedia.org/P65919 and previous config saved to /var/cache/conftool/dbconfig/20240708-050301-root.json
05:02 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 25 hosts with reason: Primary switchover s5 T369478
04:52 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P65918 and previous config saved to /var/cache/conftool/dbconfig/20240708-045246-marostegui.json
04:37 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T367856)', diff saved to https://phabricator.wikimedia.org/P65917 and previous config saved to /var/cache/conftool/dbconfig/20240708-043738-marostegui.json
01:40 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1200 (T367856)', diff saved to https://phabricator.wikimedia.org/P65916 and previous config saved to /var/cache/conftool/dbconfig/20240708-014044-marostegui.json
01:40 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1200.eqiad.wmnet with reason: Maintenance
01:40 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1200.eqiad.wmnet with reason: Maintenance
01:40 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T367856)', diff saved to https://phabricator.wikimedia.org/P65915 and previous config saved to /var/cache/conftool/dbconfig/20240708-014022-marostegui.json
01:25 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P65914 and previous config saved to /var/cache/conftool/dbconfig/20240708-012515-marostegui.json
01:10 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P65913 and previous config saved to /var/cache/conftool/dbconfig/20240708-011008-marostegui.json
00:55 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T367856)', diff saved to https://phabricator.wikimedia.org/P65912 and previous config saved to /var/cache/conftool/dbconfig/20240708-005501-marostegui.json

2024-07-07

21:50 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1185 (T367856)', diff saved to https://phabricator.wikimedia.org/P65911 and previous config saved to /var/cache/conftool/dbconfig/20240707-215014-marostegui.json
21:50 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1185.eqiad.wmnet with reason: Maintenance
21:49 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1185.eqiad.wmnet with reason: Maintenance
21:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T367856)', diff saved to https://phabricator.wikimedia.org/P65910 and previous config saved to /var/cache/conftool/dbconfig/20240707-214952-marostegui.json
21:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P65909 and previous config saved to /var/cache/conftool/dbconfig/20240707-213445-marostegui.json
21:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P65908 and previous config saved to /var/cache/conftool/dbconfig/20240707-211938-marostegui.json
21:04 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T367856)', diff saved to https://phabricator.wikimedia.org/P65907 and previous config saved to /var/cache/conftool/dbconfig/20240707-210430-marostegui.json
15:41 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1161 (T367856)', diff saved to https://phabricator.wikimedia.org/P65906 and previous config saved to /var/cache/conftool/dbconfig/20240707-154059-marostegui.json
15:40 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
15:40 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
15:40 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1161.eqiad.wmnet with reason: Maintenance
15:40 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1161.eqiad.wmnet with reason: Maintenance

2024-07-06

18:26 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2211 (T367856)', diff saved to https://phabricator.wikimedia.org/P65905 and previous config saved to /var/cache/conftool/dbconfig/20240706-182625-marostegui.json
18:11 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2211', diff saved to https://phabricator.wikimedia.org/P65904 and previous config saved to /var/cache/conftool/dbconfig/20240706-181117-marostegui.json
17:56 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2211', diff saved to https://phabricator.wikimedia.org/P65903 and previous config saved to /var/cache/conftool/dbconfig/20240706-175610-marostegui.json
17:41 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2211 (T367856)', diff saved to https://phabricator.wikimedia.org/P65902 and previous config saved to /var/cache/conftool/dbconfig/20240706-174103-marostegui.json
17:21 hnowlan@cumin1002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:swift-fe-eqiad
17:18 hnowlan@cumin1002: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:swift-fe-eqiad
12:45 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2211 (T367856)', diff saved to https://phabricator.wikimedia.org/P65901 and previous config saved to /var/cache/conftool/dbconfig/20240706-124535-marostegui.json
12:45 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2211.codfw.wmnet with reason: Maintenance
12:45 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2211.codfw.wmnet with reason: Maintenance
07:55 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2201.codfw.wmnet with reason: Maintenance
07:54 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2201.codfw.wmnet with reason: Maintenance
07:54 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2192 (T367856)', diff saved to https://phabricator.wikimedia.org/P65900 and previous config saved to /var/cache/conftool/dbconfig/20240706-075448-marostegui.json
07:39 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2192', diff saved to https://phabricator.wikimedia.org/P65899 and previous config saved to /var/cache/conftool/dbconfig/20240706-073941-marostegui.json
07:24 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2192', diff saved to https://phabricator.wikimedia.org/P65898 and previous config saved to /var/cache/conftool/dbconfig/20240706-072434-marostegui.json
07:09 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2192 (T367856)', diff saved to https://phabricator.wikimedia.org/P65897 and previous config saved to /var/cache/conftool/dbconfig/20240706-070927-marostegui.json
04:35 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2192 (T367856)', diff saved to https://phabricator.wikimedia.org/P65896 and previous config saved to /var/cache/conftool/dbconfig/20240706-043535-marostegui.json
04:35 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2192.codfw.wmnet with reason: Maintenance
04:35 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2192.codfw.wmnet with reason: Maintenance
04:35 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T367856)', diff saved to https://phabricator.wikimedia.org/P65895 and previous config saved to /var/cache/conftool/dbconfig/20240706-043513-marostegui.json
04:20 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P65894 and previous config saved to /var/cache/conftool/dbconfig/20240706-042006-marostegui.json
04:05 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P65893 and previous config saved to /var/cache/conftool/dbconfig/20240706-040459-marostegui.json
03:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T367856)', diff saved to https://phabricator.wikimedia.org/P65892 and previous config saved to /var/cache/conftool/dbconfig/20240706-034952-marostegui.json
00:56 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2178 (T367856)', diff saved to https://phabricator.wikimedia.org/P65891 and previous config saved to /var/cache/conftool/dbconfig/20240706-005648-marostegui.json
00:56 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2178.codfw.wmnet with reason: Maintenance
00:56 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2178.codfw.wmnet with reason: Maintenance
00:56 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2171 (T367856)', diff saved to https://phabricator.wikimedia.org/P65890 and previous config saved to /var/cache/conftool/dbconfig/20240706-005626-marostegui.json
00:41 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2171', diff saved to https://phabricator.wikimedia.org/P65889 and previous config saved to /var/cache/conftool/dbconfig/20240706-004119-marostegui.json
00:26 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2171', diff saved to https://phabricator.wikimedia.org/P65888 and previous config saved to /var/cache/conftool/dbconfig/20240706-002612-marostegui.json
00:11 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2171 (T367856)', diff saved to https://phabricator.wikimedia.org/P65887 and previous config saved to /var/cache/conftool/dbconfig/20240706-001105-marostegui.json

2024-07-05

20:05 akosiaris@deploy1003: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
20:04 akosiaris@deploy1003: helmfile [staging] START helmfile.d/services/api-gateway: apply
18:56 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2171 (T367856)', diff saved to https://phabricator.wikimedia.org/P65886 and previous config saved to /var/cache/conftool/dbconfig/20240705-185604-marostegui.json
18:55 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2171.codfw.wmnet with reason: Maintenance
18:55 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2171.codfw.wmnet with reason: Maintenance
18:55 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T367856)', diff saved to https://phabricator.wikimedia.org/P65885 and previous config saved to /var/cache/conftool/dbconfig/20240705-185542-marostegui.json
18:40 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P65884 and previous config saved to /var/cache/conftool/dbconfig/20240705-184034-marostegui.json
18:34 marostegui@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P65883 and previous config saved to /var/cache/conftool/dbconfig/20240705-183428-root.json
18:25 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P65882 and previous config saved to /var/cache/conftool/dbconfig/20240705-182527-marostegui.json
18:19 marostegui@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P65881 and previous config saved to /var/cache/conftool/dbconfig/20240705-181923-root.json
18:10 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T367856)', diff saved to https://phabricator.wikimedia.org/P65880 and previous config saved to /var/cache/conftool/dbconfig/20240705-181020-marostegui.json
18:04 marostegui@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P65879 and previous config saved to /var/cache/conftool/dbconfig/20240705-180417-root.json
17:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2207 (T352010)', diff saved to https://phabricator.wikimedia.org/P65878 and previous config saved to /var/cache/conftool/dbconfig/20240705-175653-ladsgroup.json
17:49 marostegui@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P65877 and previous config saved to /var/cache/conftool/dbconfig/20240705-174912-root.json
17:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2207', diff saved to https://phabricator.wikimedia.org/P65876 and previous config saved to /var/cache/conftool/dbconfig/20240705-174146-ladsgroup.json
17:34 marostegui@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P65875 and previous config saved to /var/cache/conftool/dbconfig/20240705-173406-root.json
17:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2207', diff saved to https://phabricator.wikimedia.org/P65874 and previous config saved to /var/cache/conftool/dbconfig/20240705-172639-ladsgroup.json
17:19 marostegui@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P65873 and previous config saved to /var/cache/conftool/dbconfig/20240705-171901-root.json
17:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2207 (T352010)', diff saved to https://phabricator.wikimedia.org/P65872 and previous config saved to /var/cache/conftool/dbconfig/20240705-171131-ladsgroup.json
17:03 marostegui@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P65871 and previous config saved to /var/cache/conftool/dbconfig/20240705-170356-root.json
17:00 logmsgbot: andrewtavis-wmde@deploy1002 Finished deploy [airflow-dags/wmde@73c6618]: (no justification provided) (duration: 00m 06s)
17:00 logmsgbot: andrewtavis-wmde@deploy1002 Started deploy [airflow-dags/wmde@73c6618]: (no justification provided)
13:40 hashar@deploy1002: Finished deploy [integration/docroot@18c8279]: Add AQS documentation to landing page - T368484 (duration: 00m 06s)
13:40 hashar@deploy1002: Started deploy [integration/docroot@18c8279]: Add AQS documentation to landing page - T368484
12:57 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1246.eqiad.wmnet with reason: Long schema change
12:57 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1246.eqiad.wmnet with reason: Long schema change
12:51 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2157 (T367856)', diff saved to https://phabricator.wikimedia.org/P65869 and previous config saved to /var/cache/conftool/dbconfig/20240705-125152-marostegui.json
12:51 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2157.codfw.wmnet with reason: Maintenance
12:51 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2157.codfw.wmnet with reason: Maintenance
12:51 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T367856)', diff saved to https://phabricator.wikimedia.org/P65868 and previous config saved to /var/cache/conftool/dbconfig/20240705-125130-marostegui.json
12:36 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P65867 and previous config saved to /var/cache/conftool/dbconfig/20240705-123623-marostegui.json
12:21 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P65866 and previous config saved to /var/cache/conftool/dbconfig/20240705-122115-marostegui.json
12:19 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
12:19 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
12:06 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T367856)', diff saved to https://phabricator.wikimedia.org/P65865 and previous config saved to /var/cache/conftool/dbconfig/20240705-120608-marostegui.json
11:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1231 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P65864 and previous config saved to /var/cache/conftool/dbconfig/20240705-115703-ladsgroup.json
11:53 dcausse: T369149: re-indexed wikidata P12861 (cirrus_rerender.rerender --wiki wikidatawiki allpages --namespace 120 --from-title P12861 --to-title P12861)
11:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1231 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P65863 and previous config saved to /var/cache/conftool/dbconfig/20240705-114157-ladsgroup.json
11:30 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on kubernetes1051.eqiad.wmnet with reason: Hardware issue
11:29 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on kubernetes1051.eqiad.wmnet with reason: Hardware issue
11:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1231 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P65862 and previous config saved to /var/cache/conftool/dbconfig/20240705-112652-ladsgroup.json
11:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2207 (T352010)', diff saved to https://phabricator.wikimedia.org/P65861 and previous config saved to /var/cache/conftool/dbconfig/20240705-111322-ladsgroup.json
11:13 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2207.codfw.wmnet with reason: Maintenance
11:13 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2207.codfw.wmnet with reason: Maintenance
11:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1231 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P65860 and previous config saved to /var/cache/conftool/dbconfig/20240705-111146-ladsgroup.json
10:52 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
10:52 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
10:41 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for Define custom search-index-data-formatter-callback (T369149), Try looking up search index data formatters by data type (T369149) (duration: 21m 22s)
10:36 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde: Continuing with sync
10:22 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde: Backport for Define custom search-index-data-formatter-callback (T369149), Try looking up search index data formatters by data type (T369149) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
10:20 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap sync-world: Backport for Define custom search-index-data-formatter-callback (T369149), Try looking up search index data formatters by data type (T369149)
10:11 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
10:10 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
09:35 fabfur: running puppet on A:cp to apply https://gerrit.wikimedia.org/r/c/operations/puppet/+/1052271 (T369345)
09:26 XioNoX: netbox-dev2003: move from netbox-dev to netbox-next - T336275
08:55 godog: silence NELNotReported NELByCountryNotReported until Tues - T369345
08:54 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2128 (T367856)', diff saved to https://phabricator.wikimedia.org/P65858 and previous config saved to /var/cache/conftool/dbconfig/20240705-085406-marostegui.json
08:53 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
08:53 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
08:53 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2128.codfw.wmnet with reason: Maintenance
08:53 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2128.codfw.wmnet with reason: Maintenance
08:53 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2123 (T367856)', diff saved to https://phabricator.wikimedia.org/P65857 and previous config saved to /var/cache/conftool/dbconfig/20240705-085329-marostegui.json
08:38 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2123', diff saved to https://phabricator.wikimedia.org/P65856 and previous config saved to /var/cache/conftool/dbconfig/20240705-083821-marostegui.json
08:23 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2123', diff saved to https://phabricator.wikimedia.org/P65855 and previous config saved to /var/cache/conftool/dbconfig/20240705-082314-marostegui.json
08:08 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2123 (T367856)', diff saved to https://phabricator.wikimedia.org/P65854 and previous config saved to /var/cache/conftool/dbconfig/20240705-080807-marostegui.json
08:04 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1240.eqiad.wmnet with reason: Maintenance
08:04 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1240.eqiad.wmnet with reason: Maintenance
07:50 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
07:50 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
07:49 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
07:47 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
07:44 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
07:44 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
05:12 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1239.eqiad.wmnet with reason: Maintenance
05:12 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1239.eqiad.wmnet with reason: Maintenance
05:12 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235 (T364069)', diff saved to https://phabricator.wikimedia.org/P65852 and previous config saved to /var/cache/conftool/dbconfig/20240705-051202-marostegui.json
05:00 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2136', diff saved to https://phabricator.wikimedia.org/P65851 and previous config saved to /var/cache/conftool/dbconfig/20240705-050028-root.json
04:56 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235', diff saved to https://phabricator.wikimedia.org/P65850 and previous config saved to /var/cache/conftool/dbconfig/20240705-045655-marostegui.json
04:51 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2123 (T367856)', diff saved to https://phabricator.wikimedia.org/P65849 and previous config saved to /var/cache/conftool/dbconfig/20240705-045145-marostegui.json
04:51 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2123.codfw.wmnet with reason: Maintenance
04:51 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2123.codfw.wmnet with reason: Maintenance
04:49 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1246 (T367856)', diff saved to https://phabricator.wikimedia.org/P65848 and previous config saved to /var/cache/conftool/dbconfig/20240705-044912-marostegui.json
04:49 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1246.eqiad.wmnet with reason: Maintenance
04:49 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1246.eqiad.wmnet with reason: Maintenance
04:41 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235', diff saved to https://phabricator.wikimedia.org/P65847 and previous config saved to /var/cache/conftool/dbconfig/20240705-044148-marostegui.json
04:26 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235 (T364069)', diff saved to https://phabricator.wikimedia.org/P65846 and previous config saved to /var/cache/conftool/dbconfig/20240705-042641-marostegui.json
01:32 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1235 (T364069)', diff saved to https://phabricator.wikimedia.org/P65845 and previous config saved to /var/cache/conftool/dbconfig/20240705-013250-marostegui.json
01:32 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1235.eqiad.wmnet with reason: Maintenance
01:32 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1235.eqiad.wmnet with reason: Maintenance
01:32 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234 (T364069)', diff saved to https://phabricator.wikimedia.org/P65844 and previous config saved to /var/cache/conftool/dbconfig/20240705-013229-marostegui.json
01:17 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234', diff saved to https://phabricator.wikimedia.org/P65843 and previous config saved to /var/cache/conftool/dbconfig/20240705-011721-marostegui.json
01:02 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234', diff saved to https://phabricator.wikimedia.org/P65842 and previous config saved to /var/cache/conftool/dbconfig/20240705-010214-marostegui.json
00:47 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234 (T364069)', diff saved to https://phabricator.wikimedia.org/P65841 and previous config saved to /var/cache/conftool/dbconfig/20240705-004707-marostegui.json

2024-07-04

22:04 mvolz@deploy1002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
22:03 mvolz@deploy1002: helmfile [eqiad] START helmfile.d/services/citoid: apply
22:02 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1234 (T364069)', diff saved to https://phabricator.wikimedia.org/P65840 and previous config saved to /var/cache/conftool/dbconfig/20240704-220227-marostegui.json
22:02 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1234.eqiad.wmnet with reason: Maintenance
22:02 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1234.eqiad.wmnet with reason: Maintenance
22:02 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232 (T364069)', diff saved to https://phabricator.wikimedia.org/P65839 and previous config saved to /var/cache/conftool/dbconfig/20240704-220205-marostegui.json
22:01 mvolz@deploy1002: helmfile [codfw] DONE helmfile.d/services/citoid: apply
22:00 mvolz@deploy1002: helmfile [codfw] START helmfile.d/services/citoid: apply
21:59 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/citoid: apply
21:59 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/citoid: apply
21:46 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232', diff saved to https://phabricator.wikimedia.org/P65838 and previous config saved to /var/cache/conftool/dbconfig/20240704-214658-marostegui.json
21:31 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232', diff saved to https://phabricator.wikimedia.org/P65837 and previous config saved to /var/cache/conftool/dbconfig/20240704-213151-marostegui.json
21:16 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232 (T364069)', diff saved to https://phabricator.wikimedia.org/P65836 and previous config saved to /var/cache/conftool/dbconfig/20240704-211644-marostegui.json
20:17 jdrewniak@deploy1002: Finished scap: Backport for [July 4th] Reduce list of exclusions for dark mode (1.43.0-wmf.12), Remove modifications of wgCheckUserLogAdditionalRights (T346022), Add editcontentmodel to interface-admin for French Wikipedia (T369113) (duration: 12m 14s)
20:12 jdrewniak@deploy1002: jdlrobson, nmw03, jdrewniak, dreamyjazz: Continuing with sync
20:08 jdrewniak@deploy1002: jdlrobson, nmw03, jdrewniak, dreamyjazz: Backport for [July 4th] Reduce list of exclusions for dark mode (1.43.0-wmf.12), Remove modifications of wgCheckUserLogAdditionalRights (T346022), Add editcontentmodel to interface-admin for French Wikipedia (T369113) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:05 jdrewniak@deploy1002: Started scap sync-world: Backport for [July 4th] Reduce list of exclusions for dark mode (1.43.0-wmf.12), Remove modifications of wgCheckUserLogAdditionalRights (T346022), Add editcontentmodel to interface-admin for French Wikipedia (T369113)
19:57 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on A:cp-text_eqiad
19:55 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on A:cp-upload_eqiad
18:23 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1232 (T364069)', diff saved to https://phabricator.wikimedia.org/P65835 and previous config saved to /var/cache/conftool/dbconfig/20240704-182308-marostegui.json
18:23 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1232.eqiad.wmnet with reason: Maintenance
18:23 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1232.eqiad.wmnet with reason: Maintenance
18:22 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1219 (T364069)', diff saved to https://phabricator.wikimedia.org/P65834 and previous config saved to /var/cache/conftool/dbconfig/20240704-182257-marostegui.json
18:07 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1219', diff saved to https://phabricator.wikimedia.org/P65833 and previous config saved to /var/cache/conftool/dbconfig/20240704-180749-marostegui.json
17:52 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1219', diff saved to https://phabricator.wikimedia.org/P65832 and previous config saved to /var/cache/conftool/dbconfig/20240704-175242-marostegui.json
17:37 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1219 (T364069)', diff saved to https://phabricator.wikimedia.org/P65831 and previous config saved to /var/cache/conftool/dbconfig/20240704-173735-marostegui.json
17:10 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1078.eqiad.wmnet
16:20 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
16:19 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
16:15 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-worker1078.eqiad.wmnet
16:14 btullis@cumin1002: END (FAIL) - Cookbook sre.hadoop.reboot-workers (exit_code=99) for Hadoop analytics cluster
16:14 btullis@cumin1002: START - Cookbook sre.hadoop.reboot-workers for Hadoop analytics cluster
16:06 kevinbazira@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
15:49 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
15:49 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
15:02 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: sync
15:02 elukey@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: sync
14:33 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1219 (T364069)', diff saved to https://phabricator.wikimedia.org/P65830 and previous config saved to /var/cache/conftool/dbconfig/20240704-143350-marostegui.json
14:33 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1219.eqiad.wmnet with reason: Maintenance
14:33 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1219.eqiad.wmnet with reason: Maintenance
14:33 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1218 (T364069)', diff saved to https://phabricator.wikimedia.org/P65829 and previous config saved to /var/cache/conftool/dbconfig/20240704-143327-marostegui.json
14:18 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1218', diff saved to https://phabricator.wikimedia.org/P65827 and previous config saved to /var/cache/conftool/dbconfig/20240704-141820-marostegui.json
14:03 Lucas_WMDE: UTC afternoon backport+config window done
14:03 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1218', diff saved to https://phabricator.wikimedia.org/P65826 and previous config saved to /var/cache/conftool/dbconfig/20240704-140313-marostegui.json
14:01 claime: Enabling and running puppet on P:trafficserver::backend to merge 1050293 - T367949
14:01 marostegui@cumin1002: dbctl commit (dc=all): 'db1213 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P65825 and previous config saved to /var/cache/conftool/dbconfig/20240704-140145-root.json
13:57 claime: Enabling puppet on cp4037.ulsfo.wmnet to test 1050293 - T367949
13:53 claime: disabling puppet on P:trafficserver::backend to merge 1049507 - T367949
13:48 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1218 (T364069)', diff saved to https://phabricator.wikimedia.org/P65824 and previous config saved to /var/cache/conftool/dbconfig/20240704-134806-marostegui.json
13:46 marostegui@cumin1002: dbctl commit (dc=all): 'db1185 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P65823 and previous config saved to /var/cache/conftool/dbconfig/20240704-134656-root.json
13:46 marostegui@cumin1002: dbctl commit (dc=all): 'db1213 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P65822 and previous config saved to /var/cache/conftool/dbconfig/20240704-134639-root.json
13:44 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for Remove "Create a book" link from sidebar on German Wikipedia (T368900) (duration: 08m 35s)
13:41 claime: Enabling and running puppet on P:trafficserver::backend to merge 1050293 - T367949
13:41 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
13:41 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
13:41 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1246 (T367856)', diff saved to https://phabricator.wikimedia.org/P65821 and previous config saved to /var/cache/conftool/dbconfig/20240704-134105-marostegui.json
13:39 logmsgbot: lucaswerkmeister-wmde@deploy1002 dreamrimmer, lucaswerkmeister-wmde: Continuing with sync
13:38 logmsgbot: lucaswerkmeister-wmde@deploy1002 dreamrimmer, lucaswerkmeister-wmde: Backport for Remove "Create a book" link from sidebar on German Wikipedia (T368900) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:36 claime: Enabling puppet on cp6016.drmrs.wmnet to test 1050293 - T367949
13:36 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap sync-world: Backport for Remove "Create a book" link from sidebar on German Wikipedia (T368900)
13:32 claime: disabling puppet on P:trafficserver::backend to merge 1050293 - T367949
13:31 marostegui@cumin1002: dbctl commit (dc=all): 'db1185 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P65820 and previous config saved to /var/cache/conftool/dbconfig/20240704-133150-root.json
13:31 marostegui@cumin1002: dbctl commit (dc=all): 'db1213 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P65819 and previous config saved to /var/cache/conftool/dbconfig/20240704-133133-root.json
13:25 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1246', diff saved to https://phabricator.wikimedia.org/P65818 and previous config saved to /var/cache/conftool/dbconfig/20240704-132558-marostegui.json
13:20 logmsgbot: andrewtavis-wmde@deploy1002 Finished deploy [airflow-dags/wmde@d773cac]: (no justification provided) (duration: 00m 03s)
13:20 logmsgbot: andrewtavis-wmde@deploy1002 Started deploy [airflow-dags/wmde@d773cac]: (no justification provided)
13:16 marostegui@cumin1002: dbctl commit (dc=all): 'db1185 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P65817 and previous config saved to /var/cache/conftool/dbconfig/20240704-131643-root.json
13:16 marostegui@cumin1002: dbctl commit (dc=all): 'db1213 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P65816 and previous config saved to /var/cache/conftool/dbconfig/20240704-131628-root.json
13:11 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
13:11 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
13:10 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1246', diff saved to https://phabricator.wikimedia.org/P65815 and previous config saved to /var/cache/conftool/dbconfig/20240704-131050-marostegui.json
13:09 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
13:09 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
13:08 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
13:07 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
13:01 marostegui@cumin1002: dbctl commit (dc=all): 'db1185 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P65814 and previous config saved to /var/cache/conftool/dbconfig/20240704-130137-root.json
13:01 marostegui@cumin1002: dbctl commit (dc=all): 'db1213 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P65813 and previous config saved to /var/cache/conftool/dbconfig/20240704-130122-root.json
12:55 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1246 (T367856)', diff saved to https://phabricator.wikimedia.org/P65812 and previous config saved to /var/cache/conftool/dbconfig/20240704-125543-marostegui.json
12:46 marostegui@cumin1002: dbctl commit (dc=all): 'db1185 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P65811 and previous config saved to /var/cache/conftool/dbconfig/20240704-124632-root.json
12:46 marostegui@cumin1002: dbctl commit (dc=all): 'db1213 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P65810 and previous config saved to /var/cache/conftool/dbconfig/20240704-124617-root.json
12:36 hashar@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.43.0-wmf.12 refs T366957
12:31 marostegui@cumin1002: dbctl commit (dc=all): 'db1185 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P65808 and previous config saved to /var/cache/conftool/dbconfig/20240704-123127-root.json
12:31 marostegui@cumin1002: dbctl commit (dc=all): 'db1213 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P65807 and previous config saved to /var/cache/conftool/dbconfig/20240704-123111-root.json
12:27 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1213', diff saved to https://phabricator.wikimedia.org/P65806 and previous config saved to /var/cache/conftool/dbconfig/20240704-122752-root.json
12:16 marostegui@cumin1002: dbctl commit (dc=all): 'db1213 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P65805 and previous config saved to /var/cache/conftool/dbconfig/20240704-121631-root.json
12:16 marostegui@cumin1002: dbctl commit (dc=all): 'db1185 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P65804 and previous config saved to /var/cache/conftool/dbconfig/20240704-121621-root.json
12:11 hashar@deploy1002: Finished scap: Backport for PermissionManager: Handle empty error array from TitleQuickPermissions (T369260) (duration: 07m 45s)
12:06 hashar@deploy1002: hashar, d3r1ck01: Continuing with sync
12:06 hashar@deploy1002: hashar, d3r1ck01: Backport for PermissionManager: Handle empty error array from TitleQuickPermissions (T369260) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
12:03 hashar@deploy1002: Started scap sync-world: Backport for PermissionManager: Handle empty error array from TitleQuickPermissions (T369260)
12:02 hashar@deploy1002: Sync cancelled.
12:02 hashar@deploy1002: hashar, d3r1ck01: Backport for PermissionManager: Handle empty error array from TitleQuickPermissions (T369260) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
11:56 hashar@deploy1002: Started scap sync-world: Backport for PermissionManager: Handle empty error array from TitleQuickPermissions (T369260)
11:55 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1246 (T367856)', diff saved to https://phabricator.wikimedia.org/P65803 and previous config saved to /var/cache/conftool/dbconfig/20240704-115522-marostegui.json
11:55 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1246.eqiad.wmnet with reason: Maintenance
11:55 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1246.eqiad.wmnet with reason: Maintenance
11:54 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1185.eqiad.wmnet onto db1213.eqiad.wmnet
11:46 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
11:45 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
11:40 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
11:39 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
11:14 marostegui@cumin1002: START - Cookbook sre.mysql.clone of db1185.eqiad.wmnet onto db1213.eqiad.wmnet
11:13 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1213 db1185 T369250', diff saved to https://phabricator.wikimedia.org/P65802 and previous config saved to /var/cache/conftool/dbconfig/20240704-111324-root.json
10:52 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1218 (T364069)', diff saved to https://phabricator.wikimedia.org/P65801 and previous config saved to /var/cache/conftool/dbconfig/20240704-105205-marostegui.json
10:51 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1218.eqiad.wmnet with reason: Maintenance
10:51 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1218.eqiad.wmnet with reason: Maintenance
10:51 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207 (T364069)', diff saved to https://phabricator.wikimedia.org/P65800 and previous config saved to /var/cache/conftool/dbconfig/20240704-105143-marostegui.json
10:36 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207', diff saved to https://phabricator.wikimedia.org/P65799 and previous config saved to /var/cache/conftool/dbconfig/20240704-103636-marostegui.json
10:21 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207', diff saved to https://phabricator.wikimedia.org/P65798 and previous config saved to /var/cache/conftool/dbconfig/20240704-102129-marostegui.json
10:06 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207 (T364069)', diff saved to https://phabricator.wikimedia.org/P65797 and previous config saved to /var/cache/conftool/dbconfig/20240704-100622-marostegui.json
09:53 topranks: Pushing updated BGP policy to cr2-eqord in Chiacago to re-announce codfw IP ranges there T367439
09:29 dcaro@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd1009.eqiad.wmnet
09:24 dcaro@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudcephosd1009.eqiad.wmnet
09:23 root@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1009.eqiad.wmnet with OS bullseye
09:23 claime: Manual cleanup of puppet certs for renamed servers mw1417.eqiad.wmnet mw1418.eqiad.wmnet mw2300.codfw.wmnet
09:22 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1231.eqiad.wmnet with reason: Maintenance
09:21 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1231.eqiad.wmnet with reason: Maintenance
09:17 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
09:17 ayounsi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove old sretest2005 IP - ayounsi@cumin1002"
09:16 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove old sretest2005 IP - ayounsi@cumin1002"
09:13 ayounsi@cumin1002: START - Cookbook sre.dns.netbox
09:05 hashar@deploy1002: rebuilt and synchronized wikiversions files: Revert "group2 wikis to 1.43.0-wmf.12" - T366957
09:03 root@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1009.eqiad.wmnet with reason: host reimage
09:00 root@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1009.eqiad.wmnet with reason: host reimage
08:59 elukey: restart mcrouter on mwmaint1002
08:56 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
08:45 fabfur: enable puppet on A:cp-ulsfo (T365718)
08:45 root@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1009.eqiad.wmnet with OS bullseye
08:44 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp4037.ulsfo.wmnet
08:43 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp4037.ulsfo.wmnet
08:28 ayounsi@cumin1002: START - Cookbook sre.dns.netbox
08:28 hashar@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.43.0-wmf.12 refs T366957
08:24 fabfur: temporary disable puppet on A:cp-ulsfo to apply https://gerrit.wikimedia.org/r/c/operations/puppet/+/1051198 (T365718)
08:10 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
08:02 fabfur@cumin1002: START - Cookbook sre.cdn.roll-reboot rolling reboot on A:cp-text_eqiad
08:02 fabfur@cumin1002: START - Cookbook sre.cdn.roll-reboot rolling reboot on A:cp-upload_eqiad
08:01 fabfur: start rebooting A:cp-eqiad (upload|text in parallel) for T366555
07:52 root@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts cloudcephosd1009.eqiad.wmnet
07:52 root@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd1009.eqiad.wmnet
07:41 root@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudcephosd1009.eqiad.wmnet
07:35 root@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cloudcephosd1009.eqiad.wmnet
07:18 dcausse: closing the backport window
07:15 dcausse: refreshing the wikitech search indices
07:11 dcausse@deploy1002: Finished scap: Backport for cirrus: re-enable search updates on wikitech (duration: 08m 28s)
07:06 dcausse@deploy1002: dcausse: Continuing with sync
07:05 dcausse@deploy1002: dcausse: Backport for cirrus: re-enable search updates on wikitech synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
07:02 dcausse@deploy1002: Started scap sync-world: Backport for cirrus: re-enable search updates on wikitech
07:01 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1207 (T364069)', diff saved to https://phabricator.wikimedia.org/P65794 and previous config saved to /var/cache/conftool/dbconfig/20240704-070100-marostegui.json
07:00 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1207.eqiad.wmnet with reason: Maintenance
07:00 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1207.eqiad.wmnet with reason: Maintenance
07:00 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T364069)', diff saved to https://phabricator.wikimedia.org/P65793 and previous config saved to /var/cache/conftool/dbconfig/20240704-070038-marostegui.json
06:30 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P65791 and previous config saved to /var/cache/conftool/dbconfig/20240704-063024-marostegui.json
06:15 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T364069)', diff saved to https://phabricator.wikimedia.org/P65790 and previous config saved to /var/cache/conftool/dbconfig/20240704-061517-marostegui.json
05:11 marostegui: Deploy schema change on db1231 s6 eqiad dbmaint T367856
05:08 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1231.eqiad.wmnet with reason: Long schema change
05:08 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1231.eqiad.wmnet with reason: Long schema change
05:03 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1231 T369020', diff saved to https://phabricator.wikimedia.org/P65789 and previous config saved to /var/cache/conftool/dbconfig/20240704-050334-marostegui.json
05:02 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db1173 to s6 primary and set section read-write T369020', diff saved to https://phabricator.wikimedia.org/P65788 and previous config saved to /var/cache/conftool/dbconfig/20240704-050237-marostegui.json
05:02 marostegui@cumin1002: dbctl commit (dc=all): 'Set s6 eqiad as read-only for maintenance - T369020', diff saved to https://phabricator.wikimedia.org/P65787 and previous config saved to /var/cache/conftool/dbconfig/20240704-050216-marostegui.json
05:01 marostegui: Starting s6 eqiad failover from db1231 to db1173 - T369020
04:44 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 27 hosts with reason: Primary switchover s6 T369020
04:44 marostegui@cumin1002: dbctl commit (dc=all): 'Set db1173 with weight 0 T369020', diff saved to https://phabricator.wikimedia.org/P65786 and previous config saved to /var/cache/conftool/dbconfig/20240704-044429-marostegui.json
04:44 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 27 hosts with reason: Primary switchover s6 T369020
03:11 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1206 (T364069)', diff saved to https://phabricator.wikimedia.org/P65785 and previous config saved to /var/cache/conftool/dbconfig/20240704-031151-marostegui.json
03:11 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1206.eqiad.wmnet with reason: Maintenance
03:11 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1206.eqiad.wmnet with reason: Maintenance
03:11 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T364069)', diff saved to https://phabricator.wikimedia.org/P65784 and previous config saved to /var/cache/conftool/dbconfig/20240704-031129-marostegui.json
02:56 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P65783 and previous config saved to /var/cache/conftool/dbconfig/20240704-025622-marostegui.json
02:43 ryankemper@cumin2002: END (PASS) - Cookbook sre.hadoop.reboot-workers (exit_code=0) for Hadoop analytics cluster
02:41 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P65782 and previous config saved to /var/cache/conftool/dbconfig/20240704-024115-marostegui.json
02:33 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on A:cp-text_drmrs
02:31 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on A:cp-upload_drmrs
02:26 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T364069)', diff saved to https://phabricator.wikimedia.org/P65781 and previous config saved to /var/cache/conftool/dbconfig/20240704-022608-marostegui.json
01:43 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1239.eqiad.wmnet with reason: Maintenance
01:43 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1239.eqiad.wmnet with reason: Maintenance
01:43 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233 (T367856)', diff saved to https://phabricator.wikimedia.org/P65780 and previous config saved to /var/cache/conftool/dbconfig/20240704-014313-marostegui.json
01:28 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P65779 and previous config saved to /var/cache/conftool/dbconfig/20240704-012806-marostegui.json
01:12 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P65778 and previous config saved to /var/cache/conftool/dbconfig/20240704-011258-marostegui.json
00:57 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233 (T367856)', diff saved to https://phabricator.wikimedia.org/P65777 and previous config saved to /var/cache/conftool/dbconfig/20240704-005750-marostegui.json
00:43 dzahn@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host parsoidtest1001.eqiad.wmnet with OS bullseye
00:43 dzahn@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - dzahn@cumin1002"
00:42 dzahn@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - dzahn@cumin1002"
00:29 dzahn@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on parsoidtest1001.eqiad.wmnet with reason: host reimage
00:25 dzahn@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on parsoidtest1001.eqiad.wmnet with reason: host reimage
00:15 dzahn@cumin1002: START - Cookbook sre.hosts.reimage for host parsoidtest1001.eqiad.wmnet with OS bullseye

2024-07-03

23:47 tzatziki: removing 11 files for legal compliance
23:23 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1196 (T364069)', diff saved to https://phabricator.wikimedia.org/P65776 and previous config saved to /var/cache/conftool/dbconfig/20240703-232302-marostegui.json
23:22 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
23:22 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
23:22 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1196.eqiad.wmnet with reason: Maintenance
23:22 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1196.eqiad.wmnet with reason: Maintenance
23:22 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1195 (T364069)', diff saved to https://phabricator.wikimedia.org/P65775 and previous config saved to /var/cache/conftool/dbconfig/20240703-232221-marostegui.json
23:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 (T352010)', diff saved to https://phabricator.wikimedia.org/P65774 and previous config saved to /var/cache/conftool/dbconfig/20240703-232154-ladsgroup.json
23:07 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1195', diff saved to https://phabricator.wikimedia.org/P65773 and previous config saved to /var/cache/conftool/dbconfig/20240703-230713-marostegui.json
23:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P65772 and previous config saved to /var/cache/conftool/dbconfig/20240703-230646-ladsgroup.json
22:52 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1195', diff saved to https://phabricator.wikimedia.org/P65771 and previous config saved to /var/cache/conftool/dbconfig/20240703-225206-marostegui.json
22:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P65770 and previous config saved to /var/cache/conftool/dbconfig/20240703-225139-ladsgroup.json
22:37 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1195 (T364069)', diff saved to https://phabricator.wikimedia.org/P65769 and previous config saved to /var/cache/conftool/dbconfig/20240703-223659-marostegui.json
22:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 (T352010)', diff saved to https://phabricator.wikimedia.org/P65768 and previous config saved to /var/cache/conftool/dbconfig/20240703-223632-ladsgroup.json
22:36 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host parsoidtest1001.eqiad.wmnet with OS bullseye
21:56 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host parsoidtest1001.eqiad.wmnet with OS bullseye
21:40 ryankemper@cumin2002: START - Cookbook sre.hadoop.reboot-workers for Hadoop analytics cluster
21:40 ryankemper@cumin2002: END (FAIL) - Cookbook sre.hadoop.reboot-workers (exit_code=99) for Hadoop analytics cluster
21:35 ryankemper@cumin2002: START - Cookbook sre.hadoop.reboot-workers for Hadoop analytics cluster
20:13 cjming: end of UTC late backport window
20:11 cjming@deploy1002: Finished scap: Backport for Remove QuickSurvey for Automoderator patroller workstream survey (T362969) (duration: 08m 22s)
20:10 cmooney@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host sretest2002.codfw.wmnet with OS bookworm
20:06 cjming@deploy1002: kgraessle, cjming: Continuing with sync
20:05 cjming@deploy1002: kgraessle, cjming: Backport for Remove QuickSurvey for Automoderator patroller workstream survey (T362969) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:05 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
20:04 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
20:04 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
20:03 cjming@deploy1002: Started scap sync-world: Backport for Remove QuickSurvey for Automoderator patroller workstream survey (T362969)
19:56 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
19:55 cmooney@cumin1002: START - Cookbook sre.dns.netbox
19:54 cmooney@cumin1002: START - Cookbook sre.hosts.reimage for host sretest2002.codfw.wmnet with OS bookworm
19:49 cmooney@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host sretest2002.codfw.wmnet with OS bookworm
19:40 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1195 (T364069)', diff saved to https://phabricator.wikimedia.org/P65766 and previous config saved to /var/cache/conftool/dbconfig/20240703-194055-marostegui.json
19:40 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1195.eqiad.wmnet with reason: Maintenance
19:40 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1195.eqiad.wmnet with reason: Maintenance
19:40 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T364069)', diff saved to https://phabricator.wikimedia.org/P65765 and previous config saved to /var/cache/conftool/dbconfig/20240703-194033-marostegui.json
19:30 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
19:30 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
19:25 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P65761 and previous config saved to /var/cache/conftool/dbconfig/20240703-192526-marostegui.json
19:25 cmooney@cumin1002: START - Cookbook sre.hosts.reimage for host sretest2002.codfw.wmnet with OS bookworm
19:24 cmooney@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2002.codfw.wmnet with OS bookworm
19:19 cmooney@cumin1002: START - Cookbook sre.hosts.reimage for host sretest2002.codfw.wmnet with OS bookworm
19:16 cmooney@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host sretest2002.codfw.wmnet with OS bookworm
19:12 ebysans@deploy1002: Finished deploy [airflow-dags/analytics@d773cac]: (no justification provided) (duration: 00m 33s)
19:11 ebysans@deploy1002: Started deploy [airflow-dags/analytics@d773cac]: (no justification provided)
19:10 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P65760 and previous config saved to /var/cache/conftool/dbconfig/20240703-191019-marostegui.json
19:08 SandraEbele_: deploying airflow dags
18:55 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T364069)', diff saved to https://phabricator.wikimedia.org/P65759 and previous config saved to /var/cache/conftool/dbconfig/20240703-185511-marostegui.json
18:54 cmooney@cumin1002: START - Cookbook sre.hosts.reimage for host sretest2002.codfw.wmnet with OS bookworm
18:36 cdanis@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
18:36 cdanis@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
18:35 cdanis@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
18:34 cdanis@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
17:50 cdanis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
17:49 cdanis@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
17:49 cdanis@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
17:48 cdanis@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
17:46 cdanis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
17:45 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Unbanning all hosts in search_eqiad
17:45 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Unbanning all hosts in search_eqiad
17:44 cdanis@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
17:44 cdanis@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
17:43 cdanis@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
17:43 cdanis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
17:41 cdanis@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
17:41 cdanis@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
17:40 cdanis@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
17:40 cdanis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply
17:37 cdanis@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply
17:37 cdanis@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply
17:36 cdanis@deploy1002: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply
17:36 marostegui@cumin1002: dbctl commit (dc=all): 'db2207 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P65758 and previous config saved to /var/cache/conftool/dbconfig/20240703-173601-root.json
17:35 cdanis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-misc: apply
17:35 cdanis@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-misc: apply
17:35 cdanis@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-misc: apply
17:35 cdanis@deploy1002: helmfile [codfw] START helmfile.d/services/mw-misc: apply
17:35 cdanis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-wikifunctions: apply
17:34 cdanis@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-wikifunctions: apply
17:34 cdanis@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-wikifunctions: apply
17:34 cdanis@deploy1002: helmfile [codfw] START helmfile.d/services/mw-wikifunctions: apply
17:34 cdanis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-jobrunner: apply
17:33 cdanis@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-jobrunner: apply
17:33 cdanis@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-jobrunner: apply
17:31 cdanis@deploy1002: helmfile [codfw] START helmfile.d/services/mw-jobrunner: apply
17:30 cdanis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
17:29 cdanis@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
17:28 cdanis@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
17:28 cdanis@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
17:22 jforrester@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
17:20 marostegui@cumin1002: dbctl commit (dc=all): 'db2207 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P65756 and previous config saved to /var/cache/conftool/dbconfig/20240703-172055-root.json
17:19 jforrester@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
17:19 jforrester@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
17:17 jforrester@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
17:17 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
17:15 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
17:11 jforrester@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
17:10 jforrester@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
17:10 jforrester@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
17:09 jforrester@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
17:08 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
17:07 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
17:05 marostegui@cumin1002: dbctl commit (dc=all): 'db2207 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P65755 and previous config saved to /var/cache/conftool/dbconfig/20240703-170549-root.json
16:50 marostegui@cumin1002: dbctl commit (dc=all): 'db2207 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P65754 and previous config saved to /var/cache/conftool/dbconfig/20240703-165044-root.json
16:47 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-presto1004.eqiad.wmnet with reason: Cold booting to investigate RAM issue
16:46 btullis@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on an-presto1004.eqiad.wmnet with reason: Cold booting to investigate RAM issue
16:44 jhathaway: adding inbound email servers mx-in{1001,2001} to our MX record
16:35 marostegui@cumin1002: dbctl commit (dc=all): 'db2207 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P65752 and previous config saved to /var/cache/conftool/dbconfig/20240703-163538-root.json
16:20 marostegui@cumin1002: dbctl commit (dc=all): 'db2207 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P65751 and previous config saved to /var/cache/conftool/dbconfig/20240703-162032-root.json
16:05 marostegui@cumin1002: dbctl commit (dc=all): 'db2207 (re)pooling @ 1%: Repooling', diff saved to Unable to send diff to phaste and previous config saved to /var/cache/conftool/dbconfig/20240703-160521-root.json
16:04 kevinbazira@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
15:47 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1186 (T364069)', diff saved to https://phabricator.wikimedia.org/P65750 and previous config saved to /var/cache/conftool/dbconfig/20240703-154716-marostegui.json
15:47 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1186.eqiad.wmnet with reason: Maintenance
15:46 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1186.eqiad.wmnet with reason: Maintenance
15:46 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T364069)', diff saved to https://phabricator.wikimedia.org/P65749 and previous config saved to /var/cache/conftool/dbconfig/20240703-154643-marostegui.json
15:41 arnaudb@cumin1002: dbctl commit (dc=all): 'db1191 (re)pooling @ 100%: post T365994 repool', diff saved to https://phabricator.wikimedia.org/P65748 and previous config saved to /var/cache/conftool/dbconfig/20240703-154142-arnaudb.json
15:41 arnaudb@cumin1002: dbctl commit (dc=all): 'db1196 (re)pooling @ 100%: post T365994 repool', diff saved to https://phabricator.wikimedia.org/P65747 and previous config saved to /var/cache/conftool/dbconfig/20240703-154121-arnaudb.json
15:41 arnaudb@cumin1002: dbctl commit (dc=all): 'db1197 (re)pooling @ 100%: post T365994 repool', diff saved to https://phabricator.wikimedia.org/P65746 and previous config saved to /var/cache/conftool/dbconfig/20240703-154109-arnaudb.json
15:32 kevinbazira@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
15:31 sukhe: restart haproxy on dns1005
15:31 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P65744 and previous config saved to /var/cache/conftool/dbconfig/20240703-153136-marostegui.json
15:26 arnaudb@cumin1002: dbctl commit (dc=all): 'db1191 (re)pooling @ 75%: post T365994 repool', diff saved to https://phabricator.wikimedia.org/P65743 and previous config saved to /var/cache/conftool/dbconfig/20240703-152636-arnaudb.json
15:26 arnaudb@cumin1002: dbctl commit (dc=all): 'db1196 (re)pooling @ 75%: post T365994 repool', diff saved to https://phabricator.wikimedia.org/P65742 and previous config saved to /var/cache/conftool/dbconfig/20240703-152616-arnaudb.json
15:26 arnaudb@cumin1002: dbctl commit (dc=all): 'db1197 (re)pooling @ 75%: post T365994 repool', diff saved to https://phabricator.wikimedia.org/P65741 and previous config saved to /var/cache/conftool/dbconfig/20240703-152603-arnaudb.json
15:16 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P65740 and previous config saved to /var/cache/conftool/dbconfig/20240703-151628-marostegui.json
15:14 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:14 ayounsi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: 208.80.152.129 v6 - ayounsi@cumin1002"
15:13 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: 208.80.152.129 v6 - ayounsi@cumin1002"
15:11 arnaudb@cumin1002: dbctl commit (dc=all): 'db1191 (re)pooling @ 50%: post T365994 repool', diff saved to https://phabricator.wikimedia.org/P65739 and previous config saved to /var/cache/conftool/dbconfig/20240703-151131-arnaudb.json
15:11 arnaudb@cumin1002: dbctl commit (dc=all): 'db1196 (re)pooling @ 50%: post T365994 repool', diff saved to https://phabricator.wikimedia.org/P65738 and previous config saved to /var/cache/conftool/dbconfig/20240703-151110-arnaudb.json
15:10 arnaudb@cumin1002: dbctl commit (dc=all): 'db1197 (re)pooling @ 50%: post T365994 repool', diff saved to https://phabricator.wikimedia.org/P65737 and previous config saved to /var/cache/conftool/dbconfig/20240703-151057-arnaudb.json
15:10 ayounsi@cumin1002: START - Cookbook sre.dns.netbox
15:04 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1233 (T367856)', diff saved to https://phabricator.wikimedia.org/P65736 and previous config saved to /var/cache/conftool/dbconfig/20240703-150411-marostegui.json
15:04 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1233.eqiad.wmnet with reason: Maintenance
15:03 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1233.eqiad.wmnet with reason: Maintenance
15:03 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229 (T367856)', diff saved to https://phabricator.wikimedia.org/P65735 and previous config saved to /var/cache/conftool/dbconfig/20240703-150348-marostegui.json
15:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T364069)', diff saved to https://phabricator.wikimedia.org/P65734 and previous config saved to /var/cache/conftool/dbconfig/20240703-150121-marostegui.json
14:56 arnaudb@cumin1002: dbctl commit (dc=all): 'db1191 (re)pooling @ 25%: post T365994 repool', diff saved to https://phabricator.wikimedia.org/P65733 and previous config saved to /var/cache/conftool/dbconfig/20240703-145625-arnaudb.json
14:56 arnaudb@cumin1002: dbctl commit (dc=all): 'db1196 (re)pooling @ 25%: post T365994 repool', diff saved to https://phabricator.wikimedia.org/P65732 and previous config saved to /var/cache/conftool/dbconfig/20240703-145604-arnaudb.json
14:55 arnaudb@cumin1002: dbctl commit (dc=all): 'db1197 (re)pooling @ 25%: post T365994 repool', diff saved to https://phabricator.wikimedia.org/P65731 and previous config saved to /var/cache/conftool/dbconfig/20240703-145552-arnaudb.json
14:54 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host parsoidtest1001.eqiad.wmnet with OS bullseye
14:51 fabfur@cumin1002: START - Cookbook sre.cdn.roll-reboot rolling reboot on A:cp-upload_drmrs
14:51 fabfur@cumin1002: START - Cookbook sre.cdn.roll-reboot rolling reboot on A:cp-text_drmrs
14:51 fabfur: start rebooting A:cp-drmrs (upload|text in parallel) for T366555
14:48 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P65730 and previous config saved to /var/cache/conftool/dbconfig/20240703-144841-marostegui.json
14:46 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply
14:45 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply
14:41 arnaudb@cumin1002: dbctl commit (dc=all): 'db1191 (re)pooling @ 10%: post T365994 repool', diff saved to https://phabricator.wikimedia.org/P65729 and previous config saved to /var/cache/conftool/dbconfig/20240703-144119-arnaudb.json
14:41 arnaudb@cumin1002: dbctl commit (dc=all): 'db1196 (re)pooling @ 10%: post T365994 repool', diff saved to https://phabricator.wikimedia.org/P65728 and previous config saved to /var/cache/conftool/dbconfig/20240703-144059-arnaudb.json
14:40 arnaudb@cumin1002: dbctl commit (dc=all): 'db1197 (re)pooling @ 10%: post T365994 repool', diff saved to https://phabricator.wikimedia.org/P65727 and previous config saved to /var/cache/conftool/dbconfig/20240703-144046-arnaudb.json
14:40 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-conf1006.eqiad.wmnet with OS bookworm
14:40 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-conf1005.eqiad.wmnet with OS bookworm
14:40 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-conf1004.eqiad.wmnet with OS bookworm
14:39 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
14:39 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
14:38 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
14:38 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
14:35 sukhe: [correction of previous A:dnsbox run] sudo cumin -b1 -s60 "A:dnsbox" "run-puppet-agent"
14:33 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P65726 and previous config saved to /var/cache/conftool/dbconfig/20240703-143334-marostegui.json
14:33 sukhe: sudo cumin "A:dnsbox" "run-puppet-agent"
14:32 sukhe: sudo cumin "A:wikidough" "run-puppet-agent"
14:32 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for kubernetes1060.eqiad.wmnet,wikikube-worker[1007,1021].eqiad.wmnet
14:32 jayme@cumin1002: START - Cookbook sre.hosts.remove-downtime for kubernetes1060.eqiad.wmnet,wikikube-worker[1007,1021].eqiad.wmnet
14:30 jayme@cumin1002: conftool action : set/pooled=yes; selector: name=(wikikube-worker1007.eqiad.wmnet|wikikube-worker1021.eqiad.wmnet|kubernetes1060.eqiad.wmnet)
14:27 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply
14:27 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply
14:26 arnaudb@cumin1002: dbctl commit (dc=all): 'db1191 (re)pooling @ 5%: post T365994 repool', diff saved to https://phabricator.wikimedia.org/P65725 and previous config saved to /var/cache/conftool/dbconfig/20240703-142614-arnaudb.json
14:25 arnaudb@cumin1002: dbctl commit (dc=all): 'db1196 (re)pooling @ 5%: post T365994 repool', diff saved to https://phabricator.wikimedia.org/P65724 and previous config saved to /var/cache/conftool/dbconfig/20240703-142553-arnaudb.json
14:25 arnaudb@cumin1002: dbctl commit (dc=all): 'db1197 (re)pooling @ 5%: post T365994 repool', diff saved to https://phabricator.wikimedia.org/P65723 and previous config saved to /var/cache/conftool/dbconfig/20240703-142541-arnaudb.json
14:25 klausman@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'article-descriptions' for release 'main' .
14:21 jayme@cumin1002: conftool action : set/pooled=inactive; selector: name=(wikikube-worker1007.eqiad.wmnet|wikikube-worker1021.eqiad.wmnet|kubernetes1060.eqiad.wmnet)
14:18 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
14:18 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229 (T367856)', diff saved to https://phabricator.wikimedia.org/P65722 and previous config saved to /var/cache/conftool/dbconfig/20240703-141826-marostegui.json
14:17 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:45:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1013,1017,1021].eqiad.wmnet with reason: T365994
14:17 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 0:45:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1013,1017,1021].eqiad.wmnet with reason: T365994
14:17 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:45:00 on db1154.eqiad.wmnet with reason: T365994
14:16 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 0:45:00 on db1154.eqiad.wmnet with reason: T365994
14:11 klausman@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'article-descriptions' for release 'main' .
14:10 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host parsoidtest1001.eqiad.wmnet with OS bullseye
14:09 klausman@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'article-descriptions' for release 'main' .
14:09 klausman@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'article-descriptions' for release 'main' .
14:09 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
14:08 klausman@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'article-descriptions' for release 'main' .
14:07 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host parsoidtest1001.eqiad.wmnet with OS bullseye
14:04 topranks: rebooting lsw1-e2-eqiad to install updated JunOS version T365994
14:01 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on 22 hosts with reason: JunOS upgrade lsw1-e2-eqiad
14:00 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:40:00 on 22 hosts with reason: JunOS upgrade lsw1-e2-eqiad
13:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on elastic[1091-1092].eqiad.wmnet,wdqs[1018,1020].eqiad.wmnet with reason: T348977
13:59 bking@cumin2002: START - Cookbook sre.hosts.downtime for 4:00:00 on elastic[1091-1092].eqiad.wmnet,wdqs[1018,1020].eqiad.wmnet with reason: T348977
13:58 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on lsw1-e2-eqiad,lsw1-e2-eqiad IPv6,ssw1-e1-eqiad.mgmt,ssw1-f1-eqiad.mgmt with reason: JunOS upgrade lsw1-e2-eqiad
13:58 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:40:00 on lsw1-e2-eqiad,lsw1-e2-eqiad IPv6,ssw1-e1-eqiad.mgmt,ssw1-f1-eqiad.mgmt with reason: JunOS upgrade lsw1-e2-eqiad
13:57 jayme@cumin1002: conftool action : set/pooled=no; selector: name=(wikikube-worker1007.eqiad.wmnet|wikikube-worker1021.eqiad.wmnet|kubernetes1060.eqiad.wmnet)
13:56 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: elastic1091*,elastic1092* for T348977 - bking@cumin2002
13:56 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: elastic1091*,elastic1092* for T348977 - bking@cumin2002
13:56 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:20:00 on kubernetes1060.eqiad.wmnet,wikikube-worker[1007,1021].eqiad.wmnet with reason: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2
13:55 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 1:20:00 on kubernetes1060.eqiad.wmnet,wikikube-worker[1007,1021].eqiad.wmnet with reason: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2
13:53 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:50:00 on lsw1-e2-eqiad.mgmt with reason: prep JunOS upgrade lsw1-e2-eqiad
13:52 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:50:00 on lsw1-e2-eqiad.mgmt with reason: prep JunOS upgrade lsw1-e2-eqiad
13:48 Lucas_WMDE: UTC afternoon backport+config window done
13:48 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for noc: fail with a 404 when the selected wiki is nonexistent, CirrusSearch: add wgCirrusSearchIndexFieldsToCleanup (duration: 08m 38s)
13:44 jayme: draining wikikube-worker1007.eqiad.wmnet wikikube-worker1021.eqiad.wmnet kubernetes1060.eqiad.wmnet for T365994
13:43 logmsgbot: lucaswerkmeister-wmde@deploy1002 dcausse, lucaswerkmeister-wmde: Continuing with sync
13:42 logmsgbot: lucaswerkmeister-wmde@deploy1002 dcausse, lucaswerkmeister-wmde: Backport for noc: fail with a 404 when the selected wiki is nonexistent, CirrusSearch: add wgCirrusSearchIndexFieldsToCleanup synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:39 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap sync-world: Backport for noc: fail with a 404 when the selected wiki is nonexistent, CirrusSearch: add wgCirrusSearchIndexFieldsToCleanup
13:38 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for GlobalRenameQueue: Fix issues with wiki ID and row query (T369147) (duration: 09m 28s)
13:33 logmsgbot: lucaswerkmeister-wmde@deploy1002 kharlan, lucaswerkmeister-wmde: Continuing with sync
13:31 logmsgbot: lucaswerkmeister-wmde@deploy1002 kharlan, lucaswerkmeister-wmde: Backport for GlobalRenameQueue: Fix issues with wiki ID and row query (T369147) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:29 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-conf1006.eqiad.wmnet with OS bookworm
13:29 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-conf1005.eqiad.wmnet with OS bookworm
13:29 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-conf1004.eqiad.wmnet with OS bookworm
13:28 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap sync-world: Backport for GlobalRenameQueue: Fix issues with wiki ID and row query (T369147)
13:25 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for PropertyValueExpertsModule: Turn on enableModuleContentVersion() (T369155) (duration: 08m 20s)
13:22 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host parsoidtest1001.eqiad.wmnet with OS bullseye
13:20 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host parsoidtest1001
13:20 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde: Continuing with sync
13:19 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde: Backport for PropertyValueExpertsModule: Turn on enableModuleContentVersion() (T369155) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:19 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host parsoidtest1001
13:18 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db[1191,1196-1197].eqiad.wmnet with reason: T365994
13:18 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db[1191,1196-1197].eqiad.wmnet with reason: T365994
13:17 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) 49.3.193.10.in-addr.arpa. on all recursors
13:17 cmooney@cumin1002: START - Cookbook sre.dns.wipe-cache 49.3.193.10.in-addr.arpa. on all recursors
13:17 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) sretest2002.mgmt.codfw.wmnet on all recursors
13:17 cmooney@cumin1002: START - Cookbook sre.dns.wipe-cache sretest2002.mgmt.codfw.wmnet on all recursors
13:17 arnaudb@cumin1002: dbctl commit (dc=all): 'T365994 - depool db1191,db1196,db1197', diff saved to https://phabricator.wikimedia.org/P65721 and previous config saved to /var/cache/conftool/dbconfig/20240703-131715-arnaudb.json
13:17 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap sync-world: Backport for PropertyValueExpertsModule: Turn on enableModuleContentVersion() (T369155)
13:16 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:16 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add sretest2002 entries - cmooney@cumin1002"
13:15 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ mwscript namespaceDupes kawikisource --fix # T363243; 34 pages to fix, 34 were resolvable; 774 links to fix, 774 were resolvable, 0 were deleted
13:15 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add sretest2002 entries - cmooney@cumin1002"
13:14 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ mwscript namespaceDupes mswikisource --fix # T369047; 6 pages to fix, 6 were resolvable; 76 links to fix, 73 were resolvable, 3 were deleted
13:13 cmooney@cumin1002: START - Cookbook sre.dns.netbox
13:12 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for mswikisource: create author and translation namespaces and add namespace aliases (T369047), kawikisource: create author namespace, add namespace aliases and sitename (T363243) (duration: 10m 39s)
13:07 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde, anzx: Continuing with sync
13:04 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde, anzx: Backport for mswikisource: create author and translation namespaces and add namespace aliases (T369047), kawikisource: create author namespace, add namespace aliases and sitename (T363243) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:01 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap sync-world: Backport for mswikisource: create author and translation namespaces and add namespace aliases (T369047), kawikisource: create author namespace, add namespace aliases and sitename (T363243)
12:51 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: sync
12:47 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: sync
12:39 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply
12:39 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply
12:37 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: sync
12:34 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: sync
12:30 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply
12:17 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply
12:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2165 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P65720 and previous config saved to /var/cache/conftool/dbconfig/20240703-121009-ladsgroup.json
11:55 ladsgroup@deploy1002: Finished scap: Backport for rpc: Update function call in RunSingleJob (T363839) (duration: 08m 08s)
11:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2165 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P65719 and previous config saved to /var/cache/conftool/dbconfig/20240703-115504-ladsgroup.json
11:52 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1184 (T364069)', diff saved to https://phabricator.wikimedia.org/P65718 and previous config saved to /var/cache/conftool/dbconfig/20240703-115211-marostegui.json
11:52 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1184.eqiad.wmnet with reason: Maintenance
11:51 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1184.eqiad.wmnet with reason: Maintenance
11:51 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T364069)', diff saved to https://phabricator.wikimedia.org/P65717 and previous config saved to /var/cache/conftool/dbconfig/20240703-115149-marostegui.json
11:50 ladsgroup@deploy1002: ladsgroup: Continuing with sync
11:49 ladsgroup@deploy1002: ladsgroup: Backport for rpc: Update function call in RunSingleJob (T363839) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
11:47 ladsgroup@deploy1002: Started scap sync-world: Backport for rpc: Update function call in RunSingleJob (T363839)
11:45 ladsgroup@deploy1002: Finished scap: Backport for Optimize static footer 'a Wikimedia project' icon further (T256190) (duration: 09m 28s)
11:40 ladsgroup@deploy1002: volker-e, ladsgroup: Continuing with sync
11:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2165 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P65716 and previous config saved to /var/cache/conftool/dbconfig/20240703-113958-ladsgroup.json
11:39 ladsgroup@deploy1002: volker-e, ladsgroup: Backport for Optimize static footer 'a Wikimedia project' icon further (T256190) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
11:36 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P65715 and previous config saved to /var/cache/conftool/dbconfig/20240703-113642-marostegui.json
11:35 ladsgroup@deploy1002: Started scap sync-world: Backport for Optimize static footer 'a Wikimedia project' icon further (T256190)
11:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1192 (T352010)', diff saved to https://phabricator.wikimedia.org/P65714 and previous config saved to /var/cache/conftool/dbconfig/20240703-112728-ladsgroup.json
11:27 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance
11:27 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance
11:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2165 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P65713 and previous config saved to /var/cache/conftool/dbconfig/20240703-112452-ladsgroup.json
11:21 cgoubert@deploy1002: Finished scap: mw-on-k8s: Move php.envvars to mediawiki-common - T365265 (duration: 05m 22s)
11:21 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P65712 and previous config saved to /var/cache/conftool/dbconfig/20240703-112135-marostegui.json
11:16 cgoubert@deploy1002: Started scap sync-world: mw-on-k8s: Move php.envvars to mediawiki-common - T365265
11:15 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
11:15 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
11:06 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T364069)', diff saved to https://phabricator.wikimedia.org/P65711 and previous config saved to /var/cache/conftool/dbconfig/20240703-110627-marostegui.json
10:38 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1229 (T367856)', diff saved to https://phabricator.wikimedia.org/P65710 and previous config saved to /var/cache/conftool/dbconfig/20240703-103839-marostegui.json
10:38 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1229.eqiad.wmnet with reason: Maintenance
10:38 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1229.eqiad.wmnet with reason: Maintenance
10:33 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
10:32 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
10:32 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
09:49 logmsgbot: andrewtavis-wmde@deploy1002 Finished deploy [airflow-dags/wmde@d773cac]: (no justification provided) (duration: 00m 07s)
09:49 logmsgbot: andrewtavis-wmde@deploy1002 Started deploy [airflow-dags/wmde@d773cac]: (no justification provided)
09:31 mlitn@deploy1002: Finished scap: Backport for Handle campaigns where wikibase is not enabled (T369085) (duration: 12m 59s)
09:27 ayounsi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "testvm2008 - ayounsi@cumin1002"
09:26 mlitn@deploy1002: mlitn: Continuing with sync
09:26 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "testvm2008 - ayounsi@cumin1002"
09:21 mlitn@deploy1002: mlitn: Backport for Handle campaigns where wikibase is not enabled (T369085) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
09:20 filippo@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: apply
09:20 filippo@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifeeds: apply
09:20 filippo@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifeeds: apply
09:20 ayounsi@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host testvm2008.wikimedia.org
09:20 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host testvm2008.wikimedia.org with OS bookworm
09:20 filippo@deploy1002: helmfile [codfw] START helmfile.d/services/wikifeeds: apply
09:19 marostegui@cumin1002: dbctl commit (dc=all): 'Give more weight to db2136 - running 10.11 T365805', diff saved to https://phabricator.wikimedia.org/P65709 and previous config saved to /var/cache/conftool/dbconfig/20240703-091956-marostegui.json
09:18 mlitn@deploy1002: Started scap sync-world: Backport for Handle campaigns where wikibase is not enabled (T369085)
09:09 brouberol@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-stretch2002.codfw.wmnet
09:06 topranks: merge host firewall changes to set default DSCP marking (T339850)
09:06 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on testvm2008.wikimedia.org with reason: host reimage
09:02 ayounsi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on testvm2008.wikimedia.org with reason: host reimage
09:02 brouberol@cumin1002: START - Cookbook sre.hosts.reboot-single for host kafka-stretch2002.codfw.wmnet
09:01 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
09:01 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
09:00 filippo@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: apply
09:00 filippo@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifeeds: apply
09:00 filippo@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifeeds: apply
08:59 filippo@deploy1002: helmfile [codfw] START helmfile.d/services/wikifeeds: apply
08:58 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
08:58 brouberol@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-stretch2001.codfw.wmnet
08:57 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
08:53 jayme: deployed istio (adding securityContext) to wikikube clusters - T362978
08:51 brouberol@cumin1002: START - Cookbook sre.hosts.reboot-single for host kafka-stretch2001.codfw.wmnet
08:51 brouberol@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-stretch1002.eqiad.wmnet
08:49 Lucas_WMDE: RELEASE_NAME=r72z2aop helmfile --file /srv/deployment-charts/helmfile.d/services/mw-script/helmfile.yaml --environment eqiad --selector name=r72z2aop destroy # clean up broken mwscript-k8s run I did just to test something
08:46 ayounsi@cumin1002: START - Cookbook sre.hosts.reimage for host testvm2008.wikimedia.org with OS bookworm
08:45 ayounsi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM testvm2008.wikimedia.org - ayounsi@cumin1002"
08:45 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM testvm2008.wikimedia.org - ayounsi@cumin1002"
08:44 brouberol@cumin1002: START - Cookbook sre.hosts.reboot-single for host kafka-stretch1002.eqiad.wmnet
08:44 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2008.wikimedia.org on all recursors
08:44 ayounsi@cumin1002: START - Cookbook sre.dns.wipe-cache testvm2008.wikimedia.org on all recursors
08:44 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
08:44 ayounsi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2008.wikimedia.org - ayounsi@cumin1002"
08:43 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2008.wikimedia.org - ayounsi@cumin1002"
08:43 filippo@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply
08:42 filippo@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox: apply
08:42 filippo@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox: apply
08:42 brouberol@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-stretch1001.eqiad.wmnet
08:41 filippo@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox: apply
08:41 filippo@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply
08:41 filippo@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply
08:41 filippo@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply
08:41 filippo@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
08:41 filippo@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply
08:40 filippo@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply
08:40 filippo@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
08:40 ayounsi@cumin1002: START - Cookbook sre.dns.netbox
08:40 ayounsi@cumin1002: START - Cookbook sre.ganeti.makevm for new host testvm2008.wikimedia.org
08:40 filippo@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply
08:40 filippo@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply
08:40 filippo@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
08:40 filippo@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-media: apply
08:40 filippo@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply
08:40 filippo@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply
08:39 filippo@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply
08:39 filippo@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply
08:39 filippo@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply
08:39 filippo@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply
08:38 filippo@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply
08:35 brouberol@cumin1002: START - Cookbook sre.hosts.reboot-single for host kafka-stretch1001.eqiad.wmnet
08:31 brouberol@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host karapace1002.eqiad.wmnet
08:22 brouberol@cumin1002: START - Cookbook sre.hosts.reboot-single for host karapace1002.eqiad.wmnet
08:18 brouberol@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host karapace1001.eqiad.wmnet
08:11 hashar@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.43.0-wmf.12 refs T366957
08:11 marostegui@cumin1002: dbctl commit (dc=all): 'Give more weight to db2136 - running 10.11 T365805', diff saved to https://phabricator.wikimedia.org/P65707 and previous config saved to /var/cache/conftool/dbconfig/20240703-081059-marostegui.json
08:09 brouberol@cumin1002: START - Cookbook sre.hosts.reboot-single for host karapace1001.eqiad.wmnet
08:09 brouberol@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host karapace1001.eqiad.wmnet
08:09 brouberol@cumin1002: START - Cookbook sre.hosts.reboot-single for host karapace1001.eqiad.wmnet
07:52 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1169 (T364069)', diff saved to https://phabricator.wikimedia.org/P65706 and previous config saved to /var/cache/conftool/dbconfig/20240703-075245-marostegui.json
07:52 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1169.eqiad.wmnet with reason: Maintenance
07:52 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1169.eqiad.wmnet with reason: Maintenance
07:43 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216 (T364069)', diff saved to https://phabricator.wikimedia.org/P65705 and previous config saved to /var/cache/conftool/dbconfig/20240703-074321-marostegui.json
07:36 kart_: Updated MinT to 2024-07-02-060114-production (T364525)
07:33 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply
07:28 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216', diff saved to https://phabricator.wikimedia.org/P65704 and previous config saved to /var/cache/conftool/dbconfig/20240703-072814-marostegui.json
07:23 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply
07:21 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply
07:14 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
07:13 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216', diff saved to https://phabricator.wikimedia.org/P65702 and previous config saved to /var/cache/conftool/dbconfig/20240703-071306-marostegui.json
07:12 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
07:07 kartik@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
06:58 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216 (T364069)', diff saved to https://phabricator.wikimedia.org/P65701 and previous config saved to /var/cache/conftool/dbconfig/20240703-065759-marostegui.json
06:20 marostegui@cumin1002: dbctl commit (dc=all): 'db2129 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P65700 and previous config saved to /var/cache/conftool/dbconfig/20240703-062057-root.json
06:05 marostegui@cumin1002: dbctl commit (dc=all): 'db2129 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P65699 and previous config saved to /var/cache/conftool/dbconfig/20240703-060552-root.json
05:50 marostegui@cumin1002: dbctl commit (dc=all): 'db2129 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P65698 and previous config saved to /var/cache/conftool/dbconfig/20240703-055046-root.json
05:35 marostegui@cumin1002: dbctl commit (dc=all): 'db2129 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P65697 and previous config saved to /var/cache/conftool/dbconfig/20240703-053541-root.json
05:23 marostegui: Deploy schema change on db2207 s2 codfw dbmaint T367856
05:23 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2207.codfw.wmnet with reason: Long schema change
05:22 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2207.codfw.wmnet with reason: Long schema change
05:21 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2207 T369130', diff saved to https://phabricator.wikimedia.org/P65696 and previous config saved to /var/cache/conftool/dbconfig/20240703-052118-root.json
05:20 marostegui@cumin1002: dbctl commit (dc=all): 'db2129 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P65695 and previous config saved to /var/cache/conftool/dbconfig/20240703-052035-root.json
05:20 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db2204 to s2 primary T369130', diff saved to https://phabricator.wikimedia.org/P65694 and previous config saved to /var/cache/conftool/dbconfig/20240703-052029-root.json
05:20 marostegui: Starting s2 codfw failover from db2207 to db2204 - T369130
05:07 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 27 hosts with reason: Primary switchover s2 T369130
05:06 marostegui@cumin1002: dbctl commit (dc=all): 'Set db2204 with weight 0 T369130', diff saved to https://phabricator.wikimedia.org/P65693 and previous config saved to /var/cache/conftool/dbconfig/20240703-050647-root.json
05:06 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 27 hosts with reason: Primary switchover s2 T369130
05:05 marostegui@cumin1002: dbctl commit (dc=all): 'db2129 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P65692 and previous config saved to /var/cache/conftool/dbconfig/20240703-050523-root.json
04:51 marostegui@cumin1002: dbctl commit (dc=all): 'Pool with small weight T365805', diff saved to https://phabricator.wikimedia.org/P65691 and previous config saved to /var/cache/conftool/dbconfig/20240703-045109-marostegui.json
04:50 marostegui@cumin1002: dbctl commit (dc=all): 'db2129 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P65690 and previous config saved to /var/cache/conftool/dbconfig/20240703-045018-root.json
04:33 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2216 (T364069)', diff saved to https://phabricator.wikimedia.org/P65689 and previous config saved to /var/cache/conftool/dbconfig/20240703-043335-marostegui.json
04:33 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2216.codfw.wmnet with reason: Maintenance
04:33 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2216.codfw.wmnet with reason: Maintenance
04:33 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2203 (T364069)', diff saved to https://phabricator.wikimedia.org/P65688 and previous config saved to /var/cache/conftool/dbconfig/20240703-043312-marostegui.json
04:18 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2203', diff saved to https://phabricator.wikimedia.org/P65687 and previous config saved to /var/cache/conftool/dbconfig/20240703-041805-marostegui.json
04:02 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2203', diff saved to https://phabricator.wikimedia.org/P65686 and previous config saved to /var/cache/conftool/dbconfig/20240703-040258-marostegui.json
03:47 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2203 (T364069)', diff saved to https://phabricator.wikimedia.org/P65685 and previous config saved to /var/cache/conftool/dbconfig/20240703-034751-marostegui.json
01:17 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2203 (T364069)', diff saved to https://phabricator.wikimedia.org/P65684 and previous config saved to /var/cache/conftool/dbconfig/20240703-011701-marostegui.json
01:16 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2203.codfw.wmnet with reason: Maintenance
01:16 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2203.codfw.wmnet with reason: Maintenance
00:48 eileen: civicrm upgraded from 6e03cff2 to 84d6f5d1
00:27 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on A:cp-upload_drmrs
00:16 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on A:cp-text_drmrs
00:05 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance
00:05 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance
00:05 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1222 (T367856)', diff saved to https://phabricator.wikimedia.org/P65683 and previous config saved to /var/cache/conftool/dbconfig/20240703-000506-marostegui.json

2024-07-02

23:50 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1222', diff saved to https://phabricator.wikimedia.org/P65682 and previous config saved to /var/cache/conftool/dbconfig/20240702-234959-marostegui.json
23:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1222', diff saved to https://phabricator.wikimedia.org/P65681 and previous config saved to /var/cache/conftool/dbconfig/20240702-233452-marostegui.json
23:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1222 (T367856)', diff saved to https://phabricator.wikimedia.org/P65680 and previous config saved to /var/cache/conftool/dbconfig/20240702-231945-marostegui.json
22:58 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2202.codfw.wmnet with reason: Maintenance
22:58 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2202.codfw.wmnet with reason: Maintenance
22:58 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188 (T364069)', diff saved to https://phabricator.wikimedia.org/P65679 and previous config saved to /var/cache/conftool/dbconfig/20240702-225835-marostegui.json
22:43 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188', diff saved to https://phabricator.wikimedia.org/P65678 and previous config saved to /var/cache/conftool/dbconfig/20240702-224328-marostegui.json
22:28 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188', diff saved to https://phabricator.wikimedia.org/P65677 and previous config saved to /var/cache/conftool/dbconfig/20240702-222820-marostegui.json
22:13 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188 (T364069)', diff saved to https://phabricator.wikimedia.org/P65676 and previous config saved to /var/cache/conftool/dbconfig/20240702-221312-marostegui.json
22:05 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-wikifunctions: apply
22:05 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-wikifunctions: apply
22:05 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-wikifunctions: apply
22:04 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/mw-wikifunctions: apply
22:04 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
22:04 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
22:04 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
22:04 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
22:04 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply
22:04 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply
22:04 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply
22:03 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply
22:03 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
22:03 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
22:03 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
22:03 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
22:03 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-misc: apply
22:03 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-misc: apply
22:03 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-misc: apply
22:03 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/mw-misc: apply
22:03 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply
22:03 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply
22:03 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply
22:03 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply
22:03 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-jobrunner: apply
22:02 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-jobrunner: apply
22:02 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-jobrunner: apply
22:02 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/mw-jobrunner: apply
22:02 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
22:02 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
22:02 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
22:02 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
22:02 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
22:01 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
22:01 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
22:01 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
21:58 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
21:58 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
21:58 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
21:57 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
21:54 rzl@deploy1002: Finished scap: T369080 (duration: 04m 13s)
21:54 rzl@deploy1002: rzl: Continuing with sync
21:52 rzl@deploy1002: rzl: T369080 synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:51 rzl@deploy1002: Started scap sync-world: T369080
21:26 eileen: civicrm upgraded from 08e568e4 to 6e03cff2
21:21 eileen: civicrm upgraded from 67bcfd72 to 08e568e4
20:50 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
20:50 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add sretest2002 entries - cmooney@cumin1002"
20:49 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add sretest2002 entries - cmooney@cumin1002"
20:45 cmooney@cumin1002: START - Cookbook sre.dns.netbox
20:45 cmooney@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest2002.mgmt.codfw.wmnet with reboot policy FORCED
20:39 cmooney@cumin1002: START - Cookbook sre.hosts.provision for host sretest2002.mgmt.codfw.wmnet with reboot policy FORCED
20:35 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
20:35 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add sretest2002 entries - cmooney@cumin1002"
20:34 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add sretest2002 entries - cmooney@cumin1002"
20:33 urbanecm@deploy1002: Finished scap: Backport for Follow the defaults for Parsoid on MFE on officewiki (T363720) (duration: 11m 44s)
20:31 cmooney@cumin1002: START - Cookbook sre.dns.netbox
20:28 urbanecm@deploy1002: arlolra, urbanecm: Continuing with sync
20:25 cmooney@cumin1002: END (FAIL) - Cookbook sre.hosts.dhcp (exit_code=99) for host wikikube-ctrl2002.codfw.wmnet
20:24 urbanecm@deploy1002: arlolra, urbanecm: Backport for Follow the defaults for Parsoid on MFE on officewiki (T363720) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:21 urbanecm@deploy1002: Started scap sync-world: Backport for Follow the defaults for Parsoid on MFE on officewiki (T363720)
20:21 urbanecm@deploy1002: Finished scap: Backport for [July 2nd] Mobile: Enable dark mode for all users for tier 1 wikis (T367151), Remove unused Linter configs (T343292) (duration: 16m 31s)
20:16 urbanecm@deploy1002: jdlrobson, arlolra, urbanecm: Continuing with sync
20:15 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
20:15 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
20:07 urbanecm@deploy1002: jdlrobson, arlolra, urbanecm: Backport for [July 2nd] Mobile: Enable dark mode for all users for tier 1 wikis (T367151), Remove unused Linter configs (T343292) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:04 urbanecm@deploy1002: Started scap sync-world: Backport for [July 2nd] Mobile: Enable dark mode for all users for tier 1 wikis (T367151), Remove unused Linter configs (T343292)
19:45 jhathaway: running another email inbound mx test on mx-in1001
19:40 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2188 (T364069)', diff saved to https://phabricator.wikimedia.org/P65675 and previous config saved to /var/cache/conftool/dbconfig/20240702-194027-marostegui.json
19:40 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2188.codfw.wmnet with reason: Maintenance
19:40 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2188.codfw.wmnet with reason: Maintenance
19:40 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T364069)', diff saved to https://phabricator.wikimedia.org/P65674 and previous config saved to /var/cache/conftool/dbconfig/20240702-194005-marostegui.json
19:25 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P65673 and previous config saved to /var/cache/conftool/dbconfig/20240702-192457-marostegui.json
19:21 eileen: civicrm upgraded from 64f23ed0 to 67bcfd72
19:09 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P65672 and previous config saved to /var/cache/conftool/dbconfig/20240702-190950-marostegui.json
18:54 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T364069)', diff saved to https://phabricator.wikimedia.org/P65671 and previous config saved to /var/cache/conftool/dbconfig/20240702-185443-marostegui.json
17:40 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
17:40 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
17:39 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
17:39 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
17:36 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
17:36 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
17:34 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
17:34 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
17:20 jforrester@deploy1002: Finished scap: Backport for Update OOUI to v0.50.3, Update OOUI to v0.50.3 (T369010) (duration: 10m 06s)
17:15 jforrester@deploy1002: jforrester: Continuing with sync
17:14 jforrester@deploy1002: jforrester: Backport for Update OOUI to v0.50.3, Update OOUI to v0.50.3 (T369010) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
17:10 jforrester@deploy1002: Started scap sync-world: Backport for Update OOUI to v0.50.3, Update OOUI to v0.50.3 (T369010)
17:07 dani@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
17:07 dani@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
17:07 dani@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
17:06 dani@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
17:06 dani@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
17:06 dani@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
17:06 mutante: lists1004 - sudo systemctl start wmf_auto_restart_exim4 (T369017)
16:54 ejegg: fundraising civicrm upgraded from 41c1bd78 to 64f23ed0
16:16 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host testvm2007.codfw.wmnet with OS bookworm
16:13 brett@cumin2002: START - Cookbook sre.cdn.roll-reboot rolling reboot on A:cp-upload_drmrs
16:02 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on testvm2007.codfw.wmnet with reason: host reimage
16:01 brett@cumin2002: START - Cookbook sre.cdn.roll-reboot rolling reboot on A:cp-text_drmrs
15:58 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-master1004.eqiad.wmnet
15:57 ayounsi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on testvm2007.codfw.wmnet with reason: host reimage
15:51 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-master1004.eqiad.wmnet
15:50 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/superset: apply
15:50 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/superset: apply
15:49 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_esams
15:46 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_esams
15:44 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on kubernetes1051.eqiad.wmnet with reason: Hardware issue
15:44 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 20:00:00 on kubernetes1051.eqiad.wmnet with reason: Hardware issue
15:43 ayounsi@cumin1002: START - Cookbook sre.hosts.reimage for host testvm2007.codfw.wmnet with OS bookworm
15:41 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2176 (T364069)', diff saved to https://phabricator.wikimedia.org/P65670 and previous config saved to /var/cache/conftool/dbconfig/20240702-154127-marostegui.json
15:41 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2176.codfw.wmnet with reason: Maintenance
15:41 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2176.codfw.wmnet with reason: Maintenance
15:41 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T364069)', diff saved to https://phabricator.wikimedia.org/P65669 and previous config saved to /var/cache/conftool/dbconfig/20240702-154105-marostegui.json
15:25 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P65668 and previous config saved to /var/cache/conftool/dbconfig/20240702-152558-marostegui.json
15:17 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host testvm2007.codfw.wmnet with OS bookworm
15:12 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: sync
15:12 elukey@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: sync
15:10 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P65667 and previous config saved to /var/cache/conftool/dbconfig/20240702-151050-marostegui.json
15:05 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts kubetcd[2004-2006].codfw.wmnet
15:05 jiji@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:05 jiji@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kubetcd[2004-2006].codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1002"
15:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on testvm2007.codfw.wmnet with reason: host reimage
15:02 jiji@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kubetcd[2004-2006].codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1002"
14:58 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-misc: apply
14:58 jiji@cumin1002: START - Cookbook sre.dns.netbox
14:58 ayounsi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on testvm2007.codfw.wmnet with reason: host reimage
14:55 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_esams
14:55 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_esams
14:55 fabfur: upgrading A:cp-esams to haproxy 2.8.10 (T367756)
14:55 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T364069)', diff saved to https://phabricator.wikimedia.org/P65666 and previous config saved to /var/cache/conftool/dbconfig/20240702-145542-marostegui.json
14:53 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-misc: apply
14:53 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-misc: apply
14:53 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-misc: apply
14:52 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-wikifunctions: apply
14:52 brouberol@cumin1002: END (PASS) - Cookbook sre.druid.reboot-workers (exit_code=0) for Druid analytics cluster: Reboot Druid nodes
14:52 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-wikifunctions: apply
14:52 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-wikifunctions: apply
14:51 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts kubetcd[1004-1006].eqiad.wmnet
14:51 jiji@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:51 jiji@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kubetcd[1004-1006].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1002"
14:50 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-wikifunctions: apply
14:48 jiji@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kubetcd[1004-1006].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1002"
14:47 jiji@cumin1002: START - Cookbook sre.hosts.decommission for hosts kubetcd[2004-2006].codfw.wmnet
14:45 jiji@cumin1002: START - Cookbook sre.dns.netbox
14:38 ayounsi@cumin1002: START - Cookbook sre.hosts.reimage for host testvm2007.codfw.wmnet with OS bookworm
14:37 jiji@cumin1002: START - Cookbook sre.hosts.decommission for hosts kubetcd[1004-1006].eqiad.wmnet
14:28 dcaro@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd1008.eqiad.wmnet
14:19 dcaro@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudcephosd1008.eqiad.wmnet
14:15 root@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1008.eqiad.wmnet with OS bullseye
14:12 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on 6 hosts with reason: decom
14:12 jiji@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on 6 hosts with reason: decom
14:11 jiji@cumin1002: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 2 days, 0:00:00 on 6 hosts with reason: decom
14:11 jiji@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on 6 hosts with reason: decom
14:07 jforrester@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
14:06 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=dns6001.wikimedia.org,service=recdns
14:06 filippo@deploy1002: helmfile [eqiad] DONE helmfile.d/services/page-analytics: apply
14:05 filippo@deploy1002: helmfile [eqiad] START helmfile.d/services/page-analytics: apply
14:05 jforrester@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
14:05 jforrester@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
14:05 filippo@deploy1002: helmfile [codfw] DONE helmfile.d/services/page-analytics: apply
14:05 filippo@deploy1002: helmfile [codfw] START helmfile.d/services/page-analytics: apply
14:04 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Unbanning all hosts in search_eqiad
14:04 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Unbanning all hosts in search_eqiad
14:04 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=dns6001.wikimedia.org,service=recdns
14:04 jforrester@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
14:03 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
14:03 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
14:03 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=dns6001.wikimedia.org
14:02 sukhe: restart anycast-hc on dns6001
14:01 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=dns6001.wikimedia.org
13:58 effie: decom old eqiad and codfw kubetcd hosts
13:46 jforrester@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
13:44 jforrester@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
13:44 jforrester@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
13:43 jforrester@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
13:42 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
13:42 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
13:41 brouberol@cumin1002: START - Cookbook sre.druid.reboot-workers for Druid analytics cluster: Reboot Druid nodes
13:39 brouberol@cumin1002: END (PASS) - Cookbook sre.druid.reboot-workers (exit_code=0) for Druid public cluster: Reboot Druid nodes
13:35 cgoubert@cumin1002: conftool action : set/weight=10:pooled=yes; selector: name=(wikikube-worker2030.codfw.wmnet|wikikube-worker2031.codfw.wmnet|wikikube-worker2032.codfw.wmnet|wikikube-worker2033.codfw.wmnet|wikikube-worker2034.codfw.wmnet),cluster=kubernetes,service=kubesvc
13:35 claime: Pooling and uncordoning wikikube-worker2030.codfw.wmnet wikikube-worker2031.codfw.wmnet wikikube-worker2032.codfw.wmnet wikikube-worker2033.codfw.wmnet wikikube-worker2034.codfw.wmnet - T351074
13:31 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1222 (T367856)', diff saved to https://phabricator.wikimedia.org/P65665 and previous config saved to /var/cache/conftool/dbconfig/20240702-133100-marostegui.json
13:30 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1222.eqiad.wmnet with reason: Maintenance
13:30 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1222.eqiad.wmnet with reason: Maintenance
13:30 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T367856)', diff saved to https://phabricator.wikimedia.org/P65664 and previous config saved to /var/cache/conftool/dbconfig/20240702-133038-marostegui.json
13:30 Lucas_WMDE: UTC afternoon backport+config window done
13:27 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for [wikifunctions] Grant wikifunctions-staff enum and converter rights (T366610 T367270), GrowthExperiments: add community updates module flag (T365877) (duration: 10m 22s)
13:22 claime: homer 'cr*codfw*' commit 'T351074'
13:22 logmsgbot: lucaswerkmeister-wmde@deploy1002 sgimeno, jforrester, lucaswerkmeister-wmde: Continuing with sync
13:21 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts kubemaster[1001-1002].eqiad.wmnet
13:21 jiji@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:21 jiji@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kubemaster[1001-1002].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1002"
13:20 logmsgbot: lucaswerkmeister-wmde@deploy1002 sgimeno, jforrester, lucaswerkmeister-wmde: Backport for [wikifunctions] Grant wikifunctions-staff enum and converter rights (T366610 T367270), GrowthExperiments: add community updates module flag (T365877) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:18 jiji@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kubemaster[1001-1002].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1002"
13:17 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap sync-world: Backport for [wikifunctions] Grant wikifunctions-staff enum and converter rights (T366610 T367270), GrowthExperiments: add community updates module flag (T365877)
13:16 jiji@cumin1002: START - Cookbook sre.dns.netbox
13:15 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P65663 and previous config saved to /var/cache/conftool/dbconfig/20240702-131531-marostegui.json
13:14 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for Enable EntitySchema data type on Wikidata (T332157) (duration: 10m 54s)
13:09 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2032.codfw.wmnet with OS bullseye
13:09 jiji@cumin1002: START - Cookbook sre.hosts.decommission for hosts kubemaster[1001-1002].eqiad.wmnet
13:09 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde: Continuing with sync
13:08 elukey@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'sync'.
13:08 elukey@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'sync'.
13:06 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde: Backport for Enable EntitySchema data type on Wikidata (T332157) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:04 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2033.codfw.wmnet with OS bullseye
13:03 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap sync-world: Backport for Enable EntitySchema data type on Wikidata (T332157)
13:00 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P65662 and previous config saved to /var/cache/conftool/dbconfig/20240702-130024-marostegui.json
12:59 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2034.codfw.wmnet with OS bullseye
12:59 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2031.codfw.wmnet with OS bullseye
12:56 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2030.codfw.wmnet with OS bullseye
12:55 jiji@cumin1002: conftool action : set/pooled=inactive; selector: name=kubemaster100[1-2].eqiad.wmnet
12:49 jiji@cumin1002: conftool action : set/pooled=no; selector: name=kubemaster100[1-2].eqiad.wmnet
12:49 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2032.codfw.wmnet with reason: host reimage
12:46 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on kubemaster[1001-1002].eqiad.wmnet with reason: decom
12:46 jiji@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on kubemaster[1001-1002].eqiad.wmnet with reason: decom
12:45 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2033.codfw.wmnet with reason: host reimage
12:45 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T367856)', diff saved to https://phabricator.wikimedia.org/P65661 and previous config saved to /var/cache/conftool/dbconfig/20240702-124517-marostegui.json
12:44 effie: decom eqiad old kubemasters - T353464
12:42 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2034.codfw.wmnet with reason: host reimage
12:41 jayme@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host kubernetes1051.eqiad.wmnet
12:39 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2031.codfw.wmnet with reason: host reimage
12:36 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2030.codfw.wmnet with reason: host reimage
12:34 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2032.codfw.wmnet with reason: host reimage
12:34 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2033.codfw.wmnet with reason: host reimage
12:34 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2034.codfw.wmnet with reason: host reimage
12:34 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2031.codfw.wmnet with reason: host reimage
12:33 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2030.codfw.wmnet with reason: host reimage
12:25 brouberol@cumin1002: START - Cookbook sre.druid.reboot-workers for Druid public cluster: Reboot Druid nodes
12:25 marostegui: Deploy schema change on db2129 s6 codfw dbmaint T367856
12:25 brouberol@cumin1002: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid analytics cluster: Roll restart of Druid jvm daemons.
12:24 jforrester@deploy1002: Finished scap: Backport for Reference widget: check for undefined config (T368736) (duration: 09m 59s)
12:19 jforrester@deploy1002: jforrester: Continuing with sync
12:19 jforrester@deploy1002: jforrester: Backport for Reference widget: check for undefined config (T368736) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
12:19 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2034.codfw.wmnet with OS bullseye
12:18 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2033.codfw.wmnet with OS bullseye
12:18 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2032.codfw.wmnet with OS bullseye
12:18 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2031.codfw.wmnet with OS bullseye
12:17 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2030.codfw.wmnet with OS bullseye
12:17 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2393 to wikikube-worker2034
12:17 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2034
12:17 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2034
12:17 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
12:17 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2393 to wikikube-worker2034 - cgoubert@cumin1002"
12:16 marostegui@cumin1002: dbctl commit (dc=all): 'db2165 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P65660 and previous config saved to /var/cache/conftool/dbconfig/20240702-121638-root.json
12:16 eoghan@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on lists1001.wikimedia.org with reason: Pre-decommissioning lists1001
12:16 eoghan@cumin1002: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on lists1001.wikimedia.org with reason: Pre-decommissioning lists1001
12:16 pfischer@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
12:15 pfischer@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
12:15 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2393 to wikikube-worker2034 - cgoubert@cumin1002"
12:14 jforrester@deploy1002: Started scap sync-world: Backport for Reference widget: check for undefined config (T368736)
12:11 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
12:11 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2393 to wikikube-worker2034
12:09 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2392 to wikikube-worker2033
12:09 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2033
12:09 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2033
12:09 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
12:09 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2392 to wikikube-worker2033 - cgoubert@cumin1002"
12:09 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_eqiad
12:08 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2392 to wikikube-worker2033 - cgoubert@cumin1002"
12:07 root@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1008.eqiad.wmnet with reason: host reimage
12:07 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_eqiad
12:05 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
12:05 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2392 to wikikube-worker2033
12:05 root@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1008.eqiad.wmnet with reason: host reimage
12:04 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2365 to wikikube-worker2032
12:04 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2032
12:03 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2032
12:03 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
12:03 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2365 to wikikube-worker2032 - cgoubert@cumin1002"
12:01 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2365 to wikikube-worker2032 - cgoubert@cumin1002"
12:01 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
12:01 marostegui@cumin1002: dbctl commit (dc=all): 'db2165 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P65659 and previous config saved to /var/cache/conftool/dbconfig/20240702-120133-root.json
12:01 jayme@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
12:00 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
12:00 jayme@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
11:59 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
11:59 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2365 to wikikube-worker2032
11:59 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2309 to wikikube-worker2031
11:58 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2031
11:58 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2031
11:58 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
11:58 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2309 to wikikube-worker2031 - cgoubert@cumin1002"
11:58 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
11:58 jayme@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
11:57 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2309 to wikikube-worker2031 - cgoubert@cumin1002"
11:55 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
11:55 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2309 to wikikube-worker2031
11:54 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2307 to wikikube-worker2030
11:54 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2030
11:52 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2030
11:52 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
11:52 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2307 to wikikube-worker2030 - cgoubert@cumin1002"
11:50 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2174 (T364069)', diff saved to https://phabricator.wikimedia.org/P65658 and previous config saved to /var/cache/conftool/dbconfig/20240702-115026-marostegui.json
11:50 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2307 to wikikube-worker2030 - cgoubert@cumin1002"
11:50 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2174.codfw.wmnet with reason: Maintenance
11:50 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2174.codfw.wmnet with reason: Maintenance
11:50 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T364069)', diff saved to https://phabricator.wikimedia.org/P65657 and previous config saved to /var/cache/conftool/dbconfig/20240702-115003-marostegui.json
11:48 root@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1008.eqiad.wmnet with OS bullseye
11:46 marostegui@cumin1002: dbctl commit (dc=all): 'db2165 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P65656 and previous config saved to /var/cache/conftool/dbconfig/20240702-114627-root.json
11:44 root@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcephosd1008.eqiad.wmnet with OS bullseye
11:43 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
11:43 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2307 to wikikube-worker2030
11:37 brouberol@cumin1002: START - Cookbook sre.druid.roll-restart-workers for Druid analytics cluster: Roll restart of Druid jvm daemons.
11:36 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2129.codfw.wmnet with reason: Long schema change
11:36 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2129.codfw.wmnet with reason: Long schema change
11:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P65655 and previous config saved to /var/cache/conftool/dbconfig/20240702-113457-marostegui.json
11:31 marostegui@cumin1002: dbctl commit (dc=all): 'db2165 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P65654 and previous config saved to /var/cache/conftool/dbconfig/20240702-113122-root.json
11:27 brouberol@cumin1002: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid public cluster: Roll restart of Druid jvm daemons.
11:26 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host eventlog1003.eqiad.wmnet
11:26 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2129 T369021', diff saved to https://phabricator.wikimedia.org/P65653 and previous config saved to /var/cache/conftool/dbconfig/20240702-112616-root.json
11:25 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db2214 to s6 primary T369021', diff saved to https://phabricator.wikimedia.org/P65652 and previous config saved to /var/cache/conftool/dbconfig/20240702-112518-marostegui.json
11:24 marostegui: Starting s6 codfw failover from db2129 to db2214 - T369021
11:24 jayme: switched wikikube production clusters from PSP to PSS for restricted namespaces - T273507
11:23 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
11:22 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host eventlog1003.eqiad.wmnet
11:22 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
11:22 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_eqiad
11:22 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_eqiad
11:21 jayme@cumin1002: START - Cookbook sre.hosts.reboot-single for host kubernetes1051.eqiad.wmnet
11:21 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
11:21 claime: Uncordoning wikikube-ctrl2001.codfw.wmnet and wikikube-ctrl2002.codfw.wmnet
11:20 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
11:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P65651 and previous config saved to /var/cache/conftool/dbconfig/20240702-111949-marostegui.json
11:17 root@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1008.eqiad.wmnet with OS bullseye
11:16 marostegui@cumin1002: dbctl commit (dc=all): 'db2165 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P65650 and previous config saved to /var/cache/conftool/dbconfig/20240702-111616-root.json
11:14 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_eqiad
11:12 cgoubert@cumin1002: conftool action : set/weight=10:pooled=yes; selector: name=(wikikube-worker2025.codfw.wmnet|wikikube-worker2026.codfw.wmnet|wikikube-worker2027.codfw.wmnet|wikikube-worker2028.codfw.wmnet|wikikube-worker2029.codfw.wmnet),cluster=kubernetes,service=kubesvc
11:12 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_eqiad
11:12 claime: pooling and uncordoning wikikube-worker2025.codfw.wmnet|wikikube-worker2026.codfw.wmnet|wikikube-worker2027.codfw.wmnet|wikikube-worker2028.codfw.wmnet|wikikube-worker2029.codfw.wmnet - T351074
11:11 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts kubemaster[2001-2002].codfw.wmnet
11:11 jiji@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
11:11 jiji@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kubemaster[2001-2002].codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1002"
11:07 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 27 hosts with reason: Primary switchover s6 T369021
11:07 marostegui@cumin1002: dbctl commit (dc=all): 'Set db2214 with weight 0 T369021', diff saved to https://phabricator.wikimedia.org/P65649 and previous config saved to /var/cache/conftool/dbconfig/20240702-110750-root.json
11:07 jiji@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kubemaster[2001-2002].codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1002"
11:07 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 27 hosts with reason: Primary switchover s6 T369021
11:04 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T364069)', diff saved to https://phabricator.wikimedia.org/P65648 and previous config saved to /var/cache/conftool/dbconfig/20240702-110442-marostegui.json
11:01 marostegui@cumin1002: dbctl commit (dc=all): 'db2165 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P65647 and previous config saved to /var/cache/conftool/dbconfig/20240702-110111-root.json
10:56 jiji@cumin1002: START - Cookbook sre.dns.netbox
10:50 jiji@cumin1002: START - Cookbook sre.hosts.decommission for hosts kubemaster[2001-2002].codfw.wmnet
10:46 marostegui@cumin1002: dbctl commit (dc=all): 'db2165 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P65646 and previous config saved to /var/cache/conftool/dbconfig/20240702-104605-root.json
10:42 pfischer@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
10:42 pfischer@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
10:42 pfischer@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
10:41 pfischer@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
10:35 brouberol@cumin1002: START - Cookbook sre.druid.roll-restart-workers for Druid public cluster: Roll restart of Druid jvm daemons.
10:34 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-master1003.eqiad.wmnet
10:32 brouberol@cumin1002: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:dse-k8s-worker
10:28 fabfur: upgrading A:cp-eqiad to haproxy 2.8.10 (T367756)
10:27 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_eqiad
10:27 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_eqiad
10:25 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-master1003.eqiad.wmnet
10:06 jynus@cumin1002: dbctl commit (dc=all): 'Repool es1025 at 100% weight T363812', diff saved to https://phabricator.wikimedia.org/P65645 and previous config saved to /var/cache/conftool/dbconfig/20240702-100636-jynus.json
10:02 claime: homer 'cr*codfw*' commit 'T351074'
09:53 jiji@cumin1002: conftool action : set/pooled=no; selector: name=kubemaster200[1-2].codfw.wmnet
09:52 elukey: volatile dir on puppetserver1001 with the new point release (12.6) for Bookworm
09:48 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on kubemaster[2001-2002].codfw.wmnet with reason: decom
09:47 jiji@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on kubemaster[2001-2002].codfw.wmnet with reason: decom
09:20 brouberol@cumin1002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:dse-k8s-worker
09:15 jynus@cumin1002: dbctl commit (dc=all): 'Repool es1025 at 50% weight T363812', diff saved to https://phabricator.wikimedia.org/P65644 and previous config saved to /var/cache/conftool/dbconfig/20240702-091508-jynus.json
08:57 jynus@cumin1002: dbctl commit (dc=all): 'Repool es1025 at 10% weight T363812', diff saved to https://phabricator.wikimedia.org/P65643 and previous config saved to /var/cache/conftool/dbconfig/20240702-085733-jynus.json
08:44 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1197 (T367856)', diff saved to https://phabricator.wikimedia.org/P65642 and previous config saved to /var/cache/conftool/dbconfig/20240702-084447-marostegui.json
08:44 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1197.eqiad.wmnet with reason: Maintenance
08:44 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1197.eqiad.wmnet with reason: Maintenance
08:44 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T367856)', diff saved to https://phabricator.wikimedia.org/P65641 and previous config saved to /var/cache/conftool/dbconfig/20240702-084425-marostegui.json
08:40 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on P{cp6009.*} and A:cp
08:38 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on P{cp6009.*} and A:cp
08:36 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_magru
08:34 hashar@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.43.0-wmf.12 refs T366957
08:34 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_magru
08:30 jayme@cumin1002: conftool action : set/pooled=inactive; selector: name=kubernetes1051.eqiad.wmnet
08:29 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P65640 and previous config saved to /var/cache/conftool/dbconfig/20240702-082918-marostegui.json
08:22 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on P{cp2031.*} and A:cp
08:20 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on P{cp2031.*} and A:cp
08:17 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on P{cp2030.*} and A:cp
08:16 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
08:15 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on P{cp2030.*} and A:cp
08:15 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
08:14 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on P{cp2028.*} and A:cp
08:14 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P65639 and previous config saved to /var/cache/conftool/dbconfig/20240702-081411-marostegui.json
08:13 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on P{cp2028.*} and A:cp
08:12 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on P{cp2027.*} and A:cp
08:11 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on P{cp2027.*} and A:cp
08:10 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2173 (T364069)', diff saved to https://phabricator.wikimedia.org/P65638 and previous config saved to /var/cache/conftool/dbconfig/20240702-081025-marostegui.json
08:10 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
08:10 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
08:10 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2173.codfw.wmnet with reason: Maintenance
08:09 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2173.codfw.wmnet with reason: Maintenance
08:09 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170 (T364069)', diff saved to https://phabricator.wikimedia.org/P65637 and previous config saved to /var/cache/conftool/dbconfig/20240702-080948-marostegui.json
08:07 jayme: draining kubernetes1051.eqiad.wmnet
08:07 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_magru
08:06 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_magru
08:01 jayme: cordon kubernetes1051.eqiad.wmnet because of several failed image pulls
07:59 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T367856)', diff saved to https://phabricator.wikimedia.org/P65635 and previous config saved to /var/cache/conftool/dbconfig/20240702-075904-marostegui.json
07:58 kharlan@deploy1002: Finished scap: Backport for Revert "QuickSurveys: Add testing survey configuration" (T368459) (duration: 41m 45s)
07:54 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170', diff saved to https://phabricator.wikimedia.org/P65634 and previous config saved to /var/cache/conftool/dbconfig/20240702-075440-marostegui.json
07:52 kharlan@deploy1002: kharlan: Continuing with sync
07:51 kharlan@deploy1002: kharlan: Backport for Revert "QuickSurveys: Add testing survey configuration" (T368459) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
07:39 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170', diff saved to https://phabricator.wikimedia.org/P65633 and previous config saved to /var/cache/conftool/dbconfig/20240702-073933-marostegui.json
07:24 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170 (T364069)', diff saved to https://phabricator.wikimedia.org/P65632 and previous config saved to /var/cache/conftool/dbconfig/20240702-072426-marostegui.json
07:16 kharlan@deploy1002: Started scap sync-world: Backport for Revert "QuickSurveys: Add testing survey configuration" (T368459)
07:06 kharlan@deploy1002: Started scap sync-world: Backport for Revert "QuickSurveys: Add testing survey configuration" (T368459)
07:01 oblivian@deploy1002: Finished scap: Rebuilding images for change to the base image for httpd (duration: 26m 52s)
06:59 XioNoX: update netboot bookworm image to pickup new point release
06:58 marostegui@cumin1002: dbctl commit (dc=all): 'db1192 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P65631 and previous config saved to /var/cache/conftool/dbconfig/20240702-065831-root.json
06:35 oblivian@deploy1002: Started scap sync-world: Rebuilding images for change to the base image for httpd
06:28 marostegui@cumin1002: dbctl commit (dc=all): 'db1192 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P65629 and previous config saved to /var/cache/conftool/dbconfig/20240702-062820-root.json
06:21 _joe_: rebuilding httpd-fcgi, mediawiki-httpd images T363342 T368640
06:13 marostegui@cumin1002: dbctl commit (dc=all): 'db1192 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P65628 and previous config saved to /var/cache/conftool/dbconfig/20240702-061315-root.json
05:58 marostegui@cumin1002: dbctl commit (dc=all): 'db1192 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P65627 and previous config saved to /var/cache/conftool/dbconfig/20240702-055809-root.json
05:43 marostegui@cumin1002: dbctl commit (dc=all): 'db1192 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P65626 and previous config saved to /var/cache/conftool/dbconfig/20240702-054304-root.json
05:27 marostegui@cumin1002: dbctl commit (dc=all): 'db1192 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P65625 and previous config saved to /var/cache/conftool/dbconfig/20240702-052759-root.json
05:25 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1192 T368371', diff saved to https://phabricator.wikimedia.org/P65624 and previous config saved to /var/cache/conftool/dbconfig/20240702-052543-root.json
05:24 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db1209 to s8 primary and set section read-write T368371', diff saved to https://phabricator.wikimedia.org/P65623 and previous config saved to /var/cache/conftool/dbconfig/20240702-052447-marostegui.json
05:24 marostegui@cumin1002: dbctl commit (dc=all): 'Set s8 eqiad as read-only for maintenance - T368371', diff saved to https://phabricator.wikimedia.org/P65622 and previous config saved to /var/cache/conftool/dbconfig/20240702-052408-marostegui.json
05:23 marostegui: Starting s8 eqiad failover from db1192 to db1209 - T368371
04:59 marostegui@cumin1002: dbctl commit (dc=all): 'Set db1209 remove from API T368371', diff saved to https://phabricator.wikimedia.org/P65621 and previous config saved to /var/cache/conftool/dbconfig/20240702-045929-marostegui.json
04:59 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 33 hosts with reason: Primary switchover s8 T368371
04:58 marostegui@cumin1002: dbctl commit (dc=all): 'Set db1209 with weight 0 T368371', diff saved to https://phabricator.wikimedia.org/P65620 and previous config saved to /var/cache/conftool/dbconfig/20240702-045856-marostegui.json
04:58 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 33 hosts with reason: Primary switchover s8 T368371
04:33 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2170 (T364069)', diff saved to https://phabricator.wikimedia.org/P65619 and previous config saved to /var/cache/conftool/dbconfig/20240702-043349-marostegui.json
04:33 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2170.codfw.wmnet with reason: Maintenance
04:33 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2170.codfw.wmnet with reason: Maintenance
04:33 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T364069)', diff saved to https://phabricator.wikimedia.org/P65618 and previous config saved to /var/cache/conftool/dbconfig/20240702-043326-marostegui.json
04:22 eileen: civicrm upgraded from f6af6380 to 41c1bd78
04:18 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P65617 and previous config saved to /var/cache/conftool/dbconfig/20240702-041819-marostegui.json
04:07 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1188 (T367856)', diff saved to https://phabricator.wikimedia.org/P65616 and previous config saved to /var/cache/conftool/dbconfig/20240702-040705-marostegui.json
04:06 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1188.eqiad.wmnet with reason: Maintenance
04:06 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1188.eqiad.wmnet with reason: Maintenance
04:06 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T367856)', diff saved to https://phabricator.wikimedia.org/P65615 and previous config saved to /var/cache/conftool/dbconfig/20240702-040643-marostegui.json
04:03 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P65614 and previous config saved to /var/cache/conftool/dbconfig/20240702-040312-marostegui.json
04:01 mwpresync@deploy1002: Pruned MediaWiki: 1.43.0-wmf.9 (duration: 01m 02s)
03:54 mwpresync@deploy1002: Finished scap: testwikis wikis to 1.43.0-wmf.12 refs T366957 (duration: 51m 33s)
03:51 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P65613 and previous config saved to /var/cache/conftool/dbconfig/20240702-035135-marostegui.json
03:51 eileen: civicrm upgraded from 52dc4f1d to f6af6380
03:48 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T364069)', diff saved to https://phabricator.wikimedia.org/P65612 and previous config saved to /var/cache/conftool/dbconfig/20240702-034805-marostegui.json
03:36 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P65611 and previous config saved to /var/cache/conftool/dbconfig/20240702-033628-marostegui.json
03:21 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T367856)', diff saved to https://phabricator.wikimedia.org/P65610 and previous config saved to /var/cache/conftool/dbconfig/20240702-032121-marostegui.json
03:03 mwpresync@deploy1002: Started scap sync-world: testwikis wikis to 1.43.0-wmf.12 refs T366957
00:45 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2153 (T364069)', diff saved to https://phabricator.wikimedia.org/P65609 and previous config saved to /var/cache/conftool/dbconfig/20240702-004524-marostegui.json
00:45 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2153.codfw.wmnet with reason: Maintenance
00:45 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2153.codfw.wmnet with reason: Maintenance
00:45 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T364069)', diff saved to https://phabricator.wikimedia.org/P65608 and previous config saved to /var/cache/conftool/dbconfig/20240702-004502-marostegui.json
00:29 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P65607 and previous config saved to /var/cache/conftool/dbconfig/20240702-002955-marostegui.json
00:16 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1038.eqiad.wmnet with OS bullseye
00:16 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
00:15 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
00:14 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P65606 and previous config saved to /var/cache/conftool/dbconfig/20240702-001448-marostegui.json
00:14 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1037.eqiad.wmnet with OS bullseye
00:14 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
00:13 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"

2024-07-01

23:59 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T364069)', diff saved to https://phabricator.wikimedia.org/P65605 and previous config saved to /var/cache/conftool/dbconfig/20240701-235941-marostegui.json
23:57 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1038.eqiad.wmnet with reason: host reimage
23:55 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1036.eqiad.wmnet with OS bullseye
23:55 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
23:54 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1037.eqiad.wmnet with reason: host reimage
23:54 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
23:51 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1038.eqiad.wmnet with reason: host reimage
23:51 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1037.eqiad.wmnet with reason: host reimage
23:41 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1035.eqiad.wmnet with OS bullseye
23:40 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
23:39 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
23:36 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1036.eqiad.wmnet with reason: host reimage
23:34 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1036.eqiad.wmnet with reason: host reimage
23:25 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1037.eqiad.wmnet with OS bullseye
23:25 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1038.eqiad.wmnet with OS bullseye
23:22 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1035.eqiad.wmnet with reason: host reimage
23:19 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1035.eqiad.wmnet with reason: host reimage
23:05 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1036.eqiad.wmnet with OS bullseye
23:02 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1035.eqiad.wmnet with OS bullseye
22:54 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephosd1038
22:54 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephosd1038
22:47 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1041.eqiad.wmnet with OS bullseye
22:47 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
22:10 sbassett@deploy1002: Synchronized private/PrivateSettings.php: Un-deployed a PS.php mitigation for T341908 (duration: 07m 24s)
21:59 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: elastic1089*,elastic1090*,elastic1104* for T348977 - bking@cumin2002
21:59 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: elastic1089*,elastic1090*,elastic1104* for T348977 - bking@cumin2002
21:58 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on elastic[1089-1090,1104].eqiad.wmnet with reason: T348977
21:58 bking@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on elastic[1089-1090,1104].eqiad.wmnet with reason: T348977
21:55 maryum: deployed patch for T366991
21:39 eileen: civicrm upgraded from f8b1f5c4 to 52dc4f1d
21:39 eileen: tools upgraded from c51f6e62 to 95f10b20
21:32 zabe: zabe@mwmaint1002:/tmp/upload$ mwscript importImages.php --wiki=commonswiki --comment-ext=txt --sleep=3600 --user=Yann . # T368703
21:24 cjming: end of UTC late backport window
21:23 cjming@deploy1002: Finished scap: Backport for extension-list: Add Metrics Platform (T366234) (duration: 28m 16s)
21:16 cjming@deploy1002: cjming: Continuing with sync
21:16 cjming@deploy1002: cjming: Backport for extension-list: Add Metrics Platform (T366234) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:05 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2146 (T364069)', diff saved to https://phabricator.wikimedia.org/P65604 and previous config saved to /var/cache/conftool/dbconfig/20240701-210534-marostegui.json
21:05 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2146.codfw.wmnet with reason: Maintenance
21:05 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2146.codfw.wmnet with reason: Maintenance
21:05 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T364069)', diff saved to https://phabricator.wikimedia.org/P65603 and previous config saved to /var/cache/conftool/dbconfig/20240701-210512-marostegui.json
21:04 ejegg: fundraising civicrm upgraded from f9782670 to f8b1f5c4
20:55 cjming@deploy1002: Started scap sync-world: Backport for extension-list: Add Metrics Platform (T366234)
20:53 cjming@deploy1002: Finished scap: Backport for Missing.php: don't redirect to unprefixed nan incubator (T86915) (duration: 09m 03s)
20:50 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P65602 and previous config saved to /var/cache/conftool/dbconfig/20240701-205003-marostegui.json
20:47 cjming@deploy1002: cjming, pppery: Continuing with sync
20:47 cjming@deploy1002: cjming, pppery: Backport for Missing.php: don't redirect to unprefixed nan incubator (T86915) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:44 cjming@deploy1002: Started scap sync-world: Backport for Missing.php: don't redirect to unprefixed nan incubator (T86915)
20:42 cjming@deploy1002: Finished scap: Backport for [July 1st] Mobile: Enable dark mode for all tier 1 wikis (logged in) (T367151), Change color of notification icon in dark-mode (T368120), Do not invert images that have been tagged with no invert classes (T368483) (duration: 10m 39s)
20:36 cjming@deploy1002: cjming, jdlrobson: Continuing with sync
20:35 ejegg: standalone SmashPig upgraded from c8993ec6 to 565c61e4
20:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P65601 and previous config saved to /var/cache/conftool/dbconfig/20240701-203456-marostegui.json
20:34 cjming@deploy1002: cjming, jdlrobson: Backport for [July 1st] Mobile: Enable dark mode for all tier 1 wikis (logged in) (T367151), Change color of notification icon in dark-mode (T368120), Do not invert images that have been tagged with no invert classes (T368483) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:31 cjming@deploy1002: Started scap sync-world: Backport for [July 1st] Mobile: Enable dark mode for all tier 1 wikis (logged in) (T367151), Change color of notification icon in dark-mode (T368120), Do not invert images that have been tagged with no invert classes (T368483)
20:30 cjming@deploy1002: Sync cancelled.
20:28 cjming@deploy1002: jdlrobson, cjming: Backport for [July 1st] Mobile: Enable dark mode for all tier 1 wikis (logged in) (T367151) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:26 cjming@deploy1002: Started scap sync-world: Backport for [July 1st] Mobile: Enable dark mode for all tier 1 wikis (logged in) (T367151)
20:23 cjming@deploy1002: Sync cancelled.
20:23 cjming@deploy1002: jdlrobson, cjming: Backport for [July 1st] Mobile: Enable dark mode for all tier 1 wikis (logged in) (T367151) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T364069)', diff saved to https://phabricator.wikimedia.org/P65600 and previous config saved to /var/cache/conftool/dbconfig/20240701-201949-marostegui.json
20:03 cjming@deploy1002: Started scap sync-world: Backport for [July 1st] Mobile: Enable dark mode for all tier 1 wikis (logged in) (T367151)
19:33 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
19:19 dancy@deploy1002: Installation of scap version "4.91.0" completed for 233 hosts
19:19 dancy@deploy1002: Installing scap version "4.91.0" for 233 hosts
19:16 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1041.eqiad.wmnet with reason: host reimage
19:15 dancy@deploy1002: Installing scap version "4.91.0" for 234 hosts
19:14 dancy@deploy1002: Installing scap version "4.91.0" for 234 hosts
19:13 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1041.eqiad.wmnet with reason: host reimage
18:57 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1041.eqiad.wmnet with OS bullseye
18:56 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephosd1041
18:56 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephosd1041
17:49 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephosd1041
17:49 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephosd1041
17:49 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
17:49 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt for dbproxy1028,9 - jclark@cumin1002"
17:48 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt for dbproxy1028,9 - jclark@cumin1002"
17:45 jclark@cumin1002: START - Cookbook sre.dns.netbox
17:44 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephosd1041
17:44 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephosd1041
17:42 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephosd1041
17:42 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephosd1041
17:41 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephosd1041
17:41 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephosd1041
17:37 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1041.eqiad.wmnet with OS bullseye
17:36 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on cr2-codfw,ssw1-a[1,8]-codfw.mgmt with reason: reboot ssw1-d8-codfw
17:35 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on cr2-codfw,ssw1-a[1,8]-codfw.mgmt with reason: reboot ssw1-d8-codfw
17:27 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1039.eqiad.wmnet with OS bullseye
17:27 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
17:16 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2145 (T364069)', diff saved to https://phabricator.wikimedia.org/P65599 and previous config saved to /var/cache/conftool/dbconfig/20240701-171609-marostegui.json
17:16 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2145.codfw.wmnet with reason: Maintenance
17:15 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2145.codfw.wmnet with reason: Maintenance
17:08 klausman@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
17:08 klausman@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
17:05 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
17:04 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
16:51 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
16:51 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/commons-impact-analytics: apply
16:51 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/commons-impact-analytics: apply
16:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1039.eqiad.wmnet with reason: host reimage
16:35 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1039.eqiad.wmnet with reason: host reimage
16:34 dancy@deploy1002: Installing scap version "4.90.0" for 234 hosts
16:34 dancy@deploy1002: Installing scap version "4.90.0" for 234 hosts
16:33 dancy@deploy1002: Installing scap version "4.90.0" for 234 hosts
16:30 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1182 (T367856)', diff saved to https://phabricator.wikimedia.org/P65598 and previous config saved to /var/cache/conftool/dbconfig/20240701-163010-marostegui.json
16:30 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1182.eqiad.wmnet with reason: Maintenance
16:29 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1182.eqiad.wmnet with reason: Maintenance
16:29 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T367856)', diff saved to https://phabricator.wikimedia.org/P65597 and previous config saved to /var/cache/conftool/dbconfig/20240701-162948-marostegui.json
16:22 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1039.eqiad.wmnet with OS bullseye
16:21 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephosd1039
16:20 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephosd1039
16:18 urandom: restarting Cassandra —restbase2023-{a,b,c}— troubleshooting storage utilization
16:17 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1041.eqiad.wmnet with OS bullseye
16:14 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P65596 and previous config saved to /var/cache/conftool/dbconfig/20240701-161441-marostegui.json
16:11 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephosd1041
16:11 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephosd1041
15:59 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P65595 and previous config saved to /var/cache/conftool/dbconfig/20240701-155934-marostegui.json
15:44 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T367856)', diff saved to https://phabricator.wikimedia.org/P65594 and previous config saved to /var/cache/conftool/dbconfig/20240701-154427-marostegui.json
15:37 marostegui@cumin1002: dbctl commit (dc=all): 'db2158 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P65593 and previous config saved to /var/cache/conftool/dbconfig/20240701-153758-root.json
15:37 dcausse@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
15:32 dcausse@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
15:25 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_codfw
15:22 marostegui@cumin1002: dbctl commit (dc=all): 'db2158 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P65592 and previous config saved to /var/cache/conftool/dbconfig/20240701-152253-root.json
15:22 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
15:22 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_codfw
15:21 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
15:21 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
15:20 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
15:16 dcausse@deploy1002: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
15:15 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1040.eqiad.wmnet with OS bullseye
15:15 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
15:14 dcausse@deploy1002: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
15:10 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2026.codfw.wmnet with OS bullseye
15:07 marostegui@cumin1002: dbctl commit (dc=all): 'db2158 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P65591 and previous config saved to /var/cache/conftool/dbconfig/20240701-150747-root.json
15:07 dcausse@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
15:07 dcausse@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
15:05 akosiaris: reboot deploy1003 T364416
15:04 dcausse@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
15:03 dcausse@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
14:57 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
14:56 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
14:56 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
14:55 claime: deploying statsd-exporter for mw-web - T365265
14:54 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/commons-impact-analytics: apply
14:54 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
14:52 marostegui@cumin1002: dbctl commit (dc=all): 'db2158 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P65590 and previous config saved to /var/cache/conftool/dbconfig/20240701-145242-root.json
14:50 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2026.codfw.wmnet with reason: host reimage
14:49 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
14:48 dcausse@deploy1002: helmfile [eqiad] DONE helmfile.d/services/rdf-streaming-updater: apply
14:48 dcausse@deploy1002: helmfile [eqiad] START helmfile.d/services/rdf-streaming-updater: apply
14:48 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2026.codfw.wmnet with reason: host reimage
14:44 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/commons-impact-analytics: apply
14:44 dcausse@deploy1002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
14:43 dcausse@deploy1002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
14:40 dcausse@deploy1002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
14:40 dcausse@deploy1002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
14:37 marostegui@cumin1002: dbctl commit (dc=all): 'db2158 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P65589 and previous config saved to /var/cache/conftool/dbconfig/20240701-143736-root.json
14:36 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_codfw
14:36 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_codfw
14:35 fabfur: upgrading A:cp-codfw to haproxy 2.8.10 (T367756)
14:32 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1040.eqiad.wmnet with reason: host reimage
14:31 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2026.codfw.wmnet with OS bullseye
14:27 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1040.eqiad.wmnet with reason: host reimage
14:22 marostegui@cumin1002: dbctl commit (dc=all): 'db2158 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P65587 and previous config saved to /var/cache/conftool/dbconfig/20240701-142231-root.json
14:16 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2141.codfw.wmnet with reason: Maintenance
14:16 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2141.codfw.wmnet with reason: Maintenance
14:16 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T364069)', diff saved to https://phabricator.wikimedia.org/P65586 and previous config saved to /var/cache/conftool/dbconfig/20240701-141640-marostegui.json
14:10 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1040.eqiad.wmnet with OS bullseye
14:07 marostegui@cumin1002: dbctl commit (dc=all): 'db2158 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P65585 and previous config saved to /var/cache/conftool/dbconfig/20240701-140725-root.json
14:03 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1040.eqiad.wmnet with OS bullseye
14:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P65584 and previous config saved to /var/cache/conftool/dbconfig/20240701-140133-marostegui.json
13:57 dcausse@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
13:56 dcausse@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
13:48 dcausse@deploy1002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
13:48 dcausse@deploy1002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
13:46 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P65583 and previous config saved to /var/cache/conftool/dbconfig/20240701-134626-marostegui.json
13:42 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1040.eqiad.wmnet with OS bullseye
13:41 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephosd1040
13:41 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephosd1040
13:37 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2026.codfw.wmnet with OS bullseye
13:31 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T364069)', diff saved to https://phabricator.wikimedia.org/P65581 and previous config saved to /var/cache/conftool/dbconfig/20240701-133118-marostegui.json
13:30 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/api-gateway: sync
13:30 elukey@deploy1002: helmfile [staging] START helmfile.d/services/api-gateway: sync
13:30 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/rest-gateway: sync
13:29 elukey@deploy1002: helmfile [staging] START helmfile.d/services/rest-gateway: sync
13:29 urbanecm: mwmaint1002: [urbanecm@mwmaint1002 ~]$ foreachwiki DiscussionTools:FixTrailingWhitespaceIds (T356196)
13:27 elukey@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'.
13:27 elukey@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'sync'.
13:26 elukey@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
13:26 elukey@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
13:26 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
13:26 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
13:25 urbanecm@deploy1002: Finished scap: Backport for FixTrailingWhitespaceIds: Don't crash on complex conflicts (T356196) (duration: 08m 46s)
13:21 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
13:21 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
13:19 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_magru
13:17 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2026.codfw.wmnet with reason: host reimage
13:17 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_magru
13:16 urbanecm@deploy1002: Started scap: Backport for FixTrailingWhitespaceIds: Don't crash on complex conflicts (T356196)
13:16 urbanecm@deploy1002: Finished scap: Backport for Update interwiki map (T368862) (duration: 09m 01s)
13:14 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2026.codfw.wmnet with reason: host reimage
13:10 urbanecm@deploy1002: urbanecm: Continuing with sync
13:10 urbanecm@deploy1002: urbanecm: Backport for Update interwiki map (T368862) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:07 urbanecm@deploy1002: Started scap: Backport for Update interwiki map (T368862)
12:56 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
12:56 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
12:56 dcausse@deploy1002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
12:55 dcausse@deploy1002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
12:55 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2026.codfw.wmnet with OS bullseye
12:54 cgoubert@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-worker2026.codfw.wmnet with OS bullseye
12:51 claime: Running update-netboot-image bullseye for 11.10 release on puppetserver1001
12:49 fabfur: upgrading A:cp-magru to haproxy 2.8.10 (T367756)
12:49 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_magru
12:49 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_magru
12:39 claime: Running update-netboot-image bullseye for 11.10 release
12:35 jayme@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
12:35 jayme@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
12:35 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2026.codfw.wmnet with reason: host reimage
12:35 jayme@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
12:35 jayme@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
12:35 jayme@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
12:35 jayme@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
12:34 jayme@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
12:33 jayme@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
12:33 jayme@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
12:33 jayme@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
12:32 klausman@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
12:32 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2026.codfw.wmnet with reason: host reimage
12:32 jayme@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
12:32 jayme@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
12:32 jayme@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
12:31 klausman@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
12:31 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
12:30 jayme@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
12:29 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
12:28 klausman@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
12:27 klausman@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
12:23 klausman@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
12:22 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
12:21 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
12:21 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
12:20 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
12:19 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
12:18 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
12:17 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
12:16 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
12:14 klausman@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
12:12 daniel@deploy1002: Finished scap: Backport for REST: detect mismatching value types in json request (T305973) (duration: 32m 48s)
12:09 klausman@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
12:08 klausman@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
12:06 klausman@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
12:04 daniel@deploy1002: daniel: Continuing with sync
12:03 klausman@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
12:01 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2026.codfw.wmnet with OS bullseye
12:01 cgoubert@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-worker2026.codfw.wmnet with OS bullseye
12:00 daniel@deploy1002: daniel: Backport for REST: detect mismatching value types in json request (T305973) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
11:58 klausman@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
11:51 klausman@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'readability' for release 'main' .
11:49 klausman@deploy1002: helmfile [ml-serve-eqiad] 'sync' command on namespace 'ores-legacy' for release 'main' .
11:46 klausman@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'llm' for release 'main' .
11:45 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
11:45 klausman@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .
11:43 slyngshede@cumin1002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging FebinBellamy out of all services on: 2188 hosts
11:43 klausman@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
11:43 slyngshede@cumin1002: START - Cookbook sre.idm.logout Logging FebinBellamy out of all services on: 2188 hosts
11:41 slyngshede@cumin1002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging AKhatun out of all services on: 2188 hosts
11:41 slyngshede@cumin1002: START - Cookbook sre.idm.logout Logging AKhatun out of all services on: 2188 hosts
11:40 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2026.codfw.wmnet with reason: host reimage
11:39 daniel@deploy1002: Started scap: Backport for REST: detect mismatching value types in json request (T305973)
11:37 klausman@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'article-descriptions' for release 'main' .
11:37 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2026.codfw.wmnet with reason: host reimage
11:33 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_drmrs
11:30 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_drmrs
11:29 btullis@cumin1002: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=99)
11:27 btullis@cumin1002: START - Cookbook sre.wikireplicas.update-views
11:19 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2026.codfw.wmnet with OS bullseye
10:57 elukey@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'.
10:49 claime: running /usr/local/bin/apply-config-kartotherian on maps-master
10:47 elukey@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'sync'.
10:47 claime: running /usr/local/bin/apply-config-kartotherian on maps-replica
10:46 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
10:46 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
10:43 claime: running puppet on maps servers
10:39 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_drmrs
10:39 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_drmrs
10:38 elukey@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
10:37 elukey@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
10:37 elukey@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'.
10:37 elukey@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'sync'.
10:26 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2130 (T364069)', diff saved to https://phabricator.wikimedia.org/P65580 and previous config saved to /var/cache/conftool/dbconfig/20240701-102633-marostegui.json
10:26 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2130.codfw.wmnet with reason: Maintenance
10:26 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2130.codfw.wmnet with reason: Maintenance
10:26 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T364069)', diff saved to https://phabricator.wikimedia.org/P65579 and previous config saved to /var/cache/conftool/dbconfig/20240701-102611-marostegui.json
10:23 fabfur: upgrading A:cp-drmrs to haproxy 2.8.10 (T367756)
10:11 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P65578 and previous config saved to /var/cache/conftool/dbconfig/20240701-101104-marostegui.json
09:55 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P65577 and previous config saved to /var/cache/conftool/dbconfig/20240701-095557-marostegui.json
09:45 marostegui@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P65576 and previous config saved to /var/cache/conftool/dbconfig/20240701-094547-root.json
09:43 marostegui@cumin1002: dbctl commit (dc=all): 'db1195 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P65575 and previous config saved to /var/cache/conftool/dbconfig/20240701-094341-root.json
09:40 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T364069)', diff saved to https://phabricator.wikimedia.org/P65574 and previous config saved to /var/cache/conftool/dbconfig/20240701-094050-marostegui.json
09:30 marostegui@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P65573 and previous config saved to /var/cache/conftool/dbconfig/20240701-093042-root.json
09:28 marostegui@cumin1002: dbctl commit (dc=all): 'db1195 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P65572 and previous config saved to /var/cache/conftool/dbconfig/20240701-092835-root.json
09:26 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
09:15 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
09:15 marostegui@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P65570 and previous config saved to /var/cache/conftool/dbconfig/20240701-091536-root.json
09:14 urbanecm@deploy1002: Finished scap: Backport for JsonSchemaValidator: Measure duration (T365245) (duration: 22m 15s)
09:13 marostegui@cumin1002: dbctl commit (dc=all): 'db1195 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P65569 and previous config saved to /var/cache/conftool/dbconfig/20240701-091329-root.json
09:06 urbanecm@deploy1002: urbanecm: Continuing with sync
09:06 urbanecm@deploy1002: urbanecm: Backport for JsonSchemaValidator: Measure duration (T365245) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
09:00 marostegui@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P65568 and previous config saved to /var/cache/conftool/dbconfig/20240701-090031-root.json
08:58 marostegui@cumin1002: dbctl commit (dc=all): 'db1195 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P65567 and previous config saved to /var/cache/conftool/dbconfig/20240701-085824-root.json
08:51 urbanecm@deploy1002: Started scap: Backport for JsonSchemaValidator: Measure duration (T365245)
08:45 marostegui@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P65566 and previous config saved to /var/cache/conftool/dbconfig/20240701-084525-root.json
08:43 marostegui@cumin1002: dbctl commit (dc=all): 'db1195 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P65565 and previous config saved to /var/cache/conftool/dbconfig/20240701-084318-root.json
08:30 marostegui@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P65564 and previous config saved to /var/cache/conftool/dbconfig/20240701-083020-root.json
08:28 marostegui@cumin1002: dbctl commit (dc=all): 'db1195 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P65563 and previous config saved to /var/cache/conftool/dbconfig/20240701-082813-root.json
08:18 jynus@cumin1002: dbctl commit (dc=all): 'Depool es1025 for backups T363812', diff saved to https://phabricator.wikimedia.org/P65562 and previous config saved to /var/cache/conftool/dbconfig/20240701-081811-jynus.json
08:15 marostegui@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P65561 and previous config saved to /var/cache/conftool/dbconfig/20240701-081514-root.json
08:13 marostegui@cumin1002: dbctl commit (dc=all): 'db1195 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P65560 and previous config saved to /var/cache/conftool/dbconfig/20240701-081307-root.json
08:07 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1169.eqiad.wmnet onto db1195.eqiad.wmnet
07:44 elukey: `apt-get clean` on buil2001 to free some space in the root partition
07:02 marostegui@cumin1002: dbctl commit (dc=all): 'Place db1195 in s1 T368871', diff saved to https://phabricator.wikimedia.org/P65559 and previous config saved to /var/cache/conftool/dbconfig/20240701-070243-marostegui.json
06:36 marostegui@cumin1002: START - Cookbook sre.mysql.clone of db1169.eqiad.wmnet onto db1195.eqiad.wmnet
06:36 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1169 T368871', diff saved to https://phabricator.wikimedia.org/P65558 and previous config saved to /var/cache/conftool/dbconfig/20240701-063601-root.json
06:33 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2116 (T364069)', diff saved to https://phabricator.wikimedia.org/P65557 and previous config saved to /var/cache/conftool/dbconfig/20240701-063344-marostegui.json
06:33 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2116.codfw.wmnet with reason: Maintenance
06:33 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2116.codfw.wmnet with reason: Maintenance
05:02 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1195.eqiad.wmnet with reason: Reboot
05:02 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1195.eqiad.wmnet with reason: Reboot
04:56 marostegui: Failover m2 from db1195 to db1228 - T368494
04:52 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db[2133,2160].codfw.wmnet,db[1195,1217,1228].eqiad.wmnet with reason: m2 switchover T368494
04:51 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on db[2133,2160].codfw.wmnet,db[1195,1217,1228].eqiad.wmnet with reason: m2 switchover T368494
04:50 marostegui: dbmaint eqiad Rebuild pagelinks table on s8 master T364069
04:49 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1156 (T367856)', diff saved to https://phabricator.wikimedia.org/P65556 and previous config saved to /var/cache/conftool/dbconfig/20240701-044945-marostegui.json
04:49 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
04:49 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
04:49 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1156.eqiad.wmnet with reason: Maintenance
04:49 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1156.eqiad.wmnet with reason: Maintenance

Other archives

2000s

Archive 1: 2004 Jun - 2004 Sep
Archive 2: 2004 Oct - 2004 Nov
Archive 3: 2004 Dec - 2005 Mar
Archive 4: 2005 Apr - 2005 Jul
Archive 5: 2005 Aug - 2005 Oct, with revision history 2004-06-23 to 2005-11-25
Archive 6: 2005 Nov - 2006 Feb
Archive 7: 2006 Mar - 2006 Jun
Archive 8: 2006 Jul - 2006 Sep
Archive 9: 2006 Oct - 2007 Jan, with revision history 2005-11-25 to 2007-02-21
Archive 10: 2007 Feb - 2007 Jun
Archive 11: 2007 Jul - 2007 Dec
Archive 12: 2008 Jan - 2008 Jul
Archive 12a: 2008 Aug
Archive 12b: 2008 Sept
Archive 13: 2008 Oct - 2009 Jun
Archive 14: 2009 Jun - 2009 Dec

2010s

2020-2024

2025-present