Jump to content

Server Admin Log/Archive 83

From Wikitech

2024-07-31

  • 22:23 pt1979@cumin1002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 22:19 pt1979@cumin1002: START - Cookbook sre.dns.netbox
  • 22:17 pt1979@cumin1002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
  • 22:15 pt1979@cumin1002: START - Cookbook sre.dns.netbox
  • 22:10 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host alert2002.mgmt.codfw.wmnet with reboot policy FORCED
  • 22:09 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host alert2002.mgmt.codfw.wmnet with reboot policy FORCED
  • 21:52 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:50 cmooney@cumin1002: START - Cookbook sre.dns.netbox
  • 21:28 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker1256.mgmt.eqiad.wmnet with reboot policy FORCED
  • 21:27 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1256.mgmt.eqiad.wmnet with reboot policy FORCED
  • 21:27 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker1256.mgmt.eqiad.wmnet with reboot policy FORCED
  • 21:26 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1256.mgmt.eqiad.wmnet with reboot policy FORCED
  • 21:17 xcollazo@deploy1003: Finished deploy [airflow-dags/analytics@82674dc]: deploy hot airflow analytics dag hot fix T368756 (duration: 01m 05s)
  • 21:16 xcollazo@deploy1003: Started deploy [airflow-dags/analytics@82674dc]: deploy hot airflow analytics dag hot fix T368756
  • 21:10 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on cp7015.magru.wmnet with reason: T371554
  • 21:10 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on cp7015.magru.wmnet with reason: T371554
  • 21:09 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:06 jclark@cumin1002: START - Cookbook sre.dns.netbox
  • 21:04 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:02 jclark@cumin1002: START - Cookbook sre.dns.netbox
  • 20:56 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker1257.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:55 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker1256.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:55 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker1255.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:54 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker1254.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:54 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker1253.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:53 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker1259.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:53 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker1251.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:53 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker1252.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:53 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker1250.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:52 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker1258.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:49 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1258.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:49 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker1258.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:47 brett@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp7015.magru.wmnet
  • 20:45 cjming: end of UTC late backport window
  • 20:44 cjming@deploy1003: Finished scap: Backport for beta: Enable NetworkSession extension (T355267) (duration: 07m 47s)
  • 20:40 cjming@deploy1003: ebernhardson, cjming: Continuing with sync
  • 20:39 cjming@deploy1003: ebernhardson, cjming: Backport for beta: Enable NetworkSession extension (T355267) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:37 cjming@deploy1003: Started scap sync-world: Backport for beta: Enable NetworkSession extension (T355267)
  • 20:34 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1257.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:34 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker1257.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:31 cjming@deploy1003: Finished scap: Backport for [arwiki] Set noindex for namespace user (T371470) (duration: 17m 28s)
  • 20:27 cjming@deploy1003: cjming, gergesshamon: Continuing with sync
  • 20:26 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1258.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:25 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1257.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:25 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1256.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:25 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1259.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:24 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1255.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:24 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1254.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:24 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1253.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:24 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1252.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:23 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1251.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:23 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1250.mgmt.eqiad.wmnet with reboot policy FORCED
  • 20:19 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:17 jclark@cumin1002: START - Cookbook sre.dns.netbox
  • 20:16 cjming@deploy1003: cjming, gergesshamon: Backport for [arwiki] Set noindex for namespace user (T371470) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:14 jclark@cumin1002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 20:14 cjming@deploy1003: Started scap sync-world: Backport for [arwiki] Set noindex for namespace user (T371470)
  • 20:12 cjming@deploy1003: Finished scap: Backport for [wmf-config] Remove trailing slash in SSO domain (duration: 08m 04s)
  • 20:09 jclark@cumin1002: START - Cookbook sre.dns.netbox
  • 20:07 cjming@deploy1003: cjming, d3r1ck01: Continuing with sync
  • 20:06 cjming@deploy1003: cjming, d3r1ck01: Backport for [wmf-config] Remove trailing slash in SSO domain synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:06 cstone: payments-wiki upgraded from c4c43c74 to e8d1c5ad
  • 20:04 cjming@deploy1003: Started scap sync-world: Backport for [wmf-config] Remove trailing slash in SSO domain
  • 20:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on netbox2002.codfw.wmnet,netbox1002.eqiad.wmnet with reason: old netbox
  • 20:02 ayounsi@cumin1002: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on netbox2002.codfw.wmnet,netbox1002.eqiad.wmnet with reason: old netbox
  • 19:41 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host alert2002.mgmt.codfw.wmnet with reboot policy FORCED
  • 19:23 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host alert2002.mgmt.codfw.wmnet with reboot policy FORCED
  • 19:23 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:20 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 19:20 jhancock@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 19:17 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 19:13 xcollazo@deploy1003: Finished deploy [airflow-dags/analytics@ea93090]: deploy latest DAGS to analyics Airflow instance. (duration: 01m 30s)
  • 19:11 xcollazo@deploy1003: Started deploy [airflow-dags/analytics@ea93090]: deploy latest DAGS to analyics Airflow instance.
  • 18:56 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host vrts2002.mgmt.codfw.wmnet with reboot policy FORCED
  • 18:55 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host vrts2002.mgmt.codfw.wmnet with reboot policy FORCED
  • 18:48 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host vrts2002.mgmt.codfw.wmnet with reboot policy FORCED
  • 18:48 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host vrts2002.mgmt.codfw.wmnet with reboot policy FORCED
  • 18:40 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host vrts2002.mgmt.codfw.wmnet with reboot policy FORCED
  • 18:29 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host vrts2002.mgmt.codfw.wmnet with reboot policy FORCED
  • 18:27 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:24 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 18:24 jhancock@cumin2002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 18:17 brennen@deploy1003: rebuilt and synchronized wikiversions files: group1 to 1.43.0-wmf.16 refs T366961
  • 18:17 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 18:09 brennen: 1.43.0-wmf.16 train (T366961): no current blockers, logs clean, rolling to group1.
  • 17:52 ejegg: payments-wiki upgraded from 91624a2e to c4c43c74
  • 17:13 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1248 (T367856)', diff saved to https://phabricator.wikimedia.org/P67171 and previous config saved to /var/cache/conftool/dbconfig/20240731-171255-marostegui.json
  • 17:13 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1248.eqiad.wmnet with reason: Maintenance
  • 17:12 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1248.eqiad.wmnet with reason: Maintenance
  • 17:12 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247 (T367856)', diff saved to https://phabricator.wikimedia.org/P67170 and previous config saved to /var/cache/conftool/dbconfig/20240731-171233-marostegui.json
  • 16:57 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247', diff saved to https://phabricator.wikimedia.org/P67169 and previous config saved to /var/cache/conftool/dbconfig/20240731-165726-marostegui.json
  • 16:42 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247', diff saved to https://phabricator.wikimedia.org/P67168 and previous config saved to /var/cache/conftool/dbconfig/20240731-164219-marostegui.json
  • 16:27 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247 (T367856)', diff saved to https://phabricator.wikimedia.org/P67167 and previous config saved to /var/cache/conftool/dbconfig/20240731-162712-marostegui.json
  • 16:17 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2228.mgmt.codfw.wmnet with reboot policy GRACEFUL
  • 16:08 elukey@cumin2002: START - Cookbook sre.hosts.provision for host db2228.mgmt.codfw.wmnet with reboot policy GRACEFUL
  • 16:08 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2227.mgmt.codfw.wmnet with reboot policy GRACEFUL
  • 16:07 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:04 ayounsi@cumin1002: START - Cookbook sre.dns.netbox
  • 15:58 elukey@cumin2002: START - Cookbook sre.hosts.provision for host db2227.mgmt.codfw.wmnet with reboot policy GRACEFUL
  • 15:56 ayounsi@cumin1002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
  • 15:55 ayounsi@cumin1002: START - Cookbook sre.dns.netbox
  • 15:49 marostegui@cumin1002: dbctl commit (dc=all): 'db2148 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P67166 and previous config saved to /var/cache/conftool/dbconfig/20240731-154912-root.json
  • 15:40 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2226.mgmt.codfw.wmnet with reboot policy GRACEFUL
  • 15:34 marostegui@cumin1002: dbctl commit (dc=all): 'db2148 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P67165 and previous config saved to /var/cache/conftool/dbconfig/20240731-153407-root.json
  • 15:30 ayounsi@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1002.eqiad.wmnet with reason: CR1058609 - ayounsi@cumin1002
  • 15:30 jgiannelos@deploy1003: Finished deploy [restbase/deploy@59a40a0]: (no justification provided) (duration: 19m 22s)
  • 15:28 ayounsi@cumin1002: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1002.eqiad.wmnet with reason: CR1058609 - ayounsi@cumin1002
  • 15:28 elukey@cumin2002: START - Cookbook sre.hosts.provision for host db2226.mgmt.codfw.wmnet with reboot policy GRACEFUL
  • 15:27 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2225.mgmt.codfw.wmnet with reboot policy GRACEFUL
  • 15:19 elukey@cumin2002: START - Cookbook sre.hosts.provision for host db2225.mgmt.codfw.wmnet with reboot policy GRACEFUL
  • 15:19 marostegui@cumin1002: dbctl commit (dc=all): 'db2148 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P67164 and previous config saved to /var/cache/conftool/dbconfig/20240731-151901-root.json
  • 15:17 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2224.mgmt.codfw.wmnet with reboot policy GRACEFUL
  • 15:11 jgiannelos@deploy1003: Started deploy [restbase/deploy@59a40a0]: (no justification provided)
  • 15:04 elukey@cumin2002: START - Cookbook sre.hosts.provision for host db2224.mgmt.codfw.wmnet with reboot policy GRACEFUL
  • 15:03 marostegui@cumin1002: dbctl commit (dc=all): 'db2148 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P67163 and previous config saved to /var/cache/conftool/dbconfig/20240731-150356-root.json
  • 14:48 marostegui@cumin1002: dbctl commit (dc=all): 'db2148 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P67162 and previous config saved to /var/cache/conftool/dbconfig/20240731-144850-root.json
  • 14:45 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2223.mgmt.codfw.wmnet with reboot policy GRACEFUL
  • 14:33 marostegui@cumin1002: dbctl commit (dc=all): 'db2148 (re)pooling @ 5%: Repooling', diff saved to Unable to send diff to phaste and previous config saved to /var/cache/conftool/dbconfig/20240731-143340-root.json
  • 14:33 elukey@cumin2002: START - Cookbook sre.hosts.provision for host db2223.mgmt.codfw.wmnet with reboot policy GRACEFUL
  • 14:21 sukhe: [done] upgrade cp4044 to ATS 9.2.5: T339134
  • 14:21 sukhe@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-ats (exit_code=0) Rolling upgrade/restart of Apache Traffic Server on P{cp4044*} and (A:cp-eqiad or A:cp-text_eqiad or A:cp-upload_eqiad or A:cp-codfw or A:cp-text_codfw or A:cp-upload_codfw or A:cp-esams or A:cp-text_esams or A:cp-upload_esams or A:cp-ulsfo or A:cp-text_ulsfo or A:cp-upload_ulsfo or A:cp-eqsin or A:cp-text_eqsin or A:cp-upload_eqsin or A:cp-
  • 14:20 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2148', diff saved to https://phabricator.wikimedia.org/P67160 and previous config saved to /var/cache/conftool/dbconfig/20240731-141959-marostegui.json
  • 14:19 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2148.codfw.wmnet with reason: Maintenance
  • 14:19 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on db2148.codfw.wmnet with reason: Maintenance
  • 14:17 sukhe@cumin1002: START - Cookbook sre.cdn.roll-upgrade-ats Rolling upgrade/restart of Apache Traffic Server on P{cp4044*} and (A:cp-eqiad or A:cp-text_eqiad or A:cp-upload_eqiad or A:cp-codfw or A:cp-text_codfw or A:cp-upload_codfw or A:cp-esams or A:cp-text_esams or A:cp-upload_esams or A:cp-ulsfo or A:cp-text_ulsfo or A:cp-upload_ulsfo or A:cp-eqsin or A:cp-text_eqsin or A:cp-upload_eqsin or A:cp-drmrs or A:cp-text_
  • 13:54 Lucas_WMDE: UTC afternoon backport+config window done
  • 13:53 logmsgbot: lucaswerkmeister-wmde@deploy1003 Finished scap: Backport for EventStreamConfig - fix for private wiki streams (T346046 T371433) (duration: 11m 31s)
  • 13:49 logmsgbot: lucaswerkmeister-wmde@deploy1003 lucaswerkmeister-wmde, otto: Continuing with sync
  • 13:49 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1019.eqiad.wmnet,service=s6
  • 13:46 bking@deploy1003: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 13:45 bking@deploy1003: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow-test-k8s: apply
  • 13:44 logmsgbot: lucaswerkmeister-wmde@deploy1003 lucaswerkmeister-wmde, otto: Backport for EventStreamConfig - fix for private wiki streams (T346046 T371433) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:42 logmsgbot: lucaswerkmeister-wmde@deploy1003 Started scap sync-world: Backport for EventStreamConfig - fix for private wiki streams (T346046 T371433)
  • 13:40 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=dns6001.wikimedia.org,service=recdns [reason: [done] pdns-rec upgrade]
  • 13:39 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=dns6001.wikimedia.org,service=recdns [reason: pdns-rec upgrade]
  • 13:39 logmsgbot: lucaswerkmeister-wmde@deploy1003 Finished scap: Backport for TranslatablePage: Store source page ids as string in WAN cache (T366455), TranslatablePage: Store source page ids as string in WAN cache (T366455) (duration: 12m 34s)
  • 13:39 sukhe: upgrade pdns-recursor to 4.8.8 from from 4.8.7 on dns6001
  • 13:34 logmsgbot: lucaswerkmeister-wmde@deploy1003 lucaswerkmeister-wmde, abi: Continuing with sync
  • 13:28 logmsgbot: lucaswerkmeister-wmde@deploy1003 lucaswerkmeister-wmde, abi: Backport for TranslatablePage: Store source page ids as string in WAN cache (T366455), TranslatablePage: Store source page ids as string in WAN cache (T366455) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:27 cdanis@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/aus-k8s-eqiad-services/jaeger: apply
  • 13:26 logmsgbot: lucaswerkmeister-wmde@deploy1003 Started scap sync-world: Backport for TranslatablePage: Store source page ids as string in WAN cache (T366455), TranslatablePage: Store source page ids as string in WAN cache (T366455)
  • 13:25 logmsgbot: lucaswerkmeister-wmde@deploy1003 Finished scap: Backport for Fix tracking parameter casing (T370045) (duration: 12m 30s)
  • 13:25 ayounsi@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1002.eqiad.wmnet with reason: Release v0.7.0 - ayounsi@cumin1002
  • 13:24 cdanis@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/aus-k8s-eqiad-services/jaeger: apply
  • 13:21 cdanis@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/aus-k8s-eqiad-services/jaeger: apply
  • 13:20 logmsgbot: lucaswerkmeister-wmde@deploy1003 lucaswerkmeister-wmde, joelyrookewmde: Continuing with sync
  • 13:19 ayounsi@cumin1002: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1002.eqiad.wmnet with reason: Release v0.7.0 - ayounsi@cumin1002
  • 13:18 cdanis@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/aus-k8s-eqiad-services/jaeger: apply
  • 13:16 logmsgbot: lucaswerkmeister-wmde@deploy1003 lucaswerkmeister-wmde, joelyrookewmde: Backport for Fix tracking parameter casing (T370045) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:13 fabfur: running `sudo cumin -b 1 -s300 A:cp-ulsfo 'depool-cdn && sleep 30 && enable-puppet "T370741" && run-puppet-agent && pool-cdn'` (T370741)
  • 13:12 logmsgbot: lucaswerkmeister-wmde@deploy1003 Started scap sync-world: Backport for Fix tracking parameter casing (T370045)
  • 12:57 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4044.ulsfo.wmnet [reason: pooling after cookbook depooled as puppet was disabled]
  • 12:57 elukey: update debmonitor-server and python3-debmonitor to bookworm-wikimedia - T368744
  • 12:54 sukhe@cumin1002: END (FAIL) - Cookbook sre.cdn.roll-upgrade-ats (exit_code=1) Rolling upgrade/restart of Apache Traffic Server on P{cp4044*} and (A:cp-eqiad or A:cp-text_eqiad or A:cp-upload_eqiad or A:cp-codfw or A:cp-text_codfw or A:cp-upload_codfw or A:cp-esams or A:cp-text_esams or A:cp-upload_esams or A:cp-ulsfo or A:cp-text_ulsfo or A:cp-upload_ulsfo or A:cp-eqsin or A:cp-text_eqsin or A:cp-upload_eqsin or A:cp-
  • 12:53 sukhe: upgrade cp4044 to ATS 9.2.5: T339134
  • 12:53 sukhe@cumin1002: START - Cookbook sre.cdn.roll-upgrade-ats Rolling upgrade/restart of Apache Traffic Server on P{cp4044*} and (A:cp-eqiad or A:cp-text_eqiad or A:cp-upload_eqiad or A:cp-codfw or A:cp-text_codfw or A:cp-upload_codfw or A:cp-esams or A:cp-text_esams or A:cp-upload_esams or A:cp-ulsfo or A:cp-text_ulsfo or A:cp-upload_ulsfo or A:cp-eqsin or A:cp-text_eqsin or A:cp-upload_eqsin or A:cp-drmrs or A:cp-text_
  • 12:50 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp4037.ulsfo.wmnet
  • 12:50 fabfur: repool cp4037, haproxy configuration modified to exclude benthos logging (T370741)
  • 12:46 cdanis@deploy1003: helmfile [aux-k8s-eqiad] DONE helmfile.d/aus-k8s-eqiad-services/jaeger: apply
  • 12:44 klausman@deploy1003: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 12:39 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp4037.ulsfo.wmnet
  • 12:39 fabfur: temporary depooling cp4037 to test remove all Benthos resources (T370741)
  • 12:34 ayounsi@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) netbox to netbox2003.codfw.wmnet,netbox1003.eqiad.wmnet with reason: Release v4.0.8 to future netbox prod - ayounsi@cumin1002 - T336275
  • 12:33 cdanis@deploy1003: helmfile [aux-k8s-eqiad] START helmfile.d/aus-k8s-eqiad-services/jaeger: apply
  • 12:30 fabfur: temporary disabling puppet on cp-ulsfo to test remove benthos from cp4037 (https://gerrit.wikimedia.org/r/c/operations/puppet/+/1057823) (T370741)
  • 12:25 ayounsi@cumin1002: START - Cookbook sre.deploy.python-code netbox to netbox2003.codfw.wmnet,netbox1003.eqiad.wmnet with reason: Release v4.0.8 to future netbox prod - ayounsi@cumin1002 - T336275
  • 12:22 klausman@deploy1003: helmfile [ml-serve-eqiad] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
  • 12:12 dreamyjazz@deploy1003: Finished scap: Backport for Grant checkuser-temporary-account-no-preference to suppress group (T371364) (duration: 08m 57s)
  • 12:11 Dreamy_Jazz: Running `mwscript extensions/MediaModeration/maintenance/updateMetrics.php --wiki=commonswiki --verbose
  • 12:08 marostegui@cumin1002: dbctl commit (dc=all): 'db2218 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P67159 and previous config saved to /var/cache/conftool/dbconfig/20240731-120844-root.json
  • 12:07 dreamyjazz@deploy1003: dreamyjazz: Continuing with sync
  • 12:07 dreamyjazz@deploy1003: dreamyjazz: Backport for Grant checkuser-temporary-account-no-preference to suppress group (T371364) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 12:06 akosiaris@cumin1002: conftool action : set/pooled=yes; selector: name=parse2001.codfw.wmnet
  • 12:06 akosiaris@cumin1002: conftool action : set/weight=10; selector: name=parse2001.codfw.wmnet
  • 12:03 dreamyjazz@deploy1003: Started scap sync-world: Backport for Grant checkuser-temporary-account-no-preference to suppress group (T371364)
  • 11:55 klausman@deploy1003: helmfile [ml-serve-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
  • 11:53 marostegui@cumin1002: dbctl commit (dc=all): 'db2218 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P67158 and previous config saved to /var/cache/conftool/dbconfig/20240731-115338-root.json
  • 11:38 marostegui@cumin1002: dbctl commit (dc=all): 'db2218 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P67156 and previous config saved to /var/cache/conftool/dbconfig/20240731-113833-root.json
  • 11:35 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on dbstore1007.eqiad.wmnet with reason: Long schema change
  • 11:35 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on dbstore1007.eqiad.wmnet with reason: Long schema change
  • 11:25 akosiaris@cumin1002: conftool action : set/pooled=yes; selector: name=parse1001.eqiad.wmnet
  • 11:25 akosiaris@cumin1002: conftool action : set/weight=10; selector: name=parse1001.eqiad.wmnet
  • 11:23 marostegui@cumin1002: dbctl commit (dc=all): 'db2218 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P67155 and previous config saved to /var/cache/conftool/dbconfig/20240731-112327-root.json
  • 11:11 urbanecm@deploy1003: Finished scap: Backport for EventStreamConfig: Re-enable mediawiki_eventbus on private wikis (T371433) (duration: 08m 02s)
  • 11:11 claime: Removing /var/lib/puppet/server/ssl/ca/signed/docker-registry.discovery.wmnet.pem on puppetmaster1001
  • 11:08 marostegui@cumin1002: dbctl commit (dc=all): 'db2218 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P67154 and previous config saved to /var/cache/conftool/dbconfig/20240731-110822-root.json
  • 11:07 klausman@deploy1003: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
  • 11:07 urbanecm@deploy1003: urbanecm: Continuing with sync
  • 11:05 akosiaris@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host parse2001.codfw.wmnet with OS bullseye
  • 11:05 urbanecm@deploy1003: urbanecm: Backport for EventStreamConfig: Re-enable mediawiki_eventbus on private wikis (T371433) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 11:03 urbanecm@deploy1003: Started scap sync-world: Backport for EventStreamConfig: Re-enable mediawiki_eventbus on private wikis (T371433)
  • 11:01 akosiaris@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host parse1001.eqiad.wmnet with OS bullseye
  • 10:53 marostegui@cumin1002: dbctl commit (dc=all): 'db2218 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P67153 and previous config saved to /var/cache/conftool/dbconfig/20240731-105317-root.json
  • 10:46 akosiaris@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on parse2001.codfw.wmnet with reason: host reimage
  • 10:46 dreamyjazz@deploy1003: Finished scap: Backport for Unblock CI (T371324), Unblock CI (T371324) (duration: 07m 29s)
  • 10:43 akosiaris@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on parse2001.codfw.wmnet with reason: host reimage
  • 10:42 akosiaris@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on parse1001.eqiad.wmnet with reason: host reimage
  • 10:41 dreamyjazz@deploy1003: dreamyjazz: Continuing with sync
  • 10:41 dreamyjazz@deploy1003: dreamyjazz: Backport for Unblock CI (T371324), Unblock CI (T371324) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 10:39 akosiaris@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on parse1001.eqiad.wmnet with reason: host reimage
  • 10:39 dreamyjazz@deploy1003: Started scap sync-world: Backport for Unblock CI (T371324), Unblock CI (T371324)
  • 10:38 marostegui@cumin1002: dbctl commit (dc=all): 'db2218 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P67152 and previous config saved to /var/cache/conftool/dbconfig/20240731-103811-root.json
  • 10:37 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2218 T371462', diff saved to https://phabricator.wikimedia.org/P67151 and previous config saved to /var/cache/conftool/dbconfig/20240731-103704-marostegui.json
  • 10:35 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db2220 to s7 primary T371462', diff saved to https://phabricator.wikimedia.org/P67150 and previous config saved to /var/cache/conftool/dbconfig/20240731-103513-root.json
  • 10:33 marostegui: Starting s7 codfw failover from db2218 to db2220 - T371462
  • 10:26 akosiaris@cumin1002: START - Cookbook sre.hosts.reimage for host parse2001.codfw.wmnet with OS bullseye
  • 10:25 akosiaris@cumin1002: START - Cookbook sre.hosts.reimage for host parse1001.eqiad.wmnet with OS bullseye
  • 10:18 akosiaris: revoke docker-registry.discovery.wmnet old certificate from Puppet CA that would expire in a few days. It hasn't been in use since https://gerrit.wikimedia.org/r/c/operations/puppet/+/1018251
  • 10:15 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 27 hosts with reason: Primary switchover s7 T371462
  • 10:14 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 27 hosts with reason: Primary switchover s7 T371462
  • 10:14 mfossati@deploy1003: Finished deploy [airflow-dags/platform_eng@6ef5a7a]: (no justification provided) (duration: 00m 30s)
  • 10:13 mfossati@deploy1003: Started deploy [airflow-dags/platform_eng@6ef5a7a]: (no justification provided)
  • 09:56 marostegui@cumin1002: dbctl commit (dc=all): 'Remove db2220 from API/vslow/dump T371462', diff saved to https://phabricator.wikimedia.org/P67149 and previous config saved to /var/cache/conftool/dbconfig/20240731-095640-root.json
  • 09:56 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 27 hosts with reason: Primary switchover s7 T371462
  • 09:56 marostegui@cumin1002: dbctl commit (dc=all): 'Set db2220 with weight 0 T371462', diff saved to https://phabricator.wikimedia.org/P67148 and previous config saved to /var/cache/conftool/dbconfig/20240731-095609-root.json
  • 09:55 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 27 hosts with reason: Primary switchover s7 T371462
  • 09:55 marostegui@cumin1002: dbctl commit (dc=all): 'Repool db2220', diff saved to https://phabricator.wikimedia.org/P67147 and previous config saved to /var/cache/conftool/dbconfig/20240731-095545-marostegui.json
  • 09:52 marostegui@cumin1002: dbctl commit (dc=all): 'db2205 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P67146 and previous config saved to /var/cache/conftool/dbconfig/20240731-095200-root.json
  • 09:50 marostegui@cumin1002: dbctl commit (dc=all): 'db2220 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P67145 and previous config saved to /var/cache/conftool/dbconfig/20240731-095050-root.json
  • 09:36 marostegui@cumin1002: dbctl commit (dc=all): 'db2205 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P67144 and previous config saved to /var/cache/conftool/dbconfig/20240731-093654-root.json
  • 09:35 marostegui@cumin1002: dbctl commit (dc=all): 'db2220 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P67143 and previous config saved to /var/cache/conftool/dbconfig/20240731-093545-root.json
  • 09:25 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1019.eqiad.wmnet,service=s4
  • 09:21 marostegui@cumin1002: dbctl commit (dc=all): 'db2205 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P67142 and previous config saved to /var/cache/conftool/dbconfig/20240731-092149-root.json
  • 09:20 marostegui@cumin1002: dbctl commit (dc=all): 'db2220 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P67141 and previous config saved to /var/cache/conftool/dbconfig/20240731-092039-root.json
  • 09:18 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2220.codfw.wmnet with reason: Maintenance
  • 09:18 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on db2220.codfw.wmnet with reason: Maintenance
  • 09:17 marostegui@cumin1002: dbctl commit (dc=all): 'Move db2121 to vslow T371361', diff saved to https://phabricator.wikimedia.org/P67140 and previous config saved to /var/cache/conftool/dbconfig/20240731-091706-root.json
  • 09:14 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2220 T371361', diff saved to https://phabricator.wikimedia.org/P67139 and previous config saved to /var/cache/conftool/dbconfig/20240731-091450-root.json
  • 09:06 marostegui@cumin1002: dbctl commit (dc=all): 'db2205 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P67138 and previous config saved to /var/cache/conftool/dbconfig/20240731-090643-root.json
  • 08:51 marostegui@cumin1002: dbctl commit (dc=all): 'db2205 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P67137 and previous config saved to /var/cache/conftool/dbconfig/20240731-085138-root.json
  • 08:47 marostegui@cumin1002: dbctl commit (dc=all): 'db1179 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P67136 and previous config saved to /var/cache/conftool/dbconfig/20240731-084705-root.json
  • 08:36 marostegui@cumin1002: dbctl commit (dc=all): 'db2205 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P67135 and previous config saved to /var/cache/conftool/dbconfig/20240731-083633-root.json
  • 08:32 marostegui@cumin1002: dbctl commit (dc=all): 'db1179 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P67134 and previous config saved to /var/cache/conftool/dbconfig/20240731-083159-root.json
  • 08:21 marostegui@cumin1002: dbctl commit (dc=all): 'db2205 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P67133 and previous config saved to /var/cache/conftool/dbconfig/20240731-082127-root.json
  • 08:18 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2205 T371455', diff saved to https://phabricator.wikimedia.org/P67132 and previous config saved to /var/cache/conftool/dbconfig/20240731-081801-root.json
  • 08:16 marostegui@cumin1002: dbctl commit (dc=all): 'db1179 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P67131 and previous config saved to /var/cache/conftool/dbconfig/20240731-081654-root.json
  • 08:16 marostegui: Starting s3 codfw failover from db2205 to db2209 - T371455
  • 08:10 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 24 hosts with reason: Switchover s3
  • 08:10 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 24 hosts with reason: Switchover s3
  • 08:01 marostegui@cumin1002: dbctl commit (dc=all): 'db1179 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P67130 and previous config saved to /var/cache/conftool/dbconfig/20240731-080148-root.json
  • 08:00 marostegui@cumin1002: dbctl commit (dc=all): 'db2203 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P67129 and previous config saved to /var/cache/conftool/dbconfig/20240731-080017-root.json
  • 07:57 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2222.mgmt.codfw.wmnet with reboot policy GRACEFUL
  • 07:49 elukey@cumin1002: START - Cookbook sre.hosts.provision for host db2222.mgmt.codfw.wmnet with reboot policy GRACEFUL
  • 07:46 marostegui@cumin1002: dbctl commit (dc=all): 'db1179 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P67128 and previous config saved to /var/cache/conftool/dbconfig/20240731-074643-root.json
  • 07:45 marostegui@cumin1002: dbctl commit (dc=all): 'db2203 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P67127 and previous config saved to /var/cache/conftool/dbconfig/20240731-074512-root.json
  • 07:41 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'clear' for AS: 64049
  • 07:39 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'clear' for AS: 64049
  • 07:30 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2221.mgmt.codfw.wmnet with reboot policy GRACEFUL
  • 07:30 marostegui@cumin1002: dbctl commit (dc=all): 'db2203 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P67126 and previous config saved to /var/cache/conftool/dbconfig/20240731-073006-root.json
  • 07:21 elukey@cumin1002: START - Cookbook sre.hosts.provision for host db2221.mgmt.codfw.wmnet with reboot policy GRACEFUL
  • 07:16 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 24 hosts with reason: Primary switchover s3 T371455
  • 07:16 marostegui@cumin1002: dbctl commit (dc=all): 'Set db2209 with weight 0 T371455', diff saved to https://phabricator.wikimedia.org/P67125 and previous config saved to /var/cache/conftool/dbconfig/20240731-071645-root.json
  • 07:16 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 24 hosts with reason: Primary switchover s3 T371455
  • 07:15 marostegui@cumin1002: dbctl commit (dc=all): 'db2203 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P67124 and previous config saved to /var/cache/conftool/dbconfig/20240731-071500-root.json
  • 07:14 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db1179.mgmt.eqiad.wmnet with reboot policy GRACEFUL
  • 07:01 elukey@cumin1002: START - Cookbook sre.hosts.provision for host db1179.mgmt.eqiad.wmnet with reboot policy GRACEFUL
  • 06:59 marostegui@cumin1002: dbctl commit (dc=all): 'db2203 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P67123 and previous config saved to /var/cache/conftool/dbconfig/20240731-065955-root.json
  • 06:53 marostegui@cumin1002: dbctl commit (dc=all): 'db1209 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P67122 and previous config saved to /var/cache/conftool/dbconfig/20240731-065341-root.json
  • 06:53 marostegui@cumin1002: dbctl commit (dc=all): 'db2209 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P67121 and previous config saved to /var/cache/conftool/dbconfig/20240731-065320-root.json
  • 06:50 slyngs: Upgrading CAS to version 7.0
  • 06:48 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1179.eqiad.wmnet with reason: Maintenance
  • 06:48 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on db1179.eqiad.wmnet with reason: Maintenance
  • 06:47 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1179 T371132', diff saved to https://phabricator.wikimedia.org/P67120 and previous config saved to /var/cache/conftool/dbconfig/20240731-064752-root.json
  • 06:44 marostegui@cumin1002: dbctl commit (dc=all): 'db2203 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P67119 and previous config saved to /var/cache/conftool/dbconfig/20240731-064449-root.json
  • 06:38 marostegui@cumin1002: dbctl commit (dc=all): 'db1209 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P67118 and previous config saved to /var/cache/conftool/dbconfig/20240731-063835-root.json
  • 06:38 marostegui@cumin1002: dbctl commit (dc=all): 'db2209 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P67117 and previous config saved to /var/cache/conftool/dbconfig/20240731-063814-root.json
  • 06:23 marostegui@cumin1002: dbctl commit (dc=all): 'db1209 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P67116 and previous config saved to /var/cache/conftool/dbconfig/20240731-062330-root.json
  • 06:23 marostegui@cumin1002: dbctl commit (dc=all): 'db2209 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P67115 and previous config saved to /var/cache/conftool/dbconfig/20240731-062308-root.json
  • 05:56 marostegui@cumin1002: dbctl commit (dc=all): 'db1173 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P67112 and previous config saved to /var/cache/conftool/dbconfig/20240731-055645-root.json
  • 05:53 marostegui@cumin1002: dbctl commit (dc=all): 'db1209 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P67111 and previous config saved to /var/cache/conftool/dbconfig/20240731-055319-root.json
  • 05:52 marostegui@cumin1002: dbctl commit (dc=all): 'db2209 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P67110 and previous config saved to /var/cache/conftool/dbconfig/20240731-055256-root.json
  • 05:50 marostegui@cumin1002: dbctl commit (dc=all): 'Make db2127 vslow and remove it as candidate master T371361', diff saved to https://phabricator.wikimedia.org/P67109 and previous config saved to /var/cache/conftool/dbconfig/20240731-055004-marostegui.json
  • 05:47 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2209.codfw.wmnet with reason: Change binlog format
  • 05:47 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on db2209.codfw.wmnet with reason: Change binlog format
  • 05:46 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2209 T371361', diff saved to https://phabricator.wikimedia.org/P67108 and previous config saved to /var/cache/conftool/dbconfig/20240731-054653-root.json
  • 05:44 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1169 (T367856)', diff saved to https://phabricator.wikimedia.org/P67107 and previous config saved to /var/cache/conftool/dbconfig/20240731-054414-marostegui.json
  • 05:44 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 05:43 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 05:43 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1163 (T367856)', diff saved to https://phabricator.wikimedia.org/P67106 and previous config saved to /var/cache/conftool/dbconfig/20240731-054352-marostegui.json
  • 05:41 marostegui@cumin1002: dbctl commit (dc=all): 'db1173 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P67105 and previous config saved to /var/cache/conftool/dbconfig/20240731-054140-root.json
  • 05:38 marostegui@cumin1002: dbctl commit (dc=all): 'db1209 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P67104 and previous config saved to /var/cache/conftool/dbconfig/20240731-053813-root.json
  • 05:28 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P67103 and previous config saved to /var/cache/conftool/dbconfig/20240731-052845-marostegui.json
  • 05:26 marostegui@cumin1002: dbctl commit (dc=all): 'db1173 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P67102 and previous config saved to /var/cache/conftool/dbconfig/20240731-052634-root.json
  • 05:23 marostegui@cumin1002: dbctl commit (dc=all): 'db1209 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P67101 and previous config saved to /var/cache/conftool/dbconfig/20240731-052308-root.json
  • 05:22 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1209 T371368', diff saved to https://phabricator.wikimedia.org/P67100 and previous config saved to /var/cache/conftool/dbconfig/20240731-052216-marostegui.json
  • 05:21 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db1193 to s8 primary and set section read-write T371368', diff saved to https://phabricator.wikimedia.org/P67099 and previous config saved to /var/cache/conftool/dbconfig/20240731-052114-root.json
  • 05:20 marostegui@cumin1002: dbctl commit (dc=all): 'Set s8 eqiad as read-only for maintenance - T371368', diff saved to https://phabricator.wikimedia.org/P67098 and previous config saved to /var/cache/conftool/dbconfig/20240731-052036-root.json
  • 05:20 marostegui: Starting s8 eqiad failover from db1209 to db1193 - T371368
  • 05:13 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P67097 and previous config saved to /var/cache/conftool/dbconfig/20240731-051339-marostegui.json
  • 05:11 marostegui@cumin1002: dbctl commit (dc=all): 'db1173 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P67096 and previous config saved to /var/cache/conftool/dbconfig/20240731-051129-root.json
  • 04:58 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1163 (T367856)', diff saved to https://phabricator.wikimedia.org/P67095 and previous config saved to /var/cache/conftool/dbconfig/20240731-045832-marostegui.json
  • 04:56 marostegui@cumin1002: dbctl commit (dc=all): 'Remove db1193 from API/vslow/dump T371368', diff saved to https://phabricator.wikimedia.org/P67094 and previous config saved to /var/cache/conftool/dbconfig/20240731-045649-root.json
  • 04:56 marostegui@cumin1002: dbctl commit (dc=all): 'Set db1193 with weight 0 T371368', diff saved to https://phabricator.wikimedia.org/P67093 and previous config saved to /var/cache/conftool/dbconfig/20240731-045631-root.json
  • 04:56 marostegui@cumin1002: dbctl commit (dc=all): 'db1173 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P67092 and previous config saved to /var/cache/conftool/dbconfig/20240731-045623-root.json
  • 04:56 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 32 hosts with reason: Primary switchover s8 T371368
  • 04:55 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 32 hosts with reason: Primary switchover s8 T371368
  • 04:52 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1173 T371365', diff saved to https://phabricator.wikimedia.org/P67091 and previous config saved to /var/cache/conftool/dbconfig/20240731-045158-marostegui.json
  • 04:49 marostegui@cumin1002: dbctl commit (dc=all): 'Set s6 eqiad as read-only for maintenance - T371365', diff saved to https://phabricator.wikimedia.org/P67089 and previous config saved to /var/cache/conftool/dbconfig/20240731-044954-root.json
  • 04:49 marostegui: Starting s6 eqiad failover from db1173 to db1201 - T371365
  • 04:35 marostegui@cumin1002: dbctl commit (dc=all): 'Remove db1201 from API/vslow/dump T371365', diff saved to https://phabricator.wikimedia.org/P67088 and previous config saved to /var/cache/conftool/dbconfig/20240731-043528-marostegui.json
  • 04:35 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s6 T371365
  • 04:35 marostegui@cumin1002: dbctl commit (dc=all): 'Set db1201 with weight 0 T371365', diff saved to https://phabricator.wikimedia.org/P67087 and previous config saved to /var/cache/conftool/dbconfig/20240731-043459-marostegui.json
  • 04:34 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 25 hosts with reason: Primary switchover s6 T371365
  • 02:29 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1247 (T367856)', diff saved to https://phabricator.wikimedia.org/P67086 and previous config saved to /var/cache/conftool/dbconfig/20240731-022920-marostegui.json
  • 02:29 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1247.eqiad.wmnet with reason: Maintenance
  • 02:29 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1247.eqiad.wmnet with reason: Maintenance
  • 00:55 eileen: civicrm upgraded from 4d3d2720 to d1f1d7bd
  • 00:03 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1248.eqiad.wmnet with OS bullseye
  • 00:03 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 00:02 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"

2024-07-30

  • 23:53 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1249.eqiad.wmnet with OS bullseye
  • 23:53 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 23:52 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 23:50 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=93) for host wikikube-worker1248.mgmt.eqiad.wmnet with reboot policy FORCED
  • 23:49 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1247.eqiad.wmnet with OS bullseye
  • 23:49 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 23:48 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 23:46 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1246.eqiad.wmnet with OS bullseye
  • 23:46 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 23:45 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1248.eqiad.wmnet with reason: host reimage
  • 23:45 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 23:44 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1244.eqiad.wmnet with OS bullseye
  • 23:44 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 23:43 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 23:42 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1248.eqiad.wmnet with reason: host reimage
  • 23:35 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1249.eqiad.wmnet with reason: host reimage
  • 23:34 tzatziki: removing 1 file for legal compliance
  • 23:32 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1249.eqiad.wmnet with reason: host reimage
  • 23:31 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1247.eqiad.wmnet with reason: host reimage
  • 23:28 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1246.eqiad.wmnet with reason: host reimage
  • 23:26 tzatziki: removing 1 file for legal compliance
  • 23:26 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1248.eqiad.wmnet with OS bullseye
  • 23:25 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1247.eqiad.wmnet with reason: host reimage
  • 23:25 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1244.eqiad.wmnet with reason: host reimage
  • 23:23 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1246.eqiad.wmnet with reason: host reimage
  • 23:22 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1244.eqiad.wmnet with reason: host reimage
  • 23:22 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1248.mgmt.eqiad.wmnet with reboot policy FORCED
  • 23:17 eileen: civicrm upgraded from 3db16342 to 4d3d2720
  • 23:15 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1241.eqiad.wmnet with OS bullseye
  • 23:15 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 23:15 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1249.eqiad.wmnet with OS bullseye
  • 23:14 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1249.mgmt.eqiad.wmnet with reboot policy FORCED
  • 23:14 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 23:13 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1248.mgmt.eqiad.wmnet with reboot policy FORCED
  • 23:12 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1245.eqiad.wmnet with OS bullseye
  • 23:12 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 23:11 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 23:09 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1247.eqiad.wmnet with OS bullseye
  • 23:09 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1243.eqiad.wmnet with OS bullseye
  • 23:09 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 23:08 tzatziki: removing 2 files for legal compliance
  • 23:07 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 23:06 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1246.eqiad.wmnet with OS bullseye
  • 23:06 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1242.eqiad.wmnet with OS bullseye
  • 23:06 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 23:06 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1244.eqiad.wmnet with OS bullseye
  • 23:05 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1244.mgmt.eqiad.wmnet with reboot policy FORCED
  • 23:04 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 22:56 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1241.eqiad.wmnet with reason: host reimage
  • 22:53 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1245.eqiad.wmnet with reason: host reimage
  • 22:50 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1243.eqiad.wmnet with reason: host reimage
  • 22:49 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1245.eqiad.wmnet with reason: host reimage
  • 22:47 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1242.eqiad.wmnet with reason: host reimage
  • 22:45 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1241.eqiad.wmnet with reason: host reimage
  • 22:45 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1243.eqiad.wmnet with reason: host reimage
  • 22:45 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1242.eqiad.wmnet with reason: host reimage
  • 22:41 eileen: config revision changed from d2484ce6 to e8cc0ed6
  • 22:35 eileen: config revision changed from 10ead940 to d2484ce6
  • 22:34 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1244.mgmt.eqiad.wmnet with reboot policy FORCED
  • 22:32 eileen: civicrm upgraded from 5ac353bd to 3db16342
  • 22:32 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1245.eqiad.wmnet with OS bullseye
  • 22:28 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1242.eqiad.wmnet with OS bullseye
  • 22:28 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1243.eqiad.wmnet with OS bullseye
  • 22:28 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1241.eqiad.wmnet with OS bullseye
  • 21:53 urbanecm@deploy1003: Finished scap: Backport for Fix resource response to use JSON content type header (T263870), Fix resource response to use JSON content type header (T263870) (duration: 08m 09s)
  • 21:45 urbanecm@deploy1003: Started scap sync-world: Backport for Fix resource response to use JSON content type header (T263870), Fix resource response to use JSON content type header (T263870)
  • 21:23 cjming@deploy1003: Finished scap: Backport for Deploy MetricsPlatform to beta cluster (T366234) (duration: 11m 41s)
  • 21:18 cjming@deploy1003: cjming: Continuing with sync
  • 21:14 cjming@deploy1003: cjming: Backport for Deploy MetricsPlatform to beta cluster (T366234) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:11 cjming@deploy1003: Started scap sync-world: Backport for Deploy MetricsPlatform to beta cluster (T366234)
  • 21:06 cjming@deploy1003: Finished scap: Backport for Enable Parsoid Read Views on {en,he}wikivoyage (T365367) (duration: 13m 18s)
  • 21:01 cjming@deploy1003: cjming, cscott: Continuing with sync
  • 20:58 cjming@deploy1003: cjming, cscott: Backport for Enable Parsoid Read Views on {en,he}wikivoyage (T365367) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:52 cjming@deploy1003: Started scap sync-world: Backport for Enable Parsoid Read Views on {en,he}wikivoyage (T365367)
  • 20:48 cjming@deploy1003: Finished scap: Backport for Add NetworkSession extension (T355267) (duration: 45m 08s)
  • 20:40 cjming@deploy1003: ebernhardson, cjming: Continuing with sync
  • 20:38 cjming@deploy1003: ebernhardson, cjming: Backport for Add NetworkSession extension (T355267) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:16 godog: bounce benthos@webrequest_live.service on centrallog for excessive lag
  • 20:06 topranks: re-enable BGP to lvs2011 on lsw1-a2-codfw (restores as primary for traffic) T370891
  • 20:03 cjming@deploy1003: Started scap sync-world: Backport for Add NetworkSession extension (T355267)
  • 19:58 topranks: rebooting lvs2011 to force new network config T370891
  • 19:37 eileen: civicrm upgraded from 5e72c64f to 5ac353bd
  • 19:29 topranks: disable BGP to lvs2011 on lsw1-a2-codfw (moves traffic to lvs2014) in advnace of vlan change T370891
  • 19:29 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lvs2011.codfw.wmnet with reason: reconfigure vlans on lvs2011
  • 19:29 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on lvs2011.codfw.wmnet with reason: reconfigure vlans on lvs2011
  • 19:28 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lsw1-a2-codfw.mgmt with reason: reconfigure vlans on lvs2011
  • 19:28 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on lsw1-a2-codfw.mgmt with reason: reconfigure vlans on lvs2011
  • 19:21 ryankemper@deploy1003: Finished deploy [wdqs/wdqs@316bf7f]: 0.3.145 (duration: 07m 59s)
  • 19:13 ryankemper@deploy1003: Started deploy [wdqs/wdqs@316bf7f]: 0.3.145
  • 18:53 brennen@deploy1003: rebuilt and synchronized wikiversions files: group0 to 1.43.0-wmf.16 refs T366961
  • 18:39 topranks: re-enabling BGP to lvs2012 from lsw1-b2-codfw T370862
  • 18:33 brennen: 1.43.0-wmf.16 train (T366961): blockers resolved, rolling to group0
  • 18:31 brennen@deploy1003: Finished scap: Backport for Bump wikimedia/parsoid to 0.20.0-a16 (T371376 T371126) (duration: 08m 54s)
  • 18:29 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lvs2012.codfw.wmnet with reason: reconfigure vlans on lvs2012
  • 18:29 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on lvs2012.codfw.wmnet with reason: reconfigure vlans on lvs2012
  • 18:29 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lsw1-b2-codfw.mgmt with reason: reconfigure vlans on lvs2012
  • 18:29 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on lsw1-b2-codfw.mgmt with reason: reconfigure vlans on lvs2012
  • 18:27 topranks: rebooting lvs2012 (again) to force new network config T370862
  • 18:26 brennen@deploy1003: brennen, cscott: Continuing with sync
  • 18:25 brennen@deploy1003: brennen, cscott: Backport for Bump wikimedia/parsoid to 0.20.0-a16 (T371376 T371126) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 18:23 brennen@deploy1003: Started scap sync-world: Backport for Bump wikimedia/parsoid to 0.20.0-a16 (T371376 T371126)
  • 18:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repool db1174', diff saved to https://phabricator.wikimedia.org/P67083 and previous config saved to /var/cache/conftool/dbconfig/20240730-181331-ladsgroup.json
  • 18:13 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lvs2012.codfw.wmnet with reason: reconfigure vlans on lvs2012
  • 18:13 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on lvs2012.codfw.wmnet with reason: reconfigure vlans on lvs2012
  • 18:12 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depool db1174', diff saved to https://phabricator.wikimedia.org/P67082 and previous config saved to /var/cache/conftool/dbconfig/20240730-181242-ladsgroup.json
  • 18:05 Dreamy_Jazz: Stopped MediaModeration scanning script on ruwiki
  • 17:56 topranks: rebooting lvs2012 to force new network config T370862
  • 17:55 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lvs2012.codfw.wmnet with reason: reconfigure vlans on lvs2012
  • 17:55 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on lvs2012.codfw.wmnet with reason: reconfigure vlans on lvs2012
  • 17:55 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lsw1-b2-codfw.mgmt with reason: reconfigure vlans on lvs2012
  • 17:55 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on lsw1-b2-codfw.mgmt with reason: reconfigure vlans on lvs2012
  • 17:51 hashar@deploy1003: Finished deploy [gerrit/gerrit@40e4e0f]: wm-pcc: separate v5 and v7 in two runs - T371407 (duration: 00m 09s)
  • 17:50 hashar@deploy1003: Started deploy [gerrit/gerrit@40e4e0f]: wm-pcc: separate v5 and v7 in two runs - T371407
  • 17:20 topranks: disable BGP to PyBal on lvs2012 from lsw1-b2-codfw (moving traffic to lvs2014)
  • 17:18 otto@deploy1003: Finished scap: mediawiki.org - Apache Rewrite /beacon/event -> EventLogging rest handler - T353817 (duration: 05m 56s)
  • 17:18 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on lsw1-b2-codfw.mgmt with reason: reconfigure vlans on lvs2012
  • 17:18 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:40:00 on lsw1-b2-codfw.mgmt with reason: reconfigure vlans on lvs2012
  • 17:17 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on lvs2012.codfw.wmnet with reason: reconfigure vlans on lvs2012
  • 17:17 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:40:00 on lvs2012.codfw.wmnet with reason: reconfigure vlans on lvs2012
  • 17:13 otto@deploy1003: Started scap sync-world: mediawiki.org - Apache Rewrite /beacon/event -> EventLogging rest handler - T353817
  • 17:12 topranks: adding row C/D vlans to lsw1-b2-codfw and adding on trunk to lvs2012 T370862
  • 16:09 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply
  • 16:08 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/shellbox-video: apply
  • 16:07 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply
  • 16:07 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply
  • 16:06 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply
  • 16:06 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/shellbox-video: apply
  • 15:56 akosiaris: restart pybal for parsoid-php removal on lvs1019, lvs2013 T359387
  • 15:50 jnuche@deploy1003: Installation of scap version "latest" completed for 213 hosts
  • 15:49 jnuche@deploy1003: Installing scap version "latest" for 213 hosts
  • 15:48 jnuche@deploy1003: Installing scap version "latest" for 214 hosts
  • 15:47 jnuche@deploy1003: Installation of scap version "latest" completed for 2 hosts
  • 15:47 jnuche@deploy1003: Installing scap version "latest" for 2 hosts
  • 15:20 akosiaris: restart pybal for parsoid-php removal on lvs1020, lvs2014 T359387
  • 15:09 ayounsi@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) netbox to netbox-dev2003.codfw.wmnet with reason: Release v4.0.8 to netbox-next - ayounsi@cumin1002 - T336275
  • 15:04 ayounsi@cumin1002: START - Cookbook sre.deploy.python-code netbox to netbox-dev2003.codfw.wmnet with reason: Release v4.0.8 to netbox-next - ayounsi@cumin1002 - T336275
  • 15:03 ayounsi@cumin1002: END (FAIL) - Cookbook sre.deploy.python-code (exit_code=99) netbox to netbox-dev2003.codfw.wmnet with reason: Release v4.0.8 to netbox-next - ayounsi@cumin1002 - T336275
  • 15:00 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=dns7001.wikimedia.org [reason: finished upgrading anycast-hc: T370068]
  • 14:59 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pc2017.mgmt.codfw.wmnet with reboot policy GRACEFUL
  • 14:58 ayounsi@cumin1002: START - Cookbook sre.deploy.python-code netbox to netbox-dev2003.codfw.wmnet with reason: Release v4.0.8 to netbox-next - ayounsi@cumin1002 - T336275
  • 14:51 sukhe: [dns7001] upgrade anycast-healthchecker to 0.9.8-1+wmf12u2: T370068
  • 14:51 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=dns7001.wikimedia.org [reason: upgrading anycast-hc: T370068]
  • 14:48 mforns@deploy1003: Finished deploy [airflow-dags/analytics@e1fdaac]: (no justification provided) (duration: 00m 26s)
  • 14:47 mforns@deploy1003: Started deploy [airflow-dags/analytics@e1fdaac]: (no justification provided)
  • 14:47 mforns@deploy1003: Finished deploy [airflow-dags/analytics@e1fdaac]: (no justification provided) (duration: 00m 15s)
  • 14:47 mforns@deploy1003: Started deploy [airflow-dags/analytics@e1fdaac]: (no justification provided)
  • 14:45 urbanecm: mwmaint1002: mwscript extensions/GrowthExperiments/maintenance/revalidateLinkRecommendations.php --wiki=enwiki --all --verbose (T370802; log kept at mwmaint1002:/home/urbanecm/revalidateLinkRecommendations-T370802-july-2024.log)
  • 14:42 elukey@cumin1002: START - Cookbook sre.hosts.provision for host pc2017.mgmt.codfw.wmnet with reboot policy GRACEFUL
  • 14:42 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pc1017.mgmt.eqiad.wmnet with reboot policy GRACEFUL
  • 14:36 ayounsi@cumin1002: END (FAIL) - Cookbook sre.deploy.python-code (exit_code=99) netbox to netbox-dev2003.codfw.wmnet with reason: Release v4.0.8 to netbox-next - ayounsi@cumin1002 - T336275
  • 14:36 ayounsi@cumin1002: START - Cookbook sre.deploy.python-code netbox to netbox-dev2003.codfw.wmnet with reason: Release v4.0.8 to netbox-next - ayounsi@cumin1002 - T336275
  • 14:35 ayounsi@cumin1002: END (FAIL) - Cookbook sre.deploy.python-code (exit_code=99) netbox to netbox-dev2003.codfw.wmnet with reason: Release v4.0.8 to netbox-next - ayounsi@cumin1002 - T336275
  • 14:35 ayounsi@cumin1002: START - Cookbook sre.deploy.python-code netbox to netbox-dev2003.codfw.wmnet with reason: Release v4.0.8 to netbox-next - ayounsi@cumin1002 - T336275
  • 14:33 elukey@cumin1002: START - Cookbook sre.hosts.provision for host pc1017.mgmt.eqiad.wmnet with reboot policy GRACEFUL
  • 14:33 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1247.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:27 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1246.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:26 hnowlan@deploy1003: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
  • 14:25 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1243.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:25 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1248.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:25 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker1244.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:25 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1244.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:24 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker1248.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:24 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1248.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:24 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1249.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:22 hnowlan@deploy1003: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 14:22 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1241.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:22 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1242.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:21 jnuche@deploy1003: Installation of scap version "latest" completed for 2 hosts
  • 14:21 jnuche@deploy1003: Installing scap version "latest" for 2 hosts
  • 14:20 jnuche@deploy1003: Installing scap version "latest" for 3 hosts
  • 14:20 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1245.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:11 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1247.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:09 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1246.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:07 jclark@cumin1002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=97)
  • 14:07 jclark@cumin1002: END (ERROR) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=97) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt wikikube-worker1241-9 - jclark@cumin1002"
  • 14:06 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1243.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:06 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt wikikube-worker1241-9 - jclark@cumin1002"
  • 14:02 jclark@cumin1002: START - Cookbook sre.dns.netbox
  • 13:58 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1241.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:58 marostegui: Remove clouddb1021 from zarcillo database T368518
  • 13:57 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker1244.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:57 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker1241.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:57 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1245.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:57 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1244.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:57 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1241.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:56 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:56 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt wikikube-worker1241-9 - jclark@cumin1002"
  • 13:55 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker1245.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:55 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt wikikube-worker1241-9 - jclark@cumin1002"
  • 13:55 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1245.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:54 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker1241.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:54 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker1245.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:54 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker1244.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:54 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1245.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:54 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1244.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:54 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1242.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:54 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1241.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:49 jclark@cumin1002: START - Cookbook sre.dns.netbox
  • 13:48 urbanecm@deploy1003: Finished scap: Backport for [eswiki] Enable Visual Editor in namespace Project (T370158), [euwiki] Enable Visual Editor in namespaces Project and Wikiproiektu (T368632), Enable VisualEditor at Spanish Wikiquote (T355336) (duration: 16m 12s)
  • 13:43 marostegui@cumin1002: dbctl commit (dc=all): 'db1193 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P67079 and previous config saved to /var/cache/conftool/dbconfig/20240730-134352-root.json
  • 13:43 urbanecm@deploy1003: urbanecm, gergesshamon: Continuing with sync
  • 13:39 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1240.eqiad.wmnet with OS bullseye
  • 13:39 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 13:34 urbanecm@deploy1003: urbanecm, gergesshamon: Backport for [eswiki] Enable Visual Editor in namespace Project (T370158), [euwiki] Enable Visual Editor in namespaces Project and Wikiproiektu (T368632), Enable VisualEditor at Spanish Wikiquote (T355336) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:33 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 13:32 urbanecm@deploy1003: Started scap sync-world: Backport for [eswiki] Enable Visual Editor in namespace Project (T370158), [euwiki] Enable Visual Editor in namespaces Project and Wikiproiektu (T368632), Enable VisualEditor at Spanish Wikiquote (T355336)
  • 13:31 urbanecm@deploy1003: Finished scap: Backport for Update nlwiki AbuseFilter config per consensus (T370605) (duration: 09m 35s)
  • 13:30 elukey: deprecate the sre-admins posix group fleetwide (replaced by ops-limited) - T360356
  • 13:28 marostegui@cumin1002: dbctl commit (dc=all): 'db1193 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P67078 and previous config saved to /var/cache/conftool/dbconfig/20240730-132846-root.json
  • 13:26 urbanecm@deploy1003: xxblackburnxx, urbanecm: Continuing with sync
  • 13:25 urbanecm@deploy1003: xxblackburnxx, urbanecm: Backport for Update nlwiki AbuseFilter config per consensus (T370605) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:22 urbanecm@deploy1003: Started scap sync-world: Backport for Update nlwiki AbuseFilter config per consensus (T370605)
  • 13:21 urbanecm@deploy1003: Finished scap: Backport for [Growth] hywwiki: Disable Add link backend (T370558), refreshLinkRecommendations: Work even when link-recommendation is disabled (T371316), refreshLinkRecommendations: Work even when link-recommendation is disabled (T371316) (duration: 22m 31s)
  • 13:16 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1240.eqiad.wmnet with reason: host reimage
  • 13:13 marostegui@cumin1002: dbctl commit (dc=all): 'db1193 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P67077 and previous config saved to /var/cache/conftool/dbconfig/20240730-131341-root.json
  • 13:13 Dreamy_Jazz: ruwiki scan is set to time out after 5 hours
  • 13:13 Dreamy_Jazz: Started MediaModeration scan on ruwiki to catch-up on monthly limit
  • 13:12 Dreamy_Jazz: Started MediaModeration script after it crashed - https://wikitech.wikimedia.org/wiki/MediaModeration
  • 13:12 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1240.eqiad.wmnet with reason: host reimage
  • 13:12 marostegui@cumin1002: dbctl commit (dc=all): 'db1201 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P67076 and previous config saved to /var/cache/conftool/dbconfig/20240730-131223-root.json
  • 12:58 urbanecm@deploy1003: Started scap sync-world: Backport for [Growth] hywwiki: Disable Add link backend (T370558), refreshLinkRecommendations: Work even when link-recommendation is disabled (T371316), refreshLinkRecommendations: Work even when link-recommendation is disabled (T371316)
  • 12:58 marostegui@cumin1002: dbctl commit (dc=all): 'db1193 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P67074 and previous config saved to /var/cache/conftool/dbconfig/20240730-125836-root.json
  • 12:57 marostegui@cumin1002: dbctl commit (dc=all): 'db1201 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P67073 and previous config saved to /var/cache/conftool/dbconfig/20240730-125717-root.json
  • 12:56 jnuche@deploy1003: Installation of scap version "latest" completed for 2 hosts
  • 12:56 jnuche@deploy1003: Installing scap version "latest" for 2 hosts
  • 12:56 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1240.eqiad.wmnet with OS bullseye
  • 12:55 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker1240.eqiad.wmnet with OS bullseye
  • 12:43 marostegui@cumin1002: dbctl commit (dc=all): 'db1193 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P67072 and previous config saved to /var/cache/conftool/dbconfig/20240730-124330-root.json
  • 12:42 marostegui@cumin1002: dbctl commit (dc=all): 'db1201 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P67071 and previous config saved to /var/cache/conftool/dbconfig/20240730-124212-root.json
  • 12:41 urbanecm: mwdebug1001: scap pull to overcome scap issues
  • 12:28 marostegui@cumin1002: dbctl commit (dc=all): 'db1193 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P67070 and previous config saved to /var/cache/conftool/dbconfig/20240730-122825-root.json
  • 12:27 marostegui@cumin1002: dbctl commit (dc=all): 'db1201 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P67069 and previous config saved to /var/cache/conftool/dbconfig/20240730-122706-root.json
  • 12:23 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1193.eqiad.wmnet with reason: Change binlog format
  • 12:22 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on db1193.eqiad.wmnet with reason: Change binlog format
  • 12:22 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1193 T371361', diff saved to https://phabricator.wikimedia.org/P67068 and previous config saved to /var/cache/conftool/dbconfig/20240730-122243-root.json
  • 12:21 JustHannah: T371253 Ran mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=dewiktionary --logwiki=metawiki 'Gregorjohannes' 'Klegul'
  • 12:17 urbanecm@deploy1003: Started scap sync-world: Backport for [Growth] hywwiki: Disable Add link backend (T370558), refreshLinkRecommendations: Work even when link-recommendation is disabled (T371316), refreshLinkRecommendations: Work even when link-recommendation is disabled (T371316)
  • 12:16 urbanecm@deploy1003: sync-world aborted: Backport for [Growth] hywwiki: Disable Add link backend (T370558), refreshLinkRecommendations: Work even when link-recommendation is disabled (T371316), refreshLinkRecommendations: Work even when link-recommendation is disabled (T371316) (duration: 14m 10s)
  • 12:15 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1231 T371361', diff saved to https://phabricator.wikimedia.org/P67066 and previous config saved to /var/cache/conftool/dbconfig/20240730-121500-root.json
  • 12:12 marostegui@cumin1002: dbctl commit (dc=all): 'db1201 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P67065 and previous config saved to /var/cache/conftool/dbconfig/20240730-121201-root.json
  • 12:08 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1201.eqiad.wmnet with reason: Change binlog format
  • 12:08 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on db1201.eqiad.wmnet with reason: Change binlog format
  • 12:08 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1201 T371361', diff saved to https://phabricator.wikimedia.org/P67064 and previous config saved to /var/cache/conftool/dbconfig/20240730-120805-root.json
  • 12:02 urbanecm@deploy1003: Started scap sync-world: Backport for [Growth] hywwiki: Disable Add link backend (T370558), refreshLinkRecommendations: Work even when link-recommendation is disabled (T371316), refreshLinkRecommendations: Work even when link-recommendation is disabled (T371316)
  • 11:54 hnowlan@deploy1003: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 11:52 hnowlan@deploy1003: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 11:47 hnowlan@deploy1003: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 11:47 hnowlan@deploy1003: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 11:16 marostegui@cumin1002: dbctl commit (dc=all): 'db1179 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P67062 and previous config saved to /var/cache/conftool/dbconfig/20240730-111622-root.json
  • 11:13 marostegui@cumin1002: dbctl commit (dc=all): 'db1224 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P67061 and previous config saved to /var/cache/conftool/dbconfig/20240730-111331-root.json
  • 11:10 kevinbazira@deploy1003: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'logo-detection' for release 'main' .
  • 11:03 klausman@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
  • 11:03 klausman@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
  • 11:02 klausman@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
  • 11:02 klausman@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
  • 11:02 klausman@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
  • 11:02 klausman@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
  • 11:02 klausman@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
  • 11:01 klausman@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
  • 11:01 klausman@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
  • 11:01 kevinbazira@deploy1003: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'logo-detection' for release 'main' .
  • 11:01 klausman@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
  • 11:01 marostegui@cumin1002: dbctl commit (dc=all): 'db1179 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P67060 and previous config saved to /var/cache/conftool/dbconfig/20240730-110117-root.json
  • 11:00 klausman@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
  • 11:00 klausman@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
  • 11:00 klausman@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
  • 10:59 klausman@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
  • 10:58 klausman@deploy1003: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
  • 10:58 marostegui@cumin1002: dbctl commit (dc=all): 'db1224 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P67059 and previous config saved to /var/cache/conftool/dbconfig/20240730-105825-root.json
  • 10:56 klausman@deploy1003: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
  • 10:55 klausman@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
  • 10:55 klausman@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
  • 10:55 klausman@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
  • 10:55 klausman@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
  • 10:55 klausman@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
  • 10:54 klausman@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
  • 10:54 klausman@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
  • 10:54 klausman@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
  • 10:53 klausman@deploy1003: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
  • 10:51 klausman@deploy1003: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
  • 10:50 volans@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2227.codfw.wmnet with OS bookworm
  • 10:50 volans@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - volans@cumin2002"
  • 10:49 volans@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - volans@cumin2002"
  • 10:47 marostegui@cumin1002: dbctl commit (dc=all): 'db2216 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P67058 and previous config saved to /var/cache/conftool/dbconfig/20240730-104705-root.json
  • 10:46 marostegui@cumin1002: dbctl commit (dc=all): 'db1179 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P67057 and previous config saved to /var/cache/conftool/dbconfig/20240730-104612-root.json
  • 10:43 marostegui@cumin1002: dbctl commit (dc=all): 'db1224 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P67056 and previous config saved to /var/cache/conftool/dbconfig/20240730-104318-root.json
  • 10:33 kevinbazira@deploy1003: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'logo-detection' for release 'main' .
  • 10:32 volans@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2227.codfw.wmnet with reason: host reimage
  • 10:32 marostegui@cumin1002: dbctl commit (dc=all): 'db2216 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P67054 and previous config saved to /var/cache/conftool/dbconfig/20240730-103200-root.json
  • 10:31 marostegui@cumin1002: dbctl commit (dc=all): 'db1179 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P67053 and previous config saved to /var/cache/conftool/dbconfig/20240730-103106-root.json
  • 10:29 volans@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2227.codfw.wmnet with reason: host reimage
  • 10:28 marostegui@cumin1002: dbctl commit (dc=all): 'db1224 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P67052 and previous config saved to /var/cache/conftool/dbconfig/20240730-102813-root.json
  • 10:21 klausman@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
  • 10:20 klausman@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
  • 10:20 klausman@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
  • 10:20 klausman@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
  • 10:20 klausman@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
  • 10:20 klausman@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
  • 10:20 klausman@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
  • 10:20 klausman@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
  • 10:16 marostegui@cumin1002: dbctl commit (dc=all): 'db2216 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P67051 and previous config saved to /var/cache/conftool/dbconfig/20240730-101654-root.json
  • 10:16 marostegui@cumin1002: dbctl commit (dc=all): 'db1179 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P67050 and previous config saved to /var/cache/conftool/dbconfig/20240730-101600-root.json
  • 10:14 volans@cumin2002: START - Cookbook sre.hosts.reimage for host db2227.codfw.wmnet with OS bookworm
  • 10:13 marostegui@cumin1002: dbctl commit (dc=all): 'db1224 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P67049 and previous config saved to /var/cache/conftool/dbconfig/20240730-101307-root.json
  • 10:08 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1240.mgmt.eqiad.wmnet with reboot policy FORCED
  • 10:02 elukey@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1240.mgmt.eqiad.wmnet with reboot policy FORCED
  • 10:01 marostegui@cumin1002: dbctl commit (dc=all): 'db2216 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P67048 and previous config saved to /var/cache/conftool/dbconfig/20240730-100148-root.json
  • 10:00 marostegui@cumin1002: dbctl commit (dc=all): 'db1179 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P67047 and previous config saved to /var/cache/conftool/dbconfig/20240730-100055-root.json
  • 09:58 marostegui@cumin1002: dbctl commit (dc=all): 'db1224 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P67046 and previous config saved to /var/cache/conftool/dbconfig/20240730-095802-root.json
  • 09:46 marostegui@cumin1002: dbctl commit (dc=all): 'db2216 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P67045 and previous config saved to /var/cache/conftool/dbconfig/20240730-094643-root.json
  • 09:45 marostegui@cumin1002: dbctl commit (dc=all): 'db1179 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P67044 and previous config saved to /var/cache/conftool/dbconfig/20240730-094549-root.json
  • 09:42 marostegui@cumin1002: dbctl commit (dc=all): 'db1224 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P67043 and previous config saved to /var/cache/conftool/dbconfig/20240730-094256-root.json
  • 09:42 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1179.eqiad.wmnet onto db1224.eqiad.wmnet
  • 09:31 marostegui@cumin1002: dbctl commit (dc=all): 'db2216 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P67042 and previous config saved to /var/cache/conftool/dbconfig/20240730-093138-root.json
  • 09:29 marostegui: Deploy schema change on db2203 s1 codfw dbmaint T367856
  • 09:26 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on netbox2003.codfw.wmnet,netbox1003.eqiad.wmnet with reason: netbox upgrade prep work
  • 09:26 ayounsi@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on netbox2003.codfw.wmnet,netbox1003.eqiad.wmnet with reason: netbox upgrade prep work
  • 09:21 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2203.codfw.wmnet with reason: Long schema change
  • 09:21 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2203.codfw.wmnet with reason: Long schema change
  • 09:19 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2203 T371345', diff saved to https://phabricator.wikimedia.org/P67041 and previous config saved to /var/cache/conftool/dbconfig/20240730-091925-marostegui.json
  • 09:17 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db2212 to s1 primary T371345', diff saved to https://phabricator.wikimedia.org/P67040 and previous config saved to /var/cache/conftool/dbconfig/20240730-091742-root.json
  • 09:10 marostegui: Starting s1 codfw failover from db2203 to db2212 - T371345
  • 08:46 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1245.eqiad.wmnet with reason: Maintenance
  • 08:46 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1245.eqiad.wmnet with reason: Maintenance
  • 08:45 marostegui@cumin1002: dbctl commit (dc=all): 'db1244 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P67039 and previous config saved to /var/cache/conftool/dbconfig/20240730-084525-root.json
  • 08:32 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db2216.codfw.wmnet onto db2212.codfw.wmnet
  • 08:30 marostegui@cumin1002: dbctl commit (dc=all): 'db1244 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P67038 and previous config saved to /var/cache/conftool/dbconfig/20240730-083020-root.json
  • 08:24 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 35 hosts with reason: Primary switchover s1 T371345
  • 08:24 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 35 hosts with reason: Primary switchover s1 T371345
  • 08:15 marostegui@cumin1002: dbctl commit (dc=all): 'db1244 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P67037 and previous config saved to /var/cache/conftool/dbconfig/20240730-081515-root.json
  • 08:11 akosiaris@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host deploy2002.codfw.wmnet with OS bullseye
  • 08:11 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker1240.mgmt.eqiad.wmnet with reboot policy FORCED
  • 08:06 marostegui: Update db1224 on zarcillo T371276
  • 08:06 marostegui@cumin1002: START - Cookbook sre.mysql.clone of db1179.eqiad.wmnet onto db1224.eqiad.wmnet
  • 08:06 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1179.eqiad.wmnet with reason: Move db1224 to x1
  • 08:05 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 5:00:00 on db1179.eqiad.wmnet with reason: Move db1224 to x1
  • 08:05 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1179 T371276', diff saved to https://phabricator.wikimedia.org/P67035 and previous config saved to /var/cache/conftool/dbconfig/20240730-080538-root.json
  • 08:05 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1224.eqiad.wmnet with reason: Move db1224 to x1
  • 08:05 elukey@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1240.mgmt.eqiad.wmnet with reboot policy FORCED
  • 08:05 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 5:00:00 on db1224.eqiad.wmnet with reason: Move db1224 to x1
  • 08:03 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-worker1240.mgmt.eqiad.wmnet with reboot policy FORCED
  • 08:02 elukey@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1240.mgmt.eqiad.wmnet with reboot policy FORCED
  • 08:01 marostegui@cumin1002: dbctl commit (dc=all): 'db1238 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P67034 and previous config saved to /var/cache/conftool/dbconfig/20240730-080135-root.json
  • 08:00 marostegui@cumin1002: dbctl commit (dc=all): 'db1244 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P67033 and previous config saved to /var/cache/conftool/dbconfig/20240730-080010-root.json
  • 07:46 marostegui@cumin1002: dbctl commit (dc=all): 'db1238 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P67032 and previous config saved to /var/cache/conftool/dbconfig/20240730-074629-root.json
  • 07:45 marostegui@cumin1002: dbctl commit (dc=all): 'db1244 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P67031 and previous config saved to /var/cache/conftool/dbconfig/20240730-074505-root.json
  • 07:33 akosiaris@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on deploy2002.codfw.wmnet with reason: host reimage
  • 07:31 marostegui@cumin1002: dbctl commit (dc=all): 'db1238 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P67030 and previous config saved to /var/cache/conftool/dbconfig/20240730-073124-root.json
  • 07:30 marostegui@cumin1002: dbctl commit (dc=all): 'db1244 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P67029 and previous config saved to /var/cache/conftool/dbconfig/20240730-072959-root.json
  • 07:28 akosiaris@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on deploy2002.codfw.wmnet with reason: host reimage
  • 07:16 marostegui@cumin1002: dbctl commit (dc=all): 'db1238 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P67028 and previous config saved to /var/cache/conftool/dbconfig/20240730-071619-root.json
  • 07:14 marostegui@cumin1002: dbctl commit (dc=all): 'db1244 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P67027 and previous config saved to /var/cache/conftool/dbconfig/20240730-071454-root.json
  • 07:14 godog: finish rolling out benthos 4.27.0-1
  • 07:10 akosiaris@cumin1002: START - Cookbook sre.hosts.reimage for host deploy2002.codfw.wmnet with OS bullseye
  • 07:01 marostegui@cumin1002: dbctl commit (dc=all): 'db1238 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P67026 and previous config saved to /var/cache/conftool/dbconfig/20240730-070114-root.json
  • 06:58 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1244.eqiad.wmnet onto db1238.eqiad.wmnet
  • 06:56 marostegui@cumin1002: START - Cookbook sre.mysql.clone of db2216.codfw.wmnet onto db2212.codfw.wmnet
  • 06:48 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2216', diff saved to https://phabricator.wikimedia.org/P67025 and previous config saved to /var/cache/conftool/dbconfig/20240730-064853-root.json
  • 06:48 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2212', diff saved to https://phabricator.wikimedia.org/P67024 and previous config saved to /var/cache/conftool/dbconfig/20240730-064835-root.json
  • 06:41 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 35 hosts with reason: Primary switchover s1 T371345
  • 06:41 marostegui@cumin1002: dbctl commit (dc=all): 'Set db2212 with weight 0 T371345', diff saved to https://phabricator.wikimedia.org/P67023 and previous config saved to /var/cache/conftool/dbconfig/20240730-064128-marostegui.json
  • 06:41 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 35 hosts with reason: Primary switchover s1 T371345
  • 05:24 marostegui@cumin1002: dbctl commit (dc=all): 'db2179 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P67022 and previous config saved to /var/cache/conftool/dbconfig/20240730-052420-root.json
  • 05:20 marostegui: Change candidate master in s4 eqiad (this is a NOOP) T371343
  • 05:09 marostegui@cumin1002: dbctl commit (dc=all): 'db2179 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P67021 and previous config saved to /var/cache/conftool/dbconfig/20240730-050914-root.json
  • 05:04 marostegui@cumin1002: START - Cookbook sre.mysql.clone of db1244.eqiad.wmnet onto db1238.eqiad.wmnet
  • 04:55 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1244.eqiad.wmnet with reason: Recloning db1238
  • 04:55 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1244.eqiad.wmnet with reason: Recloning db1238
  • 04:55 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1238.eqiad.wmnet with reason: Long schema change
  • 04:55 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1238.eqiad.wmnet with reason: Long schema change
  • 04:54 marostegui@cumin1002: dbctl commit (dc=all): 'db2179 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P67020 and previous config saved to /var/cache/conftool/dbconfig/20240730-045409-root.json
  • 04:53 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1238 T371251', diff saved to https://phabricator.wikimedia.org/P67019 and previous config saved to /var/cache/conftool/dbconfig/20240730-045336-marostegui.json
  • 04:51 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db1160 to s4 primary and set section read-write T371251', diff saved to https://phabricator.wikimedia.org/P67018 and previous config saved to /var/cache/conftool/dbconfig/20240730-045104-marostegui.json
  • 04:50 marostegui@cumin1002: dbctl commit (dc=all): 'Set s4 eqiad as read-only for maintenance - T371251', diff saved to https://phabricator.wikimedia.org/P67017 and previous config saved to /var/cache/conftool/dbconfig/20240730-045032-root.json
  • 04:50 marostegui: Starting s4 eqiad failover from db1238 to db1160 - T371251
  • 04:39 marostegui@cumin1002: dbctl commit (dc=all): 'db2179 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P67016 and previous config saved to /var/cache/conftool/dbconfig/20240730-043904-root.json
  • 04:27 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1163 (T367856)', diff saved to https://phabricator.wikimedia.org/P67015 and previous config saved to /var/cache/conftool/dbconfig/20240730-042755-marostegui.json
  • 04:27 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 04:27 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 04:25 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 32 hosts with reason: Primary switchover s4 T371251
  • 04:25 marostegui@cumin1002: dbctl commit (dc=all): 'Set db1160 with weight 0 T371251', diff saved to https://phabricator.wikimedia.org/P67014 and previous config saved to /var/cache/conftool/dbconfig/20240730-042528-root.json
  • 04:25 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 32 hosts with reason: Primary switchover s4 T371251
  • 04:23 marostegui@cumin1002: dbctl commit (dc=all): 'db2179 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P67013 and previous config saved to /var/cache/conftool/dbconfig/20240730-042358-root.json
  • 04:07 mwpresync@deploy1003: Pruned MediaWiki: 1.43.0-wmf.13 (duration: 06m 51s)
  • 03:02 mwpresync@deploy1003: Started scap sync-world: testwikis to 1.43.0-wmf.16 refs T366961
  • 02:52 eileen: disabled audit modules (Adyen audit etc)
  • 02:09 eileen: civicrm upgraded from 2837c4e9 to 5e72c64f
  • 02:05 eileen: config revision changed from 8e2f7c03 to 10ead940
  • 01:02 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216 (T367856)', diff saved to https://phabricator.wikimedia.org/P67011 and previous config saved to /var/cache/conftool/dbconfig/20240730-010232-marostegui.json
  • 00:47 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216', diff saved to https://phabricator.wikimedia.org/P67010 and previous config saved to /var/cache/conftool/dbconfig/20240730-004725-marostegui.json
  • 00:32 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216', diff saved to https://phabricator.wikimedia.org/P67009 and previous config saved to /var/cache/conftool/dbconfig/20240730-003218-marostegui.json
  • 00:17 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216 (T367856)', diff saved to https://phabricator.wikimedia.org/P67008 and previous config saved to /var/cache/conftool/dbconfig/20240730-001710-marostegui.json

2024-07-29

  • 23:19 eileen: civicrm upgraded from efbb874e to 2837c4e9
  • 22:19 eileen: * civicrm upgraded from 1dc4f944 to efbb874e
  • 21:42 dwisehaupt@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:42 dwisehaupt@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: * - dwisehaupt@cumin1002"
  • 21:41 dwisehaupt@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: * - dwisehaupt@cumin1002"
  • 21:38 dwisehaupt@cumin1002: START - Cookbook sre.dns.netbox
  • 21:09 cjming: end of UTC late backport window
  • 21:06 cjming@deploy1003: Finished scap: Backport for Produce a limited set of event streams on private wikis (pt 2) (T346046) (duration: 10m 40s)
  • 21:00 cjming@deploy1003: ebernhardson, cjming: Continuing with sync
  • 21:00 cjming@deploy1003: ebernhardson, cjming: Backport for Produce a limited set of event streams on private wikis (pt 2) (T346046) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:55 cjming@deploy1003: Started scap sync-world: Backport for Produce a limited set of event streams on private wikis (pt 2) (T346046)
  • 20:52 cjming@deploy1003: Finished scap: Backport for Clean up night mode exclude namespaces and allow font size on submit (T370092 T370505) (duration: 08m 18s)
  • 20:48 ebernhardson@deploy1003: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync
  • 20:48 ebernhardson@deploy1003: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync
  • 20:46 cjming@deploy1003: cjming, jdlrobson: Continuing with sync
  • 20:45 cjming@deploy1003: cjming, jdlrobson: Backport for Clean up night mode exclude namespaces and allow font size on submit (T370092 T370505) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:45 ebernhardson@deploy1003: helmfile [codfw] DONE helmfile.d/services/eventgate-main: sync
  • 20:45 ebernhardson@deploy1003: helmfile [codfw] START helmfile.d/services/eventgate-main: sync
  • 20:43 cjming@deploy1003: Started scap sync-world: Backport for Clean up night mode exclude namespaces and allow font size on submit (T370092 T370505)
  • 20:42 cjming@deploy1003: Finished scap: Backport for Produce a limited set of event streams on private wikis (pt 1) (T346046) (duration: 07m 30s)
  • 20:37 cjming@deploy1003: ebernhardson, cjming: Continuing with sync
  • 20:36 cjming@deploy1003: ebernhardson, cjming: Backport for Produce a limited set of event streams on private wikis (pt 1) (T346046) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:34 cjming@deploy1003: Started scap sync-world: Backport for Produce a limited set of event streams on private wikis (pt 1) (T346046)
  • 20:33 cjming@deploy1003: Finished scap: Backport for enwiki, commonswiki: lift IP cap for edit-a-thon (T371026) (duration: 07m 59s)
  • 20:27 cjming@deploy1003: superzerocool, cjming: Continuing with sync
  • 20:27 cjming@deploy1003: superzerocool, cjming: Backport for enwiki, commonswiki: lift IP cap for edit-a-thon (T371026) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:25 cjming@deploy1003: Started scap sync-world: Backport for enwiki, commonswiki: lift IP cap for edit-a-thon (T371026)
  • 20:19 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db2227.codfw.wmnet with OS bookworm
  • 20:15 cjming@deploy1003: Finished scap: Backport for Increase edit count requirement for autoconfirmed on English Wikivoyage (T371186) (duration: 08m 52s)
  • 20:10 cjming@deploy1003: nmw03, cjming: Continuing with sync
  • 20:08 cjming@deploy1003: nmw03, cjming: Backport for Increase edit count requirement for autoconfirmed on English Wikivoyage (T371186) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:06 cjming@deploy1003: Started scap sync-world: Backport for Increase edit count requirement for autoconfirmed on English Wikivoyage (T371186)
  • 18:59 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2227.codfw.wmnet with OS bookworm
  • 18:58 pt1979@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host db2227.codfw.wmnet with OS bookworm
  • 17:52 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2227.codfw.wmnet with OS bookworm
  • 17:51 urbanecm: mwmaint1002: kill extensions/GrowthExperiments/maintenance/refreshLinkRecommendations.php for enwiki (T370802)
  • 17:50 marostegui@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db2227.codfw.wmnet with OS bookworm
  • 17:26 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4052.ulsfo.wmnet [reason: testing ATS 9.2.5 upgrade]
  • 17:25 marostegui@cumin1002: START - Cookbook sre.hosts.reimage for host db2227.codfw.wmnet with OS bookworm
  • 17:24 marostegui@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db2227.codfw.wmnet with OS bookworm
  • 17:17 sukhe@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-ats (exit_code=0) Rolling upgrade/restart of Apache Traffic Server on P{cp4052*} and (A:cp-eqiad or A:cp-text_eqiad or A:cp-upload_eqiad or A:cp-codfw or A:cp-text_codfw or A:cp-upload_codfw or A:cp-esams or A:cp-text_esams or A:cp-upload_esams or A:cp-ulsfo or A:cp-text_ulsfo or A:cp-upload_ulsfo or A:cp-eqsin or A:cp-text_eqsin or A:cp-upload_eqsin or A:cp-
  • 17:14 sukhe@cumin1002: START - Cookbook sre.cdn.roll-upgrade-ats Rolling upgrade/restart of Apache Traffic Server on P{cp4052*} and (A:cp-eqiad or A:cp-text_eqiad or A:cp-upload_eqiad or A:cp-codfw or A:cp-text_codfw or A:cp-upload_codfw or A:cp-esams or A:cp-text_esams or A:cp-upload_esams or A:cp-ulsfo or A:cp-text_ulsfo or A:cp-upload_ulsfo or A:cp-eqsin or A:cp-text_eqsin or A:cp-upload_eqsin or A:cp-drmrs or A:cp-text_
  • 17:14 sukhe: reprepro -C main include bullseye-wikimedia trafficserver_9.2.5-1wm2_amd64.changes T339134
  • 16:47 urbanecm@deploy1003: Finished scap: Backport for Display a GlobalBlock link to stewards in Special:CheckUser (T370463 T178571), Ignore help-links with no title configured (T370941) (duration: 10m 56s)
  • 16:45 marostegui@cumin1002: START - Cookbook sre.hosts.reimage for host db2227.codfw.wmnet with OS bookworm
  • 16:44 marostegui@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db2227.codfw.wmnet with OS bookworm
  • 16:42 urbanecm@deploy1003: dreamyjazz, migr, urbanecm: Continuing with sync
  • 16:38 urbanecm@deploy1003: dreamyjazz, migr, urbanecm: Backport for Display a GlobalBlock link to stewards in Special:CheckUser (T370463 T178571), Ignore help-links with no title configured (T370941) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 16:38 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host gerrit2003.wikimedia.org with OS bookworm
  • 16:38 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 16:36 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 16:36 urbanecm@deploy1003: Started scap sync-world: Backport for Display a GlobalBlock link to stewards in Special:CheckUser (T370463 T178571), Ignore help-links with no title configured (T370941)
  • 16:30 Emperor: restart swift-proxy on ms-fe2011 T360913
  • 16:18 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on gerrit2003.wikimedia.org with reason: host reimage
  • 16:17 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp4052.ulsfo.wmnet [reason: testing ATS 9.2.5 upgrade]
  • 16:15 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on gerrit2003.wikimedia.org with reason: host reimage
  • 16:04 sukhe@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-ats (exit_code=0) Rolling upgrade/restart of Apache Traffic Server on P{cp4052*} and (A:cp-eqiad or A:cp-text_eqiad or A:cp-upload_eqiad or A:cp-codfw or A:cp-text_codfw or A:cp-upload_codfw or A:cp-esams or A:cp-text_esams or A:cp-upload_esams or A:cp-ulsfo or A:cp-text_ulsfo or A:cp-upload_ulsfo or A:cp-eqsin or A:cp-text_eqsin or A:cp-upload_eqsin or A:cp-
  • 16:01 sukhe@cumin1002: START - Cookbook sre.cdn.roll-upgrade-ats Rolling upgrade/restart of Apache Traffic Server on P{cp4052*} and (A:cp-eqiad or A:cp-text_eqiad or A:cp-upload_eqiad or A:cp-codfw or A:cp-text_codfw or A:cp-upload_codfw or A:cp-esams or A:cp-text_esams or A:cp-upload_esams or A:cp-ulsfo or A:cp-text_ulsfo or A:cp-upload_ulsfo or A:cp-eqsin or A:cp-text_eqsin or A:cp-upload_eqsin or A:cp-drmrs or A:cp-text_
  • 15:57 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host gerrit2003.wikimedia.org with OS bookworm
  • 15:56 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:56 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add public vlan for gerrit2003 - pt1979@cumin2002"
  • 15:56 sukhe: reprepro -C main include bullseye-wikimedia trafficserver_9.2.5-1wm1_amd64.changes T339134
  • 15:55 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add public vlan for gerrit2003 - pt1979@cumin2002"
  • 15:55 klausman@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
  • 15:54 klausman@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
  • 15:53 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 15:49 klausman@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
  • 15:48 klausman@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
  • 15:47 pt1979@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host gerrit2003.codfw.wmnet with OS bookworm
  • 15:41 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host gerrit2003.codfw.wmnet with OS bookworm
  • 15:40 pt1979@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host gerrit2003.codfw.wmnet with OS bookworm
  • 15:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2233.codfw.wmnet with OS bookworm
  • 15:33 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 15:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 15:23 klausman@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
  • 15:23 klausman@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
  • 15:18 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker1240.eqiad.wmnet with OS bullseye
  • 15:18 klausman@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
  • 15:17 klausman@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
  • 15:16 klausman@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
  • 15:16 klausman@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
  • 15:14 sukhe: running authdns-update after dns2006 depool
  • 15:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2233.codfw.wmnet with reason: host reimage
  • 15:13 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=dns2006.wikimedia.org [reason: finished upgrading anycast-hc: T370068]
  • 15:10 sukhe: [dns2006] upgrade anycast-healthchecker to 0.9.8-1+wmf12u2: T370068
  • 15:10 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2233.codfw.wmnet with reason: host reimage
  • 15:09 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=dns2006.wikimedia.org [reason: upgrading anycast-hc: T370068]
  • 15:02 marostegui@cumin1002: START - Cookbook sre.hosts.reimage for host db2227.codfw.wmnet with OS bookworm
  • 14:59 klausman@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
  • 14:58 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host gerrit2003.codfw.wmnet with OS bookworm
  • 14:58 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db2227.codfw.wmnet with OS bookworm
  • 14:57 klausman@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
  • 14:56 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db2233.codfw.wmnet with OS bookworm
  • 14:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db2233.codfw.wmnet with OS bookworm
  • 14:45 Lucas_WMDE: UTC afternoon backport+config window done
  • 14:41 logmsgbot: lucaswerkmeister-wmde@deploy1003 Finished scap: Backport for Revert "TranslatablePage: Split translatable page id cache into multiple shards" (T366455) (duration: 07m 58s)
  • 14:39 klausman@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
  • 14:37 klausman@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
  • 14:35 logmsgbot: lucaswerkmeister-wmde@deploy1003 lucaswerkmeister-wmde: Continuing with sync
  • 14:35 logmsgbot: lucaswerkmeister-wmde@deploy1003 lucaswerkmeister-wmde: Backport for Revert "TranslatablePage: Split translatable page id cache into multiple shards" (T366455) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:34 sukhe: sudo cumin -b1 -s120 'O:wikidough' 'run-puppet-agent'
  • 14:33 sukhe: A:wikidough: debdeploy upgrade anycast-hc to 0.9.8: T370068
  • 14:33 logmsgbot: lucaswerkmeister-wmde@deploy1003 Started scap sync-world: Backport for Revert "TranslatablePage: Split translatable page id cache into multiple shards" (T366455)
  • 14:33 sukhe: A:wikidough: debdeploy upgrade anycast-hc to 0.9.8
  • 14:24 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2229.codfw.wmnet with OS bookworm
  • 14:24 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 14:24 herron: the grafana default datasource has been changed from graphite to thanos T269333
  • 14:23 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 14:23 klausman@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
  • 14:22 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2231.codfw.wmnet with OS bookworm
  • 14:22 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 14:21 logmsgbot: lucaswerkmeister-wmde@deploy1003 Finished scap: Backport for TranslatablePage: Split translatable page id cache into multiple shards (T366455) (duration: 19m 24s)
  • 14:21 klausman@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
  • 14:20 klausman@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
  • 14:20 klausman@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
  • 14:19 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 14:19 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2230.codfw.wmnet with OS bookworm
  • 14:19 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 14:15 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 14:15 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2232.codfw.wmnet with OS bookworm
  • 14:15 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 14:14 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db2227.codfw.wmnet with OS bookworm
  • 14:13 logmsgbot: lucaswerkmeister-wmde@deploy1003 lucaswerkmeister-wmde, abi: Continuing with sync
  • 14:13 logmsgbot: lucaswerkmeister-wmde@deploy1003 lucaswerkmeister-wmde, abi: Backport for TranslatablePage: Split translatable page id cache into multiple shards (T366455) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2228.codfw.wmnet with OS bookworm
  • 14:13 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 14:12 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 14:09 SandraEbele_: rerunning airflow mediawiki_history_check_denormalize dag as down stream task after rerunning mediawiki_history_denormalize dag
  • 14:07 cgoubert@cumin1002: conftool action : set/weight=10:pooled=yes; selector: name=(wikikube-worker2039.codfw.wmnet),cluster=kubernetes,service=kubesvc [reason: Pooling and uncordoning - T351074]
  • 14:06 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 14:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2231.codfw.wmnet with reason: host reimage
  • 14:02 logmsgbot: lucaswerkmeister-wmde@deploy1003 Started scap sync-world: Backport for TranslatablePage: Split translatable page id cache into multiple shards (T366455)
  • 14:01 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1240.eqiad.wmnet with OS bullseye
  • 14:01 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker1240.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:01 jnuche@deploy1003: Installation of scap version "4.94.0" completed for 210 hosts
  • 14:00 jnuche@deploy1003: Installing scap version "4.94.0" for 210 hosts
  • 13:59 jnuche@deploy1003: Installing scap version "4.94.0" for 211 hosts
  • 13:59 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2229.codfw.wmnet with reason: host reimage
  • 13:56 claime: homer 'cr*codfw*' commit 'T351074'
  • 13:55 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2230.codfw.wmnet with reason: host reimage
  • 13:52 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2232.codfw.wmnet with reason: host reimage
  • 13:49 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2228.codfw.wmnet with reason: host reimage
  • 13:49 jclark@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker1240.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:48 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2231.codfw.wmnet with reason: host reimage
  • 13:48 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2229.codfw.wmnet with reason: host reimage
  • 13:48 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:48 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt wikikube-worker1240 - jclark@cumin1002"
  • 13:47 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2232.codfw.wmnet with reason: host reimage
  • 13:47 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2230.codfw.wmnet with reason: host reimage
  • 13:47 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt wikikube-worker1240 - jclark@cumin1002"
  • 13:47 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2228.codfw.wmnet with reason: host reimage
  • 13:46 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db2227.codfw.wmnet with OS bookworm
  • 13:45 logmsgbot: lucaswerkmeister-wmde@deploy1003 Synchronized php-1.43.0-wmf.15/extensions/ContentTranslation/extension.json: Backport for AX: Unregister "axArticleFooterEntrypointRegistrar" hook handler (T363338) (duration: 06m 36s)
  • 13:44 jclark@cumin1002: START - Cookbook sre.dns.netbox
  • 13:41 XioNoX: push new pfw policies - T371137
  • 13:36 filippo@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafkamon2003.codfw.wmnet
  • 13:33 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db2233.codfw.wmnet with OS bookworm
  • 13:33 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db2232.codfw.wmnet with OS bookworm
  • 13:33 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db2231.codfw.wmnet with OS bookworm
  • 13:33 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db2230.codfw.wmnet with OS bookworm
  • 13:33 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db2229.codfw.wmnet with OS bookworm
  • 13:33 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db2228.codfw.wmnet with OS bookworm
  • 13:33 filippo@cumin1002: START - Cookbook sre.hosts.reboot-single for host kafkamon2003.codfw.wmnet
  • 13:30 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2222.codfw.wmnet with OS bookworm
  • 13:30 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 13:29 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 13:24 logmsgbot: lucaswerkmeister-wmde@deploy1003 Synchronized wmf-config/: Backport for Enable mul language code on Wikidata (limited mode) (T330281) (duration: 06m 47s)
  • 13:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2225.codfw.wmnet with OS bookworm
  • 13:13 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2222.codfw.wmnet with reason: host reimage
  • 13:11 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2222.codfw.wmnet with reason: host reimage
  • 13:10 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2226.codfw.wmnet with OS bookworm
  • 13:10 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 13:07 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2225.codfw.wmnet with reason: host reimage
  • 13:05 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2225.codfw.wmnet with reason: host reimage
  • 13:03 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 13:02 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db2225.codfw.wmnet with OS bookworm
  • 13:02 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2223.codfw.wmnet with OS bookworm
  • 13:01 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 13:00 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 12:59 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host db2225.codfw.wmnet with OS bookworm
  • 12:58 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 12:57 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 12:57 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2224.codfw.wmnet with OS bookworm
  • 12:57 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 12:57 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db2222.codfw.wmnet with OS bookworm
  • 12:55 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 12:55 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2221.codfw.wmnet with OS bookworm
  • 12:55 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 12:53 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2039.codfw.wmnet with OS bullseye
  • 12:51 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 12:48 klausman@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
  • 12:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2226.codfw.wmnet with reason: host reimage
  • 12:47 klausman@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
  • 12:46 godog: upgrade and roll-restart benthos@mw_accesslog_sampler on logstash hosts
  • 12:45 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host db2222.codfw.wmnet with OS bookworm
  • 12:43 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2223.codfw.wmnet with reason: host reimage
  • 12:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2225.codfw.wmnet with reason: host reimage
  • 12:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2224.codfw.wmnet with reason: host reimage
  • 12:35 godog: test benthos 4.27 on logstash1023
  • 12:35 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2221.codfw.wmnet with reason: host reimage
  • 12:34 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2225.codfw.wmnet with reason: host reimage
  • 12:34 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2224.codfw.wmnet with reason: host reimage
  • 12:33 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2223.codfw.wmnet with reason: host reimage
  • 12:33 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2226.codfw.wmnet with reason: host reimage
  • 12:32 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2039.codfw.wmnet with reason: host reimage
  • 12:30 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2221.codfw.wmnet with reason: host reimage
  • 12:27 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2039.codfw.wmnet with reason: host reimage
  • 12:18 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db2227.codfw.wmnet with OS bookworm
  • 12:18 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db2226.codfw.wmnet with OS bookworm
  • 12:18 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db2225.codfw.wmnet with OS bookworm
  • 12:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db2224.codfw.wmnet with OS bookworm
  • 12:17 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db2223.codfw.wmnet with OS bookworm
  • 12:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db2222.codfw.wmnet with OS bookworm
  • 12:14 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db2221.codfw.wmnet with OS bookworm
  • 12:08 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2039.codfw.wmnet with OS bullseye
  • 12:07 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2441 to wikikube-worker2039
  • 12:07 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2039
  • 12:06 klausman@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
  • 12:02 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/recommendation-api: sync
  • 12:02 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/recommendation-api: sync
  • 12:01 klausman@deploy1003: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
  • 11:59 klausman@deploy1003: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
  • 11:51 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2039
  • 11:51 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:51 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2441 to wikikube-worker2039 - cgoubert@cumin1002"
  • 11:49 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2441 to wikikube-worker2039 - cgoubert@cumin1002"
  • 11:46 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 11:46 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2441 to wikikube-worker2039
  • 11:26 akosiaris@deploy1003: Finished scap: check the deployment server after switchover (duration: 32m 28s)
  • 11:14 marostegui@cumin1002: dbctl commit (dc=all): 'es1032 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P67004 and previous config saved to /var/cache/conftool/dbconfig/20240729-111410-root.json
  • 10:59 marostegui@cumin1002: dbctl commit (dc=all): 'es1032 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P67003 and previous config saved to /var/cache/conftool/dbconfig/20240729-105904-root.json
  • 10:54 akosiaris@deploy1003: Started scap sync-world: check the deployment server after switchover
  • 10:43 marostegui@cumin1002: dbctl commit (dc=all): 'es1032 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P67002 and previous config saved to /var/cache/conftool/dbconfig/20240729-104358-root.json
  • 10:28 marostegui@cumin1002: dbctl commit (dc=all): 'es1032 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P67001 and previous config saved to /var/cache/conftool/dbconfig/20240729-102853-root.json
  • 10:20 marostegui: Deploy schema change on s7 eqiad master with replication dbmaint T370394
  • 10:14 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2441.mgmt.codfw.wmnet with reboot policy GRACEFUL
  • 10:13 marostegui@cumin1002: dbctl commit (dc=all): 'es1032 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P67000 and previous config saved to /var/cache/conftool/dbconfig/20240729-101348-root.json
  • 10:12 godog: bounce benthos@mw_accesslog_sampler on logstash collectors
  • 10:11 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es1032.eqiad.wmnet with reason: Long schema change
  • 10:11 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on es1032.eqiad.wmnet with reason: Long schema change
  • 10:07 cgoubert@cumin1002: START - Cookbook sre.hosts.provision for host mw2441.mgmt.codfw.wmnet with reboot policy GRACEFUL
  • 09:31 Dreamy_Jazz: Restarted MediaModeration scanning script - https://wikitech.wikimedia.org/wiki/MediaModeration
  • 09:27 dcausse@deploy1002: Finished deploy [airflow-dags/search@7da1ef0]: search: process_sparql_query workaround oom issues (duration: 00m 20s)
  • 09:27 dcausse@deploy1002: Started deploy [airflow-dags/search@7da1ef0]: search: process_sparql_query workaround oom issues
  • 09:22 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es1032 investigate access denied errors', diff saved to https://phabricator.wikimedia.org/P66999 and previous config saved to /var/cache/conftool/dbconfig/20240729-092239-root.json
  • 09:17 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1244 (T367856)', diff saved to https://phabricator.wikimedia.org/P66998 and previous config saved to /var/cache/conftool/dbconfig/20240729-091658-marostegui.json
  • 09:16 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1244.eqiad.wmnet with reason: Maintenance
  • 09:16 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1244.eqiad.wmnet with reason: Maintenance
  • 09:16 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243 (T367856)', diff saved to https://phabricator.wikimedia.org/P66997 and previous config saved to /var/cache/conftool/dbconfig/20240729-091637-marostegui.json
  • 09:09 marostegui@cumin1002: dbctl commit (dc=all): 'Repool 25% of es1032', diff saved to https://phabricator.wikimedia.org/P66996 and previous config saved to /var/cache/conftool/dbconfig/20240729-090953-marostegui.json
  • 09:07 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es1032.eqiad.wmnet with reason: Long schema change
  • 09:07 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on es1032.eqiad.wmnet with reason: Long schema change
  • 09:07 marostegui@cumin1002: dbctl commit (dc=all): 'Depool es1032 investigate access denied errors', diff saved to https://phabricator.wikimedia.org/P66995 and previous config saved to /var/cache/conftool/dbconfig/20240729-090730-root.json
  • 09:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243', diff saved to https://phabricator.wikimedia.org/P66994 and previous config saved to /var/cache/conftool/dbconfig/20240729-090129-marostegui.json
  • 08:46 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243', diff saved to https://phabricator.wikimedia.org/P66992 and previous config saved to /var/cache/conftool/dbconfig/20240729-084622-marostegui.json
  • 08:31 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243 (T367856)', diff saved to https://phabricator.wikimedia.org/P66991 and previous config saved to /var/cache/conftool/dbconfig/20240729-083115-marostegui.json
  • 07:54 dcausse: closing the backport window
  • 07:53 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 24482
  • 07:51 dcausse@deploy1002: Finished scap: Backport for GeoData: add pool counter settings (T370621) (duration: 11m 36s)
  • 07:47 brouberol@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts karapace1001.eqiad.wmnet
  • 07:47 brouberol@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 07:47 brouberol@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: karapace1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1002"
  • 07:46 brouberol@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: karapace1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1002"
  • 07:46 dcausse@deploy1002: dcausse: Continuing with sync
  • 07:42 dcausse@deploy1002: dcausse: Backport for GeoData: add pool counter settings (T370621) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 07:41 brouberol@cumin1002: START - Cookbook sre.dns.netbox
  • 07:39 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 24482
  • 07:39 dcausse@deploy1002: Started scap sync-world: Backport for GeoData: add pool counter settings (T370621)
  • 07:39 ayounsi@cumin1002: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'configure' for AS: 24482
  • 07:37 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 24482
  • 07:34 kartik@deploy1002: Finished scap: Backport for Temporary disable MinT for Wikireaders for bn, fa, hi, and ko (duration: 14m 42s)
  • 07:34 brouberol@cumin1002: START - Cookbook sre.hosts.decommission for hosts karapace1001.eqiad.wmnet
  • 07:34 brouberol@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts karapace1002.eqiad.wmnet
  • 07:34 brouberol@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 07:34 brouberol@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: karapace1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1002"
  • 07:32 brouberol@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: karapace1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - brouberol@cumin1002"
  • 07:29 brouberol@cumin1002: START - Cookbook sre.dns.netbox
  • 07:25 kartik@deploy1002: kartik: Continuing with sync
  • 07:25 kartik@deploy1002: kartik: Backport for Temporary disable MinT for Wikireaders for bn, fa, hi, and ko synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 07:25 brouberol@cumin1002: START - Cookbook sre.hosts.decommission for hosts karapace1002.eqiad.wmnet
  • 07:19 kartik@deploy1002: Started scap sync-world: Backport for Temporary disable MinT for Wikireaders for bn, fa, hi, and ko
  • 07:19 kartik@deploy1002: Sync cancelled.
  • 07:19 kartik@deploy1002: kartik: Backport for Temporary disable MinT for Wikireaders for bn, fa, hi, and ko synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 07:03 kartik@deploy1002: Started scap sync-world: Backport for Temporary disable MinT for Wikireaders for bn, fa, hi, and ko
  • 06:48 marostegui: Deploy schema change on s4 codfw db2179 dbmaint T367856
  • 06:47 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2179.codfw.wmnet with reason: Long schema change
  • 06:46 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2179.codfw.wmnet with reason: Long schema change
  • 06:44 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2179 T371205', diff saved to https://phabricator.wikimedia.org/P66990 and previous config saved to /var/cache/conftool/dbconfig/20240729-064405-marostegui.json
  • 06:42 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db2140 to s4 primary T371205', diff saved to https://phabricator.wikimedia.org/P66989 and previous config saved to /var/cache/conftool/dbconfig/20240729-064250-marostegui.json
  • 06:42 marostegui: Starting s4 codfw failover from db2179 to db2140 - T371205
  • 03:08 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2216 (T367856)', diff saved to https://phabricator.wikimedia.org/P66984 and previous config saved to /var/cache/conftool/dbconfig/20240729-030804-marostegui.json
  • 03:07 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2216.codfw.wmnet with reason: Maintenance
  • 03:07 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2216.codfw.wmnet with reason: Maintenance
  • 03:07 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2212 (T367856)', diff saved to https://phabricator.wikimedia.org/P66983 and previous config saved to /var/cache/conftool/dbconfig/20240729-030742-marostegui.json
  • 02:52 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2212', diff saved to https://phabricator.wikimedia.org/P66982 and previous config saved to /var/cache/conftool/dbconfig/20240729-025235-marostegui.json
  • 02:37 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2212', diff saved to https://phabricator.wikimedia.org/P66981 and previous config saved to /var/cache/conftool/dbconfig/20240729-023728-marostegui.json
  • 02:22 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2212 (T367856)', diff saved to https://phabricator.wikimedia.org/P66980 and previous config saved to /var/cache/conftool/dbconfig/20240729-022221-marostegui.json

2024-07-28

  • 19:00 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1243 (T367856)', diff saved to https://phabricator.wikimedia.org/P66979 and previous config saved to /var/cache/conftool/dbconfig/20240728-190050-marostegui.json
  • 19:00 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1243.eqiad.wmnet with reason: Maintenance
  • 19:00 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1243.eqiad.wmnet with reason: Maintenance
  • 19:00 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242 (T367856)', diff saved to https://phabricator.wikimedia.org/P66978 and previous config saved to /var/cache/conftool/dbconfig/20240728-190028-marostegui.json
  • 18:45 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242', diff saved to https://phabricator.wikimedia.org/P66977 and previous config saved to /var/cache/conftool/dbconfig/20240728-184521-marostegui.json
  • 18:30 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242', diff saved to https://phabricator.wikimedia.org/P66976 and previous config saved to /var/cache/conftool/dbconfig/20240728-183013-marostegui.json
  • 18:15 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242 (T367856)', diff saved to https://phabricator.wikimedia.org/P66975 and previous config saved to /var/cache/conftool/dbconfig/20240728-181506-marostegui.json
  • 04:42 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2212 (T367856)', diff saved to https://phabricator.wikimedia.org/P66974 and previous config saved to /var/cache/conftool/dbconfig/20240728-044200-marostegui.json
  • 04:41 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2212.codfw.wmnet with reason: Maintenance
  • 04:41 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2212.codfw.wmnet with reason: Maintenance
  • 04:20 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1242 (T367856)', diff saved to https://phabricator.wikimedia.org/P66973 and previous config saved to /var/cache/conftool/dbconfig/20240728-042021-marostegui.json
  • 04:20 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1242.eqiad.wmnet with reason: Maintenance
  • 04:20 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1242.eqiad.wmnet with reason: Maintenance
  • 04:20 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241 (T367856)', diff saved to https://phabricator.wikimedia.org/P66972 and previous config saved to /var/cache/conftool/dbconfig/20240728-042000-marostegui.json
  • 04:04 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241', diff saved to https://phabricator.wikimedia.org/P66971 and previous config saved to /var/cache/conftool/dbconfig/20240728-040453-marostegui.json
  • 03:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241', diff saved to https://phabricator.wikimedia.org/P66970 and previous config saved to /var/cache/conftool/dbconfig/20240728-034946-marostegui.json
  • 03:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241 (T367856)', diff saved to https://phabricator.wikimedia.org/P66969 and previous config saved to /var/cache/conftool/dbconfig/20240728-033440-marostegui.json

2024-07-27

  • 13:59 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1241 (T367856)', diff saved to https://phabricator.wikimedia.org/P66968 and previous config saved to /var/cache/conftool/dbconfig/20240727-135859-marostegui.json
  • 13:58 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1241.eqiad.wmnet with reason: Maintenance
  • 13:58 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1241.eqiad.wmnet with reason: Maintenance
  • 13:58 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221 (T367856)', diff saved to https://phabricator.wikimedia.org/P66967 and previous config saved to /var/cache/conftool/dbconfig/20240727-135838-marostegui.json
  • 13:43 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221', diff saved to https://phabricator.wikimedia.org/P66966 and previous config saved to /var/cache/conftool/dbconfig/20240727-134331-marostegui.json
  • 13:28 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221', diff saved to https://phabricator.wikimedia.org/P66965 and previous config saved to /var/cache/conftool/dbconfig/20240727-132824-marostegui.json
  • 13:13 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221 (T367856)', diff saved to https://phabricator.wikimedia.org/P66964 and previous config saved to /var/cache/conftool/dbconfig/20240727-131316-marostegui.json
  • 11:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P66963 and previous config saved to /var/cache/conftool/dbconfig/20240727-113018-ladsgroup.json
  • 11:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P66962 and previous config saved to /var/cache/conftool/dbconfig/20240727-111512-ladsgroup.json
  • 11:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P66961 and previous config saved to /var/cache/conftool/dbconfig/20240727-110007-ladsgroup.json
  • 10:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P66960 and previous config saved to /var/cache/conftool/dbconfig/20240727-104502-ladsgroup.json
  • 10:11 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1246.eqiad.wmnet with reason: Sad
  • 10:11 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db1246.eqiad.wmnet with reason: Sad
  • 10:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depool db1246, paged', diff saved to https://phabricator.wikimedia.org/P66959 and previous config saved to /var/cache/conftool/dbconfig/20240727-100533-ladsgroup.json
  • 07:08 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2202.codfw.wmnet with reason: Maintenance
  • 07:08 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2202.codfw.wmnet with reason: Maintenance
  • 07:08 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188 (T367856)', diff saved to https://phabricator.wikimedia.org/P66958 and previous config saved to /var/cache/conftool/dbconfig/20240727-070839-marostegui.json
  • 06:53 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188', diff saved to https://phabricator.wikimedia.org/P66957 and previous config saved to /var/cache/conftool/dbconfig/20240727-065332-marostegui.json
  • 06:38 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188', diff saved to https://phabricator.wikimedia.org/P66956 and previous config saved to /var/cache/conftool/dbconfig/20240727-063824-marostegui.json
  • 06:23 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188 (T367856)', diff saved to https://phabricator.wikimedia.org/P66955 and previous config saved to /var/cache/conftool/dbconfig/20240727-062317-marostegui.json
  • 01:54 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2234.codfw.wmnet with OS bookworm
  • 01:54 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 01:26 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 01:13 pt1979@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host db2233.codfw.wmnet with OS bookworm
  • 01:10 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2234.codfw.wmnet with reason: host reimage
  • 01:07 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2234.codfw.wmnet with reason: host reimage
  • 01:07 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2233.codfw.wmnet with OS bookworm
  • 00:53 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2234.codfw.wmnet with OS bookworm
  • 00:51 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2235.codfw.wmnet with OS bookworm
  • 00:51 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 00:51 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2236.codfw.wmnet with OS bookworm
  • 00:51 pt1979@cumin2002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 00:42 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 00:26 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2235.codfw.wmnet with reason: host reimage
  • 00:24 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2235.codfw.wmnet with reason: host reimage
  • 00:20 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 00:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1163 (T352010)', diff saved to https://phabricator.wikimedia.org/P66954 and previous config saved to /var/cache/conftool/dbconfig/20240727-002016-ladsgroup.json
  • 00:10 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2235.codfw.wmnet with OS bookworm
  • 00:08 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2237.codfw.wmnet with OS bookworm
  • 00:08 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 00:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P66953 and previous config saved to /var/cache/conftool/dbconfig/20240727-000509-ladsgroup.json
  • 00:04 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2236.codfw.wmnet with reason: host reimage
  • 00:01 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2236.codfw.wmnet with reason: host reimage
  • 00:01 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"

2024-07-26

  • 23:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1163', diff saved to https://phabricator.wikimedia.org/P66952 and previous config saved to /var/cache/conftool/dbconfig/20240726-235001-ladsgroup.json
  • 23:46 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2236.codfw.wmnet with OS bookworm
  • 23:44 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2237.codfw.wmnet with reason: host reimage
  • 23:42 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2237.codfw.wmnet with reason: host reimage
  • 23:38 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2238.codfw.wmnet with OS bookworm
  • 23:38 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 23:36 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1221 (T367856)', diff saved to https://phabricator.wikimedia.org/P66951 and previous config saved to /var/cache/conftool/dbconfig/20240726-233648-marostegui.json
  • 23:36 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1015,1019].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 23:36 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1015,1019].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 23:36 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1221.eqiad.wmnet with reason: Maintenance
  • 23:36 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1221.eqiad.wmnet with reason: Maintenance
  • 23:36 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199 (T367856)', diff saved to https://phabricator.wikimedia.org/P66950 and previous config saved to /var/cache/conftool/dbconfig/20240726-233619-marostegui.json
  • 23:36 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 23:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1163 (T352010)', diff saved to https://phabricator.wikimedia.org/P66949 and previous config saved to /var/cache/conftool/dbconfig/20240726-233454-ladsgroup.json
  • 23:28 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2237.codfw.wmnet with OS bookworm
  • 23:21 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199', diff saved to https://phabricator.wikimedia.org/P66948 and previous config saved to /var/cache/conftool/dbconfig/20240726-232112-marostegui.json
  • 23:18 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2238.codfw.wmnet with reason: host reimage
  • 23:15 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2238.codfw.wmnet with reason: host reimage
  • 23:11 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2239.codfw.wmnet with OS bookworm
  • 23:10 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 23:09 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 23:06 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199', diff saved to https://phabricator.wikimedia.org/P66947 and previous config saved to /var/cache/conftool/dbconfig/20240726-230605-marostegui.json
  • 23:02 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2238.codfw.wmnet with OS bookworm
  • 22:52 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2239.codfw.wmnet with reason: host reimage
  • 22:51 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199 (T367856)', diff saved to https://phabricator.wikimedia.org/P66946 and previous config saved to /var/cache/conftool/dbconfig/20240726-225058-marostegui.json
  • 22:50 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2239.codfw.wmnet with reason: host reimage
  • 22:35 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2239.codfw.wmnet with OS bookworm
  • 22:35 pt1979@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host db2239.codfw.wmnet with OS bookworm
  • 20:44 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2239.codfw.wmnet with OS bookworm
  • 18:52 mutante: [deploy1002:~] $ echo 'https://sep11.wikipedia.org' | mwscript purgeList.php --wiki=aawiki - T367014
  • 18:02 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephmon1005.eqiad.wmnet with OS bullseye
  • 18:02 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 17:57 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 17:56 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephmon1006.eqiad.wmnet with OS bullseye
  • 17:56 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 17:53 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 17:41 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephmon1005.eqiad.wmnet with reason: host reimage
  • 17:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephmon1006.eqiad.wmnet with reason: host reimage
  • 17:35 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephmon1005.eqiad.wmnet with reason: host reimage
  • 17:35 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephmon1006.eqiad.wmnet with reason: host reimage
  • 17:33 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephmon1005.eqiad.wmnet with OS bullseye
  • 17:33 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephmon1006.eqiad.wmnet with OS bullseye
  • 17:16 cjming@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic: apply
  • 17:16 cjming@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic: apply
  • 16:54 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2239.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:53 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2238.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:52 cjming@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
  • 16:52 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2237.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:52 cjming@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
  • 16:51 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2236.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:50 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2235.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:47 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2234.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:46 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2233.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:45 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2232.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2231.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:43 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db2239.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:42 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db2238.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2230.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:41 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db2237.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:40 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db2236.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:38 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db2235.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:38 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2229.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:37 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db2234.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:36 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2228.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:35 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db2233.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:34 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db2232.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:33 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db2231.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:31 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db2230.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:26 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db2229.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:25 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db2228.mgmt.codfw.wmnet with reboot policy FORCED
  • 16:24 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:24 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding db2229 to codfw - jhancock@cumin2002"
  • 16:23 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding db2229 to codfw - jhancock@cumin2002"
  • 16:20 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 15:55 mfossati@deploy1002: Finished deploy [airflow-dags/platform_eng@845502d]: (no justification provided) (duration: 00m 37s)
  • 15:55 mfossati@deploy1002: Started deploy [airflow-dags/platform_eng@845502d]: (no justification provided)
  • 15:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1163 (T352010)', diff saved to https://phabricator.wikimedia.org/P66945 and previous config saved to /var/cache/conftool/dbconfig/20240726-153145-ladsgroup.json
  • 15:31 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 15:31 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1163.eqiad.wmnet with reason: Maintenance
  • 15:12 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host db2227.codfw.wmnet with OS bookworm
  • 15:09 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2227.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:53 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db2227.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:52 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:51 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding db2227 to codfw - jhancock@cumin2002"
  • 14:50 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding db2227 to codfw - jhancock@cumin2002"
  • 14:48 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 14:42 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 14:42 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['db2226']
  • 14:41 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2226']
  • 14:41 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['db2226']
  • 14:41 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2226']
  • 14:34 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db2240.codfw.wmnet with OS bookworm
  • 14:34 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 14:23 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 14:17 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2226.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:07 dcausse@deploy1002: Finished deploy [airflow-dags/search@fb00e94]: search: process_sparql_query_hourly tune the number of partitions to prevent OOM (duration: 00m 21s)
  • 14:07 dcausse@deploy1002: Started deploy [airflow-dags/search@fb00e94]: search: process_sparql_query_hourly tune the number of partitions to prevent OOM
  • 14:06 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2240.codfw.wmnet with reason: host reimage
  • 14:03 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2240.codfw.wmnet with reason: host reimage
  • 13:59 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db2226.mgmt.codfw.wmnet with reboot policy FORCED
  • 13:56 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:56 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding db2226 to codfw - jhancock@cumin2002"
  • 13:55 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding db2226 to codfw - jhancock@cumin2002"
  • 13:52 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 13:49 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host db2240.codfw.wmnet with OS bookworm
  • 13:42 elukey: move dump_cloud_ip_ranges's write to /srv/private capabilities back to puppetmaster1001 - T368023
  • 13:23 dcausse@deploy1002: Finished deploy [airflow-dags/search@d09039f]: search: fix drop dailies and bump discolitycs to fix numpy & pyarrow version conflict (duration: 00m 45s)
  • 13:23 dcausse@deploy1002: Started deploy [airflow-dags/search@d09039f]: search: fix drop dailies and bump discolitycs to fix numpy & pyarrow version conflict
  • 13:19 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host sretest1001.eqiad.wmnet with OS bullseye
  • 13:02 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1001.eqiad.wmnet with reason: host reimage
  • 12:58 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1001.eqiad.wmnet with reason: host reimage
  • 12:56 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephmon1006.eqiad.wmnet with OS bullseye
  • 12:42 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephmon1005.eqiad.wmnet with OS bullseye
  • 12:42 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host sretest1001.eqiad.wmnet with OS bullseye
  • 12:00 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc1017.eqiad.wmnet with OS bookworm
  • 11:52 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephmon1005.eqiad.wmnet with OS bullseye
  • 11:52 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephmon1006.eqiad.wmnet with OS bullseye
  • 11:51 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage
  • 11:48 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage
  • 11:45 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm
  • 11:05 jelto@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: Upgrade GitLab to new version
  • 10:40 akosiaris@deploy1003: Synchronized .mailmap: Testing a noop deploy from deploy1003 (duration: 20m 28s)
  • 10:03 elukey@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host sretest1001.eqiad.wmnet with OS bullseye
  • 10:00 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/growthbook: apply
  • 10:00 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/growthbook: apply
  • 09:38 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host analytics1073.eqiad.wmnet
  • 09:35 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host analytics1073.eqiad.wmnet
  • 09:33 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host analytics1072.eqiad.wmnet
  • 09:27 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host analytics1072.eqiad.wmnet
  • 09:21 akosiaris@deploy1003: helmfile [staging] DONE helmfile.d/services/machinetranslation: sync
  • 09:21 akosiaris@deploy1003: helmfile [staging] START helmfile.d/services/machinetranslation: sync
  • 09:21 akosiaris@deploy1003: helmfile [codfw] DONE helmfile.d/services/machinetranslation: sync
  • 09:21 akosiaris@deploy1003: helmfile [codfw] START helmfile.d/services/machinetranslation: sync
  • 09:21 akosiaris@deploy1003: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: sync
  • 09:16 akosiaris@deploy1003: helmfile [staging] DONE helmfile.d/services/sessionstore: sync
  • 09:10 akosiaris@deploy1003: helmfile [staging] DONE helmfile.d/services/recommendation-api: sync
  • 09:09 akosiaris@deploy1003: helmfile [staging] START helmfile.d/services/recommendation-api: sync
  • 09:09 akosiaris@deploy1003: helmfile [codfw] DONE helmfile.d/services/recommendation-api: sync
  • 09:09 akosiaris@deploy1003: helmfile [codfw] START helmfile.d/services/recommendation-api: sync
  • 09:09 akosiaris@deploy1003: helmfile [eqiad] DONE helmfile.d/services/recommendation-api: sync
  • 09:09 akosiaris@deploy1003: helmfile [eqiad] START helmfile.d/services/recommendation-api: sync
  • 09:06 akosiaris@deploy1003: helmfile [staging] START helmfile.d/services/sessionstore: sync
  • 09:06 akosiaris@deploy1003: helmfile [codfw] DONE helmfile.d/services/sessionstore: sync
  • 09:06 akosiaris@deploy1003: helmfile [staging] DONE helmfile.d/services/echostore: sync
  • 09:06 akosiaris@deploy1003: helmfile [codfw] START helmfile.d/services/sessionstore: sync
  • 09:06 akosiaris@deploy1003: helmfile [eqiad] DONE helmfile.d/services/sessionstore: sync
  • 09:06 akosiaris@deploy1003: helmfile [staging] START helmfile.d/services/echostore: sync
  • 09:06 akosiaris@deploy1003: helmfile [codfw] DONE helmfile.d/services/echostore: sync
  • 09:06 akosiaris@deploy1003: helmfile [eqiad] START helmfile.d/services/sessionstore: sync
  • 09:06 akosiaris@deploy1003: helmfile [codfw] START helmfile.d/services/echostore: sync
  • 09:06 akosiaris@deploy1003: helmfile [eqiad] DONE helmfile.d/services/echostore: sync
  • 09:06 akosiaris@deploy1003: helmfile [eqiad] START helmfile.d/services/echostore: sync
  • 09:05 akosiaris@deploy1003: helmfile [eqiad] START helmfile.d/services/machinetranslation: sync
  • 09:02 akosiaris@deploy1003: helmfile [staging] DONE helmfile.d/services/linkrecommendation: sync
  • 09:02 akosiaris@deploy1003: helmfile [staging] START helmfile.d/services/linkrecommendation: sync
  • 09:02 akosiaris@deploy1003: helmfile [codfw] DONE helmfile.d/services/linkrecommendation: sync
  • 09:01 akosiaris@deploy1003: helmfile [codfw] START helmfile.d/services/linkrecommendation: sync
  • 09:01 akosiaris@deploy1003: helmfile [eqiad] DONE helmfile.d/services/linkrecommendation: sync
  • 09:01 akosiaris@deploy1003: helmfile [eqiad] START helmfile.d/services/linkrecommendation: sync
  • 08:56 akosiaris@deploy1003: helmfile [staging] DONE helmfile.d/services/developer-portal: sync
  • 08:55 akosiaris@deploy1003: helmfile [staging] START helmfile.d/services/developer-portal: sync
  • 08:55 akosiaris@deploy1003: helmfile [codfw] DONE helmfile.d/services/developer-portal: sync
  • 08:55 akosiaris@deploy1003: helmfile [codfw] START helmfile.d/services/developer-portal: sync
  • 08:55 akosiaris@deploy1003: helmfile [eqiad] DONE helmfile.d/services/developer-portal: sync
  • 08:55 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2188 (T367856)', diff saved to https://phabricator.wikimedia.org/P66942 and previous config saved to /var/cache/conftool/dbconfig/20240726-085529-marostegui.json
  • 08:55 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2188.codfw.wmnet with reason: Maintenance
  • 08:55 akosiaris@deploy1003: helmfile [eqiad] START helmfile.d/services/developer-portal: sync
  • 08:55 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2188.codfw.wmnet with reason: Maintenance
  • 08:55 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T367856)', diff saved to https://phabricator.wikimedia.org/P66941 and previous config saved to /var/cache/conftool/dbconfig/20240726-085507-marostegui.json
  • 08:52 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/growthbook: apply
  • 08:52 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/growthbook: apply
  • 08:40 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P66940 and previous config saved to /var/cache/conftool/dbconfig/20240726-083959-marostegui.json
  • 08:35 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on sretest1001.eqiad.wmnet with reason: host reimage
  • 08:32 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on sretest1001.eqiad.wmnet with reason: host reimage
  • 08:25 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Upgrade GitLab to new version
  • 08:24 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P66939 and previous config saved to /var/cache/conftool/dbconfig/20240726-082452-marostegui.json
  • 08:18 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/growthbook: apply
  • 08:18 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/growthbook: apply
  • 08:16 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/growthbook: apply
  • 08:16 elukey@cumin1002: START - Cookbook sre.hosts.reimage for host sretest1001.eqiad.wmnet with OS bullseye
  • 08:09 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T367856)', diff saved to https://phabricator.wikimedia.org/P66938 and previous config saved to /var/cache/conftool/dbconfig/20240726-080945-marostegui.json
  • 07:43 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1199 (T367856)', diff saved to https://phabricator.wikimedia.org/P66937 and previous config saved to /var/cache/conftool/dbconfig/20240726-074330-marostegui.json
  • 07:43 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1199.eqiad.wmnet with reason: Maintenance
  • 07:43 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1199.eqiad.wmnet with reason: Maintenance
  • 07:43 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190 (T367856)', diff saved to https://phabricator.wikimedia.org/P66936 and previous config saved to /var/cache/conftool/dbconfig/20240726-074308-marostegui.json
  • 07:28 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P66935 and previous config saved to /var/cache/conftool/dbconfig/20240726-072801-marostegui.json
  • 07:12 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P66934 and previous config saved to /var/cache/conftool/dbconfig/20240726-071254-marostegui.json
  • 06:57 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190 (T367856)', diff saved to https://phabricator.wikimedia.org/P66933 and previous config saved to /var/cache/conftool/dbconfig/20240726-065747-marostegui.json
  • 06:56 XioNoX: continue rolling out "LVS-and-NS-service-ips" prefix-list rename to network device
  • 00:47 ladsgroup@deploy1002: Finished scap: Backport for Update UI classes and CSS for review notices (T191156), Add CSS class to watchlist pending notice (T191156) (duration: 09m 49s)
  • 00:42 ladsgroup@deploy1002: ladsgroup: Continuing with sync
  • 00:40 ladsgroup@deploy1002: ladsgroup: Backport for Update UI classes and CSS for review notices (T191156), Add CSS class to watchlist pending notice (T191156) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 00:37 ladsgroup@deploy1002: Started scap sync-world: Backport for Update UI classes and CSS for review notices (T191156), Add CSS class to watchlist pending notice (T191156)

2024-07-25

  • 23:09 ladsgroup@deploy1002: ladsgroup: Continuing with sync
  • 23:05 ladsgroup@deploy1002: ladsgroup: Backport for Add CSS class to watchlist pending notice (T191156) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 23:03 ladsgroup@deploy1002: Started scap sync-world: Backport for Add CSS class to watchlist pending notice (T191156)
  • 22:56 ladsgroup@deploy1002: Finished scap: Backport for Revert "Use expression builder to avoid IDatabase::makeList" (T371052) (duration: 10m 08s)
  • 22:50 ladsgroup@deploy1002: ladsgroup, umherirrender: Continuing with sync
  • 22:48 ladsgroup@deploy1002: ladsgroup, umherirrender: Backport for Revert "Use expression builder to avoid IDatabase::makeList" (T371052) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 22:46 ladsgroup@deploy1002: Started scap sync-world: Backport for Revert "Use expression builder to avoid IDatabase::makeList" (T371052)
  • 22:11 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2240.mgmt.codfw.wmnet with reboot policy FORCED
  • 22:10 eoghan@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: Upgrade for T370973
  • 22:04 eoghan@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Upgrade for T370973
  • 22:04 eoghan@cumin1002: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab1004.wikimedia.org with reason: Upgrade for T370973
  • 22:03 eoghan@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Upgrade for T370973
  • 22:00 eoghan@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: Upgrade for T370973
  • 21:59 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db2240.mgmt.codfw.wmnet with reboot policy FORCED
  • 21:55 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:55 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding db2240 to codfw - jhancock@cumin2002"
  • 21:54 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding db2240 to codfw - jhancock@cumin2002"
  • 21:54 eoghan@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Upgrade for T370973
  • 21:52 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 21:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db2224']
  • 21:42 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db2225']
  • 21:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db2223']
  • 21:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db2222']
  • 21:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['db2221']
  • 21:36 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2225']
  • 21:36 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2224']
  • 21:36 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2223']
  • 21:36 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2222']
  • 21:35 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2221']
  • 21:35 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['db2225']
  • 21:35 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['db2224']
  • 21:35 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['db2223']
  • 21:35 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['db2222']
  • 21:35 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['db2221']
  • 21:35 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2225']
  • 21:35 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2224']
  • 21:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2223']
  • 21:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2222']
  • 21:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['db2221']
  • 21:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2225.mgmt.codfw.wmnet with reboot policy FORCED
  • 21:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2222.mgmt.codfw.wmnet with reboot policy FORCED
  • 21:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2223.mgmt.codfw.wmnet with reboot policy FORCED
  • 21:32 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2221.mgmt.codfw.wmnet with reboot policy FORCED
  • 21:31 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host db2224.mgmt.codfw.wmnet with reboot policy FORCED
  • 21:22 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db2225.mgmt.codfw.wmnet with reboot policy FORCED
  • 21:22 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db2222.mgmt.codfw.wmnet with reboot policy FORCED
  • 21:19 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db2225.mgmt.codfw.wmnet with reboot policy FORCED
  • 21:19 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host db2222.mgmt.codfw.wmnet with reboot policy FORCED
  • 21:18 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db2225.mgmt.codfw.wmnet with reboot policy FORCED
  • 21:18 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db2224.mgmt.codfw.wmnet with reboot policy FORCED
  • 21:18 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db2223.mgmt.codfw.wmnet with reboot policy FORCED
  • 21:18 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db2222.mgmt.codfw.wmnet with reboot policy FORCED
  • 21:18 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host db2221.mgmt.codfw.wmnet with reboot policy FORCED
  • 21:18 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:18 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding db2221 to codfw - jhancock@cumin2002"
  • 21:17 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding db2221 to codfw - jhancock@cumin2002"
  • 21:14 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 19:16 cstone: payments-wiki upgraded from a37746fe to 91624a2e
  • 19:12 pt1979@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephmon1004.eqiad.wmnet with OS bullseye
  • 19:12 pt1979@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin1002"
  • 18:59 ryankemper@cumin2002: START - Cookbook sre.hadoop.reboot-workers for Hadoop analytics cluster
  • 18:26 pt1979@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin1002"
  • 18:12 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group2 to 1.43.0-wmf.15 refs T366960
  • 18:10 pt1979@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephmon1004.eqiad.wmnet with reason: host reimage
  • 18:07 pt1979@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephmon1004.eqiad.wmnet with reason: host reimage
  • 18:05 pt1979@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephmon1004.eqiad.wmnet with OS bullseye
  • 17:56 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on netbox2003.codfw.wmnet,netbox1003.eqiad.wmnet with reason: netbox upgrade prep work
  • 17:56 ayounsi@cumin1002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on netbox2003.codfw.wmnet,netbox1003.eqiad.wmnet with reason: netbox upgrade prep work
  • 17:32 pt1979@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephmon1004.eqiad.wmnet with OS bullseye
  • 17:20 swfrench@cumin1002: conftool action : set/weight=10:pooled=yes; selector: name=(wikikube-worker1032.eqiad.wmnet),cluster=kubernetes,service=kubesvc [reason: T351074 - pooling after reimage]
  • 17:08 swfrench@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1032.eqiad.wmnet with OS bullseye
  • 17:06 swfrench-wmf: running homer 'cr*eqiad*' commit 'T351074' for k8s worker reimage
  • 17:03 ebernhardson@deploy1002: Finished deploy [airflow-dags/search@b1a04fc]: bump discolytics to 0.25 (duration: 00m 25s)
  • 17:03 ebernhardson@deploy1002: Started deploy [airflow-dags/search@b1a04fc]: bump discolytics to 0.25
  • 16:48 swfrench@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1032.eqiad.wmnet with reason: host reimage
  • 16:46 ebernhardson@deploy1002: Finished deploy [airflow-dags/search@8c8f4c2]: Add new fields to search_satisfaction metrics (duration: 00m 19s)
  • 16:46 ebernhardson@deploy1002: Started deploy [airflow-dags/search@8c8f4c2]: Add new fields to search_satisfaction metrics
  • 16:45 swfrench@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1032.eqiad.wmnet with reason: host reimage
  • 16:45 pt1979@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephmon1004.eqiad.wmnet with OS bullseye
  • 16:30 swfrench@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1032.eqiad.wmnet with OS bullseye
  • 16:29 swfrench@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1032.eqiad.wmnet on all recursors
  • 16:29 swfrench@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker1032.eqiad.wmnet on all recursors
  • 16:27 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/growthbook: apply
  • 16:27 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/growthbook: apply
  • 16:25 swfrench@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw1364 to wikikube-worker1032
  • 16:24 swfrench@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1032
  • 16:24 swfrench@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1032
  • 16:23 swfrench@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:23 swfrench@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1364 to wikikube-worker1032 - swfrench@cumin1002"
  • 16:21 swfrench@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1364 to wikikube-worker1032 - swfrench@cumin1002"
  • 16:18 swfrench@cumin1002: START - Cookbook sre.dns.netbox
  • 16:18 swfrench@cumin1002: START - Cookbook sre.hosts.rename from mw1364 to wikikube-worker1032
  • 16:17 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/growthbook: apply
  • 16:09 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:07 ayounsi@cumin1002: START - Cookbook sre.dns.netbox
  • 15:15 elukey: upgrade spicerack to 8.9.0 on cumin nodes
  • 15:08 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1190 (T367856)', diff saved to https://phabricator.wikimedia.org/P66930 and previous config saved to /var/cache/conftool/dbconfig/20240725-150739-marostegui.json
  • 15:07 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1190.eqiad.wmnet with reason: Maintenance
  • 15:07 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1190.eqiad.wmnet with reason: Maintenance
  • 15:07 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1160 (T367856)', diff saved to https://phabricator.wikimedia.org/P66929 and previous config saved to /var/cache/conftool/dbconfig/20240725-150717-marostegui.json
  • 14:53 elukey: uploaded spicerack_8.9.0 to apt.wikimedia.org bullseye-wikimedia
  • 14:52 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P66928 and previous config saved to /var/cache/conftool/dbconfig/20240725-145210-marostegui.json
  • 14:51 sukhe: running authdns-update after dns4003 depool
  • 14:48 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=dns4003.wikimedia.org [reason: finished upgrading anycast-hc: T370068]
  • 14:46 sukhe: [dns4003] upgrade anycast-healthchecker to 0.9.8-1+wmf12u2: T370068
  • 14:44 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=dns4003.wikimedia.org [reason: upgrading anycast-hc: T370068]
  • 14:37 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1160', diff saved to https://phabricator.wikimedia.org/P66926 and previous config saved to /var/cache/conftool/dbconfig/20240725-143703-marostegui.json
  • 14:36 dcausse@deploy1002: Finished deploy [airflow-dags/search@87b91b6]: search: drop hourly weighted_tags support (duration: 00m 20s)
  • 14:36 dcausse@deploy1002: Started deploy [airflow-dags/search@87b91b6]: search: drop hourly weighted_tags support
  • 14:21 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1160 (T367856)', diff saved to https://phabricator.wikimedia.org/P66925 and previous config saved to /var/cache/conftool/dbconfig/20240725-142155-marostegui.json
  • 14:19 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: sync
  • 14:12 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/machinetranslation: sync
  • 14:12 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: sync
  • 14:04 akosiaris@deploy1003: helmfile [staging] DONE helmfile.d/services/recommendation-api: sync
  • 14:04 akosiaris@deploy1003: helmfile [staging] START helmfile.d/services/recommendation-api: sync
  • 14:04 akosiaris@deploy1003: helmfile [codfw] DONE helmfile.d/services/recommendation-api: sync
  • 14:03 akosiaris@deploy1003: helmfile [codfw] START helmfile.d/services/recommendation-api: sync
  • 14:03 akosiaris@deploy1003: helmfile [eqiad] DONE helmfile.d/services/recommendation-api: sync
  • 14:03 akosiaris@deploy1003: helmfile [eqiad] START helmfile.d/services/recommendation-api: sync
  • 13:57 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/machinetranslation: sync
  • 13:57 akosiaris@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: sync
  • 13:53 akosiaris@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: sync
  • 13:52 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/sessionstore: sync
  • 13:52 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/sessionstore: sync
  • 13:52 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/sessionstore: sync
  • 13:52 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/sessionstore: sync
  • 13:48 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/echostore: apply
  • 13:48 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/echostore: apply
  • 13:48 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/echostore: apply
  • 13:48 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/echostore: apply
  • 13:48 akosiaris@deploy1002: helmfile [staging] DONE helmfile.d/services/echostore: apply
  • 13:48 akosiaris@deploy1002: helmfile [staging] START helmfile.d/services/echostore: apply
  • 13:47 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/linkrecommendation: apply
  • 13:45 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/linkrecommendation: apply
  • 13:45 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/linkrecommendation: apply
  • 13:45 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/linkrecommendation: apply
  • 13:45 akosiaris@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply
  • 13:45 akosiaris@deploy1002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply
  • 13:43 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
  • 13:43 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
  • 13:43 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
  • 13:43 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
  • 13:42 akosiaris@deploy1002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
  • 13:42 akosiaris@deploy1002: helmfile [staging] START helmfile.d/services/developer-portal: apply
  • 13:41 cgoubert@cumin1002: conftool action : set/pooled=yes; selector: name=kubernetes1051.eqiad.wmnet,cluster=kubernetes,service=kubesvc [reason: Uncordoning kubernetes1051 for missed upgrades - T369011]
  • 13:40 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubernetes1051.eqiad.wmnet
  • 13:34 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=1) for host pc1017.eqiad.wmnet with OS bookworm
  • 13:32 cgoubert@cumin1002: START - Cookbook sre.hosts.reboot-single for host kubernetes1051.eqiad.wmnet
  • 13:30 Lucas_WMDE: UTC afternoon backport+config window done
  • 13:30 cgoubert@cumin1002: conftool action : set/pooled=inactive; selector: name=kubernetes1051.eqiad.wmnet,cluster=kubernetes,service=kubesvc [reason: Cordoning kubernetes1051 for missed upgrades - T369011]
  • 13:30 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for Add wikibase client interaction stream (T370045) (duration: 07m 56s)
  • 13:25 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde, joelyrookewmde: Continuing with sync
  • 13:24 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde, joelyrookewmde: Backport for Add wikibase client interaction stream (T370045) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:22 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap sync-world: Backport for Add wikibase client interaction stream (T370045)
  • 13:18 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for Enable optional MathJax rendering in everywhere (T370507) (duration: 09m 57s)
  • 13:17 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage
  • 13:15 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage
  • 13:14 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde, physikerwelt: Continuing with sync
  • 13:12 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde, physikerwelt: Backport for Enable optional MathJax rendering in everywhere (T370507) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:09 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap sync-world: Backport for Enable optional MathJax rendering in everywhere (T370507)
  • 13:01 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm
  • 12:56 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=1) for host pc1017.eqiad.wmnet with OS bookworm
  • 12:56 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 12:42 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/growthbook: apply
  • 12:42 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/growthbook: apply
  • 12:33 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 12:29 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-misc: apply
  • 12:28 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-misc: apply
  • 12:28 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-misc: apply
  • 12:27 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-misc: apply
  • 12:27 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 12:26 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 12:26 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
  • 12:25 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
  • 12:25 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply
  • 12:24 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply
  • 12:24 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply
  • 12:23 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply
  • 12:23 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-jobrunner: apply
  • 12:22 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-jobrunner: apply
  • 12:22 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-jobrunner: apply
  • 12:20 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-jobrunner: apply
  • 12:20 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-wikifunctions: apply
  • 12:20 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-wikifunctions: apply
  • 12:20 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-wikifunctions: apply
  • 12:18 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-wikifunctions: apply
  • 12:18 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
  • 12:17 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
  • 12:17 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
  • 12:16 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
  • 12:16 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 12:16 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage
  • 12:15 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 12:15 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 12:14 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 12:13 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage
  • 12:13 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 12:12 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 12:12 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 12:12 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 12:08 cgoubert@deploy1002: sync-world aborted: Deploying mpic envoy listener - 1056163 - T366234 (duration: 17m 59s)
  • 11:59 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm
  • 11:53 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/growthbook: apply
  • 11:51 cgoubert@deploy1002: Started scap sync-world: Deploying mpic envoy listener - 1056163 - T366234
  • 11:45 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/growthbook: apply
  • 11:22 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/growthbook: apply
  • 11:22 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/growthbook: apply
  • 10:42 elukey: upload docker-report 0.0.15 to bullseye-wimedia and upgrade build2001
  • 10:00 cgoubert@cumin1002: conftool action : set/pooled=yes; selector: name=kubernetes1051.eqiad.wmnet,cluster=kubernetes,service=kubesvc [reason: Uncordoning kubernetes1051 - T369011]
  • 09:57 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:54 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 09:27 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/growthbook: apply
  • 09:26 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/growthbook: apply
  • 09:19 elukey: move dump_cloud_ip_ranges from puppetmaster1001 to puppetserver1001 - T368023
  • 07:38 kart_: Updated cxserver to 2024-07-22-050142-production (T363968)
  • 07:37 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2176 (T367856)', diff saved to https://phabricator.wikimedia.org/P66924 and previous config saved to /var/cache/conftool/dbconfig/20240725-073742-marostegui.json
  • 07:37 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2176.codfw.wmnet with reason: Maintenance
  • 07:37 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2176.codfw.wmnet with reason: Maintenance
  • 07:37 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T367856)', diff saved to https://phabricator.wikimedia.org/P66923 and previous config saved to /var/cache/conftool/dbconfig/20240725-073720-marostegui.json
  • 07:37 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 07:36 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/growthbook: apply
  • 07:36 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 07:35 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/growthbook: apply
  • 07:35 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 07:35 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 07:22 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P66922 and previous config saved to /var/cache/conftool/dbconfig/20240725-072213-marostegui.json
  • 07:14 XioNoX: add transit BGP session to KPN in esams
  • 07:07 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P66921 and previous config saved to /var/cache/conftool/dbconfig/20240725-070706-marostegui.json
  • 06:52 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T367856)', diff saved to https://phabricator.wikimedia.org/P66920 and previous config saved to /var/cache/conftool/dbconfig/20240725-065159-marostegui.json
  • 00:43 zabe@deploy1002: Finished scap: Backport for Further configs for cswikivoyage (T370913) (duration: 08m 22s)
  • 00:39 zabe@deploy1002: zabe: Continuing with sync
  • 00:37 zabe@deploy1002: zabe: Backport for Further configs for cswikivoyage (T370913) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 00:35 zabe@deploy1002: Started scap sync-world: Backport for Further configs for cswikivoyage (T370913)
  • 00:11 eileen: civicrm upgraded from c656ab2f to 1dc4f944
  • 00:00 dani@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 00:00 dani@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 00:00 dani@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply

2024-07-24

  • 23:59 dani@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 23:59 dani@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 23:59 dani@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 23:20 zabe@deploy1002: Finished scap: update interwiki cache (duration: 08m 25s)
  • 23:11 zabe@deploy1002: Started scap sync-world: update interwiki cache
  • 23:09 zabe: zabe@mwmaint1002:~$ mwscript extensions/CirrusSearch/maintenance/UpdateSearchIndexConfig.php --wiki=cswikivoyage --cluster=all 2>&1 | tee /tmp/cswikivoyage.UpdateSearchIndexConfig.log # T370905
  • 23:08 zabe@deploy1002: Finished scap: T370905 (duration: 09m 14s)
  • 23:02 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1160 (T367856)', diff saved to https://phabricator.wikimedia.org/P66919 and previous config saved to /var/cache/conftool/dbconfig/20240724-230209-marostegui.json
  • 23:02 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 23:01 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1160.eqiad.wmnet with reason: Maintenance
  • 22:59 zabe@deploy1002: Started scap sync-world: T370905
  • 22:59 zabe: Create Wikivoyage Czech # T370905
  • 22:42 ejegg: re-enabled Adyen job runner
  • 22:41 ejegg: SmashPig upgraded from f2aca230 to 1b2d9a6e across all frack servers
  • 22:34 ejegg: SmashPig upgraded from f2aca230 to 1b2d9a6e on frpig1002 only
  • 22:34 ejegg: SmashPig upgraded from f2aca230 to 1b2d9a6e on frpig2001 only
  • 22:33 ejegg: disabled Adyen job runner
  • 21:59 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs1021.eqiad.wmnet
  • 21:59 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs1020.eqiad.wmnet
  • 21:58 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs1019.eqiad.wmnet
  • 21:58 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs1018.eqiad.wmnet
  • 21:55 ryankemper@cumin2002: START - Cookbook sre.hosts.reboot-single for host wdqs1021.eqiad.wmnet
  • 21:55 ryankemper@cumin2002: START - Cookbook sre.hosts.reboot-single for host wdqs1020.eqiad.wmnet
  • 21:55 ryankemper@cumin2002: START - Cookbook sre.hosts.reboot-single for host wdqs1019.eqiad.wmnet
  • 21:55 ryankemper@cumin2002: START - Cookbook sre.hosts.reboot-single for host wdqs1018.eqiad.wmnet
  • 21:54 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on wdqs[1018-1021].eqiad.wmnet with reason: T366555 security
  • 21:54 ryankemper@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on wdqs[1018-1021].eqiad.wmnet with reason: T366555 security
  • 21:51 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs1014.eqiad.wmnet
  • 21:50 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs1015.eqiad.wmnet
  • 21:47 ryankemper@cumin2002: START - Cookbook sre.hosts.reboot-single for host wdqs1014.eqiad.wmnet
  • 21:47 ryankemper@cumin2002: START - Cookbook sre.hosts.reboot-single for host wdqs1015.eqiad.wmnet
  • 21:47 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on wdqs[1014-1015].eqiad.wmnet with reason: T366555 security
  • 21:47 ryankemper@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on wdqs[1014-1015].eqiad.wmnet with reason: T366555 security
  • 21:46 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs2007.codfw.wmnet
  • 21:45 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs2010.codfw.wmnet
  • 21:45 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs2011.codfw.wmnet
  • 21:45 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs1012.eqiad.wmnet
  • 21:45 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs2012.codfw.wmnet
  • 21:45 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs2009.codfw.wmnet
  • 21:44 ryankemper@cumin2002: END (PASS) - Cookbook sre.apifeatureusage.roll-restart-reboot-logstash (exit_code=0) rolling reboot on A:apifeatureusage
  • 21:42 ryankemper@cumin2002: START - Cookbook sre.hosts.reboot-single for host wdqs2012.codfw.wmnet
  • 21:42 ryankemper@cumin2002: START - Cookbook sre.hosts.reboot-single for host wdqs2011.codfw.wmnet
  • 21:42 ryankemper@cumin2002: START - Cookbook sre.hosts.reboot-single for host wdqs2010.codfw.wmnet
  • 21:42 ryankemper@cumin2002: START - Cookbook sre.hosts.reboot-single for host wdqs2009.codfw.wmnet
  • 21:42 ryankemper@cumin2002: START - Cookbook sre.hosts.reboot-single for host wdqs2007.codfw.wmnet
  • 21:42 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs1013.eqiad.wmnet
  • 21:41 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on wdqs[2007,2009-2012].codfw.wmnet with reason: T366555 security
  • 21:40 ryankemper@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on wdqs[2007,2009-2012].codfw.wmnet with reason: T366555 security
  • 21:38 ryankemper@cumin2002: START - Cookbook sre.hosts.reboot-single for host wdqs1013.eqiad.wmnet
  • 21:38 ryankemper@cumin2002: START - Cookbook sre.hosts.reboot-single for host wdqs1012.eqiad.wmnet
  • 21:38 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on wdqs[1012-1013].eqiad.wmnet with reason: T366555 security
  • 21:38 ryankemper@cumin2002: START - Cookbook sre.hosts.downtime for 0:30:00 on wdqs[1012-1013].eqiad.wmnet with reason: T366555 security
  • 21:35 ryankemper@cumin2002: START - Cookbook sre.apifeatureusage.roll-restart-reboot-logstash rolling reboot on A:apifeatureusage
  • 21:32 ebernhardson@deploy1002: Finished scap: Backport for Check the output of RevisionStore::getRevisionById (T370770) (duration: 12m 07s)
  • 21:28 ebernhardson@deploy1002: ebernhardson: Continuing with sync
  • 21:26 ebernhardson@deploy1002: ebernhardson: Backport for Check the output of RevisionStore::getRevisionById (T370770) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:20 ebernhardson@deploy1002: Started scap sync-world: Backport for Check the output of RevisionStore::getRevisionById (T370770)
  • 21:17 zabe@deploy1002: Finished scap: Backport for Create dark mode launch banner for Vector 2022 (T370303) (duration: 41m 44s)
  • 21:11 zabe@deploy1002: jdrewniak, zabe: Continuing with sync
  • 21:07 zabe@deploy1002: jdrewniak, zabe: Backport for Create dark mode launch banner for Vector 2022 (T370303) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:49 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm
  • 20:36 zabe@deploy1002: Started scap sync-world: Backport for Create dark mode launch banner for Vector 2022 (T370303)
  • 20:24 sergi0: mwscript extensions/GrowthExperiments/maintenance/migrateCommunityConfig.php --wiki=frwiktionary #T369711
  • 20:23 sergi0: sgimeno@mwmaint1002:~$ mwscript extensions/GrowthExperiments/maintenance/migrateCommunityConfig.php --wiki=dewiki --force
  • 20:18 zabe@deploy1002: Finished scap: Backport for frwiktionary, dewiki: enable CommunityConfiguration (T370261 T369711) (duration: 09m 43s)
  • 20:13 zabe@deploy1002: zabe, sgimeno: Continuing with sync
  • 20:11 zabe@deploy1002: zabe, sgimeno: Backport for frwiktionary, dewiki: enable CommunityConfiguration (T370261 T369711) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:08 zabe@deploy1002: Started scap sync-world: Backport for frwiktionary, dewiki: enable CommunityConfiguration (T370261 T369711)
  • 19:31 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 19:31 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 19:30 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 19:30 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 19:27 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 19:27 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 18:10 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group1 to 1.43.0-wmf.15 refs T366960
  • 17:22 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:22 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding frack servers to codfw - jhancock@cumin2002"
  • 17:21 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding frack servers to codfw - jhancock@cumin2002"
  • 17:13 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 17:03 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:03 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding frack servers to codfw - jhancock@cumin2002"
  • 17:02 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding frack servers to codfw - jhancock@cumin2002"
  • 16:59 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 16:54 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:54 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding frack servers to codfw - jhancock@cumin2002"
  • 16:53 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding frack servers to codfw - jhancock@cumin2002"
  • 16:50 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 16:44 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:44 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding frack servers to codfw - jhancock@cumin2002"
  • 16:43 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding frack servers to codfw - jhancock@cumin2002"
  • 16:38 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 16:33 sukhe: sudo cumin -b1 -s120 'O:wikidough' 'systemctl restart anycast-healthchecker.service'
  • 15:43 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply
  • 15:42 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply
  • 15:30 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs1020.eqiad.wmnet
  • 15:24 sukhe@cumin1002: START - Cookbook sre.hosts.reboot-single for host lvs1020.eqiad.wmnet
  • 15:20 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host pc2017.codfw.wmnet with OS bookworm
  • 15:10 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc2017.codfw.wmnet with reason: host reimage
  • 15:07 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs3010.esams.wmnet
  • 15:07 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc2017.codfw.wmnet with reason: host reimage
  • 15:04 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host pc2017.codfw.wmnet with OS bookworm
  • 15:01 sukhe@cumin1002: START - Cookbook sre.hosts.reboot-single for host lvs3010.esams.wmnet
  • 14:54 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=1) for host pc2017.codfw.wmnet with OS bookworm
  • 14:54 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 14:52 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jhancock@cumin2002"
  • 14:52 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for Revert "TranslatablePage: Split translatable page id cache into multiple shards", Revert "TranslatablePage: Split translatable page id cache into multiple shards" (duration: 09m 37s)
  • 14:47 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde, trainbranchbot: Continuing with sync
  • 14:44 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde, trainbranchbot: Backport for Revert "TranslatablePage: Split translatable page id cache into multiple shards", Revert "TranslatablePage: Split translatable page id cache into multiple shards" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:42 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap sync-world: Backport for Revert "TranslatablePage: Split translatable page id cache into multiple shards", Revert "TranslatablePage: Split translatable page id cache into multiple shards"
  • 14:36 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs4010.ulsfo.wmnet
  • 14:35 ecarg@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 14:33 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc2017.codfw.wmnet with reason: host reimage
  • 14:33 ecarg@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 14:32 ecarg@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 14:31 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=dns6001.wikimedia.org [reason: finished upgrading anycast-hc: T370068]
  • 14:31 ecarg@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 14:30 sukhe@cumin1002: START - Cookbook sre.hosts.reboot-single for host lvs4010.ulsfo.wmnet
  • 14:30 jhancock@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc2017.codfw.wmnet with reason: host reimage
  • 14:29 ecarg@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 14:28 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs5006.eqsin.wmnet
  • 14:27 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=dns6001.wikimedia.org [reason: upgrading anycast-hc: T370068]
  • 14:27 ecarg@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 14:26 ecarg@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 14:26 ecarg@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 14:26 kamila@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply
  • 14:25 kamila@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-video: apply
  • 14:25 kamila@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply
  • 14:24 kamila@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-video: apply
  • 14:24 kamila@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply
  • 14:24 sukhe: upgrade O:durum to anycast-hc 0.9.8-1+wmf12u2
  • 14:22 kamila@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply
  • 14:22 sukhe@cumin1002: START - Cookbook sre.hosts.reboot-single for host lvs5006.eqsin.wmnet
  • 14:20 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs6003.drmrs.wmnet
  • 14:20 ecarg@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 14:19 ecarg@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 14:19 ecarg@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 14:18 sukhe: disable puppet on O:durum
  • 14:18 ecarg@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 14:16 ecarg@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 14:15 ecarg@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 14:14 sukhe@cumin1002: START - Cookbook sre.hosts.reboot-single for host lvs6003.drmrs.wmnet
  • 14:10 Lucas_WMDE: UTC afternoon backport+config window done
  • 14:09 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for TranslatablePage: Split translatable page id cache into multiple shards (T366455) (duration: 11m 21s)
  • 14:05 logmsgbot: lucaswerkmeister-wmde@deploy1002 abi, lucaswerkmeister-wmde: Continuing with sync
  • 14:00 logmsgbot: lucaswerkmeister-wmde@deploy1002 abi, lucaswerkmeister-wmde: Backport for TranslatablePage: Split translatable page id cache into multiple shards (T366455) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:59 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host pc2017.codfw.wmnet with OS bookworm
  • 13:58 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap sync-world: Backport for TranslatablePage: Split translatable page id cache into multiple shards (T366455)
  • 13:57 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for TranslatablePage: Split translatable page id cache into multiple shards (T366455) (duration: 10m 21s)
  • 13:52 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde, abi: Continuing with sync
  • 13:51 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage
  • 13:49 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde, abi: Backport for TranslatablePage: Split translatable page id cache into multiple shards (T366455) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:48 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on pc1017.eqiad.wmnet with reason: host reimage
  • 13:46 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap sync-world: Backport for TranslatablePage: Split translatable page id cache into multiple shards (T366455)
  • 13:37 godog: silence OtelCollectorRefusedSpans in codfw for 7d - T370043
  • 13:35 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm
  • 13:28 sukhe: reprepro -C main include bookworm-wikimedia anycast-healthchecker_0.9.8-1+wmf12u2_amd64.changes: T370068
  • 13:25 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for knwikisource: Enable local uploads (T370765) (duration: 10m 14s)
  • 13:19 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde, anzx: Continuing with sync
  • 13:18 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde, anzx: Backport for knwikisource: Enable local uploads (T370765) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:15 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap sync-world: Backport for knwikisource: Enable local uploads (T370765)
  • 13:14 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host pc1017.eqiad.wmnet with OS bookworm
  • 12:39 akosiaris@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host deploy1003.eqiad.wmnet with OS bullseye
  • 12:31 mfossati@deploy1002: Finished deploy [airflow-dags/platform_eng@24f95a8]: (no justification provided) (duration: 00m 30s)
  • 12:31 mfossati@deploy1002: Started deploy [airflow-dags/platform_eng@24f95a8]: (no justification provided)
  • 11:11 dreamyjazz@deploy1002: Finished scap: Backport for Remove now unused $wgGlobalBlockingDatabase definition (T370856) (duration: 07m 27s)
  • 11:06 dreamyjazz@deploy1002: dreamyjazz: Continuing with sync
  • 11:06 dreamyjazz@deploy1002: dreamyjazz: Backport for Remove now unused $wgGlobalBlockingDatabase definition (T370856) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 11:03 dreamyjazz@deploy1002: Started scap sync-world: Backport for Remove now unused $wgGlobalBlockingDatabase definition (T370856)
  • 11:00 jiji@deploy1002: Finished scap: Noop, bumping mediawiki chart version (duration: 02m 32s)
  • 10:57 jiji@deploy1002: Started scap sync-world: Noop, bumping mediawiki chart version
  • 10:54 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 10:54 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 10:54 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 10:53 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 10:33 akosiaris@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on deploy1003.eqiad.wmnet with reason: host reimage
  • 10:28 akosiaris@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on deploy1003.eqiad.wmnet with reason: host reimage
  • 10:16 akosiaris@cumin1002: START - Cookbook sre.hosts.reimage for host deploy1003.eqiad.wmnet with OS bullseye
  • 10:15 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 21 days, 0:00:00 on 16 hosts with reason: Legacy appserver spindown
  • 10:15 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 21 days, 0:00:00 on 16 hosts with reason: Legacy appserver spindown
  • 06:54 XioNoX: deploy CR1056198 Rename LVS-service-IPs prefix-list
  • 06:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P66908 and previous config saved to /var/cache/conftool/dbconfig/20240724-060142-marostegui.json
  • 05:46 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P66907 and previous config saved to /var/cache/conftool/dbconfig/20240724-054635-marostegui.json
  • 05:45 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 05:45 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 05:31 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T367856)', diff saved to https://phabricator.wikimedia.org/P66906 and previous config saved to /var/cache/conftool/dbconfig/20240724-053128-marostegui.json
  • 05:12 akosiaris@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host deploy1003.eqiad.wmnet with OS bullseye
  • 01:14 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host pc2017.codfw.wmnet with OS bookworm
  • 00:12 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm

2024-07-23

  • 23:58 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host pc1017.eqiad.wmnet with OS bookworm
  • 23:54 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host pc2017.codfw.wmnet with OS bookworm
  • 23:49 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['pc2017']
  • 23:43 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['pc2017']
  • 23:43 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['pc2017']
  • 23:42 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['pc2017']
  • 23:41 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pc2017.mgmt.codfw.wmnet with reboot policy FORCED
  • 23:23 eileen: civicrm upgraded from 4247715d to c656ab2f
  • 23:20 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host pc2017.mgmt.codfw.wmnet with reboot policy FORCED
  • 23:17 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 23:17 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding pc2017 to codfw - jhancock@cumin2002"
  • 23:16 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding pc2017 to codfw - jhancock@cumin2002"
  • 23:12 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 23:11 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host pc1017.eqiad.wmnet with OS bookworm
  • 23:09 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host pc1017.mgmt.eqiad.wmnet with reboot policy FORCED
  • 22:57 jclark@cumin1002: START - Cookbook sre.hosts.provision for host pc1017.mgmt.eqiad.wmnet with reboot policy FORCED
  • 22:56 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:54 jclark@cumin1002: START - Cookbook sre.dns.netbox
  • 22:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1203 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P66905 and previous config saved to /var/cache/conftool/dbconfig/20240723-223855-ladsgroup.json
  • 22:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1202 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P66904 and previous config saved to /var/cache/conftool/dbconfig/20240723-223826-ladsgroup.json
  • 22:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1195 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P66903 and previous config saved to /var/cache/conftool/dbconfig/20240723-223742-ladsgroup.json
  • 22:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1203 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P66902 and previous config saved to /var/cache/conftool/dbconfig/20240723-222349-ladsgroup.json
  • 22:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1202 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P66901 and previous config saved to /var/cache/conftool/dbconfig/20240723-222320-ladsgroup.json
  • 22:22 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Unbanning all hosts in search_eqiad
  • 22:22 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Unbanning all hosts in search_eqiad
  • 22:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1195 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P66900 and previous config saved to /var/cache/conftool/dbconfig/20240723-222236-ladsgroup.json
  • 22:08 jclark@cumin1002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 22:08 jclark@cumin1002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt pc1017 - jclark@cumin1002"
  • 22:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1203 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P66899 and previous config saved to /var/cache/conftool/dbconfig/20240723-220844-ladsgroup.json
  • 22:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1202 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P66898 and previous config saved to /var/cache/conftool/dbconfig/20240723-220815-ladsgroup.json
  • 22:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1195 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P66897 and previous config saved to /var/cache/conftool/dbconfig/20240723-220731-ladsgroup.json
  • 22:07 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt pc1017 - jclark@cumin1002"
  • 22:03 jclark@cumin1002: START - Cookbook sre.dns.netbox
  • 21:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1203 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P66896 and previous config saved to /var/cache/conftool/dbconfig/20240723-215338-ladsgroup.json
  • 21:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1202 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P66895 and previous config saved to /var/cache/conftool/dbconfig/20240723-215309-ladsgroup.json
  • 21:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1195 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P66894 and previous config saved to /var/cache/conftool/dbconfig/20240723-215225-ladsgroup.json
  • away: UTC late deploys done
  • 20:53 tgr@deploy1002: Finished scap: Backport for Respect wgTranslateNumerals in Cite footnote markers (T370585), Respect wgTranslateNumerals in Cite footnote markers (T370585) (duration: 09m 34s)
  • 20:48 tgr@deploy1002: wmde-fisch, tgr: Continuing with sync
  • 20:46 tgr@deploy1002: wmde-fisch, tgr: Backport for Respect wgTranslateNumerals in Cite footnote markers (T370585), Respect wgTranslateNumerals in Cite footnote markers (T370585) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:44 tgr@deploy1002: Started scap sync-world: Backport for Respect wgTranslateNumerals in Cite footnote markers (T370585), Respect wgTranslateNumerals in Cite footnote markers (T370585)
  • 20:38 ryankemper@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Unbanning all hosts in search_eqiad
  • 20:38 ryankemper@cumin2002: START - Cookbook sre.elasticsearch.ban Unbanning all hosts in search_eqiad
  • 20:22 tgr@deploy1002: Finished scap: Backport for debug: Enable Special:WikimediaDebug (T350094) (duration: 09m 28s)
  • 20:16 tgr@deploy1002: tgr: Continuing with sync
  • 20:14 tgr@deploy1002: tgr: Backport for debug: Enable Special:WikimediaDebug (T350094) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:12 tgr@deploy1002: Started scap sync-world: Backport for debug: Enable Special:WikimediaDebug (T350094)
  • 18:59 milimetric@deploy1002: Finished deploy [airflow-dags/analytics@01e1952]: (no justification provided) (duration: 00m 30s)
  • 18:58 milimetric@deploy1002: Started deploy [airflow-dags/analytics@01e1952]: (no justification provided)
  • 18:45 mutante: puppetmaster1001/puppetmaster2001 - rm /var/run/confd-template/*.err to clear pybal icinga alerts after T367949
  • 18:42 mutante: puppetmaster1001/puppetmaster2001 - rm /var/run/confd-template/_srv_config-master_pybal_codfw_api-https.err to clear pybal icinga alerts after T367949
  • 18:40 pt1979@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephmon1004.eqiad.wmnet with OS bullseye
  • 18:14 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group0 to 1.43.0-wmf.15 refs T366960
  • 18:13 swfrench-wmf: sudo cumin 'A:lvs-secondary-eqiad or A:lvs-low-traffic-eqiad' 'ipvsadm --delete-service --tcp-service 10.2.2.1:443' (appservers-https eqiad) - T367949
  • 18:12 aokoth@cumin1002: END (PASS) - Cookbook sre.vrts.upgrade (exit_code=0) on VRTS host vrts1001.eqiad.wmnet
  • 18:11 swfrench-wmf: sudo cumin 'A:lvs-secondary-eqiad or A:lvs-low-traffic-eqiad' 'ipvsadm --delete-service --tcp-service 10.2.2.22:443' (api-https eqiad) - T367949
  • 18:11 swfrench-wmf: sudo cumin 'A:lvs-secondary-eqiad or A:lvs-low-traffic-eqiad' 'ipvsa
  • 18:10 aokoth@cumin1002: START - Cookbook sre.vrts.upgrade on VRTS host vrts1001.eqiad.wmnet
  • 18:10 swfrench-wmf: sudo cumin 'A:lvs-secondary-codfw or A:lvs-low-traffic-codfw' 'ipvsa
  • 18:08 swfrench-wmf: sudo cumin 'A:lvs-secondary-codfw or A:lvs-low-traffic-codfw' 'ipvsa
  • 18:01 aokoth@cumin1002: END (FAIL) - Cookbook sre.vrts.upgrade (exit_code=99) on VRTS host vrts1001.eqiad.wmnet
  • 18:01 aokoth@cumin1002: START - Cookbook sre.vrts.upgrade on VRTS host vrts1001.eqiad.wmnet
  • 17:58 swfrench-wmf: sudo cumin 'A:lvs-low-traffic-eqiad' 'systemctl restart pybal.service' - T367949
  • 17:51 swfrench-wmf: sudo cumin 'A:lvs-secondary-eqiad' 'systemctl restart pybal.service' - T367949
  • 17:46 logmsgbot: nshahquinn-wmf@deploy1002 Finished deploy [airflow-dags/analytics_product@ebd9e13]: (no justification provided) (duration: 00m 07s)
  • 17:46 logmsgbot: nshahquinn-wmf@deploy1002 Started deploy [airflow-dags/analytics_product@ebd9e13]: (no justification provided)
  • 17:44 swfrench-wmf: sudo cumin 'A:lvs-low-traffic-codfw' 'systemctl restart pybal.service' - T367949
  • 17:41 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs2014.codfw.wmnet
  • 17:41 sukhe@cumin1002: START - Cookbook sre.hosts.remove-downtime for lvs2014.codfw.wmnet
  • 17:40 swfrench@cumin2002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-secondary-codfw (T367949)
  • 17:37 pt1979@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephmon1004.eqiad.wmnet with OS bullseye
  • 17:33 swfrench@cumin2002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-secondary-codfw (T367949)
  • 17:28 swfrench-wmf: run-puppet-agent on O:lvs::balancer to pick up switch to service_setup, removal of profile::lvs::realserver::pools - T367949
  • 17:17 swfrench-wmf: run-puppet-agent on A:dnsbox to pick up switch to lvs_setup - T367949
  • 17:06 swfrench-wmf: ran authdns-update on dns1004 to pick up removal of appservers / api records - T367949
  • 17:04 dancy@deploy1002: sync-world aborted: testing (duration: 00m 51s)
  • 17:03 dancy@deploy1002: Started scap sync-world: testing
  • 17:02 pt1979@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephmon1004.eqiad.wmnet with OS bullseye
  • 16:59 jhathaway: applying varnish change on cp4037, 1030591
  • 16:58 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply
  • 16:57 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply
  • 16:16 pt1979@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephmon1004.eqiad.wmnet with OS bullseye
  • 16:14 pt1979@cumin1002: END (FAIL) - Cookbook sre.hosts.dhcp (exit_code=99) for host cloudcephmon1004.eqiad.wmnet
  • 16:07 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/growthbook: apply
  • 16:07 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/growthbook: apply
  • 15:52 pt1979@cumin1002: START - Cookbook sre.hosts.dhcp for host cloudcephmon1004.eqiad.wmnet
  • 15:48 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/growthbook: apply
  • 15:47 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 15:47 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 15:24 cgoubert@cumin1002: conftool action : set/pooled=yes; selector: name=(kubernetes1025|kubernetes1026|kubernetes1052|kubernetes1053|kubernetes1054|kubernetes1055|kubernetes1056|mw1496).eqiad.wmnet,cluster=kubernetes,service=kubesvc [reason: Uncordoning following T365998]
  • 15:24 Emperor: moss-be1003 out of maintenance mode after network downtime T365998
  • 15:22 cgoubert@cumin1002: conftool action : set/pooled=yes; selector: name=dse-k8s-worker1008.eqiad.wmnet,cluster=dse-k8s,service=kubesvc
  • 15:22 claime: Uncordoning dse-k8s-worker1008.eqiad.wmnet after T365998
  • 15:20 andrewbogott: find /srv/mediawiki/images/wikitech/archive -type f | xargs delete on wikitech-static, drive is full of nonsense
  • 15:07 brennen@deploy1002: Finished deploy [phabricator/deployment@3902e30]: deploy phab1004 for T370776 (duration: 00m 33s)
  • 15:06 brennen@deploy1002: Started deploy [phabricator/deployment@3902e30]: deploy phab1004 for T370776
  • 15:06 brennen@deploy1002: Finished deploy [phabricator/deployment@3902e30]: deploy phab2002 for T370776 (redux, first deploy a mistaken no-op) (duration: 00m 34s)
  • 15:05 brennen@deploy1002: Started deploy [phabricator/deployment@3902e30]: deploy phab2002 for T370776 (redux, first deploy a mistaken no-op)
  • 15:05 brennen@deploy1002: Finished deploy [phabricator/deployment@7335128]: deploy phab2002 for T370776 (duration: 01m 17s)
  • 15:03 brennen@deploy1002: Started deploy [phabricator/deployment@7335128]: deploy phab2002 for T370776
  • 15:03 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab2002.codfw.wmnet with reason: Phabricator/Phorge update
  • 15:03 jelto@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on phab2002.codfw.wmnet with reason: Phabricator/Phorge update
  • 15:03 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab1004.eqiad.wmnet with reason: Phabricator/Phorge update
  • 15:02 jelto@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on phab1004.eqiad.wmnet with reason: Phabricator/Phorge update
  • 15:02 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab.wmfusercontent.org with reason: Phabricator/Phorge update
  • 15:02 jelto@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on phab.wmfusercontent.org with reason: Phabricator/Phorge update
  • 15:01 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 25 hosts with reason: JunOS upgrade lsw1-f3-eqiad
  • 15:01 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on 25 hosts with reason: JunOS upgrade lsw1-f3-eqiad
  • 15:01 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lsw1-f3-eqiad,lsw1-f3-eqiad IPv6,ssw1-e1-eqiad.mgmt,ssw1-f1-eqiad.mgmt with reason: JunOS upgrade lsw1-f3-eqiad
  • 15:00 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on lsw1-f3-eqiad,lsw1-f3-eqiad IPv6,ssw1-e1-eqiad.mgmt,ssw1-f1-eqiad.mgmt with reason: JunOS upgrade lsw1-f3-eqiad
  • 15:00 topranks: rebooting lsw1-f3-eqiad to complete JunOS upgrade (T365998)
  • 14:59 XioNoX: deploy CR1055546 border-in: remove authdns filter
  • 14:59 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply
  • 14:58 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply
  • 14:54 Emperor: moss-be1003 into maintenance mode for network downtime T365998
  • 14:48 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:50:00 on lsw1-f3-eqiad.mgmt with reason: prep JunOS upgrade lsw1-f3-eqiad
  • 14:48 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:50:00 on lsw1-f3-eqiad.mgmt with reason: prep JunOS upgrade lsw1-f3-eqiad
  • 14:10 klausman@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1001.eqiad.wmnet
  • 14:10 ChrisDobbins901_: cdobbins@cumin1002:~$ sudo cumin 'A:cp' 'run-puppet-agent "merging CR #1041705"'
  • 14:06 akosiaris@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on deploy1003.eqiad.wmnet with reason: host reimage
  • 14:03 klausman@cumin1002: START - Cookbook sre.hosts.reboot-single for host ml-serve1001.eqiad.wmnet
  • 14:03 akosiaris@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on deploy1003.eqiad.wmnet with reason: host reimage
  • 13:58 Lucas_WMDE: UTC afternoon backport+config window done
  • 13:57 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for MoveLogFormatter::getPreloadTitles: Handle bad titles (T370396) (duration: 09m 24s)
  • 13:52 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde: Continuing with sync
  • 13:51 XioNoX: deploy CR1055544 border-in: remove squid and nrpe filters, expand LVS filter
  • 13:51 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde: Backport for MoveLogFormatter::getPreloadTitles: Handle bad titles (T370396) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:50 sukhe: running authdns-update after dns6001 depool
  • 13:50 XioNoX: deploy CR1055543: border-in: remove git-ssh term
  • 13:49 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:49 akosiaris@cumin1002: START - Cookbook sre.hosts.reimage for host deploy1003.eqiad.wmnet with OS bullseye
  • 13:48 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 13:47 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap sync-world: Backport for MoveLogFormatter::getPreloadTitles: Handle bad titles (T370396)
  • 13:44 ChrisDobbins901_: cdobbins@cumin1002:~$ sudo cumin 'A:cp' 'disable-puppet "merging CR #1041705"'
  • 13:43 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/growthbook: apply
  • 13:40 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=dns6001.wikimedia.org [reason: finished upgrading anycast-hc: T370068]
  • 13:38 cmooney@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM netflow7001.magru.wmnet
  • 13:37 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=dns6001.wikimedia.org [reason: upgrading anycast-hc: T370068]
  • 13:34 cmooney@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM netflow7001.magru.wmnet
  • 13:34 cmooney@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM netflow6001.drmrs.wmnet
  • 13:31 cmooney@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM netflow5002.eqsin.wmnet
  • 13:30 cmooney@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM netflow6001.drmrs.wmnet
  • 13:29 cmooney@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM netflow4002.ulsfo.wmnet
  • 13:24 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for [arwiki] Enable the CampaignEvents extension (T370066) (duration: 19m 17s)
  • 13:24 cmooney@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM netflow5002.eqsin.wmnet
  • 13:23 cmooney@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM netflow4002.ulsfo.wmnet
  • 13:22 cmooney@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM netflow3003.esams.wmnet
  • 13:19 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde, daimona: Continuing with sync
  • 13:16 cmooney@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM netflow3003.esams.wmnet
  • 13:15 cmooney@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM netflow1002.eqiad.wmnet
  • 13:11 cmooney@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM netflow1002.eqiad.wmnet
  • 13:10 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde, daimona: Backport for [arwiki] Enable the CampaignEvents extension (T370066) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:05 cgoubert@cumin1002: conftool action : set/pooled=inactive; selector: name=dse-k8s-worker1008.eqiad.wmnet,cluster=dse-k8s,service=kubesvc
  • 13:05 claime: Cordoning dse-k8s-worker1008.eqiad.wmnet for T365998
  • 13:05 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap sync-world: Backport for [arwiki] Enable the CampaignEvents extension (T370066)
  • 11:28 cgoubert@cumin1002: conftool action : set/pooled=inactive; selector: name=(kubernetes1025|kubernetes1026|kubernetes1052|kubernetes1053|kubernetes1054|kubernetes1055|kubernetes1056|mw1496).eqiad.wmnet,cluster=kubernetes,service=kubesvc
  • 11:19 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 11:19 claime: Lowered concurrency of RecordLint job to 50 - T370304
  • 11:18 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
  • 11:18 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 11:17 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
  • 11:16 cgoubert@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 11:15 cgoubert@deploy1002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
  • 10:51 Amir1: running "delete from linter where linter_cat = 23 limit 1000;" in a loop in mwmaint (T370304)
  • 10:39 claime: Cordoning kubernetes1025.eqiad.wmnet kubernetes1026.eqiad.wmnet kubernetes1052.eqiad.wmnet kubernetes1053.eqiad.wmnet kubernetes1054.eqiad.wmnet kubernetes1055.eqiad.wmnet kubernetes1056.eqiad.wmnet mw1496.eqiad.wmnet for T365998
  • 10:03 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/growthbook: apply
  • 10:02 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/growthbook: apply
  • 09:41 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/growthbook: apply
  • 09:41 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/growthbook: apply
  • 09:35 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/growthbook: apply
  • 09:35 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/growthbook: apply
  • 09:14 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/growthbook: apply
  • 09:12 dreamyjazz@deploy1002: Finished scap: Backport for Define wgGlobalBlockingCentralWiki as 'metawiki' (T370457) (duration: 11m 29s)
  • 09:07 dreamyjazz@deploy1002: dreamyjazz: Continuing with sync
  • 09:07 dreamyjazz@deploy1002: dreamyjazz: Backport for Define wgGlobalBlockingCentralWiki as 'metawiki' (T370457) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 09:05 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/growthbook: apply
  • 09:01 dreamyjazz@deploy1002: Started scap sync-world: Backport for Define wgGlobalBlockingCentralWiki as 'metawiki' (T370457)
  • 08:27 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/growthbook: apply
  • 08:17 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/growthbook: apply
  • 07:22 kartik@deploy1002: Finished scap: Backport for uzwiki: Limit publishing in CX to 'patroller' and 'sysop' groups (T370387) (duration: 13m 37s)
  • 07:17 kartik@deploy1002: kartik: Continuing with sync
  • 07:15 kartik@deploy1002: kartik: Backport for uzwiki: Limit publishing in CX to 'patroller' and 'sysop' groups (T370387) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 07:08 kartik@deploy1002: Started scap sync-world: Backport for uzwiki: Limit publishing in CX to 'patroller' and 'sysop' groups (T370387)
  • 06:58 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 06:58 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 05:00 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2173 (T367856)', diff saved to https://phabricator.wikimedia.org/P66892 and previous config saved to /var/cache/conftool/dbconfig/20240723-050042-marostegui.json
  • 05:00 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 12:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 05:00 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 12:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 05:00 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2173.codfw.wmnet with reason: Maintenance
  • 05:00 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2173.codfw.wmnet with reason: Maintenance
  • 05:00 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170 (T367856)', diff saved to https://phabricator.wikimedia.org/P66891 and previous config saved to /var/cache/conftool/dbconfig/20240723-050004-marostegui.json
  • 04:44 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170', diff saved to https://phabricator.wikimedia.org/P66890 and previous config saved to /var/cache/conftool/dbconfig/20240723-044457-marostegui.json
  • 04:29 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170', diff saved to https://phabricator.wikimedia.org/P66889 and previous config saved to /var/cache/conftool/dbconfig/20240723-042950-marostegui.json
  • 04:14 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170 (T367856)', diff saved to https://phabricator.wikimedia.org/P66888 and previous config saved to /var/cache/conftool/dbconfig/20240723-041442-marostegui.json
  • 04:01 mwpresync@deploy1002: Pruned MediaWiki: 1.43.0-wmf.12 (duration: 01m 00s)
  • 03:54 mwpresync@deploy1002: Finished scap: testwikis to 1.43.0-wmf.15 refs T366960 (duration: 51m 50s)
  • 03:03 mwpresync@deploy1002: Started scap sync-world: testwikis to 1.43.0-wmf.15 refs T366960
  • 01:28 dani@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 01:27 dani@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 01:27 dani@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 01:27 dani@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 01:27 dani@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 01:27 dani@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 01:24 dani@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 01:24 dani@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 01:24 dani@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 01:24 dani@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 01:24 dani@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 01:24 dani@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 01:24 dani@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 01:24 dani@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 01:24 dani@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 01:24 dani@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 01:24 dani@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 01:24 dani@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 00:22 eevans@deploy1002: helmfile [staging] DONE helmfile.d/services/data-gateway: apply
  • 00:22 eevans@deploy1002: helmfile [staging] START helmfile.d/services/data-gateway: apply
  • 00:05 cmooney@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM netflow2003.codfw.wmnet
  • 00:02 cmooney@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM netflow2003.codfw.wmnet
  • 00:00 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on netflow2003.codfw.wmnet with reason: reboot netflow2003
  • 00:00 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:15:00 on netflow2003.codfw.wmnet with reason: reboot netflow2003

2024-07-22

2024-07-21

  • 23:22 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219 (T367856)', diff saved to https://phabricator.wikimedia.org/P66871 and previous config saved to /var/cache/conftool/dbconfig/20240721-232234-marostegui.json
  • 23:07 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P66870 and previous config saved to /var/cache/conftool/dbconfig/20240721-230727-marostegui.json
  • 22:52 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P66869 and previous config saved to /var/cache/conftool/dbconfig/20240721-225219-marostegui.json
  • 22:44 ladsgroup@deploy1002: Finished scap: Backport for Disable missing-image-alt-text lint (T370304) (duration: 26m 27s)
  • 22:37 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219 (T367856)', diff saved to https://phabricator.wikimedia.org/P66868 and previous config saved to /var/cache/conftool/dbconfig/20240721-223712-marostegui.json
  • 22:36 ladsgroup@deploy1002: ladsgroup: Continuing with sync
  • 22:35 ladsgroup@deploy1002: ladsgroup: Backport for Disable missing-image-alt-text lint (T370304) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 22:18 ladsgroup@deploy1002: Started scap sync-world: Backport for Disable missing-image-alt-text lint (T370304)
  • 08:58 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2219 (T367856)', diff saved to https://phabricator.wikimedia.org/P66867 and previous config saved to /var/cache/conftool/dbconfig/20240721-085853-marostegui.json
  • 08:58 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2219.codfw.wmnet with reason: Maintenance
  • 08:58 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2219.codfw.wmnet with reason: Maintenance
  • 08:58 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2210 (T367856)', diff saved to https://phabricator.wikimedia.org/P66866 and previous config saved to /var/cache/conftool/dbconfig/20240721-085832-marostegui.json
  • 08:43 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P66865 and previous config saved to /var/cache/conftool/dbconfig/20240721-084325-marostegui.json
  • 08:28 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P66864 and previous config saved to /var/cache/conftool/dbconfig/20240721-082818-marostegui.json
  • 08:13 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2210 (T367856)', diff saved to https://phabricator.wikimedia.org/P66863 and previous config saved to /var/cache/conftool/dbconfig/20240721-081310-marostegui.json
  • 02:01 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2153 (T367856)', diff saved to https://phabricator.wikimedia.org/P66862 and previous config saved to /var/cache/conftool/dbconfig/20240721-020121-marostegui.json
  • 02:01 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2153.codfw.wmnet with reason: Maintenance
  • 02:01 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2153.codfw.wmnet with reason: Maintenance
  • 02:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T367856)', diff saved to https://phabricator.wikimedia.org/P66861 and previous config saved to /var/cache/conftool/dbconfig/20240721-020059-marostegui.json
  • 01:45 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P66860 and previous config saved to /var/cache/conftool/dbconfig/20240721-014552-marostegui.json
  • 01:30 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P66859 and previous config saved to /var/cache/conftool/dbconfig/20240721-013044-marostegui.json
  • 01:15 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T367856)', diff saved to https://phabricator.wikimedia.org/P66858 and previous config saved to /var/cache/conftool/dbconfig/20240721-011537-marostegui.json

2024-07-20

  • 19:00 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2210 (T367856)', diff saved to https://phabricator.wikimedia.org/P66857 and previous config saved to /var/cache/conftool/dbconfig/20240720-190046-marostegui.json
  • 19:00 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2210.codfw.wmnet with reason: Maintenance
  • 19:00 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2210.codfw.wmnet with reason: Maintenance
  • 19:00 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2206 (T367856)', diff saved to https://phabricator.wikimedia.org/P66856 and previous config saved to /var/cache/conftool/dbconfig/20240720-190024-marostegui.json
  • 18:45 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2206', diff saved to https://phabricator.wikimedia.org/P66855 and previous config saved to /var/cache/conftool/dbconfig/20240720-184516-marostegui.json
  • 18:30 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2206', diff saved to https://phabricator.wikimedia.org/P66854 and previous config saved to /var/cache/conftool/dbconfig/20240720-183009-marostegui.json
  • 18:15 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2206 (T367856)', diff saved to https://phabricator.wikimedia.org/P66853 and previous config saved to /var/cache/conftool/dbconfig/20240720-181502-marostegui.json
  • 14:30 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephmon1005.eqiad.wmnet with OS bullseye
  • 14:22 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephmon1004.eqiad.wmnet with OS bullseye
  • 14:16 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephmon1004
  • 14:16 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephmon1004
  • 14:15 jclark@cumin1002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host cloudcephmon1006
  • 14:15 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephmon1006
  • 14:15 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephmon1005
  • 14:15 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephmon1005
  • 14:15 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephmon1004
  • 14:14 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephmon1004
  • 14:10 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephmon1004
  • 14:10 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephmon1004
  • 14:09 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:09 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt for cloudcephmon1004-6 - jclark@cumin1002"
  • 14:08 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt for cloudcephmon1004-6 - jclark@cumin1002"
  • 14:06 jclark@cumin1002: START - Cookbook sre.dns.netbox
  • 14:06 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephmon1004
  • 14:05 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephmon1004
  • 14:05 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephmon1004
  • 14:05 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephmon1004
  • 13:59 jclark@cumin1002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host cloudcephmon1006
  • 13:59 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephmon1006
  • 13:54 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephmon1004
  • 13:54 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephmon1004
  • 13:47 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephmon1004
  • 13:47 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephmon1004
  • 13:47 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephmon1005
  • 13:47 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephmon1005
  • 13:45 jclark@cumin1002: END (FAIL) - Cookbook sre.network.configure-switch-interfaces (exit_code=99) for host cloudcephmon1005
  • 13:45 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephmon1005
  • 13:45 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephmon1004
  • 13:44 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephmon1004
  • 13:34 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephmon1005
  • 13:34 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephmon1005
  • 13:33 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephmon1004
  • 13:33 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephmon1004
  • 13:10 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephmon1005.eqiad.wmnet with OS bullseye
  • 13:02 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephmon1004.eqiad.wmnet with OS bullseye
  • 08:15 dani@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 08:15 dani@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 08:15 dani@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 08:15 dani@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 08:15 dani@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 08:15 dani@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 06:21 dani@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 03:35 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2206 (T367856)', diff saved to https://phabricator.wikimedia.org/P66852 and previous config saved to /var/cache/conftool/dbconfig/20240720-033501-marostegui.json
  • 03:34 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2206.codfw.wmnet with reason: Maintenance
  • 03:34 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2206.codfw.wmnet with reason: Maintenance
  • 01:17 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2146 (T367856)', diff saved to https://phabricator.wikimedia.org/P66851 and previous config saved to /var/cache/conftool/dbconfig/20240720-011705-marostegui.json
  • 01:16 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2146.codfw.wmnet with reason: Maintenance
  • 01:16 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2146.codfw.wmnet with reason: Maintenance
  • 01:16 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T367856)', diff saved to https://phabricator.wikimedia.org/P66850 and previous config saved to /var/cache/conftool/dbconfig/20240720-011643-marostegui.json
  • 01:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P66849 and previous config saved to /var/cache/conftool/dbconfig/20240720-010136-marostegui.json
  • 00:46 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P66848 and previous config saved to /var/cache/conftool/dbconfig/20240720-004629-marostegui.json
  • 00:31 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T367856)', diff saved to https://phabricator.wikimedia.org/P66847 and previous config saved to /var/cache/conftool/dbconfig/20240720-003122-marostegui.json
  • 00:26 jclark@cumin1002: END (ERROR) - Cookbook sre.hosts.provision (exit_code=97) for host db1179.mgmt.eqiad.wmnet with reboot policy GRACEFUL
  • 00:14 jclark@cumin1002: START - Cookbook sre.hosts.provision for host db1179.mgmt.eqiad.wmnet with reboot policy GRACEFUL

2024-07-19

  • 21:14 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1061.eqiad.wmnet with OS bookworm
  • 20:52 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1061.eqiad.wmnet with reason: host reimage
  • 20:49 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1061.eqiad.wmnet with reason: host reimage
  • 20:34 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1061.eqiad.wmnet with OS bookworm
  • 17:21 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:21 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns entries for new irb ints codfw row c and d - cmooney@cumin1002"
  • 17:20 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns entries for new irb ints codfw row c and d - cmooney@cumin1002"
  • 17:17 cmooney@cumin1002: START - Cookbook sre.dns.netbox
  • 17:13 cmooney@cumin1002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 17:12 topranks: adding irb ints for row c/d vlans to codfw leaf switches in those rows T364095
  • 17:05 cmooney@cumin1002: START - Cookbook sre.dns.netbox
  • 16:48 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:46 cmooney@cumin1002: START - Cookbook sre.dns.netbox
  • 16:20 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/superset-next: apply
  • 16:20 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/superset-next: apply
  • 16:13 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/superset-next: apply
  • 16:11 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/superset-next: apply
  • 15:52 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2038.codfw.wmnet with OS bullseye
  • 15:44 jhancock@cumin2002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['gerrit2003']
  • 15:44 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['gerrit2003']
  • 15:43 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['gerrit2003']
  • 15:41 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2037.codfw.wmnet with OS bullseye
  • 15:35 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['gerrit2003']
  • 15:35 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['gerrit2003']
  • 15:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['gerrit2003']
  • 15:34 jhancock@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['gerrit2003']
  • 15:34 jhancock@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['gerrit2003']
  • 15:32 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2038.codfw.wmnet with reason: host reimage
  • 15:28 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2038.codfw.wmnet with reason: host reimage
  • 15:27 jhancock@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host gerrit2003.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:25 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host sretest2002.codfw.wmnet
  • 15:20 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2037.codfw.wmnet with reason: host reimage
  • 15:17 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2037.codfw.wmnet with reason: host reimage
  • 15:16 jhancock@cumin2002: START - Cookbook sre.hosts.provision for host gerrit2003.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:16 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:16 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding gerrit2003 to codfw - jhancock@cumin2002"
  • 15:15 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding gerrit2003 to codfw - jhancock@cumin2002"
  • 15:11 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 15:10 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2038.codfw.wmnet with OS bullseye
  • 15:09 cgoubert@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-worker2038.codfw.wmnet with OS bullseye
  • 14:59 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2037.codfw.wmnet with OS bullseye
  • 14:54 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2035.codfw.wmnet with OS bullseye
  • 14:51 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2036.codfw.wmnet with OS bullseye
  • 14:49 cmooney@cumin1002: START - Cookbook sre.hosts.dhcp for host sretest2002.codfw.wmnet
  • 14:45 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2038.codfw.wmnet with OS bullseye
  • 14:43 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-worker2037.mgmt.codfw.wmnet with reboot policy GRACEFUL
  • 14:43 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:43 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns entries for sretest2002 - cmooney@cumin1002"
  • 14:42 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns entries for sretest2002 - cmooney@cumin1002"
  • 14:40 cgoubert@cumin1002: START - Cookbook sre.hosts.provision for host wikikube-worker2037.mgmt.codfw.wmnet with reboot policy GRACEFUL
  • 14:39 godog: power off centrallog1002 for network upgrade - T369825
  • 14:38 filippo@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on centrallog1002.eqiad.wmnet with reason: network upgrade
  • 14:38 filippo@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on centrallog1002.eqiad.wmnet with reason: network upgrade
  • 14:37 cmooney@cumin1002: START - Cookbook sre.dns.netbox
  • 14:36 cgoubert@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-worker2038.codfw.wmnet with OS bullseye
  • 14:36 cgoubert@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-worker2037.codfw.wmnet with OS bullseye
  • 14:34 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2035.codfw.wmnet with reason: host reimage
  • 14:31 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2036.codfw.wmnet with reason: host reimage
  • 14:28 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2035.codfw.wmnet with reason: host reimage
  • 14:27 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2036.codfw.wmnet with reason: host reimage
  • 14:10 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2038.codfw.wmnet with OS bullseye
  • 14:10 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2037.codfw.wmnet with OS bullseye
  • 14:09 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2036.codfw.wmnet with OS bullseye
  • 14:09 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2035.codfw.wmnet with OS bullseye
  • 14:08 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2439 to wikikube-worker2038
  • 14:07 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2038
  • 14:06 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2038
  • 14:06 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:06 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2439 to wikikube-worker2038 - cgoubert@cumin1002"
  • 14:05 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2439 to wikikube-worker2038 - cgoubert@cumin1002"
  • 14:03 herron@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thanos-web,name=titan1001.eqiad.wmnet
  • 14:02 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 14:02 herron@puppetmaster1001: conftool action : set/pooled=no; selector: service=thanos-web,name=titan1001.eqiad.wmnet
  • 14:02 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2439 to wikikube-worker2038
  • 14:02 herron@puppetmaster1001: conftool action : set/pooled=yes; selector: service=thanos-web,name=titan1002.eqiad.wmnet
  • 14:01 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2438 to wikikube-worker2037
  • 14:01 herron@puppetmaster1001: conftool action : set/pooled=no; selector: service=thanos-web,name=titan1002.eqiad.wmnet
  • 14:01 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2037
  • 13:59 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2037
  • 13:59 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:59 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2438 to wikikube-worker2037 - cgoubert@cumin1002"
  • 13:57 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2438 to wikikube-worker2037 - cgoubert@cumin1002"
  • 13:55 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 13:55 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2438 to wikikube-worker2037
  • 13:53 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2433 to wikikube-worker2036
  • 13:53 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2036
  • 13:52 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2036
  • 13:52 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:52 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2433 to wikikube-worker2036 - cgoubert@cumin1002"
  • 13:51 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2433 to wikikube-worker2036 - cgoubert@cumin1002"
  • 13:48 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 13:48 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2433 to wikikube-worker2036
  • 13:46 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2432 to wikikube-worker2035
  • 13:46 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2035
  • 13:45 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2035
  • 13:45 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:45 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2432 to wikikube-worker2035 - cgoubert@cumin1002"
  • 13:42 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2432 to wikikube-worker2035 - cgoubert@cumin1002"
  • 13:39 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 13:39 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2432 to wikikube-worker2035
  • 13:31 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2439.mgmt.codfw.wmnet with reboot policy GRACEFUL
  • 13:21 cgoubert@cumin1002: START - Cookbook sre.hosts.provision for host mw2439.mgmt.codfw.wmnet with reboot policy GRACEFUL
  • 12:53 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2439.mgmt.codfw.wmnet with reboot policy GRACEFUL
  • 12:49 cgoubert@cumin1002: START - Cookbook sre.hosts.provision for host mw2439.mgmt.codfw.wmnet with reboot policy GRACEFUL
  • 12:47 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host mw2439.mgmt.codfw.wmnet with reboot policy GRACEFUL
  • 12:47 cgoubert@cumin1002: START - Cookbook sre.hosts.provision for host mw2439.mgmt.codfw.wmnet with reboot policy GRACEFUL
  • 12:24 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.convert-disks (exit_code=0) for host mw2439
  • 12:24 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2439.mgmt.codfw.wmnet with reboot policy GRACEFUL
  • 12:23 arnaudb@cumin1002: dbctl commit (dc=all): 'T365998 - depooling db1195 - s1 db1202 - s7 db1203 - s8', diff saved to https://phabricator.wikimedia.org/P66843 and previous config saved to /var/cache/conftool/dbconfig/20240719-122320-arnaudb.json
  • 12:20 cgoubert@cumin1002: START - Cookbook sre.hosts.provision for host mw2439.mgmt.codfw.wmnet with reboot policy GRACEFUL
  • 12:19 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2199.codfw.wmnet with reason: Maintenance
  • 12:19 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2199.codfw.wmnet with reason: Maintenance
  • 12:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T367856)', diff saved to https://phabricator.wikimedia.org/P66842 and previous config saved to /var/cache/conftool/dbconfig/20240719-121933-marostegui.json
  • 12:19 cgoubert@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts mw2439.codfw.wmnet
  • 12:18 cgoubert@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mw2439.codfw.wmnet
  • 12:18 cgoubert@cumin1002: START - Cookbook sre.hosts.convert-disks for host mw2439
  • 12:13 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.convert-disks (exit_code=99) for host mw2439
  • 12:12 cgoubert@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts mw2439.codfw.wmnet
  • 12:12 cgoubert@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mw2439.codfw.wmnet
  • 12:12 cgoubert@cumin1002: START - Cookbook sre.hosts.convert-disks for host mw2439
  • 12:10 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.convert-disks (exit_code=99) for host mw2439
  • 12:09 cgoubert@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts mw2439.codfw.wmnet
  • 12:09 cgoubert@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mw2439.codfw.wmnet
  • 12:09 cgoubert@cumin1002: START - Cookbook sre.hosts.convert-disks for host mw2439
  • 12:06 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2439.mgmt.codfw.wmnet with reboot policy GRACEFUL
  • 12:04 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P66841 and previous config saved to /var/cache/conftool/dbconfig/20240719-120426-marostegui.json
  • 12:01 cgoubert@cumin1002: START - Cookbook sre.hosts.provision for host mw2439.mgmt.codfw.wmnet with reboot policy GRACEFUL
  • 11:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P66840 and previous config saved to /var/cache/conftool/dbconfig/20240719-114919-marostegui.json
  • 11:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T367856)', diff saved to https://phabricator.wikimedia.org/P66839 and previous config saved to /var/cache/conftool/dbconfig/20240719-113412-marostegui.json
  • 11:10 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.convert-disks (exit_code=99) for host mw2439
  • 11:07 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 11:05 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 11:05 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 10:54 cgoubert@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts mw2439.codfw.wmnet
  • 10:54 cgoubert@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mw2439.codfw.wmnet
  • 10:54 cgoubert@cumin1002: START - Cookbook sre.hosts.convert-disks for host mw2439
  • 10:49 cgoubert@cumin1002: END (ERROR) - Cookbook sre.hosts.convert-disks (exit_code=97) for host mw2439
  • 10:41 cgoubert@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts mw2439.codfw.wmnet
  • 10:41 cgoubert@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mw2439.codfw.wmnet
  • 10:41 cgoubert@cumin1002: START - Cookbook sre.hosts.convert-disks for host mw2439
  • 10:38 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.convert-disks (exit_code=99) for host mw2439
  • 10:37 cgoubert@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts mw2439.codfw.wmnet
  • 10:37 cgoubert@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mw2439.codfw.wmnet
  • 10:37 cgoubert@cumin1002: START - Cookbook sre.hosts.convert-disks for host mw2439
  • 10:28 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.convert-disks (exit_code=99) for host mw2439
  • 10:13 cgoubert@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts mw2439.codfw.wmnet
  • 10:13 cgoubert@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mw2439.codfw.wmnet
  • 10:13 cgoubert@cumin1002: START - Cookbook sre.hosts.convert-disks for host mw2439
  • 10:06 pfischer@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 10:05 pfischer@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 10:00 pfischer@deploy1002: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 10:00 pfischer@deploy1002: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 09:58 cgoubert@cumin1002: END (ERROR) - Cookbook sre.hosts.convert-disks (exit_code=97) for host mw2439
  • 09:54 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.dhcp (exit_code=99) for host sretest2001.codfw.wmnet
  • 09:42 cgoubert@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts mw2439.codfw.wmnet
  • 09:41 cgoubert@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mw2439.codfw.wmnet
  • 09:41 cgoubert@cumin1002: START - Cookbook sre.hosts.convert-disks for host mw2439
  • 09:35 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.convert-disks (exit_code=99) for host mw2439
  • 09:35 cgoubert@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts mw2439.codfw.wmnet
  • 09:35 cgoubert@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mw2439.codfw.wmnet
  • 09:35 cgoubert@cumin1002: START - Cookbook sre.hosts.convert-disks for host mw2439
  • 09:32 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.convert-disks (exit_code=99) for host mw2439
  • 09:31 cgoubert@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts mw2439.codfw.wmnet
  • 09:21 cgoubert@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mw2439.codfw.wmnet
  • 09:21 cgoubert@cumin1002: START - Cookbook sre.hosts.convert-disks for host mw2439
  • 08:16 pfischer@deploy1002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 08:16 pfischer@deploy1002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 08:15 pfischer@deploy1002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 08:15 pfischer@deploy1002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 08:15 pfischer@deploy1002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 08:15 pfischer@deploy1002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 08:08 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.convert-disks (exit_code=99) for host mw2438
  • 08:08 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2438.mgmt.codfw.wmnet with reboot policy GRACEFUL
  • 08:05 elukey@cumin1002: START - Cookbook sre.hosts.dhcp for host sretest2001.codfw.wmnet
  • 02:50 eileen: civicrm upgraded from 384fe444 to a9ef8ab9
  • 00:28 zabe@deploy1002: sync-world aborted: Backport for Set some site names for new-ish wikis (T363270 T360303 T360310 T363263) (duration: 01m 33s)
  • 00:26 zabe@deploy1002: Started scap sync-world: Backport for Set some site names for new-ish wikis (T363270 T360303 T360310 T363263)

2024-07-18

  • 23:57 topranks: re-enable ssw<->ssw bgp in codfw to move east-west traffic away from CRs T369274
  • 23:46 topranks: move IP GW for vlan private1-d-codfw to ssw1-d1-codfw and ssw1-d8-codfw T369274
  • 23:44 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 23:44 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns entries for migrated codfw gw IPs - cmooney@cumin1002"
  • 23:44 topranks: remove VRRP group for private1-d-codfw vlan on cr1-codfw and cr2-codfw
  • 23:43 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns entries for migrated codfw gw IPs - cmooney@cumin1002"
  • 23:40 cmooney@cumin1002: START - Cookbook sre.dns.netbox
  • 23:36 topranks: move outbound gateway for private1-d-codfw vlan from cr1-codfw to ssw1-d1-codfw
  • 23:31 topranks: disable IPv6 RA generation for private1-d-codfw vlan on cr1-codfw and cr2-codfw T369274
  • 23:17 topranks: enable IPv6 RA generation for private1-d-codfw vlan from ssw1-d1-codfw and ssw1-d8-codfw T369274
  • 23:16 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2145 (T367856)', diff saved to https://phabricator.wikimedia.org/P66838 and previous config saved to /var/cache/conftool/dbconfig/20240718-231639-marostegui.json
  • 23:16 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2145.codfw.wmnet with reason: Maintenance
  • 23:16 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2145.codfw.wmnet with reason: Maintenance
  • 23:05 topranks: Remove VRRP group for vlan private1-c-codfw on cr1-codfw and cr2-codfw
  • 22:49 topranks: Re-route outbound traffic for private1-c-codfw vlan on to ssw1-d1-codfw
  • 22:33 topranks: Disable IPv6 RA generation for private1-c-codfw vlan on cr1-codfw and cr2-codfw T369274
  • 22:19 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on elastic1100.eqiad.wmnet with reason: catch up on indexing
  • 22:19 bking@cumin2002: START - Cookbook sre.hosts.downtime for 4:00:00 on elastic1100.eqiad.wmnet with reason: catch up on indexing
  • 22:15 topranks: add IP interfaces for private1-c-codfw vlan to ssw1-d1-codfw and ssw1-d8-codfw
  • 22:03 topranks: move GW IPs for public1-d-codfw vlan to ssw1-d1-codfw and ssw1-d8-codfw T369274
  • 21:58 topranks: remove VRRP group on cr1-codfw and cr2-codfw for public1-d-codfw vlan T369274
  • 21:57 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.force-shard-allocation (exit_code=0)
  • 21:57 bking@cumin2002: START - Cookbook sre.elasticsearch.force-shard-allocation
  • 21:39 topranks: disable IPv6 RA generation on cr1-codfw and cr2-codfw for public1-d-codfw vlan T369274
  • 21:21 topranks: enable IPv6 RA generation on ssw1-d1-codfw and ssw1-d8-codfw for public1-d-codfw vlan T369274
  • 21:14 dancy@deploy1002: Finished scap: Backport for Fix guard clause in Revision Hook Handler and Precheck (T370161) (duration: 12m 02s)
  • 21:09 dancy@deploy1002: suecarmol, dancy: Continuing with sync
  • 21:08 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Unbanning all hosts in search_eqiad
  • 21:08 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Unbanning all hosts in search_eqiad
  • 21:04 dancy@deploy1002: suecarmol, dancy: Backport for Fix guard clause in Revision Hook Handler and Precheck (T370161) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:02 dancy@deploy1002: Started scap sync-world: Backport for Fix guard clause in Revision Hook Handler and Precheck (T370161)
  • 21:01 dancy@deploy1002: rebuilt and synchronized wikiversions files: group2 to 1.43.0-wmf.14 refs T366959
  • 20:52 dancy@deploy1002: Finished scap: Backport for Fixes client preferences error (T370441) (duration: 11m 22s)
  • 20:49 topranks: remove VRRP for public1-c-codfw vlan from cr1-codfw and cr2-codfw T369274
  • 20:47 dancy@deploy1002: dancy, jdlrobson: Continuing with sync
  • 20:43 dancy@deploy1002: dancy, jdlrobson: Backport for Fixes client preferences error (T370441) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:41 dancy@deploy1002: Started scap sync-world: Backport for Fixes client preferences error (T370441)
  • 20:25 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2172 (T367856)', diff saved to https://phabricator.wikimedia.org/P66836 and previous config saved to /var/cache/conftool/dbconfig/20240718-202511-marostegui.json
  • 20:25 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2172.codfw.wmnet with reason: Maintenance
  • 20:24 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2172.codfw.wmnet with reason: Maintenance
  • 20:24 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T367856)', diff saved to https://phabricator.wikimedia.org/P66835 and previous config saved to /var/cache/conftool/dbconfig/20240718-202449-marostegui.json
  • 20:04 topranks: enabling IPv6 RA generation for public1-c-codfw on ssw1-d1-codfw and ssw1-d8-codfw T369274
  • 19:54 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P66832 and previous config saved to /var/cache/conftool/dbconfig/20240718-195434-marostegui.json
  • 19:54 dancy@deploy1002: Finished scap: Backport for [i18n] Change the names of the Arabic months (T370456) (duration: 10m 23s)
  • 19:47 dancy@deploy1002: dancy: Continuing with sync
  • 19:46 dancy@deploy1002: dancy: Backport for [i18n] Change the names of the Arabic months (T370456) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 19:43 dancy@deploy1002: Started scap sync-world: Backport for [i18n] Change the names of the Arabic months (T370456)
  • 19:43 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:43 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new IRB interfaces codfw - cmooney@cumin1002"
  • 19:42 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new IRB interfaces codfw - cmooney@cumin1002"
  • 19:39 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T367856)', diff saved to https://phabricator.wikimedia.org/P66831 and previous config saved to /var/cache/conftool/dbconfig/20240718-193927-marostegui.json
  • 19:38 cmooney@cumin1002: START - Cookbook sre.dns.netbox
  • 19:37 topranks: add IRB int on public1-c-codfw vlan to ssw1-d1-codfw and ssw1-d8-codfw T369274
  • 19:37 denisse: Send SIGQUIT signal to the benthos service after a goroutine was waiting forever in webrequest_live.yaml - T369256
  • 19:34 topranks: disable BGP between spine switches in rows A and row D prior to enabling IP GW (T369274)
  • 19:32 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on ssw1-a[1,8]-codfw.mgmt,ssw1-d[1,8]-codfw.mgmt with reason: Migrate codfw row c and d IP GWs from CRs to Spines
  • 19:31 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on ssw1-a[1,8]-codfw.mgmt,ssw1-d[1,8]-codfw.mgmt with reason: Migrate codfw row c and d IP GWs from CRs to Spines
  • 19:12 topranks: enabling BGP session from cr1-codfw to ssw1-d1-codfw
  • 19:07 dancy@deploy1002: Installing scap version "4.93.0" for 232 hosts
  • 18:30 aokoth@cumin1002: END (FAIL) - Cookbook sre.vrts.upgrade (exit_code=99) on VRTS host vrts1001.eqiad.wmnet
  • 18:27 aokoth@cumin1002: START - Cookbook sre.vrts.upgrade on VRTS host vrts1001.eqiad.wmnet
  • 18:17 swfrench-wmf: api-ro.discovery.wmnet now resolves to failoid - T367949
  • 18:03 swfrench-wmf: appservers-ro.discovery.wmnet now resolves to failoid - T367949
  • 18:01 aokoth@cumin1002: END (FAIL) - Cookbook sre.vrts.upgrade (exit_code=99) on VRTS host vrts1001.eqiad.wmnet
  • 18:01 aokoth@cumin1002: START - Cookbook sre.vrts.upgrade on VRTS host vrts1001.eqiad.wmnet
  • 17:45 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2136', diff saved to https://phabricator.wikimedia.org/P66829 and previous config saved to /var/cache/conftool/dbconfig/20240718-174547-root.json
  • 17:43 topranks: disabling cr2-codfw port et-1/1/0 connecting to asw-c-codfw T366941
  • 17:38 cgoubert@cumin1002: START - Cookbook sre.hosts.provision for host mw2438.mgmt.codfw.wmnet with reboot policy GRACEFUL
  • 17:29 cgoubert@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts mw2438.codfw.wmnet
  • 17:29 cgoubert@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mw2438.codfw.wmnet
  • 17:29 cgoubert@cumin1002: START - Cookbook sre.hosts.convert-disks for host mw2438
  • 17:28 cgoubert@cumin1002: END (ERROR) - Cookbook sre.hosts.convert-disks (exit_code=97) for host mw2438
  • 17:24 topranks: making cr1-codfw interfaces connecting ssw1-d1-codfw VRRP master for row c & d vlans T366941
  • 17:20 cgoubert@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts mw2438.codfw.wmnet
  • 17:20 cgoubert@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mw2438.codfw.wmnet
  • 17:20 cgoubert@cumin1002: START - Cookbook sre.hosts.convert-disks for host mw2438
  • 17:15 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.convert-disks (exit_code=99) for host mw2438
  • 17:15 cgoubert@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts mw2438.codfw.wmnet
  • 17:15 cgoubert@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mw2438.codfw.wmnet
  • 17:15 cgoubert@cumin1002: START - Cookbook sre.hosts.convert-disks for host mw2438
  • 17:10 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.convert-disks (exit_code=99) for host mw2438
  • 17:10 cgoubert@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts mw2438.codfw.wmnet
  • 17:10 cgoubert@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mw2438.codfw.wmnet
  • 17:09 cgoubert@cumin1002: START - Cookbook sre.hosts.convert-disks for host mw2438
  • 16:52 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.convert-disks (exit_code=99) for host mw2438
  • 16:39 topranks: resetting line card 1/1 on cr1-codfw (T366941)
  • 16:37 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mw2438.codfw.wmnet
  • 16:35 cgoubert@cumin1002: START - Cookbook sre.hosts.reboot-single for host mw2438.codfw.wmnet
  • 16:35 cgoubert@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts mw2438.codfw.wmnet
  • 16:34 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:20:00 on ssw1-a1-codfw.mgmt with reason: bouncing line card on cr1-codfw
  • 16:34 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:20:00 on ssw1-a1-codfw.mgmt with reason: bouncing line card on cr1-codfw
  • 16:32 papaul: re-enable option 82 on lsw1-b7-codfw
  • 16:26 cgoubert@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mw2438.codfw.wmnet
  • 16:25 cgoubert@cumin1002: START - Cookbook sre.hosts.convert-disks for host mw2438
  • 16:24 papaul: disable option 82 on lsw1-b7-codfw to test pxe boot issue
  • 16:23 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.convert-disks (exit_code=99) for host mw2433
  • 16:21 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:20:00 on cloudsw1-b1-codfw.mgmt,pfw3-codfw with reason: bouncing line card on cr1-codfw
  • 16:21 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:20:00 on cloudsw1-b1-codfw.mgmt,pfw3-codfw with reason: bouncing line card on cr1-codfw
  • 16:13 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mw2433.codfw.wmnet
  • 16:10 cgoubert@cumin1002: START - Cookbook sre.hosts.reboot-single for host mw2433.codfw.wmnet
  • 16:10 cgoubert@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts mw2433.codfw.wmnet
  • 16:10 cgoubert@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mw2433.codfw.wmnet
  • 16:10 cgoubert@cumin1002: START - Cookbook sre.hosts.convert-disks for host mw2433
  • 16:07 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:20:00 on cloudsw1-b1-codfw.mgmt,pfw3-codfw with reason: bouncing line card on cr1-codfw
  • 16:07 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:20:00 on cloudsw1-b1-codfw.mgmt,pfw3-codfw with reason: bouncing line card on cr1-codfw
  • 15:52 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub: sync on production
  • 15:48 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub: apply on production
  • 15:37 arnaudb@cumin1002: dbctl commit (dc=all): 'db1203 (re)pooling @ 100%: maintenance rescheduled', diff saved to https://phabricator.wikimedia.org/P66827 and previous config saved to /var/cache/conftool/dbconfig/20240718-153748-arnaudb.json
  • 15:37 arnaudb@cumin1002: dbctl commit (dc=all): 'db1202 (re)pooling @ 100%: maintenance rescheduled', diff saved to https://phabricator.wikimedia.org/P66826 and previous config saved to /var/cache/conftool/dbconfig/20240718-153731-arnaudb.json
  • 15:37 arnaudb@cumin1002: dbctl commit (dc=all): 'db1195 (re)pooling @ 100%: maintenance rescheduled', diff saved to https://phabricator.wikimedia.org/P66825 and previous config saved to /var/cache/conftool/dbconfig/20240718-153718-arnaudb.json
  • 15:35 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host sretest2001.codfw.wmnet
  • 15:29 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2433.mgmt.codfw.wmnet with reboot policy GRACEFUL
  • 15:23 cgoubert@cumin1002: START - Cookbook sre.hosts.provision for host mw2433.mgmt.codfw.wmnet with reboot policy GRACEFUL
  • 15:22 arnaudb@cumin1002: dbctl commit (dc=all): 'db1203 (re)pooling @ 75%: maintenance rescheduled', diff saved to https://phabricator.wikimedia.org/P66824 and previous config saved to /var/cache/conftool/dbconfig/20240718-152243-arnaudb.json
  • 15:22 arnaudb@cumin1002: dbctl commit (dc=all): 'db1202 (re)pooling @ 75%: maintenance rescheduled', diff saved to https://phabricator.wikimedia.org/P66823 and previous config saved to /var/cache/conftool/dbconfig/20240718-152225-arnaudb.json
  • 15:22 arnaudb@cumin1002: dbctl commit (dc=all): 'db1195 (re)pooling @ 75%: maintenance rescheduled', diff saved to https://phabricator.wikimedia.org/P66822 and previous config saved to /var/cache/conftool/dbconfig/20240718-152213-arnaudb.json
  • 15:19 topranks: disabling interface et-1/1/3 on cr1-codfw (facing asw-d-codfw) T366941
  • 15:17 topranks: disabling interface et-1/1/0 on cr1-codfw (facing asw-c-codfw) T366941
  • 15:13 elukey@cumin1002: START - Cookbook sre.hosts.dhcp for host sretest2001.codfw.wmnet
  • 15:12 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on cr[1-2]-codfw,ssw1-d[1,8]-codfw with reason: Move asw-c-codfw and asw-d-codfw CR uplinks
  • 15:12 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on cr[1-2]-codfw,ssw1-d[1,8]-codfw with reason: Move asw-c-codfw and asw-d-codfw CR uplinks
  • 15:12 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.convert-disks (exit_code=99) for host mw2433
  • 15:09 mforns@deploy1002: Finished deploy [airflow-dags/analytics@cde3c31]: (no justification provided) (duration: 00m 30s)
  • 15:08 mforns@deploy1002: Started deploy [airflow-dags/analytics@cde3c31]: (no justification provided)
  • 15:07 arnaudb@cumin1002: dbctl commit (dc=all): 'db1203 (re)pooling @ 50%: maintenance rescheduled', diff saved to https://phabricator.wikimedia.org/P66821 and previous config saved to /var/cache/conftool/dbconfig/20240718-150737-arnaudb.json
  • 15:07 arnaudb@cumin1002: dbctl commit (dc=all): 'db1202 (re)pooling @ 50%: maintenance rescheduled', diff saved to https://phabricator.wikimedia.org/P66820 and previous config saved to /var/cache/conftool/dbconfig/20240718-150720-arnaudb.json
  • 15:07 arnaudb@cumin1002: dbctl commit (dc=all): 'db1195 (re)pooling @ 50%: maintenance rescheduled', diff saved to https://phabricator.wikimedia.org/P66819 and previous config saved to /var/cache/conftool/dbconfig/20240718-150708-arnaudb.json
  • 15:03 cgoubert@cumin1002: START - Cookbook sre.hosts.convert-disks for host mw2433
  • 14:58 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.convert-disks (exit_code=99) for host mw2433
  • 14:58 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host mw2433.codfw.wmnet
  • 14:52 arnaudb@cumin1002: dbctl commit (dc=all): 'db1203 (re)pooling @ 25%: maintenance rescheduled', diff saved to https://phabricator.wikimedia.org/P66818 and previous config saved to /var/cache/conftool/dbconfig/20240718-145232-arnaudb.json
  • 14:52 arnaudb@cumin1002: dbctl commit (dc=all): 'db1202 (re)pooling @ 25%: maintenance rescheduled', diff saved to https://phabricator.wikimedia.org/P66817 and previous config saved to /var/cache/conftool/dbconfig/20240718-145214-arnaudb.json
  • 14:52 arnaudb@cumin1002: dbctl commit (dc=all): 'db1195 (re)pooling @ 25%: maintenance rescheduled', diff saved to Unable to send diff to phaste and previous config saved to /var/cache/conftool/dbconfig/20240718-145157-arnaudb.json
  • 14:47 arnaudb@cumin1002: dbctl commit (dc=all): 'T365998 - depooling db1195 - s1 db1202 - s7 db1203 - s8', diff saved to https://phabricator.wikimedia.org/P66816 and previous config saved to /var/cache/conftool/dbconfig/20240718-144754-arnaudb.json
  • 14:47 cgoubert@cumin1002: START - Cookbook sre.hosts.reboot-single for host mw2433.codfw.wmnet
  • 14:47 cgoubert@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts mw2433.codfw.wmnet
  • 14:40 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-conf1004.eqiad.wmnet with OS bookworm
  • 14:38 cgoubert@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mw2433.codfw.wmnet
  • 14:38 cgoubert@cumin1002: START - Cookbook sre.hosts.convert-disks for host mw2433
  • 14:17 kevinbazira@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 14:06 logmsgbot: lucaswerkmeister-wmde@deploy1002 helmfile [codfw] DONE helmfile.d/services/termbox: apply
  • 14:06 logmsgbot: lucaswerkmeister-wmde@deploy1002 helmfile [codfw] START helmfile.d/services/termbox: apply
  • 14:06 logmsgbot: lucaswerkmeister-wmde@deploy1002 helmfile [eqiad] DONE helmfile.d/services/termbox: apply
  • 14:06 logmsgbot: lucaswerkmeister-wmde@deploy1002 helmfile [eqiad] START helmfile.d/services/termbox: apply
  • 14:05 logmsgbot: lucaswerkmeister-wmde@deploy1002 helmfile [staging] DONE helmfile.d/services/termbox: apply
  • 14:05 logmsgbot: lucaswerkmeister-wmde@deploy1002 helmfile [staging] START helmfile.d/services/termbox: apply
  • 14:01 logmsgbot: lucaswerkmeister-wmde@deploy1002 helmfile [codfw] DONE helmfile.d/services/termbox: apply
  • 14:01 logmsgbot: lucaswerkmeister-wmde@deploy1002 helmfile [codfw] START helmfile.d/services/termbox: apply
  • 14:01 logmsgbot: lucaswerkmeister-wmde@deploy1002 helmfile [eqiad] DONE helmfile.d/services/termbox: apply
  • 14:01 logmsgbot: lucaswerkmeister-wmde@deploy1002 helmfile [eqiad] START helmfile.d/services/termbox: apply
  • 13:55 logmsgbot: lucaswerkmeister-wmde@deploy1002 helmfile [staging] DONE helmfile.d/services/termbox: apply
  • 13:53 logmsgbot: lucaswerkmeister-wmde@deploy1002 helmfile [staging] START helmfile.d/services/termbox: apply
  • 13:50 brett: Release ncmonitor 1.1.0-1 to bookworm-wikimedia
  • 13:46 Dreamy_Jazz: Afternoon UTC backport window done
  • 13:44 dreamyjazz@deploy1002: Finished scap: Backport for Allow Bureaucrats on Foundation Wiki to be able to remove Sysop rights (T370097), fix(editor): make PageTitleControl reliably blankable (T370326) (duration: 09m 59s)
  • 13:39 dreamyjazz@deploy1002: migr, dreamyjazz, dreamrimmer: Continuing with sync
  • 13:36 dreamyjazz@deploy1002: migr, dreamyjazz, dreamrimmer: Backport for Allow Bureaucrats on Foundation Wiki to be able to remove Sysop rights (T370097), fix(editor): make PageTitleControl reliably blankable (T370326) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:34 dreamyjazz@deploy1002: Started scap sync-world: Backport for Allow Bureaucrats on Foundation Wiki to be able to remove Sysop rights (T370097), fix(editor): make PageTitleControl reliably blankable (T370326)
  • 13:29 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-conf1004.eqiad.wmnet with OS bookworm
  • 13:07 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw2432.codfw.wmnet with OS buster
  • 12:55 topranks: re-enabling interface et-1/0/2 on cr2-codfw which connects to ssw1-d8-codfw (problemtic IP interfaces have been deleted) T366941
  • 12:52 topranks: re-enabling BGP between spine-layer switches in codfw (problematic IP interfaces have been deleted) T366941
  • 12:51 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:51 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove entries for IRB ints on row D spines - cmooney@cumin1002"
  • 12:50 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove entries for IRB ints on row D spines - cmooney@cumin1002"
  • 12:46 cmooney@cumin1002: START - Cookbook sre.dns.netbox
  • 12:40 dreamyjazz@deploy1002: Finished scap: Backport for [GlobalBlocking] Enable global account blocks on all wikis (T356924) (duration: 09m 10s)
  • 12:35 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub-next: sync on staging
  • 12:35 dreamyjazz@deploy1002: dreamyjazz: Continuing with sync
  • 12:34 dreamyjazz@deploy1002: dreamyjazz: Backport for [GlobalBlocking] Enable global account blocks on all wikis (T356924) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 12:33 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw2432.codfw.wmnet with reason: host reimage
  • 12:32 pfischer@deploy1002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 12:32 pfischer@deploy1002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 12:32 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub-next: apply on staging
  • 12:30 dreamyjazz@deploy1002: Started scap sync-world: Backport for [GlobalBlocking] Enable global account blocks on all wikis (T356924)
  • 12:27 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw2432.codfw.wmnet with reason: host reimage
  • 12:25 elukey: update spicerack to 8.8.0 on cumin1002
  • 12:14 claime: restarting sync-puppet-volatile on puppetserver2001
  • 12:10 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host mw2432.codfw.wmnet with OS buster
  • 12:09 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 12:09 isaranto@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 12:08 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 11:42 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host mw2432.mgmt.codfw.wmnet with reboot policy GRACEFUL
  • 11:39 cgoubert@cumin1002: START - Cookbook sre.hosts.provision for host mw2432.mgmt.codfw.wmnet with reboot policy GRACEFUL
  • 11:15 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub-next: sync on staging
  • 11:14 jforrester@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 11:14 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:14 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new IRB interfaces codfw - cmooney@cumin1002"
  • 11:13 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add entries for new IRB interfaces codfw - cmooney@cumin1002"
  • 11:12 jforrester@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 11:12 jforrester@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 11:10 cmooney@cumin1002: START - Cookbook sre.dns.netbox
  • 11:10 jforrester@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 11:09 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 11:07 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 11:07 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:05 cmooney@cumin1002: START - Cookbook sre.dns.netbox
  • 11:05 cmooney@cumin1002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
  • 11:04 cmooney@cumin1002: START - Cookbook sre.dns.netbox
  • 11:04 cmooney@cumin1002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 11:03 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub-next: apply on staging
  • 10:54 cmooney@cumin1002: START - Cookbook sre.dns.netbox
  • 10:38 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host mw2432.codfw.wmnet with OS buster
  • 10:28 cgoubert@cumin1002: END (ERROR) - Cookbook sre.hosts.convert-disks (exit_code=97) for host mw2432
  • 10:17 cgoubert@cumin1002: START - Cookbook sre.hosts.convert-disks for host mw2432
  • 10:08 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.convert-disks (exit_code=99) for host mw2432
  • 10:04 cgoubert@cumin1002: START - Cookbook sre.hosts.convert-disks for host mw2432
  • 09:56 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.convert-disks (exit_code=99) for host mw2432
  • 09:52 cgoubert@cumin1002: START - Cookbook sre.hosts.convert-disks for host mw2432
  • 09:46 kevinbazira@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 09:46 kevinbazira@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 09:44 elukey: upgrade spicerack to 8.8.0 on cumin2002 - testing the new release
  • 09:43 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub-next: sync on staging
  • 09:26 elukey: uploaded spicerack_8.8.0 to apt.wikimedia.org bullseye-wikimedia
  • 09:26 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub-next: apply on staging
  • 09:08 btullis: disabled check-private-data.timer on clouddb1021, pending decom.
  • 09:06 dcausse@deploy1002: helmfile [eqiad] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 09:06 dcausse@deploy1002: helmfile [eqiad] START helmfile.d/services/rdf-streaming-updater: apply
  • 09:02 dcausse@deploy1002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 09:02 dcausse@deploy1002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
  • 08:56 dcausse@deploy1002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 08:55 dcausse@deploy1002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
  • 08:51 dcausse@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 08:51 dcausse@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 08:47 dcausse@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 08:47 dcausse@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 08:13 aklapper@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.43.0-wmf.14 refs T366959
  • 04:38 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2155 (T367856)', diff saved to https://phabricator.wikimedia.org/P66806 and previous config saved to /var/cache/conftool/dbconfig/20240718-043817-marostegui.json
  • 04:38 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 04:37 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 04:37 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2155.codfw.wmnet with reason: Maintenance
  • 04:37 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2155.codfw.wmnet with reason: Maintenance
  • 04:37 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T367856)', diff saved to https://phabricator.wikimedia.org/P66805 and previous config saved to /var/cache/conftool/dbconfig/20240718-043739-marostegui.json
  • 04:22 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P66804 and previous config saved to /var/cache/conftool/dbconfig/20240718-042232-marostegui.json
  • 04:07 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P66803 and previous config saved to /var/cache/conftool/dbconfig/20240718-040725-marostegui.json
  • 03:52 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T367856)', diff saved to https://phabricator.wikimedia.org/P66802 and previous config saved to /var/cache/conftool/dbconfig/20240718-035218-marostegui.json
  • 00:35 ryankemper@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: elastic110[0-2]* for row maint - ryankemper@cumin2002 - T348977
  • 00:35 ryankemper@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: elastic110[0-2]* for row maint - ryankemper@cumin2002 - T348977
  • 00:05 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2220 (T367781)', diff saved to https://phabricator.wikimedia.org/P66801 and previous config saved to /var/cache/conftool/dbconfig/20240718-000500-arnaudb.json

2024-07-17

  • 23:50 mutante: phabricator (phab1004) - deployed gerrit:1054907 ; restarted apache
  • 23:49 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2220', diff saved to https://phabricator.wikimedia.org/P66800 and previous config saved to /var/cache/conftool/dbconfig/20240717-234953-arnaudb.json
  • 23:34 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2220', diff saved to https://phabricator.wikimedia.org/P66799 and previous config saved to /var/cache/conftool/dbconfig/20240717-233446-arnaudb.json
  • 23:19 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2220 (T367781)', diff saved to https://phabricator.wikimedia.org/P66798 and previous config saved to /var/cache/conftool/dbconfig/20240717-231939-arnaudb.json
  • 23:16 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2220 (T367781)', diff saved to https://phabricator.wikimedia.org/P66797 and previous config saved to /var/cache/conftool/dbconfig/20240717-231612-arnaudb.json
  • 23:16 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2220.codfw.wmnet with reason: Maintenance
  • 23:16 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 23:15 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2220.codfw.wmnet with reason: Maintenance
  • 23:15 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2208 (T367781)', diff saved to https://phabricator.wikimedia.org/P66796 and previous config saved to /var/cache/conftool/dbconfig/20240717-231550-arnaudb.json
  • 23:14 jclark@cumin1002: START - Cookbook sre.dns.netbox
  • 23:13 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephmon1006
  • 23:13 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephmon1006
  • 23:13 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephmon1004
  • 23:13 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephmon1004
  • 23:12 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephmon1005
  • 23:11 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephmon1005
  • 23:00 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2208', diff saved to https://phabricator.wikimedia.org/P66795 and previous config saved to /var/cache/conftool/dbconfig/20240717-230043-arnaudb.json
  • 22:45 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2208', diff saved to https://phabricator.wikimedia.org/P66794 and previous config saved to /var/cache/conftool/dbconfig/20240717-224536-arnaudb.json
  • 22:39 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephmon1006.eqiad.wmnet with OS bullseye
  • 22:39 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephmon1005.eqiad.wmnet with OS bullseye
  • 22:39 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephmon1004.eqiad.wmnet with OS bullseye
  • 22:37 zabe: zabe@mwmaint1002:~$ mwscript createAndPromote.php aewikimedia "Reda Kerbouche" REDACTED --bureaucrat --sysop # T362529
  • 22:30 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2208 (T367781)', diff saved to https://phabricator.wikimedia.org/P66793 and previous config saved to /var/cache/conftool/dbconfig/20240717-223028-arnaudb.json
  • 22:28 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephmon1004.mgmt.eqiad.wmnet with reboot policy FORCED
  • 22:28 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephmon1006.mgmt.eqiad.wmnet with reboot policy FORCED
  • 22:28 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephmon1005.mgmt.eqiad.wmnet with reboot policy FORCED
  • 22:27 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2208 (T367781)', diff saved to https://phabricator.wikimedia.org/P66792 and previous config saved to /var/cache/conftool/dbconfig/20240717-222701-arnaudb.json
  • 22:26 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2208.codfw.wmnet with reason: Maintenance
  • 22:26 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2208.codfw.wmnet with reason: Maintenance
  • 22:26 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2200.codfw.wmnet with reason: Maintenance
  • 22:26 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2200.codfw.wmnet with reason: Maintenance
  • 22:25 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2198.codfw.wmnet with reason: Maintenance
  • 22:25 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2198.codfw.wmnet with reason: Maintenance
  • 22:25 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T367781)', diff saved to https://phabricator.wikimedia.org/P66791 and previous config saved to /var/cache/conftool/dbconfig/20240717-222530-arnaudb.json
  • 22:17 jclark@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephmon1005.mgmt.eqiad.wmnet with reboot policy FORCED
  • 22:17 jclark@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephmon1006.mgmt.eqiad.wmnet with reboot policy FORCED
  • 22:17 jclark@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephmon1004.mgmt.eqiad.wmnet with reboot policy FORCED
  • 22:14 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:14 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt for cloudcephmon1004-6 - jclark@cumin1002"
  • 22:13 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt for cloudcephmon1004-6 - jclark@cumin1002"
  • 22:10 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P66790 and previous config saved to /var/cache/conftool/dbconfig/20240717-221023-arnaudb.json
  • 22:07 jclark@cumin1002: START - Cookbook sre.dns.netbox
  • 21:55 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P66789 and previous config saved to /var/cache/conftool/dbconfig/20240717-215516-arnaudb.json
  • 21:51 eileen: civicrm upgraded from 1ac3e7be to 384fe444
  • 21:40 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T367781)', diff saved to https://phabricator.wikimedia.org/P66788 and previous config saved to /var/cache/conftool/dbconfig/20240717-214008-arnaudb.json
  • 21:36 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2182 (T367781)', diff saved to https://phabricator.wikimedia.org/P66787 and previous config saved to /var/cache/conftool/dbconfig/20240717-213641-arnaudb.json
  • 21:36 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2182.codfw.wmnet with reason: Maintenance
  • 21:36 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2182.codfw.wmnet with reason: Maintenance
  • 21:36 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2168 (T367781)', diff saved to https://phabricator.wikimedia.org/P66786 and previous config saved to /var/cache/conftool/dbconfig/20240717-213619-arnaudb.json
  • away: UTC late deploys done
  • 21:21 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2168', diff saved to https://phabricator.wikimedia.org/P66785 and previous config saved to /var/cache/conftool/dbconfig/20240717-212112-arnaudb.json
  • 21:19 tgr@deploy1002: Finished scap: Backport for skin-themes dblist is expanded to include tier 2 wikis as well as tier 1. (T367150) (duration: 16m 59s)
  • 21:14 tgr@deploy1002: tgr, ksarabia: Continuing with sync
  • 21:08 tgr@deploy1002: tgr, ksarabia: Backport for skin-themes dblist is expanded to include tier 2 wikis as well as tier 1. (T367150) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:06 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2168', diff saved to https://phabricator.wikimedia.org/P66784 and previous config saved to /var/cache/conftool/dbconfig/20240717-210605-arnaudb.json
  • 21:02 tgr@deploy1002: Started scap sync-world: Backport for skin-themes dblist is expanded to include tier 2 wikis as well as tier 1. (T367150)
  • 21:01 tgr@deploy1002: Finished scap: Backport for SUL3: Fix URL handling for the SSO domain (T365162) (duration: 42m 33s)
  • 20:54 tgr@deploy1002: tgr: Continuing with sync
  • 20:53 tgr@deploy1002: tgr: Backport for SUL3: Fix URL handling for the SSO domain (T365162) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:50 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2168 (T367781)', diff saved to https://phabricator.wikimedia.org/P66783 and previous config saved to /var/cache/conftool/dbconfig/20240717-205058-arnaudb.json
  • 20:47 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2168 (T367781)', diff saved to https://phabricator.wikimedia.org/P66782 and previous config saved to /var/cache/conftool/dbconfig/20240717-204731-arnaudb.json
  • 20:47 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2168.codfw.wmnet with reason: Maintenance
  • 20:47 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2168.codfw.wmnet with reason: Maintenance
  • 20:47 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T367781)', diff saved to https://phabricator.wikimedia.org/P66781 and previous config saved to /var/cache/conftool/dbconfig/20240717-204709-arnaudb.json
  • 20:32 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P66780 and previous config saved to /var/cache/conftool/dbconfig/20240717-203202-arnaudb.json
  • 20:18 tgr@deploy1002: Started scap sync-world: Backport for SUL3: Fix URL handling for the SSO domain (T365162)
  • 20:16 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P66779 and previous config saved to /var/cache/conftool/dbconfig/20240717-201655-arnaudb.json
  • 20:14 tgr@deploy1002: Finished scap: Backport for SUL3: Fix cookie names on the SSO domain (T365162) (duration: 09m 23s)
  • 20:12 topranks: rebooting unused switch ssw1-d8-codfw in an effort to troubleshoot gnmic errors
  • 20:12 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on cr2-codfw,ssw1-a[1,8]-codfw.mgmt with reason: Rebooting ssw1-d8-codfw to try and fix gnmi telemtry
  • 20:12 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on cr2-codfw,ssw1-a[1,8]-codfw.mgmt with reason: Rebooting ssw1-d8-codfw to try and fix gnmi telemtry
  • 20:09 tgr@deploy1002: tgr: Continuing with sync
  • 20:07 tgr@deploy1002: tgr: Backport for SUL3: Fix cookie names on the SSO domain (T365162) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:04 tgr@deploy1002: Started scap sync-world: Backport for SUL3: Fix cookie names on the SSO domain (T365162)
  • 20:01 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T367781)', diff saved to https://phabricator.wikimedia.org/P66778 and previous config saved to /var/cache/conftool/dbconfig/20240717-200147-arnaudb.json
  • 19:59 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2159 (T367781)', diff saved to https://phabricator.wikimedia.org/P66777 and previous config saved to /var/cache/conftool/dbconfig/20240717-195921-arnaudb.json
  • 19:59 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 19:59 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 19:59 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2159.codfw.wmnet with reason: Maintenance
  • 19:58 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2159.codfw.wmnet with reason: Maintenance
  • 19:58 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T367781)', diff saved to https://phabricator.wikimedia.org/P66776 and previous config saved to /var/cache/conftool/dbconfig/20240717-195844-arnaudb.json
  • 19:45 eileen: config revision changed from 85336766 to 4ea1c745
  • 19:43 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P66775 and previous config saved to /var/cache/conftool/dbconfig/20240717-194337-arnaudb.json
  • 19:28 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P66774 and previous config saved to /var/cache/conftool/dbconfig/20240717-192830-arnaudb.json
  • 19:13 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T367781)', diff saved to https://phabricator.wikimedia.org/P66773 and previous config saved to /var/cache/conftool/dbconfig/20240717-191324-arnaudb.json
  • 19:11 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2150 (T367781)', diff saved to https://phabricator.wikimedia.org/P66772 and previous config saved to /var/cache/conftool/dbconfig/20240717-191057-arnaudb.json
  • 19:10 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2150.codfw.wmnet with reason: Maintenance
  • 19:10 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2150.codfw.wmnet with reason: Maintenance
  • 19:10 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2122 (T367781)', diff saved to https://phabricator.wikimedia.org/P66771 and previous config saved to /var/cache/conftool/dbconfig/20240717-191035-arnaudb.json
  • 18:55 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2122', diff saved to https://phabricator.wikimedia.org/P66770 and previous config saved to /var/cache/conftool/dbconfig/20240717-185528-arnaudb.json
  • 18:46 dancy@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.43.0-wmf.14 refs T366959
  • 18:40 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2122', diff saved to https://phabricator.wikimedia.org/P66769 and previous config saved to /var/cache/conftool/dbconfig/20240717-184021-arnaudb.json
  • 18:25 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2122 (T367781)', diff saved to https://phabricator.wikimedia.org/P66768 and previous config saved to /var/cache/conftool/dbconfig/20240717-182514-arnaudb.json
  • 18:21 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2122 (T367781)', diff saved to https://phabricator.wikimedia.org/P66767 and previous config saved to /var/cache/conftool/dbconfig/20240717-182147-arnaudb.json
  • 18:21 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2122.codfw.wmnet with reason: Maintenance
  • 18:21 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2122.codfw.wmnet with reason: Maintenance
  • 18:21 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2121 (T367781)', diff saved to https://phabricator.wikimedia.org/P66766 and previous config saved to /var/cache/conftool/dbconfig/20240717-182125-arnaudb.json
  • 18:10 dancy@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.43.0-wmf.14 refs T366959
  • 18:06 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2121', diff saved to https://phabricator.wikimedia.org/P66765 and previous config saved to /var/cache/conftool/dbconfig/20240717-180617-arnaudb.json
  • 18:01 topranks: adjust route preference for traffic to AWS on Eqiad core routers T370297
  • 17:51 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2121', diff saved to https://phabricator.wikimedia.org/P66764 and previous config saved to /var/cache/conftool/dbconfig/20240717-175110-arnaudb.json
  • 17:36 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2121 (T367781)', diff saved to https://phabricator.wikimedia.org/P66763 and previous config saved to /var/cache/conftool/dbconfig/20240717-173603-arnaudb.json
  • 17:33 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2121 (T367781)', diff saved to https://phabricator.wikimedia.org/P66762 and previous config saved to /var/cache/conftool/dbconfig/20240717-173336-arnaudb.json
  • 17:33 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 17:33 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2121.codfw.wmnet with reason: Maintenance
  • 17:33 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
  • 17:33 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
  • 17:32 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1227 (T367781)', diff saved to https://phabricator.wikimedia.org/P66761 and previous config saved to /var/cache/conftool/dbconfig/20240717-173257-arnaudb.json
  • 17:27 mutante: removing integration.mediawiki.org from DNS - T361250
  • 17:17 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1227', diff saved to https://phabricator.wikimedia.org/P66760 and previous config saved to /var/cache/conftool/dbconfig/20240717-171750-arnaudb.json
  • 17:13 inflatador: bking@kafka-main2005 `kafka topics --create --topic ${TOPIC} --partitions 1 --replication-factor 3; kafka configs --entity-type topics --entity-name ${TOPIC} --alter --add-config retention.ms=2592000000 T367510`
  • 17:02 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1227', diff saved to https://phabricator.wikimedia.org/P66759 and previous config saved to /var/cache/conftool/dbconfig/20240717-170243-arnaudb.json
  • 16:59 btullis@deploy1002: Finished deploy [airflow-dags/analytics@ca21d05]: (no justification provided) (duration: 00m 51s)
  • 16:58 btullis@deploy1002: Started deploy [airflow-dags/analytics@ca21d05]: (no justification provided)
  • 16:47 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1227 (T367781)', diff saved to https://phabricator.wikimedia.org/P66758 and previous config saved to /var/cache/conftool/dbconfig/20240717-164736-arnaudb.json
  • 16:45 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1227 (T367781)', diff saved to https://phabricator.wikimedia.org/P66757 and previous config saved to /var/cache/conftool/dbconfig/20240717-164521-arnaudb.json
  • 16:45 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1227.eqiad.wmnet with reason: Maintenance
  • 16:45 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1227.eqiad.wmnet with reason: Maintenance
  • 16:45 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1202 (T367781)', diff saved to https://phabricator.wikimedia.org/P66756 and previous config saved to /var/cache/conftool/dbconfig/20240717-164459-arnaudb.json
  • 16:34 klausman@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
  • 16:34 klausman@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
  • 16:32 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
  • 16:31 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
  • 16:31 klausman@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
  • 16:31 klausman@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
  • 16:30 klausman@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
  • 16:30 otto@deploy1002: Finished deploy [analytics/refinery@8f00c85] (thin): THIN [analytics/refinery@8f00c859] (duration: 04m 08s)
  • 16:29 klausman@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
  • 16:29 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P66755 and previous config saved to /var/cache/conftool/dbconfig/20240717-162952-arnaudb.json
  • 16:26 otto@deploy1002: Started deploy [analytics/refinery@8f00c85] (thin): THIN [analytics/refinery@8f00c859]
  • 16:21 otto@deploy1002: Finished deploy [analytics/refinery@8f00c85]: [analytics/refinery@8f00c859] (duration: 07m 59s)
  • 16:14 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P66754 and previous config saved to /var/cache/conftool/dbconfig/20240717-161445-arnaudb.json
  • 16:13 otto@deploy1002: Started deploy [analytics/refinery@8f00c85]: [analytics/refinery@8f00c859]
  • 16:08 inflatador: bking@kafka-main1005 `kafka topics --create --topic ${TOPIC} --partitions 1 --replication-factor 3; kafka configs --entity-type topics --entity-name ${TOPIC} --alter --add-config retention.ms=2592000000` T367510
  • 15:59 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1202 (T367781)', diff saved to https://phabricator.wikimedia.org/P66752 and previous config saved to /var/cache/conftool/dbconfig/20240717-155937-arnaudb.json
  • 15:56 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1202 (T367781)', diff saved to https://phabricator.wikimedia.org/P66751 and previous config saved to /var/cache/conftool/dbconfig/20240717-155628-arnaudb.json
  • 15:56 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1202.eqiad.wmnet with reason: Maintenance
  • 15:56 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1202.eqiad.wmnet with reason: Maintenance
  • 15:56 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T367781)', diff saved to https://phabricator.wikimedia.org/P66750 and previous config saved to /var/cache/conftool/dbconfig/20240717-155606-arnaudb.json
  • 15:53 otto@deploy1002: Finished deploy [analytics/refinery@8f00c85] (hadoop-test): - take 2 - TEST [analytics/refinery@8f00c859] (duration: 03m 33s)
  • 15:50 otto@deploy1002: Started deploy [analytics/refinery@8f00c85] (hadoop-test): - take 2 - TEST [analytics/refinery@8f00c859]
  • 15:46 otto@deploy1002: Finished deploy [analytics/refinery@0b53772] (hadoop-test): TEST [analytics/refinery@0b53772e] (duration: 03m 27s)
  • 15:42 otto@deploy1002: Started deploy [analytics/refinery@0b53772] (hadoop-test): TEST [analytics/refinery@0b53772e]
  • 15:41 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P66748 and previous config saved to /var/cache/conftool/dbconfig/20240717-154059-arnaudb.json
  • 15:38 sukhe@cumin1002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-low-traffic-eqiad and A:lvs
  • 15:37 sukhe@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-low-traffic-eqiad and A:lvs
  • 15:35 sukhe@cumin1002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-secondary-eqiad and A:lvs
  • 15:35 sukhe@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-secondary-eqiad and A:lvs
  • 15:33 sukhe@cumin1002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-low-traffic-codfw and A:lvs
  • 15:32 sukhe@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-low-traffic-codfw and A:lvs
  • 15:32 topranks: Adjust anycast route policy at Chicago Network POP cr2-eqord to announce anycast ranges T367439
  • 15:30 sukhe: sudo cumin "A:lvs" "run-puppet-agent" to pick up apus change
  • 15:29 sukhe@cumin1002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-secondary-codfw and A:lvs
  • 15:28 sukhe@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-secondary-codfw and A:lvs
  • 15:25 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P66747 and previous config saved to /var/cache/conftool/dbconfig/20240717-152552-arnaudb.json
  • 15:24 jforrester@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 15:23 jforrester@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 15:23 jforrester@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 15:22 jforrester@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 15:21 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbproxy2007.codfw.wmnet with OS bookworm
  • 15:21 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 15:21 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 15:21 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) apus.discovery.wmnet on all recursors
  • 15:20 sukhe@cumin1002: START - Cookbook sre.dns.wipe-cache apus.discovery.wmnet on all recursors
  • 15:20 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 15:19 jgiannelos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
  • 15:18 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 15:18 sukhe: running authdns-update for CR 1054346
  • 15:16 jgiannelos@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop: apply
  • 15:16 sukhe: cumin 'A:dnsbox' 'run-puppet-agent': T279621
  • 15:13 jgiannelos@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
  • 15:12 jgiannelos@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop: apply
  • 15:11 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
  • 15:10 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T367781)', diff saved to https://phabricator.wikimedia.org/P66745 and previous config saved to /var/cache/conftool/dbconfig/20240717-151045-arnaudb.json
  • 15:09 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: apply
  • 15:08 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: apply
  • 15:08 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
  • 15:08 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1194 (T367781)', diff saved to https://phabricator.wikimedia.org/P66744 and previous config saved to /var/cache/conftool/dbconfig/20240717-150833-arnaudb.json
  • 15:08 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1194.eqiad.wmnet with reason: Maintenance
  • 15:08 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1194.eqiad.wmnet with reason: Maintenance
  • 15:08 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T367781)', diff saved to https://phabricator.wikimedia.org/P66743 and previous config saved to /var/cache/conftool/dbconfig/20240717-150811-arnaudb.json
  • 15:08 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: apply
  • 15:08 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
  • 15:07 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: apply
  • 15:07 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: apply
  • 15:03 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbproxy2007.codfw.wmnet with reason: host reimage
  • 14:59 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dbproxy2007.codfw.wmnet with reason: host reimage
  • 14:53 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P66742 and previous config saved to /var/cache/conftool/dbconfig/20240717-145303-arnaudb.json
  • 14:46 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/commons-impact-analytics: apply
  • 14:46 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/commons-impact-analytics: apply
  • 14:46 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host dbproxy2007.codfw.wmnet with OS bookworm
  • 14:44 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 14:44 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 14:44 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T367856)', diff saved to https://phabricator.wikimedia.org/P66741 and previous config saved to /var/cache/conftool/dbconfig/20240717-144415-marostegui.json
  • 14:40 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/commons-impact-analytics: apply
  • 14:40 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/commons-impact-analytics: apply
  • 14:37 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P66740 and previous config saved to /var/cache/conftool/dbconfig/20240717-143756-arnaudb.json
  • 14:37 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/commons-impact-analytics: apply
  • 14:36 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/commons-impact-analytics: apply
  • 14:29 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P66739 and previous config saved to /var/cache/conftool/dbconfig/20240717-142908-marostegui.json
  • 14:27 jgiannelos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
  • 14:27 jgiannelos@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop: apply
  • 14:27 jgiannelos@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
  • 14:27 jgiannelos@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop: apply
  • 14:26 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for durum3003.esams.wmnet
  • 14:26 sukhe@cumin1002: START - Cookbook sre.hosts.remove-downtime for durum3003.esams.wmnet
  • 14:22 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T367781)', diff saved to https://phabricator.wikimedia.org/P66738 and previous config saved to /var/cache/conftool/dbconfig/20240717-142249-arnaudb.json
  • 14:22 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on durum3003.esams.wmnet with reason: testing anycast-healthchecker 0.9.8
  • 14:22 sukhe@cumin1002: START - Cookbook sre.hosts.downtime for 0:10:00 on durum3003.esams.wmnet with reason: testing anycast-healthchecker 0.9.8
  • 14:20 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbproxy2008.codfw.wmnet with OS bookworm
  • 14:20 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 14:19 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1191 (T367781)', diff saved to https://phabricator.wikimedia.org/P66737 and previous config saved to /var/cache/conftool/dbconfig/20240717-141939-arnaudb.json
  • 14:19 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1191.eqiad.wmnet with reason: Maintenance
  • 14:19 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1191.eqiad.wmnet with reason: Maintenance
  • 14:19 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T367781)', diff saved to https://phabricator.wikimedia.org/P66736 and previous config saved to /var/cache/conftool/dbconfig/20240717-141929-arnaudb.json
  • 14:19 jforrester@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 14:18 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 14:17 jforrester@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 14:17 jforrester@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 14:16 sukhe: [durum3003] upgrade anycast-healthchecker to 0.9.8-1+wmf12u1: T370068
  • 14:16 jforrester@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 14:14 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.convert-disks (exit_code=99) for host mw2432
  • 14:14 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P66735 and previous config saved to /var/cache/conftool/dbconfig/20240717-141401-marostegui.json
  • 14:11 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 14:11 cgoubert@cumin1002: START - Cookbook sre.hosts.convert-disks for host mw2432
  • 14:11 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 14:07 jgiannelos@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
  • 14:06 jgiannelos@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop: apply
  • 14:04 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P66734 and previous config saved to /var/cache/conftool/dbconfig/20240717-140423-arnaudb.json
  • 14:03 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbproxy2008.codfw.wmnet with reason: host reimage
  • 13:59 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.convert-disks (exit_code=99) for host mw2432
  • 13:59 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dbproxy2008.codfw.wmnet with reason: host reimage
  • 13:58 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T367856)', diff saved to https://phabricator.wikimedia.org/P66733 and previous config saved to /var/cache/conftool/dbconfig/20240717-135854-marostegui.json
  • 13:56 cgoubert@cumin1002: START - Cookbook sre.hosts.convert-disks for host mw2432
  • 13:54 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.convert-disks (exit_code=99) for host mw2432
  • 13:54 cgoubert@cumin1002: START - Cookbook sre.hosts.convert-disks for host mw2432
  • 13:53 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.convert-disks (exit_code=99) for host mw2432
  • 13:53 cgoubert@cumin1002: START - Cookbook sre.hosts.convert-disks for host mw2432
  • 13:49 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P66732 and previous config saved to /var/cache/conftool/dbconfig/20240717-134916-arnaudb.json
  • 13:43 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.convert-disks (exit_code=99) for host mw2432
  • 13:43 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host dbproxy2008.codfw.wmnet with OS bookworm
  • 13:40 cgoubert@cumin1002: START - Cookbook sre.hosts.convert-disks for host mw2432
  • 13:37 jgiannelos@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
  • 13:36 pt1979@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dbproxy2008.codfw.wmnet with OS bookworm
  • 13:34 jgiannelos@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop: apply
  • 13:34 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T367781)', diff saved to https://phabricator.wikimedia.org/P66730 and previous config saved to /var/cache/conftool/dbconfig/20240717-133408-arnaudb.json
  • 13:33 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
  • 13:33 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: apply
  • 13:29 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.convert-disks (exit_code=99) for host mw2432
  • 13:26 cgoubert@cumin1002: START - Cookbook sre.hosts.convert-disks for host mw2432
  • 13:26 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host dbproxy2008.codfw.wmnet with OS bookworm
  • 13:19 urbanecm: Stop revalidateLinkRecommendation for azwiki; restart as `[urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/revalidateLinkRecommendations.php --wiki=azwiki --olderThan=20240104000000 --verbose` instead (T370262)
  • 13:13 urbanecm@deploy1002: Finished scap: Backport for Add Portal namespace for Ingush Wikipedia (T326089), eventbus: enable instrumentation on group 0 (T363587) (duration: 10m 06s)
  • 13:12 urbanecm: [urbanecm@mwmaint1002 ~]$ mwscript extensions/GrowthExperiments/maintenance/revalidateLinkRecommendations.php --wiki=azwiki --all --verbose # T370262
  • 13:08 urbanecm@deploy1002: nmw03, gmodena, urbanecm: Continuing with sync
  • 13:07 sukhe: [intentional] stop nginx.service on durum1001
  • 13:05 urbanecm@deploy1002: nmw03, gmodena, urbanecm: Backport for Add Portal namespace for Ingush Wikipedia (T326089), eventbus: enable instrumentation on group 0 (T363587) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:03 urbanecm@deploy1002: Started scap sync-world: Backport for Add Portal namespace for Ingush Wikipedia (T326089), eventbus: enable instrumentation on group 0 (T363587)
  • 12:33 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1181 (T367781)', diff saved to https://phabricator.wikimedia.org/P66729 and previous config saved to /var/cache/conftool/dbconfig/20240717-123352-arnaudb.json
  • 12:33 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 12:33 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 12:33 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T367781)', diff saved to https://phabricator.wikimedia.org/P66728 and previous config saved to /var/cache/conftool/dbconfig/20240717-123341-arnaudb.json
  • 12:31 urbanecm: Community configuration deployment finished
  • 12:29 urbanecm@deploy1002: Finished scap: Backport for CommunityConfiguration: Release to all Growth wikis, except frwiktionary (T366458), dewiki: Disable CommunityConfiguration (T366458) (duration: 08m 30s)
  • 12:24 urbanecm@deploy1002: urbanecm: Continuing with sync
  • 12:23 urbanecm@deploy1002: urbanecm: Backport for CommunityConfiguration: Release to all Growth wikis, except frwiktionary (T366458), dewiki: Disable CommunityConfiguration (T366458) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 12:21 urbanecm@deploy1002: Started scap sync-world: Backport for CommunityConfiguration: Release to all Growth wikis, except frwiktionary (T366458), dewiki: Disable CommunityConfiguration (T366458)
  • 12:19 urbanecm@deploy1002: Sync cancelled.
  • 12:19 urbanecm: (relogging to attach to the task) migrateCommunityConfig.php finished, logs are available at https://phabricator.wikimedia.org/P66724 (T366458)
  • 12:18 urbanecm: migrateCommunityConfig.php finished, logs are available at https://phabricator.wikimedia.org/P66724
  • 12:18 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P66725 and previous config saved to /var/cache/conftool/dbconfig/20240717-121834-arnaudb.json
  • 12:03 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P66723 and previous config saved to /var/cache/conftool/dbconfig/20240717-120327-arnaudb.json
  • 11:57 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.convert-disks (exit_code=99) for host mw2432
  • 11:54 cgoubert@cumin1002: START - Cookbook sre.hosts.convert-disks for host mw2432
  • 11:52 urbanecm: [urbanecm@mwdebug1001 ~]$ foreachwikiindblist growthexperiments extensions/GrowthExperiments/maintenance/migrateCommunityConfig.php # T366458; output logged to migrateCommunityConfig.log in my home
  • 11:51 urbanecm@deploy1002: urbanecm: Backport for CommunityConfiguration: Release to all Growth wikis, except frwiktionary (T366458) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 11:49 urbanecm@deploy1002: Started scap sync-world: Backport for CommunityConfiguration: Release to all Growth wikis, except frwiktionary (T366458)
  • 11:48 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T367781)', diff saved to https://phabricator.wikimedia.org/P66722 and previous config saved to /var/cache/conftool/dbconfig/20240717-114820-arnaudb.json
  • 11:45 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1174 (T367781)', diff saved to https://phabricator.wikimedia.org/P66721 and previous config saved to /var/cache/conftool/dbconfig/20240717-114510-arnaudb.json
  • 11:45 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 11:44 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 11:44 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 11:44 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 11:44 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1170 (T367781)', diff saved to https://phabricator.wikimedia.org/P66720 and previous config saved to /var/cache/conftool/dbconfig/20240717-114426-arnaudb.json
  • 11:40 marostegui@cumin1002: dbctl commit (dc=all): 'Increase db2136's weight - testing 10.11 T365805', diff saved to https://phabricator.wikimedia.org/P66719 and previous config saved to /var/cache/conftool/dbconfig/20240717-114032-marostegui.json
  • 11:39 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2147 (T367856)', diff saved to https://phabricator.wikimedia.org/P66718 and previous config saved to /var/cache/conftool/dbconfig/20240717-113954-marostegui.json
  • 11:39 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2147.codfw.wmnet with reason: Maintenance
  • 11:39 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2147.codfw.wmnet with reason: Maintenance
  • 11:39 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2140 (T367856)', diff saved to https://phabricator.wikimedia.org/P66717 and previous config saved to /var/cache/conftool/dbconfig/20240717-113932-marostegui.json
  • 11:38 _joe_: deleted pod that was reportedly returning 5xx to the cdn for mw-api-ext
  • 11:29 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1170', diff saved to https://phabricator.wikimedia.org/P66716 and previous config saved to /var/cache/conftool/dbconfig/20240717-112919-arnaudb.json
  • 11:27 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.convert-disks (exit_code=99) for host mw2432
  • 11:24 cgoubert@cumin1002: START - Cookbook sre.hosts.convert-disks for host mw2432
  • 11:24 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2140', diff saved to https://phabricator.wikimedia.org/P66715 and previous config saved to /var/cache/conftool/dbconfig/20240717-112425-marostegui.json
  • 11:23 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on mw2432.codfw.wmnet with reason: RAID conversion testing
  • 11:22 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on mw2432.codfw.wmnet with reason: RAID conversion testing
  • 11:14 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1170', diff saved to https://phabricator.wikimedia.org/P66714 and previous config saved to /var/cache/conftool/dbconfig/20240717-111412-arnaudb.json
  • 11:12 cmooney@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-d8-codfw
  • 11:10 cmooney@cumin1002: START - Cookbook sre.network.tls for network device lsw1-d8-codfw
  • 11:10 cmooney@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-d7-codfw
  • 11:09 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2140', diff saved to https://phabricator.wikimedia.org/P66713 and previous config saved to /var/cache/conftool/dbconfig/20240717-110918-marostegui.json
  • 11:08 cmooney@cumin1002: START - Cookbook sre.network.tls for network device lsw1-d7-codfw
  • 11:08 cmooney@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-d6-codfw
  • 11:05 cmooney@cumin1002: START - Cookbook sre.network.tls for network device lsw1-d6-codfw
  • 11:05 cmooney@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-d5-codfw
  • 11:03 cmooney@cumin1002: START - Cookbook sre.network.tls for network device lsw1-d5-codfw
  • 11:03 cmooney@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-d4-codfw
  • 11:01 cmooney@cumin1002: START - Cookbook sre.network.tls for network device lsw1-d4-codfw
  • 11:01 cmooney@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-d3-codfw
  • 10:59 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1170 (T367781)', diff saved to https://phabricator.wikimedia.org/P66712 and previous config saved to /var/cache/conftool/dbconfig/20240717-105904-arnaudb.json
  • 10:58 cmooney@cumin1002: START - Cookbook sre.network.tls for network device lsw1-d3-codfw
  • 10:58 cmooney@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-d2-codfw
  • 10:56 cmooney@cumin1002: START - Cookbook sre.network.tls for network device lsw1-d2-codfw
  • 10:56 cmooney@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-c7-codfw
  • 10:54 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2140 (T367856)', diff saved to https://phabricator.wikimedia.org/P66711 and previous config saved to /var/cache/conftool/dbconfig/20240717-105411-marostegui.json
  • 10:53 cmooney@cumin1002: START - Cookbook sre.network.tls for network device lsw1-c7-codfw
  • 10:53 cmooney@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-c6-codfw
  • 10:51 cmooney@cumin1002: START - Cookbook sre.network.tls for network device lsw1-c6-codfw
  • 10:51 cmooney@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-c5-codfw
  • 10:49 cmooney@cumin1002: START - Cookbook sre.network.tls for network device lsw1-c5-codfw
  • 10:49 cmooney@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-c4-codfw
  • 10:46 cmooney@cumin1002: START - Cookbook sre.network.tls for network device lsw1-c4-codfw
  • 10:46 cmooney@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-c3-codfw
  • 10:44 cmooney@cumin1002: START - Cookbook sre.network.tls for network device lsw1-c3-codfw
  • 10:44 cmooney@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-c2-codfw
  • 10:41 cmooney@cumin1002: START - Cookbook sre.network.tls for network device lsw1-c2-codfw
  • 10:41 cmooney@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-c1-codfw
  • 10:39 cmooney@cumin1002: START - Cookbook sre.network.tls for network device lsw1-c1-codfw
  • 10:39 cmooney@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-d8-codfw
  • 10:37 cmooney@cumin1002: START - Cookbook sre.network.tls for network device ssw1-d8-codfw
  • 10:37 cmooney@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device ssw1-d1-codfw
  • 10:34 cmooney@cumin1002: START - Cookbook sre.network.tls for network device ssw1-d1-codfw
  • 10:34 cmooney@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device asw1-b4-magru
  • 10:32 cmooney@cumin1002: START - Cookbook sre.network.tls for network device asw1-b4-magru
  • 10:32 cmooney@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device asw1-b3-magru
  • 10:29 cmooney@cumin1002: START - Cookbook sre.network.tls for network device asw1-b3-magru
  • 09:58 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1170 (T367781)', diff saved to https://phabricator.wikimedia.org/P66710 and previous config saved to /var/cache/conftool/dbconfig/20240717-095845-arnaudb.json
  • 09:58 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 09:58 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 09:44 marostegui@cumin1002: dbctl commit (dc=all): 'db1181 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P66709 and previous config saved to /var/cache/conftool/dbconfig/20240717-094412-root.json
  • 09:29 marostegui@cumin1002: dbctl commit (dc=all): 'db1181 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P66708 and previous config saved to /var/cache/conftool/dbconfig/20240717-092907-root.json
  • 09:18 cmooney@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cr2-magru
  • 09:14 cmooney@cumin1002: START - Cookbook sre.network.tls for network device cr2-magru
  • 09:14 marostegui@cumin1002: dbctl commit (dc=all): 'db1181 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P66706 and previous config saved to /var/cache/conftool/dbconfig/20240717-091402-root.json
  • 09:13 cmooney@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device cr1-magru
  • 09:08 cmooney@cumin1002: START - Cookbook sre.network.tls for network device cr1-magru
  • 09:02 elukey@puppetserver1001: conftool action : set/pooled=yes; selector: name=cp4037.ulsfo.wmnet
  • 08:58 marostegui@cumin1002: dbctl commit (dc=all): 'db1181 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P66705 and previous config saved to /var/cache/conftool/dbconfig/20240717-085857-root.json
  • 08:57 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4037.ulsfo.wmnet
  • 08:48 elukey@cumin1002: START - Cookbook sre.hosts.reboot-single for host cp4037.ulsfo.wmnet
  • 08:47 elukey@puppetserver1001: conftool action : set/pooled=no; selector: name=cp4037.ulsfo.wmnet
  • 08:43 marostegui@cumin1002: dbctl commit (dc=all): 'db1181 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P66704 and previous config saved to /var/cache/conftool/dbconfig/20240717-084351-root.json
  • 08:06 kartik@deploy1002: Finished scap: Backport for TranslatablePageState: Check if banner namespaces are configured (T370219) (duration: 14m 26s)
  • 08:00 kartik@deploy1002: abi, kartik: Continuing with sync
  • 07:54 kartik@deploy1002: abi, kartik: Backport for TranslatablePageState: Check if banner namespaces are configured (T370219) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 07:51 kartik@deploy1002: Started scap sync-world: Backport for TranslatablePageState: Check if banner namespaces are configured (T370219)
  • 07:50 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 07:50 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 07:50 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 07:49 elukey: restart hadoop-mapreduce-historyserver.service on an-master1003 - failed for Java OOM
  • 07:49 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 07:38 elukey@cumin1002: END (PASS) - Cookbook sre.network.tls (exit_code=0) for network device lsw1-d1-codfw
  • 07:37 jayme: imported helm3 3.11.3 to bullseye-wikimedia and buster-wikimedia
  • 07:36 elukey@cumin1002: START - Cookbook sre.network.tls for network device lsw1-d1-codfw
  • 06:48 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'clear' for AS: 17072
  • 06:48 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'clear' for AS: 17072
  • 05:36 marostegui: Deploy schema change on s7 eqiad db1181 dbmaint T367856
  • 05:35 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1181.eqiad.wmnet with reason: Long schema change
  • 05:35 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1181.eqiad.wmnet with reason: Long schema change
  • 05:34 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1181 T370121', diff saved to https://phabricator.wikimedia.org/P66703 and previous config saved to /var/cache/conftool/dbconfig/20240717-053359-marostegui.json
  • 05:33 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db1236 to s7 primary and set section read-write T370121', diff saved to https://phabricator.wikimedia.org/P66702 and previous config saved to /var/cache/conftool/dbconfig/20240717-053302-root.json
  • 05:32 marostegui@cumin1002: dbctl commit (dc=all): 'Set s7 eqiad as read-only for maintenance - T370121', diff saved to https://phabricator.wikimedia.org/P66701 and previous config saved to /var/cache/conftool/dbconfig/20240717-053230-root.json
  • 05:32 marostegui: Starting s7 eqiad failover from db1181 to db1236 - T370121
  • 05:14 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 27 hosts with reason: Primary switchover s7 T370121
  • 05:14 marostegui@cumin1002: dbctl commit (dc=all): 'Set db1236 with weight 0 T370121', diff saved to https://phabricator.wikimedia.org/P66700 and previous config saved to /var/cache/conftool/dbconfig/20240717-051419-root.json
  • 05:14 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 27 hosts with reason: Primary switchover s7 T370121
  • 02:56 eileen: civicrm upgraded from 4f919c1e to 1ac3e7be
  • 00:42 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Unbanning all hosts in search_eqiad
  • 00:42 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Unbanning all hosts in search_eqiad

2024-07-16

  • 23:33 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2209 (T367781)', diff saved to https://phabricator.wikimedia.org/P66699 and previous config saved to /var/cache/conftool/dbconfig/20240716-233336-arnaudb.json
  • 23:25 cstone: civicrm upgraded from 8dbcdfb7 to 4f919c1e
  • 23:18 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2209', diff saved to https://phabricator.wikimedia.org/P66698 and previous config saved to /var/cache/conftool/dbconfig/20240716-231829-arnaudb.json
  • 23:04 eileen: config revision changed from a1ed167f to 85336766
  • 23:03 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2209', diff saved to https://phabricator.wikimedia.org/P66697 and previous config saved to /var/cache/conftool/dbconfig/20240716-230322-arnaudb.json
  • 22:48 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2209 (T367781)', diff saved to https://phabricator.wikimedia.org/P66696 and previous config saved to /var/cache/conftool/dbconfig/20240716-224815-arnaudb.json
  • 22:40 tzatziki: removing 9 files for legal compliance
  • 22:37 eileen: * civicrm upgraded from 3287ced0 to 8dbcdfb7
  • 22:26 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2209 (T367781)', diff saved to https://phabricator.wikimedia.org/P66695 and previous config saved to /var/cache/conftool/dbconfig/20240716-222638-arnaudb.json
  • 22:26 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2209.codfw.wmnet with reason: Maintenance
  • 22:26 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2209.codfw.wmnet with reason: Maintenance
  • 22:26 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 (T367781)', diff saved to https://phabricator.wikimedia.org/P66694 and previous config saved to /var/cache/conftool/dbconfig/20240716-222616-arnaudb.json
  • 22:11 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P66693 and previous config saved to /var/cache/conftool/dbconfig/20240716-221109-arnaudb.json
  • 21:59 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dbproxy2008.codfw.wmnet with OS bookworm
  • 21:56 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P66692 and previous config saved to /var/cache/conftool/dbconfig/20240716-215601-arnaudb.json
  • 21:40 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 (T367781)', diff saved to https://phabricator.wikimedia.org/P66691 and previous config saved to /var/cache/conftool/dbconfig/20240716-214054-arnaudb.json
  • 21:19 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2194 (T367781)', diff saved to https://phabricator.wikimedia.org/P66690 and previous config saved to /var/cache/conftool/dbconfig/20240716-211914-arnaudb.json
  • 21:19 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2194.codfw.wmnet with reason: Maintenance
  • 21:18 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2194.codfw.wmnet with reason: Maintenance
  • 21:18 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2190 (T367781)', diff saved to https://phabricator.wikimedia.org/P66689 and previous config saved to /var/cache/conftool/dbconfig/20240716-211852-arnaudb.json
  • 21:03 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P66688 and previous config saved to /var/cache/conftool/dbconfig/20240716-210345-arnaudb.json
  • 20:54 urbanecm@deploy1002: Finished scap: Backport for [July 16th] Enable dark mode for logged out users (tier 1) (T367150) (duration: 08m 43s)
  • 20:49 urbanecm@deploy1002: urbanecm, jdlrobson: Continuing with sync
  • 20:48 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P66687 and previous config saved to /var/cache/conftool/dbconfig/20240716-204838-arnaudb.json
  • 20:48 urbanecm@deploy1002: urbanecm, jdlrobson: Backport for [July 16th] Enable dark mode for logged out users (tier 1) (T367150) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:45 urbanecm@deploy1002: Started scap sync-world: Backport for [July 16th] Enable dark mode for logged out users (tier 1) (T367150)
  • 20:39 urbanecm@deploy1002: Finished scap: Backport for Ensure every test-config has valid defaults, Merge partial config with defaults (T368606), Merge partial config with defaults (T368606) (duration: 09m 55s)
  • 20:38 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host dbproxy2008.codfw.wmnet with OS bookworm
  • 20:34 urbanecm@deploy1002: urbanecm, migr: Continuing with sync
  • 20:33 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2190 (T367781)', diff saved to https://phabricator.wikimedia.org/P66686 and previous config saved to /var/cache/conftool/dbconfig/20240716-203331-arnaudb.json
  • 20:33 urbanecm@deploy1002: urbanecm, migr: Backport for Ensure every test-config has valid defaults, Merge partial config with defaults (T368606), Merge partial config with defaults (T368606) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:30 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host dbproxy2008.codfw.wmnet with OS bookworm
  • 20:30 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host dbproxy2008.codfw.wmnet with OS bookworm
  • 20:29 urbanecm@deploy1002: Started scap sync-world: Backport for Ensure every test-config has valid defaults, Merge partial config with defaults (T368606), Merge partial config with defaults (T368606)
  • 20:27 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host dbproxy2008.codfw.wmnet with OS bookworm
  • 20:14 urbanecm@deploy1002: Finished scap: Backport for foundationwiki: Restrict `unfuzzy` right to autoconfirmed users (T369979) (duration: 09m 31s)
  • 20:12 swfrench@cumin2002: conftool action : set/pooled=true; selector: dnsdisc=appservers-ro,name=eqiad [reason: Repooling to concentrate clients in eqiad - T367949]
  • 20:11 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2190 (T367781)', diff saved to https://phabricator.wikimedia.org/P66685 and previous config saved to /var/cache/conftool/dbconfig/20240716-201153-arnaudb.json
  • 20:11 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2190.codfw.wmnet with reason: Maintenance
  • 20:11 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2190.codfw.wmnet with reason: Maintenance
  • 20:11 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T367781)', diff saved to https://phabricator.wikimedia.org/P66684 and previous config saved to /var/cache/conftool/dbconfig/20240716-201131-arnaudb.json
  • 20:09 urbanecm@deploy1002: seawolf35gerrit, urbanecm: Continuing with sync
  • 20:09 urbanecm@deploy1002: seawolf35gerrit, urbanecm: Backport for foundationwiki: Restrict `unfuzzy` right to autoconfirmed users (T369979) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:05 urbanecm@deploy1002: Started scap sync-world: Backport for foundationwiki: Restrict `unfuzzy` right to autoconfirmed users (T369979)
  • 19:56 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P66683 and previous config saved to /var/cache/conftool/dbconfig/20240716-195624-arnaudb.json
  • 19:41 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P66682 and previous config saved to /var/cache/conftool/dbconfig/20240716-194117-arnaudb.json
  • 19:26 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T367781)', diff saved to https://phabricator.wikimedia.org/P66681 and previous config saved to /var/cache/conftool/dbconfig/20240716-192610-arnaudb.json
  • 19:25 swfrench@cumin2002: conftool action : set/pooled=false; selector: dnsdisc=appservers-ro,name=eqiad [reason: Depooling ahead of turndown - T367949]
  • 19:24 swfrench-wmf: depooling appservers-ro in eqiad, which is not used by remaining analytics workloads - T367949
  • 19:18 cdanis@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 19:18 cdanis@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 19:17 cdanis@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 19:15 cdanis@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 19:07 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host dbproxy2008.codfw.wmnet with OS bookworm
  • 19:05 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2177 (T367781)', diff saved to https://phabricator.wikimedia.org/P66680 and previous config saved to /var/cache/conftool/dbconfig/20240716-190526-arnaudb.json
  • 19:05 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2177.codfw.wmnet with reason: Maintenance
  • 19:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2177.codfw.wmnet with reason: Maintenance
  • 19:05 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T367781)', diff saved to https://phabricator.wikimedia.org/P66679 and previous config saved to /var/cache/conftool/dbconfig/20240716-190504-arnaudb.json
  • 18:56 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2140 (T367856)', diff saved to https://phabricator.wikimedia.org/P66678 and previous config saved to /var/cache/conftool/dbconfig/20240716-185657-marostegui.json
  • 18:56 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2140.codfw.wmnet with reason: Maintenance
  • 18:56 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2140.codfw.wmnet with reason: Maintenance
  • 18:51 cdanis@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 18:50 cdanis@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 18:49 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P66677 and previous config saved to /var/cache/conftool/dbconfig/20240716-184956-arnaudb.json
  • 18:49 cdanis@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 18:49 cdanis@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 18:45 pt1979@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host dbproxy2007.codfw.wmnet with OS bookworm
  • 18:34 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P66675 and previous config saved to /var/cache/conftool/dbconfig/20240716-183449-arnaudb.json
  • 18:27 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host dbproxy2007.codfw.wmnet with OS bookworm
  • 18:19 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T367781)', diff saved to https://phabricator.wikimedia.org/P66674 and previous config saved to /var/cache/conftool/dbconfig/20240716-181942-arnaudb.json
  • 18:14 dancy@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.43.0-wmf.14 refs T366959
  • 18:00 dancy@deploy1002: Installing scap version "4.92.0" for 232 hosts
  • 17:59 otto@deploy1002: Finished deploy [analytics/refinery@f97900c]: Deploy refinery with refinery-source version 0.2.44 for mw on k8s - take 3 [analytics/refinery@f97900c9] (duration: 00m 47s)
  • 17:58 otto@deploy1002: Started deploy [analytics/refinery@f97900c]: Deploy refinery with refinery-source version 0.2.44 for mw on k8s - take 3 [analytics/refinery@f97900c9]
  • 17:58 otto@deploy1002: Finished deploy [analytics/refinery@f97900c]: Deploy refinery with refinery-source version 0.2.44 for mw on k8s - take 2 [analytics/refinery@f97900c9] (duration: 02m 44s)
  • 17:58 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2156 (T367781)', diff saved to https://phabricator.wikimedia.org/P66672 and previous config saved to /var/cache/conftool/dbconfig/20240716-175820-arnaudb.json
  • 17:58 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 17:58 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 17:57 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2156.codfw.wmnet with reason: Maintenance
  • 17:57 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2156.codfw.wmnet with reason: Maintenance
  • 17:57 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T367781)', diff saved to https://phabricator.wikimedia.org/P66671 and previous config saved to /var/cache/conftool/dbconfig/20240716-175742-arnaudb.json
  • 17:55 otto@deploy1002: Started deploy [analytics/refinery@f97900c]: Deploy refinery with refinery-source version 0.2.44 for mw on k8s - take 2 [analytics/refinery@f97900c9]
  • 17:55 otto@deploy1002: Finished deploy [analytics/refinery@f97900c]: Deploy refinery with refinery-source version 0.2.44 for mw on k8s [analytics/refinery@f97900c9] (duration: 08m 33s)
  • 17:55 cdanis@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 17:53 cdanis@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 17:47 otto@deploy1002: Started deploy [analytics/refinery@f97900c]: Deploy refinery with refinery-source version 0.2.44 for mw on k8s [analytics/refinery@f97900c9]
  • 17:47 otto@deploy1002: Finished deploy [analytics/refinery@f97900c] (hadoop-test): Deploy refinery with refinery-source version 0.2.44 for mw on k8s - TEST [analytics/refinery@f97900c9] (duration: 03m 23s)
  • 17:46 swfrench-wmf: appservers-rw and api-rw now resolve to failoid - T367949
  • 17:44 otto@deploy1002: Started deploy [analytics/refinery@f97900c] (hadoop-test): Deploy refinery with refinery-source version 0.2.44 for mw on k8s - TEST [analytics/refinery@f97900c9]
  • 17:44 swfrench@cumin2002: conftool action : set/pooled=false; selector: dnsdisc=api-rw,name=eqiad [reason: Depooling ahead of turndown - T367949]
  • 17:43 swfrench@cumin2002: conftool action : set/pooled=false; selector: dnsdisc=appservers-rw,name=eqiad [reason: Depooling ahead of turndown - T367949]
  • 17:42 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P66670 and previous config saved to /var/cache/conftool/dbconfig/20240716-174235-arnaudb.json
  • 17:40 swfrench@cumin2002: conftool action : set/pooled=false; selector: dnsdisc=api-ro,name=codfw [reason: Depooling ahead of turndown - T367949]
  • 17:39 swfrench@cumin2002: conftool action : set/pooled=false; selector: dnsdisc=appservers-ro,name=codfw [reason: Depooling ahead of turndown - T367949]
  • 17:27 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P66669 and previous config saved to /var/cache/conftool/dbconfig/20240716-172727-arnaudb.json
  • 17:14 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbproxy2006.codfw.wmnet with OS bookworm
  • 17:14 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 17:12 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 17:12 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T367781)', diff saved to https://phabricator.wikimedia.org/P66668 and previous config saved to /var/cache/conftool/dbconfig/20240716-171220-arnaudb.json
  • 17:00 mutante: lists2001 - systemctl reset-failed after gerrit:1054610 to fix T370098
  • 16:56 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbproxy2006.codfw.wmnet with reason: host reimage
  • 16:53 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dbproxy2006.codfw.wmnet with reason: host reimage
  • 16:51 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2149 (T367781)', diff saved to https://phabricator.wikimedia.org/P66667 and previous config saved to /var/cache/conftool/dbconfig/20240716-165135-arnaudb.json
  • 16:51 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2149.codfw.wmnet with reason: Maintenance
  • 16:51 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2149.codfw.wmnet with reason: Maintenance
  • 16:44 arnaudb@cumin1002: dbctl commit (dc=all): 'db1201 (re)pooling @ 100%: post T365997 repool', diff saved to https://phabricator.wikimedia.org/P66666 and previous config saved to /var/cache/conftool/dbconfig/20240716-164446-arnaudb.json
  • 16:44 arnaudb@cumin1002: dbctl commit (dc=all): 'db1200 (re)pooling @ 100%: post T365997 repool', diff saved to https://phabricator.wikimedia.org/P66665 and previous config saved to /var/cache/conftool/dbconfig/20240716-164437-arnaudb.json
  • 16:44 arnaudb@cumin1002: dbctl commit (dc=all): 'db1194 (re)pooling @ 100%: post T365997 repool', diff saved to https://phabricator.wikimedia.org/P66664 and previous config saved to /var/cache/conftool/dbconfig/20240716-164422-arnaudb.json
  • 16:39 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host dbproxy2006.codfw.wmnet with OS bookworm
  • 16:31 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 16:31 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 16:31 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2127 (T367781)', diff saved to https://phabricator.wikimedia.org/P66663 and previous config saved to /var/cache/conftool/dbconfig/20240716-163059-arnaudb.json
  • 16:29 arnaudb@cumin1002: dbctl commit (dc=all): 'db1201 (re)pooling @ 75%: post T365997 repool', diff saved to https://phabricator.wikimedia.org/P66662 and previous config saved to /var/cache/conftool/dbconfig/20240716-162940-arnaudb.json
  • 16:29 arnaudb@cumin1002: dbctl commit (dc=all): 'db1200 (re)pooling @ 75%: post T365997 repool', diff saved to https://phabricator.wikimedia.org/P66661 and previous config saved to /var/cache/conftool/dbconfig/20240716-162931-arnaudb.json
  • 16:29 arnaudb@cumin1002: dbctl commit (dc=all): 'db1194 (re)pooling @ 75%: post T365997 repool', diff saved to https://phabricator.wikimedia.org/P66660 and previous config saved to /var/cache/conftool/dbconfig/20240716-162916-arnaudb.json
  • 16:21 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:21 sukhe@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: merge DNS franio changes (add mgmt IPs) - sukhe@cumin1002"
  • 16:20 sukhe@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: merge DNS franio changes (add mgmt IPs) - sukhe@cumin1002"
  • 16:18 sukhe@cumin1002: START - Cookbook sre.dns.netbox
  • 16:15 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2127', diff saved to https://phabricator.wikimedia.org/P66659 and previous config saved to /var/cache/conftool/dbconfig/20240716-161552-arnaudb.json
  • 16:14 arnaudb@cumin1002: dbctl commit (dc=all): 'db1201 (re)pooling @ 50%: post T365997 repool', diff saved to https://phabricator.wikimedia.org/P66658 and previous config saved to /var/cache/conftool/dbconfig/20240716-161435-arnaudb.json
  • 16:14 arnaudb@cumin1002: dbctl commit (dc=all): 'db1200 (re)pooling @ 50%: post T365997 repool', diff saved to https://phabricator.wikimedia.org/P66657 and previous config saved to /var/cache/conftool/dbconfig/20240716-161426-arnaudb.json
  • 16:14 arnaudb@cumin1002: dbctl commit (dc=all): 'db1194 (re)pooling @ 50%: post T365997 repool', diff saved to https://phabricator.wikimedia.org/P66656 and previous config saved to /var/cache/conftool/dbconfig/20240716-161411-arnaudb.json
  • 16:00 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2127', diff saved to https://phabricator.wikimedia.org/P66655 and previous config saved to /var/cache/conftool/dbconfig/20240716-160044-arnaudb.json
  • 15:59 arnaudb@cumin1002: dbctl commit (dc=all): 'db1201 (re)pooling @ 25%: post T365997 repool', diff saved to https://phabricator.wikimedia.org/P66654 and previous config saved to /var/cache/conftool/dbconfig/20240716-155930-arnaudb.json
  • 15:59 arnaudb@cumin1002: dbctl commit (dc=all): 'db1200 (re)pooling @ 25%: post T365997 repool', diff saved to https://phabricator.wikimedia.org/P66653 and previous config saved to /var/cache/conftool/dbconfig/20240716-155920-arnaudb.json
  • 15:59 arnaudb@cumin1002: dbctl commit (dc=all): 'db1194 (re)pooling @ 25%: post T365997 repool', diff saved to https://phabricator.wikimedia.org/P66652 and previous config saved to /var/cache/conftool/dbconfig/20240716-155905-arnaudb.json
  • 15:58 elukey: uploaded spicerack_8.7.0 to apt.wikimedia.org bullseye-wikimedia
  • 15:52 marostegui@cumin1002: dbctl commit (dc=all): 'db1158 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P66651 and previous config saved to /var/cache/conftool/dbconfig/20240716-155221-root.json
  • 15:45 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2127 (T367781)', diff saved to https://phabricator.wikimedia.org/P66650 and previous config saved to /var/cache/conftool/dbconfig/20240716-154537-arnaudb.json
  • 15:44 arnaudb@cumin1002: dbctl commit (dc=all): 'db1201 (re)pooling @ 10%: post T365997 repool', diff saved to https://phabricator.wikimedia.org/P66649 and previous config saved to /var/cache/conftool/dbconfig/20240716-154424-arnaudb.json
  • 15:44 arnaudb@cumin1002: dbctl commit (dc=all): 'db1200 (re)pooling @ 10%: post T365997 repool', diff saved to https://phabricator.wikimedia.org/P66648 and previous config saved to /var/cache/conftool/dbconfig/20240716-154415-arnaudb.json
  • 15:44 arnaudb@cumin1002: dbctl commit (dc=all): 'db1194 (re)pooling @ 10%: post T365997 repool', diff saved to https://phabricator.wikimedia.org/P66647 and previous config saved to /var/cache/conftool/dbconfig/20240716-154401-arnaudb.json
  • 15:39 dcausse@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:39 dcausse@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:37 papaul: reboot fpc0 on fasw-c-codfw.mgmt.codfw.wmnet
  • 15:37 marostegui@cumin1002: dbctl commit (dc=all): 'db1158 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P66646 and previous config saved to /var/cache/conftool/dbconfig/20240716-153715-root.json
  • 15:36 dcausse@deploy1002: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:35 dcausse@deploy1002: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:32 dcausse@deploy1002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:32 dcausse@deploy1002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:29 arnaudb@cumin1002: dbctl commit (dc=all): 'db1201 (re)pooling @ 5%: post T365997 repool', diff saved to https://phabricator.wikimedia.org/P66645 and previous config saved to /var/cache/conftool/dbconfig/20240716-152918-arnaudb.json
  • 15:29 arnaudb@cumin1002: dbctl commit (dc=all): 'db1200 (re)pooling @ 5%: post T365997 repool', diff saved to https://phabricator.wikimedia.org/P66644 and previous config saved to /var/cache/conftool/dbconfig/20240716-152910-arnaudb.json
  • 15:28 arnaudb@cumin1002: dbctl commit (dc=all): 'db1194 (re)pooling @ 5%: post T365997 repool', diff saved to https://phabricator.wikimedia.org/P66643 and previous config saved to /var/cache/conftool/dbconfig/20240716-152855-arnaudb.json
  • 15:27 cgoubert@cumin1002: conftool action : set/pooled=yes; selector: name=(kubernetes1062.eqiad.wmnet|mw1494.eqiad.wmnet|mw1495.eqiad.wmnet),cluster=kubernetes,service=kubesvc
  • 15:27 claime: Uncordoning kubernetes1062.eqiad.wmnet mw1494.eqiad.wmnet mw1495.eqiad.wmnet - T365997
  • 15:23 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2127 (T367781)', diff saved to https://phabricator.wikimedia.org/P66642 and previous config saved to /var/cache/conftool/dbconfig/20240716-152349-arnaudb.json
  • 15:23 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2127.codfw.wmnet with reason: Maintenance
  • 15:23 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2127.codfw.wmnet with reason: Maintenance
  • 15:22 marostegui@cumin1002: dbctl commit (dc=all): 'db1158 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P66641 and previous config saved to /var/cache/conftool/dbconfig/20240716-152209-root.json
  • 15:19 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 15:19 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 15:15 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1240.eqiad.wmnet with reason: Maintenance
  • 15:15 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1240.eqiad.wmnet with reason: Maintenance
  • 15:15 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1212 (T367781)', diff saved to https://phabricator.wikimedia.org/P66640 and previous config saved to /var/cache/conftool/dbconfig/20240716-151516-arnaudb.json
  • 15:08 topranks: Rebooting lsw1-f2-eqiad to complete JunOS upgrade T365997
  • 15:08 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 21 hosts with reason: JunOS upgrade lsw1-f2-eqiad
  • 15:07 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on 21 hosts with reason: JunOS upgrade lsw1-f2-eqiad
  • 15:07 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lsw1-f2-eqiad,lsw1-f2-eqiad IPv6,ssw1-e1-eqiad.mgmt,ssw1-f1-eqiad.mgmt with reason: JunOS upgrade lsw1-f2-eqiad
  • 15:07 marostegui@cumin1002: dbctl commit (dc=all): 'db1158 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P66638 and previous config saved to /var/cache/conftool/dbconfig/20240716-150704-root.json
  • 15:06 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on lsw1-f2-eqiad,lsw1-f2-eqiad IPv6,ssw1-e1-eqiad.mgmt,ssw1-f1-eqiad.mgmt with reason: JunOS upgrade lsw1-f2-eqiad
  • 15:06 brennen@deploy1002: Finished deploy [phabricator/deployment@7335128]: deploy phab1004 for T370109 (duration: 00m 52s)
  • 15:05 godog: silence OtelCollectorRefusedSpans in codfw for 7d - T370043
  • 15:05 godog: silence OtelCollectorRefusedSpans in codfw for 7d
  • 15:05 brennen@deploy1002: Started deploy [phabricator/deployment@7335128]: deploy phab1004 for T370109
  • 15:04 brennen@deploy1002: Finished deploy [phabricator/deployment@7335128]: test deploy phab2002 for T370109 (duration: 00m 34s)
  • 15:04 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab.wmfusercontent.org with reason: Phabricator/Phorge update
  • 15:04 jelto@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on phab.wmfusercontent.org with reason: Phabricator/Phorge update
  • 15:04 brennen@deploy1002: Started deploy [phabricator/deployment@7335128]: test deploy phab2002 for T370109
  • 15:02 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab1004.eqiad.wmnet with reason: Phabricator/Phorge update
  • 15:02 jelto@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on phab1004.eqiad.wmnet with reason: Phabricator/Phorge update
  • 15:02 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab2002.codfw.wmnet with reason: Phabricator/Phorge update
  • 15:02 jelto@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on phab2002.codfw.wmnet with reason: Phabricator/Phorge update
  • {{safesubst:SAL entry|1=15:01 urbanecm@deploy1002: Finished scap: Backport for Introduce Vanish Request Flow (T367329 T367726 T367728 T367729 T367744 T368177 T368285 T368368 T368372 T368611 T369489), Pass wiki id to actor store for cross-db hasPublicLogs query (T370059), Properly set automatic vanish performer on GlobalRenameUser (T368177), [[gerrit:1053373|Enable account vanishing in Centra}}
  • 15:00 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1212', diff saved to https://phabricator.wikimedia.org/P66637 and previous config saved to /var/cache/conftool/dbconfig/20240716-150007-arnaudb.json
  • 14:53 urbanecm@deploy1002: dbrant, urbanecm: Continuing with sync
  • {{safesubst:SAL entry|1=14:53 urbanecm@deploy1002: dbrant, urbanecm: Backport for Introduce Vanish Request Flow (T367329 T367726 T367728 T367729 T367744 T368177 T368285 T368368 T368372 T368611 T369489), Pass wiki id to actor store for cross-db hasPublicLogs query (T370059), Properly set automatic vanish performer on GlobalRenameUser (T368177), [[gerrit:1053373|Enable account vanishing in Cen}}
  • 14:53 filippo@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on centrallog2002.codfw.wmnet with reason: network upgrade
  • 14:53 filippo@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on centrallog2002.codfw.wmnet with reason: network upgrade
  • 14:51 marostegui@cumin1002: dbctl commit (dc=all): 'db1158 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P66636 and previous config saved to /var/cache/conftool/dbconfig/20240716-145159-root.json
  • 14:49 sukhe: [durum1001] upgrade anycast-healthchecker to 0.9.8-1+wmf12u1: T370068
  • 14:46 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:50:00 on lsw1-f2-eqiad.mgmt with reason: prep JunOS upgrade lsw1-f2-eqiad
  • 14:46 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:50:00 on lsw1-f2-eqiad.mgmt with reason: prep JunOS upgrade lsw1-f2-eqiad
  • 14:45 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1212', diff saved to https://phabricator.wikimedia.org/P66635 and previous config saved to /var/cache/conftool/dbconfig/20240716-144500-arnaudb.json
  • 14:44 sukhe: reprepro -C main include bookworm-wikimedia anycast-healthchecker_0.9.8-1+wmf12u1_amd64.changes: T370068
  • 14:36 cgoubert@cumin1002: conftool action : set/pooled=inactive; selector: name=(kubernetes1062.eqiad.wmnet|mw1494.eqiad.wmnet|mw1495.eqiad.wmnet),cluster=kubernetes,service=kubesvc
  • 14:34 claime: Cordoning kubernetes1062.eqiad.wmnet mw1494.eqiad.wmnet mw1495.eqiad.wmnet - T365997
  • 14:33 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db[1194,1200-1201].eqiad.wmnet,dbstore1009.eqiad.wmnet with reason: T365997
  • 14:33 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db[1194,1200-1201].eqiad.wmnet,dbstore1009.eqiad.wmnet with reason: T365997
  • 14:33 arnaudb@cumin1002: dbctl commit (dc=all): 'T365997 - depool db1194-s7,db1200-s5,db1201-s6', diff saved to https://phabricator.wikimedia.org/P66634 and previous config saved to /var/cache/conftool/dbconfig/20240716-143306-arnaudb.json
  • 14:29 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1212 (T367781)', diff saved to https://phabricator.wikimedia.org/P66633 and previous config saved to /var/cache/conftool/dbconfig/20240716-142953-arnaudb.json
  • {{safesubst:SAL entry|1=14:26 urbanecm@deploy1002: Started scap sync-world: Backport for Introduce Vanish Request Flow (T367329 T367726 T367728 T367729 T367744 T368177 T368285 T368368 T368372 T368611 T369489), Pass wiki id to actor store for cross-db hasPublicLogs query (T370059), Properly set automatic vanish performer on GlobalRenameUser (T368177), [[gerrit:1053373|Enable account vanishing}}
  • 14:23 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1212 (T367781)', diff saved to https://phabricator.wikimedia.org/P66632 and previous config saved to /var/cache/conftool/dbconfig/20240716-142321-arnaudb.json
  • 14:23 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 14:22 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 14:22 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1212.eqiad.wmnet with reason: Maintenance
  • 14:22 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1212.eqiad.wmnet with reason: Maintenance
  • 14:20 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T367781)', diff saved to https://phabricator.wikimedia.org/P66631 and previous config saved to /var/cache/conftool/dbconfig/20240716-142029-arnaudb.json
  • 14:12 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 14:11 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 14:10 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 14:08 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 14:07 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 14:07 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 14:05 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P66630 and previous config saved to /var/cache/conftool/dbconfig/20240716-140522-arnaudb.json
  • 14:03 cgoubert@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts mw2432.codfw.wmnet
  • 13:53 cgoubert@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mw2432.codfw.wmnet
  • 13:50 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P66629 and previous config saved to /var/cache/conftool/dbconfig/20240716-135015-arnaudb.json
  • away: UTC afternoon deploys done
  • 13:39 tgr@deploy1002: Finished scap: Backport for Handle sso.wikimedia.org domain (T365162) (duration: 19m 07s)
  • 13:35 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T367781)', diff saved to https://phabricator.wikimedia.org/P66628 and previous config saved to /var/cache/conftool/dbconfig/20240716-133508-arnaudb.json
  • 13:34 tgr@deploy1002: tgr: Continuing with sync
  • 13:29 mforns@deploy1002: Finished deploy [airflow-dags/analytics@1ee55b8]: (no justification provided) (duration: 00m 30s)
  • 13:29 mforns@deploy1002: Started deploy [airflow-dags/analytics@1ee55b8]: (no justification provided)
  • 13:29 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1198 (T367781)', diff saved to https://phabricator.wikimedia.org/P66627 and previous config saved to /var/cache/conftool/dbconfig/20240716-132915-arnaudb.json
  • 13:29 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1198.eqiad.wmnet with reason: Maintenance
  • 13:28 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1198.eqiad.wmnet with reason: Maintenance
  • 13:28 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1189 (T367781)', diff saved to https://phabricator.wikimedia.org/P66626 and previous config saved to /var/cache/conftool/dbconfig/20240716-132853-arnaudb.json
  • 13:22 tgr@deploy1002: tgr: Backport for Handle sso.wikimedia.org domain (T365162) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:20 tgr@deploy1002: Started scap sync-world: Backport for Handle sso.wikimedia.org domain (T365162)
  • 13:15 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for EventStreamConfig: Enable hive ingestion for mediawiki.page-delete (T367134) (duration: 10m 15s)
  • 13:13 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P66625 and previous config saved to /var/cache/conftool/dbconfig/20240716-131346-arnaudb.json
  • 13:10 logmsgbot: lucaswerkmeister-wmde@deploy1002 tchin, lucaswerkmeister-wmde: Continuing with sync
  • 13:09 logmsgbot: lucaswerkmeister-wmde@deploy1002 tchin, lucaswerkmeister-wmde: Backport for EventStreamConfig: Enable hive ingestion for mediawiki.page-delete (T367134) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:05 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap sync-world: Backport for EventStreamConfig: Enable hive ingestion for mediawiki.page-delete (T367134)
  • 12:58 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P66624 and previous config saved to /var/cache/conftool/dbconfig/20240716-125839-arnaudb.json
  • 12:46 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2130 (T367856)', diff saved to https://phabricator.wikimedia.org/P66623 and previous config saved to /var/cache/conftool/dbconfig/20240716-124604-marostegui.json
  • 12:45 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2130.codfw.wmnet with reason: Maintenance
  • 12:45 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2130.codfw.wmnet with reason: Maintenance
  • 12:45 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T367856)', diff saved to https://phabricator.wikimedia.org/P66622 and previous config saved to /var/cache/conftool/dbconfig/20240716-124543-marostegui.json
  • 12:43 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1189 (T367781)', diff saved to https://phabricator.wikimedia.org/P66621 and previous config saved to /var/cache/conftool/dbconfig/20240716-124332-arnaudb.json
  • 12:30 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P66620 and previous config saved to /var/cache/conftool/dbconfig/20240716-123035-marostegui.json
  • 12:20 marostegui@cumin1002: dbctl commit (dc=all): 'db1174 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P66619 and previous config saved to /var/cache/conftool/dbconfig/20240716-122039-root.json
  • 12:15 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P66618 and previous config saved to /var/cache/conftool/dbconfig/20240716-121528-marostegui.json
  • 12:10 ayounsi@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) netbox to netbox-dev2003.codfw.wmnet with reason: Release v4.0.7 to netbox-next - ayounsi@cumin1002 - T336275
  • 12:09 ayounsi@cumin1002: START - Cookbook sre.deploy.python-code netbox to netbox-dev2003.codfw.wmnet with reason: Release v4.0.7 to netbox-next - ayounsi@cumin1002 - T336275
  • 12:05 marostegui@cumin1002: dbctl commit (dc=all): 'db1174 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P66617 and previous config saved to /var/cache/conftool/dbconfig/20240716-120534-root.json
  • 12:00 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T367856)', diff saved to https://phabricator.wikimedia.org/P66616 and previous config saved to /var/cache/conftool/dbconfig/20240716-120021-marostegui.json
  • 12:00 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2116 (T367856)', diff saved to https://phabricator.wikimedia.org/P66615 and previous config saved to /var/cache/conftool/dbconfig/20240716-120012-marostegui.json
  • 12:00 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 6:00:00 on db2116.codfw.wmnet with reason: Maintenance
  • 12:00 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 6:00:00 on db2116.codfw.wmnet with reason: Maintenance
  • 11:59 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T367856)', diff saved to https://phabricator.wikimedia.org/P66614 and previous config saved to /var/cache/conftool/dbconfig/20240716-115920-marostegui.json
  • 11:50 marostegui@cumin1002: dbctl commit (dc=all): 'db1174 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P66613 and previous config saved to /var/cache/conftool/dbconfig/20240716-115028-root.json
  • 11:49 effie: drain mw1496.eqiad.wmnet
  • 11:43 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1189 (T367781)', diff saved to https://phabricator.wikimedia.org/P66611 and previous config saved to /var/cache/conftool/dbconfig/20240716-114315-arnaudb.json
  • 11:43 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1189.eqiad.wmnet with reason: Maintenance
  • 11:42 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1189.eqiad.wmnet with reason: Maintenance
  • 11:42 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T367781)', diff saved to https://phabricator.wikimedia.org/P66610 and previous config saved to /var/cache/conftool/dbconfig/20240716-114254-arnaudb.json
  • 11:35 marostegui@cumin1002: dbctl commit (dc=all): 'db1174 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P66608 and previous config saved to /var/cache/conftool/dbconfig/20240716-113523-root.json
  • 11:27 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P66607 and previous config saved to /var/cache/conftool/dbconfig/20240716-112746-arnaudb.json
  • 11:20 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply
  • 11:20 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply
  • 11:20 marostegui@cumin1002: dbctl commit (dc=all): 'db1174 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P66606 and previous config saved to /var/cache/conftool/dbconfig/20240716-112017-root.json
  • 11:12 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P66605 and previous config saved to /var/cache/conftool/dbconfig/20240716-111239-arnaudb.json
  • 11:08 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply
  • 11:07 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply
  • 11:05 marostegui@cumin1002: dbctl commit (dc=all): 'db1174 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P66604 and previous config saved to /var/cache/conftool/dbconfig/20240716-110512-root.json
  • 10:57 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T367781)', diff saved to https://phabricator.wikimedia.org/P66603 and previous config saved to /var/cache/conftool/dbconfig/20240716-105732-arnaudb.json
  • 10:53 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply
  • 10:51 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1175 (T367781)', diff saved to https://phabricator.wikimedia.org/P66602 and previous config saved to /var/cache/conftool/dbconfig/20240716-105139-arnaudb.json
  • 10:51 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 10:51 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 10:51 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T367781)', diff saved to https://phabricator.wikimedia.org/P66601 and previous config saved to /var/cache/conftool/dbconfig/20240716-105117-arnaudb.json
  • 10:50 marostegui@cumin1002: dbctl commit (dc=all): 'db1174 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P66600 and previous config saved to /var/cache/conftool/dbconfig/20240716-105006-root.json
  • 10:36 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P66599 and previous config saved to /var/cache/conftool/dbconfig/20240716-103610-arnaudb.json
  • 10:35 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply
  • 10:21 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P66598 and previous config saved to /var/cache/conftool/dbconfig/20240716-102103-arnaudb.json
  • 10:10 dcausse: T362529: creating aewikimedia CirrusSearch indices with 'mwscript extensions/CirrusSearch/maintenance/UpdateSearchIndexConfig.php --wiki=aewikimedia --cluster=all'
  • 10:05 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T367781)', diff saved to https://phabricator.wikimedia.org/P66597 and previous config saved to /var/cache/conftool/dbconfig/20240716-100556-arnaudb.json
  • 10:00 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1166 (T367781)', diff saved to https://phabricator.wikimedia.org/P66595 and previous config saved to /var/cache/conftool/dbconfig/20240716-100002-arnaudb.json
  • 09:59 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 09:59 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 09:59 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1157 (T367781)', diff saved to https://phabricator.wikimedia.org/P66594 and previous config saved to /var/cache/conftool/dbconfig/20240716-095939-arnaudb.json
  • 09:54 isaranto@deploy1002: helmfile [ml-serve-eqiad] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 09:53 isaranto@deploy1002: helmfile [ml-serve-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 09:52 isaranto@deploy1002: helmfile [ml-serve-eqiad] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
  • 09:52 isaranto@deploy1002: helmfile [ml-serve-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
  • 09:50 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 09:44 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1157', diff saved to https://phabricator.wikimedia.org/P66593 and previous config saved to /var/cache/conftool/dbconfig/20240716-094432-arnaudb.json
  • 09:44 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 09:42 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 09:39 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 09:37 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 09:37 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 09:32 btullis@cumin1002: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0) for database aewikimedia (T362529)
  • 09:29 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1157', diff saved to https://phabricator.wikimedia.org/P66592 and previous config saved to /var/cache/conftool/dbconfig/20240716-092924-arnaudb.json
  • 09:23 isaranto@deploy1002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
  • 09:20 godog: bounce benthos@mw_accesslog_sampler - T369256
  • 09:14 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1157 (T367781)', diff saved to https://phabricator.wikimedia.org/P66591 and previous config saved to /var/cache/conftool/dbconfig/20240716-091418-arnaudb.json
  • 09:12 elukey: update docker-registry to 0.0.14-1 on build2001
  • 09:12 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 09:12 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 09:12 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 09:11 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 09:11 elukey: update docker-report to 0.0.14-1 on bullseye-wikimedia
  • 09:06 btullis@cumin1002: START - Cookbook sre.wikireplicas.add-wiki for database aewikimedia (T362529)
  • 09:04 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 09:03 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 09:03 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 09:03 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 09:03 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 09:02 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 08:50 isaranto@deploy1002: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 08:32 godog: root@kafka-logging1001:~# kafka topics --alter --topic mediawiki.httpd.accesslog --partitions 12 - T369256
  • 08:31 marostegui: Clone dbstore1008:3317 from db1174 T370122
  • 08:28 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1008.eqiad.wmnet with reason: Long schema change
  • 08:28 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1008.eqiad.wmnet with reason: Long schema change
  • 08:27 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1174', diff saved to https://phabricator.wikimedia.org/P66589 and previous config saved to /var/cache/conftool/dbconfig/20240716-082727-root.json
  • 08:22 marostegui@cumin1002: dbctl commit (dc=all): 'db1174 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P66588 and previous config saved to /var/cache/conftool/dbconfig/20240716-082213-root.json
  • 08:14 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1157 (T367781)', diff saved to https://phabricator.wikimedia.org/P66587 and previous config saved to /var/cache/conftool/dbconfig/20240716-081401-arnaudb.json
  • 08:13 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1157.eqiad.wmnet with reason: Maintenance
  • 08:13 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1157.eqiad.wmnet with reason: Maintenance
  • 08:11 marostegui@cumin1002: dbctl commit (dc=all): 'db1157 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P66586 and previous config saved to /var/cache/conftool/dbconfig/20240716-081129-root.json
  • 08:09 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 08:09 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 08:07 marostegui@cumin1002: dbctl commit (dc=all): 'db1157 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P66585 and previous config saved to /var/cache/conftool/dbconfig/20240716-080720-root.json
  • 08:07 marostegui@cumin1002: dbctl commit (dc=all): 'db1174 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P66584 and previous config saved to /var/cache/conftool/dbconfig/20240716-080707-root.json
  • 07:46 klausman@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1006.eqiad.wmnet
  • 07:40 Dreamy_Jazz: Morning UTC backport window done
  • 07:38 klausman@cumin1002: START - Cookbook sre.hosts.reboot-single for host ml-serve1006.eqiad.wmnet
  • 07:38 klausman@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-serve1002.eqiad.wmnet
  • 07:29 Dreamy_Jazz: Restarted MediaModeration scanning scrpt
  • 07:28 klausman@cumin1002: START - Cookbook sre.hosts.reboot-single for host ml-serve1002.eqiad.wmnet
  • 07:19 dreamyjazz@deploy1002: Finished scap: Backport for [CheckUser] Remove wgCheckUserEventTablesMigrationStage config (T366546) (duration: 12m 09s)
  • 07:14 dreamyjazz@deploy1002: dreamyjazz: Continuing with sync
  • 07:14 dreamyjazz@deploy1002: dreamyjazz: Backport for [CheckUser] Remove wgCheckUserEventTablesMigrationStage config (T366546) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 07:13 volans@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 07:13 volans@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Merging pending changes for frack hosts as per IRC discussion - volans@cumin1002"
  • 07:10 volans@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Merging pending changes for frack hosts as per IRC discussion - volans@cumin1002"
  • 07:07 dreamyjazz@deploy1002: Started scap sync-world: Backport for [CheckUser] Remove wgCheckUserEventTablesMigrationStage config (T366546)
  • 07:07 volans@cumin1002: START - Cookbook sre.dns.netbox
  • 06:59 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 52999
  • 06:59 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 52999
  • 06:18 kart_: Updated cxserver to 2024-07-15-100650-production (T354666)
  • 06:16 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 06:16 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 06:12 kevinbazira@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'readability' for release 'main' .
  • 06:12 kevinbazira@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'readability' for release 'main' .
  • 06:11 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 06:11 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 06:06 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 06:05 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 05:43 marostegui: Deploy schema change on s7 eqiad db1174 dbmaint T367856
  • 05:43 marostegui: Deploy schema change on s3 eqiad db1157 dbmaint T367856
  • 05:25 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2116.codfw.wmnet with reason: Long schema change
  • 05:25 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2116.codfw.wmnet with reason: Long schema change
  • 05:17 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1157.eqiad.wmnet with reason: Long schema change
  • 05:17 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1157.eqiad.wmnet with reason: Long schema change
  • 05:17 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1157 T370019', diff saved to https://phabricator.wikimedia.org/P66581 and previous config saved to /var/cache/conftool/dbconfig/20240716-051718-root.json
  • 05:15 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db1223 to s3 primary and set section read-write T370019', diff saved to https://phabricator.wikimedia.org/P66580 and previous config saved to /var/cache/conftool/dbconfig/20240716-051538-root.json
  • 05:15 marostegui@cumin1002: dbctl commit (dc=all): 'Set s3 eqiad as read-only for maintenance - T370019', diff saved to https://phabricator.wikimedia.org/P66579 and previous config saved to /var/cache/conftool/dbconfig/20240716-051516-root.json
  • 05:15 marostegui: Starting s3 eqiad failover from db1157 to db1223 - T370019
  • 04:58 marostegui@cumin1002: dbctl commit (dc=all): 'Set db1223 with weight 0 T370019', diff saved to https://phabricator.wikimedia.org/P66578 and previous config saved to /var/cache/conftool/dbconfig/20240716-045839-root.json
  • 04:58 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1174.eqiad.wmnet with reason: Long schema change
  • 04:58 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1174.eqiad.wmnet with reason: Long schema change
  • 04:58 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1174', diff saved to https://phabricator.wikimedia.org/P66577 and previous config saved to /var/cache/conftool/dbconfig/20240716-045807-marostegui.json
  • 04:58 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s3 T370019
  • 04:57 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 25 hosts with reason: Primary switchover s3 T370019
  • 04:01 mwpresync@deploy1002: Pruned MediaWiki: 1.43.0-wmf.11 (duration: 00m 58s)
  • 03:53 mwpresync@deploy1002: Finished scap: testwikis wikis to 1.43.0-wmf.14 refs T366959 (duration: 50m 56s)
  • 03:03 mwpresync@deploy1002: Started scap sync-world: testwikis wikis to 1.43.0-wmf.14 refs T366959
  • 02:55 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2207 (T367781)', diff saved to https://phabricator.wikimedia.org/P66576 and previous config saved to /var/cache/conftool/dbconfig/20240716-025545-arnaudb.json
  • 02:40 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2207', diff saved to https://phabricator.wikimedia.org/P66575 and previous config saved to /var/cache/conftool/dbconfig/20240716-024038-arnaudb.json
  • 02:25 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2207', diff saved to https://phabricator.wikimedia.org/P66574 and previous config saved to /var/cache/conftool/dbconfig/20240716-022531-arnaudb.json
  • 02:10 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2207 (T367781)', diff saved to https://phabricator.wikimedia.org/P66573 and previous config saved to /var/cache/conftool/dbconfig/20240716-021023-arnaudb.json
  • 02:07 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2207 (T367781)', diff saved to https://phabricator.wikimedia.org/P66572 and previous config saved to /var/cache/conftool/dbconfig/20240716-020751-arnaudb.json
  • 02:07 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2207.codfw.wmnet with reason: Maintenance
  • 02:07 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2207.codfw.wmnet with reason: Maintenance
  • 01:21 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2197.codfw.wmnet with reason: Maintenance
  • 01:21 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2197.codfw.wmnet with reason: Maintenance
  • 01:21 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189 (T367781)', diff saved to https://phabricator.wikimedia.org/P66570 and previous config saved to /var/cache/conftool/dbconfig/20240716-012125-arnaudb.json
  • 01:06 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189', diff saved to https://phabricator.wikimedia.org/P66569 and previous config saved to /var/cache/conftool/dbconfig/20240716-010618-arnaudb.json
  • 00:51 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189', diff saved to https://phabricator.wikimedia.org/P66568 and previous config saved to /var/cache/conftool/dbconfig/20240716-005111-arnaudb.json
  • 00:36 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189 (T367781)', diff saved to https://phabricator.wikimedia.org/P66567 and previous config saved to /var/cache/conftool/dbconfig/20240716-003604-arnaudb.json
  • 00:33 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2189 (T367781)', diff saved to https://phabricator.wikimedia.org/P66566 and previous config saved to /var/cache/conftool/dbconfig/20240716-003331-arnaudb.json
  • 00:33 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2189.codfw.wmnet with reason: Maintenance
  • 00:33 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2189.codfw.wmnet with reason: Maintenance
  • 00:33 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T367781)', diff saved to https://phabricator.wikimedia.org/P66565 and previous config saved to /var/cache/conftool/dbconfig/20240716-003310-arnaudb.json
  • 00:26 zabe: zabe@mwmaint1002:/tmp/upload$ mwscript importImages.php --wiki=commonswiki --comment-ext=txt --user=Trade . # T369998
  • 00:22 zabe: zabe@mwmaint1002:~$ mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=enwiktionary --logwiki=metawiki 'Dodo cham' 'Le GlitcheurHD' # T369777
  • 00:18 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P66564 and previous config saved to /var/cache/conftool/dbconfig/20240716-001802-arnaudb.json
  • 00:02 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P66563 and previous config saved to /var/cache/conftool/dbconfig/20240716-000255-arnaudb.json

2024-07-15

  • 23:54 zabe@deploy1002: Finished scap: Backport for Further configurations for aewikimedia (T362529) (duration: 12m 26s)
  • 23:49 zabe@deploy1002: zabe: Continuing with sync
  • 23:48 zabe: zabe@mwmaint1002:~$ mwscript extensions/WikimediaMaintenance/createExtensionTables.php aewikimedia translate # T362529
  • 23:47 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T367781)', diff saved to https://phabricator.wikimedia.org/P66562 and previous config saved to /var/cache/conftool/dbconfig/20240715-234748-arnaudb.json
  • 23:45 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2175 (T367781)', diff saved to https://phabricator.wikimedia.org/P66561 and previous config saved to /var/cache/conftool/dbconfig/20240715-234516-arnaudb.json
  • 23:45 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2175.codfw.wmnet with reason: Maintenance
  • 23:44 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2175.codfw.wmnet with reason: Maintenance
  • 23:44 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T367781)', diff saved to https://phabricator.wikimedia.org/P66560 and previous config saved to /var/cache/conftool/dbconfig/20240715-234454-arnaudb.json
  • 23:44 zabe@deploy1002: zabe: Backport for Further configurations for aewikimedia (T362529) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 23:42 zabe@deploy1002: Started scap sync-world: Backport for Further configurations for aewikimedia (T362529)
  • 23:29 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P66559 and previous config saved to /var/cache/conftool/dbconfig/20240715-232947-arnaudb.json
  • 23:14 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P66558 and previous config saved to /var/cache/conftool/dbconfig/20240715-231440-arnaudb.json
  • 23:11 logmsgbot: nshahquinn-wmf@deploy1002 Finished deploy [airflow-dags/analytics_product@767d7ad]: (no justification provided) (duration: 00m 08s)
  • 23:11 logmsgbot: nshahquinn-wmf@deploy1002 Started deploy [airflow-dags/analytics_product@767d7ad]: (no justification provided)
  • 22:59 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T367781)', diff saved to https://phabricator.wikimedia.org/P66557 and previous config saved to /var/cache/conftool/dbconfig/20240715-225933-arnaudb.json
  • 22:57 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2148 (T367781)', diff saved to https://phabricator.wikimedia.org/P66556 and previous config saved to /var/cache/conftool/dbconfig/20240715-225701-arnaudb.json
  • 22:56 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2148.codfw.wmnet with reason: Maintenance
  • 22:56 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2148.codfw.wmnet with reason: Maintenance
  • 22:56 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2138 (T367781)', diff saved to https://phabricator.wikimedia.org/P66555 and previous config saved to /var/cache/conftool/dbconfig/20240715-225639-arnaudb.json
  • 22:41 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2138', diff saved to https://phabricator.wikimedia.org/P66554 and previous config saved to /var/cache/conftool/dbconfig/20240715-224131-arnaudb.json
  • 22:26 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2138', diff saved to https://phabricator.wikimedia.org/P66553 and previous config saved to /var/cache/conftool/dbconfig/20240715-222624-arnaudb.json
  • 22:11 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2138 (T367781)', diff saved to https://phabricator.wikimedia.org/P66552 and previous config saved to /var/cache/conftool/dbconfig/20240715-221117-arnaudb.json
  • 22:08 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2138 (T367781)', diff saved to https://phabricator.wikimedia.org/P66551 and previous config saved to /var/cache/conftool/dbconfig/20240715-220845-arnaudb.json
  • 22:08 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 22:08 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 22:08 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2126 (T367781)', diff saved to https://phabricator.wikimedia.org/P66550 and previous config saved to /var/cache/conftool/dbconfig/20240715-220823-arnaudb.json
  • 21:53 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2126', diff saved to https://phabricator.wikimedia.org/P66549 and previous config saved to /var/cache/conftool/dbconfig/20240715-215316-arnaudb.json
  • 21:38 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2126', diff saved to https://phabricator.wikimedia.org/P66548 and previous config saved to /var/cache/conftool/dbconfig/20240715-213809-arnaudb.json
  • 21:23 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2126 (T367781)', diff saved to https://phabricator.wikimedia.org/P66547 and previous config saved to /var/cache/conftool/dbconfig/20240715-212302-arnaudb.json
  • 21:20 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2126 (T367781)', diff saved to https://phabricator.wikimedia.org/P66546 and previous config saved to /var/cache/conftool/dbconfig/20240715-212034-arnaudb.json
  • 21:20 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 21:20 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 21:20 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2126.codfw.wmnet with reason: Maintenance
  • 21:20 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2126.codfw.wmnet with reason: Maintenance
  • 21:20 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2125 (T367781)', diff saved to https://phabricator.wikimedia.org/P66545 and previous config saved to /var/cache/conftool/dbconfig/20240715-211957-arnaudb.json
  • 21:04 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2125', diff saved to https://phabricator.wikimedia.org/P66544 and previous config saved to /var/cache/conftool/dbconfig/20240715-210451-arnaudb.json
  • 20:49 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2125', diff saved to https://phabricator.wikimedia.org/P66543 and previous config saved to /var/cache/conftool/dbconfig/20240715-204944-arnaudb.json
  • 20:39 catrope@deploy1002: Finished scap: Backport for Revert changes in log levels, Revert "Change Linter log level to info" (duration: 07m 41s)
  • 20:35 catrope@deploy1002: arlolra, catrope: Continuing with sync
  • 20:34 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2125 (T367781)', diff saved to https://phabricator.wikimedia.org/P66542 and previous config saved to /var/cache/conftool/dbconfig/20240715-203435-arnaudb.json
  • 20:34 catrope@deploy1002: arlolra, catrope: Backport for Revert changes in log levels, Revert "Change Linter log level to info" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:32 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 20:32 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 20:32 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2137 (T367856)', diff saved to https://phabricator.wikimedia.org/P66541 and previous config saved to /var/cache/conftool/dbconfig/20240715-203233-marostegui.json
  • 20:32 catrope@deploy1002: Started scap sync-world: Backport for Revert changes in log levels, Revert "Change Linter log level to info"
  • 20:32 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2125 (T367781)', diff saved to https://phabricator.wikimedia.org/P66540 and previous config saved to /var/cache/conftool/dbconfig/20240715-203203-arnaudb.json
  • 20:31 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2125.codfw.wmnet with reason: Maintenance
  • 20:31 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2125.codfw.wmnet with reason: Maintenance
  • 20:31 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 20:31 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 20:31 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1246 (T367781)', diff saved to https://phabricator.wikimedia.org/P66539 and previous config saved to /var/cache/conftool/dbconfig/20240715-203120-arnaudb.json
  • 20:29 catrope@deploy1002: Finished scap: Backport for [July 15th] Deploy dark mode to all logged-in users (T368795) (duration: 10m 26s)
  • 20:24 catrope@deploy1002: jdlrobson, catrope: Continuing with sync
  • 20:22 catrope@deploy1002: jdlrobson, catrope: Backport for [July 15th] Deploy dark mode to all logged-in users (T368795) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:19 ryankemper: [WDQS Deploy] Deploy complete. Successful test query placed on query.wikidata.org, there's no relevant criticals in Icinga, and Grafana looks good
  • 20:18 catrope@deploy1002: Started scap sync-world: Backport for [July 15th] Deploy dark mode to all logged-in users (T368795)
  • 20:17 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2137', diff saved to https://phabricator.wikimedia.org/P66538 and previous config saved to /var/cache/conftool/dbconfig/20240715-201726-marostegui.json
  • 20:16 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1246', diff saved to https://phabricator.wikimedia.org/P66537 and previous config saved to /var/cache/conftool/dbconfig/20240715-201613-arnaudb.json
  • 20:02 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2137', diff saved to https://phabricator.wikimedia.org/P66536 and previous config saved to /var/cache/conftool/dbconfig/20240715-200218-marostegui.json
  • 20:01 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1246', diff saved to https://phabricator.wikimedia.org/P66535 and previous config saved to /var/cache/conftool/dbconfig/20240715-200106-arnaudb.json
  • 19:55 marostegui@cumin1002: dbctl commit (dc=all): 'db2121 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P66534 and previous config saved to /var/cache/conftool/dbconfig/20240715-195510-root.json
  • 19:55 marostegui@cumin1002: dbctl commit (dc=all): 'db1170 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P66533 and previous config saved to /var/cache/conftool/dbconfig/20240715-195459-root.json
  • 19:47 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2137 (T367856)', diff saved to https://phabricator.wikimedia.org/P66532 and previous config saved to /var/cache/conftool/dbconfig/20240715-194711-marostegui.json
  • 19:46 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1246 (T367781)', diff saved to https://phabricator.wikimedia.org/P66531 and previous config saved to /var/cache/conftool/dbconfig/20240715-194559-arnaudb.json
  • 19:43 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1246 (T367781)', diff saved to https://phabricator.wikimedia.org/P66530 and previous config saved to /var/cache/conftool/dbconfig/20240715-194344-arnaudb.json
  • 19:43 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1246.eqiad.wmnet with reason: Maintenance
  • 19:43 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1246.eqiad.wmnet with reason: Maintenance
  • 19:43 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1239.eqiad.wmnet with reason: Maintenance
  • 19:43 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1239.eqiad.wmnet with reason: Maintenance
  • 19:42 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233 (T367781)', diff saved to https://phabricator.wikimedia.org/P66529 and previous config saved to /var/cache/conftool/dbconfig/20240715-194257-arnaudb.json
  • 19:40 marostegui@cumin1002: dbctl commit (dc=all): 'db2121 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P66528 and previous config saved to /var/cache/conftool/dbconfig/20240715-194004-root.json
  • 19:39 marostegui@cumin1002: dbctl commit (dc=all): 'db1170 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P66527 and previous config saved to /var/cache/conftool/dbconfig/20240715-193953-root.json
  • 19:27 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P66526 and previous config saved to /var/cache/conftool/dbconfig/20240715-192750-arnaudb.json
  • 19:25 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@9ad2bec]: 0.3.144 (duration: 08m 31s)
  • 19:24 marostegui@cumin1002: dbctl commit (dc=all): 'db2121 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P66525 and previous config saved to /var/cache/conftool/dbconfig/20240715-192458-root.json
  • 19:24 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: elastic109[8-9]* for T348977 - bking@cumin2002
  • 19:24 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: elastic109[8-9]* for T348977 - bking@cumin2002
  • 19:24 marostegui@cumin1002: dbctl commit (dc=all): 'db1170 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P66524 and previous config saved to /var/cache/conftool/dbconfig/20240715-192448-root.json
  • 19:24 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on elastic[1098-1099].eqiad.wmnet with reason: T348977
  • 19:23 bking@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on elastic[1098-1099].eqiad.wmnet with reason: T348977
  • 19:17 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.144` on canary `wdqs1016`; proceeding to rest of fleet
  • 19:16 ryankemper@deploy1002: Started deploy [wdqs/wdqs@9ad2bec]: 0.3.144
  • 19:16 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.144`. Pre-deploy tests passing on canary `wdqs1016`
  • 19:12 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P66523 and previous config saved to /var/cache/conftool/dbconfig/20240715-191243-arnaudb.json
  • 19:09 marostegui@cumin1002: dbctl commit (dc=all): 'db2121 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P66522 and previous config saved to /var/cache/conftool/dbconfig/20240715-190953-root.json
  • 19:09 marostegui@cumin1002: dbctl commit (dc=all): 'db1170 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P66521 and previous config saved to /var/cache/conftool/dbconfig/20240715-190942-root.json
  • 18:57 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233 (T367781)', diff saved to https://phabricator.wikimedia.org/P66520 and previous config saved to /var/cache/conftool/dbconfig/20240715-185736-arnaudb.json
  • 18:55 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1233 (T367781)', diff saved to https://phabricator.wikimedia.org/P66519 and previous config saved to /var/cache/conftool/dbconfig/20240715-185521-arnaudb.json
  • 18:55 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1233.eqiad.wmnet with reason: Maintenance
  • 18:55 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1233.eqiad.wmnet with reason: Maintenance
  • 18:55 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229 (T367781)', diff saved to https://phabricator.wikimedia.org/P66518 and previous config saved to /var/cache/conftool/dbconfig/20240715-185459-arnaudb.json
  • 18:54 marostegui@cumin1002: dbctl commit (dc=all): 'db2121 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P66517 and previous config saved to /var/cache/conftool/dbconfig/20240715-185447-root.json
  • 18:54 marostegui@cumin1002: dbctl commit (dc=all): 'db1170 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P66516 and previous config saved to /var/cache/conftool/dbconfig/20240715-185437-root.json
  • 18:39 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P66515 and previous config saved to /var/cache/conftool/dbconfig/20240715-183952-arnaudb.json
  • 18:39 marostegui@cumin1002: dbctl commit (dc=all): 'db2121 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P66514 and previous config saved to /var/cache/conftool/dbconfig/20240715-183942-root.json
  • 18:39 marostegui@cumin1002: dbctl commit (dc=all): 'db1170 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P66513 and previous config saved to /var/cache/conftool/dbconfig/20240715-183931-root.json
  • 18:24 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P66512 and previous config saved to /var/cache/conftool/dbconfig/20240715-182444-arnaudb.json
  • 18:24 marostegui@cumin1002: dbctl commit (dc=all): 'db2121 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P66511 and previous config saved to /var/cache/conftool/dbconfig/20240715-182436-root.json
  • 18:24 marostegui@cumin1002: dbctl commit (dc=all): 'db1170 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P66510 and previous config saved to /var/cache/conftool/dbconfig/20240715-182426-root.json
  • 18:09 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229 (T367781)', diff saved to https://phabricator.wikimedia.org/P66509 and previous config saved to /var/cache/conftool/dbconfig/20240715-180937-arnaudb.json
  • 18:07 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1229 (T367781)', diff saved to https://phabricator.wikimedia.org/P66508 and previous config saved to /var/cache/conftool/dbconfig/20240715-180726-arnaudb.json
  • 18:07 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1229.eqiad.wmnet with reason: Maintenance
  • 18:07 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1229.eqiad.wmnet with reason: Maintenance
  • 18:07 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1225.eqiad.wmnet with reason: Maintenance
  • 18:06 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1225.eqiad.wmnet with reason: Maintenance
  • 18:06 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T367781)', diff saved to https://phabricator.wikimedia.org/P66507 and previous config saved to /var/cache/conftool/dbconfig/20240715-180640-arnaudb.json
  • 18:04 herron: upgraded prometheus-ipmi-exporter to 1.8.0 T368088
  • 17:51 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P66506 and previous config saved to /var/cache/conftool/dbconfig/20240715-175133-arnaudb.json
  • 17:41 mnz@deploy1002: Finished deploy [airflow-dags/research@5121748]: (no justification provided) (duration: 00m 10s)
  • 17:40 mnz@deploy1002: Started deploy [airflow-dags/research@5121748]: (no justification provided)
  • 17:38 ejegg: Fundraising python tools upgraded from 94bac5c6 to 490a7b3f
  • 17:37 ejegg: SmashPig upgraded from 565c61e4 to f2aca230
  • 17:36 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P66505 and previous config saved to /var/cache/conftool/dbconfig/20240715-173625-arnaudb.json
  • 17:21 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T367781)', diff saved to https://phabricator.wikimedia.org/P66504 and previous config saved to /var/cache/conftool/dbconfig/20240715-172118-arnaudb.json
  • 17:19 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1197 (T367781)', diff saved to https://phabricator.wikimedia.org/P66503 and previous config saved to /var/cache/conftool/dbconfig/20240715-171908-arnaudb.json
  • 17:19 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1197.eqiad.wmnet with reason: Maintenance
  • 17:18 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1197.eqiad.wmnet with reason: Maintenance
  • 17:18 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T367781)', diff saved to Unable to send diff to phaste and previous config saved to /var/cache/conftool/dbconfig/20240715-171841-arnaudb.json
  • 17:03 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P66501 and previous config saved to /var/cache/conftool/dbconfig/20240715-170334-arnaudb.json
  • 16:48 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P66500 and previous config saved to /var/cache/conftool/dbconfig/20240715-164827-arnaudb.json
  • 16:33 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T367781)', diff saved to https://phabricator.wikimedia.org/P66499 and previous config saved to /var/cache/conftool/dbconfig/20240715-163320-arnaudb.json
  • 16:31 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1188 (T367781)', diff saved to https://phabricator.wikimedia.org/P66498 and previous config saved to /var/cache/conftool/dbconfig/20240715-163110-arnaudb.json
  • 16:31 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1188.eqiad.wmnet with reason: Maintenance
  • 16:30 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1188.eqiad.wmnet with reason: Maintenance
  • 16:30 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T367781)', diff saved to https://phabricator.wikimedia.org/P66497 and previous config saved to /var/cache/conftool/dbconfig/20240715-163048-arnaudb.json
  • 16:15 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P66496 and previous config saved to /var/cache/conftool/dbconfig/20240715-161541-arnaudb.json
  • 16:00 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P66495 and previous config saved to /var/cache/conftool/dbconfig/20240715-160033-arnaudb.json
  • 15:47 klausman@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
  • 15:47 klausman@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
  • 15:45 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T367781)', diff saved to https://phabricator.wikimedia.org/P66494 and previous config saved to /var/cache/conftool/dbconfig/20240715-154526-arnaudb.json
  • 15:43 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1182 (T367781)', diff saved to https://phabricator.wikimedia.org/P66493 and previous config saved to /var/cache/conftool/dbconfig/20240715-154312-arnaudb.json
  • 15:43 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 15:42 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 15:42 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T367781)', diff saved to https://phabricator.wikimedia.org/P66492 and previous config saved to /var/cache/conftool/dbconfig/20240715-154250-arnaudb.json
  • 15:32 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on netbox2003.codfw.wmnet with reason: netbox upgrade prep work
  • 15:31 ayounsi@cumin1002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on netbox2003.codfw.wmnet with reason: netbox upgrade prep work
  • 15:27 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P66491 and previous config saved to /var/cache/conftool/dbconfig/20240715-152742-arnaudb.json
  • 15:17 klausman@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
  • 15:16 klausman@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
  • 15:16 klausman@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
  • 15:14 mnz@deploy1002: Finished deploy [airflow-dags/research@5121748]: (no justification provided) (duration: 00m 31s)
  • 15:13 mnz@deploy1002: Started deploy [airflow-dags/research@5121748]: (no justification provided)
  • 15:13 klausman@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
  • 15:12 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P66490 and previous config saved to /var/cache/conftool/dbconfig/20240715-151235-arnaudb.json
  • 15:12 klausman@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
  • 15:12 klausman@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
  • 15:09 klausman@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
  • 15:07 klausman@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
  • 14:57 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T367781)', diff saved to https://phabricator.wikimedia.org/P66489 and previous config saved to /var/cache/conftool/dbconfig/20240715-145728-arnaudb.json
  • 14:55 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1162 (T367781)', diff saved to https://phabricator.wikimedia.org/P66488 and previous config saved to /var/cache/conftool/dbconfig/20240715-145517-arnaudb.json
  • 14:55 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 14:55 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 14:54 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T367781)', diff saved to https://phabricator.wikimedia.org/P66487 and previous config saved to /var/cache/conftool/dbconfig/20240715-145455-arnaudb.json
  • 14:50 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on aqs1013.eqiad.wmnet with reason: Server swap — T362033
  • 14:50 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on aqs1013.eqiad.wmnet with reason: Server swap — T362033
  • 14:39 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P66486 and previous config saved to /var/cache/conftool/dbconfig/20240715-143948-arnaudb.json
  • 14:24 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P66485 and previous config saved to /var/cache/conftool/dbconfig/20240715-142441-arnaudb.json
  • 14:16 _joe_: updating conftool to 3.1.0 fleet wide
  • 14:13 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbproxy2005.codfw.wmnet with OS bookworm
  • 14:09 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T367781)', diff saved to https://phabricator.wikimedia.org/P66484 and previous config saved to /var/cache/conftool/dbconfig/20240715-140934-arnaudb.json
  • 14:07 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1156 (T367781)', diff saved to https://phabricator.wikimedia.org/P66483 and previous config saved to /var/cache/conftool/dbconfig/20240715-140720-arnaudb.json
  • 14:07 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 14:07 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 14:06 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 14:06 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 13:58 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbproxy2005.codfw.wmnet with reason: host reimage
  • 13:54 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dbproxy2005.codfw.wmnet with reason: host reimage
  • 13:53 oblivian@puppetmaster2001: conftool action : set/pooled=yes; selector: name=mw1386.*,cluster=kubernetes,dc=eqiad [reason: Test conftool sal logging]
  • 13:51 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on netbox1003.eqiad.wmnet with reason: netbox upgrade prep work
  • 13:51 ayounsi@cumin1002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on netbox1003.eqiad.wmnet with reason: netbox upgrade prep work
  • 13:50 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on netboxdb2003.codfw.wmnet with reason: netbox upgrade prep work
  • 13:50 ayounsi@cumin1002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on netboxdb2003.codfw.wmnet with reason: netbox upgrade prep work
  • 13:45 _joe_: uploading conftool 3.1.0 to bookworm,bullseye,buster
  • 13:41 Lucas_WMDE: UTC afternoon backport+config window done
  • 13:39 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host dbproxy2005.codfw.wmnet with OS bookworm
  • 13:33 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for Add entity-schema to $wgWBRepoSettings['searchIndexTypes'] (T369495) (duration: 30m 51s)
  • 13:25 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde: Continuing with sync
  • 13:15 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde: Backport for Add entity-schema to $wgWBRepoSettings['searchIndexTypes'] (T369495) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:02 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap sync-world: Backport for Add entity-schema to $wgWBRepoSettings['searchIndexTypes'] (T369495)
  • 12:41 klausman@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
  • 12:41 klausman@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
  • 12:41 klausman@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
  • 12:40 klausman@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
  • 12:30 dcausse@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 12:30 dcausse@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 12:30 dcausse@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 12:30 dcausse@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 12:16 dcausse@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 12:15 dcausse@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 11:32 marostegui: test
  • 11:31 marostegui: Reboot stashbot
  • 11:25 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 11:24 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
  • 11:11 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 11:11 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
  • 11:11 claime: Increasing webVideoTranscodePrioritized concurrency in changeprop-jobqueue
  • 11:09 cgoubert@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 11:08 cgoubert@deploy1002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
  • 11:08 cgoubert@deploy1002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
  • 10:21 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2116 (T367856)', diff saved to https://phabricator.wikimedia.org/P66480 and previous config saved to /var/cache/conftool/dbconfig/20240715-102117-marostegui.json
  • 10:21 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2116.codfw.wmnet with reason: Maintenance
  • 10:20 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2116.codfw.wmnet with reason: Maintenance
  • 09:59 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 52999
  • 09:59 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 52999
  • 09:58 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 270361
  • 09:58 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 270361
  • 09:58 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 262293
  • 09:58 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 262293
  • 09:58 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 61941
  • 09:57 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 61941
  • 09:56 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 49544
  • 09:54 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 49544
  • 09:29 claime: manually removing mw1349.eqiad.wmnet mw1350.eqiad.wmnet mw1351.eqiad.wmnet from k8s following reimage to videoscalers - T351074
  • 09:25 dcausse@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 09:22 dcausse@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 09:19 marostegui: Deploy schema change on s7 eqiad db1170 dbmaint T367856
  • 09:18 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Long schema change
  • 09:18 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Long schema change
  • 09:18 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1170 (T367856)', diff saved to https://phabricator.wikimedia.org/P66479 and previous config saved to /var/cache/conftool/dbconfig/20240715-091800-marostegui.json
  • 09:17 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 09:17 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 09:16 elukey@cumin1002: END (FAIL) - Cookbook sre.network.tls (exit_code=99) for network device lsw1-d3-codfw
  • 09:15 marostegui: Deploy schema change on s7 codfw db2121 dbmaint T367856
  • 09:15 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2121.codfw.wmnet with reason: Long schema change
  • 09:14 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2121.codfw.wmnet with reason: Long schema change
  • 09:14 elukey@cumin1002: START - Cookbook sre.network.tls for network device lsw1-d3-codfw
  • 09:05 volans@cumin1002: dbctl commit (dc=all): 'Depool db2121 T369882', diff saved to https://phabricator.wikimedia.org/P66478 and previous config saved to /var/cache/conftool/dbconfig/20240715-090532-volans.json
  • 08:56 volans@cumin1002: dbctl commit (dc=all): 'Promote db2218 to s7 primary T369882', diff saved to https://phabricator.wikimedia.org/P66477 and previous config saved to /var/cache/conftool/dbconfig/20240715-085654-volans.json
  • 08:51 volans: Starting s7 codfw failover from db2121 to db2218 - T369882
  • 08:33 slyngshede@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host idp2004.wikimedia.org
  • 08:33 slyngshede@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host idp2004.wikimedia.org with OS bookworm
  • 08:22 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 52468
  • 08:21 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 52468
  • 08:16 slyngshede@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on idp2004.wikimedia.org with reason: host reimage
  • 08:13 slyngshede@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on idp2004.wikimedia.org with reason: host reimage
  • 08:12 volans@cumin2002: dbctl commit (dc=all): 'Remove db2218 from API T369882', diff saved to https://phabricator.wikimedia.org/P66475 and previous config saved to /var/cache/conftool/dbconfig/20240715-081252-volans.json
  • 08:09 volans@cumin2002: dbctl commit (dc=all): 'Set db2218 with weight 0 T369882', diff saved to https://phabricator.wikimedia.org/P66474 and previous config saved to /var/cache/conftool/dbconfig/20240715-080948-volans.json
  • 08:05 volans@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 28 hosts with reason: Primary switchover s7 T369882
  • 08:04 volans@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on 28 hosts with reason: Primary switchover s7 T369882
  • 07:58 slyngshede@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM idp2004.wikimedia.org - slyngshede@cumin1002"
  • 07:57 slyngshede@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM idp2004.wikimedia.org - slyngshede@cumin1002"
  • 07:57 slyngshede@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) idp2004.wikimedia.org on all recursors
  • 07:57 slyngshede@cumin1002: START - Cookbook sre.dns.wipe-cache idp2004.wikimedia.org on all recursors
  • 07:57 slyngshede@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 07:57 slyngshede@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM idp2004.wikimedia.org - slyngshede@cumin1002"
  • 07:55 slyngshede@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM idp2004.wikimedia.org - slyngshede@cumin1002"
  • 07:53 slyngshede@cumin1002: START - Cookbook sre.dns.netbox
  • 07:53 slyngshede@cumin1002: START - Cookbook sre.ganeti.makevm for new host idp2004.wikimedia.org
  • 07:36 slyngshede@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host idp1004.wikimedia.org
  • 07:36 slyngshede@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host idp1004.wikimedia.org with OS bookworm
  • 07:21 slyngshede@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on idp1004.wikimedia.org with reason: host reimage
  • 07:17 slyngshede@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on idp1004.wikimedia.org with reason: host reimage
  • 07:17 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db1179.eqiad.wmnet with reason: T369855
  • 07:17 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on db1179.eqiad.wmnet with reason: T369855
  • 07:06 slyngshede@cumin1002: START - Cookbook sre.hosts.reimage for host idp1004.wikimedia.org with OS bookworm
  • 07:05 slyngshede@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM idp1004.wikimedia.org - slyngshede@cumin1002"
  • 07:04 slyngshede@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM idp1004.wikimedia.org - slyngshede@cumin1002"
  • 07:04 slyngshede@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) idp1004.wikimedia.org on all recursors
  • 07:04 slyngshede@cumin1002: START - Cookbook sre.dns.wipe-cache idp1004.wikimedia.org on all recursors
  • 07:04 slyngshede@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 07:04 slyngshede@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM idp1004.wikimedia.org - slyngshede@cumin1002"
  • 07:03 slyngshede@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM idp1004.wikimedia.org - slyngshede@cumin1002"
  • 07:01 slyngshede@cumin1002: START - Cookbook sre.dns.netbox
  • 07:00 slyngshede@cumin1002: START - Cookbook sre.ganeti.makevm for new host idp1004.wikimedia.org
  • 06:22 marostegui@cumin1002: dbctl commit (dc=all): 'Repool db2136', diff saved to https://phabricator.wikimedia.org/P66473 and previous config saved to /var/cache/conftool/dbconfig/20240715-062216-root.json
  • 06:07 dani@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 06:07 dani@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 06:07 dani@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 06:06 dani@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 06:06 dani@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 06:06 dani@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 05:12 marostegui@cumin1002: START - Cookbook sre.hosts.reimage for host dbproxy2005.codfw.wmnet with OS bookworm
  • 04:47 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2137 (T367856)', diff saved to https://phabricator.wikimedia.org/P66472 and previous config saved to /var/cache/conftool/dbconfig/20240715-044723-marostegui.json
  • 04:47 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2137.codfw.wmnet with reason: Maintenance
  • 04:47 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2137.codfw.wmnet with reason: Maintenance
  • 04:13 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 04:13 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove IPV6 for dbproxy200[5-8] - pt1979@cumin2002"
  • 04:12 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove IPV6 for dbproxy200[5-8] - pt1979@cumin2002"
  • 04:09 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 02:11 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
  • 02:11 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
  • 02:11 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1236 (T367856)', diff saved to https://phabricator.wikimedia.org/P66471 and previous config saved to /var/cache/conftool/dbconfig/20240715-021121-marostegui.json
  • 01:56 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1236', diff saved to https://phabricator.wikimedia.org/P66470 and previous config saved to /var/cache/conftool/dbconfig/20240715-015613-marostegui.json
  • 01:41 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1236', diff saved to https://phabricator.wikimedia.org/P66469 and previous config saved to /var/cache/conftool/dbconfig/20240715-014106-marostegui.json
  • 01:26 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1236 (T367856)', diff saved to https://phabricator.wikimedia.org/P66467 and previous config saved to /var/cache/conftool/dbconfig/20240715-012559-marostegui.json

2024-07-14

  • 22:31 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1236 (T367856)', diff saved to https://phabricator.wikimedia.org/P66466 and previous config saved to /var/cache/conftool/dbconfig/20240714-223146-marostegui.json
  • 22:31 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1236.eqiad.wmnet with reason: Maintenance
  • 22:31 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1236.eqiad.wmnet with reason: Maintenance
  • 22:31 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1227 (T367856)', diff saved to https://phabricator.wikimedia.org/P66465 and previous config saved to /var/cache/conftool/dbconfig/20240714-223124-marostegui.json
  • 22:16 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1227', diff saved to https://phabricator.wikimedia.org/P66464 and previous config saved to /var/cache/conftool/dbconfig/20240714-221617-marostegui.json
  • 22:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1227', diff saved to https://phabricator.wikimedia.org/P66463 and previous config saved to /var/cache/conftool/dbconfig/20240714-220110-marostegui.json
  • 21:46 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1227 (T367856)', diff saved to https://phabricator.wikimedia.org/P66462 and previous config saved to /var/cache/conftool/dbconfig/20240714-214603-marostegui.json
  • 17:58 marostegui@cumin1002: dbctl commit (dc=all): 'db1189 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P66461 and previous config saved to /var/cache/conftool/dbconfig/20240714-175827-root.json
  • 17:43 marostegui@cumin1002: dbctl commit (dc=all): 'db1189 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P66460 and previous config saved to /var/cache/conftool/dbconfig/20240714-174322-root.json
  • 17:28 marostegui@cumin1002: dbctl commit (dc=all): 'db1189 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P66459 and previous config saved to /var/cache/conftool/dbconfig/20240714-172816-root.json
  • 17:13 marostegui@cumin1002: dbctl commit (dc=all): 'db1189 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P66458 and previous config saved to /var/cache/conftool/dbconfig/20240714-171311-root.json
  • 16:58 marostegui@cumin1002: dbctl commit (dc=all): 'db1189 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P66457 and previous config saved to /var/cache/conftool/dbconfig/20240714-165805-root.json
  • 16:43 marostegui@cumin1002: dbctl commit (dc=all): 'db1189 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P66456 and previous config saved to /var/cache/conftool/dbconfig/20240714-164300-root.json
  • 16:30 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: Maintenance
  • 16:30 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2136.codfw.wmnet with reason: Maintenance
  • 16:27 marostegui@cumin1002: dbctl commit (dc=all): 'db1189 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P66455 and previous config saved to /var/cache/conftool/dbconfig/20240714-162755-root.json
  • 14:00 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1227 (T367856)', diff saved to https://phabricator.wikimedia.org/P66454 and previous config saved to /var/cache/conftool/dbconfig/20240714-140046-marostegui.json
  • 14:00 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1227.eqiad.wmnet with reason: Maintenance
  • 14:00 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1227.eqiad.wmnet with reason: Maintenance
  • 14:00 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1202 (T367856)', diff saved to https://phabricator.wikimedia.org/P66453 and previous config saved to /var/cache/conftool/dbconfig/20240714-140024-marostegui.json
  • 13:45 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P66452 and previous config saved to /var/cache/conftool/dbconfig/20240714-134517-marostegui.json
  • 13:30 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P66451 and previous config saved to /var/cache/conftool/dbconfig/20240714-133010-marostegui.json
  • 13:15 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1202 (T367856)', diff saved to https://phabricator.wikimedia.org/P66450 and previous config saved to /var/cache/conftool/dbconfig/20240714-131502-marostegui.json
  • 09:35 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1202 (T367856)', diff saved to https://phabricator.wikimedia.org/P66449 and previous config saved to /var/cache/conftool/dbconfig/20240714-093540-marostegui.json
  • 09:35 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1202.eqiad.wmnet with reason: Maintenance
  • 09:35 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1202.eqiad.wmnet with reason: Maintenance
  • 09:35 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T367856)', diff saved to https://phabricator.wikimedia.org/P66448 and previous config saved to /var/cache/conftool/dbconfig/20240714-093518-marostegui.json
  • 09:20 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P66447 and previous config saved to /var/cache/conftool/dbconfig/20240714-092011-marostegui.json
  • 09:05 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P66446 and previous config saved to /var/cache/conftool/dbconfig/20240714-090504-marostegui.json
  • 08:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T367856)', diff saved to https://phabricator.wikimedia.org/P66445 and previous config saved to /var/cache/conftool/dbconfig/20240714-084956-marostegui.json
  • 08:49 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1189 (T367856)', diff saved to https://phabricator.wikimedia.org/P66444 and previous config saved to /var/cache/conftool/dbconfig/20240714-084903-marostegui.json
  • 08:48 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1189.eqiad.wmnet with reason: Maintenance
  • 08:48 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1189.eqiad.wmnet with reason: Maintenance
  • 05:46 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1194 (T367856)', diff saved to https://phabricator.wikimedia.org/P66443 and previous config saved to /var/cache/conftool/dbconfig/20240714-054611-marostegui.json
  • 05:46 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1194.eqiad.wmnet with reason: Maintenance
  • 05:45 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1194.eqiad.wmnet with reason: Maintenance
  • 05:45 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T367856)', diff saved to https://phabricator.wikimedia.org/P66442 and previous config saved to /var/cache/conftool/dbconfig/20240714-054549-marostegui.json
  • 05:30 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P66441 and previous config saved to /var/cache/conftool/dbconfig/20240714-053042-marostegui.json
  • 05:15 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P66440 and previous config saved to /var/cache/conftool/dbconfig/20240714-051535-marostegui.json
  • 05:00 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T367856)', diff saved to https://phabricator.wikimedia.org/P66439 and previous config saved to /var/cache/conftool/dbconfig/20240714-050027-marostegui.json
  • 01:59 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1191 (T367856)', diff saved to https://phabricator.wikimedia.org/P66438 and previous config saved to /var/cache/conftool/dbconfig/20240714-015901-marostegui.json
  • 01:58 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1191.eqiad.wmnet with reason: Maintenance
  • 01:58 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1191.eqiad.wmnet with reason: Maintenance
  • 01:58 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T367856)', diff saved to https://phabricator.wikimedia.org/P66437 and previous config saved to /var/cache/conftool/dbconfig/20240714-015838-marostegui.json
  • 01:43 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P66436 and previous config saved to /var/cache/conftool/dbconfig/20240714-014331-marostegui.json
  • 01:28 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P66435 and previous config saved to /var/cache/conftool/dbconfig/20240714-012824-marostegui.json
  • 01:13 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T367856)', diff saved to https://phabricator.wikimedia.org/P66434 and previous config saved to /var/cache/conftool/dbconfig/20240714-011317-marostegui.json
  • 00:13 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1174 (T367856)', diff saved to https://phabricator.wikimedia.org/P66433 and previous config saved to /var/cache/conftool/dbconfig/20240714-001301-marostegui.json
  • 00:12 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 00:12 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1174.eqiad.wmnet with reason: Maintenance

2024-07-13

  • 15:52 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 15:52 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 15:52 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1170 (T367856)', diff saved to https://phabricator.wikimedia.org/P66432 and previous config saved to /var/cache/conftool/dbconfig/20240713-155158-marostegui.json
  • 15:36 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1170', diff saved to https://phabricator.wikimedia.org/P66431 and previous config saved to /var/cache/conftool/dbconfig/20240713-153650-marostegui.json
  • 15:21 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1170', diff saved to https://phabricator.wikimedia.org/P66430 and previous config saved to /var/cache/conftool/dbconfig/20240713-152143-marostegui.json
  • 15:06 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1170 (T367856)', diff saved to https://phabricator.wikimedia.org/P66429 and previous config saved to /var/cache/conftool/dbconfig/20240713-150636-marostegui.json
  • 14:06 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1170 (T367856)', diff saved to https://phabricator.wikimedia.org/P66428 and previous config saved to /var/cache/conftool/dbconfig/20240713-140620-marostegui.json
  • 14:06 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 14:06 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 13:36 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 13:35 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 10:58 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 10:58 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 10:58 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 10:58 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 06:19 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1240.eqiad.wmnet with reason: Maintenance
  • 06:19 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1240.eqiad.wmnet with reason: Maintenance
  • 06:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1223 (T367856)', diff saved to https://phabricator.wikimedia.org/P66427 and previous config saved to /var/cache/conftool/dbconfig/20240713-061928-marostegui.json
  • 06:04 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1223', diff saved to https://phabricator.wikimedia.org/P66426 and previous config saved to /var/cache/conftool/dbconfig/20240713-060421-marostegui.json
  • 05:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1223', diff saved to https://phabricator.wikimedia.org/P66425 and previous config saved to /var/cache/conftool/dbconfig/20240713-054913-marostegui.json
  • 05:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1223 (T367856)', diff saved to https://phabricator.wikimedia.org/P66424 and previous config saved to /var/cache/conftool/dbconfig/20240713-053406-marostegui.json
  • 01:33 tzatziki: removing 2 files for legal compliance
  • 01:22 tzatziki: removing 16 files for legal compliance
  • 00:04 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2220 (T367856)', diff saved to https://phabricator.wikimedia.org/P66423 and previous config saved to /var/cache/conftool/dbconfig/20240713-000433-marostegui.json

2024-07-12

  • 23:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2220', diff saved to https://phabricator.wikimedia.org/P66422 and previous config saved to /var/cache/conftool/dbconfig/20240712-234926-marostegui.json
  • 23:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2220', diff saved to https://phabricator.wikimedia.org/P66421 and previous config saved to /var/cache/conftool/dbconfig/20240712-233419-marostegui.json
  • 23:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2220 (T367856)', diff saved to https://phabricator.wikimedia.org/P66420 and previous config saved to /var/cache/conftool/dbconfig/20240712-231912-marostegui.json
  • 22:34 tzatziki: removing 1 file for legal compliance
  • 22:32 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1223 (T367856)', diff saved to https://phabricator.wikimedia.org/P66419 and previous config saved to /var/cache/conftool/dbconfig/20240712-223226-marostegui.json
  • 22:32 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1223.eqiad.wmnet with reason: Maintenance
  • 22:32 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1223.eqiad.wmnet with reason: Maintenance
  • 22:32 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1212 (T367856)', diff saved to https://phabricator.wikimedia.org/P66418 and previous config saved to /var/cache/conftool/dbconfig/20240712-223204-marostegui.json
  • 22:21 tzatziki: removing 1 file for legal compliance
  • 22:16 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1212', diff saved to https://phabricator.wikimedia.org/P66417 and previous config saved to /var/cache/conftool/dbconfig/20240712-221656-marostegui.json
  • 22:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1212', diff saved to https://phabricator.wikimedia.org/P66416 and previous config saved to /var/cache/conftool/dbconfig/20240712-220149-marostegui.json
  • 21:46 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1212 (T367856)', diff saved to https://phabricator.wikimedia.org/P66415 and previous config saved to /var/cache/conftool/dbconfig/20240712-214642-marostegui.json
  • 19:02 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1212 (T367856)', diff saved to https://phabricator.wikimedia.org/P66414 and previous config saved to /var/cache/conftool/dbconfig/20240712-190224-marostegui.json
  • 19:02 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 19:02 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 19:02 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1212.eqiad.wmnet with reason: Maintenance
  • 19:02 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1212.eqiad.wmnet with reason: Maintenance
  • 19:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T367856)', diff saved to https://phabricator.wikimedia.org/P66413 and previous config saved to /var/cache/conftool/dbconfig/20240712-190154-marostegui.json
  • 18:46 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P66412 and previous config saved to /var/cache/conftool/dbconfig/20240712-184647-marostegui.json
  • 18:31 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P66411 and previous config saved to /var/cache/conftool/dbconfig/20240712-183140-marostegui.json
  • 18:16 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T367856)', diff saved to https://phabricator.wikimedia.org/P66410 and previous config saved to /var/cache/conftool/dbconfig/20240712-181632-marostegui.json
  • 17:10 hnowlan@cumin1002: conftool action : set/pooled=yes:weight=10; selector: name=(mw1349.eqiad.wmnet|mw1350.eqiad.wmnet|mw1351.eqiad.wmnet)
  • 17:07 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw1349.eqiad.wmnet
  • 17:07 hnowlan@cumin1002: START - Cookbook sre.hosts.remove-downtime for mw1349.eqiad.wmnet
  • 17:07 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw[1350-1351].eqiad.wmnet
  • 17:07 cgoubert@cumin1002: START - Cookbook sre.hosts.remove-downtime for mw[1350-1351].eqiad.wmnet
  • 17:06 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw1351.eqiad.wmnet with OS buster
  • 17:03 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw1350.eqiad.wmnet with OS buster
  • 17:00 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw1349.eqiad.wmnet with OS buster
  • 16:32 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1351.eqiad.wmnet with reason: host reimage
  • 16:29 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1350.eqiad.wmnet with reason: host reimage
  • 16:27 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1349.eqiad.wmnet with reason: host reimage
  • 16:24 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1351.eqiad.wmnet with reason: host reimage
  • 16:24 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1350.eqiad.wmnet with reason: host reimage
  • 16:23 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1349.eqiad.wmnet with reason: host reimage
  • 16:17 dcausse@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 16:16 dcausse@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 16:10 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host mw1351.eqiad.wmnet with OS buster
  • 16:09 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host mw1350.eqiad.wmnet with OS buster
  • 16:09 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host mw1349.eqiad.wmnet with OS buster
  • 16:05 cgoubert@cumin1002: conftool action : set/pooled=inactive; selector: name=(mw1349|mw1350|mw1351).eqiad.wmnet,cluster=(jobrunner|videoscaler)
  • 16:05 cgoubert@cumin1002: conftool action : set/pooled=yes; selector: name=(mw1349|mw1350|mw1351).eqiad.wmnet,cluster=(jobrunner|videoscaler)
  • 16:04 claime: pooling mw1349, mw1350, mw1351 as jobrunners
  • 16:03 cgoubert@cumin1002: conftool action : set/pooled=no:weight=10; selector: name=(mw1349|mw1350|mw1351).eqiad.wmnet,cluster=(jobrunner|videoscaler)
  • 16:01 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw1351.eqiad.wmnet with OS buster
  • 16:01 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw1350.eqiad.wmnet
  • 16:01 cgoubert@cumin1002: START - Cookbook sre.hosts.remove-downtime for mw1350.eqiad.wmnet
  • 16:00 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw1349.eqiad.wmnet
  • 16:00 cgoubert@cumin1002: START - Cookbook sre.hosts.remove-downtime for mw1349.eqiad.wmnet
  • 15:59 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw1350.eqiad.wmnet with OS buster
  • 15:58 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw1349.eqiad.wmnet with OS buster
  • 15:57 cgoubert@cumin1002: conftool action : set/pooled=no:weight=10; selector: name=(mw1349|mw1350|mw1351).eqiad.wmnet,cluster=jobrunner
  • 15:57 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host sretest2001.codfw.wmnet
  • 15:49 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2220 (T367856)', diff saved to https://phabricator.wikimedia.org/P66408 and previous config saved to /var/cache/conftool/dbconfig/20240712-154954-marostegui.json
  • 15:49 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2220.codfw.wmnet with reason: Maintenance
  • 15:49 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2220.codfw.wmnet with reason: Maintenance
  • 15:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2218 (T367856)', diff saved to https://phabricator.wikimedia.org/P66407 and previous config saved to /var/cache/conftool/dbconfig/20240712-154921-marostegui.json
  • 15:47 dcausse@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 15:47 dcausse@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 15:46 pt1979@cumin2002: START - Cookbook sre.hosts.dhcp for host sretest2001.codfw.wmnet
  • 15:46 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host sretest2001.codfw.wmnet
  • 15:46 pt1979@cumin2002: START - Cookbook sre.hosts.dhcp for host sretest2001.codfw.wmnet
  • 15:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2218', diff saved to https://phabricator.wikimedia.org/P66406 and previous config saved to /var/cache/conftool/dbconfig/20240712-153414-marostegui.json
  • 15:33 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/commons-impact-analytics: apply
  • 15:33 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1351.eqiad.wmnet with reason: host reimage
  • 15:32 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/commons-impact-analytics: apply
  • 15:26 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/commons-impact-analytics: apply
  • 15:26 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1350.eqiad.wmnet with reason: host reimage
  • 15:25 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/commons-impact-analytics: apply
  • 15:23 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1349.eqiad.wmnet with reason: host reimage
  • 15:21 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1351.eqiad.wmnet with reason: host reimage
  • 15:21 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1350.eqiad.wmnet with reason: host reimage
  • 15:21 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1349.eqiad.wmnet with reason: host reimage
  • 15:20 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/commons-impact-analytics: apply
  • 15:20 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/commons-impact-analytics: apply
  • 15:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2218', diff saved to https://phabricator.wikimedia.org/P66405 and previous config saved to /var/cache/conftool/dbconfig/20240712-151907-marostegui.json
  • 15:17 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 15:17 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
  • 15:17 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 15:17 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
  • 15:15 hnowlan: homer 'cr*eqiad*' commit 'videoscaler reimages mw1349/mw135[01]'
  • 15:08 dcausse@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 15:07 dcausse@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 15:07 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host mw1351.eqiad.wmnet with OS buster
  • 15:06 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host mw1350.eqiad.wmnet with OS buster
  • 15:06 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host mw1349.eqiad.wmnet with OS buster
  • 15:04 dcausse@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 15:04 dcausse@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 15:04 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2218 (T367856)', diff saved to https://phabricator.wikimedia.org/P66404 and previous config saved to /var/cache/conftool/dbconfig/20240712-150400-marostegui.json
  • 15:03 dcausse@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 15:02 dcausse@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 14:58 cgoubert@cumin1002: conftool action : set/pooled=inactive; selector: name=(mw1349.eqiad.wmnet|mw1350.eqiad.wmnet|mw1351.eqiad.wmnet),cluster=kubernetes,service=kubesvc
  • 14:55 claime: Draining and depooling mw1349, mw1350, mw1351 for reimage as jobrunners
  • 14:36 elukey@cumin1002: END (FAIL) - Cookbook sre.network.tls (exit_code=99) for network device lsw1-d3-codfw
  • 14:34 elukey@cumin1002: START - Cookbook sre.network.tls for network device lsw1-d3-codfw
  • 14:20 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 14:19 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
  • 14:19 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 14:18 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
  • 13:45 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:43 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 13:22 cdanis@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/aus-k8s-eqiad-services/jaeger: apply
  • 13:21 cdanis@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/aus-k8s-eqiad-services/jaeger: apply
  • 13:21 cdanis@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/aus-k8s-eqiad-services/jaeger: apply
  • 13:21 cdanis@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/aus-k8s-eqiad-services/jaeger: apply
  • 13:19 cdanis@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/aus-k8s-eqiad-services/jaeger: apply
  • 13:18 cdanis@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/aus-k8s-eqiad-services/jaeger: apply
  • 13:18 cdanis@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/aus-k8s-eqiad-services/jaeger: apply
  • 13:12 cdanis@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/aus-k8s-eqiad-services/jaeger: apply
  • 13:10 topranks: pushing updated BGP policy to cr2-eqord and cr2-eqdfw to announce Anycast ranges from network pops (T367439)
  • 10:24 arnaudb@cumin1002: dbctl commit (dc=all): 'db1196 (re)pooling @ 100%: stopping T367781', diff saved to https://phabricator.wikimedia.org/P66396 and previous config saved to /var/cache/conftool/dbconfig/20240712-102416-arnaudb.json
  • 10:22 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1198 (T367856)', diff saved to https://phabricator.wikimedia.org/P66395 and previous config saved to /var/cache/conftool/dbconfig/20240712-102243-marostegui.json
  • 10:22 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1198.eqiad.wmnet with reason: Maintenance
  • 10:22 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1198.eqiad.wmnet with reason: Maintenance
  • 10:22 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1189 (T367856)', diff saved to https://phabricator.wikimedia.org/P66394 and previous config saved to /var/cache/conftool/dbconfig/20240712-102221-marostegui.json
  • 10:18 godog: stop benthos@webrequest_live on centrallog2002 and start it on centrallog1002 - T369737
  • 10:09 arnaudb@cumin1002: dbctl commit (dc=all): 'db1196 (re)pooling @ 75%: stopping T367781', diff saved to https://phabricator.wikimedia.org/P66393 and previous config saved to /var/cache/conftool/dbconfig/20240712-100910-arnaudb.json
  • 10:07 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P66392 and previous config saved to /var/cache/conftool/dbconfig/20240712-100714-marostegui.json
  • 09:54 arnaudb@cumin1002: dbctl commit (dc=all): 'db1196 (re)pooling @ 50%: stopping T367781', diff saved to https://phabricator.wikimedia.org/P66391 and previous config saved to /var/cache/conftool/dbconfig/20240712-095405-arnaudb.json
  • 09:53 godog: temp stop benthos@webrequest_live on centrallog1002 - T369737
  • 09:52 klausman@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
  • 09:52 klausman@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
  • 09:52 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P66389 and previous config saved to /var/cache/conftool/dbconfig/20240712-095207-marostegui.json
  • 09:39 arnaudb@cumin1002: dbctl commit (dc=all): 'db1196 (re)pooling @ 25%: stopping T367781', diff saved to https://phabricator.wikimedia.org/P66388 and previous config saved to /var/cache/conftool/dbconfig/20240712-093900-arnaudb.json
  • 09:37 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1189 (T367856)', diff saved to https://phabricator.wikimedia.org/P66387 and previous config saved to /var/cache/conftool/dbconfig/20240712-093700-marostegui.json
  • 09:23 arnaudb@cumin1002: dbctl commit (dc=all): 'db1196 (re)pooling @ 10%: stopping T367781', diff saved to https://phabricator.wikimedia.org/P66386 and previous config saved to /var/cache/conftool/dbconfig/20240712-092354-arnaudb.json
  • 09:20 dcausse@deploy1002: Finished scap: Backport for Re-add CirrusSearch prefix to statsd metrics (T359033) (duration: 09m 44s)
  • 09:15 dcausse@deploy1002: dcausse: Continuing with sync
  • 09:13 dcausse@deploy1002: dcausse: Backport for Re-add CirrusSearch prefix to statsd metrics (T359033) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 09:10 dcausse@deploy1002: Started scap sync-world: Backport for Re-add CirrusSearch prefix to statsd metrics (T359033)
  • 09:10 elukey: upgrade httpd version in production (bullseye/bookworm) for T369885
  • 09:08 arnaudb@cumin1002: dbctl commit (dc=all): 'db1196 (re)pooling @ 5%: stopping T367781', diff saved to https://phabricator.wikimedia.org/P66385 and previous config saved to /var/cache/conftool/dbconfig/20240712-090849-arnaudb.json
  • 09:05 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1158 (T367781)', diff saved to https://phabricator.wikimedia.org/P66384 and previous config saved to /var/cache/conftool/dbconfig/20240712-090527-arnaudb.json
  • 09:05 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 09:05 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 09:05 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 09:04 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 08:59 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on db1179.eqiad.wmnet with reason: T369855
  • 08:59 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on db1179.eqiad.wmnet with reason: T369855
  • 08:42 godog: tweak benthos@webrequest_live output batching on centrallog2001 - T369737
  • 08:36 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1189 (T367856)', diff saved to https://phabricator.wikimedia.org/P66383 and previous config saved to /var/cache/conftool/dbconfig/20240712-083644-marostegui.json
  • 08:36 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1189.eqiad.wmnet with reason: Maintenance
  • 08:36 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1189.eqiad.wmnet with reason: Maintenance
  • 08:36 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T367856)', diff saved to https://phabricator.wikimedia.org/P66382 and previous config saved to /var/cache/conftool/dbconfig/20240712-083621-marostegui.json
  • 08:21 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P66381 and previous config saved to /var/cache/conftool/dbconfig/20240712-082114-marostegui.json
  • 08:06 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P66380 and previous config saved to /var/cache/conftool/dbconfig/20240712-080607-marostegui.json
  • 07:51 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T367856)', diff saved to https://phabricator.wikimedia.org/P66379 and previous config saved to /var/cache/conftool/dbconfig/20240712-075100-marostegui.json
  • 07:31 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2218 (T367856)', diff saved to https://phabricator.wikimedia.org/P66377 and previous config saved to /var/cache/conftool/dbconfig/20240712-073102-marostegui.json
  • 07:30 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2218.codfw.wmnet with reason: Maintenance
  • 07:30 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2218.codfw.wmnet with reason: Maintenance
  • 07:30 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2208 (T367856)', diff saved to https://phabricator.wikimedia.org/P66376 and previous config saved to /var/cache/conftool/dbconfig/20240712-073040-marostegui.json
  • 07:30 kevinbazira@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'article-descriptions' for release 'main' .
  • 07:24 kevinbazira@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'article-descriptions' for release 'main' .
  • 07:15 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2208', diff saved to https://phabricator.wikimedia.org/P66375 and previous config saved to /var/cache/conftool/dbconfig/20240712-071533-marostegui.json
  • 07:00 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2208', diff saved to https://phabricator.wikimedia.org/P66374 and previous config saved to /var/cache/conftool/dbconfig/20240712-070026-marostegui.json
  • 06:37 Dreamy_Jazz: Starting MediaModeration scan on commons after it crashed last night due to database issues - https://wikitech.wikimedia.org/wiki/MediaModeration
  • 06:18 marostegui@cumin1002: dbctl commit (dc=all): 'db2127 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P66372 and previous config saved to /var/cache/conftool/dbconfig/20240712-061835-root.json
  • 06:03 marostegui@cumin1002: dbctl commit (dc=all): 'db2127 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P66371 and previous config saved to /var/cache/conftool/dbconfig/20240712-060329-root.json
  • 05:48 marostegui@cumin1002: dbctl commit (dc=all): 'db2127 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P66370 and previous config saved to /var/cache/conftool/dbconfig/20240712-054824-root.json
  • 05:33 marostegui@cumin1002: dbctl commit (dc=all): 'db2127 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P66369 and previous config saved to /var/cache/conftool/dbconfig/20240712-053318-root.json
  • 05:18 marostegui@cumin1002: dbctl commit (dc=all): 'db2127 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P66368 and previous config saved to /var/cache/conftool/dbconfig/20240712-051813-root.json
  • 05:08 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2136', diff saved to https://phabricator.wikimedia.org/P66367 and previous config saved to /var/cache/conftool/dbconfig/20240712-050800-root.json
  • 05:03 marostegui@cumin1002: dbctl commit (dc=all): 'db2127 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P66366 and previous config saved to /var/cache/conftool/dbconfig/20240712-050307-root.json
  • 04:48 marostegui@cumin1002: dbctl commit (dc=all): 'db2127 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P66365 and previous config saved to /var/cache/conftool/dbconfig/20240712-044802-root.json
  • 03:52 ayounsi@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host netboxdb2003.codfw.wmnet
  • 03:52 ayounsi@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host netboxdb2003.codfw.wmnet with OS bookworm
  • 00:01 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1175 (T367856)', diff saved to https://phabricator.wikimedia.org/P66364 and previous config saved to /var/cache/conftool/dbconfig/20240712-000131-marostegui.json
  • 00:01 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 00:01 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 00:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T367856)', diff saved to https://phabricator.wikimedia.org/P66363 and previous config saved to /var/cache/conftool/dbconfig/20240712-000109-marostegui.json

2024-07-11

  • 23:46 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P66362 and previous config saved to /var/cache/conftool/dbconfig/20240711-234602-marostegui.json
  • 23:37 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2213 (T367781)', diff saved to https://phabricator.wikimedia.org/P66361 and previous config saved to /var/cache/conftool/dbconfig/20240711-233712-arnaudb.json
  • 23:30 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P66360 and previous config saved to /var/cache/conftool/dbconfig/20240711-233054-marostegui.json
  • 23:22 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2208 (T367856)', diff saved to https://phabricator.wikimedia.org/P66359 and previous config saved to /var/cache/conftool/dbconfig/20240711-232218-marostegui.json
  • 23:22 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2208.codfw.wmnet with reason: Maintenance
  • 23:22 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2213', diff saved to https://phabricator.wikimedia.org/P66358 and previous config saved to /var/cache/conftool/dbconfig/20240711-232205-arnaudb.json
  • 23:21 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2208.codfw.wmnet with reason: Maintenance
  • 23:15 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T367856)', diff saved to https://phabricator.wikimedia.org/P66357 and previous config saved to /var/cache/conftool/dbconfig/20240711-231547-marostegui.json
  • 23:06 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2213', diff saved to https://phabricator.wikimedia.org/P66356 and previous config saved to /var/cache/conftool/dbconfig/20240711-230657-arnaudb.json
  • 23:06 zabe@deploy1002: Finished scap: update interwiki cache (duration: 07m 37s)
  • 22:59 zabe@deploy1002: Started scap sync-world: update interwiki cache
  • 22:51 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2213 (T367781)', diff saved to https://phabricator.wikimedia.org/P66355 and previous config saved to /var/cache/conftool/dbconfig/20240711-225150-arnaudb.json
  • 22:49 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2213 (T367781)', diff saved to https://phabricator.wikimedia.org/P66354 and previous config saved to /var/cache/conftool/dbconfig/20240711-224858-arnaudb.json
  • 22:48 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2213.codfw.wmnet with reason: Maintenance
  • 22:48 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2213.codfw.wmnet with reason: Maintenance
  • 22:48 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2211 (T367781)', diff saved to https://phabricator.wikimedia.org/P66353 and previous config saved to /var/cache/conftool/dbconfig/20240711-224836-arnaudb.json
  • 22:33 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2211', diff saved to https://phabricator.wikimedia.org/P66352 and previous config saved to /var/cache/conftool/dbconfig/20240711-223329-arnaudb.json
  • 22:27 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:27 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove IPV6 for dbproxy200[5-8] - pt1979@cumin2002"
  • 22:26 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove IPV6 for dbproxy200[5-8] - pt1979@cumin2002"
  • 22:23 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 22:18 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2211', diff saved to https://phabricator.wikimedia.org/P66351 and previous config saved to /var/cache/conftool/dbconfig/20240711-221822-arnaudb.json
  • 22:03 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2211 (T367781)', diff saved to https://phabricator.wikimedia.org/P66350 and previous config saved to /var/cache/conftool/dbconfig/20240711-220315-arnaudb.json
  • 21:59 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2211 (T367781)', diff saved to https://phabricator.wikimedia.org/P66349 and previous config saved to /var/cache/conftool/dbconfig/20240711-215921-arnaudb.json
  • 21:59 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2211.codfw.wmnet with reason: Maintenance
  • 21:59 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2211.codfw.wmnet with reason: Maintenance
  • 21:57 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2201.codfw.wmnet with reason: Maintenance
  • 21:57 rzl: systemctl restart apache2 on mwdebug1002, mwdebug2001, mwdebug2002 for https://gerrit.wikimedia.org/r/1052128
  • 21:57 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2201.codfw.wmnet with reason: Maintenance
  • 21:57 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2192 (T367781)', diff saved to https://phabricator.wikimedia.org/P66348 and previous config saved to /var/cache/conftool/dbconfig/20240711-215700-arnaudb.json
  • 21:44 rzl: rzl@mwdebug1002:~$ sudo apache2ctl restart
  • 21:41 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2192', diff saved to https://phabricator.wikimedia.org/P66347 and previous config saved to /var/cache/conftool/dbconfig/20240711-214153-arnaudb.json
  • 21:38 jhathaway: upgrading exim4 to 4.94.2-7+deb11u3
  • 21:26 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2192', diff saved to https://phabricator.wikimedia.org/P66346 and previous config saved to /var/cache/conftool/dbconfig/20240711-212646-arnaudb.json
  • 21:13 catrope@deploy1002: Finished scap: Backport for Change Linter log level to info (duration: 14m 40s)
  • 21:11 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Unbanning all hosts in search_eqiad
  • 21:11 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2192 (T367781)', diff saved to https://phabricator.wikimedia.org/P66345 and previous config saved to /var/cache/conftool/dbconfig/20240711-211138-arnaudb.json
  • 21:11 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Unbanning all hosts in search_eqiad
  • 21:08 catrope@deploy1002: arlolra, catrope: Continuing with sync
  • 21:07 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2192 (T367781)', diff saved to https://phabricator.wikimedia.org/P66344 and previous config saved to /var/cache/conftool/dbconfig/20240711-210747-arnaudb.json
  • 21:07 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2192.codfw.wmnet with reason: Maintenance
  • 21:07 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2192.codfw.wmnet with reason: Maintenance
  • 21:07 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T367781)', diff saved to https://phabricator.wikimedia.org/P66343 and previous config saved to /var/cache/conftool/dbconfig/20240711-210725-arnaudb.json
  • 21:05 catrope@deploy1002: arlolra, catrope: Backport for Change Linter log level to info synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:59 catrope@deploy1002: Started scap sync-world: Backport for Change Linter log level to info
  • 20:52 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P66342 and previous config saved to /var/cache/conftool/dbconfig/20240711-205218-arnaudb.json
  • 20:37 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P66341 and previous config saved to /var/cache/conftool/dbconfig/20240711-203711-arnaudb.json
  • 20:37 catrope@deploy1002: Finished scap: Backport for Vector theme should default to day (T369833) (duration: 17m 09s)
  • 20:32 catrope@deploy1002: jdlrobson, catrope: Continuing with sync
  • 20:30 catrope@deploy1002: jdlrobson, catrope: Backport for Vector theme should default to day (T369833) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:29 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbproxy2005.codfw.wmnet with OS bookworm
  • 20:28 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 20:26 eileen: config revision changed from 540f27e6 to c25da839 renable silverpop_daily
  • 20:22 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T367781)', diff saved to https://phabricator.wikimedia.org/P66340 and previous config saved to /var/cache/conftool/dbconfig/20240711-202204-arnaudb.json
  • 20:19 catrope@deploy1002: Started scap sync-world: Backport for Vector theme should default to day (T369833)
  • 20:18 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2178 (T367781)', diff saved to https://phabricator.wikimedia.org/P66339 and previous config saved to /var/cache/conftool/dbconfig/20240711-201815-arnaudb.json
  • 20:18 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2178.codfw.wmnet with reason: Maintenance
  • 20:17 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2178.codfw.wmnet with reason: Maintenance
  • 20:17 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2171 (T367781)', diff saved to https://phabricator.wikimedia.org/P66338 and previous config saved to /var/cache/conftool/dbconfig/20240711-201753-arnaudb.json
  • 20:15 catrope@deploy1002: Finished scap: Backport for Graph: Fix JSON parse errors in Graph data source tracking (duration: 13m 32s)
  • 20:10 catrope@deploy1002: catrope: Continuing with sync
  • 20:08 catrope@deploy1002: catrope: Backport for Graph: Fix JSON parse errors in Graph data source tracking synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:02 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2171', diff saved to https://phabricator.wikimedia.org/P66337 and previous config saved to /var/cache/conftool/dbconfig/20240711-200246-arnaudb.json
  • 20:01 catrope@deploy1002: Started scap sync-world: Backport for Graph: Fix JSON parse errors in Graph data source tracking
  • 19:47 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2171', diff saved to https://phabricator.wikimedia.org/P66336 and previous config saved to /var/cache/conftool/dbconfig/20240711-194739-arnaudb.json
  • 19:32 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2171 (T367781)', diff saved to https://phabricator.wikimedia.org/P66335 and previous config saved to /var/cache/conftool/dbconfig/20240711-193231-arnaudb.json
  • 19:28 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2171 (T367781)', diff saved to https://phabricator.wikimedia.org/P66334 and previous config saved to /var/cache/conftool/dbconfig/20240711-192842-arnaudb.json
  • 19:28 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 19:28 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 19:28 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T367781)', diff saved to https://phabricator.wikimedia.org/P66333 and previous config saved to /var/cache/conftool/dbconfig/20240711-192820-arnaudb.json
  • 19:23 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - pt1979@cumin2002"
  • 19:13 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P66332 and previous config saved to /var/cache/conftool/dbconfig/20240711-191313-arnaudb.json
  • 19:12 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/commons-impact-analytics: apply
  • 19:11 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/commons-impact-analytics: apply
  • 19:09 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbproxy2005.codfw.wmnet with reason: host reimage
  • 19:06 pt1979@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on dbproxy2005.codfw.wmnet with reason: host reimage
  • 18:58 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P66331 and previous config saved to /var/cache/conftool/dbconfig/20240711-185805-arnaudb.json
  • 18:52 pt1979@cumin2002: START - Cookbook sre.hosts.reimage for host dbproxy2005.codfw.wmnet with OS bookworm
  • 18:43 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T367781)', diff saved to https://phabricator.wikimedia.org/P66330 and previous config saved to /var/cache/conftool/dbconfig/20240711-184258-arnaudb.json
  • 18:40 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2157 (T367781)', diff saved to https://phabricator.wikimedia.org/P66329 and previous config saved to /var/cache/conftool/dbconfig/20240711-184009-arnaudb.json
  • 18:40 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2157.codfw.wmnet with reason: Maintenance
  • 18:39 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2157.codfw.wmnet with reason: Maintenance
  • 18:39 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T367781)', diff saved to https://phabricator.wikimedia.org/P66328 and previous config saved to /var/cache/conftool/dbconfig/20240711-183946-arnaudb.json
  • 18:24 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P66327 and previous config saved to /var/cache/conftool/dbconfig/20240711-182438-arnaudb.json
  • 18:18 aokoth@cumin1002: END (FAIL) - Cookbook sre.vrts.upgrade (exit_code=99) on VRTS host vrts1001.eqiad.wmnet
  • 18:15 aokoth@cumin1002: START - Cookbook sre.vrts.upgrade on VRTS host vrts1001.eqiad.wmnet
  • 18:09 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P66326 and previous config saved to /var/cache/conftool/dbconfig/20240711-180931-arnaudb.json
  • 18:00 aokoth@cumin1002: END (FAIL) - Cookbook sre.vrts.upgrade (exit_code=93) on VRTS host vrts1001.eqiad.wmnet
  • 18:00 aokoth@cumin1002: START - Cookbook sre.vrts.upgrade on VRTS host vrts1001.eqiad.wmnet
  • 17:54 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T367781)', diff saved to https://phabricator.wikimedia.org/P66325 and previous config saved to /var/cache/conftool/dbconfig/20240711-175424-arnaudb.json
  • 17:52 daniel@deploy1002: Finished scap: Backport for Enable Special:RestSandbox on testwiki (T362006) (duration: 11m 01s)
  • 17:52 rzl@cumin2002: dbctl commit (dc=all): 'db1179 depooled', diff saved to https://phabricator.wikimedia.org/P66324 and previous config saved to /var/cache/conftool/dbconfig/20240711-175212-rzl.json
  • 17:50 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2128 (T367781)', diff saved to https://phabricator.wikimedia.org/P66322 and previous config saved to /var/cache/conftool/dbconfig/20240711-175038-arnaudb.json
  • 17:50 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 17:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 17:50 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2128.codfw.wmnet with reason: Maintenance
  • 17:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2128.codfw.wmnet with reason: Maintenance
  • 17:49 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
  • 17:49 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
  • 17:49 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1245.eqiad.wmnet with reason: Maintenance
  • 17:48 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1245.eqiad.wmnet with reason: Maintenance
  • 17:48 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1216.eqiad.wmnet with reason: Maintenance
  • 17:48 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1216.eqiad.wmnet with reason: Maintenance
  • 17:48 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1213 (T367781)', diff saved to https://phabricator.wikimedia.org/P66321 and previous config saved to /var/cache/conftool/dbconfig/20240711-174820-arnaudb.json
  • 17:47 daniel@deploy1002: daniel: Continuing with sync
  • 17:46 daniel@deploy1002: daniel: Backport for Enable Special:RestSandbox on testwiki (T362006) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 17:41 daniel@deploy1002: Started scap sync-world: Backport for Enable Special:RestSandbox on testwiki (T362006)
  • 17:33 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1213', diff saved to https://phabricator.wikimedia.org/P66319 and previous config saved to /var/cache/conftool/dbconfig/20240711-173313-arnaudb.json
  • 17:28 akosiaris@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host deploy1003.eqiad.wmnet with OS bullseye
  • 17:18 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1213', diff saved to https://phabricator.wikimedia.org/P66318 and previous config saved to /var/cache/conftool/dbconfig/20240711-171806-arnaudb.json
  • 17:10 daniel@deploy1002: Started scap sync-world: Backport for Enable Special:RestSandbox on testwiki (T362006)
  • 17:10 bd808@deploy1002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
  • 17:09 bd808@deploy1002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
  • 17:09 bd808@deploy1002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
  • 17:08 bd808@deploy1002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
  • 17:07 bd808@deploy1002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
  • 17:07 bd808@deploy1002: helmfile [staging] START helmfile.d/services/developer-portal: apply
  • 17:06 mutante: puppetmaster1001 - puppet cert clean aphlict..discovery.wmnet T369796 T360413
  • 17:03 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1213 (T367781)', diff saved to https://phabricator.wikimedia.org/P66317 and previous config saved to /var/cache/conftool/dbconfig/20240711-170258-arnaudb.json
  • 17:00 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1213 (T367781)', diff saved to https://phabricator.wikimedia.org/P66316 and previous config saved to /var/cache/conftool/dbconfig/20240711-170030-arnaudb.json
  • 17:00 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1213.eqiad.wmnet with reason: Maintenance
  • 17:00 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1213.eqiad.wmnet with reason: Maintenance
  • 17:00 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1210 (T367781)', diff saved to https://phabricator.wikimedia.org/P66315 and previous config saved to /var/cache/conftool/dbconfig/20240711-170007-arnaudb.json
  • 16:58 mutante: puppetmaster1001 - puppet cert clean phabricator.discovery.wmnet T369796 T360413
  • 16:58 dcausse@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 16:58 dcausse@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 16:46 dcausse@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 16:46 dcausse@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 16:45 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1210', diff saved to https://phabricator.wikimedia.org/P66314 and previous config saved to /var/cache/conftool/dbconfig/20240711-164500-arnaudb.json
  • 16:40 dcausse@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 16:40 dcausse@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 16:29 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1210', diff saved to https://phabricator.wikimedia.org/P66313 and previous config saved to /var/cache/conftool/dbconfig/20240711-162953-arnaudb.json
  • 16:14 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1210 (T367781)', diff saved to https://phabricator.wikimedia.org/P66312 and previous config saved to /var/cache/conftool/dbconfig/20240711-161446-arnaudb.json
  • 16:13 ejegg: payments-wiki upgraded from 4e48059a to c8edeb8e
  • 16:12 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1210 (T367781)', diff saved to https://phabricator.wikimedia.org/P66311 and previous config saved to /var/cache/conftool/dbconfig/20240711-161219-arnaudb.json
  • 16:12 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1210.eqiad.wmnet with reason: Maintenance
  • 16:12 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1210.eqiad.wmnet with reason: Maintenance
  • 16:11 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T367781)', diff saved to https://phabricator.wikimedia.org/P66310 and previous config saved to /var/cache/conftool/dbconfig/20240711-161157-arnaudb.json
  • 16:03 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2200.codfw.wmnet with reason: Maintenance
  • 16:03 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2200.codfw.wmnet with reason: Maintenance
  • 15:56 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P66309 and previous config saved to /var/cache/conftool/dbconfig/20240711-155649-arnaudb.json
  • 15:53 dcausse@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 15:52 dcausse@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 15:51 akosiaris@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on deploy1003.eqiad.wmnet with reason: host reimage
  • 15:51 arnaudb@cumin1002: dbctl commit (dc=all): 'db1193 (re)pooling @ 100%: post T365996 repool', diff saved to https://phabricator.wikimedia.org/P66308 and previous config saved to /var/cache/conftool/dbconfig/20240711-155109-arnaudb.json
  • 15:48 akosiaris@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on deploy1003.eqiad.wmnet with reason: host reimage
  • 15:41 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P66307 and previous config saved to /var/cache/conftool/dbconfig/20240711-154142-arnaudb.json
  • 15:41 sfaci@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
  • 15:40 sfaci@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
  • 15:36 akosiaris@cumin1002: START - Cookbook sre.hosts.reimage for host deploy1003.eqiad.wmnet with OS bullseye
  • 15:36 arnaudb@cumin1002: dbctl commit (dc=all): 'db1193 (re)pooling @ 75%: post T365996 repool', diff saved to https://phabricator.wikimedia.org/P66306 and previous config saved to /var/cache/conftool/dbconfig/20240711-153604-arnaudb.json
  • 15:31 dcausse@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 15:30 dcausse@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 15:29 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1166 (T367856)', diff saved to https://phabricator.wikimedia.org/P66305 and previous config saved to /var/cache/conftool/dbconfig/20240711-152946-marostegui.json
  • 15:29 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 15:29 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 15:26 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T367781)', diff saved to https://phabricator.wikimedia.org/P66304 and previous config saved to /var/cache/conftool/dbconfig/20240711-152635-arnaudb.json
  • 15:26 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
  • 15:24 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1200 (T367781)', diff saved to https://phabricator.wikimedia.org/P66303 and previous config saved to /var/cache/conftool/dbconfig/20240711-152412-arnaudb.json
  • 15:24 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1200.eqiad.wmnet with reason: Maintenance
  • 15:23 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1200.eqiad.wmnet with reason: Maintenance
  • 15:23 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T367781)', diff saved to https://phabricator.wikimedia.org/P66302 and previous config saved to /var/cache/conftool/dbconfig/20240711-152350-arnaudb.json
  • 15:22 sfaci@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
  • 15:22 sfaci@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
  • 15:22 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 15:21 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply
  • 15:21 arnaudb@cumin1002: dbctl commit (dc=all): 'db1193 (re)pooling @ 50%: post T365996 repool', diff saved to https://phabricator.wikimedia.org/P66301 and previous config saved to /var/cache/conftool/dbconfig/20240711-152058-arnaudb.json
  • 15:20 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply
  • 15:20 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 15:17 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 15:13 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-wikifunctions: apply
  • 15:13 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-wikifunctions: apply
  • 15:12 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-wikifunctions: apply
  • 15:12 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/mw-wikifunctions: apply
  • 15:12 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-wikifunctions: apply
  • 15:11 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/mw-wikifunctions: apply
  • 15:11 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-wikifunctions: apply
  • 15:11 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/mw-wikifunctions: apply
  • 15:08 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P66300 and previous config saved to /var/cache/conftool/dbconfig/20240711-150843-arnaudb.json
  • 15:05 arnaudb@cumin1002: dbctl commit (dc=all): 'db1193 (re)pooling @ 25%: post T365996 repool', diff saved to https://phabricator.wikimedia.org/P66299 and previous config saved to /var/cache/conftool/dbconfig/20240711-150553-arnaudb.json
  • 15:03 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
  • 15:01 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
  • 15:00 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
  • 14:59 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
  • 14:55 Emperor: repool ms-fe1014 and thanos-fe1004 before switch work T365996
  • 14:53 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P66298 and previous config saved to /var/cache/conftool/dbconfig/20240711-145336-arnaudb.json
  • 14:50 arnaudb@cumin1002: dbctl commit (dc=all): 'db1193 (re)pooling @ 10%: post T365996 repool', diff saved to https://phabricator.wikimedia.org/P66297 and previous config saved to /var/cache/conftool/dbconfig/20240711-145047-arnaudb.json
  • 14:43 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
  • 14:42 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop: apply
  • 14:42 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
  • 14:42 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop: apply
  • 14:41 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
  • 14:40 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: apply
  • 14:38 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T367781)', diff saved to https://phabricator.wikimedia.org/P66296 and previous config saved to /var/cache/conftool/dbconfig/20240711-143829-arnaudb.json
  • 14:36 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1185 (T367781)', diff saved to https://phabricator.wikimedia.org/P66295 and previous config saved to /var/cache/conftool/dbconfig/20240711-143606-arnaudb.json
  • 14:35 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1185.eqiad.wmnet with reason: Maintenance
  • 14:35 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1185.eqiad.wmnet with reason: Maintenance
  • 14:35 arnaudb@cumin1002: dbctl commit (dc=all): 'db1193 (re)pooling @ 5%: post T365996 repool', diff saved to https://phabricator.wikimedia.org/P66294 and previous config saved to /var/cache/conftool/dbconfig/20240711-143541-arnaudb.json
  • 14:35 godog: pool titan1001 for switch work T365996
  • 14:25 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on backup1011.eqiad.wmnet,db1193.eqiad.wmnet,dbproxy1027.eqiad.wmnet with reason: T365996
  • 14:25 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1:30:00 on backup1011.eqiad.wmnet,db1193.eqiad.wmnet,dbproxy1027.eqiad.wmnet with reason: T365996
  • 14:25 arnaudb@cumin1002: dbctl commit (dc=all): 'T365996 - depool db1193 - s8', diff saved to https://phabricator.wikimedia.org/P66293 and previous config saved to /var/cache/conftool/dbconfig/20240711-142544-arnaudb.json
  • 14:20 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1183', diff saved to https://phabricator.wikimedia.org/P66292 and previous config saved to /var/cache/conftool/dbconfig/20240711-142037-arnaudb.json
  • 14:19 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 23 hosts with reason: JunOS upgrade lsw1-f1-eqiad
  • 14:19 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on 23 hosts with reason: JunOS upgrade lsw1-f1-eqiad
  • 14:15 topranks: rebooting lsw1-f1-eqiad to install updated JunOS version T365996
  • 14:12 godog: depool titan1001 for switch work T365996
  • 14:12 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 23 hosts with reason: JunOS upgrade lsw1-f1-eqiad
  • 14:12 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on 23 hosts with reason: JunOS upgrade lsw1-f1-eqiad
  • 14:09 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lsw1-f1-eqiad,lsw1-f1-eqiad IPv6,ssw1-e1-eqiad.mgmt,ssw1-f1-eqiad.mgmt with reason: JunOS upgrade lsw1-f1-eqiad
  • 14:08 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on lsw1-f1-eqiad,lsw1-f1-eqiad IPv6,ssw1-e1-eqiad.mgmt,ssw1-f1-eqiad.mgmt with reason: JunOS upgrade lsw1-f1-eqiad
  • 14:08 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:50:00 on lsw1-f1-eqiad.mgmt with reason: prep JunOS upgrade lsw1-f1-eqiad
  • 14:08 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:50:00 on lsw1-f1-eqiad.mgmt with reason: prep JunOS upgrade lsw1-f1-eqiad
  • 14:05 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1183', diff saved to https://phabricator.wikimedia.org/P66291 and previous config saved to /var/cache/conftool/dbconfig/20240711-140530-arnaudb.json
  • 13:56 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
  • 13:52 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
  • 13:50 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1183 (T367781)', diff saved to https://phabricator.wikimedia.org/P66290 and previous config saved to /var/cache/conftool/dbconfig/20240711-135023-arnaudb.json
  • 13:50 Emperor: depool ms-fe1014 and thanos-fe1004 before switch work T365996
  • 13:48 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1183 (T367781)', diff saved to https://phabricator.wikimedia.org/P66289 and previous config saved to /var/cache/conftool/dbconfig/20240711-134759-arnaudb.json
  • 13:47 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1183.eqiad.wmnet with reason: Maintenance
  • 13:47 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1183.eqiad.wmnet with reason: Maintenance
  • 13:47 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T367781)', diff saved to https://phabricator.wikimedia.org/P66288 and previous config saved to /var/cache/conftool/dbconfig/20240711-134737-arnaudb.json
  • 13:44 btullis@cumin1002: END (PASS) - Cookbook sre.presto.roll-restart-workers (exit_code=0) for Presto an-presto cluster: Roll restart of all Presto's jvm daemons.
  • 13:32 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
  • 13:32 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P66287 and previous config saved to /var/cache/conftool/dbconfig/20240711-133229-arnaudb.json
  • 13:29 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
  • 13:28 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1090.eqiad.wmnet
  • 13:26 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
  • 13:22 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
  • 13:20 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-worker1090.eqiad.wmnet
  • 13:17 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P66286 and previous config saved to /var/cache/conftool/dbconfig/20240711-131721-arnaudb.json
  • 13:14 cgoubert@cumin1002: conftool action : set/pooled=yes; selector: name=(kubernetes1062.eqiad.wmnet|mw1494.eqiad.wmnet|mw1495.eqiad.wmnet),cluster=kubernetes,service=kubesvc
  • 13:14 claime: Uncordoning and depooling kubernetes1062.eqiad.wmnet mw1494.eqiad.wmnet mw1495.eqiad.wmnet that were actually not concerned by T365996
  • 13:13 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
  • 13:12 btullis@cumin1002: START - Cookbook sre.presto.roll-restart-workers for Presto an-presto cluster: Roll restart of all Presto's jvm daemons.
  • 13:10 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
  • 13:09 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
  • 13:08 cgoubert@cumin1002: conftool action : set/pooled=inactive; selector: name=(kubernetes1062.eqiad.wmnet|mw1494.eqiad.wmnet|mw1495.eqiad.wmnet),cluster=kubernetes,service=kubesvc
  • 13:05 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
  • 13:04 claime: Cordoning and depooling kubernetes1062.eqiad.wmnet mw1494.eqiad.wmnet mw1495.eqiad.wmnet for T365996
  • 13:04 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6 days, 0:00:00 on relforge[1003-1004].eqiad.wmnet with reason: T368950
  • 13:04 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
  • 13:03 bking@cumin2002: START - Cookbook sre.hosts.downtime for 6 days, 0:00:00 on relforge[1003-1004].eqiad.wmnet with reason: T368950
  • 13:02 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T367781)', diff saved to https://phabricator.wikimedia.org/P66285 and previous config saved to /var/cache/conftool/dbconfig/20240711-130214-arnaudb.json
  • 13:00 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
  • 12:59 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1161 (T367781)', diff saved to https://phabricator.wikimedia.org/P66284 and previous config saved to /var/cache/conftool/dbconfig/20240711-125949-arnaudb.json
  • 12:59 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 12:59 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 12:59 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 12:59 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 12:55 godog: reenable benthos@webrequest_live on centrallog2002 - T369737
  • 12:51 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on netboxdb2003.codfw.wmnet with reason: netbox upgrade prep work
  • 12:51 ayounsi@cumin1002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on netboxdb2003.codfw.wmnet with reason: netbox upgrade prep work
  • 12:51 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on netboxdb1003.eqiad.wmnet with reason: netbox upgrade prep work
  • 12:51 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
  • 12:51 ayounsi@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on netboxdb2003.codfw.wmnet with reason: host reimage
  • 12:51 ayounsi@cumin1002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on netboxdb1003.eqiad.wmnet with reason: netbox upgrade prep work
  • 12:50 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
  • 12:50 dcausse@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 12:50 dcausse@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 12:50 claime: running puppet on O:analytics_cluster::turnilo,O:analytics_cluster::turnilo::staging
  • 12:48 godog: temp stop benthos@webrequest_live on centrallog2002 - T369737
  • 12:47 ayounsi@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on netboxdb2003.codfw.wmnet with reason: host reimage
  • 12:43 dcausse@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 12:42 dcausse@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 12:39 ayounsi@cumin1002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 4 days, 0:00:00 on netboxdb1003.eqiad.wmnet with reason: netbox upgrade prep work
  • 12:39 ayounsi@cumin1002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on netboxdb1003.eqiad.wmnet with reason: netbox upgrade prep work
  • 12:30 ayounsi@cumin2002: START - Cookbook sre.hosts.reimage for host netboxdb2003.codfw.wmnet with OS bookworm
  • 12:30 ayounsi@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM netboxdb2003.codfw.wmnet - ayounsi@cumin2002"
  • 12:29 ayounsi@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM netboxdb2003.codfw.wmnet - ayounsi@cumin2002"
  • 12:28 ayounsi@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) netboxdb2003.codfw.wmnet on all recursors
  • 12:28 ayounsi@cumin2002: START - Cookbook sre.dns.wipe-cache netboxdb2003.codfw.wmnet on all recursors
  • 12:28 ayounsi@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:28 ayounsi@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM netboxdb2003.codfw.wmnet - ayounsi@cumin2002"
  • 12:28 dcausse@deploy1002: Finished deploy [airflow-dags/search@7bb895a]: search: stop using api-ro.discovery.wmnet (duration: 00m 21s)
  • 12:27 dcausse@deploy1002: Started deploy [airflow-dags/search@7bb895a]: search: stop using api-ro.discovery.wmnet
  • 12:27 ayounsi@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM netboxdb2003.codfw.wmnet - ayounsi@cumin2002"
  • 12:24 ayounsi@cumin2002: START - Cookbook sre.dns.netbox
  • 12:24 ayounsi@cumin2002: START - Cookbook sre.ganeti.makevm for new host netboxdb2003.codfw.wmnet
  • 11:50 ayounsi@cumin1002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=97) for new host netboxdb1003.eqiad.wmnet
  • 11:50 ayounsi@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host netboxdb1003.eqiad.wmnet with OS bookworm
  • 11:49 dcausse@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 11:48 dcausse@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 11:36 ayounsi@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host netbox2003.codfw.wmnet
  • 11:36 ayounsi@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host netbox2003.codfw.wmnet with OS bookworm
  • 11:29 ayounsi@cumin1002: START - Cookbook sre.hosts.reimage for host netboxdb1003.eqiad.wmnet with OS bookworm
  • 11:29 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
  • 11:29 ayounsi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM netboxdb1003.eqiad.wmnet - ayounsi@cumin1002"
  • 11:29 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop: apply
  • 11:29 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on netbox2003.codfw.wmnet with reason: netbox upgrade prep work
  • 11:29 ayounsi@cumin1002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on netbox2003.codfw.wmnet with reason: netbox upgrade prep work
  • 11:28 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on netbox1003.eqiad.wmnet with reason: netbox upgrade prep work
  • 11:28 ayounsi@cumin1002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on netbox1003.eqiad.wmnet with reason: netbox upgrade prep work
  • 11:28 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM netboxdb1003.eqiad.wmnet - ayounsi@cumin1002"
  • 11:28 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) netboxdb1003.eqiad.wmnet on all recursors
  • 11:28 ayounsi@cumin1002: START - Cookbook sre.dns.wipe-cache netboxdb1003.eqiad.wmnet on all recursors
  • 11:28 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:28 ayounsi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM netboxdb1003.eqiad.wmnet - ayounsi@cumin1002"
  • 11:26 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM netboxdb1003.eqiad.wmnet - ayounsi@cumin1002"
  • 11:24 ayounsi@cumin1002: START - Cookbook sre.dns.netbox
  • 11:24 ayounsi@cumin1002: START - Cookbook sre.ganeti.makevm for new host netboxdb1003.eqiad.wmnet
  • 11:14 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
  • 11:14 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop: apply
  • 11:13 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
  • 11:12 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: apply
  • 11:02 ayounsi@cumin1002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host netboxdb1003.eqiad.wmnet
  • 11:02 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) netboxdb1003.eqiad.wmnet on all recursors
  • 11:02 ayounsi@cumin1002: START - Cookbook sre.dns.wipe-cache netboxdb1003.eqiad.wmnet on all recursors
  • 11:02 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:00 ayounsi@cumin1002: START - Cookbook sre.dns.netbox
  • 11:00 ayounsi@cumin1002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 10:58 ayounsi@cumin1002: START - Cookbook sre.dns.netbox
  • 10:58 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) netboxdb1003.eqiad.wmnet on all recursors
  • 10:58 ayounsi@cumin1002: START - Cookbook sre.dns.wipe-cache netboxdb1003.eqiad.wmnet on all recursors
  • 10:58 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:57 ayounsi@cumin1002: START - Cookbook sre.dns.netbox
  • 10:56 ayounsi@cumin1002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 10:53 ayounsi@cumin1002: START - Cookbook sre.dns.netbox
  • 10:53 ayounsi@cumin1002: START - Cookbook sre.ganeti.makevm for new host netboxdb1003.eqiad.wmnet
  • 10:52 ayounsi@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host netbox1003.eqiad.wmnet
  • 10:52 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host netbox1003.eqiad.wmnet with OS bookworm
  • 10:51 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic: apply
  • 10:51 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic: apply
  • 10:47 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub: sync on production
  • 10:41 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub: apply on production
  • 10:40 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/superset: apply
  • 10:40 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/superset: apply
  • 10:39 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/superset-next: apply
  • 10:39 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/superset-next: apply
  • 10:37 sfaci@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
  • 10:36 sfaci@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
  • 10:34 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 10:34 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 10:27 jelto@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: Upgrade GitLab to new version
  • 10:12 klausman@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
  • 10:12 klausman@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
  • 10:01 sukhe: [end] authdns-update for sending BR to magru: T359054
  • 10:00 sukhe: [start] authdns-update for sending BR to magru: T359054
  • 09:54 dcausse@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 09:54 dcausse@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 09:53 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
  • 09:53 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
  • 09:45 ayounsi@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on netbox2003.codfw.wmnet with reason: host reimage
  • 09:42 ayounsi@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on netbox2003.codfw.wmnet with reason: host reimage
  • 09:36 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
  • 09:33 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
  • 09:31 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
  • 09:28 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
  • 09:25 ayounsi@cumin2002: START - Cookbook sre.hosts.reimage for host netbox2003.codfw.wmnet with OS bookworm
  • 09:25 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on netbox1003.eqiad.wmnet with reason: host reimage
  • 09:23 jiji@deploy1002: Finished scap: Remove mcrouter container and exporter from mediawiki pods (duration: 04m 33s)
  • 09:23 ayounsi@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM netbox2003.codfw.wmnet - ayounsi@cumin2002"
  • 09:22 ayounsi@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM netbox2003.codfw.wmnet - ayounsi@cumin2002"
  • 09:22 ayounsi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on netbox1003.eqiad.wmnet with reason: host reimage
  • 09:22 ayounsi@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) netbox2003.codfw.wmnet on all recursors
  • 09:22 ayounsi@cumin2002: START - Cookbook sre.dns.wipe-cache netbox2003.codfw.wmnet on all recursors
  • 09:22 ayounsi@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:22 ayounsi@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM netbox2003.codfw.wmnet - ayounsi@cumin2002"
  • 09:20 ayounsi@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM netbox2003.codfw.wmnet - ayounsi@cumin2002"
  • 09:19 jiji@deploy1002: Started scap sync-world: Remove mcrouter container and exporter from mediawiki pods
  • 09:18 ayounsi@cumin2002: START - Cookbook sre.dns.netbox
  • 09:18 ayounsi@cumin2002: START - Cookbook sre.ganeti.makevm for new host netbox2003.codfw.wmnet
  • 09:13 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
  • 09:12 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
  • 09:11 ayounsi@cumin1002: START - Cookbook sre.hosts.reimage for host netbox1003.eqiad.wmnet with OS bookworm
  • 09:10 ayounsi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM netbox1003.eqiad.wmnet - ayounsi@cumin1002"
  • 09:09 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM netbox1003.eqiad.wmnet - ayounsi@cumin1002"
  • 09:09 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) netbox1003.eqiad.wmnet on all recursors
  • 09:09 ayounsi@cumin1002: START - Cookbook sre.dns.wipe-cache netbox1003.eqiad.wmnet on all recursors
  • 09:09 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:09 ayounsi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM netbox1003.eqiad.wmnet - ayounsi@cumin1002"
  • 09:08 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM netbox1003.eqiad.wmnet - ayounsi@cumin1002"
  • 09:05 ayounsi@cumin1002: START - Cookbook sre.dns.netbox
  • 09:05 ayounsi@cumin1002: START - Cookbook sre.ganeti.makevm for new host netbox1003.eqiad.wmnet
  • 09:05 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 09:04 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 09:02 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 09:00 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 08:57 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 08:57 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 08:55 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 08:55 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 08:46 elukey: cd /srv/git/private; git reset --hard HEAD^ on puppetserver1001 to remove my last local commit (test before migration of the private repo to puppetserver1001) - T368023
  • 08:41 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2198.codfw.wmnet with reason: Maintenance
  • 08:41 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2198.codfw.wmnet with reason: Maintenance
  • 08:41 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T367856)', diff saved to https://phabricator.wikimedia.org/P66280 and previous config saved to /var/cache/conftool/dbconfig/20240711-084151-marostegui.json
  • 08:30 hashar: Switched CI Quibble and Phan jobs based on PHP 8.1, 8.2 and 8.3 from Buster to Bullseye - T335766 T366799 T369146
  • 08:26 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P66279 and previous config saved to /var/cache/conftool/dbconfig/20240711-082644-marostegui.json
  • 08:15 aklapper@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.43.0-wmf.13 refs T366958
  • 08:11 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P66278 and previous config saved to /var/cache/conftool/dbconfig/20240711-081137-marostegui.json
  • 08:05 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Upgrade GitLab to new version
  • 07:56 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T367856)', diff saved to https://phabricator.wikimedia.org/P66277 and previous config saved to /var/cache/conftool/dbconfig/20240711-075630-marostegui.json
  • 07:50 marostegui: Deploy schema change on s3 codfw db2127 dbmaint T367856
  • 07:48 dcausse: closing the backport window
  • 07:48 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2127.codfw.wmnet with reason: Long schema change
  • 07:48 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2127.codfw.wmnet with reason: Long schema change
  • 07:47 dcausse@deploy1002: Finished scap: Backport for Fix pool counter metric (duration: 09m 56s)
  • 07:46 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2127 T369691', diff saved to https://phabricator.wikimedia.org/P66276 and previous config saved to /var/cache/conftool/dbconfig/20240711-074629-marostegui.json
  • 07:45 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db2205 to s3 primary T369691', diff saved to https://phabricator.wikimedia.org/P66275 and previous config saved to /var/cache/conftool/dbconfig/20240711-074534-marostegui.json
  • 07:45 marostegui: Starting s3 codfw failover from db2127 to db2205 - T369691
  • 07:42 dcausse@deploy1002: dcausse: Continuing with sync
  • 07:41 dcausse@deploy1002: dcausse: Backport for Fix pool counter metric synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 07:37 dcausse@deploy1002: Started scap sync-world: Backport for Fix pool counter metric
  • 07:31 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s3 T369691
  • 07:31 marostegui@cumin1002: dbctl commit (dc=all): 'Set db2205 with weight 0 T369691', diff saved to https://phabricator.wikimedia.org/P66274 and previous config saved to /var/cache/conftool/dbconfig/20240711-073101-root.json
  • 07:30 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 07:30 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 25 hosts with reason: Primary switchover s3 T369691
  • 07:30 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 07:28 jgiannelos@deploy1002: Finished scap: Backport for Linter: trigger parsoid parses on template changes (T361013) (duration: 14m 25s)
  • 07:23 jgiannelos@deploy1002: daniel, jgiannelos: Continuing with sync
  • 07:17 jgiannelos@deploy1002: daniel, jgiannelos: Backport for Linter: trigger parsoid parses on template changes (T361013) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 07:14 jgiannelos@deploy1002: Started scap sync-world: Backport for Linter: trigger parsoid parses on template changes (T361013)
  • 07:12 kartik@deploy1002: Finished scap: Backport for Enable MinT for Wikipedia readers MVP on a second group of pilot wikis (T367067) (duration: 09m 32s)
  • 07:07 kartik@deploy1002: kartik: Continuing with sync
  • 07:05 kartik@deploy1002: kartik: Backport for Enable MinT for Wikipedia readers MVP on a second group of pilot wikis (T367067) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 07:02 kartik@deploy1002: Started scap sync-world: Backport for Enable MinT for Wikipedia readers MVP on a second group of pilot wikis (T367067)
  • 07:00 marostegui@cumin1002: dbctl commit (dc=all): 'db1163 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P66273 and previous config saved to /var/cache/conftool/dbconfig/20240711-070004-root.json
  • 06:55 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2200.codfw.wmnet with reason: Maintenance
  • 06:55 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2200.codfw.wmnet with reason: Maintenance
  • 06:55 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2198.codfw.wmnet with reason: Maintenance
  • 06:55 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2198.codfw.wmnet with reason: Maintenance
  • 06:55 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 (T367781)', diff saved to https://phabricator.wikimedia.org/P66272 and previous config saved to /var/cache/conftool/dbconfig/20240711-065508-arnaudb.json
  • 06:54 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2209 (T367856)', diff saved to https://phabricator.wikimedia.org/P66271 and previous config saved to /var/cache/conftool/dbconfig/20240711-065432-marostegui.json
  • 06:29 marostegui@cumin1002: dbctl commit (dc=all): 'db1163 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P66267 and previous config saved to /var/cache/conftool/dbconfig/20240711-062953-root.json
  • 06:24 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P66266 and previous config saved to /var/cache/conftool/dbconfig/20240711-062454-arnaudb.json
  • 06:24 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2209', diff saved to https://phabricator.wikimedia.org/P66265 and previous config saved to /var/cache/conftool/dbconfig/20240711-062417-marostegui.json
  • 06:14 marostegui@cumin1002: dbctl commit (dc=all): 'db1163 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P66264 and previous config saved to /var/cache/conftool/dbconfig/20240711-061447-root.json
  • 06:09 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 (T367781)', diff saved to https://phabricator.wikimedia.org/P66263 and previous config saved to /var/cache/conftool/dbconfig/20240711-060947-arnaudb.json
  • 06:09 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2209 (T367856)', diff saved to https://phabricator.wikimedia.org/P66262 and previous config saved to /var/cache/conftool/dbconfig/20240711-060910-marostegui.json
  • 06:07 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2195 (T367781)', diff saved to https://phabricator.wikimedia.org/P66261 and previous config saved to /var/cache/conftool/dbconfig/20240711-060736-arnaudb.json
  • 06:07 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2195.codfw.wmnet with reason: Maintenance
  • 06:07 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2195.codfw.wmnet with reason: Maintenance
  • 06:07 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 (T367781)', diff saved to https://phabricator.wikimedia.org/P66260 and previous config saved to /var/cache/conftool/dbconfig/20240711-060714-arnaudb.json
  • 05:59 marostegui@cumin1002: dbctl commit (dc=all): 'db1163 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P66259 and previous config saved to /var/cache/conftool/dbconfig/20240711-055942-root.json
  • 05:52 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P66258 and previous config saved to /var/cache/conftool/dbconfig/20240711-055206-arnaudb.json
  • 05:44 marostegui@cumin1002: dbctl commit (dc=all): 'db1163 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P66257 and previous config saved to /var/cache/conftool/dbconfig/20240711-054436-root.json
  • 05:36 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P66256 and previous config saved to /var/cache/conftool/dbconfig/20240711-053659-arnaudb.json
  • 05:29 marostegui@cumin1002: dbctl commit (dc=all): 'db1163 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P66255 and previous config saved to /var/cache/conftool/dbconfig/20240711-052931-root.json
  • 05:27 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1163 T369514', diff saved to https://phabricator.wikimedia.org/P66254 and previous config saved to /var/cache/conftool/dbconfig/20240711-052702-root.json
  • 05:25 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db1184 to s1 primary and set section read-write T369514', diff saved to https://phabricator.wikimedia.org/P66253 and previous config saved to /var/cache/conftool/dbconfig/20240711-052540-root.json
  • 05:25 marostegui@cumin1002: dbctl commit (dc=all): 'Set s1 eqiad as read-only for maintenance - T369514', diff saved to https://phabricator.wikimedia.org/P66252 and previous config saved to /var/cache/conftool/dbconfig/20240711-052507-root.json
  • 05:24 marostegui: Starting s1 eqiad failover from db1163 to db1184 - T369514
  • 05:21 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 (T367781)', diff saved to https://phabricator.wikimedia.org/P66251 and previous config saved to /var/cache/conftool/dbconfig/20240711-052151-arnaudb.json
  • 05:19 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2181 (T367781)', diff saved to https://phabricator.wikimedia.org/P66250 and previous config saved to /var/cache/conftool/dbconfig/20240711-051941-arnaudb.json
  • 05:19 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2181.codfw.wmnet with reason: Maintenance
  • 05:19 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2181.codfw.wmnet with reason: Maintenance
  • 05:19 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167 (T367781)', diff saved to https://phabricator.wikimedia.org/P66249 and previous config saved to /var/cache/conftool/dbconfig/20240711-051920-arnaudb.json
  • 05:04 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P66248 and previous config saved to /var/cache/conftool/dbconfig/20240711-050413-arnaudb.json
  • 04:59 marostegui@cumin1002: dbctl commit (dc=all): 'Remove db1184 from API/vslow/dump T369514', diff saved to https://phabricator.wikimedia.org/P66247 and previous config saved to /var/cache/conftool/dbconfig/20240711-045905-marostegui.json
  • 04:58 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 36 hosts with reason: Primary switchover s1 T369514
  • 04:58 marostegui@cumin1002: dbctl commit (dc=all): 'Set db1184 with weight 0 T369514', diff saved to https://phabricator.wikimedia.org/P66246 and previous config saved to /var/cache/conftool/dbconfig/20240711-045829-marostegui.json
  • 04:58 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 36 hosts with reason: Primary switchover s1 T369514
  • 04:49 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P66245 and previous config saved to /var/cache/conftool/dbconfig/20240711-044905-arnaudb.json
  • 04:34 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167 (T367781)', diff saved to https://phabricator.wikimedia.org/P66244 and previous config saved to /var/cache/conftool/dbconfig/20240711-043358-arnaudb.json
  • 04:31 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2167 (T367781)', diff saved to https://phabricator.wikimedia.org/P66243 and previous config saved to /var/cache/conftool/dbconfig/20240711-043147-arnaudb.json
  • 04:31 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2167.codfw.wmnet with reason: Maintenance
  • 04:31 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2167.codfw.wmnet with reason: Maintenance
  • 04:31 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T367781)', diff saved to https://phabricator.wikimedia.org/P66242 and previous config saved to /var/cache/conftool/dbconfig/20240711-043124-arnaudb.json
  • 04:16 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P66241 and previous config saved to /var/cache/conftool/dbconfig/20240711-041617-arnaudb.json
  • 04:01 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P66240 and previous config saved to /var/cache/conftool/dbconfig/20240711-040110-arnaudb.json
  • 03:46 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T367781)', diff saved to https://phabricator.wikimedia.org/P66239 and previous config saved to /var/cache/conftool/dbconfig/20240711-034603-arnaudb.json
  • 03:43 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2166 (T367781)', diff saved to https://phabricator.wikimedia.org/P66238 and previous config saved to /var/cache/conftool/dbconfig/20240711-034352-arnaudb.json
  • 03:43 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2166.codfw.wmnet with reason: Maintenance
  • 03:43 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2166.codfw.wmnet with reason: Maintenance
  • 03:43 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2165 (T367781)', diff saved to https://phabricator.wikimedia.org/P66237 and previous config saved to /var/cache/conftool/dbconfig/20240711-034330-arnaudb.json
  • 03:28 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P66236 and previous config saved to /var/cache/conftool/dbconfig/20240711-032823-arnaudb.json
  • 03:20 eileen: civicrm upgraded from 04cb9083 to 3287ced0
  • 03:13 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2165', diff saved to https://phabricator.wikimedia.org/P66235 and previous config saved to /var/cache/conftool/dbconfig/20240711-031316-arnaudb.json
  • 03:08 eileen: civicrm upgraded from 2d1a0aad to 04cb9083
  • 02:58 eileen: config revision changed from e02c3a85 to 540f27e6
  • 02:58 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2165 (T367781)', diff saved to https://phabricator.wikimedia.org/P66234 and previous config saved to /var/cache/conftool/dbconfig/20240711-025809-arnaudb.json
  • 02:56 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2165 (T367781)', diff saved to https://phabricator.wikimedia.org/P66233 and previous config saved to /var/cache/conftool/dbconfig/20240711-025558-arnaudb.json
  • 02:55 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2165.codfw.wmnet with reason: Maintenance
  • 02:55 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2165.codfw.wmnet with reason: Maintenance
  • 02:55 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T367781)', diff saved to https://phabricator.wikimedia.org/P66232 and previous config saved to /var/cache/conftool/dbconfig/20240711-025537-arnaudb.json
  • 02:48 eileen: civicrm upgraded from a17496a2 to 2d1a0aad
  • 02:45 mutante: stewards2001 - sudo mv /srv/repos/users-db /root/ - run puppet and let it recreate the usersdb repo - this time pulling from gitlab - T369780 T369430
  • 02:40 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P66231 and previous config saved to /var/cache/conftool/dbconfig/20240711-024030-arnaudb.json
  • 02:25 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P66230 and previous config saved to /var/cache/conftool/dbconfig/20240711-022522-arnaudb.json
  • 02:15 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1060.eqiad.wmnet with OS bookworm
  • 02:10 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T367781)', diff saved to https://phabricator.wikimedia.org/P66229 and previous config saved to /var/cache/conftool/dbconfig/20240711-021015-arnaudb.json
  • 02:08 eileen: civicrm upgraded from a03085ff to 1e2fcba3
  • 02:08 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2164 (T367781)', diff saved to https://phabricator.wikimedia.org/P66228 and previous config saved to /var/cache/conftool/dbconfig/20240711-020805-arnaudb.json
  • 02:07 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 02:07 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 02:07 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2164.codfw.wmnet with reason: Maintenance
  • 02:07 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2164.codfw.wmnet with reason: Maintenance
  • 02:07 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T367781)', diff saved to https://phabricator.wikimedia.org/P66227 and previous config saved to /var/cache/conftool/dbconfig/20240711-020738-arnaudb.json
  • 01:54 eileen: config revision changed from 840e6b90 to e02c3a85
  • 01:52 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P66226 and previous config saved to /var/cache/conftool/dbconfig/20240711-015231-arnaudb.json
  • 01:46 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1060.eqiad.wmnet with reason: host reimage
  • 01:43 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1060.eqiad.wmnet with reason: host reimage
  • 01:37 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P66225 and previous config saved to /var/cache/conftool/dbconfig/20240711-013723-arnaudb.json
  • 01:27 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1060.eqiad.wmnet with OS bookworm
  • 01:22 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T367781)', diff saved to https://phabricator.wikimedia.org/P66224 and previous config saved to /var/cache/conftool/dbconfig/20240711-012216-arnaudb.json
  • 01:21 mutante: gerrit-replica.wikimedia.org (gerrit2002) - switched firewall provider from iptables to nftables - all seems fine to me but just in case: gerrit:1053068 can be reverted to go back
  • 01:20 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2163 (T367781)', diff saved to https://phabricator.wikimedia.org/P66223 and previous config saved to /var/cache/conftool/dbconfig/20240711-012006-arnaudb.json
  • 01:19 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2163.codfw.wmnet with reason: Maintenance
  • 01:19 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2163.codfw.wmnet with reason: Maintenance
  • 01:19 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2162 (T367781)', diff saved to https://phabricator.wikimedia.org/P66222 and previous config saved to /var/cache/conftool/dbconfig/20240711-011944-arnaudb.json
  • 01:04 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2162', diff saved to https://phabricator.wikimedia.org/P66221 and previous config saved to /var/cache/conftool/dbconfig/20240711-010437-arnaudb.json
  • 00:55 mutante: gerrit-replica.wikimedia.org (gerrit2002) - maintenance
  • 00:49 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2162', diff saved to https://phabricator.wikimedia.org/P66220 and previous config saved to /var/cache/conftool/dbconfig/20240711-004930-arnaudb.json
  • 00:49 dzahn@cumin1002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 1:00:00 on gerrit-replica.wikimedia.org with reason: switch firewall provider
  • 00:49 dzahn@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on gerrit-replica.wikimedia.org with reason: switch firewall provider
  • 00:49 dzahn@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on gerrit2002.wikimedia.org with reason: switch firewall provider
  • 00:48 dzahn@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on gerrit2002.wikimedia.org with reason: switch firewall provider
  • 00:34 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2162 (T367781)', diff saved to https://phabricator.wikimedia.org/P66219 and previous config saved to /var/cache/conftool/dbconfig/20240711-003423-arnaudb.json
  • 00:32 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2162 (T367781)', diff saved to https://phabricator.wikimedia.org/P66218 and previous config saved to /var/cache/conftool/dbconfig/20240711-003212-arnaudb.json
  • 00:32 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2162.codfw.wmnet with reason: Maintenance
  • 00:32 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2162.codfw.wmnet with reason: Maintenance
  • 00:31 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T367781)', diff saved to https://phabricator.wikimedia.org/P66217 and previous config saved to /var/cache/conftool/dbconfig/20240711-003150-arnaudb.json
  • 00:16 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P66216 and previous config saved to /var/cache/conftool/dbconfig/20240711-001643-arnaudb.json
  • 00:01 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P66215 and previous config saved to /var/cache/conftool/dbconfig/20240711-000136-arnaudb.json

2024-07-10

  • 23:46 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T367781)', diff saved to https://phabricator.wikimedia.org/P66214 and previous config saved to /var/cache/conftool/dbconfig/20240710-234629-arnaudb.json
  • 23:44 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2154 (T367781)', diff saved to https://phabricator.wikimedia.org/P66213 and previous config saved to /var/cache/conftool/dbconfig/20240710-234418-arnaudb.json
  • 23:44 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2154.codfw.wmnet with reason: Maintenance
  • 23:44 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2154.codfw.wmnet with reason: Maintenance
  • 23:43 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T367781)', diff saved to https://phabricator.wikimedia.org/P66212 and previous config saved to /var/cache/conftool/dbconfig/20240710-234356-arnaudb.json
  • 23:36 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2182 (T367856)', diff saved to https://phabricator.wikimedia.org/P66211 and previous config saved to /var/cache/conftool/dbconfig/20240710-233558-marostegui.json
  • 23:35 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2182.codfw.wmnet with reason: Maintenance
  • 23:35 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2182.codfw.wmnet with reason: Maintenance
  • 23:35 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2168 (T367856)', diff saved to https://phabricator.wikimedia.org/P66210 and previous config saved to /var/cache/conftool/dbconfig/20240710-233535-marostegui.json
  • 23:35 rzl: $ sudo cumin A:all-mw enable-puppet T367012
  • 23:34 rzl@deploy1002: Finished scap: T367012 (duration: 07m 45s)
  • 23:30 rzl@deploy1002: rzl: Continuing with sync
  • 23:29 rzl@deploy1002: rzl: T367012 synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 23:28 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P66209 and previous config saved to /var/cache/conftool/dbconfig/20240710-232849-arnaudb.json
  • 23:27 rzl@deploy1002: Started scap sync-world: T367012
  • 23:20 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2168', diff saved to https://phabricator.wikimedia.org/P66208 and previous config saved to /var/cache/conftool/dbconfig/20240710-232028-marostegui.json
  • 23:20 rzl: $ sudo cumin A:all-mw disable-puppet # T367012 - really just for the old mwdebug hosts
  • 23:16 zabe@deploy1002: Finished scap: update interwiki cache (duration: 07m 32s)
  • 23:13 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P66207 and previous config saved to /var/cache/conftool/dbconfig/20240710-231342-arnaudb.json
  • 23:09 zabe@deploy1002: Started scap sync-world: update interwiki cache
  • 23:08 zabe@deploy1002: Finished scap: T362529 (duration: 07m 44s)
  • 23:05 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2168', diff saved to https://phabricator.wikimedia.org/P66206 and previous config saved to /var/cache/conftool/dbconfig/20240710-230522-marostegui.json
  • 23:02 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2209 (T367856)', diff saved to https://phabricator.wikimedia.org/P66205 and previous config saved to /var/cache/conftool/dbconfig/20240710-230130-marostegui.json
  • 23:01 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2209.codfw.wmnet with reason: Maintenance
  • 23:01 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2209.codfw.wmnet with reason: Maintenance
  • 23:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2205 (T367856)', diff saved to https://phabricator.wikimedia.org/P66204 and previous config saved to /var/cache/conftool/dbconfig/20240710-230107-marostegui.json
  • 23:00 zabe@deploy1002: Started scap sync-world: T362529
  • 23:00 zabe: Create Wikimedians of United Arab Emirates User Group Wiki # T362529
  • 23:00 mutante: puppetserver1001 - fixing failed unit geoip_update_ipinfo.service
  • 22:58 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T367781)', diff saved to https://phabricator.wikimedia.org/P66203 and previous config saved to /var/cache/conftool/dbconfig/20240710-225835-arnaudb.json
  • 22:57 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2152 (T367781)', diff saved to https://phabricator.wikimedia.org/P66202 and previous config saved to /var/cache/conftool/dbconfig/20240710-225725-arnaudb.json
  • 22:57 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2152.codfw.wmnet with reason: Maintenance
  • 22:57 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2152.codfw.wmnet with reason: Maintenance
  • 22:57 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance
  • 22:56 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance
  • 22:56 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 (T367781)', diff saved to https://phabricator.wikimedia.org/P66201 and previous config saved to /var/cache/conftool/dbconfig/20240710-225647-arnaudb.json
  • 22:53 mutante: puppetmaster1001 - remove Enterprise product ID from MaxMind downloads. sudo systemctl start geoip_update_ipinfo - T366272
  • 22:50 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2168 (T367856)', diff saved to https://phabricator.wikimedia.org/P66200 and previous config saved to /var/cache/conftool/dbconfig/20240710-225015-marostegui.json
  • 22:46 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P66199 and previous config saved to /var/cache/conftool/dbconfig/20240710-224559-marostegui.json
  • 22:41 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P66198 and previous config saved to /var/cache/conftool/dbconfig/20240710-224140-arnaudb.json
  • 22:35 dzahn@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: security release
  • 22:30 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P66197 and previous config saved to /var/cache/conftool/dbconfig/20240710-223052-marostegui.json
  • 22:26 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P66196 and previous config saved to /var/cache/conftool/dbconfig/20240710-222633-arnaudb.json
  • 22:25 dzahn@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: security release
  • 22:19 dzahn@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: security release
  • 22:15 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2205 (T367856)', diff saved to https://phabricator.wikimedia.org/P66195 and previous config saved to /var/cache/conftool/dbconfig/20240710-221545-marostegui.json
  • 22:11 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 (T367781)', diff saved to https://phabricator.wikimedia.org/P66194 and previous config saved to /var/cache/conftool/dbconfig/20240710-221126-arnaudb.json
  • 22:10 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1226 (T367781)', diff saved to https://phabricator.wikimedia.org/P66193 and previous config saved to /var/cache/conftool/dbconfig/20240710-221018-arnaudb.json
  • 22:10 mutante: gitlab-replica-b.wikimedia.org - version upgrade in progress
  • 22:10 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1226.eqiad.wmnet with reason: Maintenance
  • 22:10 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1226.eqiad.wmnet with reason: Maintenance
  • 22:10 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1216.eqiad.wmnet with reason: Maintenance
  • 22:09 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1216.eqiad.wmnet with reason: Maintenance
  • 22:09 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 (T367781)', diff saved to https://phabricator.wikimedia.org/P66192 and previous config saved to /var/cache/conftool/dbconfig/20240710-220951-arnaudb.json
  • 22:09 dzahn@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: security release
  • 21:54 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P66191 and previous config saved to /var/cache/conftool/dbconfig/20240710-215444-arnaudb.json
  • 21:39 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P66190 and previous config saved to /var/cache/conftool/dbconfig/20240710-213935-arnaudb.json
  • 21:30 jdrewniak@deploy1002: Finished scap: Backport for [July 10th] Vector: enable dark mode for tier 1 wikis (logged in only) (T368795), Add beta tag & feedback link to Appearance menu (T367871) (duration: 11m 35s)
  • 21:24 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 (T367781)', diff saved to https://phabricator.wikimedia.org/P66188 and previous config saved to /var/cache/conftool/dbconfig/20240710-212427-arnaudb.json
  • 21:23 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1214 (T367781)', diff saved to https://phabricator.wikimedia.org/P66187 and previous config saved to /var/cache/conftool/dbconfig/20240710-212319-arnaudb.json
  • 21:23 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1214.eqiad.wmnet with reason: Maintenance
  • 21:23 jdrewniak@deploy1002: jdlrobson, jdrewniak: Continuing with sync
  • 21:23 jdrewniak@deploy1002: jdlrobson, jdrewniak: Backport for [July 10th] Vector: enable dark mode for tier 1 wikis (logged in only) (T368795), Add beta tag & feedback link to Appearance menu (T367871) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:23 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1214.eqiad.wmnet with reason: Maintenance
  • 21:22 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 (T367781)', diff saved to https://phabricator.wikimedia.org/P66186 and previous config saved to /var/cache/conftool/dbconfig/20240710-212257-arnaudb.json
  • 21:18 jdrewniak@deploy1002: Started scap sync-world: Backport for [July 10th] Vector: enable dark mode for tier 1 wikis (logged in only) (T368795), Add beta tag & feedback link to Appearance menu (T367871)
  • 21:17 jdrewniak@deploy1002: Sync cancelled.
  • 21:17 jdrewniak@deploy1002: jdlrobson, jdrewniak: Backport for [July 10th] Vector: enable dark mode for tier 1 wikis (logged in only) (T368795), Add beta tag & feedback link to Appearance menu (T367871) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:10 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: elastic1096*,elastic1097*,elastic1106* for T348977 - bking@cumin2002
  • 21:10 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: elastic1096*,elastic1097*,elastic1106* for T348977 - bking@cumin2002
  • 21:09 jdrewniak@deploy1002: Started scap sync-world: Backport for [July 10th] Vector: enable dark mode for tier 1 wikis (logged in only) (T368795), Add beta tag & feedback link to Appearance menu (T367871)
  • 21:07 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P66185 and previous config saved to /var/cache/conftool/dbconfig/20240710-210750-arnaudb.json
  • 21:06 jdrewniak@deploy1002: Sync cancelled.
  • 21:06 jdrewniak@deploy1002: jdrewniak, jdlrobson: Backport for [July 10th] Vector: enable dark mode for tier 1 wikis (logged in only) (T368795), Add beta tag & feedback link to Appearance menu (T367871) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:04 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on elastic[1096-1097,1106].eqiad.wmnet with reason: T348977
  • 21:03 bking@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on elastic[1096-1097,1106].eqiad.wmnet with reason: T348977
  • 20:52 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P66184 and previous config saved to /var/cache/conftool/dbconfig/20240710-205242-arnaudb.json
  • 20:37 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 (T367781)', diff saved to https://phabricator.wikimedia.org/P66183 and previous config saved to /var/cache/conftool/dbconfig/20240710-203735-arnaudb.json
  • 20:37 jdrewniak@deploy1002: Started scap sync-world: Backport for [July 10th] Vector: enable dark mode for tier 1 wikis (logged in only) (T368795), Add beta tag & feedback link to Appearance menu (T367871)
  • 20:36 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1211 (T367781)', diff saved to https://phabricator.wikimedia.org/P66182 and previous config saved to /var/cache/conftool/dbconfig/20240710-203627-arnaudb.json
  • 20:36 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1211.eqiad.wmnet with reason: Maintenance
  • 20:36 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1211.eqiad.wmnet with reason: Maintenance
  • 20:36 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 (T367781)', diff saved to https://phabricator.wikimedia.org/P66181 and previous config saved to /var/cache/conftool/dbconfig/20240710-203605-arnaudb.json
  • 20:20 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P66180 and previous config saved to /var/cache/conftool/dbconfig/20240710-202057-arnaudb.json
  • 20:05 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P66179 and previous config saved to /var/cache/conftool/dbconfig/20240710-200550-arnaudb.json
  • 19:50 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 (T367781)', diff saved to https://phabricator.wikimedia.org/P66178 and previous config saved to /var/cache/conftool/dbconfig/20240710-195043-arnaudb.json
  • 19:49 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1203 (T367781)', diff saved to https://phabricator.wikimedia.org/P66177 and previous config saved to /var/cache/conftool/dbconfig/20240710-194935-arnaudb.json
  • 19:49 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1203.eqiad.wmnet with reason: Maintenance
  • 19:49 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1203.eqiad.wmnet with reason: Maintenance
  • 19:49 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1193 (T367781)', diff saved to https://phabricator.wikimedia.org/P66176 and previous config saved to /var/cache/conftool/dbconfig/20240710-194913-arnaudb.json
  • 19:34 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1193', diff saved to https://phabricator.wikimedia.org/P66174 and previous config saved to /var/cache/conftool/dbconfig/20240710-193406-arnaudb.json
  • 19:19 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1193', diff saved to https://phabricator.wikimedia.org/P66173 and previous config saved to /var/cache/conftool/dbconfig/20240710-191859-arnaudb.json
  • 19:03 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1193 (T367781)', diff saved to https://phabricator.wikimedia.org/P66172 and previous config saved to /var/cache/conftool/dbconfig/20240710-190352-arnaudb.json
  • 19:03 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1193 (T367781)', diff saved to https://phabricator.wikimedia.org/P66171 and previous config saved to /var/cache/conftool/dbconfig/20240710-190244-arnaudb.json
  • 19:02 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1193.eqiad.wmnet with reason: Maintenance
  • 19:02 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1193.eqiad.wmnet with reason: Maintenance
  • 19:02 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 (T367781)', diff saved to https://phabricator.wikimedia.org/P66170 and previous config saved to /var/cache/conftool/dbconfig/20240710-190222-arnaudb.json
  • 18:56 sfaci@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic: apply
  • 18:56 sfaci@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic: apply
  • 18:47 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P66169 and previous config saved to /var/cache/conftool/dbconfig/20240710-184714-arnaudb.json
  • 18:45 ryankemper@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:45 ryankemper@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add 4 new IPs (2 eqiad, 2 codfw) for wdqs graph split - ryankemper@cumin2002"
  • 18:43 ryankemper@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add 4 new IPs (2 eqiad, 2 codfw) for wdqs graph split - ryankemper@cumin2002"
  • 18:35 ryankemper@cumin2002: START - Cookbook sre.dns.netbox
  • 18:32 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P66168 and previous config saved to /var/cache/conftool/dbconfig/20240710-183207-arnaudb.json
  • 18:17 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 (T367781)', diff saved to https://phabricator.wikimedia.org/P66166 and previous config saved to /var/cache/conftool/dbconfig/20240710-181700-arnaudb.json
  • 17:16 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1192 (T367781)', diff saved to https://phabricator.wikimedia.org/P66164 and previous config saved to /var/cache/conftool/dbconfig/20240710-171644-arnaudb.json
  • 17:16 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1192.eqiad.wmnet with reason: Maintenance
  • 17:16 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1192.eqiad.wmnet with reason: Maintenance
  • 17:16 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T367781)', diff saved to https://phabricator.wikimedia.org/P66163 and previous config saved to /var/cache/conftool/dbconfig/20240710-171622-arnaudb.json
  • 17:01 arnaudb@cumin1002: dbctl commit (dc=all): 'db1190 (re)pooling @ 100%: post T365993 repool', diff saved to https://phabricator.wikimedia.org/P66162 and previous config saved to /var/cache/conftool/dbconfig/20240710-170143-arnaudb.json
  • 17:01 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P66161 and previous config saved to /var/cache/conftool/dbconfig/20240710-170115-arnaudb.json
  • 16:46 arnaudb@cumin1002: dbctl commit (dc=all): 'db1190 (re)pooling @ 75%: post T365993 repool', diff saved to https://phabricator.wikimedia.org/P66160 and previous config saved to /var/cache/conftool/dbconfig/20240710-164637-arnaudb.json
  • 16:46 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P66159 and previous config saved to /var/cache/conftool/dbconfig/20240710-164608-arnaudb.json
  • 16:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T352010)', diff saved to https://phabricator.wikimedia.org/P66158 and previous config saved to /var/cache/conftool/dbconfig/20240710-164225-ladsgroup.json
  • 16:31 arnaudb@cumin1002: dbctl commit (dc=all): 'db1190 (re)pooling @ 50%: post T365993 repool', diff saved to https://phabricator.wikimedia.org/P66157 and previous config saved to /var/cache/conftool/dbconfig/20240710-163131-arnaudb.json
  • 16:31 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T367781)', diff saved to https://phabricator.wikimedia.org/P66156 and previous config saved to /var/cache/conftool/dbconfig/20240710-163100-arnaudb.json
  • 16:30 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1178 (T367781)', diff saved to https://phabricator.wikimedia.org/P66155 and previous config saved to /var/cache/conftool/dbconfig/20240710-162952-arnaudb.json
  • 16:29 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1178.eqiad.wmnet with reason: Maintenance
  • 16:29 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1178.eqiad.wmnet with reason: Maintenance
  • 16:29 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T367781)', diff saved to Unable to send diff to phaste and previous config saved to /var/cache/conftool/dbconfig/20240710-162926-arnaudb.json
  • 16:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P66153 and previous config saved to /var/cache/conftool/dbconfig/20240710-162718-ladsgroup.json
  • 16:17 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
  • 16:17 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
  • 16:16 arnaudb@cumin1002: dbctl commit (dc=all): 'db1190 (re)pooling @ 25%: post T365993 repool', diff saved to https://phabricator.wikimedia.org/P66152 and previous config saved to /var/cache/conftool/dbconfig/20240710-161626-arnaudb.json
  • 16:14 brett@cumin2002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-secondary (T368083)
  • 16:14 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P66151 and previous config saved to /var/cache/conftool/dbconfig/20240710-161419-arnaudb.json
  • 16:12 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P66150 and previous config saved to /var/cache/conftool/dbconfig/20240710-161211-ladsgroup.json
  • 16:11 brett@cumin2002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-secondary (T368083)
  • 16:08 brett@cumin2002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-high-traffic2 (T368083)
  • 16:05 brett@cumin2002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-high-traffic2 (T368083)
  • 16:03 brett@cumin2002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-high-traffic1 (T368083)
  • 16:01 arnaudb@cumin1002: dbctl commit (dc=all): 'db1190 (re)pooling @ 10%: post T365993 repool', diff saved to https://phabricator.wikimedia.org/P66149 and previous config saved to /var/cache/conftool/dbconfig/20240710-160120-arnaudb.json
  • 16:01 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
  • 16:00 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
  • 16:00 brett@cumin2002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-high-traffic1 (T368083)
  • 15:59 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P66148 and previous config saved to /var/cache/conftool/dbconfig/20240710-155911-arnaudb.json
  • 15:59 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 15:58 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 15:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T352010)', diff saved to https://phabricator.wikimedia.org/P66147 and previous config saved to /var/cache/conftool/dbconfig/20240710-155703-ladsgroup.json
  • 15:55 brett@cumin2002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-high-traffic2-eqsin (T368083)
  • 15:54 brett@cumin2002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-high-traffic2-eqsin (T368083)
  • 15:53 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Unbanning all hosts in search_eqiad
  • 15:53 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Unbanning all hosts in search_eqiad
  • 15:49 brett@cumin2002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-high-traffic1-eqsin (T368083)
  • 15:48 brett@cumin2002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-high-traffic1-eqsin (T368083)
  • 15:48 sfaci@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
  • 15:48 sfaci@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
  • 15:46 arnaudb@cumin1002: dbctl commit (dc=all): 'db1190 (re)pooling @ 5%: post T365993 repool', diff saved to https://phabricator.wikimedia.org/P66146 and previous config saved to /var/cache/conftool/dbconfig/20240710-154615-arnaudb.json
  • 15:44 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T367781)', diff saved to https://phabricator.wikimedia.org/P66145 and previous config saved to /var/cache/conftool/dbconfig/20240710-154404-arnaudb.json
  • 15:42 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1177 (T367781)', diff saved to https://phabricator.wikimedia.org/P66144 and previous config saved to /var/cache/conftool/dbconfig/20240710-154256-arnaudb.json
  • 15:42 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1177.eqiad.wmnet with reason: Maintenance
  • 15:42 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1177.eqiad.wmnet with reason: Maintenance
  • 15:42 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T367781)', diff saved to https://phabricator.wikimedia.org/P66143 and previous config saved to /var/cache/conftool/dbconfig/20240710-154234-arnaudb.json
  • 15:36 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-presto1004.eqiad.wmnet with reason: Shutting down to investigate RAM issue
  • 15:36 btullis@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on an-presto1004.eqiad.wmnet with reason: Shutting down to investigate RAM issue
  • 15:27 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P66142 and previous config saved to /var/cache/conftool/dbconfig/20240710-152727-arnaudb.json
  • 15:27 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-high-traffic1 (T368083)
  • 15:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Remove db1233 from groups', diff saved to https://phabricator.wikimedia.org/P66141 and previous config saved to /var/cache/conftool/dbconfig/20240710-152616-ladsgroup.json
  • 15:24 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-high-traffic1 (T368083)
  • 15:24 vgutierrez: rolling restart of high-traffic1 LVSs to switch ncredir to maglev - T368083
  • 15:24 topranks: rebooting lsw1-e1-eqiad to install updated JunOS version T365993
  • 15:24 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on 26 hosts with reason: JunOS upgrade lsw1-e1-eqiad
  • 15:23 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on 26 hosts with reason: JunOS upgrade lsw1-e1-eqiad
  • 15:23 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lsw1-e1-eqiad,lsw1-e1-eqiad IPv6,ssw1-e1-eqiad.mgmt,ssw1-f1-eqiad.mgmt with reason: JunOS upgrade lsw1-e1-eqiad
  • 15:23 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on lsw1-e1-eqiad,lsw1-e1-eqiad IPv6,ssw1-e1-eqiad.mgmt,ssw1-f1-eqiad.mgmt with reason: JunOS upgrade lsw1-e1-eqiad
  • 15:19 vgutierrez@cumin1002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-secondary (T368083)
  • 15:16 vgutierrez@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-secondary (T368083)
  • 15:14 vgutierrez: rolling restart of secondary LVSs to switch ncredir to maglev - T368083
  • 15:13 elukey: restart turnilo on an-tool1007
  • 15:12 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P66140 and previous config saved to /var/cache/conftool/dbconfig/20240710-151219-arnaudb.json
  • 14:58 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2205 (T367856)', diff saved to https://phabricator.wikimedia.org/P66139 and previous config saved to /var/cache/conftool/dbconfig/20240710-145807-marostegui.json
  • 14:58 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2205.codfw.wmnet with reason: Maintenance
  • 14:57 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2205.codfw.wmnet with reason: Maintenance
  • 14:57 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 (T367856)', diff saved to https://phabricator.wikimedia.org/P66138 and previous config saved to /var/cache/conftool/dbconfig/20240710-145744-marostegui.json
  • 14:57 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T367781)', diff saved to https://phabricator.wikimedia.org/P66137 and previous config saved to /var/cache/conftool/dbconfig/20240710-145712-arnaudb.json
  • 14:56 cmooney@cumin1002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: elastic1104*,elastic1089*,elastic1090* for T365993 - cmooney@cumin1002
  • 14:55 cmooney@cumin1002: START - Cookbook sre.elasticsearch.ban Banning hosts: elastic1104*,elastic1089*,elastic1090* for T365993 - cmooney@cumin1002
  • 14:42 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P66136 and previous config saved to /var/cache/conftool/dbconfig/20240710-144237-marostegui.json
  • 14:37 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2168 (T367856)', diff saved to https://phabricator.wikimedia.org/P66135 and previous config saved to /var/cache/conftool/dbconfig/20240710-143713-marostegui.json
  • 14:37 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2168.codfw.wmnet with reason: Maintenance
  • 14:36 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2168.codfw.wmnet with reason: Maintenance
  • 14:36 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T367856)', diff saved to https://phabricator.wikimedia.org/P66134 and previous config saved to /var/cache/conftool/dbconfig/20240710-143651-marostegui.json
  • 14:34 cmooney@cumin1002: END (FAIL) - Cookbook sre.elasticsearch.ban (exit_code=99) Banning hosts: elastic1104,elastic1089,elastic1090 for ban elastic nodes before switch upgrade rack E1 - cmooney@cumin1002 - T365993
  • 14:34 cmooney@cumin1002: START - Cookbook sre.elasticsearch.ban Banning hosts: elastic1104,elastic1089,elastic1090 for ban elastic nodes before switch upgrade rack E1 - cmooney@cumin1002 - T365993
  • 14:30 logmsgbot: lucaswerkmeister-wmde@deploy1002 helmfile [codfw] DONE helmfile.d/services/termbox: apply
  • 14:30 logmsgbot: lucaswerkmeister-wmde@deploy1002 helmfile [codfw] START helmfile.d/services/termbox: apply
  • 14:30 logmsgbot: lucaswerkmeister-wmde@deploy1002 helmfile [eqiad] DONE helmfile.d/services/termbox: apply
  • 14:30 logmsgbot: lucaswerkmeister-wmde@deploy1002 helmfile [eqiad] START helmfile.d/services/termbox: apply
  • 14:28 logmsgbot: lucaswerkmeister-wmde@deploy1002 helmfile [staging] DONE helmfile.d/services/termbox: apply
  • 14:28 logmsgbot: lucaswerkmeister-wmde@deploy1002 helmfile [staging] START helmfile.d/services/termbox: apply
  • 14:27 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P66133 and previous config saved to /var/cache/conftool/dbconfig/20240710-142730-marostegui.json
  • 14:21 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P66132 and previous config saved to /var/cache/conftool/dbconfig/20240710-142144-marostegui.json
  • 14:21 kamila@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply
  • 14:20 kamila@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-video: apply
  • 14:20 kamila@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply
  • 14:19 kamila@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply
  • 14:19 kamila@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply
  • 14:19 kamila@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-video: apply
  • 14:16 jforrester@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 14:15 effie: disable puppet on mw memcached hosts - T352885
  • 14:13 jforrester@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 14:13 jforrester@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 14:12 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 (T367856)', diff saved to https://phabricator.wikimedia.org/P66131 and previous config saved to /var/cache/conftool/dbconfig/20240710-141222-marostegui.json
  • 14:11 jforrester@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 14:11 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 14:10 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 14:08 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on lsw1-e1-eqiad.mgmt with reason: prep JunOS upgrade lsw1-e1-eqiad
  • 14:08 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 1:30:00 on lsw1-e1-eqiad.mgmt with reason: prep JunOS upgrade lsw1-e1-eqiad
  • 14:07 jforrester@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 14:07 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:06 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P66130 and previous config saved to /var/cache/conftool/dbconfig/20240710-140637-marostegui.json
  • 14:06 jforrester@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 14:06 jforrester@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 14:05 jforrester@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 14:05 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 14:04 XioNoX: add ipxe_1.21.1+git-20240627.b66e27d to bookworm-wikimedia reprepro
  • 14:04 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 14:04 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 14:03 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on backup1010.eqiad.wmnet,db1190.eqiad.wmnet,dbproxy1026.eqiad.wmnet with reason: T365993
  • 14:02 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on backup1010.eqiad.wmnet,db1190.eqiad.wmnet,dbproxy1026.eqiad.wmnet with reason: T365993
  • 14:02 arnaudb@cumin1002: dbctl commit (dc=all): 'T365993 - depool db1190 - s4', diff saved to https://phabricator.wikimedia.org/P66129 and previous config saved to /var/cache/conftool/dbconfig/20240710-140224-arnaudb.json
  • 13:57 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1172 (T367781)', diff saved to https://phabricator.wikimedia.org/P66128 and previous config saved to /var/cache/conftool/dbconfig/20240710-135656-arnaudb.json
  • 13:57 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1172.eqiad.wmnet with reason: Maintenance
  • 13:56 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on kubernetes1051.eqiad.wmnet with reason: Hardware issue
  • 13:56 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1172.eqiad.wmnet with reason: Maintenance
  • 13:56 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 13:56 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on kubernetes1051.eqiad.wmnet with reason: Hardware issue
  • 13:56 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 13:56 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T367781)', diff saved to https://phabricator.wikimedia.org/P66127 and previous config saved to /var/cache/conftool/dbconfig/20240710-135619-arnaudb.json
  • 13:54 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 13:53 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
  • 13:51 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T367856)', diff saved to https://phabricator.wikimedia.org/P66126 and previous config saved to /var/cache/conftool/dbconfig/20240710-135130-marostegui.json
  • 13:49 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 13:48 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
  • 13:46 akosiaris@cumin1002: conftool action : set/pooled=inactive; selector: name=kubernetes1059.*
  • 13:44 btullis: re-enabling the misc dumps jobs on snapshot1017 with https://gerrit.wikimedia.org/r/c/operations/puppet/+/1053315
  • 13:41 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P66125 and previous config saved to /var/cache/conftool/dbconfig/20240710-134112-arnaudb.json
  • 13:34 klausman@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
  • 13:34 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-mariadb1001.eqiad.wmnet with OS bookworm
  • 13:33 klausman@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
  • 13:26 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P66124 and previous config saved to /var/cache/conftool/dbconfig/20240710-132604-arnaudb.json
  • 13:18 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-mariadb1001.eqiad.wmnet with reason: host reimage
  • 13:15 btullis@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-mariadb1001.eqiad.wmnet with reason: host reimage
  • 13:11 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T367781)', diff saved to https://phabricator.wikimedia.org/P66123 and previous config saved to /var/cache/conftool/dbconfig/20240710-131057-arnaudb.json
  • 13:01 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host an-mariadb1001.eqiad.wmnet with OS bookworm
  • 12:59 marostegui@cumin1002: dbctl commit (dc=all): 'db1183 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P66122 and previous config saved to /var/cache/conftool/dbconfig/20240710-125928-root.json
  • 12:44 marostegui@cumin1002: dbctl commit (dc=all): 'db1183 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P66121 and previous config saved to /var/cache/conftool/dbconfig/20240710-124422-root.json
  • 12:38 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1167 (T367781)', diff saved to https://phabricator.wikimedia.org/P66120 and previous config saved to /var/cache/conftool/dbconfig/20240710-123844-arnaudb.json
  • 12:38 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 12:38 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 12:38 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1167.eqiad.wmnet with reason: Maintenance
  • 12:38 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1167.eqiad.wmnet with reason: Maintenance
  • 12:30 topranks: removing unused wmcs vlans from asw2-b-eqiad
  • 12:29 marostegui@cumin1002: dbctl commit (dc=all): 'db1183 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P66119 and previous config saved to /var/cache/conftool/dbconfig/20240710-122917-root.json
  • 12:23 logmsgbot: lucaswerkmeister-wmde@deploy1002 helmfile [codfw] DONE helmfile.d/services/termbox: apply
  • 12:22 logmsgbot: lucaswerkmeister-wmde@deploy1002 helmfile [codfw] START helmfile.d/services/termbox: apply
  • 12:22 logmsgbot: lucaswerkmeister-wmde@deploy1002 helmfile [eqiad] DONE helmfile.d/services/termbox: apply
  • 12:21 logmsgbot: lucaswerkmeister-wmde@deploy1002 helmfile [eqiad] START helmfile.d/services/termbox: apply
  • 12:21 logmsgbot: lucaswerkmeister-wmde@deploy1002 helmfile [staging] DONE helmfile.d/services/termbox: apply
  • 12:20 logmsgbot: lucaswerkmeister-wmde@deploy1002 helmfile [staging] START helmfile.d/services/termbox: apply
  • 12:14 marostegui@cumin1002: dbctl commit (dc=all): 'db1183 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P66118 and previous config saved to /var/cache/conftool/dbconfig/20240710-121411-root.json
  • 11:59 marostegui@cumin1002: dbctl commit (dc=all): 'db1183 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P66117 and previous config saved to /var/cache/conftool/dbconfig/20240710-115906-root.json
  • 11:53 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub: sync on production
  • 11:50 marostegui@cumin1002: dbctl commit (dc=all): 'Pool db2136 into api with small weight T365805', diff saved to https://phabricator.wikimedia.org/P66116 and previous config saved to /var/cache/conftool/dbconfig/20240710-115046-marostegui.json
  • 11:50 claime: cleaned up leftover media files on videoscalers
  • 11:50 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub: apply on production
  • 11:44 marostegui@cumin1002: dbctl commit (dc=all): 'db1183 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P66115 and previous config saved to /var/cache/conftool/dbconfig/20240710-114401-root.json
  • 11:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1162 (T352010)', diff saved to https://phabricator.wikimedia.org/P66114 and previous config saved to /var/cache/conftool/dbconfig/20240710-113010-ladsgroup.json
  • 11:30 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 11:29 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 11:28 marostegui@cumin1002: dbctl commit (dc=all): 'db1183 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P66113 and previous config saved to /var/cache/conftool/dbconfig/20240710-112856-root.json
  • 11:22 mnz@deploy1002: Finished deploy [airflow-dags/research@5121748]: (no justification provided) (duration: 00m 41s)
  • 11:21 mnz@deploy1002: Started deploy [airflow-dags/research@5121748]: (no justification provided)
  • 10:43 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic: apply
  • 10:43 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic: apply
  • 10:43 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
  • 10:43 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
  • 10:42 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub-next: sync on staging
  • 10:39 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub-next: apply on staging
  • 10:38 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/superset: apply
  • 10:38 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/superset: apply
  • 10:34 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 10:34 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 10:29 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/superset-next: apply
  • 10:26 mnz@deploy1002: Finished deploy [airflow-dags/research@5121748]: (no justification provided) (duration: 00m 04s)
  • 10:26 mnz@deploy1002: Started deploy [airflow-dags/research@5121748]: (no justification provided)
  • 10:23 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1208.eqiad.wmnet with reason: corruption issue
  • 10:22 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1208.eqiad.wmnet with reason: corruption issue
  • 10:21 jiji@deploy1002: Finished scap: Switch mediawiki everywhere to use node-local mcrouter ds - T346690 (duration: 05m 15s)
  • 10:15 jiji@deploy1002: Started scap sync-world: Switch mediawiki everywhere to use node-local mcrouter ds - T346690
  • 09:29 kevinbazira@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 08:51 aklapper@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.43.0-wmf.13 refs T366958
  • 08:41 hashar: On deployment server, unblocked train by manually editing /var/lib/scap/scap/lib/python3.7/site-packages/scap/train.py to allow train blocker task with "progress" status instead of just "open" # T369689
  • 08:08 kostajh: UTC morning deploys done
  • 08:06 kharlan@deploy1002: Finished scap: Backport for ConfirmEdit: Enable showcaptcha action on testwiki and beta wikis (T20110) (duration: 09m 41s)
  • 08:00 kharlan@deploy1002: kharlan: Continuing with sync
  • 07:59 kharlan@deploy1002: kharlan: Backport for ConfirmEdit: Enable showcaptcha action on testwiki and beta wikis (T20110) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 07:57 kharlan@deploy1002: Started scap sync-world: Backport for ConfirmEdit: Enable showcaptcha action on testwiki and beta wikis (T20110)
  • 07:43 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs2025.codfw.wmnet
  • 07:43 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs2024.codfw.wmnet
  • 07:36 ryankemper@cumin2002: START - Cookbook sre.hosts.reboot-single for host wdqs2025.codfw.wmnet
  • 07:36 ryankemper@cumin2002: START - Cookbook sre.hosts.reboot-single for host wdqs2024.codfw.wmnet
  • 07:35 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs2023.codfw.wmnet
  • 07:33 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs2020.codfw.wmnet
  • 07:33 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs2021.codfw.wmnet
  • 07:32 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs2022.codfw.wmnet
  • 07:28 ryankemper@cumin2002: START - Cookbook sre.hosts.reboot-single for host wdqs2023.codfw.wmnet
  • 07:27 ryankemper@cumin2002: START - Cookbook sre.hosts.reboot-single for host wdqs2020.codfw.wmnet
  • 07:26 ryankemper@cumin2002: START - Cookbook sre.hosts.reboot-single for host wdqs2021.codfw.wmnet
  • 07:26 ryankemper@cumin2002: START - Cookbook sre.hosts.reboot-single for host wdqs2022.codfw.wmnet
  • 07:22 kostajh: UTC morning deploys done
  • 07:20 kharlan@deploy1002: Finished scap: Backport for IPReputation: Enable extension on testwiki (T360067) (duration: 14m 05s)
  • 07:18 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs2019.codfw.wmnet
  • 07:18 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs2018.codfw.wmnet
  • 07:15 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs2017.codfw.wmnet
  • 07:15 kharlan@deploy1002: kharlan: Continuing with sync
  • 07:11 ryankemper@cumin2002: START - Cookbook sre.hosts.reboot-single for host wdqs2018.codfw.wmnet
  • 07:11 ryankemper@cumin2002: START - Cookbook sre.hosts.reboot-single for host wdqs2019.codfw.wmnet
  • 07:09 ryankemper@cumin2002: START - Cookbook sre.hosts.reboot-single for host wdqs2017.codfw.wmnet
  • 07:09 kharlan@deploy1002: kharlan: Backport for IPReputation: Enable extension on testwiki (T360067) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 07:08 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs2016.codfw.wmnet
  • 07:06 kharlan@deploy1002: Started scap sync-world: Backport for IPReputation: Enable extension on testwiki (T360067)
  • 07:02 ryankemper@cumin2002: START - Cookbook sre.hosts.reboot-single for host wdqs2016.codfw.wmnet
  • 07:01 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs2015.codfw.wmnet
  • 06:58 XioNoX: push policy-statement BGP_agg_net_pops to all CRs (noop as it's not applied there) - T367439
  • 06:54 ryankemper@cumin2002: START - Cookbook sre.hosts.reboot-single for host wdqs2015.codfw.wmnet
  • 06:54 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 17072
  • 06:52 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 17072
  • 06:52 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs2014.codfw.wmnet
  • 06:35 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs2013.codfw.wmnet
  • 06:29 ryankemper@cumin2002: START - Cookbook sre.hosts.reboot-single for host wdqs2013.codfw.wmnet
  • 06:28 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs2012.codfw.wmnet
  • 06:24 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2194 (T367856)', diff saved to https://phabricator.wikimedia.org/P66110 and previous config saved to /var/cache/conftool/dbconfig/20240710-062424-marostegui.json
  • 06:24 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2194.codfw.wmnet with reason: Maintenance
  • 06:24 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2194.codfw.wmnet with reason: Maintenance
  • 06:24 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2190 (T367856)', diff saved to https://phabricator.wikimedia.org/P66109 and previous config saved to /var/cache/conftool/dbconfig/20240710-062401-marostegui.json
  • 06:22 ryankemper@cumin2002: START - Cookbook sre.hosts.reboot-single for host wdqs2012.codfw.wmnet
  • 06:16 ryankemper@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host wdqs2012.codfw.wmnet
  • 06:15 ryankemper@cumin2002: START - Cookbook sre.hosts.reboot-single for host wdqs2012.codfw.wmnet
  • 06:08 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P66108 and previous config saved to /var/cache/conftool/dbconfig/20240710-060854-marostegui.json
  • 05:53 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P66107 and previous config saved to /var/cache/conftool/dbconfig/20240710-055347-marostegui.json
  • 05:49 marostegui: Deploy schema change on s5 eqiad db1183 dbmaint T367856
  • 05:48 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1183.eqiad.wmnet with reason: Long schema change
  • 05:48 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1183.eqiad.wmnet with reason: Long schema change
  • 05:47 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1183 T369616', diff saved to https://phabricator.wikimedia.org/P66106 and previous config saved to /var/cache/conftool/dbconfig/20240710-054710-root.json
  • 05:46 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db1230 to s5 primary and set section read-write T369616', diff saved to https://phabricator.wikimedia.org/P66105 and previous config saved to /var/cache/conftool/dbconfig/20240710-054621-marostegui.json
  • 05:46 marostegui@cumin1002: dbctl commit (dc=all): 'Set s5 eqiad as read-only for maintenance - T369616', diff saved to https://phabricator.wikimedia.org/P66104 and previous config saved to /var/cache/conftool/dbconfig/20240710-054559-marostegui.json
  • 05:45 marostegui: Starting s5 eqiad failover from db1183 to db1230 - T369616
  • 05:38 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2190 (T367856)', diff saved to https://phabricator.wikimedia.org/P66103 and previous config saved to /var/cache/conftool/dbconfig/20240710-053840-marostegui.json
  • 05:30 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s5 T369616
  • 05:30 marostegui@cumin1002: dbctl commit (dc=all): 'Set db1230 with weight 0 T369616', diff saved to https://phabricator.wikimedia.org/P66102 and previous config saved to /var/cache/conftool/dbconfig/20240710-053009-root.json
  • 05:30 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 25 hosts with reason: Primary switchover s5 T369616
  • 05:25 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2159 (T367856)', diff saved to https://phabricator.wikimedia.org/P66101 and previous config saved to /var/cache/conftool/dbconfig/20240710-052520-marostegui.json
  • 05:25 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 05:25 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 05:24 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2159.codfw.wmnet with reason: Maintenance
  • 05:24 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2159.codfw.wmnet with reason: Maintenance
  • 05:24 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T367856)', diff saved to https://phabricator.wikimedia.org/P66100 and previous config saved to /var/cache/conftool/dbconfig/20240710-052443-marostegui.json
  • 05:09 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P66099 and previous config saved to /var/cache/conftool/dbconfig/20240710-050935-marostegui.json
  • 04:54 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P66098 and previous config saved to /var/cache/conftool/dbconfig/20240710-045428-marostegui.json
  • 04:39 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T367856)', diff saved to https://phabricator.wikimedia.org/P66097 and previous config saved to /var/cache/conftool/dbconfig/20240710-043921-marostegui.json
  • 03:22 eileen: tools upgraded from 95f10b20 to 94bac5c6

2024-07-09

  • 22:33 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2190 (T367856)', diff saved to https://phabricator.wikimedia.org/P66096 and previous config saved to /var/cache/conftool/dbconfig/20240709-223336-marostegui.json
  • 22:33 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2190.codfw.wmnet with reason: Maintenance
  • 22:33 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2190.codfw.wmnet with reason: Maintenance
  • 22:33 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T367856)', diff saved to https://phabricator.wikimedia.org/P66095 and previous config saved to /var/cache/conftool/dbconfig/20240709-223314-marostegui.json
  • 22:18 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P66094 and previous config saved to /var/cache/conftool/dbconfig/20240709-221807-marostegui.json
  • 22:03 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P66093 and previous config saved to /var/cache/conftool/dbconfig/20240709-220300-marostegui.json
  • 21:50 ejegg: payments-wiki upgraded from dc0c14d4 to 4e48059a (and ingenico config removed)
  • 21:47 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T367856)', diff saved to https://phabricator.wikimedia.org/P66092 and previous config saved to /var/cache/conftool/dbconfig/20240709-214752-marostegui.json
  • 21:24 ejegg: fundraising civicrm upgraded from 84d6f5d1 to a03085ff
  • 21:18 urbanecm@deploy1002: Finished scap: Backport for use text() instead of escaped() for msg recentchanges (T352626) (duration: 21m 50s)
  • 21:13 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Unbanning all hosts in search_eqiad
  • 21:13 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Unbanning all hosts in search_eqiad
  • 21:12 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2212 (T352010)', diff saved to https://phabricator.wikimedia.org/P66091 and previous config saved to /var/cache/conftool/dbconfig/20240709-211231-ladsgroup.json
  • 21:12 urbanecm@deploy1002: gergesshamon, urbanecm: Continuing with sync
  • 21:00 urbanecm@deploy1002: gergesshamon, urbanecm: Backport for use text() instead of escaped() for msg recentchanges (T352626) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2212', diff saved to https://phabricator.wikimedia.org/P66090 and previous config saved to /var/cache/conftool/dbconfig/20240709-205724-ladsgroup.json
  • 20:56 urbanecm@deploy1002: Started scap sync-world: Backport for use text() instead of escaped() for msg recentchanges (T352626)
  • 20:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2212', diff saved to https://phabricator.wikimedia.org/P66089 and previous config saved to /var/cache/conftool/dbconfig/20240709-204217-ladsgroup.json
  • 20:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2212 (T352010)', diff saved to https://phabricator.wikimedia.org/P66088 and previous config saved to /var/cache/conftool/dbconfig/20240709-202709-ladsgroup.json
  • 20:19 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2150 (T367856)', diff saved to https://phabricator.wikimedia.org/P66087 and previous config saved to /var/cache/conftool/dbconfig/20240709-201928-marostegui.json
  • 20:19 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2150.codfw.wmnet with reason: Maintenance
  • 20:19 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2150.codfw.wmnet with reason: Maintenance
  • 20:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2122 (T367856)', diff saved to https://phabricator.wikimedia.org/P66086 and previous config saved to /var/cache/conftool/dbconfig/20240709-201906-marostegui.json
  • 20:16 urbanecm@deploy1002: Finished scap: Backport for Missing.php: check REQUEST_URI in addition to PATH_INFO (T9496 T355018) (duration: 13m 01s)
  • 20:10 urbanecm@deploy1002: urbanecm, pppery: Continuing with sync
  • 20:07 urbanecm@deploy1002: urbanecm, pppery: Backport for Missing.php: check REQUEST_URI in addition to PATH_INFO (T9496 T355018) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:03 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2122', diff saved to https://phabricator.wikimedia.org/P66084 and previous config saved to /var/cache/conftool/dbconfig/20240709-200359-marostegui.json
  • 20:03 urbanecm@deploy1002: Started scap sync-world: Backport for Missing.php: check REQUEST_URI in addition to PATH_INFO (T9496 T355018)
  • 19:48 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2122', diff saved to https://phabricator.wikimedia.org/P66083 and previous config saved to /var/cache/conftool/dbconfig/20240709-194851-marostegui.json
  • 19:33 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2122 (T367856)', diff saved to https://phabricator.wikimedia.org/P66082 and previous config saved to /var/cache/conftool/dbconfig/20240709-193344-marostegui.json
  • 17:14 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 17:13 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 17:12 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 17:12 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 17:12 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 17:12 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 17:12 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 17:12 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 17:11 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: apply
  • 17:11 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 17:11 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/rest-gateway: apply
  • 17:03 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on relforge[1003-1004].eqiad.wmnet with reason: T368950
  • 17:03 bking@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on relforge[1003-1004].eqiad.wmnet with reason: T368950
  • 16:59 marostegui@cumin1002: dbctl commit (dc=all): 'db1199 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P66080 and previous config saved to /var/cache/conftool/dbconfig/20240709-165921-root.json
  • 16:57 marostegui@cumin1002: dbctl commit (dc=all): 'db1198 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P66079 and previous config saved to /var/cache/conftool/dbconfig/20240709-165746-root.json
  • 16:57 marostegui@cumin1002: dbctl commit (dc=all): 'db1192 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P66078 and previous config saved to /var/cache/conftool/dbconfig/20240709-165738-root.json
  • 16:57 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: apply
  • 16:57 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/rest-gateway: apply
  • 16:44 marostegui@cumin1002: dbctl commit (dc=all): 'db1199 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P66077 and previous config saved to /var/cache/conftool/dbconfig/20240709-164415-root.json
  • 16:42 marostegui@cumin1002: dbctl commit (dc=all): 'db1198 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P66076 and previous config saved to /var/cache/conftool/dbconfig/20240709-164241-root.json
  • 16:42 marostegui@cumin1002: dbctl commit (dc=all): 'db1192 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P66075 and previous config saved to /var/cache/conftool/dbconfig/20240709-164233-root.json
  • 16:40 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/rest-gateway: apply
  • 16:40 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/rest-gateway: apply
  • 16:30 btullis@deploy1002: Finished deploy [analytics/refinery@a203f30] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@a203f30c] (duration: 03m 41s)
  • 16:29 marostegui@cumin1002: dbctl commit (dc=all): 'db1199 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P66074 and previous config saved to /var/cache/conftool/dbconfig/20240709-162909-root.json
  • 16:27 marostegui@cumin1002: dbctl commit (dc=all): 'db1198 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P66073 and previous config saved to /var/cache/conftool/dbconfig/20240709-162735-root.json
  • 16:27 marostegui@cumin1002: dbctl commit (dc=all): 'db1192 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P66072 and previous config saved to /var/cache/conftool/dbconfig/20240709-162727-root.json
  • 16:26 btullis@deploy1002: Started deploy [analytics/refinery@a203f30] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@a203f30c]
  • 16:25 btullis@deploy1002: Finished deploy [analytics/refinery@a203f30] (thin): Regular analytics weekly train THIN [analytics/refinery@a203f30c] (duration: 04m 05s)
  • 16:21 btullis@deploy1002: Started deploy [analytics/refinery@a203f30] (thin): Regular analytics weekly train THIN [analytics/refinery@a203f30c]
  • 16:20 btullis@deploy1002: Finished deploy [analytics/refinery@a203f30]: Regular analytics weekly train [analytics/refinery@a203f30c] (duration: 01m 18s)
  • 16:19 btullis@deploy1002: Started deploy [analytics/refinery@a203f30]: Regular analytics weekly train [analytics/refinery@a203f30c]
  • 16:19 btullis@deploy1002: Finished deploy [analytics/refinery@a203f30]: Regular analytics weekly train [analytics/refinery@a203f30c] (duration: 04m 51s)
  • 16:14 btullis@deploy1002: Started deploy [analytics/refinery@a203f30]: Regular analytics weekly train [analytics/refinery@a203f30c]
  • 16:14 btullis@deploy1002: Finished deploy [analytics/refinery@a203f30]: Regular analytics weekly train [analytics/refinery@a203f30c] (duration: 09m 23s)
  • 16:14 marostegui@cumin1002: dbctl commit (dc=all): 'db1199 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P66071 and previous config saved to /var/cache/conftool/dbconfig/20240709-161404-root.json
  • 16:14 btullis: pooled druid1010
  • 16:13 btullis: unset noout mode on the cephosd cluster
  • 16:13 btullis: uncordoned dse-k8s-worker1006
  • 16:12 marostegui@cumin1002: dbctl commit (dc=all): 'db1198 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P66070 and previous config saved to /var/cache/conftool/dbconfig/20240709-161230-root.json
  • 16:12 marostegui@cumin1002: dbctl commit (dc=all): 'db1192 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P66069 and previous config saved to /var/cache/conftool/dbconfig/20240709-161222-root.json
  • 16:07 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply
  • 16:04 btullis@deploy1002: Started deploy [analytics/refinery@a203f30]: Regular analytics weekly train [analytics/refinery@a203f30c]
  • 15:58 marostegui@cumin1002: dbctl commit (dc=all): 'db1199 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P66068 and previous config saved to /var/cache/conftool/dbconfig/20240709-155858-root.json
  • 15:57 marostegui@cumin1002: dbctl commit (dc=all): 'db1198 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P66067 and previous config saved to /var/cache/conftool/dbconfig/20240709-155724-root.json
  • 15:57 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-video: apply
  • 15:57 marostegui@cumin1002: dbctl commit (dc=all): 'db1192 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P66066 and previous config saved to /var/cache/conftool/dbconfig/20240709-155717-root.json
  • 15:56 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply
  • 15:46 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-video: apply
  • 15:44 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply
  • 15:44 ayounsi@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) netbox to netbox-dev2003.codfw.wmnet with reason: Release v4.0.6 to netbox-next - ayounsi@cumin1002 - T336275
  • 15:44 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply
  • 15:43 marostegui@cumin1002: dbctl commit (dc=all): 'db1199 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P66065 and previous config saved to /var/cache/conftool/dbconfig/20240709-154353-root.json
  • 15:42 ayounsi@cumin1002: START - Cookbook sre.deploy.python-code netbox to netbox-dev2003.codfw.wmnet with reason: Release v4.0.6 to netbox-next - ayounsi@cumin1002 - T336275
  • 15:42 marostegui@cumin1002: dbctl commit (dc=all): 'db1198 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P66064 and previous config saved to /var/cache/conftool/dbconfig/20240709-154219-root.json
  • 15:42 marostegui@cumin1002: dbctl commit (dc=all): 'db1192 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P66063 and previous config saved to /var/cache/conftool/dbconfig/20240709-154211-root.json
  • 15:41 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/commons-impact-analytics: apply
  • 15:41 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/commons-impact-analytics: apply
  • 15:39 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/commons-impact-analytics: apply
  • 15:38 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/commons-impact-analytics: apply
  • 15:35 sukhe: remove traffic-dnsbox VM on cloud-vps: T360710
  • 15:28 marostegui@cumin1002: dbctl commit (dc=all): 'db1199 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P66062 and previous config saved to /var/cache/conftool/dbconfig/20240709-152847-root.json
  • 15:27 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 9 hosts
  • 15:27 hnowlan@cumin1002: START - Cookbook sre.hosts.remove-downtime for 9 hosts
  • 15:27 marostegui@cumin1002: dbctl commit (dc=all): 'db1198 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P66061 and previous config saved to /var/cache/conftool/dbconfig/20240709-152713-root.json
  • 15:27 marostegui@cumin1002: dbctl commit (dc=all): 'db1192 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P66060 and previous config saved to /var/cache/conftool/dbconfig/20240709-152706-root.json
  • 15:14 ayounsi@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) netbox to netbox-dev2003.codfw.wmnet with reason: Release v4.0.6 to netbox-next - ayounsi@cumin1002 - T336275
  • 15:12 ayounsi@cumin1002: START - Cookbook sre.deploy.python-code netbox to netbox-dev2003.codfw.wmnet with reason: Release v4.0.6 to netbox-next - ayounsi@cumin1002 - T336275
  • 15:11 ayounsi@cumin1002: END (FAIL) - Cookbook sre.deploy.python-code (exit_code=99) netbox to netbox-dev2003.codfw.wmnet with reason: Release v4.0.6 to netbox-next - ayounsi@cumin1002 - T336275
  • 15:08 ayounsi@cumin1002: START - Cookbook sre.deploy.python-code netbox to netbox-dev2003.codfw.wmnet with reason: Release v4.0.6 to netbox-next - ayounsi@cumin1002 - T336275
  • 15:04 topranks: rebooting lsw1-e3-eqiad to install updated JunOS version T365998
  • 15:03 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on 27 hosts with reason: JunOS upgrade lsw1-e3-eqiad
  • 15:02 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:40:00 on 27 hosts with reason: JunOS upgrade lsw1-e3-eqiad
  • 15:01 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 9 hosts with reason: network maintenance
  • 15:01 hnowlan@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on 9 hosts with reason: network maintenance
  • 15:00 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on lsw1-e3-eqiad,lsw1-e3-eqiad IPv6,ssw1-e1-eqiad.mgmt,ssw1-f1-eqiad.mgmt with reason: JunOS upgrade lsw1-e3-eqiad
  • 14:59 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on lsw1-e3-eqiad,lsw1-e3-eqiad IPv6,ssw1-e1-eqiad.mgmt,ssw1-f1-eqiad.mgmt with reason: JunOS upgrade lsw1-e3-eqiad
  • 14:54 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/commons-impact-analytics: apply
  • 14:53 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/commons-impact-analytics: apply
  • 14:53 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on lsw1-e3-eqiad.mgmt with reason: prep JunOS upgrade lsw1-e3-eqiad
  • 14:53 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:40:00 on lsw1-e3-eqiad.mgmt with reason: prep JunOS upgrade lsw1-e3-eqiad
  • 14:50 hashar: Restart Gerrit primary on gerrit1003 to apply a configuration change | T367505
  • 14:46 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-conf1003.eqiad.wmnet
  • 14:46 hashar@deploy1002: Finished deploy [integration/docroot@c8b0266]: (no justification provided) (duration: 00m 07s)
  • 14:46 hashar@deploy1002: Started deploy [integration/docroot@c8b0266]: (no justification provided)
  • 14:45 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-coord1003.eqiad.wmnet
  • 14:43 Lucas_WMDE: UTC afternoon backport+config window done
  • 14:40 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-conf1003.eqiad.wmnet
  • 14:40 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-conf1002.eqiad.wmnet
  • 14:38 sukhe: dummy authdns-update
  • 14:38 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-coord1003.eqiad.wmnet
  • 14:37 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cephosd2003.codfw.wmnet
  • 14:37 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for wmfRenderEmptyGraphTag: Fix count() warning (T369600) (duration: 14m 35s)
  • 14:35 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-conf1002.eqiad.wmnet
  • 14:32 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-conf1001.eqiad.wmnet
  • 14:32 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde: Continuing with sync
  • 14:29 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde: Backport for wmfRenderEmptyGraphTag: Fix count() warning (T369600) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:28 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host cephosd2003.codfw.wmnet
  • 14:28 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cephosd2002.codfw.wmnet
  • 14:27 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-conf1001.eqiad.wmnet
  • 14:26 hnowlan@cumin1002: conftool action : set/pooled=inactive; selector: name=(kubernetes1061.eqiad.wmnet|kubernetes1048.eqiad.wmnet|kubernetes1047.eqiad.wmnet|kubernetes1049.eqiad.wmnet|kubernetes1050.eqiad.wmnet|kubernetes1051.eqiad.wmnet|mw1491.eqiad.wmnet|mw1492.eqiad.wmnet|mw1493.eqiad.wmnet),cluster=kubernetes,service=kubesvc
  • 14:26 hnowlan: kubectl drain kubernetes1061.eqiad.wmnet kubernetes1061.eqiad.wmnet kubernetes1061.eqiad.wmnet kubernetes1061.eqiad.wmnet kubernetes1061.eqiad.wmnet kubernetes1061.eqiad.wmnet kubernetes1061.eqiad.wmnet mw1492.eqiad.wmnet mw1492.eqiad.wmnet (T365995)
  • 14:25 ayounsi@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) netbox to netbox-dev2003.codfw.wmnet with reason: Release v4.0.6 to netbox-next - ayounsi@cumin1002 - T336275
  • 14:22 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap sync-world: Backport for wmfRenderEmptyGraphTag: Fix count() warning (T369600)
  • 14:21 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host cephosd2002.codfw.wmnet
  • 14:21 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cephosd2001.codfw.wmnet
  • 14:17 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for Re-introduce notices (T369053) (duration: 39m 17s)
  • 14:15 ayounsi@cumin1002: START - Cookbook sre.deploy.python-code netbox to netbox-dev2003.codfw.wmnet with reason: Release v4.0.6 to netbox-next - ayounsi@cumin1002 - T336275
  • 14:13 ayounsi@cumin1002: END (FAIL) - Cookbook sre.deploy.python-code (exit_code=99) netbox to netbox-dev2003.codfw.wmnet with reason: Release v4.0.6 to netbox-next - ayounsi@cumin1002 - T336275
  • 14:12 ayounsi@cumin1002: START - Cookbook sre.deploy.python-code netbox to netbox-dev2003.codfw.wmnet with reason: Release v4.0.6 to netbox-next - ayounsi@cumin1002 - T336275
  • 14:12 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host cephosd2001.codfw.wmnet
  • 14:10 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde, mlitn: Continuing with sync
  • 14:09 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde, mlitn: Backport for Re-introduce notices (T369053) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:03 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1080.eqiad.wmnet
  • 14:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2212 (T352010)', diff saved to https://phabricator.wikimedia.org/P66059 and previous config saved to /var/cache/conftool/dbconfig/20240709-140033-ladsgroup.json
  • 14:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2212.codfw.wmnet with reason: Maintenance
  • 14:00 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2212.codfw.wmnet with reason: Maintenance
  • 13:59 XioNoX: netbox-deploy - rebase the dev branch into main
  • 13:41 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-worker1080.eqiad.wmnet
  • 13:38 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap sync-world: Backport for Re-introduce notices (T369053)
  • 13:34 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2177 (T367856)', diff saved to https://phabricator.wikimedia.org/P66058 and previous config saved to /var/cache/conftool/dbconfig/20240709-133450-marostegui.json
  • 13:34 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2177.codfw.wmnet with reason: Maintenance
  • 13:34 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2177.codfw.wmnet with reason: Maintenance
  • 13:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T367856)', diff saved to https://phabricator.wikimedia.org/P66057 and previous config saved to /var/cache/conftool/dbconfig/20240709-133428-marostegui.json
  • 13:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P66056 and previous config saved to /var/cache/conftool/dbconfig/20240709-131921-marostegui.json
  • 13:16 sukhe: dummy authdns-update run
  • 13:14 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for Add $wgMaxShellWallClockTime setting for shellbox (T356241) (duration: 08m 28s)
  • 13:08 logmsgbot: lucaswerkmeister-wmde@deploy1002 kamila, lucaswerkmeister-wmde: Continuing with sync
  • 13:08 logmsgbot: lucaswerkmeister-wmde@deploy1002 kamila, lucaswerkmeister-wmde: Backport for Add $wgMaxShellWallClockTime setting for shellbox (T356241) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:05 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap sync-world: Backport for Add $wgMaxShellWallClockTime setting for shellbox (T356241)
  • 13:04 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P66055 and previous config saved to /var/cache/conftool/dbconfig/20240709-130414-marostegui.json
  • 12:59 hashar: Restart Gerrit replica on gerrit2002 to apply a configuration change | T367505
  • 12:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T367856)', diff saved to https://phabricator.wikimedia.org/P66054 and previous config saved to /var/cache/conftool/dbconfig/20240709-124907-marostegui.json
  • 12:04 marostegui@cumin1002: dbctl commit (dc=all): 'db2212 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P66053 and previous config saved to /var/cache/conftool/dbconfig/20240709-120440-root.json
  • 12:01 eoghan@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts lists1001.wikimedia.org
  • 12:01 eoghan@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:01 eoghan@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: lists1001.wikimedia.org decommissioned, removing all IPs except the asset tag one - eoghan@cumin1002"
  • 11:59 eoghan@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: lists1001.wikimedia.org decommissioned, removing all IPs except the asset tag one - eoghan@cumin1002"
  • 11:54 eoghan@cumin1002: START - Cookbook sre.dns.netbox
  • 11:49 marostegui@cumin1002: dbctl commit (dc=all): 'db2212 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P66052 and previous config saved to /var/cache/conftool/dbconfig/20240709-114935-root.json
  • 11:45 eoghan@cumin1002: START - Cookbook sre.hosts.decommission for hosts lists1001.wikimedia.org
  • 11:34 marostegui@cumin1002: dbctl commit (dc=all): 'db2212 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P66051 and previous config saved to /var/cache/conftool/dbconfig/20240709-113430-root.json
  • 11:28 eoghan: Decommissioning lists1001 T331706
  • 11:26 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply
  • 11:26 marostegui@cumin1002: dbctl commit (dc=all): 'db1162 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P66050 and previous config saved to /var/cache/conftool/dbconfig/20240709-112611-root.json
  • 11:19 marostegui@cumin1002: dbctl commit (dc=all): 'db2212 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P66049 and previous config saved to /var/cache/conftool/dbconfig/20240709-111925-root.json
  • 11:18 btullis: depooled druid1010 for T365995
  • 11:17 btullis: set cephosd cluster into noout mode to prevent rebalancing for T365995
  • 11:16 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply
  • 11:15 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 11:15 btullis: drained dse-k8s-worker1006.eqiad.wmnet ready for T365995
  • 11:15 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 11:14 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 11:13 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 11:12 hnowlan@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 11:11 hnowlan@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 11:11 marostegui@cumin1002: dbctl commit (dc=all): 'db1162 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P66048 and previous config saved to /var/cache/conftool/dbconfig/20240709-111105-root.json
  • 11:10 hnowlan@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 11:10 hnowlan@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 11:04 marostegui@cumin1002: dbctl commit (dc=all): 'db2212 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P66047 and previous config saved to /var/cache/conftool/dbconfig/20240709-110420-root.json
  • 10:56 marostegui@cumin1002: dbctl commit (dc=all): 'db1162 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P66046 and previous config saved to /var/cache/conftool/dbconfig/20240709-105600-root.json
  • 10:54 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2122 (T367856)', diff saved to https://phabricator.wikimedia.org/P66045 and previous config saved to /var/cache/conftool/dbconfig/20240709-105454-marostegui.json
  • 10:54 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2122.codfw.wmnet with reason: Maintenance
  • 10:54 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2122.codfw.wmnet with reason: Maintenance
  • 10:49 marostegui@cumin1002: dbctl commit (dc=all): 'db2212 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P66044 and previous config saved to /var/cache/conftool/dbconfig/20240709-104914-root.json
  • 10:40 marostegui@cumin1002: dbctl commit (dc=all): 'db1162 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P66043 and previous config saved to /var/cache/conftool/dbconfig/20240709-104054-root.json
  • 10:37 Dreamy_Jazz: Finished running maintenance scripts for T366781
  • 10:34 marostegui@cumin1002: dbctl commit (dc=all): 'db2212 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P66042 and previous config saved to /var/cache/conftool/dbconfig/20240709-103409-root.json
  • 10:33 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2212 T369515', diff saved to https://phabricator.wikimedia.org/P66041 and previous config saved to /var/cache/conftool/dbconfig/20240709-103331-root.json
  • 10:32 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db2203 to s1 primary T369515', diff saved to https://phabricator.wikimedia.org/P66040 and previous config saved to /var/cache/conftool/dbconfig/20240709-103238-root.json
  • 10:32 marostegui: Starting s1 codfw failover from db2212 to db2203 - T369515
  • 10:29 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1192 db1198 db1199 T365995', diff saved to https://phabricator.wikimedia.org/P66039 and previous config saved to /var/cache/conftool/dbconfig/20240709-102947-root.json
  • 10:25 marostegui@cumin1002: dbctl commit (dc=all): 'db1162 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P66038 and previous config saved to /var/cache/conftool/dbconfig/20240709-102549-root.json
  • 10:10 marostegui@cumin1002: dbctl commit (dc=all): 'db1162 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P66037 and previous config saved to /var/cache/conftool/dbconfig/20240709-101043-root.json
  • 10:04 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 10:03 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 09:57 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 36 hosts with reason: Primary switchover s1 T369515
  • 09:57 marostegui@cumin1002: dbctl commit (dc=all): 'Set db2203 with weight 0 T369515', diff saved to https://phabricator.wikimedia.org/P66036 and previous config saved to /var/cache/conftool/dbconfig/20240709-095659-root.json
  • 09:56 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 36 hosts with reason: Primary switchover s1 T369515
  • 09:55 marostegui@cumin1002: dbctl commit (dc=all): 'db1162 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P66035 and previous config saved to /var/cache/conftool/dbconfig/20240709-095538-root.json
  • 09:26 cparle@deploy1002: Finished deploy [airflow-dags/platform_eng@0e9b3ac]: (no justification provided) (duration: 00m 32s)
  • 09:26 cparle@deploy1002: Started deploy [airflow-dags/platform_eng@0e9b3ac]: (no justification provided)
  • 09:06 vgutierrez: restart purged @ cp3073
  • 08:28 filippo@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
  • 08:28 filippo@deploy1002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 08:28 filippo@deploy1002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
  • 08:27 filippo@deploy1002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
  • 08:17 aklapper@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.43.0-wmf.13 refs T366958
  • 08:03 jforrester@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 08:01 jforrester@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 08:01 jforrester@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 07:59 jforrester@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 07:58 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 07:57 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 07:42 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts netbox-dev2002.codfw.wmnet
  • 07:42 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 07:42 ayounsi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: netbox-dev2002.codfw.wmnet decommissioned, removing all IPs except the asset tag one - ayounsi@cumin1002"
  • 07:40 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: netbox-dev2002.codfw.wmnet decommissioned, removing all IPs except the asset tag one - ayounsi@cumin1002"
  • 07:40 Dreamy_Jazz: Morning UTC backport window done
  • 07:38 vgutierrez: repool cp3073
  • 07:35 ayounsi@cumin1002: START - Cookbook sre.dns.netbox
  • 07:32 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on P{cp3073.*} and A:cp
  • 07:32 fabfur@cumin1002: cookbooks.sre.cdn.roll-reboot finished rebooting cp3073.esams.wmnet
  • 07:30 dreamyjazz@deploy1002: Synchronized wmf-config/throttle.php: Deploying throttle change for T369522 (duration: 09m 50s)
  • 07:26 ayounsi@cumin1002: START - Cookbook sre.hosts.decommission for hosts netbox-dev2002.codfw.wmnet
  • 07:25 fabfur@cumin1002: START - Cookbook sre.cdn.roll-reboot rolling reboot on P{cp3073.*} and A:cp
  • 07:12 fabfur@cumin1002: END (FAIL) - Cookbook sre.cdn.roll-reboot (exit_code=1) rolling reboot on P{cp3073.*} and A:cp
  • 07:10 fabfur@cumin1002: START - Cookbook sre.cdn.roll-reboot rolling reboot on P{cp3073.*} and A:cp
  • 07:08 fabfur@cumin1002: END (ERROR) - Cookbook sre.cdn.roll-reboot (exit_code=97) rolling reboot on P{cp3073.*} and A:cp
  • 07:08 fabfur@cumin1002: START - Cookbook sre.cdn.roll-reboot rolling reboot on P{cp3073.*} and A:cp
  • 06:54 Dreamy_Jazz: Start `foreachwikiindblist group2.dblist extensions/CheckUser/maintenance/deleteReadOldRowsInCuChanges.php --batch-size=200` in a tmux session
  • 05:41 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
  • 05:41 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
  • 05:20 marostegui: Deploy schema change on s2 eqiad db1162 dbmaint T367856
  • 05:20 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1162.eqiad.wmnet with reason: Long schema change
  • 05:20 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1162.eqiad.wmnet with reason: Long schema change
  • 05:19 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1162 T369339', diff saved to https://phabricator.wikimedia.org/P66034 and previous config saved to /var/cache/conftool/dbconfig/20240709-051911-marostegui.json
  • 05:18 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db1222 to s2 primary and set section read-write T369339', diff saved to https://phabricator.wikimedia.org/P66033 and previous config saved to /var/cache/conftool/dbconfig/20240709-051814-marostegui.json
  • 05:17 marostegui@cumin1002: dbctl commit (dc=all): 'Set s2 eqiad as read-only for maintenance - T369339', diff saved to https://phabricator.wikimedia.org/P66032 and previous config saved to /var/cache/conftool/dbconfig/20240709-051749-marostegui.json
  • 05:17 marostegui: Starting s2 eqiad failover from db1162 to db1222 - T369339
  • 04:58 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 27 hosts with reason: Primary switchover s2 T369339
  • 04:58 marostegui@cumin1002: dbctl commit (dc=all): 'Set db1222 with weight 0 T369339', diff saved to https://phabricator.wikimedia.org/P66031 and previous config saved to /var/cache/conftool/dbconfig/20240709-045814-marostegui.json
  • 04:58 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 27 hosts with reason: Primary switchover s2 T369339
  • 04:41 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2156 (T367856)', diff saved to https://phabricator.wikimedia.org/P66030 and previous config saved to /var/cache/conftool/dbconfig/20240709-044128-marostegui.json
  • 04:41 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 04:41 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 04:41 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2156.codfw.wmnet with reason: Maintenance
  • 04:40 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2156.codfw.wmnet with reason: Maintenance
  • 04:40 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T367856)', diff saved to https://phabricator.wikimedia.org/P66029 and previous config saved to /var/cache/conftool/dbconfig/20240709-044051-marostegui.json
  • 04:25 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P66028 and previous config saved to /var/cache/conftool/dbconfig/20240709-042544-marostegui.json
  • 04:10 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P66027 and previous config saved to /var/cache/conftool/dbconfig/20240709-041036-marostegui.json
  • 04:01 mwpresync@deploy1002: Pruned MediaWiki: 1.43.0-wmf.10 (duration: 00m 57s)
  • 03:55 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T367856)', diff saved to https://phabricator.wikimedia.org/P66026 and previous config saved to /var/cache/conftool/dbconfig/20240709-035529-marostegui.json
  • 03:53 mwpresync@deploy1002: Finished scap: testwikis wikis to 1.43.0-wmf.13 refs T366958 (duration: 50m 52s)
  • 03:03 mwpresync@deploy1002: Started scap sync-world: testwikis wikis to 1.43.0-wmf.13 refs T366958
  • 01:42 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217 (T367781)', diff saved to https://phabricator.wikimedia.org/P66025 and previous config saved to /var/cache/conftool/dbconfig/20240709-014242-arnaudb.json
  • 01:27 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217', diff saved to https://phabricator.wikimedia.org/P66024 and previous config saved to /var/cache/conftool/dbconfig/20240709-012735-arnaudb.json
  • 01:12 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217', diff saved to https://phabricator.wikimedia.org/P66023 and previous config saved to /var/cache/conftool/dbconfig/20240709-011227-arnaudb.json
  • 00:57 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217 (T367781)', diff saved to https://phabricator.wikimedia.org/P66022 and previous config saved to /var/cache/conftool/dbconfig/20240709-005720-arnaudb.json
  • 00:54 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2217 (T367781)', diff saved to https://phabricator.wikimedia.org/P66021 and previous config saved to /var/cache/conftool/dbconfig/20240709-005456-arnaudb.json
  • 00:54 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2217.codfw.wmnet with reason: Maintenance
  • 00:54 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2217.codfw.wmnet with reason: Maintenance
  • 00:49 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.dhcp (exit_code=99) for host sretest2001.codfw.wmnet
  • 00:14 pt1979@cumin2002: START - Cookbook sre.hosts.dhcp for host sretest2001.codfw.wmnet
  • 00:13 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2197.codfw.wmnet with reason: Maintenance
  • 00:13 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2197.codfw.wmnet with reason: Maintenance
  • 00:13 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193 (T367781)', diff saved to https://phabricator.wikimedia.org/P66020 and previous config saved to /var/cache/conftool/dbconfig/20240709-001324-arnaudb.json
  • 00:13 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1245.eqiad.wmnet with reason: Maintenance
  • 00:12 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1245.eqiad.wmnet with reason: Maintenance
  • 00:12 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1230 (T367856)', diff saved to https://phabricator.wikimedia.org/P66019 and previous config saved to /var/cache/conftool/dbconfig/20240709-001250-marostegui.json
  • 00:05 ejegg: payments-wiki upgraded from 82a5e588 to dc0c14d4

2024-07-08

  • 23:58 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193', diff saved to https://phabricator.wikimedia.org/P66018 and previous config saved to /var/cache/conftool/dbconfig/20240708-235817-arnaudb.json
  • 23:57 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1230', diff saved to https://phabricator.wikimedia.org/P66017 and previous config saved to /var/cache/conftool/dbconfig/20240708-235742-marostegui.json
  • 23:52 fabfur@cumin1002: END (FAIL) - Cookbook sre.cdn.roll-reboot (exit_code=1) rolling reboot on A:cp-text_esams
  • 23:43 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193', diff saved to https://phabricator.wikimedia.org/P66016 and previous config saved to /var/cache/conftool/dbconfig/20240708-234310-arnaudb.json
  • 23:42 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1230', diff saved to https://phabricator.wikimedia.org/P66015 and previous config saved to /var/cache/conftool/dbconfig/20240708-234235-marostegui.json
  • 23:28 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193 (T367781)', diff saved to https://phabricator.wikimedia.org/P66014 and previous config saved to /var/cache/conftool/dbconfig/20240708-232803-arnaudb.json
  • 23:27 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1230 (T367856)', diff saved to https://phabricator.wikimedia.org/P66013 and previous config saved to /var/cache/conftool/dbconfig/20240708-232728-marostegui.json
  • 23:25 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2193 (T367781)', diff saved to https://phabricator.wikimedia.org/P66012 and previous config saved to /var/cache/conftool/dbconfig/20240708-232549-arnaudb.json
  • 23:25 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2193.codfw.wmnet with reason: Maintenance
  • 23:25 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2193.codfw.wmnet with reason: Maintenance
  • 23:25 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T367781)', diff saved to https://phabricator.wikimedia.org/P66011 and previous config saved to /var/cache/conftool/dbconfig/20240708-232527-arnaudb.json
  • 23:10 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P66010 and previous config saved to /var/cache/conftool/dbconfig/20240708-231020-arnaudb.json
  • 22:55 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P66009 and previous config saved to /var/cache/conftool/dbconfig/20240708-225513-arnaudb.json
  • 22:46 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.reboot (exit_code=0)
  • 22:42 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on A:cp-upload_esams
  • 22:42 fabfur@cumin1002: cookbooks.sre.cdn.roll-reboot finished rebooting cp3081.esams.wmnet
  • 22:40 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T367781)', diff saved to https://phabricator.wikimedia.org/P66008 and previous config saved to /var/cache/conftool/dbconfig/20240708-224006-arnaudb.json
  • 22:37 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2180 (T367781)', diff saved to https://phabricator.wikimedia.org/P66007 and previous config saved to /var/cache/conftool/dbconfig/20240708-223752-arnaudb.json
  • 22:37 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2180.codfw.wmnet with reason: Maintenance
  • 22:37 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2180.codfw.wmnet with reason: Maintenance
  • 22:37 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169 (T367781)', diff saved to https://phabricator.wikimedia.org/P66006 and previous config saved to /var/cache/conftool/dbconfig/20240708-223741-arnaudb.json
  • 22:26 bking@cumin2002: START - Cookbook sre.wdqs.reboot
  • 22:22 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169', diff saved to https://phabricator.wikimedia.org/P66005 and previous config saved to /var/cache/conftool/dbconfig/20240708-222234-arnaudb.json
  • 22:07 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169', diff saved to https://phabricator.wikimedia.org/P66004 and previous config saved to /var/cache/conftool/dbconfig/20240708-220727-arnaudb.json
  • 21:52 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169 (T367781)', diff saved to https://phabricator.wikimedia.org/P66003 and previous config saved to /var/cache/conftool/dbconfig/20240708-215220-arnaudb.json
  • 21:49 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2169 (T367781)', diff saved to https://phabricator.wikimedia.org/P66002 and previous config saved to /var/cache/conftool/dbconfig/20240708-214954-arnaudb.json
  • 21:49 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 21:49 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 21:49 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T367781)', diff saved to https://phabricator.wikimedia.org/P66001 and previous config saved to /var/cache/conftool/dbconfig/20240708-214932-arnaudb.json
  • 21:34 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P66000 and previous config saved to /var/cache/conftool/dbconfig/20240708-213425-arnaudb.json
  • 21:24 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 21:23 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 21:19 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P65999 and previous config saved to /var/cache/conftool/dbconfig/20240708-211918-arnaudb.json
  • 21:16 catrope@deploy1002: Finished scap: Backport for Enable VisualEditor by default on Italian Wikibooks (T369342) (duration: 09m 23s)
  • 21:10 catrope@deploy1002: catrope, nmw03: Continuing with sync
  • 21:09 catrope@deploy1002: catrope, nmw03: Backport for Enable VisualEditor by default on Italian Wikibooks (T369342) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:06 catrope@deploy1002: Started scap sync-world: Backport for Enable VisualEditor by default on Italian Wikibooks (T369342)
  • 21:05 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: elastic109[3-5]* for T348977 - bking@cumin2002
  • 21:05 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: elastic109[3-5]* for T348977 - bking@cumin2002
  • 21:05 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on elastic[1093-1095].eqiad.wmnet with reason: T348977
  • 21:05 bking@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on elastic[1093-1095].eqiad.wmnet with reason: T348977
  • 21:04 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T367781)', diff saved to https://phabricator.wikimedia.org/P65998 and previous config saved to /var/cache/conftool/dbconfig/20240708-210410-arnaudb.json
  • 21:02 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs1023.eqiad.wmnet
  • 21:02 fabfur@cumin1002: cookbooks.sre.cdn.roll-reboot finished rebooting cp3080.esams.wmnet
  • 21:01 fabfur@cumin1002: cookbooks.sre.cdn.roll-reboot finished rebooting cp3072.esams.wmnet
  • 21:01 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2158 (T367781)', diff saved to https://phabricator.wikimedia.org/P65997 and previous config saved to /var/cache/conftool/dbconfig/20240708-210144-arnaudb.json
  • 21:01 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 21:01 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 21:01 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2158.codfw.wmnet with reason: Maintenance
  • 21:01 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2158.codfw.wmnet with reason: Maintenance
  • 21:01 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T367781)', diff saved to https://phabricator.wikimedia.org/P65996 and previous config saved to /var/cache/conftool/dbconfig/20240708-210106-arnaudb.json
  • 20:55 bking@cumin2002: START - Cookbook sre.hosts.reboot-single for host wdqs1023.eqiad.wmnet
  • 20:52 catrope@deploy1002: Finished scap: Backport for Graph extension: Add tracking for data sources used in <graph> tags (duration: 13m 00s)
  • 20:50 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs1022.eqiad.wmnet
  • 20:47 catrope@deploy1002: catrope: Continuing with sync
  • 20:46 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P65995 and previous config saved to /var/cache/conftool/dbconfig/20240708-204559-arnaudb.json
  • 20:43 bking@cumin2002: START - Cookbook sre.hosts.reboot-single for host wdqs1022.eqiad.wmnet
  • 20:42 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.reboot (exit_code=99)
  • 20:42 catrope@deploy1002: catrope: Backport for Graph extension: Add tracking for data sources used in <graph> tags synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:40 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1230 (T367856)', diff saved to https://phabricator.wikimedia.org/P65994 and previous config saved to /var/cache/conftool/dbconfig/20240708-204042-marostegui.json
  • 20:40 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1230.eqiad.wmnet with reason: Maintenance
  • 20:40 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1230.eqiad.wmnet with reason: Maintenance
  • 20:39 catrope@deploy1002: Started scap sync-world: Backport for Graph extension: Add tracking for data sources used in <graph> tags
  • 20:38 bking@cumin2002: START - Cookbook sre.wdqs.reboot
  • 20:35 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.reboot (exit_code=99)
  • 20:30 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P65993 and previous config saved to /var/cache/conftool/dbconfig/20240708-203052-arnaudb.json
  • 20:28 bking@cumin2002: START - Cookbook sre.wdqs.reboot
  • 20:27 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.reboot (exit_code=0)
  • 20:15 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T367781)', diff saved to https://phabricator.wikimedia.org/P65992 and previous config saved to /var/cache/conftool/dbconfig/20240708-201545-arnaudb.json
  • 20:13 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2151 (T367781)', diff saved to https://phabricator.wikimedia.org/P65991 and previous config saved to /var/cache/conftool/dbconfig/20240708-201318-arnaudb.json
  • 20:13 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2151.codfw.wmnet with reason: Maintenance
  • 20:13 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2151.codfw.wmnet with reason: Maintenance
  • 20:12 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2129 (T367781)', diff saved to https://phabricator.wikimedia.org/P65990 and previous config saved to /var/cache/conftool/dbconfig/20240708-201256-arnaudb.json
  • 20:08 bking@cumin2002: START - Cookbook sre.wdqs.reboot
  • 19:57 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2129', diff saved to https://phabricator.wikimedia.org/P65989 and previous config saved to /var/cache/conftool/dbconfig/20240708-195749-arnaudb.json
  • 19:44 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2149 (T367856)', diff saved to https://phabricator.wikimedia.org/P65988 and previous config saved to /var/cache/conftool/dbconfig/20240708-194435-marostegui.json
  • 19:44 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2149.codfw.wmnet with reason: Maintenance
  • 19:44 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2149.codfw.wmnet with reason: Maintenance
  • 19:42 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2129', diff saved to https://phabricator.wikimedia.org/P65987 and previous config saved to /var/cache/conftool/dbconfig/20240708-194242-arnaudb.json
  • 19:39 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.reboot (exit_code=0)
  • 19:27 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2129 (T367781)', diff saved to https://phabricator.wikimedia.org/P65986 and previous config saved to /var/cache/conftool/dbconfig/20240708-192735-arnaudb.json
  • 19:25 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2129 (T367781)', diff saved to https://phabricator.wikimedia.org/P65985 and previous config saved to /var/cache/conftool/dbconfig/20240708-192508-arnaudb.json
  • 19:25 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2129.codfw.wmnet with reason: Maintenance
  • 19:24 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2129.codfw.wmnet with reason: Maintenance
  • 19:24 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2124 (T367781)', diff saved to https://phabricator.wikimedia.org/P65984 and previous config saved to /var/cache/conftool/dbconfig/20240708-192444-arnaudb.json
  • 19:21 fabfur@cumin1002: cookbooks.sre.cdn.roll-reboot finished rebooting cp3079.esams.wmnet
  • 19:21 fabfur@cumin1002: cookbooks.sre.cdn.roll-reboot finished rebooting cp3071.esams.wmnet
  • 19:21 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 19:20 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 19:09 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P65983 and previous config saved to /var/cache/conftool/dbconfig/20240708-190937-arnaudb.json
  • 19:02 bking@cumin2002: START - Cookbook sre.wdqs.reboot
  • 18:54 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P65982 and previous config saved to /var/cache/conftool/dbconfig/20240708-185430-arnaudb.json
  • 18:39 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2124 (T367781)', diff saved to https://phabricator.wikimedia.org/P65981 and previous config saved to /var/cache/conftool/dbconfig/20240708-183923-arnaudb.json
  • 18:37 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2124 (T367781)', diff saved to https://phabricator.wikimedia.org/P65980 and previous config saved to /var/cache/conftool/dbconfig/20240708-183658-arnaudb.json
  • 18:36 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2124.codfw.wmnet with reason: Maintenance
  • 18:36 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2124.codfw.wmnet with reason: Maintenance
  • 18:36 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2114.codfw.wmnet with reason: Maintenance
  • 18:36 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2114.codfw.wmnet with reason: Maintenance
  • 18:36 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance
  • 18:35 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance
  • 18:35 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1231 (T367781)', diff saved to https://phabricator.wikimedia.org/P65979 and previous config saved to /var/cache/conftool/dbconfig/20240708-183548-arnaudb.json
  • 18:20 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1231', diff saved to https://phabricator.wikimedia.org/P65978 and previous config saved to /var/cache/conftool/dbconfig/20240708-182041-arnaudb.json
  • 18:06 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host search-loader2002.codfw.wmnet
  • 18:05 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1231', diff saved to https://phabricator.wikimedia.org/P65977 and previous config saved to /var/cache/conftool/dbconfig/20240708-180533-arnaudb.json
  • 18:02 bking@cumin2002: START - Cookbook sre.hosts.reboot-single for host search-loader2002.codfw.wmnet
  • 17:50 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1231 (T367781)', diff saved to https://phabricator.wikimedia.org/P65976 and previous config saved to /var/cache/conftool/dbconfig/20240708-175026-arnaudb.json
  • 17:49 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1231 (T367781)', diff saved to https://phabricator.wikimedia.org/P65975 and previous config saved to /var/cache/conftool/dbconfig/20240708-174918-arnaudb.json
  • 17:49 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1231.eqiad.wmnet with reason: Maintenance
  • 17:48 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1231.eqiad.wmnet with reason: Maintenance
  • 17:48 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1225.eqiad.wmnet with reason: Maintenance
  • 17:48 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1225.eqiad.wmnet with reason: Maintenance
  • 17:48 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1224 (T367781)', diff saved to https://phabricator.wikimedia.org/P65974 and previous config saved to /var/cache/conftool/dbconfig/20240708-174823-arnaudb.json
  • 17:40 fabfur@cumin1002: cookbooks.sre.cdn.roll-reboot finished rebooting cp3078.esams.wmnet
  • 17:38 fabfur@cumin1002: cookbooks.sre.cdn.roll-reboot finished rebooting cp3070.esams.wmnet
  • 17:33 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1224', diff saved to https://phabricator.wikimedia.org/P65973 and previous config saved to /var/cache/conftool/dbconfig/20240708-173316-arnaudb.json
  • 17:18 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1224', diff saved to https://phabricator.wikimedia.org/P65972 and previous config saved to /var/cache/conftool/dbconfig/20240708-171810-arnaudb.json
  • 17:03 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1224 (T367781)', diff saved to https://phabricator.wikimedia.org/P65971 and previous config saved to /var/cache/conftool/dbconfig/20240708-170302-arnaudb.json
  • 17:00 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1224 (T367781)', diff saved to https://phabricator.wikimedia.org/P65970 and previous config saved to /var/cache/conftool/dbconfig/20240708-170053-arnaudb.json
  • 17:00 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1224.eqiad.wmnet with reason: Maintenance
  • 17:00 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1224.eqiad.wmnet with reason: Maintenance
  • 17:00 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T367781)', diff saved to https://phabricator.wikimedia.org/P65969 and previous config saved to /var/cache/conftool/dbconfig/20240708-170031-arnaudb.json
  • 16:45 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P65968 and previous config saved to /var/cache/conftool/dbconfig/20240708-164524-arnaudb.json
  • 16:39 ladsgroup@deploy1002: Finished scap: Backport for Reduce frequency of two query pages in commonswiki (T369024) (duration: 07m 50s)
  • 16:34 ladsgroup@deploy1002: ladsgroup: Continuing with sync
  • 16:33 ladsgroup@deploy1002: ladsgroup: Backport for Reduce frequency of two query pages in commonswiki (T369024) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 16:31 ladsgroup@deploy1002: Started scap sync-world: Backport for Reduce frequency of two query pages in commonswiki (T369024)
  • 16:30 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P65967 and previous config saved to /var/cache/conftool/dbconfig/20240708-163017-arnaudb.json
  • 16:15 dcaro@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd1011.eqiad.wmnet
  • 16:15 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T367781)', diff saved to https://phabricator.wikimedia.org/P65966 and previous config saved to /var/cache/conftool/dbconfig/20240708-161510-arnaudb.json
  • 16:13 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1201 (T367781)', diff saved to https://phabricator.wikimedia.org/P65965 and previous config saved to /var/cache/conftool/dbconfig/20240708-161302-arnaudb.json
  • 16:12 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1201.eqiad.wmnet with reason: Maintenance
  • 16:12 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1201.eqiad.wmnet with reason: Maintenance
  • 16:12 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T367781)', diff saved to https://phabricator.wikimedia.org/P65964 and previous config saved to /var/cache/conftool/dbconfig/20240708-161238-arnaudb.json
  • 16:09 dcaro@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudcephosd1011.eqiad.wmnet
  • 16:08 root@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1011.eqiad.wmnet with OS bullseye
  • 15:57 fabfur@cumin1002: cookbooks.sre.cdn.roll-reboot finished rebooting cp3077.esams.wmnet
  • 15:57 fabfur@cumin1002: cookbooks.sre.cdn.roll-reboot finished rebooting cp3069.esams.wmnet
  • 15:57 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P65963 and previous config saved to /var/cache/conftool/dbconfig/20240708-155731-arnaudb.json
  • 15:51 jdrewniak@deploy1002: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 06m 28s)
  • 15:47 pfischer@deploy1002: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:46 pfischer@deploy1002: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:45 pfischer@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:45 pfischer@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:45 pfischer@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:45 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/commons-impact-analytics: apply
  • 15:44 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 07m 54s)
  • 15:44 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/commons-impact-analytics: apply
  • 15:44 pfischer@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:42 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P65962 and previous config saved to /var/cache/conftool/dbconfig/20240708-154224-arnaudb.json
  • 15:38 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 15:38 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 15:27 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T367781)', diff saved to https://phabricator.wikimedia.org/P65961 and previous config saved to /var/cache/conftool/dbconfig/20240708-152717-arnaudb.json
  • 15:25 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1187 (T367781)', diff saved to https://phabricator.wikimedia.org/P65960 and previous config saved to /var/cache/conftool/dbconfig/20240708-152508-arnaudb.json
  • 15:25 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1187.eqiad.wmnet with reason: Maintenance
  • 15:24 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1187.eqiad.wmnet with reason: Maintenance
  • 15:24 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T367781)', diff saved to https://phabricator.wikimedia.org/P65959 and previous config saved to /var/cache/conftool/dbconfig/20240708-152446-arnaudb.json
  • 15:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Bumping db1227 weight (T366852)', diff saved to https://phabricator.wikimedia.org/P65958 and previous config saved to /var/cache/conftool/dbconfig/20240708-152222-ladsgroup.json
  • 15:16 root@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1011.eqiad.wmnet with reason: host reimage
  • 15:13 root@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1011.eqiad.wmnet with reason: host reimage
  • 15:09 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P65957 and previous config saved to /var/cache/conftool/dbconfig/20240708-150939-arnaudb.json
  • 14:59 root@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1011.eqiad.wmnet with OS bullseye
  • 14:57 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host search-loader1002.eqiad.wmnet
  • 14:54 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P65956 and previous config saved to /var/cache/conftool/dbconfig/20240708-145432-arnaudb.json
  • 14:53 bking@cumin2002: START - Cookbook sre.hosts.reboot-single for host search-loader1002.eqiad.wmnet
  • 14:53 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host search-loader1002.eqiad.wmnet
  • 14:53 bking@cumin2002: START - Cookbook sre.hosts.reboot-single for host search-loader1002.eqiad.wmnet
  • 14:52 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host search-loader1002.eqiad.wmnet
  • 14:51 bking@cumin2002: START - Cookbook sre.hosts.reboot-single for host search-loader1002.eqiad.wmnet
  • 14:51 claime: cleaning up old shellbox files on mw1438
  • 14:43 root@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts cloudcephosd1011.eqiad.wmnet
  • 14:43 root@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd1011.eqiad.wmnet
  • 14:39 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T367781)', diff saved to https://phabricator.wikimedia.org/P65955 and previous config saved to /var/cache/conftool/dbconfig/20240708-143925-arnaudb.json
  • 14:37 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1180 (T367781)', diff saved to https://phabricator.wikimedia.org/P65954 and previous config saved to /var/cache/conftool/dbconfig/20240708-143716-arnaudb.json
  • 14:37 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 14:36 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 14:36 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T367781)', diff saved to https://phabricator.wikimedia.org/P65953 and previous config saved to /var/cache/conftool/dbconfig/20240708-143654-arnaudb.json
  • 14:34 root@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudcephosd1011.eqiad.wmnet
  • 14:31 root@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cloudcephosd1011.eqiad.wmnet
  • 14:27 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.reboot (exit_code=99)
  • 14:27 bking@cumin2002: START - Cookbook sre.wdqs.reboot
  • 14:23 filippo@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
  • 14:22 filippo@deploy1002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 14:22 filippo@deploy1002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
  • 14:21 filippo@deploy1002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
  • 14:21 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.reboot (exit_code=99)
  • 14:21 bking@cumin2002: START - Cookbook sre.wdqs.reboot
  • 14:21 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P65952 and previous config saved to /var/cache/conftool/dbconfig/20240708-142147-arnaudb.json
  • 14:21 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.reboot (exit_code=99)
  • 14:21 bking@cumin2002: START - Cookbook sre.wdqs.reboot
  • 14:20 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.reboot (exit_code=99)
  • 14:20 bking@cumin2002: START - Cookbook sre.wdqs.reboot
  • 14:20 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.reboot (exit_code=99)
  • 14:20 bking@cumin2002: START - Cookbook sre.wdqs.reboot
  • 14:20 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.reboot (exit_code=99)
  • 14:20 bking@cumin2002: START - Cookbook sre.wdqs.reboot
  • 14:18 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.reboot (exit_code=99)
  • 14:17 bking@cumin2002: START - Cookbook sre.wdqs.reboot
  • 14:17 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.reboot (exit_code=99)
  • 14:17 bking@cumin2002: START - Cookbook sre.wdqs.reboot
  • 14:17 fabfur@cumin1002: cookbooks.sre.cdn.roll-reboot finished rebooting cp3068.esams.wmnet
  • 14:16 fabfur@cumin1002: cookbooks.sre.cdn.roll-reboot finished rebooting cp3076.esams.wmnet
  • 14:14 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance
  • 14:14 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance
  • 14:14 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1213 (T367856)', diff saved to https://phabricator.wikimedia.org/P65951 and previous config saved to /var/cache/conftool/dbconfig/20240708-141432-marostegui.json
  • 14:13 claime: cleaning up old shellbox files on mw1446
  • 14:06 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P65950 and previous config saved to /var/cache/conftool/dbconfig/20240708-140640-arnaudb.json
  • 13:59 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1213', diff saved to https://phabricator.wikimedia.org/P65949 and previous config saved to /var/cache/conftool/dbconfig/20240708-135925-marostegui.json
  • 13:58 urbanecm@deploy1002: Finished scap: Backport for lib: Update metrics-platform to 84ed8dcbe7c9 (duration: 10m 36s)
  • 13:53 urbanecm@deploy1002: phuedx, urbanecm: Continuing with sync
  • 13:51 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T367781)', diff saved to https://phabricator.wikimedia.org/P65948 and previous config saved to /var/cache/conftool/dbconfig/20240708-135132-arnaudb.json
  • 13:50 urbanecm@deploy1002: phuedx, urbanecm: Backport for lib: Update metrics-platform to 84ed8dcbe7c9 synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:50 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1168 (T367781)', diff saved to https://phabricator.wikimedia.org/P65947 and previous config saved to /var/cache/conftool/dbconfig/20240708-135024-arnaudb.json
  • 13:50 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 13:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 13:50 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T367781)', diff saved to https://phabricator.wikimedia.org/P65946 and previous config saved to /var/cache/conftool/dbconfig/20240708-135002-arnaudb.json
  • 13:48 urbanecm@deploy1002: Started scap sync-world: Backport for lib: Update metrics-platform to 84ed8dcbe7c9
  • 13:47 urbanecm@deploy1002: Finished scap: Backport for EventStreamConfig: Add hive ingestion defaults (T367134), [wikifunctionswiki] Disable MobileFrontend in production (T349408) (duration: 30m 38s)
  • 13:44 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1213', diff saved to https://phabricator.wikimedia.org/P65945 and previous config saved to /var/cache/conftool/dbconfig/20240708-134418-marostegui.json
  • 13:42 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.REBOOT (1 nodes at a time) for ElasticSearch cluster relforge: security update - bking@cumin2002 - T366555
  • 13:39 urbanecm@deploy1002: tchin, jforrester, urbanecm: Continuing with sync
  • 13:34 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P65944 and previous config saved to /var/cache/conftool/dbconfig/20240708-133456-arnaudb.json
  • 13:32 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REBOOT (1 nodes at a time) for ElasticSearch cluster relforge: security update - bking@cumin2002 - T366555
  • 13:32 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REBOOT (1 nodes at a time) for ElasticSearch cluster relforge: security update - bking@cumin2002 - T366555
  • 13:32 urbanecm@deploy1002: tchin, jforrester, urbanecm: Backport for EventStreamConfig: Add hive ingestion defaults (T367134), [wikifunctionswiki] Disable MobileFrontend in production (T349408) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:31 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REBOOT (1 nodes at a time) for ElasticSearch cluster relforge: security update - bking@cumin2002 - T366555
  • 13:29 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1213 (T367856)', diff saved to https://phabricator.wikimedia.org/P65943 and previous config saved to /var/cache/conftool/dbconfig/20240708-132911-marostegui.json
  • 13:19 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P65942 and previous config saved to /var/cache/conftool/dbconfig/20240708-131948-arnaudb.json
  • 13:17 urbanecm@deploy1002: Started scap sync-world: Backport for EventStreamConfig: Add hive ingestion defaults (T367134), [wikifunctionswiki] Disable MobileFrontend in production (T349408)
  • 13:04 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T367781)', diff saved to https://phabricator.wikimedia.org/P65941 and previous config saved to /var/cache/conftool/dbconfig/20240708-130441-arnaudb.json
  • 13:03 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1165 (T367781)', diff saved to https://phabricator.wikimedia.org/P65940 and previous config saved to /var/cache/conftool/dbconfig/20240708-130333-arnaudb.json
  • 13:03 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 13:03 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 13:03 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 13:02 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 12:51 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 12:51 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-mariadb1002.eqiad.wmnet with OS bookworm
  • 12:51 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 12:48 vgutierrez: test bwlimit per url on cp4051 - T317799
  • 12:43 marostegui@cumin1002: dbctl commit (dc=all): 'Pool with small weight T365805', diff saved to https://phabricator.wikimedia.org/P65939 and previous config saved to /var/cache/conftool/dbconfig/20240708-124310-marostegui.json
  • 12:36 fabfur@cumin1002: cookbooks.sre.cdn.roll-reboot finished rebooting cp3067.esams.wmnet
  • 12:36 fabfur@cumin1002: cookbooks.sre.cdn.roll-reboot finished rebooting cp3075.esams.wmnet
  • 12:35 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-mariadb1002.eqiad.wmnet with reason: host reimage
  • 12:32 btullis@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-mariadb1002.eqiad.wmnet with reason: host reimage
  • 12:27 btullis@deploy1002: Finished deploy [airflow-dags/analytics@a2faba7]: (no justification provided) (duration: 00m 27s)
  • 12:27 btullis@deploy1002: Started deploy [airflow-dags/analytics@a2faba7]: (no justification provided)
  • 12:19 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host an-mariadb1002.eqiad.wmnet with OS bookworm
  • 11:54 marostegui@cumin1002: dbctl commit (dc=all): 'db2213 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P65938 and previous config saved to /var/cache/conftool/dbconfig/20240708-115422-root.json
  • 11:47 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 262476
  • 11:47 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 262476
  • 11:39 marostegui@cumin1002: dbctl commit (dc=all): 'db2213 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P65937 and previous config saved to /var/cache/conftool/dbconfig/20240708-113917-root.json
  • 11:34 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 11:34 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 11:27 jforrester@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 11:26 jforrester@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 11:26 jforrester@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 11:25 jforrester@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 11:25 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 11:25 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 11:24 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 11:24 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 11:24 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 11:24 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 11:24 marostegui@cumin1002: dbctl commit (dc=all): 'db2213 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P65936 and previous config saved to /var/cache/conftool/dbconfig/20240708-112411-root.json
  • 11:09 marostegui@cumin1002: dbctl commit (dc=all): 'db2213 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P65935 and previous config saved to /var/cache/conftool/dbconfig/20240708-110905-root.json
  • 10:55 fabfur@cumin1002: cookbooks.sre.cdn.roll-reboot finished rebooting cp3066.esams.wmnet
  • 10:55 fabfur@cumin1002: cookbooks.sre.cdn.roll-reboot finished rebooting cp3074.esams.wmnet
  • 10:54 marostegui@cumin1002: dbctl commit (dc=all): 'db2213 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P65934 and previous config saved to /var/cache/conftool/dbconfig/20240708-105400-root.json
  • 10:53 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1213 (T367856)', diff saved to https://phabricator.wikimedia.org/P65933 and previous config saved to /var/cache/conftool/dbconfig/20240708-105348-marostegui.json
  • 10:53 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1213.eqiad.wmnet with reason: Maintenance
  • 10:53 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1213.eqiad.wmnet with reason: Maintenance
  • 10:53 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1210 (T367856)', diff saved to https://phabricator.wikimedia.org/P65932 and previous config saved to /var/cache/conftool/dbconfig/20240708-105325-marostegui.json
  • 10:45 fabfur@cumin1002: START - Cookbook sre.cdn.roll-reboot rolling reboot on A:cp-upload_esams
  • 10:45 fabfur@cumin1002: START - Cookbook sre.cdn.roll-reboot rolling reboot on A:cp-text_esams
  • 10:45 fabfur: rebooting A:cp-esams (T366555)
  • 10:43 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 270359
  • 10:43 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 270359
  • 10:43 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 268248
  • 10:43 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 268248
  • 10:42 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 262476
  • 10:42 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 262476
  • 10:42 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 272432
  • 10:41 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 272432
  • 10:38 marostegui@cumin1002: dbctl commit (dc=all): 'db2213 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P65931 and previous config saved to /var/cache/conftool/dbconfig/20240708-103854-root.json
  • 10:38 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1210', diff saved to https://phabricator.wikimedia.org/P65930 and previous config saved to /var/cache/conftool/dbconfig/20240708-103818-marostegui.json
  • 10:23 marostegui@cumin1002: dbctl commit (dc=all): 'db2213 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P65929 and previous config saved to /var/cache/conftool/dbconfig/20240708-102347-root.json
  • 10:23 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1210', diff saved to https://phabricator.wikimedia.org/P65928 and previous config saved to /var/cache/conftool/dbconfig/20240708-102311-marostegui.json
  • 10:08 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1210 (T367856)', diff saved to https://phabricator.wikimedia.org/P65927 and previous config saved to /var/cache/conftool/dbconfig/20240708-100804-marostegui.json
  • 10:06 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: sync
  • 10:02 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: sync
  • 10:00 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 10:00 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 09:58 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: sync
  • 09:55 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: sync
  • 09:50 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: sync
  • 09:50 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/rest-gateway: sync
  • 09:49 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: sync
  • 09:49 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/api-gateway: sync
  • 09:44 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: sync
  • 09:44 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/rest-gateway: sync
  • 09:41 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/api-gateway: sync
  • 09:41 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/api-gateway: sync
  • 09:38 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: sync
  • 09:38 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifeeds: sync
  • 09:32 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifeeds: sync
  • 09:32 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/wikifeeds: sync
  • 09:31 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifeeds: sync
  • 09:31 elukey@deploy1002: helmfile [staging] START helmfile.d/services/wikifeeds: sync
  • 09:17 arturo: aborrero@apt1002:~$ sudo -i reprepro --component thirdparty/k9s includedeb bookworm-wikimedia /home/aborrero/k9s_linux_amd64.deb (T366061)
  • 08:59 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:swift-fe-codfw
  • 08:56 mvernon@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:swift-fe-codfw
  • 08:51 Dreamy_Jazz: Running `foreachwikiindblist group1.dblist extensions/CheckUser/maintenance/deleteReadOldRowsInCuChanges.php --batch-size=200` in a tmux session
  • 08:50 Dreamy_Jazz: Restarting MediaModeration scanning script - https://wikitech.wikimedia.org/wiki/MediaModeration
  • 08:42 arturo: update packages for thirdparty/kubeadm-k8s-1-25 bookworm-wikimedia in apt1002 (T369163)
  • 08:26 godog: re-enable business hours americas oncall - T369122
  • 07:02 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 270052
  • 07:01 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 270052
  • 06:17 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 52455
  • 06:16 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 52455
  • 06:15 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 137409
  • 06:14 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 137409
  • 06:13 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 27768
  • 06:13 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 27768
  • 06:11 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 61512
  • 06:09 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 61512
  • 06:08 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 269783
  • 06:08 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 269783
  • 06:06 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 52320
  • 06:05 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 52320
  • 06:05 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 7738
  • 06:04 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 7738
  • 06:04 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 52468
  • 06:03 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 52468
  • 06:02 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 270052
  • 06:01 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 270052
  • 05:59 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 28008
  • 05:59 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 28008
  • 05:39 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 17072
  • 05:39 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 17072
  • 05:39 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 263522
  • 05:38 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 263522
  • 05:38 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 61942
  • 05:38 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 61942
  • 05:38 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 18013
  • 05:37 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 18013
  • 05:37 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 268248
  • 05:37 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 268248
  • 05:36 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 61672
  • 05:36 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 61672
  • 05:36 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 28352
  • 05:36 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 28352
  • 05:36 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 999
  • 05:36 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 999
  • 05:35 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 4788
  • 05:34 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 4788
  • 05:34 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 132167
  • 05:33 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 132167
  • 05:33 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 6447
  • 05:32 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 6447
  • 05:31 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1210 (T367856)', diff saved to https://phabricator.wikimedia.org/P65926 and previous config saved to /var/cache/conftool/dbconfig/20240708-053133-marostegui.json
  • 05:31 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1210.eqiad.wmnet with reason: Maintenance
  • 05:31 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1210.eqiad.wmnet with reason: Maintenance
  • 05:31 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T367856)', diff saved to https://phabricator.wikimedia.org/P65925 and previous config saved to /var/cache/conftool/dbconfig/20240708-053122-marostegui.json
  • 05:30 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 28306
  • 05:29 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 28306
  • 05:24 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2213.codfw.wmnet with reason: Long schema change
  • 05:24 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2213.codfw.wmnet with reason: Long schema change
  • 05:24 marostegui: Deploy schema change on s5 codfw db2213 dbmaint T367856
  • 05:19 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2213 T369478', diff saved to https://phabricator.wikimedia.org/P65923 and previous config saved to /var/cache/conftool/dbconfig/20240708-051935-root.json
  • 05:18 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db2123 to s5 primary T369478', diff saved to https://phabricator.wikimedia.org/P65922 and previous config saved to /var/cache/conftool/dbconfig/20240708-051840-root.json
  • 05:18 marostegui: Starting s5 codfw failover from db2213 to db2123 - T369478
  • 05:16 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P65921 and previous config saved to /var/cache/conftool/dbconfig/20240708-051615-marostegui.json
  • 05:16 marostegui@cumin1002: dbctl commit (dc=all): 'Remove db2123 from dump/slow', diff saved to https://phabricator.wikimedia.org/P65920 and previous config saved to /var/cache/conftool/dbconfig/20240708-051605-marostegui.json
  • 05:03 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s5 T369478
  • 05:03 marostegui@cumin1002: dbctl commit (dc=all): 'Set db2123 with weight 0 T369478', diff saved to https://phabricator.wikimedia.org/P65919 and previous config saved to /var/cache/conftool/dbconfig/20240708-050301-root.json
  • 05:02 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 25 hosts with reason: Primary switchover s5 T369478
  • 04:52 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P65918 and previous config saved to /var/cache/conftool/dbconfig/20240708-045246-marostegui.json
  • 04:37 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T367856)', diff saved to https://phabricator.wikimedia.org/P65917 and previous config saved to /var/cache/conftool/dbconfig/20240708-043738-marostegui.json
  • 01:40 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1200 (T367856)', diff saved to https://phabricator.wikimedia.org/P65916 and previous config saved to /var/cache/conftool/dbconfig/20240708-014044-marostegui.json
  • 01:40 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1200.eqiad.wmnet with reason: Maintenance
  • 01:40 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1200.eqiad.wmnet with reason: Maintenance
  • 01:40 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T367856)', diff saved to https://phabricator.wikimedia.org/P65915 and previous config saved to /var/cache/conftool/dbconfig/20240708-014022-marostegui.json
  • 01:25 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P65914 and previous config saved to /var/cache/conftool/dbconfig/20240708-012515-marostegui.json
  • 01:10 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P65913 and previous config saved to /var/cache/conftool/dbconfig/20240708-011008-marostegui.json
  • 00:55 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T367856)', diff saved to https://phabricator.wikimedia.org/P65912 and previous config saved to /var/cache/conftool/dbconfig/20240708-005501-marostegui.json

2024-07-07

  • 21:50 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1185 (T367856)', diff saved to https://phabricator.wikimedia.org/P65911 and previous config saved to /var/cache/conftool/dbconfig/20240707-215014-marostegui.json
  • 21:50 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1185.eqiad.wmnet with reason: Maintenance
  • 21:49 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1185.eqiad.wmnet with reason: Maintenance
  • 21:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T367856)', diff saved to https://phabricator.wikimedia.org/P65910 and previous config saved to /var/cache/conftool/dbconfig/20240707-214952-marostegui.json
  • 21:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P65909 and previous config saved to /var/cache/conftool/dbconfig/20240707-213445-marostegui.json
  • 21:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P65908 and previous config saved to /var/cache/conftool/dbconfig/20240707-211938-marostegui.json
  • 21:04 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T367856)', diff saved to https://phabricator.wikimedia.org/P65907 and previous config saved to /var/cache/conftool/dbconfig/20240707-210430-marostegui.json
  • 15:41 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1161 (T367856)', diff saved to https://phabricator.wikimedia.org/P65906 and previous config saved to /var/cache/conftool/dbconfig/20240707-154059-marostegui.json
  • 15:40 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 15:40 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 15:40 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 15:40 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1161.eqiad.wmnet with reason: Maintenance

2024-07-06

  • 18:26 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2211 (T367856)', diff saved to https://phabricator.wikimedia.org/P65905 and previous config saved to /var/cache/conftool/dbconfig/20240706-182625-marostegui.json
  • 18:11 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2211', diff saved to https://phabricator.wikimedia.org/P65904 and previous config saved to /var/cache/conftool/dbconfig/20240706-181117-marostegui.json
  • 17:56 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2211', diff saved to https://phabricator.wikimedia.org/P65903 and previous config saved to /var/cache/conftool/dbconfig/20240706-175610-marostegui.json
  • 17:41 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2211 (T367856)', diff saved to https://phabricator.wikimedia.org/P65902 and previous config saved to /var/cache/conftool/dbconfig/20240706-174103-marostegui.json
  • 17:21 hnowlan@cumin1002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:swift-fe-eqiad
  • 17:18 hnowlan@cumin1002: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:swift-fe-eqiad
  • 12:45 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2211 (T367856)', diff saved to https://phabricator.wikimedia.org/P65901 and previous config saved to /var/cache/conftool/dbconfig/20240706-124535-marostegui.json
  • 12:45 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2211.codfw.wmnet with reason: Maintenance
  • 12:45 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2211.codfw.wmnet with reason: Maintenance
  • 07:55 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2201.codfw.wmnet with reason: Maintenance
  • 07:54 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2201.codfw.wmnet with reason: Maintenance
  • 07:54 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2192 (T367856)', diff saved to https://phabricator.wikimedia.org/P65900 and previous config saved to /var/cache/conftool/dbconfig/20240706-075448-marostegui.json
  • 07:39 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2192', diff saved to https://phabricator.wikimedia.org/P65899 and previous config saved to /var/cache/conftool/dbconfig/20240706-073941-marostegui.json
  • 07:24 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2192', diff saved to https://phabricator.wikimedia.org/P65898 and previous config saved to /var/cache/conftool/dbconfig/20240706-072434-marostegui.json
  • 07:09 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2192 (T367856)', diff saved to https://phabricator.wikimedia.org/P65897 and previous config saved to /var/cache/conftool/dbconfig/20240706-070927-marostegui.json
  • 04:35 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2192 (T367856)', diff saved to https://phabricator.wikimedia.org/P65896 and previous config saved to /var/cache/conftool/dbconfig/20240706-043535-marostegui.json
  • 04:35 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2192.codfw.wmnet with reason: Maintenance
  • 04:35 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2192.codfw.wmnet with reason: Maintenance
  • 04:35 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T367856)', diff saved to https://phabricator.wikimedia.org/P65895 and previous config saved to /var/cache/conftool/dbconfig/20240706-043513-marostegui.json
  • 04:20 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P65894 and previous config saved to /var/cache/conftool/dbconfig/20240706-042006-marostegui.json
  • 04:05 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P65893 and previous config saved to /var/cache/conftool/dbconfig/20240706-040459-marostegui.json
  • 03:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T367856)', diff saved to https://phabricator.wikimedia.org/P65892 and previous config saved to /var/cache/conftool/dbconfig/20240706-034952-marostegui.json
  • 00:56 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2178 (T367856)', diff saved to https://phabricator.wikimedia.org/P65891 and previous config saved to /var/cache/conftool/dbconfig/20240706-005648-marostegui.json
  • 00:56 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2178.codfw.wmnet with reason: Maintenance
  • 00:56 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2178.codfw.wmnet with reason: Maintenance
  • 00:56 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2171 (T367856)', diff saved to https://phabricator.wikimedia.org/P65890 and previous config saved to /var/cache/conftool/dbconfig/20240706-005626-marostegui.json
  • 00:41 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2171', diff saved to https://phabricator.wikimedia.org/P65889 and previous config saved to /var/cache/conftool/dbconfig/20240706-004119-marostegui.json
  • 00:26 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2171', diff saved to https://phabricator.wikimedia.org/P65888 and previous config saved to /var/cache/conftool/dbconfig/20240706-002612-marostegui.json
  • 00:11 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2171 (T367856)', diff saved to https://phabricator.wikimedia.org/P65887 and previous config saved to /var/cache/conftool/dbconfig/20240706-001105-marostegui.json

2024-07-05

  • 20:05 akosiaris@deploy1003: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
  • 20:04 akosiaris@deploy1003: helmfile [staging] START helmfile.d/services/api-gateway: apply
  • 18:56 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2171 (T367856)', diff saved to https://phabricator.wikimedia.org/P65886 and previous config saved to /var/cache/conftool/dbconfig/20240705-185604-marostegui.json
  • 18:55 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 18:55 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 18:55 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T367856)', diff saved to https://phabricator.wikimedia.org/P65885 and previous config saved to /var/cache/conftool/dbconfig/20240705-185542-marostegui.json
  • 18:40 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P65884 and previous config saved to /var/cache/conftool/dbconfig/20240705-184034-marostegui.json
  • 18:34 marostegui@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P65883 and previous config saved to /var/cache/conftool/dbconfig/20240705-183428-root.json
  • 18:25 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P65882 and previous config saved to /var/cache/conftool/dbconfig/20240705-182527-marostegui.json
  • 18:19 marostegui@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P65881 and previous config saved to /var/cache/conftool/dbconfig/20240705-181923-root.json
  • 18:10 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T367856)', diff saved to https://phabricator.wikimedia.org/P65880 and previous config saved to /var/cache/conftool/dbconfig/20240705-181020-marostegui.json
  • 18:04 marostegui@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P65879 and previous config saved to /var/cache/conftool/dbconfig/20240705-180417-root.json
  • 17:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2207 (T352010)', diff saved to https://phabricator.wikimedia.org/P65878 and previous config saved to /var/cache/conftool/dbconfig/20240705-175653-ladsgroup.json
  • 17:49 marostegui@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P65877 and previous config saved to /var/cache/conftool/dbconfig/20240705-174912-root.json
  • 17:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2207', diff saved to https://phabricator.wikimedia.org/P65876 and previous config saved to /var/cache/conftool/dbconfig/20240705-174146-ladsgroup.json
  • 17:34 marostegui@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P65875 and previous config saved to /var/cache/conftool/dbconfig/20240705-173406-root.json
  • 17:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2207', diff saved to https://phabricator.wikimedia.org/P65874 and previous config saved to /var/cache/conftool/dbconfig/20240705-172639-ladsgroup.json
  • 17:19 marostegui@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P65873 and previous config saved to /var/cache/conftool/dbconfig/20240705-171901-root.json
  • 17:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2207 (T352010)', diff saved to https://phabricator.wikimedia.org/P65872 and previous config saved to /var/cache/conftool/dbconfig/20240705-171131-ladsgroup.json
  • 17:03 marostegui@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P65871 and previous config saved to /var/cache/conftool/dbconfig/20240705-170356-root.json
  • 17:00 logmsgbot: andrewtavis-wmde@deploy1002 Finished deploy [airflow-dags/wmde@73c6618]: (no justification provided) (duration: 00m 06s)
  • 17:00 logmsgbot: andrewtavis-wmde@deploy1002 Started deploy [airflow-dags/wmde@73c6618]: (no justification provided)
  • 13:40 hashar@deploy1002: Finished deploy [integration/docroot@18c8279]: Add AQS documentation to landing page - T368484 (duration: 00m 06s)
  • 13:40 hashar@deploy1002: Started deploy [integration/docroot@18c8279]: Add AQS documentation to landing page - T368484
  • 12:57 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1246.eqiad.wmnet with reason: Long schema change
  • 12:57 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1246.eqiad.wmnet with reason: Long schema change
  • 12:51 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2157 (T367856)', diff saved to https://phabricator.wikimedia.org/P65869 and previous config saved to /var/cache/conftool/dbconfig/20240705-125152-marostegui.json
  • 12:51 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2157.codfw.wmnet with reason: Maintenance
  • 12:51 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2157.codfw.wmnet with reason: Maintenance
  • 12:51 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T367856)', diff saved to https://phabricator.wikimedia.org/P65868 and previous config saved to /var/cache/conftool/dbconfig/20240705-125130-marostegui.json
  • 12:36 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P65867 and previous config saved to /var/cache/conftool/dbconfig/20240705-123623-marostegui.json
  • 12:21 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P65866 and previous config saved to /var/cache/conftool/dbconfig/20240705-122115-marostegui.json
  • 12:19 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 12:19 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 12:06 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T367856)', diff saved to https://phabricator.wikimedia.org/P65865 and previous config saved to /var/cache/conftool/dbconfig/20240705-120608-marostegui.json
  • 11:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1231 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P65864 and previous config saved to /var/cache/conftool/dbconfig/20240705-115703-ladsgroup.json
  • 11:53 dcausse: T369149: re-indexed wikidata P12861 (cirrus_rerender.rerender --wiki wikidatawiki allpages --namespace 120 --from-title P12861 --to-title P12861)
  • 11:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1231 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P65863 and previous config saved to /var/cache/conftool/dbconfig/20240705-114157-ladsgroup.json
  • 11:30 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on kubernetes1051.eqiad.wmnet with reason: Hardware issue
  • 11:29 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on kubernetes1051.eqiad.wmnet with reason: Hardware issue
  • 11:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1231 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P65862 and previous config saved to /var/cache/conftool/dbconfig/20240705-112652-ladsgroup.json
  • 11:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2207 (T352010)', diff saved to https://phabricator.wikimedia.org/P65861 and previous config saved to /var/cache/conftool/dbconfig/20240705-111322-ladsgroup.json
  • 11:13 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2207.codfw.wmnet with reason: Maintenance
  • 11:13 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2207.codfw.wmnet with reason: Maintenance
  • 11:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1231 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P65860 and previous config saved to /var/cache/conftool/dbconfig/20240705-111146-ladsgroup.json
  • 10:52 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
  • 10:52 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
  • 10:41 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for Define custom search-index-data-formatter-callback (T369149), Try looking up search index data formatters by data type (T369149) (duration: 21m 22s)
  • 10:36 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde: Continuing with sync
  • 10:22 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde: Backport for Define custom search-index-data-formatter-callback (T369149), Try looking up search index data formatters by data type (T369149) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 10:20 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap sync-world: Backport for Define custom search-index-data-formatter-callback (T369149), Try looking up search index data formatters by data type (T369149)
  • 10:11 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 10:10 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 09:35 fabfur: running puppet on A:cp to apply https://gerrit.wikimedia.org/r/c/operations/puppet/+/1052271 (T369345)
  • 09:26 XioNoX: netbox-dev2003: move from netbox-dev to netbox-next - T336275
  • 08:55 godog: silence NELNotReported NELByCountryNotReported until Tues - T369345
  • 08:54 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2128 (T367856)', diff saved to https://phabricator.wikimedia.org/P65858 and previous config saved to /var/cache/conftool/dbconfig/20240705-085406-marostegui.json
  • 08:53 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 08:53 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 08:53 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2128.codfw.wmnet with reason: Maintenance
  • 08:53 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2128.codfw.wmnet with reason: Maintenance
  • 08:53 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2123 (T367856)', diff saved to https://phabricator.wikimedia.org/P65857 and previous config saved to /var/cache/conftool/dbconfig/20240705-085329-marostegui.json
  • 08:38 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2123', diff saved to https://phabricator.wikimedia.org/P65856 and previous config saved to /var/cache/conftool/dbconfig/20240705-083821-marostegui.json
  • 08:23 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2123', diff saved to https://phabricator.wikimedia.org/P65855 and previous config saved to /var/cache/conftool/dbconfig/20240705-082314-marostegui.json
  • 08:08 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2123 (T367856)', diff saved to https://phabricator.wikimedia.org/P65854 and previous config saved to /var/cache/conftool/dbconfig/20240705-080807-marostegui.json
  • 08:04 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1240.eqiad.wmnet with reason: Maintenance
  • 08:04 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1240.eqiad.wmnet with reason: Maintenance
  • 07:50 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 07:50 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 07:49 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 07:47 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 07:44 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 07:44 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 05:12 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1239.eqiad.wmnet with reason: Maintenance
  • 05:12 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1239.eqiad.wmnet with reason: Maintenance
  • 05:12 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235 (T364069)', diff saved to https://phabricator.wikimedia.org/P65852 and previous config saved to /var/cache/conftool/dbconfig/20240705-051202-marostegui.json
  • 05:00 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2136', diff saved to https://phabricator.wikimedia.org/P65851 and previous config saved to /var/cache/conftool/dbconfig/20240705-050028-root.json
  • 04:56 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235', diff saved to https://phabricator.wikimedia.org/P65850 and previous config saved to /var/cache/conftool/dbconfig/20240705-045655-marostegui.json
  • 04:51 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2123 (T367856)', diff saved to https://phabricator.wikimedia.org/P65849 and previous config saved to /var/cache/conftool/dbconfig/20240705-045145-marostegui.json
  • 04:51 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 04:51 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 04:49 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1246 (T367856)', diff saved to https://phabricator.wikimedia.org/P65848 and previous config saved to /var/cache/conftool/dbconfig/20240705-044912-marostegui.json
  • 04:49 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1246.eqiad.wmnet with reason: Maintenance
  • 04:49 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1246.eqiad.wmnet with reason: Maintenance
  • 04:41 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235', diff saved to https://phabricator.wikimedia.org/P65847 and previous config saved to /var/cache/conftool/dbconfig/20240705-044148-marostegui.json
  • 04:26 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235 (T364069)', diff saved to https://phabricator.wikimedia.org/P65846 and previous config saved to /var/cache/conftool/dbconfig/20240705-042641-marostegui.json
  • 01:32 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1235 (T364069)', diff saved to https://phabricator.wikimedia.org/P65845 and previous config saved to /var/cache/conftool/dbconfig/20240705-013250-marostegui.json
  • 01:32 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1235.eqiad.wmnet with reason: Maintenance
  • 01:32 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1235.eqiad.wmnet with reason: Maintenance
  • 01:32 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234 (T364069)', diff saved to https://phabricator.wikimedia.org/P65844 and previous config saved to /var/cache/conftool/dbconfig/20240705-013229-marostegui.json
  • 01:17 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234', diff saved to https://phabricator.wikimedia.org/P65843 and previous config saved to /var/cache/conftool/dbconfig/20240705-011721-marostegui.json
  • 01:02 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234', diff saved to https://phabricator.wikimedia.org/P65842 and previous config saved to /var/cache/conftool/dbconfig/20240705-010214-marostegui.json
  • 00:47 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234 (T364069)', diff saved to https://phabricator.wikimedia.org/P65841 and previous config saved to /var/cache/conftool/dbconfig/20240705-004707-marostegui.json

2024-07-04

  • 22:04 mvolz@deploy1002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
  • 22:03 mvolz@deploy1002: helmfile [eqiad] START helmfile.d/services/citoid: apply
  • 22:02 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1234 (T364069)', diff saved to https://phabricator.wikimedia.org/P65840 and previous config saved to /var/cache/conftool/dbconfig/20240704-220227-marostegui.json
  • 22:02 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1234.eqiad.wmnet with reason: Maintenance
  • 22:02 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1234.eqiad.wmnet with reason: Maintenance
  • 22:02 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232 (T364069)', diff saved to https://phabricator.wikimedia.org/P65839 and previous config saved to /var/cache/conftool/dbconfig/20240704-220205-marostegui.json
  • 22:01 mvolz@deploy1002: helmfile [codfw] DONE helmfile.d/services/citoid: apply
  • 22:00 mvolz@deploy1002: helmfile [codfw] START helmfile.d/services/citoid: apply
  • 21:59 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/citoid: apply
  • 21:59 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/citoid: apply
  • 21:46 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232', diff saved to https://phabricator.wikimedia.org/P65838 and previous config saved to /var/cache/conftool/dbconfig/20240704-214658-marostegui.json
  • 21:31 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232', diff saved to https://phabricator.wikimedia.org/P65837 and previous config saved to /var/cache/conftool/dbconfig/20240704-213151-marostegui.json
  • 21:16 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232 (T364069)', diff saved to https://phabricator.wikimedia.org/P65836 and previous config saved to /var/cache/conftool/dbconfig/20240704-211644-marostegui.json
  • 20:17 jdrewniak@deploy1002: Finished scap: Backport for [July 4th] Reduce list of exclusions for dark mode (1.43.0-wmf.12), Remove modifications of wgCheckUserLogAdditionalRights (T346022), Add editcontentmodel to interface-admin for French Wikipedia (T369113) (duration: 12m 14s)
  • 20:12 jdrewniak@deploy1002: jdlrobson, nmw03, jdrewniak, dreamyjazz: Continuing with sync
  • 20:08 jdrewniak@deploy1002: jdlrobson, nmw03, jdrewniak, dreamyjazz: Backport for [July 4th] Reduce list of exclusions for dark mode (1.43.0-wmf.12), Remove modifications of wgCheckUserLogAdditionalRights (T346022), Add editcontentmodel to interface-admin for French Wikipedia (T369113) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:05 jdrewniak@deploy1002: Started scap sync-world: Backport for [July 4th] Reduce list of exclusions for dark mode (1.43.0-wmf.12), Remove modifications of wgCheckUserLogAdditionalRights (T346022), Add editcontentmodel to interface-admin for French Wikipedia (T369113)
  • 19:57 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on A:cp-text_eqiad
  • 19:55 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on A:cp-upload_eqiad
  • 18:23 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1232 (T364069)', diff saved to https://phabricator.wikimedia.org/P65835 and previous config saved to /var/cache/conftool/dbconfig/20240704-182308-marostegui.json
  • 18:23 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1232.eqiad.wmnet with reason: Maintenance
  • 18:23 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1232.eqiad.wmnet with reason: Maintenance
  • 18:22 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1219 (T364069)', diff saved to https://phabricator.wikimedia.org/P65834 and previous config saved to /var/cache/conftool/dbconfig/20240704-182257-marostegui.json
  • 18:07 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1219', diff saved to https://phabricator.wikimedia.org/P65833 and previous config saved to /var/cache/conftool/dbconfig/20240704-180749-marostegui.json
  • 17:52 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1219', diff saved to https://phabricator.wikimedia.org/P65832 and previous config saved to /var/cache/conftool/dbconfig/20240704-175242-marostegui.json
  • 17:37 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1219 (T364069)', diff saved to https://phabricator.wikimedia.org/P65831 and previous config saved to /var/cache/conftool/dbconfig/20240704-173735-marostegui.json
  • 17:10 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1078.eqiad.wmnet
  • 16:20 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 16:19 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 16:15 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-worker1078.eqiad.wmnet
  • 16:14 btullis@cumin1002: END (FAIL) - Cookbook sre.hadoop.reboot-workers (exit_code=99) for Hadoop analytics cluster
  • 16:14 btullis@cumin1002: START - Cookbook sre.hadoop.reboot-workers for Hadoop analytics cluster
  • 16:06 kevinbazira@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 15:49 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 15:49 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 15:02 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: sync
  • 15:02 elukey@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: sync
  • 14:33 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1219 (T364069)', diff saved to https://phabricator.wikimedia.org/P65830 and previous config saved to /var/cache/conftool/dbconfig/20240704-143350-marostegui.json
  • 14:33 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1219.eqiad.wmnet with reason: Maintenance
  • 14:33 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1219.eqiad.wmnet with reason: Maintenance
  • 14:33 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1218 (T364069)', diff saved to https://phabricator.wikimedia.org/P65829 and previous config saved to /var/cache/conftool/dbconfig/20240704-143327-marostegui.json
  • 14:18 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1218', diff saved to https://phabricator.wikimedia.org/P65827 and previous config saved to /var/cache/conftool/dbconfig/20240704-141820-marostegui.json
  • 14:03 Lucas_WMDE: UTC afternoon backport+config window done
  • 14:03 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1218', diff saved to https://phabricator.wikimedia.org/P65826 and previous config saved to /var/cache/conftool/dbconfig/20240704-140313-marostegui.json
  • 14:01 claime: Enabling and running puppet on P:trafficserver::backend to merge 1050293 - T367949
  • 14:01 marostegui@cumin1002: dbctl commit (dc=all): 'db1213 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P65825 and previous config saved to /var/cache/conftool/dbconfig/20240704-140145-root.json
  • 13:57 claime: Enabling puppet on cp4037.ulsfo.wmnet to test 1050293 - T367949
  • 13:53 claime: disabling puppet on P:trafficserver::backend to merge 1049507 - T367949
  • 13:48 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1218 (T364069)', diff saved to https://phabricator.wikimedia.org/P65824 and previous config saved to /var/cache/conftool/dbconfig/20240704-134806-marostegui.json
  • 13:46 marostegui@cumin1002: dbctl commit (dc=all): 'db1185 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P65823 and previous config saved to /var/cache/conftool/dbconfig/20240704-134656-root.json
  • 13:46 marostegui@cumin1002: dbctl commit (dc=all): 'db1213 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P65822 and previous config saved to /var/cache/conftool/dbconfig/20240704-134639-root.json
  • 13:44 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for Remove "Create a book" link from sidebar on German Wikipedia (T368900) (duration: 08m 35s)
  • 13:41 claime: Enabling and running puppet on P:trafficserver::backend to merge 1050293 - T367949
  • 13:41 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 13:41 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 13:41 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1246 (T367856)', diff saved to https://phabricator.wikimedia.org/P65821 and previous config saved to /var/cache/conftool/dbconfig/20240704-134105-marostegui.json
  • 13:39 logmsgbot: lucaswerkmeister-wmde@deploy1002 dreamrimmer, lucaswerkmeister-wmde: Continuing with sync
  • 13:38 logmsgbot: lucaswerkmeister-wmde@deploy1002 dreamrimmer, lucaswerkmeister-wmde: Backport for Remove "Create a book" link from sidebar on German Wikipedia (T368900) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:36 claime: Enabling puppet on cp6016.drmrs.wmnet to test 1050293 - T367949
  • 13:36 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap sync-world: Backport for Remove "Create a book" link from sidebar on German Wikipedia (T368900)
  • 13:32 claime: disabling puppet on P:trafficserver::backend to merge 1050293 - T367949
  • 13:31 marostegui@cumin1002: dbctl commit (dc=all): 'db1185 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P65820 and previous config saved to /var/cache/conftool/dbconfig/20240704-133150-root.json
  • 13:31 marostegui@cumin1002: dbctl commit (dc=all): 'db1213 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P65819 and previous config saved to /var/cache/conftool/dbconfig/20240704-133133-root.json
  • 13:25 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1246', diff saved to https://phabricator.wikimedia.org/P65818 and previous config saved to /var/cache/conftool/dbconfig/20240704-132558-marostegui.json
  • 13:20 logmsgbot: andrewtavis-wmde@deploy1002 Finished deploy [airflow-dags/wmde@d773cac]: (no justification provided) (duration: 00m 03s)
  • 13:20 logmsgbot: andrewtavis-wmde@deploy1002 Started deploy [airflow-dags/wmde@d773cac]: (no justification provided)
  • 13:16 marostegui@cumin1002: dbctl commit (dc=all): 'db1185 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P65817 and previous config saved to /var/cache/conftool/dbconfig/20240704-131643-root.json
  • 13:16 marostegui@cumin1002: dbctl commit (dc=all): 'db1213 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P65816 and previous config saved to /var/cache/conftool/dbconfig/20240704-131628-root.json
  • 13:11 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 13:11 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 13:10 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1246', diff saved to https://phabricator.wikimedia.org/P65815 and previous config saved to /var/cache/conftool/dbconfig/20240704-131050-marostegui.json
  • 13:09 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 13:09 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 13:08 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 13:07 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 13:01 marostegui@cumin1002: dbctl commit (dc=all): 'db1185 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P65814 and previous config saved to /var/cache/conftool/dbconfig/20240704-130137-root.json
  • 13:01 marostegui@cumin1002: dbctl commit (dc=all): 'db1213 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P65813 and previous config saved to /var/cache/conftool/dbconfig/20240704-130122-root.json
  • 12:55 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1246 (T367856)', diff saved to https://phabricator.wikimedia.org/P65812 and previous config saved to /var/cache/conftool/dbconfig/20240704-125543-marostegui.json
  • 12:46 marostegui@cumin1002: dbctl commit (dc=all): 'db1185 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P65811 and previous config saved to /var/cache/conftool/dbconfig/20240704-124632-root.json
  • 12:46 marostegui@cumin1002: dbctl commit (dc=all): 'db1213 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P65810 and previous config saved to /var/cache/conftool/dbconfig/20240704-124617-root.json
  • 12:36 hashar@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.43.0-wmf.12 refs T366957
  • 12:31 marostegui@cumin1002: dbctl commit (dc=all): 'db1185 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P65808 and previous config saved to /var/cache/conftool/dbconfig/20240704-123127-root.json
  • 12:31 marostegui@cumin1002: dbctl commit (dc=all): 'db1213 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P65807 and previous config saved to /var/cache/conftool/dbconfig/20240704-123111-root.json
  • 12:27 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1213', diff saved to https://phabricator.wikimedia.org/P65806 and previous config saved to /var/cache/conftool/dbconfig/20240704-122752-root.json
  • 12:16 marostegui@cumin1002: dbctl commit (dc=all): 'db1213 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P65805 and previous config saved to /var/cache/conftool/dbconfig/20240704-121631-root.json
  • 12:16 marostegui@cumin1002: dbctl commit (dc=all): 'db1185 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P65804 and previous config saved to /var/cache/conftool/dbconfig/20240704-121621-root.json
  • 12:11 hashar@deploy1002: Finished scap: Backport for PermissionManager: Handle empty error array from TitleQuickPermissions (T369260) (duration: 07m 45s)
  • 12:06 hashar@deploy1002: hashar, d3r1ck01: Continuing with sync
  • 12:06 hashar@deploy1002: hashar, d3r1ck01: Backport for PermissionManager: Handle empty error array from TitleQuickPermissions (T369260) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 12:03 hashar@deploy1002: Started scap sync-world: Backport for PermissionManager: Handle empty error array from TitleQuickPermissions (T369260)
  • 12:02 hashar@deploy1002: Sync cancelled.
  • 12:02 hashar@deploy1002: hashar, d3r1ck01: Backport for PermissionManager: Handle empty error array from TitleQuickPermissions (T369260) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 11:56 hashar@deploy1002: Started scap sync-world: Backport for PermissionManager: Handle empty error array from TitleQuickPermissions (T369260)
  • 11:55 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1246 (T367856)', diff saved to https://phabricator.wikimedia.org/P65803 and previous config saved to /var/cache/conftool/dbconfig/20240704-115522-marostegui.json
  • 11:55 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1246.eqiad.wmnet with reason: Maintenance
  • 11:55 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1246.eqiad.wmnet with reason: Maintenance
  • 11:54 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1185.eqiad.wmnet onto db1213.eqiad.wmnet
  • 11:46 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
  • 11:45 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
  • 11:40 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 11:39 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 11:14 marostegui@cumin1002: START - Cookbook sre.mysql.clone of db1185.eqiad.wmnet onto db1213.eqiad.wmnet
  • 11:13 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1213 db1185 T369250', diff saved to https://phabricator.wikimedia.org/P65802 and previous config saved to /var/cache/conftool/dbconfig/20240704-111324-root.json
  • 10:52 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1218 (T364069)', diff saved to https://phabricator.wikimedia.org/P65801 and previous config saved to /var/cache/conftool/dbconfig/20240704-105205-marostegui.json
  • 10:51 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1218.eqiad.wmnet with reason: Maintenance
  • 10:51 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1218.eqiad.wmnet with reason: Maintenance
  • 10:51 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207 (T364069)', diff saved to https://phabricator.wikimedia.org/P65800 and previous config saved to /var/cache/conftool/dbconfig/20240704-105143-marostegui.json
  • 10:36 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207', diff saved to https://phabricator.wikimedia.org/P65799 and previous config saved to /var/cache/conftool/dbconfig/20240704-103636-marostegui.json
  • 10:21 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207', diff saved to https://phabricator.wikimedia.org/P65798 and previous config saved to /var/cache/conftool/dbconfig/20240704-102129-marostegui.json
  • 10:06 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207 (T364069)', diff saved to https://phabricator.wikimedia.org/P65797 and previous config saved to /var/cache/conftool/dbconfig/20240704-100622-marostegui.json
  • 09:53 topranks: Pushing updated BGP policy to cr2-eqord in Chiacago to re-announce codfw IP ranges there T367439
  • 09:29 dcaro@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd1009.eqiad.wmnet
  • 09:24 dcaro@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudcephosd1009.eqiad.wmnet
  • 09:23 root@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1009.eqiad.wmnet with OS bullseye
  • 09:23 claime: Manual cleanup of puppet certs for renamed servers mw1417.eqiad.wmnet mw1418.eqiad.wmnet mw2300.codfw.wmnet
  • 09:22 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1231.eqiad.wmnet with reason: Maintenance
  • 09:21 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1231.eqiad.wmnet with reason: Maintenance
  • 09:17 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:17 ayounsi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove old sretest2005 IP - ayounsi@cumin1002"
  • 09:16 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove old sretest2005 IP - ayounsi@cumin1002"
  • 09:13 ayounsi@cumin1002: START - Cookbook sre.dns.netbox
  • 09:05 hashar@deploy1002: rebuilt and synchronized wikiversions files: Revert "group2 wikis to 1.43.0-wmf.12" - T366957
  • 09:03 root@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1009.eqiad.wmnet with reason: host reimage
  • 09:00 root@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1009.eqiad.wmnet with reason: host reimage
  • 08:59 elukey: restart mcrouter on mwmaint1002
  • 08:56 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 08:45 fabfur: enable puppet on A:cp-ulsfo (T365718)
  • 08:45 root@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1009.eqiad.wmnet with OS bullseye
  • 08:44 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp4037.ulsfo.wmnet
  • 08:43 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp4037.ulsfo.wmnet
  • 08:28 ayounsi@cumin1002: START - Cookbook sre.dns.netbox
  • 08:28 hashar@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.43.0-wmf.12 refs T366957
  • 08:24 fabfur: temporary disable puppet on A:cp-ulsfo to apply https://gerrit.wikimedia.org/r/c/operations/puppet/+/1051198 (T365718)
  • 08:10 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 08:02 fabfur@cumin1002: START - Cookbook sre.cdn.roll-reboot rolling reboot on A:cp-text_eqiad
  • 08:02 fabfur@cumin1002: START - Cookbook sre.cdn.roll-reboot rolling reboot on A:cp-upload_eqiad
  • 08:01 fabfur: start rebooting A:cp-eqiad (upload|text in parallel) for T366555
  • 07:52 root@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts cloudcephosd1009.eqiad.wmnet
  • 07:52 root@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd1009.eqiad.wmnet
  • 07:41 root@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudcephosd1009.eqiad.wmnet
  • 07:35 root@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cloudcephosd1009.eqiad.wmnet
  • 07:18 dcausse: closing the backport window
  • 07:15 dcausse: refreshing the wikitech search indices
  • 07:11 dcausse@deploy1002: Finished scap: Backport for cirrus: re-enable search updates on wikitech (duration: 08m 28s)
  • 07:06 dcausse@deploy1002: dcausse: Continuing with sync
  • 07:05 dcausse@deploy1002: dcausse: Backport for cirrus: re-enable search updates on wikitech synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 07:02 dcausse@deploy1002: Started scap sync-world: Backport for cirrus: re-enable search updates on wikitech
  • 07:01 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1207 (T364069)', diff saved to https://phabricator.wikimedia.org/P65794 and previous config saved to /var/cache/conftool/dbconfig/20240704-070100-marostegui.json
  • 07:00 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1207.eqiad.wmnet with reason: Maintenance
  • 07:00 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1207.eqiad.wmnet with reason: Maintenance
  • 07:00 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T364069)', diff saved to https://phabricator.wikimedia.org/P65793 and previous config saved to /var/cache/conftool/dbconfig/20240704-070038-marostegui.json
  • 06:30 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P65791 and previous config saved to /var/cache/conftool/dbconfig/20240704-063024-marostegui.json
  • 06:15 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T364069)', diff saved to https://phabricator.wikimedia.org/P65790 and previous config saved to /var/cache/conftool/dbconfig/20240704-061517-marostegui.json
  • 05:11 marostegui: Deploy schema change on db1231 s6 eqiad dbmaint T367856
  • 05:08 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1231.eqiad.wmnet with reason: Long schema change
  • 05:08 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1231.eqiad.wmnet with reason: Long schema change
  • 05:03 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1231 T369020', diff saved to https://phabricator.wikimedia.org/P65789 and previous config saved to /var/cache/conftool/dbconfig/20240704-050334-marostegui.json
  • 05:02 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db1173 to s6 primary and set section read-write T369020', diff saved to https://phabricator.wikimedia.org/P65788 and previous config saved to /var/cache/conftool/dbconfig/20240704-050237-marostegui.json
  • 05:02 marostegui@cumin1002: dbctl commit (dc=all): 'Set s6 eqiad as read-only for maintenance - T369020', diff saved to https://phabricator.wikimedia.org/P65787 and previous config saved to /var/cache/conftool/dbconfig/20240704-050216-marostegui.json
  • 05:01 marostegui: Starting s6 eqiad failover from db1231 to db1173 - T369020
  • 04:44 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 27 hosts with reason: Primary switchover s6 T369020
  • 04:44 marostegui@cumin1002: dbctl commit (dc=all): 'Set db1173 with weight 0 T369020', diff saved to https://phabricator.wikimedia.org/P65786 and previous config saved to /var/cache/conftool/dbconfig/20240704-044429-marostegui.json
  • 04:44 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 27 hosts with reason: Primary switchover s6 T369020
  • 03:11 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1206 (T364069)', diff saved to https://phabricator.wikimedia.org/P65785 and previous config saved to /var/cache/conftool/dbconfig/20240704-031151-marostegui.json
  • 03:11 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1206.eqiad.wmnet with reason: Maintenance
  • 03:11 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1206.eqiad.wmnet with reason: Maintenance
  • 03:11 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T364069)', diff saved to https://phabricator.wikimedia.org/P65784 and previous config saved to /var/cache/conftool/dbconfig/20240704-031129-marostegui.json
  • 02:56 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P65783 and previous config saved to /var/cache/conftool/dbconfig/20240704-025622-marostegui.json
  • 02:43 ryankemper@cumin2002: END (PASS) - Cookbook sre.hadoop.reboot-workers (exit_code=0) for Hadoop analytics cluster
  • 02:41 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P65782 and previous config saved to /var/cache/conftool/dbconfig/20240704-024115-marostegui.json
  • 02:33 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on A:cp-text_drmrs
  • 02:31 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on A:cp-upload_drmrs
  • 02:26 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T364069)', diff saved to https://phabricator.wikimedia.org/P65781 and previous config saved to /var/cache/conftool/dbconfig/20240704-022608-marostegui.json
  • 01:43 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1239.eqiad.wmnet with reason: Maintenance
  • 01:43 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1239.eqiad.wmnet with reason: Maintenance
  • 01:43 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233 (T367856)', diff saved to https://phabricator.wikimedia.org/P65780 and previous config saved to /var/cache/conftool/dbconfig/20240704-014313-marostegui.json
  • 01:28 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P65779 and previous config saved to /var/cache/conftool/dbconfig/20240704-012806-marostegui.json
  • 01:12 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P65778 and previous config saved to /var/cache/conftool/dbconfig/20240704-011258-marostegui.json
  • 00:57 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233 (T367856)', diff saved to https://phabricator.wikimedia.org/P65777 and previous config saved to /var/cache/conftool/dbconfig/20240704-005750-marostegui.json
  • 00:43 dzahn@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host parsoidtest1001.eqiad.wmnet with OS bullseye
  • 00:43 dzahn@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - dzahn@cumin1002"
  • 00:42 dzahn@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - dzahn@cumin1002"
  • 00:29 dzahn@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on parsoidtest1001.eqiad.wmnet with reason: host reimage
  • 00:25 dzahn@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on parsoidtest1001.eqiad.wmnet with reason: host reimage
  • 00:15 dzahn@cumin1002: START - Cookbook sre.hosts.reimage for host parsoidtest1001.eqiad.wmnet with OS bullseye

2024-07-03

  • 23:47 tzatziki: removing 11 files for legal compliance
  • 23:23 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1196 (T364069)', diff saved to https://phabricator.wikimedia.org/P65776 and previous config saved to /var/cache/conftool/dbconfig/20240703-232302-marostegui.json
  • 23:22 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 23:22 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 23:22 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1196.eqiad.wmnet with reason: Maintenance
  • 23:22 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1196.eqiad.wmnet with reason: Maintenance
  • 23:22 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1195 (T364069)', diff saved to https://phabricator.wikimedia.org/P65775 and previous config saved to /var/cache/conftool/dbconfig/20240703-232221-marostegui.json
  • 23:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 (T352010)', diff saved to https://phabricator.wikimedia.org/P65774 and previous config saved to /var/cache/conftool/dbconfig/20240703-232154-ladsgroup.json
  • 23:07 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1195', diff saved to https://phabricator.wikimedia.org/P65773 and previous config saved to /var/cache/conftool/dbconfig/20240703-230713-marostegui.json
  • 23:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P65772 and previous config saved to /var/cache/conftool/dbconfig/20240703-230646-ladsgroup.json
  • 22:52 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1195', diff saved to https://phabricator.wikimedia.org/P65771 and previous config saved to /var/cache/conftool/dbconfig/20240703-225206-marostegui.json
  • 22:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P65770 and previous config saved to /var/cache/conftool/dbconfig/20240703-225139-ladsgroup.json
  • 22:37 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1195 (T364069)', diff saved to https://phabricator.wikimedia.org/P65769 and previous config saved to /var/cache/conftool/dbconfig/20240703-223659-marostegui.json
  • 22:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 (T352010)', diff saved to https://phabricator.wikimedia.org/P65768 and previous config saved to /var/cache/conftool/dbconfig/20240703-223632-ladsgroup.json
  • 22:36 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host parsoidtest1001.eqiad.wmnet with OS bullseye
  • 21:56 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host parsoidtest1001.eqiad.wmnet with OS bullseye
  • 21:40 ryankemper@cumin2002: START - Cookbook sre.hadoop.reboot-workers for Hadoop analytics cluster
  • 21:40 ryankemper@cumin2002: END (FAIL) - Cookbook sre.hadoop.reboot-workers (exit_code=99) for Hadoop analytics cluster
  • 21:35 ryankemper@cumin2002: START - Cookbook sre.hadoop.reboot-workers for Hadoop analytics cluster
  • 20:13 cjming: end of UTC late backport window
  • 20:11 cjming@deploy1002: Finished scap: Backport for Remove QuickSurvey for Automoderator patroller workstream survey (T362969) (duration: 08m 22s)
  • 20:10 cmooney@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host sretest2002.codfw.wmnet with OS bookworm
  • 20:06 cjming@deploy1002: kgraessle, cjming: Continuing with sync
  • 20:05 cjming@deploy1002: kgraessle, cjming: Backport for Remove QuickSurvey for Automoderator patroller workstream survey (T362969) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:05 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 20:04 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 20:04 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 20:03 cjming@deploy1002: Started scap sync-world: Backport for Remove QuickSurvey for Automoderator patroller workstream survey (T362969)
  • 19:56 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:55 cmooney@cumin1002: START - Cookbook sre.dns.netbox
  • 19:54 cmooney@cumin1002: START - Cookbook sre.hosts.reimage for host sretest2002.codfw.wmnet with OS bookworm
  • 19:49 cmooney@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host sretest2002.codfw.wmnet with OS bookworm
  • 19:40 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1195 (T364069)', diff saved to https://phabricator.wikimedia.org/P65766 and previous config saved to /var/cache/conftool/dbconfig/20240703-194055-marostegui.json
  • 19:40 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1195.eqiad.wmnet with reason: Maintenance
  • 19:40 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1195.eqiad.wmnet with reason: Maintenance
  • 19:40 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T364069)', diff saved to https://phabricator.wikimedia.org/P65765 and previous config saved to /var/cache/conftool/dbconfig/20240703-194033-marostegui.json
  • 19:30 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 19:30 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 19:25 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P65761 and previous config saved to /var/cache/conftool/dbconfig/20240703-192526-marostegui.json
  • 19:25 cmooney@cumin1002: START - Cookbook sre.hosts.reimage for host sretest2002.codfw.wmnet with OS bookworm
  • 19:24 cmooney@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2002.codfw.wmnet with OS bookworm
  • 19:19 cmooney@cumin1002: START - Cookbook sre.hosts.reimage for host sretest2002.codfw.wmnet with OS bookworm
  • 19:16 cmooney@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host sretest2002.codfw.wmnet with OS bookworm
  • 19:12 ebysans@deploy1002: Finished deploy [airflow-dags/analytics@d773cac]: (no justification provided) (duration: 00m 33s)
  • 19:11 ebysans@deploy1002: Started deploy [airflow-dags/analytics@d773cac]: (no justification provided)
  • 19:10 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P65760 and previous config saved to /var/cache/conftool/dbconfig/20240703-191019-marostegui.json
  • 19:08 SandraEbele_: deploying airflow dags
  • 18:55 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T364069)', diff saved to https://phabricator.wikimedia.org/P65759 and previous config saved to /var/cache/conftool/dbconfig/20240703-185511-marostegui.json
  • 18:54 cmooney@cumin1002: START - Cookbook sre.hosts.reimage for host sretest2002.codfw.wmnet with OS bookworm
  • 18:36 cdanis@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 18:36 cdanis@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 18:35 cdanis@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 18:34 cdanis@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 17:50 cdanis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 17:49 cdanis@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 17:49 cdanis@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
  • 17:48 cdanis@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
  • 17:46 cdanis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
  • 17:45 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Unbanning all hosts in search_eqiad
  • 17:45 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Unbanning all hosts in search_eqiad
  • 17:44 cdanis@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
  • 17:44 cdanis@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
  • 17:43 cdanis@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
  • 17:43 cdanis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 17:41 cdanis@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 17:41 cdanis@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 17:40 cdanis@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 17:40 cdanis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply
  • 17:37 cdanis@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply
  • 17:37 cdanis@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply
  • 17:36 cdanis@deploy1002: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply
  • 17:36 marostegui@cumin1002: dbctl commit (dc=all): 'db2207 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P65758 and previous config saved to /var/cache/conftool/dbconfig/20240703-173601-root.json
  • 17:35 cdanis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-misc: apply
  • 17:35 cdanis@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-misc: apply
  • 17:35 cdanis@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-misc: apply
  • 17:35 cdanis@deploy1002: helmfile [codfw] START helmfile.d/services/mw-misc: apply
  • 17:35 cdanis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-wikifunctions: apply
  • 17:34 cdanis@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-wikifunctions: apply
  • 17:34 cdanis@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-wikifunctions: apply
  • 17:34 cdanis@deploy1002: helmfile [codfw] START helmfile.d/services/mw-wikifunctions: apply
  • 17:34 cdanis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-jobrunner: apply
  • 17:33 cdanis@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-jobrunner: apply
  • 17:33 cdanis@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-jobrunner: apply
  • 17:31 cdanis@deploy1002: helmfile [codfw] START helmfile.d/services/mw-jobrunner: apply
  • 17:30 cdanis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 17:29 cdanis@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 17:28 cdanis@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 17:28 cdanis@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 17:22 jforrester@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 17:20 marostegui@cumin1002: dbctl commit (dc=all): 'db2207 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P65756 and previous config saved to /var/cache/conftool/dbconfig/20240703-172055-root.json
  • 17:19 jforrester@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 17:19 jforrester@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 17:17 jforrester@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 17:17 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 17:15 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 17:11 jforrester@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 17:10 jforrester@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 17:10 jforrester@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 17:09 jforrester@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 17:08 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 17:07 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 17:05 marostegui@cumin1002: dbctl commit (dc=all): 'db2207 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P65755 and previous config saved to /var/cache/conftool/dbconfig/20240703-170549-root.json
  • 16:50 marostegui@cumin1002: dbctl commit (dc=all): 'db2207 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P65754 and previous config saved to /var/cache/conftool/dbconfig/20240703-165044-root.json
  • 16:47 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-presto1004.eqiad.wmnet with reason: Cold booting to investigate RAM issue
  • 16:46 btullis@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on an-presto1004.eqiad.wmnet with reason: Cold booting to investigate RAM issue
  • 16:44 jhathaway: adding inbound email servers mx-in{1001,2001} to our MX record
  • 16:35 marostegui@cumin1002: dbctl commit (dc=all): 'db2207 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P65752 and previous config saved to /var/cache/conftool/dbconfig/20240703-163538-root.json
  • 16:20 marostegui@cumin1002: dbctl commit (dc=all): 'db2207 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P65751 and previous config saved to /var/cache/conftool/dbconfig/20240703-162032-root.json
  • 16:05 marostegui@cumin1002: dbctl commit (dc=all): 'db2207 (re)pooling @ 1%: Repooling', diff saved to Unable to send diff to phaste and previous config saved to /var/cache/conftool/dbconfig/20240703-160521-root.json
  • 16:04 kevinbazira@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 15:47 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1186 (T364069)', diff saved to https://phabricator.wikimedia.org/P65750 and previous config saved to /var/cache/conftool/dbconfig/20240703-154716-marostegui.json
  • 15:47 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1186.eqiad.wmnet with reason: Maintenance
  • 15:46 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1186.eqiad.wmnet with reason: Maintenance
  • 15:46 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T364069)', diff saved to https://phabricator.wikimedia.org/P65749 and previous config saved to /var/cache/conftool/dbconfig/20240703-154643-marostegui.json
  • 15:41 arnaudb@cumin1002: dbctl commit (dc=all): 'db1191 (re)pooling @ 100%: post T365994 repool', diff saved to https://phabricator.wikimedia.org/P65748 and previous config saved to /var/cache/conftool/dbconfig/20240703-154142-arnaudb.json
  • 15:41 arnaudb@cumin1002: dbctl commit (dc=all): 'db1196 (re)pooling @ 100%: post T365994 repool', diff saved to https://phabricator.wikimedia.org/P65747 and previous config saved to /var/cache/conftool/dbconfig/20240703-154121-arnaudb.json
  • 15:41 arnaudb@cumin1002: dbctl commit (dc=all): 'db1197 (re)pooling @ 100%: post T365994 repool', diff saved to https://phabricator.wikimedia.org/P65746 and previous config saved to /var/cache/conftool/dbconfig/20240703-154109-arnaudb.json
  • 15:32 kevinbazira@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 15:31 sukhe: restart haproxy on dns1005
  • 15:31 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P65744 and previous config saved to /var/cache/conftool/dbconfig/20240703-153136-marostegui.json
  • 15:26 arnaudb@cumin1002: dbctl commit (dc=all): 'db1191 (re)pooling @ 75%: post T365994 repool', diff saved to https://phabricator.wikimedia.org/P65743 and previous config saved to /var/cache/conftool/dbconfig/20240703-152636-arnaudb.json
  • 15:26 arnaudb@cumin1002: dbctl commit (dc=all): 'db1196 (re)pooling @ 75%: post T365994 repool', diff saved to https://phabricator.wikimedia.org/P65742 and previous config saved to /var/cache/conftool/dbconfig/20240703-152616-arnaudb.json
  • 15:26 arnaudb@cumin1002: dbctl commit (dc=all): 'db1197 (re)pooling @ 75%: post T365994 repool', diff saved to https://phabricator.wikimedia.org/P65741 and previous config saved to /var/cache/conftool/dbconfig/20240703-152603-arnaudb.json
  • 15:16 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P65740 and previous config saved to /var/cache/conftool/dbconfig/20240703-151628-marostegui.json
  • 15:14 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:14 ayounsi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: 208.80.152.129 v6 - ayounsi@cumin1002"
  • 15:13 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: 208.80.152.129 v6 - ayounsi@cumin1002"
  • 15:11 arnaudb@cumin1002: dbctl commit (dc=all): 'db1191 (re)pooling @ 50%: post T365994 repool', diff saved to https://phabricator.wikimedia.org/P65739 and previous config saved to /var/cache/conftool/dbconfig/20240703-151131-arnaudb.json
  • 15:11 arnaudb@cumin1002: dbctl commit (dc=all): 'db1196 (re)pooling @ 50%: post T365994 repool', diff saved to https://phabricator.wikimedia.org/P65738 and previous config saved to /var/cache/conftool/dbconfig/20240703-151110-arnaudb.json
  • 15:10 arnaudb@cumin1002: dbctl commit (dc=all): 'db1197 (re)pooling @ 50%: post T365994 repool', diff saved to https://phabricator.wikimedia.org/P65737 and previous config saved to /var/cache/conftool/dbconfig/20240703-151057-arnaudb.json
  • 15:10 ayounsi@cumin1002: START - Cookbook sre.dns.netbox
  • 15:04 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1233 (T367856)', diff saved to https://phabricator.wikimedia.org/P65736 and previous config saved to /var/cache/conftool/dbconfig/20240703-150411-marostegui.json
  • 15:04 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1233.eqiad.wmnet with reason: Maintenance
  • 15:03 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1233.eqiad.wmnet with reason: Maintenance
  • 15:03 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229 (T367856)', diff saved to https://phabricator.wikimedia.org/P65735 and previous config saved to /var/cache/conftool/dbconfig/20240703-150348-marostegui.json
  • 15:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T364069)', diff saved to https://phabricator.wikimedia.org/P65734 and previous config saved to /var/cache/conftool/dbconfig/20240703-150121-marostegui.json
  • 14:56 arnaudb@cumin1002: dbctl commit (dc=all): 'db1191 (re)pooling @ 25%: post T365994 repool', diff saved to https://phabricator.wikimedia.org/P65733 and previous config saved to /var/cache/conftool/dbconfig/20240703-145625-arnaudb.json
  • 14:56 arnaudb@cumin1002: dbctl commit (dc=all): 'db1196 (re)pooling @ 25%: post T365994 repool', diff saved to https://phabricator.wikimedia.org/P65732 and previous config saved to /var/cache/conftool/dbconfig/20240703-145604-arnaudb.json
  • 14:55 arnaudb@cumin1002: dbctl commit (dc=all): 'db1197 (re)pooling @ 25%: post T365994 repool', diff saved to https://phabricator.wikimedia.org/P65731 and previous config saved to /var/cache/conftool/dbconfig/20240703-145552-arnaudb.json
  • 14:54 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host parsoidtest1001.eqiad.wmnet with OS bullseye
  • 14:51 fabfur@cumin1002: START - Cookbook sre.cdn.roll-reboot rolling reboot on A:cp-upload_drmrs
  • 14:51 fabfur@cumin1002: START - Cookbook sre.cdn.roll-reboot rolling reboot on A:cp-text_drmrs
  • 14:51 fabfur: start rebooting A:cp-drmrs (upload|text in parallel) for T366555
  • 14:48 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P65730 and previous config saved to /var/cache/conftool/dbconfig/20240703-144841-marostegui.json
  • 14:46 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply
  • 14:45 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply
  • 14:41 arnaudb@cumin1002: dbctl commit (dc=all): 'db1191 (re)pooling @ 10%: post T365994 repool', diff saved to https://phabricator.wikimedia.org/P65729 and previous config saved to /var/cache/conftool/dbconfig/20240703-144119-arnaudb.json
  • 14:41 arnaudb@cumin1002: dbctl commit (dc=all): 'db1196 (re)pooling @ 10%: post T365994 repool', diff saved to https://phabricator.wikimedia.org/P65728 and previous config saved to /var/cache/conftool/dbconfig/20240703-144059-arnaudb.json
  • 14:40 arnaudb@cumin1002: dbctl commit (dc=all): 'db1197 (re)pooling @ 10%: post T365994 repool', diff saved to https://phabricator.wikimedia.org/P65727 and previous config saved to /var/cache/conftool/dbconfig/20240703-144046-arnaudb.json
  • 14:40 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-conf1006.eqiad.wmnet with OS bookworm
  • 14:40 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-conf1005.eqiad.wmnet with OS bookworm
  • 14:40 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-conf1004.eqiad.wmnet with OS bookworm
  • 14:39 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 14:39 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 14:38 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 14:38 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 14:35 sukhe: [correction of previous A:dnsbox run] sudo cumin -b1 -s60 "A:dnsbox" "run-puppet-agent"
  • 14:33 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P65726 and previous config saved to /var/cache/conftool/dbconfig/20240703-143334-marostegui.json
  • 14:33 sukhe: sudo cumin "A:dnsbox" "run-puppet-agent"
  • 14:32 sukhe: sudo cumin "A:wikidough" "run-puppet-agent"
  • 14:32 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for kubernetes1060.eqiad.wmnet,wikikube-worker[1007,1021].eqiad.wmnet
  • 14:32 jayme@cumin1002: START - Cookbook sre.hosts.remove-downtime for kubernetes1060.eqiad.wmnet,wikikube-worker[1007,1021].eqiad.wmnet
  • 14:30 jayme@cumin1002: conftool action : set/pooled=yes; selector: name=(wikikube-worker1007.eqiad.wmnet|wikikube-worker1021.eqiad.wmnet|kubernetes1060.eqiad.wmnet)
  • 14:27 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply
  • 14:27 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply
  • 14:26 arnaudb@cumin1002: dbctl commit (dc=all): 'db1191 (re)pooling @ 5%: post T365994 repool', diff saved to https://phabricator.wikimedia.org/P65725 and previous config saved to /var/cache/conftool/dbconfig/20240703-142614-arnaudb.json
  • 14:25 arnaudb@cumin1002: dbctl commit (dc=all): 'db1196 (re)pooling @ 5%: post T365994 repool', diff saved to https://phabricator.wikimedia.org/P65724 and previous config saved to /var/cache/conftool/dbconfig/20240703-142553-arnaudb.json
  • 14:25 arnaudb@cumin1002: dbctl commit (dc=all): 'db1197 (re)pooling @ 5%: post T365994 repool', diff saved to https://phabricator.wikimedia.org/P65723 and previous config saved to /var/cache/conftool/dbconfig/20240703-142541-arnaudb.json
  • 14:25 klausman@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'article-descriptions' for release 'main' .
  • 14:21 jayme@cumin1002: conftool action : set/pooled=inactive; selector: name=(wikikube-worker1007.eqiad.wmnet|wikikube-worker1021.eqiad.wmnet|kubernetes1060.eqiad.wmnet)
  • 14:18 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 14:18 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229 (T367856)', diff saved to https://phabricator.wikimedia.org/P65722 and previous config saved to /var/cache/conftool/dbconfig/20240703-141826-marostegui.json
  • 14:17 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:45:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1013,1017,1021].eqiad.wmnet with reason: T365994
  • 14:17 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 0:45:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1013,1017,1021].eqiad.wmnet with reason: T365994
  • 14:17 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:45:00 on db1154.eqiad.wmnet with reason: T365994
  • 14:16 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 0:45:00 on db1154.eqiad.wmnet with reason: T365994
  • 14:11 klausman@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'article-descriptions' for release 'main' .
  • 14:10 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host parsoidtest1001.eqiad.wmnet with OS bullseye
  • 14:09 klausman@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'article-descriptions' for release 'main' .
  • 14:09 klausman@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'article-descriptions' for release 'main' .
  • 14:09 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 14:08 klausman@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'article-descriptions' for release 'main' .
  • 14:07 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host parsoidtest1001.eqiad.wmnet with OS bullseye
  • 14:04 topranks: rebooting lsw1-e2-eqiad to install updated JunOS version T365994
  • 14:01 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on 22 hosts with reason: JunOS upgrade lsw1-e2-eqiad
  • 14:00 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:40:00 on 22 hosts with reason: JunOS upgrade lsw1-e2-eqiad
  • 13:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on elastic[1091-1092].eqiad.wmnet,wdqs[1018,1020].eqiad.wmnet with reason: T348977
  • 13:59 bking@cumin2002: START - Cookbook sre.hosts.downtime for 4:00:00 on elastic[1091-1092].eqiad.wmnet,wdqs[1018,1020].eqiad.wmnet with reason: T348977
  • 13:58 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on lsw1-e2-eqiad,lsw1-e2-eqiad IPv6,ssw1-e1-eqiad.mgmt,ssw1-f1-eqiad.mgmt with reason: JunOS upgrade lsw1-e2-eqiad
  • 13:58 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:40:00 on lsw1-e2-eqiad,lsw1-e2-eqiad IPv6,ssw1-e1-eqiad.mgmt,ssw1-f1-eqiad.mgmt with reason: JunOS upgrade lsw1-e2-eqiad
  • 13:57 jayme@cumin1002: conftool action : set/pooled=no; selector: name=(wikikube-worker1007.eqiad.wmnet|wikikube-worker1021.eqiad.wmnet|kubernetes1060.eqiad.wmnet)
  • 13:56 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: elastic1091*,elastic1092* for T348977 - bking@cumin2002
  • 13:56 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: elastic1091*,elastic1092* for T348977 - bking@cumin2002
  • 13:56 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:20:00 on kubernetes1060.eqiad.wmnet,wikikube-worker[1007,1021].eqiad.wmnet with reason: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2
  • 13:55 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 1:20:00 on kubernetes1060.eqiad.wmnet,wikikube-worker[1007,1021].eqiad.wmnet with reason: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2
  • 13:53 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:50:00 on lsw1-e2-eqiad.mgmt with reason: prep JunOS upgrade lsw1-e2-eqiad
  • 13:52 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:50:00 on lsw1-e2-eqiad.mgmt with reason: prep JunOS upgrade lsw1-e2-eqiad
  • 13:48 Lucas_WMDE: UTC afternoon backport+config window done
  • 13:48 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for noc: fail with a 404 when the selected wiki is nonexistent, CirrusSearch: add wgCirrusSearchIndexFieldsToCleanup (duration: 08m 38s)
  • 13:44 jayme: draining wikikube-worker1007.eqiad.wmnet wikikube-worker1021.eqiad.wmnet kubernetes1060.eqiad.wmnet for T365994
  • 13:43 logmsgbot: lucaswerkmeister-wmde@deploy1002 dcausse, lucaswerkmeister-wmde: Continuing with sync
  • 13:42 logmsgbot: lucaswerkmeister-wmde@deploy1002 dcausse, lucaswerkmeister-wmde: Backport for noc: fail with a 404 when the selected wiki is nonexistent, CirrusSearch: add wgCirrusSearchIndexFieldsToCleanup synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:39 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap sync-world: Backport for noc: fail with a 404 when the selected wiki is nonexistent, CirrusSearch: add wgCirrusSearchIndexFieldsToCleanup
  • 13:38 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for GlobalRenameQueue: Fix issues with wiki ID and row query (T369147) (duration: 09m 28s)
  • 13:33 logmsgbot: lucaswerkmeister-wmde@deploy1002 kharlan, lucaswerkmeister-wmde: Continuing with sync
  • 13:31 logmsgbot: lucaswerkmeister-wmde@deploy1002 kharlan, lucaswerkmeister-wmde: Backport for GlobalRenameQueue: Fix issues with wiki ID and row query (T369147) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:29 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-conf1006.eqiad.wmnet with OS bookworm
  • 13:29 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-conf1005.eqiad.wmnet with OS bookworm
  • 13:29 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-conf1004.eqiad.wmnet with OS bookworm
  • 13:28 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap sync-world: Backport for GlobalRenameQueue: Fix issues with wiki ID and row query (T369147)
  • 13:25 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for PropertyValueExpertsModule: Turn on enableModuleContentVersion() (T369155) (duration: 08m 20s)
  • 13:22 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host parsoidtest1001.eqiad.wmnet with OS bullseye
  • 13:20 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host parsoidtest1001
  • 13:20 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde: Continuing with sync
  • 13:19 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde: Backport for PropertyValueExpertsModule: Turn on enableModuleContentVersion() (T369155) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:19 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host parsoidtest1001
  • 13:18 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db[1191,1196-1197].eqiad.wmnet with reason: T365994
  • 13:18 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db[1191,1196-1197].eqiad.wmnet with reason: T365994
  • 13:17 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) 49.3.193.10.in-addr.arpa. on all recursors
  • 13:17 cmooney@cumin1002: START - Cookbook sre.dns.wipe-cache 49.3.193.10.in-addr.arpa. on all recursors
  • 13:17 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) sretest2002.mgmt.codfw.wmnet on all recursors
  • 13:17 cmooney@cumin1002: START - Cookbook sre.dns.wipe-cache sretest2002.mgmt.codfw.wmnet on all recursors
  • 13:17 arnaudb@cumin1002: dbctl commit (dc=all): 'T365994 - depool db1191,db1196,db1197', diff saved to https://phabricator.wikimedia.org/P65721 and previous config saved to /var/cache/conftool/dbconfig/20240703-131715-arnaudb.json
  • 13:17 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap sync-world: Backport for PropertyValueExpertsModule: Turn on enableModuleContentVersion() (T369155)
  • 13:16 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:16 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add sretest2002 entries - cmooney@cumin1002"
  • 13:15 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ mwscript namespaceDupes kawikisource --fix # T363243; 34 pages to fix, 34 were resolvable; 774 links to fix, 774 were resolvable, 0 were deleted
  • 13:15 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add sretest2002 entries - cmooney@cumin1002"
  • 13:14 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ mwscript namespaceDupes mswikisource --fix # T369047; 6 pages to fix, 6 were resolvable; 76 links to fix, 73 were resolvable, 3 were deleted
  • 13:13 cmooney@cumin1002: START - Cookbook sre.dns.netbox
  • 13:12 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for mswikisource: create author and translation namespaces and add namespace aliases (T369047), kawikisource: create author namespace, add namespace aliases and sitename (T363243) (duration: 10m 39s)
  • 13:07 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde, anzx: Continuing with sync
  • 13:04 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde, anzx: Backport for mswikisource: create author and translation namespaces and add namespace aliases (T369047), kawikisource: create author namespace, add namespace aliases and sitename (T363243) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:01 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap sync-world: Backport for mswikisource: create author and translation namespaces and add namespace aliases (T369047), kawikisource: create author namespace, add namespace aliases and sitename (T363243)
  • 12:51 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: sync
  • 12:47 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: sync
  • 12:39 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply
  • 12:39 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply
  • 12:37 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: sync
  • 12:34 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: sync
  • 12:30 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply
  • 12:17 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply
  • 12:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2165 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P65720 and previous config saved to /var/cache/conftool/dbconfig/20240703-121009-ladsgroup.json
  • 11:55 ladsgroup@deploy1002: Finished scap: Backport for rpc: Update function call in RunSingleJob (T363839) (duration: 08m 08s)
  • 11:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2165 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P65719 and previous config saved to /var/cache/conftool/dbconfig/20240703-115504-ladsgroup.json
  • 11:52 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1184 (T364069)', diff saved to https://phabricator.wikimedia.org/P65718 and previous config saved to /var/cache/conftool/dbconfig/20240703-115211-marostegui.json
  • 11:52 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 11:51 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 11:51 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T364069)', diff saved to https://phabricator.wikimedia.org/P65717 and previous config saved to /var/cache/conftool/dbconfig/20240703-115149-marostegui.json
  • 11:50 ladsgroup@deploy1002: ladsgroup: Continuing with sync
  • 11:49 ladsgroup@deploy1002: ladsgroup: Backport for rpc: Update function call in RunSingleJob (T363839) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 11:47 ladsgroup@deploy1002: Started scap sync-world: Backport for rpc: Update function call in RunSingleJob (T363839)
  • 11:45 ladsgroup@deploy1002: Finished scap: Backport for Optimize static footer 'a Wikimedia project' icon further (T256190) (duration: 09m 28s)
  • 11:40 ladsgroup@deploy1002: volker-e, ladsgroup: Continuing with sync
  • 11:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2165 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P65716 and previous config saved to /var/cache/conftool/dbconfig/20240703-113958-ladsgroup.json
  • 11:39 ladsgroup@deploy1002: volker-e, ladsgroup: Backport for Optimize static footer 'a Wikimedia project' icon further (T256190) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 11:36 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P65715 and previous config saved to /var/cache/conftool/dbconfig/20240703-113642-marostegui.json
  • 11:35 ladsgroup@deploy1002: Started scap sync-world: Backport for Optimize static footer 'a Wikimedia project' icon further (T256190)
  • 11:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1192 (T352010)', diff saved to https://phabricator.wikimedia.org/P65714 and previous config saved to /var/cache/conftool/dbconfig/20240703-112728-ladsgroup.json
  • 11:27 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance
  • 11:27 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance
  • 11:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2165 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P65713 and previous config saved to /var/cache/conftool/dbconfig/20240703-112452-ladsgroup.json
  • 11:21 cgoubert@deploy1002: Finished scap: mw-on-k8s: Move php.envvars to mediawiki-common - T365265 (duration: 05m 22s)
  • 11:21 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P65712 and previous config saved to /var/cache/conftool/dbconfig/20240703-112135-marostegui.json
  • 11:16 cgoubert@deploy1002: Started scap sync-world: mw-on-k8s: Move php.envvars to mediawiki-common - T365265
  • 11:15 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 11:15 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 11:06 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T364069)', diff saved to https://phabricator.wikimedia.org/P65711 and previous config saved to /var/cache/conftool/dbconfig/20240703-110627-marostegui.json
  • 10:38 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1229 (T367856)', diff saved to https://phabricator.wikimedia.org/P65710 and previous config saved to /var/cache/conftool/dbconfig/20240703-103839-marostegui.json
  • 10:38 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1229.eqiad.wmnet with reason: Maintenance
  • 10:38 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1229.eqiad.wmnet with reason: Maintenance
  • 10:33 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 10:32 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 10:32 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 09:49 logmsgbot: andrewtavis-wmde@deploy1002 Finished deploy [airflow-dags/wmde@d773cac]: (no justification provided) (duration: 00m 07s)
  • 09:49 logmsgbot: andrewtavis-wmde@deploy1002 Started deploy [airflow-dags/wmde@d773cac]: (no justification provided)
  • 09:31 mlitn@deploy1002: Finished scap: Backport for Handle campaigns where wikibase is not enabled (T369085) (duration: 12m 59s)
  • 09:27 ayounsi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "testvm2008 - ayounsi@cumin1002"
  • 09:26 mlitn@deploy1002: mlitn: Continuing with sync
  • 09:26 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "testvm2008 - ayounsi@cumin1002"
  • 09:21 mlitn@deploy1002: mlitn: Backport for Handle campaigns where wikibase is not enabled (T369085) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 09:20 filippo@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: apply
  • 09:20 filippo@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifeeds: apply
  • 09:20 filippo@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifeeds: apply
  • 09:20 ayounsi@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host testvm2008.wikimedia.org
  • 09:20 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host testvm2008.wikimedia.org with OS bookworm
  • 09:20 filippo@deploy1002: helmfile [codfw] START helmfile.d/services/wikifeeds: apply
  • 09:19 marostegui@cumin1002: dbctl commit (dc=all): 'Give more weight to db2136 - running 10.11 T365805', diff saved to https://phabricator.wikimedia.org/P65709 and previous config saved to /var/cache/conftool/dbconfig/20240703-091956-marostegui.json
  • 09:18 mlitn@deploy1002: Started scap sync-world: Backport for Handle campaigns where wikibase is not enabled (T369085)
  • 09:09 brouberol@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-stretch2002.codfw.wmnet
  • 09:06 topranks: merge host firewall changes to set default DSCP marking (T339850)
  • 09:06 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on testvm2008.wikimedia.org with reason: host reimage
  • 09:02 ayounsi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on testvm2008.wikimedia.org with reason: host reimage
  • 09:02 brouberol@cumin1002: START - Cookbook sre.hosts.reboot-single for host kafka-stretch2002.codfw.wmnet
  • 09:01 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 09:01 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 09:00 filippo@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: apply
  • 09:00 filippo@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifeeds: apply
  • 09:00 filippo@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifeeds: apply
  • 08:59 filippo@deploy1002: helmfile [codfw] START helmfile.d/services/wikifeeds: apply
  • 08:58 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 08:58 brouberol@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-stretch2001.codfw.wmnet
  • 08:57 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 08:53 jayme: deployed istio (adding securityContext) to wikikube clusters - T362978
  • 08:51 brouberol@cumin1002: START - Cookbook sre.hosts.reboot-single for host kafka-stretch2001.codfw.wmnet
  • 08:51 brouberol@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-stretch1002.eqiad.wmnet
  • 08:49 Lucas_WMDE: RELEASE_NAME=r72z2aop helmfile --file /srv/deployment-charts/helmfile.d/services/mw-script/helmfile.yaml --environment eqiad --selector name=r72z2aop destroy # clean up broken mwscript-k8s run I did just to test something
  • 08:46 ayounsi@cumin1002: START - Cookbook sre.hosts.reimage for host testvm2008.wikimedia.org with OS bookworm
  • 08:45 ayounsi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM testvm2008.wikimedia.org - ayounsi@cumin1002"
  • 08:45 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM testvm2008.wikimedia.org - ayounsi@cumin1002"
  • 08:44 brouberol@cumin1002: START - Cookbook sre.hosts.reboot-single for host kafka-stretch1002.eqiad.wmnet
  • 08:44 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2008.wikimedia.org on all recursors
  • 08:44 ayounsi@cumin1002: START - Cookbook sre.dns.wipe-cache testvm2008.wikimedia.org on all recursors
  • 08:44 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:44 ayounsi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2008.wikimedia.org - ayounsi@cumin1002"
  • 08:43 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2008.wikimedia.org - ayounsi@cumin1002"
  • 08:43 filippo@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply
  • 08:42 filippo@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox: apply
  • 08:42 filippo@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox: apply
  • 08:42 brouberol@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-stretch1001.eqiad.wmnet
  • 08:41 filippo@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox: apply
  • 08:41 filippo@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply
  • 08:41 filippo@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply
  • 08:41 filippo@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply
  • 08:41 filippo@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 08:41 filippo@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 08:40 filippo@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply
  • 08:40 filippo@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 08:40 ayounsi@cumin1002: START - Cookbook sre.dns.netbox
  • 08:40 ayounsi@cumin1002: START - Cookbook sre.ganeti.makevm for new host testvm2008.wikimedia.org
  • 08:40 filippo@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply
  • 08:40 filippo@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 08:40 filippo@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 08:40 filippo@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-media: apply
  • 08:40 filippo@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 08:40 filippo@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply
  • 08:39 filippo@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply
  • 08:39 filippo@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply
  • 08:39 filippo@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply
  • 08:39 filippo@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply
  • 08:38 filippo@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply
  • 08:35 brouberol@cumin1002: START - Cookbook sre.hosts.reboot-single for host kafka-stretch1001.eqiad.wmnet
  • 08:31 brouberol@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host karapace1002.eqiad.wmnet
  • 08:22 brouberol@cumin1002: START - Cookbook sre.hosts.reboot-single for host karapace1002.eqiad.wmnet
  • 08:18 brouberol@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host karapace1001.eqiad.wmnet
  • 08:11 hashar@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.43.0-wmf.12 refs T366957
  • 08:11 marostegui@cumin1002: dbctl commit (dc=all): 'Give more weight to db2136 - running 10.11 T365805', diff saved to https://phabricator.wikimedia.org/P65707 and previous config saved to /var/cache/conftool/dbconfig/20240703-081059-marostegui.json
  • 08:09 brouberol@cumin1002: START - Cookbook sre.hosts.reboot-single for host karapace1001.eqiad.wmnet
  • 08:09 brouberol@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host karapace1001.eqiad.wmnet
  • 08:09 brouberol@cumin1002: START - Cookbook sre.hosts.reboot-single for host karapace1001.eqiad.wmnet
  • 07:52 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1169 (T364069)', diff saved to https://phabricator.wikimedia.org/P65706 and previous config saved to /var/cache/conftool/dbconfig/20240703-075245-marostegui.json
  • 07:52 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 07:52 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 07:43 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216 (T364069)', diff saved to https://phabricator.wikimedia.org/P65705 and previous config saved to /var/cache/conftool/dbconfig/20240703-074321-marostegui.json
  • 07:36 kart_: Updated MinT to 2024-07-02-060114-production (T364525)
  • 07:33 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply
  • 07:28 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216', diff saved to https://phabricator.wikimedia.org/P65704 and previous config saved to /var/cache/conftool/dbconfig/20240703-072814-marostegui.json
  • 07:23 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply
  • 07:21 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply
  • 07:14 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
  • 07:13 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216', diff saved to https://phabricator.wikimedia.org/P65702 and previous config saved to /var/cache/conftool/dbconfig/20240703-071306-marostegui.json
  • 07:12 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
  • 07:07 kartik@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
  • 06:58 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216 (T364069)', diff saved to https://phabricator.wikimedia.org/P65701 and previous config saved to /var/cache/conftool/dbconfig/20240703-065759-marostegui.json
  • 06:20 marostegui@cumin1002: dbctl commit (dc=all): 'db2129 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P65700 and previous config saved to /var/cache/conftool/dbconfig/20240703-062057-root.json
  • 06:05 marostegui@cumin1002: dbctl commit (dc=all): 'db2129 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P65699 and previous config saved to /var/cache/conftool/dbconfig/20240703-060552-root.json
  • 05:50 marostegui@cumin1002: dbctl commit (dc=all): 'db2129 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P65698 and previous config saved to /var/cache/conftool/dbconfig/20240703-055046-root.json
  • 05:35 marostegui@cumin1002: dbctl commit (dc=all): 'db2129 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P65697 and previous config saved to /var/cache/conftool/dbconfig/20240703-053541-root.json
  • 05:23 marostegui: Deploy schema change on db2207 s2 codfw dbmaint T367856
  • 05:23 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2207.codfw.wmnet with reason: Long schema change
  • 05:22 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2207.codfw.wmnet with reason: Long schema change
  • 05:21 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2207 T369130', diff saved to https://phabricator.wikimedia.org/P65696 and previous config saved to /var/cache/conftool/dbconfig/20240703-052118-root.json
  • 05:20 marostegui@cumin1002: dbctl commit (dc=all): 'db2129 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P65695 and previous config saved to /var/cache/conftool/dbconfig/20240703-052035-root.json
  • 05:20 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db2204 to s2 primary T369130', diff saved to https://phabricator.wikimedia.org/P65694 and previous config saved to /var/cache/conftool/dbconfig/20240703-052029-root.json
  • 05:20 marostegui: Starting s2 codfw failover from db2207 to db2204 - T369130
  • 05:07 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 27 hosts with reason: Primary switchover s2 T369130
  • 05:06 marostegui@cumin1002: dbctl commit (dc=all): 'Set db2204 with weight 0 T369130', diff saved to https://phabricator.wikimedia.org/P65693 and previous config saved to /var/cache/conftool/dbconfig/20240703-050647-root.json
  • 05:06 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 27 hosts with reason: Primary switchover s2 T369130
  • 05:05 marostegui@cumin1002: dbctl commit (dc=all): 'db2129 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P65692 and previous config saved to /var/cache/conftool/dbconfig/20240703-050523-root.json
  • 04:51 marostegui@cumin1002: dbctl commit (dc=all): 'Pool with small weight T365805', diff saved to https://phabricator.wikimedia.org/P65691 and previous config saved to /var/cache/conftool/dbconfig/20240703-045109-marostegui.json
  • 04:50 marostegui@cumin1002: dbctl commit (dc=all): 'db2129 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P65690 and previous config saved to /var/cache/conftool/dbconfig/20240703-045018-root.json
  • 04:33 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2216 (T364069)', diff saved to https://phabricator.wikimedia.org/P65689 and previous config saved to /var/cache/conftool/dbconfig/20240703-043335-marostegui.json
  • 04:33 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2216.codfw.wmnet with reason: Maintenance
  • 04:33 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2216.codfw.wmnet with reason: Maintenance
  • 04:33 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2203 (T364069)', diff saved to https://phabricator.wikimedia.org/P65688 and previous config saved to /var/cache/conftool/dbconfig/20240703-043312-marostegui.json
  • 04:18 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2203', diff saved to https://phabricator.wikimedia.org/P65687 and previous config saved to /var/cache/conftool/dbconfig/20240703-041805-marostegui.json
  • 04:02 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2203', diff saved to https://phabricator.wikimedia.org/P65686 and previous config saved to /var/cache/conftool/dbconfig/20240703-040258-marostegui.json
  • 03:47 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2203 (T364069)', diff saved to https://phabricator.wikimedia.org/P65685 and previous config saved to /var/cache/conftool/dbconfig/20240703-034751-marostegui.json
  • 01:17 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2203 (T364069)', diff saved to https://phabricator.wikimedia.org/P65684 and previous config saved to /var/cache/conftool/dbconfig/20240703-011701-marostegui.json
  • 01:16 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2203.codfw.wmnet with reason: Maintenance
  • 01:16 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2203.codfw.wmnet with reason: Maintenance
  • 00:48 eileen: civicrm upgraded from 6e03cff2 to 84d6f5d1
  • 00:27 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on A:cp-upload_drmrs
  • 00:16 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on A:cp-text_drmrs
  • 00:05 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance
  • 00:05 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance
  • 00:05 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1222 (T367856)', diff saved to https://phabricator.wikimedia.org/P65683 and previous config saved to /var/cache/conftool/dbconfig/20240703-000506-marostegui.json

2024-07-02

  • 23:50 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1222', diff saved to https://phabricator.wikimedia.org/P65682 and previous config saved to /var/cache/conftool/dbconfig/20240702-234959-marostegui.json
  • 23:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1222', diff saved to https://phabricator.wikimedia.org/P65681 and previous config saved to /var/cache/conftool/dbconfig/20240702-233452-marostegui.json
  • 23:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1222 (T367856)', diff saved to https://phabricator.wikimedia.org/P65680 and previous config saved to /var/cache/conftool/dbconfig/20240702-231945-marostegui.json
  • 22:58 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2202.codfw.wmnet with reason: Maintenance
  • 22:58 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2202.codfw.wmnet with reason: Maintenance
  • 22:58 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188 (T364069)', diff saved to https://phabricator.wikimedia.org/P65679 and previous config saved to /var/cache/conftool/dbconfig/20240702-225835-marostegui.json
  • 22:43 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188', diff saved to https://phabricator.wikimedia.org/P65678 and previous config saved to /var/cache/conftool/dbconfig/20240702-224328-marostegui.json
  • 22:28 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188', diff saved to https://phabricator.wikimedia.org/P65677 and previous config saved to /var/cache/conftool/dbconfig/20240702-222820-marostegui.json
  • 22:13 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188 (T364069)', diff saved to https://phabricator.wikimedia.org/P65676 and previous config saved to /var/cache/conftool/dbconfig/20240702-221312-marostegui.json
  • 22:05 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-wikifunctions: apply
  • 22:05 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-wikifunctions: apply
  • 22:05 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-wikifunctions: apply
  • 22:04 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/mw-wikifunctions: apply
  • 22:04 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 22:04 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 22:04 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
  • 22:04 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
  • 22:04 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply
  • 22:04 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply
  • 22:04 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply
  • 22:03 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply
  • 22:03 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 22:03 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 22:03 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 22:03 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 22:03 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-misc: apply
  • 22:03 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-misc: apply
  • 22:03 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-misc: apply
  • 22:03 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/mw-misc: apply
  • 22:03 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply
  • 22:03 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply
  • 22:03 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply
  • 22:03 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply
  • 22:03 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-jobrunner: apply
  • 22:02 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-jobrunner: apply
  • 22:02 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-jobrunner: apply
  • 22:02 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/mw-jobrunner: apply
  • 22:02 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 22:02 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 22:02 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 22:02 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 22:02 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
  • 22:01 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
  • 22:01 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
  • 22:01 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
  • 21:58 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 21:58 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 21:58 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 21:57 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 21:54 rzl@deploy1002: Finished scap: T369080 (duration: 04m 13s)
  • 21:54 rzl@deploy1002: rzl: Continuing with sync
  • 21:52 rzl@deploy1002: rzl: T369080 synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:51 rzl@deploy1002: Started scap sync-world: T369080
  • 21:26 eileen: civicrm upgraded from 08e568e4 to 6e03cff2
  • 21:21 eileen: civicrm upgraded from 67bcfd72 to 08e568e4
  • 20:50 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:50 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add sretest2002 entries - cmooney@cumin1002"
  • 20:49 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add sretest2002 entries - cmooney@cumin1002"
  • 20:45 cmooney@cumin1002: START - Cookbook sre.dns.netbox
  • 20:45 cmooney@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest2002.mgmt.codfw.wmnet with reboot policy FORCED
  • 20:39 cmooney@cumin1002: START - Cookbook sre.hosts.provision for host sretest2002.mgmt.codfw.wmnet with reboot policy FORCED
  • 20:35 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:35 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add sretest2002 entries - cmooney@cumin1002"
  • 20:34 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add sretest2002 entries - cmooney@cumin1002"
  • 20:33 urbanecm@deploy1002: Finished scap: Backport for Follow the defaults for Parsoid on MFE on officewiki (T363720) (duration: 11m 44s)
  • 20:31 cmooney@cumin1002: START - Cookbook sre.dns.netbox
  • 20:28 urbanecm@deploy1002: arlolra, urbanecm: Continuing with sync
  • 20:25 cmooney@cumin1002: END (FAIL) - Cookbook sre.hosts.dhcp (exit_code=99) for host wikikube-ctrl2002.codfw.wmnet
  • 20:24 urbanecm@deploy1002: arlolra, urbanecm: Backport for Follow the defaults for Parsoid on MFE on officewiki (T363720) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:21 urbanecm@deploy1002: Started scap sync-world: Backport for Follow the defaults for Parsoid on MFE on officewiki (T363720)
  • 20:21 urbanecm@deploy1002: Finished scap: Backport for [July 2nd] Mobile: Enable dark mode for all users for tier 1 wikis (T367151), Remove unused Linter configs (T343292) (duration: 16m 31s)
  • 20:16 urbanecm@deploy1002: jdlrobson, arlolra, urbanecm: Continuing with sync
  • 20:15 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 20:15 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 20:07 urbanecm@deploy1002: jdlrobson, arlolra, urbanecm: Backport for [July 2nd] Mobile: Enable dark mode for all users for tier 1 wikis (T367151), Remove unused Linter configs (T343292) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:04 urbanecm@deploy1002: Started scap sync-world: Backport for [July 2nd] Mobile: Enable dark mode for all users for tier 1 wikis (T367151), Remove unused Linter configs (T343292)
  • 19:45 jhathaway: running another email inbound mx test on mx-in1001
  • 19:40 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2188 (T364069)', diff saved to https://phabricator.wikimedia.org/P65675 and previous config saved to /var/cache/conftool/dbconfig/20240702-194027-marostegui.json
  • 19:40 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2188.codfw.wmnet with reason: Maintenance
  • 19:40 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2188.codfw.wmnet with reason: Maintenance
  • 19:40 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T364069)', diff saved to https://phabricator.wikimedia.org/P65674 and previous config saved to /var/cache/conftool/dbconfig/20240702-194005-marostegui.json
  • 19:25 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P65673 and previous config saved to /var/cache/conftool/dbconfig/20240702-192457-marostegui.json
  • 19:21 eileen: civicrm upgraded from 64f23ed0 to 67bcfd72
  • 19:09 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P65672 and previous config saved to /var/cache/conftool/dbconfig/20240702-190950-marostegui.json
  • 18:54 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T364069)', diff saved to https://phabricator.wikimedia.org/P65671 and previous config saved to /var/cache/conftool/dbconfig/20240702-185443-marostegui.json
  • 17:40 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 17:40 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 17:39 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 17:39 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 17:36 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 17:36 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 17:34 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 17:34 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 17:20 jforrester@deploy1002: Finished scap: Backport for Update OOUI to v0.50.3, Update OOUI to v0.50.3 (T369010) (duration: 10m 06s)
  • 17:15 jforrester@deploy1002: jforrester: Continuing with sync
  • 17:14 jforrester@deploy1002: jforrester: Backport for Update OOUI to v0.50.3, Update OOUI to v0.50.3 (T369010) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 17:10 jforrester@deploy1002: Started scap sync-world: Backport for Update OOUI to v0.50.3, Update OOUI to v0.50.3 (T369010)
  • 17:07 dani@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 17:07 dani@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 17:07 dani@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 17:06 dani@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 17:06 dani@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 17:06 dani@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 17:06 mutante: lists1004 - sudo systemctl start wmf_auto_restart_exim4 (T369017)
  • 16:54 ejegg: fundraising civicrm upgraded from 41c1bd78 to 64f23ed0
  • 16:16 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host testvm2007.codfw.wmnet with OS bookworm
  • 16:13 brett@cumin2002: START - Cookbook sre.cdn.roll-reboot rolling reboot on A:cp-upload_drmrs
  • 16:02 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on testvm2007.codfw.wmnet with reason: host reimage
  • 16:01 brett@cumin2002: START - Cookbook sre.cdn.roll-reboot rolling reboot on A:cp-text_drmrs
  • 15:58 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-master1004.eqiad.wmnet
  • 15:57 ayounsi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on testvm2007.codfw.wmnet with reason: host reimage
  • 15:51 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-master1004.eqiad.wmnet
  • 15:50 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/superset: apply
  • 15:50 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/superset: apply
  • 15:49 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_esams
  • 15:46 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_esams
  • 15:44 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on kubernetes1051.eqiad.wmnet with reason: Hardware issue
  • 15:44 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 20:00:00 on kubernetes1051.eqiad.wmnet with reason: Hardware issue
  • 15:43 ayounsi@cumin1002: START - Cookbook sre.hosts.reimage for host testvm2007.codfw.wmnet with OS bookworm
  • 15:41 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2176 (T364069)', diff saved to https://phabricator.wikimedia.org/P65670 and previous config saved to /var/cache/conftool/dbconfig/20240702-154127-marostegui.json
  • 15:41 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2176.codfw.wmnet with reason: Maintenance
  • 15:41 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2176.codfw.wmnet with reason: Maintenance
  • 15:41 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T364069)', diff saved to https://phabricator.wikimedia.org/P65669 and previous config saved to /var/cache/conftool/dbconfig/20240702-154105-marostegui.json
  • 15:25 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P65668 and previous config saved to /var/cache/conftool/dbconfig/20240702-152558-marostegui.json
  • 15:17 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host testvm2007.codfw.wmnet with OS bookworm
  • 15:12 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: sync
  • 15:12 elukey@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: sync
  • 15:10 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P65667 and previous config saved to /var/cache/conftool/dbconfig/20240702-151050-marostegui.json
  • 15:05 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts kubetcd[2004-2006].codfw.wmnet
  • 15:05 jiji@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:05 jiji@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kubetcd[2004-2006].codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1002"
  • 15:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on testvm2007.codfw.wmnet with reason: host reimage
  • 15:02 jiji@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kubetcd[2004-2006].codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1002"
  • 14:58 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-misc: apply
  • 14:58 jiji@cumin1002: START - Cookbook sre.dns.netbox
  • 14:58 ayounsi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on testvm2007.codfw.wmnet with reason: host reimage
  • 14:55 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_esams
  • 14:55 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_esams
  • 14:55 fabfur: upgrading A:cp-esams to haproxy 2.8.10 (T367756)
  • 14:55 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T364069)', diff saved to https://phabricator.wikimedia.org/P65666 and previous config saved to /var/cache/conftool/dbconfig/20240702-145542-marostegui.json
  • 14:53 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-misc: apply
  • 14:53 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-misc: apply
  • 14:53 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-misc: apply
  • 14:52 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-wikifunctions: apply
  • 14:52 brouberol@cumin1002: END (PASS) - Cookbook sre.druid.reboot-workers (exit_code=0) for Druid analytics cluster: Reboot Druid nodes
  • 14:52 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-wikifunctions: apply
  • 14:52 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-wikifunctions: apply
  • 14:51 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts kubetcd[1004-1006].eqiad.wmnet
  • 14:51 jiji@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:51 jiji@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kubetcd[1004-1006].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1002"
  • 14:50 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-wikifunctions: apply
  • 14:48 jiji@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kubetcd[1004-1006].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1002"
  • 14:47 jiji@cumin1002: START - Cookbook sre.hosts.decommission for hosts kubetcd[2004-2006].codfw.wmnet
  • 14:45 jiji@cumin1002: START - Cookbook sre.dns.netbox
  • 14:38 ayounsi@cumin1002: START - Cookbook sre.hosts.reimage for host testvm2007.codfw.wmnet with OS bookworm
  • 14:37 jiji@cumin1002: START - Cookbook sre.hosts.decommission for hosts kubetcd[1004-1006].eqiad.wmnet
  • 14:28 dcaro@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd1008.eqiad.wmnet
  • 14:19 dcaro@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudcephosd1008.eqiad.wmnet
  • 14:15 root@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1008.eqiad.wmnet with OS bullseye
  • 14:12 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on 6 hosts with reason: decom
  • 14:12 jiji@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on 6 hosts with reason: decom
  • 14:11 jiji@cumin1002: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 2 days, 0:00:00 on 6 hosts with reason: decom
  • 14:11 jiji@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on 6 hosts with reason: decom
  • 14:07 jforrester@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 14:06 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=dns6001.wikimedia.org,service=recdns
  • 14:06 filippo@deploy1002: helmfile [eqiad] DONE helmfile.d/services/page-analytics: apply
  • 14:05 filippo@deploy1002: helmfile [eqiad] START helmfile.d/services/page-analytics: apply
  • 14:05 jforrester@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 14:05 jforrester@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 14:05 filippo@deploy1002: helmfile [codfw] DONE helmfile.d/services/page-analytics: apply
  • 14:05 filippo@deploy1002: helmfile [codfw] START helmfile.d/services/page-analytics: apply
  • 14:04 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Unbanning all hosts in search_eqiad
  • 14:04 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Unbanning all hosts in search_eqiad
  • 14:04 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=dns6001.wikimedia.org,service=recdns
  • 14:04 jforrester@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 14:03 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 14:03 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 14:03 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=dns6001.wikimedia.org
  • 14:02 sukhe: restart anycast-hc on dns6001
  • 14:01 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=dns6001.wikimedia.org
  • 13:58 effie: decom old eqiad and codfw kubetcd hosts
  • 13:46 jforrester@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 13:44 jforrester@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 13:44 jforrester@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 13:43 jforrester@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 13:42 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 13:42 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 13:41 brouberol@cumin1002: START - Cookbook sre.druid.reboot-workers for Druid analytics cluster: Reboot Druid nodes
  • 13:39 brouberol@cumin1002: END (PASS) - Cookbook sre.druid.reboot-workers (exit_code=0) for Druid public cluster: Reboot Druid nodes
  • 13:35 cgoubert@cumin1002: conftool action : set/weight=10:pooled=yes; selector: name=(wikikube-worker2030.codfw.wmnet|wikikube-worker2031.codfw.wmnet|wikikube-worker2032.codfw.wmnet|wikikube-worker2033.codfw.wmnet|wikikube-worker2034.codfw.wmnet),cluster=kubernetes,service=kubesvc
  • 13:35 claime: Pooling and uncordoning wikikube-worker2030.codfw.wmnet wikikube-worker2031.codfw.wmnet wikikube-worker2032.codfw.wmnet wikikube-worker2033.codfw.wmnet wikikube-worker2034.codfw.wmnet - T351074
  • 13:31 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1222 (T367856)', diff saved to https://phabricator.wikimedia.org/P65665 and previous config saved to /var/cache/conftool/dbconfig/20240702-133100-marostegui.json
  • 13:30 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1222.eqiad.wmnet with reason: Maintenance
  • 13:30 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1222.eqiad.wmnet with reason: Maintenance
  • 13:30 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T367856)', diff saved to https://phabricator.wikimedia.org/P65664 and previous config saved to /var/cache/conftool/dbconfig/20240702-133038-marostegui.json
  • 13:30 Lucas_WMDE: UTC afternoon backport+config window done
  • 13:27 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for [wikifunctions] Grant wikifunctions-staff enum and converter rights (T366610 T367270), GrowthExperiments: add community updates module flag (T365877) (duration: 10m 22s)
  • 13:22 claime: homer 'cr*codfw*' commit 'T351074'
  • 13:22 logmsgbot: lucaswerkmeister-wmde@deploy1002 sgimeno, jforrester, lucaswerkmeister-wmde: Continuing with sync
  • 13:21 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts kubemaster[1001-1002].eqiad.wmnet
  • 13:21 jiji@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:21 jiji@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kubemaster[1001-1002].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1002"
  • 13:20 logmsgbot: lucaswerkmeister-wmde@deploy1002 sgimeno, jforrester, lucaswerkmeister-wmde: Backport for [wikifunctions] Grant wikifunctions-staff enum and converter rights (T366610 T367270), GrowthExperiments: add community updates module flag (T365877) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:18 jiji@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kubemaster[1001-1002].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1002"
  • 13:17 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap sync-world: Backport for [wikifunctions] Grant wikifunctions-staff enum and converter rights (T366610 T367270), GrowthExperiments: add community updates module flag (T365877)
  • 13:16 jiji@cumin1002: START - Cookbook sre.dns.netbox
  • 13:15 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P65663 and previous config saved to /var/cache/conftool/dbconfig/20240702-131531-marostegui.json
  • 13:14 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for Enable EntitySchema data type on Wikidata (T332157) (duration: 10m 54s)
  • 13:09 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2032.codfw.wmnet with OS bullseye
  • 13:09 jiji@cumin1002: START - Cookbook sre.hosts.decommission for hosts kubemaster[1001-1002].eqiad.wmnet
  • 13:09 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde: Continuing with sync
  • 13:08 elukey@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 13:08 elukey@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 13:06 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde: Backport for Enable EntitySchema data type on Wikidata (T332157) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:04 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2033.codfw.wmnet with OS bullseye
  • 13:03 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap sync-world: Backport for Enable EntitySchema data type on Wikidata (T332157)
  • 13:00 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P65662 and previous config saved to /var/cache/conftool/dbconfig/20240702-130024-marostegui.json
  • 12:59 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2034.codfw.wmnet with OS bullseye
  • 12:59 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2031.codfw.wmnet with OS bullseye
  • 12:56 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2030.codfw.wmnet with OS bullseye
  • 12:55 jiji@cumin1002: conftool action : set/pooled=inactive; selector: name=kubemaster100[1-2].eqiad.wmnet
  • 12:49 jiji@cumin1002: conftool action : set/pooled=no; selector: name=kubemaster100[1-2].eqiad.wmnet
  • 12:49 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2032.codfw.wmnet with reason: host reimage
  • 12:46 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on kubemaster[1001-1002].eqiad.wmnet with reason: decom
  • 12:46 jiji@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on kubemaster[1001-1002].eqiad.wmnet with reason: decom
  • 12:45 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2033.codfw.wmnet with reason: host reimage
  • 12:45 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T367856)', diff saved to https://phabricator.wikimedia.org/P65661 and previous config saved to /var/cache/conftool/dbconfig/20240702-124517-marostegui.json
  • 12:44 effie: decom eqiad old kubemasters - T353464
  • 12:42 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2034.codfw.wmnet with reason: host reimage
  • 12:41 jayme@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host kubernetes1051.eqiad.wmnet
  • 12:39 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2031.codfw.wmnet with reason: host reimage
  • 12:36 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2030.codfw.wmnet with reason: host reimage
  • 12:34 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2032.codfw.wmnet with reason: host reimage
  • 12:34 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2033.codfw.wmnet with reason: host reimage
  • 12:34 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2034.codfw.wmnet with reason: host reimage
  • 12:34 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2031.codfw.wmnet with reason: host reimage
  • 12:33 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2030.codfw.wmnet with reason: host reimage
  • 12:25 brouberol@cumin1002: START - Cookbook sre.druid.reboot-workers for Druid public cluster: Reboot Druid nodes
  • 12:25 marostegui: Deploy schema change on db2129 s6 codfw dbmaint T367856
  • 12:25 brouberol@cumin1002: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid analytics cluster: Roll restart of Druid jvm daemons.
  • 12:24 jforrester@deploy1002: Finished scap: Backport for Reference widget: check for undefined config (T368736) (duration: 09m 59s)
  • 12:19 jforrester@deploy1002: jforrester: Continuing with sync
  • 12:19 jforrester@deploy1002: jforrester: Backport for Reference widget: check for undefined config (T368736) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 12:19 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2034.codfw.wmnet with OS bullseye
  • 12:18 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2033.codfw.wmnet with OS bullseye
  • 12:18 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2032.codfw.wmnet with OS bullseye
  • 12:18 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2031.codfw.wmnet with OS bullseye
  • 12:17 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2030.codfw.wmnet with OS bullseye
  • 12:17 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2393 to wikikube-worker2034
  • 12:17 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2034
  • 12:17 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2034
  • 12:17 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:17 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2393 to wikikube-worker2034 - cgoubert@cumin1002"
  • 12:16 marostegui@cumin1002: dbctl commit (dc=all): 'db2165 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P65660 and previous config saved to /var/cache/conftool/dbconfig/20240702-121638-root.json
  • 12:16 eoghan@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on lists1001.wikimedia.org with reason: Pre-decommissioning lists1001
  • 12:16 eoghan@cumin1002: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on lists1001.wikimedia.org with reason: Pre-decommissioning lists1001
  • 12:16 pfischer@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 12:15 pfischer@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 12:15 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2393 to wikikube-worker2034 - cgoubert@cumin1002"
  • 12:14 jforrester@deploy1002: Started scap sync-world: Backport for Reference widget: check for undefined config (T368736)
  • 12:11 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 12:11 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2393 to wikikube-worker2034
  • 12:09 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2392 to wikikube-worker2033
  • 12:09 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2033
  • 12:09 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2033
  • 12:09 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:09 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2392 to wikikube-worker2033 - cgoubert@cumin1002"
  • 12:09 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_eqiad
  • 12:08 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2392 to wikikube-worker2033 - cgoubert@cumin1002"
  • 12:07 root@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1008.eqiad.wmnet with reason: host reimage
  • 12:07 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_eqiad
  • 12:05 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 12:05 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2392 to wikikube-worker2033
  • 12:05 root@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1008.eqiad.wmnet with reason: host reimage
  • 12:04 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2365 to wikikube-worker2032
  • 12:04 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2032
  • 12:03 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2032
  • 12:03 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:03 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2365 to wikikube-worker2032 - cgoubert@cumin1002"
  • 12:01 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2365 to wikikube-worker2032 - cgoubert@cumin1002"
  • 12:01 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 12:01 marostegui@cumin1002: dbctl commit (dc=all): 'db2165 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P65659 and previous config saved to /var/cache/conftool/dbconfig/20240702-120133-root.json
  • 12:01 jayme@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 12:00 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 12:00 jayme@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 11:59 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 11:59 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2365 to wikikube-worker2032
  • 11:59 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2309 to wikikube-worker2031
  • 11:58 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2031
  • 11:58 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2031
  • 11:58 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:58 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2309 to wikikube-worker2031 - cgoubert@cumin1002"
  • 11:58 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 11:58 jayme@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 11:57 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2309 to wikikube-worker2031 - cgoubert@cumin1002"
  • 11:55 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 11:55 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2309 to wikikube-worker2031
  • 11:54 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2307 to wikikube-worker2030
  • 11:54 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2030
  • 11:52 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2030
  • 11:52 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:52 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2307 to wikikube-worker2030 - cgoubert@cumin1002"
  • 11:50 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2174 (T364069)', diff saved to https://phabricator.wikimedia.org/P65658 and previous config saved to /var/cache/conftool/dbconfig/20240702-115026-marostegui.json
  • 11:50 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2307 to wikikube-worker2030 - cgoubert@cumin1002"
  • 11:50 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2174.codfw.wmnet with reason: Maintenance
  • 11:50 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2174.codfw.wmnet with reason: Maintenance
  • 11:50 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T364069)', diff saved to https://phabricator.wikimedia.org/P65657 and previous config saved to /var/cache/conftool/dbconfig/20240702-115003-marostegui.json
  • 11:48 root@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1008.eqiad.wmnet with OS bullseye
  • 11:46 marostegui@cumin1002: dbctl commit (dc=all): 'db2165 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P65656 and previous config saved to /var/cache/conftool/dbconfig/20240702-114627-root.json
  • 11:44 root@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcephosd1008.eqiad.wmnet with OS bullseye
  • 11:43 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 11:43 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2307 to wikikube-worker2030
  • 11:37 brouberol@cumin1002: START - Cookbook sre.druid.roll-restart-workers for Druid analytics cluster: Roll restart of Druid jvm daemons.
  • 11:36 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2129.codfw.wmnet with reason: Long schema change
  • 11:36 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2129.codfw.wmnet with reason: Long schema change
  • 11:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P65655 and previous config saved to /var/cache/conftool/dbconfig/20240702-113457-marostegui.json
  • 11:31 marostegui@cumin1002: dbctl commit (dc=all): 'db2165 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P65654 and previous config saved to /var/cache/conftool/dbconfig/20240702-113122-root.json
  • 11:27 brouberol@cumin1002: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid public cluster: Roll restart of Druid jvm daemons.
  • 11:26 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host eventlog1003.eqiad.wmnet
  • 11:26 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2129 T369021', diff saved to https://phabricator.wikimedia.org/P65653 and previous config saved to /var/cache/conftool/dbconfig/20240702-112616-root.json
  • 11:25 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db2214 to s6 primary T369021', diff saved to https://phabricator.wikimedia.org/P65652 and previous config saved to /var/cache/conftool/dbconfig/20240702-112518-marostegui.json
  • 11:24 marostegui: Starting s6 codfw failover from db2129 to db2214 - T369021
  • 11:24 jayme: switched wikikube production clusters from PSP to PSS for restricted namespaces - T273507
  • 11:23 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 11:22 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host eventlog1003.eqiad.wmnet
  • 11:22 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 11:22 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_eqiad
  • 11:22 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_eqiad
  • 11:21 jayme@cumin1002: START - Cookbook sre.hosts.reboot-single for host kubernetes1051.eqiad.wmnet
  • 11:21 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 11:21 claime: Uncordoning wikikube-ctrl2001.codfw.wmnet and wikikube-ctrl2002.codfw.wmnet
  • 11:20 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 11:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P65651 and previous config saved to /var/cache/conftool/dbconfig/20240702-111949-marostegui.json
  • 11:17 root@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1008.eqiad.wmnet with OS bullseye
  • 11:16 marostegui@cumin1002: dbctl commit (dc=all): 'db2165 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P65650 and previous config saved to /var/cache/conftool/dbconfig/20240702-111616-root.json
  • 11:14 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_eqiad
  • 11:12 cgoubert@cumin1002: conftool action : set/weight=10:pooled=yes; selector: name=(wikikube-worker2025.codfw.wmnet|wikikube-worker2026.codfw.wmnet|wikikube-worker2027.codfw.wmnet|wikikube-worker2028.codfw.wmnet|wikikube-worker2029.codfw.wmnet),cluster=kubernetes,service=kubesvc
  • 11:12 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_eqiad
  • 11:12 claime: pooling and uncordoning wikikube-worker2025.codfw.wmnet|wikikube-worker2026.codfw.wmnet|wikikube-worker2027.codfw.wmnet|wikikube-worker2028.codfw.wmnet|wikikube-worker2029.codfw.wmnet - T351074
  • 11:11 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts kubemaster[2001-2002].codfw.wmnet
  • 11:11 jiji@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:11 jiji@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kubemaster[2001-2002].codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1002"
  • 11:07 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 27 hosts with reason: Primary switchover s6 T369021
  • 11:07 marostegui@cumin1002: dbctl commit (dc=all): 'Set db2214 with weight 0 T369021', diff saved to https://phabricator.wikimedia.org/P65649 and previous config saved to /var/cache/conftool/dbconfig/20240702-110750-root.json
  • 11:07 jiji@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kubemaster[2001-2002].codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1002"
  • 11:07 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 27 hosts with reason: Primary switchover s6 T369021
  • 11:04 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T364069)', diff saved to https://phabricator.wikimedia.org/P65648 and previous config saved to /var/cache/conftool/dbconfig/20240702-110442-marostegui.json
  • 11:01 marostegui@cumin1002: dbctl commit (dc=all): 'db2165 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P65647 and previous config saved to /var/cache/conftool/dbconfig/20240702-110111-root.json
  • 10:56 jiji@cumin1002: START - Cookbook sre.dns.netbox
  • 10:50 jiji@cumin1002: START - Cookbook sre.hosts.decommission for hosts kubemaster[2001-2002].codfw.wmnet
  • 10:46 marostegui@cumin1002: dbctl commit (dc=all): 'db2165 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P65646 and previous config saved to /var/cache/conftool/dbconfig/20240702-104605-root.json
  • 10:42 pfischer@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 10:42 pfischer@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 10:42 pfischer@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 10:41 pfischer@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 10:35 brouberol@cumin1002: START - Cookbook sre.druid.roll-restart-workers for Druid public cluster: Roll restart of Druid jvm daemons.
  • 10:34 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-master1003.eqiad.wmnet
  • 10:32 brouberol@cumin1002: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:dse-k8s-worker
  • 10:28 fabfur: upgrading A:cp-eqiad to haproxy 2.8.10 (T367756)
  • 10:27 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_eqiad
  • 10:27 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_eqiad
  • 10:25 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-master1003.eqiad.wmnet
  • 10:06 jynus@cumin1002: dbctl commit (dc=all): 'Repool es1025 at 100% weight T363812', diff saved to https://phabricator.wikimedia.org/P65645 and previous config saved to /var/cache/conftool/dbconfig/20240702-100636-jynus.json
  • 10:02 claime: homer 'cr*codfw*' commit 'T351074'
  • 09:53 jiji@cumin1002: conftool action : set/pooled=no; selector: name=kubemaster200[1-2].codfw.wmnet
  • 09:52 elukey: volatile dir on puppetserver1001 with the new point release (12.6) for Bookworm
  • 09:48 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on kubemaster[2001-2002].codfw.wmnet with reason: decom
  • 09:47 jiji@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on kubemaster[2001-2002].codfw.wmnet with reason: decom
  • 09:20 brouberol@cumin1002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:dse-k8s-worker
  • 09:15 jynus@cumin1002: dbctl commit (dc=all): 'Repool es1025 at 50% weight T363812', diff saved to https://phabricator.wikimedia.org/P65644 and previous config saved to /var/cache/conftool/dbconfig/20240702-091508-jynus.json
  • 08:57 jynus@cumin1002: dbctl commit (dc=all): 'Repool es1025 at 10% weight T363812', diff saved to https://phabricator.wikimedia.org/P65643 and previous config saved to /var/cache/conftool/dbconfig/20240702-085733-jynus.json
  • 08:44 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1197 (T367856)', diff saved to https://phabricator.wikimedia.org/P65642 and previous config saved to /var/cache/conftool/dbconfig/20240702-084447-marostegui.json
  • 08:44 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1197.eqiad.wmnet with reason: Maintenance
  • 08:44 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1197.eqiad.wmnet with reason: Maintenance
  • 08:44 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T367856)', diff saved to https://phabricator.wikimedia.org/P65641 and previous config saved to /var/cache/conftool/dbconfig/20240702-084425-marostegui.json
  • 08:40 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on P{cp6009.*} and A:cp
  • 08:38 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on P{cp6009.*} and A:cp
  • 08:36 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_magru
  • 08:34 hashar@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.43.0-wmf.12 refs T366957
  • 08:34 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_magru
  • 08:30 jayme@cumin1002: conftool action : set/pooled=inactive; selector: name=kubernetes1051.eqiad.wmnet
  • 08:29 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P65640 and previous config saved to /var/cache/conftool/dbconfig/20240702-082918-marostegui.json
  • 08:22 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on P{cp2031.*} and A:cp
  • 08:20 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on P{cp2031.*} and A:cp
  • 08:17 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on P{cp2030.*} and A:cp
  • 08:16 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 08:15 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on P{cp2030.*} and A:cp
  • 08:15 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 08:14 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on P{cp2028.*} and A:cp
  • 08:14 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P65639 and previous config saved to /var/cache/conftool/dbconfig/20240702-081411-marostegui.json
  • 08:13 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on P{cp2028.*} and A:cp
  • 08:12 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on P{cp2027.*} and A:cp
  • 08:11 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on P{cp2027.*} and A:cp
  • 08:10 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2173 (T364069)', diff saved to https://phabricator.wikimedia.org/P65638 and previous config saved to /var/cache/conftool/dbconfig/20240702-081025-marostegui.json
  • 08:10 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 08:10 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 08:10 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2173.codfw.wmnet with reason: Maintenance
  • 08:09 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2173.codfw.wmnet with reason: Maintenance
  • 08:09 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170 (T364069)', diff saved to https://phabricator.wikimedia.org/P65637 and previous config saved to /var/cache/conftool/dbconfig/20240702-080948-marostegui.json
  • 08:07 jayme: draining kubernetes1051.eqiad.wmnet
  • 08:07 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_magru
  • 08:06 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_magru
  • 08:01 jayme: cordon kubernetes1051.eqiad.wmnet because of several failed image pulls
  • 07:59 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T367856)', diff saved to https://phabricator.wikimedia.org/P65635 and previous config saved to /var/cache/conftool/dbconfig/20240702-075904-marostegui.json
  • 07:58 kharlan@deploy1002: Finished scap: Backport for Revert "QuickSurveys: Add testing survey configuration" (T368459) (duration: 41m 45s)
  • 07:54 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170', diff saved to https://phabricator.wikimedia.org/P65634 and previous config saved to /var/cache/conftool/dbconfig/20240702-075440-marostegui.json
  • 07:52 kharlan@deploy1002: kharlan: Continuing with sync
  • 07:51 kharlan@deploy1002: kharlan: Backport for Revert "QuickSurveys: Add testing survey configuration" (T368459) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 07:39 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170', diff saved to https://phabricator.wikimedia.org/P65633 and previous config saved to /var/cache/conftool/dbconfig/20240702-073933-marostegui.json
  • 07:24 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170 (T364069)', diff saved to https://phabricator.wikimedia.org/P65632 and previous config saved to /var/cache/conftool/dbconfig/20240702-072426-marostegui.json
  • 07:16 kharlan@deploy1002: Started scap sync-world: Backport for Revert "QuickSurveys: Add testing survey configuration" (T368459)
  • 07:06 kharlan@deploy1002: Started scap sync-world: Backport for Revert "QuickSurveys: Add testing survey configuration" (T368459)
  • 07:01 oblivian@deploy1002: Finished scap: Rebuilding images for change to the base image for httpd (duration: 26m 52s)
  • 06:59 XioNoX: update netboot bookworm image to pickup new point release
  • 06:58 marostegui@cumin1002: dbctl commit (dc=all): 'db1192 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P65631 and previous config saved to /var/cache/conftool/dbconfig/20240702-065831-root.json
  • 06:35 oblivian@deploy1002: Started scap sync-world: Rebuilding images for change to the base image for httpd
  • 06:28 marostegui@cumin1002: dbctl commit (dc=all): 'db1192 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P65629 and previous config saved to /var/cache/conftool/dbconfig/20240702-062820-root.json
  • 06:21 _joe_: rebuilding httpd-fcgi, mediawiki-httpd images T363342 T368640
  • 06:13 marostegui@cumin1002: dbctl commit (dc=all): 'db1192 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P65628 and previous config saved to /var/cache/conftool/dbconfig/20240702-061315-root.json
  • 05:58 marostegui@cumin1002: dbctl commit (dc=all): 'db1192 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P65627 and previous config saved to /var/cache/conftool/dbconfig/20240702-055809-root.json
  • 05:43 marostegui@cumin1002: dbctl commit (dc=all): 'db1192 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P65626 and previous config saved to /var/cache/conftool/dbconfig/20240702-054304-root.json
  • 05:27 marostegui@cumin1002: dbctl commit (dc=all): 'db1192 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P65625 and previous config saved to /var/cache/conftool/dbconfig/20240702-052759-root.json
  • 05:25 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1192 T368371', diff saved to https://phabricator.wikimedia.org/P65624 and previous config saved to /var/cache/conftool/dbconfig/20240702-052543-root.json
  • 05:24 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db1209 to s8 primary and set section read-write T368371', diff saved to https://phabricator.wikimedia.org/P65623 and previous config saved to /var/cache/conftool/dbconfig/20240702-052447-marostegui.json
  • 05:24 marostegui@cumin1002: dbctl commit (dc=all): 'Set s8 eqiad as read-only for maintenance - T368371', diff saved to https://phabricator.wikimedia.org/P65622 and previous config saved to /var/cache/conftool/dbconfig/20240702-052408-marostegui.json
  • 05:23 marostegui: Starting s8 eqiad failover from db1192 to db1209 - T368371
  • 04:59 marostegui@cumin1002: dbctl commit (dc=all): 'Set db1209 remove from API T368371', diff saved to https://phabricator.wikimedia.org/P65621 and previous config saved to /var/cache/conftool/dbconfig/20240702-045929-marostegui.json
  • 04:59 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 33 hosts with reason: Primary switchover s8 T368371
  • 04:58 marostegui@cumin1002: dbctl commit (dc=all): 'Set db1209 with weight 0 T368371', diff saved to https://phabricator.wikimedia.org/P65620 and previous config saved to /var/cache/conftool/dbconfig/20240702-045856-marostegui.json
  • 04:58 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 33 hosts with reason: Primary switchover s8 T368371
  • 04:33 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2170 (T364069)', diff saved to https://phabricator.wikimedia.org/P65619 and previous config saved to /var/cache/conftool/dbconfig/20240702-043349-marostegui.json
  • 04:33 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 04:33 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 04:33 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T364069)', diff saved to https://phabricator.wikimedia.org/P65618 and previous config saved to /var/cache/conftool/dbconfig/20240702-043326-marostegui.json
  • 04:22 eileen: civicrm upgraded from f6af6380 to 41c1bd78
  • 04:18 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P65617 and previous config saved to /var/cache/conftool/dbconfig/20240702-041819-marostegui.json
  • 04:07 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1188 (T367856)', diff saved to https://phabricator.wikimedia.org/P65616 and previous config saved to /var/cache/conftool/dbconfig/20240702-040705-marostegui.json
  • 04:06 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1188.eqiad.wmnet with reason: Maintenance
  • 04:06 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1188.eqiad.wmnet with reason: Maintenance
  • 04:06 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T367856)', diff saved to https://phabricator.wikimedia.org/P65615 and previous config saved to /var/cache/conftool/dbconfig/20240702-040643-marostegui.json
  • 04:03 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P65614 and previous config saved to /var/cache/conftool/dbconfig/20240702-040312-marostegui.json
  • 04:01 mwpresync@deploy1002: Pruned MediaWiki: 1.43.0-wmf.9 (duration: 01m 02s)
  • 03:54 mwpresync@deploy1002: Finished scap: testwikis wikis to 1.43.0-wmf.12 refs T366957 (duration: 51m 33s)
  • 03:51 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P65613 and previous config saved to /var/cache/conftool/dbconfig/20240702-035135-marostegui.json
  • 03:51 eileen: civicrm upgraded from 52dc4f1d to f6af6380
  • 03:48 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T364069)', diff saved to https://phabricator.wikimedia.org/P65612 and previous config saved to /var/cache/conftool/dbconfig/20240702-034805-marostegui.json
  • 03:36 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P65611 and previous config saved to /var/cache/conftool/dbconfig/20240702-033628-marostegui.json
  • 03:21 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T367856)', diff saved to https://phabricator.wikimedia.org/P65610 and previous config saved to /var/cache/conftool/dbconfig/20240702-032121-marostegui.json
  • 03:03 mwpresync@deploy1002: Started scap sync-world: testwikis wikis to 1.43.0-wmf.12 refs T366957
  • 00:45 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2153 (T364069)', diff saved to https://phabricator.wikimedia.org/P65609 and previous config saved to /var/cache/conftool/dbconfig/20240702-004524-marostegui.json
  • 00:45 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2153.codfw.wmnet with reason: Maintenance
  • 00:45 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2153.codfw.wmnet with reason: Maintenance
  • 00:45 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T364069)', diff saved to https://phabricator.wikimedia.org/P65608 and previous config saved to /var/cache/conftool/dbconfig/20240702-004502-marostegui.json
  • 00:29 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P65607 and previous config saved to /var/cache/conftool/dbconfig/20240702-002955-marostegui.json
  • 00:16 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1038.eqiad.wmnet with OS bullseye
  • 00:16 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 00:15 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 00:14 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P65606 and previous config saved to /var/cache/conftool/dbconfig/20240702-001448-marostegui.json
  • 00:14 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1037.eqiad.wmnet with OS bullseye
  • 00:14 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 00:13 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"

2024-07-01

  • 23:59 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T364069)', diff saved to https://phabricator.wikimedia.org/P65605 and previous config saved to /var/cache/conftool/dbconfig/20240701-235941-marostegui.json
  • 23:57 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1038.eqiad.wmnet with reason: host reimage
  • 23:55 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1036.eqiad.wmnet with OS bullseye
  • 23:55 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 23:54 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1037.eqiad.wmnet with reason: host reimage
  • 23:54 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 23:51 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1038.eqiad.wmnet with reason: host reimage
  • 23:51 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1037.eqiad.wmnet with reason: host reimage
  • 23:41 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1035.eqiad.wmnet with OS bullseye
  • 23:40 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 23:39 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 23:36 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1036.eqiad.wmnet with reason: host reimage
  • 23:34 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1036.eqiad.wmnet with reason: host reimage
  • 23:25 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1037.eqiad.wmnet with OS bullseye
  • 23:25 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1038.eqiad.wmnet with OS bullseye
  • 23:22 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1035.eqiad.wmnet with reason: host reimage
  • 23:19 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1035.eqiad.wmnet with reason: host reimage
  • 23:05 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1036.eqiad.wmnet with OS bullseye
  • 23:02 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1035.eqiad.wmnet with OS bullseye
  • 22:54 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephosd1038
  • 22:54 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephosd1038
  • 22:47 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1041.eqiad.wmnet with OS bullseye
  • 22:47 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 22:10 sbassett@deploy1002: Synchronized private/PrivateSettings.php: Un-deployed a PS.php mitigation for T341908 (duration: 07m 24s)
  • 21:59 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: elastic1089*,elastic1090*,elastic1104* for T348977 - bking@cumin2002
  • 21:59 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: elastic1089*,elastic1090*,elastic1104* for T348977 - bking@cumin2002
  • 21:58 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on elastic[1089-1090,1104].eqiad.wmnet with reason: T348977
  • 21:58 bking@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on elastic[1089-1090,1104].eqiad.wmnet with reason: T348977
  • 21:55 maryum: deployed patch for T366991
  • 21:39 eileen: civicrm upgraded from f8b1f5c4 to 52dc4f1d
  • 21:39 eileen: tools upgraded from c51f6e62 to 95f10b20
  • 21:32 zabe: zabe@mwmaint1002:/tmp/upload$ mwscript importImages.php --wiki=commonswiki --comment-ext=txt --sleep=3600 --user=Yann . # T368703
  • 21:24 cjming: end of UTC late backport window
  • 21:23 cjming@deploy1002: Finished scap: Backport for extension-list: Add Metrics Platform (T366234) (duration: 28m 16s)
  • 21:16 cjming@deploy1002: cjming: Continuing with sync
  • 21:16 cjming@deploy1002: cjming: Backport for extension-list: Add Metrics Platform (T366234) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:05 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2146 (T364069)', diff saved to https://phabricator.wikimedia.org/P65604 and previous config saved to /var/cache/conftool/dbconfig/20240701-210534-marostegui.json
  • 21:05 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2146.codfw.wmnet with reason: Maintenance
  • 21:05 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2146.codfw.wmnet with reason: Maintenance
  • 21:05 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T364069)', diff saved to https://phabricator.wikimedia.org/P65603 and previous config saved to /var/cache/conftool/dbconfig/20240701-210512-marostegui.json
  • 21:04 ejegg: fundraising civicrm upgraded from f9782670 to f8b1f5c4
  • 20:55 cjming@deploy1002: Started scap sync-world: Backport for extension-list: Add Metrics Platform (T366234)
  • 20:53 cjming@deploy1002: Finished scap: Backport for Missing.php: don't redirect to unprefixed nan incubator (T86915) (duration: 09m 03s)
  • 20:50 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P65602 and previous config saved to /var/cache/conftool/dbconfig/20240701-205003-marostegui.json
  • 20:47 cjming@deploy1002: cjming, pppery: Continuing with sync
  • 20:47 cjming@deploy1002: cjming, pppery: Backport for Missing.php: don't redirect to unprefixed nan incubator (T86915) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:44 cjming@deploy1002: Started scap sync-world: Backport for Missing.php: don't redirect to unprefixed nan incubator (T86915)
  • 20:42 cjming@deploy1002: Finished scap: Backport for [July 1st] Mobile: Enable dark mode for all tier 1 wikis (logged in) (T367151), Change color of notification icon in dark-mode (T368120), Do not invert images that have been tagged with no invert classes (T368483) (duration: 10m 39s)
  • 20:36 cjming@deploy1002: cjming, jdlrobson: Continuing with sync
  • 20:35 ejegg: standalone SmashPig upgraded from c8993ec6 to 565c61e4
  • 20:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P65601 and previous config saved to /var/cache/conftool/dbconfig/20240701-203456-marostegui.json
  • 20:34 cjming@deploy1002: cjming, jdlrobson: Backport for [July 1st] Mobile: Enable dark mode for all tier 1 wikis (logged in) (T367151), Change color of notification icon in dark-mode (T368120), Do not invert images that have been tagged with no invert classes (T368483) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:31 cjming@deploy1002: Started scap sync-world: Backport for [July 1st] Mobile: Enable dark mode for all tier 1 wikis (logged in) (T367151), Change color of notification icon in dark-mode (T368120), Do not invert images that have been tagged with no invert classes (T368483)
  • 20:30 cjming@deploy1002: Sync cancelled.
  • 20:28 cjming@deploy1002: jdlrobson, cjming: Backport for [July 1st] Mobile: Enable dark mode for all tier 1 wikis (logged in) (T367151) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:26 cjming@deploy1002: Started scap sync-world: Backport for [July 1st] Mobile: Enable dark mode for all tier 1 wikis (logged in) (T367151)
  • 20:23 cjming@deploy1002: Sync cancelled.
  • 20:23 cjming@deploy1002: jdlrobson, cjming: Backport for [July 1st] Mobile: Enable dark mode for all tier 1 wikis (logged in) (T367151) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T364069)', diff saved to https://phabricator.wikimedia.org/P65600 and previous config saved to /var/cache/conftool/dbconfig/20240701-201949-marostegui.json
  • 20:03 cjming@deploy1002: Started scap sync-world: Backport for [July 1st] Mobile: Enable dark mode for all tier 1 wikis (logged in) (T367151)
  • 19:33 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 19:19 dancy@deploy1002: Installation of scap version "4.91.0" completed for 233 hosts
  • 19:19 dancy@deploy1002: Installing scap version "4.91.0" for 233 hosts
  • 19:16 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1041.eqiad.wmnet with reason: host reimage
  • 19:15 dancy@deploy1002: Installing scap version "4.91.0" for 234 hosts
  • 19:14 dancy@deploy1002: Installing scap version "4.91.0" for 234 hosts
  • 19:13 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1041.eqiad.wmnet with reason: host reimage
  • 18:57 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1041.eqiad.wmnet with OS bullseye
  • 18:56 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephosd1041
  • 18:56 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephosd1041
  • 17:49 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephosd1041
  • 17:49 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephosd1041
  • 17:49 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:49 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt for dbproxy1028,9 - jclark@cumin1002"
  • 17:48 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt for dbproxy1028,9 - jclark@cumin1002"
  • 17:45 jclark@cumin1002: START - Cookbook sre.dns.netbox
  • 17:44 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephosd1041
  • 17:44 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephosd1041
  • 17:42 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephosd1041
  • 17:42 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephosd1041
  • 17:41 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephosd1041
  • 17:41 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephosd1041
  • 17:37 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1041.eqiad.wmnet with OS bullseye
  • 17:36 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on cr2-codfw,ssw1-a[1,8]-codfw.mgmt with reason: reboot ssw1-d8-codfw
  • 17:35 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on cr2-codfw,ssw1-a[1,8]-codfw.mgmt with reason: reboot ssw1-d8-codfw
  • 17:27 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1039.eqiad.wmnet with OS bullseye
  • 17:27 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 17:16 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2145 (T364069)', diff saved to https://phabricator.wikimedia.org/P65599 and previous config saved to /var/cache/conftool/dbconfig/20240701-171609-marostegui.json
  • 17:16 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2145.codfw.wmnet with reason: Maintenance
  • 17:15 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2145.codfw.wmnet with reason: Maintenance
  • 17:08 klausman@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
  • 17:08 klausman@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
  • 17:05 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
  • 17:04 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
  • 16:51 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 16:51 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/commons-impact-analytics: apply
  • 16:51 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/commons-impact-analytics: apply
  • 16:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1039.eqiad.wmnet with reason: host reimage
  • 16:35 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1039.eqiad.wmnet with reason: host reimage
  • 16:34 dancy@deploy1002: Installing scap version "4.90.0" for 234 hosts
  • 16:34 dancy@deploy1002: Installing scap version "4.90.0" for 234 hosts
  • 16:33 dancy@deploy1002: Installing scap version "4.90.0" for 234 hosts
  • 16:30 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1182 (T367856)', diff saved to https://phabricator.wikimedia.org/P65598 and previous config saved to /var/cache/conftool/dbconfig/20240701-163010-marostegui.json
  • 16:30 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 16:29 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 16:29 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T367856)', diff saved to https://phabricator.wikimedia.org/P65597 and previous config saved to /var/cache/conftool/dbconfig/20240701-162948-marostegui.json
  • 16:22 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1039.eqiad.wmnet with OS bullseye
  • 16:21 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephosd1039
  • 16:20 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephosd1039
  • 16:18 urandom: restarting Cassandra —restbase2023-{a,b,c}— troubleshooting storage utilization
  • 16:17 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1041.eqiad.wmnet with OS bullseye
  • 16:14 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P65596 and previous config saved to /var/cache/conftool/dbconfig/20240701-161441-marostegui.json
  • 16:11 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephosd1041
  • 16:11 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephosd1041
  • 15:59 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P65595 and previous config saved to /var/cache/conftool/dbconfig/20240701-155934-marostegui.json
  • 15:44 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T367856)', diff saved to https://phabricator.wikimedia.org/P65594 and previous config saved to /var/cache/conftool/dbconfig/20240701-154427-marostegui.json
  • 15:37 marostegui@cumin1002: dbctl commit (dc=all): 'db2158 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P65593 and previous config saved to /var/cache/conftool/dbconfig/20240701-153758-root.json
  • 15:37 dcausse@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:32 dcausse@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:25 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_codfw
  • 15:22 marostegui@cumin1002: dbctl commit (dc=all): 'db2158 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P65592 and previous config saved to /var/cache/conftool/dbconfig/20240701-152253-root.json
  • 15:22 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 15:22 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_codfw
  • 15:21 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 15:21 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 15:20 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 15:16 dcausse@deploy1002: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:15 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1040.eqiad.wmnet with OS bullseye
  • 15:15 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 15:14 dcausse@deploy1002: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:10 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2026.codfw.wmnet with OS bullseye
  • 15:07 marostegui@cumin1002: dbctl commit (dc=all): 'db2158 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P65591 and previous config saved to /var/cache/conftool/dbconfig/20240701-150747-root.json
  • 15:07 dcausse@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 15:07 dcausse@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 15:05 akosiaris: reboot deploy1003 T364416
  • 15:04 dcausse@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 15:03 dcausse@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:57 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 14:56 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 14:56 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
  • 14:55 claime: deploying statsd-exporter for mw-web - T365265
  • 14:54 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/commons-impact-analytics: apply
  • 14:54 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
  • 14:52 marostegui@cumin1002: dbctl commit (dc=all): 'db2158 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P65590 and previous config saved to /var/cache/conftool/dbconfig/20240701-145242-root.json
  • 14:50 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2026.codfw.wmnet with reason: host reimage
  • 14:49 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 14:48 dcausse@deploy1002: helmfile [eqiad] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 14:48 dcausse@deploy1002: helmfile [eqiad] START helmfile.d/services/rdf-streaming-updater: apply
  • 14:48 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2026.codfw.wmnet with reason: host reimage
  • 14:44 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/commons-impact-analytics: apply
  • 14:44 dcausse@deploy1002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 14:43 dcausse@deploy1002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
  • 14:40 dcausse@deploy1002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 14:40 dcausse@deploy1002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 14:37 marostegui@cumin1002: dbctl commit (dc=all): 'db2158 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P65589 and previous config saved to /var/cache/conftool/dbconfig/20240701-143736-root.json
  • 14:36 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_codfw
  • 14:36 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_codfw
  • 14:35 fabfur: upgrading A:cp-codfw to haproxy 2.8.10 (T367756)
  • 14:32 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1040.eqiad.wmnet with reason: host reimage
  • 14:31 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2026.codfw.wmnet with OS bullseye
  • 14:27 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1040.eqiad.wmnet with reason: host reimage
  • 14:22 marostegui@cumin1002: dbctl commit (dc=all): 'db2158 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P65587 and previous config saved to /var/cache/conftool/dbconfig/20240701-142231-root.json
  • 14:16 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 14:16 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 14:16 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T364069)', diff saved to https://phabricator.wikimedia.org/P65586 and previous config saved to /var/cache/conftool/dbconfig/20240701-141640-marostegui.json
  • 14:10 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1040.eqiad.wmnet with OS bullseye
  • 14:07 marostegui@cumin1002: dbctl commit (dc=all): 'db2158 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P65585 and previous config saved to /var/cache/conftool/dbconfig/20240701-140725-root.json
  • 14:03 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1040.eqiad.wmnet with OS bullseye
  • 14:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P65584 and previous config saved to /var/cache/conftool/dbconfig/20240701-140133-marostegui.json
  • 13:57 dcausse@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 13:56 dcausse@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 13:48 dcausse@deploy1002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 13:48 dcausse@deploy1002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 13:46 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P65583 and previous config saved to /var/cache/conftool/dbconfig/20240701-134626-marostegui.json
  • 13:42 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1040.eqiad.wmnet with OS bullseye
  • 13:41 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephosd1040
  • 13:41 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephosd1040
  • 13:37 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2026.codfw.wmnet with OS bullseye
  • 13:31 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T364069)', diff saved to https://phabricator.wikimedia.org/P65581 and previous config saved to /var/cache/conftool/dbconfig/20240701-133118-marostegui.json
  • 13:30 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/api-gateway: sync
  • 13:30 elukey@deploy1002: helmfile [staging] START helmfile.d/services/api-gateway: sync
  • 13:30 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/rest-gateway: sync
  • 13:29 elukey@deploy1002: helmfile [staging] START helmfile.d/services/rest-gateway: sync
  • 13:29 urbanecm: mwmaint1002: [urbanecm@mwmaint1002 ~]$ foreachwiki DiscussionTools:FixTrailingWhitespaceIds (T356196)
  • 13:27 elukey@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'.
  • 13:27 elukey@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'sync'.
  • 13:26 elukey@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
  • 13:26 elukey@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
  • 13:26 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 13:26 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 13:25 urbanecm@deploy1002: Finished scap: Backport for FixTrailingWhitespaceIds: Don't crash on complex conflicts (T356196) (duration: 08m 46s)
  • 13:21 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 13:21 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 13:19 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_magru
  • 13:17 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2026.codfw.wmnet with reason: host reimage
  • 13:17 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_magru
  • 13:16 urbanecm@deploy1002: Started scap: Backport for FixTrailingWhitespaceIds: Don't crash on complex conflicts (T356196)
  • 13:16 urbanecm@deploy1002: Finished scap: Backport for Update interwiki map (T368862) (duration: 09m 01s)
  • 13:14 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2026.codfw.wmnet with reason: host reimage
  • 13:10 urbanecm@deploy1002: urbanecm: Continuing with sync
  • 13:10 urbanecm@deploy1002: urbanecm: Backport for Update interwiki map (T368862) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:07 urbanecm@deploy1002: Started scap: Backport for Update interwiki map (T368862)
  • 12:56 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 12:56 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 12:56 dcausse@deploy1002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 12:55 dcausse@deploy1002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 12:55 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2026.codfw.wmnet with OS bullseye
  • 12:54 cgoubert@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-worker2026.codfw.wmnet with OS bullseye
  • 12:51 claime: Running update-netboot-image bullseye for 11.10 release on puppetserver1001
  • 12:49 fabfur: upgrading A:cp-magru to haproxy 2.8.10 (T367756)
  • 12:49 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_magru
  • 12:49 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_magru
  • 12:39 claime: Running update-netboot-image bullseye for 11.10 release
  • 12:35 jayme@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
  • 12:35 jayme@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
  • 12:35 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2026.codfw.wmnet with reason: host reimage
  • 12:35 jayme@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
  • 12:35 jayme@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
  • 12:35 jayme@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
  • 12:35 jayme@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
  • 12:34 jayme@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 12:33 jayme@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 12:33 jayme@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 12:33 jayme@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 12:32 klausman@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
  • 12:32 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2026.codfw.wmnet with reason: host reimage
  • 12:32 jayme@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 12:32 jayme@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 12:32 jayme@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 12:31 klausman@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
  • 12:31 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
  • 12:30 jayme@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 12:29 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
  • 12:28 klausman@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
  • 12:27 klausman@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
  • 12:23 klausman@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 12:22 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 12:21 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 12:21 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 12:20 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 12:19 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 12:18 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 12:17 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 12:16 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 12:14 klausman@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 12:12 daniel@deploy1002: Finished scap: Backport for REST: detect mismatching value types in json request (T305973) (duration: 32m 48s)
  • 12:09 klausman@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 12:08 klausman@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 12:06 klausman@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 12:04 daniel@deploy1002: daniel: Continuing with sync
  • 12:03 klausman@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 12:01 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2026.codfw.wmnet with OS bullseye
  • 12:01 cgoubert@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-worker2026.codfw.wmnet with OS bullseye
  • 12:00 daniel@deploy1002: daniel: Backport for REST: detect mismatching value types in json request (T305973) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 11:58 klausman@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 11:51 klausman@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'readability' for release 'main' .
  • 11:49 klausman@deploy1002: helmfile [ml-serve-eqiad] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 11:46 klausman@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'llm' for release 'main' .
  • 11:45 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 11:45 klausman@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 11:43 slyngshede@cumin1002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging FebinBellamy out of all services on: 2188 hosts
  • 11:43 klausman@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 11:43 slyngshede@cumin1002: START - Cookbook sre.idm.logout Logging FebinBellamy out of all services on: 2188 hosts
  • 11:41 slyngshede@cumin1002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging AKhatun out of all services on: 2188 hosts
  • 11:41 slyngshede@cumin1002: START - Cookbook sre.idm.logout Logging AKhatun out of all services on: 2188 hosts
  • 11:40 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2026.codfw.wmnet with reason: host reimage
  • 11:39 daniel@deploy1002: Started scap: Backport for REST: detect mismatching value types in json request (T305973)
  • 11:37 klausman@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'article-descriptions' for release 'main' .
  • 11:37 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2026.codfw.wmnet with reason: host reimage
  • 11:33 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_drmrs
  • 11:30 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_drmrs
  • 11:29 btullis@cumin1002: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=99)
  • 11:27 btullis@cumin1002: START - Cookbook sre.wikireplicas.update-views
  • 11:19 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2026.codfw.wmnet with OS bullseye
  • 10:57 elukey@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'.
  • 10:49 claime: running /usr/local/bin/apply-config-kartotherian on maps-master
  • 10:47 elukey@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'sync'.
  • 10:47 claime: running /usr/local/bin/apply-config-kartotherian on maps-replica
  • 10:46 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 10:46 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 10:43 claime: running puppet on maps servers
  • 10:39 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_drmrs
  • 10:39 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_drmrs
  • 10:38 elukey@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
  • 10:37 elukey@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
  • 10:37 elukey@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'.
  • 10:37 elukey@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'sync'.
  • 10:26 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2130 (T364069)', diff saved to https://phabricator.wikimedia.org/P65580 and previous config saved to /var/cache/conftool/dbconfig/20240701-102633-marostegui.json
  • 10:26 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2130.codfw.wmnet with reason: Maintenance
  • 10:26 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2130.codfw.wmnet with reason: Maintenance
  • 10:26 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T364069)', diff saved to https://phabricator.wikimedia.org/P65579 and previous config saved to /var/cache/conftool/dbconfig/20240701-102611-marostegui.json
  • 10:23 fabfur: upgrading A:cp-drmrs to haproxy 2.8.10 (T367756)
  • 10:11 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P65578 and previous config saved to /var/cache/conftool/dbconfig/20240701-101104-marostegui.json
  • 09:55 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P65577 and previous config saved to /var/cache/conftool/dbconfig/20240701-095557-marostegui.json
  • 09:45 marostegui@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P65576 and previous config saved to /var/cache/conftool/dbconfig/20240701-094547-root.json
  • 09:43 marostegui@cumin1002: dbctl commit (dc=all): 'db1195 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P65575 and previous config saved to /var/cache/conftool/dbconfig/20240701-094341-root.json
  • 09:40 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T364069)', diff saved to https://phabricator.wikimedia.org/P65574 and previous config saved to /var/cache/conftool/dbconfig/20240701-094050-marostegui.json
  • 09:30 marostegui@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P65573 and previous config saved to /var/cache/conftool/dbconfig/20240701-093042-root.json
  • 09:28 marostegui@cumin1002: dbctl commit (dc=all): 'db1195 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P65572 and previous config saved to /var/cache/conftool/dbconfig/20240701-092835-root.json
  • 09:26 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 09:15 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 09:15 marostegui@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P65570 and previous config saved to /var/cache/conftool/dbconfig/20240701-091536-root.json
  • 09:14 urbanecm@deploy1002: Finished scap: Backport for JsonSchemaValidator: Measure duration (T365245) (duration: 22m 15s)
  • 09:13 marostegui@cumin1002: dbctl commit (dc=all): 'db1195 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P65569 and previous config saved to /var/cache/conftool/dbconfig/20240701-091329-root.json
  • 09:06 urbanecm@deploy1002: urbanecm: Continuing with sync
  • 09:06 urbanecm@deploy1002: urbanecm: Backport for JsonSchemaValidator: Measure duration (T365245) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 09:00 marostegui@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P65568 and previous config saved to /var/cache/conftool/dbconfig/20240701-090031-root.json
  • 08:58 marostegui@cumin1002: dbctl commit (dc=all): 'db1195 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P65567 and previous config saved to /var/cache/conftool/dbconfig/20240701-085824-root.json
  • 08:51 urbanecm@deploy1002: Started scap: Backport for JsonSchemaValidator: Measure duration (T365245)
  • 08:45 marostegui@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P65566 and previous config saved to /var/cache/conftool/dbconfig/20240701-084525-root.json
  • 08:43 marostegui@cumin1002: dbctl commit (dc=all): 'db1195 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P65565 and previous config saved to /var/cache/conftool/dbconfig/20240701-084318-root.json
  • 08:30 marostegui@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P65564 and previous config saved to /var/cache/conftool/dbconfig/20240701-083020-root.json
  • 08:28 marostegui@cumin1002: dbctl commit (dc=all): 'db1195 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P65563 and previous config saved to /var/cache/conftool/dbconfig/20240701-082813-root.json
  • 08:18 jynus@cumin1002: dbctl commit (dc=all): 'Depool es1025 for backups T363812', diff saved to https://phabricator.wikimedia.org/P65562 and previous config saved to /var/cache/conftool/dbconfig/20240701-081811-jynus.json
  • 08:15 marostegui@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P65561 and previous config saved to /var/cache/conftool/dbconfig/20240701-081514-root.json
  • 08:13 marostegui@cumin1002: dbctl commit (dc=all): 'db1195 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P65560 and previous config saved to /var/cache/conftool/dbconfig/20240701-081307-root.json
  • 08:07 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1169.eqiad.wmnet onto db1195.eqiad.wmnet
  • 07:44 elukey: `apt-get clean` on buil2001 to free some space in the root partition
  • 07:02 marostegui@cumin1002: dbctl commit (dc=all): 'Place db1195 in s1 T368871', diff saved to https://phabricator.wikimedia.org/P65559 and previous config saved to /var/cache/conftool/dbconfig/20240701-070243-marostegui.json
  • 06:36 marostegui@cumin1002: START - Cookbook sre.mysql.clone of db1169.eqiad.wmnet onto db1195.eqiad.wmnet
  • 06:36 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1169 T368871', diff saved to https://phabricator.wikimedia.org/P65558 and previous config saved to /var/cache/conftool/dbconfig/20240701-063601-root.json
  • 06:33 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2116 (T364069)', diff saved to https://phabricator.wikimedia.org/P65557 and previous config saved to /var/cache/conftool/dbconfig/20240701-063344-marostegui.json
  • 06:33 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2116.codfw.wmnet with reason: Maintenance
  • 06:33 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2116.codfw.wmnet with reason: Maintenance
  • 05:02 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1195.eqiad.wmnet with reason: Reboot
  • 05:02 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1195.eqiad.wmnet with reason: Reboot
  • 04:56 marostegui: Failover m2 from db1195 to db1228 - T368494
  • 04:52 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db[2133,2160].codfw.wmnet,db[1195,1217,1228].eqiad.wmnet with reason: m2 switchover T368494
  • 04:51 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on db[2133,2160].codfw.wmnet,db[1195,1217,1228].eqiad.wmnet with reason: m2 switchover T368494
  • 04:50 marostegui: dbmaint eqiad Rebuild pagelinks table on s8 master T364069
  • 04:49 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1156 (T367856)', diff saved to https://phabricator.wikimedia.org/P65556 and previous config saved to /var/cache/conftool/dbconfig/20240701-044945-marostegui.json
  • 04:49 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 04:49 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 04:49 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 04:49 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1156.eqiad.wmnet with reason: Maintenance


Other archives

2000s

2010s

2020s