Jump to content

Server Admin Log/Archive 81

From Wikitech

2024-06-15

  • 23:48 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2125', diff saved to https://phabricator.wikimedia.org/P65031 and previous config saved to /var/cache/conftool/dbconfig/20240615-234836-ladsgroup.json
  • 23:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2125', diff saved to https://phabricator.wikimedia.org/P65030 and previous config saved to /var/cache/conftool/dbconfig/20240615-233329-ladsgroup.json
  • 23:18 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2125 (T352010)', diff saved to https://phabricator.wikimedia.org/P65029 and previous config saved to /var/cache/conftool/dbconfig/20240615-231822-ladsgroup.json
  • 21:18 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1227 (T364069)', diff saved to https://phabricator.wikimedia.org/P65028 and previous config saved to /var/cache/conftool/dbconfig/20240615-211811-marostegui.json
  • 21:18 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1227.eqiad.wmnet with reason: Maintenance
  • 21:17 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1227.eqiad.wmnet with reason: Maintenance
  • 21:17 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1202 (T364069)', diff saved to https://phabricator.wikimedia.org/P65027 and previous config saved to /var/cache/conftool/dbconfig/20240615-211750-marostegui.json
  • 21:02 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P65026 and previous config saved to /var/cache/conftool/dbconfig/20240615-210243-marostegui.json
  • 20:47 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1202', diff saved to https://phabricator.wikimedia.org/P65025 and previous config saved to /var/cache/conftool/dbconfig/20240615-204735-marostegui.json
  • 20:32 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1202 (T364069)', diff saved to https://phabricator.wikimedia.org/P65024 and previous config saved to /var/cache/conftool/dbconfig/20240615-203229-marostegui.json
  • 16:32 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P65021 and previous config saved to /var/cache/conftool/dbconfig/20240615-163203-marostegui.json
  • 16:16 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P65020 and previous config saved to /var/cache/conftool/dbconfig/20240615-161656-marostegui.json
  • 16:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T364069)', diff saved to https://phabricator.wikimedia.org/P65019 and previous config saved to /var/cache/conftool/dbconfig/20240615-160149-marostegui.json
  • 11:58 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1194 (T364069)', diff saved to https://phabricator.wikimedia.org/P65018 and previous config saved to /var/cache/conftool/dbconfig/20240615-115812-marostegui.json
  • 11:58 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1194.eqiad.wmnet with reason: Maintenance
  • 11:57 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1194.eqiad.wmnet with reason: Maintenance
  • 11:57 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T364069)', diff saved to https://phabricator.wikimedia.org/P65017 and previous config saved to /var/cache/conftool/dbconfig/20240615-115750-marostegui.json
  • 11:42 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P65016 and previous config saved to /var/cache/conftool/dbconfig/20240615-114243-marostegui.json
  • 11:27 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P65015 and previous config saved to /var/cache/conftool/dbconfig/20240615-112736-marostegui.json
  • 11:12 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T364069)', diff saved to https://phabricator.wikimedia.org/P65014 and previous config saved to /var/cache/conftool/dbconfig/20240615-111229-marostegui.json
  • 09:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2125 (T352010)', diff saved to https://phabricator.wikimedia.org/P65013 and previous config saved to /var/cache/conftool/dbconfig/20240615-092730-ladsgroup.json
  • 09:27 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2125.codfw.wmnet with reason: Maintenance
  • 09:27 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2125.codfw.wmnet with reason: Maintenance
  • 07:12 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1191 (T364069)', diff saved to https://phabricator.wikimedia.org/P65012 and previous config saved to /var/cache/conftool/dbconfig/20240615-071215-marostegui.json
  • 07:12 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1191.eqiad.wmnet with reason: Maintenance
  • 07:11 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1191.eqiad.wmnet with reason: Maintenance
  • 07:11 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T364069)', diff saved to https://phabricator.wikimedia.org/P65011 and previous config saved to /var/cache/conftool/dbconfig/20240615-071152-marostegui.json
  • 06:56 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P65010 and previous config saved to /var/cache/conftool/dbconfig/20240615-065645-marostegui.json
  • 06:41 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P65009 and previous config saved to /var/cache/conftool/dbconfig/20240615-064138-marostegui.json
  • 06:26 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T364069)', diff saved to https://phabricator.wikimedia.org/P65008 and previous config saved to /var/cache/conftool/dbconfig/20240615-062631-marostegui.json
  • 06:19 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1170 (T367261)', diff saved to https://phabricator.wikimedia.org/P65007 and previous config saved to /var/cache/conftool/dbconfig/20240615-061919-marostegui.json
  • 06:19 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 06:19 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 06:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T367261)', diff saved to https://phabricator.wikimedia.org/P65006 and previous config saved to /var/cache/conftool/dbconfig/20240615-061908-marostegui.json
  • 06:04 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P65005 and previous config saved to /var/cache/conftool/dbconfig/20240615-060401-marostegui.json
  • 05:48 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P65004 and previous config saved to /var/cache/conftool/dbconfig/20240615-054854-marostegui.json
  • 05:33 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T367261)', diff saved to https://phabricator.wikimedia.org/P65003 and previous config saved to /var/cache/conftool/dbconfig/20240615-053346-marostegui.json
  • 05:02 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1158 (T367261)', diff saved to https://phabricator.wikimedia.org/P65002 and previous config saved to /var/cache/conftool/dbconfig/20240615-050236-marostegui.json
  • 05:02 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1014,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 05:02 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb[1014,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 05:02 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 05:02 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 02:40 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 02:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 02:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1246 (T352010)', diff saved to https://phabricator.wikimedia.org/P65001 and previous config saved to /var/cache/conftool/dbconfig/20240615-024019-ladsgroup.json
  • 02:39 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1181 (T364069)', diff saved to https://phabricator.wikimedia.org/P65000 and previous config saved to /var/cache/conftool/dbconfig/20240615-023904-marostegui.json
  • 02:38 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 02:38 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 02:38 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T364069)', diff saved to https://phabricator.wikimedia.org/P64999 and previous config saved to /var/cache/conftool/dbconfig/20240615-023842-marostegui.json
  • 02:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1246', diff saved to https://phabricator.wikimedia.org/P64998 and previous config saved to /var/cache/conftool/dbconfig/20240615-022512-ladsgroup.json
  • 02:23 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P64997 and previous config saved to /var/cache/conftool/dbconfig/20240615-022335-marostegui.json
  • 02:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1246', diff saved to https://phabricator.wikimedia.org/P64996 and previous config saved to /var/cache/conftool/dbconfig/20240615-021005-ladsgroup.json
  • 02:08 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P64995 and previous config saved to /var/cache/conftool/dbconfig/20240615-020827-marostegui.json
  • 01:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1246 (T352010)', diff saved to https://phabricator.wikimedia.org/P64994 and previous config saved to /var/cache/conftool/dbconfig/20240615-015458-ladsgroup.json
  • 01:53 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T364069)', diff saved to https://phabricator.wikimedia.org/P64993 and previous config saved to /var/cache/conftool/dbconfig/20240615-015320-marostegui.json

2024-06-14

  • 23:09 mnz@deploy1002: Finished deploy [airflow-dags/research@ee5a291]: (no justification provided) (duration: 00m 30s)
  • 23:09 mnz@deploy1002: Started deploy [airflow-dags/research@ee5a291]: (no justification provided)
  • 22:55 brett@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4041.ulsfo.wmnet
  • 22:50 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4041.ulsfo.wmnet with OS bullseye
  • 22:33 mnz@deploy1002: Finished deploy [airflow-dags/research@5e1cd80]: (no justification provided) (duration: 00m 31s)
  • 22:33 mnz@deploy1002: Started deploy [airflow-dags/research@5e1cd80]: (no justification provided)
  • 22:27 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4041.ulsfo.wmnet with reason: host reimage
  • 22:24 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4041.ulsfo.wmnet with reason: host reimage
  • 22:03 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp4041.ulsfo.wmnet with OS bullseye
  • 22:02 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4041.ulsfo.wmnet with OS bullseye
  • 21:49 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1174 (T364069)', diff saved to https://phabricator.wikimedia.org/P64992 and previous config saved to /var/cache/conftool/dbconfig/20240614-214910-marostegui.json
  • 21:49 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 21:48 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 21:46 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp4041.ulsfo.wmnet with OS bullseye
  • 21:33 brett@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4040.ulsfo.wmnet
  • 21:33 Emperor: restart swift-proxy on ms-fe1010 T360913
  • 21:31 brett@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp4041.ulsfo.wmnet
  • 21:12 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2218 (T352010)', diff saved to https://phabricator.wikimedia.org/P64991 and previous config saved to /var/cache/conftool/dbconfig/20240614-211239-ladsgroup.json
  • 20:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2218', diff saved to https://phabricator.wikimedia.org/P64990 and previous config saved to /var/cache/conftool/dbconfig/20240614-205731-ladsgroup.json
  • 20:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2218', diff saved to https://phabricator.wikimedia.org/P64989 and previous config saved to /var/cache/conftool/dbconfig/20240614-204224-ladsgroup.json
  • 20:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2218 (T352010)', diff saved to https://phabricator.wikimedia.org/P64988 and previous config saved to /var/cache/conftool/dbconfig/20240614-202717-ladsgroup.json
  • 20:22 cdobbins@cumin1002: conftool action : set/pooled=yes; selector: name=4040.ulsfo.wmnet
  • 20:14 cdobbins@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4040.ulsfo.wmnet with OS bullseye
  • 19:52 cdobbins@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4040.ulsfo.wmnet with reason: host reimage
  • 19:49 cdobbins@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4040.ulsfo.wmnet with reason: host reimage
  • 19:27 cdobbins@cumin1002: START - Cookbook sre.hosts.reimage for host cp4040.ulsfo.wmnet with OS bullseye
  • 19:27 cdobbins@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4040.ulsfo.wmnet with OS bullseye
  • 19:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1246 (T352010)', diff saved to https://phabricator.wikimedia.org/P64987 and previous config saved to /var/cache/conftool/dbconfig/20240614-192643-ladsgroup.json
  • 19:26 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1246.eqiad.wmnet with reason: Maintenance
  • 19:26 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1246.eqiad.wmnet with reason: Maintenance
  • 19:00 cdobbins@cumin1002: START - Cookbook sre.hosts.reimage for host cp4040.ulsfo.wmnet with OS bullseye
  • 18:54 cdobbins@cumin1002: conftool action : set/pooled=no; selector: name=4040.ulsfo.wmnet
  • 17:34 pt1979@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host wikikube-ctrl2003.mgmt.codfw.wmnet with reboot policy FORCED
  • 17:23 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-ctrl2003.mgmt.codfw.wmnet with reboot policy FORCED
  • 17:11 jdrewniak@deploy1002: Finished scap: Backport for For now scope hatnote and infobox styles (T367462) (duration: 16m 06s)
  • 17:01 jdrewniak@deploy1002: jdlrobson, jdrewniak: Continuing with sync
  • 16:31 jan_drewniak: starting friday backport for T367462 https://gerrit.wikimedia.org/r/c/mediawiki/extensions/WikimediaMessages/+/1043827
  • 16:25 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4039.ulsfo.wmnet with reason: host reimage
  • 16:22 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4039.ulsfo.wmnet with reason: host reimage
  • 16:19 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host moss-be1002.eqiad.wmnet with OS bookworm
  • 16:01 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp4039.ulsfo.wmnet with OS bullseye
  • 16:00 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4039.ulsfo.wmnet with OS bullseye
  • 15:58 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on moss-be1002.eqiad.wmnet with reason: host reimage
  • 15:55 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on moss-be1002.eqiad.wmnet with reason: host reimage
  • 15:48 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp4039.ulsfo.wmnet with OS bullseye
  • 15:44 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-ctrl2003.mgmt.codfw.wmnet with reboot policy FORCED
  • 15:37 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host moss-be1002.eqiad.wmnet with OS bookworm
  • 15:37 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 15:37 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 15:37 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1170 (T364069)', diff saved to https://phabricator.wikimedia.org/P64984 and previous config saved to /var/cache/conftool/dbconfig/20240614-153727-marostegui.json
  • 15:37 mvernon@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host moss-be1002.eqiad.wmnet with OS bookworm
  • 15:32 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 15:32 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 15:31 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 15:31 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 15:29 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 15:29 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 15:27 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host moss-be1002.eqiad.wmnet with OS bookworm
  • 15:27 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 15:27 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 15:26 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 15:26 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 15:25 brett@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp4039.ulsfo.wmnet
  • 15:22 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1170', diff saved to https://phabricator.wikimedia.org/P64982 and previous config saved to /var/cache/conftool/dbconfig/20240614-152220-marostegui.json
  • 15:21 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 15:21 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 15:07 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1170', diff saved to https://phabricator.wikimedia.org/P64981 and previous config saved to /var/cache/conftool/dbconfig/20240614-150713-marostegui.json
  • 14:54 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host moss-be2003.codfw.wmnet with OS bookworm
  • 14:54 jynus: upgrade db1245 to mariadb 10.6 T360751
  • 14:52 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1170 (T364069)', diff saved to https://phabricator.wikimedia.org/P64980 and previous config saved to /var/cache/conftool/dbconfig/20240614-145206-marostegui.json
  • 14:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2211 (T367261)', diff saved to https://phabricator.wikimedia.org/P64979 and previous config saved to /var/cache/conftool/dbconfig/20240614-144925-marostegui.json
  • 14:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2211', diff saved to https://phabricator.wikimedia.org/P64978 and previous config saved to /var/cache/conftool/dbconfig/20240614-143418-marostegui.json
  • 14:34 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on moss-be2003.codfw.wmnet with reason: host reimage
  • 14:31 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on moss-be2003.codfw.wmnet with reason: host reimage
  • 14:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2211', diff saved to https://phabricator.wikimedia.org/P64976 and previous config saved to /var/cache/conftool/dbconfig/20240614-141911-marostegui.json
  • 14:16 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-ctrl2003.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:15 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-ctrl2003.mgmt.codfw.wmnet with reboot policy FORCED
  • 14:12 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host moss-be2003.codfw.wmnet with OS bookworm
  • 14:11 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host moss-be2002.codfw.wmnet with OS bookworm
  • 14:11 mvernon@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - mvernon@cumin2002"
  • 14:11 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1034.eqiad.wmnet with OS bookworm
  • 14:10 mvernon@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - mvernon@cumin2002"
  • 14:04 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "new ldap-maint hosts - jmm@cumin2002 - T367490"
  • 14:04 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2211 (T367261)', diff saved to https://phabricator.wikimedia.org/P64975 and previous config saved to /var/cache/conftool/dbconfig/20240614-140404-marostegui.json
  • 14:01 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2211 (T367261)', diff saved to https://phabricator.wikimedia.org/P64974 and previous config saved to /var/cache/conftool/dbconfig/20240614-140125-marostegui.json
  • 14:01 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2211.codfw.wmnet with reason: Maintenance
  • 14:01 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2211.codfw.wmnet with reason: Maintenance
  • 13:59 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2201.codfw.wmnet with reason: Maintenance
  • 13:59 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2201.codfw.wmnet with reason: Maintenance
  • 13:59 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2192 (T367261)', diff saved to https://phabricator.wikimedia.org/P64973 and previous config saved to /var/cache/conftool/dbconfig/20240614-135900-marostegui.json
  • 13:57 pt1979@cumin2002: START - Cookbook sre.hosts.provision for host wikikube-ctrl2003.mgmt.codfw.wmnet with reboot policy FORCED
  • 13:52 jynus: restart db2139, db2141
  • 13:50 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on moss-be2002.codfw.wmnet with reason: host reimage
  • 13:49 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "new ldap-maint hosts - jmm@cumin2002 - T367490"
  • 13:47 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on moss-be2002.codfw.wmnet with reason: host reimage
  • 13:44 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1034.eqiad.wmnet with reason: host reimage
  • 13:43 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2192', diff saved to https://phabricator.wikimedia.org/P64972 and previous config saved to /var/cache/conftool/dbconfig/20240614-134354-marostegui.json
  • 13:41 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1034.eqiad.wmnet with reason: host reimage
  • 13:28 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2192', diff saved to https://phabricator.wikimedia.org/P64971 and previous config saved to /var/cache/conftool/dbconfig/20240614-132847-marostegui.json
  • 13:28 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host moss-be2002.codfw.wmnet with OS bookworm
  • 13:24 jynus: restart db1216, db1225, db1240, db1245
  • 13:23 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1034.eqiad.wmnet with OS bookworm
  • 13:22 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on cloudvirt1034.eqiad.wmnet with reason: reimage and move to OVS
  • 13:22 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on cloudvirt1034.eqiad.wmnet with reason: reimage and move to OVS
  • 13:17 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host moss-be2001.codfw.wmnet with OS bookworm
  • 13:13 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2192 (T367261)', diff saved to https://phabricator.wikimedia.org/P64970 and previous config saved to /var/cache/conftool/dbconfig/20240614-131339-marostegui.json
  • 13:11 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2192 (T367261)', diff saved to https://phabricator.wikimedia.org/P64969 and previous config saved to /var/cache/conftool/dbconfig/20240614-131113-marostegui.json
  • 13:11 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2192.codfw.wmnet with reason: Maintenance
  • 13:10 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2192.codfw.wmnet with reason: Maintenance
  • 13:10 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T367261)', diff saved to https://phabricator.wikimedia.org/P64968 and previous config saved to /var/cache/conftool/dbconfig/20240614-131051-marostegui.json
  • 13:05 jynus: restart db1150, db1171
  • 12:58 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 12:58 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 12:58 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on moss-be2001.codfw.wmnet with reason: host reimage
  • 12:55 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P64967 and previous config saved to /var/cache/conftool/dbconfig/20240614-125543-marostegui.json
  • 12:54 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on moss-be2001.codfw.wmnet with reason: host reimage
  • 12:51 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host moss-be2001.codfw.wmnet with OS bookworm
  • 12:45 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on gitlab2002.wikimedia.org with reason: GitLab upgrade
  • 12:45 jelto@cumin1002: START - Cookbook sre.hosts.downtime for 0:15:00 on gitlab2002.wikimedia.org with reason: GitLab upgrade
  • 12:40 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P64966 and previous config saved to /var/cache/conftool/dbconfig/20240614-124036-marostegui.json
  • 12:25 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T367261)', diff saved to https://phabricator.wikimedia.org/P64964 and previous config saved to /var/cache/conftool/dbconfig/20240614-122530-marostegui.json
  • 12:23 mvernon@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host moss-be2001.codfw.wmnet with OS bookworm
  • 12:22 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2178 (T367261)', diff saved to https://phabricator.wikimedia.org/P64963 and previous config saved to /var/cache/conftool/dbconfig/20240614-122255-marostegui.json
  • 12:22 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2178.codfw.wmnet with reason: Maintenance
  • 12:22 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2178.codfw.wmnet with reason: Maintenance
  • 12:22 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2171 (T367261)', diff saved to https://phabricator.wikimedia.org/P64962 and previous config saved to /var/cache/conftool/dbconfig/20240614-122233-marostegui.json
  • 12:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1230 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P64961 and previous config saved to /var/cache/conftool/dbconfig/20240614-122210-ladsgroup.json
  • 12:09 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1239.eqiad.wmnet with reason: Maintenance
  • 12:09 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1239.eqiad.wmnet with reason: Maintenance
  • 12:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233 (T352010)', diff saved to https://phabricator.wikimedia.org/P64960 and previous config saved to /var/cache/conftool/dbconfig/20240614-120918-ladsgroup.json
  • 12:09 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on clouddb1018.eqiad.wmnet with reason: hardware issues T367499
  • 12:08 fnegri@cumin1002: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on clouddb1018.eqiad.wmnet with reason: hardware issues T367499
  • 12:08 fnegri@cumin1002: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host clouddb1018.eqiad.wmnet
  • 12:07 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2171', diff saved to https://phabricator.wikimedia.org/P64959 and previous config saved to /var/cache/conftool/dbconfig/20240614-120727-marostegui.json
  • 12:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1230 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P64958 and previous config saved to /var/cache/conftool/dbconfig/20240614-120704-ladsgroup.json
  • 12:01 jelto@cumin1002: END (FAIL) - Cookbook sre.gitlab.upgrade (exit_code=99) on GitLab host gitlab2002.wikimedia.org with reason: GitLab to new version
  • 11:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P64957 and previous config saved to /var/cache/conftool/dbconfig/20240614-115411-ladsgroup.json
  • 11:52 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2171', diff saved to https://phabricator.wikimedia.org/P64956 and previous config saved to /var/cache/conftool/dbconfig/20240614-115220-marostegui.json
  • 11:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1230 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P64955 and previous config saved to /var/cache/conftool/dbconfig/20240614-115159-ladsgroup.json
  • 11:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2218 (T352010)', diff saved to https://phabricator.wikimedia.org/P64954 and previous config saved to /var/cache/conftool/dbconfig/20240614-114002-ladsgroup.json
  • 11:39 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2218.codfw.wmnet with reason: Maintenance
  • 11:39 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2218.codfw.wmnet with reason: Maintenance
  • 11:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P64953 and previous config saved to /var/cache/conftool/dbconfig/20240614-113904-ladsgroup.json
  • 11:37 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2171 (T367261)', diff saved to https://phabricator.wikimedia.org/P64952 and previous config saved to /var/cache/conftool/dbconfig/20240614-113712-marostegui.json
  • 11:37 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ldap-maint1001.eqiad.wmnet
  • 11:37 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ldap-maint1001.eqiad.wmnet with OS bookworm
  • 11:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1230 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P64951 and previous config saved to /var/cache/conftool/dbconfig/20240614-113654-ladsgroup.json
  • 11:33 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2171 (T367261)', diff saved to https://phabricator.wikimedia.org/P64950 and previous config saved to /var/cache/conftool/dbconfig/20240614-113325-marostegui.json
  • 11:33 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 11:33 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 11:33 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T367261)', diff saved to https://phabricator.wikimedia.org/P64949 and previous config saved to /var/cache/conftool/dbconfig/20240614-113303-marostegui.json
  • 11:28 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1230.eqiad.wmnet with reason: Maintenance
  • 11:28 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1230.eqiad.wmnet with reason: Maintenance
  • 11:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233 (T352010)', diff saved to https://phabricator.wikimedia.org/P64948 and previous config saved to /var/cache/conftool/dbconfig/20240614-112357-ladsgroup.json
  • 11:21 fnegri@cumin1002: START - Cookbook sre.hosts.reboot-single for host clouddb1018.eqiad.wmnet
  • 11:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ldap-maint1001.eqiad.wmnet with reason: host reimage
  • 11:18 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on clouddb1018.eqiad.wmnet with reason: T366555
  • 11:18 fnegri@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on clouddb1018.eqiad.wmnet with reason: T366555
  • 11:17 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P64947 and previous config saved to /var/cache/conftool/dbconfig/20240614-111756-marostegui.json
  • 11:17 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ldap-maint1001.eqiad.wmnet with reason: host reimage
  • 11:06 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 11:06 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 11:02 jynus: restart backup* hosts
  • 11:02 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host moss-be2001.codfw.wmnet with OS bookworm
  • 11:02 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P64946 and previous config saved to /var/cache/conftool/dbconfig/20240614-110249-marostegui.json
  • 11:00 mvernon@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host moss-be2001.codfw.wmnet with OS bookworm
  • 10:59 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host moss-be2001.codfw.wmnet with OS bookworm
  • 10:56 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: sync
  • 10:55 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on clouddb1018.eqiad.wmnet with reason: T366555
  • 10:55 fnegri@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on clouddb1018.eqiad.wmnet with reason: T366555
  • 10:55 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-parsoid: sync
  • 10:55 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: sync
  • 10:54 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-parsoid: sync
  • 10:54 mvernon@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host moss-be2001.codfw.wmnet with OS bookworm
  • 10:54 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1018.eqiad.wmnet,service=s7
  • 10:54 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1018.eqiad.wmnet,service=s2
  • 10:54 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: sync
  • 10:53 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-parsoid: sync
  • 10:53 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: sync
  • 10:52 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-parsoid: sync
  • 10:47 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T367261)', diff saved to https://phabricator.wikimedia.org/P64945 and previous config saved to /var/cache/conftool/dbconfig/20240614-104742-marostegui.json
  • 10:45 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host moss-be2001.codfw.wmnet with OS bookworm
  • 10:43 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host moss-fe2002.codfw.wmnet with OS bookworm
  • 10:43 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2157 (T367261)', diff saved to https://phabricator.wikimedia.org/P64943 and previous config saved to /var/cache/conftool/dbconfig/20240614-104352-marostegui.json
  • 10:43 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2157.codfw.wmnet with reason: Maintenance
  • 10:43 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2157.codfw.wmnet with reason: Maintenance
  • 10:43 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T367261)', diff saved to https://phabricator.wikimedia.org/P64942 and previous config saved to /var/cache/conftool/dbconfig/20240614-104330-marostegui.json
  • 10:39 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply
  • 10:37 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply
  • 10:33 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on moss-fe2002.codfw.wmnet with reason: host reimage
  • 10:30 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on moss-fe2002.codfw.wmnet with reason: host reimage
  • 10:28 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P64941 and previous config saved to /var/cache/conftool/dbconfig/20240614-102823-marostegui.json
  • 10:28 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host moss-fe2002.codfw.wmnet with OS bookworm
  • 10:25 mvernon@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host moss-be2001.codfw.wmnet with OS bookworm
  • 10:17 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ldap-maint1001.eqiad.wmnet with OS bookworm
  • 10:13 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P64940 and previous config saved to /var/cache/conftool/dbconfig/20240614-101316-marostegui.json
  • 10:00 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM ldap-maint1001.eqiad.wmnet - jmm@cumin2002"
  • 09:59 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM ldap-maint1001.eqiad.wmnet - jmm@cumin2002"
  • 09:58 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T367261)', diff saved to https://phabricator.wikimedia.org/P64939 and previous config saved to /var/cache/conftool/dbconfig/20240614-095809-marostegui.json
  • 09:54 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2128 (T367261)', diff saved to https://phabricator.wikimedia.org/P64938 and previous config saved to /var/cache/conftool/dbconfig/20240614-095434-marostegui.json
  • 09:54 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 09:54 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 09:54 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2128.codfw.wmnet with reason: Maintenance
  • 09:54 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2128.codfw.wmnet with reason: Maintenance
  • 09:53 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2123 (T367261)', diff saved to https://phabricator.wikimedia.org/P64937 and previous config saved to /var/cache/conftool/dbconfig/20240614-095356-marostegui.json
  • 09:45 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ldap-maint1001.eqiad.wmnet on all recursors
  • 09:45 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply
  • 09:45 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache ldap-maint1001.eqiad.wmnet on all recursors
  • 09:45 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:45 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ldap-maint1001.eqiad.wmnet - jmm@cumin2002"
  • 09:44 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply
  • 09:44 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply
  • 09:43 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply
  • 09:43 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: GitLab to new version
  • 09:43 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ldap-maint1001.eqiad.wmnet - jmm@cumin2002"
  • 09:38 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2123', diff saved to https://phabricator.wikimedia.org/P64936 and previous config saved to /var/cache/conftool/dbconfig/20240614-093849-marostegui.json
  • 09:37 jynus: upgrade and restart dbprov[12]00[3456]
  • 09:36 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1170 (T364069)', diff saved to https://phabricator.wikimedia.org/P64935 and previous config saved to /var/cache/conftool/dbconfig/20240614-093657-marostegui.json
  • 09:36 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 09:36 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 09:36 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T364069)', diff saved to https://phabricator.wikimedia.org/P64934 and previous config saved to /var/cache/conftool/dbconfig/20240614-093634-marostegui.json
  • 09:31 filippo@deploy1002: helmfile [codfw] DONE helmfile.d/services/zotero: apply
  • 09:31 filippo@deploy1002: helmfile [codfw] START helmfile.d/services/zotero: apply
  • 09:31 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 09:31 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host ldap-maint1001.eqiad.wmnet
  • 09:30 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.resource-report (exit_code=0)
  • 09:30 jmm@cumin2002: START - Cookbook sre.ganeti.resource-report
  • 09:29 filippo@deploy1002: helmfile [eqiad] DONE helmfile.d/services/zotero: apply
  • 09:29 filippo@deploy1002: helmfile [eqiad] START helmfile.d/services/zotero: apply
  • 09:25 filippo@deploy1002: helmfile [eqiad] DONE helmfile.d/services/apertium: apply
  • 09:25 filippo@deploy1002: helmfile [eqiad] START helmfile.d/services/apertium: apply
  • 09:23 filippo@deploy1002: helmfile [codfw] DONE helmfile.d/services/apertium: apply
  • 09:23 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2123', diff saved to https://phabricator.wikimedia.org/P64933 and previous config saved to /var/cache/conftool/dbconfig/20240614-092342-marostegui.json
  • 09:23 filippo@deploy1002: helmfile [codfw] START helmfile.d/services/apertium: apply
  • 09:22 filippo@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventstreams: apply
  • 09:22 filippo@deploy1002: helmfile [codfw] START helmfile.d/services/eventstreams: apply
  • 09:21 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P64932 and previous config saved to /var/cache/conftool/dbconfig/20240614-092127-marostegui.json
  • 09:14 filippo@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventstreams: apply
  • 09:13 filippo@deploy1002: helmfile [eqiad] START helmfile.d/services/eventstreams: apply
  • 09:10 ryankemper@cumin2002: END (ERROR) - Cookbook sre.hadoop.reboot-workers (exit_code=97) for Hadoop analytics cluster
  • 09:08 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2123 (T367261)', diff saved to https://phabricator.wikimedia.org/P64931 and previous config saved to /var/cache/conftool/dbconfig/20240614-090835-marostegui.json
  • 09:06 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ldap-maint2001.codfw.wmnet
  • 09:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ldap-maint2001.codfw.wmnet with OS bookworm
  • 09:06 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P64930 and previous config saved to /var/cache/conftool/dbconfig/20240614-090620-marostegui.json
  • 09:04 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2123 (T367261)', diff saved to https://phabricator.wikimedia.org/P64929 and previous config saved to /var/cache/conftool/dbconfig/20240614-090457-marostegui.json
  • 09:04 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 09:04 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 09:03 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
  • 09:03 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
  • 09:01 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance
  • 09:01 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1245.eqiad.wmnet with reason: Maintenance
  • 09:00 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1230.eqiad.wmnet with reason: Maintenance
  • 09:00 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1230.eqiad.wmnet with reason: Maintenance
  • 08:58 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1216.eqiad.wmnet with reason: Maintenance
  • 08:58 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1216.eqiad.wmnet with reason: Maintenance
  • 08:58 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1213 (T367261)', diff saved to https://phabricator.wikimedia.org/P64928 and previous config saved to /var/cache/conftool/dbconfig/20240614-085817-marostegui.json
  • 08:55 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host moss-be2001.codfw.wmnet with OS bookworm
  • 08:51 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T364069)', diff saved to https://phabricator.wikimedia.org/P64927 and previous config saved to /var/cache/conftool/dbconfig/20240614-085113-marostegui.json
  • 08:51 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ldap-maint2001.codfw.wmnet with reason: host reimage
  • 08:47 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ldap-maint2001.codfw.wmnet with reason: host reimage
  • 08:44 marostegui: dbmaint eqiad s8 deploy schema change T367261
  • 08:44 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Long schema change
  • 08:44 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Long schema change
  • 08:43 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1213', diff saved to https://phabricator.wikimedia.org/P64926 and previous config saved to /var/cache/conftool/dbconfig/20240614-084310-marostegui.json
  • 08:35 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host moss-fe2002.codfw.wmnet with OS bookworm
  • 08:30 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ldap-maint2001.codfw.wmnet with OS bookworm
  • 08:28 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM ldap-maint2001.codfw.wmnet - jmm@cumin2002"
  • 08:28 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1213', diff saved to https://phabricator.wikimedia.org/P64925 and previous config saved to /var/cache/conftool/dbconfig/20240614-082803-marostegui.json
  • 08:27 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM ldap-maint2001.codfw.wmnet - jmm@cumin2002"
  • 08:27 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ldap-maint2001.codfw.wmnet on all recursors
  • 08:27 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache ldap-maint2001.codfw.wmnet on all recursors
  • 08:27 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:27 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ldap-maint2001.codfw.wmnet - jmm@cumin2002"
  • 08:26 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ldap-maint2001.codfw.wmnet - jmm@cumin2002"
  • 08:24 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 08:24 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host ldap-maint2001.codfw.wmnet
  • 08:21 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.resource-report (exit_code=0)
  • 08:21 jmm@cumin2002: START - Cookbook sre.ganeti.resource-report
  • 08:17 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on moss-fe2002.codfw.wmnet with reason: host reimage
  • 08:14 pfischer@deploy1002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 08:14 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on moss-fe2002.codfw.wmnet with reason: host reimage
  • 08:14 pfischer@deploy1002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 08:13 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1213 (T367261)', diff saved to https://phabricator.wikimedia.org/P64924 and previous config saved to /var/cache/conftool/dbconfig/20240614-081255-marostegui.json
  • 08:09 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1213 (T367261)', diff saved to https://phabricator.wikimedia.org/P64923 and previous config saved to /var/cache/conftool/dbconfig/20240614-080938-marostegui.json
  • 08:09 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1213.eqiad.wmnet with reason: Maintenance
  • 08:09 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1213.eqiad.wmnet with reason: Maintenance
  • 08:09 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1210 (T367261)', diff saved to https://phabricator.wikimedia.org/P64922 and previous config saved to /var/cache/conftool/dbconfig/20240614-080915-marostegui.json
  • 08:03 marostegui: dbmaint codfw s8 deploy schema change T367261
  • 07:56 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host moss-fe2002.codfw.wmnet with OS bookworm
  • 07:54 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1210', diff saved to https://phabricator.wikimedia.org/P64921 and previous config saved to /var/cache/conftool/dbconfig/20240614-075408-marostegui.json
  • 07:39 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1210', diff saved to https://phabricator.wikimedia.org/P64920 and previous config saved to /var/cache/conftool/dbconfig/20240614-073902-marostegui.json
  • 07:31 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ping1003.eqiad.wmnet
  • 07:31 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 07:31 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ping1003.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 07:23 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1210 (T367261)', diff saved to https://phabricator.wikimedia.org/P64919 and previous config saved to /var/cache/conftool/dbconfig/20240614-072354-marostegui.json
  • 07:23 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ping1003.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 07:20 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1210 (T367261)', diff saved to https://phabricator.wikimedia.org/P64918 and previous config saved to /var/cache/conftool/dbconfig/20240614-072034-marostegui.json
  • 07:20 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1210.eqiad.wmnet with reason: Maintenance
  • 07:20 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1210.eqiad.wmnet with reason: Maintenance
  • 07:20 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T367261)', diff saved to https://phabricator.wikimedia.org/P64917 and previous config saved to /var/cache/conftool/dbconfig/20240614-072012-marostegui.json
  • 07:19 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 07:17 marostegui: dbmaint eqiad s1 deploy schema change T367261
  • 07:14 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ping1003.eqiad.wmnet
  • 07:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts ping2003.codfw.wmnet
  • 07:09 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 07:09 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ping2003.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 07:07 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on clouddb[1013,1017].eqiad.wmnet,db1154.eqiad.wmnet with reason: Long schema change
  • 07:07 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on clouddb[1013,1017].eqiad.wmnet,db1154.eqiad.wmnet with reason: Long schema change
  • 07:07 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ping2003.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 07:05 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P64916 and previous config saved to /var/cache/conftool/dbconfig/20240614-070505-marostegui.json
  • 06:58 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 06:53 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ping2003.codfw.wmnet
  • 06:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P64915 and previous config saved to /var/cache/conftool/dbconfig/20240614-064958-marostegui.json
  • 06:41 marostegui: dbmaint codfw s1 deploy schema change T367261
  • 06:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T367261)', diff saved to https://phabricator.wikimedia.org/P64914 and previous config saved to /var/cache/conftool/dbconfig/20240614-063451-marostegui.json
  • 06:34 moritzm: rebalance ganeti/C in eqiad following reboots
  • 06:31 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1200 (T367261)', diff saved to https://phabricator.wikimedia.org/P64913 and previous config saved to /var/cache/conftool/dbconfig/20240614-063138-marostegui.json
  • 06:31 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1200.eqiad.wmnet with reason: Maintenance
  • 06:31 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1200.eqiad.wmnet with reason: Maintenance
  • 06:31 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T367261)', diff saved to https://phabricator.wikimedia.org/P64912 and previous config saved to /var/cache/conftool/dbconfig/20240614-063116-marostegui.json
  • 06:16 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P64911 and previous config saved to /var/cache/conftool/dbconfig/20240614-061609-marostegui.json
  • 06:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P64910 and previous config saved to /var/cache/conftool/dbconfig/20240614-060102-marostegui.json
  • 05:45 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T367261)', diff saved to https://phabricator.wikimedia.org/P64909 and previous config saved to /var/cache/conftool/dbconfig/20240614-054555-marostegui.json
  • 05:40 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1185 (T367261)', diff saved to https://phabricator.wikimedia.org/P64908 and previous config saved to /var/cache/conftool/dbconfig/20240614-054041-marostegui.json
  • 05:40 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1185.eqiad.wmnet with reason: Maintenance
  • 05:40 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1185.eqiad.wmnet with reason: Maintenance
  • 05:40 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T367261)', diff saved to https://phabricator.wikimedia.org/P64907 and previous config saved to /var/cache/conftool/dbconfig/20240614-054019-marostegui.json
  • 05:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1233 (T352010)', diff saved to https://phabricator.wikimedia.org/P64906 and previous config saved to /var/cache/conftool/dbconfig/20240614-053023-ladsgroup.json
  • 05:30 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1233.eqiad.wmnet with reason: Maintenance
  • 05:30 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1233.eqiad.wmnet with reason: Maintenance
  • 05:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229 (T352010)', diff saved to https://phabricator.wikimedia.org/P64905 and previous config saved to /var/cache/conftool/dbconfig/20240614-053001-ladsgroup.json
  • 05:25 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P64904 and previous config saved to /var/cache/conftool/dbconfig/20240614-052512-marostegui.json
  • 05:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P64903 and previous config saved to /var/cache/conftool/dbconfig/20240614-051454-ladsgroup.json
  • 05:10 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P64902 and previous config saved to /var/cache/conftool/dbconfig/20240614-051005-marostegui.json
  • 04:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P64901 and previous config saved to /var/cache/conftool/dbconfig/20240614-045947-ladsgroup.json
  • 04:55 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T367261)', diff saved to https://phabricator.wikimedia.org/P64900 and previous config saved to /var/cache/conftool/dbconfig/20240614-045458-marostegui.json
  • 04:51 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1161 (T367261)', diff saved to https://phabricator.wikimedia.org/P64899 and previous config saved to /var/cache/conftool/dbconfig/20240614-045129-marostegui.json
  • 04:51 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 04:51 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 04:51 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 04:51 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 04:48 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1158 (T364069)', diff saved to https://phabricator.wikimedia.org/P64898 and previous config saved to /var/cache/conftool/dbconfig/20240614-044840-marostegui.json
  • 04:48 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 04:48 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 04:48 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 04:48 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 04:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229 (T352010)', diff saved to https://phabricator.wikimedia.org/P64897 and previous config saved to /var/cache/conftool/dbconfig/20240614-044440-ladsgroup.json
  • 03:39 cdobbins@cumin1002: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on A:cp-upload_eqsin
  • 03:39 cdobbins@cumin1002: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on A:cp-text_eqsin
  • 01:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1229 (T352010)', diff saved to https://phabricator.wikimedia.org/P64896 and previous config saved to /var/cache/conftool/dbconfig/20240614-010717-ladsgroup.json
  • 01:07 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1229.eqiad.wmnet with reason: Maintenance
  • 01:06 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1229.eqiad.wmnet with reason: Maintenance

2024-06-13

  • 23:56 zabe@deploy1002: Finished scap: T361041, Update interwiki cache (duration: 11m 07s)
  • 23:48 foks: removing 7 files for legal compliance
  • 23:45 zabe@deploy1002: Started scap: T361041, Update interwiki cache
  • 23:23 zabe: zabe@mwmaint1002:~$ mwscript extensions/CirrusSearch/maintenance/UpdateSearchIndexConfig.php --wiki=sysop_plwiki --cluster=all 2>&1 | tee /tmp/sysop_plwiki.UpdateSearchIndexConfig.log # T361041
  • 23:20 zabe@deploy1002: Finished scap: T361041 (duration: 11m 36s)
  • 23:17 foks: removing 9 files for legal compliance
  • 23:08 zabe@deploy1002: Started scap: T361041
  • 23:06 zabe@deploy1002: Sync cancelled.
  • 23:02 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:aqs-eqiad: Upgrade to Java 11 — T350567 - eevans@cumin1002
  • 23:01 zabe@deploy1002: zabe: T361041 synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 22:59 zabe@deploy1002: Started scap: T361041
  • 22:49 zabe: create plwiki sysop wiki # T361041
  • 22:37 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host wikikube-ctrl2003.mgmt.codfw.wmnet with reboot policy FORCED
  • 22:05 ryankemper@cumin2002: START - Cookbook sre.hadoop.reboot-workers for Hadoop analytics cluster
  • 21:33 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:aqs-eqiad: Upgrade to Java 11 — T350567 - eevans@cumin1002
  • 21:32 jsn@deploy1002: Finished scap: Backport for Deploy QuickSurvey for Automoderator patroller workstream survey (T362969) (duration: 14m 18s)
  • 21:23 jsn@deploy1002: jsn, kgraessle: Continuing with sync
  • 21:22 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2220 (T364069)', diff saved to https://phabricator.wikimedia.org/P64894 and previous config saved to /var/cache/conftool/dbconfig/20240613-212230-marostegui.json
  • 21:20 jsn@deploy1002: jsn, kgraessle: Backport for Deploy QuickSurvey for Automoderator patroller workstream survey (T362969) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:17 jsn@deploy1002: Started scap: Backport for Deploy QuickSurvey for Automoderator patroller workstream survey (T362969)
  • 21:15 jsn@deploy1002: Finished scap: Backport for Look for iPadOS in user-agent, in addition to iOS. (T362723) (duration: 14m 11s)
  • 21:07 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2220', diff saved to https://phabricator.wikimedia.org/P64893 and previous config saved to /var/cache/conftool/dbconfig/20240613-210723-marostegui.json
  • 21:07 jsn@deploy1002: dbrant, jsn: Continuing with sync
  • 21:04 jsn@deploy1002: dbrant, jsn: Backport for Look for iPadOS in user-agent, in addition to iOS. (T362723) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:04 topranks: changing BGP aggregate contribution policy / external route announcement cr2-eqdfw (T367439)
  • 21:03 topranks: changing BGP aggregate contribution policy / external route announcement cr2-eqord (T367439)
  • 21:01 jsn@deploy1002: Started scap: Backport for Look for iPadOS in user-agent, in addition to iOS. (T362723)
  • 20:55 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:aqs-codfw: Upgrade to Java 11 — T350567 - eevans@cumin1002
  • 20:52 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2220', diff saved to https://phabricator.wikimedia.org/P64892 and previous config saved to /var/cache/conftool/dbconfig/20240613-205215-marostegui.json
  • 20:50 cdobbins@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4038.eqsin.wmnet
  • 20:44 cdobbins@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4038.ulsfo.wmnet with OS bullseye
  • 20:37 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2220 (T364069)', diff saved to https://phabricator.wikimedia.org/P64891 and previous config saved to /var/cache/conftool/dbconfig/20240613-203708-marostegui.json
  • 20:17 cdobbins@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4038.ulsfo.wmnet with reason: host reimage
  • 20:14 cdobbins@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4038.ulsfo.wmnet with reason: host reimage
  • 20:13 foks: removing 1 file for legal compliance
  • 20:00 kamila@cumin1002: conftool action : set/pooled=yes; selector: name=wikikube-ctrl1003.eqiad.wmnet
  • 19:59 foks: removing 2 files for legal compliance
  • 19:58 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for wikikube-ctrl1003.eqiad.wmnet
  • 19:58 kamila@cumin1002: START - Cookbook sre.hosts.remove-downtime for wikikube-ctrl1003.eqiad.wmnet
  • 19:53 cdobbins@cumin2002: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS bullseye
  • 19:51 foks: removing 2 files for legal compliance
  • 19:51 cdobbins@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4038.ulsfo.wmnet with OS bullseye
  • 19:41 foks: removing 2 files for legal compliance
  • 19:28 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:aqs-codfw: Upgrade to Java 11 — T350567 - eevans@cumin1002
  • 19:27 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for aqs1013.eqiad.wmnet
  • 19:27 eevans@cumin1002: START - Cookbook sre.hosts.remove-downtime for aqs1013.eqiad.wmnet
  • 19:27 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-ctrl1003.eqiad.wmnet with OS bullseye
  • 19:10 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on wikikube-ctrl1003.eqiad.wmnet with reason: reimage failing
  • 19:10 kamila@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on wikikube-ctrl1003.eqiad.wmnet with reason: reimage failing
  • 18:49 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance
  • 18:49 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance
  • 18:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1222 (T352010)', diff saved to https://phabricator.wikimedia.org/P64890 and previous config saved to /var/cache/conftool/dbconfig/20240613-184924-ladsgroup.json
  • 18:36 cdobbins@cumin2002: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS bullseye
  • 18:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1222', diff saved to https://phabricator.wikimedia.org/P64889 and previous config saved to /var/cache/conftool/dbconfig/20240613-183417-ladsgroup.json
  • 18:29 brennen@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.43.0-wmf.9 refs T361403
  • 18:29 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 18:28 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 18:26 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 18:26 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 18:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1222', diff saved to https://phabricator.wikimedia.org/P64888 and previous config saved to /var/cache/conftool/dbconfig/20240613-181911-ladsgroup.json
  • 18:17 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4038.ulsfo.wmnet with OS bullseye
  • 18:16 brennen: 1.43.0-wmf.9 train (T361403): no current blockers, rolling to group2
  • 18:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1222 (T352010)', diff saved to https://phabricator.wikimedia.org/P64887 and previous config saved to /var/cache/conftool/dbconfig/20240613-180404-ladsgroup.json
  • 17:57 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS bullseye
  • 17:57 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4038.ulsfo.wmnet with OS bullseye
  • 17:39 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp4038.ulsfo.wmnet with OS bullseye
  • 17:33 brett@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp4038.ulsfo.wmnet
  • 17:19 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-reload reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/full/20240603/ using stat1009.eqiad.wmnet)
  • 17:06 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217 (T367261)', diff saved to https://phabricator.wikimedia.org/P64886 and previous config saved to /var/cache/conftool/dbconfig/20240613-170602-marostegui.json
  • 16:57 brennen@deploy1002: Finished scap: Backport for Convert local function to arrow function to fix context (T367366) (duration: 16m 51s)
  • 16:43 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:43 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS info - pt1979@cumin2002"
  • 16:43 brennen@deploy1002: jforrester, brennen: Backport for Convert local function to arrow function to fix context (T367366) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 16:41 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS info - pt1979@cumin2002"
  • 16:40 brennen@deploy1002: Started scap: Backport for Convert local function to arrow function to fix context (T367366)
  • 16:39 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 16:35 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217', diff saved to https://phabricator.wikimedia.org/P64884 and previous config saved to /var/cache/conftool/dbconfig/20240613-163547-marostegui.json
  • 16:30 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/full/20240603 using stat1009.eqiad.wmnet)
  • 16:29 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host moss-fe2002.codfw.wmnet with OS bookworm
  • 16:27 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-reload reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/full/20240603 using stat1009.eqiad.wmnet)
  • 16:24 mutante: gitlab-replica.wikimedia.org - short downtime - renaming to gitlab-replica-a
  • 16:23 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:23 arnaudb@cumin1002: dbctl commit (dc=all): 'es1039 (re)pooling @ 100%: post T365983 repool', diff saved to https://phabricator.wikimedia.org/P64883 and previous config saved to /var/cache/conftool/dbconfig/20240613-162321-arnaudb.json
  • 16:21 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 16:20 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217 (T367261)', diff saved to https://phabricator.wikimedia.org/P64882 and previous config saved to /var/cache/conftool/dbconfig/20240613-162040-marostegui.json
  • 16:18 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-reload (exit_code=0) reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/discovery/wdqs-reload-cookbook-test-T349069/ using stat1009.eqiad.wmnet)
  • 16:18 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on aqs1013.eqiad.wmnet with reason: Main board swap — T362033
  • 16:18 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on aqs1013.eqiad.wmnet with reason: Main board swap — T362033
  • 16:16 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2217 (T367261)', diff saved to https://phabricator.wikimedia.org/P64881 and previous config saved to /var/cache/conftool/dbconfig/20240613-161641-marostegui.json
  • 16:16 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2217.codfw.wmnet with reason: Maintenance
  • 16:16 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2217.codfw.wmnet with reason: Maintenance
  • 16:16 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2214 (T367261)', diff saved to https://phabricator.wikimedia.org/P64880 and previous config saved to /var/cache/conftool/dbconfig/20240613-161617-marostegui.json
  • 16:11 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-reload reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/discovery/wdqs-reload-cookbook-test-T349069/ using stat1009.eqiad.wmnet)
  • 16:11 cdanis: gnt-node failover -f ganeti2028.codfw.wmnet
  • 16:11 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on moss-fe2002.codfw.wmnet with reason: host reimage
  • 16:09 pfischer@deploy1002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 16:08 pfischer@deploy1002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 16:08 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on moss-fe2002.codfw.wmnet with reason: host reimage
  • 16:08 cdanis: forcibly rebooted ganeti2028, drdbd hung
  • 16:08 arnaudb@cumin1002: dbctl commit (dc=all): 'es1039 (re)pooling @ 75%: post T365983 repool', diff saved to https://phabricator.wikimedia.org/P64878 and previous config saved to /var/cache/conftool/dbconfig/20240613-160816-arnaudb.json
  • 16:07 ebernhardson@deploy1002: Finished deploy [airflow-dags/search@ee5a291]: make public data from wdqs subgraph analysis readable by others (duration: 00m 22s)
  • 16:06 ebernhardson@deploy1002: Started deploy [airflow-dags/search@ee5a291]: make public data from wdqs subgraph analysis readable by others
  • 16:04 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2220 (T364069)', diff saved to https://phabricator.wikimedia.org/P64877 and previous config saved to /var/cache/conftool/dbconfig/20240613-160453-marostegui.json
  • 16:04 pfischer@deploy1002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 16:04 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2220.codfw.wmnet with reason: Maintenance
  • 16:04 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2220.codfw.wmnet with reason: Maintenance
  • 16:04 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2218 (T364069)', diff saved to https://phabricator.wikimedia.org/P64876 and previous config saved to /var/cache/conftool/dbconfig/20240613-160431-marostegui.json
  • 16:04 pfischer@deploy1002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 16:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2214', diff saved to https://phabricator.wikimedia.org/P64875 and previous config saved to /var/cache/conftool/dbconfig/20240613-160110-marostegui.json
  • 15:54 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp4037.ulsfo.wmnet
  • 15:53 arnaudb@cumin1002: dbctl commit (dc=all): 'es1039 (re)pooling @ 50%: post T365983 repool', diff saved to https://phabricator.wikimedia.org/P64874 and previous config saved to /var/cache/conftool/dbconfig/20240613-155310-arnaudb.json
  • 15:52 elukey: drop mediawiki-services-restbase docker images from the Docker Registry - T367427
  • 15:51 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp4037.ulsfo.wmnet
  • 15:50 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host moss-fe2002.codfw.wmnet with OS bookworm
  • 15:50 mvernon@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host moss-fe2002.codfw.wmnet with OS bookworm
  • 15:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2218', diff saved to https://phabricator.wikimedia.org/P64873 and previous config saved to /var/cache/conftool/dbconfig/20240613-154924-marostegui.json
  • 15:46 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2214', diff saved to https://phabricator.wikimedia.org/P64872 and previous config saved to /var/cache/conftool/dbconfig/20240613-154603-marostegui.json
  • 15:44 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-ctrl1003.eqiad.wmnet with reason: host reimage
  • 15:42 kamila@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-ctrl1003.eqiad.wmnet with reason: host reimage
  • 15:41 cdobbins@cumin1002: START - Cookbook sre.cdn.roll-reboot rolling reboot on A:cp-text_eqsin
  • 15:38 cdobbins@cumin1002: START - Cookbook sre.cdn.roll-reboot rolling reboot on A:cp-upload_eqsin
  • 15:38 ChrisDobbins901_: cdobbins@cumin1002 sudo -i cookbook sre.cdn.roll-reboot --alias 'cp-upload_eqsin' --batchsize 1 --reason T366555 --task-id T366555 --grace-sleep 5400
  • 15:38 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/ratelimit: apply
  • 15:38 arnaudb@cumin1002: dbctl commit (dc=all): 'es1039 (re)pooling @ 25%: post T365983 repool', diff saved to https://phabricator.wikimedia.org/P64871 and previous config saved to /var/cache/conftool/dbconfig/20240613-153805-arnaudb.json
  • 15:37 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/ratelimit: apply
  • 15:37 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply
  • 15:37 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host moss-fe2002.codfw.wmnet with OS bookworm
  • 15:36 mvernon@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host moss-fe2002.codfw.wmnet with OS bookworm
  • 15:35 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/ratelimit: apply
  • 15:34 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/ratelimit: apply
  • 15:34 jayme@deploy1002: helmfile [staging] START helmfile.d/services/ratelimit: apply
  • 15:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2218', diff saved to https://phabricator.wikimedia.org/P64870 and previous config saved to /var/cache/conftool/dbconfig/20240613-153417-marostegui.json
  • 15:30 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2214 (T367261)', diff saved to https://phabricator.wikimedia.org/P64869 and previous config saved to /var/cache/conftool/dbconfig/20240613-153056-marostegui.json
  • 15:28 Lucas_WMDE: STOPPED lucaswerkmeister-wmde@mwmaint1002:~$ time mwscript extensions/DiscussionTools/maintenance/persistRevisionThreadItems.php --wiki enwiki --current --all --touched-after=20240524120000 --start '["55386869"]' 2>&1 | tee -a ~/T315510-enwiki-9; date # Ctrl+C – had slowed down, unnecessary work by this point; was at --start '["55914913"]'
  • 15:28 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl1003.eqiad.wmnet with OS bullseye
  • 15:27 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2214 (T367261)', diff saved to https://phabricator.wikimedia.org/P64868 and previous config saved to /var/cache/conftool/dbconfig/20240613-152748-marostegui.json
  • 15:27 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2214.codfw.wmnet with reason: Maintenance
  • 15:27 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2214.codfw.wmnet with reason: Maintenance
  • 15:27 kamila@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-ctrl1003.eqiad.wmnet with OS bullseye
  • 15:26 elukey: drop mediawiki-services-parsoid docker images from the Docker Registry - T367427
  • 15:25 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host moss-fe2002.codfw.wmnet with OS bookworm
  • 15:24 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2197.codfw.wmnet with reason: Maintenance
  • 15:24 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2197.codfw.wmnet with reason: Maintenance
  • 15:24 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193 (T367261)', diff saved to https://phabricator.wikimedia.org/P64867 and previous config saved to /var/cache/conftool/dbconfig/20240613-152420-marostegui.json
  • 15:23 arnaudb@cumin1002: dbctl commit (dc=all): 'es1039 (re)pooling @ 10%: post T365983 repool', diff saved to https://phabricator.wikimedia.org/P64866 and previous config saved to /var/cache/conftool/dbconfig/20240613-152300-arnaudb.json
  • 15:22 elukey: drop eventgate-ci docker images from the Docker Registry
  • 15:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2218 (T364069)', diff saved to https://phabricator.wikimedia.org/P64865 and previous config saved to /var/cache/conftool/dbconfig/20240613-151910-marostegui.json
  • 15:15 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl1003.eqiad.wmnet with OS bullseye
  • 15:09 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193', diff saved to https://phabricator.wikimedia.org/P64864 and previous config saved to /var/cache/conftool/dbconfig/20240613-150913-marostegui.json
  • 15:08 pfischer@deploy1002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:07 pfischer@deploy1002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:07 pfischer@deploy1002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:07 pfischer@deploy1002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:07 pfischer@deploy1002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:07 pfischer@deploy1002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:05 volans: upgrading spicerack on cumin1002 to v8.6.0
  • 15:04 topranks: rebooting lsw1-f6-codfw to upgrade JunOS on switch T365983
  • 15:04 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:35:00 on an-worker[1169-1171].eqiad.wmnet,es1039.eqiad.wmnet,ms-be1080.eqiad.wmnet with reason: JunOS upgrade lsw1-f6-eqiad
  • 15:04 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:35:00 on an-worker[1169-1171].eqiad.wmnet,es1039.eqiad.wmnet,ms-be1080.eqiad.wmnet with reason: JunOS upgrade lsw1-f6-eqiad
  • 15:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T352010)', diff saved to https://phabricator.wikimedia.org/P64863 and previous config saved to /var/cache/conftool/dbconfig/20240613-150332-ladsgroup.json
  • 15:03 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on lsw1-f6-eqiad,lsw1-f6-eqiad IPv6,ssw1-e1-eqiad.mgmt,ssw1-f1-eqiad.mgmt with reason: JunOS upgrade lsw1-f6-eqiad
  • 15:03 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:40:00 on lsw1-f6-eqiad,lsw1-f6-eqiad IPv6,ssw1-e1-eqiad.mgmt,ssw1-f1-eqiad.mgmt with reason: JunOS upgrade lsw1-f6-eqiad
  • 15:01 cdanis@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 15:01 cdanis@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 15:00 cdanis@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 14:59 kamila@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-ctrl1003
  • 14:59 cdanis@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 14:59 kamila@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-ctrl1003
  • 14:59 kamila@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:57 kamila@cumin1002: START - Cookbook sre.dns.netbox
  • 14:57 cdanis@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 14:57 kamila@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-ctrl1003
  • 14:57 kamila@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-ctrl1003
  • 14:55 cdanis@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 14:55 cdanis@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 14:54 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193', diff saved to https://phabricator.wikimedia.org/P64862 and previous config saved to /var/cache/conftool/dbconfig/20240613-145406-marostegui.json
  • 14:53 cdanis@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 14:51 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es1039.eqiad.wmnet with reason: T365983
  • 14:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on es1039.eqiad.wmnet with reason: T365983
  • 14:50 arnaudb@cumin1002: dbctl commit (dc=all): 'es1039 depool ahead of T365983', diff saved to https://phabricator.wikimedia.org/P64861 and previous config saved to /var/cache/conftool/dbconfig/20240613-145035-arnaudb.json
  • 14:49 elukey@cumin2002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest1001.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:49 moritzm: rebalance ganeti/B in eqiad following reboots
  • 14:49 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1033.eqiad.wmnet with OS bookworm
  • 14:48 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P64860 and previous config saved to /var/cache/conftool/dbconfig/20240613-144825-ladsgroup.json
  • 14:47 kamila@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-ctrl1003
  • 14:46 kamila@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:46 elukey@cumin2002: START - Cookbook sre.hosts.provision for host sretest1001.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:45 kamila@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-ctrl1003
  • 14:44 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply
  • 14:44 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply
  • 14:44 kamila@cumin1002: START - Cookbook sre.dns.netbox
  • 14:44 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply
  • 14:44 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply
  • 14:41 hashar@deploy1002: Finished deploy [gerrit/gerrit@89042ad]: Gerrit to snapshot version 3.9.5-22-g7380128525 on gerrit1003 # T358762 (duration: 00m 05s)
  • 14:41 hashar@deploy1002: Started deploy [gerrit/gerrit@89042ad]: Gerrit to snapshot version 3.9.5-22-g7380128525 on gerrit1003 # T358762
  • 14:39 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193 (T367261)', diff saved to https://phabricator.wikimedia.org/P64859 and previous config saved to /var/cache/conftool/dbconfig/20240613-143859-marostegui.json
  • 14:35 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2193 (T367261)', diff saved to https://phabricator.wikimedia.org/P64858 and previous config saved to /var/cache/conftool/dbconfig/20240613-143554-marostegui.json
  • 14:35 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2193.codfw.wmnet with reason: Maintenance
  • 14:35 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2193.codfw.wmnet with reason: Maintenance
  • 14:35 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T367261)', diff saved to https://phabricator.wikimedia.org/P64857 and previous config saved to /var/cache/conftool/dbconfig/20240613-143531-marostegui.json
  • 14:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P64856 and previous config saved to /var/cache/conftool/dbconfig/20240613-143318-ladsgroup.json
  • 14:32 hashar@deploy1002: Finished deploy [gerrit/gerrit@89042ad]: Gerrit to snapshot version 3.9.5-22-g7380128525 on gerrit2002 # T358762 (duration: 00m 07s)
  • 14:32 hashar@deploy1002: Started deploy [gerrit/gerrit@89042ad]: Gerrit to snapshot version 3.9.5-22-g7380128525 on gerrit2002 # T358762
  • 14:27 bblack: authdns-update for https://gerrit.wikimedia.org/r/1042490 (remaps some Facebook ranges to codfw+eqiad)
  • 14:24 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1033.eqiad.wmnet with reason: host reimage
  • 14:21 cgoubert@deploy1002: Finished scap: Change mwapi listener to mw-api-int - T333120 (duration: 06m 47s)
  • 14:21 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1033.eqiad.wmnet with reason: host reimage
  • 14:20 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P64855 and previous config saved to /var/cache/conftool/dbconfig/20240613-142024-marostegui.json
  • 14:18 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T352010)', diff saved to https://phabricator.wikimedia.org/P64854 and previous config saved to /var/cache/conftool/dbconfig/20240613-141810-ladsgroup.json
  • 14:16 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/ratelimit: apply
  • 14:16 jayme@deploy1002: helmfile [staging] START helmfile.d/services/ratelimit: apply
  • 14:15 cgoubert@deploy1002: Started scap: Change mwapi listener to mw-api-int - T333120
  • 14:05 Lucas_WMDE: UTC afternoon backport+config window done
  • 14:05 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for Load EntitySchema on Test Wikidata clients (T363153) (duration: 14m 14s)
  • 14:05 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P64853 and previous config saved to /var/cache/conftool/dbconfig/20240613-140517-marostegui.json
  • 14:03 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1033.eqiad.wmnet with OS bookworm
  • 14:00 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on cloudvirt1033.eqiad.wmnet with reason: reimage and move to OVS
  • 14:00 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: sync
  • 13:59 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on cloudvirt1033.eqiad.wmnet with reason: reimage and move to OVS
  • 13:59 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: sync
  • 13:56 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde: Continuing with sync
  • 13:55 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: sync
  • 13:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2123 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P64852 and previous config saved to /var/cache/conftool/dbconfig/20240613-135523-ladsgroup.json
  • 13:55 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-constraints: sync
  • 13:55 claime: roll-restarting shellbox-constraints
  • 13:53 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde: Backport for Load EntitySchema on Test Wikidata clients (T363153) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:51 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap: Backport for Load EntitySchema on Test Wikidata clients (T363153)
  • 13:50 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T367261)', diff saved to https://phabricator.wikimedia.org/P64851 and previous config saved to /var/cache/conftool/dbconfig/20240613-135010-marostegui.json
  • 13:48 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp4037.ulsfo.wmnet
  • 13:47 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp4037.ulsfo.wmnet
  • 13:47 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2180 (T367261)', diff saved to https://phabricator.wikimedia.org/P64850 and previous config saved to /var/cache/conftool/dbconfig/20240613-134701-marostegui.json
  • 13:47 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:40:00 on lsw1-f6-eqiad.mgmt with reason: prep JunOS upgrade lsw1-f6-eqiad
  • 13:46 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2180.codfw.wmnet with reason: Maintenance
  • 13:46 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 2:40:00 on lsw1-f6-eqiad.mgmt with reason: prep JunOS upgrade lsw1-f6-eqiad
  • 13:46 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2180.codfw.wmnet with reason: Maintenance
  • 13:46 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169 (T367261)', diff saved to https://phabricator.wikimedia.org/P64849 and previous config saved to /var/cache/conftool/dbconfig/20240613-134639-marostegui.json
  • 13:45 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for [svwikt] Add a temporary logo for the 100.000 pages (T364247) (duration: 13m 24s)
  • 13:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1230 (T352010)', diff saved to https://phabricator.wikimedia.org/P64848 and previous config saved to /var/cache/conftool/dbconfig/20240613-134456-ladsgroup.json
  • 13:44 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1230.eqiad.wmnet with reason: Maintenance
  • 13:44 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1230.eqiad.wmnet with reason: Maintenance
  • 13:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2123 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P64847 and previous config saved to /var/cache/conftool/dbconfig/20240613-134017-ladsgroup.json
  • 13:36 logmsgbot: lucaswerkmeister-wmde@deploy1002 superpes, lucaswerkmeister-wmde: Continuing with sync
  • 13:34 logmsgbot: lucaswerkmeister-wmde@deploy1002 superpes, lucaswerkmeister-wmde: Backport for [svwikt] Add a temporary logo for the 100.000 pages (T364247) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:33 pfischer@deploy1002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 13:33 pfischer@deploy1002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 13:32 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap: Backport for [svwikt] Add a temporary logo for the 100.000 pages (T364247)
  • 13:31 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169', diff saved to https://phabricator.wikimedia.org/P64846 and previous config saved to /var/cache/conftool/dbconfig/20240613-133132-marostegui.json
  • 13:30 volans: upgrading spicerack on cumin2002 to v8.6.0
  • 13:26 moritzm: installing pillow security updates
  • 13:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2123 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P64845 and previous config saved to /var/cache/conftool/dbconfig/20240613-132512-ladsgroup.json
  • 13:18 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1032.eqiad.wmnet with OS bookworm
  • 13:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1206 (T352010)', diff saved to https://phabricator.wikimedia.org/P64844 and previous config saved to /var/cache/conftool/dbconfig/20240613-131746-ladsgroup.json
  • 13:17 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1206.eqiad.wmnet with reason: Maintenance
  • 13:17 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1206.eqiad.wmnet with reason: Maintenance
  • 13:16 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169', diff saved to https://phabricator.wikimedia.org/P64843 and previous config saved to /var/cache/conftool/dbconfig/20240613-131625-marostegui.json
  • 13:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2123 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P64842 and previous config saved to /var/cache/conftool/dbconfig/20240613-131006-ladsgroup.json
  • 13:07 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 13:07 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 13:06 moritzm: installing pillow security updates
  • 13:03 jmm@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host cumin2002.codfw.wmnet
  • 13:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169 (T367261)', diff saved to https://phabricator.wikimedia.org/P64841 and previous config saved to /var/cache/conftool/dbconfig/20240613-130117-marostegui.json
  • 12:57 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2169 (T367261)', diff saved to https://phabricator.wikimedia.org/P64840 and previous config saved to /var/cache/conftool/dbconfig/20240613-125700-marostegui.json
  • 12:56 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 12:56 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 12:56 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T367261)', diff saved to https://phabricator.wikimedia.org/P64839 and previous config saved to /var/cache/conftool/dbconfig/20240613-125648-marostegui.json
  • 12:52 jmm@cumin1002: START - Cookbook sre.hosts.reboot-single for host cumin2002.codfw.wmnet
  • 12:51 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1032.eqiad.wmnet with reason: host reimage
  • 12:48 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1032.eqiad.wmnet with reason: host reimage
  • 12:41 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P64838 and previous config saved to /var/cache/conftool/dbconfig/20240613-124141-marostegui.json
  • 12:39 elukey: reset BIOS/BMC to factory default on sretest1001 - T365372
  • 12:30 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1032.eqiad.wmnet with OS bookworm
  • 12:26 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P64837 and previous config saved to /var/cache/conftool/dbconfig/20240613-122634-marostegui.json
  • 12:26 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on cloudvirt1032.eqiad.wmnet with reason: reimage and move to OVS
  • 12:26 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on cloudvirt1032.eqiad.wmnet with reason: reimage and move to OVS
  • 12:21 ladsgroup@deploy1002: Finished scap: Backport for Temporarily bump circuit breaking threshold to 350 (duration: 12m 13s)
  • 12:20 pfischer@deploy1002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 12:19 pfischer@deploy1002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 12:17 pfischer@deploy1002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 12:16 pfischer@deploy1002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 12:15 pfischer@deploy1002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 12:12 ladsgroup@deploy1002: ladsgroup: Continuing with sync
  • 12:12 ladsgroup@deploy1002: ladsgroup: Backport for Temporarily bump circuit breaking threshold to 350 synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 12:11 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T367261)', diff saved to https://phabricator.wikimedia.org/P64836 and previous config saved to /var/cache/conftool/dbconfig/20240613-121127-marostegui.json
  • 12:09 ladsgroup@deploy1002: Started scap: Backport for Temporarily bump circuit breaking threshold to 350
  • 12:07 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2158 (T367261)', diff saved to https://phabricator.wikimedia.org/P64835 and previous config saved to /var/cache/conftool/dbconfig/20240613-120711-marostegui.json
  • 12:07 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 12:07 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 12:07 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2158.codfw.wmnet with reason: Maintenance
  • 12:06 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2158.codfw.wmnet with reason: Maintenance
  • 12:06 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T367261)', diff saved to https://phabricator.wikimedia.org/P64834 and previous config saved to /var/cache/conftool/dbconfig/20240613-120644-marostegui.json
  • 11:58 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp4037.ulsfo.wmnet
  • 11:57 fabfur: enabling puppet && repool cp4037 (T360454)
  • 11:51 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P64832 and previous config saved to /var/cache/conftool/dbconfig/20240613-115137-marostegui.json
  • 11:36 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P64831 and previous config saved to /var/cache/conftool/dbconfig/20240613-113630-marostegui.json
  • 11:35 jelto@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: Upgrade GitLab Replica to new version
  • 11:29 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Upgrade GitLab Replica to new version
  • 11:28 jelto@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: Upgrade GitLab Replica to new version
  • 11:27 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubemaster2001.codfw.wmnet
  • 11:22 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Upgrade GitLab Replica to new version
  • 11:21 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T367261)', diff saved to https://phabricator.wikimedia.org/P64830 and previous config saved to /var/cache/conftool/dbconfig/20240613-112122-marostegui.json
  • 11:20 cgoubert@cumin1002: START - Cookbook sre.hosts.reboot-single for host kubemaster2001.codfw.wmnet
  • 11:19 cgoubert@cumin1002: conftool action : set/pooled=inactive; selector: name=wikikube-ctrl2003.codfw.wmnet
  • 11:17 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2151 (T367261)', diff saved to https://phabricator.wikimedia.org/P64829 and previous config saved to /var/cache/conftool/dbconfig/20240613-111706-marostegui.json
  • 11:17 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2151.codfw.wmnet with reason: Maintenance
  • 11:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1222 (T352010)', diff saved to https://phabricator.wikimedia.org/P64828 and previous config saved to /var/cache/conftool/dbconfig/20240613-111655-ladsgroup.json
  • 11:16 moritzm: installing pillow security updates
  • 11:16 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1222.eqiad.wmnet with reason: Maintenance
  • 11:16 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2151.codfw.wmnet with reason: Maintenance
  • 11:16 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2124 (T367261)', diff saved to https://phabricator.wikimedia.org/P64827 and previous config saved to /var/cache/conftool/dbconfig/20240613-111642-marostegui.json
  • 11:16 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1222.eqiad.wmnet with reason: Maintenance
  • 11:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T352010)', diff saved to https://phabricator.wikimedia.org/P64826 and previous config saved to /var/cache/conftool/dbconfig/20240613-111633-ladsgroup.json
  • 11:14 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubemaster2002.codfw.wmnet
  • 11:09 jiji@cumin1002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-worker-eqiad
  • 11:08 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 11:08 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 11:07 cgoubert@cumin1002: START - Cookbook sre.hosts.reboot-single for host kubemaster2002.codfw.wmnet
  • 11:01 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 11:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P64825 and previous config saved to /var/cache/conftool/dbconfig/20240613-110135-marostegui.json
  • 11:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P64824 and previous config saved to /var/cache/conftool/dbconfig/20240613-110126-ladsgroup.json
  • 10:59 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 10:58 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 10:56 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubemaster1001.eqiad.wmnet
  • 10:55 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp4037.ulsfo.wmnet
  • 10:52 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp4037.ulsfo.wmnet
  • 10:49 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp4037.ulsfo.wmnet
  • 10:49 cgoubert@cumin1002: START - Cookbook sre.hosts.reboot-single for host kubemaster1001.eqiad.wmnet
  • 10:48 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 10:48 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 10:48 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub: sync on production
  • 10:48 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 10:47 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 10:47 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubemaster1002.eqiad.wmnet
  • 10:47 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 10:46 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 10:46 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 10:46 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P64823 and previous config saved to /var/cache/conftool/dbconfig/20240613-104628-marostegui.json
  • 10:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P64822 and previous config saved to /var/cache/conftool/dbconfig/20240613-104619-ladsgroup.json
  • 10:43 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp4037.ulsfo.wmnet
  • 10:42 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub: apply on production
  • 10:41 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub-next: sync on staging
  • 10:41 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 10:41 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-main2010.codfw.wmnet
  • 10:41 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 10:40 cgoubert@cumin1002: START - Cookbook sre.hosts.reboot-single for host kubemaster1002.eqiad.wmnet
  • 10:39 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub-next: apply on staging
  • 10:34 cgoubert@cumin1002: START - Cookbook sre.hosts.reboot-single for host kafka-main2010.codfw.wmnet
  • 10:34 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-main2009.codfw.wmnet
  • 10:33 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub-next: sync on staging
  • 10:31 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2124 (T367261)', diff saved to https://phabricator.wikimedia.org/P64821 and previous config saved to /var/cache/conftool/dbconfig/20240613-103120-marostegui.json
  • 10:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T352010)', diff saved to https://phabricator.wikimedia.org/P64820 and previous config saved to /var/cache/conftool/dbconfig/20240613-103111-ladsgroup.json
  • 10:31 cmooney@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-ctrl1003']
  • 10:30 cmooney@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-ctrl1003']
  • 10:29 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub-next: apply on staging
  • 10:29 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub-next: sync on staging
  • 10:28 cgoubert@cumin1002: START - Cookbook sre.hosts.reboot-single for host kafka-main2009.codfw.wmnet
  • 10:28 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-main2008.codfw.wmnet
  • 10:27 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2124 (T367261)', diff saved to https://phabricator.wikimedia.org/P64819 and previous config saved to /var/cache/conftool/dbconfig/20240613-102659-marostegui.json
  • 10:26 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2124.codfw.wmnet with reason: Maintenance
  • 10:26 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw[2287-2290].codfw.wmnet
  • 10:26 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:26 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mw[2287-2290].codfw.wmnet decommissioned, removing all IPs except the asset tag one - cgoubert@cumin1002"
  • 10:26 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2124.codfw.wmnet with reason: Maintenance
  • 10:26 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub-next: apply on staging
  • 10:25 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2114.codfw.wmnet with reason: Maintenance
  • 10:25 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2114.codfw.wmnet with reason: Maintenance
  • 10:23 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mw[2287-2290].codfw.wmnet decommissioned, removing all IPs except the asset tag one - cgoubert@cumin1002"
  • 10:23 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance
  • 10:23 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance
  • 10:22 cgoubert@cumin1002: START - Cookbook sre.hosts.reboot-single for host kafka-main2008.codfw.wmnet
  • 10:22 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-main2007.codfw.wmnet
  • 10:21 hashar: Gerrit upgrade completed
  • 10:20 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1225.eqiad.wmnet with reason: Maintenance
  • 10:20 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1225.eqiad.wmnet with reason: Maintenance
  • 10:20 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1224 (T367261)', diff saved to https://phabricator.wikimedia.org/P64818 and previous config saved to /var/cache/conftool/dbconfig/20240613-102016-marostegui.json
  • 10:20 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 10:15 cgoubert@cumin1002: START - Cookbook sre.hosts.reboot-single for host kafka-main2007.codfw.wmnet
  • 10:15 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-main2006.codfw.wmnet
  • 10:10 fabfur: cp4037 depooled && puppet disable to profile benthos configuration (T360454)
  • 10:09 cgoubert@cumin1002: START - Cookbook sre.hosts.reboot-single for host kafka-main2006.codfw.wmnet
  • 10:09 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp4037.ulsfo.wmnet
  • 10:08 hashar@deploy1002: Finished deploy [gerrit/gerrit@ee8252a]: Gerrit to snapshot version 3.9.5-21-g553ea468a1 on gerrit1003 # T367029 T367135 (duration: 00m 06s)
  • 10:08 hashar@deploy1002: Started deploy [gerrit/gerrit@ee8252a]: Gerrit to snapshot version 3.9.5-21-g553ea468a1 on gerrit1003 # T367029 T367135
  • 10:06 cgoubert@cumin1002: START - Cookbook sre.hosts.decommission for hosts mw[2287-2290].codfw.wmnet
  • 10:05 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts mw[2281,2283-2286].codfw.wmnet
  • 10:05 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:05 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mw[2281,2283-2286].codfw.wmnet decommissioned, removing all IPs except the asset tag one - cgoubert@cumin1002"
  • 10:05 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1224', diff saved to https://phabricator.wikimedia.org/P64816 and previous config saved to /var/cache/conftool/dbconfig/20240613-100509-marostegui.json
  • 10:04 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mw[2281,2283-2286].codfw.wmnet decommissioned, removing all IPs except the asset tag one - cgoubert@cumin1002"
  • 10:04 hashar@deploy1002: Finished deploy [gerrit/gerrit@ee8252a]: Gerrit to snapshot version 3.9.5-21-g553ea468a1 (duration: 00m 08s)
  • 10:04 hashar@deploy1002: Started deploy [gerrit/gerrit@ee8252a]: Gerrit to snapshot version 3.9.5-21-g553ea468a1
  • 10:03 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts wikikube-ctrl2003.codfw.wmnet
  • 10:03 kamila@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:03 kamila@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wikikube-ctrl2003.codfw.wmnet decommissioned, removing all IPs except the asset tag one - kamila@cumin1002"
  • 10:02 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub-next: sync on staging
  • 10:02 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 10:01 kamila@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wikikube-ctrl2003.codfw.wmnet decommissioned, removing all IPs except the asset tag one - kamila@cumin1002"
  • 09:59 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub-next: apply on staging
  • 09:58 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-main1010.eqiad.wmnet
  • 09:53 kamila@cumin1002: START - Cookbook sre.dns.netbox
  • 09:52 cgoubert@cumin1002: START - Cookbook sre.hosts.reboot-single for host kafka-main1010.eqiad.wmnet
  • 09:52 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-main1009.eqiad.wmnet
  • 09:50 kamila@cumin1002: conftool action : set/pooled=yes; selector: name=wikikube-ctrl2001.eqiad.wmnet
  • 09:50 kamila@cumin1002: conftool action : set/pooled=inactive; selector: name=wikikube-ctrl2003.eqiad.wmnet
  • 09:50 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1224', diff saved to https://phabricator.wikimedia.org/P64815 and previous config saved to /var/cache/conftool/dbconfig/20240613-095002-marostegui.json
  • 09:47 cgoubert@cumin1002: START - Cookbook sre.hosts.decommission for hosts mw[2281,2283-2286].codfw.wmnet
  • 09:46 kamila@cumin1002: START - Cookbook sre.hosts.decommission for hosts wikikube-ctrl2003.codfw.wmnet
  • 09:45 cgoubert@cumin1002: START - Cookbook sre.hosts.reboot-single for host kafka-main1009.eqiad.wmnet
  • 09:45 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-main1008.eqiad.wmnet
  • 09:39 cgoubert@cumin1002: START - Cookbook sre.hosts.reboot-single for host kafka-main1008.eqiad.wmnet
  • 09:39 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-main1007.eqiad.wmnet
  • 09:39 kamila@cumin1002: conftool action : set/pooled=inactive; selector: name=wikikube-ctrl2001.codfw.wmnet
  • 09:38 kamila@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-ctrl1003.eqiad.wmnet with OS bullseye
  • 09:37 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 09:37 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 09:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1224 (T367261)', diff saved to https://phabricator.wikimedia.org/P64814 and previous config saved to /var/cache/conftool/dbconfig/20240613-093455-marostegui.json
  • 09:33 cgoubert@cumin1002: START - Cookbook sre.hosts.reboot-single for host kafka-main1007.eqiad.wmnet
  • 09:33 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-main1006.eqiad.wmnet
  • 09:32 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1224 (T367261)', diff saved to https://phabricator.wikimedia.org/P64813 and previous config saved to /var/cache/conftool/dbconfig/20240613-093158-marostegui.json
  • 09:31 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1224.eqiad.wmnet with reason: Maintenance
  • 09:31 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1224.eqiad.wmnet with reason: Maintenance
  • 09:31 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T367261)', diff saved to https://phabricator.wikimedia.org/P64812 and previous config saved to /var/cache/conftool/dbconfig/20240613-093136-marostegui.json
  • 09:26 cgoubert@cumin1002: START - Cookbook sre.hosts.reboot-single for host kafka-main1006.eqiad.wmnet
  • 09:22 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub-next: apply on staging
  • 09:17 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub-next: sync on staging
  • 09:16 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P64811 and previous config saved to /var/cache/conftool/dbconfig/20240613-091629-marostegui.json
  • 09:12 arnaudb@cumin1002: dbctl commit (dc=all): 'db2125 (re)pooling @ 100%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P64810 and previous config saved to /var/cache/conftool/dbconfig/20240613-091200-arnaudb.json
  • 09:07 jiji@cumin1002: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:wikikube-worker-eqiad
  • 09:07 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub-next: apply on staging
  • 09:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P64809 and previous config saved to /var/cache/conftool/dbconfig/20240613-090122-marostegui.json
  • 08:59 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl1003.eqiad.wmnet with OS bullseye
  • 08:56 arnaudb@cumin1002: dbctl commit (dc=all): 'db2125 (re)pooling @ 75%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P64808 and previous config saved to /var/cache/conftool/dbconfig/20240613-085654-arnaudb.json
  • 08:46 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T367261)', diff saved to https://phabricator.wikimedia.org/P64807 and previous config saved to /var/cache/conftool/dbconfig/20240613-084615-marostegui.json
  • 08:43 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1201 (T367261)', diff saved to https://phabricator.wikimedia.org/P64806 and previous config saved to /var/cache/conftool/dbconfig/20240613-084310-marostegui.json
  • 08:43 jelto@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: Upgrade GitLab Replica to new version
  • 08:43 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1201.eqiad.wmnet with reason: Maintenance
  • 08:42 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1201.eqiad.wmnet with reason: Maintenance
  • 08:42 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T367261)', diff saved to https://phabricator.wikimedia.org/P64805 and previous config saved to /var/cache/conftool/dbconfig/20240613-084248-marostegui.json
  • 08:41 arnaudb@cumin1002: dbctl commit (dc=all): 'db2125 (re)pooling @ 50%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P64804 and previous config saved to /var/cache/conftool/dbconfig/20240613-084149-arnaudb.json
  • 08:37 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Upgrade GitLab Replica to new version
  • 08:36 jelto@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: Upgrade GitLab Replica to new version
  • 08:30 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Upgrade GitLab Replica to new version
  • 08:29 kart_: Updated MinT to 2024-06-12-111204-production (T363563)
  • 08:27 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P64803 and previous config saved to /var/cache/conftool/dbconfig/20240613-082741-marostegui.json
  • 08:26 arnaudb@cumin1002: dbctl commit (dc=all): 'db2125 (re)pooling @ 25%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P64802 and previous config saved to /var/cache/conftool/dbconfig/20240613-082643-arnaudb.json
  • 08:25 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply
  • 08:15 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply
  • 08:13 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply
  • 08:12 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P64801 and previous config saved to /var/cache/conftool/dbconfig/20240613-081234-marostegui.json
  • 08:11 arnaudb@cumin1002: dbctl commit (dc=all): 'db2125 (re)pooling @ 10%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P64800 and previous config saved to /var/cache/conftool/dbconfig/20240613-081138-arnaudb.json
  • 08:11 jiji@cumin1002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-worker-eqiad
  • 08:08 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on db2125.codfw.wmnet with reason: index issue
  • 08:08 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on db2125.codfw.wmnet with reason: index issue
  • 08:06 arnaudb@cumin1002: dbctl commit (dc=all): 'index error depool db2125', diff saved to https://phabricator.wikimedia.org/P64799 and previous config saved to /var/cache/conftool/dbconfig/20240613-080624-arnaudb.json
  • 08:06 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
  • 07:59 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
  • 07:57 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T367261)', diff saved to https://phabricator.wikimedia.org/P64798 and previous config saved to /var/cache/conftool/dbconfig/20240613-075727-marostegui.json
  • 07:55 marostegui@cumin1002: dbctl commit (dc=all): 'db1230 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P64797 and previous config saved to /var/cache/conftool/dbconfig/20240613-075500-root.json
  • 07:54 kartik@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
  • 07:54 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1187 (T367261)', diff saved to https://phabricator.wikimedia.org/P64796 and previous config saved to /var/cache/conftool/dbconfig/20240613-075420-marostegui.json
  • 07:54 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1187.eqiad.wmnet with reason: Maintenance
  • 07:54 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1187.eqiad.wmnet with reason: Maintenance
  • 07:54 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T367261)', diff saved to https://phabricator.wikimedia.org/P64795 and previous config saved to /var/cache/conftool/dbconfig/20240613-075358-marostegui.json
  • 07:39 marostegui@cumin1002: dbctl commit (dc=all): 'db1230 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P64794 and previous config saved to /var/cache/conftool/dbconfig/20240613-073955-root.json
  • 07:38 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P64793 and previous config saved to /var/cache/conftool/dbconfig/20240613-073851-marostegui.json
  • 07:28 ryankemper@cumin2002: END (FAIL) - Cookbook sre.hadoop.reboot-workers (exit_code=99) for Hadoop analytics cluster
  • 07:24 marostegui@cumin1002: dbctl commit (dc=all): 'db1230 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P64792 and previous config saved to /var/cache/conftool/dbconfig/20240613-072450-root.json
  • 07:23 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P64791 and previous config saved to /var/cache/conftool/dbconfig/20240613-072344-marostegui.json
  • 07:21 jiji@cumin1002: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:wikikube-worker-eqiad
  • 07:09 marostegui@cumin1002: dbctl commit (dc=all): 'db1230 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P64790 and previous config saved to /var/cache/conftool/dbconfig/20240613-070944-root.json
  • 07:08 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T367261)', diff saved to https://phabricator.wikimedia.org/P64789 and previous config saved to /var/cache/conftool/dbconfig/20240613-070837-marostegui.json
  • 07:05 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1180 (T367261)', diff saved to https://phabricator.wikimedia.org/P64788 and previous config saved to /var/cache/conftool/dbconfig/20240613-070531-marostegui.json
  • 07:05 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 07:05 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 07:05 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1173 (T367261)', diff saved to https://phabricator.wikimedia.org/P64787 and previous config saved to /var/cache/conftool/dbconfig/20240613-070509-marostegui.json
  • 06:54 marostegui@cumin1002: dbctl commit (dc=all): 'db1230 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P64786 and previous config saved to /var/cache/conftool/dbconfig/20240613-065439-root.json
  • 06:50 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1173', diff saved to https://phabricator.wikimedia.org/P64785 and previous config saved to /var/cache/conftool/dbconfig/20240613-065002-marostegui.json
  • 06:42 moritzm: rebalance ganeti clusters in eqiad following reboots
  • 06:39 marostegui@cumin1002: dbctl commit (dc=all): 'db1230 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P64784 and previous config saved to /var/cache/conftool/dbconfig/20240613-063934-root.json
  • 06:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1173', diff saved to https://phabricator.wikimedia.org/P64783 and previous config saved to /var/cache/conftool/dbconfig/20240613-063455-marostegui.json
  • 06:27 jiji@cumin1002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-worker-eqiad
  • 06:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1173 (T367261)', diff saved to https://phabricator.wikimedia.org/P64782 and previous config saved to /var/cache/conftool/dbconfig/20240613-061948-marostegui.json
  • 06:16 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1173 (T367261)', diff saved to https://phabricator.wikimedia.org/P64781 and previous config saved to /var/cache/conftool/dbconfig/20240613-061636-marostegui.json
  • 06:16 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1173.eqiad.wmnet with reason: Maintenance
  • 06:16 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1173.eqiad.wmnet with reason: Maintenance
  • 06:16 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T367261)', diff saved to https://phabricator.wikimedia.org/P64780 and previous config saved to /var/cache/conftool/dbconfig/20240613-061613-marostegui.json
  • 06:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1197 (T352010)', diff saved to https://phabricator.wikimedia.org/P64779 and previous config saved to /var/cache/conftool/dbconfig/20240613-060927-ladsgroup.json
  • 06:09 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1197.eqiad.wmnet with reason: Maintenance
  • 06:09 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1197.eqiad.wmnet with reason: Maintenance
  • 06:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T352010)', diff saved to https://phabricator.wikimedia.org/P64778 and previous config saved to /var/cache/conftool/dbconfig/20240613-060905-ladsgroup.json
  • 06:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P64777 and previous config saved to /var/cache/conftool/dbconfig/20240613-060107-marostegui.json
  • 05:57 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2218 (T364069)', diff saved to https://phabricator.wikimedia.org/P64776 and previous config saved to /var/cache/conftool/dbconfig/20240613-055747-marostegui.json
  • 05:57 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2218.codfw.wmnet with reason: Maintenance
  • 05:57 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2218.codfw.wmnet with reason: Maintenance
  • 05:57 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2208 (T364069)', diff saved to https://phabricator.wikimedia.org/P64775 and previous config saved to /var/cache/conftool/dbconfig/20240613-055725-marostegui.json
  • 05:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P64774 and previous config saved to /var/cache/conftool/dbconfig/20240613-055358-ladsgroup.json
  • 05:47 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1238.eqiad.wmnet with reason: Long schema change
  • 05:47 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1238.eqiad.wmnet with reason: Long schema change
  • 05:46 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P64773 and previous config saved to /var/cache/conftool/dbconfig/20240613-054600-marostegui.json
  • 05:42 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2208', diff saved to https://phabricator.wikimedia.org/P64772 and previous config saved to /var/cache/conftool/dbconfig/20240613-054218-marostegui.json
  • 05:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P64771 and previous config saved to /var/cache/conftool/dbconfig/20240613-053851-ladsgroup.json
  • 05:30 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T367261)', diff saved to https://phabricator.wikimedia.org/P64770 and previous config saved to /var/cache/conftool/dbconfig/20240613-053052-marostegui.json
  • 05:27 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1168 (T367261)', diff saved to https://phabricator.wikimedia.org/P64769 and previous config saved to /var/cache/conftool/dbconfig/20240613-052746-marostegui.json
  • 05:27 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 05:27 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 05:27 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T367261)', diff saved to https://phabricator.wikimedia.org/P64768 and previous config saved to /var/cache/conftool/dbconfig/20240613-052723-marostegui.json
  • 05:27 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2208', diff saved to https://phabricator.wikimedia.org/P64767 and previous config saved to /var/cache/conftool/dbconfig/20240613-052711-marostegui.json
  • 05:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T352010)', diff saved to https://phabricator.wikimedia.org/P64766 and previous config saved to /var/cache/conftool/dbconfig/20240613-052344-ladsgroup.json
  • 05:12 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P64765 and previous config saved to /var/cache/conftool/dbconfig/20240613-051216-marostegui.json
  • 05:12 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2208 (T364069)', diff saved to https://phabricator.wikimedia.org/P64764 and previous config saved to /var/cache/conftool/dbconfig/20240613-051204-marostegui.json
  • 04:57 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P64763 and previous config saved to /var/cache/conftool/dbconfig/20240613-045709-marostegui.json
  • 04:55 marostegui: dbmaint eqiad s5 deploy schema change on db1230 T364299
  • 04:54 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1230.eqiad.wmnet with reason: Long schema change
  • 04:54 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1230.eqiad.wmnet with reason: Long schema change
  • 04:52 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1230 T367146', diff saved to https://phabricator.wikimedia.org/P64762 and previous config saved to /var/cache/conftool/dbconfig/20240613-045254-root.json
  • 04:51 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db1183 to s5 primary and set section read-write T367146', diff saved to https://phabricator.wikimedia.org/P64761 and previous config saved to /var/cache/conftool/dbconfig/20240613-045141-root.json
  • 04:51 marostegui@cumin1002: dbctl commit (dc=all): 'Set s5 eqiad as read-only for maintenance - T367146', diff saved to https://phabricator.wikimedia.org/P64760 and previous config saved to /var/cache/conftool/dbconfig/20240613-045121-root.json
  • 04:51 marostegui: Starting s5 eqiad failover from db1230 to db1183 - T367146
  • 04:42 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T367261)', diff saved to https://phabricator.wikimedia.org/P64759 and previous config saved to /var/cache/conftool/dbconfig/20240613-044201-marostegui.json
  • 04:38 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1165 (T367261)', diff saved to https://phabricator.wikimedia.org/P64758 and previous config saved to /var/cache/conftool/dbconfig/20240613-043848-marostegui.json
  • 04:38 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 04:38 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 04:38 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 04:38 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 04:32 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 24 hosts with reason: Primary switchover s5 T367146
  • 04:32 marostegui@cumin1002: dbctl commit (dc=all): 'Set db1183 with weight 0 T367146', diff saved to https://phabricator.wikimedia.org/P64757 and previous config saved to /var/cache/conftool/dbconfig/20240613-043239-root.json
  • 04:32 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 24 hosts with reason: Primary switchover s5 T367146
  • 00:42 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2208 (T364069)', diff saved to https://phabricator.wikimedia.org/P64756 and previous config saved to /var/cache/conftool/dbconfig/20240613-004247-marostegui.json
  • 00:42 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2208.codfw.wmnet with reason: Maintenance
  • 00:42 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2208.codfw.wmnet with reason: Maintenance
  • 00:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1188 (T352010)', diff saved to https://phabricator.wikimedia.org/P64755 and previous config saved to /var/cache/conftool/dbconfig/20240613-003507-ladsgroup.json
  • 00:35 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1188.eqiad.wmnet with reason: Maintenance
  • 00:34 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1188.eqiad.wmnet with reason: Maintenance
  • 00:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T352010)', diff saved to https://phabricator.wikimedia.org/P64754 and previous config saved to /var/cache/conftool/dbconfig/20240613-003444-ladsgroup.json
  • 00:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P64753 and previous config saved to /var/cache/conftool/dbconfig/20240613-001937-ladsgroup.json
  • 00:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P64752 and previous config saved to /var/cache/conftool/dbconfig/20240613-000430-ladsgroup.json

2024-06-12

  • 23:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T352010)', diff saved to https://phabricator.wikimedia.org/P64751 and previous config saved to /var/cache/conftool/dbconfig/20240612-234923-ladsgroup.json
  • 22:17 brett@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4037.ulsfo.wmnet
  • 22:13 krinkle@deploy1002: Finished scap: Backport for Move etcd.php from wmf-config/ to src/ (T308932) (duration: 13m 42s)
  • 22:10 eevans@deploy1002: helmfile [eqiad] DONE helmfile.d/services/data-gateway: apply
  • 22:08 eevans@deploy1002: helmfile [eqiad] START helmfile.d/services/data-gateway: apply
  • 22:07 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4037.ulsfo.wmnet with OS bullseye
  • 22:06 eevans@deploy1002: helmfile [codfw] DONE helmfile.d/services/data-gateway: apply
  • 22:04 krinkle@deploy1002: krinkle: Continuing with sync
  • 22:04 eevans@deploy1002: helmfile [codfw] START helmfile.d/services/data-gateway: apply
  • 22:03 krinkle@deploy1002: krinkle: Backport for Move etcd.php from wmf-config/ to src/ (T308932) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:59 krinkle@deploy1002: Started scap: Backport for Move etcd.php from wmf-config/ to src/ (T308932)
  • 21:44 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4037.ulsfo.wmnet with reason: host reimage
  • 21:42 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:cassandra-dev: Apply remote logging fix (r1042273) - eevans@cumin1002
  • 21:41 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4037.ulsfo.wmnet with reason: host reimage
  • 21:36 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/device-analytics: sync
  • 21:36 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/device-analytics: sync
  • 21:36 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/device-analytics: apply
  • 21:35 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/device-analytics: apply
  • 21:34 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/edit-analytics: apply
  • 21:33 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/edit-analytics: apply
  • 21:33 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/editor-analytics: apply
  • 21:32 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/editor-analytics: apply
  • 21:31 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/geo-analytics: sync
  • 21:31 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/geo-analytics: sync
  • 21:30 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/geo-analytics: apply
  • 21:30 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/geo-analytics: apply
  • 21:28 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/image-suggestion: sync
  • 21:28 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/image-suggestion: apply
  • 21:28 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/image-suggestion: apply
  • 21:27 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/media-analytics: apply
  • 21:26 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/media-analytics: apply
  • 21:25 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/page-analytics: apply
  • 21:24 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/page-analytics: apply
  • 21:22 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/data-gateway: sync
  • 21:22 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/data-gateway: sync
  • 21:21 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:cassandra-dev: Apply remote logging fix (r1042273) - eevans@cumin1002
  • 21:20 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching aqs1010.eqiad.wmnet: Apply remote logging fix (r1042273) - eevans@cumin1002
  • 21:19 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp4037.ulsfo.wmnet with OS bullseye
  • 21:18 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4037.ulsfo.wmnet with OS bullseye
  • 21:17 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/data-gateway: apply
  • 21:17 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/data-gateway: apply
  • 21:13 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching aqs1010.eqiad.wmnet: Apply remote logging fix (r1042273) - eevans@cumin1002
  • 21:11 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-reload (exit_code=0) reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/discovery/wdqs-reload-cookbook-test-T349069/ using stat1009.eqiad.wmnet)
  • 21:05 ryankemper@cumin2002: START - Cookbook sre.hadoop.reboot-workers for Hadoop analytics cluster
  • 21:05 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-reload reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/discovery/wdqs-reload-cookbook-test-T349069/ using stat1009.eqiad.wmnet)
  • 21:04 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp4037.ulsfo.wmnet with OS bullseye
  • 20:53 cjming: end of UTC late backport window
  • 20:52 cjming@deploy1002: Finished scap: Backport for Don't squish images in non-responsive skins e.g. Vector 2010 (T113101) (duration: 12m 52s)
  • 20:47 brett@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp4037.ulsfo.wmnet
  • 20:44 cjming@deploy1002: cjming, jdlrobson: Continuing with sync
  • 20:42 cjming@deploy1002: cjming, jdlrobson: Backport for Don't squish images in non-responsive skins e.g. Vector 2010 (T113101) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:39 cjming@deploy1002: Started scap: Backport for Don't squish images in non-responsive skins e.g. Vector 2010 (T113101)
  • 20:29 cjming@deploy1002: Finished scap: Backport for Disable quick surveys using deprecated configuration (T367128) (duration: 11m 59s)
  • 20:22 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2209 (T367261)', diff saved to https://phabricator.wikimedia.org/P64750 and previous config saved to /var/cache/conftool/dbconfig/20240612-202233-marostegui.json
  • 20:21 cjming@deploy1002: jdlrobson, cjming: Continuing with sync
  • 20:19 cjming@deploy1002: jdlrobson, cjming: Backport for Disable quick surveys using deprecated configuration (T367128) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:17 cjming@deploy1002: Started scap: Backport for Disable quick surveys using deprecated configuration (T367128)
  • 20:10 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on A:cp-upload_codfw
  • 20:07 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2209', diff saved to https://phabricator.wikimedia.org/P64749 and previous config saved to /var/cache/conftool/dbconfig/20240612-200726-marostegui.json
  • 20:00 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/data-gateway: apply
  • 19:59 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/data-gateway: apply
  • 19:58 brennen@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.43.0-wmf.9 refs T361403
  • 19:52 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2209', diff saved to https://phabricator.wikimedia.org/P64748 and previous config saved to /var/cache/conftool/dbconfig/20240612-195219-marostegui.json
  • 19:49 hashar@deploy1002: Finished deploy [gerrit/gerrit@e4c49f9]: wm-patch-demo: silently ignore errors - T367155 (duration: 00m 07s)
  • 19:49 hashar@deploy1002: Started deploy [gerrit/gerrit@e4c49f9]: wm-patch-demo: silently ignore errors - T367155
  • 19:48 ebysans@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: apply
  • 19:48 ebysans@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-main: apply
  • 19:48 brennen: 1.43.0-wmf.9 train (T361403): blockers (hopefully) resolved, rolling to group1
  • 19:46 ebysans@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-main: apply
  • 19:45 brennen@deploy1002: Finished scap: Backport for Call NamespaceRegistrationHandler::setConstants() earlier (T367334 T363153) (duration: 13m 06s)
  • 19:45 ebysans@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-main: apply
  • 19:43 ebysans@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-main: apply
  • 19:43 ebysans@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-main: apply
  • 19:41 ebysans@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 19:40 ebysans@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
  • 19:40 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/data-gateway: apply
  • 19:39 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/data-gateway: apply
  • 19:39 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/page-analytics: apply
  • 19:38 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/page-analytics: apply
  • 19:37 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/media-analytics: apply
  • 19:37 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2209 (T367261)', diff saved to https://phabricator.wikimedia.org/P64747 and previous config saved to /var/cache/conftool/dbconfig/20240612-193712-marostegui.json
  • 19:36 brennen@deploy1002: brennen: Continuing with sync
  • 19:36 ebysans@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 19:36 ebysans@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply
  • 19:36 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/media-analytics: apply
  • 19:35 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/image-suggestion: apply
  • 19:35 brennen@deploy1002: brennen: Backport for Call NamespaceRegistrationHandler::setConstants() earlier (T367334 T363153) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 19:35 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/image-suggestion: apply
  • 19:34 ebysans@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 19:34 ebysans@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-analytics-external: apply
  • 19:34 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/geo-analytics: apply
  • 19:33 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/geo-analytics: apply
  • 19:32 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/editor-analytics: apply
  • 19:32 ebysans@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics: apply
  • 19:32 brennen@deploy1002: Started scap: Backport for Call NamespaceRegistrationHandler::setConstants() earlier (T367334 T363153)
  • 19:32 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/editor-analytics: apply
  • 19:31 ebysans@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics: apply
  • 19:31 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/edit-analytics: apply
  • 19:30 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/edit-analytics: apply
  • 19:30 ebysans@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics: apply
  • 19:30 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/device-analytics: apply
  • 19:29 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/device-analytics: apply
  • 19:29 ebysans@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-analytics: apply
  • 19:28 ebysans@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-analytics: apply
  • 19:27 ebysans@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-analytics: apply
  • 19:26 ebysans@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-logging-external: apply
  • 19:25 ebysans@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-logging-external: apply
  • 19:23 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2209 (T367261)', diff saved to https://phabricator.wikimedia.org/P64746 and previous config saved to /var/cache/conftool/dbconfig/20240612-192327-marostegui.json
  • 19:23 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2209.codfw.wmnet with reason: Maintenance
  • 19:23 ebysans@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-logging-external: apply
  • 19:23 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2209.codfw.wmnet with reason: Maintenance
  • 19:23 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2205 (T367261)', diff saved to https://phabricator.wikimedia.org/P64745 and previous config saved to /var/cache/conftool/dbconfig/20240612-192303-marostegui.json
  • 19:22 ebysans@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-logging-external: apply
  • 19:22 cdanis@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 19:22 cdanis@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 19:19 ebysans@deploy1002: helmfile [staging] DONE helmfile.d/services/eventgate-logging-external: apply
  • 19:19 ebysans@deploy1002: helmfile [staging] START helmfile.d/services/eventgate-logging-external: apply
  • 19:18 ebysans@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventstreams: apply
  • 19:17 ebysans@deploy1002: helmfile [eqiad] START helmfile.d/services/eventstreams: apply
  • 19:15 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2200.codfw.wmnet with reason: Maintenance
  • 19:15 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2200.codfw.wmnet with reason: Maintenance
  • 19:11 ebysans@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventstreams: apply
  • 19:10 ebysans@deploy1002: helmfile [codfw] START helmfile.d/services/eventstreams: apply
  • 19:09 cdanis@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 19:08 cdanis@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 19:07 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P64744 and previous config saved to /var/cache/conftool/dbconfig/20240612-190755-marostegui.json
  • 19:06 cdanis@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 19:06 cdanis@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 19:03 cdanis@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 19:02 cdanis@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 19:02 ebysans@deploy1002: helmfile [staging] DONE helmfile.d/services/eventstreams: apply
  • 19:02 ebysans@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams: apply
  • 18:59 cdanis@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 18:59 cdanis@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 18:59 ebysans@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventstreams-internal: apply
  • 18:58 ebysans@deploy1002: helmfile [codfw] START helmfile.d/services/eventstreams-internal: apply
  • 18:58 cdanis@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 18:57 cdanis@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 18:55 cdanis@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 18:52 cdanis@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 18:52 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P64742 and previous config saved to /var/cache/conftool/dbconfig/20240612-185248-marostegui.json
  • 18:51 cdanis@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 18:49 ebysans@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventstreams-internal: apply
  • 18:48 ebysans@deploy1002: helmfile [eqiad] START helmfile.d/services/eventstreams-internal: apply
  • 18:42 ebysans@deploy1002: helmfile [staging] DONE helmfile.d/services/eventstreams-internal: apply
  • 18:41 ebysans@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams-internal: apply
  • 18:40 ebysans@deploy1002: helmfile [staging] DONE helmfile.d/services/eventstreams-internal: apply
  • 18:40 ebysans@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams-internal: apply
  • 18:39 ebysans@deploy1002: helmfile [staging] DONE helmfile.d/services/eventstreams-internal: apply
  • 18:39 ebysans@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams-internal: apply
  • 18:37 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2205 (T367261)', diff saved to https://phabricator.wikimedia.org/P64741 and previous config saved to /var/cache/conftool/dbconfig/20240612-183741-marostegui.json
  • 18:24 ejegg: fundraising civicrm upgraded from 955166d1 to 76857844
  • 18:23 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2205 (T367261)', diff saved to https://phabricator.wikimedia.org/P64740 and previous config saved to /var/cache/conftool/dbconfig/20240612-182343-marostegui.json
  • 18:23 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2205.codfw.wmnet with reason: Maintenance
  • 18:23 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2205.codfw.wmnet with reason: Maintenance
  • 18:23 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 (T367261)', diff saved to https://phabricator.wikimedia.org/P64739 and previous config saved to /var/cache/conftool/dbconfig/20240612-182321-marostegui.json
  • 18:08 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P64738 and previous config saved to /var/cache/conftool/dbconfig/20240612-180814-marostegui.json
  • 18:04 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/discovery/wdqs-reload-cookbook-test-T349069/ using stat1009.eqiad.wmnet)
  • 18:01 brennen: 1.43.0-wmf.9 train (T361403): currently blocked on T367334, holding at group0 until resolved.
  • 17:59 mutante: gitlab-replica-old - downtime, renaming to gitlab-replica-b
  • 17:58 dzahn@cumin1002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 1:00:00 on gitlab-replica-old.wikimedia.org with reason: renaming gitlab-replica
  • 17:58 dzahn@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on gitlab-replica-old.wikimedia.org with reason: renaming gitlab-replica
  • 17:58 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-reload reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/discovery/wdqs-reload-cookbook-test-T349069/ using stat1009.eqiad.wmnet)
  • 17:57 dzahn@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on gitlab1003.wikimedia.org with reason: renaming gitlab-replica
  • 17:57 dzahn@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on gitlab1003.wikimedia.org with reason: renaming gitlab-replica
  • 17:56 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/discovery/wdqs-reload-cookbook-test-T349069/ using stat1009.eqiad.wmnet)
  • 17:53 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194', diff saved to https://phabricator.wikimedia.org/P64737 and previous config saved to /var/cache/conftool/dbconfig/20240612-175306-marostegui.json
  • 17:52 brett: authdns-update run on dns1004 (T364891)
  • 17:51 brett: Repool ulsfo as A:cp-text nvme upgrades are complete (T364891)
  • 17:49 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-reload reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/discovery/wdqs-reload-cookbook-test-T349069/ using stat1009.eqiad.wmnet)
  • 17:39 brett: Remove downtime of cache_text/cp text servers in ulsfo - T364891
  • 17:38 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2194 (T367261)', diff saved to https://phabricator.wikimedia.org/P64736 and previous config saved to /var/cache/conftool/dbconfig/20240612-173759-marostegui.json
  • 17:30 brett@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=cache_text,dc=ulsfo
  • 17:26 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 17:25 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 17:25 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 17:25 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 17:24 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 17:24 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 17:24 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2194 (T367261)', diff saved to https://phabricator.wikimedia.org/P64735 and previous config saved to /var/cache/conftool/dbconfig/20240612-172406-marostegui.json
  • 17:24 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2194.codfw.wmnet with reason: Maintenance
  • 17:23 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2194.codfw.wmnet with reason: Maintenance
  • 17:23 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2190 (T367261)', diff saved to https://phabricator.wikimedia.org/P64734 and previous config saved to /var/cache/conftool/dbconfig/20240612-172344-marostegui.json
  • 17:13 cdanis@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 17:13 cdanis@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 17:10 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp4037.ulsfo.wmnet
  • 17:09 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp4037.ulsfo.wmnet
  • 17:09 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-reboot (exit_code=0) rolling reboot on A:sessionstore
  • 17:08 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P64733 and previous config saved to /var/cache/conftool/dbconfig/20240612-170837-marostegui.json
  • 16:56 cdanis@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 16:55 cdanis@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 16:53 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2190', diff saved to https://phabricator.wikimedia.org/P64732 and previous config saved to /var/cache/conftool/dbconfig/20240612-165329-marostegui.json
  • 16:38 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-reload reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/discovery/wdqs-reload-cookbook-test-T349069/ using stat1009.eqiad.wmnet)
  • 16:31 cdanis@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 16:28 cdanis@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 16:24 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2190 (T367261)', diff saved to https://phabricator.wikimedia.org/P64730 and previous config saved to /var/cache/conftool/dbconfig/20240612-162426-marostegui.json
  • 16:24 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2190.codfw.wmnet with reason: Maintenance
  • 16:24 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2190.codfw.wmnet with reason: Maintenance
  • 16:24 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T367261)', diff saved to https://phabricator.wikimedia.org/P64729 and previous config saved to /var/cache/conftool/dbconfig/20240612-162403-marostegui.json
  • 16:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1182 (T352010)', diff saved to https://phabricator.wikimedia.org/P64728 and previous config saved to /var/cache/conftool/dbconfig/20240612-162134-ladsgroup.json
  • 16:21 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 16:21 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 16:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T352010)', diff saved to https://phabricator.wikimedia.org/P64727 and previous config saved to /var/cache/conftool/dbconfig/20240612-162110-ladsgroup.json
  • 16:20 brett: cumin 'A:cp-text and A:ulsfo' 'systemctl poweroff' - T364891
  • 16:19 kamila@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-ctrl1003.eqiad.wmnet with OS bullseye
  • 16:19 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on 8 hosts with reason: T364891
  • 16:18 brett@cumin2002: START - Cookbook sre.hosts.downtime for 3:00:00 on 8 hosts with reason: T364891
  • 16:18 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 16:18 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 16:17 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 16:17 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 16:17 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp4037.ulsfo.wmnet
  • 16:13 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp4037.ulsfo.wmnet
  • 16:11 jhathaway@deploy1002: Finished scap: (no justification provided) (duration: 03m 19s)
  • 16:10 jhathaway@deploy1002: Started scap: (no justification provided)
  • 16:08 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P64726 and previous config saved to /var/cache/conftool/dbconfig/20240612-160856-marostegui.json
  • 16:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P64725 and previous config saved to /var/cache/conftool/dbconfig/20240612-160603-ladsgroup.json
  • 16:05 eevans@cumin1002: START - Cookbook sre.cassandra.roll-reboot rolling reboot on A:sessionstore
  • 16:00 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl1003.eqiad.wmnet with OS bullseye
  • 15:55 otto@deploy1002: Finished scap: Backport for Remove EventLoggingLegacyConverter code - it has been moved to EventLogging (T353817) (duration: 12m 19s)
  • 15:53 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2177', diff saved to https://phabricator.wikimedia.org/P64724 and previous config saved to /var/cache/conftool/dbconfig/20240612-155349-marostegui.json
  • 15:53 kamila@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-ctrl1003.eqiad.wmnet with OS bullseye
  • 15:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P64723 and previous config saved to /var/cache/conftool/dbconfig/20240612-155056-ladsgroup.json
  • 15:47 otto@deploy1002: otto: Continuing with sync
  • 15:46 otto@deploy1002: otto: Backport for Remove EventLoggingLegacyConverter code - it has been moved to EventLogging (T353817) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 15:43 otto@deploy1002: Started scap: Backport for Remove EventLoggingLegacyConverter code - it has been moved to EventLogging (T353817)
  • 15:42 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl1003.eqiad.wmnet with OS bullseye
  • 15:38 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2177 (T367261)', diff saved to https://phabricator.wikimedia.org/P64722 and previous config saved to /var/cache/conftool/dbconfig/20240612-153842-marostegui.json
  • 15:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T352010)', diff saved to https://phabricator.wikimedia.org/P64721 and previous config saved to /var/cache/conftool/dbconfig/20240612-153549-ladsgroup.json
  • 15:35 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1027.eqiad.wmnet
  • 15:34 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:34 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding sretest2001 to codfw - jhancock@cumin2002"
  • 15:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1027.eqiad.wmnet
  • 15:33 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding sretest2001 to codfw - jhancock@cumin2002"
  • 15:31 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 15:28 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1027.eqiad.wmnet
  • 15:28 denisse@cumin2002: conftool action : set/pooled=true; selector: dnsdisc=logstash,name=eqiad
  • 15:27 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync
  • 15:27 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync
  • 15:25 volans: uploaded spicerack_8.6.0 to apt.wikimedia.org bullseye-wikimedia
  • 15:25 kamila@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-ctrl1003']
  • 15:24 kamila@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-ctrl1003']
  • 15:24 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2177 (T367261)', diff saved to https://phabricator.wikimedia.org/P64720 and previous config saved to /var/cache/conftool/dbconfig/20240612-152403-marostegui.json
  • 15:23 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2177.codfw.wmnet with reason: Maintenance
  • 15:23 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2177.codfw.wmnet with reason: Maintenance
  • 15:23 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T367261)', diff saved to https://phabricator.wikimedia.org/P64719 and previous config saved to /var/cache/conftool/dbconfig/20240612-152351-marostegui.json
  • 15:23 kamila@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-ctrl1003']
  • 15:12 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1027.eqiad.wmnet
  • 15:12 kamila@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-ctrl1003']
  • 15:12 kamila@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-ctrl1003.eqiad.wmnet with OS bullseye
  • 15:08 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P64718 and previous config saved to /var/cache/conftool/dbconfig/20240612-150844-marostegui.json
  • 15:02 cdanis: T364907 💙cdanis@apt1002.wikimedia.org ~ 🕚☕ sudo -i reprepro --keepunreferencedfiles includedeb bullseye-wikimedia ~/otelcol-contrib_0.102.0_linux_amd64.deb
  • 15:02 brett: authdns-update run on dns1004
  • 15:01 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl1003.eqiad.wmnet with OS bullseye
  • 15:00 brett: Depooling ulsfo in preparation for A:cp-text downtime/poweroff for nvme upgrades (T364891)
  • 15:00 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for Revert "Only register EntitySchema namespace when feature is enabled", Revert "Allow loading EntitySchema on client (only) wikis" (duration: 12m 36s)
  • 14:53 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2156', diff saved to https://phabricator.wikimedia.org/P64717 and previous config saved to /var/cache/conftool/dbconfig/20240612-145337-marostegui.json
  • 14:53 cdanis@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 14:53 cdanis@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 14:51 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde: Continuing with sync
  • 14:50 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde: Backport for Revert "Only register EntitySchema namespace when feature is enabled", Revert "Allow loading EntitySchema on client (only) wikis" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:50 cgoubert@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling reboot on A:kafka-main-eqiad
  • 14:49 cdanis@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 14:49 cdanis@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 14:47 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap: Backport for Revert "Only register EntitySchema namespace when feature is enabled", Revert "Allow loading EntitySchema on client (only) wikis"
  • 14:46 oblivian@deploy1002: Finished scap: Backport for Use the statsd-exporter service where available (T365265) (duration: 12m 05s)
  • 14:44 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host moss-be1003.eqiad.wmnet with OS bookworm
  • 14:38 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2156 (T367261)', diff saved to https://phabricator.wikimedia.org/P64716 and previous config saved to /var/cache/conftool/dbconfig/20240612-143830-marostegui.json
  • 14:38 oblivian@deploy1002: oblivian: Continuing with sync
  • 14:37 kamila@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-ctrl1003
  • 14:37 oblivian@deploy1002: oblivian: Backport for Use the statsd-exporter service where available (T365265) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:36 kamila@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-ctrl1003
  • 14:35 kamila@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:35 kamila@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Moved wikikube-ctrl1003 to a new rack - kamila@cumin1002"
  • 14:34 moritzm: failover ganeti master in eqiad to ganeti1028
  • 14:34 oblivian@deploy1002: Started scap: Backport for Use the statsd-exporter service where available (T365265)
  • 14:34 kamila@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Moved wikikube-ctrl1003 to a new rack - kamila@cumin1002"
  • 14:31 moritzm: installing gst-plugins-base1.0 security updates
  • 14:31 kamila@cumin1002: START - Cookbook sre.dns.netbox
  • 14:31 jforrester@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 14:29 jforrester@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 14:29 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1038.eqiad.wmnet
  • 14:29 jforrester@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 14:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1038.eqiad.wmnet
  • 14:28 jforrester@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 14:27 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 14:27 claime: trafficserver: move 95% of traffic to mw-on-k8s
  • 14:27 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for Allow loading EntitySchema on client (only) wikis (T363153), Only register EntitySchema namespace when feature is enabled (T363153) (duration: 12m 32s)
  • 14:27 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 14:24 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on moss-be1003.eqiad.wmnet with reason: host reimage
  • 14:24 jforrester@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 14:24 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2156 (T367261)', diff saved to https://phabricator.wikimedia.org/P64715 and previous config saved to /var/cache/conftool/dbconfig/20240612-142412-marostegui.json
  • 14:24 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 14:23 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 14:23 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2156.codfw.wmnet with reason: Maintenance
  • 14:23 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2156.codfw.wmnet with reason: Maintenance
  • 14:23 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T367261)', diff saved to https://phabricator.wikimedia.org/P64714 and previous config saved to /var/cache/conftool/dbconfig/20240612-142335-marostegui.json
  • 14:22 jforrester@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 14:22 jforrester@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 14:22 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/toolhub: apply
  • 14:21 mvernon@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on moss-be1003.eqiad.wmnet with reason: host reimage
  • 14:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1038.eqiad.wmnet
  • 14:20 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1020.eqiad.wmnet,service=s5
  • 14:20 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1020.eqiad.wmnet,service=s8
  • 14:20 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/toolhub: apply
  • 14:20 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/toolhub: apply
  • 14:19 jforrester@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 14:19 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/toolhub: apply
  • 14:19 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/toolhub: apply
  • 14:19 jayme@deploy1002: helmfile [staging] START helmfile.d/services/toolhub: apply
  • 14:18 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde: Continuing with sync
  • 14:17 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 14:17 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde: Backport for Allow loading EntitySchema on client (only) wikis (T363153), Only register EntitySchema namespace when feature is enabled (T363153) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:15 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 14:15 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host clouddb1020.eqiad.wmnet
  • 14:14 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap: Backport for Allow loading EntitySchema on client (only) wikis (T363153), Only register EntitySchema namespace when feature is enabled (T363153)
  • 14:10 moritzm: installing libarchive security updates
  • 14:08 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P64713 and previous config saved to /var/cache/conftool/dbconfig/20240612-140827-marostegui.json
  • 14:07 fnegri@cumin1002: START - Cookbook sre.hosts.reboot-single for host clouddb1020.eqiad.wmnet
  • 14:02 vgutierrez: repool text@esams with IPIP encapsulation enabled - T366466
  • 14:02 mvernon@cumin1002: START - Cookbook sre.hosts.reimage for host moss-be1003.eqiad.wmnet with OS bookworm
  • 14:00 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp4037.ulsfo.wmnet
  • 13:56 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1038.eqiad.wmnet
  • 13:55 dcausse@deploy1002: Finished deploy [wdqs/wdqs@1cf4017]: deploy to test server wdqs2023 (fix loadData.sh) (duration: 00m 13s)
  • 13:54 dcausse@deploy1002: Started deploy [wdqs/wdqs@1cf4017]: deploy to test server wdqs2023 (fix loadData.sh)
  • 13:53 vgutierrez: rolling restart of pybal on lvs3010 and lvs3008 - T366466
  • 13:53 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P64712 and previous config saved to /var/cache/conftool/dbconfig/20240612-135319-marostegui.json
  • 13:49 fabfur: depooled cp4037 to test benthos/haproxy configuration (T365718)
  • 13:48 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on clouddb1020.eqiad.wmnet with reason: T366555
  • 13:48 fnegri@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on clouddb1020.eqiad.wmnet with reason: T366555
  • 13:48 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp4037.ulsfo.wmnet
  • 13:46 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1020.eqiad.wmnet,service=s8
  • 13:46 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1020.eqiad.wmnet,service=s5
  • 13:46 cgoubert@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling reboot on A:kafka-main-eqiad
  • 13:45 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1019.eqiad.wmnet,service=s4
  • 13:45 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1019.eqiad.wmnet,service=s6
  • 13:45 claime: Starting kafka-main reboots in eqiad
  • 13:44 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2198.codfw.wmnet with reason: Maintenance
  • 13:44 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2198.codfw.wmnet with reason: Maintenance
  • 13:44 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T364069)', diff saved to https://phabricator.wikimedia.org/P64710 and previous config saved to /var/cache/conftool/dbconfig/20240612-134414-marostegui.json
  • 13:42 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1022.eqiad.wmnet
  • 13:42 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1022.eqiad.wmnet
  • 13:39 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for poolcounter2004.codfw.wmnet: Renew puppet certificate - elukey@cumin1002
  • 13:38 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T367261)', diff saved to https://phabricator.wikimedia.org/P64709 and previous config saved to /var/cache/conftool/dbconfig/20240612-133812-marostegui.json
  • 13:38 elukey@cumin1002: START - Cookbook sre.puppet.renew-cert for poolcounter2004.codfw.wmnet: Renew puppet certificate - elukey@cumin1002
  • 13:37 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for poolcounter2003.codfw.wmnet: Renew puppet certificate - elukey@cumin1002
  • 13:36 elukey@cumin1002: START - Cookbook sre.puppet.renew-cert for poolcounter2003.codfw.wmnet: Renew puppet certificate - elukey@cumin1002
  • 13:36 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1022.eqiad.wmnet
  • 13:36 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for poolcounter1004.eqiad.wmnet: Renew puppet certificate - elukey@cumin1002
  • 13:35 elukey@cumin1002: START - Cookbook sre.puppet.renew-cert for poolcounter1004.eqiad.wmnet: Renew puppet certificate - elukey@cumin1002
  • 13:35 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for poolcounter1005.eqiad.wmnet: Renew puppet certificate - elukey@cumin1002
  • 13:34 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for aqs1010.eqiad.wmnet
  • 13:34 eevans@cumin1002: START - Cookbook sre.hosts.remove-downtime for aqs1010.eqiad.wmnet
  • 13:34 elukey@cumin1002: START - Cookbook sre.puppet.renew-cert for poolcounter1005.eqiad.wmnet: Renew puppet certificate - elukey@cumin1002
  • 13:34 brouberol@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/discovery/wdqs-reload-cookbook-test-T349069/ using stat1009.eqiad.wmnet)
  • 13:31 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:31 sukhe@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add ntp-[abc].anycast.wmnet addresses - sukhe@cumin1002"
  • 13:30 sukhe@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add ntp-[abc].anycast.wmnet addresses - sukhe@cumin1002"
  • 13:30 brouberol@cumin2002: START - Cookbook sre.wdqs.data-reload reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/discovery/wdqs-reload-cookbook-test-T349069/ using stat1009.eqiad.wmnet)
  • 13:29 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P64708 and previous config saved to /var/cache/conftool/dbconfig/20240612-132907-marostegui.json
  • 13:28 sukhe: add ntp-[abc].anycast.wmnet: 10.3.0.[5-7]/32: T366360
  • 13:28 sukhe@cumin1002: START - Cookbook sre.dns.netbox
  • 13:26 vgutierrez: depool text@esams before enabling IPIP encapsulation - T366466
  • 13:26 dcausse@deploy1002: Finished deploy [wdqs/wdqs@43b966f]: deploy to test server wdqs2023 (duration: 00m 14s)
  • 13:25 dcausse@deploy1002: Started deploy [wdqs/wdqs@43b966f]: deploy to test server wdqs2023
  • 13:24 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2149 (T367261)', diff saved to https://phabricator.wikimedia.org/P64707 and previous config saved to /var/cache/conftool/dbconfig/20240612-132351-marostegui.json
  • 13:23 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2149.codfw.wmnet with reason: Maintenance
  • 13:23 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2149.codfw.wmnet with reason: Maintenance
  • 13:21 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1022.eqiad.wmnet
  • 13:21 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for Only register EntitySchema namespace when feature is enabled (T363153) (duration: 12m 15s)
  • 13:20 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1021.eqiad.wmnet
  • 13:19 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1021.eqiad.wmnet
  • 13:18 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on aqs1010.eqiad.wmnet with reason: Troubleshooting remote logging — T350567
  • 13:18 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on aqs1010.eqiad.wmnet with reason: Troubleshooting remote logging — T350567
  • 13:14 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2182', diff saved to https://phabricator.wikimedia.org/P64706 and previous config saved to /var/cache/conftool/dbconfig/20240612-131400-marostegui.json
  • 13:13 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1021.eqiad.wmnet
  • 13:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on logstash1031.eqiad.wmnet with reason: reboot/ganeti
  • 13:13 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde: Continuing with sync
  • 13:13 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on logstash1031.eqiad.wmnet with reason: reboot/ganeti
  • 13:12 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde: Backport for Only register EntitySchema namespace when feature is enabled (T363153) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:10 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 13:10 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 13:09 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap: Backport for Only register EntitySchema namespace when feature is enabled (T363153)
  • 13:02 marostegui@cumin1002: dbctl commit (dc=all): 'db1191 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P64705 and previous config saved to /var/cache/conftool/dbconfig/20240612-130232-root.json
  • 13:02 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 13:02 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 12:58 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2182 (T364069)', diff saved to https://phabricator.wikimedia.org/P64704 and previous config saved to /var/cache/conftool/dbconfig/20240612-125853-marostegui.json
  • 12:58 ladsgroup@deploy1002: Finished scap: Backport for override circuit breaking threshold for ES hosts (duration: 16m 34s)
  • 12:56 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1021.eqiad.wmnet
  • 12:53 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1020.eqiad.wmnet
  • 12:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1020.eqiad.wmnet
  • 12:50 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1240.eqiad.wmnet with reason: Maintenance
  • 12:50 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1240.eqiad.wmnet with reason: Maintenance
  • 12:50 ladsgroup@deploy1002: ladsgroup: Continuing with sync
  • 12:47 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1020.eqiad.wmnet
  • 12:47 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on logstash1030.eqiad.wmnet with reason: reboot/ganeti
  • 12:47 marostegui@cumin1002: dbctl commit (dc=all): 'db1191 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P64703 and previous config saved to /var/cache/conftool/dbconfig/20240612-124727-root.json
  • 12:47 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 1:00:00 on logstash1030.eqiad.wmnet with reason: reboot/ganeti
  • 12:45 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1223.eqiad.wmnet with reason: Maintenance
  • 12:45 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1223.eqiad.wmnet with reason: Maintenance
  • 12:44 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1212 (T367261)', diff saved to https://phabricator.wikimedia.org/P64702 and previous config saved to /var/cache/conftool/dbconfig/20240612-124456-marostegui.json
  • 12:44 ladsgroup@deploy1002: ladsgroup: Backport for override circuit breaking threshold for ES hosts synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 12:42 ladsgroup@deploy1002: Started scap: Backport for override circuit breaking threshold for ES hosts
  • 12:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ping1003.eqiad.wmnet
  • 12:34 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ping1003.eqiad.wmnet
  • 12:32 marostegui@cumin1002: dbctl commit (dc=all): 'db1191 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P64701 and previous config saved to /var/cache/conftool/dbconfig/20240612-123222-root.json
  • 12:29 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1212', diff saved to https://phabricator.wikimedia.org/P64700 and previous config saved to /var/cache/conftool/dbconfig/20240612-122948-marostegui.json
  • 12:29 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply
  • 12:29 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply
  • 12:28 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply
  • 12:25 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/sessionstore: apply
  • 12:25 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/sessionstore: apply
  • 12:25 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/echostore: apply
  • 12:24 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/echostore: apply
  • 12:18 Emperor: restart swift-proxy on ms-fe1013 T360913
  • 12:17 Emperor: restart swift-proxy on ms-fe2011 ms-fe2014 T360913
  • 12:17 marostegui@cumin1002: dbctl commit (dc=all): 'db1191 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P64699 and previous config saved to /var/cache/conftool/dbconfig/20240612-121716-root.json
  • 12:14 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1212', diff saved to https://phabricator.wikimedia.org/P64698 and previous config saved to /var/cache/conftool/dbconfig/20240612-121441-marostegui.json
  • 12:14 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/sessionstore: apply
  • 12:14 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply
  • 12:13 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/sessionstore: apply
  • 12:13 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/sessionstore: apply
  • 12:13 jayme@deploy1002: helmfile [staging] START helmfile.d/services/sessionstore: apply
  • 12:12 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/echostore: apply
  • 12:12 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/echostore: apply
  • 12:11 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1020.eqiad.wmnet
  • 12:11 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1033.eqiad.wmnet
  • 12:11 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1033.eqiad.wmnet
  • 12:10 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/echostore: apply
  • 12:10 jayme@deploy1002: helmfile [staging] START helmfile.d/services/echostore: apply
  • 12:05 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply
  • 12:04 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1033.eqiad.wmnet
  • 12:02 marostegui@cumin1002: dbctl commit (dc=all): 'db1191 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P64697 and previous config saved to /var/cache/conftool/dbconfig/20240612-120211-root.json
  • 12:00 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventstreams-internal: apply
  • 11:59 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1212 (T367261)', diff saved to https://phabricator.wikimedia.org/P64696 and previous config saved to /var/cache/conftool/dbconfig/20240612-115934-marostegui.json
  • 11:59 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/eventstreams-internal: apply
  • 11:59 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventstreams: apply
  • 11:58 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/eventstreams: apply
  • 11:57 claime: Manual restart of dump_cloud_ip_ranges.service on A:puppetserver and A:puppetmaster
  • 11:55 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
  • 11:55 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
  • 11:54 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
  • 11:54 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
  • 11:54 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1033.eqiad.wmnet
  • 11:53 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventstreams-internal: apply
  • 11:53 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1034.eqiad.wmnet
  • 11:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1034.eqiad.wmnet
  • 11:53 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 11:53 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 11:52 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/eventstreams-internal: apply
  • 11:52 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventstreams: apply
  • 11:51 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1212 (T367261)', diff saved to https://phabricator.wikimedia.org/P64695 and previous config saved to /var/cache/conftool/dbconfig/20240612-115143-marostegui.json
  • 11:51 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/eventstreams: apply
  • 11:51 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 11:51 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 11:51 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1212.eqiad.wmnet with reason: Maintenance
  • 11:51 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/eventstreams-internal: apply
  • 11:51 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1212.eqiad.wmnet with reason: Maintenance
  • 11:51 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T367261)', diff saved to https://phabricator.wikimedia.org/P64693 and previous config saved to /var/cache/conftool/dbconfig/20240612-115103-marostegui.json
  • 11:50 jayme@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams-internal: apply
  • 11:50 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/eventstreams: apply
  • 11:50 jayme@deploy1002: helmfile [staging] START helmfile.d/services/eventstreams: apply
  • 11:47 marostegui@cumin1002: dbctl commit (dc=all): 'db1191 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P64692 and previous config saved to /var/cache/conftool/dbconfig/20240612-114705-root.json
  • 11:47 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1034.eqiad.wmnet
  • 11:46 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/image-suggestion: apply
  • 11:45 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply
  • 11:45 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/image-suggestion: apply
  • 11:45 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-mcrouter: apply
  • 11:45 jayme@deploy1002: helmfile [staging] START helmfile.d/services/mw-mcrouter: apply
  • 11:44 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/image-suggestion: apply
  • 11:44 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/image-suggestion: apply
  • 11:44 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1191', diff saved to https://phabricator.wikimedia.org/P64691 and previous config saved to /var/cache/conftool/dbconfig/20240612-114410-root.json
  • 11:42 mvolz@deploy1002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
  • 11:42 mvolz@deploy1002: helmfile [eqiad] START helmfile.d/services/citoid: apply
  • 11:39 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply
  • 11:38 cgoubert@deploy1002: helmfile [staging] DONE helmfile.d/services/image-suggestion: apply
  • 11:37 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply
  • 11:37 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 11:37 cgoubert@deploy1002: helmfile [staging] START helmfile.d/services/image-suggestion: apply
  • 11:37 mvolz@deploy1002: helmfile [codfw] DONE helmfile.d/services/citoid: apply
  • 11:37 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 11:37 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply
  • 11:36 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply
  • 11:36 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply
  • 11:36 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1034.eqiad.wmnet
  • 11:36 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply
  • 11:36 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply
  • 11:35 mvolz@deploy1002: helmfile [codfw] START helmfile.d/services/citoid: apply
  • 11:35 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P64690 and previous config saved to /var/cache/conftool/dbconfig/20240612-113556-marostegui.json
  • 11:35 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox: apply
  • 11:31 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox: apply
  • 11:31 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/citoid: apply
  • 11:30 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/citoid: apply
  • 11:22 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1031.eqiad.wmnet with OS bookworm
  • 11:20 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1198', diff saved to https://phabricator.wikimedia.org/P64689 and previous config saved to /var/cache/conftool/dbconfig/20240612-112048-marostegui.json
  • 11:14 mvolz@deploy1002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
  • 11:14 mvolz@deploy1002: helmfile [eqiad] START helmfile.d/services/citoid: apply
  • 11:13 mvolz@deploy1002: helmfile [codfw] DONE helmfile.d/services/citoid: apply
  • 11:12 mvolz@deploy1002: helmfile [codfw] START helmfile.d/services/citoid: apply
  • 11:12 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/citoid: apply
  • 11:12 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/citoid: apply
  • 11:10 moritzm: rebalance ganeti cluster in eqsin following reboots
  • 11:08 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
  • 11:08 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for EntitySchemaSlotViewRenderer: Fix Phan failure (duration: 12m 10s)
  • 11:08 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
  • 11:05 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1198 (T367261)', diff saved to https://phabricator.wikimedia.org/P64688 and previous config saved to /var/cache/conftool/dbconfig/20240612-110541-marostegui.json
  • 11:04 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Trust and Safety" "Wikimedia Foundation/Legal/Community Resilience and Sustainability/Trust and Safety" "Zabe" --reason "per request T367217"
  • 11:03 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts wikikube-ctrl1003.eqiad.wmnet
  • 11:03 kamila@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:03 kamila@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wikikube-ctrl1003.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - kamila@cumin1002"
  • 11:01 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Wikimedia Foundation Legal department" "Wikimedia Foundation/Legal" "Zabe" --reason "per request T367216"
  • 11:00 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti5007.eqsin.wmnet
  • 10:59 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5007.eqsin.wmnet
  • 10:58 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde: Continuing with sync
  • 10:58 kamila@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wikikube-ctrl1003.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - kamila@cumin1002"
  • 10:58 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde: Backport for EntitySchemaSlotViewRenderer: Fix Phan failure synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 10:57 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Global Advocacy/Conversation hours and Events" "Wikimedia Foundation/Legal/Global Advocacy/Conversation hours and Events" "Zabe" --reason "per request T367219"
  • 10:56 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1198 (T367261)', diff saved to https://phabricator.wikimedia.org/P64687 and previous config saved to /var/cache/conftool/dbconfig/20240612-105615-marostegui.json
  • 10:56 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1198.eqiad.wmnet with reason: Maintenance
  • 10:56 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap: Backport for EntitySchemaSlotViewRenderer: Fix Phan failure
  • 10:55 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1198.eqiad.wmnet with reason: Maintenance
  • 10:55 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1189 (T367261)', diff saved to https://phabricator.wikimedia.org/P64686 and previous config saved to /var/cache/conftool/dbconfig/20240612-105554-marostegui.json
  • 10:54 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1031.eqiad.wmnet with reason: host reimage
  • 10:54 kamila@cumin1002: START - Cookbook sre.dns.netbox
  • 10:53 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Global Advocacy/About" "Wikimedia Foundation/Legal/Global Advocacy/About" "Zabe" --reason "per request T367219"
  • 10:53 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5007.eqsin.wmnet
  • 10:52 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1031.eqiad.wmnet with reason: host reimage
  • 10:48 kamila@cumin1002: START - Cookbook sre.hosts.decommission for hosts wikikube-ctrl1003.eqiad.wmnet
  • 10:46 kamila@cumin1002: conftool action : set/pooled=inactive; selector: name=wikikube-ctrl1003.eqiad.wmnet
  • 10:41 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Global Advocacy" "Wikimedia Foundation/Legal/Global Advocacy" "Zabe" --reason "per request T367219"
  • 10:41 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host clouddb1019.eqiad.wmnet
  • 10:40 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P64685 and previous config saved to /var/cache/conftool/dbconfig/20240612-104047-marostegui.json
  • 10:33 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1031.eqiad.wmnet with OS bookworm
  • 10:27 fnegri@cumin1002: START - Cookbook sre.hosts.reboot-single for host clouddb1019.eqiad.wmnet
  • 10:26 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti5007.eqsin.wmnet
  • 10:25 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1189', diff saved to https://phabricator.wikimedia.org/P64684 and previous config saved to /var/cache/conftool/dbconfig/20240612-102540-marostegui.json
  • 10:25 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply
  • 10:25 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply
  • 10:25 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 10:24 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 10:24 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply
  • 10:23 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-media: apply
  • 10:23 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply
  • 10:23 godog: remove MediaWiki.jawiki.GrowthExperiments.NewcomerTask.update_.* from graphite hosts - T362633
  • 10:23 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply
  • 10:23 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox: apply
  • 10:22 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox: apply
  • 10:19 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1019.eqiad.wmnet,service=s6
  • 10:19 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1019.eqiad.wmnet,service=s4
  • 10:19 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Grants:Community Resources" "Wikimedia Foundation/Advancement/Community Growth/Community Resources" "Zabe" --reason "per request T365837"
  • 10:17 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply
  • 10:16 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply
  • 10:16 jayme@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply
  • 10:16 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 10:16 jayme@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 10:16 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply
  • 10:16 jayme@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-media: apply
  • 10:16 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply
  • 10:15 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on 9 hosts with reason: decommissioning
  • 10:15 jayme@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply
  • 10:15 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox: apply
  • 10:15 jayme@deploy1002: helmfile [staging] START helmfile.d/services/shellbox: apply
  • 10:15 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on 9 hosts with reason: decommissioning
  • 10:14 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox: apply
  • 10:14 jayme@deploy1002: helmfile [staging] START helmfile.d/services/shellbox: apply
  • 10:10 claime: Depooling mw2281.codfw.wmnet,mw22[83-90].codfw.wmnet for decommission - T367275
  • 10:10 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1189 (T367261)', diff saved to https://phabricator.wikimedia.org/P64683 and previous config saved to /var/cache/conftool/dbconfig/20240612-101032-marostegui.json
  • 10:08 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply
  • 10:07 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply
  • 10:07 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
  • 10:07 zabe: zabe@mwmaint1002:~$ foreachwikiindblist 'all - s4' refreshImageMetadata.php --mime image/webp # T364680
  • 09:48 fabfur: disabling puppet on cp4037 to test benthos configuration (T360454)
  • 09:47 fabfur: disabling puppet on cp4037 to test benthos configuration
  • 09:47 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P64680 and previous config saved to /var/cache/conftool/dbconfig/20240612-094738-marostegui.json
  • 09:47 _joe_: running dump_cloud_ip_ranges on puppetmaster1001 to test fixed script
  • 09:43 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1018.eqiad.wmnet,service=s7
  • 09:43 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1018.eqiad.wmnet,service=s2
  • 09:33 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 09:32 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1175', diff saved to https://phabricator.wikimedia.org/P64679 and previous config saved to /var/cache/conftool/dbconfig/20240612-093231-marostegui.json
  • 09:32 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 09:17 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1175 (T367261)', diff saved to https://phabricator.wikimedia.org/P64678 and previous config saved to /var/cache/conftool/dbconfig/20240612-091724-marostegui.json
  • 09:11 moritzm: failover ganeti cluster for eqsin to ganeti5004
  • 09:10 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1175 (T367261)', diff saved to https://phabricator.wikimedia.org/P64677 and previous config saved to /var/cache/conftool/dbconfig/20240612-090959-marostegui.json
  • 09:09 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 09:09 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1175.eqiad.wmnet with reason: Maintenance
  • 09:09 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T367261)', diff saved to https://phabricator.wikimedia.org/P64676 and previous config saved to /var/cache/conftool/dbconfig/20240612-090937-marostegui.json
  • 09:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2214 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P64675 and previous config saved to /var/cache/conftool/dbconfig/20240612-090834-ladsgroup.json
  • 09:06 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-ctrl1002.eqiad.wmnet with OS bullseye
  • 09:04 Lucas_WMDE: START lucaswerkmeister-wmde@mwmaint1002:~$ time mwscript extensions/DiscussionTools/maintenance/persistRevisionThreadItems.php --wiki enwiki --current --all --touched-after=20240524120000 --start '["55386869"]' 2>&1 | tee -a ~/T315510-enwiki-9; date
  • 09:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1223 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P64674 and previous config saved to /var/cache/conftool/dbconfig/20240612-090435-ladsgroup.json
  • 09:04 Lucas_WMDE: STOPPED lucaswerkmeister-wmde@mwmaint1002:~$ time mwscript extensions/DiscussionTools/maintenance/persistRevisionThreadItems.php --wiki enwiki --current --all --touched-after=20240524120000 --start '["55019880"]' 2>&1 | tee -a ~/T315510-enwiki-8; date # Ctrl+C, had become very slow, trying restart
  • 08:54 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P64673 and previous config saved to /var/cache/conftool/dbconfig/20240612-085430-marostegui.json
  • 08:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2214 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P64672 and previous config saved to /var/cache/conftool/dbconfig/20240612-085329-ladsgroup.json
  • 08:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti5006.eqsin.wmnet
  • 08:52 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5006.eqsin.wmnet
  • 08:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1223 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P64671 and previous config saved to /var/cache/conftool/dbconfig/20240612-084929-ladsgroup.json
  • 08:45 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5006.eqsin.wmnet
  • 08:42 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-ctrl1002.eqiad.wmnet with reason: host reimage
  • 08:42 zabe: zabe@mwmaint1002:~$ mwscript refreshImageMetadata.php commonswiki --mime image/webp # T364680
  • 08:39 slyngshede@cumin1002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging Mike Pham out of all services on: 2200 hosts
  • 08:39 kamila@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-ctrl1002.eqiad.wmnet with reason: host reimage
  • 08:39 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1166', diff saved to https://phabricator.wikimedia.org/P64670 and previous config saved to /var/cache/conftool/dbconfig/20240612-083923-marostegui.json
  • 08:38 slyngshede@cumin1002: START - Cookbook sre.idm.logout Logging Mike Pham out of all services on: 2200 hosts
  • 08:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2214 (re)pooling @ 50%: Maint over', diff saved to https://phabricator.wikimedia.org/P64669 and previous config saved to /var/cache/conftool/dbconfig/20240612-083824-ladsgroup.json
  • 08:36 Lucas_WMDE: lucaswerkmeister-wmde@deploy1002 ~ $ mwscript-k8s --comment 'T367174, P12703' extensions/Wikibase/repo/maintenance/changePropertyDataType.php wikidatawiki -- --property-id P12703 --new-data-type external-id --summary 'T367174' # succeeded
  • 08:35 Lucas_WMDE: lucaswerkmeister-wmde@deploy1002 ~ $ mwscript-k8s --comment 'T367174, P12583' extensions/Wikibase/repo/maintenance/changePropertyDataType.php wikidatawiki -- --property-id P12583 --new-data-type external-id --summary 'T367174' # succeeded
  • 08:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1223 (re)pooling @ 50%: Maint over', diff saved to https://phabricator.wikimedia.org/P64668 and previous config saved to /var/cache/conftool/dbconfig/20240612-083424-ladsgroup.json
  • 08:28 brouberol@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/discovery/wdqs-reload-cookbook-test-T349069/ using stat1009.eqiad.wmnet)
  • 08:27 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 08:27 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 08:27 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2123', diff saved to https://phabricator.wikimedia.org/P64667 and previous config saved to /var/cache/conftool/dbconfig/20240612-082702-marostegui.json
  • 08:26 fabfur@cumin1002: START - Cookbook sre.cdn.roll-reboot rolling reboot on A:cp-upload_codfw
  • 08:26 fabfur: start rebooting all cp-upload_codfw hosts for T366555 (spaced 1.5 hrs)
  • 08:25 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl1002.eqiad.wmnet with OS bullseye
  • 08:25 kamila@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-ctrl1002
  • 08:25 kamila@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-ctrl1002
  • 08:24 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1166 (T367261)', diff saved to https://phabricator.wikimedia.org/P64666 and previous config saved to /var/cache/conftool/dbconfig/20240612-082415-marostegui.json
  • 08:24 brouberol@cumin2002: START - Cookbook sre.wdqs.data-reload reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/discovery/wdqs-reload-cookbook-test-T349069/ using stat1009.eqiad.wmnet)
  • 08:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2214 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P64665 and previous config saved to /var/cache/conftool/dbconfig/20240612-082318-ladsgroup.json
  • 08:22 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2214.codfw.wmnet with reason: Maintenance
  • 08:22 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2214.codfw.wmnet with reason: Maintenance
  • 08:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1223 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P64664 and previous config saved to /var/cache/conftool/dbconfig/20240612-081918-ladsgroup.json
  • 08:17 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti5006.eqsin.wmnet
  • 08:17 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host ganeti1019.eqiad.wmnet with OS bullseye
  • 08:16 marostegui@cumin1002: dbctl commit (dc=all): 'db2123 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P64663 and previous config saved to /var/cache/conftool/dbconfig/20240612-081643-root.json
  • 08:15 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1166 (T367261)', diff saved to https://phabricator.wikimedia.org/P64662 and previous config saved to /var/cache/conftool/dbconfig/20240612-081551-marostegui.json
  • 08:15 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 08:15 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1166.eqiad.wmnet with reason: Maintenance
  • 08:15 brouberol@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/discovery/wdqs-reload-cookbook-test-T349069/ using stat1009.eqiad.wmnet)
  • 08:15 brouberol@cumin2002: START - Cookbook sre.wdqs.data-reload reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/discovery/wdqs-reload-cookbook-test-T349069/ using stat1009.eqiad.wmnet)
  • 08:12 brouberol@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/discovery/wdqs-reload-cookbook-test-T349069/ using stat1009.eqiad.wmnet)
  • 08:12 brouberol@cumin2002: START - Cookbook sre.wdqs.data-reload reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/discovery/wdqs-reload-cookbook-test-T349069/ using stat1009.eqiad.wmnet)
  • 08:12 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1156 (T352010)', diff saved to https://phabricator.wikimedia.org/P64661 and previous config saved to /var/cache/conftool/dbconfig/20240612-081158-ladsgroup.json
  • 08:11 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 08:11 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 08:11 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 08:11 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 08:09 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.resource-report (exit_code=0)
  • 08:09 jmm@cumin2002: START - Cookbook sre.ganeti.resource-report
  • 08:05 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 08:05 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 07:42 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti5005.eqsin.wmnet
  • 07:42 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5005.eqsin.wmnet
  • 07:36 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti1019.eqiad.wmnet with OS bullseye
  • 07:36 jmm@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host ganeti1019.eqiad.wmnet with OS bullseye
  • 07:36 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5005.eqsin.wmnet
  • 07:23 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti5005.eqsin.wmnet
  • 07:21 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti5004.eqsin.wmnet
  • 07:21 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti5004.eqsin.wmnet
  • 07:20 marostegui: dbmaint optimize pagelinks on old s6 codfw master db2214 T364069
  • 07:16 kartik@deploy1002: Finished scap: Backport for Content Translation: Set MT threshold 85% in the Portuguese Wikipedia (T356356) (duration: 13m 11s)
  • 07:14 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2214.codfw.wmnet with reason: Long schema change
  • 07:14 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti5004.eqsin.wmnet
  • 07:14 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2214.codfw.wmnet with reason: Long schema change
  • 07:14 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 07:14 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 07:14 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2214.codfw.wmnet with reason: Long schema change
  • 07:14 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on db2214.codfw.wmnet with reason: Long schema change
  • 07:14 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti5004.eqsin.wmnet
  • 07:13 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2214 T367262', diff saved to https://phabricator.wikimedia.org/P64660 and previous config saved to /var/cache/conftool/dbconfig/20240612-071340-root.json
  • 07:12 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db2129 to s6 primary T367262', diff saved to https://phabricator.wikimedia.org/P64659 and previous config saved to /var/cache/conftool/dbconfig/20240612-071158-root.json
  • 07:06 kartik@deploy1002: kartik: Continuing with sync
  • 07:05 kartik@deploy1002: kartik: Backport for Content Translation: Set MT threshold 85% in the Portuguese Wikipedia (T356356) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 07:04 marostegui: Starting s6 codfw failover from db2214 to db2129 - T367262
  • 07:03 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 07:03 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2182 (T364069)', diff saved to https://phabricator.wikimedia.org/P64658 and previous config saved to /var/cache/conftool/dbconfig/20240612-070302-marostegui.json
  • 07:02 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 07:02 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2182.codfw.wmnet with reason: Maintenance
  • 07:02 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 07:02 kartik@deploy1002: Started scap: Backport for Content Translation: Set MT threshold 85% in the Portuguese Wikipedia (T356356)
  • 07:02 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2182.codfw.wmnet with reason: Maintenance
  • 07:02 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2168 (T364069)', diff saved to https://phabricator.wikimedia.org/P64657 and previous config saved to /var/cache/conftool/dbconfig/20240612-070240-marostegui.json
  • 07:02 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 06:58 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ganeti1019.eqiad.wmnet with OS bullseye
  • 06:55 moritzm: remove ganeti1019 from eqiad cluster T367071
  • 06:54 moritzm: rebalance ganeti clusters in codfw following reboots
  • 06:47 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2168', diff saved to https://phabricator.wikimedia.org/P64656 and previous config saved to /var/cache/conftool/dbconfig/20240612-064733-marostegui.json
  • 06:44 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 06:43 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 06:42 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 26 hosts with reason: Primary switchover s6 T367262
  • 06:42 marostegui@cumin1002: dbctl commit (dc=all): 'Set db2129 with weight 0 T367262', diff saved to https://phabricator.wikimedia.org/P64655 and previous config saved to /var/cache/conftool/dbconfig/20240612-064200-root.json
  • 06:41 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 26 hosts with reason: Primary switchover s6 T367262
  • 06:40 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 06:40 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 06:38 hashar@deploy1002: Finished deploy [gerrit/gerrit@69984f7]: wm-zuul-status: fix reload button - T360550 (duration: 00m 07s)
  • 06:38 hashar@deploy1002: Started deploy [gerrit/gerrit@69984f7]: wm-zuul-status: fix reload button - T360550
  • 06:32 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2168', diff saved to https://phabricator.wikimedia.org/P64654 and previous config saved to /var/cache/conftool/dbconfig/20240612-063225-marostegui.json
  • 06:17 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2168 (T364069)', diff saved to https://phabricator.wikimedia.org/P64653 and previous config saved to /var/cache/conftool/dbconfig/20240612-061718-marostegui.json
  • 05:59 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 05:59 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 05:58 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 05:58 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 05:51 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 05:51 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 05:17 dani@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 05:17 dani@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 05:17 dani@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 05:16 dani@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 05:16 dani@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 05:16 dani@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 00:54 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2168 (T364069)', diff saved to https://phabricator.wikimedia.org/P64652 and previous config saved to /var/cache/conftool/dbconfig/20240612-005420-marostegui.json
  • 00:54 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2168.codfw.wmnet with reason: Maintenance
  • 00:53 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2168.codfw.wmnet with reason: Maintenance
  • 00:53 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T364069)', diff saved to https://phabricator.wikimedia.org/P64651 and previous config saved to /var/cache/conftool/dbconfig/20240612-005347-marostegui.json
  • 00:53 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on A:cp-text_codfw
  • 00:38 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P64650 and previous config saved to /var/cache/conftool/dbconfig/20240612-003840-marostegui.json
  • 00:23 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2159', diff saved to https://phabricator.wikimedia.org/P64649 and previous config saved to /var/cache/conftool/dbconfig/20240612-002332-marostegui.json
  • 00:08 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2159 (T364069)', diff saved to https://phabricator.wikimedia.org/P64648 and previous config saved to /var/cache/conftool/dbconfig/20240612-000825-marostegui.json

2024-06-11

  • 23:45 eevans@deploy1002: helmfile [staging] DONE helmfile.d/services/data-gateway: apply
  • 23:45 eevans@deploy1002: helmfile [staging] START helmfile.d/services/data-gateway: apply
  • 22:56 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/discovery/wdqs-reload-cookbook-test-T349069/ using stat1009.eqiad.wmnet)
  • 22:29 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-reboot (exit_code=0) rolling reboot on A:aqs-codfw
  • 21:56 ladsgroup@deploy1002: Finished scap: Backport for Fix Linker::makeExternalLink build failures (T367127) (duration: 12m 33s)
  • 21:51 ejegg: fundraising civicrm upgraded from 7252b1b9 to f7855d25
  • 21:47 ladsgroup@deploy1002: matmarex, ladsgroup: Continuing with sync
  • 21:47 ladsgroup@deploy1002: matmarex, ladsgroup: Backport for Fix Linker::makeExternalLink build failures (T367127) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:44 ladsgroup@deploy1002: Started scap: Backport for Fix Linker::makeExternalLink build failures (T367127)
  • 21:42 ladsgroup@deploy1002: Finished scap: Backport for Reduce the threshold for section wide circuit breaking to 300 (duration: 12m 08s)
  • 21:33 ladsgroup@deploy1002: ladsgroup: Continuing with sync
  • 21:32 ladsgroup@deploy1002: ladsgroup: Backport for Reduce the threshold for section wide circuit breaking to 300 synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:30 ladsgroup@deploy1002: Started scap: Backport for Reduce the threshold for section wide circuit breaking to 300
  • 21:27 ladsgroup@deploy1002: Finished scap: Backport for [zghwiki] Add patroller and autopatrolled groups (T357411) (duration: 11m 53s)
  • 21:18 ladsgroup@deploy1002: pppery, ladsgroup: Continuing with sync
  • 21:18 ladsgroup@deploy1002: pppery, ladsgroup: Backport for [zghwiki] Add patroller and autopatrolled groups (T357411) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:16 ladsgroup@deploy1002: Started scap: Backport for [zghwiki] Add patroller and autopatrolled groups (T357411)
  • 21:15 ladsgroup@deploy1002: Finished scap: Backport for Stop writing to the old pagelinks columns of s2 (T352010) (duration: 12m 02s)
  • 21:06 ladsgroup@deploy1002: ladsgroup: Continuing with sync
  • 21:05 ladsgroup@deploy1002: ladsgroup: Backport for Stop writing to the old pagelinks columns of s2 (T352010) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:03 ladsgroup@deploy1002: Started scap: Backport for Stop writing to the old pagelinks columns of s2 (T352010)
  • 21:01 ladsgroup@deploy1002: Finished scap: Backport for Avoid wrapping floated tables using computed styles (T366314) (duration: 14m 28s)
  • 20:52 ejegg: re-enabled fundraising scheduled jobs
  • 20:52 ladsgroup@deploy1002: jdlrobson, ladsgroup: Continuing with sync
  • 20:49 ladsgroup@deploy1002: jdlrobson, ladsgroup: Backport for Avoid wrapping floated tables using computed styles (T366314) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:46 ladsgroup@deploy1002: Started scap: Backport for Avoid wrapping floated tables using computed styles (T366314)
  • 20:46 ladsgroup@deploy1002: Finished scap: Backport for Drop unused config, enable responsive tables on group 0 (T301212 T366314) (duration: 14m 18s)
  • 20:36 ladsgroup@deploy1002: ladsgroup, jdlrobson: Continuing with sync
  • 20:34 ladsgroup@deploy1002: ladsgroup, jdlrobson: Backport for Drop unused config, enable responsive tables on group 0 (T301212 T366314) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:31 ladsgroup@deploy1002: Started scap: Backport for Drop unused config, enable responsive tables on group 0 (T301212 T366314)
  • 20:30 ladsgroup@deploy1002: Finished scap: Backport for [ptwikinews] Set atom feed link (T356003), [jawikinews] Set $wgArticleCountMethod to any (T364189) (duration: 12m 52s)
  • 20:21 ladsgroup@deploy1002: pppery, ladsgroup: Continuing with sync
  • 20:20 ladsgroup@deploy1002: pppery, ladsgroup: Backport for [ptwikinews] Set atom feed link (T356003), [jawikinews] Set $wgArticleCountMethod to any (T364189) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:17 ladsgroup@deploy1002: Started scap: Backport for [ptwikinews] Set atom feed link (T356003), [jawikinews] Set $wgArticleCountMethod to any (T364189)
  • 20:16 ladsgroup@deploy1002: Finished scap: Backport for MediaWiki.org: restrict unfuzzy rights to autoconfirmed (T366994) (duration: 12m 54s)
  • 20:13 eevans@cumin1002: START - Cookbook sre.cassandra.roll-reboot rolling reboot on A:aqs-codfw
  • 20:07 ladsgroup@deploy1002: ladsgroup, pppery: Continuing with sync
  • 20:06 ladsgroup@deploy1002: ladsgroup, pppery: Backport for MediaWiki.org: restrict unfuzzy rights to autoconfirmed (T366994) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:03 ladsgroup@deploy1002: Started scap: Backport for MediaWiki.org: restrict unfuzzy rights to autoconfirmed (T366994)
  • 19:38 kamila@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-ctrl1002
  • 19:38 kamila@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-ctrl1002
  • 19:33 kamila@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-ctrl1002.eqiad.wmnet with OS bullseye
  • 19:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1222 (T352010)', diff saved to https://phabricator.wikimedia.org/P64646 and previous config saved to /var/cache/conftool/dbconfig/20240611-192403-ladsgroup.json
  • 19:23 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl1002.eqiad.wmnet with OS bullseye
  • 19:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1222', diff saved to https://phabricator.wikimedia.org/P64645 and previous config saved to /var/cache/conftool/dbconfig/20240611-190855-ladsgroup.json
  • 18:59 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-reboot (exit_code=0) rolling reboot on A:aqs-eqiad
  • 18:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1222', diff saved to https://phabricator.wikimedia.org/P64644 and previous config saved to /var/cache/conftool/dbconfig/20240611-185348-ladsgroup.json
  • 18:46 ebernhardson@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 18:44 ebernhardson@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 18:41 ebernhardson@deploy1002: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 18:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1222 (T352010)', diff saved to https://phabricator.wikimedia.org/P64643 and previous config saved to /var/cache/conftool/dbconfig/20240611-183841-ladsgroup.json
  • 18:37 ebernhardson@deploy1002: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 18:22 brennen@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.43.0-wmf.9 refs T361403
  • 18:19 ebernhardson@deploy1002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 18:19 ebernhardson@deploy1002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 18:15 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2159 (T364069)', diff saved to https://phabricator.wikimedia.org/P64642 and previous config saved to /var/cache/conftool/dbconfig/20240611-181526-marostegui.json
  • 18:15 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 18:15 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 18:15 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2159.codfw.wmnet with reason: Maintenance
  • 18:14 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2159.codfw.wmnet with reason: Maintenance
  • 18:14 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T364069)', diff saved to https://phabricator.wikimedia.org/P64641 and previous config saved to /var/cache/conftool/dbconfig/20240611-181448-marostegui.json
  • 18:10 brennen: 1.43.0-wmf.9 train (T361403): no blockers, rolling to group0
  • 18:08 ejegg: stopped fundraising scheduled jobs
  • 17:59 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P64640 and previous config saved to /var/cache/conftool/dbconfig/20240611-175941-marostegui.json
  • 17:59 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 17:58 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 17:56 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 17:56 taavi@deploy1002: Finished scap: Backport for wikitech: Stop loading OpenStackManager (T161553 T338477 T359544) (duration: 12m 00s)
  • 17:56 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 17:47 taavi@deploy1002: taavi: Continuing with sync
  • 17:47 taavi@deploy1002: taavi: Backport for wikitech: Stop loading OpenStackManager (T161553 T338477 T359544) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 17:45 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 17:45 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 17:44 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2150', diff saved to https://phabricator.wikimedia.org/P64639 and previous config saved to /var/cache/conftool/dbconfig/20240611-174434-marostegui.json
  • 17:44 taavi@deploy1002: Started scap: Backport for wikitech: Stop loading OpenStackManager (T161553 T338477 T359544)
  • 17:37 rzl@deploy1002: Finished scap: (no justification provided) (duration: 11m 40s)
  • 17:33 rzl: rzl@cumin2002:~$ sudo cumin 'C:profile::mediawiki::webserver' 'enable-puppet T366649'
  • 17:33 rzl@deploy1002: rzl: Continuing with sync
  • 17:30 rzl@deploy1002: rzl: (no justification provided) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 17:29 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2150 (T364069)', diff saved to https://phabricator.wikimedia.org/P64638 and previous config saved to /var/cache/conftool/dbconfig/20240611-172928-marostegui.json
  • 17:26 rzl@deploy1002: Started scap: (no justification provided)
  • 17:14 rzl: rzl@cumin2002:~$ sudo cumin 'C:profile::mediawiki::webserver' 'disable-puppet T366649'
  • 17:11 ejegg: fundraising civicrm upgraded from ebfbad86 to 7252b1b9
  • 17:09 ebernhardson@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 17:09 ebernhardson@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 17:09 kamila@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-ctrl1002.eqiad.wmnet with OS bullseye
  • 17:08 ebernhardson@deploy1002: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 17:08 ebernhardson@deploy1002: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 17:04 ebernhardson@deploy1002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 17:04 ebernhardson@deploy1002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 17:04 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Unbanning all hosts in search_eqiad
  • 17:04 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Unbanning all hosts in search_eqiad
  • 16:59 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl1002.eqiad.wmnet with OS bullseye
  • 16:56 kamila@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-ctrl1002.eqiad.wmnet with OS bullseye
  • 16:56 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/superset: apply
  • 16:56 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/superset: apply
  • 16:53 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/superset-next: apply
  • 16:53 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/superset-next: apply
  • 16:51 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl1002.eqiad.wmnet with OS bullseye
  • 16:47 ryankemper@cumin2002: END (PASS) - Cookbook sre.hadoop.reboot-workers (exit_code=0) for Hadoop test cluster
  • 16:40 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-reboot (exit_code=0) rolling reboot on A:restbase-codfw
  • 16:37 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl1002.eqiad.wmnet with OS bullseye
  • 16:36 kamila@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-ctrl1002.eqiad.wmnet with OS bullseye
  • 16:35 ebernhardson@deploy1002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 16:35 ebernhardson@deploy1002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 16:33 kamila@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "updated wikikube-ctrl1002 status - kamila@cumin1002 - T366204"
  • 16:31 cgoubert@cumin1002: conftool action : set/weight=10:pooled=yes; selector: name=(wikikube-worker1013.eqiad.wmnet|wikikube-worker1014.eqiad.wmnet|wikikube-worker1017.eqiad.wmnet|wikikube-worker1018.eqiad.wmnet),cluster=kubernetes,service=kubesvc
  • 16:31 claime: pool and uncordon wikikube-worker1013.eqiad.wmnet,wikikube-worker1014.eqiad.wmnet,wikikube-worker1017.eqiad.wmnet,wikikube-worker1018.eqiad.wmnet - T351074
  • 16:31 kamila@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "updated wikikube-ctrl1002 status - kamila@cumin1002 - T366204"
  • 16:29 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl1002.eqiad.wmnet with OS bullseye
  • 16:28 kamila@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:27 kamila@cumin1002: conftool action : set/pooled=yes; selector: name=wikikube-ctrl1001.eqiad.wmnet
  • 16:26 kamila@cumin1002: START - Cookbook sre.dns.netbox
  • 16:21 arnaudb@cumin1002: dbctl commit (dc=all): 'es1038 (re)pooling @ 100%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P64637 and previous config saved to /var/cache/conftool/dbconfig/20240611-162154-arnaudb.json
  • 16:21 claime: homer 'cr*eqiad*' commit 'T351074'
  • 16:16 elukey: manual run of docker-report-k8s on build2001 (some failed results)
  • 16:12 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1017.eqiad.wmnet with OS bullseye
  • 16:09 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1018.eqiad.wmnet with OS bullseye
  • 16:07 kamila@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-ctrl1002
  • 16:06 arnaudb@cumin1002: dbctl commit (dc=all): 'es1038 (re)pooling @ 75%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P64636 and previous config saved to /var/cache/conftool/dbconfig/20240611-160649-arnaudb.json
  • 16:06 ryankemper@cumin2002: START - Cookbook sre.hadoop.reboot-workers for Hadoop test cluster
  • 16:05 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1014.eqiad.wmnet with OS bullseye
  • 16:05 kamila@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-ctrl1002
  • 16:05 kamila@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:05 kamila@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update moved wikikube-ctrl1002 host in eqiad - kamila@cumin1002"
  • 16:04 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-main: sync
  • 16:04 kamila@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Update moved wikikube-ctrl1002 host in eqiad - kamila@cumin1002"
  • 16:04 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-main: sync
  • 16:03 claime: roll restarting eventgate-main eqiad
  • 16:00 kamila@cumin1002: START - Cookbook sre.dns.netbox
  • 15:53 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1017.eqiad.wmnet with reason: host reimage
  • 15:51 arnaudb@cumin1002: dbctl commit (dc=all): 'es1038 (re)pooling @ 50%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P64635 and previous config saved to /var/cache/conftool/dbconfig/20240611-155143-arnaudb.json
  • 15:51 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/termbox: apply
  • 15:50 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1018.eqiad.wmnet with reason: host reimage
  • 15:50 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/termbox: apply
  • 15:47 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1014.eqiad.wmnet with reason: host reimage
  • 15:45 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1018.eqiad.wmnet with reason: host reimage
  • 15:45 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1017.eqiad.wmnet with reason: host reimage
  • 15:44 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1014.eqiad.wmnet with reason: host reimage
  • 14:58 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:35:00 on 6 hosts with reason: upgrade lsw1-f5-eqiad
  • 14:57 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:35:00 on 6 hosts with reason: upgrade lsw1-f5-eqiad
  • 14:56 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ping2003.codfw.wmnet
  • 14:53 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1013.eqiad.wmnet with OS bullseye
  • 14:52 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1013.eqiad.wmnet on all recursors
  • 14:52 cgoubert@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker1013.eqiad.wmnet on all recursors
  • 14:52 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on lsw1-f5-eqiad,lsw1-f5-eqiad IPv6,ssw1-e1-eqiad.mgmt,ssw1-f1-eqiad.mgmt with reason: prep upgrade of device
  • 14:52 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 14:51 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw1403 to wikikube-worker1014
  • 14:51 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:40:00 on lsw1-f5-eqiad,lsw1-f5-eqiad IPv6,ssw1-e1-eqiad.mgmt,ssw1-f1-eqiad.mgmt with reason: prep upgrade of device
  • 14:51 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1037.eqiad.wmnet
  • 14:51 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1037.eqiad.wmnet
  • 14:51 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.rename (exit_code=93) from mw1403 to wikikube-worker1014.eqiad.wmnet
  • 14:51 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw1403 to wikikube-worker1014.eqiad.wmnet
  • 14:51 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti3007.esams.wmnet
  • 14:50 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw1402 to wikikube-worker1013
  • 14:50 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1013
  • 14:46 arnaudb@cumin1002: dbctl commit (dc=all): 'es1038 depool T365982', diff saved to https://phabricator.wikimedia.org/P64631 and previous config saved to /var/cache/conftool/dbconfig/20240611-144624-arnaudb.json
  • 14:45 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1013
  • 14:45 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:45 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1402 to wikikube-worker1013 - cgoubert@cumin1002"
  • 14:45 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifeeds: apply
  • 14:44 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/wikifeeds: apply
  • 14:44 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1402 to wikikube-worker1013 - cgoubert@cumin1002"
  • 14:44 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts wikikube-ctrl1002.eqiad.wmnet
  • 14:44 kamila@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:44 kamila@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wikikube-ctrl1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - kamila@cumin1002"
  • 14:44 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti3007.esams.wmnet
  • 14:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1037.eqiad.wmnet
  • 14:42 kamila@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wikikube-ctrl1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - kamila@cumin1002"
  • 14:41 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 14:39 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifeeds: apply
  • 14:38 jiji@cumin1002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-worker-eqiad
  • 14:38 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/wikifeeds: apply
  • 14:38 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw1402 to wikikube-worker1013
  • 14:36 kamila@cumin1002: START - Cookbook sre.dns.netbox
  • 14:35 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti3008.esams.wmnet
  • 14:35 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub-next: apply on staging
  • 14:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti3008.esams.wmnet
  • 14:30 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on es1038.eqiad.wmnet with reason: T365982
  • 14:30 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on es1038.eqiad.wmnet with reason: T365982
  • 14:29 kamila@cumin1002: START - Cookbook sre.hosts.decommission for hosts wikikube-ctrl1002.eqiad.wmnet
  • 14:29 claime: depooling mw1402 mw1403 mw1406 mw1411 for reimage to k8s - T351074
  • 14:29 Lucas_WMDE: UTC afternoon backport+config window done
  • 14:28 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for Enable Vector appearance menu & larger font-size on wikipedias (T362148) (duration: 19m 08s)
  • 14:28 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:20:00 on lsw1-f5-eqiad.mgmt with reason: prep upgrade of device
  • 14:28 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 1:20:00 on lsw1-f5-eqiad.mgmt with reason: prep upgrade of device
  • 14:27 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti3008.esams.wmnet
  • 14:26 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1037.eqiad.wmnet
  • 14:20 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub-next: apply on staging
  • 14:19 logmsgbot: lucaswerkmeister-wmde@deploy1002 jdrewniak, lucaswerkmeister-wmde: Continuing with sync
  • 14:18 kamila@cumin1002: conftool action : set/pooled=inactive; selector: name=wikikube-ctrl1002.eqiad.wmnet
  • 14:17 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1036.eqiad.wmnet
  • 14:17 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1036.eqiad.wmnet
  • 14:12 logmsgbot: lucaswerkmeister-wmde@deploy1002 jdrewniak, lucaswerkmeister-wmde: Backport for Enable Vector appearance menu & larger font-size on wikipedias (T362148) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:11 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti3008.esams.wmnet
  • 14:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1036.eqiad.wmnet
  • 14:09 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap: Backport for Enable Vector appearance menu & larger font-size on wikipedias (T362148)
  • 14:08 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephosd1036.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:07 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for Enable CampaignEvents on swahili wikipedia (T366502) (duration: 14m 40s)
  • 14:05 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1035.eqiad.wmnet with OS bullseye
  • 14:04 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1017.eqiad.wmnet,service=s3
  • 14:04 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1017.eqiad.wmnet,service=s1
  • 14:02 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephosd1038.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:01 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1036.eqiad.wmnet
  • 14:01 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host clouddb1017.eqiad.wmnet
  • 14:00 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1035.eqiad.wmnet
  • 13:59 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1035.eqiad.wmnet
  • 13:58 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde, cmelo: Continuing with sync
  • 13:58 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephosd1035.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:58 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephosd1037.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:57 jclark@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephosd1036.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:55 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudcephosd1036.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:55 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde, cmelo: Backport for Enable CampaignEvents on swahili wikipedia (T366502) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:53 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1035.eqiad.wmnet
  • 13:52 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap: Backport for Enable CampaignEvents on swahili wikipedia (T366502)
  • 13:52 jclark@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephosd1036.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:51 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for Configures the necessary user rights for CampaignEvents on swahili (T366502) (duration: 44m 51s)
  • 13:50 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts stat1007.eqiad.wmnet
  • 13:50 btullis@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:50 fnegri@cumin1002: START - Cookbook sre.hosts.reboot-single for host clouddb1017.eqiad.wmnet
  • 13:49 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudcephosd1036.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:49 jclark@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephosd1038.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:48 btullis@cumin1002: START - Cookbook sre.dns.netbox
  • 13:47 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1017.eqiad.wmnet,service=s3
  • 13:47 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1017.eqiad.wmnet,service=s1
  • 13:46 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudcephosd1038.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:46 jclark@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephosd1038.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:46 jclark@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephosd1037.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:46 jclark@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephosd1036.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:46 jclark@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephosd1035.mgmt.eqiad.wmnet with reboot policy FORCED
  • 13:45 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:45 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt for cloudcephosd1035-38 - jclark@cumin1002"
  • 13:45 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1035.eqiad.wmnet
  • 13:45 vgutierrez: rolling switch from tcp-mss-clamper to ferm based MSS clamping on A:ncredir - T365689
  • 13:44 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt for cloudcephosd1035-38 - jclark@cumin1002"
  • 13:43 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1032.eqiad.wmnet
  • 13:43 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1032.eqiad.wmnet
  • 13:42 jiji@cumin1002: END (ERROR) - Cookbook sre.k8s.reboot-nodes (exit_code=97) rolling reboot on A:wikikube-worker-eqiad
  • 13:40 btullis@cumin1002: START - Cookbook sre.hosts.decommission for hosts stat1007.eqiad.wmnet
  • 13:40 jclark@cumin1002: START - Cookbook sre.dns.netbox
  • 13:40 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts stat1006.eqiad.wmnet
  • 13:40 btullis@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:40 btullis@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: stat1006.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - btullis@cumin1002"
  • 13:36 vgutierrez: repool ncredir6001 - T365689
  • 13:36 eevans@cumin1002: START - Cookbook sre.cassandra.roll-reboot rolling reboot on A:restbase-codfw
  • 13:36 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1032.eqiad.wmnet
  • 13:33 moritzm: failover ganeti cluster for esams01 to ganeti3005
  • 13:32 moritzm: failover ganeti cluster for esams02 to ganeti3006
  • 13:26 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti3006.esams.wmnet
  • 13:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti3006.esams.wmnet
  • 13:22 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1016.eqiad.wmnet,service=s5
  • 13:22 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1016.eqiad.wmnet,service=s8
  • 13:21 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1032.eqiad.wmnet
  • 13:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2205 (T352010)', diff saved to https://phabricator.wikimedia.org/P64630 and previous config saved to /var/cache/conftool/dbconfig/20240611-132043-ladsgroup.json
  • 13:19 logmsgbot: lucaswerkmeister-wmde@deploy1002 cmelo, lucaswerkmeister-wmde: Continuing with sync
  • 13:18 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti3006.esams.wmnet
  • 13:18 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1031.eqiad.wmnet
  • 13:17 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1031.eqiad.wmnet
  • 13:15 vgutierrez: depool ncredir6001 - T365689
  • 13:11 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1031.eqiad.wmnet
  • 13:11 logmsgbot: lucaswerkmeister-wmde@deploy1002 cmelo, lucaswerkmeister-wmde: Backport for Configures the necessary user rights for CampaignEvents on swahili (T366502) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:10 btullis@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: stat1006.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - btullis@cumin1002"
  • 13:09 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 13:09 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 13:09 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 13:07 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 13:07 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 13:06 fabfur@cumin1002: START - Cookbook sre.cdn.roll-reboot rolling reboot on A:cp-text_codfw
  • 13:06 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti3006.esams.wmnet
  • 13:06 vgutierrez: disable puppet on A:ncredir before merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/1035724 - T365689
  • 13:06 fabfur: start rebooting all cp-text_codfw hosts for T366555 (spaced 1.5 hrs)
  • 13:06 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap: Backport for Configures the necessary user rights for CampaignEvents on swahili (T366502)
  • 13:06 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 13:06 btullis@cumin1002: START - Cookbook sre.dns.netbox
  • 13:06 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 13:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P64629 and previous config saved to /var/cache/conftool/dbconfig/20240611-130535-ladsgroup.json
  • 13:04 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host clouddb1016.eqiad.wmnet
  • 13:03 vgutierrez: repool text@eqiad with IPIP encapsulation enabled - T366466
  • 13:02 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 13:01 jiji@cumin1002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-worker-eqiad
  • 12:59 btullis@cumin1002: START - Cookbook sre.hosts.decommission for hosts stat1006.eqiad.wmnet
  • 12:53 fnegri@cumin1002: START - Cookbook sre.hosts.reboot-single for host clouddb1016.eqiad.wmnet
  • 12:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2205', diff saved to https://phabricator.wikimedia.org/P64628 and previous config saved to /var/cache/conftool/dbconfig/20240611-125028-ladsgroup.json
  • 12:50 vgutierrez: rolling restart of pybal on lvs1020 and lvs1017 - T366466
  • 12:49 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1016.eqiad.wmnet,service=s8
  • 12:49 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1016.eqiad.wmnet,service=s5
  • 12:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2205 (T352010)', diff saved to https://phabricator.wikimedia.org/P64627 and previous config saved to /var/cache/conftool/dbconfig/20240611-123521-ladsgroup.json
  • 12:32 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1223.eqiad.wmnet with reason: Maintenance
  • 12:32 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1223.eqiad.wmnet with reason: Maintenance
  • 12:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2205 (T352010)', diff saved to https://phabricator.wikimedia.org/P64626 and previous config saved to /var/cache/conftool/dbconfig/20240611-123046-ladsgroup.json
  • 12:30 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2205.codfw.wmnet with reason: Maintenance
  • 12:30 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2205.codfw.wmnet with reason: Maintenance
  • 12:26 fabfur: cancelled previous command (text@eqiad is going to be depooled at the same time)
  • 12:25 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti3005.esams.wmnet
  • 12:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti3005.esams.wmnet
  • 12:23 fabfur: start rebooting all cp-text_codfw hosts for T366555 (spaced 1.5 hrs)
  • 12:19 vgutierrez: depool text@eqiad before enabling IPIP encapsulation - T366466
  • 12:17 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti3005.esams.wmnet
  • 12:14 sfaci@deploy1002: helmfile [eqiad] DONE helmfile.d/services/editor-analytics: apply
  • 12:13 sfaci@deploy1002: helmfile [eqiad] START helmfile.d/services/editor-analytics: apply
  • 12:13 sfaci@deploy1002: helmfile [codfw] DONE helmfile.d/services/editor-analytics: apply
  • 12:11 sfaci@deploy1002: helmfile [codfw] START helmfile.d/services/editor-analytics: apply
  • 12:10 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti3005.esams.wmnet
  • 12:10 sfaci@deploy1002: helmfile [staging] DONE helmfile.d/services/editor-analytics: apply
  • 12:09 sfaci@deploy1002: helmfile [staging] START helmfile.d/services/editor-analytics: apply
  • 12:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T352010)', diff saved to https://phabricator.wikimedia.org/P64625 and previous config saved to /var/cache/conftool/dbconfig/20240611-120710-ladsgroup.json
  • 12:07 sfaci@deploy1002: helmfile [eqiad] DONE helmfile.d/services/edit-analytics: apply
  • 12:06 claime: Finished kafka-main reboots in codfw
  • 12:06 cgoubert@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling reboot on A:kafka-main-codfw
  • 12:05 sfaci@deploy1002: helmfile [eqiad] START helmfile.d/services/edit-analytics: apply
  • 12:05 sfaci@deploy1002: helmfile [codfw] DONE helmfile.d/services/edit-analytics: apply
  • 12:04 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts stat1005.eqiad.wmnet
  • 12:04 btullis@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:04 btullis@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: stat1005.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - btullis@cumin1002"
  • 12:04 moritzm: rebalance ganeti cluster in ulsfo following reboots
  • 12:04 sfaci@deploy1002: helmfile [codfw] START helmfile.d/services/edit-analytics: apply
  • 12:03 sfaci@deploy1002: helmfile [staging] DONE helmfile.d/services/edit-analytics: apply
  • 12:02 sfaci@deploy1002: helmfile [staging] START helmfile.d/services/edit-analytics: apply
  • 12:00 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti4005.ulsfo.wmnet
  • 12:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4005.ulsfo.wmnet
  • 11:57 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1240.eqiad.wmnet with reason: repl issues
  • 11:57 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1240.eqiad.wmnet with reason: repl issues
  • 11:57 sfaci@deploy1002: helmfile [eqiad] DONE helmfile.d/services/page-analytics: apply
  • 11:55 sfaci@deploy1002: helmfile [eqiad] START helmfile.d/services/page-analytics: apply
  • 11:55 sfaci@deploy1002: helmfile [codfw] DONE helmfile.d/services/page-analytics: apply
  • 11:55 btullis@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: stat1005.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - btullis@cumin1002"
  • 11:54 sfaci@deploy1002: helmfile [codfw] START helmfile.d/services/page-analytics: apply
  • 11:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P64624 and previous config saved to /var/cache/conftool/dbconfig/20240611-115203-ladsgroup.json
  • 11:51 jayme: removed similar-users deployments from all k8s clusters - T345274
  • 11:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P64621 and previous config saved to /var/cache/conftool/dbconfig/20240611-113656-ladsgroup.json
  • 11:34 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2150 (T364069)', diff saved to https://phabricator.wikimedia.org/P64620 and previous config saved to /var/cache/conftool/dbconfig/20240611-113452-marostegui.json
  • 11:34 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2150.codfw.wmnet with reason: Maintenance
  • 11:34 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2150.codfw.wmnet with reason: Maintenance
  • 11:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2122 (T364069)', diff saved to https://phabricator.wikimedia.org/P64619 and previous config saved to /var/cache/conftool/dbconfig/20240611-113430-marostegui.json
  • 11:32 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1030.eqiad.wmnet
  • 11:31 marostegui@cumin1002: dbctl commit (dc=all): 'db1223 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P64618 and previous config saved to /var/cache/conftool/dbconfig/20240611-113121-root.json
  • 11:29 moritzm: failover ganeti master in ulsfo to ganeti4008
  • 11:27 btullis@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: stat1004.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - btullis@cumin1002"
  • 11:26 klausman@deploy1002: helmfile [ml-serve-eqiad] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 11:24 klausman@deploy1002: helmfile [ml-serve-eqiad] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
  • 11:23 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti4008.ulsfo.wmnet
  • 11:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4008.ulsfo.wmnet
  • 11:23 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 11:22 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 11:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T352010)', diff saved to https://phabricator.wikimedia.org/P64617 and previous config saved to /var/cache/conftool/dbconfig/20240611-112149-ladsgroup.json
  • 11:21 klausman@deploy1002: helmfile [ml-serve-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
  • 11:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2122', diff saved to https://phabricator.wikimedia.org/P64616 and previous config saved to /var/cache/conftool/dbconfig/20240611-111922-marostegui.json
  • 11:16 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4008.ulsfo.wmnet
  • 11:16 marostegui@cumin1002: dbctl commit (dc=all): 'db1223 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P64615 and previous config saved to /var/cache/conftool/dbconfig/20240611-111616-root.json
  • 11:15 klausman@deploy1002: helmfile [ml-serve-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 11:13 jayme: removing similar-users service - T345274
  • 11:12 btullis@cumin1002: START - Cookbook sre.dns.netbox
  • 11:09 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1015.eqiad.wmnet,service=s4
  • 11:09 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1015.eqiad.wmnet,service=s6
  • 11:09 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti4008.ulsfo.wmnet
  • 11:07 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host clouddb1015.eqiad.wmnet
  • 11:07 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti4007.ulsfo.wmnet
  • 11:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4007.ulsfo.wmnet
  • 11:06 cgoubert@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling reboot on A:kafka-main-codfw
  • 11:05 klausman@deploy1002: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 11:05 claime: Starting kafka-main reboots in codfw
  • 11:04 btullis@cumin1002: START - Cookbook sre.hosts.decommission for hosts stat1004.eqiad.wmnet
  • 11:04 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2122', diff saved to https://phabricator.wikimedia.org/P64614 and previous config saved to /var/cache/conftool/dbconfig/20240611-110414-marostegui.json
  • 11:00 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4007.ulsfo.wmnet
  • 10:57 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply
  • 10:57 klausman@deploy1002: helmfile [ml-staging-codfw] 'sync' command on namespace 'recommendation-api-ng' for release 'main' .
  • 10:50 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti4007.ulsfo.wmnet
  • 10:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2122 (T364069)', diff saved to https://phabricator.wikimedia.org/P64613 and previous config saved to /var/cache/conftool/dbconfig/20240611-104908-marostegui.json
  • 10:48 marostegui: dbmaint codfw s5 deploy schema change on db2123 T364069
  • 10:48 marostegui: dbmaint codfw s5 deploy schema change on db2123 T364299
  • 10:47 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db2123.codfw.wmnet with reason: Long schema change
  • 10:47 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on db2123.codfw.wmnet with reason: Long schema change
  • 10:45 fnegri@cumin1002: START - Cookbook sre.hosts.reboot-single for host clouddb1015.eqiad.wmnet
  • 10:45 claime: move 90% of traffic to mw-on-k8s - T362323
  • 10:43 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2123 T367145', diff saved to https://phabricator.wikimedia.org/P64612 and previous config saved to /var/cache/conftool/dbconfig/20240611-104336-root.json
  • 10:43 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti4006.ulsfo.wmnet
  • 10:43 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti4006.ulsfo.wmnet
  • 10:42 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db2213 to s5 primary T367145', diff saved to https://phabricator.wikimedia.org/P64611 and previous config saved to /var/cache/conftool/dbconfig/20240611-104232-root.json
  • 10:42 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply
  • 10:42 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply
  • 10:42 marostegui: Starting s5 codfw failover from db2123 to db2213 - T367145
  • 10:41 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
  • 10:40 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1015.eqiad.wmnet,service=s6
  • 10:40 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1015.eqiad.wmnet,service=s4
  • 10:39 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
  • 10:39 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
  • 10:38 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
  • 10:38 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 10:38 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 10:37 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
  • 10:37 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
  • 10:37 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti4006.ulsfo.wmnet
  • 10:34 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
  • 10:32 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
  • 10:29 marostegui@cumin1002: dbctl commit (dc=all): 'Remove db2213 from API/vslow/dump T367145', diff saved to https://phabricator.wikimedia.org/P64610 and previous config saved to /var/cache/conftool/dbconfig/20240611-102900-root.json
  • 10:28 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 24 hosts with reason: Primary switchover s5 T367145
  • 10:28 marostegui@cumin1002: dbctl commit (dc=all): 'Set db2213 with weight 0 T367145', diff saved to https://phabricator.wikimedia.org/P64609 and previous config saved to /var/cache/conftool/dbconfig/20240611-102820-root.json
  • 10:28 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 24 hosts with reason: Primary switchover s5 T367145
  • 10:27 jayme@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
  • 10:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1222 (T352010)', diff saved to https://phabricator.wikimedia.org/P64608 and previous config saved to /var/cache/conftool/dbconfig/20240611-102444-ladsgroup.json
  • 10:24 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1222.eqiad.wmnet with reason: Maintenance
  • 10:24 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1222.eqiad.wmnet with reason: Maintenance
  • 10:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1206 (T352010)', diff saved to https://phabricator.wikimedia.org/P64607 and previous config saved to /var/cache/conftool/dbconfig/20240611-102125-ladsgroup.json
  • 10:21 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1206.eqiad.wmnet with reason: Maintenance
  • 10:21 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1206.eqiad.wmnet with reason: Maintenance
  • 10:20 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti4006.ulsfo.wmnet
  • 10:18 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1030.eqiad.wmnet
  • 10:16 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
  • 10:16 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
  • 10:16 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
  • 10:16 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1014.eqiad.wmnet,service=s7
  • 10:16 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1014.eqiad.wmnet,service=s2
  • 10:16 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
  • 10:15 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
  • 10:15 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1029.eqiad.wmnet
  • 10:15 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1029.eqiad.wmnet
  • 10:15 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host clouddb1014.eqiad.wmnet
  • 10:15 jayme@deploy1002: helmfile [staging] START helmfile.d/services/developer-portal: apply
  • 10:14 filippo@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling reboot on A:kafka-logging-eqiad
  • 10:14 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1222 (T360332)', diff saved to https://phabricator.wikimedia.org/P64606 and previous config saved to /var/cache/conftool/dbconfig/20240611-101400-arnaudb.json
  • 10:11 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/linkrecommendation: apply
  • 10:10 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/linkrecommendation: apply
  • 10:10 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/linkrecommendation: apply
  • 10:09 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1029.eqiad.wmnet
  • 10:09 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/linkrecommendation: apply
  • 10:08 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mx1001.wikimedia.org
  • 10:08 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/linkrecommendation: apply
  • 10:08 jayme@deploy1002: helmfile [staging] START helmfile.d/services/linkrecommendation: apply
  • 10:07 brouberol@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
  • 10:07 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 10:06 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 10:06 brouberol@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
  • 10:06 brouberol@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
  • 10:06 brouberol@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
  • 10:04 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host mx1001.wikimedia.org
  • 10:04 brouberol@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
  • 10:04 fnegri@cumin1002: START - Cookbook sre.hosts.reboot-single for host clouddb1014.eqiad.wmnet
  • 10:03 brouberol@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
  • 10:02 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 10:02 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 10:02 brouberol@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 10:01 jmm@cumin2002: END (PASS) - Cookbook sre.pki.restart-reboot (exit_code=0) rolling reboot on A:pki
  • 10:01 brouberol@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 10:01 brouberol@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 10:00 brouberol@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 10:00 sukhe: [end] running authdns-update to send Bolivia (BO) and Paraguay (PY) to magru: T346722
  • 09:59 brouberol@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 09:59 brouberol@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 09:59 sukhe: [start] running authdns-update to send Bolivia (BO) and Paraguay (PY) to magru
  • 09:58 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1222', diff saved to https://phabricator.wikimedia.org/P64605 and previous config saved to /var/cache/conftool/dbconfig/20240611-095853-arnaudb.json
  • 09:58 brouberol@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 09:58 brouberol@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 09:57 brouberol@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 09:57 brouberol@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 09:56 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1029.eqiad.wmnet
  • 09:56 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1014.eqiad.wmnet,service=s2
  • 09:56 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1014.eqiad.wmnet,service=s7
  • 09:55 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1028.eqiad.wmnet
  • 09:55 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1028.eqiad.wmnet
  • 09:49 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1028.eqiad.wmnet
  • 09:49 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) pki.discovery.wmnet. on all recursors
  • 09:49 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache pki.discovery.wmnet. on all recursors
  • 09:45 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/toolhub: apply
  • 09:44 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/toolhub: apply
  • 09:44 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/toolhub: apply
  • 09:43 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1222', diff saved to https://phabricator.wikimedia.org/P64604 and previous config saved to /var/cache/conftool/dbconfig/20240611-094347-arnaudb.json
  • 09:43 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/toolhub: apply
  • 09:42 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) pki.discovery.wmnet. on all recursors
  • 09:42 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache pki.discovery.wmnet. on all recursors
  • 09:42 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/toolhub: apply
  • 09:42 jmm@cumin2002: START - Cookbook sre.pki.restart-reboot rolling reboot on A:pki
  • 09:41 jayme@deploy1002: helmfile [staging] START helmfile.d/services/toolhub: apply
  • 09:37 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp2027.codfw.wmnet
  • 09:36 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1028.eqiad.wmnet
  • 09:35 moritzm: rebalance ganeti clusters in codfw following reboots
  • 09:34 sfaci@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
  • 09:34 sfaci@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
  • 09:28 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1222 (T360332)', diff saved to https://phabricator.wikimedia.org/P64603 and previous config saved to /var/cache/conftool/dbconfig/20240611-092839-arnaudb.json
  • 09:27 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mx2001.wikimedia.org
  • 09:26 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1026.eqiad.wmnet
  • 09:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1026.eqiad.wmnet
  • 09:25 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1222 (T360332)', diff saved to https://phabricator.wikimedia.org/P64602 and previous config saved to /var/cache/conftool/dbconfig/20240611-092504-arnaudb.json
  • 09:24 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1222.eqiad.wmnet with reason: Maintenance
  • 09:24 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1222.eqiad.wmnet with reason: Maintenance
  • 09:23 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host mx2001.wikimedia.org
  • 09:20 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1026.eqiad.wmnet
  • 09:20 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2022.codfw.wmnet
  • 09:19 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2022.codfw.wmnet
  • 09:16 filippo@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling reboot on A:kafka-logging-eqiad
  • 09:13 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2022.codfw.wmnet
  • 09:08 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1026.eqiad.wmnet
  • 09:07 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1025.eqiad.wmnet
  • 09:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1025.eqiad.wmnet
  • 09:04 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2022.codfw.wmnet
  • 09:01 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1025.eqiad.wmnet
  • 08:57 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2032.codfw.wmnet
  • 08:57 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2032.codfw.wmnet
  • 08:53 gmodena@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 08:53 gmodena@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 08:51 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2032.codfw.wmnet
  • 08:47 gmodena@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 08:46 gmodena@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 08:46 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 08:46 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2032.codfw.wmnet
  • 08:46 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 08:45 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 08:45 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 08:44 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2031.codfw.wmnet
  • 08:43 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2031.codfw.wmnet
  • 08:41 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1025.eqiad.wmnet
  • 08:38 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ time mwscript extensions/DiscussionTools/maintenance/persistRevisionThreadItems.php --wiki enwiki --current --all --touched-after=20240524120000 --start '["55019880"]' 2>&1 | tee -a ~/T315510-enwiki-8; date
  • 08:37 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2031.codfw.wmnet
  • 08:33 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp2027.ulsfo.wmnet
  • 08:32 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp2027.codfw.wmnet
  • 08:31 marostegui: Install 10.11 on db1153 (non used x2 replica) T365805
  • 08:31 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db1153.eqiad.wmnet with reason: Long schema change
  • 08:31 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on db1153.eqiad.wmnet with reason: Long schema change
  • 08:31 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2031.codfw.wmnet
  • 08:31 filippo@cumin1002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling reboot on A:kafka-logging-codfw
  • 08:30 marostegui: Install 10.11 on db1153 (non used x2 replioca)
  • 08:13 marostegui@cumin1002: dbctl commit (dc=all): 'db1222 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P64600 and previous config saved to /var/cache/conftool/dbconfig/20240611-081314-root.json
  • 08:05 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1024.eqiad.wmnet
  • 08:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1024.eqiad.wmnet
  • 08:02 gmodena@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 08:02 gmodena@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 07:58 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1024.eqiad.wmnet
  • 07:58 marostegui@cumin1002: dbctl commit (dc=all): 'db1222 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P64599 and previous config saved to /var/cache/conftool/dbconfig/20240611-075809-root.json
  • 07:55 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2029.codfw.wmnet
  • 07:54 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2030.codfw.wmnet
  • 07:54 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2030.codfw.wmnet
  • 07:48 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2030.codfw.wmnet
  • 07:47 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1024.eqiad.wmnet
  • 07:45 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1023.eqiad.wmnet
  • 07:45 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2030.codfw.wmnet
  • 07:44 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1023.eqiad.wmnet
  • 07:43 marostegui@cumin1002: dbctl commit (dc=all): 'db1222 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P64598 and previous config saved to /var/cache/conftool/dbconfig/20240611-074304-root.json
  • 07:40 kart_: Updated MinT to 2024-06-11-052620-production (T364122, T346226, T357548)
  • 07:40 marostegui@cumin1002: dbctl commit (dc=all): 'db1233 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P64597 and previous config saved to /var/cache/conftool/dbconfig/20240611-074009-root.json
  • 07:38 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1023.eqiad.wmnet
  • 07:37 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply
  • 07:36 filippo@cumin1002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling reboot on A:kafka-logging-codfw
  • 07:28 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply
  • 07:27 marostegui@cumin1002: dbctl commit (dc=all): 'db1222 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P64596 and previous config saved to /var/cache/conftool/dbconfig/20240611-072758-root.json
  • 07:26 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply
  • 07:25 marostegui@cumin1002: dbctl commit (dc=all): 'db1233 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P64595 and previous config saved to /var/cache/conftool/dbconfig/20240611-072504-root.json
  • 07:18 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
  • 07:17 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
  • 07:13 kartik@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
  • 07:12 marostegui@cumin1002: dbctl commit (dc=all): 'db1222 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P64594 and previous config saved to /var/cache/conftool/dbconfig/20240611-071253-root.json
  • 07:11 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1023.eqiad.wmnet
  • 07:09 marostegui@cumin1002: dbctl commit (dc=all): 'db1233 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P64593 and previous config saved to /var/cache/conftool/dbconfig/20240611-070958-root.json
  • 07:05 arnaudb@deploy1002: Finished scap: Backport for Revert "dbconfig: temporary disable writes on es6" (duration: 11m 36s)
  • 07:02 moritzm: failover ganeti master in codfw to ganeti2020
  • 06:57 arnaudb@deploy1002: arnaudb: Continuing with sync
  • 06:56 arnaudb@deploy1002: arnaudb: Backport for Revert "dbconfig: temporary disable writes on es6" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 06:54 marostegui@cumin1002: dbctl commit (dc=all): 'db1233 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P64592 and previous config saved to /var/cache/conftool/dbconfig/20240611-065453-root.json
  • 06:54 arnaudb@deploy1002: Started scap: Backport for Revert "dbconfig: temporary disable writes on es6"
  • 06:40 arnaudb@cumin1002: dbctl commit (dc=all): 'mimic weight', diff saved to https://phabricator.wikimedia.org/P64591 and previous config saved to /var/cache/conftool/dbconfig/20240611-064041-arnaudb.json
  • 06:40 oblivian@deploy1002: Unlocked for deployment [ALL REPOSITORIES]: incident in progress, blocking deploys --joe (duration: 15m 33s)
  • 06:39 marostegui@cumin1002: dbctl commit (dc=all): 'db1233 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P64590 and previous config saved to /var/cache/conftool/dbconfig/20240611-063947-root.json
  • 06:39 arnaudb@cumin1002: dbctl commit (dc=all): 'mimic weight', diff saved to https://phabricator.wikimedia.org/P64589 and previous config saved to /var/cache/conftool/dbconfig/20240611-063903-arnaudb.json
  • 06:31 arnaudb@cumin1002: dbctl commit (dc=all): 'Promote es1037 to es6 primary T367055', diff saved to https://phabricator.wikimedia.org/P64588 and previous config saved to /var/cache/conftool/dbconfig/20240611-063109-arnaudb.json
  • 06:30 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 06:30 arnaudb: Starting es6 eqiad failover from es1038 to es1037 - T367055
  • 06:24 marostegui@cumin1002: dbctl commit (dc=all): 'db1233 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P64587 and previous config saved to /var/cache/conftool/dbconfig/20240611-062441-root.json
  • 06:24 oblivian@deploy1002: Locking from deployment [ALL REPOSITORIES]: incident in progress, blocking deploys --joe
  • 06:23 arnaudb@cumin1002: dbctl commit (dc=all): 'Set es1037 with weight 0 T367055', diff saved to https://phabricator.wikimedia.org/P64586 and previous config saved to /var/cache/conftool/dbconfig/20240611-062353-arnaudb.json
  • 06:23 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 6 hosts with reason: Primary switchover es6 T367055
  • 06:23 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 6 hosts with reason: Primary switchover es6 T367055
  • 06:19 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 06:14 marostegui@cumin1002: dbctl commit (dc=all): 'db2140 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P64585 and previous config saved to /var/cache/conftool/dbconfig/20240611-061413-root.json
  • 06:12 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 06:11 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 06:09 marostegui@cumin1002: dbctl commit (dc=all): 'db1233 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P64584 and previous config saved to /var/cache/conftool/dbconfig/20240611-060935-root.json
  • 06:09 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 06:07 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 06:07 arnaudb@deploy1002: Finished scap: Backport for dbconfig: temporary disable writes on es6 (T367055) (duration: 15m 42s)
  • 05:59 marostegui@cumin1002: dbctl commit (dc=all): 'db2140 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P64583 and previous config saved to /var/cache/conftool/dbconfig/20240611-055907-root.json
  • 05:58 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1233.eqiad.wmnet with reason: maintenance
  • 05:58 arnaudb@deploy1002: arnaudb: Continuing with sync
  • 05:58 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1233.eqiad.wmnet with reason: maintenance
  • 05:58 arnaudb@cumin1002: dbctl commit (dc=all): 'depool db1233', diff saved to https://phabricator.wikimedia.org/P64582 and previous config saved to /var/cache/conftool/dbconfig/20240611-055816-arnaudb.json
  • 05:56 arnaudb@deploy1002: arnaudb: Backport for dbconfig: temporary disable writes on es6 (T367055) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 05:51 arnaudb@deploy1002: Started scap: Backport for dbconfig: temporary disable writes on es6 (T367055)
  • 05:44 marostegui@cumin1002: dbctl commit (dc=all): 'db2140 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P64581 and previous config saved to /var/cache/conftool/dbconfig/20240611-054401-root.json
  • 05:28 marostegui@cumin1002: dbctl commit (dc=all): 'db2140 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P64580 and previous config saved to /var/cache/conftool/dbconfig/20240611-052856-root.json
  • 05:24 marostegui: dbmaint eqiad s3 deploy schema change on db1223 T364069
  • 05:22 marostegui: dbmaint eqiad s3 deploy schema change on db1223 T364299
  • 05:21 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1223.eqiad.wmnet with reason: Long schema change
  • 05:21 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1223.eqiad.wmnet with reason: Long schema change
  • 05:21 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1223 T367140', diff saved to https://phabricator.wikimedia.org/P64579 and previous config saved to /var/cache/conftool/dbconfig/20240611-052101-root.json
  • 05:20 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db1157 to s3 primary and set section read-write T367140', diff saved to https://phabricator.wikimedia.org/P64578 and previous config saved to /var/cache/conftool/dbconfig/20240611-052000-root.json
  • 05:19 marostegui@cumin1002: dbctl commit (dc=all): 'Set s3 eqiad as read-only for maintenance - T367140', diff saved to https://phabricator.wikimedia.org/P64577 and previous config saved to /var/cache/conftool/dbconfig/20240611-051941-root.json
  • 05:19 marostegui: Starting s3 eqiad failover from db1223 to db1157 - T367140
  • 05:13 marostegui@cumin1002: dbctl commit (dc=all): 'db2140 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P64576 and previous config saved to /var/cache/conftool/dbconfig/20240611-051351-root.json
  • 05:04 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 24 hosts with reason: Primary switchover s3 T367140
  • 05:03 marostegui@cumin1002: dbctl commit (dc=all): 'Set db1157 with weight 0 T367140', diff saved to https://phabricator.wikimedia.org/P64575 and previous config saved to /var/cache/conftool/dbconfig/20240611-050351-root.json
  • 05:03 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 24 hosts with reason: Primary switchover s3 T367140
  • 04:58 marostegui@cumin1002: dbctl commit (dc=all): 'db2140 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P64574 and previous config saved to /var/cache/conftool/dbconfig/20240611-045845-root.json
  • 04:57 marostegui: dbmaint eqiad s2 deploy schema change on db1222 T364299
  • 04:56 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1222.eqiad.wmnet with reason: Long schema change
  • 04:56 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1222.eqiad.wmnet with reason: Long schema change
  • 04:54 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1222 T366687', diff saved to https://phabricator.wikimedia.org/P64573 and previous config saved to /var/cache/conftool/dbconfig/20240611-045447-root.json
  • 04:54 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db1162 to s2 primary and set section read-write T366687', diff saved to https://phabricator.wikimedia.org/P64572 and previous config saved to /var/cache/conftool/dbconfig/20240611-045359-root.json
  • 04:53 marostegui@cumin1002: dbctl commit (dc=all): 'Set s2 eqiad as read-only for maintenance - T366687', diff saved to https://phabricator.wikimedia.org/P64571 and previous config saved to /var/cache/conftool/dbconfig/20240611-045341-root.json
  • 04:53 marostegui: Starting s2 eqiad failover from db1222 to db1162 - T366687
  • 04:46 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2122 (T364069)', diff saved to https://phabricator.wikimedia.org/P64570 and previous config saved to /var/cache/conftool/dbconfig/20240611-044616-marostegui.json
  • 04:46 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2122.codfw.wmnet with reason: Maintenance
  • 04:45 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2122.codfw.wmnet with reason: Maintenance
  • 04:43 marostegui@cumin1002: dbctl commit (dc=all): 'db2140 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P64569 and previous config saved to /var/cache/conftool/dbconfig/20240611-044339-root.json
  • 04:33 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 26 hosts with reason: Primary switchover s2 T366687
  • 04:33 marostegui@cumin1002: dbctl commit (dc=all): 'Set db1162 with weight 0 T366687', diff saved to https://phabricator.wikimedia.org/P64568 and previous config saved to /var/cache/conftool/dbconfig/20240611-043333-marostegui.json
  • 04:33 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 26 hosts with reason: Primary switchover s2 T366687
  • 04:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216 (T352010)', diff saved to https://phabricator.wikimedia.org/P64567 and previous config saved to /var/cache/conftool/dbconfig/20240611-041938-ladsgroup.json
  • 04:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216', diff saved to https://phabricator.wikimedia.org/P64566 and previous config saved to /var/cache/conftool/dbconfig/20240611-040432-ladsgroup.json
  • 04:01 mwpresync@deploy1002: Pruned MediaWiki: 1.43.0-wmf.6 (duration: 01m 05s)
  • 04:00 mwpresync@deploy1002: Finished scap: testwikis wikis to 1.43.0-wmf.9 refs T361403 (duration: 57m 19s)
  • 03:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216', diff saved to https://phabricator.wikimedia.org/P64565 and previous config saved to /var/cache/conftool/dbconfig/20240611-034925-ladsgroup.json
  • 03:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216 (T352010)', diff saved to https://phabricator.wikimedia.org/P64564 and previous config saved to /var/cache/conftool/dbconfig/20240611-033418-ladsgroup.json
  • 03:02 mwpresync@deploy1002: Started scap: testwikis wikis to 1.43.0-wmf.9 refs T361403
  • 00:40 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-reboot (exit_code=0) rolling reboot on A:restbase-eqiad

2024-06-10

  • 23:52 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance
  • 23:52 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance
  • 22:36 dcausse@deploy1002: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 22:36 dcausse@deploy1002: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 22:30 dcausse@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 22:30 dcausse@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 22:28 dcausse@deploy1002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 22:27 dcausse@deploy1002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 22:25 reedy@deploy1002: Synchronized wmf-config/: sync interwiki lists (duration: 10m 07s)
  • 22:14 reedy@deploy1002: Synchronized langlist-labs: Add fr and bn (duration: 14m 29s)
  • 21:56 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance
  • 21:56 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance
  • 21:56 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1224 (T364069)', diff saved to https://phabricator.wikimedia.org/P64563 and previous config saved to /var/cache/conftool/dbconfig/20240610-215622-marostegui.json
  • 21:41 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1224', diff saved to https://phabricator.wikimedia.org/P64562 and previous config saved to /var/cache/conftool/dbconfig/20240610-214115-marostegui.json
  • 21:27 eevans@cumin1002: START - Cookbook sre.cassandra.roll-reboot rolling reboot on A:restbase-eqiad
  • 21:26 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1224', diff saved to https://phabricator.wikimedia.org/P64561 and previous config saved to /var/cache/conftool/dbconfig/20240610-212608-marostegui.json
  • 21:19 ejegg: fundraising python tools upgraded from 8c98b674 to c51f6e62
  • 21:19 ejegg: Standalone SmashPig upgraded from edf573bb to 1d1b770c
  • 21:11 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1224 (T364069)', diff saved to https://phabricator.wikimedia.org/P64560 and previous config saved to /var/cache/conftool/dbconfig/20240610-211101-marostegui.json
  • 20:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2216 (T352010)', diff saved to https://phabricator.wikimedia.org/P64559 and previous config saved to /var/cache/conftool/dbconfig/20240610-204622-ladsgroup.json
  • 20:46 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2216.codfw.wmnet with reason: Maintenance
  • 20:46 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2216.codfw.wmnet with reason: Maintenance
  • 20:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2203 (T352010)', diff saved to https://phabricator.wikimedia.org/P64558 and previous config saved to /var/cache/conftool/dbconfig/20240610-204600-ladsgroup.json
  • 20:36 sfaci@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
  • 20:36 sfaci@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
  • 20:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2203', diff saved to https://phabricator.wikimedia.org/P64557 and previous config saved to /var/cache/conftool/dbconfig/20240610-203053-ladsgroup.json
  • 20:30 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/discovery/wdqs-reload-cookbook-test-T349069/ using stat1009.eqiad.wmnet)
  • 20:30 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-reload reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/discovery/wdqs-reload-cookbook-test-T349069/ using stat1009.eqiad.wmnet)
  • 20:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2203', diff saved to https://phabricator.wikimedia.org/P64556 and previous config saved to /var/cache/conftool/dbconfig/20240610-201546-ladsgroup.json
  • 20:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2203 (T352010)', diff saved to https://phabricator.wikimedia.org/P64555 and previous config saved to /var/cache/conftool/dbconfig/20240610-200039-ladsgroup.json
  • 19:58 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1224 (T364069)', diff saved to https://phabricator.wikimedia.org/P64554 and previous config saved to /var/cache/conftool/dbconfig/20240610-195826-marostegui.json
  • 19:58 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1224.eqiad.wmnet with reason: Maintenance
  • 19:58 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1224.eqiad.wmnet with reason: Maintenance
  • 19:58 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T364069)', diff saved to https://phabricator.wikimedia.org/P64553 and previous config saved to /var/cache/conftool/dbconfig/20240610-195804-marostegui.json
  • 19:42 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P64552 and previous config saved to /var/cache/conftool/dbconfig/20240610-194256-marostegui.json
  • 19:27 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P64551 and previous config saved to /var/cache/conftool/dbconfig/20240610-192749-marostegui.json
  • 19:22 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/discovery/wdqs-reload-cookbook-test-T349069/ using stat1009.eqiad.wmnet)
  • 19:12 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T364069)', diff saved to https://phabricator.wikimedia.org/P64550 and previous config saved to /var/cache/conftool/dbconfig/20240610-191242-marostegui.json
  • 19:02 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-reload reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/discovery/wdqs-reload-cookbook-test-T349069/ using stat1009.eqiad.wmnet)
  • 19:02 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/discovery/wdqs-reload-cookbook-test-T349069/ using stat1009.eqiad.wmnet)
  • 18:11 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-reload reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/discovery/wdqs-reload-cookbook-test-T349069/ using stat1009.eqiad.wmnet)
  • 17:50 amastilovic@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 17:50 amastilovic@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 17:47 amastilovic@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 17:46 amastilovic@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 17:43 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1201 (T364069)', diff saved to https://phabricator.wikimedia.org/P64547 and previous config saved to /var/cache/conftool/dbconfig/20240610-174349-marostegui.json
  • 17:43 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1201.eqiad.wmnet with reason: Maintenance
  • 17:43 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1201.eqiad.wmnet with reason: Maintenance
  • 17:43 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T364069)', diff saved to https://phabricator.wikimedia.org/P64546 and previous config saved to /var/cache/conftool/dbconfig/20240610-174327-marostegui.json
  • 17:37 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 17:36 otto@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 17:30 dancy@deploy1002: Installation of scap version "4.87.0" completed for 285 hosts
  • 17:29 amastilovic@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 17:29 amastilovic@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 17:28 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P64545 and previous config saved to /var/cache/conftool/dbconfig/20240610-172820-marostegui.json
  • 17:25 dancy@deploy1002: Installing scap version "4.87.0" for 285 hosts
  • 17:13 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P64544 and previous config saved to /var/cache/conftool/dbconfig/20240610-171313-marostegui.json
  • 17:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2200.codfw.wmnet with reason: Maintenance
  • 17:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2200.codfw.wmnet with reason: Maintenance
  • 16:58 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T364069)', diff saved to https://phabricator.wikimedia.org/P64543 and previous config saved to /var/cache/conftool/dbconfig/20240610-165806-marostegui.json
  • 16:26 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
  • 16:21 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 16:20 marostegui: Drop flaggedpage_pending from s1 T365568
  • 16:05 cdanis: 💙cdanis@cumin1002.eqiad.wmnet ~ 🕛☕ sudo cumin -b 8 '*.codfw.wmnet and C:geoip::data::puppet%fetch_ipinfo_dbs=true' 'sha512sum /usr/share/GeoIPInfo/GeoLite2-ASN.mmdb || run-puppet-agent'
  • 16:01 cdanis: 💙cdanis@puppetserver2001.codfw.wmnet ~ 🕛☕ sudo systemctl restart sync-puppet-volatile
  • 16:00 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 16:00 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-reboot (exit_code=0) rolling reboot on A:cassandra-dev
  • 15:54 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 15:47 marostegui: Drop flaggedpage_pending from s3 T365568
  • 15:46 marostegui: Drop flaggedpage_pending from s5 T365568
  • 15:43 marostegui: Drop flaggedpage_pending from s2 T365568
  • 15:42 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 15:42 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 15:41 godog: bounce benthos@mw_accesslog_metrics.service on centrallog hosts
  • 15:41 marostegui: Drop flaggedpage_pending from s7 T365568
  • 15:40 marostegui: Drop flaggedpage_pending from s6 T365568
  • 15:34 ladsgroup@deploy1002: Synchronized portals: (no justification provided) (duration: 11m 20s)
  • 15:31 eevans@cumin1002: START - Cookbook sre.cassandra.roll-reboot rolling reboot on A:cassandra-dev
  • 15:31 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/proton: apply
  • 15:29 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/proton: apply
  • 15:22 ladsgroup@deploy1002: Synchronized portals/wikipedia.org/assets: (no justification provided) (duration: 10m 28s)
  • 15:07 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2024.codfw.wmnet
  • 15:07 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2024.codfw.wmnet
  • 15:05 cdobbins@cumin1002: conftool action : set/pooled=yes; selector: name=4046.ulsfo.wmnet
  • 15:04 ladsgroup@deploy1002: Finished scap: Backport for errorpages: Add dark mode support (duration: 17m 15s)
  • 15:03 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply
  • 15:02 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply
  • 15:02 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 15:02 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 15:02 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply
  • 15:01 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply
  • 15:01 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply
  • 15:01 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 15:01 cdobbins@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4046.ulsfo.wmnet
  • 15:01 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply
  • 15:01 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply
  • 15:00 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox: apply
  • 15:00 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2024.codfw.wmnet
  • 14:59 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
  • 14:59 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply
  • 14:58 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply
  • 14:58 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 14:57 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 14:57 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply
  • 14:56 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-media: apply
  • 14:56 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply
  • 14:56 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply
  • 14:56 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching aqs1010.eqiad.wmnet: Apply update to Java 11 - eevans@cumin1002
  • 14:56 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox: apply
  • 14:55 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox: apply
  • 14:55 ladsgroup@deploy1002: ladsgroup and ebrahim: Continuing with sync
  • 14:54 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 14:54 ladsgroup@deploy1002: ladsgroup and ebrahim: Backport for errorpages: Add dark mode support synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:53 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2024.codfw.wmnet
  • 14:53 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
  • 14:53 cgoubert@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply
  • 14:52 cgoubert@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply
  • 14:52 moritzm: powercycling ganeti1019, stuck on reboot
  • 14:52 cgoubert@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 14:52 cgoubert@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 14:52 cgoubert@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply
  • 14:52 ChrisDobbins901_: sudo -i cookbook sre.hosts.reboot-single -r 'Kernel upgrade' 'P{cp4046.*}'
  • 14:51 cgoubert@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-media: apply
  • 14:51 cdobbins@cumin1002: START - Cookbook sre.hosts.reboot-single for host cp4046.ulsfo.wmnet
  • 14:51 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
  • 14:51 cgoubert@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply
  • 14:51 cgoubert@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply
  • 14:50 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop: apply
  • 14:50 cgoubert@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox: apply
  • 14:50 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/proton: apply
  • 14:49 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/proton: apply
  • 14:48 cgoubert@deploy1002: helmfile [staging] START helmfile.d/services/shellbox: apply
  • 14:48 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching aqs1010.eqiad.wmnet: Apply update to Java 11 - eevans@cumin1002
  • 14:47 urandom: aqs1010: restarting cassandra to apply upgrade to Java 11 — T350567
  • 14:47 ladsgroup@deploy1002: Started scap: Backport for errorpages: Add dark mode support
  • 14:46 cdobbins@cumin1002: conftool action : set/pooled=no; selector: name=cp4046.ulsfo.wmnet
  • 14:46 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 14:45 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 14:45 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2023.codfw.wmnet
  • 14:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2023.codfw.wmnet
  • 14:45 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1187 (T364069)', diff saved to https://phabricator.wikimedia.org/P64539 and previous config saved to /var/cache/conftool/dbconfig/20240610-144501-marostegui.json
  • 14:44 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1187.eqiad.wmnet with reason: Maintenance
  • 14:44 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 14:44 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1187.eqiad.wmnet with reason: Maintenance
  • 14:44 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T364069)', diff saved to https://phabricator.wikimedia.org/P64538 and previous config saved to /var/cache/conftool/dbconfig/20240610-144439-marostegui.json
  • 14:44 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop: sync
  • 14:43 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on elastic1107.eqiad.wmnet with reason: T365982
  • 14:43 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop: sync
  • 14:43 swfrench@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 14:43 bking@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on elastic1107.eqiad.wmnet with reason: T365982
  • 14:42 swfrench@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 14:41 swfrench@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 14:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1019.eqiad.wmnet
  • 14:41 swfrench@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 14:40 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2023.codfw.wmnet
  • 14:39 swfrench@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 14:38 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: sync
  • 14:37 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
  • 14:36 elukey@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: sync
  • 14:36 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
  • 14:33 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1019.eqiad.wmnet
  • 14:32 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2023.codfw.wmnet
  • 14:31 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti2023.codfw.wmnet
  • 14:31 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2023.codfw.wmnet
  • 14:29 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P64537 and previous config saved to /var/cache/conftool/dbconfig/20240610-142931-marostegui.json
  • 14:23 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 14:23 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 14:19 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
  • 14:19 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
  • 14:19 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/datasets-config: apply
  • 14:19 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/datasets-config: apply
  • 14:18 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/datasets-config: apply
  • 14:18 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/datasets-config-next: apply
  • 14:18 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/datasets-config-next: apply
  • 14:14 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P64536 and previous config saved to /var/cache/conftool/dbconfig/20240610-141422-marostegui.json
  • 14:11 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 14:10 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 13:59 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T364069)', diff saved to https://phabricator.wikimedia.org/P64535 and previous config saved to /var/cache/conftool/dbconfig/20240610-135914-marostegui.json
  • 13:57 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: elastic1107.eqiad.wmnet for T348977 - bking@cumin2002
  • 13:57 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: elastic1107.eqiad.wmnet for T348977 - bking@cumin2002
  • 13:57 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.ban (exit_code=99) Banning hosts: elastic1107 for T348977 - bking@cumin2002
  • 13:57 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: elastic1107 for T348977 - bking@cumin2002
  • 13:50 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp4047.ulsfo.wmnet
  • 13:49 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/superset: apply
  • 13:48 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/superset: apply
  • 13:47 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/superset-next: apply
  • 13:47 taavi@cumin1002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-secondary-eqiad or A:lvs-low-traffic-eqiad and A:lvs
  • 13:47 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/superset-next: apply
  • 13:46 taavi@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-secondary-eqiad or A:lvs-low-traffic-eqiad and A:lvs
  • 13:43 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/echoserver: apply
  • 13:43 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/echoserver: apply
  • 13:42 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 13:42 elukey@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 13:37 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/spark-history: apply
  • 13:36 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/spark-history: apply
  • 13:36 elukey: move recommendation-api on wikikube to prometheus metrics (offboarded from statsd) - T205870
  • 13:36 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/recommendation-api: sync
  • 13:35 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/recommendation-api: sync
  • 13:34 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/spark-history: apply
  • 13:34 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/spark-history: apply
  • 13:30 marostegui: dbmaint codfw s4 deploy schema change on db2140 T364069
  • 13:29 taavi: taavi@mw1447 ~ $ sudo /usr/local/sbin/restart-php-fpm-all php7.4-fpm 9223372 # leftover from me restarting LVS during deployment
  • 13:28 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2140.codfw.wmnet with reason: Long schema change
  • 13:28 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2140.codfw.wmnet with reason: Long schema change
  • 13:27 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/recommendation-api: sync
  • 13:26 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/recommendation-api: sync
  • 13:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2203 (T352010)', diff saved to https://phabricator.wikimedia.org/P64534 and previous config saved to /var/cache/conftool/dbconfig/20240610-132619-ladsgroup.json
  • 13:26 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2203.codfw.wmnet with reason: Maintenance
  • 13:25 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2203.codfw.wmnet with reason: Maintenance
  • 13:25 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/recommendation-api: sync
  • 13:25 elukey@deploy1002: helmfile [staging] START helmfile.d/services/recommendation-api: sync
  • 13:20 ladsgroup@deploy1002: Finished scap: Backport for [huwiki] Add "suppressredirect" user right to editor user group (T366438) (duration: 15m 05s)
  • 13:19 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4047.ulsfo.wmnet
  • 13:18 taavi@cumin1002: END (FAIL) - Cookbook sre.loadbalancer.restart-pybal (exit_code=99) rolling-restart of pybal on A:lvs-secondary-eqiad or A:lvs-low-traffic-eqiad and A:lvs
  • 13:18 taavi@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-secondary-eqiad or A:lvs-low-traffic-eqiad and A:lvs
  • 13:13 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2021.codfw.wmnet
  • 13:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2021.codfw.wmnet
  • 13:13 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1018.eqiad.wmnet
  • 13:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1018.eqiad.wmnet
  • 13:11 taavi: restarting eqiad low-traffic LVS for https://gerrit.wikimedia.org/r/c/operations/puppet/+/941459
  • 13:11 ladsgroup@deploy1002: ladsgroup and gergesshamon: Continuing with sync
  • 13:10 fabfur@cumin1002: START - Cookbook sre.hosts.reboot-single for host cp4047.ulsfo.wmnet
  • 13:10 elukey@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 13:09 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp4047.ulsfo.wmnet
  • 13:09 fabfur: rebooting cp4047 (T366555)
  • 13:09 elukey@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 13:08 ladsgroup@deploy1002: ladsgroup and gergesshamon: Backport for [huwiki] Add "suppressredirect" user right to editor user group (T366438) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:08 elukey@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
  • 13:07 elukey@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
  • 13:07 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2021.codfw.wmnet
  • 13:07 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1018.eqiad.wmnet
  • 13:05 ladsgroup@deploy1002: Started scap: Backport for [huwiki] Add "suppressredirect" user right to editor user group (T366438)
  • 13:04 elukey@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
  • 13:04 elukey@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
  • 13:03 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
  • 13:03 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
  • 13:01 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'sync'.
  • 13:01 elukey@deploy1002: helmfile [eqiad] START helmfile.d/admin 'sync'.
  • 12:58 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'sync'.
  • 12:58 elukey@deploy1002: helmfile [eqiad] START helmfile.d/admin 'sync'.
  • 12:55 fabfur: repooling text@drmrs (IPIP encapsulation enabled) (T366466)
  • 12:53 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub-next: sync on staging
  • 12:50 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'sync'.
  • 12:50 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub-next: apply on staging
  • 12:49 elukey@deploy1002: helmfile [eqiad] START helmfile.d/admin 'sync'.
  • 12:48 elukey@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
  • 12:46 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1018.eqiad.wmnet
  • 12:46 elukey@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
  • 12:45 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2021.codfw.wmnet
  • 12:44 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2020.codfw.wmnet
  • 12:44 elukey@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'.
  • 12:43 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2020.codfw.wmnet
  • 12:43 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1017.eqiad.wmnet
  • 12:43 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1017.eqiad.wmnet
  • 12:43 elukey@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'sync'.
  • 12:41 elukey@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 12:40 elukey@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 12:37 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2020.codfw.wmnet
  • 12:37 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1017.eqiad.wmnet
  • 12:30 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2020.codfw.wmnet
  • 12:28 arnaudb@cumin1002: dbctl commit (dc=all): 'db2204 (re)pooling @ 100%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P64532 and previous config saved to /var/cache/conftool/dbconfig/20240610-122847-arnaudb.json
  • 12:28 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2019.codfw.wmnet
  • 12:28 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2019.codfw.wmnet
  • 12:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2019.codfw.wmnet
  • 12:21 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1017.eqiad.wmnet
  • 12:20 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2019.codfw.wmnet
  • 12:15 oblivian@deploy1002: Finished scap: Deploying change to base mediawiki image (take 2) (duration: 22m 39s)
  • 12:13 arnaudb@cumin1002: dbctl commit (dc=all): 'db2204 (re)pooling @ 75%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P64531 and previous config saved to /var/cache/conftool/dbconfig/20240610-121341-arnaudb.json
  • 12:05 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2018.codfw.wmnet
  • 12:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2018.codfw.wmnet
  • 11:58 arnaudb@cumin1002: dbctl commit (dc=all): 'db2204 (re)pooling @ 50%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P64530 and previous config saved to /var/cache/conftool/dbconfig/20240610-115834-arnaudb.json
  • 11:56 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2018.codfw.wmnet
  • 11:53 oblivian@deploy1002: Started scap: Deploying change to base mediawiki image (take 2)
  • 11:49 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1180 (T364069)', diff saved to https://phabricator.wikimedia.org/P64528 and previous config saved to /var/cache/conftool/dbconfig/20240610-114957-marostegui.json
  • 11:49 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 11:49 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 11:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1173 (T364069)', diff saved to https://phabricator.wikimedia.org/P64527 and previous config saved to /var/cache/conftool/dbconfig/20240610-114934-marostegui.json
  • 11:49 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1016.eqiad.wmnet
  • 11:48 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1016.eqiad.wmnet
  • 11:44 oblivian@deploy1002: sync-world aborted: Deploying change to base mediawiki image (duration: 10m 21s)
  • 11:43 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2018.codfw.wmnet
  • 11:43 arnaudb@cumin1002: dbctl commit (dc=all): 'db2204 (re)pooling @ 25%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P64526 and previous config saved to /var/cache/conftool/dbconfig/20240610-114329-arnaudb.json
  • 11:43 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1016.eqiad.wmnet
  • 11:39 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub: sync on production
  • 11:36 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub: apply on production
  • 11:36 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2017.codfw.wmnet
  • 11:36 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2017.codfw.wmnet
  • 11:35 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub-next: sync on staging
  • 11:34 oblivian@deploy1002: Started scap: Deploying change to base mediawiki image
  • 11:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1173', diff saved to https://phabricator.wikimedia.org/P64525 and previous config saved to /var/cache/conftool/dbconfig/20240610-113426-marostegui.json
  • 11:34 oblivian@deploy1002: Unlocked for deployment [ALL REPOSITORIES]: setting global lock while working on mw-on-k8s --joe. Ping me if you need urgent deployments (duration: 10m 22s)
  • 11:32 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub-next: apply on staging
  • 11:29 fabfur: restarting pybal on lvs6003,lvs6001 to apply https://gerrit.wikimedia.org/r/c/operations/puppet/+/1039947 (T366466)
  • 11:28 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1016.eqiad.wmnet
  • 11:28 arnaudb@cumin1002: dbctl commit (dc=all): 'db2204 (re)pooling @ 10%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P64524 and previous config saved to /var/cache/conftool/dbconfig/20240610-112821-arnaudb.json
  • 11:28 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2017.codfw.wmnet
  • 11:26 fabfur: enabling && running puppet on A:lvs-drmrs to apply https://gerrit.wikimedia.org/r/c/operations/puppet/+/1039947 (T366466)
  • 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1015.eqiad.wmnet
  • 11:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1015.eqiad.wmnet
  • 11:23 oblivian@deploy1002: Locking from deployment [ALL REPOSITORIES]: setting global lock while working on mw-on-k8s --joe. Ping me if you need urgent deployments
  • 11:19 oblivian@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 11:19 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1015.eqiad.wmnet
  • 11:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1173', diff saved to https://phabricator.wikimedia.org/P64523 and previous config saved to /var/cache/conftool/dbconfig/20240610-111917-marostegui.json
  • 11:19 oblivian@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 11:19 oblivian@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 11:18 oblivian@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 11:13 arnaudb@cumin1002: dbctl commit (dc=all): 'db2204 (re)pooling @ 5%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P64522 and previous config saved to /var/cache/conftool/dbconfig/20240610-111315-arnaudb.json
  • 10:47 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudgw1002.eqiad.wmnet
  • 10:43 arnaudb@cumin1002: dbctl commit (dc=all): 'db2204 (re)pooling @ 1%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P64519 and previous config saved to /var/cache/conftool/dbconfig/20240610-104303-arnaudb.json
  • 10:41 taavi@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudgw1002.eqiad.wmnet
  • 10:41 fabfur: depooling text@drmrs to apply IPIP encapsulation patches (T366466)
  • 10:34 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2016.codfw.wmnet
  • 10:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2016.codfw.wmnet
  • 10:27 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2204.codfw.wmnet with reason: Maintenance
  • 10:27 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2204.codfw.wmnet with reason: Maintenance
  • 10:26 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2016.codfw.wmnet
  • 10:25 isaranto@deploy1002: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 10:25 arnaudb@cumin1002: dbctl commit (dc=all): 'Depool db2204 T367019', diff saved to https://phabricator.wikimedia.org/P64518 and previous config saved to /var/cache/conftool/dbconfig/20240610-102511-arnaudb.json
  • 10:24 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1014.eqiad.wmnet
  • 10:23 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1014.eqiad.wmnet
  • 10:21 claime: repooled all active/active mediawiki services from codfw
  • 10:21 cgoubert@cumin1002: conftool action : set/pooled=true; selector: dnsdisc=api-ro,name=codfw
  • 10:21 cgoubert@cumin1002: conftool action : set/pooled=true; selector: dnsdisc=appservers-ro,name=codfw
  • 10:21 cgoubert@cumin1002: conftool action : set/pooled=true; selector: dnsdisc=mw-api-int-ro,name=codfw
  • 10:21 cgoubert@cumin1002: conftool action : set/pooled=true; selector: dnsdisc=mw-api-ext-ro,name=codfw
  • 10:21 cgoubert@cumin1002: conftool action : set/pooled=true; selector: dnsdisc=mw-web-ro,name=codfw
  • 10:17 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1014.eqiad.wmnet
  • 10:08 claime: depooled all active/active mediawiki services from codfw
  • 10:08 cgoubert@cumin1002: conftool action : set/pooled=false; selector: dnsdisc=api-ro,name=codfw
  • 10:07 cgoubert@cumin1002: conftool action : set/pooled=false; selector: dnsdisc=appservers-ro,name=codfw
  • 10:06 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2016.codfw.wmnet
  • 10:05 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1014.eqiad.wmnet
  • 10:05 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
  • 10:02 cgoubert@cumin1002: conftool action : set/pooled=false; selector: dnsdisc=mw-api-int-ro,name=codfw
  • 10:02 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 10:01 cgoubert@cumin1002: conftool action : set/pooled=false; selector: dnsdisc=mw-api-ext-ro,name=codfw
  • 10:01 cgoubert@cumin1002: conftool action : set/pooled=false; selector: dnsdisc=mw-web-ro,name=codfw
  • 10:01 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 09:57 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 09:57 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on 26 hosts with reason: Issue from T367019
  • 09:57 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 5:00:00 on 26 hosts with reason: Issue from T367019
  • 09:54 arnaudb@cumin1002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 5:00:00 on 870 hosts with reason: Issue from T367019
  • 09:54 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 5:00:00 on 870 hosts with reason: Issue from T367019
  • 09:53 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 09:53 jayme@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 09:47 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp4048.ulsfo.wmnet
  • 09:37 godog: roll upgrade prometheus-statsd-exporter to baremetal - T302373
  • 09:34 taavi@deploy1002: Finished scap: Backport for Reapply "wikitech: Replace OSM class in Gerrit blocking hook" (duration: 11m 17s)
  • 09:25 taavi@deploy1002: taavi: Continuing with sync
  • 09:25 taavi@deploy1002: taavi: Backport for Reapply "wikitech: Replace OSM class in Gerrit blocking hook" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 09:24 volans@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
  • 09:24 volans@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
  • 09:22 taavi@deploy1002: Started scap: Backport for Reapply "wikitech: Replace OSM class in Gerrit blocking hook"
  • 09:22 volans@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
  • 09:22 volans@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
  • 09:16 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1173 (T364069)', diff saved to https://phabricator.wikimedia.org/P64517 and previous config saved to /var/cache/conftool/dbconfig/20240610-091631-marostegui.json
  • 09:16 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance
  • 09:16 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance
  • 09:16 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T364069)', diff saved to https://phabricator.wikimedia.org/P64516 and previous config saved to /var/cache/conftool/dbconfig/20240610-091606-marostegui.json
  • 09:15 arnaudb@cumin1002: dbctl commit (dc=all): 'Promote db2207 to s2 primary T367019', diff saved to https://phabricator.wikimedia.org/P64515 and previous config saved to /var/cache/conftool/dbconfig/20240610-091506-arnaudb.json
  • 09:14 arnaudb: Starting s2 codfw failover from db2204 to db2207 - T367019
  • 09:01 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2015.codfw.wmnet
  • 09:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2015.codfw.wmnet
  • 09:01 godog: upload prometheus-statsd-exporter 0.26.1-1 to apt - T302373
  • 09:00 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P64514 and previous config saved to /var/cache/conftool/dbconfig/20240610-090058-marostegui.json
  • 09:00 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1013.eqiad.wmnet
  • 09:00 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1013.eqiad.wmnet
  • 08:57 arnaudb@cumin1002: dbctl commit (dc=all): 'Set db2207 with weight 0 T367019', diff saved to https://phabricator.wikimedia.org/P64513 and previous config saved to /var/cache/conftool/dbconfig/20240610-085721-arnaudb.json
  • 08:57 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 26 hosts with reason: Primary switchover s2 T367019
  • 08:56 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 26 hosts with reason: Primary switchover s2 T367019
  • 08:55 arnaudb@cumin1002: dbctl commit (dc=all): 'db2207 (re)pooling @ 100%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P64512 and previous config saved to /var/cache/conftool/dbconfig/20240610-085548-arnaudb.json
  • 08:54 godog: upgrade prometheus-statsd-exporter on webperf - T302373
  • 08:53 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1013.eqiad.wmnet
  • 08:53 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2015.codfw.wmnet
  • 08:51 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:50 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add new entries for cr2-codfw peering to ssw1-d8-codfw - cmooney@cumin1002"
  • 08:50 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add new entries for cr2-codfw peering to ssw1-d8-codfw - cmooney@cumin1002"
  • 08:48 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4048.ulsfo.wmnet
  • 08:47 cmooney@cumin1002: START - Cookbook sre.dns.netbox
  • 08:46 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1013.eqiad.wmnet
  • 08:46 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2015.codfw.wmnet
  • 08:45 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P64511 and previous config saved to /var/cache/conftool/dbconfig/20240610-084550-marostegui.json
  • 08:41 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2014.codfw.wmnet
  • 08:41 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2014.codfw.wmnet
  • 08:41 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1012.eqiad.wmnet
  • 08:41 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1012.eqiad.wmnet
  • 08:40 arnaudb@cumin1002: dbctl commit (dc=all): 'db2207 (re)pooling @ 75%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P64510 and previous config saved to /var/cache/conftool/dbconfig/20240610-084042-arnaudb.json
  • 08:39 fabfur@cumin1002: START - Cookbook sre.hosts.reboot-single for host cp4048.ulsfo.wmnet
  • 08:39 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp4048.ulsfo.wmnet
  • 08:36 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ping1004.eqiad.wmnet with OS bookworm
  • 08:35 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2014.codfw.wmnet
  • 08:35 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1012.eqiad.wmnet
  • 08:18 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2013.codfw.wmnet
  • 08:17 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1011.eqiad.wmnet
  • 08:17 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ping1004.eqiad.wmnet with reason: host reimage
  • 08:14 kostajh: UTC morning deploys done
  • 08:13 kharlan@deploy1002: Finished scap: Backport for IPInfo: Switch to using GeoLite2 data (T361884) (duration: 14m 07s)
  • 08:10 arnaudb@cumin1002: dbctl commit (dc=all): 'db2207 (re)pooling @ 25%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P64507 and previous config saved to /var/cache/conftool/dbconfig/20240610-081030-arnaudb.json
  • 08:04 kharlan@deploy1002: kharlan: Continuing with sync
  • 08:03 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on gerrit1003.wikimedia.org with reason: Gerrit upgrade
  • 08:03 jelto@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on gerrit1003.wikimedia.org with reason: Gerrit upgrade
  • 08:03 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on gerrit2002.wikimedia.org with reason: Gerrit upgrade
  • 08:03 jelto@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on gerrit2002.wikimedia.org with reason: Gerrit upgrade
  • 08:02 kharlan@deploy1002: kharlan: Backport for IPInfo: Switch to using GeoLite2 data (T361884) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 07:59 kharlan@deploy1002: Started scap: Backport for IPInfo: Switch to using GeoLite2 data (T361884)
  • 07:58 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2013.codfw.wmnet
  • 07:58 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1011.eqiad.wmnet
  • 07:57 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ping1004.eqiad.wmnet with OS bookworm
  • 07:57 kharlan@deploy1002: kharlan: Backport for IPInfo: Switch to using GeoLite2 data (T361884) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 07:56 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 07:55 arnaudb@cumin1002: dbctl commit (dc=all): 'db2207 (re)pooling @ 10%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P64506 and previous config saved to /var/cache/conftool/dbconfig/20240610-075524-arnaudb.json
  • 07:55 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1010.eqiad.wmnet
  • 07:54 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1010.eqiad.wmnet
  • 07:53 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 07:53 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 07:51 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2012.codfw.wmnet
  • 07:51 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2012.codfw.wmnet
  • 07:50 marostegui@cumin1002: dbctl commit (dc=all): 'db2218 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P64505 and previous config saved to /var/cache/conftool/dbconfig/20240610-075056-root.json
  • 07:50 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 07:48 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1010.eqiad.wmnet
  • 07:47 arnaudb@cumin1002: END (PASS) - Cookbook sre.mysql.upgrade (exit_code=0) for db2207.codfw.wmnet
  • 07:44 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2012.codfw.wmnet
  • 07:43 arnaudb@cumin1002: START - Cookbook sre.mysql.upgrade for db2207.codfw.wmnet
  • 07:41 arnaudb@cumin1002: dbctl commit (dc=all): 'depool db2207 maintenance', diff saved to https://phabricator.wikimedia.org/P64504 and previous config saved to /var/cache/conftool/dbconfig/20240610-074157-arnaudb.json
  • 07:41 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db2207.codfw.wmnet with reason: maintenance
  • 07:41 kharlan@deploy1002: Started scap: Backport for IPInfo: Switch to using GeoLite2 data (T361884)
  • 07:41 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db2207.codfw.wmnet with reason: maintenance
  • 07:38 arnaudb@cumin1002: dbctl commit (dc=all): 'Revert db2207 with weight 500 T367019', diff saved to https://phabricator.wikimedia.org/P64503 and previous config saved to /var/cache/conftool/dbconfig/20240610-073838-arnaudb.json
  • 07:37 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 07:37 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1010.eqiad.wmnet
  • 07:37 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.drain-node (exit_code=99) for draining ganeti node ganeti1010.eqiad.wmnet
  • 07:37 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1010.eqiad.wmnet
  • 07:36 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1009.eqiad.wmnet
  • 07:36 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1009.eqiad.wmnet
  • 07:35 marostegui@cumin1002: dbctl commit (dc=all): 'db2218 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P64502 and previous config saved to /var/cache/conftool/dbconfig/20240610-073549-root.json
  • 07:35 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 07:34 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 07:33 jayme@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 07:32 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2012.codfw.wmnet
  • 07:31 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2011.codfw.wmnet
  • 07:31 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2011.codfw.wmnet
  • 07:30 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti1009.eqiad.wmnet
  • 07:26 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/push-notifications: apply
  • 07:25 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/push-notifications: apply
  • 07:24 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/push-notifications: apply
  • 07:23 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/push-notifications: apply
  • 07:23 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2011.codfw.wmnet
  • 07:22 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/push-notifications: apply
  • 07:22 jayme@deploy1002: helmfile [staging] START helmfile.d/services/push-notifications: apply
  • 07:20 marostegui@cumin1002: dbctl commit (dc=all): 'db2218 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P64501 and previous config saved to /var/cache/conftool/dbconfig/20240610-072043-root.json
  • 07:17 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1009.eqiad.wmnet
  • 07:15 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2011.codfw.wmnet
  • 07:14 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2010.codfw.wmnet
  • 07:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2010.codfw.wmnet
  • 07:07 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2010.codfw.wmnet
  • 07:05 marostegui@cumin1002: dbctl commit (dc=all): 'db2218 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P64500 and previous config saved to /var/cache/conftool/dbconfig/20240610-070537-root.json
  • 07:03 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2010.codfw.wmnet
  • 07:02 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1168 (T364069)', diff saved to https://phabricator.wikimedia.org/P64499 and previous config saved to /var/cache/conftool/dbconfig/20240610-070249-marostegui.json
  • 07:02 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 07:02 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 07:02 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T364069)', diff saved to https://phabricator.wikimedia.org/P64498 and previous config saved to /var/cache/conftool/dbconfig/20240610-070224-marostegui.json
  • 07:00 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2009.codfw.wmnet
  • 06:59 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2009.codfw.wmnet
  • 06:58 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1240.eqiad.wmnet with reason: Maintenance
  • 06:58 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1240.eqiad.wmnet with reason: Maintenance
  • 06:56 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2202.codfw.wmnet with reason: Maintenance
  • 06:56 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2202.codfw.wmnet with reason: Maintenance
  • 06:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188 (T352010)', diff saved to https://phabricator.wikimedia.org/P64497 and previous config saved to /var/cache/conftool/dbconfig/20240610-065640-ladsgroup.json
  • 06:53 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2009.codfw.wmnet
  • 06:50 marostegui@cumin1002: dbctl commit (dc=all): 'db2218 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P64496 and previous config saved to /var/cache/conftool/dbconfig/20240610-065031-root.json
  • 06:47 marostegui: dbmaint codfw s4 deploy schema change on db2140 T364299
  • 06:47 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P64495 and previous config saved to /var/cache/conftool/dbconfig/20240610-064716-marostegui.json
  • 06:46 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2140.codfw.wmnet with reason: Long schema change
  • 06:46 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2140.codfw.wmnet with reason: Long schema change
  • 06:45 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2009.codfw.wmnet
  • 06:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188', diff saved to https://phabricator.wikimedia.org/P64494 and previous config saved to /var/cache/conftool/dbconfig/20240610-064132-ladsgroup.json
  • 06:39 arnaudb@cumin1002: dbctl commit (dc=all): 'Set db2207 with weight 0 T367019', diff saved to https://phabricator.wikimedia.org/P64493 and previous config saved to /var/cache/conftool/dbconfig/20240610-063912-arnaudb.json
  • 06:39 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2140 T367017', diff saved to https://phabricator.wikimedia.org/P64492 and previous config saved to /var/cache/conftool/dbconfig/20240610-063904-root.json
  • 06:38 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 26 hosts with reason: Primary switchover s2 T367019
  • 06:38 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db2179 to s4 primary T367017', diff saved to https://phabricator.wikimedia.org/P64491 and previous config saved to /var/cache/conftool/dbconfig/20240610-063830-root.json
  • 06:38 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 26 hosts with reason: Primary switchover s2 T367019
  • 06:38 marostegui: Starting s4 codfw failover from db2140 to db2179 - T367017
  • 06:35 marostegui@cumin1002: dbctl commit (dc=all): 'db2218 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P64490 and previous config saved to /var/cache/conftool/dbconfig/20240610-063524-root.json
  • 06:32 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P64489 and previous config saved to /var/cache/conftool/dbconfig/20240610-063208-marostegui.json
  • 06:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188', diff saved to https://phabricator.wikimedia.org/P64488 and previous config saved to /var/cache/conftool/dbconfig/20240610-062624-ladsgroup.json
  • 06:20 marostegui@cumin1002: dbctl commit (dc=all): 'db2218 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P64487 and previous config saved to /var/cache/conftool/dbconfig/20240610-062017-root.json
  • 06:19 marostegui@cumin1002: dbctl commit (dc=all): 'Remove db2179 from API/vslow/dump T367017', diff saved to https://phabricator.wikimedia.org/P64486 and previous config saved to /var/cache/conftool/dbconfig/20240610-061939-root.json
  • 06:19 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 32 hosts with reason: Primary switchover s4 T367017
  • 06:18 marostegui@cumin1002: dbctl commit (dc=all): 'Set db2179 with weight 0 T367017', diff saved to https://phabricator.wikimedia.org/P64485 and previous config saved to /var/cache/conftool/dbconfig/20240610-061849-root.json
  • 06:18 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 32 hosts with reason: Primary switchover s4 T367017
  • 06:17 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T364069)', diff saved to https://phabricator.wikimedia.org/P64484 and previous config saved to /var/cache/conftool/dbconfig/20240610-061658-marostegui.json
  • 06:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188 (T352010)', diff saved to https://phabricator.wikimedia.org/P64483 and previous config saved to /var/cache/conftool/dbconfig/20240610-061116-ladsgroup.json
  • 05:29 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2198.codfw.wmnet with reason: Maintenance
  • 05:29 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2198.codfw.wmnet with reason: Maintenance
  • 05:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 (T352010)', diff saved to https://phabricator.wikimedia.org/P64482 and previous config saved to /var/cache/conftool/dbconfig/20240610-052941-ladsgroup.json
  • 05:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P64481 and previous config saved to /var/cache/conftool/dbconfig/20240610-051432-ladsgroup.json
  • 05:13 marostegui: dbmaint codfw s7 deploy schema change on db2218 T364299
  • 05:12 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2218.codfw.wmnet with reason: Long schema change
  • 05:12 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2218.codfw.wmnet with reason: Long schema change
  • 05:07 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2218 T366875', diff saved to https://phabricator.wikimedia.org/P64480 and previous config saved to /var/cache/conftool/dbconfig/20240610-050738-root.json
  • 05:06 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db2121 to s7 primary T366875', diff saved to https://phabricator.wikimedia.org/P64479 and previous config saved to /var/cache/conftool/dbconfig/20240610-050637-marostegui.json
  • 05:06 marostegui: Starting s7 codfw failover from db2218 to db2121 - T366875
  • 04:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P64478 and previous config saved to /var/cache/conftool/dbconfig/20240610-045922-ladsgroup.json
  • 04:52 kart_: Updated Apertium to 2024-06-07-143238-production (T356252)
  • 04:49 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/apertium: apply
  • 04:49 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/apertium: apply
  • 04:44 marostegui: Rename flaggedpage_pending in s5 T365568
  • 04:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 (T352010)', diff saved to https://phabricator.wikimedia.org/P64477 and previous config saved to /var/cache/conftool/dbconfig/20240610-044414-ladsgroup.json
  • 04:42 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/apertium: apply
  • 04:41 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/apertium: apply
  • 04:37 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/apertium: apply
  • 04:37 marostegui@cumin1002: dbctl commit (dc=all): 'Remove db2121 from API/vslow/dump T366875', diff saved to https://phabricator.wikimedia.org/P64476 and previous config saved to /var/cache/conftool/dbconfig/20240610-043741-root.json
  • 04:37 kartik@deploy1002: helmfile [staging] START helmfile.d/services/apertium: apply
  • 04:37 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 27 hosts with reason: Primary switchover s7 T366875
  • 04:36 marostegui@cumin1002: dbctl commit (dc=all): 'Set db2121 with weight 0 T366875', diff saved to https://phabricator.wikimedia.org/P64475 and previous config saved to /var/cache/conftool/dbconfig/20240610-043649-root.json
  • 04:36 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 27 hosts with reason: Primary switchover s7 T366875
  • 04:36 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1165 (T364069)', diff saved to https://phabricator.wikimedia.org/P64474 and previous config saved to /var/cache/conftool/dbconfig/20240610-043615-marostegui.json
  • 04:36 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 04:35 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 04:35 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 04:35 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1165.eqiad.wmnet with reason: Maintenance

2024-06-09

  • 23:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2188 (T352010)', diff saved to https://phabricator.wikimedia.org/P64473 and previous config saved to /var/cache/conftool/dbconfig/20240609-234110-ladsgroup.json
  • 23:41 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2188.codfw.wmnet with reason: Maintenance
  • 23:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2188.codfw.wmnet with reason: Maintenance
  • 23:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T352010)', diff saved to https://phabricator.wikimedia.org/P64472 and previous config saved to /var/cache/conftool/dbconfig/20240609-234047-ladsgroup.json
  • 23:29 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1239.eqiad.wmnet with reason: Maintenance
  • 23:29 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1239.eqiad.wmnet with reason: Maintenance
  • 23:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235 (T352010)', diff saved to https://phabricator.wikimedia.org/P64471 and previous config saved to /var/cache/conftool/dbconfig/20240609-232921-ladsgroup.json
  • 23:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P64470 and previous config saved to /var/cache/conftool/dbconfig/20240609-232539-ladsgroup.json
  • 23:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235', diff saved to https://phabricator.wikimedia.org/P64469 and previous config saved to /var/cache/conftool/dbconfig/20240609-231413-ladsgroup.json
  • 23:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P64468 and previous config saved to /var/cache/conftool/dbconfig/20240609-231031-ladsgroup.json
  • 22:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235', diff saved to https://phabricator.wikimedia.org/P64467 and previous config saved to /var/cache/conftool/dbconfig/20240609-225905-ladsgroup.json
  • 22:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T352010)', diff saved to https://phabricator.wikimedia.org/P64466 and previous config saved to /var/cache/conftool/dbconfig/20240609-225523-ladsgroup.json
  • 22:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235 (T352010)', diff saved to https://phabricator.wikimedia.org/P64465 and previous config saved to /var/cache/conftool/dbconfig/20240609-224357-ladsgroup.json
  • 19:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2195 (T352010)', diff saved to https://phabricator.wikimedia.org/P64464 and previous config saved to /var/cache/conftool/dbconfig/20240609-192428-ladsgroup.json
  • 19:24 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2195.codfw.wmnet with reason: Maintenance
  • 19:24 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2195.codfw.wmnet with reason: Maintenance
  • 19:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 (T352010)', diff saved to https://phabricator.wikimedia.org/P64463 and previous config saved to /var/cache/conftool/dbconfig/20240609-192404-ladsgroup.json
  • 19:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P64462 and previous config saved to /var/cache/conftool/dbconfig/20240609-190856-ladsgroup.json
  • 18:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P64461 and previous config saved to /var/cache/conftool/dbconfig/20240609-185347-ladsgroup.json
  • 18:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 (T352010)', diff saved to https://phabricator.wikimedia.org/P64460 and previous config saved to /var/cache/conftool/dbconfig/20240609-183839-ladsgroup.json
  • 16:06 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248 (T364299)', diff saved to https://phabricator.wikimedia.org/P64459 and previous config saved to /var/cache/conftool/dbconfig/20240609-160621-marostegui.json
  • 15:51 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248', diff saved to https://phabricator.wikimedia.org/P64458 and previous config saved to /var/cache/conftool/dbconfig/20240609-155113-marostegui.json
  • 15:36 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248', diff saved to https://phabricator.wikimedia.org/P64457 and previous config saved to /var/cache/conftool/dbconfig/20240609-153605-marostegui.json
  • 15:20 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248 (T364299)', diff saved to https://phabricator.wikimedia.org/P64456 and previous config saved to /var/cache/conftool/dbconfig/20240609-152057-marostegui.json
  • 15:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1235 (T352010)', diff saved to https://phabricator.wikimedia.org/P64455 and previous config saved to /var/cache/conftool/dbconfig/20240609-152020-ladsgroup.json
  • 15:20 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1235.eqiad.wmnet with reason: Maintenance
  • 15:20 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1235.eqiad.wmnet with reason: Maintenance
  • 15:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234 (T352010)', diff saved to https://phabricator.wikimedia.org/P64454 and previous config saved to /var/cache/conftool/dbconfig/20240609-151956-ladsgroup.json
  • 15:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234', diff saved to https://phabricator.wikimedia.org/P64453 and previous config saved to /var/cache/conftool/dbconfig/20240609-150448-ladsgroup.json
  • 14:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234', diff saved to https://phabricator.wikimedia.org/P64452 and previous config saved to /var/cache/conftool/dbconfig/20240609-144940-ladsgroup.json
  • 14:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234 (T352010)', diff saved to https://phabricator.wikimedia.org/P64451 and previous config saved to /var/cache/conftool/dbconfig/20240609-143432-ladsgroup.json
  • 14:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2176 (T352010)', diff saved to https://phabricator.wikimedia.org/P64450 and previous config saved to /var/cache/conftool/dbconfig/20240609-143128-ladsgroup.json
  • 14:31 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2176.codfw.wmnet with reason: Maintenance
  • 14:31 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2176.codfw.wmnet with reason: Maintenance
  • 14:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T352010)', diff saved to https://phabricator.wikimedia.org/P64449 and previous config saved to /var/cache/conftool/dbconfig/20240609-143105-ladsgroup.json
  • 14:30 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217 (T364069)', diff saved to https://phabricator.wikimedia.org/P64448 and previous config saved to /var/cache/conftool/dbconfig/20240609-143032-marostegui.json
  • 14:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P64447 and previous config saved to /var/cache/conftool/dbconfig/20240609-141557-ladsgroup.json
  • 14:15 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217', diff saved to https://phabricator.wikimedia.org/P64446 and previous config saved to /var/cache/conftool/dbconfig/20240609-141524-marostegui.json
  • 14:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P64445 and previous config saved to /var/cache/conftool/dbconfig/20240609-140049-ladsgroup.json
  • 14:00 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217', diff saved to https://phabricator.wikimedia.org/P64444 and previous config saved to /var/cache/conftool/dbconfig/20240609-140016-marostegui.json
  • 13:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T352010)', diff saved to https://phabricator.wikimedia.org/P64443 and previous config saved to /var/cache/conftool/dbconfig/20240609-134541-ladsgroup.json
  • 13:45 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217 (T364069)', diff saved to https://phabricator.wikimedia.org/P64442 and previous config saved to /var/cache/conftool/dbconfig/20240609-134508-marostegui.json
  • 12:08 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1248 (T364299)', diff saved to https://phabricator.wikimedia.org/P64441 and previous config saved to /var/cache/conftool/dbconfig/20240609-120817-marostegui.json
  • 12:08 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1248.eqiad.wmnet with reason: Maintenance
  • 12:07 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1248.eqiad.wmnet with reason: Maintenance
  • 12:07 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221 (T364299)', diff saved to https://phabricator.wikimedia.org/P64440 and previous config saved to /var/cache/conftool/dbconfig/20240609-120753-marostegui.json
  • 12:04 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2217 (T364069)', diff saved to https://phabricator.wikimedia.org/P64439 and previous config saved to /var/cache/conftool/dbconfig/20240609-120400-marostegui.json
  • 12:03 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2217.codfw.wmnet with reason: Maintenance
  • 12:03 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2217.codfw.wmnet with reason: Maintenance
  • 11:52 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221', diff saved to https://phabricator.wikimedia.org/P64438 and previous config saved to /var/cache/conftool/dbconfig/20240609-115245-marostegui.json
  • 11:37 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221', diff saved to https://phabricator.wikimedia.org/P64437 and previous config saved to /var/cache/conftool/dbconfig/20240609-113737-marostegui.json
  • 11:22 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221 (T364299)', diff saved to https://phabricator.wikimedia.org/P64436 and previous config saved to /var/cache/conftool/dbconfig/20240609-112229-marostegui.json
  • 11:20 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance
  • 11:19 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance
  • 11:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 (T352010)', diff saved to https://phabricator.wikimedia.org/P64435 and previous config saved to /var/cache/conftool/dbconfig/20240609-111945-ladsgroup.json
  • 11:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P64434 and previous config saved to /var/cache/conftool/dbconfig/20240609-110437-ladsgroup.json
  • 10:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P64433 and previous config saved to /var/cache/conftool/dbconfig/20240609-104929-ladsgroup.json
  • 10:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 (T352010)', diff saved to https://phabricator.wikimedia.org/P64432 and previous config saved to /var/cache/conftool/dbconfig/20240609-103421-ladsgroup.json
  • 09:59 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2197.codfw.wmnet with reason: Maintenance
  • 09:58 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2197.codfw.wmnet with reason: Maintenance
  • 09:58 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193 (T364069)', diff saved to https://phabricator.wikimedia.org/P64431 and previous config saved to /var/cache/conftool/dbconfig/20240609-095854-marostegui.json
  • 09:43 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193', diff saved to https://phabricator.wikimedia.org/P64430 and previous config saved to /var/cache/conftool/dbconfig/20240609-094346-marostegui.json
  • 09:28 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193', diff saved to https://phabricator.wikimedia.org/P64429 and previous config saved to /var/cache/conftool/dbconfig/20240609-092837-marostegui.json
  • 09:13 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193 (T364069)', diff saved to https://phabricator.wikimedia.org/P64428 and previous config saved to /var/cache/conftool/dbconfig/20240609-091329-marostegui.json
  • 08:01 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2193 (T364069)', diff saved to https://phabricator.wikimedia.org/P64427 and previous config saved to /var/cache/conftool/dbconfig/20240609-080149-marostegui.json
  • 08:01 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2193.codfw.wmnet with reason: Maintenance
  • 08:01 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2193.codfw.wmnet with reason: Maintenance
  • 08:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T364069)', diff saved to https://phabricator.wikimedia.org/P64426 and previous config saved to /var/cache/conftool/dbconfig/20240609-080125-marostegui.json
  • 07:55 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1221 (T364299)', diff saved to https://phabricator.wikimedia.org/P64425 and previous config saved to /var/cache/conftool/dbconfig/20240609-075533-marostegui.json
  • 07:55 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 07:55 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 07:55 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1221.eqiad.wmnet with reason: Maintenance
  • 07:54 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1221.eqiad.wmnet with reason: Maintenance
  • 07:46 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P64424 and previous config saved to /var/cache/conftool/dbconfig/20240609-074617-marostegui.json
  • 07:31 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P64423 and previous config saved to /var/cache/conftool/dbconfig/20240609-073109-marostegui.json
  • 07:16 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T364069)', diff saved to https://phabricator.wikimedia.org/P64422 and previous config saved to /var/cache/conftool/dbconfig/20240609-071601-marostegui.json
  • 06:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1234 (T352010)', diff saved to https://phabricator.wikimedia.org/P64421 and previous config saved to /var/cache/conftool/dbconfig/20240609-064733-ladsgroup.json
  • 06:47 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1234.eqiad.wmnet with reason: Maintenance
  • 06:47 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1234.eqiad.wmnet with reason: Maintenance
  • 06:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232 (T352010)', diff saved to https://phabricator.wikimedia.org/P64420 and previous config saved to /var/cache/conftool/dbconfig/20240609-064709-ladsgroup.json
  • 06:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2181 (T352010)', diff saved to https://phabricator.wikimedia.org/P64419 and previous config saved to /var/cache/conftool/dbconfig/20240609-063607-ladsgroup.json
  • 06:35 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance
  • 06:35 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance
  • 06:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167 (T352010)', diff saved to https://phabricator.wikimedia.org/P64418 and previous config saved to /var/cache/conftool/dbconfig/20240609-063543-ladsgroup.json
  • 06:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232', diff saved to https://phabricator.wikimedia.org/P64417 and previous config saved to /var/cache/conftool/dbconfig/20240609-063201-ladsgroup.json
  • 06:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P64416 and previous config saved to /var/cache/conftool/dbconfig/20240609-062033-ladsgroup.json
  • 06:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232', diff saved to https://phabricator.wikimedia.org/P64415 and previous config saved to /var/cache/conftool/dbconfig/20240609-061653-ladsgroup.json
  • 06:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P64414 and previous config saved to /var/cache/conftool/dbconfig/20240609-060525-ladsgroup.json
  • 06:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232 (T352010)', diff saved to https://phabricator.wikimedia.org/P64413 and previous config saved to /var/cache/conftool/dbconfig/20240609-060146-ladsgroup.json
  • 05:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167 (T352010)', diff saved to https://phabricator.wikimedia.org/P64412 and previous config saved to /var/cache/conftool/dbconfig/20240609-055017-ladsgroup.json
  • 05:48 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2180 (T364069)', diff saved to https://phabricator.wikimedia.org/P64411 and previous config saved to /var/cache/conftool/dbconfig/20240609-054833-marostegui.json
  • 05:48 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2180.codfw.wmnet with reason: Maintenance
  • 05:48 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2180.codfw.wmnet with reason: Maintenance
  • 05:48 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169 (T364069)', diff saved to https://phabricator.wikimedia.org/P64410 and previous config saved to /var/cache/conftool/dbconfig/20240609-054809-marostegui.json
  • 05:33 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169', diff saved to https://phabricator.wikimedia.org/P64409 and previous config saved to /var/cache/conftool/dbconfig/20240609-053301-marostegui.json
  • 05:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2174 (T352010)', diff saved to https://phabricator.wikimedia.org/P64408 and previous config saved to /var/cache/conftool/dbconfig/20240609-052358-ladsgroup.json
  • 05:23 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2174.codfw.wmnet with reason: Maintenance
  • 05:23 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2174.codfw.wmnet with reason: Maintenance
  • 05:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T352010)', diff saved to https://phabricator.wikimedia.org/P64407 and previous config saved to /var/cache/conftool/dbconfig/20240609-052334-ladsgroup.json
  • 05:17 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169', diff saved to https://phabricator.wikimedia.org/P64406 and previous config saved to /var/cache/conftool/dbconfig/20240609-051753-marostegui.json
  • 05:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P64405 and previous config saved to /var/cache/conftool/dbconfig/20240609-050826-ladsgroup.json
  • 05:02 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169 (T364069)', diff saved to https://phabricator.wikimedia.org/P64404 and previous config saved to /var/cache/conftool/dbconfig/20240609-050245-marostegui.json
  • 04:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P64403 and previous config saved to /var/cache/conftool/dbconfig/20240609-045319-ladsgroup.json
  • 04:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T352010)', diff saved to https://phabricator.wikimedia.org/P64402 and previous config saved to /var/cache/conftool/dbconfig/20240609-043811-ladsgroup.json
  • 02:59 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2169 (T364069)', diff saved to https://phabricator.wikimedia.org/P64401 and previous config saved to /var/cache/conftool/dbconfig/20240609-025921-marostegui.json
  • 02:59 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 02:59 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 02:58 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T364069)', diff saved to https://phabricator.wikimedia.org/P64400 and previous config saved to /var/cache/conftool/dbconfig/20240609-025856-marostegui.json
  • 02:43 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P64399 and previous config saved to /var/cache/conftool/dbconfig/20240609-024349-marostegui.json
  • 02:28 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P64398 and previous config saved to /var/cache/conftool/dbconfig/20240609-022840-marostegui.json
  • 02:13 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T364069)', diff saved to https://phabricator.wikimedia.org/P64397 and previous config saved to /var/cache/conftool/dbconfig/20240609-021333-marostegui.json
  • 02:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1226 (T352010)', diff saved to https://phabricator.wikimedia.org/P64396 and previous config saved to /var/cache/conftool/dbconfig/20240609-020120-ladsgroup.json
  • 02:01 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance
  • 02:00 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance
  • 01:24 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 01:24 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 01:24 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249 (T364299)', diff saved to https://phabricator.wikimedia.org/P64395 and previous config saved to /var/cache/conftool/dbconfig/20240609-012432-marostegui.json
  • 01:09 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249', diff saved to https://phabricator.wikimedia.org/P64394 and previous config saved to /var/cache/conftool/dbconfig/20240609-010922-marostegui.json
  • 00:54 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249', diff saved to https://phabricator.wikimedia.org/P64393 and previous config saved to /var/cache/conftool/dbconfig/20240609-005414-marostegui.json
  • 00:39 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249 (T364299)', diff saved to https://phabricator.wikimedia.org/P64392 and previous config saved to /var/cache/conftool/dbconfig/20240609-003906-marostegui.json
  • 00:07 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2158 (T364069)', diff saved to https://phabricator.wikimedia.org/P64391 and previous config saved to /var/cache/conftool/dbconfig/20240609-000718-marostegui.json
  • 00:07 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 00:06 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 00:06 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2158.codfw.wmnet with reason: Maintenance
  • 00:06 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2158.codfw.wmnet with reason: Maintenance
  • 00:06 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T364069)', diff saved to https://phabricator.wikimedia.org/P64390 and previous config saved to /var/cache/conftool/dbconfig/20240609-000640-marostegui.json

2024-06-08

  • 23:51 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P64389 and previous config saved to /var/cache/conftool/dbconfig/20240608-235132-marostegui.json
  • 23:36 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P64388 and previous config saved to /var/cache/conftool/dbconfig/20240608-233623-marostegui.json
  • 23:21 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T364069)', diff saved to https://phabricator.wikimedia.org/P64387 and previous config saved to /var/cache/conftool/dbconfig/20240608-232115-marostegui.json
  • 22:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1232 (T352010)', diff saved to https://phabricator.wikimedia.org/P64386 and previous config saved to /var/cache/conftool/dbconfig/20240608-222832-ladsgroup.json
  • 22:28 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1232.eqiad.wmnet with reason: Maintenance
  • 22:28 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1232.eqiad.wmnet with reason: Maintenance
  • 22:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1228 (T352010)', diff saved to https://phabricator.wikimedia.org/P64385 and previous config saved to /var/cache/conftool/dbconfig/20240608-222808-ladsgroup.json
  • 22:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1228', diff saved to https://phabricator.wikimedia.org/P64384 and previous config saved to /var/cache/conftool/dbconfig/20240608-221259-ladsgroup.json
  • 21:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1228', diff saved to https://phabricator.wikimedia.org/P64383 and previous config saved to /var/cache/conftool/dbconfig/20240608-215751-ladsgroup.json
  • 21:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1228 (T352010)', diff saved to https://phabricator.wikimedia.org/P64382 and previous config saved to /var/cache/conftool/dbconfig/20240608-214243-ladsgroup.json
  • 21:27 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1249 (T364299)', diff saved to https://phabricator.wikimedia.org/P64381 and previous config saved to /var/cache/conftool/dbconfig/20240608-212701-marostegui.json
  • 21:26 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1249.eqiad.wmnet with reason: Maintenance
  • 21:26 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1249.eqiad.wmnet with reason: Maintenance
  • 21:26 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247 (T364299)', diff saved to https://phabricator.wikimedia.org/P64380 and previous config saved to /var/cache/conftool/dbconfig/20240608-212637-marostegui.json
  • 21:15 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2151 (T364069)', diff saved to https://phabricator.wikimedia.org/P64379 and previous config saved to /var/cache/conftool/dbconfig/20240608-211527-marostegui.json
  • 21:15 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2151.codfw.wmnet with reason: Maintenance
  • 21:15 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2151.codfw.wmnet with reason: Maintenance
  • 21:15 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2129 (T364069)', diff saved to https://phabricator.wikimedia.org/P64378 and previous config saved to /var/cache/conftool/dbconfig/20240608-211503-marostegui.json
  • 21:11 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247', diff saved to https://phabricator.wikimedia.org/P64377 and previous config saved to /var/cache/conftool/dbconfig/20240608-211128-marostegui.json
  • 20:59 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2129', diff saved to https://phabricator.wikimedia.org/P64376 and previous config saved to /var/cache/conftool/dbconfig/20240608-205955-marostegui.json
  • 20:56 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247', diff saved to https://phabricator.wikimedia.org/P64375 and previous config saved to /var/cache/conftool/dbconfig/20240608-205618-marostegui.json
  • 20:44 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2129', diff saved to https://phabricator.wikimedia.org/P64374 and previous config saved to /var/cache/conftool/dbconfig/20240608-204447-marostegui.json
  • 20:41 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247 (T364299)', diff saved to https://phabricator.wikimedia.org/P64373 and previous config saved to /var/cache/conftool/dbconfig/20240608-204106-marostegui.json
  • 20:29 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2129 (T364069)', diff saved to https://phabricator.wikimedia.org/P64372 and previous config saved to /var/cache/conftool/dbconfig/20240608-202939-marostegui.json
  • 20:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2173 (T352010)', diff saved to https://phabricator.wikimedia.org/P64371 and previous config saved to /var/cache/conftool/dbconfig/20240608-202016-ladsgroup.json
  • 20:20 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 20:20 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 20:20 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2173.codfw.wmnet with reason: Maintenance
  • 20:19 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2173.codfw.wmnet with reason: Maintenance
  • 20:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170 (T352010)', diff saved to https://phabricator.wikimedia.org/P64370 and previous config saved to /var/cache/conftool/dbconfig/20240608-201948-ladsgroup.json
  • 20:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170', diff saved to https://phabricator.wikimedia.org/P64369 and previous config saved to /var/cache/conftool/dbconfig/20240608-200440-ladsgroup.json
  • 19:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170', diff saved to https://phabricator.wikimedia.org/P64368 and previous config saved to /var/cache/conftool/dbconfig/20240608-194932-ladsgroup.json
  • 19:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170 (T352010)', diff saved to https://phabricator.wikimedia.org/P64367 and previous config saved to /var/cache/conftool/dbconfig/20240608-193424-ladsgroup.json
  • 18:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2167 (T352010)', diff saved to https://phabricator.wikimedia.org/P64366 and previous config saved to /var/cache/conftool/dbconfig/20240608-182811-ladsgroup.json
  • 18:28 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2167.codfw.wmnet with reason: Maintenance
  • 18:27 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2167.codfw.wmnet with reason: Maintenance
  • 18:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T352010)', diff saved to https://phabricator.wikimedia.org/P64365 and previous config saved to /var/cache/conftool/dbconfig/20240608-182747-ladsgroup.json
  • 18:16 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2129 (T364069)', diff saved to https://phabricator.wikimedia.org/P64364 and previous config saved to /var/cache/conftool/dbconfig/20240608-181559-marostegui.json
  • 18:15 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2129.codfw.wmnet with reason: Maintenance
  • 18:15 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2129.codfw.wmnet with reason: Maintenance
  • 18:15 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2124 (T364069)', diff saved to https://phabricator.wikimedia.org/P64363 and previous config saved to /var/cache/conftool/dbconfig/20240608-181536-marostegui.json
  • 18:12 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P64362 and previous config saved to /var/cache/conftool/dbconfig/20240608-181238-ladsgroup.json
  • 18:00 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P64361 and previous config saved to /var/cache/conftool/dbconfig/20240608-180027-marostegui.json
  • 17:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P64360 and previous config saved to /var/cache/conftool/dbconfig/20240608-175730-ladsgroup.json
  • 17:45 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P64359 and previous config saved to /var/cache/conftool/dbconfig/20240608-174519-marostegui.json
  • 17:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T352010)', diff saved to https://phabricator.wikimedia.org/P64358 and previous config saved to /var/cache/conftool/dbconfig/20240608-174222-ladsgroup.json
  • 17:30 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2124 (T364069)', diff saved to https://phabricator.wikimedia.org/P64357 and previous config saved to /var/cache/conftool/dbconfig/20240608-173011-marostegui.json
  • 17:16 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1247 (T364299)', diff saved to https://phabricator.wikimedia.org/P64356 and previous config saved to /var/cache/conftool/dbconfig/20240608-171628-marostegui.json
  • 17:16 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1247.eqiad.wmnet with reason: Maintenance
  • 17:16 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1247.eqiad.wmnet with reason: Maintenance
  • 15:21 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2124 (T364069)', diff saved to https://phabricator.wikimedia.org/P64355 and previous config saved to /var/cache/conftool/dbconfig/20240608-152142-marostegui.json
  • 15:21 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2124.codfw.wmnet with reason: Maintenance
  • 15:21 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2124.codfw.wmnet with reason: Maintenance
  • 14:42 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1245.eqiad.wmnet with reason: Maintenance
  • 14:42 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1245.eqiad.wmnet with reason: Maintenance
  • 14:42 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1244 (T364299)', diff saved to https://phabricator.wikimedia.org/P64354 and previous config saved to /var/cache/conftool/dbconfig/20240608-144229-marostegui.json
  • 14:27 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1244', diff saved to https://phabricator.wikimedia.org/P64353 and previous config saved to /var/cache/conftool/dbconfig/20240608-142721-marostegui.json
  • 14:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1228 (T352010)', diff saved to https://phabricator.wikimedia.org/P64352 and previous config saved to /var/cache/conftool/dbconfig/20240608-141514-ladsgroup.json
  • 14:15 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1228.eqiad.wmnet with reason: Maintenance
  • 14:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1228.eqiad.wmnet with reason: Maintenance
  • 14:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1219 (T352010)', diff saved to https://phabricator.wikimedia.org/P64351 and previous config saved to /var/cache/conftool/dbconfig/20240608-141450-ladsgroup.json
  • 14:12 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1244', diff saved to https://phabricator.wikimedia.org/P64350 and previous config saved to /var/cache/conftool/dbconfig/20240608-141212-marostegui.json
  • 13:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1219', diff saved to https://phabricator.wikimedia.org/P64349 and previous config saved to /var/cache/conftool/dbconfig/20240608-135942-ladsgroup.json
  • 13:57 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
  • 13:57 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
  • 13:57 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1244 (T364299)', diff saved to https://phabricator.wikimedia.org/P64348 and previous config saved to /var/cache/conftool/dbconfig/20240608-135704-marostegui.json
  • 13:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1219', diff saved to https://phabricator.wikimedia.org/P64347 and previous config saved to /var/cache/conftool/dbconfig/20240608-134434-ladsgroup.json
  • 13:41 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance
  • 13:41 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance
  • 13:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 (T352010)', diff saved to https://phabricator.wikimedia.org/P64346 and previous config saved to /var/cache/conftool/dbconfig/20240608-134110-ladsgroup.json
  • 13:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1219 (T352010)', diff saved to https://phabricator.wikimedia.org/P64345 and previous config saved to /var/cache/conftool/dbconfig/20240608-132926-ladsgroup.json
  • 13:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P64344 and previous config saved to /var/cache/conftool/dbconfig/20240608-132602-ladsgroup.json
  • 13:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P64343 and previous config saved to /var/cache/conftool/dbconfig/20240608-131054-ladsgroup.json
  • 12:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 (T352010)', diff saved to https://phabricator.wikimedia.org/P64342 and previous config saved to /var/cache/conftool/dbconfig/20240608-125546-ladsgroup.json
  • 11:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2170 (T352010)', diff saved to https://phabricator.wikimedia.org/P64341 and previous config saved to /var/cache/conftool/dbconfig/20240608-113928-ladsgroup.json
  • 11:39 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 11:39 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 11:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T352010)', diff saved to https://phabricator.wikimedia.org/P64340 and previous config saved to /var/cache/conftool/dbconfig/20240608-113905-ladsgroup.json
  • 11:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P64339 and previous config saved to /var/cache/conftool/dbconfig/20240608-112357-ladsgroup.json
  • 11:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P64338 and previous config saved to /var/cache/conftool/dbconfig/20240608-110849-ladsgroup.json
  • 10:53 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T352010)', diff saved to https://phabricator.wikimedia.org/P64337 and previous config saved to /var/cache/conftool/dbconfig/20240608-105341-ladsgroup.json
  • 10:50 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1244 (T364299)', diff saved to https://phabricator.wikimedia.org/P64336 and previous config saved to /var/cache/conftool/dbconfig/20240608-105032-marostegui.json
  • 10:50 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1244.eqiad.wmnet with reason: Maintenance
  • 10:50 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1244.eqiad.wmnet with reason: Maintenance
  • 10:50 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243 (T364299)', diff saved to https://phabricator.wikimedia.org/P64335 and previous config saved to /var/cache/conftool/dbconfig/20240608-105008-marostegui.json
  • 10:35 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243', diff saved to https://phabricator.wikimedia.org/P64334 and previous config saved to /var/cache/conftool/dbconfig/20240608-103501-marostegui.json
  • 10:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243', diff saved to https://phabricator.wikimedia.org/P64333 and previous config saved to /var/cache/conftool/dbconfig/20240608-101953-marostegui.json
  • 10:04 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243 (T364299)', diff saved to https://phabricator.wikimedia.org/P64332 and previous config saved to /var/cache/conftool/dbconfig/20240608-100443-marostegui.json
  • 06:43 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1243 (T364299)', diff saved to https://phabricator.wikimedia.org/P64331 and previous config saved to /var/cache/conftool/dbconfig/20240608-064353-marostegui.json
  • 06:43 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1243.eqiad.wmnet with reason: Maintenance
  • 06:43 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1243.eqiad.wmnet with reason: Maintenance
  • 06:43 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242 (T364299)', diff saved to https://phabricator.wikimedia.org/P64330 and previous config saved to /var/cache/conftool/dbconfig/20240608-064328-marostegui.json
  • 06:28 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242', diff saved to https://phabricator.wikimedia.org/P64329 and previous config saved to /var/cache/conftool/dbconfig/20240608-062820-marostegui.json
  • 06:13 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242', diff saved to https://phabricator.wikimedia.org/P64328 and previous config saved to /var/cache/conftool/dbconfig/20240608-061313-marostegui.json
  • 05:58 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242 (T364299)', diff saved to https://phabricator.wikimedia.org/P64327 and previous config saved to /var/cache/conftool/dbconfig/20240608-055804-marostegui.json
  • 05:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2166 (T352010)', diff saved to https://phabricator.wikimedia.org/P64326 and previous config saved to /var/cache/conftool/dbconfig/20240608-054609-ladsgroup.json
  • 05:46 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2166.codfw.wmnet with reason: Maintenance
  • 05:45 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2166.codfw.wmnet with reason: Maintenance
  • 05:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T352010)', diff saved to https://phabricator.wikimedia.org/P64325 and previous config saved to /var/cache/conftool/dbconfig/20240608-054545-ladsgroup.json
  • 05:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P64324 and previous config saved to /var/cache/conftool/dbconfig/20240608-053037-ladsgroup.json
  • 05:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1219 (T352010)', diff saved to https://phabricator.wikimedia.org/P64323 and previous config saved to /var/cache/conftool/dbconfig/20240608-052817-ladsgroup.json
  • 05:28 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1219.eqiad.wmnet with reason: Maintenance
  • 05:27 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1219.eqiad.wmnet with reason: Maintenance
  • 05:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1218 (T352010)', diff saved to https://phabricator.wikimedia.org/P64322 and previous config saved to /var/cache/conftool/dbconfig/20240608-052753-ladsgroup.json
  • 05:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P64321 and previous config saved to /var/cache/conftool/dbconfig/20240608-051529-ladsgroup.json
  • 05:12 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1218', diff saved to https://phabricator.wikimedia.org/P64320 and previous config saved to /var/cache/conftool/dbconfig/20240608-051244-ladsgroup.json
  • 05:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T352010)', diff saved to https://phabricator.wikimedia.org/P64319 and previous config saved to /var/cache/conftool/dbconfig/20240608-050021-ladsgroup.json
  • 04:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1218', diff saved to https://phabricator.wikimedia.org/P64318 and previous config saved to /var/cache/conftool/dbconfig/20240608-045736-ladsgroup.json
  • 04:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1218 (T352010)', diff saved to https://phabricator.wikimedia.org/P64317 and previous config saved to /var/cache/conftool/dbconfig/20240608-044228-ladsgroup.json
  • 02:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2153 (T352010)', diff saved to https://phabricator.wikimedia.org/P64316 and previous config saved to /var/cache/conftool/dbconfig/20240608-024534-ladsgroup.json
  • 02:45 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2153.codfw.wmnet with reason: Maintenance
  • 02:45 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2153.codfw.wmnet with reason: Maintenance
  • 02:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T352010)', diff saved to https://phabricator.wikimedia.org/P64315 and previous config saved to /var/cache/conftool/dbconfig/20240608-024511-ladsgroup.json
  • 02:44 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1242 (T364299)', diff saved to https://phabricator.wikimedia.org/P64314 and previous config saved to /var/cache/conftool/dbconfig/20240608-024455-marostegui.json
  • 02:44 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1242.eqiad.wmnet with reason: Maintenance
  • 02:44 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1242.eqiad.wmnet with reason: Maintenance
  • 02:44 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241 (T364299)', diff saved to https://phabricator.wikimedia.org/P64313 and previous config saved to /var/cache/conftool/dbconfig/20240608-024431-marostegui.json
  • 02:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1214 (T352010)', diff saved to https://phabricator.wikimedia.org/P64312 and previous config saved to /var/cache/conftool/dbconfig/20240608-023735-ladsgroup.json
  • 02:37 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance
  • 02:37 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance
  • 02:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 (T352010)', diff saved to https://phabricator.wikimedia.org/P64311 and previous config saved to /var/cache/conftool/dbconfig/20240608-023711-ladsgroup.json
  • 02:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P64310 and previous config saved to /var/cache/conftool/dbconfig/20240608-023003-ladsgroup.json
  • 02:29 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241', diff saved to https://phabricator.wikimedia.org/P64309 and previous config saved to /var/cache/conftool/dbconfig/20240608-022923-marostegui.json
  • 02:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P64308 and previous config saved to /var/cache/conftool/dbconfig/20240608-022203-ladsgroup.json
  • 02:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P64307 and previous config saved to /var/cache/conftool/dbconfig/20240608-021455-ladsgroup.json
  • 02:14 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241', diff saved to https://phabricator.wikimedia.org/P64306 and previous config saved to /var/cache/conftool/dbconfig/20240608-021415-marostegui.json
  • 02:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P64305 and previous config saved to /var/cache/conftool/dbconfig/20240608-020655-ladsgroup.json
  • 01:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T352010)', diff saved to https://phabricator.wikimedia.org/P64304 and previous config saved to /var/cache/conftool/dbconfig/20240608-015947-ladsgroup.json
  • 01:59 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241 (T364299)', diff saved to https://phabricator.wikimedia.org/P64303 and previous config saved to /var/cache/conftool/dbconfig/20240608-015906-marostegui.json
  • 01:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 (T352010)', diff saved to https://phabricator.wikimedia.org/P64302 and previous config saved to /var/cache/conftool/dbconfig/20240608-015147-ladsgroup.json

2024-06-07

  • 22:43 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1241 (T364299)', diff saved to https://phabricator.wikimedia.org/P64301 and previous config saved to /var/cache/conftool/dbconfig/20240607-224306-marostegui.json
  • 22:42 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1241.eqiad.wmnet with reason: Maintenance
  • 22:42 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1241.eqiad.wmnet with reason: Maintenance
  • 22:42 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1238 (T364299)', diff saved to https://phabricator.wikimedia.org/P64300 and previous config saved to /var/cache/conftool/dbconfig/20240607-224242-marostegui.json
  • 22:33 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 22:33 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 22:33 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249 (T364069)', diff saved to https://phabricator.wikimedia.org/P64299 and previous config saved to /var/cache/conftool/dbconfig/20240607-223300-marostegui.json
  • 22:27 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1238', diff saved to https://phabricator.wikimedia.org/P64298 and previous config saved to /var/cache/conftool/dbconfig/20240607-222734-marostegui.json
  • 22:17 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249', diff saved to https://phabricator.wikimedia.org/P64297 and previous config saved to /var/cache/conftool/dbconfig/20240607-221752-marostegui.json
  • 22:12 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1238', diff saved to https://phabricator.wikimedia.org/P64296 and previous config saved to /var/cache/conftool/dbconfig/20240607-221224-marostegui.json
  • 22:02 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249', diff saved to https://phabricator.wikimedia.org/P64295 and previous config saved to /var/cache/conftool/dbconfig/20240607-220244-marostegui.json
  • 21:57 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1238 (T364299)', diff saved to https://phabricator.wikimedia.org/P64294 and previous config saved to /var/cache/conftool/dbconfig/20240607-215716-marostegui.json
  • 21:47 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1249 (T364069)', diff saved to https://phabricator.wikimedia.org/P64293 and previous config saved to /var/cache/conftool/dbconfig/20240607-214736-marostegui.json
  • 21:18 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1218 (T352010)', diff saved to https://phabricator.wikimedia.org/P64292 and previous config saved to /var/cache/conftool/dbconfig/20240607-211842-ladsgroup.json
  • 21:18 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1218.eqiad.wmnet with reason: Maintenance
  • 21:18 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1218.eqiad.wmnet with reason: Maintenance
  • 21:18 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207 (T352010)', diff saved to https://phabricator.wikimedia.org/P64291 and previous config saved to /var/cache/conftool/dbconfig/20240607-211818-ladsgroup.json
  • 21:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207', diff saved to https://phabricator.wikimedia.org/P64290 and previous config saved to /var/cache/conftool/dbconfig/20240607-210310-ladsgroup.json
  • 20:48 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207', diff saved to https://phabricator.wikimedia.org/P64289 and previous config saved to /var/cache/conftool/dbconfig/20240607-204801-ladsgroup.json
  • 20:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207 (T352010)', diff saved to https://phabricator.wikimedia.org/P64288 and previous config saved to /var/cache/conftool/dbconfig/20240607-203253-ladsgroup.json
  • 19:42 dduvall@deploy1002: Finished scap: Backport for mediawiki.diff: Fix color regression and also use one more token (T366845) (duration: 16m 10s)
  • 19:33 dduvall@deploy1002: dduvall: Continuing with sync
  • 19:28 dduvall@deploy1002: dduvall: Backport for mediawiki.diff: Fix color regression and also use one more token (T366845) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 19:26 dduvall@deploy1002: Started scap: Backport for mediawiki.diff: Fix color regression and also use one more token (T366845)
  • 19:25 eevans@deploy1002: helmfile [staging] DONE helmfile.d/services/data-gateway: apply
  • 19:25 eevans@deploy1002: helmfile [staging] START helmfile.d/services/data-gateway: apply
  • 19:07 cdanis@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/aus-k8s-eqiad-services/jaeger: apply
  • 19:06 cdanis@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/aus-k8s-eqiad-services/jaeger: apply
  • 18:42 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1238 (T364299)', diff saved to https://phabricator.wikimedia.org/P64287 and previous config saved to /var/cache/conftool/dbconfig/20240607-184232-marostegui.json
  • 18:42 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1238.eqiad.wmnet with reason: Maintenance
  • 18:42 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1238.eqiad.wmnet with reason: Maintenance
  • 18:42 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221 (T364299)', diff saved to https://phabricator.wikimedia.org/P64286 and previous config saved to /var/cache/conftool/dbconfig/20240607-184208-marostegui.json
  • 18:27 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221', diff saved to https://phabricator.wikimedia.org/P64285 and previous config saved to /var/cache/conftool/dbconfig/20240607-182700-marostegui.json
  • 18:11 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221', diff saved to https://phabricator.wikimedia.org/P64284 and previous config saved to /var/cache/conftool/dbconfig/20240607-181151-marostegui.json
  • 18:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2146 (T352010)', diff saved to https://phabricator.wikimedia.org/P64283 and previous config saved to /var/cache/conftool/dbconfig/20240607-181021-ladsgroup.json
  • 18:10 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2146.codfw.wmnet with reason: Maintenance
  • 18:10 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2146.codfw.wmnet with reason: Maintenance
  • 18:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T352010)', diff saved to https://phabricator.wikimedia.org/P64282 and previous config saved to /var/cache/conftool/dbconfig/20240607-180958-ladsgroup.json
  • 17:56 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221 (T364299)', diff saved to https://phabricator.wikimedia.org/P64281 and previous config saved to /var/cache/conftool/dbconfig/20240607-175643-marostegui.json
  • 17:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P64280 and previous config saved to /var/cache/conftool/dbconfig/20240607-175450-ladsgroup.json
  • 17:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P64279 and previous config saved to /var/cache/conftool/dbconfig/20240607-173942-ladsgroup.json
  • 17:31 topranks: resetting line card 1/0 on cr2-codfw to enable new 100G link to ssw1-d8-codfw T364095
  • 17:28 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:20:00 on cloudsw1-b1-codfw.mgmt,cr2-eqord,pfw3-codfw with reason: bouncing fpc 1 pic 0 on cr2-codfw
  • 17:28 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:20:00 on cloudsw1-b1-codfw.mgmt,cr2-eqord,pfw3-codfw with reason: bouncing fpc 1 pic 0 on cr2-codfw
  • 17:24 topranks: re-route traffic from cr2-eqord away from circuit to cr2-codfw to allow for line card reset T364095
  • 17:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T352010)', diff saved to https://phabricator.wikimedia.org/P64278 and previous config saved to /var/cache/conftool/dbconfig/20240607-172432-ladsgroup.json
  • 17:23 topranks: disable IP transit to Lumen AS3356 from cr2-eqiad to allow line card reset T364095
  • 17:12 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cr2-codfw,cr2-codfw IPv6,re0.cr2-codfw.mgmt with reason: bouncing fpc 1 pic 0 on cr2-codfw
  • 17:12 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on cr2-codfw,cr2-codfw IPv6,re0.cr2-codfw.mgmt with reason: bouncing fpc 1 pic 0 on cr2-codfw
  • 17:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2164 (T352010)', diff saved to https://phabricator.wikimedia.org/P64277 and previous config saved to /var/cache/conftool/dbconfig/20240607-170634-ladsgroup.json
  • 17:06 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 17:06 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 17:06 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2164.codfw.wmnet with reason: Maintenance
  • 17:05 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2164.codfw.wmnet with reason: Maintenance
  • 17:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T352010)', diff saved to https://phabricator.wikimedia.org/P64276 and previous config saved to /var/cache/conftool/dbconfig/20240607-170555-ladsgroup.json
  • 16:56 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1221 (T364299)', diff saved to https://phabricator.wikimedia.org/P64275 and previous config saved to /var/cache/conftool/dbconfig/20240607-165616-marostegui.json
  • 16:56 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 16:55 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 16:55 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1221.eqiad.wmnet with reason: Maintenance
  • 16:55 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1221.eqiad.wmnet with reason: Maintenance
  • 16:55 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199 (T364299)', diff saved to https://phabricator.wikimedia.org/P64274 and previous config saved to /var/cache/conftool/dbconfig/20240607-165533-marostegui.json
  • 16:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P64273 and previous config saved to /var/cache/conftool/dbconfig/20240607-165047-ladsgroup.json
  • 16:40 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199', diff saved to https://phabricator.wikimedia.org/P64272 and previous config saved to /var/cache/conftool/dbconfig/20240607-164025-marostegui.json
  • 16:38 cdobbins@cumin1002: conftool action : set/pooled=yes; selector: name=4048.ulsfo.wmnet
  • 16:36 cdobbins@cumin1002: conftool action : set/pooled=no; selector: name=cp4048.ulsfo.wmnet
  • 16:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P64271 and previous config saved to /var/cache/conftool/dbconfig/20240607-163539-ladsgroup.json
  • 16:32 topranks: enabling new transport circuit from cr1-drmrs to cr2-eqiad T343385
  • 16:25 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199', diff saved to https://phabricator.wikimedia.org/P64270 and previous config saved to /var/cache/conftool/dbconfig/20240607-162516-marostegui.json
  • 16:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T352010)', diff saved to https://phabricator.wikimedia.org/P64269 and previous config saved to /var/cache/conftool/dbconfig/20240607-162031-ladsgroup.json
  • 16:19 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/proton: apply
  • 16:10 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199 (T364299)', diff saved to https://phabricator.wikimedia.org/P64268 and previous config saved to /var/cache/conftool/dbconfig/20240607-161007-marostegui.json
  • 16:08 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/proton: apply
  • 16:07 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:07 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns entries for moved telxius transpoort eqiad drmrs - cmooney@cumin1002"
  • 16:06 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/proton: apply
  • 16:06 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update dns entries for moved telxius transpoort eqiad drmrs - cmooney@cumin1002"
  • 16:05 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/proton: apply
  • 16:03 cmooney@cumin1002: START - Cookbook sre.dns.netbox
  • 15:59 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/proton: apply
  • 15:59 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/proton: apply
  • 15:53 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:53 sukhe@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: merging pending cr2-codfw changes - sukhe@cumin1002"
  • 15:52 sukhe@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: merging pending cr2-codfw changes - sukhe@cumin1002"
  • 15:45 sukhe@cumin1002: START - Cookbook sre.dns.netbox
  • 15:37 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 15:35 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 15:34 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 15:31 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 15:30 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 15:30 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 15:25 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 15:24 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 15:24 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 15:24 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 15:24 cgoubert@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 15:23 cgoubert@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 15:14 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:cassandra-dev: Apply update to Java 11 - eevans@cumin1002
  • 15:10 topranks: disabling netbox service on primary netbox server netbox1001 to restore db from backup
  • 15:01 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on netbox1002.eqiad.wmnet with reason: Restoring DB from backup on netboxdb1002
  • 15:01 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on netbox1002.eqiad.wmnet with reason: Restoring DB from backup on netboxdb1002
  • 14:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1211 (T352010)', diff saved to https://phabricator.wikimedia.org/P64267 and previous config saved to /var/cache/conftool/dbconfig/20240607-145937-ladsgroup.json
  • 14:59 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance
  • 14:59 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance
  • 14:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 (T352010)', diff saved to https://phabricator.wikimedia.org/P64266 and previous config saved to /var/cache/conftool/dbconfig/20240607-145913-ladsgroup.json
  • 14:55 topranks: enabling port et-1/0/2 for 100G mode on cr2-codfw T364095
  • 14:53 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:cassandra-dev: Apply update to Java 11 - eevans@cumin1002
  • 14:46 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:46 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add new entries for cr2-codfw peering to ssw1-d8-codfw - cmooney@cumin1002"
  • 14:45 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add new entries for cr2-codfw peering to ssw1-d8-codfw - cmooney@cumin1002"
  • 14:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P64265 and previous config saved to /var/cache/conftool/dbconfig/20240607-144404-ladsgroup.json
  • 14:43 cmooney@cumin1002: START - Cookbook sre.dns.netbox
  • 14:39 dcausse@deploy1002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 14:39 dcausse@deploy1002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 14:38 jhathaway@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 14:38 jhathaway@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 14:37 jhathaway@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 14:37 jhathaway@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 14:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P64264 and previous config saved to /var/cache/conftool/dbconfig/20240607-142856-ladsgroup.json
  • 14:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 (T352010)', diff saved to https://phabricator.wikimedia.org/P64263 and previous config saved to /var/cache/conftool/dbconfig/20240607-141349-ladsgroup.json
  • 14:02 Emperor: restart swift-proxy on ms-fe1009 ms-fe1011 ms-fe1012 ms-fe1014 T360913
  • 13:23 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1249 (T364069)', diff saved to https://phabricator.wikimedia.org/P64262 and previous config saved to /var/cache/conftool/dbconfig/20240607-132342-marostegui.json
  • 13:23 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1249.eqiad.wmnet with reason: Maintenance
  • 13:23 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1249.eqiad.wmnet with reason: Maintenance
  • 13:23 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248 (T364069)', diff saved to https://phabricator.wikimedia.org/P64261 and previous config saved to /var/cache/conftool/dbconfig/20240607-132319-marostegui.json
  • 13:08 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248', diff saved to https://phabricator.wikimedia.org/P64260 and previous config saved to /var/cache/conftool/dbconfig/20240607-130811-marostegui.json
  • 13:05 moritzm: uploaded wmf-laptop 1.0.0 to component/wmf-laptop for bookworm-wikimedia
  • 13:04 dcausse@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 13:04 dcausse@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 13:02 dcausse@deploy1002: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 13:01 dcausse@deploy1002: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 13:01 dcausse@deploy1002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 12:53 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248', diff saved to https://phabricator.wikimedia.org/P64259 and previous config saved to /var/cache/conftool/dbconfig/20240607-125303-marostegui.json
  • 12:49 dcausse@deploy1002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 12:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1207 (T352010)', diff saved to https://phabricator.wikimedia.org/P64258 and previous config saved to /var/cache/conftool/dbconfig/20240607-124641-ladsgroup.json
  • 12:46 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1207.eqiad.wmnet with reason: Maintenance
  • 12:46 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1207.eqiad.wmnet with reason: Maintenance
  • 12:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T352010)', diff saved to https://phabricator.wikimedia.org/P64257 and previous config saved to /var/cache/conftool/dbconfig/20240607-124616-ladsgroup.json
  • 12:44 dcausse@deploy1002: helmfile [eqiad] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 12:44 dcausse@deploy1002: helmfile [eqiad] START helmfile.d/services/rdf-streaming-updater: apply
  • 12:41 dcausse@deploy1002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 12:40 dcausse@deploy1002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
  • 12:38 dcausse@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 12:38 dcausse@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 12:37 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1248 (T364069)', diff saved to https://phabricator.wikimedia.org/P64256 and previous config saved to /var/cache/conftool/dbconfig/20240607-123754-marostegui.json
  • 12:33 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 12:31 isaranto@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 12:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P64255 and previous config saved to /var/cache/conftool/dbconfig/20240607-123108-ladsgroup.json
  • 12:24 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1199 (T364299)', diff saved to https://phabricator.wikimedia.org/P64254 and previous config saved to /var/cache/conftool/dbconfig/20240607-122413-marostegui.json
  • 12:24 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1199.eqiad.wmnet with reason: Maintenance
  • 12:23 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1199.eqiad.wmnet with reason: Maintenance
  • 12:23 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190 (T364299)', diff saved to https://phabricator.wikimedia.org/P64253 and previous config saved to /var/cache/conftool/dbconfig/20240607-122349-marostegui.json
  • 12:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P64252 and previous config saved to /var/cache/conftool/dbconfig/20240607-121559-ladsgroup.json
  • 12:08 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/services/ratelimit: apply
  • 12:08 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P64251 and previous config saved to /var/cache/conftool/dbconfig/20240607-120841-marostegui.json
  • 12:08 jayme@deploy1002: helmfile [eqiad] START helmfile.d/services/ratelimit: apply
  • 12:07 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/services/ratelimit: apply
  • 12:07 aokoth@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host phab1004.eqiad.wmnet
  • 12:07 jayme@deploy1002: helmfile [codfw] START helmfile.d/services/ratelimit: apply
  • 12:07 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/ratelimit: apply
  • 12:07 jayme@deploy1002: helmfile [staging] START helmfile.d/services/ratelimit: apply
  • 12:01 aokoth@cumin1002: START - Cookbook sre.hosts.reboot-single for host phab1004.eqiad.wmnet
  • 12:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T352010)', diff saved to https://phabricator.wikimedia.org/P64250 and previous config saved to /var/cache/conftool/dbconfig/20240607-120051-ladsgroup.json
  • 11:53 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P64249 and previous config saved to /var/cache/conftool/dbconfig/20240607-115333-marostegui.json
  • 11:48 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb1011.eqiad.wmnet
  • 11:42 cgoubert@cumin1002: START - Cookbook sre.hosts.reboot-single for host rdb1011.eqiad.wmnet
  • 11:42 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb1013.eqiad.wmnet
  • 11:38 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190 (T364299)', diff saved to https://phabricator.wikimedia.org/P64248 and previous config saved to /var/cache/conftool/dbconfig/20240607-113824-marostegui.json
  • 11:36 cgoubert@cumin1002: START - Cookbook sre.hosts.reboot-single for host rdb1013.eqiad.wmnet
  • 11:35 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb1012.eqiad.wmnet
  • 11:29 cgoubert@cumin1002: START - Cookbook sre.hosts.reboot-single for host rdb1012.eqiad.wmnet
  • 11:28 sfaci@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
  • 11:28 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb1014.eqiad.wmnet
  • 11:28 sfaci@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
  • 11:22 cgoubert@cumin1002: START - Cookbook sre.hosts.reboot-single for host rdb1014.eqiad.wmnet
  • 11:20 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb2007.codfw.wmnet
  • 11:12 cgoubert@cumin1002: START - Cookbook sre.hosts.reboot-single for host rdb2007.codfw.wmnet
  • 11:12 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb2008.codfw.wmnet
  • 11:05 cgoubert@cumin1002: START - Cookbook sre.hosts.reboot-single for host rdb2008.codfw.wmnet
  • 11:05 jelto@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab2002.wikimedia.org
  • 11:03 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb2009.codfw.wmnet
  • 11:01 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/ratelimit: apply
  • 11:01 jayme@deploy1002: helmfile [staging] START helmfile.d/services/ratelimit: apply
  • 11:00 jelto@cumin2002: START - Cookbook sre.hosts.reboot-single for host gitlab2002.wikimedia.org
  • 11:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1206 (T352010)', diff saved to https://phabricator.wikimedia.org/P64246 and previous config saved to /var/cache/conftool/dbconfig/20240607-110025-ladsgroup.json
  • 11:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1206.eqiad.wmnet with reason: Maintenance
  • 11:00 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1206.eqiad.wmnet with reason: Maintenance
  • 11:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T352010)', diff saved to https://phabricator.wikimedia.org/P64245 and previous config saved to /var/cache/conftool/dbconfig/20240607-110000-ladsgroup.json
  • 10:57 cgoubert@cumin1002: START - Cookbook sre.hosts.reboot-single for host rdb2009.codfw.wmnet
  • 10:56 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rdb2010.codfw.wmnet
  • 10:50 cgoubert@cumin1002: START - Cookbook sre.hosts.reboot-single for host rdb2010.codfw.wmnet
  • 10:50 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host rdb2010.codfw.wmnet
  • 10:50 cgoubert@cumin1002: START - Cookbook sre.hosts.reboot-single for host rdb2010.codfw.wmnet
  • 10:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P64244 and previous config saved to /var/cache/conftool/dbconfig/20240607-104452-ladsgroup.json
  • 10:33 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/ratelimit: apply
  • 10:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P64243 and previous config saved to /var/cache/conftool/dbconfig/20240607-102944-ladsgroup.json
  • 10:23 jayme@deploy1002: helmfile [staging] START helmfile.d/services/ratelimit: apply
  • 10:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T352010)', diff saved to https://phabricator.wikimedia.org/P64242 and previous config saved to /var/cache/conftool/dbconfig/20240607-101436-ladsgroup.json
  • 10:13 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:wikikube-staging-worker-eqiad
  • 09:58 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host pki1002.eqiad.wmnet
  • 09:56 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/ratelimit: apply
  • 09:56 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 09:56 jayme@deploy1002: helmfile [staging] START helmfile.d/services/ratelimit: apply
  • 09:54 cgoubert@cumin1002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-staging-worker-eqiad
  • 09:54 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 09:54 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:wikikube-staging-worker-codfw
  • 09:54 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 09:53 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 09:53 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 09:52 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 09:52 moritzm: powercycle pki1002
  • 09:43 jynus: upgrading and restarting db1239 T360751
  • 09:40 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host pki1002.eqiad.wmnet
  • 09:40 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc2002.wikimedia.org
  • 09:38 isaranto@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 09:36 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc2002.wikimedia.org
  • 09:36 cgoubert@cumin1002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-staging-worker-codfw
  • 09:35 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 09:35 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 09:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc1002.wikimedia.org
  • 09:30 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc1002.wikimedia.org
  • 09:30 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 09:28 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 09:26 isaranto@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 09:25 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 09:24 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 09:22 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 09:20 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2028.codfw.wmnet
  • 09:20 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2028.codfw.wmnet
  • 09:18 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2145 (T352010)', diff saved to https://phabricator.wikimedia.org/P64241 and previous config saved to /var/cache/conftool/dbconfig/20240607-091849-ladsgroup.json
  • 09:18 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2145.codfw.wmnet with reason: Maintenance
  • 09:18 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2145.codfw.wmnet with reason: Maintenance
  • 09:13 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2028.codfw.wmnet
  • 09:11 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp4049.ulsfo.wmnet
  • 09:06 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2028.codfw.wmnet
  • 09:04 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2027.codfw.wmnet
  • 09:04 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2027.codfw.wmnet
  • 09:03 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 09:03 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 08:57 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2027.codfw.wmnet
  • 08:51 jynus@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2099.codfw.wmnet
  • 08:51 jynus@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:51 jynus@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2099.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jynus@cumin1002"
  • 08:50 jynus@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2099.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jynus@cumin1002"
  • 08:49 taavi: import opentofu 1.7.2 to apt.wikimedia.org T365696
  • 08:49 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2027.codfw.wmnet
  • 08:48 jynus: reboot dbprov1001,1002,2001,2002
  • 08:46 jynus@cumin1002: START - Cookbook sre.dns.netbox
  • 08:41 jynus@cumin1002: START - Cookbook sre.hosts.decommission for hosts db2099.codfw.wmnet
  • 08:40 jynus@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2098.codfw.wmnet
  • 08:40 jynus@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:39 jynus@cumin1002: START - Cookbook sre.dns.netbox
  • 08:39 jynus@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2097.codfw.wmnet
  • 08:39 jynus@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:39 jynus@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2097.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jynus@cumin1002"
  • 08:37 jynus@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2097.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jynus@cumin1002"
  • 08:35 jynus@cumin1002: START - Cookbook sre.dns.netbox
  • 08:19 fabfur@cumin1002: START - Cookbook sre.hosts.reboot-single for host cp4049.ulsfo.wmnet
  • 08:19 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp4049.ulsfo.wmnet
  • 08:18 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti2025.codfw.wmnet
  • 08:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti2025.codfw.wmnet
  • 08:15 jynus: deleted from zarcillo db2097, db2098, db2099 T362802 T366877 T362883
  • 08:12 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti2025.codfw.wmnet
  • 08:09 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti2025.codfw.wmnet
  • 08:03 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host pki-root1002.eqiad.wmnet
  • 07:57 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1190 (T364299)', diff saved to https://phabricator.wikimedia.org/P64239 and previous config saved to /var/cache/conftool/dbconfig/20240607-075742-marostegui.json
  • 07:57 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1190.eqiad.wmnet with reason: Maintenance
  • 07:57 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1190.eqiad.wmnet with reason: Maintenance
  • 07:57 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host pki-root1002.eqiad.wmnet
  • 07:56 ryankemper@cumin2002: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reboot - ryankemper@cumin2002 - T366555
  • 07:51 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host seaborgium.wikimedia.org
  • 07:48 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host seaborgium.wikimedia.org
  • 07:45 jynus@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2097.codfw.wmnet with reason: about to decommission
  • 07:45 jynus@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2097.codfw.wmnet with reason: about to decommission
  • 07:45 jynus@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2099.codfw.wmnet with reason: about to decommission
  • 07:44 jynus@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2099.codfw.wmnet with reason: about to decommission
  • 07:30 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host bast1003.wikimedia.org with OS bookworm
  • 07:19 jynus@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2098.codfw.wmnet with reason: about to decommission
  • 07:19 jynus@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2098.codfw.wmnet with reason: about to decommission
  • 07:12 ryankemper@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reboot - ryankemper@cumin2002 - T366555
  • 07:09 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on bast1003.wikimedia.org with reason: host reimage
  • 07:07 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on bast1003.wikimedia.org with reason: host reimage
  • 06:52 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host bast1003.wikimedia.org with OS bookworm
  • 06:51 moritzm: reimaging bast1003 to bookworm
  • 06:36 ryankemper@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reboot - ryankemper@cumin2002 - T366555
  • 06:34 ryankemper@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reboot - ryankemper@cumin2002 - T366555
  • 06:31 ryankemper@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reboot - ryankemper@cumin2002 - T366555
  • 05:15 ryankemper@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reboot - ryankemper@cumin2002 - T366555
  • 04:44 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 04:44 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 04:35 ryankemper@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reboot - ryankemper@cumin2002 - T366555
  • 04:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2163 (T352010)', diff saved to https://phabricator.wikimedia.org/P64238 and previous config saved to /var/cache/conftool/dbconfig/20240607-043343-ladsgroup.json
  • 04:33 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2163.codfw.wmnet with reason: Maintenance
  • 04:33 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2163.codfw.wmnet with reason: Maintenance
  • 04:33 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2162 (T352010)', diff saved to https://phabricator.wikimedia.org/P64237 and previous config saved to /var/cache/conftool/dbconfig/20240607-043320-ladsgroup.json
  • 04:23 ryankemper@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reboot - ryankemper@cumin2002 - T366555
  • 04:18 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2162', diff saved to https://phabricator.wikimedia.org/P64236 and previous config saved to /var/cache/conftool/dbconfig/20240607-041812-ladsgroup.json
  • 04:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2162', diff saved to https://phabricator.wikimedia.org/P64235 and previous config saved to /var/cache/conftool/dbconfig/20240607-040302-ladsgroup.json
  • 04:02 ryankemper@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reboot - ryankemper@cumin2002 - T366555
  • 04:01 ryankemper@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reboot - ryankemper@cumin2002 - T366555
  • 03:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2162 (T352010)', diff saved to https://phabricator.wikimedia.org/P64234 and previous config saved to /var/cache/conftool/dbconfig/20240607-034755-ladsgroup.json
  • 03:35 ryankemper@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reboot - ryankemper@cumin2002 - T366555
  • 03:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1209 (T352010)', diff saved to https://phabricator.wikimedia.org/P64233 and previous config saved to /var/cache/conftool/dbconfig/20240607-033141-ladsgroup.json
  • 03:31 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1209.eqiad.wmnet with reason: Maintenance
  • 03:31 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1209.eqiad.wmnet with reason: Maintenance
  • 03:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 (T352010)', diff saved to https://phabricator.wikimedia.org/P64232 and previous config saved to /var/cache/conftool/dbconfig/20240607-033118-ladsgroup.json
  • 03:28 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1248 (T364069)', diff saved to https://phabricator.wikimedia.org/P64231 and previous config saved to /var/cache/conftool/dbconfig/20240607-032809-marostegui.json
  • 03:28 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1248.eqiad.wmnet with reason: Maintenance
  • 03:27 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1248.eqiad.wmnet with reason: Maintenance
  • 03:27 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247 (T364069)', diff saved to https://phabricator.wikimedia.org/P64230 and previous config saved to /var/cache/conftool/dbconfig/20240607-032746-marostegui.json
  • 03:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P64229 and previous config saved to /var/cache/conftool/dbconfig/20240607-031610-ladsgroup.json
  • 03:12 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247', diff saved to https://phabricator.wikimedia.org/P64228 and previous config saved to /var/cache/conftool/dbconfig/20240607-031238-marostegui.json
  • 03:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P64227 and previous config saved to /var/cache/conftool/dbconfig/20240607-030102-ladsgroup.json
  • 02:57 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247', diff saved to https://phabricator.wikimedia.org/P64226 and previous config saved to /var/cache/conftool/dbconfig/20240607-025729-marostegui.json
  • 02:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 (T352010)', diff saved to https://phabricator.wikimedia.org/P64225 and previous config saved to /var/cache/conftool/dbconfig/20240607-024554-ladsgroup.json
  • 02:44 ejegg: fundraising civicrm upgraded from 757f8528 to ebfbad86
  • 02:42 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1247 (T364069)', diff saved to https://phabricator.wikimedia.org/P64224 and previous config saved to /var/cache/conftool/dbconfig/20240607-024221-marostegui.json
  • 02:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1196 (T352010)', diff saved to https://phabricator.wikimedia.org/P64223 and previous config saved to /var/cache/conftool/dbconfig/20240607-021501-ladsgroup.json
  • 02:14 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 02:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 02:14 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1196.eqiad.wmnet with reason: Maintenance
  • 02:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1196.eqiad.wmnet with reason: Maintenance
  • 02:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T352010)', diff saved to https://phabricator.wikimedia.org/P64222 and previous config saved to /var/cache/conftool/dbconfig/20240607-021418-ladsgroup.json
  • 01:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P64221 and previous config saved to /var/cache/conftool/dbconfig/20240607-015910-ladsgroup.json
  • 01:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P64220 and previous config saved to /var/cache/conftool/dbconfig/20240607-014403-ladsgroup.json
  • afk: fundraising civicrm upgraded from 286bd2b8 to 757f8528
  • 01:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T352010)', diff saved to https://phabricator.wikimedia.org/P64219 and previous config saved to /var/cache/conftool/dbconfig/20240607-012855-ladsgroup.json
  • 01:14 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 01:14 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 01:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T352010)', diff saved to https://phabricator.wikimedia.org/P64218 and previous config saved to /var/cache/conftool/dbconfig/20240607-011438-ladsgroup.json
  • 00:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P64217 and previous config saved to /var/cache/conftool/dbconfig/20240607-005930-ladsgroup.json
  • 00:55 ryankemper@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_eqiad: eqiad cluster reboot - ryankemper@cumin2002 - T366555
  • 00:55 ryankemper@cumin2002: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster restart - ryankemper@cumin2002 - T366555
  • 00:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P64216 and previous config saved to /var/cache/conftool/dbconfig/20240607-004423-ladsgroup.json
  • 00:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T352010)', diff saved to https://phabricator.wikimedia.org/P64215 and previous config saved to /var/cache/conftool/dbconfig/20240607-002915-ladsgroup.json
  • 00:23 bd808@deploy1002: Finished scap: Backport for Revert "wikitech: Replace OSM class in Gerrit blocking hook" (duration: 11m 24s)
  • 00:15 bd808@deploy1002: bd808 and trainbranchbot: Continuing with sync
  • 00:14 bd808@deploy1002: bd808 and trainbranchbot: Backport for Revert "wikitech: Replace OSM class in Gerrit blocking hook" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 00:12 bd808@deploy1002: Started scap: Backport for Revert "wikitech: Replace OSM class in Gerrit blocking hook"

2024-06-06

  • 23:32 bd808@deploy1002: Finished scap: Backport for wikitech: Replace OSM class in Gerrit blocking hook (T161553) (duration: 11m 24s)
  • 23:23 bd808@deploy1002: taavi and bd808: Continuing with sync
  • 23:23 bd808@deploy1002: taavi and bd808: Backport for wikitech: Replace OSM class in Gerrit blocking hook (T161553) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 23:20 bd808@deploy1002: Started scap: Backport for wikitech: Replace OSM class in Gerrit blocking hook (T161553)
  • 23:16 bd808@deploy1002: Finished scap: Backport for wikitech: Update Phabricator Conduit calls to disable/enable users (T366587) (duration: 12m 01s)
  • 23:07 bd808@deploy1002: bd808: Continuing with sync
  • 23:06 bd808@deploy1002: bd808: Backport for wikitech: Update Phabricator Conduit calls to disable/enable users (T366587) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 23:04 bd808@deploy1002: Started scap: Backport for wikitech: Update Phabricator Conduit calls to disable/enable users (T366587)
  • 21:46 ryankemper@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster restart - ryankemper@cumin2002 - T366555
  • 21:27 ryankemper@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster restart - ryankemper@cumin2002 - T366555
  • 21:10 jdrewniak@deploy1002: Finished scap: Backport for Disable font size options on specified pages for all wikis (T366625) (duration: 12m 50s)
  • 21:01 jdrewniak@deploy1002: jdrewniak and toyofuku: Continuing with sync
  • 21:00 jdrewniak@deploy1002: jdrewniak and toyofuku: Backport for Disable font size options on specified pages for all wikis (T366625) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:57 jdrewniak@deploy1002: Started scap: Backport for Disable font size options on specified pages for all wikis (T366625)
  • 20:54 urbanecm@deploy1002: Finished scap: Backport for testwiki: Enable CommunityConfiguration (T360954) (duration: 12m 09s)
  • 20:50 urbanecm: mwscript extensions/GrowthExperiments/maintenance/migrateCommunityConfig.php --wiki=testwiki # T360954
  • 20:46 urbanecm@deploy1002: urbanecm: Continuing with sync
  • 20:44 urbanecm@deploy1002: urbanecm: Backport for testwiki: Enable CommunityConfiguration (T360954) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:42 urbanecm@deploy1002: Started scap: Backport for testwiki: Enable CommunityConfiguration (T360954)
  • 20:41 urbanecm@deploy1002: Finished scap: Backport for [mswiktionary] Rename namespace "Wiktionary" to "Wikikamus" (T366549), Improve navigation link handling in CommunityConfiguration (T364938 T365504 T360954), Drop logging level for unsupported providers to DEBUG (T366519 T360954) (duration: 19m 42s)
  • 20:33 urbanecm@deploy1002: urbanecm and sgimeno and gergesshamon: Continuing with sync
  • 20:32 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/data-gateway: apply
  • 20:31 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/data-gateway: apply
  • 20:30 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/data-gateway: apply
  • 20:29 ejegg: fundraising civicrm upgraded from 71ed6bed to 286bd2b8
  • 20:28 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/data-gateway: apply
  • 20:26 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/data-gateway: apply
  • 20:26 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/data-gateway: apply
  • 20:24 urbanecm@deploy1002: urbanecm and sgimeno and gergesshamon: Backport for [mswiktionary] Rename namespace "Wiktionary" to "Wikikamus" (T366549), Improve navigation link handling in CommunityConfiguration (T364938 T365504 T360954), Drop logging level for unsupported providers to DEBUG (T366519 T360954) synced to the testservers (https://wikitech.wikimedia.org/wiki
  • 20:22 urbanecm@deploy1002: Started scap: Backport for [mswiktionary] Rename namespace "Wiktionary" to "Wikikamus" (T366549), Improve navigation link handling in CommunityConfiguration (T364938 T365504 T360954), Drop logging level for unsupported providers to DEBUG (T366519 T360954)
  • 20:21 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/data-gateway: apply
  • 20:20 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/data-gateway: apply
  • 20:20 urbanecm@deploy1002: Finished scap: Backport for Assign applychangetags right to group "all" on plwiktionary (T363638), InitialiseSettings: Enable AutoModerator on trwiki (T362622), InitaliseSettings-labs: Deploy Automoderator patroller workstream survey to cawiki (T362969) (duration: 14m 10s)
  • 20:19 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/data-gateway: apply
  • 20:18 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/data-gateway: apply
  • 20:13 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/data-gateway: apply
  • 20:13 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/data-gateway: apply
  • 20:11 urbanecm@deploy1002: wargo and urbanecm and jsn and kgraessle: Continuing with sync
  • 20:08 urbanecm@deploy1002: wargo and urbanecm and jsn and kgraessle: Backport for Assign applychangetags right to group "all" on plwiktionary (T363638), InitialiseSettings: Enable AutoModerator on trwiki (T362622), InitaliseSettings-labs: Deploy Automoderator patroller workstream survey to cawiki (T362969) synced to the testservers (https://wikitech.wikimedia.org/wiki
  • 20:06 urbanecm@deploy1002: Started scap: Backport for Assign applychangetags right to group "all" on plwiktionary (T363638), InitialiseSettings: Enable AutoModerator on trwiki (T362622), InitaliseSettings-labs: Deploy Automoderator patroller workstream survey to cawiki (T362969)
  • 20:02 ryankemper@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster restart - ryankemper@cumin2002 - T366555
  • 19:31 xcollazo@deploy1002: Finished deploy [airflow-dags/analytics@a8843e6]: Deploying latest DAGs to the analytics Airflow instance. T358707. (duration: 00m 26s)
  • 19:30 xcollazo@deploy1002: Started deploy [airflow-dags/analytics@a8843e6]: Deploying latest DAGs to the analytics Airflow instance. T358707.
  • 18:29 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.43.0-wmf.8 refs T361402
  • 18:17 thcipriani@deploy1002: Finished deploy [releng/jenkins-deploy@3be9893] (releasing): (no justification provided) (duration: 00m 43s)
  • 18:17 thcipriani@deploy1002: Started deploy [releng/jenkins-deploy@3be9893] (releasing): (no justification provided)
  • 17:57 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-ctrl1001.eqiad.wmnet with OS bullseye
  • 17:57 kamila@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - kamila@cumin1002"
  • 17:56 kamila@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - kamila@cumin1002"
  • 17:48 topranks: re-enabling pybal on lvs1017 after cable move T366361
  • 17:31 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1247 (T364069)', diff saved to https://phabricator.wikimedia.org/P64211 and previous config saved to /var/cache/conftool/dbconfig/20240606-173121-marostegui.json
  • 17:31 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1247.eqiad.wmnet with reason: Maintenance
  • 17:31 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1247.eqiad.wmnet with reason: Maintenance
  • 17:26 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:20:00 on lvs1017.eqiad.wmnet with reason: moving lvs1017 link back to ssw1-e1-codfw
  • 17:26 topranks: disabling pybal on lvs1017 to move traffic to lvs1020 in advance of cable move T366361
  • 17:26 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:20:00 on lvs1017.eqiad.wmnet with reason: moving lvs1017 link back to ssw1-e1-codfw
  • 17:23 topranks: re-enabling pybal on lvs1018 after cable move T366361
  • 17:15 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:20:00 on lvs1018.eqiad.wmnet with reason: moving lvs1018 link back to ssw1-e1-codfw
  • 17:15 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:20:00 on lvs1018.eqiad.wmnet with reason: moving lvs1018 link back to ssw1-e1-codfw
  • 17:15 cmooney@cumin1002: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 0:20:00 on lvs1019.eqiad.wmnet with reason: moving lvs1018 link back to ssw1-e1-codfw
  • 17:14 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:20:00 on lvs1019.eqiad.wmnet with reason: moving lvs1018 link back to ssw1-e1-codfw
  • 17:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1186 (T352010)', diff saved to https://phabricator.wikimedia.org/P64210 and previous config saved to /var/cache/conftool/dbconfig/20240606-171359-ladsgroup.json
  • 17:13 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1186.eqiad.wmnet with reason: Maintenance
  • 17:13 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1186.eqiad.wmnet with reason: Maintenance
  • 17:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T352010)', diff saved to https://phabricator.wikimedia.org/P64209 and previous config saved to /var/cache/conftool/dbconfig/20240606-171336-ladsgroup.json
  • 17:11 topranks: disabling pybal on lvs1018 to move traffic to lvs1020 in advance of cable move T366361
  • 17:11 topranks: re-enabling pybal on lvs1019 after cable move T366361
  • 16:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P64208 and previous config saved to /var/cache/conftool/dbconfig/20240606-165828-ladsgroup.json
  • 16:52 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:20:00 on lvs1019.eqiad.wmnet with reason: moving lvs1019 link back to ssw1-f1-codfw
  • 16:51 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:20:00 on lvs1019.eqiad.wmnet with reason: moving lvs1019 link back to ssw1-f1-codfw
  • 16:50 topranks: disabling pybal on lvs1019 to move traffic to lvs1020 in advance of cable move T366361
  • 16:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P64207 and previous config saved to /var/cache/conftool/dbconfig/20240606-164320-ladsgroup.json
  • 16:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T352010)', diff saved to https://phabricator.wikimedia.org/P64206 and previous config saved to /var/cache/conftool/dbconfig/20240606-162812-ladsgroup.json
  • 16:28 hashar@deploy1002: Finished deploy [integration/docroot@eee90e6]: (no justification provided) (duration: 00m 05s)
  • 16:28 hashar@deploy1002: Started deploy [integration/docroot@eee90e6]: (no justification provided)
  • 16:25 dancy@deploy1002: Installation of scap version "4.86.1" completed for 285 hosts
  • 16:25 dancy@deploy1002: Installing scap version "4.86.1" for 285 hosts
  • 16:24 dancy@deploy1002: Installing scap version "4.86.1" for 286 hosts
  • 16:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2130 (T352010)', diff saved to https://phabricator.wikimedia.org/P64205 and previous config saved to /var/cache/conftool/dbconfig/20240606-161338-ladsgroup.json
  • 16:13 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2130.codfw.wmnet with reason: Maintenance
  • 16:13 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2130.codfw.wmnet with reason: Maintenance
  • 16:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T352010)', diff saved to https://phabricator.wikimedia.org/P64204 and previous config saved to /var/cache/conftool/dbconfig/20240606-161312-ladsgroup.json
  • 16:10 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on wikikube-ctrl1001.eqiad.wmnet with reason: reimage still running
  • 16:10 kamila@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on wikikube-ctrl1001.eqiad.wmnet with reason: reimage still running
  • 16:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2162 (T352010)', diff saved to https://phabricator.wikimedia.org/P64203 and previous config saved to /var/cache/conftool/dbconfig/20240606-160028-ladsgroup.json
  • 16:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2162.codfw.wmnet with reason: Maintenance
  • 16:00 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2162.codfw.wmnet with reason: Maintenance
  • 16:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2161 (T352010)', diff saved to https://phabricator.wikimedia.org/P64202 and previous config saved to /var/cache/conftool/dbconfig/20240606-160004-ladsgroup.json
  • 15:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P64201 and previous config saved to /var/cache/conftool/dbconfig/20240606-155804-ladsgroup.json
  • 15:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2161', diff saved to https://phabricator.wikimedia.org/P64199 and previous config saved to /var/cache/conftool/dbconfig/20240606-154457-ladsgroup.json
  • 15:44 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/data-gateway: apply
  • 15:42 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/data-gateway: apply
  • 15:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P64198 and previous config saved to /var/cache/conftool/dbconfig/20240606-154255-ladsgroup.json
  • 15:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1203 (T352010)', diff saved to https://phabricator.wikimedia.org/P64197 and previous config saved to /var/cache/conftool/dbconfig/20240606-154028-ladsgroup.json
  • 15:40 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance
  • 15:40 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/data-gateway: apply
  • 15:40 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance
  • 15:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1193 (T352010)', diff saved to https://phabricator.wikimedia.org/P64196 and previous config saved to /var/cache/conftool/dbconfig/20240606-154004-ladsgroup.json
  • 15:38 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/data-gateway: apply
  • 15:38 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/data-gateway: apply
  • 15:37 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/data-gateway: apply
  • 15:37 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 (T360332)', diff saved to https://phabricator.wikimedia.org/P64195 and previous config saved to /var/cache/conftool/dbconfig/20240606-153730-arnaudb.json
  • 15:29 topranks: rebooting ssw1-f1-eqiad to install new JunOS release T366361
  • 15:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2161', diff saved to https://phabricator.wikimedia.org/P64194 and previous config saved to /var/cache/conftool/dbconfig/20240606-152949-ladsgroup.json
  • 15:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T352010)', diff saved to https://phabricator.wikimedia.org/P64193 and previous config saved to /var/cache/conftool/dbconfig/20240606-152747-ladsgroup.json
  • 15:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1193', diff saved to https://phabricator.wikimedia.org/P64192 and previous config saved to /var/cache/conftool/dbconfig/20240606-152456-ladsgroup.json
  • 15:23 kamila@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "moved wikikube-ctrl1001 to a new rack - kamila@cumin1002 - T366204"
  • 15:23 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
  • 15:23 jforrester@deploy1002: Finished scap: Backport for Revert "commonswiki: Enable numeric wgCategoryCollation" (T366809) (duration: 13m 58s)
  • 15:22 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P64191 and previous config saved to /var/cache/conftool/dbconfig/20240606-152222-arnaudb.json
  • 15:19 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 15:18 kamila@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "moved wikikube-ctrl1001 to a new rack - kamila@cumin1002 - T366204"
  • 15:16 sfaci@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
  • 15:16 sfaci@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
  • 15:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2161 (T352010)', diff saved to https://phabricator.wikimedia.org/P64190 and previous config saved to /var/cache/conftool/dbconfig/20240606-151440-ladsgroup.json
  • 15:14 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 15:12 jforrester@deploy1002: jforrester: Continuing with sync
  • 15:11 jforrester@deploy1002: jforrester: Backport for Revert "commonswiki: Enable numeric wgCategoryCollation" (T366809) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 15:10 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 15:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1193', diff saved to https://phabricator.wikimedia.org/P64189 and previous config saved to /var/cache/conftool/dbconfig/20240606-150948-ladsgroup.json
  • 15:09 jforrester@deploy1002: Started scap: Backport for Revert "commonswiki: Enable numeric wgCategoryCollation" (T366809)
  • 15:08 jforrester@deploy1002: Finished scap: Backport for Add wikilambda-edit-monolingual-text-placeholder message to extension.json (T359782) (duration: 12m 05s)
  • 15:07 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P64188 and previous config saved to /var/cache/conftool/dbconfig/20240606-150714-arnaudb.json
  • 15:04 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on ssw1-e1-eqiad.mgmt with reason: upgrading spine switches eqiad rows e and f
  • 15:04 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 1:30:00 on ssw1-e1-eqiad.mgmt with reason: upgrading spine switches eqiad rows e and f
  • 14:59 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on 15 hosts with reason: upgrading spine switches eqiad rows e and f
  • 14:59 jforrester@deploy1002: jforrester: Continuing with sync
  • 14:59 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 1:30:00 on 15 hosts with reason: upgrading spine switches eqiad rows e and f
  • 14:58 jforrester@deploy1002: jforrester: Backport for Add wikilambda-edit-monolingual-text-placeholder message to extension.json (T359782) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:58 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 14:58 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 14:56 topranks: disable ssw1-f1-eqiad leaf-facing ports in advance of upgrade T366361
  • 14:56 jforrester@deploy1002: Started scap: Backport for Add wikilambda-edit-monolingual-text-placeholder message to extension.json (T359782)
  • 14:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1193 (T352010)', diff saved to https://phabricator.wikimedia.org/P64187 and previous config saved to /var/cache/conftool/dbconfig/20240606-145440-ladsgroup.json
  • 14:52 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 (T360332)', diff saved to https://phabricator.wikimedia.org/P64186 and previous config saved to /var/cache/conftool/dbconfig/20240606-145205-arnaudb.json
  • 14:51 elukey: kill sessionstore pod running on mw1390.eqiad.wmnet (no dedicated='kask' taint)
  • 14:49 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1209 (T360332)', diff saved to https://phabricator.wikimedia.org/P64185 and previous config saved to /var/cache/conftool/dbconfig/20240606-144943-arnaudb.json
  • 14:49 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1209.eqiad.wmnet with reason: Maintenance
  • 14:49 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1209.eqiad.wmnet with reason: Maintenance
  • 14:43 sukhe: sudo cumin -b1 -s60 'A:cp and A:eqsin' 'run-puppet-agent --enable "merging CR 1038881"'
  • 14:25 TheresNoTime: close UTC afternoon backport window
  • 14:18 hashar@deploy1002: Finished deploy [integration/docroot@eee90e6]: Build dependencies updates (duration: 00m 10s)
  • 14:18 hashar@deploy1002: Started deploy [integration/docroot@eee90e6]: Build dependencies updates
  • 14:17 hashar@deploy1002: Finished deploy [integration/docroot@eee90e6]: Build dependencies updates (duration: 00m 09s)
  • 14:17 hashar@deploy1002: Started deploy [integration/docroot@eee90e6]: Build dependencies updates
  • 14:17 samtar@deploy1002: Finished scap: Backport for commonswiki: Enable numeric wgCategoryCollation (T362494), Add project namespace alias for Azerbaijani Wikisource (T365966) (duration: 12m 58s)
  • 14:15 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ssw1-f1-eqiad,ssw1-f1-eqiad IPv6,ssw1-f1-eqiad.mgmt with reason: upgrading spine switches eqiad rows e and f
  • 14:15 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on ssw1-f1-eqiad,ssw1-f1-eqiad IPv6,ssw1-f1-eqiad.mgmt with reason: upgrading spine switches eqiad rows e and f
  • 14:14 topranks: disabling BGP on cr2-eqiad towards ssw1-f1-eqiad prior to upgrade of ssw later T366361
  • 14:14 ChrisDobbins901_: sudo cumin 'A:cp and A:eqsin' 'disable-puppet "merging CR 1038881"'
  • 14:08 samtar@deploy1002: samtar and anzx and nmw03: Continuing with sync
  • 14:07 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp4050.ulsfo.wmnet
  • 14:06 samtar@deploy1002: samtar and anzx and nmw03: Backport for commonswiki: Enable numeric wgCategoryCollation (T362494), Add project namespace alias for Azerbaijani Wikisource (T365966) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:06 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4050.ulsfo.wmnet
  • 14:05 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-ctrl1001.eqiad.wmnet with reason: host reimage
  • 14:04 samtar@deploy1002: Started scap: Backport for commonswiki: Enable numeric wgCategoryCollation (T362494), Add project namespace alias for Azerbaijani Wikisource (T365966)
  • 14:02 kamila@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-ctrl1001.eqiad.wmnet with reason: host reimage
  • 14:00 kartik@deploy1002: Finished scap: Backport for CX: Fix translation container max width for large screens (T366374) (duration: 13m 11s)
  • 13:57 fabfur@cumin1002: START - Cookbook sre.hosts.reboot-single for host cp4050.ulsfo.wmnet
  • 13:56 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp4050.ulsfo.wmnet
  • 13:52 kartik@deploy1002: kartik: Continuing with sync
  • 13:50 kartik@deploy1002: kartik: Backport for CX: Fix translation container max width for large screens (T366374) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:47 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl1001.eqiad.wmnet with OS bullseye
  • 13:47 kartik@deploy1002: Started scap: Backport for CX: Fix translation container max width for large screens (T366374)
  • 13:46 samtar@deploy1002: Finished scap: Backport for [mswiktionary] Change the default Sitename value to Wikikamus (T366549) (duration: 16m 05s)
  • 13:45 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host wikikube-ctrl1001.eqiad.wmnet
  • 13:44 kamila@cumin1002: START - Cookbook sre.hosts.dhcp for host wikikube-ctrl1001.eqiad.wmnet
  • 13:44 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host wikikube-ctrl1001.eqiad.wmnet
  • 13:37 samtar@deploy1002: samtar and gergesshamon: Continuing with sync
  • 13:32 samtar@deploy1002: samtar and gergesshamon: Backport for [mswiktionary] Change the default Sitename value to Wikikamus (T366549) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:30 samtar@deploy1002: Started scap: Backport for [mswiktionary] Change the default Sitename value to Wikikamus (T366549)
  • 13:28 samtar@deploy1002: Finished scap: Backport for Activate campaignEvents extension on Igbo wiki. (T363199) (duration: 14m 07s)
  • 13:19 samtar@deploy1002: mhorsey and samtar: Continuing with sync
  • 13:16 samtar@deploy1002: mhorsey and samtar: Backport for Activate campaignEvents extension on Igbo wiki. (T363199) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:15 samtar@deploy1002: Started scap: Backport for Activate campaignEvents extension on Igbo wiki. (T363199)
  • 13:11 taavi: taavi@deploy1002 ~ $ sudo kill 32174 # kill forgotten scap sync-world process
  • 13:08 klausman@cumin1002: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:ml-serve-worker-eqiad
  • 12:57 vgutierrez: repool text@cofw with IPIP encapsulation enabled - T366466
  • 12:56 jiji@cumin1002: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:wikikube-worker-eqiad
  • 12:56 isaranto@deploy1002: helmfile [ml-staging-codfw] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 12:50 vgutierrez: rolling restart of pybal on lvs2014 and lvs2011 - T366466
  • 12:44 topranks: disabling PyBal on lvs1019 to allow for cable move T366361
  • 12:40 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp4051.ulsfo.wmnet
  • 12:39 topranks: rebooting ssw1-e1-eqiad to upgrade JunOS
  • 12:39 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4051.ulsfo.wmnet
  • 12:33 topranks: disabling BGP to ssw1-e1-eqiad from cr1-eqiad in advance of upgrade T366361
  • 12:33 vgutierrez: depool text@codfw before enabling IPIP encapsulation - T366466
  • 12:29 fabfur@cumin1002: START - Cookbook sre.hosts.reboot-single for host cp4051.ulsfo.wmnet
  • 12:28 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp4051.ulsfo.wmnet
  • 12:25 topranks: disabling PyBal on lvs1018 to allow for cable move T366361
  • 12:25 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on lvs1018.eqiad.wmnet with reason: moving lvs1018 link to row E from spine to leaf
  • 12:25 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:40:00 on lvs1018.eqiad.wmnet with reason: moving lvs1018 link to row E from spine to leaf
  • 12:24 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs1017.eqiad.wmnet
  • 12:24 cmooney@cumin1002: START - Cookbook sre.hosts.remove-downtime for lvs1017.eqiad.wmnet
  • 12:21 sfaci@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
  • 12:21 sfaci@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
  • 12:14 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on 18 hosts with reason: upgrading spine switches eqiad rows e and f
  • 12:14 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 1:30:00 on 18 hosts with reason: upgrading spine switches eqiad rows e and f
  • 11:56 topranks: disabling PyBal on lvs1017 to allow for cable move T366361
  • 11:55 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on lvs1017.eqiad.wmnet with reason: moving lvs1017 link to row E from spine to leaf
  • 11:55 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:40:00 on lvs1017.eqiad.wmnet with reason: moving lvs1017 link to row E from spine to leaf
  • 11:28 cgoubert@cumin1002: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:wikikube-worker-codfw
  • 11:27 effie: kicking off k8s eqiad restarts - T366555
  • 11:25 jiji@cumin1002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-worker-eqiad
  • 11:15 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/data-gateway: apply
  • 11:09 klausman@cumin1002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:ml-serve-worker-eqiad
  • 11:05 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/data-gateway: apply
  • 10:58 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/data-gateway: apply
  • 10:47 sfaci@deploy1002: helmfile [eqiad] DONE helmfile.d/services/geo-analytics: apply
  • 10:45 sfaci@deploy1002: helmfile [eqiad] START helmfile.d/services/geo-analytics: apply
  • 10:45 sfaci@deploy1002: helmfile [codfw] DONE helmfile.d/services/geo-analytics: apply
  • 10:43 sfaci@deploy1002: helmfile [codfw] START helmfile.d/services/geo-analytics: apply
  • 10:41 pfischer@deploy1002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 10:41 sfaci@deploy1002: helmfile [staging] DONE helmfile.d/services/geo-analytics: apply
  • 10:40 sfaci@deploy1002: helmfile [staging] START helmfile.d/services/geo-analytics: apply
  • 10:40 sfaci@deploy1002: helmfile [eqiad] DONE helmfile.d/services/device-analytics: apply
  • 10:38 sfaci@deploy1002: helmfile [eqiad] START helmfile.d/services/device-analytics: apply
  • 10:37 sfaci@deploy1002: helmfile [codfw] DONE helmfile.d/services/device-analytics: apply
  • 10:35 sfaci@deploy1002: helmfile [codfw] START helmfile.d/services/device-analytics: apply
  • 10:27 sfaci@deploy1002: helmfile [staging] DONE helmfile.d/services/device-analytics: apply
  • 10:26 sfaci@deploy1002: helmfile [staging] START helmfile.d/services/device-analytics: apply
  • 10:11 pfischer@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 10:07 arnaudb@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 100%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P64181 and previous config saved to /var/cache/conftool/dbconfig/20240606-100747-arnaudb.json
  • 09:52 arnaudb@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 75%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P64180 and previous config saved to /var/cache/conftool/dbconfig/20240606-095240-arnaudb.json
  • 09:51 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1245.eqiad.wmnet with reason: Maintenance
  • 09:50 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1245.eqiad.wmnet with reason: Maintenance
  • 09:50 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1244 (T364069)', diff saved to https://phabricator.wikimedia.org/P64179 and previous config saved to /var/cache/conftool/dbconfig/20240606-095053-marostegui.json
  • 09:47 mvernon@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host thanos-be2004.codfw.wmnet
  • 09:37 arnaudb@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 50%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P64178 and previous config saved to /var/cache/conftool/dbconfig/20240606-093734-arnaudb.json
  • 09:35 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1244', diff saved to https://phabricator.wikimedia.org/P64177 and previous config saved to /var/cache/conftool/dbconfig/20240606-093545-marostegui.json
  • 09:33 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host thanos-be2004.codfw.wmnet
  • 09:30 mvernon@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host thanos-be2003.codfw.wmnet
  • 09:22 arnaudb@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 25%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P64176 and previous config saved to /var/cache/conftool/dbconfig/20240606-092228-arnaudb.json
  • 09:22 stevemunene@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 09:20 stevemunene@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 09:20 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1244', diff saved to https://phabricator.wikimedia.org/P64175 and previous config saved to /var/cache/conftool/dbconfig/20240606-092037-marostegui.json
  • 09:20 stevemunene@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 09:18 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host thanos-be2003.codfw.wmnet
  • 09:17 cgoubert@cumin1002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-worker-codfw
  • 09:17 stevemunene@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 09:15 stevemunene@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 09:13 stevemunene@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 09:12 stevemunene@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 09:11 stevemunene@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 09:08 mvernon@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host thanos-be1004.eqiad.wmnet
  • 09:07 arnaudb@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 10%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P64174 and previous config saved to /var/cache/conftool/dbconfig/20240606-090722-arnaudb.json
  • 09:05 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1244 (T364069)', diff saved to https://phabricator.wikimedia.org/P64173 and previous config saved to /var/cache/conftool/dbconfig/20240606-090529-marostegui.json
  • 09:01 mvernon@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host thanos-be2002.codfw.wmnet
  • 09:01 filippo@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus1006.eqiad.wmnet
  • 09:01 filippo@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus2006.codfw.wmnet
  • 08:57 sfaci@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
  • 08:56 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host thanos-be1004.eqiad.wmnet
  • 08:56 sfaci@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
  • 08:52 filippo@cumin1002: START - Cookbook sre.hosts.reboot-single for host prometheus1006.eqiad.wmnet
  • 08:52 arnaudb@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 5%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P64172 and previous config saved to /var/cache/conftool/dbconfig/20240606-085216-arnaudb.json
  • 08:52 filippo@cumin1002: START - Cookbook sre.hosts.reboot-single for host prometheus2006.codfw.wmnet
  • 08:50 dcaro@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd1031.eqiad.wmnet
  • 08:47 mvernon@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host thanos-be1003.eqiad.wmnet
  • 08:44 dcaro@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudcephosd1031.eqiad.wmnet
  • 08:44 pfischer@deploy1002: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 08:43 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host thanos-be2002.codfw.wmnet
  • 08:40 mvernon@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host thanos-be2001.codfw.wmnet
  • 08:39 sfaci@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
  • 08:39 sfaci@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
  • 08:38 filippo@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus2005.codfw.wmnet
  • 08:37 arnaudb@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 2%: post maintenance repool', diff saved to https://phabricator.wikimedia.org/P64171 and previous config saved to /var/cache/conftool/dbconfig/20240606-083710-arnaudb.json
  • 08:36 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host thanos-be1003.eqiad.wmnet
  • 08:35 pfischer@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 08:35 pfischer@deploy1002: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 08:19 filippo@cumin1002: START - Cookbook sre.hosts.reboot-single for host prometheus1005.eqiad.wmnet
  • 08:17 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219 (T364299)', diff saved to https://phabricator.wikimedia.org/P64167 and previous config saved to /var/cache/conftool/dbconfig/20240606-081753-marostegui.json
  • 08:14 stevemunene@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 08:14 stevemunene@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 08:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P64166 and previous config saved to /var/cache/conftool/dbconfig/20240606-081412-ladsgroup.json
  • 08:02 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P64165 and previous config saved to /var/cache/conftool/dbconfig/20240606-080245-marostegui.json
  • 08:02 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host thanos-be1002.eqiad.wmnet
  • 08:01 mvernon@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host thanos-be1001.eqiad.wmnet
  • 08:00 urbanecm@deploy1002: Started scap: Backport for Add throttle exception for an upcoming workshop (T366748)
  • 07:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P64164 and previous config saved to /var/cache/conftool/dbconfig/20240606-075904-ladsgroup.json
  • 07:50 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host thanos-be1001.eqiad.wmnet
  • 07:47 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P64163 and previous config saved to /var/cache/conftool/dbconfig/20240606-074737-marostegui.json
  • 07:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T352010)', diff saved to https://phabricator.wikimedia.org/P64162 and previous config saved to /var/cache/conftool/dbconfig/20240606-074356-ladsgroup.json
  • 07:32 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219 (T364299)', diff saved to https://phabricator.wikimedia.org/P64161 and previous config saved to /var/cache/conftool/dbconfig/20240606-073229-marostegui.json
  • 07:30 ryankemper@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster restart - ryankemper@cumin2002 - T366555
  • 07:06 hashar: Restarting Gerrit
  • 07:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2116 (T352010)', diff saved to https://phabricator.wikimedia.org/P64160 and previous config saved to /var/cache/conftool/dbconfig/20240606-070558-ladsgroup.json
  • 07:05 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2116.codfw.wmnet with reason: Maintenance
  • 07:05 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2116.codfw.wmnet with reason: Maintenance
  • 06:56 dcaro@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd1034.eqiad.wmnet
  • 06:49 dcaro@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudcephosd1034.eqiad.wmnet
  • 05:40 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/full/20240527/ using stat1009.eqiad.wmnet)
  • 05:21 ryankemper@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster restart - ryankemper@cumin2002 - T366555
  • 05:20 ryankemper@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster restart - ryankemper@cumin2002 - T366555
  • 05:04 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-reload reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/full/20240527/ using stat1009.eqiad.wmnet)
  • 05:02 ryankemper@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REBOOT (3 nodes at a time) for ElasticSearch cluster search_codfw: codfw cluster restart - ryankemper@cumin2002 - T366555
  • 04:17 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2219 (T364299)', diff saved to https://phabricator.wikimedia.org/P64159 and previous config saved to /var/cache/conftool/dbconfig/20240606-041714-marostegui.json
  • 04:17 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2219.codfw.wmnet with reason: Maintenance
  • 04:16 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2219.codfw.wmnet with reason: Maintenance
  • 04:16 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2210 (T364299)', diff saved to https://phabricator.wikimedia.org/P64158 and previous config saved to /var/cache/conftool/dbconfig/20240606-041650-marostegui.json
  • 04:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P64157 and previous config saved to /var/cache/conftool/dbconfig/20240606-040142-marostegui.json
  • 03:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1193 (T352010)', diff saved to https://phabricator.wikimedia.org/P64156 and previous config saved to /var/cache/conftool/dbconfig/20240606-034732-ladsgroup.json
  • 03:47 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1193.eqiad.wmnet with reason: Maintenance
  • 03:47 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1193.eqiad.wmnet with reason: Maintenance
  • 03:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T352010)', diff saved to https://phabricator.wikimedia.org/P64155 and previous config saved to /var/cache/conftool/dbconfig/20240606-034709-ladsgroup.json
  • 03:46 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P64154 and previous config saved to /var/cache/conftool/dbconfig/20240606-034635-marostegui.json
  • 03:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P64153 and previous config saved to /var/cache/conftool/dbconfig/20240606-033201-ladsgroup.json
  • 03:31 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2210 (T364299)', diff saved to https://phabricator.wikimedia.org/P64152 and previous config saved to /var/cache/conftool/dbconfig/20240606-033125-marostegui.json
  • 03:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2161 (T352010)', diff saved to https://phabricator.wikimedia.org/P64151 and previous config saved to /var/cache/conftool/dbconfig/20240606-032907-ladsgroup.json
  • 03:29 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2161.codfw.wmnet with reason: Maintenance
  • 03:28 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2161.codfw.wmnet with reason: Maintenance
  • 03:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T352010)', diff saved to https://phabricator.wikimedia.org/P64150 and previous config saved to /var/cache/conftool/dbconfig/20240606-032844-ladsgroup.json
  • 03:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P64149 and previous config saved to /var/cache/conftool/dbconfig/20240606-031653-ladsgroup.json
  • 03:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P64148 and previous config saved to /var/cache/conftool/dbconfig/20240606-031336-ladsgroup.json
  • 03:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T352010)', diff saved to https://phabricator.wikimedia.org/P64147 and previous config saved to /var/cache/conftool/dbconfig/20240606-030145-ladsgroup.json
  • 02:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P64146 and previous config saved to /var/cache/conftool/dbconfig/20240606-025828-ladsgroup.json
  • 02:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T352010)', diff saved to https://phabricator.wikimedia.org/P64145 and previous config saved to /var/cache/conftool/dbconfig/20240606-024321-ladsgroup.json
  • 01:22 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1244 (T364069)', diff saved to https://phabricator.wikimedia.org/P64144 and previous config saved to /var/cache/conftool/dbconfig/20240606-012208-marostegui.json
  • 01:22 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1244.eqiad.wmnet with reason: Maintenance
  • 01:21 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1244.eqiad.wmnet with reason: Maintenance
  • 01:21 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243 (T364069)', diff saved to https://phabricator.wikimedia.org/P64143 and previous config saved to /var/cache/conftool/dbconfig/20240606-012144-marostegui.json
  • 01:06 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243', diff saved to https://phabricator.wikimedia.org/P64142 and previous config saved to /var/cache/conftool/dbconfig/20240606-010636-marostegui.json
  • 00:51 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243', diff saved to https://phabricator.wikimedia.org/P64141 and previous config saved to /var/cache/conftool/dbconfig/20240606-005128-marostegui.json
  • 00:36 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1243 (T364069)', diff saved to https://phabricator.wikimedia.org/P64140 and previous config saved to /var/cache/conftool/dbconfig/20240606-003620-marostegui.json
  • 00:32 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2210 (T364299)', diff saved to https://phabricator.wikimedia.org/P64139 and previous config saved to /var/cache/conftool/dbconfig/20240606-003232-marostegui.json
  • 00:32 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2210.codfw.wmnet with reason: Maintenance
  • 00:32 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2210.codfw.wmnet with reason: Maintenance
  • 00:32 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2206 (T364299)', diff saved to https://phabricator.wikimedia.org/P64138 and previous config saved to /var/cache/conftool/dbconfig/20240606-003208-marostegui.json
  • 00:17 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2206', diff saved to https://phabricator.wikimedia.org/P64137 and previous config saved to /var/cache/conftool/dbconfig/20240606-001700-marostegui.json
  • 00:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2206', diff saved to https://phabricator.wikimedia.org/P64136 and previous config saved to /var/cache/conftool/dbconfig/20240606-000151-marostegui.json

2024-06-05

  • 23:46 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2206 (T364299)', diff saved to https://phabricator.wikimedia.org/P64135 and previous config saved to /var/cache/conftool/dbconfig/20240605-234643-marostegui.json
  • 23:30 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2102.codfw.wmnet with reason: Maintenance
  • 23:29 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2102.codfw.wmnet with reason: Maintenance
  • 23:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1169 (T352010)', diff saved to https://phabricator.wikimedia.org/P64134 and previous config saved to /var/cache/conftool/dbconfig/20240605-232926-ladsgroup.json
  • 23:29 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 23:29 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 22:54 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/data-gateway: apply
  • 22:50 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.REBOOT (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic cluster restart - bking@cumin2002 - T366555
  • 22:44 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/data-gateway: apply
  • 22:03 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:cassandra-dev: Hail mary - eevans@cumin1002
  • 21:43 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:cassandra-dev: Hail mary - eevans@cumin1002
  • 21:42 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REBOOT (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic cluster restart - bking@cumin2002 - T366555
  • 21:42 bking@cumin2002: END (ERROR) - Cookbook sre.elasticsearch.rolling-operation (exit_code=97) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic cluster restart - bking@cumin2002 - T366555
  • 21:36 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster cloudelastic: cloudelastic cluster restart - bking@cumin2002 - T366555
  • 21:18 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/data-gateway: apply
  • 21:08 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/data-gateway: apply
  • 21:02 jhathaway@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host mx-in2001.wikimedia.org
  • 21:02 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mx-in2001.wikimedia.org with OS bookworm
  • 20:45 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mx-in2001.wikimedia.org with reason: host reimage
  • 20:42 jhathaway@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mx-in2001.wikimedia.org with reason: host reimage
  • 20:29 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2206 (T364299)', diff saved to https://phabricator.wikimedia.org/P64133 and previous config saved to /var/cache/conftool/dbconfig/20240605-202949-marostegui.json
  • 20:29 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2206.codfw.wmnet with reason: Maintenance
  • 20:29 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2206.codfw.wmnet with reason: Maintenance
  • 20:26 jhathaway@cumin1002: START - Cookbook sre.hosts.reimage for host mx-in2001.wikimedia.org with OS bookworm
  • 20:26 jhathaway@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM mx-in2001.wikimedia.org - jhathaway@cumin1002"
  • 20:25 urbanecm@deploy1002: Finished scap: Backport for [CheckUser] Stop writing old for event tables migration on group0 (T360685), Growth: Use `growthexperiments` DB list for enabling GrowthExperiments (T364892), [Beta] Enable CommunityConfiguration extension in all wikis (T364892) (duration: 22m 04s)
  • 20:25 jhathaway@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM mx-in2001.wikimedia.org - jhathaway@cumin1002"
  • 20:25 jhathaway@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) mx-in2001.wikimedia.org on all recursors
  • 20:25 jhathaway@cumin1002: START - Cookbook sre.dns.wipe-cache mx-in2001.wikimedia.org on all recursors
  • 20:25 jhathaway@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:25 jhathaway@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM mx-in2001.wikimedia.org - jhathaway@cumin1002"
  • 20:24 jhathaway@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM mx-in2001.wikimedia.org - jhathaway@cumin1002"
  • 20:22 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/data-gateway: apply
  • 20:21 jhathaway@cumin1002: START - Cookbook sre.dns.netbox
  • 20:21 jhathaway@cumin1002: START - Cookbook sre.ganeti.makevm for new host mx-in2001.wikimedia.org
  • 20:18 jhathaway@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host mx-in1001.wikimedia.org
  • 20:18 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mx-in1001.wikimedia.org with OS bookworm
  • 20:16 urbanecm@deploy1002: urbanecm and sgimeno and dreamyjazz: Continuing with sync
  • 20:12 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/data-gateway: apply
  • 20:06 ejegg: payments-wiki upgraded from c255fda8 to 82a5e588
  • 20:06 urbanecm@deploy1002: urbanecm and sgimeno and dreamyjazz: Backport for [CheckUser] Stop writing old for event tables migration on group0 (T360685), Growth: Use `growthexperiments` DB list for enabling GrowthExperiments (T364892), [Beta] Enable CommunityConfiguration extension in all wikis (T364892) synced to the testservers (https://wikitech.wikimedia.org/wiki/M
  • 20:03 urbanecm@deploy1002: Started scap: Backport for [CheckUser] Stop writing old for event tables migration on group0 (T360685), Growth: Use `growthexperiments` DB list for enabling GrowthExperiments (T364892), [Beta] Enable CommunityConfiguration extension in all wikis (T364892)
  • 20:02 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mx-in1001.wikimedia.org with reason: host reimage
  • 19:57 jhathaway@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mx-in1001.wikimedia.org with reason: host reimage
  • 19:47 jhathaway@cumin1002: START - Cookbook sre.hosts.reimage for host mx-in1001.wikimedia.org with OS bookworm
  • 19:45 jhathaway@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM mx-in1001.wikimedia.org - jhathaway@cumin1002"
  • 19:44 jhathaway@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM mx-in1001.wikimedia.org - jhathaway@cumin1002"
  • 19:43 jhathaway@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) mx-in1001.wikimedia.org on all recursors
  • 19:43 jhathaway@cumin1002: START - Cookbook sre.dns.wipe-cache mx-in1001.wikimedia.org on all recursors
  • 19:43 jhathaway@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:43 jhathaway@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM mx-in1001.wikimedia.org - jhathaway@cumin1002"
  • 19:38 jhathaway@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM mx-in1001.wikimedia.org - jhathaway@cumin1002"
  • 19:36 jhathaway@cumin1002: START - Cookbook sre.dns.netbox
  • 19:36 jhathaway@cumin1002: START - Cookbook sre.ganeti.makevm for new host mx-in1001.wikimedia.org
  • 19:27 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/full/20240527/ using stat1009.eqiad.wmnet)
  • 19:09 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/data-gateway: apply
  • 18:58 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/data-gateway: apply
  • 18:53 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.43.0-wmf.8 refs T361402
  • 18:53 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-reload reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/full/20240527/ using stat1009.eqiad.wmnet)
  • 18:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2203 (T352010)', diff saved to https://phabricator.wikimedia.org/P64132 and previous config saved to /var/cache/conftool/dbconfig/20240605-184250-ladsgroup.json
  • 18:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2203', diff saved to https://phabricator.wikimedia.org/P64131 and previous config saved to /var/cache/conftool/dbconfig/20240605-182742-ladsgroup.json
  • 18:13 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/data-gateway: apply
  • 18:12 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2203', diff saved to https://phabricator.wikimedia.org/P64130 and previous config saved to /var/cache/conftool/dbconfig/20240605-181234-ladsgroup.json
  • 18:12 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/data-gateway: apply
  • 18:11 aokoth@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host vrts1001.eqiad.wmnet
  • 18:07 aokoth@cumin1002: START - Cookbook sre.hosts.reboot-single for host vrts1001.eqiad.wmnet
  • 18:06 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/full/20240527/ using stat1009.eqiad.wmnet)
  • 17:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2203 (T352010)', diff saved to https://phabricator.wikimedia.org/P64129 and previous config saved to /var/cache/conftool/dbconfig/20240605-175725-ladsgroup.json
  • 17:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2207 (T352010)', diff saved to https://phabricator.wikimedia.org/P64128 and previous config saved to /var/cache/conftool/dbconfig/20240605-175503-ladsgroup.json
  • 17:50 kamila@cumin1002: START - Cookbook sre.hosts.dhcp for host wikikube-ctrl1001.eqiad.wmnet
  • 17:47 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2199.codfw.wmnet with reason: Maintenance
  • 17:47 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2199.codfw.wmnet with reason: Maintenance
  • 17:47 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2179 (T364299)', diff saved to https://phabricator.wikimedia.org/P64127 and previous config saved to /var/cache/conftool/dbconfig/20240605-174724-marostegui.json
  • 17:42 ladsgroup@deploy1002: Finished scap: Backport for Stop writing to pagelinks old columns in enwiki (T352010) (duration: 12m 19s)
  • 17:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2207', diff saved to https://phabricator.wikimedia.org/P64126 and previous config saved to /var/cache/conftool/dbconfig/20240605-173954-ladsgroup.json
  • 17:33 ladsgroup@deploy1002: ladsgroup: Continuing with sync
  • 17:32 ladsgroup@deploy1002: ladsgroup: Backport for Stop writing to pagelinks old columns in enwiki (T352010) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 17:32 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2179', diff saved to https://phabricator.wikimedia.org/P64125 and previous config saved to /var/cache/conftool/dbconfig/20240605-173216-marostegui.json
  • 17:31 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-reload reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/full/20240527/ using stat1009.eqiad.wmnet)
  • 17:29 ladsgroup@deploy1002: Started scap: Backport for Stop writing to pagelinks old columns in enwiki (T352010)
  • 17:27 kamila@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-ctrl1001']
  • 17:24 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/full/20240527/ using stat1009.eqiad.wmnet)
  • 17:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2207', diff saved to https://phabricator.wikimedia.org/P64124 and previous config saved to /var/cache/conftool/dbconfig/20240605-172446-ladsgroup.json
  • 17:17 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2179', diff saved to https://phabricator.wikimedia.org/P64123 and previous config saved to /var/cache/conftool/dbconfig/20240605-171708-marostegui.json
  • 17:13 kamila@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-ctrl1001']
  • 17:12 kamila@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-ctrl1001.eqiad.wmnet with OS bullseye
  • 17:10 jhathaway: phabricator email now egressing via mx-out{1001,2001}.wikimedia.org, which should solve the SPF warnings in your inbox
  • 17:10 dcaro@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd1033.eqiad.wmnet
  • 17:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2207 (T352010)', diff saved to https://phabricator.wikimedia.org/P64122 and previous config saved to /var/cache/conftool/dbconfig/20240605-170938-ladsgroup.json
  • 17:06 dzahn@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on stat1007.eqiad.wmnet with reason: decom T353785
  • 17:06 dcaro@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudcephosd1033.eqiad.wmnet
  • 17:06 dzahn@cumin1002: START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on stat1007.eqiad.wmnet with reason: decom T353785
  • 17:05 dzahn@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on stat1006.eqiad.wmnet with reason: decom T353785
  • 17:05 dzahn@cumin1002: START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on stat1006.eqiad.wmnet with reason: decom T353785
  • 17:04 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl1001.eqiad.wmnet with OS bullseye
  • 17:02 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2179 (T364299)', diff saved to https://phabricator.wikimedia.org/P64121 and previous config saved to /var/cache/conftool/dbconfig/20240605-170200-marostegui.json
  • 16:56 kamila@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-ctrl1001']
  • 16:56 dzahn@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on stat1005.eqiad.wmnet with reason: decom T353785
  • 16:56 dzahn@cumin1002: START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on stat1005.eqiad.wmnet with reason: decom T353785
  • 16:54 mutante: downtimed stat1004 for 10 days to avoid alerting spam during decom process - T353785
  • 16:53 dzahn@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10 days, 0:00:00 on stat1004.eqiad.wmnet with reason: decom T353785
  • 16:53 dzahn@cumin1002: START - Cookbook sre.hosts.downtime for 10 days, 0:00:00 on stat1004.eqiad.wmnet with reason: decom T353785
  • 16:52 ladsgroup@deploy1002: Finished scap: Backport for Bump XML dump schema to version 0.11 (T365155) (duration: 18m 23s)
  • 16:48 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-reload reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/full/20240527/ using stat1009.eqiad.wmnet)
  • 16:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1177 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P64120 and previous config saved to /var/cache/conftool/dbconfig/20240605-164635-ladsgroup.json
  • 16:46 kamila@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-ctrl1001']
  • 16:45 kamila@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-ctrl1001.eqiad.wmnet with OS bullseye
  • 16:43 ladsgroup@deploy1002: ladsgroup and dr0ptp4kt: Continuing with sync
  • 16:40 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kubestage1003.eqiad.wmnet
  • 16:38 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl1001.eqiad.wmnet with OS bullseye
  • 16:36 ladsgroup@deploy1002: ladsgroup and dr0ptp4kt: Backport for Bump XML dump schema to version 0.11 (T365155) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 16:34 kamila@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-ctrl1001.eqiad.wmnet with OS bullseye
  • 16:34 ladsgroup@deploy1002: Started scap: Backport for Bump XML dump schema to version 0.11 (T365155)
  • 16:32 jayme@cumin1002: START - Cookbook sre.hosts.reboot-single for host kubestage1003.eqiad.wmnet
  • 16:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1177 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P64119 and previous config saved to /var/cache/conftool/dbconfig/20240605-163129-ladsgroup.json
  • 16:20 jforrester@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 16:18 jforrester@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 16:18 dcaro@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd1032.eqiad.wmnet
  • 16:18 jforrester@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 16:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1177 (re)pooling @ 50%: Maint over', diff saved to https://phabricator.wikimedia.org/P64118 and previous config saved to /var/cache/conftool/dbconfig/20240605-161622-ladsgroup.json
  • 16:16 jforrester@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 16:15 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 16:14 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 16:12 dcaro@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudcephosd1032.eqiad.wmnet
  • 16:11 jforrester@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 16:10 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl1001.eqiad.wmnet with OS bullseye
  • 16:10 jforrester@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 16:10 jforrester@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 16:08 jforrester@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 16:05 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 16:05 jayme@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 16:01 aokoth@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 16:01 aokoth@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 16:01 kamila@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-ctrl1001.eqiad.wmnet with OS bullseye
  • 16:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1177 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P64117 and previous config saved to /var/cache/conftool/dbconfig/20240605-160116-ladsgroup.json
  • 15:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1178 (T352010)', diff saved to https://phabricator.wikimedia.org/P64116 and previous config saved to /var/cache/conftool/dbconfig/20240605-155955-ladsgroup.json
  • 15:59 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance
  • 15:59 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance
  • 15:59 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1082.eqiad.wmnet
  • 15:58 aokoth@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 15:58 aokoth@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 15:57 aokoth@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 15:56 aokoth@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 15:51 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1082.eqiad.wmnet
  • 15:51 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1081.eqiad.wmnet
  • 15:51 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl1001.eqiad.wmnet with OS bullseye
  • 15:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T352010)', diff saved to https://phabricator.wikimedia.org/P64115 and previous config saved to /var/cache/conftool/dbconfig/20240605-155023-ladsgroup.json
  • 15:46 kamila@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-ctrl1001.eqiad.wmnet with OS bullseye
  • 15:44 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1081.eqiad.wmnet
  • 15:43 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1080.eqiad.wmnet
  • 15:43 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netboxdb1002.eqiad.wmnet
  • 15:43 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2080.codfw.wmnet
  • 15:39 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netboxdb1002.eqiad.wmnet
  • 15:37 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1080.eqiad.wmnet
  • 15:37 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netboxdb2002.codfw.wmnet
  • 15:37 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1079.eqiad.wmnet
  • 15:36 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2080.codfw.wmnet
  • 15:36 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2079.codfw.wmnet
  • 15:34 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl1001.eqiad.wmnet with OS bullseye
  • 15:33 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netboxdb2002.codfw.wmnet
  • 15:32 moritzm: rebalancing drmrs Ganeti clusters
  • 15:30 kamila@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-ctrl1001']
  • 15:29 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1079.eqiad.wmnet
  • 15:28 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1078.eqiad.wmnet
  • 15:28 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2079.codfw.wmnet
  • 15:27 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2078.codfw.wmnet
  • 15:26 sukhe@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM pybal-test2003.codfw.wmnet
  • 15:25 jmm@cumin2002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host ping1004.eqiad.wmnet
  • 15:25 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ping1004.eqiad.wmnet with OS bookworm
  • 15:24 sukhe@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM pybal-test2003.codfw.wmnet
  • 15:21 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1078.eqiad.wmnet
  • 15:20 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1077.eqiad.wmnet
  • 15:19 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2078.codfw.wmnet
  • 15:19 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2077.codfw.wmnet
  • 15:18 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host krb1001.eqiad.wmnet
  • 15:13 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host krb1001.eqiad.wmnet
  • 15:13 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1077.eqiad.wmnet
  • 15:12 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2077.codfw.wmnet
  • 15:10 kamila@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-ctrl1001']
  • 15:10 kamila@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['wikikube-ctrl1001']
  • 15:09 kamila@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-ctrl1001']
  • 15:09 jnuche@deploy1002: Installation of scap version "4.86.0" completed for 285 hosts
  • 15:08 jnuche@deploy1002: Installing scap version "4.86.0" for 285 hosts
  • 15:07 jnuche@deploy1002: Installing scap version "4.86.0" for 286 hosts
  • 15:06 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1243 (T364069)', diff saved to https://phabricator.wikimedia.org/P64114 and previous config saved to /var/cache/conftool/dbconfig/20240605-150605-marostegui.json
  • 15:05 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1243.eqiad.wmnet with reason: Maintenance
  • 15:05 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1243.eqiad.wmnet with reason: Maintenance
  • 15:05 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242 (T364069)', diff saved to https://phabricator.wikimedia.org/P64113 and previous config saved to /var/cache/conftool/dbconfig/20240605-150542-marostegui.json
  • 15:05 kamila@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-ctrl1001.eqiad.wmnet with OS bullseye
  • 15:04 vgutierrez: repool text@eqsin with IPIP encapsulation enabled - T366466
  • 15:02 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest1001.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:01 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host krb2002.codfw.wmnet
  • 15:01 aikochou@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 14:59 cwhite@deploy1002: Finished scap: Backport for MWMultiVersion: Fix "Undefined index: PATH_INFO" warnings (T366657) (duration: 12m 32s)
  • 14:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2154 (T352010)', diff saved to https://phabricator.wikimedia.org/P64112 and previous config saved to /var/cache/conftool/dbconfig/20240605-145757-ladsgroup.json
  • 14:57 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2154.codfw.wmnet with reason: Maintenance
  • 14:57 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2154.codfw.wmnet with reason: Maintenance
  • 14:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T352010)', diff saved to https://phabricator.wikimedia.org/P64111 and previous config saved to /var/cache/conftool/dbconfig/20240605-145735-ladsgroup.json
  • 14:55 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl1001.eqiad.wmnet with OS bullseye
  • 14:55 kamila@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-ctrl1001.eqiad.wmnet with OS bullseye
  • 14:55 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host krb2002.codfw.wmnet
  • 14:55 vgutierrez: rolling restart of pybal on lvs5006 and lvs5004 - T366466
  • 14:50 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242', diff saved to https://phabricator.wikimedia.org/P64110 and previous config saved to /var/cache/conftool/dbconfig/20240605-145034-marostegui.json
  • 14:50 cwhite@deploy1002: matmarex and cwhite: Continuing with sync
  • 14:49 cwhite@deploy1002: matmarex and cwhite: Backport for MWMultiVersion: Fix "Undefined index: PATH_INFO" warnings (T366657) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:47 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host serpens.wikimedia.org
  • 14:46 cwhite@deploy1002: Started scap: Backport for MWMultiVersion: Fix "Undefined index: PATH_INFO" warnings (T366657)
  • 14:44 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host serpens.wikimedia.org
  • 14:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P64109 and previous config saved to /var/cache/conftool/dbconfig/20240605-144227-ladsgroup.json
  • 14:35 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242', diff saved to https://phabricator.wikimedia.org/P64108 and previous config saved to /var/cache/conftool/dbconfig/20240605-143526-marostegui.json
  • 14:29 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti6002.drmrs.wmnet
  • 14:29 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti6002.drmrs.wmnet
  • 14:29 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl1001.eqiad.wmnet with OS bullseye
  • 14:28 vgutierrez: depool text@eqsin before enabling IPIP encapsulation - T366466
  • 14:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P64107 and previous config saved to /var/cache/conftool/dbconfig/20240605-142718-ladsgroup.json
  • 14:23 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1076.eqiad.wmnet
  • 14:23 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2076.codfw.wmnet
  • 14:23 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti6002.drmrs.wmnet
  • 14:20 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1242 (T364069)', diff saved to https://phabricator.wikimedia.org/P64106 and previous config saved to /var/cache/conftool/dbconfig/20240605-142018-marostegui.json
  • 14:15 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2076.codfw.wmnet
  • 14:15 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1076.eqiad.wmnet
  • 14:13 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1075.eqiad.wmnet
  • 14:13 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2075.codfw.wmnet
  • 14:12 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T352010)', diff saved to https://phabricator.wikimedia.org/P64105 and previous config saved to /var/cache/conftool/dbconfig/20240605-141210-ladsgroup.json
  • 14:10 cgoubert@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 14:10 cgoubert@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 14:07 elukey@cumin1002: START - Cookbook sre.hosts.provision for host sretest1001.mgmt.eqiad.wmnet with reboot policy FORCED
  • 14:05 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2075.codfw.wmnet
  • 14:05 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1075.eqiad.wmnet
  • 14:04 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ping1004.eqiad.wmnet with OS bookworm
  • 14:04 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM ping1004.eqiad.wmnet - jmm@cumin2002"
  • 14:02 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 14:02 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 14:00 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM ping1004.eqiad.wmnet - jmm@cumin2002"
  • 14:00 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti6002.drmrs.wmnet
  • 14:00 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2074.codfw.wmnet
  • 14:00 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ping1004.eqiad.wmnet on all recursors
  • 14:00 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache ping1004.eqiad.wmnet on all recursors
  • 14:00 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:00 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ping1004.eqiad.wmnet - jmm@cumin2002"
  • 13:57 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti6001.drmrs.wmnet
  • 13:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti6001.drmrs.wmnet
  • 13:55 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ping1004.eqiad.wmnet - jmm@cumin2002"
  • 13:54 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1074.eqiad.wmnet
  • 13:52 filippo@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus3003.esams.wmnet
  • 13:52 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2074.codfw.wmnet
  • 13:52 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2073.codfw.wmnet
  • 13:52 filippo@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus5002.eqsin.wmnet
  • 13:52 filippo@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus4002.ulsfo.wmnet
  • 13:51 aikochou@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 13:50 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti6001.drmrs.wmnet
  • 13:48 inflatador: bking@an-db1001 install python3-psycopg2 pkg T363001
  • 13:48 daniel@deploy1002: Finished scap: Backport for Set LinterParseOnDerivedDataUpdate to false (T361013) (duration: 17m 50s)
  • 13:48 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 13:48 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host ping1004.eqiad.wmnet
  • 13:47 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.resource-report (exit_code=0)
  • 13:47 jmm@cumin2002: START - Cookbook sre.ganeti.resource-report
  • 13:46 filippo@cumin1002: START - Cookbook sre.hosts.reboot-single for host prometheus3003.esams.wmnet
  • 13:46 elukey: factory reset for sretest1001 to test the new provision cookbook - T365372
  • 13:46 filippo@cumin1002: START - Cookbook sre.hosts.reboot-single for host prometheus4002.ulsfo.wmnet
  • 13:46 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2073.codfw.wmnet
  • 13:46 filippo@cumin1002: START - Cookbook sre.hosts.reboot-single for host prometheus5002.eqsin.wmnet
  • 13:46 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1074.eqiad.wmnet
  • 13:45 inflatador: bking@an-db1001 install acl pkg T363001
  • 13:43 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1073.eqiad.wmnet
  • 13:43 filippo@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus6002.drmrs.wmnet
  • 13:43 filippo@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host prometheus7001.magru.wmnet
  • 13:40 daniel@deploy1002: daniel: Continuing with sync
  • 13:39 filippo@cumin1002: START - Cookbook sre.hosts.reboot-single for host prometheus6002.drmrs.wmnet
  • 13:37 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti6001.drmrs.wmnet
  • 13:37 filippo@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host graphite1005.eqiad.wmnet
  • 13:37 filippo@cumin1002: START - Cookbook sre.hosts.reboot-single for host prometheus7001.magru.wmnet
  • 13:37 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2072.codfw.wmnet
  • 13:36 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1073.eqiad.wmnet
  • 13:35 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1072.eqiad.wmnet
  • 13:34 daniel@deploy1002: daniel: Backport for Set LinterParseOnDerivedDataUpdate to false (T361013) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:34 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 13:30 daniel@deploy1002: Started scap: Backport for Set LinterParseOnDerivedDataUpdate to false (T361013)
  • 13:29 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2072.codfw.wmnet
  • 13:28 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2071.codfw.wmnet
  • 13:27 elukey: systemctl reset-failed prometheus-redis-exporter@6380.service redis-instance-tcp_6380.service on netbox[12]002 + apt-get purge of redis-server and prometheus-redis-exporter packages to clean up stale configs (no local redis is used)
  • 13:27 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1072.eqiad.wmnet
  • 13:26 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1071.eqiad.wmnet
  • 13:26 dreamyjazz@deploy1002: Finished scap: Backport for Follow-up: Don't run interact with block buttons if they don't exist (T329493) (duration: 11m 39s)
  • 13:25 filippo@cumin1002: START - Cookbook sre.hosts.reboot-single for host graphite1005.eqiad.wmnet
  • 13:21 fabfur: enable magru DC after applying IPIP encapsulation patches (T366466)
  • 13:20 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2071.codfw.wmnet
  • 13:19 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2070.codfw.wmnet
  • 13:17 dreamyjazz@deploy1002: dreamyjazz: Continuing with sync
  • 13:17 dreamyjazz@deploy1002: dreamyjazz: Backport for Follow-up: Don't run interact with block buttons if they don't exist (T329493) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:16 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2179 (T364299)', diff saved to https://phabricator.wikimedia.org/P64104 and previous config saved to /var/cache/conftool/dbconfig/20240605-131647-marostegui.json
  • 13:16 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2179.codfw.wmnet with reason: Maintenance
  • 13:16 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2179.codfw.wmnet with reason: Maintenance
  • 13:16 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T364299)', diff saved to https://phabricator.wikimedia.org/P64103 and previous config saved to /var/cache/conftool/dbconfig/20240605-131623-marostegui.json
  • 13:14 dreamyjazz@deploy1002: Started scap: Backport for Follow-up: Don't run interact with block buttons if they don't exist (T329493)
  • 13:13 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2070.codfw.wmnet
  • 13:13 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1071.eqiad.wmnet
  • 13:13 dreamyjazz@deploy1002: Finished scap: Backport for [CheckUser] Stop writing old for event table migration on testwiki (T360686) (duration: 19m 13s)
  • 13:10 elukey@cumin1002: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:aux-worker
  • 13:06 fabfur: restarting pybal on lvs7001/lvs7003 to appy IPIP conf (T366466)
  • 13:04 dreamyjazz@deploy1002: dreamyjazz: Continuing with sync
  • 13:03 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1070.eqiad.wmnet
  • 13:02 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2069.codfw.wmnet
  • 13:02 dreamyjazz@deploy1002: dreamyjazz: Backport for [CheckUser] Stop writing old for event table migration on testwiki (T360686) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P64102 and previous config saved to /var/cache/conftool/dbconfig/20240605-130115-marostegui.json
  • 12:56 elukey@cumin1002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:aux-worker
  • 12:55 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1070.eqiad.wmnet
  • 12:55 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2069.codfw.wmnet
  • 12:53 dreamyjazz@deploy1002: Started scap: Backport for [CheckUser] Stop writing old for event table migration on testwiki (T360686)
  • 12:53 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1069.eqiad.wmnet
  • 12:52 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host ping2004.codfw.wmnet
  • 12:52 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ping2004.codfw.wmnet with OS bookworm
  • 12:51 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2068.codfw.wmnet
  • 12:49 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1246.eqiad.wmnet with reason: maintenance
  • 12:49 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1246.eqiad.wmnet with reason: maintenance
  • 12:49 arnaudb@cumin1002: dbctl commit (dc=all): 'depool db1246 T363119', diff saved to https://phabricator.wikimedia.org/P64101 and previous config saved to /var/cache/conftool/dbconfig/20240605-124918-arnaudb.json
  • 12:46 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172', diff saved to https://phabricator.wikimedia.org/P64100 and previous config saved to /var/cache/conftool/dbconfig/20240605-124607-marostegui.json
  • 12:46 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1069.eqiad.wmnet
  • 12:45 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1068.eqiad.wmnet
  • 12:45 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2068.codfw.wmnet
  • 12:45 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2067.codfw.wmnet
  • 12:43 moritzm: failover ganeti masters in drmrs
  • 12:40 cgoubert@cumin1002: END (ERROR) - Cookbook sre.k8s.reboot-nodes (exit_code=97) rolling reboot on A:wikikube-worker-codfw
  • 12:39 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1068.eqiad.wmnet
  • 12:39 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1067.eqiad.wmnet
  • 12:38 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2067.codfw.wmnet
  • 12:38 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2066.codfw.wmnet
  • 12:36 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ping2004.codfw.wmnet with reason: host reimage
  • 12:35 fabfur: disabling puppet on A:cp-text to test IPIP encapsulation on magru (T366466)
  • 12:33 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti6004.drmrs.wmnet
  • 12:33 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti6004.drmrs.wmnet
  • 12:32 jmm@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ping2004.codfw.wmnet with reason: host reimage
  • 12:32 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2066.codfw.wmnet
  • 12:31 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1067.eqiad.wmnet
  • 12:31 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2172 (T364299)', diff saved to https://phabricator.wikimedia.org/P64099 and previous config saved to /var/cache/conftool/dbconfig/20240605-123059-marostegui.json
  • 12:29 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1066.eqiad.wmnet
  • 12:29 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2065.codfw.wmnet
  • 12:27 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti6004.drmrs.wmnet
  • 12:26 fabfur: disabling magru DC to apply IPIP encapsulation patches (T366466)
  • 12:21 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2065.codfw.wmnet
  • 12:20 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on dbstore1007.eqiad.wmnet with reason: Long schema change
  • 12:20 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 10:00:00 on dbstore1007.eqiad.wmnet with reason: Long schema change
  • 12:20 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2064.codfw.wmnet
  • 12:19 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti6004.drmrs.wmnet
  • 12:17 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti6003.drmrs.wmnet
  • 12:17 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1066.eqiad.wmnet
  • 12:17 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti6003.drmrs.wmnet
  • 12:16 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1065.eqiad.wmnet
  • 12:15 jmm@cumin2002: START - Cookbook sre.hosts.reimage for host ping2004.codfw.wmnet with OS bookworm
  • 12:15 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM ping2004.codfw.wmnet - jmm@cumin2002"
  • 12:14 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2064.codfw.wmnet
  • 12:14 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM ping2004.codfw.wmnet - jmm@cumin2002"
  • 12:13 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) ping2004.codfw.wmnet on all recursors
  • 12:13 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache ping2004.codfw.wmnet on all recursors
  • 12:13 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:13 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ping2004.codfw.wmnet - jmm@cumin2002"
  • 12:13 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2063.codfw.wmnet
  • 12:12 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM ping2004.codfw.wmnet - jmm@cumin2002"
  • 12:10 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti6003.drmrs.wmnet
  • 12:09 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1065.eqiad.wmnet
  • 12:08 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1064.eqiad.wmnet
  • 12:05 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 12:05 jmm@cumin2002: START - Cookbook sre.ganeti.makevm for new host ping2004.codfw.wmnet
  • 12:04 jmm@cumin2002: END (PASS) - Cookbook sre.ganeti.resource-report (exit_code=0)
  • 12:04 jmm@cumin2002: START - Cookbook sre.ganeti.resource-report
  • 12:03 jmm@cumin2002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti6003.drmrs.wmnet
  • 12:00 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1064.eqiad.wmnet
  • 12:00 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1063.eqiad.wmnet
  • 11:58 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2063.codfw.wmnet
  • 11:57 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2062.codfw.wmnet
  • 11:52 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1063.eqiad.wmnet
  • 11:50 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1062.eqiad.wmnet
  • 11:49 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2062.codfw.wmnet
  • 11:48 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2061.codfw.wmnet
  • 11:44 jmm@cumin2002: END (PASS) - Cookbook sre.ldap.roll-restart-reboot-replica (exit_code=0) rolling reboot on A:ldap-replicas-eqiad
  • 11:41 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1062.eqiad.wmnet
  • 11:41 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2061.codfw.wmnet
  • 11:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host netbox-dev2002.codfw.wmnet
  • 11:39 hnowlan@cumin1002: conftool action : set/weight=10:pooled=yes; selector: name=(wikikube-worker1008.eqiad.wmnet|wikikube-worker1009.eqiad.wmnet|wikikube-worker1010.eqiad.wmnet|wikikube-worker1011.eqiad.wmnet|wikikube-worker1012.eqiad.wmnet),cluster=kubernetes,service=kubesvc
  • 11:38 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host netbox-dev2002.codfw.wmnet
  • 11:38 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1061.eqiad.wmnet
  • 11:38 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2060.codfw.wmnet
  • 11:37 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1031.eqiad.wmnet with OS bullseye
  • 11:36 jmm@cumin2002: START - Cookbook sre.ldap.roll-restart-reboot-replica rolling reboot on A:ldap-replicas-eqiad
  • 11:31 hnowlan: running homer to configure bgp on 5 new k8s workers
  • 11:31 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1011.eqiad.wmnet with OS bullseye
  • 11:30 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2060.codfw.wmnet
  • 11:30 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1061.eqiad.wmnet
  • 11:27 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1009.eqiad.wmnet with OS bullseye
  • 11:21 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1031.eqiad.wmnet with reason: host reimage
  • 11:17 jiji@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1031.eqiad.wmnet with reason: host reimage
  • 11:12 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1011.eqiad.wmnet with reason: host reimage
  • 11:09 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1009.eqiad.wmnet with reason: host reimage
  • 11:06 hnowlan@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1011.eqiad.wmnet with reason: host reimage
  • 11:06 hnowlan@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1009.eqiad.wmnet with reason: host reimage
  • 11:06 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2059.codfw.wmnet
  • 11:03 jiji@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1031.eqiad.wmnet with OS bullseye
  • 11:03 claime: restarted send_tile_invalidations.service on maps1009
  • 11:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1184 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P64098 and previous config saved to /var/cache/conftool/dbconfig/20240605-110303-ladsgroup.json
  • 10:59 jmm@cumin2002: END (PASS) - Cookbook sre.ldap.roll-restart-reboot-replica (exit_code=0) rolling reboot on A:ldap-replicas-codfw
  • 10:54 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1060.eqiad.wmnet
  • 10:54 marostegui@cumin1002: dbctl commit (dc=all): 'db1227 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P64097 and previous config saved to /var/cache/conftool/dbconfig/20240605-105400-root.json
  • 10:53 hnowlan@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1011.eqiad.wmnet with OS bullseye
  • 10:53 hnowlan@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-worker1011.eqiad.wmnet with OS bullseye
  • 10:53 hnowlan@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1009.eqiad.wmnet with OS bullseye
  • 10:52 hnowlan@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker1009.eqiad.wmnet with OS bullseye
  • 10:52 jmm@cumin2002: START - Cookbook sre.ldap.roll-restart-reboot-replica rolling reboot on A:ldap-replicas-codfw
  • 10:50 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2059.codfw.wmnet
  • 10:50 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2058.codfw.wmnet
  • 10:47 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1184 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P64096 and previous config saved to /var/cache/conftool/dbconfig/20240605-104757-ladsgroup.json
  • 10:46 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1060.eqiad.wmnet
  • 10:46 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1059.eqiad.wmnet
  • 10:42 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2058.codfw.wmnet
  • 10:40 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2057.codfw.wmnet
  • 10:39 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-etcd1003.eqiad.wmnet
  • 10:38 marostegui@cumin1002: dbctl commit (dc=all): 'db1227 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P64094 and previous config saved to /var/cache/conftool/dbconfig/20240605-103854-root.json
  • 10:37 klausman@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ml-etcd1003.eqiad.wmnet
  • 10:37 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1012.eqiad.wmnet with OS bullseye
  • 10:35 klausman@cumin2002: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:ml-serve-worker-codfw
  • 10:34 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1010.eqiad.wmnet with OS bullseye
  • 10:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1184 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P64093 and previous config saved to /var/cache/conftool/dbconfig/20240605-103251-ladsgroup.json
  • 10:32 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1059.eqiad.wmnet
  • 10:32 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2057.codfw.wmnet
  • 10:31 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2056.codfw.wmnet
  • 10:30 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1008.eqiad.wmnet with OS bullseye
  • 10:30 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1058.eqiad.wmnet
  • 10:27 jmm@cumin2002: END (PASS) - Cookbook sre.netbox.restart-reboot (exit_code=0) rolling reboot on A:netbox
  • 10:23 marostegui@cumin1002: dbctl commit (dc=all): 'db1227 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P64091 and previous config saved to /var/cache/conftool/dbconfig/20240605-102348-root.json
  • 10:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2207 (T352010)', diff saved to https://phabricator.wikimedia.org/P64090 and previous config saved to /var/cache/conftool/dbconfig/20240605-102252-ladsgroup.json
  • 10:22 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2207.codfw.wmnet with reason: Maintenance
  • 10:22 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2207.codfw.wmnet with reason: Maintenance
  • 10:22 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1058.eqiad.wmnet
  • 10:22 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2056.codfw.wmnet
  • 10:21 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2055.codfw.wmnet
  • 10:21 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1057.eqiad.wmnet
  • 10:18 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1012.eqiad.wmnet with reason: host reimage
  • 10:17 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1184 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P64088 and previous config saved to /var/cache/conftool/dbconfig/20240605-101744-ladsgroup.json
  • 10:16 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1152.eqiad.wmnet with OS bookworm
  • 10:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2203 (T352010)', diff saved to https://phabricator.wikimedia.org/P64087 and previous config saved to /var/cache/conftool/dbconfig/20240605-101521-ladsgroup.json
  • 10:15 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2203.codfw.wmnet with reason: Maintenance
  • 10:15 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1010.eqiad.wmnet with reason: host reimage
  • 10:15 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2203.codfw.wmnet with reason: Maintenance
  • 10:13 dcaro@cumin1002: END (ERROR) - Cookbook sre.hosts.reboot-single (exit_code=97) for host cloudcephosd1031.eqiad.wmnet
  • 10:13 hnowlan@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1012.eqiad.wmnet with reason: host reimage
  • 10:11 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1008.eqiad.wmnet with reason: host reimage
  • 10:10 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db1152 back to x2 eqiad master T366677', diff saved to https://phabricator.wikimedia.org/P64086 and previous config saved to /var/cache/conftool/dbconfig/20240605-101019-root.json
  • 10:09 hnowlan@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1010.eqiad.wmnet with reason: host reimage
  • 10:09 hnowlan@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1008.eqiad.wmnet with reason: host reimage
  • 10:08 marostegui@cumin1002: dbctl commit (dc=all): 'db1227 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P64085 and previous config saved to /var/cache/conftool/dbconfig/20240605-100842-root.json
  • 10:08 marostegui@cumin1002: dbctl commit (dc=all): 'db1186 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P64084 and previous config saved to /var/cache/conftool/dbconfig/20240605-100810-root.json
  • 10:01 marostegui@cumin1002: dbctl commit (dc=all): 'db2207 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P64083 and previous config saved to /var/cache/conftool/dbconfig/20240605-100117-root.json
  • 10:00 fabfur: disabling puppet on cp4037 to test Benthos performances (T358109)
  • 10:00 hnowlan@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1012.eqiad.wmnet with OS bullseye
  • 10:00 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1057.eqiad.wmnet
  • 10:00 hnowlan@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1011.eqiad.wmnet with OS bullseye
  • 10:00 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2055.codfw.wmnet
  • 09:59 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) netbox.discovery.wmnet. on all recursors
  • 09:59 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache netbox.discovery.wmnet. on all recursors
  • 09:59 cgoubert@cumin1002: conftool action : set/pooled=yes:weight=10; selector: name=wikikube-worker1001.eqiad.wmnet,cluster=kubernetes,service=kubesvc
  • 09:58 claime: pooling and uncordoning wikikube-worker1001 - T351074
  • 09:57 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw1456 to wikikube-worker1012
  • 09:57 hnowlan@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1012
  • 09:56 aikochou@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 09:55 hnowlan@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1010.eqiad.wmnet with OS bullseye
  • 09:55 hnowlan@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1009.eqiad.wmnet with OS bullseye
  • 09:55 hnowlan@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1008.eqiad.wmnet with OS bullseye
  • 09:55 hnowlan@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1008.eqiad.wmnet wikikube-worker1009.eqiad.wmnet wikikube-worker1010.eqiad.wmnet wikikube-worker1011.eqiad.wmnet wikikube-worker1012.eqiad.wmnet on all recursors
  • 09:55 hnowlan@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker1008.eqiad.wmnet wikikube-worker1009.eqiad.wmnet wikikube-worker1010.eqiad.wmnet wikikube-worker1011.eqiad.wmnet wikikube-worker1012.eqiad.wmnet on all recursors
  • 09:54 hnowlan@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1012
  • 09:54 hnowlan@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:54 hnowlan@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1456 to wikikube-worker1012 - hnowlan@cumin1002"
  • 09:54 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1152.eqiad.wmnet with reason: host reimage
  • 09:54 jmm@cumin2002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) netbox.discovery.wmnet. on all recursors
  • 09:54 jmm@cumin2002: START - Cookbook sre.dns.wipe-cache netbox.discovery.wmnet. on all recursors
  • 09:54 jmm@cumin2002: START - Cookbook sre.netbox.restart-reboot rolling reboot on A:netbox
  • 09:53 hnowlan@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1456 to wikikube-worker1012 - hnowlan@cumin1002"
  • 09:53 marostegui@cumin1002: dbctl commit (dc=all): 'db1227 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P64082 and previous config saved to /var/cache/conftool/dbconfig/20240605-095336-root.json
  • 09:53 marostegui@cumin1002: dbctl commit (dc=all): 'db1186 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P64081 and previous config saved to /var/cache/conftool/dbconfig/20240605-095303-root.json
  • 09:52 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw1428 to wikikube-worker1011
  • 09:52 hnowlan@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1011
  • 09:51 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1152.eqiad.wmnet with reason: host reimage
  • 09:51 dcaro@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudcephosd1031.eqiad.wmnet
  • 09:51 hnowlan@cumin1002: START - Cookbook sre.dns.netbox
  • 09:51 hnowlan@cumin1002: START - Cookbook sre.hosts.rename from mw1456 to wikikube-worker1012
  • 09:50 hnowlan@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1011
  • 09:50 hnowlan@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:50 hnowlan@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1428 to wikikube-worker1011 - hnowlan@cumin1002"
  • 09:49 hnowlan@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1428 to wikikube-worker1011 - hnowlan@cumin1002"
  • 09:46 hnowlan@cumin1002: START - Cookbook sre.dns.netbox
  • 09:46 hnowlan@cumin1002: START - Cookbook sre.hosts.rename from mw1428 to wikikube-worker1011
  • 09:46 marostegui@cumin1002: dbctl commit (dc=all): 'db2207 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P64080 and previous config saved to /var/cache/conftool/dbconfig/20240605-094611-root.json
  • 09:46 hnowlan@cumin1002: END (FAIL) - Cookbook sre.hosts.rename (exit_code=99) from mw1428 to wikikube-worker1011
  • 09:45 hnowlan@cumin1002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 09:45 hnowlan@cumin1002: END (FAIL) - Cookbook sre.hosts.rename (exit_code=99) from mw1456 to wikikube-worker1012
  • 09:44 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw1410 to wikikube-worker1010
  • 09:44 hnowlan@cumin1002: START - Cookbook sre.hosts.rename from mw1456 to wikikube-worker1012
  • 09:44 hnowlan@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1010
  • 09:44 hnowlan@cumin1002: START - Cookbook sre.dns.netbox
  • 09:44 claime: homer 'cr*eqiad*' commit 'T351074'
  • 09:44 hnowlan@cumin1002: START - Cookbook sre.hosts.rename from mw1428 to wikikube-worker1011
  • 09:43 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1001.eqiad.wmnet with OS bullseye
  • 09:43 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw1401 to wikikube-worker1009
  • 09:43 hnowlan@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1009
  • 09:42 hnowlan@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1010
  • 09:42 hnowlan@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:41 hnowlan@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1009
  • 09:41 hnowlan@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:41 hnowlan@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1401 to wikikube-worker1009 - hnowlan@cumin1002"
  • 09:41 hnowlan@cumin1002: START - Cookbook sre.dns.netbox
  • 09:40 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw1400 to wikikube-worker1008
  • 09:40 hnowlan@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1008
  • 09:39 hnowlan@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1401 to wikikube-worker1009 - hnowlan@cumin1002"
  • 09:38 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2054.codfw.wmnet
  • 09:38 hnowlan@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1008
  • 09:38 hnowlan@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:38 hnowlan@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1400 to wikikube-worker1008 - hnowlan@cumin1002"
  • 09:38 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader2003.wikimedia.org
  • 09:38 marostegui@cumin1002: dbctl commit (dc=all): 'db1227 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P64079 and previous config saved to /var/cache/conftool/dbconfig/20240605-093830-root.json
  • 09:37 marostegui@cumin1002: dbctl commit (dc=all): 'db1186 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P64078 and previous config saved to /var/cache/conftool/dbconfig/20240605-093757-root.json
  • 09:37 marostegui@cumin1002: START - Cookbook sre.hosts.reimage for host db1152.eqiad.wmnet with OS bookworm
  • 09:35 hnowlan@cumin1002: START - Cookbook sre.dns.netbox
  • 09:35 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db1151 to temp x2 eqiad master T366677', diff saved to https://phabricator.wikimedia.org/P64077 and previous config saved to /var/cache/conftool/dbconfig/20240605-093507-root.json
  • 09:34 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 6 hosts with reason: Reimage x2 eqiad master T366677
  • 09:34 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on 6 hosts with reason: Reimage x2 eqiad master T366677
  • 09:34 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader2003.wikimedia.org
  • 09:33 hnowlan@cumin1002: START - Cookbook sre.hosts.rename from mw1410 to wikikube-worker1010
  • 09:33 hnowlan@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1400 to wikikube-worker1008 - hnowlan@cumin1002"
  • 09:31 hnowlan@cumin1002: END (FAIL) - Cookbook sre.hosts.rename (exit_code=93) from mw1410 to wikikube-worker1010.eqiad.wmnet
  • 09:31 hnowlan@cumin1002: START - Cookbook sre.hosts.rename from mw1410 to wikikube-worker1010.eqiad.wmnet
  • 09:31 dcaro@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd1030.eqiad.wmnet
  • 09:31 hnowlan@cumin1002: START - Cookbook sre.hosts.rename from mw1401 to wikikube-worker1009
  • 09:31 marostegui@cumin1002: dbctl commit (dc=all): 'db2207 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P64076 and previous config saved to /var/cache/conftool/dbconfig/20240605-093105-root.json
  • 09:30 hnowlan@cumin1002: START - Cookbook sre.dns.netbox
  • 09:30 hnowlan@cumin1002: START - Cookbook sre.hosts.rename from mw1400 to wikikube-worker1008
  • 09:29 hnowlan@cumin1002: END (FAIL) - Cookbook sre.hosts.rename (exit_code=93) from mw1400 to wikikube-worker1008.eqiad.wmnet
  • 09:29 hnowlan@cumin1002: START - Cookbook sre.hosts.rename from mw1400 to wikikube-worker1008.eqiad.wmnet
  • 09:26 dcaro@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudcephosd1030.eqiad.wmnet
  • 09:26 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1056.eqiad.wmnet
  • 09:24 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2054.codfw.wmnet
  • 09:24 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2053.codfw.wmnet
  • 09:23 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1227.eqiad.wmnet with OS bookworm
  • 09:23 marostegui@cumin1002: dbctl commit (dc=all): 'db1227 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P64075 and previous config saved to /var/cache/conftool/dbconfig/20240605-092324-root.json
  • 09:22 marostegui@cumin1002: dbctl commit (dc=all): 'db1186 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P64074 and previous config saved to /var/cache/conftool/dbconfig/20240605-092251-root.json
  • 09:20 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1001.eqiad.wmnet with reason: host reimage
  • 09:19 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1056.eqiad.wmnet
  • 09:18 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1055.eqiad.wmnet
  • 09:17 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1001.eqiad.wmnet with reason: host reimage
  • 09:16 marostegui@cumin1002: dbctl commit (dc=all): 'db2207 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P64073 and previous config saved to /var/cache/conftool/dbconfig/20240605-091559-root.json
  • 09:15 brouberol@cumin2002: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid test cluster: Roll restart of Druid jvm daemons.
  • 09:11 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1055.eqiad.wmnet
  • 09:11 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1054.eqiad.wmnet
  • 09:07 marostegui@cumin1002: dbctl commit (dc=all): 'db1186 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P64072 and previous config saved to /var/cache/conftool/dbconfig/20240605-090745-root.json
  • 09:06 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp4044.ulsfo.wmnet
  • 09:06 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp4052.ulsfo.wmnet
  • 09:06 brouberol@cumin2002: START - Cookbook sre.druid.roll-restart-workers for Druid test cluster: Roll restart of Druid jvm daemons.
  • 09:02 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1001.eqiad.wmnet with OS bullseye
  • 09:02 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1227.eqiad.wmnet with reason: host reimage
  • 09:01 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1001.eqiad.wmnet on all recursors
  • 09:01 cgoubert@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker1001.eqiad.wmnet on all recursors
  • 09:01 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4052.ulsfo.wmnet
  • 09:00 marostegui@cumin1002: dbctl commit (dc=all): 'db2207 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P64071 and previous config saved to /var/cache/conftool/dbconfig/20240605-090053-root.json
  • 09:00 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp4044.ulsfo.wmnet
  • 08:58 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/ipoid: apply
  • 08:58 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/ipoid: apply
  • 08:58 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1227.eqiad.wmnet with reason: host reimage
  • 08:57 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader1004.wikimedia.org
  • 08:57 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2053.codfw.wmnet
  • 08:57 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1054.eqiad.wmnet
  • 08:54 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/ipoid: apply
  • 08:54 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1053.eqiad.wmnet
  • 08:54 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2052.codfw.wmnet
  • 08:53 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/ipoid: apply
  • 08:53 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader1004.wikimedia.org
  • 08:52 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:52 marostegui@cumin1002: dbctl commit (dc=all): 'db1186 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P64070 and previous config saved to /var/cache/conftool/dbconfig/20240605-085239-root.json
  • 08:52 dcaro@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd1029.eqiad.wmnet
  • 08:51 fabfur@cumin1002: START - Cookbook sre.hosts.reboot-single for host cp4052.ulsfo.wmnet
  • 08:51 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 08:51 fabfur@cumin1002: START - Cookbook sre.hosts.reboot-single for host cp4044.ulsfo.wmnet
  • 08:50 fabfur@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host cp4044.ulsfo.wmnet
  • 08:50 fabfur@cumin1002: START - Cookbook sre.hosts.reboot-single for host cp4044.ulsfo.wmnet
  • 08:49 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host testvm2002.codfw.wmnet
  • 08:47 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2052.codfw.wmnet
  • 08:47 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1053.eqiad.wmnet
  • 08:45 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host testvm2002.codfw.wmnet
  • 08:45 marostegui@cumin1002: dbctl commit (dc=all): 'db2207 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P64069 and previous config saved to /var/cache/conftool/dbconfig/20240605-084547-root.json
  • 08:45 dcaro@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudcephosd1029.eqiad.wmnet
  • 08:45 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp4052.ulsfo.wmnet
  • 08:44 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp4044.ulsfo.wmnet
  • 08:44 marostegui@cumin1002: START - Cookbook sre.hosts.reimage for host db1227.eqiad.wmnet with OS bookworm
  • 08:42 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1227', diff saved to https://phabricator.wikimedia.org/P64068 and previous config saved to /var/cache/conftool/dbconfig/20240605-084211-root.json
  • 08:42 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1227.eqiad.wmnet with reason: Reimage
  • 08:41 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 5:00:00 on db1227.eqiad.wmnet with reason: Reimage
  • 08:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rpki1001.eqiad.wmnet
  • 08:37 dcaro@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd1028.eqiad.wmnet
  • 08:37 marostegui@cumin1002: dbctl commit (dc=all): 'db1186 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P64067 and previous config saved to /var/cache/conftool/dbconfig/20240605-083733-root.json
  • 08:35 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host rpki1001.eqiad.wmnet
  • 08:34 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw1358 to wikikube-worker1001
  • 08:34 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1001
  • 08:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host rpki2002.codfw.wmnet
  • 08:19 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1186.eqiad.wmnet with OS bookworm
  • 08:18 klausman@cumin2002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:ml-serve-worker-codfw
  • 08:17 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P64063 and previous config saved to /var/cache/conftool/dbconfig/20240605-081755-marostegui.json
  • 08:14 dcaro@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd1027.eqiad.wmnet
  • 08:08 dcaro@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudcephosd1027.eqiad.wmnet
  • 08:07 dcaro@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd1026.eqiad.wmnet
  • 08:02 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155', diff saved to https://phabricator.wikimedia.org/P64062 and previous config saved to /var/cache/conftool/dbconfig/20240605-080247-marostegui.json
  • 08:01 dcaro@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudcephosd1026.eqiad.wmnet
  • 08:00 dcaro@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd1025.eqiad.wmnet
  • 08:00 cgoubert@cumin1002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-worker-codfw
  • 07:57 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mirror1001.wikimedia.org
  • 07:56 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1186.eqiad.wmnet with reason: host reimage
  • 07:54 dcaro@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudcephosd1025.eqiad.wmnet
  • 07:53 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1186.eqiad.wmnet with reason: host reimage
  • 07:50 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host mirror1001.wikimedia.org
  • 07:47 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2155 (T364299)', diff saved to https://phabricator.wikimedia.org/P64061 and previous config saved to /var/cache/conftool/dbconfig/20240605-074739-marostegui.json
  • 07:45 dcaro@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd1021.eqiad.wmnet
  • 07:38 marostegui@cumin1002: START - Cookbook sre.hosts.reimage for host db1186.eqiad.wmnet with OS bookworm
  • 07:38 dcaro@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudcephosd1021.eqiad.wmnet
  • 07:38 marostegui@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host db1186.eqiad.wmnet with OS bookworm
  • 07:38 marostegui@cumin1002: START - Cookbook sre.hosts.reimage for host db1186.eqiad.wmnet with OS bookworm
  • 07:37 marostegui@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host db1186.eqiad.wmnet with OS bookworm
  • 07:37 marostegui@cumin1002: START - Cookbook sre.hosts.reimage for host db1186.eqiad.wmnet with OS bookworm
  • 07:35 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install2004.wikimedia.org
  • 07:35 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install1004.wikimedia.org
  • 07:31 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5:00:00 on db1186.eqiad.wmnet with reason: Reimage
  • 07:31 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install2004.wikimedia.org
  • 07:30 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 5:00:00 on db1186.eqiad.wmnet with reason: Reimage
  • 07:30 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install1004.wikimedia.org
  • 07:30 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1186', diff saved to https://phabricator.wikimedia.org/P64060 and previous config saved to /var/cache/conftool/dbconfig/20240605-073024-root.json
  • 07:28 marostegui: dbmaint codfw s2 deploy schema change on db2207 T364299
  • 07:27 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db2207.codfw.wmnet with reason: Long schema change
  • 07:27 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 10:00:00 on db2207.codfw.wmnet with reason: Long schema change
  • 07:25 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2207 T366038', diff saved to https://phabricator.wikimedia.org/P64059 and previous config saved to /var/cache/conftool/dbconfig/20240605-072509-root.json
  • 07:24 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db2204 to s2 primary T366038', diff saved to https://phabricator.wikimedia.org/P64058 and previous config saved to /var/cache/conftool/dbconfig/20240605-072427-marostegui.json
  • 07:24 marostegui: Starting s2 codfw failover from db2207 to db2204 - T366038
  • 07:08 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 27 hosts with reason: Primary switchover s2 T366038
  • 07:08 marostegui@cumin1002: dbctl commit (dc=all): 'Set db2204 with weight 0 T366038', diff saved to https://phabricator.wikimedia.org/P64057 and previous config saved to /var/cache/conftool/dbconfig/20240605-070758-root.json
  • 07:07 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 27 hosts with reason: Primary switchover s2 T366038
  • 04:44 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1242 (T364069)', diff saved to https://phabricator.wikimedia.org/P64056 and previous config saved to /var/cache/conftool/dbconfig/20240605-044418-marostegui.json
  • 04:44 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1242.eqiad.wmnet with reason: Maintenance
  • 04:43 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1242.eqiad.wmnet with reason: Maintenance
  • 04:43 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241 (T364069)', diff saved to https://phabricator.wikimedia.org/P64055 and previous config saved to /var/cache/conftool/dbconfig/20240605-044355-marostegui.json
  • 04:28 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241', diff saved to https://phabricator.wikimedia.org/P64054 and previous config saved to /var/cache/conftool/dbconfig/20240605-042847-marostegui.json
  • 04:13 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241', diff saved to https://phabricator.wikimedia.org/P64053 and previous config saved to /var/cache/conftool/dbconfig/20240605-041339-marostegui.json
  • 04:13 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2155 (T364299)', diff saved to https://phabricator.wikimedia.org/P64052 and previous config saved to /var/cache/conftool/dbconfig/20240605-041306-marostegui.json
  • 04:12 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 04:12 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 04:12 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2155.codfw.wmnet with reason: Maintenance
  • 04:12 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2155.codfw.wmnet with reason: Maintenance
  • 04:12 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T364299)', diff saved to https://phabricator.wikimedia.org/P64051 and previous config saved to /var/cache/conftool/dbconfig/20240605-041227-marostegui.json
  • 03:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1177 (T352010)', diff saved to https://phabricator.wikimedia.org/P64050 and previous config saved to /var/cache/conftool/dbconfig/20240605-035855-ladsgroup.json
  • 03:58 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance
  • 03:58 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance
  • 03:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T352010)', diff saved to https://phabricator.wikimedia.org/P64049 and previous config saved to /var/cache/conftool/dbconfig/20240605-035832-ladsgroup.json
  • 03:58 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1241 (T364069)', diff saved to https://phabricator.wikimedia.org/P64048 and previous config saved to /var/cache/conftool/dbconfig/20240605-035831-marostegui.json
  • 03:57 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P64047 and previous config saved to /var/cache/conftool/dbconfig/20240605-035719-marostegui.json
  • 03:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P64046 and previous config saved to /var/cache/conftool/dbconfig/20240605-034326-ladsgroup.json
  • 03:42 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147', diff saved to https://phabricator.wikimedia.org/P64045 and previous config saved to /var/cache/conftool/dbconfig/20240605-034212-marostegui.json
  • 03:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P64044 and previous config saved to /var/cache/conftool/dbconfig/20240605-032817-ladsgroup.json
  • 03:27 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2147 (T364299)', diff saved to https://phabricator.wikimedia.org/P64043 and previous config saved to /var/cache/conftool/dbconfig/20240605-032704-marostegui.json
  • 03:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T352010)', diff saved to https://phabricator.wikimedia.org/P64042 and previous config saved to /var/cache/conftool/dbconfig/20240605-031310-ladsgroup.json
  • 02:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2152 (T352010)', diff saved to https://phabricator.wikimedia.org/P64041 and previous config saved to /var/cache/conftool/dbconfig/20240605-023423-ladsgroup.json
  • 02:34 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2152.codfw.wmnet with reason: Maintenance
  • 02:34 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2152.codfw.wmnet with reason: Maintenance

2024-06-04

  • 23:42 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2147 (T364299)', diff saved to https://phabricator.wikimedia.org/P64040 and previous config saved to /var/cache/conftool/dbconfig/20240604-234228-marostegui.json
  • 23:42 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2147.codfw.wmnet with reason: Maintenance
  • 23:42 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2147.codfw.wmnet with reason: Maintenance
  • 23:15 tzatziki: removing one file for legal compliance
  • 23:09 dzahn@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on miscweb1003.eqiad.wmnet with reason: reboot T366555
  • 23:09 dzahn@cumin1002: START - Cookbook sre.hosts.downtime for 0:10:00 on miscweb1003.eqiad.wmnet with reason: reboot T366555
  • 22:50 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/full/20240527/ using stat1009.eqiad.wmnet)
  • 22:47 dzahn@cumin1002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 0:10:00 on contint.wikimedia.org with reason: reboot T366555
  • 22:47 dzahn@cumin1002: START - Cookbook sre.hosts.downtime for 0:10:00 on contint.wikimedia.org with reason: reboot T366555
  • 22:47 tzatziki: removing one file for legal compliance
  • 22:46 dzahn@cumin1002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 0:10:00 on contint1002.wikimedia.org with reason: reboot T366555
  • 22:46 dzahn@cumin1002: START - Cookbook sre.hosts.downtime for 0:10:00 on contint1002.wikimedia.org with reason: reboot T366555
  • 22:36 dzahn@cumin1002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 0:10:00 on contint1002.wikimedia.org with reason: reboot T366555
  • 22:36 dzahn@cumin1002: START - Cookbook sre.hosts.downtime for 0:10:00 on contint1002.wikimedia.org with reason: reboot T366555
  • 22:36 mutante: CI - (integration.wikimedia.org) short downtime for maintenance
  • 22:35 dzahn@cumin1002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 0:10:00 on contint.wikimedia.org with reason: reboot T366555
  • 22:35 dzahn@cumin1002: START - Cookbook sre.hosts.downtime for 0:10:00 on contint.wikimedia.org with reason: reboot T366555
  • 22:29 tzatziki: removing two files for legal compliance
  • 22:16 tzatziki: removing three files for legal compliance
  • 22:08 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-reload reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/full/20240527/ using stat1009.eqiad.wmnet)
  • 22:02 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/full/20240527/ using stat1009.eqiad.wmnet)
  • 22:02 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-reload reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/full/20240527/ using stat1009.eqiad.wmnet)
  • 22:00 kamila@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-ctrl1001.eqiad.wmnet with OS bullseye
  • 21:59 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/full/20240527/ using stat1009.eqiad.wmnet)
  • 21:59 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-reload reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/full/20240527/ using stat1009.eqiad.wmnet)
  • 21:41 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/full/20240527/ using stat1009.eqiad.wmnet)
  • 21:41 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-reload reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/full/20240527/ using stat1009.eqiad.wmnet)
  • 21:35 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/full/20240527/ using stat1009.eqiad.wmnet)
  • 21:34 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-reload reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/full/20240527/ using stat1009.eqiad.wmnet)
  • 21:33 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/full/20240527/ using stat1009.eqiad.wmnet)
  • 21:33 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-reload reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/full/20240527/ using stat1009.eqiad.wmnet)
  • 21:33 urbanecm@deploy1002: Finished scap: Backport for Disable font size options on specified pages for most wikis (T366334) (duration: 15m 10s)
  • 21:32 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/full/20240527/ using stat1009.eqiad.wmnet)
  • 21:32 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-reload reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/full/20240527/ using stat1009.eqiad.wmnet)
  • 21:28 ryankemper@cumin2002: END (FAIL) - Cookbook sre.wdqs.data-reload (exit_code=99) reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/full/20240527/ using stat1009.eqiad.wmnet)
  • 21:28 ryankemper@cumin2002: START - Cookbook sre.wdqs.data-reload reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/full/20240527/ using stat1009.eqiad.wmnet)
  • 21:24 urbanecm@deploy1002: toyofuku and urbanecm: Continuing with sync
  • 21:21 urbanecm@deploy1002: toyofuku and urbanecm: Backport for Disable font size options on specified pages for most wikis (T366334) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:18 urbanecm@deploy1002: Started scap: Backport for Disable font size options on specified pages for most wikis (T366334)
  • 21:10 tgr@deploy1002: Finished scap: Backport for multiversion: Support beta for upload hostname check, multiversion: Add tests for MWMultiVersion::getMediaWiki() (duration: 16m 33s)
  • 21:07 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl1001.eqiad.wmnet with OS bullseye
  • 21:06 kamila@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-ctrl1001']
  • 21:01 tgr@deploy1002: tgr: Continuing with sync
  • 20:58 tgr@deploy1002: tgr: Backport for multiversion: Support beta for upload hostname check, multiversion: Add tests for MWMultiVersion::getMediaWiki() synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:56 kamila@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-ctrl1001']
  • 20:53 tgr@deploy1002: Started scap: Backport for multiversion: Support beta for upload hostname check, multiversion: Add tests for MWMultiVersion::getMediaWiki()
  • 20:52 kamila@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-ctrl1001.eqiad.wmnet with OS bullseye
  • 20:47 tgr@deploy1002: Finished scap: Backport for beta: Introduce new test2wiki on test2.wikipedia.beta.wmcloud.org (T355281) (duration: 13m 12s)
  • 20:39 tgr@deploy1002: tgr and pmiazga: Continuing with sync
  • 20:37 tgr@deploy1002: tgr and pmiazga: Backport for beta: Introduce new test2wiki on test2.wikipedia.beta.wmcloud.org (T355281) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:34 tgr@deploy1002: Started scap: Backport for beta: Introduce new test2wiki on test2.wikipedia.beta.wmcloud.org (T355281)
  • 20:28 ladsgroup@deploy1002: Finished scap: Backport for [pawiki] Enable wgMinervaEnableSiteNotice (T366434) (duration: 13m 24s)
  • 20:27 jhathaway: vacuuming pcc db
  • 20:26 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 20:25 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 20:25 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2137 (T364299)', diff saved to https://phabricator.wikimedia.org/P64039 and previous config saved to /var/cache/conftool/dbconfig/20240604-202554-marostegui.json
  • 20:22 jclark@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts mw1358.eqiad.wmnet
  • 20:22 jclark@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mw1358.eqiad.wmnet
  • 20:21 jclark@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts mw1358.eqiad.wmnet
  • 20:21 jclark@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mw1358.eqiad.wmnet
  • 20:19 ladsgroup@deploy1002: pppery and ladsgroup: Continuing with sync
  • 20:17 ladsgroup@deploy1002: pppery and ladsgroup: Backport for [pawiki] Enable wgMinervaEnableSiteNotice (T366434) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:15 ladsgroup@deploy1002: Started scap: Backport for [pawiki] Enable wgMinervaEnableSiteNotice (T366434)
  • 20:10 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2137', diff saved to https://phabricator.wikimedia.org/P64038 and previous config saved to /var/cache/conftool/dbconfig/20240604-201047-marostegui.json
  • 20:00 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl1001.eqiad.wmnet with OS bullseye
  • 19:59 kamila@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['wikikube-ctrl1001']
  • 19:55 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2137', diff saved to https://phabricator.wikimedia.org/P64037 and previous config saved to /var/cache/conftool/dbconfig/20240604-195539-marostegui.json
  • 19:49 ecarg@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 19:49 ecarg@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 19:47 kamila@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['wikikube-ctrl1001']
  • 19:44 kamila@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-ctrl1001.eqiad.wmnet with OS bullseye
  • 19:40 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2137 (T364299)', diff saved to https://phabricator.wikimedia.org/P64036 and previous config saved to /var/cache/conftool/dbconfig/20240604-194031-marostegui.json
  • 19:38 mutante: https://gerrit-replica.wikimedia.org - short downtime for maintenance
  • 19:38 dzahn@cumin1002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 0:10:00 on gerrit-replica.wikimedia.org with reason: reboot T366555
  • 19:38 dzahn@cumin1002: START - Cookbook sre.hosts.downtime for 0:10:00 on gerrit-replica.wikimedia.org with reason: reboot T366555
  • 19:37 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl1001.eqiad.wmnet with OS bullseye
  • 19:37 dzahn@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on gerrit2002.wikimedia.org with reason: reboot T366555
  • 19:37 kamila@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-ctrl1001.eqiad.wmnet with OS bullseye
  • 19:37 dzahn@cumin1002: START - Cookbook sre.hosts.downtime for 0:10:00 on gerrit2002.wikimedia.org with reason: reboot T366555
  • 19:36 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl1001.eqiad.wmnet with OS bullseye
  • 19:33 dzahn@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on contint2002.wikimedia.org with reason: reboot T366555
  • 19:32 dzahn@cumin1002: START - Cookbook sre.hosts.downtime for 0:10:00 on contint2002.wikimedia.org with reason: reboot T366555
  • 19:16 mutante: releases.wikimedia.org - short downtime for maintenance
  • 19:14 dzahn@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:10:00 on releases1003.eqiad.wmnet with reason: reboot T366555
  • 19:13 dzahn@cumin1002: START - Cookbook sre.hosts.downtime for 0:10:00 on releases1003.eqiad.wmnet with reason: reboot T366555
  • 19:12 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 19:12 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 19:09 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1241 (T364069)', diff saved to https://phabricator.wikimedia.org/P64035 and previous config saved to /var/cache/conftool/dbconfig/20240604-190931-marostegui.json
  • 19:09 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1241.eqiad.wmnet with reason: Maintenance
  • 19:09 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1241.eqiad.wmnet with reason: Maintenance
  • 19:09 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1238 (T364069)', diff saved to https://phabricator.wikimedia.org/P64034 and previous config saved to /var/cache/conftool/dbconfig/20240604-190906-marostegui.json
  • 19:06 ryankemper: [WDQS Deploy] Restarting `wdqs-categories` across lvs-managed hosts, one node at a time: `sudo -E cumin -b 1 'A:wdqs-all and not A:wdqs-test' 'depool && sleep 45 && systemctl restart wdqs-categories && sleep 45 && pool'`
  • 19:06 ryankemper: [WDQS Deploy] Restarted `wdqs-categories` across all test hosts simultaneously: `sudo -E cumin 'A:wdqs-test' 'systemctl restart wdqs-categories'`
  • 19:06 ryankemper: [WDQS Deploy] Restarted `wdqs-updater` across all hosts, 4 hosts at a time: `sudo -E cumin -b 4 'A:wdqs-all' 'systemctl restart wdqs-updater'`
  • 19:00 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@43b966f]: 0.3.142 (duration: 12m 53s)
  • 18:53 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1238', diff saved to https://phabricator.wikimedia.org/P64033 and previous config saved to /var/cache/conftool/dbconfig/20240604-185358-marostegui.json
  • 18:48 ryankemper: [WDQS Deploy] Forgot to run the command to set git hash to tip of origin/master so deploy was a partial no-op. Re-rolling...
  • 18:47 ryankemper@deploy1002: Started deploy [wdqs/wdqs@43b966f]: 0.3.142
  • 18:46 ryankemper@deploy1002: Finished deploy [wdqs/wdqs@143ca33]: 0.3.142 (duration: 02m 02s)
  • 18:45 ryankemper: [WDQS Deploy] Tests passing following deploy of `0.3.142` on canary `wdqs1016`; proceeding to rest of fleet
  • 18:44 ryankemper@deploy1002: Started deploy [wdqs/wdqs@143ca33]: 0.3.142
  • 18:41 ryankemper: [WDQS Deploy] Gearing up for deploy of wdqs `0.3.142`. Pre-deploy tests passing on canary `wdqs1016`
  • 18:38 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1238', diff saved to https://phabricator.wikimedia.org/P64032 and previous config saved to /var/cache/conftool/dbconfig/20240604-183850-marostegui.json
  • 18:35 mutante: aphlict - (phab realtime notifications) - reboots
  • 18:30 mutante: doc.wikimedia.org - very short downtime for maintenance
  • 18:28 dzahn@cumin1002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 0:10:00 on doc1003.eqiad.wmnet with reason: reboot T366555
  • 18:28 dzahn@cumin1002: START - Cookbook sre.hosts.downtime for 0:10:00 on doc1003.eqiad.wmnet with reason: reboot T366555
  • 18:28 dzahn@cumin1002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 0:10:00 on doc.wikimedia.org with reason: reboot T366555
  • 18:28 dzahn@cumin1002: START - Cookbook sre.hosts.downtime for 0:10:00 on doc.wikimedia.org with reason: reboot T366555
  • 18:26 dduvall@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.43.0-wmf.8 refs T361402
  • 18:23 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1238 (T364069)', diff saved to https://phabricator.wikimedia.org/P64031 and previous config saved to /var/cache/conftool/dbconfig/20240604-182342-marostegui.json
  • 18:15 kamila@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-ctrl1001.eqiad.wmnet with OS bullseye
  • 18:04 sukhe@cumin1002: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on P{cp7014*} and A:cp
  • 17:54 sukhe@cumin1002: START - Cookbook sre.cdn.roll-reboot rolling reboot on P{cp7014*} and A:cp
  • 17:53 sukhe: sudo cumin 'A:cp-upload and A:magru' "sed -i '/\sup ethtool -A eno12399np0/d' /etc/network/interfaces"
  • 17:51 sukhe: sudo cumin 'A:cp-text and A:magru' "sed -i '/\sup ethtool -A eno12399np0/d' /etc/network/interfaces"
  • 17:49 sukhe@cumin1002: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on P{cp7002*} and A:cp
  • 17:39 sukhe@cumin1002: START - Cookbook sre.cdn.roll-reboot rolling reboot on P{cp7002*} and A:cp
  • 17:23 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl1001.eqiad.wmnet with OS bullseye
  • 17:22 sukhe: sudo cumin 'A:cp and A:magru' 'run-puppet-agent'
  • 17:15 kamila@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:15 kamila@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Moved wikikube-ctrl1001 to a new rack - kamila@cumin1002"
  • 17:14 kamila@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Moved wikikube-ctrl1001 to a new rack - kamila@cumin1002"
  • 17:11 kamila@cumin1002: START - Cookbook sre.dns.netbox
  • 16:53 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp700[12].magru.wmnet,service=(cdn|ats-be)
  • 16:52 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 16:51 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: apply
  • 16:41 elukey: delete other 2 pods in eventgate-main on wikikube-eqiad to test if envoy on them is in a weird state
  • 16:36 dcaro@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd1010.eqiad.wmnet
  • 16:31 elukey: delete 3 pods in eventgate-main on wikikube-eqiad to test if envoy on them is in a weird state
  • 16:29 dcaro@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudcephosd1010.eqiad.wmnet
  • 16:22 marostegui@cumin1002: dbctl commit (dc=all): 'db2203 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P64028 and previous config saved to /var/cache/conftool/dbconfig/20240604-162241-root.json
  • 16:22 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp7002.magru.wmnet
  • 16:15 fabfur@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cp7001.magru.wmnet
  • 16:12 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2137 (T364299)', diff saved to https://phabricator.wikimedia.org/P64025 and previous config saved to /var/cache/conftool/dbconfig/20240604-161233-marostegui.json
  • 16:12 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2137.codfw.wmnet with reason: Maintenance
  • 16:12 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2137.codfw.wmnet with reason: Maintenance
  • 16:12 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2136 (T364299)', diff saved to https://phabricator.wikimedia.org/P64024 and previous config saved to /var/cache/conftool/dbconfig/20240604-161210-marostegui.json
  • 16:11 fabfur@cumin1002: START - Cookbook sre.hosts.reboot-single for host cp7002.magru.wmnet
  • 16:10 fabfur@cumin1002: START - Cookbook sre.hosts.reboot-single for host cp7001.magru.wmnet
  • 16:10 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1013.eqiad.wmnet,service=s1
  • 16:10 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1013.eqiad.wmnet,service=s3
  • 16:09 fnegri@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host clouddb1013.eqiad.wmnet
  • 16:09 dcaro@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd1005.eqiad.wmnet
  • 16:08 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be2051.codfw.wmnet
  • 16:07 marostegui@cumin1002: dbctl commit (dc=all): 'db2203 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P64023 and previous config saved to /var/cache/conftool/dbconfig/20240604-160735-root.json
  • 16:05 btullis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/image-suggestion: apply
  • 16:05 btullis@deploy1002: helmfile [eqiad] START helmfile.d/services/image-suggestion: apply
  • 16:04 btullis@deploy1002: helmfile [codfw] DONE helmfile.d/services/image-suggestion: apply
  • 16:04 aokoth@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host phab2002.codfw.wmnet
  • 16:04 btullis@deploy1002: helmfile [codfw] START helmfile.d/services/image-suggestion: apply
  • 16:02 dcaro@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudcephosd1005.eqiad.wmnet
  • 16:00 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1052.eqiad.wmnet
  • 16:00 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop: apply
  • 15:59 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop: apply
  • 15:58 aokoth@cumin1002: START - Cookbook sre.hosts.reboot-single for host phab2002.codfw.wmnet
  • 15:57 dcaro@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd1004.eqiad.wmnet
  • 15:57 elukey@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM aux-k8s-etcd1003.eqiad.wmnet
  • 15:57 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2136', diff saved to https://phabricator.wikimedia.org/P64022 and previous config saved to /var/cache/conftool/dbconfig/20240604-155701-marostegui.json
  • 15:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Bumping db1194 weight', diff saved to https://phabricator.wikimedia.org/P64021 and previous config saved to /var/cache/conftool/dbconfig/20240604-155629-ladsgroup.json
  • 15:55 fnegri@cumin1002: START - Cookbook sre.hosts.reboot-single for host clouddb1013.eqiad.wmnet
  • 15:53 elukey@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM aux-k8s-etcd1003.eqiad.wmnet
  • 15:53 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1052.eqiad.wmnet
  • 15:53 mvernon@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ms-be1051.eqiad.wmnet
  • 15:52 dcaro@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudcephosd1004.eqiad.wmnet
  • 15:52 marostegui@cumin1002: dbctl commit (dc=all): 'db2203 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P64020 and previous config saved to /var/cache/conftool/dbconfig/20240604-155228-root.json
  • 15:52 dcaro@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd1003.eqiad.wmnet
  • 15:52 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1013.eqiad.wmnet,service=s3
  • 15:51 elukey@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM aux-k8s-etcd1002.eqiad.wmnet
  • 15:51 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1013.eqiad.wmnet,service=s1
  • 15:48 aokoth@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host miscweb2003.codfw.wmnet
  • 15:47 elukey@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM aux-k8s-etcd1002.eqiad.wmnet
  • 15:47 dcaro@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudcephosd1003.eqiad.wmnet
  • 15:47 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host ms-be2051.codfw.wmnet
  • 15:47 elukey@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM aux-k8s-etcd1001.eqiad.wmnet
  • 15:46 mvernon@cumin1002: START - Cookbook sre.hosts.reboot-single for host ms-be1051.eqiad.wmnet
  • 15:44 aokoth@cumin1002: START - Cookbook sre.hosts.reboot-single for host miscweb2003.codfw.wmnet
  • 15:43 elukey@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM aux-k8s-etcd1001.eqiad.wmnet
  • 15:43 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 15:43 elukey@cumin1002: END (FAIL) - Cookbook sre.ganeti.reboot-vm (exit_code=99) for VM aux-k8s-etcd1001.eqiad.wmnet
  • 15:42 elukey@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM aux-k8s-etcd1001.eqiad.wmnet
  • 15:42 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
  • 15:42 aokoth@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host vrts2001.codfw.wmnet
  • 15:41 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2136', diff saved to https://phabricator.wikimedia.org/P64019 and previous config saved to /var/cache/conftool/dbconfig/20240604-154153-marostegui.json
  • 15:40 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on A:cp-text_magru
  • 15:38 aokoth@cumin1002: START - Cookbook sre.hosts.reboot-single for host vrts2001.codfw.wmnet
  • 15:37 elukey@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM aux-k8s-ctrl1002.eqiad.wmnet
  • 15:37 cgoubert@cumin1002: conftool action : set/pooled=yes; selector: name=kubernetes203(0|3|5).codfw.wmnet,cluster=kubernetes,service=kubesvc
  • 15:37 marostegui@cumin1002: dbctl commit (dc=all): 'db2203 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P64018 and previous config saved to /var/cache/conftool/dbconfig/20240604-153722-root.json
  • 15:36 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for kubernetes[2030,2033,2035].codfw.wmnet
  • 15:36 dcaro@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd1002.eqiad.wmnet
  • 15:36 cgoubert@cumin1002: START - Cookbook sre.hosts.remove-downtime for kubernetes[2030,2033,2035].codfw.wmnet
  • 15:36 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop: apply
  • 15:34 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop: apply
  • 15:31 dcaro@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudcephosd1002.eqiad.wmnet
  • 15:31 elukey@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM aux-k8s-ctrl1002.eqiad.wmnet
  • 15:29 tchin@deploy1002: Finished deploy [airflow-dags/analytics_test@a279784]: (no justification provided) (duration: 00m 10s)
  • 15:29 tchin@deploy1002: Started deploy [airflow-dags/analytics_test@a279784]: (no justification provided)
  • 15:29 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 15:28 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
  • 15:28 dcaro@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd1001.eqiad.wmnet
  • 15:27 tchin@deploy1002: Finished deploy [airflow-dags/analytics@a279784]: (no justification provided) (duration: 00m 27s)
  • 15:27 dcausse@deploy1002: Finished deploy [airflow-dags/search@a279784]: search: bump to discolytics 0.24 and name n-triples dumps (duration: 00m 27s)
  • 15:27 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
  • 15:27 tchin@deploy1002: Started deploy [airflow-dags/analytics@a279784]: (no justification provided)
  • 15:27 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: apply
  • 15:27 dcausse@deploy1002: Started deploy [airflow-dags/search@a279784]: search: bump to discolytics 0.24 and name n-triples dumps
  • 15:26 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2136 (T364299)', diff saved to https://phabricator.wikimedia.org/P64017 and previous config saved to /var/cache/conftool/dbconfig/20240604-152644-marostegui.json
  • 15:25 elukey@cumin1002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM aux-k8s-ctrl1001.eqiad.wmnet
  • 15:22 marostegui@cumin1002: dbctl commit (dc=all): 'db2203 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P64015 and previous config saved to /var/cache/conftool/dbconfig/20240604-152216-root.json
  • 15:22 dcaro@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudcephosd1001.eqiad.wmnet
  • 15:21 kamila@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-ctrl1001
  • 15:21 kamila@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-ctrl1001
  • 15:19 kamila@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:19 elukey@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM aux-k8s-ctrl1001.eqiad.wmnet
  • 15:18 elukey@cumin1002: END (ERROR) - Cookbook sre.ganeti.reboot-vm (exit_code=97) for VM aux-k8s-ctrl1001.eqiad.wmnet
  • 15:18 elukey@cumin1002: START - Cookbook sre.ganeti.reboot-vm for VM aux-k8s-ctrl1001.eqiad.wmnet
  • 15:18 kamila@cumin1002: START - Cookbook sre.dns.netbox
  • 15:16 ejegg: fundraising civicrm upgraded from 44900b8c to 71ed6bed
  • 15:15 kamila@cumin1002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 15:15 kamila@cumin1002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Moved wikikube-ctrl1001 to a new rack - kamila@cumin1002"
  • 15:15 ejegg: payments-wiki upgraded from 0174d89c to c255fda8
  • 15:13 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest1001.mgmt.eqiad.wmnet with reboot policy GRACEFUL
  • 15:12 elukey@cumin1002: START - Cookbook sre.hosts.provision for host sretest1001.mgmt.eqiad.wmnet with reboot policy GRACEFUL
  • 15:12 dancy@deploy1002: Installation of scap version "4.85.0" completed for 294 hosts
  • 15:11 kamila@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Moved wikikube-ctrl1001 to a new rack - kamila@cumin1002"
  • 15:11 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest1001.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:11 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on A:cp-upload_magru
  • 15:11 dancy@deploy1002: Installing scap version "4.85.0" for 294 hosts
  • 15:11 elukey@cumin1002: START - Cookbook sre.hosts.provision for host sretest1001.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:09 kamila@cumin1002: START - Cookbook sre.dns.netbox
  • 15:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1172 (T352010)', diff saved to https://phabricator.wikimedia.org/P64014 and previous config saved to /var/cache/conftool/dbconfig/20240604-150835-ladsgroup.json
  • 15:08 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance
  • 15:08 elukey@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest1004.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:08 elukey@cumin1002: START - Cookbook sre.hosts.provision for host sretest1004.mgmt.eqiad.wmnet with reboot policy FORCED
  • 15:08 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance
  • 15:07 marostegui@cumin1002: dbctl commit (dc=all): 'db2203 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P64013 and previous config saved to /var/cache/conftool/dbconfig/20240604-150710-root.json
  • 15:06 sukhe@cumin1002: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on P{cp3066*} and A:cp
  • 15:05 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host sretest1001.mgmt.eqiad.wmnet with reboot policy GRACEFUL
  • 15:04 brennen@deploy1002: Finished deploy [phabricator/deployment@ef680d8]: deploy phab1004 for T366605 (duration: 00m 32s)
  • 15:04 elukey@cumin1002: START - Cookbook sre.hosts.provision for host sretest1001.mgmt.eqiad.wmnet with reboot policy GRACEFUL
  • 15:04 brennen@deploy1002: Started deploy [phabricator/deployment@ef680d8]: deploy phab1004 for T366605
  • 15:03 aokoth@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab1004.eqiad.wmnet with reason: Phorge Update
  • 15:03 aokoth@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on phab1004.eqiad.wmnet with reason: Phorge Update
  • 15:03 brennen@deploy1002: Finished deploy [phabricator/deployment@ef680d8]: deploy phab2002 for T366605 (duration: 00m 33s)
  • 15:02 brennen@deploy1002: Started deploy [phabricator/deployment@ef680d8]: deploy phab2002 for T366605
  • 15:02 aokoth@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab2002.codfw.wmnet with reason: Phorge Update
  • 15:02 aokoth@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on phab2002.codfw.wmnet with reason: Phorge Update
  • 14:57 kamila@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-ctrl1001
  • 14:56 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 14:56 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 14:55 kamila@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-ctrl1001
  • 14:55 sukhe@cumin1002: START - Cookbook sre.cdn.roll-reboot rolling reboot on P{cp3066*} and A:cp
  • 14:53 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 14:52 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/changeprop-jobqueue: apply
  • 14:52 marostegui@cumin1002: dbctl commit (dc=all): 'db2203 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P64012 and previous config saved to /var/cache/conftool/dbconfig/20240604-145203-root.json
  • 14:49 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop-jobqueue: apply
  • 14:48 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on kubernetes[2030,2033,2035].codfw.wmnet with reason: Hardware issue
  • 14:48 sukhe@cumin1002: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on P{cp4045*} and A:cp
  • 14:48 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on kubernetes[2030,2033,2035].codfw.wmnet with reason: Hardware issue
  • 14:48 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/changeprop-jobqueue: apply
  • 14:46 cgoubert@cumin1002: conftool action : set/pooled=yes; selector: name=kubernetes203(1|4).codfw.wmnet,cluster=kubernetes,service=kubesvc
  • 14:43 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/changeprop: apply
  • 14:43 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/changeprop: apply
  • 14:38 sukhe@cumin1002: START - Cookbook sre.cdn.roll-reboot rolling reboot on P{cp4045*} and A:cp
  • 14:33 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs7003.magru.wmnet
  • 14:27 sukhe@cumin1002: START - Cookbook sre.hosts.reboot-single for host lvs7003.magru.wmnet
  • 14:22 cgoubert@cumin1002: END (FAIL) - Cookbook sre.k8s.reboot-nodes (exit_code=1) rolling reboot on A:wikikube-worker-codfw
  • 14:14 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts wikikube-ctrl1001.eqiad.wmnet
  • 14:14 kamila@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:14 kamila@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wikikube-ctrl1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - kamila@cumin1002"
  • 14:10 kamila@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wikikube-ctrl1001.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - kamila@cumin1002"
  • 14:06 kamila@cumin1002: START - Cookbook sre.dns.netbox
  • 14:02 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs7002.magru.wmnet
  • 14:00 kamila@cumin1002: START - Cookbook sre.hosts.decommission for hosts wikikube-ctrl1001.eqiad.wmnet
  • 13:59 sukhe@cumin1002: START - Cookbook sre.hosts.reboot-single for host lvs7002.magru.wmnet
  • 13:59 kamila@cumin1002: conftool action : set/pooled=inactive; selector: name=wikikube-ctrl1001.eqiad.wmnet
  • 13:46 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-serve-ctrl1002.eqiad.wmnet
  • 13:42 klausman@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ml-serve-ctrl1002.eqiad.wmnet
  • 13:42 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-serve-ctrl1001.eqiad.wmnet
  • 13:37 klausman@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ml-serve-ctrl1001.eqiad.wmnet
  • 13:35 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-serve-ctrl2002.codfw.wmnet
  • 13:32 ladsgroup@cumin1002: dbctl commit (dc=all): 'Bumping db1194 weight', diff saved to https://phabricator.wikimedia.org/P64009 and previous config saved to /var/cache/conftool/dbconfig/20240604-133250-ladsgroup.json
  • 13:29 klausman@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ml-serve-ctrl2002.codfw.wmnet
  • 13:29 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-serve-ctrl2001.codfw.wmnet
  • 13:25 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2098.codfw.wmnet with reason: Maintenance
  • 13:25 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2098.codfw.wmnet with reason: Maintenance
  • 13:24 klausman@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ml-serve-ctrl2001.codfw.wmnet
  • 13:23 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-staging-ctrl2002.codfw.wmnet
  • 13:22 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 13:21 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub: sync on production
  • 13:20 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 13:20 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 13:19 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 13:18 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub: sync on production
  • 13:17 klausman@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ml-staging-ctrl2002.codfw.wmnet
  • 13:17 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-staging-ctrl2001.codfw.wmnet
  • 13:14 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host lvs7001.magru.wmnet
  • 13:12 klausman@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ml-staging-ctrl2001.codfw.wmnet
  • 13:11 fabfur@cumin1002: START - Cookbook sre.cdn.roll-reboot rolling reboot on A:cp-upload_magru
  • 13:11 fabfur@cumin1002: START - Cookbook sre.cdn.roll-reboot rolling reboot on A:cp-text_magru
  • 13:11 sukhe@cumin1002: START - Cookbook sre.hosts.reboot-single for host lvs7001.magru.wmnet
  • 13:10 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-staging-etcd2003.codfw.wmnet
  • 13:08 klausman@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ml-staging-etcd2003.codfw.wmnet
  • 13:08 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-staging-etcd2002.codfw.wmnet
  • 13:05 klausman@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ml-staging-etcd2002.codfw.wmnet
  • 13:05 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-staging-etcd2001.codfw.wmnet
  • 13:03 klausman@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ml-staging-etcd2001.codfw.wmnet
  • 13:02 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-etcd1002.eqiad.wmnet
  • 13:00 klausman@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ml-etcd1002.eqiad.wmnet
  • 12:59 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-etcd1001.eqiad.wmnet
  • 12:57 klausman@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ml-etcd1001.eqiad.wmnet
  • 12:56 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-etcd2003.codfw.wmnet
  • 12:53 brouberol@cumin2002: END (PASS) - Cookbook sre.wdqs.restart (exit_code=0)
  • 12:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader2004.wikimedia.org
  • 12:53 klausman@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ml-etcd2003.codfw.wmnet
  • 12:52 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-etcd2002.codfw.wmnet
  • 12:48 klausman@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ml-etcd2002.codfw.wmnet
  • 12:48 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader2004.wikimedia.org
  • 12:44 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 12:44 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 12:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1246 (T352010)', diff saved to https://phabricator.wikimedia.org/P64008 and previous config saved to /var/cache/conftool/dbconfig/20240604-124432-ladsgroup.json
  • 12:43 klausman@cumin2002: END (PASS) - Cookbook sre.ganeti.reboot-vm (exit_code=0) for VM ml-etcd2001.codfw.wmnet
  • 12:39 klausman@cumin2002: START - Cookbook sre.ganeti.reboot-vm for VM ml-etcd2001.codfw.wmnet
  • 12:39 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host urldownloader1003.wikimedia.org
  • 12:34 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host urldownloader1003.wikimedia.org
  • 12:32 brouberol@cumin2002: START - Cookbook sre.wdqs.restart
  • 12:32 brouberol@cumin2002: END (ERROR) - Cookbook sre.wdqs.restart (exit_code=97)
  • 12:32 brouberol@cumin2002: START - Cookbook sre.wdqs.restart
  • 12:32 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-druid1001.eqiad.wmnet
  • 12:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1246', diff saved to https://phabricator.wikimedia.org/P64007 and previous config saved to /var/cache/conftool/dbconfig/20240604-122924-ladsgroup.json
  • 12:29 filippo@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host centrallog1002.eqiad.wmnet
  • 12:28 taavi@cumin1002: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0) for database dtpwiki (T365229)
  • 12:26 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host an-test-druid1001.eqiad.wmnet
  • 12:26 marostegui@cumin1002: dbctl commit (dc=all): 'db1156 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P64006 and previous config saved to /var/cache/conftool/dbconfig/20240604-122602-root.json
  • 12:22 filippo@cumin1002: START - Cookbook sre.hosts.reboot-single for host centrallog1002.eqiad.wmnet
  • 12:17 klausman@cumin2002: END (PASS) - Cookbook sre.cassandra.roll-reboot (exit_code=0) rolling reboot on A:ml-cache-eqiad
  • 12:15 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/image-suggestion: apply
  • 12:15 btullis@deploy1002: helmfile [staging] START helmfile.d/services/image-suggestion: apply
  • 12:14 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/image-suggestion: apply
  • 12:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1246', diff saved to https://phabricator.wikimedia.org/P64005 and previous config saved to /var/cache/conftool/dbconfig/20240604-121415-ladsgroup.json
  • 12:14 btullis@deploy1002: helmfile [staging] START helmfile.d/services/image-suggestion: apply
  • 12:12 btullis@deploy1002: helmfile [staging] DONE helmfile.d/services/image-suggestion: apply
  • 12:12 btullis@deploy1002: helmfile [staging] START helmfile.d/services/image-suggestion: apply
  • 12:10 marostegui@cumin1002: dbctl commit (dc=all): 'db1156 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P64004 and previous config saved to /var/cache/conftool/dbconfig/20240604-121056-root.json
  • 12:08 klausman@cumin2002: END (PASS) - Cookbook sre.cassandra.roll-reboot (exit_code=0) rolling reboot on A:ml-cache-codfw
  • 12:02 taavi@cumin1002: START - Cookbook sre.wikireplicas.add-wiki for database dtpwiki (T365229)
  • 11:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1246 (T352010)', diff saved to https://phabricator.wikimedia.org/P64003 and previous config saved to /var/cache/conftool/dbconfig/20240604-115907-ladsgroup.json
  • 11:55 marostegui@cumin1002: dbctl commit (dc=all): 'db1156 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P64002 and previous config saved to /var/cache/conftool/dbconfig/20240604-115549-root.json
  • 11:54 hnowlan: depooling 3 api appservers and 2 appservers in advance of reimaging
  • 11:50 klausman@cumin2002: START - Cookbook sre.cassandra.roll-reboot rolling reboot on A:ml-cache-eqiad
  • 11:44 klausman@cumin2002: START - Cookbook sre.cassandra.roll-reboot rolling reboot on A:ml-cache-codfw
  • 11:41 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2136 (T364299)', diff saved to https://phabricator.wikimedia.org/P64001 and previous config saved to /var/cache/conftool/dbconfig/20240604-114157-marostegui.json
  • 11:41 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2136.codfw.wmnet with reason: Maintenance
  • 11:41 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2136.codfw.wmnet with reason: Maintenance
  • 11:40 marostegui@cumin1002: dbctl commit (dc=all): 'db1156 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P64000 and previous config saved to /var/cache/conftool/dbconfig/20240604-114043-root.json
  • 11:39 cgoubert@cumin1002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-worker-codfw
  • 11:39 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies (exit_code=0) rolling reboot on A:thanos-fe
  • 11:36 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2003.codfw.wmnet
  • 11:29 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2003.codfw.wmnet
  • 11:27 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2001.codfw.wmnet
  • 11:25 marostegui@cumin1002: dbctl commit (dc=all): 'db1156 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P63999 and previous config saved to /var/cache/conftool/dbconfig/20240604-112537-root.json
  • 11:21 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2001.codfw.wmnet
  • 11:15 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudlb2003-dev.codfw.wmnet
  • 11:10 marostegui@cumin1002: dbctl commit (dc=all): 'db1156 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P63998 and previous config saved to /var/cache/conftool/dbconfig/20240604-111031-root.json
  • 11:06 taavi@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudlb2003-dev.codfw.wmnet
  • 11:06 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudlb2002-dev.codfw.wmnet
  • 11:06 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:04 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 11:00 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts mw1358.eqiad.wmnet
  • 10:59 cgoubert@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mw1358.eqiad.wmnet
  • 10:59 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudlb1002.eqiad.wmnet
  • 10:57 taavi@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudlb2002-dev.codfw.wmnet
  • 10:57 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudlb2001-dev.codfw.wmnet
  • 10:55 marostegui@cumin1002: dbctl commit (dc=all): 'db1156 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P63996 and previous config saved to /var/cache/conftool/dbconfig/20240604-105525-root.json
  • 10:54 filippo@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host centrallog1002.eqiad.wmnet
  • 10:53 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on mw1358.eqiad.wmnet with reason: Waiting on iDrac update
  • 10:53 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on mw1358.eqiad.wmnet with reason: Waiting on iDrac update
  • 10:50 taavi@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudlb1002.eqiad.wmnet
  • 10:50 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudlb1001.eqiad.wmnet
  • 10:49 taavi@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudlb2001-dev.codfw.wmnet
  • 10:48 mvernon@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-thanos-proxies rolling reboot on A:thanos-fe
  • 10:46 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling reboot on P{ms-fe2*} and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
  • 10:45 marostegui: dbmaint codfw s1 deploy schema change on db2203 T364299
  • 10:45 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db2203.codfw.wmnet with reason: Long schema change
  • 10:45 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 10:00:00 on db2203.codfw.wmnet with reason: Long schema change
  • 10:45 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db2141.codfw.wmnet with reason: Long schema change
  • 10:45 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 10:00:00 on db2141.codfw.wmnet with reason: Long schema change
  • 10:43 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2203 T366552', diff saved to https://phabricator.wikimedia.org/P63995 and previous config saved to /var/cache/conftool/dbconfig/20240604-104337-root.json
  • 10:42 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db2212 to s1 primary T366552', diff saved to https://phabricator.wikimedia.org/P63994 and previous config saved to /var/cache/conftool/dbconfig/20240604-104241-root.json
  • 10:42 marostegui: Starting s1 codfw failover from db2203 to db2212 - T366552
  • 10:42 taavi@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudlb1001.eqiad.wmnet
  • 10:40 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host build2001.codfw.wmnet
  • 10:38 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: dumps::generation::worker::dumper
  • 10:36 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1156.eqiad.wmnet with OS bookworm
  • 10:34 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host build2001.codfw.wmnet
  • 10:34 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host failoid1002.eqiad.wmnet
  • 10:30 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host failoid1002.eqiad.wmnet
  • 10:28 hashar@deploy1002: Finished deploy [releng/jenkins-deploy@5d3a06d] (releasing): (no justification provided) (duration: 01m 12s)
  • 10:27 hashar: Upgrading releases Jenkins instances # T366008
  • 10:27 hashar@deploy1002: Started deploy [releng/jenkins-deploy@5d3a06d] (releasing): (no justification provided)
  • 10:24 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: dumps::generation::worker::dumper
  • 10:23 claime: Migrating votewiki to mw-on-k8s - T362323
  • 10:22 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host failoid2002.codfw.wmnet
  • 10:20 filippo@cumin1002: START - Cookbook sre.hosts.reboot-single for host centrallog1002.eqiad.wmnet
  • 10:18 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host failoid2002.codfw.wmnet
  • 10:16 hashar: Upgrading CI Jenkins # T366008
  • 10:15 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudlb1002.eqiad.wmnet
  • 10:15 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1156.eqiad.wmnet with reason: host reimage
  • 10:12 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1156.eqiad.wmnet with reason: host reimage
  • 10:10 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host clouddb2002-dev.codfw.wmnet
  • 10:09 mvernon@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling reboot on P{ms-fe2*} and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
  • 10:08 marostegui: dbmaint eqiad s1 deploy schema change on db1184 T364299
  • 10:07 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: dumps::generation::worker::dumper_monitor
  • 10:07 taavi@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudlb1002.eqiad.wmnet
  • 10:06 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudlb1001.eqiad.wmnet
  • 10:04 taavi@cumin1002: START - Cookbook sre.hosts.reboot-single for host clouddb2002-dev.codfw.wmnet
  • 10:04 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling reboot on P{ms-fe1*} and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
  • 10:00 marostegui@cumin1002: dbctl commit (dc=all): 'Set db2212 with weight 0 T366552', diff saved to https://phabricator.wikimedia.org/P63993 and previous config saved to /var/cache/conftool/dbconfig/20240604-100024-root.json
  • 10:00 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 35 hosts with reason: Primary switchover s1 T366552
  • 09:59 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 35 hosts with reason: Primary switchover s1 T366552
  • 09:58 marostegui@cumin1002: START - Cookbook sre.hosts.reimage for host db1156.eqiad.wmnet with OS bookworm
  • 09:58 taavi@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudlb1001.eqiad.wmnet
  • 09:57 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudlb2003-dev.codfw.wmnet
  • 09:56 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcumin1001.eqiad.wmnet
  • 09:54 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudgw2003-dev.codfw.wmnet
  • 09:53 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cuminunpriv1001.eqiad.wmnet
  • 09:53 taavi@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudcumin1001.eqiad.wmnet
  • 09:48 filippo@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host centrallog2002.codfw.wmnet
  • 09:48 taavi@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudlb2003-dev.codfw.wmnet
  • 09:48 taavi@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudgw2003-dev.codfw.wmnet
  • 09:47 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host cuminunpriv1001.eqiad.wmnet
  • 09:45 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcumin2001.codfw.wmnet
  • 09:45 filippo@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host graphite2004.codfw.wmnet
  • 09:45 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudlb2002-dev.codfw.wmnet
  • 09:44 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudgw2002-dev.codfw.wmnet
  • 09:43 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install3003.wikimedia.org
  • 09:42 filippo@cumin1002: START - Cookbook sre.hosts.reboot-single for host centrallog2002.codfw.wmnet
  • 09:41 taavi@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudcumin2001.codfw.wmnet
  • 09:40 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcontrol2008-dev.codfw.wmnet
  • 09:40 filippo@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mwlog2002.codfw.wmnet
  • 09:39 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: dumps::generation::worker::dumper_monitor
  • 09:38 taavi@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudgw2002-dev.codfw.wmnet
  • 09:37 filippo@cumin1002: START - Cookbook sre.hosts.reboot-single for host graphite2004.codfw.wmnet
  • 09:37 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudweb1004.wikimedia.org
  • 09:37 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install3003.wikimedia.org
  • 09:36 taavi@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudlb2002-dev.codfw.wmnet
  • 09:36 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudlb2001-dev.codfw.wmnet
  • 09:34 taavi@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudcontrol2008-dev.codfw.wmnet
  • 09:34 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcontrol2007-dev.codfw.wmnet
  • 09:34 filippo@cumin1002: START - Cookbook sre.hosts.reboot-single for host mwlog2002.codfw.wmnet
  • 09:33 filippo@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host mwlog1002.eqiad.wmnet
  • 09:33 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install4002.wikimedia.org
  • 09:30 taavi@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudweb1004.wikimedia.org
  • 09:29 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab2003.wikimedia.org
  • 09:29 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudweb1003.wikimedia.org
  • 09:27 taavi@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudcontrol2007-dev.codfw.wmnet
  • 09:27 filippo@cumin1002: START - Cookbook sre.hosts.reboot-single for host mwlog1002.eqiad.wmnet
  • 09:27 taavi@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudlb2001-dev.codfw.wmnet
  • 09:27 mvernon@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling reboot on P{ms-fe1*} and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
  • 09:27 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host testhost2001.codfw.wmnet
  • 09:26 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install4002.wikimedia.org
  • 09:25 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install5002.wikimedia.org
  • 09:22 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host moss-fe2001.codfw.wmnet
  • 09:22 taavi@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudweb1003.wikimedia.org
  • 09:22 jelto@cumin1002: START - Cookbook sre.hosts.reboot-single for host gitlab2003.wikimedia.org
  • 09:21 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudweb2002-dev.wikimedia.org
  • 09:21 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab1003.wikimedia.org
  • 09:21 taavi@cumin1002: START - Cookbook sre.hosts.reboot-single for host testhost2001.codfw.wmnet
  • 09:18 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install5002.wikimedia.org
  • 09:17 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host moss-fe2001.codfw.wmnet
  • 09:15 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host moss-fe1001.eqiad.wmnet
  • 09:15 jelto@cumin1002: START - Cookbook sre.hosts.reboot-single for host gitlab1003.wikimedia.org
  • 09:15 taavi@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudweb2002-dev.wikimedia.org
  • 09:14 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host gitlab1004.wikimedia.org
  • 09:14 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install6002.wikimedia.org
  • 09:09 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host moss-fe1001.eqiad.wmnet
  • 09:08 jelto@cumin1002: START - Cookbook sre.hosts.reboot-single for host gitlab1004.wikimedia.org
  • 09:08 jelto@cumin1002: END (PASS) - Cookbook sre.gitlab.reboot-runner (exit_code=0) rolling reboot on A:gitlab-runner
  • 09:08 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install6002.wikimedia.org
  • 09:01 moritzm: imported python3-xapian-haystack 2.1.1-1+deb12u1 to bookworm-wikimedia (already lined up for the next Bookworm point release to address https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1066136 and needed for the update of the Mailman servers T331706
  • 08:56 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host install7001.wikimedia.org
  • 08:54 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub: sync on production
  • 08:52 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub: apply on production
  • 08:52 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1238 (T364069)', diff saved to https://phabricator.wikimedia.org/P63992 and previous config saved to /var/cache/conftool/dbconfig/20240604-085205-marostegui.json
  • 08:51 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1238.eqiad.wmnet with reason: Maintenance
  • 08:51 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host install7001.wikimedia.org
  • 08:51 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1238.eqiad.wmnet with reason: Maintenance
  • 08:51 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221 (T364069)', diff saved to https://phabricator.wikimedia.org/P63991 and previous config saved to /var/cache/conftool/dbconfig/20240604-085141-marostegui.json
  • 08:50 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host idp-test1003.wikimedia.org
  • 08:46 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host idp-test1003.wikimedia.org
  • 08:44 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1156', diff saved to https://phabricator.wikimedia.org/P63990 and previous config saved to /var/cache/conftool/dbconfig/20240604-084428-root.json
  • 08:40 kostajh: UTC morning deploys done
  • 08:38 kharlan@deploy1002: Finished scap: Backport for IPReputationHooks: Bump schema version (T354597) (duration: 15m 45s)
  • 08:36 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221', diff saved to https://phabricator.wikimedia.org/P63989 and previous config saved to /var/cache/conftool/dbconfig/20240604-083633-marostegui.json
  • 08:19 kharlan@deploy1002: Finished scap: Backport for IPReputationHooks: Bump schema version (T354597) (duration: 14m 08s)
  • 08:10 kharlan@deploy1002: kharlan: Continuing with sync
  • 08:08 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216', diff saved to https://phabricator.wikimedia.org/P63986 and previous config saved to /var/cache/conftool/dbconfig/20240604-080846-marostegui.json
  • 08:08 kharlan@deploy1002: kharlan: Backport for IPReputationHooks: Bump schema version (T354597) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 08:06 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1221 (T364069)', diff saved to https://phabricator.wikimedia.org/P63985 and previous config saved to /var/cache/conftool/dbconfig/20240604-080617-marostegui.json
  • 08:06 jiji@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-wf2002.codfw.wmnet with reason: host reimage
  • 08:05 kharlan@deploy1002: Started scap: Backport for IPReputationHooks: Bump schema version (T354597)
  • 08:02 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-wf1002.eqiad.wmnet with reason: host reimage
  • 08:01 jiji@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-wf2002.codfw.wmnet with reason: host reimage
  • 07:57 jiji@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-wf1002.eqiad.wmnet with reason: host reimage
  • 07:56 hashar: Restarting Gerrit for Java 17 upgrade # T364342
  • 07:56 hashar@deploy1002: Finished deploy [gerrit/gerrit@6ba3f2e]: gerrit1003: switch to Java 17 version of plugins after having switched Java to 17- T364342 (duration: 00m 03s)
  • 07:56 hashar@deploy1002: Started deploy [gerrit/gerrit@6ba3f2e]: gerrit1003: switch to Java 17 version of plugins after having switched Java to 17- T364342
  • 07:53 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216', diff saved to https://phabricator.wikimedia.org/P63984 and previous config saved to /var/cache/conftool/dbconfig/20240604-075338-marostegui.json
  • 07:47 hashar@deploy1002: Finished deploy [gerrit/gerrit@6ba3f2e]: gerrit2002: switch to Java 17 version of plugins after having switched Java to 17- T364342 (duration: 00m 05s)
  • 07:46 hashar@deploy1002: Started deploy [gerrit/gerrit@6ba3f2e]: gerrit2002: switch to Java 17 version of plugins after having switched Java to 17- T364342
  • 07:42 jiji@cumin2002: START - Cookbook sre.hosts.reimage for host mc-wf2002.codfw.wmnet with OS bookworm
  • 07:42 jiji@cumin1002: START - Cookbook sre.hosts.reimage for host mc-wf1002.eqiad.wmnet with OS bookworm
  • 07:38 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216 (T364299)', diff saved to https://phabricator.wikimedia.org/P63983 and previous config saved to /var/cache/conftool/dbconfig/20240604-073830-marostegui.json
  • 07:27 marostegui: dbmaint eqiad s1 deploy schema change on db1184 T356166
  • 07:15 moritzm: installing intel-microcode updates on bullseye
  • 07:10 marostegui: dbmaint eqiad s1 deploy schema change on db1184 T355609
  • 07:06 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 07:06 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 07:05 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1184.eqiad.wmnet with OS bookworm
  • 06:43 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1184.eqiad.wmnet with reason: host reimage
  • 06:40 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1184.eqiad.wmnet with reason: host reimage
  • 06:26 arnaudb@cumin1002: START - Cookbook sre.hosts.reimage for host db1184.eqiad.wmnet with OS bookworm
  • 06:26 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on db1184.eqiad.wmnet with reason: reimage
  • 06:26 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 3:00:00 on db1184.eqiad.wmnet with reason: reimage
  • 06:14 marostegui: Rename table flaggedpage_pending on db1185 (s5 eqiad dbmaint) - T365568
  • 06:09 arnaudb@cumin1002: dbctl commit (dc=all): ' fix api db1163 vs db1184 T366259', diff saved to https://phabricator.wikimedia.org/P63982 and previous config saved to /var/cache/conftool/dbconfig/20240604-060925-arnaudb.json
  • 06:07 arnaudb@cumin1002: dbctl commit (dc=all): 'API db1163 T366259', diff saved to https://phabricator.wikimedia.org/P63981 and previous config saved to /var/cache/conftool/dbconfig/20240604-060747-arnaudb.json
  • 06:07 arnaudb@cumin1002: dbctl commit (dc=all): 'Depool db1184 T366259', diff saved to https://phabricator.wikimedia.org/P63980 and previous config saved to /var/cache/conftool/dbconfig/20240604-060703-arnaudb.json
  • 06:03 arnaudb@cumin1002: dbctl commit (dc=all): 'Promote db1163 to s1 primary and set section read-write T366259', diff saved to https://phabricator.wikimedia.org/P63979 and previous config saved to /var/cache/conftool/dbconfig/20240604-060324-arnaudb.json
  • 06:02 arnaudb@cumin1002: dbctl commit (dc=all): 'Set s1 eqiad as read-only for maintenance - T366259', diff saved to https://phabricator.wikimedia.org/P63978 and previous config saved to /var/cache/conftool/dbconfig/20240604-060208-arnaudb.json
  • 06:01 arnaudb: Starting s1 eqiad failover from db1184 to db1163 - T366259
  • 05:28 arnaudb@cumin1002: dbctl commit (dc=all): 'Set db1163 with weight 0 T366259', diff saved to https://phabricator.wikimedia.org/P63977 and previous config saved to /var/cache/conftool/dbconfig/20240604-052803-arnaudb.json
  • 05:27 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 35 hosts with reason: Primary switchover s1 T366259
  • 05:27 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 35 hosts with reason: Primary switchover s1 T366259
  • 04:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1246 (T352010)', diff saved to https://phabricator.wikimedia.org/P63976 and previous config saved to /var/cache/conftool/dbconfig/20240604-042011-ladsgroup.json
  • 04:20 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1246.eqiad.wmnet with reason: Maintenance
  • 04:19 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1246.eqiad.wmnet with reason: Maintenance
  • 04:01 mwpresync@deploy1002: Pruned MediaWiki: 1.43.0-wmf.5 (duration: 00m 57s)
  • 03:57 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2216 (T364299)', diff saved to https://phabricator.wikimedia.org/P63975 and previous config saved to /var/cache/conftool/dbconfig/20240604-035703-marostegui.json
  • 03:56 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2216.codfw.wmnet with reason: Maintenance
  • 03:56 mwpresync@deploy1002: Finished scap: testwikis wikis to 1.43.0-wmf.8 refs T361402 (duration: 53m 47s)
  • 03:56 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2216.codfw.wmnet with reason: Maintenance
  • 03:56 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2212 (T364299)', diff saved to https://phabricator.wikimedia.org/P63974 and previous config saved to /var/cache/conftool/dbconfig/20240604-035640-marostegui.json
  • 03:41 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2212', diff saved to https://phabricator.wikimedia.org/P63973 and previous config saved to /var/cache/conftool/dbconfig/20240604-034132-marostegui.json
  • 03:26 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2212', diff saved to https://phabricator.wikimedia.org/P63972 and previous config saved to /var/cache/conftool/dbconfig/20240604-032625-marostegui.json
  • 03:11 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2212 (T364299)', diff saved to https://phabricator.wikimedia.org/P63971 and previous config saved to /var/cache/conftool/dbconfig/20240604-031117-marostegui.json
  • 03:09 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2212 (T364299)', diff saved to https://phabricator.wikimedia.org/P63970 and previous config saved to /var/cache/conftool/dbconfig/20240604-030906-marostegui.json
  • 03:08 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2212.codfw.wmnet with reason: Maintenance
  • 03:08 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2212.codfw.wmnet with reason: Maintenance
  • 03:03 mwpresync@deploy1002: Started scap: testwikis wikis to 1.43.0-wmf.8 refs T361402
  • 00:21 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 00:21 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 00:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T352010)', diff saved to https://phabricator.wikimedia.org/P63969 and previous config saved to /var/cache/conftool/dbconfig/20240604-002119-ladsgroup.json
  • 00:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P63968 and previous config saved to /var/cache/conftool/dbconfig/20240604-000612-ladsgroup.json

2024-06-03

  • 23:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P63967 and previous config saved to /var/cache/conftool/dbconfig/20240603-235104-ladsgroup.json
  • 23:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T352010)', diff saved to https://phabricator.wikimedia.org/P63966 and previous config saved to /var/cache/conftool/dbconfig/20240603-233555-ladsgroup.json
  • 23:14 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki mediawikiwiki "Extension:DynamicPageList (Wikimedia)" "Extension:DynamicPageList" "Zabe" --reason "per request T366488"
  • 23:14 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2202.codfw.wmnet with reason: Maintenance
  • 23:14 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2202.codfw.wmnet with reason: Maintenance
  • 23:14 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188 (T364299)', diff saved to https://phabricator.wikimedia.org/P63965 and previous config saved to /var/cache/conftool/dbconfig/20240603-231424-marostegui.json
  • 22:59 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188', diff saved to https://phabricator.wikimedia.org/P63963 and previous config saved to /var/cache/conftool/dbconfig/20240603-225916-marostegui.json
  • 22:44 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188', diff saved to https://phabricator.wikimedia.org/P63962 and previous config saved to /var/cache/conftool/dbconfig/20240603-224408-marostegui.json
  • 22:29 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188 (T364299)', diff saved to https://phabricator.wikimedia.org/P63961 and previous config saved to /var/cache/conftool/dbconfig/20240603-222900-marostegui.json
  • 22:26 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1221 (T364069)', diff saved to https://phabricator.wikimedia.org/P63960 and previous config saved to /var/cache/conftool/dbconfig/20240603-222607-marostegui.json
  • 22:26 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 22:25 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 22:25 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1221.eqiad.wmnet with reason: Maintenance
  • 22:25 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1221.eqiad.wmnet with reason: Maintenance
  • 22:25 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199 (T364069)', diff saved to https://phabricator.wikimedia.org/P63959 and previous config saved to /var/cache/conftool/dbconfig/20240603-222524-marostegui.json
  • 22:10 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199', diff saved to https://phabricator.wikimedia.org/P63958 and previous config saved to /var/cache/conftool/dbconfig/20240603-221016-marostegui.json
  • 21:55 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199', diff saved to https://phabricator.wikimedia.org/P63957 and previous config saved to /var/cache/conftool/dbconfig/20240603-215508-marostegui.json
  • 21:40 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1199 (T364069)', diff saved to https://phabricator.wikimedia.org/P63956 and previous config saved to /var/cache/conftool/dbconfig/20240603-214000-marostegui.json
  • 21:20 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1239.eqiad.wmnet with reason: Maintenance
  • 21:20 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1239.eqiad.wmnet with reason: Maintenance
  • 21:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233 (T352010)', diff saved to https://phabricator.wikimedia.org/P63955 and previous config saved to /var/cache/conftool/dbconfig/20240603-212040-ladsgroup.json
  • 21:13 marostegui@cumin1002: dbctl commit (dc=all): 'db2212 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P63954 and previous config saved to /var/cache/conftool/dbconfig/20240603-211312-root.json
  • 21:05 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P63953 and previous config saved to /var/cache/conftool/dbconfig/20240603-210532-ladsgroup.json
  • 20:58 marostegui@cumin1002: dbctl commit (dc=all): 'db2212 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P63952 and previous config saved to /var/cache/conftool/dbconfig/20240603-205806-root.json
  • 20:51 urbanecm@deploy1002: Finished scap: Backport for Wrap tables in Vector 2022 for projects where legacy Vector is default (T366314), Enable night theme on pages which have no color contrast issues (T366370) (duration: 14m 57s)
  • 20:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P63951 and previous config saved to /var/cache/conftool/dbconfig/20240603-205024-ladsgroup.json
  • 20:43 urbanecm@deploy1002: jdlrobson and urbanecm: Continuing with sync
  • 20:43 marostegui@cumin1002: dbctl commit (dc=all): 'db2212 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P63950 and previous config saved to /var/cache/conftool/dbconfig/20240603-204300-root.json
  • 20:39 urbanecm@deploy1002: jdlrobson and urbanecm: Backport for Wrap tables in Vector 2022 for projects where legacy Vector is default (T366314), Enable night theme on pages which have no color contrast issues (T366370) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:36 urbanecm@deploy1002: Started scap: Backport for Wrap tables in Vector 2022 for projects where legacy Vector is default (T366314), Enable night theme on pages which have no color contrast issues (T366370)
  • 20:36 urbanecm@deploy1002: Finished scap: Backport for EventLogging: Enable IP reputation logging (T354597), [trwiki] Allow translator group to publish translation only in Extension:ContentTranslation, [trwiki] Reducing count edits ip and newbie per minute (T330811) (duration: 30m 14s)
  • 20:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233 (T352010)', diff saved to https://phabricator.wikimedia.org/P63949 and previous config saved to /var/cache/conftool/dbconfig/20240603-203514-ladsgroup.json
  • 20:27 marostegui@cumin1002: dbctl commit (dc=all): 'db2212 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P63948 and previous config saved to /var/cache/conftool/dbconfig/20240603-202754-root.json
  • 20:27 urbanecm@deploy1002: kharlan and urbanecm and gergesshamon: Continuing with sync
  • 20:12 marostegui@cumin1002: dbctl commit (dc=all): 'db2212 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P63947 and previous config saved to /var/cache/conftool/dbconfig/20240603-201248-root.json
  • 20:10 urbanecm@deploy1002: kharlan and urbanecm and gergesshamon: Backport for EventLogging: Enable IP reputation logging (T354597), [trwiki] Allow translator group to publish translation only in Extension:ContentTranslation, [trwiki] Reducing count edits ip and newbie per minute (T330811) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:06 urbanecm@deploy1002: Started scap: Backport for EventLogging: Enable IP reputation logging (T354597), [trwiki] Allow translator group to publish translation only in Extension:ContentTranslation, [trwiki] Reducing count edits ip and newbie per minute (T330811)
  • 19:57 marostegui@cumin1002: dbctl commit (dc=all): 'db2212 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P63946 and previous config saved to /var/cache/conftool/dbconfig/20240603-195742-root.json
  • 19:42 marostegui@cumin1002: dbctl commit (dc=all): 'db2212 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P63945 and previous config saved to /var/cache/conftool/dbconfig/20240603-194236-root.json
  • 18:30 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2188 (T364299)', diff saved to https://phabricator.wikimedia.org/P63944 and previous config saved to /var/cache/conftool/dbconfig/20240603-183029-marostegui.json
  • 18:30 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2188.codfw.wmnet with reason: Maintenance
  • 18:30 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2188.codfw.wmnet with reason: Maintenance
  • 18:30 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T364299)', diff saved to https://phabricator.wikimedia.org/P63943 and previous config saved to /var/cache/conftool/dbconfig/20240603-183006-marostegui.json
  • 18:14 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P63942 and previous config saved to /var/cache/conftool/dbconfig/20240603-181459-marostegui.json
  • 17:59 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P63941 and previous config saved to /var/cache/conftool/dbconfig/20240603-175951-marostegui.json
  • 17:44 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T364299)', diff saved to https://phabricator.wikimedia.org/P63940 and previous config saved to /var/cache/conftool/dbconfig/20240603-174442-marostegui.json
  • 17:27 cgoubert@cumin1002: conftool action : set/weight=10:pooled=yes; selector: name=(wikikube-worker1002.eqiad.wmnet|wikikube-worker1003.eqiad.wmnet|wikikube-worker1007.eqiad.wmnet|wikikube-worker1004.eqiad.wmnet),cluster=kubernetes,service=kubesvc
  • 17:27 claime: Pooling and uncordoning wikikube-worker1002.eqiad.wmnet,wikikube-worker1003.eqiad.wmnet,wikikube-worker1007.eqiad.wmnet,wikikube-worker1004.eqiad.wmnet - T351074
  • 17:19 claime: homer 'cr*eqiad*' commit 'T351074'
  • 17:18 bd808@deploy1002: helmfile [eqiad] DONE helmfile.d/services/toolhub: apply
  • 17:17 claime: homer 'lsw1-e2-eqiad*' commit 'T351074'
  • 17:17 claime: homer 'lsw1-e2-eqiad*' commit 'T35107
  • 17:17 bd808@deploy1002: helmfile [eqiad] START helmfile.d/services/toolhub: apply
  • 17:17 bd808@deploy1002: helmfile [codfw] DONE helmfile.d/services/toolhub: apply
  • 17:16 bd808@deploy1002: helmfile [codfw] START helmfile.d/services/toolhub: apply
  • 17:15 bd808@deploy1002: helmfile [staging] DONE helmfile.d/services/toolhub: apply
  • 17:14 bd808@deploy1002: helmfile [staging] START helmfile.d/services/toolhub: apply
  • 16:55 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1007.eqiad.wmnet with OS bullseye
  • 16:37 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1007.eqiad.wmnet with reason: host reimage
  • 16:33 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1007.eqiad.wmnet with reason: host reimage
  • 16:20 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1007.eqiad.wmnet with OS bullseye
  • 16:18 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host wikikube-worker1007.eqiad.wmnet with OS bullseye
  • 16:02 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1003.eqiad.wmnet with OS bullseye
  • 15:59 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1004.eqiad.wmnet with OS bullseye
  • 15:55 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1002.eqiad.wmnet with OS bullseye
  • 15:50 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2212', diff saved to https://phabricator.wikimedia.org/P63939 and previous config saved to /var/cache/conftool/dbconfig/20240603-155048-root.json
  • 15:43 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1003.eqiad.wmnet with reason: host reimage
  • 15:43 hashar@deploy1002: Finished deploy [gerrit/gerrit@c93e47d]: Revert "Rebuild plugins for Java 17" to stick to Java 11 based compiled plugins - T364342 (duration: 00m 05s)
  • 15:43 hashar@deploy1002: Started deploy [gerrit/gerrit@c93e47d]: Revert "Rebuild plugins for Java 17" to stick to Java 11 based compiled plugins - T364342
  • 15:42 jhathaway: deploying more restrictive SPF & DMARC settings for wikipedia.org
  • 15:41 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1004.eqiad.wmnet with reason: host reimage
  • 15:37 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1002.eqiad.wmnet with reason: host reimage
  • 15:36 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1004.eqiad.wmnet with reason: host reimage
  • 15:36 pt1979@cumin2002: END (PASS) - Cookbook sre.network.provision (exit_code=0) for device lsw1-c2-codfw.mgmt.codfw.wmnet
  • 15:35 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1003.eqiad.wmnet with reason: host reimage
  • 15:34 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1002.eqiad.wmnet with reason: host reimage
  • 15:30 dancy@deploy1002: sync-world aborted: testing (duration: 00m 00s)
  • 15:30 dancy@deploy1002: Started scap: testing
  • 15:27 dancy@mwmaint1002: scap failed: FileNotFoundError [Errno 2] No such file or directory: '/etc/helmfile-defaults/mediawiki-deployments.yaml' (duration: 00m 00s)
  • 15:23 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1007.eqiad.wmnet with OS bullseye
  • 15:23 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1004.eqiad.wmnet with OS bullseye
  • 15:22 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1003.eqiad.wmnet with OS bullseye
  • 15:21 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1002.eqiad.wmnet with OS bullseye
  • 15:04 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:04 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-c2-codfw - pt1979@cumin2002"
  • 15:03 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add management record for lsw1-c2-codfw - pt1979@cumin2002"
  • 15:03 dancy@deploy1002: Installing scap version "4.84.0" for 297 hosts
  • 15:01 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw1490 to wikikube-worker1007
  • 15:01 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1007
  • 15:00 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 15:00 pt1979@cumin2002: START - Cookbook sre.network.provision for device lsw1-c2-codfw.mgmt.codfw.wmnet
  • 15:00 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1007
  • 15:00 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:00 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1490 to wikikube-worker1007 - cgoubert@cumin1002"
  • 14:57 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1490 to wikikube-worker1007 - cgoubert@cumin1002"
  • 14:57 hashar@deploy1002: Finished deploy [gerrit/gerrit@6ba3f2e]: Rebuild plugins for Java 17 - T364342 (duration: 00m 05s)
  • 14:57 hashar@deploy1002: Started deploy [gerrit/gerrit@6ba3f2e]: Rebuild plugins for Java 17 - T364342
  • 14:55 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 14:55 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw1490 to wikikube-worker1007
  • 14:54 hashar@deploy1002: Finished deploy [gerrit/gerrit@6ba3f2e]: Rebuild plugins for Java 17 - T364342 (duration: 00m 08s)
  • 14:54 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw1443 to wikikube-worker1004
  • 14:54 hashar@deploy1002: Started deploy [gerrit/gerrit@6ba3f2e]: Rebuild plugins for Java 17 - T364342
  • 14:54 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1004
  • 14:53 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1004
  • 14:53 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:53 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1443 to wikikube-worker1004 - cgoubert@cumin1002"
  • 14:53 hashar@deploy1002: Finished deploy [gerrit/gerrit@c93e47d]: Rebuild plugins for Java 17 - T364342 (duration: 00m 05s)
  • 14:53 hashar@deploy1002: Started deploy [gerrit/gerrit@c93e47d]: Rebuild plugins for Java 17 - T364342
  • 14:52 Dreamy_Jazz: Afternoon UTC backport window done
  • 14:52 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1443 to wikikube-worker1004 - cgoubert@cumin1002"
  • 14:51 dreamyjazz@deploy1002: Finished scap: Backport for Ensure excluded SHA-1s have numeric keys for scanFilesInScanTable.php (T366473) (duration: 12m 04s)
  • 14:45 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 14:45 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw1443 to wikikube-worker1004
  • 14:44 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw1427 to wikikube-worker1003
  • 14:44 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1003
  • 14:43 dreamyjazz@deploy1002: dreamyjazz: Continuing with sync
  • 14:42 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1003
  • 14:42 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:42 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1427 to wikikube-worker1003 - cgoubert@cumin1002"
  • 14:41 dreamyjazz@deploy1002: dreamyjazz: Backport for Ensure excluded SHA-1s have numeric keys for scanFilesInScanTable.php (T366473) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:41 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1427 to wikikube-worker1003 - cgoubert@cumin1002"
  • 14:39 dreamyjazz@deploy1002: Started scap: Backport for Ensure excluded SHA-1s have numeric keys for scanFilesInScanTable.php (T366473)
  • 14:39 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 14:38 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw1427 to wikikube-worker1003
  • 14:38 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw1426 to wikikube-worker1002
  • 14:38 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1002
  • 14:37 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1002
  • 14:37 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:37 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1426 to wikikube-worker1002 - cgoubert@cumin1002"
  • 14:35 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1426 to wikikube-worker1002 - cgoubert@cumin1002"
  • 14:34 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts mw1358.eqiad.wmnet
  • 14:33 cgoubert@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mw1358.eqiad.wmnet
  • 14:33 vgutierrez: repool text@ulsfo with IPIP encapsulation enabled - T366466
  • 14:31 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host snapshot1012.eqiad.wmnet with OS bullseye
  • 14:31 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts mw1358.eqiad.wmnet
  • 14:31 cgoubert@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mw1358.eqiad.wmnet
  • 14:30 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts mw1358.eqiad.wmnet
  • 14:30 cgoubert@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mw1358.eqiad.wmnet
  • 14:30 jiji@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-wf2001.codfw.wmnet with OS bookworm
  • 14:29 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 14:29 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw1426 to wikikube-worker1002
  • 14:28 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host snapshot1010.eqiad.wmnet with OS bullseye
  • 14:25 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-wf1001.eqiad.wmnet with OS bookworm
  • 14:24 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.rename (exit_code=99) from mw1358 to wikikube-worker1001
  • 14:24 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw1358 to wikikube-worker1001
  • 14:20 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 14:19 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 14:12 jiji@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-wf2001.codfw.wmnet with reason: host reimage
  • 14:09 jiji@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-wf2001.codfw.wmnet with reason: host reimage
  • 14:08 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-wf1001.eqiad.wmnet with reason: host reimage
  • 14:05 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on snapshot1012.eqiad.wmnet with reason: host reimage
  • 14:05 jiji@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-wf1001.eqiad.wmnet with reason: host reimage
  • 14:02 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on snapshot1010.eqiad.wmnet with reason: host reimage
  • 14:01 tgr@deploy1002: Finished scap: Backport for [trwiki] Create translator group (T356440) (duration: 23m 15s)
  • 13:59 btullis@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on snapshot1012.eqiad.wmnet with reason: host reimage
  • 13:59 btullis@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on snapshot1010.eqiad.wmnet with reason: host reimage
  • 13:58 vgutierrez: rolling restart of pybal on lvs4010 and lvs4008 - T366466
  • 13:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1233 (T352010)', diff saved to https://phabricator.wikimedia.org/P63937 and previous config saved to /var/cache/conftool/dbconfig/20240603-135634-ladsgroup.json
  • 13:56 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1233.eqiad.wmnet with reason: Maintenance
  • 13:56 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1233.eqiad.wmnet with reason: Maintenance
  • 13:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229 (T352010)', diff saved to https://phabricator.wikimedia.org/P63936 and previous config saved to /var/cache/conftool/dbconfig/20240603-135612-ladsgroup.json
  • 13:54 vgutierrez: re-enable puppet on "A:cp-text_ulsfo" - T366466
  • 13:50 jiji@cumin2002: START - Cookbook sre.hosts.reimage for host mc-wf2001.codfw.wmnet with OS bookworm
  • 13:50 jiji@cumin1002: START - Cookbook sre.hosts.reimage for host mc-wf1001.eqiad.wmnet with OS bookworm
  • 13:49 vgutierrez: re-enable puppet on "A:cp-text and not A:cp-text_ulsfo" - T366466
  • 13:46 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host snapshot1012.eqiad.wmnet with OS bullseye
  • 13:46 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host snapshot1010.eqiad.wmnet with OS bullseye
  • 13:44 tgr@deploy1002: gergesshamon and tgr: Continuing with sync
  • 13:41 vgutierrez: disable puppet on A:cp-text before merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/1038294/ - T366466
  • 13:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P63935 and previous config saved to /var/cache/conftool/dbconfig/20240603-134104-ladsgroup.json
  • 13:40 tgr@deploy1002: gergesshamon and tgr: Backport for [trwiki] Create translator group (T356440) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:38 tgr@deploy1002: Started scap: Backport for [trwiki] Create translator group (T356440)
  • 13:36 vgutierrez: depool text@ulsfo before enabling IPIP encapsulation - T366466
  • 13:32 tgr@deploy1002: Finished scap: Backport for [Beta] cswiki: enable CommunityConfiguration for GrowthExperiments (T364892), [multiversion] Add 'manage-dblist init-labs' subcommand, [arwiki] add ipblock-exempt to bot group (T366404) (duration: 19m 07s)
  • 13:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P63934 and previous config saved to /var/cache/conftool/dbconfig/20240603-132556-ladsgroup.json
  • 13:23 tgr@deploy1002: sgimeno and gergesshamon and tgr: Continuing with sync
  • 13:20 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1001.eqiad.wmnet with OS bookworm
  • 13:16 tgr@deploy1002: sgimeno and gergesshamon and tgr: Backport for [Beta] cswiki: enable CommunityConfiguration for GrowthExperiments (T364892), [multiversion] Add 'manage-dblist init-labs' subcommand, [arwiki] add ipblock-exempt to bot group (T366404) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:13 tgr@deploy1002: Started scap: Backport for [Beta] cswiki: enable CommunityConfiguration for GrowthExperiments (T364892), [multiversion] Add 'manage-dblist init-labs' subcommand, [arwiki] add ipblock-exempt to bot group (T366404)
  • 13:13 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti-test2002.codfw.wmnet
  • 13:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229 (T352010)', diff saved to https://phabricator.wikimedia.org/P63933 and previous config saved to /var/cache/conftool/dbconfig/20240603-131048-ladsgroup.json
  • 13:08 moritzm: uploaded intel-microcode 3.20240312.1~deb11u1 to apt.wikimedia.org (import from bullseye-proposed-updates, to be coupled with forthcoming reboots)
  • 13:07 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host ganeti-test2002.codfw.wmnet
  • 13:03 Emperor: depool moss-fe2001 with a view to returning it to apus T279621
  • 13:02 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1001.eqiad.wmnet with reason: host reimage
  • 13:02 Emperor: depool moss-fe1001 with a view to returning it to apus T279621
  • 13:00 jiji@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1001.eqiad.wmnet with reason: host reimage
  • 12:55 Emperor: depool/restart swift-proxy/repool ms-fe10{09,11,12,14} due to rising connection failures T360913
  • 12:47 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host sretest1001.eqiad.wmnet
  • 12:46 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2176 (T364299)', diff saved to https://phabricator.wikimedia.org/P63932 and previous config saved to /var/cache/conftool/dbconfig/20240603-124628-marostegui.json
  • 12:46 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2176.codfw.wmnet with reason: Maintenance
  • 12:46 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2176.codfw.wmnet with reason: Maintenance
  • 12:46 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T364299)', diff saved to https://phabricator.wikimedia.org/P63931 and previous config saved to /var/cache/conftool/dbconfig/20240603-124605-marostegui.json
  • 12:45 jiji@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1001.eqiad.wmnet with OS bookworm
  • 12:41 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1002.eqiad.wmnet with OS bookworm
  • 12:40 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host sretest1001.eqiad.wmnet
  • 12:30 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P63930 and previous config saved to /var/cache/conftool/dbconfig/20240603-123057-marostegui.json
  • 12:24 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp1002.eqiad.wmnet with reason: host reimage
  • 12:20 jiji@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp1002.eqiad.wmnet with reason: host reimage
  • 12:15 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P63929 and previous config saved to /var/cache/conftool/dbconfig/20240603-121549-marostegui.json
  • 12:06 jiji@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1002.eqiad.wmnet with OS bookworm
  • 12:03 ladsgroup@deploy1002: Finished scap: Backport for Enable numeric sorting for Persian (T329440) (duration: 12m 07s)
  • 12:00 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T364299)', diff saved to https://phabricator.wikimedia.org/P63928 and previous config saved to /var/cache/conftool/dbconfig/20240603-120041-marostegui.json
  • 11:54 ladsgroup@deploy1002: ebrahim and ladsgroup: Continuing with sync
  • 11:53 jynus@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on backup2011.codfw.wmnet with reason: remount filesystem
  • 11:53 jynus@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on backup2011.codfw.wmnet with reason: remount filesystem
  • 11:53 ladsgroup@deploy1002: ebrahim and ladsgroup: Backport for Enable numeric sorting for Persian (T329440) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 11:51 ladsgroup@deploy1002: Started scap: Backport for Enable numeric sorting for Persian (T329440)
  • 11:35 effie: restart memcached on mc1050 and mc2050
  • 11:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1167 (T352010)', diff saved to https://phabricator.wikimedia.org/P63927 and previous config saved to /var/cache/conftool/dbconfig/20240603-113447-ladsgroup.json
  • 11:34 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 11:34 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 11:34 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance
  • 11:34 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance
  • 11:27 jynus@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on backup2011.codfw.wmnet with reason: remount filesystem
  • 11:26 jynus@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on backup2011.codfw.wmnet with reason: remount filesystem
  • 11:24 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1037.eqiad.wmnet with OS bookworm
  • 11:07 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-host (exit_code=0) for host snapshot1013.eqiad.wmnet
  • 11:07 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1037.eqiad.wmnet with reason: host reimage
  • 11:04 jiji@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1037.eqiad.wmnet with reason: host reimage
  • 10:54 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1199 (T364069)', diff saved to https://phabricator.wikimedia.org/P63926 and previous config saved to /var/cache/conftool/dbconfig/20240603-105416-marostegui.json
  • 10:54 jmm@cumin2002: START - Cookbook sre.puppet.migrate-host for host snapshot1013.eqiad.wmnet
  • 10:54 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1199.eqiad.wmnet with reason: Maintenance
  • 10:53 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1199.eqiad.wmnet with reason: Maintenance
  • 10:53 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190 (T364069)', diff saved to https://phabricator.wikimedia.org/P63925 and previous config saved to /var/cache/conftool/dbconfig/20240603-105352-marostegui.json
  • 10:50 jiji@cumin1002: START - Cookbook sre.hosts.reimage for host mc1037.eqiad.wmnet with OS bookworm
  • 10:41 moritzm: installing linux 5.10.218 security updates
  • 10:40 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1038.eqiad.wmnet with OS bookworm
  • 10:38 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P63924 and previous config saved to /var/cache/conftool/dbconfig/20240603-103844-marostegui.json
  • 10:29 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host snapshot1013.eqiad.wmnet with OS bullseye
  • 10:23 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190', diff saved to https://phabricator.wikimedia.org/P63923 and previous config saved to /var/cache/conftool/dbconfig/20240603-102335-marostegui.json
  • 10:21 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1038.eqiad.wmnet with reason: host reimage
  • 10:18 jiji@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1038.eqiad.wmnet with reason: host reimage
  • 10:08 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1190 (T364069)', diff saved to https://phabricator.wikimedia.org/P63922 and previous config saved to /var/cache/conftool/dbconfig/20240603-100827-marostegui.json
  • 10:03 jiji@cumin1002: START - Cookbook sre.hosts.reimage for host mc1038.eqiad.wmnet with OS bookworm
  • 10:02 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on snapshot1013.eqiad.wmnet with reason: host reimage
  • 09:58 ladsgroup@deploy1002: Finished scap: Backport for Stop writing to the old pagelinks columns in s8 (T352010) (duration: 18m 39s)
  • 09:57 Dreamy_Jazz: Restarting MediaModeration scanning script - https://wikitech.wikimedia.org/wiki/MediaModeration
  • 09:56 btullis@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on snapshot1013.eqiad.wmnet with reason: host reimage
  • 09:49 jiji@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host mc-gp2001.codfw.wmnet with OS bookworm
  • 09:45 ladsgroup@deploy1002: ladsgroup: Continuing with sync
  • 09:43 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host snapshot1013.eqiad.wmnet with OS bullseye
  • 09:42 ladsgroup@deploy1002: ladsgroup: Backport for Stop writing to the old pagelinks columns in s8 (T352010) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 09:41 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc1039.eqiad.wmnet with OS bookworm
  • 09:40 ladsgroup@deploy1002: Started scap: Backport for Stop writing to the old pagelinks columns in s8 (T352010)
  • 09:31 jiji@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc-gp2001.codfw.wmnet with reason: host reimage
  • 09:29 jiji@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc-gp2001.codfw.wmnet with reason: host reimage
  • 09:25 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mc1039.eqiad.wmnet with reason: host reimage
  • 09:22 jiji@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mc1039.eqiad.wmnet with reason: host reimage
  • 09:10 jiji@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2001.codfw.wmnet with OS bookworm
  • 09:10 jiji@cumin1002: START - Cookbook sre.hosts.reimage for host mc1039.eqiad.wmnet with OS bookworm
  • 09:08 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 09:08 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 09:08 jiji@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts ['mc1039.eqiad.wmnet']
  • 08:49 jiji@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp2002.codfw.wmnet with OS bookworm
  • 08:45 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mc-gp1003.eqiad.wmnet with OS bookworm
  • 08:15 hashar@deploy1002: Finished deploy [gerrit/gerrit@c93e47d]: Revert Gerrit back to 3.8.6 - T354887 (duration: 00m 05s)
  • 08:15 hashar@deploy1002: Started deploy [gerrit/gerrit@c93e47d]: Revert Gerrit back to 3.8.6 - T354887
  • 08:10 jiji@cumin1002: START - Cookbook sre.hosts.reimage for host mc-gp1003.eqiad.wmnet with OS bookworm
  • 08:09 jiji@cumin2002: START - Cookbook sre.hosts.reimage for host mc-gp2002.codfw.wmnet with OS bookworm
  • 08:08 hashar@deploy1002: Finished deploy [gerrit/gerrit@7838134]: Gerrit to v3.9.5 on gerrit1003 - T354887 (duration: 00m 05s)
  • 08:08 hashar@deploy1002: Started deploy [gerrit/gerrit@7838134]: Gerrit to v3.9.5 on gerrit1003 - T354887
  • 08:08 hashar@deploy1002: Finished deploy [gerrit/gerrit@7838134]: Gerrit to v3.9.5 on gerrit2002 - T354887 (duration: 00m 08s)
  • 08:08 hashar@deploy1002: Started deploy [gerrit/gerrit@7838134]: Gerrit to v3.9.5 on gerrit2002 - T354887
  • 08:04 jiji@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['mc1039.eqiad.wmnet']
  • 07:32 kartik@deploy1002: Finished scap: Backport for testwiki: Fix language for nan in Section Translation (duration: 28m 37s)
  • 07:25 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2174 (T364299)', diff saved to https://phabricator.wikimedia.org/P63920 and previous config saved to /var/cache/conftool/dbconfig/20240603-072513-marostegui.json
  • 07:25 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2174.codfw.wmnet with reason: Maintenance
  • 07:24 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2174.codfw.wmnet with reason: Maintenance
  • 07:24 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T364299)', diff saved to https://phabricator.wikimedia.org/P63919 and previous config saved to /var/cache/conftool/dbconfig/20240603-072450-marostegui.json
  • 07:22 kartik@deploy1002: kartik: Continuing with sync
  • 07:18 kartik@deploy1002: kartik: Backport for testwiki: Fix language for nan in Section Translation synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 07:09 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P63918 and previous config saved to /var/cache/conftool/dbconfig/20240603-070942-marostegui.json
  • 07:04 kartik@deploy1002: Started scap: Backport for testwiki: Fix language for nan in Section Translation
  • 06:54 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P63917 and previous config saved to /var/cache/conftool/dbconfig/20240603-065434-marostegui.json
  • 06:39 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T364299)', diff saved to https://phabricator.wikimedia.org/P63916 and previous config saved to /var/cache/conftool/dbconfig/20240603-063925-marostegui.json
  • 06:38 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2173 (T364299)', diff saved to https://phabricator.wikimedia.org/P63915 and previous config saved to /var/cache/conftool/dbconfig/20240603-063814-marostegui.json
  • 06:38 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 06:37 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 06:37 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2173.codfw.wmnet with reason: Maintenance
  • 06:37 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2173.codfw.wmnet with reason: Maintenance
  • 06:37 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170 (T364299)', diff saved to https://phabricator.wikimedia.org/P63914 and previous config saved to /var/cache/conftool/dbconfig/20240603-063735-marostegui.json
  • 06:22 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170', diff saved to https://phabricator.wikimedia.org/P63913 and previous config saved to /var/cache/conftool/dbconfig/20240603-062227-marostegui.json
  • 06:19 marostegui@cumin1002: dbctl commit (dc=all): 'db1213 (re)pooling @ 100%: Repooling T366429', diff saved to https://phabricator.wikimedia.org/P63912 and previous config saved to /var/cache/conftool/dbconfig/20240603-061956-root.json
  • 06:07 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170', diff saved to https://phabricator.wikimedia.org/P63911 and previous config saved to /var/cache/conftool/dbconfig/20240603-060719-marostegui.json
  • 06:04 marostegui@cumin1002: dbctl commit (dc=all): 'db1213 (re)pooling @ 75%: Repooling T366429', diff saved to https://phabricator.wikimedia.org/P63910 and previous config saved to /var/cache/conftool/dbconfig/20240603-060450-root.json
  • 05:52 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 05:52 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 05:52 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170 (T364299)', diff saved to https://phabricator.wikimedia.org/P63909 and previous config saved to /var/cache/conftool/dbconfig/20240603-055210-marostegui.json
  • 05:49 marostegui@cumin1002: dbctl commit (dc=all): 'db1213 (re)pooling @ 50%: Repooling T366429', diff saved to https://phabricator.wikimedia.org/P63908 and previous config saved to /var/cache/conftool/dbconfig/20240603-054944-root.json
  • 05:39 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 05:39 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 05:34 marostegui@cumin1002: dbctl commit (dc=all): 'db1213 (re)pooling @ 25%: Repooling T366429', diff saved to https://phabricator.wikimedia.org/P63907 and previous config saved to /var/cache/conftool/dbconfig/20240603-053438-root.json
  • 05:19 marostegui@cumin1002: dbctl commit (dc=all): 'db1213 (re)pooling @ 10%: Repooling T366429', diff saved to https://phabricator.wikimedia.org/P63906 and previous config saved to /var/cache/conftool/dbconfig/20240603-051932-root.json
  • 05:04 marostegui@cumin1002: dbctl commit (dc=all): 'db1213 (re)pooling @ 5%: Repooling T366429', diff saved to https://phabricator.wikimedia.org/P63905 and previous config saved to /var/cache/conftool/dbconfig/20240603-050424-root.json
  • 04:49 marostegui@cumin1002: dbctl commit (dc=all): 'db1213 (re)pooling @ 1%: Repooling T366429', diff saved to https://phabricator.wikimedia.org/P63904 and previous config saved to /var/cache/conftool/dbconfig/20240603-044918-root.json
  • 04:34 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 04:34 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 04:09 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 04:09 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 03:46 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 03:46 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 03:43 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 03:43 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 03:41 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 03:41 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 03:40 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 03:40 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 03:38 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 03:38 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 03:36 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 03:36 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 03:34 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 03:34 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 03:32 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 03:31 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 03:29 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 03:29 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 03:26 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 03:26 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 03:16 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 03:16 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 03:14 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 03:14 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 03:10 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 03:10 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 03:08 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 03:08 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 03:07 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 03:06 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 03:05 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 03:05 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 03:03 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 03:03 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 03:01 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 03:01 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 02:56 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 02:56 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 02:52 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 02:52 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 02:49 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 02:49 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 02:47 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 02:47 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 02:44 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 02:44 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 02:42 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 02:42 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 02:38 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 02:38 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 02:37 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 02:36 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 02:35 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 02:35 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 02:33 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 02:33 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 02:31 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 02:31 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 02:29 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 02:29 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 02:27 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 02:27 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 02:25 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 02:25 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 02:16 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 02:16 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 02:14 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 02:14 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 02:12 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 02:11 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 02:10 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 02:10 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:45 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:45 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:42 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:42 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:40 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:40 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:38 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:38 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:31 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:31 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:30 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:29 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:27 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:27 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:25 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:25 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:24 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:23 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:18 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2170 (T364299)', diff saved to https://phabricator.wikimedia.org/P63903 and previous config saved to /var/cache/conftool/dbconfig/20240603-011839-marostegui.json
  • 01:18 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 01:18 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 01:18 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T364299)', diff saved to https://phabricator.wikimedia.org/P63902 and previous config saved to /var/cache/conftool/dbconfig/20240603-011813-marostegui.json
  • 01:11 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:11 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:09 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance
  • 01:09 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance
  • 01:09 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T352010)', diff saved to https://phabricator.wikimedia.org/P63901 and previous config saved to /var/cache/conftool/dbconfig/20240603-010925-ladsgroup.json
  • 01:03 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P63900 and previous config saved to /var/cache/conftool/dbconfig/20240603-010305-marostegui.json
  • 01:02 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:02 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:00 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:00 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:58 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:58 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:56 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:56 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:54 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:54 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P63899 and previous config saved to /var/cache/conftool/dbconfig/20240603-005415-ladsgroup.json
  • 00:52 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:52 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:50 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:50 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:48 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:48 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:47 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P63898 and previous config saved to /var/cache/conftool/dbconfig/20240603-004757-marostegui.json
  • 00:45 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:45 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:43 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:43 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:41 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:41 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P63897 and previous config saved to /var/cache/conftool/dbconfig/20240603-003907-ladsgroup.json
  • 00:39 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:38 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:37 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:37 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:35 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:35 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:33 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:33 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:32 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T364299)', diff saved to https://phabricator.wikimedia.org/P63896 and previous config saved to /var/cache/conftool/dbconfig/20240603-003247-marostegui.json
  • 00:31 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:31 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T352010)', diff saved to https://phabricator.wikimedia.org/P63895 and previous config saved to /var/cache/conftool/dbconfig/20240603-002359-ladsgroup.json
  • 00:21 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:21 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:19 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:19 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:17 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:17 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:15 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:15 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:13 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:13 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:11 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:10 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:05 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:05 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply

2024-06-02

  • 23:58 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 23:58 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 23:56 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 23:56 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 23:54 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 23:54 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 23:51 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 23:51 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 23:49 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 23:49 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 23:47 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 23:47 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 23:45 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 23:45 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 23:43 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 23:43 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 23:41 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 23:41 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 23:39 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 23:39 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 23:37 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 23:37 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 23:35 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 23:35 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 23:28 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1190 (T364069)', diff saved to https://phabricator.wikimedia.org/P63894 and previous config saved to /var/cache/conftool/dbconfig/20240602-232847-marostegui.json
  • 23:28 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1190.eqiad.wmnet with reason: Maintenance
  • 23:28 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1190.eqiad.wmnet with reason: Maintenance
  • 23:28 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 23:28 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 23:21 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 23:21 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 23:19 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 23:19 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 23:17 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 23:17 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 23:15 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 23:15 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 23:13 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 23:13 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 23:11 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 23:11 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 23:08 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 23:08 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 23:07 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 23:06 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 23:03 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 23:03 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 23:01 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 23:01 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 22:59 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 22:59 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 22:52 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 22:52 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 22:39 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 22:39 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 22:37 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 22:37 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 22:35 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 22:35 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 22:34 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 22:34 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 22:30 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 22:30 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 22:25 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 22:25 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 22:23 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 22:23 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 22:21 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 22:21 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 22:19 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 22:19 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 22:17 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 22:17 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 22:15 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 22:15 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 22:09 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 22:09 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 22:07 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 22:07 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 22:03 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 22:03 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 22:00 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 22:00 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 21:58 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 21:58 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 21:55 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 21:55 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 21:53 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 21:53 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 21:48 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 21:48 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 21:46 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 21:46 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 21:44 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 21:44 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 21:42 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 21:42 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 21:40 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 21:40 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 21:38 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 21:38 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 21:36 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 21:36 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 21:35 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 21:35 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 21:33 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 21:33 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 21:31 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 21:31 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 21:28 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 21:28 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 21:26 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 21:26 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 21:25 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 21:25 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 21:23 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 21:23 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 21:21 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 21:21 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 21:14 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 21:14 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 21:12 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 21:12 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 21:01 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 21:01 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 20:59 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 20:59 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 20:55 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 20:55 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 20:53 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3 days, 0:00:00 on db1213.eqiad.wmnet with reason: replication issues
  • 20:53 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 3 days, 0:00:00 on db1213.eqiad.wmnet with reason: replication issues
  • 20:53 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 20:53 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 20:51 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 20:51 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 20:47 taavi@cumin1002: dbctl commit (dc=all): 'depool db1213', diff saved to https://phabricator.wikimedia.org/P63893 and previous config saved to /var/cache/conftool/dbconfig/20240602-204719-taavi.json
  • 20:46 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 20:46 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 20:43 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 20:43 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 20:42 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 20:41 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 20:40 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 20:40 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 20:38 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 20:38 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 20:34 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 20:34 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 20:32 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 20:32 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 20:30 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 20:30 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 20:25 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 20:25 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 20:23 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 20:23 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 20:21 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 20:21 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 20:19 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 20:19 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 20:17 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 20:17 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 20:15 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 20:15 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 20:12 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 20:12 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 20:06 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 20:06 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 20:04 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 20:04 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 20:02 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 20:02 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 20:00 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2153 (T364299)', diff saved to https://phabricator.wikimedia.org/P63892 and previous config saved to /var/cache/conftool/dbconfig/20240602-200046-marostegui.json
  • 20:00 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2153.codfw.wmnet with reason: Maintenance
  • 20:00 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2153.codfw.wmnet with reason: Maintenance
  • 20:00 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T364299)', diff saved to https://phabricator.wikimedia.org/P63891 and previous config saved to /var/cache/conftool/dbconfig/20240602-200021-marostegui.json
  • 20:00 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 20:00 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 19:58 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 19:58 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 19:55 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 19:55 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 19:53 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 19:53 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 19:50 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 19:50 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 19:48 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 19:48 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 19:47 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 19:47 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 19:45 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P63890 and previous config saved to /var/cache/conftool/dbconfig/20240602-194514-marostegui.json
  • 19:42 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 19:41 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 19:36 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 19:36 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 19:34 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 19:34 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 19:32 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 19:32 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 19:30 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 19:30 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 19:30 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P63889 and previous config saved to /var/cache/conftool/dbconfig/20240602-193006-marostegui.json
  • 19:28 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 19:28 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 19:26 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 19:26 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 19:23 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 19:23 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 19:20 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 19:20 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 19:15 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T364299)', diff saved to https://phabricator.wikimedia.org/P63888 and previous config saved to /var/cache/conftool/dbconfig/20240602-191458-marostegui.json
  • 19:13 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 19:13 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 19:11 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 19:11 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 19:09 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 19:09 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 19:07 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 19:07 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 19:02 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 19:02 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 19:00 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 19:00 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 18:56 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 18:56 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 18:54 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 18:54 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 18:52 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 18:52 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 18:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1197 (T352010)', diff saved to https://phabricator.wikimedia.org/P63887 and previous config saved to /var/cache/conftool/dbconfig/20240602-185215-ladsgroup.json
  • 18:52 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1197.eqiad.wmnet with reason: Maintenance
  • 18:51 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1197.eqiad.wmnet with reason: Maintenance
  • 18:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T352010)', diff saved to https://phabricator.wikimedia.org/P63886 and previous config saved to /var/cache/conftool/dbconfig/20240602-185151-ladsgroup.json
  • 18:47 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 18:47 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 18:45 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 18:45 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 18:41 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 18:41 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 18:39 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 18:39 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 18:38 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 18:37 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 18:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P63885 and previous config saved to /var/cache/conftool/dbconfig/20240602-183643-ladsgroup.json
  • 18:35 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 18:35 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 18:33 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 18:33 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 18:31 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 18:31 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 18:29 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 18:29 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 18:27 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 18:27 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 18:25 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 18:25 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 18:23 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 18:23 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 18:21 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 18:21 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 18:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P63884 and previous config saved to /var/cache/conftool/dbconfig/20240602-182135-ladsgroup.json
  • 18:19 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 18:19 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 18:13 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 18:13 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 18:11 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 18:11 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 18:09 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 18:09 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 18:07 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 18:07 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 18:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T352010)', diff saved to https://phabricator.wikimedia.org/P63883 and previous config saved to /var/cache/conftool/dbconfig/20240602-180627-ladsgroup.json
  • 18:05 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 18:05 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 18:03 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 18:03 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 17:58 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 17:58 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 17:56 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 17:56 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 17:54 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 17:54 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 17:52 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 17:52 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 17:41 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 17:41 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 17:39 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 17:39 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 17:37 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 17:37 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 17:35 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 17:35 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 17:33 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 17:33 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 17:31 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 17:31 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 17:22 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 17:22 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 17:20 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 17:20 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 17:18 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 17:18 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 17:16 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 17:16 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 17:13 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 17:13 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 17:04 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 17:04 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 17:01 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 17:01 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 16:58 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 16:57 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 16:52 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 16:52 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 16:50 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 16:50 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 16:46 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 16:46 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 16:41 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 16:41 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 16:39 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 16:39 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 16:37 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 16:37 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 16:35 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 16:34 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 16:33 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 16:33 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 16:14 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 16:14 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 16:12 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 16:12 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 16:10 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 16:10 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 16:01 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 16:01 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:59 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:59 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:57 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:57 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:55 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:55 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:53 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:53 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:51 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:51 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:49 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:49 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:47 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:47 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:45 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:45 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:43 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:43 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:41 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:41 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:32 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:32 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:30 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:30 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:26 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:26 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:20 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:20 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:19 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:18 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:17 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:17 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:15 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:15 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:13 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:13 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:10 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:10 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:08 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:08 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:06 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:06 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 14:50 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 14:49 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 14:49 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2146 (T364299)', diff saved to https://phabricator.wikimedia.org/P63882 and previous config saved to /var/cache/conftool/dbconfig/20240602-144924-marostegui.json
  • 14:49 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2146.codfw.wmnet with reason: Maintenance
  • 14:49 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2146.codfw.wmnet with reason: Maintenance
  • 14:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T364299)', diff saved to https://phabricator.wikimedia.org/P63881 and previous config saved to /var/cache/conftool/dbconfig/20240602-144900-marostegui.json
  • 14:48 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 14:48 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 14:40 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 14:40 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 14:33 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P63880 and previous config saved to /var/cache/conftool/dbconfig/20240602-143352-marostegui.json
  • 14:18 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 14:18 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 14:18 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P63879 and previous config saved to /var/cache/conftool/dbconfig/20240602-141843-marostegui.json
  • 14:17 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 14:17 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 14:14 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 14:14 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 14:13 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 14:12 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 14:11 marostegui@cumin1002: dbctl commit (dc=all): 'db1206 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P63878 and previous config saved to /var/cache/conftool/dbconfig/20240602-141139-root.json
  • 14:11 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 14:11 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 14:06 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 14:06 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 14:04 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 14:04 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 14:03 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T364299)', diff saved to https://phabricator.wikimedia.org/P63877 and previous config saved to /var/cache/conftool/dbconfig/20240602-140334-marostegui.json
  • 13:56 marostegui@cumin1002: dbctl commit (dc=all): 'db1206 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P63876 and previous config saved to /var/cache/conftool/dbconfig/20240602-135632-root.json
  • 13:49 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 13:49 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 13:47 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 13:47 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 13:41 marostegui@cumin1002: dbctl commit (dc=all): 'db1206 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P63875 and previous config saved to /var/cache/conftool/dbconfig/20240602-134126-root.json
  • 13:29 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 13:29 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 13:27 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 13:27 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 13:26 marostegui@cumin1002: dbctl commit (dc=all): 'db1206 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P63874 and previous config saved to /var/cache/conftool/dbconfig/20240602-132620-root.json
  • 13:25 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 13:25 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 13:21 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 13:21 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 13:19 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 13:19 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 13:17 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 13:17 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 13:15 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 13:15 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 13:13 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 13:13 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 13:11 marostegui@cumin1002: dbctl commit (dc=all): 'db1206 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P63873 and previous config saved to /var/cache/conftool/dbconfig/20240602-131114-root.json
  • 13:09 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 13:09 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 13:07 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 13:07 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 13:02 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 13:02 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 12:59 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 12:59 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 12:56 marostegui@cumin1002: dbctl commit (dc=all): 'db1206 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P63872 and previous config saved to /var/cache/conftool/dbconfig/20240602-125608-root.json
  • 12:53 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 12:53 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 12:51 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 12:51 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 12:49 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 12:49 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 12:47 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 12:47 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 12:45 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 12:45 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 12:43 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 12:43 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 12:42 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 12:41 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 12:41 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1150.eqiad.wmnet with reason: Maintenance
  • 12:41 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 12:41 marostegui@cumin1002: dbctl commit (dc=all): 'db1206 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P63871 and previous config saved to /var/cache/conftool/dbconfig/20240602-124102-root.json
  • 12:40 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 12:39 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 12:38 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 12:38 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 12:36 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 12:36 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 12:24 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 12:24 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 12:20 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 12:20 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 12:18 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 12:18 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 12:15 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 12:15 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 12:13 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 12:12 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 12:11 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 12:11 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 12:02 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 12:02 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 12:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1188 (T352010)', diff saved to https://phabricator.wikimedia.org/P63870 and previous config saved to /var/cache/conftool/dbconfig/20240602-120033-ladsgroup.json
  • 12:00 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1188.eqiad.wmnet with reason: Maintenance
  • 12:00 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1188.eqiad.wmnet with reason: Maintenance
  • 12:00 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T352010)', diff saved to https://phabricator.wikimedia.org/P63869 and previous config saved to /var/cache/conftool/dbconfig/20240602-120010-ladsgroup.json
  • 11:58 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 11:58 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 11:56 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 11:56 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 11:54 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 11:54 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 11:49 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 11:48 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 11:45 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P63868 and previous config saved to /var/cache/conftool/dbconfig/20240602-114503-ladsgroup.json
  • 11:41 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 11:41 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 11:39 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 11:39 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 11:36 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 11:36 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 11:31 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 11:31 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 11:29 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P63867 and previous config saved to /var/cache/conftool/dbconfig/20240602-112955-ladsgroup.json
  • 11:29 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 11:29 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 11:25 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219 (T364069)', diff saved to https://phabricator.wikimedia.org/P63866 and previous config saved to /var/cache/conftool/dbconfig/20240602-112512-marostegui.json
  • 11:23 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 11:23 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 11:19 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 11:19 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 11:17 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 11:17 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 11:14 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T352010)', diff saved to https://phabricator.wikimedia.org/P63865 and previous config saved to /var/cache/conftool/dbconfig/20240602-111447-ladsgroup.json
  • 11:14 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 11:14 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 11:12 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 11:12 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 11:10 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P63864 and previous config saved to /var/cache/conftool/dbconfig/20240602-111004-marostegui.json
  • 10:58 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 10:58 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 10:56 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 10:56 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 10:54 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219', diff saved to https://phabricator.wikimedia.org/P63863 and previous config saved to /var/cache/conftool/dbconfig/20240602-105456-marostegui.json
  • 10:52 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 10:52 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 10:49 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 10:49 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 10:47 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 10:47 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 10:43 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 10:43 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 10:41 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 10:41 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 10:39 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2219 (T364069)', diff saved to https://phabricator.wikimedia.org/P63862 and previous config saved to /var/cache/conftool/dbconfig/20240602-103948-marostegui.json
  • 10:36 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 10:36 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 10:34 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 10:34 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 10:32 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 10:32 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 10:30 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 10:30 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 10:26 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 10:26 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 10:24 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 10:24 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 10:22 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 10:22 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 10:16 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 10:16 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 10:14 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 10:14 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 10:12 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 10:12 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 10:10 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 10:10 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 10:08 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 10:08 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 10:05 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 10:05 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 09:59 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 09:59 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 09:57 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 09:57 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 09:44 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 09:44 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 09:42 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 09:42 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 09:31 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 09:31 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 09:25 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 09:25 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 09:18 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 09:18 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 09:10 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2145 (T364299)', diff saved to https://phabricator.wikimedia.org/P63861 and previous config saved to /var/cache/conftool/dbconfig/20240602-091021-marostegui.json
  • 09:10 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2145.codfw.wmnet with reason: Maintenance
  • 09:10 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2145.codfw.wmnet with reason: Maintenance
  • 09:09 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 09:09 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 09:09 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T364299)', diff saved to https://phabricator.wikimedia.org/P63860 and previous config saved to /var/cache/conftool/dbconfig/20240602-090941-marostegui.json
  • 09:09 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 09:09 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 09:07 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 09:07 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 08:59 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 08:59 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 08:54 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P63859 and previous config saved to /var/cache/conftool/dbconfig/20240602-085433-marostegui.json
  • 08:45 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 08:45 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 08:43 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 08:43 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 08:41 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 08:41 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 08:39 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 08:39 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 08:39 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P63858 and previous config saved to /var/cache/conftool/dbconfig/20240602-083925-marostegui.json
  • 08:37 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 08:37 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 08:33 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 08:33 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 08:19 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 08:19 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 08:17 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 08:17 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 08:15 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 08:15 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 08:13 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 08:13 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 08:08 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 08:08 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 08:06 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 08:06 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 08:04 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 08:04 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 08:02 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 08:02 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 08:01 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 08:01 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 07:59 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 07:59 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 07:56 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 07:56 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 07:50 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 07:50 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 07:47 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 07:47 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 07:46 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 07:46 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 07:44 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 07:44 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 07:42 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 07:42 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 07:40 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 07:40 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 07:38 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 07:38 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 07:30 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db1206.eqiad.wmnet with reason: Long schema change
  • 07:30 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 10:00:00 on db1206.eqiad.wmnet with reason: Long schema change
  • 07:29 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1206', diff saved to https://phabricator.wikimedia.org/P63856 and previous config saved to /var/cache/conftool/dbconfig/20240602-072956-root.json
  • 07:24 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 07:24 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 07:09 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 07:09 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 07:06 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 07:06 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 06:59 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 06:59 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 06:57 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 06:57 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 06:49 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 06:49 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 06:47 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 06:47 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 06:45 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 06:45 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 06:41 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 06:41 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 06:39 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 06:39 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 06:29 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 06:29 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 06:23 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 06:23 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 06:21 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 06:21 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 06:15 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 06:15 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 06:14 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 06:14 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 06:12 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 06:12 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 06:10 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 06:10 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 06:08 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 06:08 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 05:59 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 05:59 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 05:57 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 05:57 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 05:55 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 05:55 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 05:53 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 05:53 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 05:50 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 05:50 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 05:47 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 05:47 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 05:44 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 05:44 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 05:42 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 05:42 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 05:39 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 05:39 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 05:37 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 05:37 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 05:35 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 05:34 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 05:31 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 05:31 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 05:29 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 05:29 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 05:27 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 05:27 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 05:18 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 05:18 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 05:16 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 05:16 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 05:15 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 05:15 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 05:13 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 05:13 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 05:10 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 05:10 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 05:08 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 05:08 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 05:06 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 05:06 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 05:04 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 05:04 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 05:02 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 05:02 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 05:00 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 05:00 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 04:58 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 04:58 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 04:56 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 04:56 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 04:53 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 04:53 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 04:36 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 04:36 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 04:18 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 04:17 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 04:16 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 04:16 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 04:14 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 04:14 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 04:06 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 04:06 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 04:04 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 04:04 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 04:02 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 04:02 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 03:58 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 03:58 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 03:55 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 03:55 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 03:52 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 03:52 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 03:47 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 03:47 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 03:45 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 03:45 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 03:42 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 03:42 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 03:36 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 03:36 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 03:36 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2130 (T364299)', diff saved to https://phabricator.wikimedia.org/P63855 and previous config saved to /var/cache/conftool/dbconfig/20240602-033618-marostegui.json
  • 03:36 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2130.codfw.wmnet with reason: Maintenance
  • 03:35 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2130.codfw.wmnet with reason: Maintenance
  • 03:35 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T364299)', diff saved to https://phabricator.wikimedia.org/P63854 and previous config saved to /var/cache/conftool/dbconfig/20240602-033555-marostegui.json
  • 03:31 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 03:31 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 03:27 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 03:27 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 03:21 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 03:21 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 03:20 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P63853 and previous config saved to /var/cache/conftool/dbconfig/20240602-032047-marostegui.json
  • 03:20 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 03:19 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 03:16 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 03:16 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 03:05 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P63852 and previous config saved to /var/cache/conftool/dbconfig/20240602-030539-marostegui.json
  • 03:02 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 03:02 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 03:00 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 03:00 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 02:53 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 02:53 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 02:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1182 (T352010)', diff saved to https://phabricator.wikimedia.org/P63851 and previous config saved to /var/cache/conftool/dbconfig/20240602-025039-ladsgroup.json
  • 02:50 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T364299)', diff saved to https://phabricator.wikimedia.org/P63850 and previous config saved to /var/cache/conftool/dbconfig/20240602-025031-marostegui.json
  • 02:50 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 02:50 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 02:50 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T352010)', diff saved to https://phabricator.wikimedia.org/P63849 and previous config saved to /var/cache/conftool/dbconfig/20240602-025015-ladsgroup.json
  • 02:49 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 02:49 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 02:43 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 02:43 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 02:37 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 02:37 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 02:35 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 02:35 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 02:35 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P63848 and previous config saved to /var/cache/conftool/dbconfig/20240602-023507-ladsgroup.json
  • 02:33 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 02:33 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 02:31 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 02:31 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 02:29 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 02:29 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 02:27 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 02:27 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 02:27 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2219 (T364069)', diff saved to https://phabricator.wikimedia.org/P63847 and previous config saved to /var/cache/conftool/dbconfig/20240602-022710-marostegui.json
  • 02:27 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2219.codfw.wmnet with reason: Maintenance
  • 02:26 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2219.codfw.wmnet with reason: Maintenance
  • 02:26 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2210 (T364069)', diff saved to https://phabricator.wikimedia.org/P63846 and previous config saved to /var/cache/conftool/dbconfig/20240602-022646-marostegui.json
  • 02:25 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 02:25 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 02:23 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 02:23 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 02:21 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 02:21 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 02:20 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1162', diff saved to https://phabricator.wikimedia.org/P63845 and previous config saved to /var/cache/conftool/dbconfig/20240602-021959-ladsgroup.json
  • 02:16 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 02:16 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 02:15 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 02:14 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 02:13 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 02:13 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 02:11 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P63844 and previous config saved to /var/cache/conftool/dbconfig/20240602-021137-marostegui.json
  • 02:10 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 02:10 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 02:09 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 02:08 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 02:06 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 02:06 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 02:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1162 (T352010)', diff saved to https://phabricator.wikimedia.org/P63843 and previous config saved to /var/cache/conftool/dbconfig/20240602-020451-ladsgroup.json
  • 02:03 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 02:03 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 02:01 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 02:01 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:59 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:59 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:56 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2210', diff saved to https://phabricator.wikimedia.org/P63842 and previous config saved to /var/cache/conftool/dbconfig/20240602-015629-marostegui.json
  • 01:52 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:52 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:50 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:50 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:48 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:48 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:46 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:46 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:44 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:44 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:42 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:42 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:41 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2210 (T364069)', diff saved to https://phabricator.wikimedia.org/P63841 and previous config saved to /var/cache/conftool/dbconfig/20240602-014121-marostegui.json
  • 01:38 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:38 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:36 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:36 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:34 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:34 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:33 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:33 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:27 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:27 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:24 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:24 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:21 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:21 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:19 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:19 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:17 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:17 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:16 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:16 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:14 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:14 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:10 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:10 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:08 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:08 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:06 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:06 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:02 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:02 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:58 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:58 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:55 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:55 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:53 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:53 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:51 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:51 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:49 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:49 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:45 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:45 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:37 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:37 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:35 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:35 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:29 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:29 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:27 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:27 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:25 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:25 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:23 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:23 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:21 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:21 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:12 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:12 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:10 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:10 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:08 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:08 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:06 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:06 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:04 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:04 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:02 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:02 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply

2024-06-01

  • 23:56 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 23:56 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 23:55 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 23:54 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 23:53 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 23:53 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 23:51 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 23:51 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 23:39 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 23:39 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 23:37 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 23:37 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 23:16 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 23:16 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 23:14 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 23:14 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 23:12 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 23:12 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 23:10 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 23:10 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 23:02 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 23:02 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 22:34 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 22:34 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 22:32 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 22:32 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 22:21 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 22:21 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 22:18 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 22:18 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 22:14 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 22:14 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 22:11 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 22:11 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 22:09 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 22:09 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 22:07 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 22:07 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 22:01 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 22:01 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 21:58 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 21:58 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 21:55 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2116 (T364299)', diff saved to https://phabricator.wikimedia.org/P63839 and previous config saved to /var/cache/conftool/dbconfig/20240601-215534-marostegui.json
  • 21:55 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2116.codfw.wmnet with reason: Maintenance
  • 21:55 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2116.codfw.wmnet with reason: Maintenance
  • 21:51 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 21:50 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 21:49 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 21:48 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 21:47 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 21:47 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 21:43 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 21:43 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 21:40 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 10:00:00 on db2102.codfw.wmnet with reason: Long schema change
  • 21:40 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 10:00:00 on db2102.codfw.wmnet with reason: Long schema change
  • 21:39 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 21:39 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 21:18 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 21:18 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 21:13 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 21:13 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 21:11 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 21:11 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 21:09 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 21:09 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 21:07 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 21:07 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 21:05 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 21:05 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 21:03 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 21:03 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 21:01 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 21:01 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 20:59 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 20:59 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 20:57 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 20:57 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 20:55 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 20:55 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 20:53 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 20:53 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 20:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1162 (T352010)', diff saved to https://phabricator.wikimedia.org/P63838 and previous config saved to /var/cache/conftool/dbconfig/20240601-201053-ladsgroup.json
  • 20:10 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 20:10 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1162.eqiad.wmnet with reason: Maintenance
  • 20:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T352010)', diff saved to https://phabricator.wikimedia.org/P63837 and previous config saved to /var/cache/conftool/dbconfig/20240601-201029-ladsgroup.json
  • 19:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P63836 and previous config saved to /var/cache/conftool/dbconfig/20240601-195521-ladsgroup.json
  • 19:42 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 19:42 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 19:40 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 19:40 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 19:40 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P63835 and previous config saved to /var/cache/conftool/dbconfig/20240601-194013-ladsgroup.json
  • 19:38 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 19:38 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 19:36 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 19:36 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 19:27 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 19:27 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 19:25 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T352010)', diff saved to https://phabricator.wikimedia.org/P63834 and previous config saved to /var/cache/conftool/dbconfig/20240601-192505-ladsgroup.json
  • 19:01 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 19:01 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 17:42 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db2102.codfw.wmnet with reason: Maintenance
  • 17:42 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db2102.codfw.wmnet with reason: Maintenance
  • 17:42 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
  • 17:42 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
  • 17:42 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1240.eqiad.wmnet with reason: Maintenance
  • 17:41 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1240.eqiad.wmnet with reason: Maintenance
  • 17:41 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1239.eqiad.wmnet with reason: Maintenance
  • 17:41 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1239.eqiad.wmnet with reason: Maintenance
  • 17:41 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235 (T364299)', diff saved to https://phabricator.wikimedia.org/P63833 and previous config saved to /var/cache/conftool/dbconfig/20240601-174133-marostegui.json
  • 17:26 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235', diff saved to https://phabricator.wikimedia.org/P63832 and previous config saved to /var/cache/conftool/dbconfig/20240601-172625-marostegui.json
  • 17:24 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2210 (T364069)', diff saved to https://phabricator.wikimedia.org/P63831 and previous config saved to /var/cache/conftool/dbconfig/20240601-172455-marostegui.json
  • 17:24 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2210.codfw.wmnet with reason: Maintenance
  • 17:24 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2210.codfw.wmnet with reason: Maintenance
  • 17:24 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2206 (T364069)', diff saved to https://phabricator.wikimedia.org/P63830 and previous config saved to /var/cache/conftool/dbconfig/20240601-172432-marostegui.json
  • 17:13 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 17:13 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 17:11 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235', diff saved to https://phabricator.wikimedia.org/P63829 and previous config saved to /var/cache/conftool/dbconfig/20240601-171116-marostegui.json
  • 17:09 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2206', diff saved to https://phabricator.wikimedia.org/P63828 and previous config saved to /var/cache/conftool/dbconfig/20240601-170924-marostegui.json
  • 17:09 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 17:09 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 16:56 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235 (T364299)', diff saved to https://phabricator.wikimedia.org/P63827 and previous config saved to /var/cache/conftool/dbconfig/20240601-165609-marostegui.json
  • 16:54 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2206', diff saved to https://phabricator.wikimedia.org/P63826 and previous config saved to /var/cache/conftool/dbconfig/20240601-165416-marostegui.json
  • 16:39 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2206 (T364069)', diff saved to https://phabricator.wikimedia.org/P63825 and previous config saved to /var/cache/conftool/dbconfig/20240601-163907-marostegui.json
  • 16:17 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 16:17 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 16:16 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 16:15 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 16:14 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 16:14 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 16:12 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 16:12 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 16:09 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 16:09 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:53 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:53 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:30 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:30 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:28 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:28 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:26 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:26 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:24 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:24 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:22 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:22 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:20 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:20 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:18 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:18 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:16 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:16 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:13 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:13 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:11 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:11 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:09 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:09 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:03 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:03 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:01 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:01 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 14:49 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 14:49 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 14:24 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 14:24 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 14:22 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 14:22 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 14:20 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 14:20 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 14:18 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 14:18 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 13:39 akosiaris@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host kafka-main1010.eqiad.wmnet with OS bullseye
  • 13:39 akosiaris@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - akosiaris@cumin1002"
  • 13:16 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 13:16 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 12:52 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 12:52 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 12:52 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1235 (T364299)', diff saved to https://phabricator.wikimedia.org/P63824 and previous config saved to /var/cache/conftool/dbconfig/20240601-125216-marostegui.json
  • 12:52 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1235.eqiad.wmnet with reason: Maintenance
  • 12:51 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1235.eqiad.wmnet with reason: Maintenance
  • 12:51 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234 (T364299)', diff saved to https://phabricator.wikimedia.org/P63823 and previous config saved to /var/cache/conftool/dbconfig/20240601-125152-marostegui.json
  • 12:50 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 12:50 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 12:44 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 12:44 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 12:36 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234', diff saved to https://phabricator.wikimedia.org/P63822 and previous config saved to /var/cache/conftool/dbconfig/20240601-123644-marostegui.json
  • 12:30 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 12:29 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 12:27 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 12:27 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 12:25 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 12:25 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 12:21 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234', diff saved to https://phabricator.wikimedia.org/P63821 and previous config saved to /var/cache/conftool/dbconfig/20240601-122136-marostegui.json
  • 12:20 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 12:19 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 12:18 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 12:18 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 12:12 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 12:12 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 12:10 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 12:10 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 12:08 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 12:08 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 12:06 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234 (T364299)', diff saved to https://phabricator.wikimedia.org/P63820 and previous config saved to /var/cache/conftool/dbconfig/20240601-120628-marostegui.json
  • 12:06 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 12:06 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 12:04 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 12:04 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 12:02 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 12:02 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 12:00 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 12:00 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 11:50 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 11:50 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 11:48 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 11:48 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 11:46 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 11:46 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 11:38 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 11:38 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 11:36 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 11:36 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 11:34 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 11:34 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 11:30 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 11:30 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 11:28 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 11:28 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 11:23 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 11:23 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 11:09 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 11:09 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 11:08 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 11:08 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 11:08 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 11:07 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 11:00 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 11:00 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 10:23 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 10:23 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 10:22 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 10:21 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 10:19 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 10:19 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 10:16 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 10:16 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 10:15 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 10:15 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 10:11 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 10:11 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 10:09 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 10:09 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 10:07 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 10:07 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 10:05 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 10:05 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 09:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1156 (T352010)', diff saved to https://phabricator.wikimedia.org/P63819 and previous config saved to /var/cache/conftool/dbconfig/20240601-095545-ladsgroup.json
  • 09:55 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 09:55 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 09:55 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 09:55 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 09:45 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 09:45 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 09:33 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 09:33 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 09:20 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 09:20 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 09:17 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 09:16 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 09:11 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 09:11 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 09:05 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 09:05 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 09:03 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 09:03 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 09:01 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 09:01 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 08:58 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 08:58 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 08:56 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 08:56 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 08:55 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 08:54 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 08:52 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 08:52 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 08:51 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 08:51 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 07:56 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 07:56 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 07:52 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 07:52 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 07:50 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 07:50 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 07:48 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 07:48 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 07:46 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 07:46 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 07:36 akosiaris@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - akosiaris@cumin1002"
  • 07:36 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 07:36 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 07:24 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 07:23 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 07:21 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 07:21 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 07:20 akosiaris@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on kafka-main1010.eqiad.wmnet with reason: host reimage
  • 07:19 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 07:19 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 07:18 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 07:17 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 07:17 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1234 (T364299)', diff saved to https://phabricator.wikimedia.org/P63818 and previous config saved to /var/cache/conftool/dbconfig/20240601-071723-marostegui.json
  • 07:17 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1234.eqiad.wmnet with reason: Maintenance
  • 07:17 akosiaris@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on kafka-main1010.eqiad.wmnet with reason: host reimage
  • 07:17 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1234.eqiad.wmnet with reason: Maintenance
  • 07:17 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232 (T364299)', diff saved to https://phabricator.wikimedia.org/P63817 and previous config saved to /var/cache/conftool/dbconfig/20240601-071700-marostegui.json
  • 07:02 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2206 (T364069)', diff saved to https://phabricator.wikimedia.org/P63816 and previous config saved to /var/cache/conftool/dbconfig/20240601-070211-marostegui.json
  • 07:02 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2206.codfw.wmnet with reason: Maintenance
  • 07:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232', diff saved to https://phabricator.wikimedia.org/P63815 and previous config saved to /var/cache/conftool/dbconfig/20240601-070151-marostegui.json
  • 07:01 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2206.codfw.wmnet with reason: Maintenance
  • 06:59 akosiaris@cumin1002: START - Cookbook sre.hosts.reimage for host kafka-main1010.eqiad.wmnet with OS bullseye
  • 06:58 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 06:58 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 06:46 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232', diff saved to https://phabricator.wikimedia.org/P63814 and previous config saved to /var/cache/conftool/dbconfig/20240601-064643-marostegui.json
  • 06:31 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232 (T364299)', diff saved to https://phabricator.wikimedia.org/P63813 and previous config saved to /var/cache/conftool/dbconfig/20240601-063135-marostegui.json
  • 06:07 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 06:07 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 05:51 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 05:50 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 05:44 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 05:44 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 05:42 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 05:42 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 05:40 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 05:39 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 05:33 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 05:33 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 05:31 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 05:31 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 05:29 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 05:29 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 05:17 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 05:17 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 05:14 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 05:14 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 04:39 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 04:39 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 04:18 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 04:18 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 04:16 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 04:16 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 04:03 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 04:03 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 03:59 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 03:59 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 03:57 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 03:57 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 03:55 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 03:55 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 03:53 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 03:53 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 03:48 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 03:48 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 03:46 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 03:46 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 03:44 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 03:44 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 03:42 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 03:42 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 03:40 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 03:39 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 03:36 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 03:35 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 03:34 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 03:33 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 03:31 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 03:31 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 03:29 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 03:29 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 03:29 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 03:29 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 03:27 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 03:27 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 03:27 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 03:27 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 03:25 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 03:25 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 03:24 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 03:24 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 03:23 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 03:23 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 03:19 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 03:19 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 03:17 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 03:17 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 03:12 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 03:12 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 03:10 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 03:10 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 03:08 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 03:08 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 03:06 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 03:06 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 03:04 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 03:04 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 03:04 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 03:04 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 03:02 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 03:02 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 02:59 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 02:59 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 02:57 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 02:57 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 02:52 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 02:52 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 02:50 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 02:50 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 02:48 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 02:48 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 02:43 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 02:43 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 02:37 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 02:37 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 02:35 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 02:35 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 02:33 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 02:33 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 02:31 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 02:31 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 02:29 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 02:29 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 02:27 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 02:27 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 02:25 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 02:25 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 02:23 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 02:23 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 02:21 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 02:21 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 02:18 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 02:18 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 02:16 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 02:16 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 02:14 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 02:14 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 02:12 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1232 (T364299)', diff saved to https://phabricator.wikimedia.org/P63812 and previous config saved to /var/cache/conftool/dbconfig/20240601-021256-marostegui.json
  • 02:12 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6:00:00 on db1232.eqiad.wmnet with reason: Maintenance
  • 02:12 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 6:00:00 on db1232.eqiad.wmnet with reason: Maintenance
  • 02:12 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1228 (T364299)', diff saved to https://phabricator.wikimedia.org/P63811 and previous config saved to /var/cache/conftool/dbconfig/20240601-021233-marostegui.json
  • 02:11 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 02:11 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 02:03 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 02:03 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 02:01 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 02:01 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:59 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:59 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:57 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1228', diff saved to https://phabricator.wikimedia.org/P63810 and previous config saved to /var/cache/conftool/dbconfig/20240601-015725-marostegui.json
  • 01:57 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:57 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:55 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:55 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:53 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:52 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:51 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:51 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:49 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:49 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:47 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:47 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:45 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:45 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:43 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:43 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:42 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1228', diff saved to https://phabricator.wikimedia.org/P63809 and previous config saved to /var/cache/conftool/dbconfig/20240601-014216-marostegui.json
  • 01:41 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:41 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:40 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:40 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:36 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:36 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:32 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:32 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:30 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:30 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:30 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:30 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:28 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:28 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:28 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:28 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:27 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1228 (T364299)', diff saved to https://phabricator.wikimedia.org/P63808 and previous config saved to /var/cache/conftool/dbconfig/20240601-012708-marostegui.json
  • 01:26 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:26 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:24 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:24 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:23 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:23 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:22 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:22 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:21 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:21 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:20 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:20 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:19 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:19 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:18 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:18 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:16 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:16 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:14 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:14 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:12 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:12 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:10 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:10 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2204 (T352010)', diff saved to https://phabricator.wikimedia.org/P63807 and previous config saved to /var/cache/conftool/dbconfig/20240601-010959-ladsgroup.json
  • 01:08 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:08 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:06 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:06 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:04 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:04 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:02 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:02 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 01:00 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 01:00 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:58 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:58 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:56 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:56 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:56 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:55 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2204', diff saved to https://phabricator.wikimedia.org/P63806 and previous config saved to /var/cache/conftool/dbconfig/20240601-005451-ladsgroup.json
  • 00:54 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:54 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:54 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:53 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:52 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:52 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:52 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:51 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:50 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:50 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:49 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:49 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:47 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:47 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:47 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:47 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:45 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:45 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:45 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:45 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2204', diff saved to https://phabricator.wikimedia.org/P63805 and previous config saved to /var/cache/conftool/dbconfig/20240601-003943-ladsgroup.json
  • 00:38 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:38 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:30 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:30 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:27 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:27 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:25 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:25 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2204 (T352010)', diff saved to https://phabricator.wikimedia.org/P63804 and previous config saved to /var/cache/conftool/dbconfig/20240601-002435-ladsgroup.json
  • 00:22 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:22 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:21 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:21 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:20 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:20 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:19 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:18 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:17 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:16 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:13 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:13 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:11 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:11 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:09 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:09 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:06 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:06 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:05 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:05 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:04 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:04 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:04 logmsgbot: @deploy1002 helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:03 logmsgbot: @deploy1002 helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 00:01 logmsgbot: @deploy1002 helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 00:01 logmsgbot: @deploy1002 helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply

Other archives

2000s

2010s

2020s