Jump to content

Server Admin Log/Archive 82

From Wikitech

2024-06-30

  • 23:25 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 23:25 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 23:17 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 23:15 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 23:14 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 23:14 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 23:14 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 23:13 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 23:12 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 23:11 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 23:11 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 23:11 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 23:11 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 23:09 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 23:09 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 23:05 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 23:03 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 22:56 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 22:55 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 22:53 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 22:53 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 22:51 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 21:27 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 21:27 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 17:08 _joe_: delete failing pod in eqiad for mw-api-ext, caused the backend errors page

2024-06-29

  • 01:24 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1040.eqiad.wmnet with OS bullseye
  • 01:12 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 01:10 jclark@cumin1002: START - Cookbook sre.dns.netbox
  • 00:04 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1040.eqiad.wmnet with OS bullseye
  • 00:04 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 00:04 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt for dbproxy1028,9 - jclark@cumin1002"
  • 00:03 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt for dbproxy1028,9 - jclark@cumin1002"
  • 00:00 jclark@cumin1002: START - Cookbook sre.dns.netbox

2024-06-28

  • 23:57 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 23:56 jclark@cumin1002: START - Cookbook sre.dns.netbox
  • 23:55 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 23:55 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt for dbproxy1028,9 - jclark@cumin1002"
  • 23:54 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt for dbproxy1028,9 - jclark@cumin1002"
  • 23:50 jclark@cumin1002: START - Cookbook sre.dns.netbox
  • 23:42 tzatziki: removing 1 image for legal compliance
  • 23:33 pt1979@cumin1002: END (FAIL) - Cookbook sre.hosts.dhcp (exit_code=99) for host cloudcephosd1039.eqiad.wmnet
  • 23:32 pt1979@cumin1002: START - Cookbook sre.hosts.dhcp for host cloudcephosd1039.eqiad.wmnet
  • 23:32 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 23:32 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update ip address for cloudcephosd1039 - pt1979@cumin2002"
  • 23:31 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update ip address for cloudcephosd1039 - pt1979@cumin2002"
  • 23:29 pt1979@cumin1002: END (FAIL) - Cookbook sre.hosts.dhcp (exit_code=99) for host cloudcephosd1039.eqiad.wmnet
  • 23:28 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 23:23 pt1979@cumin1002: START - Cookbook sre.hosts.dhcp for host cloudcephosd1039.eqiad.wmnet
  • 23:22 pt1979@cumin1002: END (FAIL) - Cookbook sre.hosts.dhcp (exit_code=99) for host cloudcephosd1039.eqiad.wmnet
  • 23:21 tzatziki: removing 1 image for legal compliance
  • 23:18 pt1979@cumin1002: START - Cookbook sre.hosts.dhcp for host cloudcephosd1039.eqiad.wmnet
  • 23:18 pt1979@cumin1002: END (FAIL) - Cookbook sre.hosts.dhcp (exit_code=99) for host cloudcephosd1039.eqiad.wmnet
  • 23:16 tzatziki: removing 1 image for legal compliance
  • 22:50 pt1979@cumin1002: START - Cookbook sre.hosts.dhcp for host cloudcephosd1039.eqiad.wmnet
  • 22:14 jclark@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcephosd1041']
  • 22:13 jclark@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1041']
  • 22:10 jclark@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudcephosd1040']
  • 22:09 jclark@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudcephosd1040']
  • 21:42 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1040.eqiad.wmnet with OS bullseye
  • 21:42 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:42 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt for cloudcephosd1040 - jclark@cumin1002"
  • 21:41 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1041.eqiad.wmnet with OS bullseye
  • 21:41 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1039.eqiad.wmnet with OS bullseye
  • 21:41 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt for cloudcephosd1040 - jclark@cumin1002"
  • 21:38 jclark@cumin1002: START - Cookbook sre.dns.netbox
  • 21:38 jclark@cumin1002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 21:36 jclark@cumin1002: START - Cookbook sre.dns.netbox
  • 21:35 jclark@cumin1002: END (FAIL) - Cookbook sre.dns.netbox (exit_code=99)
  • 21:34 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephosd1041.mgmt.eqiad.wmnet with reboot policy FORCED
  • 21:34 jclark@cumin1002: START - Cookbook sre.dns.netbox
  • 21:33 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host cloudcephosd1039.mgmt.eqiad.wmnet with reboot policy FORCED
  • 21:30 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudcephosd1040.mgmt.eqiad.wmnet with reboot policy FORCED
  • 21:22 jclark@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephosd1041.mgmt.eqiad.wmnet with reboot policy FORCED
  • 21:21 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:21 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt for cloudcephosd1039 - jclark@cumin1002"
  • 21:20 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt for cloudcephosd1039 - jclark@cumin1002"
  • 21:18 jclark@cumin1002: START - Cookbook sre.dns.netbox
  • 21:17 jclark@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephosd1040.mgmt.eqiad.wmnet with reboot policy FORCED
  • 21:17 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 21:17 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt for cloudcephosd1039 - jclark@cumin1002"
  • 21:16 jclark@cumin1002: START - Cookbook sre.hosts.provision for host cloudcephosd1039.mgmt.eqiad.wmnet with reboot policy FORCED
  • 21:16 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt for cloudcephosd1039 - jclark@cumin1002"
  • 21:11 jclark@cumin1002: START - Cookbook sre.dns.netbox
  • 21:05 sukhe: sudo cumin -b11 "A:cp-text" 'run-puppet-agent'
  • 20:29 sukhe: sudo cumin "A:cp-text" 'disable-puppet "CR 1050672"'
  • 20:20 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbproxy1028.eqiad.wmnet with OS bookworm
  • 20:20 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 20:19 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 20:18 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host dbproxy1029.eqiad.wmnet with OS bookworm
  • 20:18 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 20:15 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 20:04 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbproxy1028.eqiad.wmnet with reason: host reimage
  • 20:01 dzahn@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on lists1001.wikimedia.org with reason: decomed
  • 20:01 dzahn@cumin1002: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on lists1001.wikimedia.org with reason: decomed
  • 20:01 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on dbproxy1029.eqiad.wmnet with reason: host reimage
  • 19:57 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on dbproxy1028.eqiad.wmnet with reason: host reimage
  • 19:57 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on dbproxy1029.eqiad.wmnet with reason: host reimage
  • 19:46 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host dbproxy1029.eqiad.wmnet with OS bookworm
  • 19:46 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host dbproxy1028.eqiad.wmnet with OS bookworm
  • 19:37 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:37 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt for dbproxy1028,9 - jclark@cumin1002"
  • 19:36 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dbproxy1029
  • 19:36 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt for dbproxy1028,9 - jclark@cumin1002"
  • 19:35 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host dbproxy1029
  • 19:34 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dbproxy1028
  • 19:33 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host dbproxy1028
  • 19:31 jclark@cumin1002: START - Cookbook sre.dns.netbox
  • 19:31 jclark@cumin1002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
  • 19:31 sukhe: sudo cumin -b10 "A:cp-text" "run-puppet-agent --enable 'dont enable'": T368645
  • 19:30 jclark@cumin1002: START - Cookbook sre.dns.netbox
  • 18:22 sukhe: disable puppet on A:cp-text
  • 18:16 sfaci@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic: apply
  • 18:16 sfaci@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic: apply
  • 18:05 sfaci@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
  • 18:05 sfaci@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
  • 16:43 hnowlan@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host wikikube-worker2026.codfw.wmnet with OS bullseye
  • 16:36 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 16:36 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 16:33 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 16:23 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 16:19 aikochou@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 16:03 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on mw2300.codfw.wmnet with reason: Reimaging issues
  • 16:03 hnowlan@cumin1002: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on mw2300.codfw.wmnet with reason: Reimaging issues
  • 15:45 hnowlan@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2026.codfw.wmnet with OS bullseye
  • 15:43 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:43 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: fix entries for wikikube-worker2026 - cmooney@cumin1002"
  • 15:35 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: fix entries for wikikube-worker2026 - cmooney@cumin1002"
  • 15:32 cmooney@cumin1002: START - Cookbook sre.dns.netbox
  • 15:25 hnowlan@cumin1002: conftool action : set/weight=10:pooled=yes; selector: name=(wikikube-worker2025.codfw.wmnet|wikikube-worker2027.codfw.wmnet|wikikube-worker2028.codfw.wmnet|wikikube-worker2029.codfw.wmnet),cluster=kubernetes,service=kubesvc
  • 15:22 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 15:21 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/commons-impact-analytics: apply
  • 15:21 andrewbogott: upgraded wikitech-static to 1_42 and php 8.3
  • 15:14 hnowlan: homer 'cr*codfw*' commit 'T351074'
  • 15:14 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2027.codfw.wmnet with OS bullseye
  • 15:12 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 15:11 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/commons-impact-analytics: apply
  • 15:11 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 15:10 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2028.codfw.wmnet with OS bullseye
  • 15:10 cgoubert@cumin1002: conftool action : set/weight=10:pooled=yes; selector: name=(wikikube-worker1027.eqiad.wmnet|wikikube-worker1028.eqiad.wmnet|wikikube-worker1029.eqiad.wmnet|wikikube-worker1030.eqiad.wmnet|wikikube-worker1031.eqiad.wmnet),cluster=kubernetes,service=kubesvc
  • 15:10 claime: Pooling and uncordoning wikikube-worker1027.eqiad.wmnet,wikikube-worker1028.eqiad.wmnet,wikikube-worker1029.eqiad.wmnet,wikikube-worker1030.eqiad.wmnet,wikikube-worker1031.eqiad.wmnet - T351074
  • 15:09 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2029.codfw.wmnet with OS bullseye
  • 15:07 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2025.codfw.wmnet with OS bullseye
  • 15:06 jhathaway: mx-in1001 postfix mx testing complete
  • 15:04 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/commons-impact-analytics: apply
  • 15:00 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 15:00 claime: homer 'cr*eqiad*' commit 'T351074'
  • 14:56 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2027.codfw.wmnet with reason: host reimage
  • 14:54 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/commons-impact-analytics: apply
  • 14:52 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2028.codfw.wmnet with reason: host reimage
  • 14:50 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2029.codfw.wmnet with reason: host reimage
  • 14:47 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2025.codfw.wmnet with reason: host reimage
  • 14:46 hnowlan@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2029.codfw.wmnet with reason: host reimage
  • 14:46 hnowlan@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2028.codfw.wmnet with reason: host reimage
  • 14:45 hnowlan@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2027.codfw.wmnet with reason: host reimage
  • 14:45 hnowlan@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2025.codfw.wmnet with reason: host reimage
  • 14:30 hnowlan@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2029.codfw.wmnet with OS bullseye
  • 14:30 hnowlan@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2028.codfw.wmnet with OS bullseye
  • 14:29 hnowlan@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2027.codfw.wmnet with OS bullseye
  • 14:28 hnowlan@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2025.codfw.wmnet with OS bullseye
  • 14:27 sukhe: sudo cumin "O:durum" "run-puppet-agent"
  • 14:27 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2330 to wikikube-worker2029
  • 14:27 hnowlan@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2029
  • 14:26 hnowlan@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2029
  • 14:26 hnowlan@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:26 hnowlan@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2330 to wikikube-worker2029 - hnowlan@cumin1002"
  • 14:25 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 14:25 hnowlan@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2330 to wikikube-worker2029 - hnowlan@cumin1002"
  • 14:24 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 14:23 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 14:23 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 14:22 hnowlan@cumin1002: START - Cookbook sre.dns.netbox
  • 14:22 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host krb1002.eqiad.wmnet with OS bookworm
  • 14:22 jclark@cumin1002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 14:22 hnowlan@cumin1002: START - Cookbook sre.hosts.rename from mw2330 to wikikube-worker2029
  • 14:22 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2308 to wikikube-worker2028
  • 14:22 hnowlan@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2028
  • 14:22 hnowlan@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2028
  • 14:22 hnowlan@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:21 hnowlan@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2308 to wikikube-worker2028 - hnowlan@cumin1002"
  • 14:15 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 14:12 hnowlan@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2308 to wikikube-worker2028 - hnowlan@cumin1002"
  • 14:10 akosiaris@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host deploy1003.eqiad.wmnet with OS bullseye
  • 14:08 hnowlan@cumin1002: START - Cookbook sre.dns.netbox
  • 14:07 hnowlan@cumin1002: START - Cookbook sre.hosts.rename from mw2308 to wikikube-worker2028
  • 14:07 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2306 to wikikube-worker2027
  • 14:07 hnowlan@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2027
  • 14:07 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 14:07 hnowlan@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2027
  • 14:07 hnowlan@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:06 hnowlan@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2306 to wikikube-worker2027 - hnowlan@cumin1002"
  • 14:06 jhathaway@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mx-in1001.wikimedia.org with reason: email testing
  • 14:05 jhathaway@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on mx-in1001.wikimedia.org with reason: email testing
  • 14:05 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1027.eqiad.wmnet with OS bullseye
  • 14:04 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 14:03 hnowlan@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2306 to wikikube-worker2027 - hnowlan@cumin1002"
  • 14:01 jhathaway: ingressing email on mx-in1001, initial test 1hr
  • 14:00 hnowlan@cumin1002: START - Cookbook sre.dns.netbox
  • 14:00 hnowlan@cumin1002: START - Cookbook sre.hosts.rename from mw2306 to wikikube-worker2027
  • 13:59 hnowlan@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2298 to wikikube-worker2025
  • 13:59 hnowlan@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2025
  • 13:56 akosiaris@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on deploy1003.eqiad.wmnet with reason: host reimage
  • 13:54 hnowlan@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2025
  • 13:54 hnowlan@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:54 hnowlan@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2298 to wikikube-worker2025 - hnowlan@cumin1002"
  • 13:53 akosiaris@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on deploy1003.eqiad.wmnet with reason: host reimage
  • 13:49 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on krb1002.eqiad.wmnet with reason: host reimage
  • 13:49 hnowlan@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2298 to wikikube-worker2025 - hnowlan@cumin1002"
  • 13:46 hnowlan@cumin1002: END (FAIL) - Cookbook sre.hosts.rename (exit_code=93) from mw2300 to wikikube-worker2026
  • 13:46 hnowlan@cumin1002: START - Cookbook sre.hosts.rename from mw2300 to wikikube-worker2026
  • 13:46 hnowlan@cumin1002: START - Cookbook sre.dns.netbox
  • 13:46 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1027.eqiad.wmnet with reason: host reimage
  • 13:45 hnowlan@cumin1002: START - Cookbook sre.hosts.rename from mw2298 to wikikube-worker2025
  • 13:45 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on krb1002.eqiad.wmnet with reason: host reimage
  • 13:43 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1027.eqiad.wmnet with reason: host reimage
  • 13:42 dani@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 13:42 dani@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 13:42 dani@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 13:42 akosiaris@cumin1002: START - Cookbook sre.hosts.reimage for host deploy1003.eqiad.wmnet with OS bullseye
  • 13:42 dani@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 13:42 dani@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 13:41 dani@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 13:41 akosiaris@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host deploy1003.eqiad.wmnet with OS bookworm
  • 13:41 akosiaris@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - akosiaris@cumin1002"
  • 13:38 btullis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 13:38 btullis@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
  • 13:38 btullis@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 13:37 btullis@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply
  • 13:32 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host krb1002.eqiad.wmnet with OS bookworm
  • 13:29 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1027.eqiad.wmnet with OS bullseye
  • 13:29 akosiaris@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - akosiaris@cumin1002"
  • 13:28 hnowlan: running `decommission` on 5 codfw api appservers
  • 13:27 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1028.mgmt.eqiad.wmnet on all recursors
  • 13:27 cmooney@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker1028.mgmt.eqiad.wmnet on all recursors
  • 13:27 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1027.mgmt.eqiad.wmnet on all recursors
  • 13:27 cmooney@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker1027.mgmt.eqiad.wmnet on all recursors
  • 13:26 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker1027.eqiad.wmnet with OS bullseye
  • 13:25 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:25 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: fix entries for wikikube-worker102[7-8] - cmooney@cumin1002"
  • 13:24 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: fix entries for wikikube-worker102[7-8] - cmooney@cumin1002"
  • 13:21 cmooney@cumin1002: START - Cookbook sre.dns.netbox
  • 13:18 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1028.mgmt.eqiad.wmnet on all recursors
  • 13:18 cmooney@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker1028.mgmt.eqiad.wmnet on all recursors
  • 13:18 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1027.mgmt.eqiad.wmnet on all recursors
  • 13:18 cmooney@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker1027.mgmt.eqiad.wmnet on all recursors
  • 13:15 akosiaris@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on deploy1003.eqiad.wmnet with reason: host reimage
  • 13:12 akosiaris@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on deploy1003.eqiad.wmnet with reason: host reimage
  • 13:11 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1029.eqiad.wmnet with OS bullseye
  • 13:06 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1028.eqiad.wmnet with reason: mgmt ip issue
  • 13:05 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1028.eqiad.wmnet with reason: mgmt ip issue
  • 13:01 akosiaris@cumin1002: START - Cookbook sre.hosts.reimage for host deploy1003.eqiad.wmnet with OS bookworm
  • 12:59 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2204 (T367856)', diff saved to https://phabricator.wikimedia.org/P65547 and previous config saved to /var/cache/conftool/dbconfig/20240628-125926-marostegui.json
  • 12:55 hashar@deploy1002: Finished deploy [gerrit/gerrit@0db053e]: Upgrade Gerrit 3.10.0-32-gf77960412e to 3.10.0-71-gf6e9431fff - T367029 T341291 (duration: 00m 09s)
  • 12:55 hashar@deploy1002: Started deploy [gerrit/gerrit@0db053e]: Upgrade Gerrit 3.10.0-32-gf77960412e to 3.10.0-71-gf6e9431fff - T367029 T341291
  • 12:53 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1029.eqiad.wmnet with reason: host reimage
  • 12:50 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1029.eqiad.wmnet with reason: host reimage
  • 12:48 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2165.codfw.wmnet with reason: Maintenance
  • 12:48 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2165.codfw.wmnet with reason: Maintenance
  • 12:45 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1027.eqiad.wmnet with OS bullseye
  • 12:44 cgoubert@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-worker1027.eqiad.wmnet with OS bullseye
  • 12:44 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2204', diff saved to https://phabricator.wikimedia.org/P65546 and previous config saved to /var/cache/conftool/dbconfig/20240628-124419-marostegui.json
  • 12:44 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-conf1004
  • 12:44 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host an-conf1004
  • 12:21 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host an-conf1004
  • 12:18 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:18 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt for an-conf1005,6 - jclark@cumin1002"
  • 12:17 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt for an-conf1005,6 - jclark@cumin1002"
  • 12:15 jclark@cumin1002: START - Cookbook sre.dns.netbox
  • 12:14 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2204 (T367856)', diff saved to https://phabricator.wikimedia.org/P65544 and previous config saved to /var/cache/conftool/dbconfig/20240628-121404-marostegui.json
  • 12:13 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1030.eqiad.wmnet with OS bullseye
  • 12:06 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1028.eqiad.wmnet with OS bullseye
  • 12:05 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1027.eqiad.wmnet with OS bullseye
  • 12:05 cgoubert@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-worker1027.eqiad.wmnet with OS bullseye
  • 11:55 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1030.eqiad.wmnet with reason: host reimage
  • 11:54 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1031.eqiad.wmnet with OS bullseye
  • 11:51 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1030.eqiad.wmnet with reason: host reimage
  • 11:50 Dreamy_Jazz: Finished run on `medium.dblist`
  • 11:47 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1028.eqiad.wmnet with reason: host reimage
  • 11:45 btullis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 11:45 btullis@deploy1002: helmfile [eqiad] START helmfile.d/services/eventgate-analytics-external: apply
  • 11:45 btullis@deploy1002: helmfile [codfw] DONE helmfile.d/services/eventgate-analytics-external: apply
  • 11:44 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1028.eqiad.wmnet with reason: host reimage
  • 11:44 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1029.eqiad.wmnet with OS bullseye
  • 11:44 btullis@deploy1002: helmfile [codfw] START helmfile.d/services/eventgate-analytics-external: apply
  • 11:44 cgoubert@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-worker1029.eqiad.wmnet with OS bullseye
  • 11:38 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1030.eqiad.wmnet with OS bullseye
  • 11:38 cgoubert@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-worker1030.eqiad.wmnet with OS bullseye
  • 11:35 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1031.eqiad.wmnet with reason: host reimage
  • 11:31 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1031.eqiad.wmnet with reason: host reimage
  • 11:31 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1027.eqiad.wmnet with OS bullseye
  • 11:30 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker1027.eqiad.wmnet with OS bullseye
  • 11:29 jnuche@deploy1002: Finished deploy [releng/jenkins-deploy@9b733de] (releasing): (no justification provided) (duration: 00m 44s)
  • 11:29 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1029.eqiad.wmnet with OS bullseye
  • 11:29 cgoubert@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-worker1029.eqiad.wmnet with OS bullseye
  • 11:29 jnuche@deploy1002: Started deploy [releng/jenkins-deploy@9b733de] (releasing): (no justification provided)
  • 11:26 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 11:24 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1028.eqiad.wmnet with OS bullseye
  • 11:23 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker1028.eqiad.wmnet with OS bullseye
  • 11:18 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1031.eqiad.wmnet with OS bullseye
  • 11:18 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1031.eqiad.wmnet on all recursors
  • 11:18 cgoubert@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker1031.eqiad.wmnet on all recursors
  • 11:17 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw1450 to wikikube-worker1031
  • 11:17 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1031
  • 11:16 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1031
  • 11:16 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:16 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1450 to wikikube-worker1031 - cgoubert@cumin1002"
  • 11:16 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 11:15 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1028.eqiad.wmnet with OS bullseye
  • 11:15 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker1028.eqiad.wmnet with OS bullseye
  • 11:14 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1450 to wikikube-worker1031 - cgoubert@cumin1002"
  • 11:13 jnuche@deploy1002: Finished deploy [releng/jenkins-deploy@9b733de] (releasing): (no justification provided) (duration: 00m 25s)
  • 11:13 Dreamy_Jazz: Running `foreachwikiindblist medium.dblist extensions/CheckUser/maintenance/deleteReadOldRowsInCuChanges.php --batch-size=200` for T366781. `medium.dblist` does not include `loginwiki` or `metawiki` (which are to be done later).
  • 11:12 jnuche@deploy1002: Started deploy [releng/jenkins-deploy@9b733de] (releasing): (no justification provided)
  • 11:11 Dreamy_Jazz: `foreachwikiindblist group1-wikipedia.dblist extensions/CheckUser/maintenance/deleteReadOldRowsInCuChanges.php --batch-size=200` finished running
  • 11:11 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1030.eqiad.wmnet with OS bullseye
  • 11:11 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1030.eqiad.wmnet on all recursors
  • 11:11 cgoubert@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker1030.eqiad.wmnet on all recursors
  • 11:09 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 11:09 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw1450 to wikikube-worker1031
  • 11:08 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw1418 to wikikube-worker1030
  • 11:08 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1030
  • 11:07 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1027.eqiad.wmnet with OS bullseye
  • 11:07 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker1027.eqiad.wmnet with OS bullseye
  • 11:06 jelto@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: Upgrade GitLab to new version
  • 11:04 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1030
  • 11:04 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:04 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1418 to wikikube-worker1030 - cgoubert@cumin1002"
  • 11:02 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1418 to wikikube-worker1030 - cgoubert@cumin1002"
  • 11:01 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1027.eqiad.wmnet with OS bullseye
  • 11:00 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 11:00 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw1418 to wikikube-worker1030
  • 10:59 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1029.eqiad.wmnet with OS bullseye
  • 10:59 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1029.eqiad.wmnet on all recursors
  • 10:59 cgoubert@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker1029.eqiad.wmnet on all recursors
  • 10:58 Dreamy_Jazz: Running `foreachwikiindblist group1-wikipedia.dblist extensions/CheckUser/maintenance/deleteReadOldRowsInCuChanges.php --batch-size=200`
  • 10:58 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-worker1027.eqiad.wmnet with OS bullseye
  • 10:58 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw1417 to wikikube-worker1029
  • 10:57 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1029
  • 10:56 Dreamy_Jazz: Stopped running script at `cawiki`
  • 10:56 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1029
  • 10:56 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:56 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1417 to wikikube-worker1029 - cgoubert@cumin1002"
  • 10:54 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1417 to wikikube-worker1029 - cgoubert@cumin1002"
  • 10:51 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 10:51 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw1417 to wikikube-worker1029
  • 10:51 Dreamy_Jazz: Running `foreachwikiindblist group1.dblist extensions/CheckUser/maintenance/deleteReadOldRowsInCuChanges.php --batch-size=200`
  • 10:51 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1028.eqiad.wmnet with OS bullseye
  • 10:51 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1028.eqiad.wmnet on all recursors
  • 10:51 cgoubert@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker1028.eqiad.wmnet on all recursors
  • 10:50 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw1413 to wikikube-worker1028
  • 10:50 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1028
  • 10:49 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1028
  • 10:49 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:49 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1413 to wikikube-worker1028 - cgoubert@cumin1002"
  • 10:48 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1413 to wikikube-worker1028 - cgoubert@cumin1002"
  • 10:46 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 10:46 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw1413 to wikikube-worker1028
  • 10:45 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1027.eqiad.wmnet with OS bullseye
  • 10:45 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 10:45 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1027.eqiad.wmnet on all recursors
  • 10:45 cgoubert@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker1027.eqiad.wmnet on all recursors
  • 10:45 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw1412 to wikikube-worker1027
  • 10:44 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1027
  • 10:44 pfischer@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 10:44 pfischer@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 10:44 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1027
  • 10:43 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:43 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1412 to wikikube-worker1027 - cgoubert@cumin1002"
  • 10:42 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1412 to wikikube-worker1027 - cgoubert@cumin1002"
  • 10:40 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 10:40 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw1412 to wikikube-worker1027
  • 10:35 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 10:30 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 10:22 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 10:22 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 10:22 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
  • 10:18 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 10:17 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: apply
  • 10:16 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: apply
  • 10:12 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 09:57 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for ml-serve2007.codfw.wmnet
  • 09:57 klausman@cumin2002: START - Cookbook sre.hosts.remove-downtime for ml-serve2007.codfw.wmnet
  • 09:37 pfischer@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 09:37 pfischer@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 09:33 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 09:22 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 08:34 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Upgrade GitLab to new version
  • 08:29 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2204 (T367856)', diff saved to https://phabricator.wikimedia.org/P65543 and previous config saved to /var/cache/conftool/dbconfig/20240628-082946-marostegui.json
  • 08:29 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2204.codfw.wmnet with reason: Maintenance
  • 08:29 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2204.codfw.wmnet with reason: Maintenance
  • 07:54 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1015.eqiad.wmnet,service=s4
  • 02:30 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 02:30 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 02:27 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 02:23 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 02:23 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 02:16 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 02:14 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 02:04 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 02:00 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 02:00 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 02:00 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 02:00 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 01:44 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 01:44 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 01:39 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 01:39 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 01:37 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 01:37 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 00:52 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 00:51 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 00:45 brett@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5024.eqsin.wmnet
  • 00:40 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5024.eqsin.wmnet with OS bullseye
  • 00:36 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 00:36 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 00:33 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 00:33 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 00:33 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 00:33 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 00:29 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 00:29 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 00:08 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5024.eqsin.wmnet with reason: host reimage
  • 00:06 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 00:06 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 00:06 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5024.eqsin.wmnet with reason: host reimage

2024-06-27

  • 23:55 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 23:55 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 23:33 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp5024.eqsin.wmnet with OS bullseye
  • 23:33 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp5024.eqsin.wmnet with OS bullseye
  • 23:32 eileen: civicrm upgraded from 76c6fed8 to f9782670
  • 23:24 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 23:24 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 23:19 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 23:18 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 23:17 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2197.codfw.wmnet with reason: Maintenance
  • 23:17 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2197.codfw.wmnet with reason: Maintenance
  • 23:17 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189 (T367856)', diff saved to https://phabricator.wikimedia.org/P65542 and previous config saved to /var/cache/conftool/dbconfig/20240627-231703-marostegui.json
  • 23:05 Dreamy_Jazz: Running `foreachwikiindblist group0.dblist extensions/CheckUser/maintenance/deleteReadOldRowsInCuChanges.php` for T366781
  • 23:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189', diff saved to https://phabricator.wikimedia.org/P65541 and previous config saved to /var/cache/conftool/dbconfig/20240627-230156-marostegui.json
  • 22:46 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189', diff saved to https://phabricator.wikimedia.org/P65540 and previous config saved to /var/cache/conftool/dbconfig/20240627-224649-marostegui.json
  • 22:44 eileen: civicrm upgraded from 7747a290 to 76c6fed8
  • 22:43 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp5024.eqsin.wmnet with OS bullseye
  • 22:43 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 22:42 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 22:41 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 22:41 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 22:39 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 22:37 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 22:37 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 22:34 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 22:34 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 22:33 eileen: civicrm upgraded from 3af41401 to 7747a290
  • 22:31 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189 (T367856)', diff saved to https://phabricator.wikimedia.org/P65539 and previous config saved to /var/cache/conftool/dbconfig/20240627-223142-marostegui.json
  • 22:29 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 22:21 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 22:19 brett@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp5024.eqsin.wmnet
  • 22:13 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 22:13 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 22:09 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 22:09 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 22:05 brett@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5023.eqsin.wmnet
  • 22:05 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 22:02 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5023.eqsin.wmnet with OS bullseye
  • 21:58 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 21:58 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 21:55 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 21:55 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 21:55 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 21:55 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 21:54 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 21:54 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 21:53 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 21:53 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 21:50 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 21:50 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 21:50 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 21:50 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 21:32 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 21:31 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 21:28 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5023.eqsin.wmnet with reason: host reimage
  • 21:25 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5023.eqsin.wmnet with reason: host reimage
  • 21:22 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 21:22 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 20:53 vriley@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host an-conf1005
  • 20:51 vriley@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host an-conf1005
  • 20:50 vriley@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:50 vriley@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt an-conf1005 - vriley@cumin1002"
  • 20:50 jhuneidi@deploy1002: Finished scap: Backport for testwiki: Enable QuickSurveys (T368459) (duration: 14m 33s)
  • 20:49 vriley@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: update mgmt an-conf1005 - vriley@cumin1002"
  • 20:47 vriley@cumin1002: START - Cookbook sre.dns.netbox
  • 20:44 jhuneidi@deploy1002: kharlan, jhuneidi: Continuing with sync
  • 20:40 otto@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 20:40 otto@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 20:39 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp5023.eqsin.wmnet with OS bullseye
  • 20:38 jhuneidi@deploy1002: kharlan, jhuneidi: Backport for testwiki: Enable QuickSurveys (T368459) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:37 otto@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 20:37 otto@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 20:35 jhuneidi@deploy1002: Started scap: Backport for testwiki: Enable QuickSurveys (T368459)
  • 20:34 jhuneidi@deploy1002: Finished scap: Backport for QuickSurveys: Add testing survey configuration (T368459) (duration: 14m 45s)
  • 20:29 jhuneidi@deploy1002: kharlan, jhuneidi: Continuing with sync
  • 20:24 otto@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 20:24 otto@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 20:22 jhuneidi@deploy1002: kharlan, jhuneidi: Backport for QuickSurveys: Add testing survey configuration (T368459) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:21 otto@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 20:21 otto@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 20:20 jhuneidi@deploy1002: Started scap: Backport for QuickSurveys: Add testing survey configuration (T368459)
  • 20:17 jhuneidi@deploy1002: Finished scap: Backport for Enable DiscussionTools permalinks on enwiki (T365974) (duration: 11m 09s)
  • 20:16 brett@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp5023.eqsin.wmnet
  • 20:11 jhuneidi@deploy1002: jhuneidi, kemayo: Continuing with sync
  • 20:08 jhuneidi@deploy1002: jhuneidi, kemayo: Backport for Enable DiscussionTools permalinks on enwiki (T365974) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:06 jhuneidi@deploy1002: Started scap: Backport for Enable DiscussionTools permalinks on enwiki (T365974)
  • 20:03 bking@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 20:03 bking@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 19:55 brett@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5022.eqsin.wmnet
  • 19:53 ottomata: deleted mw-page-content-change-enrich stuck jobmanager pod: kubectl -n mw-page-content-change-enrich delete pod flink-app-main-859d98c57b-zrgwk - T368667
  • 19:51 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1059.eqiad.wmnet with OS bookworm
  • 19:48 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5022.eqsin.wmnet with OS bullseye
  • 19:33 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1064.eqiad.wmnet with OS bookworm
  • 19:27 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1059.eqiad.wmnet with reason: host reimage
  • 19:23 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1059.eqiad.wmnet with reason: host reimage
  • 19:14 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5022.eqsin.wmnet with reason: host reimage
  • 19:10 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1064.eqiad.wmnet with reason: host reimage
  • 19:09 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5022.eqsin.wmnet with reason: host reimage
  • 19:07 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1059.eqiad.wmnet with OS bookworm
  • 19:07 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1064.eqiad.wmnet with reason: host reimage
  • 19:00 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 19:00 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 19:00 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 18:52 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1064.eqiad.wmnet with OS bookworm
  • 18:36 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp5022.eqsin.wmnet with OS bullseye
  • 18:36 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp5022.eqsin.wmnet with OS bullseye
  • 18:19 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 18:19 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 18:19 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp5022.eqsin.wmnet with OS bullseye
  • 18:19 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on aqs1013.eqiad.wmnet with reason: Server swap — T362033
  • 18:18 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on aqs1013.eqiad.wmnet with reason: Server swap — T362033
  • 18:15 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 18:14 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1067.eqiad.wmnet with OS bookworm
  • 18:12 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 18:12 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 18:12 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.43.0-wmf.11 refs T366956
  • 18:12 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1065.eqiad.wmnet with OS bookworm
  • 18:11 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 18:11 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 18:10 brett@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp5022.eqsin.wmnet
  • 18:08 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1066.eqiad.wmnet with OS bookworm
  • 18:08 ejegg: fundraising civicrm upgraded from 13a13f3a to 43fc2c89
  • 18:04 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1058.eqiad.wmnet with OS bookworm
  • 17:59 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1057.eqiad.wmnet with OS bookworm
  • 17:51 brett@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5021.eqsin.wmnet
  • 17:50 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1067.eqiad.wmnet with reason: host reimage
  • 17:47 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1065.eqiad.wmnet with reason: host reimage
  • 17:45 swfrench@deploy1002: Finished scap: Deploying securityContext changes for T362978 to main release (duration: 04m 09s)
  • 17:43 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1066.eqiad.wmnet with reason: host reimage
  • 17:41 swfrench@deploy1002: Started scap: Deploying securityContext changes for T362978 to main release
  • 17:39 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1058.eqiad.wmnet with reason: host reimage
  • 17:37 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 17:37 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 17:36 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1057.eqiad.wmnet with reason: host reimage
  • 17:35 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 17:35 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 17:34 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1058.eqiad.wmnet with reason: host reimage
  • 17:34 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirtlocal1003.eqiad.wmnet with OS bookworm
  • 17:34 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1066.eqiad.wmnet with reason: host reimage
  • 17:34 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1067.eqiad.wmnet with reason: host reimage
  • 17:33 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1065.eqiad.wmnet with reason: host reimage
  • 17:33 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 17:33 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1057.eqiad.wmnet with reason: host reimage
  • 17:33 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 17:33 swfrench-wmf: canary deployments are healthy, slow-logs still produced, continuing with main deployments for T362978
  • 17:33 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 17:33 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 17:32 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 17:32 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 17:26 hashar@deploy1002: Finished deploy [gerrit/gerrit@7659481]: Revert "Add image-diff JavaScript plugin (take 2)" (duration: 00m 07s)
  • 17:26 hashar@deploy1002: Started deploy [gerrit/gerrit@7659481]: Revert "Add image-diff JavaScript plugin (take 2)"
  • 17:26 hashar@deploy1002: deploy aborted: Revert Add image-diff JavaScript plugin (take 2) (duration: 00m 00s)
  • 17:26 hashar@deploy1002: Started deploy [gerrit/gerrit@7659481]: Revert Add image-diff JavaScript plugin (take 2)
  • 17:23 swfrench@deploy1002: Finished scap: (no justification provided) (duration: 08m 03s)
  • 17:19 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1067.eqiad.wmnet with OS bookworm
  • 17:19 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1066.eqiad.wmnet with OS bookworm
  • 17:19 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1065.eqiad.wmnet with OS bookworm
  • 17:18 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1058.eqiad.wmnet with OS bookworm
  • 17:17 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1057.eqiad.wmnet with OS bookworm
  • 17:14 swfrench@deploy1002: Started scap: (no justification provided)
  • 17:13 hashar@deploy1002: Finished deploy [gerrit/gerrit@8c6ae73]: Add image-diff JavaScript plugin (take 2) - T341291 (duration: 00m 07s)
  • 17:13 hashar@deploy1002: Started deploy [gerrit/gerrit@8c6ae73]: Add image-diff JavaScript plugin (take 2) - T341291
  • 17:09 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirtlocal1003.eqiad.wmnet with reason: host reimage
  • 17:06 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirtlocal1003.eqiad.wmnet with reason: host reimage
  • 17:06 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5021.eqsin.wmnet with OS bullseye
  • 16:55 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Unbanning all hosts in search_eqiad
  • 16:54 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Unbanning all hosts in search_eqiad
  • 16:50 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirtlocal1003.eqiad.wmnet with OS bookworm
  • 16:36 arnaudb@cumin1002: dbctl commit (dc=all): 'es1037 (re)pooling @ 100%: post T365988 repool', diff saved to https://phabricator.wikimedia.org/P65537 and previous config saved to /var/cache/conftool/dbconfig/20240627-163635-arnaudb.json
  • 16:35 cgoubert@cumin1002: conftool action : set/weight=10:pooled=yes; selector: name=(wikikube-worker1022.eqiad.wmnet|wikikube-worker1023.eqiad.wmnet|wikikube-worker1024.eqiad.wmnet|wikikube-worker1025.eqiad.wmnet|wikikube-worker1026.eqiad.wmnet),cluster=kubernetes,service=kubesvc
  • 16:35 claime: Pooling and uncordoning wikikube-worker1022.eqiad.wmnet,wikikube-worker1023.eqiad.wmnet,wikikube-worker1024.eqiad.wmnet,wikikube-worker1025.eqiad.wmnet,wikikube-worker1026.eqiad.wmnet - T351074
  • 16:32 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5021.eqsin.wmnet with reason: host reimage
  • 16:29 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5021.eqsin.wmnet with reason: host reimage
  • 16:27 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirtlocal1002.eqiad.wmnet with OS bookworm
  • 16:21 arnaudb@cumin1002: dbctl commit (dc=all): 'es1037 (re)pooling @ 75%: post T365988 repool', diff saved to https://phabricator.wikimedia.org/P65536 and previous config saved to /var/cache/conftool/dbconfig/20240627-162129-arnaudb.json
  • 16:18 claime: homer 'cr*eqiad*' commit 'T351074'
  • 16:13 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1026.eqiad.wmnet with OS bullseye
  • 16:11 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1025.eqiad.wmnet with OS bullseye
  • 16:09 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1023.eqiad.wmnet with OS bullseye
  • 16:06 arnaudb@cumin1002: dbctl commit (dc=all): 'es1037 (re)pooling @ 50%: post T365988 repool', diff saved to https://phabricator.wikimedia.org/P65535 and previous config saved to /var/cache/conftool/dbconfig/20240627-160624-arnaudb.json
  • 16:05 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1024.eqiad.wmnet with OS bullseye
  • 16:03 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1022.eqiad.wmnet with OS bullseye
  • 16:01 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirtlocal1002.eqiad.wmnet with reason: host reimage
  • 16:00 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/citoid: sync
  • 16:00 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/citoid: sync
  • 15:58 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirtlocal1002.eqiad.wmnet with reason: host reimage
  • 15:56 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp5021.eqsin.wmnet with OS bullseye
  • 15:56 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp5021.eqsin.wmnet with OS bullseye
  • 15:56 jhancock@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host dbproxy2005
  • 15:56 jhancock@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host dbproxy2005
  • 15:55 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1026.eqiad.wmnet with reason: host reimage
  • 15:51 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1025.eqiad.wmnet with reason: host reimage
  • 15:51 arnaudb@cumin1002: dbctl commit (dc=all): 'es1037 (re)pooling @ 25%: post T365988 repool', diff saved to https://phabricator.wikimedia.org/P65534 and previous config saved to /var/cache/conftool/dbconfig/20240627-155118-arnaudb.json
  • 15:48 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1023.eqiad.wmnet with reason: host reimage
  • 15:46 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1024.eqiad.wmnet with reason: host reimage
  • 15:44 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1025.eqiad.wmnet with reason: host reimage
  • 15:43 hnowlan: restarted ferm on 8 failing k8s workers
  • 15:43 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1022.eqiad.wmnet with reason: host reimage
  • 15:42 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1026.eqiad.wmnet with reason: host reimage
  • 15:42 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1024.eqiad.wmnet with reason: host reimage
  • 15:41 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirtlocal1002.eqiad.wmnet with OS bookworm
  • 15:41 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1023.eqiad.wmnet with reason: host reimage
  • 15:40 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1022.eqiad.wmnet with reason: host reimage
  • 15:36 arnaudb@cumin1002: dbctl commit (dc=all): 'es1037 (re)pooling @ 10%: post T365988 repool', diff saved to https://phabricator.wikimedia.org/P65533 and previous config saved to /var/cache/conftool/dbconfig/20240627-153613-arnaudb.json
  • 15:29 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1025.eqiad.wmnet with OS bullseye
  • 15:29 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1025.eqiad.wmnet on all recursors
  • 15:29 cgoubert@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker1025.eqiad.wmnet on all recursors
  • 15:28 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw1373 to wikikube-worker1025
  • 15:28 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1025
  • 15:28 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1026.eqiad.wmnet with OS bullseye
  • 15:28 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1026.eqiad.wmnet on all recursors
  • 15:28 cgoubert@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker1026.eqiad.wmnet on all recursors
  • 15:27 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1025
  • 15:27 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:27 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1373 to wikikube-worker1025 - cgoubert@cumin1002"
  • 15:27 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1024.eqiad.wmnet with OS bullseye
  • 15:27 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1024.eqiad.wmnet on all recursors
  • 15:27 cgoubert@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker1024.eqiad.wmnet on all recursors
  • 15:26 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1023.eqiad.wmnet with OS bullseye
  • 15:26 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1023.eqiad.wmnet on all recursors
  • 15:26 cgoubert@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker1023.eqiad.wmnet on all recursors
  • 15:25 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1022.eqiad.wmnet with OS bullseye
  • 15:25 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1022.eqiad.wmnet on all recursors
  • 15:25 cgoubert@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker1022.eqiad.wmnet on all recursors
  • 15:25 pmiazga: T367901 mwmaint1002: Ran `mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=rowiki --logwiki=metawiki 'Rui_Filipe_Fernandes' '44_Gabriel’`
  • 15:24 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1373 to wikikube-worker1025 - cgoubert@cumin1002"
  • 15:23 dcaro@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd1007.eqiad.wmnet
  • 15:21 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 15:21 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw1373 to wikikube-worker1025
  • 15:21 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw1366 to wikikube-worker1024
  • 15:21 arnaudb@cumin1002: dbctl commit (dc=all): 'es1037 (re)pooling @ 5%: post T365988 repool', diff saved to https://phabricator.wikimedia.org/P65532 and previous config saved to /var/cache/conftool/dbconfig/20240627-152107-arnaudb.json
  • 15:21 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1024
  • 15:20 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1024
  • 15:20 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:20 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1366 to wikikube-worker1024 - cgoubert@cumin1002"
  • 15:19 pmiazga: T368451 mwmaint1002: Ran `mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=commonswiki --logwiki=metawiki 'Agustín_Antonio_Cardozo' 'Agustín_Cardozo_Cabrera’
  • 15:18 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1366 to wikikube-worker1024 - cgoubert@cumin1002"
  • 15:17 dcaro@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudcephosd1007.eqiad.wmnet
  • 15:16 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 15:16 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw1366 to wikikube-worker1024
  • 15:15 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw1404 to wikikube-worker1026
  • 15:15 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1026
  • 15:14 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1026
  • 15:14 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:14 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1404 to wikikube-worker1026 - cgoubert@cumin1002"
  • 15:12 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1404 to wikikube-worker1026 - cgoubert@cumin1002"
  • 15:10 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 15:10 cgoubert@cumin1002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
  • 15:10 hashar@deploy1002: Finished deploy [gerrit/gerrit@8c94fee]: Revert "Add image-diff JavaScript plugin" (duration: 00m 07s)
  • 15:09 hashar@deploy1002: Started deploy [gerrit/gerrit@8c94fee]: Revert "Add image-diff JavaScript plugin"
  • 15:09 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts mw1373.eqiad.wmnet
  • 15:09 cgoubert@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mw1373.eqiad.wmnet
  • 15:08 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 15:08 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw1404 to wikikube-worker1026
  • 15:04 hashar@deploy1002: Finished deploy [gerrit/gerrit@9652bc3]: Add image-diff JavaScript plugin - T341291 (duration: 00m 07s)
  • 15:04 hashar@deploy1002: Started deploy [gerrit/gerrit@9652bc3]: Add image-diff JavaScript plugin - T341291
  • 15:03 dcaro@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1007.eqiad.wmnet with OS bullseye
  • 15:02 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@483e8c3] (codfw): Bump kartotherian src to latest master (duration: 02m 49s)
  • 15:00 topranks: rebooting lsw1-e7-eqiad to upgrade JunOS on switch T365988
  • 15:00 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts mw1366.eqiad.wmnet
  • 15:00 cgoubert@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mw1366.eqiad.wmnet
  • 15:00 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@483e8c3] (codfw): Bump kartotherian src to latest master
  • 14:59 jgiannelos@deploy1002: Finished deploy [kartotherian/deploy@483e8c3] (eqiad): Bump kartotherian src to latest master (duration: 03m 10s)
  • 14:58 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on an-worker[1163-1165].eqiad.wmnet,es1037.eqiad.wmnet,ms-be1078.eqiad.wmnet with reason: JunOS upgrade lsw1-e7-eqiad
  • 14:58 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:40:00 on an-worker[1163-1165].eqiad.wmnet,es1037.eqiad.wmnet,ms-be1078.eqiad.wmnet with reason: JunOS upgrade lsw1-e7-eqiad
  • 14:57 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw1365 to wikikube-worker1023
  • 14:57 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1023
  • 14:57 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on lsw1-e7-eqiad,lsw1-e7-eqiad IPv6,ssw1-e1-eqiad.mgmt,ssw1-f1-eqiad.mgmt with reason: JunOS upgrade lsw1-e7-eqiad
  • 14:57 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp5021.eqsin.wmnet with OS bullseye
  • 14:57 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:40:00 on lsw1-e7-eqiad,lsw1-e7-eqiad IPv6,ssw1-e1-eqiad.mgmt,ssw1-f1-eqiad.mgmt with reason: JunOS upgrade lsw1-e7-eqiad
  • 14:56 jgiannelos@deploy1002: Started deploy [kartotherian/deploy@483e8c3] (eqiad): Bump kartotherian src to latest master
  • 14:56 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1023
  • 14:56 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:55 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1365 to wikikube-worker1023 - cgoubert@cumin1002"
  • 14:54 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1365 to wikikube-worker1023 - cgoubert@cumin1002"
  • 14:53 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host deploy1003.eqiad.wmnet with OS bullseye
  • 14:52 brennen@deploy1002: Finished deploy [phabricator/deployment@0df351e]: deploy phab1004 for minor update (duration: 00m 32s)
  • 14:52 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 14:52 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw1365 to wikikube-worker1023
  • 14:52 brennen@deploy1002: Started deploy [phabricator/deployment@0df351e]: deploy phab1004 for minor update
  • 14:51 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw1359 to wikikube-worker1022
  • 14:51 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1022
  • 14:51 brennen@deploy1002: Finished deploy [phabricator/deployment@0df351e]: test deploy phab2002 (duration: 00m 34s)
  • 14:50 brennen@deploy1002: Started deploy [phabricator/deployment@0df351e]: test deploy phab2002
  • 14:50 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1022
  • 14:50 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:50 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1359 to wikikube-worker1022 - cgoubert@cumin1002"
  • 14:48 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1359 to wikikube-worker1022 - cgoubert@cumin1002"
  • 14:46 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:50:00 on lsw1-e7-eqiad.mgmt with reason: prep JunOS upgrade lsw1-e7-eqiad
  • 14:46 dcaro@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1007.eqiad.wmnet with reason: host reimage
  • 14:46 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:50:00 on lsw1-e7-eqiad.mgmt with reason: prep JunOS upgrade lsw1-e7-eqiad
  • 14:46 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 14:46 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw1359 to wikikube-worker1022
  • 14:43 dcaro@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1007.eqiad.wmnet with reason: host reimage
  • 14:38 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on es1037.eqiad.wmnet with reason: T365988
  • 14:38 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on es1037.eqiad.wmnet with reason: T365988
  • 14:37 arnaudb@cumin1002: dbctl commit (dc=all): 'T365988 - depool es1037', diff saved to https://phabricator.wikimedia.org/P65531 and previous config saved to /var/cache/conftool/dbconfig/20240627-143741-arnaudb.json
  • 14:15 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1015.eqiad.wmnet,service=s4
  • 14:12 dcaro@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1007.eqiad.wmnet with OS bullseye
  • 13:54 urbanecm@deploy1002: Finished scap: Backport for CommonSettings: Mark REL1_42 as stable (T359850) (duration: 08m 10s)
  • 13:48 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 13:46 urbanecm@deploy1002: Started scap: Backport for CommonSettings: Mark REL1_42 as stable (T359850)
  • 13:46 urbanecm@deploy1002: Finished scap: Backport for ptwiki: Enable CommunityConfiguration (T368310) (duration: 08m 58s)
  • 13:42 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirtlocal1001.eqiad.wmnet with OS bookworm
  • 13:41 urbanecm@deploy1002: urbanecm: Continuing with sync
  • 13:41 urbanecm: Run `mwscript extensions/GrowthExperiments/maintenance/migrateCommunityConfig.php --wiki=ptwiki --force` via mwdebug1001 (T368310)
  • 13:40 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host deploy1003
  • 13:39 urbanecm@deploy1002: urbanecm: Backport for ptwiki: Enable CommunityConfiguration (T368310) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:38 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host deploy1003
  • 13:37 urbanecm@deploy1002: Started scap: Backport for ptwiki: Enable CommunityConfiguration (T368310)
  • 13:36 urbanecm@deploy1002: Finished scap: Backport for Enable local uploads for Gilaki Wikipedia (T364673), [noop] Remove $wgRedirectScript, not used since MediaWiki 1.22, CommunityConfiguration: Log info and higher (duration: 10m 22s)
  • 13:35 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 13:31 urbanecm@deploy1002: urbanecm, tgr, nmw03: Continuing with sync
  • 13:28 urbanecm@deploy1002: urbanecm, tgr, nmw03: Backport for Enable local uploads for Gilaki Wikipedia (T364673), [noop] Remove $wgRedirectScript, not used since MediaWiki 1.22, CommunityConfiguration: Log info and higher synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:25 urbanecm@deploy1002: Started scap: Backport for Enable local uploads for Gilaki Wikipedia (T364673), [noop] Remove $wgRedirectScript, not used since MediaWiki 1.22, CommunityConfiguration: Log info and higher
  • 13:24 urbanecm@deploy1002: Finished scap: Backport for [CheckUser] Stop writing old for event tables migration on all wikis (T360685), testwiki: use shellbox-video for scaling video (T356241), Add VK namespace alias to Azerbaijani Wikibooks (T368237) (duration: 16m 48s)
  • 13:19 urbanecm@deploy1002: urbanecm, dreamrimmer, hnowlan, dreamyjazz: Continuing with sync
  • 13:12 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirtlocal1001.eqiad.wmnet with reason: host reimage
  • 13:10 urbanecm@deploy1002: urbanecm, dreamrimmer, hnowlan, dreamyjazz: Backport for [CheckUser] Stop writing old for event tables migration on all wikis (T360685), testwiki: use shellbox-video for scaling video (T356241), Add VK namespace alias to Azerbaijani Wikibooks (T368237) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:08 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirtlocal1001.eqiad.wmnet with reason: host reimage
  • 13:08 urbanecm@deploy1002: Started scap: Backport for [CheckUser] Stop writing old for event tables migration on all wikis (T360685), testwiki: use shellbox-video for scaling video (T356241), Add VK namespace alias to Azerbaijani Wikibooks (T368237)
  • 13:02 sukhe: A:dnsbox: remove 10.3.0.2/32 from /e/n/i
  • 12:52 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirtlocal1001.eqiad.wmnet with OS bookworm
  • 12:50 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2189 (T367856)', diff saved to https://phabricator.wikimedia.org/P65529 and previous config saved to /var/cache/conftool/dbconfig/20240627-125019-marostegui.json
  • 12:50 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2189.codfw.wmnet with reason: Maintenance
  • 12:50 sukhe: sudo cumin 'A:dnsbox' 'rm /var/lib/dnsbox/ntp.state': remove obsolete ntp.state file
  • 12:50 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2189.codfw.wmnet with reason: Maintenance
  • 12:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T367856)', diff saved to https://phabricator.wikimedia.org/P65528 and previous config saved to /var/cache/conftool/dbconfig/20240627-124957-marostegui.json
  • 12:48 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on ml-serve2007.codfw.wmnet with reason: Hardware maintenance for memory errors
  • 12:48 klausman@cumin2002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on ml-serve2007.codfw.wmnet with reason: Hardware maintenance for memory errors
  • 12:42 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host deploy1003.eqiad.wmnet with OS bullseye
  • 12:38 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance
  • 12:38 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance
  • 12:38 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 (T364069)', diff saved to https://phabricator.wikimedia.org/P65527 and previous config saved to /var/cache/conftool/dbconfig/20240627-123805-marostegui.json
  • 12:22 jclark@cumin1002: START - Cookbook sre.dns.netbox
  • 12:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P65524 and previous config saved to /var/cache/conftool/dbconfig/20240627-121942-marostegui.json
  • 12:17 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 12:15 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 12:14 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:12 cmooney@cumin1002: START - Cookbook sre.dns.netbox
  • 12:12 cmooney@cumin1002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
  • 12:12 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 12:12 cgoubert@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 12:11 cmooney@cumin1002: START - Cookbook sre.dns.netbox
  • 12:10 cmooney@cumin1002: END (ERROR) - Cookbook sre.dns.netbox (exit_code=97)
  • 12:10 cmooney@cumin1002: START - Cookbook sre.dns.netbox
  • 12:10 cgoubert@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 12:08 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply
  • 12:07 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226', diff saved to https://phabricator.wikimedia.org/P65523 and previous config saved to /var/cache/conftool/dbconfig/20240627-120751-marostegui.json
  • 12:07 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply
  • 12:07 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply
  • 12:07 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
  • 12:07 cgoubert@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
  • 12:07 cgoubert@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
  • 12:04 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T367856)', diff saved to https://phabricator.wikimedia.org/P65522 and previous config saved to /var/cache/conftool/dbconfig/20240627-120435-marostegui.json
  • 12:00 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/toolhub: apply
  • 12:00 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/toolhub: apply
  • 11:59 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/toolhub: apply
  • 11:59 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/toolhub: apply
  • 11:59 cgoubert@deploy1002: helmfile [staging] DONE helmfile.d/services/toolhub: apply
  • 11:58 cgoubert@deploy1002: helmfile [staging] START helmfile.d/services/toolhub: apply
  • 11:53 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 11:52 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1226 (T364069)', diff saved to https://phabricator.wikimedia.org/P65521 and previous config saved to /var/cache/conftool/dbconfig/20240627-115244-marostegui.json
  • 11:51 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 11:51 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 11:49 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 11:49 cgoubert@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 11:48 cgoubert@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 11:47 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/developer-portal: apply
  • 11:47 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/developer-portal: apply
  • 11:46 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/developer-portal: apply
  • 11:46 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/developer-portal: apply
  • 11:46 cgoubert@deploy1002: helmfile [staging] DONE helmfile.d/services/developer-portal: apply
  • 11:46 cgoubert@deploy1002: helmfile [staging] START helmfile.d/services/developer-portal: apply
  • 11:42 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 11:42 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 11:41 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 11:41 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply
  • 11:41 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 11:40 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply
  • 11:40 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply
  • 11:40 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply
  • 11:40 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 11:39 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 11:39 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply
  • 11:39 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply
  • 11:39 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply
  • 11:38 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply
  • 11:38 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply
  • 11:38 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp4037.ulsfo.wmnet
  • 11:36 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox: apply
  • 11:36 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply
  • 11:35 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-video: apply
  • 11:35 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply
  • 11:35 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 11:35 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp4037.ulsfo.wmnet
  • 11:35 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply
  • 11:34 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 11:34 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 11:34 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 11:34 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply
  • 11:33 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-media: apply
  • 11:33 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply
  • 11:32 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply
  • 11:32 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox: apply
  • 11:30 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox: apply
  • 11:29 cgoubert@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply
  • 11:29 cgoubert@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-video: apply
  • 11:28 cgoubert@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-timeline: apply
  • 11:28 cgoubert@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-timeline: apply
  • 11:28 cgoubert@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 11:28 cgoubert@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 11:28 cgoubert@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-media: apply
  • 11:27 cgoubert@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-media: apply
  • 11:27 cgoubert@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-constraints: apply
  • 11:27 cgoubert@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-constraints: apply
  • 11:27 cgoubert@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox: apply
  • 11:26 cgoubert@deploy1002: helmfile [staging] START helmfile.d/services/shellbox: apply
  • 11:25 cgoubert@deploy1002: Finished scap: Deploy new prometheus-php-fpm-exporter, prometheus-apache-exporter - T283861 (duration: 06m 17s)
  • 11:24 slyngshede@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host idp-test1004.wikimedia.org with OS bookworm
  • 11:19 cgoubert@deploy1002: Started scap: Deploy new prometheus-php-fpm-exporter, prometheus-apache-exporter - T283861
  • 11:17 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 11:17 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 11:14 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 11:14 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 11:13 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 11:13 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 11:13 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 11:13 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 11:12 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 11:07 slyngshede@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on idp-test1004.wikimedia.org with reason: host reimage
  • 11:03 slyngshede@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on idp-test1004.wikimedia.org with reason: host reimage
  • 11:00 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 11:00 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 11:00 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 10:51 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 10:51 claime: Deploying new prometheus-php-fpm-exporter, prometheus-apache-exporter to mw-on-k8s and shellbox - T283861
  • 10:49 slyngshede@cumin1002: START - Cookbook sre.hosts.reimage for host idp-test1004.wikimedia.org with OS bookworm
  • 10:48 slyngshede@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host idp-test2004.wikimedia.org
  • 10:48 slyngshede@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host idp-test2004.wikimedia.org with OS bookworm
  • 10:43 fabfur: re-enabling puppet on A:cp-text_ulsfo (reverted https://gerrit.wikimedia.org/r/c/operations/puppet/+/1050297) (T365718)
  • 10:38 jelto@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab2002.wikimedia.org with reason: Upgrade GitLab to new version
  • 10:30 fabfur: correcting previous statement: puppet disabled just on A:cp-text_ulsfo
  • 10:28 fabfur: disable puppet on A:cp-ulsfo to apply selectively https://gerrit.wikimedia.org/r/c/operations/puppet/+/1050258 (T365718)
  • 10:24 isaranto@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 10:24 isaranto@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 10:20 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 10:04 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 10:04 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 09:42 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 09:41 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 09:40 slyngshede@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on idp-test2004.wikimedia.org with reason: host reimage
  • 09:38 slyngshede@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on idp-test2004.wikimedia.org with reason: host reimage
  • 09:21 slyngshede@cumin1002: START - Cookbook sre.hosts.reimage for host idp-test2004.wikimedia.org with OS bookworm
  • 09:14 slyngshede@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM idp-test2004.wikimedia.org - slyngshede@cumin1002"
  • 09:13 slyngshede@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM idp-test2004.wikimedia.org - slyngshede@cumin1002"
  • 09:13 slyngshede@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) idp-test2004.wikimedia.org on all recursors
  • 09:13 slyngshede@cumin1002: START - Cookbook sre.dns.wipe-cache idp-test2004.wikimedia.org on all recursors
  • 09:13 slyngshede@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:13 slyngshede@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM idp-test2004.wikimedia.org - slyngshede@cumin1002"
  • 09:12 slyngshede@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM idp-test2004.wikimedia.org - slyngshede@cumin1002"
  • 09:09 slyngshede@cumin1002: START - Cookbook sre.dns.netbox
  • 09:09 slyngshede@cumin1002: START - Cookbook sre.ganeti.makevm for new host idp-test2004.wikimedia.org
  • 09:04 jmm@cumin2002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts ganeti1019.eqiad.wmnet
  • 09:04 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:04 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti1019.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 09:01 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: ganeti1019.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 08:59 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 08:54 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts ganeti1019.eqiad.wmnet
  • 08:48 slyngshede@cumin1002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=99) for new host idp-test1004.wikimedia.org
  • 08:48 slyngshede@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host idp-test1004.wikimedia.org with OS bookworm
  • 08:40 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1226 (T364069)', diff saved to https://phabricator.wikimedia.org/P65518 and previous config saved to /var/cache/conftool/dbconfig/20240627-084043-marostegui.json
  • 08:40 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance
  • 08:40 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1226.eqiad.wmnet with reason: Maintenance
  • 08:37 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts irc1001.wikimedia.org
  • 08:37 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:37 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: irc1001.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 08:36 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: irc1001.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 08:33 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 08:27 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts irc1001.wikimedia.org
  • 08:26 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts irc2001.wikimedia.org
  • 08:26 jmm@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:26 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: irc2001.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 08:20 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab2002.wikimedia.org with reason: Upgrade GitLab to new version
  • 08:10 jynus@cumin1002: dbctl commit (dc=all): 'Repool es1025 at 100% weight T363812', diff saved to https://phabricator.wikimedia.org/P65517 and previous config saved to /var/cache/conftool/dbconfig/20240627-081044-jynus.json
  • 08:10 jynus@cumin1002: dbctl commit (dc=all): 'Repool es1022 at 100% weight T363812', diff saved to https://phabricator.wikimedia.org/P65516 and previous config saved to /var/cache/conftool/dbconfig/20240627-081016-jynus.json
  • 08:08 jmm@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: irc2001.wikimedia.org decommissioned, removing all IPs except the asset tag one - jmm@cumin2002"
  • 08:04 jelto@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: Upgrade GitLab Replica to new version
  • 07:59 jynus@cumin1002: dbctl commit (dc=all): 'Repool es1025 at 50% weight T363812', diff saved to https://phabricator.wikimedia.org/P65515 and previous config saved to /var/cache/conftool/dbconfig/20240627-075944-jynus.json
  • 07:57 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Upgrade GitLab Replica to new version
  • 07:57 jelto@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: Upgrade GitLab Replica to new version
  • 07:56 jynus@cumin1002: dbctl commit (dc=all): 'Repool es1022 at 50% weight T363812', diff saved to https://phabricator.wikimedia.org/P65514 and previous config saved to /var/cache/conftool/dbconfig/20240627-075620-jynus.json
  • 07:54 jynus@cumin1002: dbctl commit (dc=all): 'Repool es1025 at 10% weight T363812', diff saved to https://phabricator.wikimedia.org/P65513 and previous config saved to /var/cache/conftool/dbconfig/20240627-075447-jynus.json
  • 07:50 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Upgrade GitLab Replica to new version
  • 07:45 jynus@cumin1002: dbctl commit (dc=all): 'Repool es1022 at 10% weight T363812', diff saved to https://phabricator.wikimedia.org/P65512 and previous config saved to /var/cache/conftool/dbconfig/20240627-074542-jynus.json
  • 07:37 jmm@cumin2002: START - Cookbook sre.dns.netbox
  • 07:36 kartik@deploy1002: Finished scap: Backport for Add Metrics Platform stream configuration and registration for MinT for Wikipedia Readers feature by Language and Product Localization team. (T368028) (duration: 08m 42s)
  • 07:36 slyngshede@cumin1002: START - Cookbook sre.hosts.reimage for host idp-test1004.wikimedia.org with OS bookworm
  • 07:35 slyngshede@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM idp-test1004.wikimedia.org - slyngshede@cumin1002"
  • 07:34 slyngshede@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM idp-test1004.wikimedia.org - slyngshede@cumin1002"
  • 07:34 slyngshede@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) idp-test1004.wikimedia.org on all recursors
  • 07:34 slyngshede@cumin1002: START - Cookbook sre.dns.wipe-cache idp-test1004.wikimedia.org on all recursors
  • 07:33 slyngshede@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 07:33 slyngshede@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM idp-test1004.wikimedia.org - slyngshede@cumin1002"
  • 07:33 jmm@cumin2002: START - Cookbook sre.hosts.decommission for hosts irc2001.wikimedia.org
  • 07:32 slyngshede@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM idp-test1004.wikimedia.org - slyngshede@cumin1002"
  • 07:31 kartik@deploy1002: kcvelaga, kartik: Continuing with sync
  • 07:30 kartik@deploy1002: kcvelaga, kartik: Backport for Add Metrics Platform stream configuration and registration for MinT for Wikipedia Readers feature by Language and Product Localization team. (T368028) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 07:27 kartik@deploy1002: Started scap: Backport for Add Metrics Platform stream configuration and registration for MinT for Wikipedia Readers feature by Language and Product Localization team. (T368028)
  • 07:24 slyngshede@cumin1002: START - Cookbook sre.dns.netbox
  • 07:24 slyngshede@cumin1002: START - Cookbook sre.ganeti.makevm for new host idp-test1004.wikimedia.org
  • 07:18 kartik@deploy1002: Finished scap: Backport for Enable MinT for Wikipedia readers MVP on a set of pilot wikis (T363465) (duration: 14m 19s)
  • 07:13 kartik@deploy1002: kartik: Continuing with sync
  • 07:06 kartik@deploy1002: kartik: Backport for Enable MinT for Wikipedia readers MVP on a set of pilot wikis (T363465) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 07:04 kartik@deploy1002: Started scap: Backport for Enable MinT for Wikipedia readers MVP on a set of pilot wikis (T363465)
  • 06:45 arnaudb@cumin1002: dbctl commit (dc=all): 'weight es1038 T368401', diff saved to https://phabricator.wikimedia.org/P65510 and previous config saved to /var/cache/conftool/dbconfig/20240627-064506-arnaudb.json
  • 06:40 arnaudb@deploy1002: Finished scap: Backport for Revert "mariadb: disable writes on es6" (duration: 07m 43s)
  • 06:35 arnaudb@deploy1002: arnaudb: Continuing with sync
  • 06:35 arnaudb@deploy1002: arnaudb: Backport for Revert "mariadb: disable writes on es6" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 06:32 arnaudb@deploy1002: Started scap: Backport for Revert "mariadb: disable writes on es6"
  • 06:23 arnaudb@cumin1002: dbctl commit (dc=all): 'weight es1037 T368401', diff saved to https://phabricator.wikimedia.org/P65509 and previous config saved to /var/cache/conftool/dbconfig/20240627-062338-arnaudb.json
  • 06:16 arnaudb@cumin1002: dbctl commit (dc=all): 'Promote es1038 to es6 primary T368401', diff saved to https://phabricator.wikimedia.org/P65508 and previous config saved to /var/cache/conftool/dbconfig/20240627-061639-arnaudb.json
  • 06:15 arnaudb: Starting es6 eqiad failover from es1037 to es1038 - T368401
  • 06:10 arnaudb@cumin1002: dbctl commit (dc=all): 'Set es1038 with weight 0 T368401', diff saved to https://phabricator.wikimedia.org/P65507 and previous config saved to /var/cache/conftool/dbconfig/20240627-061055-arnaudb.json
  • 06:10 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 6 hosts with reason: Primary switchover es6 T368401
  • 06:10 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 6 hosts with reason: Primary switchover es6 T368401
  • 06:09 arnaudb@deploy1002: Finished scap: Backport for mariadb: disable writes on es6 (T368401) (duration: 08m 00s)
  • 06:04 arnaudb@deploy1002: arnaudb: Continuing with sync
  • 06:04 arnaudb@deploy1002: arnaudb: Backport for mariadb: disable writes on es6 (T368401) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 06:01 arnaudb@deploy1002: Started scap: Backport for mariadb: disable writes on es6 (T368401)
  • 03:55 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance
  • 03:55 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance
  • 03:55 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 (T364069)', diff saved to https://phabricator.wikimedia.org/P65506 and previous config saved to /var/cache/conftool/dbconfig/20240627-035544-marostegui.json
  • 03:40 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P65505 and previous config saved to /var/cache/conftool/dbconfig/20240627-034037-marostegui.json
  • 03:25 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214', diff saved to https://phabricator.wikimedia.org/P65504 and previous config saved to /var/cache/conftool/dbconfig/20240627-032530-marostegui.json
  • 03:10 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1214 (T364069)', diff saved to https://phabricator.wikimedia.org/P65503 and previous config saved to /var/cache/conftool/dbconfig/20240627-031023-marostegui.json
  • 00:56 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2175 (T367856)', diff saved to https://phabricator.wikimedia.org/P65502 and previous config saved to /var/cache/conftool/dbconfig/20240627-005613-marostegui.json
  • 00:56 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2175.codfw.wmnet with reason: Maintenance
  • 00:55 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2175.codfw.wmnet with reason: Maintenance
  • 00:55 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T367856)', diff saved to https://phabricator.wikimedia.org/P65501 and previous config saved to /var/cache/conftool/dbconfig/20240627-005549-marostegui.json
  • 00:40 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P65500 and previous config saved to /var/cache/conftool/dbconfig/20240627-004042-marostegui.json
  • 00:25 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P65499 and previous config saved to /var/cache/conftool/dbconfig/20240627-002535-marostegui.json
  • 00:10 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T367856)', diff saved to https://phabricator.wikimedia.org/P65498 and previous config saved to /var/cache/conftool/dbconfig/20240627-001028-marostegui.json

2024-06-26

  • 23:56 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5021.eqsin.wmnet with OS bullseye
  • 23:26 mutante: people1004 - stopped confd which logs every 3 seconds that it can't find any templates (T356296)
  • 23:23 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5021.eqsin.wmnet with reason: host reimage
  • 23:20 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5021.eqsin.wmnet with reason: host reimage
  • 23:10 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1214 (T364069)', diff saved to https://phabricator.wikimedia.org/P65497 and previous config saved to /var/cache/conftool/dbconfig/20240626-231020-marostegui.json
  • 23:10 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance
  • 23:10 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1214.eqiad.wmnet with reason: Maintenance
  • 23:10 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 (T364069)', diff saved to https://phabricator.wikimedia.org/P65496 and previous config saved to /var/cache/conftool/dbconfig/20240626-230958-marostegui.json
  • 22:54 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P65495 and previous config saved to /var/cache/conftool/dbconfig/20240626-225451-marostegui.json
  • 22:47 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp5021.eqsin.wmnet with OS bullseye
  • 22:41 brett@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp5021.eqsin.wmnet
  • 22:39 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211', diff saved to https://phabricator.wikimedia.org/P65494 and previous config saved to /var/cache/conftool/dbconfig/20240626-223944-marostegui.json
  • 22:26 brett@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5020.eqsin.wmnet
  • 22:24 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1211 (T364069)', diff saved to https://phabricator.wikimedia.org/P65493 and previous config saved to /var/cache/conftool/dbconfig/20240626-222434-marostegui.json
  • 22:22 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5020.eqsin.wmnet with OS bullseye
  • 21:50 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5020.eqsin.wmnet with reason: host reimage
  • 21:46 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5020.eqsin.wmnet with reason: host reimage
  • 21:40 cjming: end of UTC late backport window
  • 21:38 cjming@deploy1002: Finished scap: Backport for Homepage: don't load yesterdays edits on desktop (T368405) (duration: 08m 48s)
  • 21:33 cjming@deploy1002: cjming, migr: Continuing with sync
  • 21:32 cjming@deploy1002: cjming, migr: Backport for Homepage: don't load yesterdays edits on desktop (T368405) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:29 cjming@deploy1002: Started scap: Backport for Homepage: don't load yesterdays edits on desktop (T368405)
  • 21:29 hashar: restarting CI Jenkins
  • 21:13 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp5020.eqsin.wmnet with OS bullseye
  • 21:13 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp5020.eqsin.wmnet with OS bullseye
  • 21:05 cjming@deploy1002: Finished scap: Backport for Homepage: log rendering time for each module and each wiki (T368405) (duration: 14m 01s)
  • 20:59 eileen: config revision changed from 0b822cd3 to 994e7b81
  • 20:57 cjming@deploy1002: cjming, migr: Continuing with sync
  • 20:55 cjming@deploy1002: cjming, migr: Backport for Homepage: log rendering time for each module and each wiki (T368405) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:51 cjming@deploy1002: Started scap: Backport for Homepage: log rendering time for each module and each wiki (T368405)
  • 20:50 jdrewniak@deploy1002: Finished scap: Backport for Enable user pages and select special pages in dark mode (1.43.0-wmf.11) (T366364 T366375 T367375 T367581 T367582 T367583) (duration: 08m 09s)
  • 20:47 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp5020.eqsin.wmnet with OS bullseye
  • 20:45 jdrewniak@deploy1002: jdlrobson, jdrewniak: Continuing with sync
  • 20:45 jdrewniak@deploy1002: jdlrobson, jdrewniak: Backport for Enable user pages and select special pages in dark mode (1.43.0-wmf.11) (T366364 T366375 T367375 T367581 T367582 T367583) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:42 jdrewniak@deploy1002: Started scap: Backport for Enable user pages and select special pages in dark mode (1.43.0-wmf.11) (T366364 T366375 T367375 T367581 T367582 T367583)
  • 20:40 jdrewniak@deploy1002: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 06m 58s)
  • 20:33 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 07m 27s)
  • 20:28 brett@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp5020.eqsin.wmnet
  • 20:21 cjming@deploy1002: Finished scap: Backport for Update QuickSurvey coverage rate for Automoderator patroller workstream survey (T362969) (duration: 08m 46s)
  • 20:15 cjming@deploy1002: cjming, kgraessle: Continuing with sync
  • 20:14 cjming@deploy1002: cjming, kgraessle: Backport for Update QuickSurvey coverage rate for Automoderator patroller workstream survey (T362969) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:12 cjming@deploy1002: Started scap: Backport for Update QuickSurvey coverage rate for Automoderator patroller workstream survey (T362969)
  • 20:08 mutante: lists1001:/lib/systemd/system# rm wmf_auto_restart_apache2.* ; systemctl reset-failed - reaction to monitoring alert "FIRING: SystemdUnitFailed: wmf_auto_restart_apache2.service on lists1001:9100"
  • 20:08 brett@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5019.eqsin.wmnet
  • 20:05 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5019.eqsin.wmnet with OS bullseye
  • 19:48 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.43.0-wmf.11 refs T366956
  • 19:40 jhathaway@deploy1002: Finished scap: (no justification provided) (duration: 02m 38s)
  • 19:39 jhathaway@deploy1002: Started scap: (no justification provided)
  • 19:33 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5019.eqsin.wmnet with reason: host reimage
  • 19:28 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5019.eqsin.wmnet with reason: host reimage
  • 19:18 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:16 cmooney@cumin1002: START - Cookbook sre.dns.netbox
  • 19:11 ottomata: re-enabling varnishkafka-eventlogging and varnish /beacon/event handling on cache text nodes. /beacon/event/ redirects which breaks the MediaWikiPingback usage - T238230
  • 19:02 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.43.0-wmf.11 refs T366956
  • 18:56 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp5019.eqsin.wmnet with OS bullseye
  • 18:55 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp5019.eqsin.wmnet with OS bullseye
  • 18:26 sukhe@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 18:26 sukhe@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove ntp.anycast.wmnet - sukhe@cumin1002"
  • 18:25 sukhe@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove ntp.anycast.wmnet - sukhe@cumin1002"
  • 18:24 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1211 (T364069)', diff saved to https://phabricator.wikimedia.org/P65490 and previous config saved to /var/cache/conftool/dbconfig/20240626-182355-marostegui.json
  • 18:23 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance
  • 18:23 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1211.eqiad.wmnet with reason: Maintenance
  • 18:23 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 (T364069)', diff saved to https://phabricator.wikimedia.org/P65489 and previous config saved to /var/cache/conftool/dbconfig/20240626-182333-marostegui.json
  • 18:23 sukhe@cumin1002: START - Cookbook sre.dns.netbox
  • 18:19 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp5019.eqsin.wmnet with OS bullseye
  • 18:17 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.43.0-wmf.11 refs T366956
  • 18:14 sukhe: # etcdctl --username root --endpoints https://conf1007.eqiad.wmnet:4001 rmdir /conftool/v1/pools/${site}/dnsbox/ntp: T366360
  • 18:12 brett@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp5019.eqsin.wmnet
  • 18:08 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P65488 and previous config saved to /var/cache/conftool/dbconfig/20240626-180824-marostegui.json
  • 18:07 xcollazo@deploy1002: Finished deploy [airflow-dags/analytics@5121748]: Deploying latest DAGs to analytics Airflow instance. (duration: 00m 39s)
  • 18:06 xcollazo@deploy1002: Started deploy [airflow-dags/analytics@5121748]: Deploying latest DAGs to analytics Airflow instance.
  • 17:59 sukhe: sudo cumin -b10 "A:cp-text" "run-puppet-agent"
  • 17:58 sukhe: sudo cumin -b1 -s30 "A:cp-text" "run-puppet-agent"
  • 17:53 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209', diff saved to https://phabricator.wikimedia.org/P65487 and previous config saved to /var/cache/conftool/dbconfig/20240626-175317-marostegui.json
  • 17:51 ottomata: disabling varnishkafka-eventlogging and varnish /beacon/event handling on ache text nodes. Puppet is disabled on all cache text, will test a few at a time first. - T238230
  • 17:46 sukhe: disable puppet in A:cp-text
  • 17:43 brett@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5018.eqsin.wmnet
  • 17:40 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5018.eqsin.wmnet with OS bullseye
  • 17:39 sukhe: sudo cumin "A:dnsbox" "run-puppet-agent"
  • 17:38 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1209 (T364069)', diff saved to https://phabricator.wikimedia.org/P65486 and previous config saved to /var/cache/conftool/dbconfig/20240626-173810-marostegui.json
  • 17:37 mnz@deploy1002: Finished deploy [airflow-dags/research@5121748]: (no justification provided) (duration: 00m 11s)
  • 17:37 mnz@deploy1002: Started deploy [airflow-dags/research@5121748]: (no justification provided)
  • 17:29 xcollazo@deploy1002: Finished deploy [analytics/refinery@ca1acb3] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@ca1acb34] (duration: 02m 54s)
  • 17:26 xcollazo@deploy1002: Started deploy [analytics/refinery@ca1acb3] (hadoop-test): Regular analytics weekly train TEST [analytics/refinery@ca1acb34]
  • 17:26 xcollazo@deploy1002: Finished deploy [analytics/refinery@ca1acb3] (thin): Regular analytics weekly train THIN [analytics/refinery@ca1acb34] (duration: 04m 12s)
  • 17:22 xcollazo@deploy1002: Started deploy [analytics/refinery@ca1acb3] (thin): Regular analytics weekly train THIN [analytics/refinery@ca1acb34]
  • 17:17 mnz@deploy1002: Finished deploy [airflow-dags/research@1996a7a]: (no justification provided) (duration: 00m 03s)
  • 17:17 mnz@deploy1002: Started deploy [airflow-dags/research@1996a7a]: (no justification provided)
  • 17:16 sukhe: re-enable puppet on A:cp-text
  • 17:14 ladsgroup@deploy1002: Finished scap: Backport for Modify WikiExporter's BATCH_SIZE from 50000 to 10000 (T368098), Skip failing ForeignResourceStructureTest (T362425), Skip failing ForeignResourceStructureTest (T362425), Modify WikiExporter's BATCH_SIZE from 50000 to 10000 (T368098) (duration: 08m 52s)
  • 17:09 ladsgroup@deploy1002: ladsgroup: Continuing with sync
  • 17:08 ladsgroup@deploy1002: ladsgroup: Backport for Modify WikiExporter's BATCH_SIZE from 50000 to 10000 (T368098), Skip failing ForeignResourceStructureTest (T362425), Skip failing ForeignResourceStructureTest (T362425), Modify WikiExporter's BATCH_SIZE from 50000 to 10000 (T368098) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwd
  • 17:06 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5018.eqsin.wmnet with reason: host reimage
  • 17:06 ladsgroup@deploy1002: Started scap: Backport for Modify WikiExporter's BATCH_SIZE from 50000 to 10000 (T368098), Skip failing ForeignResourceStructureTest (T362425), Skip failing ForeignResourceStructureTest (T362425), Modify WikiExporter's BATCH_SIZE from 50000 to 10000 (T368098)
  • 17:03 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5018.eqsin.wmnet with reason: host reimage
  • 17:01 xcollazo@deploy1002: Finished deploy [analytics/refinery@ca1acb3]: Regular analytics weekly train [analytics/refinery@ca1acb34] (duration: 09m 16s)
  • 16:52 xcollazo@deploy1002: Started deploy [analytics/refinery@ca1acb3]: Regular analytics weekly train [analytics/refinery@ca1acb34]
  • 16:52 sukhe: disable puppet on A:cp-text
  • 16:50 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on aqs1013.eqiad.wmnet with reason: Server swap — T362033
  • 16:50 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on aqs1013.eqiad.wmnet with reason: Server swap — T362033
  • 16:44 mnz@deploy1002: Finished deploy [airflow-dags/research@1996a7a]: (no justification provided) (duration: 00m 03s)
  • 16:44 mnz@deploy1002: Started deploy [airflow-dags/research@1996a7a]: (no justification provided)
  • 16:39 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/changeprop-jobqueue: sync
  • 16:38 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/changeprop-jobqueue: sync
  • 16:30 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp5018.eqsin.wmnet with OS bullseye
  • 16:27 xcollazo@deploy1002: Finished deploy [analytics/refinery@ca1acb3]: Regular analytics weekly train [analytics/refinery@ca1acb34] (duration: 00m 29s)
  • 16:27 xcollazo@deploy1002: Started deploy [analytics/refinery@ca1acb3]: Regular analytics weekly train [analytics/refinery@ca1acb34]
  • 16:25 brett@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp5018.eqsin.wmnet
  • 16:15 mnz@deploy1002: Finished deploy [airflow-dags/research@1996a7a]: (no justification provided) (duration: 00m 33s)
  • 16:14 mnz@deploy1002: Started deploy [airflow-dags/research@1996a7a]: (no justification provided)
  • 15:58 sukhe: sudo cumin -b1 -s120 "A:dnsbox and not P{dns6001*}" "run-puppet-agent --enable 'rolling out CR 1048064'"
  • 15:58 sukhe: sudo cumin -b1 -s120 "A:dnsbox and not P{dns6001*}" "run-puppet-agent --enable 'rolling out CR 1049969'"
  • 15:43 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ganeti1029.eqiad.wmnet
  • 15:38 andrew@cumin1002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cloudvirt2003-dev.codfw.wmnet
  • 15:38 andrew@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:38 elukey@cumin1002: START - Cookbook sre.hosts.reboot-single for host ganeti1029.eqiad.wmnet
  • 15:36 andrew@cumin1002: START - Cookbook sre.dns.netbox
  • 15:35 elukey@cumin1002: END (PASS) - Cookbook sre.ganeti.drain-node (exit_code=0) for draining ganeti node ganeti1029.eqiad.wmnet
  • 15:32 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on logstash1024.eqiad.wmnet with reason: Temporary stop to migrate the VM away from the ganeti node
  • 15:32 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 0:15:00 on logstash1024.eqiad.wmnet with reason: Temporary stop to migrate the VM away from the ganeti node
  • 15:32 andrew@cumin1002: START - Cookbook sre.hosts.decommission for hosts cloudvirt2003-dev.codfw.wmnet
  • 15:27 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply
  • 15:27 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-video: apply
  • 15:27 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply
  • 15:27 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply
  • 15:25 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply
  • 15:25 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-video: apply
  • 15:20 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on logstash1023.eqiad.wmnet with reason: Temporary stop to migrate the VM away from the ganeti node
  • 15:20 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 0:15:00 on logstash1023.eqiad.wmnet with reason: Temporary stop to migrate the VM away from the ganeti node
  • 15:16 andrew@cumin1002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cloudvirt2003-dev.codfw.wmnet
  • 15:16 andrew@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:16 andrew@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirt2003-dev.codfw.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1002"
  • 15:15 root@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1006.eqiad.wmnet with OS bullseye
  • 15:14 andrew@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirt2003-dev.codfw.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1002"
  • 15:14 sukhe: sudo cumin "A:dnsbox" 'disable-puppet "rolling out CR 1048064"'
  • 15:12 andrew@cumin1002: START - Cookbook sre.dns.netbox
  • 15:08 andrew@cumin1002: START - Cookbook sre.hosts.decommission for hosts cloudvirt2003-dev.codfw.wmnet
  • 15:08 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudvirt2002-dev.codfw.wmnet
  • 15:07 andrew@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:07 andrew@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirt2002-dev.codfw.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1002"
  • 15:06 andrew@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirt2002-dev.codfw.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1002"
  • 15:04 andrew@cumin1002: START - Cookbook sre.dns.netbox
  • 15:00 taavi@cumin1002: END (ERROR) - Cookbook sre.puppet.renew-cert (exit_code=97) for cloudcephosd1006.eqiad.wmnet: Renew puppet certificate - taavi@cumin1002
  • 14:59 andrew@cumin1002: START - Cookbook sre.hosts.decommission for hosts cloudvirt2002-dev.codfw.wmnet
  • 14:58 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudvirt2001-dev.codfw.wmnet
  • 14:58 andrew@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:58 andrew@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirt2001-dev.codfw.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1002"
  • 14:58 taavi@cumin1002: START - Cookbook sre.puppet.renew-cert for cloudcephosd1006.eqiad.wmnet: Renew puppet certificate - taavi@cumin1002
  • 14:57 andrew@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirt2001-dev.codfw.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1002"
  • 14:54 andrew@cumin1002: START - Cookbook sre.dns.netbox
  • 14:48 andrew@cumin1002: START - Cookbook sre.hosts.decommission for hosts cloudvirt2001-dev.codfw.wmnet
  • 14:40 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:40 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding dbproxy2007 to codfw - jhancock@cumin2002"
  • 14:38 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding dbproxy2007 to codfw - jhancock@cumin2002"
  • 14:33 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 14:29 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for Add shellbox-video vars/config, enable on beta (T356241) (duration: 08m 22s)
  • 14:24 logmsgbot: lucaswerkmeister-wmde@deploy1002 hnowlan, lucaswerkmeister-wmde: Continuing with sync
  • 14:24 logmsgbot: lucaswerkmeister-wmde@deploy1002 hnowlan, lucaswerkmeister-wmde: Backport for Add shellbox-video vars/config, enable on beta (T356241) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:21 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap: Backport for Add shellbox-video vars/config, enable on beta (T356241)
  • 14:21 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore[1004-1006].eqiad.wmnet: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
  • 14:19 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for wikidatawiki: Add namespace 640 (EntitySchema) to $wgContentNamespaces (T368010) (duration: 07m 57s)
  • 14:14 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde: Continuing with sync
  • 14:14 jforrester@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 14:14 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde: Backport for wikidatawiki: Add namespace 640 (EntitySchema) to $wgContentNamespaces (T368010) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:13 jforrester@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 14:13 jforrester@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 14:11 jforrester@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 14:11 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap: Backport for wikidatawiki: Add namespace 640 (EntitySchema) to $wgContentNamespaces (T368010)
  • 14:09 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 14:09 root@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1006.eqiad.wmnet with reason: host reimage
  • 14:08 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 14:07 jforrester@deploy1002: Finished scap: Backport for CodeEditor.vue: add watcher for disabled state (T368504) (duration: 08m 00s)
  • 14:07 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 14:06 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 14:06 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 14:06 root@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1006.eqiad.wmnet with reason: host reimage
  • 14:05 jforrester@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 14:05 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 14:04 jforrester@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 14:04 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 14:04 jforrester@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 14:02 jforrester@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 14:02 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore[1004-1006].eqiad.wmnet: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
  • 14:02 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 14:02 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore[2005-2006].codfw.wmnet: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
  • 14:01 jforrester@deploy1002: jforrester: Continuing with sync
  • 14:01 jforrester@deploy1002: jforrester: Backport for CodeEditor.vue: add watcher for disabled state (T368504) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:01 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 14:01 claime: Deploying statsd-exporter for mw-api-int - T365265
  • 14:01 fnegri@cumin1002: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0) for database btmwiki (T368066)
  • 14:01 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 13:59 jforrester@deploy1002: Started scap: Backport for CodeEditor.vue: add watcher for disabled state (T368504)
  • 13:56 Lucas_WMDE: UTC afternoon backport+config window done (I might deploy a few more patches later out-of-window)
  • 13:55 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for [u4cwiki] Enable importing from dewiki/enwiki/metawiki (T368522), [arbcom_itwiki] Change the logo and a new wordmark and a favicon (T368532) (duration: 08m 49s)
  • 13:50 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde, superpes: Continuing with sync
  • 13:49 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore[2005-2006].codfw.wmnet: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
  • 13:49 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde, superpes: Backport for [u4cwiki] Enable importing from dewiki/enwiki/metawiki (T368522), [arbcom_itwiki] Change the logo and a new wordmark and a favicon (T368532) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:46 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap: Backport for [u4cwiki] Enable importing from dewiki/enwiki/metawiki (T368522), [arbcom_itwiki] Change the logo and a new wordmark and a favicon (T368532)
  • 13:42 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for [ltwiki] Add a new 'rollbacker' usergroup (T367993) (duration: 08m 48s)
  • 13:37 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde, superpes: Continuing with sync
  • 13:36 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde, superpes: Backport for [ltwiki] Add a new 'rollbacker' usergroup (T367993) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:35 fnegri@cumin1002: START - Cookbook sre.wikireplicas.add-wiki for database btmwiki (T368066)
  • 13:33 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap: Backport for [ltwiki] Add a new 'rollbacker' usergroup (T367993)
  • 13:32 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1209 (T364069)', diff saved to https://phabricator.wikimedia.org/P65481 and previous config saved to /var/cache/conftool/dbconfig/20240626-133239-marostegui.json
  • 13:32 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1209.eqiad.wmnet with reason: Maintenance
  • 13:32 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1209.eqiad.wmnet with reason: Maintenance
  • 13:32 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 (T364069)', diff saved to https://phabricator.wikimedia.org/P65480 and previous config saved to /var/cache/conftool/dbconfig/20240626-133216-marostegui.json
  • 13:31 fnegri@cumin1002: END (PASS) - Cookbook sre.wikireplicas.add-wiki (exit_code=0) for database btmwiki (T368066)
  • 13:30 Lucas_WMDE: lucaswerkmeister-wmde@deploy1002 /srv/mediawiki-staging (master $ u=) $ mwscript-k8s namespaceDupes maiwiki -- --fix # T363667, 0 pages/links to fix, i.e. no-op
  • 13:28 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for Meta-Wiki: restrict unfuzzy rights to autoconfirmed (T368416), maiwiki: Remove 'CA' namespace alias (T363667) (duration: 10m 50s)
  • 13:28 elukey@cumin1002: START - Cookbook sre.ganeti.drain-node for draining ganeti node ganeti1029.eqiad.wmnet
  • 13:23 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde, dreamrimmer: Continuing with sync
  • 13:23 root@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1006.eqiad.wmnet with OS bullseye
  • 13:20 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde, dreamrimmer: Backport for Meta-Wiki: restrict unfuzzy rights to autoconfirmed (T368416), maiwiki: Remove 'CA' namespace alias (T363667) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:19 fnegri@cumin1002: START - Cookbook sre.wikireplicas.add-wiki for database btmwiki (T368066)
  • 13:17 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap: Backport for Meta-Wiki: restrict unfuzzy rights to autoconfirmed (T368416), maiwiki: Remove 'CA' namespace alias (T363667)
  • 13:17 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P65479 and previous config saved to /var/cache/conftool/dbconfig/20240626-131709-marostegui.json
  • 13:15 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for [CheckUser] Stop writing old for event tables migration on group1 (T360685) (duration: 12m 09s)
  • 13:13 elukey: reload nginx on registry* nodes (Docker registry) to pick up new logging changes
  • 13:10 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde, dreamyjazz: Continuing with sync
  • 13:06 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde, dreamyjazz: Backport for [CheckUser] Stop writing old for event tables migration on group1 (T360685) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:03 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap: Backport for [CheckUser] Stop writing old for event tables migration on group1 (T360685)
  • 13:02 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203', diff saved to https://phabricator.wikimedia.org/P65478 and previous config saved to /var/cache/conftool/dbconfig/20240626-130201-marostegui.json
  • 12:46 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1203 (T364069)', diff saved to https://phabricator.wikimedia.org/P65476 and previous config saved to /var/cache/conftool/dbconfig/20240626-124654-marostegui.json
  • 12:20 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 12:12 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2148 (T367856)', diff saved to https://phabricator.wikimedia.org/P65471 and previous config saved to /var/cache/conftool/dbconfig/20240626-121158-marostegui.json
  • 12:11 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2148.codfw.wmnet with reason: Maintenance
  • 12:11 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2148.codfw.wmnet with reason: Maintenance
  • 12:11 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2138 (T367856)', diff saved to https://phabricator.wikimedia.org/P65470 and previous config saved to /var/cache/conftool/dbconfig/20240626-121136-marostegui.json
  • 11:56 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2138', diff saved to https://phabricator.wikimedia.org/P65469 and previous config saved to /var/cache/conftool/dbconfig/20240626-115628-marostegui.json
  • 11:55 mvolz@deploy1002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
  • 11:55 mvolz@deploy1002: helmfile [eqiad] START helmfile.d/services/citoid: apply
  • 11:54 mvolz@deploy1002: helmfile [codfw] DONE helmfile.d/services/citoid: apply
  • 11:54 mvolz@deploy1002: helmfile [codfw] START helmfile.d/services/citoid: apply
  • 11:54 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/citoid: apply
  • 11:54 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/citoid: apply
  • 11:52 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/citoid: apply
  • 11:52 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/citoid: apply
  • 11:51 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/citoid: apply
  • 11:51 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/citoid: apply
  • 11:44 mvolz@deploy1002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
  • 11:43 mvolz@deploy1002: helmfile [eqiad] START helmfile.d/services/citoid: apply
  • 11:41 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2138', diff saved to https://phabricator.wikimedia.org/P65468 and previous config saved to /var/cache/conftool/dbconfig/20240626-114121-marostegui.json
  • 11:41 mvolz@deploy1002: helmfile [codfw] DONE helmfile.d/services/citoid: apply
  • 11:40 mvolz@deploy1002: helmfile [codfw] START helmfile.d/services/citoid: apply
  • 11:39 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/citoid: apply
  • 11:39 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/citoid: apply
  • 11:35 moritzm: installing emacs security updates
  • 11:27 mvolz@deploy1002: helmfile [codfw] DONE helmfile.d/services/zotero: apply
  • 11:26 mvolz@deploy1002: helmfile [codfw] START helmfile.d/services/zotero: apply
  • 11:26 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/zotero: apply
  • 11:26 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/zotero: apply
  • 11:26 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2138 (T367856)', diff saved to https://phabricator.wikimedia.org/P65467 and previous config saved to /var/cache/conftool/dbconfig/20240626-112614-marostegui.json
  • 11:24 root@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1006.eqiad.wmnet with OS bullseye
  • 11:24 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/zotero: apply
  • 11:23 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/zotero: apply
  • 11:19 jynus@cumin1002: dbctl commit (dc=all): 'Repool es2022 fully T363812', diff saved to https://phabricator.wikimedia.org/P65466 and previous config saved to /var/cache/conftool/dbconfig/20240626-111934-jynus.json
  • 11:14 mvolz@deploy1002: helmfile [codfw] DONE helmfile.d/services/zotero: apply
  • 11:13 mvolz@deploy1002: helmfile [codfw] START helmfile.d/services/zotero: apply
  • 11:12 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/zotero: apply
  • 11:12 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/zotero: apply
  • 11:07 mvolz@deploy1002: helmfile [codfw] DONE helmfile.d/services/zotero: apply
  • 11:07 mvolz@deploy1002: helmfile [codfw] START helmfile.d/services/zotero: apply
  • 11:06 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/zotero: apply
  • 11:06 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/zotero: apply
  • 10:39 jynus@cumin1002: dbctl commit (dc=all): 'Repool es2022 at 50% T363812', diff saved to https://phabricator.wikimedia.org/P65465 and previous config saved to /var/cache/conftool/dbconfig/20240626-103933-jynus.json
  • 10:25 jynus@cumin1002: dbctl commit (dc=all): 'Repool es2022 after backup T363812', diff saved to https://phabricator.wikimedia.org/P65464 and previous config saved to /var/cache/conftool/dbconfig/20240626-102523-jynus.json
  • 10:20 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 10:13 claime: enabling puppet on cp-text - T367949
  • 10:04 claime: enabling puppet on cp4037 - T367949
  • 10:02 claime: disabling puppet on cp-text - T367949
  • 09:59 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 09:55 slyngs: Update idp.wikimedia.org to CAS 6.6.15.2 (T368503)
  • 09:50 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 09:48 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'readability' for release 'main' .
  • 09:46 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'llm' for release 'main' .
  • 09:44 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 09:38 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'article-descriptions' for release 'main' .
  • 09:01 slyngshede@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host idp-test1002.wikimedia.org with OS bookworm
  • 08:55 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2136 T365805', diff saved to https://phabricator.wikimedia.org/P65463 and previous config saved to /var/cache/conftool/dbconfig/20240626-085511-root.json
  • 08:44 elukey@cumin1002: END (PASS) - Cookbook sre.puppet.renew-cert (exit_code=0) for puppetmaster1003.eqiad.wmnet: Renew puppet certificate - elukey@cumin1002
  • 08:42 elukey@cumin1002: START - Cookbook sre.puppet.renew-cert for puppetmaster1003.eqiad.wmnet: Renew puppet certificate - elukey@cumin1002
  • 08:40 slyngshede@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on idp-test1002.wikimedia.org with reason: host reimage
  • 08:39 hashar@deploy1002: Finished deploy [gerrit/gerrit@2fc2b03]: Gerrit to 3.10 on gerrit1003 # T367419 (duration: 00m 43s)
  • 08:39 hashar@deploy1002: Started deploy [gerrit/gerrit@2fc2b03]: Gerrit to 3.10 on gerrit1003 # T367419
  • 08:38 slyngshede@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on idp-test1002.wikimedia.org with reason: host reimage
  • 08:37 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1203 (T364069)', diff saved to https://phabricator.wikimedia.org/P65462 and previous config saved to /var/cache/conftool/dbconfig/20240626-083733-marostegui.json
  • 08:37 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance
  • 08:37 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1203.eqiad.wmnet with reason: Maintenance
  • 08:37 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1193 (T364069)', diff saved to https://phabricator.wikimedia.org/P65461 and previous config saved to /var/cache/conftool/dbconfig/20240626-083711-marostegui.json
  • 08:32 hashar@deploy1002: Finished deploy [gerrit/gerrit@2fc2b03]: Gerrit to 3.10 on gerrit2002 # T367419 (duration: 00m 48s)
  • 08:31 hashar@deploy1002: Started deploy [gerrit/gerrit@2fc2b03]: Gerrit to 3.10 on gerrit2002 # T367419
  • 08:25 slyngshede@cumin1002: START - Cookbook sre.hosts.reimage for host idp-test1002.wikimedia.org with OS bookworm
  • 08:22 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1193', diff saved to https://phabricator.wikimedia.org/P65460 and previous config saved to /var/cache/conftool/dbconfig/20240626-082204-marostegui.json
  • 08:11 jynus@cumin1002: dbctl commit (dc=all): 'Depool es1025 for backups T363812', diff saved to https://phabricator.wikimedia.org/P65458 and previous config saved to /var/cache/conftool/dbconfig/20240626-081130-jynus.json
  • 08:10 marostegui@cumin1002: dbctl commit (dc=all): 'Set es1023 as es5 master - this is a NOOP', diff saved to https://phabricator.wikimedia.org/P65457 and previous config saved to /var/cache/conftool/dbconfig/20240626-081014-marostegui.json
  • 08:06 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1193', diff saved to https://phabricator.wikimedia.org/P65456 and previous config saved to /var/cache/conftool/dbconfig/20240626-080657-marostegui.json
  • 08:06 marostegui@cumin1002: dbctl commit (dc=all): 'Fix weights for es2021 and es2024', diff saved to https://phabricator.wikimedia.org/P65455 and previous config saved to /var/cache/conftool/dbconfig/20240626-080649-marostegui.json
  • 07:59 jynus@cumin1002: dbctl commit (dc=all): 'Depool es1022 for backups T363812', diff saved to https://phabricator.wikimedia.org/P65454 and previous config saved to /var/cache/conftool/dbconfig/20240626-075946-jynus.json
  • 07:54 jynus@cumin1002: dbctl commit (dc=all): 'Repool es2025 at 100% load', diff saved to https://phabricator.wikimedia.org/P65453 and previous config saved to /var/cache/conftool/dbconfig/20240626-075428-jynus.json
  • 07:50 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1193 (T364069)', diff saved to https://phabricator.wikimedia.org/P65451 and previous config saved to /var/cache/conftool/dbconfig/20240626-075043-marostegui.json
  • 07:44 kevinbazira@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 07:33 jynus@cumin1002: dbctl commit (dc=all): 'Repool es2025 at 50% load', diff saved to https://phabricator.wikimedia.org/P65449 and previous config saved to /var/cache/conftool/dbconfig/20240626-073304-jynus.json
  • 07:28 jynus@cumin1002: dbctl commit (dc=all): 'Repool es2025 with low load for warmup', diff saved to https://phabricator.wikimedia.org/P65448 and previous config saved to /var/cache/conftool/dbconfig/20240626-072810-jynus.json
  • 07:03 moritzm: installing emacs security updates
  • 06:56 marostegui@cumin1002: dbctl commit (dc=all): 'Pool db2136 - running 10.11 with minium weight T365805', diff saved to https://phabricator.wikimedia.org/P65447 and previous config saved to /var/cache/conftool/dbconfig/20240626-065636-marostegui.json
  • 06:52 marostegui: Enable slow query log on db2136 running 10.11 T365805
  • 06:39 marostegui: Install mariadb 10.11 on s4 db2136 (depooled for now) T365805
  • 06:31 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2136 T365805', diff saved to https://phabricator.wikimedia.org/P65446 and previous config saved to /var/cache/conftool/dbconfig/20240626-063109-root.json
  • 06:01 marostegui: dbmaint eqiad Drop ipblocks in s1 T367632
  • 05:59 marostegui: dbmaint eqiad Drop ipblocks in s3 T367632
  • 05:57 marostegui: dbmaint eqiad Drop ipblocks in s4 T367632
  • 05:39 ryankemper: [Elastic] `curl -s -X POST https://search.svc.eqiad.wmnet:9243/_cluster/reroute?retry_failed=true` did the trick. Shard initializing, cluster should be back to green soon enough
  • 05:36 ryankemper: [Elastic] One unassigned shard; cluster status yellow. Not a big deal, looks like `shard has exceeded the maximum number of retries [5] on failed allocation attempts`, I'll try a manual `/_cluster/reroute?retry_failed=true`
  • 05:01 marostegui: dbmaint eqiad Drop ipblocks in s5 T367632
  • 04:53 marostegui: dbmaint eqiad Drop ipblocks in s2 T367632
  • 04:51 marostegui: dbmaint eqiad Drop ipblocks in s8 T367632
  • 03:39 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1193 (T364069)', diff saved to https://phabricator.wikimedia.org/P65445 and previous config saved to /var/cache/conftool/dbconfig/20240626-033955-marostegui.json
  • 03:39 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1193.eqiad.wmnet with reason: Maintenance
  • 03:39 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1193.eqiad.wmnet with reason: Maintenance
  • 03:39 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T364069)', diff saved to https://phabricator.wikimedia.org/P65444 and previous config saved to /var/cache/conftool/dbconfig/20240626-033933-marostegui.json
  • 03:24 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P65443 and previous config saved to /var/cache/conftool/dbconfig/20240626-032426-marostegui.json
  • 03:09 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178', diff saved to https://phabricator.wikimedia.org/P65442 and previous config saved to /var/cache/conftool/dbconfig/20240626-030919-marostegui.json
  • 02:54 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1178 (T364069)', diff saved to https://phabricator.wikimedia.org/P65441 and previous config saved to /var/cache/conftool/dbconfig/20240626-025412-marostegui.json
  • 00:21 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2138 (T367856)', diff saved to https://phabricator.wikimedia.org/P65440 and previous config saved to /var/cache/conftool/dbconfig/20240626-002103-marostegui.json
  • 00:20 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 00:20 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 00:20 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2126 (T367856)', diff saved to https://phabricator.wikimedia.org/P65439 and previous config saved to /var/cache/conftool/dbconfig/20240626-002041-marostegui.json
  • 00:05 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2126', diff saved to https://phabricator.wikimedia.org/P65438 and previous config saved to /var/cache/conftool/dbconfig/20240626-000534-marostegui.json

2024-06-25

  • 23:50 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2126', diff saved to https://phabricator.wikimedia.org/P65437 and previous config saved to /var/cache/conftool/dbconfig/20240625-235027-marostegui.json
  • 23:35 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2126 (T367856)', diff saved to https://phabricator.wikimedia.org/P65436 and previous config saved to /var/cache/conftool/dbconfig/20240625-233520-marostegui.json
  • 23:27 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt2006-dev.codfw.wmnet with OS bookworm
  • 23:05 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt2006-dev.codfw.wmnet with reason: host reimage
  • 23:02 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt2006-dev.codfw.wmnet with reason: host reimage
  • 22:44 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt2006-dev.codfw.wmnet with OS bookworm
  • 22:43 brett@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp5017.eqsin.wmnet
  • 22:42 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1178 (T364069)', diff saved to https://phabricator.wikimedia.org/P65435 and previous config saved to /var/cache/conftool/dbconfig/20240625-224249-marostegui.json
  • 22:42 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance
  • 22:42 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1178.eqiad.wmnet with reason: Maintenance
  • 22:42 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T364069)', diff saved to https://phabricator.wikimedia.org/P65434 and previous config saved to /var/cache/conftool/dbconfig/20240625-224226-marostegui.json
  • 22:33 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt2005-dev.codfw.wmnet with OS bookworm
  • 22:27 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P65433 and previous config saved to /var/cache/conftool/dbconfig/20240625-222719-marostegui.json
  • 22:12 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177', diff saved to https://phabricator.wikimedia.org/P65432 and previous config saved to /var/cache/conftool/dbconfig/20240625-221212-marostegui.json
  • 22:10 bvibber: a webVideoTranscode job reported 'No space left on device' from a failed ffmpeg run on mw1446 recently
  • 22:09 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt2005-dev.codfw.wmnet with reason: host reimage
  • 22:05 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt2005-dev.codfw.wmnet with reason: host reimage
  • 21:57 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1177 (T364069)', diff saved to https://phabricator.wikimedia.org/P65431 and previous config saved to /var/cache/conftool/dbconfig/20240625-215705-marostegui.json
  • 21:47 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt2005-dev.codfw.wmnet with OS bookworm
  • 20:44 cjming: end of UTC late backport window
  • 20:41 cjming@deploy1002: Finished scap: Backport for Cleanup: Remove wgNavigationTimingSurveyName (T367128) (duration: 08m 29s)
  • 20:36 cjming@deploy1002: jdlrobson, cjming: Continuing with sync
  • 20:35 cjming@deploy1002: jdlrobson, cjming: Backport for Cleanup: Remove wgNavigationTimingSurveyName (T367128) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:32 cjming@deploy1002: Started scap: Backport for Cleanup: Remove wgNavigationTimingSurveyName (T367128)
  • 20:31 cjming@deploy1002: Finished scap: Backport for Enable dark mode on more pages (T366378 T367374 T366373 T366520 T366373) (duration: 15m 04s)
  • 20:26 cjming@deploy1002: jdlrobson, cjming: Continuing with sync
  • 20:19 cjming@deploy1002: jdlrobson, cjming: Backport for Enable dark mode on more pages (T366378 T367374 T366373 T366520 T366373) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:16 cjming@deploy1002: Started scap: Backport for Enable dark mode on more pages (T366378 T367374 T366373 T366520 T366373)
  • 20:14 cjming@deploy1002: Finished scap: Backport for Temporarily disable '4K' 2160p and mid 1440p transcodes (T368433) (duration: 08m 36s)
  • 20:11 Emperor: restart swift-proxy on ms-fe2010 ms-fe1011 T360913
  • 20:09 cjming@deploy1002: cjming, bvibber: Continuing with sync
  • 20:08 cjming@deploy1002: cjming, bvibber: Backport for Temporarily disable '4K' 2160p and mid 1440p transcodes (T368433) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:05 cjming@deploy1002: Started scap: Backport for Temporarily disable '4K' 2160p and mid 1440p transcodes (T368433)
  • 20:03 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp5017.eqsin.wmnet with OS bullseye
  • 20:01 hashar@deploy1002: Finished deploy [integration/docroot@1eb5f4c]: remove CollaborationKit T368092 (duration: 00m 07s)
  • 20:01 hashar@deploy1002: Started deploy [integration/docroot@1eb5f4c]: remove CollaborationKit T368092
  • 19:29 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2126 (T367856)', diff saved to https://phabricator.wikimedia.org/P65430 and previous config saved to /var/cache/conftool/dbconfig/20240625-192947-marostegui.json
  • 19:29 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 19:29 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 19:29 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2126.codfw.wmnet with reason: Maintenance
  • 19:29 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2126.codfw.wmnet with reason: Maintenance
  • 19:29 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2125 (T367856)', diff saved to https://phabricator.wikimedia.org/P65429 and previous config saved to /var/cache/conftool/dbconfig/20240625-192910-marostegui.json
  • 19:28 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp5017.eqsin.wmnet with reason: host reimage
  • 19:25 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp5017.eqsin.wmnet with reason: host reimage
  • 19:23 sukhe: re-enable puppet on lvs2011
  • 19:14 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2125', diff saved to https://phabricator.wikimedia.org/P65428 and previous config saved to /var/cache/conftool/dbconfig/20240625-191403-marostegui.json
  • 18:58 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2125', diff saved to https://phabricator.wikimedia.org/P65426 and previous config saved to /var/cache/conftool/dbconfig/20240625-185856-marostegui.json
  • 18:49 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp5017.eqsin.wmnet with OS bullseye
  • 18:49 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp5017.eqsin.wmnet with OS bullseye
  • 18:43 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2125 (T367856)', diff saved to https://phabricator.wikimedia.org/P65425 and previous config saved to /var/cache/conftool/dbconfig/20240625-184349-marostegui.json
  • 18:31 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp5017.eqsin.wmnet with OS bullseye
  • 18:28 brett@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp5017.eqsin.wmnet
  • 18:22 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt2004-dev.codfw.wmnet with OS bookworm
  • 18:14 jhuneidi@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.43.0-wmf.11 refs T366956
  • 18:06 topranks: bringing up link from ssw1-a1-codfw to ssw1-d1-codfw T364095
  • 17:57 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt2004-dev.codfw.wmnet with reason: host reimage
  • 17:55 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt2004-dev.codfw.wmnet with reason: host reimage
  • 17:51 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore2004.codfw.wmnet: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
  • 17:44 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore2004.codfw.wmnet: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
  • 17:43 brett: Re-re-pooling lvs2011 - T368165
  • 17:37 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt2004-dev.codfw.wmnet with OS bookworm
  • 17:36 brett: Depooling lvs2011 due to elevated socket/tcp errors - T368165
  • 17:28 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt2004-dev.codfw.wmnet with OS bookworm
  • 17:28 brett: Pooling lvs2011 - T368165
  • 17:25 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1177 (T364069)', diff saved to https://phabricator.wikimedia.org/P65424 and previous config saved to /var/cache/conftool/dbconfig/20240625-172502-marostegui.json
  • 17:24 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance
  • 17:24 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1177.eqiad.wmnet with reason: Maintenance
  • 17:24 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T364069)', diff saved to https://phabricator.wikimedia.org/P65423 and previous config saved to /var/cache/conftool/dbconfig/20240625-172440-marostegui.json
  • 17:09 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P65422 and previous config saved to /var/cache/conftool/dbconfig/20240625-170933-marostegui.json
  • 17:06 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:ml-cache-codfw: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
  • 17:04 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt2004-dev.codfw.wmnet with reason: host reimage
  • 17:02 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt2004-dev.codfw.wmnet with reason: host reimage
  • 17:01 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:aqs-codfw: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
  • 16:54 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172', diff saved to https://phabricator.wikimedia.org/P65421 and previous config saved to /var/cache/conftool/dbconfig/20240625-165426-marostegui.json
  • 16:49 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:ml-cache-codfw: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
  • 16:43 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt2004-dev.codfw.wmnet with OS bookworm
  • 16:39 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1172 (T364069)', diff saved to https://phabricator.wikimedia.org/P65420 and previous config saved to /var/cache/conftool/dbconfig/20240625-163919-marostegui.json
  • 16:37 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:ml-cache-eqiad: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
  • 16:33 arnaudb@cumin1002: dbctl commit (dc=all): 'es1035 (re)pooling @ 100%: post T365986 repool', diff saved to https://phabricator.wikimedia.org/P65419 and previous config saved to /var/cache/conftool/dbconfig/20240625-163330-arnaudb.json
  • 16:31 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw1437.eqiad.wmnet
  • 16:31 cgoubert@cumin1002: START - Cookbook sre.hosts.remove-downtime for mw1437.eqiad.wmnet
  • 16:27 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mw1437.eqiad.wmnet with reason: Resizing disk
  • 16:27 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on mw1437.eqiad.wmnet with reason: Resizing disk
  • 16:23 bvibber: running requeueTranscodes for missing audio files on commons (mwmaint1002) cf T368364
  • 16:23 claime: depooling mw1437
  • 16:19 claime: cleaning up shellbox leftover files on mw1437.eqiad.wmnet
  • 16:19 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:ml-cache-eqiad: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
  • 16:18 arnaudb@cumin1002: dbctl commit (dc=all): 'es1035 (re)pooling @ 75%: post T365986 repool', diff saved to https://phabricator.wikimedia.org/P65418 and previous config saved to /var/cache/conftool/dbconfig/20240625-161824-arnaudb.json
  • 16:15 claime: Extending vg-srv on mw1437
  • 16:10 brennen@deploy1002: Finished deploy [phabricator/deployment@72ad841]: deploy phab1004 for T368392 - followup T364728 (duration: 00m 39s)
  • 16:10 brennen@deploy1002: Started deploy [phabricator/deployment@72ad841]: deploy phab1004 for T368392 - followup T364728
  • 16:09 brennen@deploy1002: Finished deploy [phabricator/deployment@72ad841]: deploy phab2002 for T368392 - followup T364728 (duration: 00m 33s)
  • 16:08 brennen@deploy1002: Started deploy [phabricator/deployment@72ad841]: deploy phab2002 for T368392 - followup T364728
  • 16:05 brennen: silencing phabricator hosts prior to deploy
  • 16:03 arnaudb@cumin1002: dbctl commit (dc=all): 'es1035 (re)pooling @ 50%: post T365986 repool', diff saved to https://phabricator.wikimedia.org/P65417 and previous config saved to /var/cache/conftool/dbconfig/20240625-160318-arnaudb.json
  • 15:33 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:aqs-codfw: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
  • 15:33 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching aqs[1011-1021].eqiad.wmnet: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
  • 15:33 arnaudb@cumin1002: dbctl commit (dc=all): 'es1035 (re)pooling @ 10%: post T365986 repool', diff saved to https://phabricator.wikimedia.org/P65415 and previous config saved to /var/cache/conftool/dbconfig/20240625-153307-arnaudb.json
  • 15:31 Dreamy_Jazz: Ran `mwscript extensions/CheckUser/maintenance/deleteReadOldRowsInCuChanges.php --wiki=testwiki` for T366781
  • 15:22 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
  • 15:21 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
  • 15:21 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
  • 15:20 claime: Deploying statsd to mw-api-ext - T365265
  • 15:19 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
  • 15:18 arnaudb@cumin1002: dbctl commit (dc=all): 'es1035 (re)pooling @ 5%: post T365986 repool', diff saved to https://phabricator.wikimedia.org/P65414 and previous config saved to /var/cache/conftool/dbconfig/20240625-151802-arnaudb.json
  • 15:06 brennen@deploy1002: Finished deploy [phabricator/deployment@f58dd50]: deploy phab1004 for T368392 (duration: 00m 50s)
  • 15:05 brennen@deploy1002: Started deploy [phabricator/deployment@f58dd50]: deploy phab1004 for T368392
  • 15:05 brennen@deploy1002: Finished deploy [phabricator/deployment@f58dd50]: deploy phab2002 for T368392 (duration: 00m 33s)
  • 15:04 brennen@deploy1002: Started deploy [phabricator/deployment@f58dd50]: deploy phab2002 for T368392
  • 15:03 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab1004.eqiad.wmnet with reason: Phabricator/Phorge update
  • 15:03 jelto@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on phab1004.eqiad.wmnet with reason: Phabricator/Phorge update
  • 15:03 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab2002.codfw.wmnet with reason: Phabricator/Phorge update
  • 15:02 jelto@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on phab2002.codfw.wmnet with reason: Phabricator/Phorge update
  • 15:00 topranks: rebooting lsw1-e5-eqiad to upgrade JunOS on switch T365986
  • 14:58 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on 7 hosts with reason: JunOS upgrade lsw1-e5-eqiad
  • 14:58 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:40:00 on 7 hosts with reason: JunOS upgrade lsw1-e5-eqiad
  • 14:57 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on lsw1-e5-eqiad,lsw1-e5-eqiad IPv6,ssw1-e1-eqiad.mgmt,ssw1-f1-eqiad.mgmt with reason: JunOS upgrade lsw1-e5-eqiad
  • 14:57 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:40:00 on lsw1-e5-eqiad,lsw1-e5-eqiad IPv6,ssw1-e1-eqiad.mgmt,ssw1-f1-eqiad.mgmt with reason: JunOS upgrade lsw1-e5-eqiad
  • 14:56 cdanis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 14:56 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:45:00 on es1035.eqiad.wmnet with reason: T365986
  • 14:56 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 0:45:00 on es1035.eqiad.wmnet with reason: T365986
  • 14:56 arnaudb@cumin1002: dbctl commit (dc=all): 'T365986 - depool es1035', diff saved to https://phabricator.wikimedia.org/P65413 and previous config saved to /var/cache/conftool/dbconfig/20240625-145558-arnaudb.json
  • 14:55 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:50:00 on lsw1-e5-eqiad.mgmt with reason: prep JunOS upgrade lsw1-e5-eqiad
  • 14:55 cdanis@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 14:55 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:50:00 on lsw1-e5-eqiad.mgmt with reason: prep JunOS upgrade lsw1-e5-eqiad
  • 14:50 cdanis@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
  • 14:49 cdanis@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
  • 14:45 urbanecm@deploy1002: Finished scap: Backport for WikiPageWriter: Do not run AbuseFilter when UltimateAuthority is used (T368275) (duration: 11m 45s)
  • 14:40 urbanecm@deploy1002: urbanecm: Continuing with sync
  • 14:40 urbanecm@deploy1002: urbanecm: Backport for WikiPageWriter: Do not run AbuseFilter when UltimateAuthority is used (T368275) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:36 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:36 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding dbproxy2005 to codfw - jhancock@cumin2002"
  • 14:35 sukhe: sudo cumin -b1 -s900 "A:dnsbox" "run-puppet-agent --enable 'rolling out CR 1049165' && systemctl restart ntp.service"
  • 14:35 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding dbproxy2005 to codfw - jhancock@cumin2002"
  • 14:33 vgutierrez: rolling upgrade of fifo-log-demux on A:cp-eqiad - T364383
  • 14:33 urbanecm@deploy1002: Started scap: Backport for WikiPageWriter: Do not run AbuseFilter when UltimateAuthority is used (T368275)
  • 14:30 vgutierrez: disable puppet on A:cp-eqiad before merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/1049570 - T364383
  • 14:24 dcausse: re-indexing all wikidata entity schemas (T368010)
  • 14:23 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 14:17 urbanecm@deploy1002: Finished scap: Backport for Add change tag "Community Configuration" (T366989), Add change tag "Community Configuration" (T366989), WikiPageWriter: Do not run AbuseFilter when UltimateAuthority is used (T368275) (duration: 58m 28s)
  • 14:15 sukhe: sudo cumin "A:dnsbox" 'disable-puppet "rolling out CR 1049165"'
  • 14:12 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching aqs[1011-1021].eqiad.wmnet: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
  • 14:11 sukhe@cumin1002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-low-traffic-codfw and A:lvs
  • 14:10 sukhe@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-low-traffic-codfw and A:lvs
  • 14:09 urbanecm@deploy1002: urbanecm: Continuing with sync
  • 14:05 sukhe: restart pybal on lvs2014
  • 14:02 eevans@deploy1002: helmfile [eqiad] DONE helmfile.d/services/data-gateway: apply
  • 14:02 eevans@deploy1002: helmfile [eqiad] START helmfile.d/services/data-gateway: apply
  • 14:01 eevans@deploy1002: helmfile [codfw] DONE helmfile.d/services/data-gateway: apply
  • 14:01 eevans@deploy1002: helmfile [codfw] START helmfile.d/services/data-gateway: apply
  • 14:00 sukhe@cumin1002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-low-traffic-eqiad and A:lvs
  • 14:00 urbanecm@deploy1002: urbanecm: Backport for Add change tag "Community Configuration" (T366989), Add change tag "Community Configuration" (T366989), WikiPageWriter: Do not run AbuseFilter when UltimateAuthority is used (T368275) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:59 eevans@deploy1002: helmfile [staging] DONE helmfile.d/services/data-gateway: apply
  • 13:59 eevans@deploy1002: helmfile [staging] START helmfile.d/services/data-gateway: apply
  • 13:54 sukhe@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-low-traffic-eqiad and A:lvs
  • 13:51 sukhe: restart pybal on lvs1020
  • 13:44 sukhe: disable puppet on A:lvs and A:codfw for CR 1049560
  • 13:43 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:43 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt forkrb1002 - jclark@cumin1002"
  • 13:42 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt forkrb1002 - jclark@cumin1002"
  • 13:39 jclark@cumin1002: START - Cookbook sre.dns.netbox
  • 13:37 mvernon@cumin2002: conftool action : set/pooled=yes; selector: cluster=apus
  • 13:36 eevans@deploy1002: helmfile [staging] DONE helmfile.d/services/data-gateway: apply
  • 13:36 eevans@deploy1002: helmfile [staging] START helmfile.d/services/data-gateway: apply
  • 13:29 vgutierrez: IPIP encapsulation enabled on ldap-ro.eqiad.wikimedia.org - T367861
  • 13:26 vgutierrez: rolling restart of pybal on lvs1020 and lvs1018 - T367861
  • 13:18 urbanecm@deploy1002: Started scap: Backport for Add change tag "Community Configuration" (T366989), Add change tag "Community Configuration" (T366989), WikiPageWriter: Do not run AbuseFilter when UltimateAuthority is used (T368275)
  • 13:07 fabfur: temporary disabled puppet on cp4037 to test benthos configuration (T367756)
  • 12:51 cgoubert@deploy1002: Finished scap: Deploy udp2log rate-limiting - T365655 - T368098 (duration: 05m 49s)
  • 12:46 cgoubert@deploy1002: Started scap: Deploy udp2log rate-limiting - T365655 - T368098
  • 12:44 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 12:44 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 12:42 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 12:42 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 12:12 XioNoX: push NTP changes on pfw3
  • 12:09 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1172 (T364069)', diff saved to https://phabricator.wikimedia.org/P65411 and previous config saved to /var/cache/conftool/dbconfig/20240625-120926-marostegui.json
  • 12:09 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance
  • 12:09 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1172.eqiad.wmnet with reason: Maintenance
  • 11:58 vgutierrez: rolling upgrade of fifo-log-demux on A:cp-esams - T364383
  • 11:56 vgutierrez: disable puppet on A:cp-esams before merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/1049529 - T364383
  • 11:55 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 11:53 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 11:45 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 11:45 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 10:40 marostegui: m2 dbmaint eqiad Stop db1217:3322 to clone db1228 T368374
  • 10:12 jmm@deploy1002: Finished scap: (no justification provided) (duration: 03m 30s)
  • 10:11 jmm@deploy1002: Started scap: (no justification provided)
  • 09:53 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 21 days, 0:00:00 on 11 hosts with reason: Turning down appserver clusters
  • 09:53 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 21 days, 0:00:00 on 11 hosts with reason: Turning down appserver clusters
  • 09:50 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 21 days, 0:00:00 on 25 hosts with reason: Turning down appserver clusters
  • 09:49 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 21 days, 0:00:00 on 25 hosts with reason: Turning down appserver clusters
  • 09:44 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db[1217,1228].eqiad.wmnet with reason: Cloning
  • 09:44 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db[1217,1228].eqiad.wmnet with reason: Cloning
  • 09:34 marostegui@cumin1002: dbctl commit (dc=all): 'Remove db1228 from dbctl T368374', diff saved to https://phabricator.wikimedia.org/P65409 and previous config saved to /var/cache/conftool/dbconfig/20240625-093454-marostegui.json
  • 09:34 slyngs: Switching idp-test.wikimedia.org to CAS 7
  • 09:32 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1228 T368374', diff saved to https://phabricator.wikimedia.org/P65408 and previous config saved to /var/cache/conftool/dbconfig/20240625-093221-root.json
  • 08:45 jynus@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2025.codfw.wmnet with reason: full dump
  • 08:45 jynus@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2025.codfw.wmnet with reason: full dump
  • 08:32 jynus@cumin1002: dbctl commit (dc=all): 'Depool es2025', diff saved to https://phabricator.wikimedia.org/P65407 and previous config saved to /var/cache/conftool/dbconfig/20240625-083216-jynus.json
  • 08:31 jynus@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on es2022.codfw.wmnet with reason: full dump
  • 08:31 jynus@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on es2022.codfw.wmnet with reason: full dump
  • 08:26 jynus@cumin1002: dbctl commit (dc=all): 'Depool es2022', diff saved to https://phabricator.wikimedia.org/P65406 and previous config saved to /var/cache/conftool/dbconfig/20240625-082649-jynus.json
  • 07:36 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host dborch1001.wikimedia.org
  • 07:32 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host dborch1001.wikimedia.org
  • 07:19 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 07:19 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 07:18 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T364069)', diff saved to https://phabricator.wikimedia.org/P65405 and previous config saved to /var/cache/conftool/dbconfig/20240625-071855-marostegui.json
  • 07:14 marostegui: Optimize pagelinks on old s8 codfw master db2165 dbmaint T364069
  • 07:06 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2165.codfw.wmnet with reason: Long schema change
  • 07:06 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2165.codfw.wmnet with reason: Long schema change
  • 07:03 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P65404 and previous config saved to /var/cache/conftool/dbconfig/20240625-070348-marostegui.json
  • 07:02 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2165 T368355', diff saved to https://phabricator.wikimedia.org/P65403 and previous config saved to /var/cache/conftool/dbconfig/20240625-070252-marostegui.json
  • 07:01 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db2161 to s8 primary T368355', diff saved to https://phabricator.wikimedia.org/P65402 and previous config saved to /var/cache/conftool/dbconfig/20240625-070127-marostegui.json
  • 07:01 marostegui: Starting s8 codfw failover from db2165 to db2161 - T368355
  • 07:00 arnaudb@deploy1002: Finished scap: Backport for Revert "dbconfig: temporary disable writes on es7" (duration: 07m 47s)
  • 06:55 arnaudb@deploy1002: arnaudb: Continuing with sync
  • 06:55 arnaudb@deploy1002: arnaudb: Backport for Revert "dbconfig: temporary disable writes on es7" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 06:54 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 06:52 arnaudb@deploy1002: Started scap: Backport for Revert "dbconfig: temporary disable writes on es7"
  • 06:48 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167', diff saved to https://phabricator.wikimedia.org/P65401 and previous config saved to /var/cache/conftool/dbconfig/20240625-064841-marostegui.json
  • 06:45 arnaudb@deploy1002: Sync cancelled.
  • 06:45 arnaudb@deploy1002: arnaudb: Backport for Revert "dbconfig: temporary disable writes on es7" synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 06:42 arnaudb@deploy1002: Started scap: Backport for Revert "dbconfig: temporary disable writes on es7"
  • 06:40 arnaudb@cumin1002: dbctl commit (dc=all): 'T368020', diff saved to https://phabricator.wikimedia.org/P65400 and previous config saved to /var/cache/conftool/dbconfig/20240625-064000-arnaudb.json
  • 06:39 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 33 hosts with reason: Primary switchover s8 T368355
  • 06:39 marostegui@cumin1002: dbctl commit (dc=all): 'Set db2161 with weight 0 T368355', diff saved to https://phabricator.wikimedia.org/P65399 and previous config saved to /var/cache/conftool/dbconfig/20240625-063908-root.json
  • 06:38 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 33 hosts with reason: Primary switchover s8 T368355
  • 06:34 arnaudb@cumin1002: dbctl commit (dc=all): 'Promote es1039 to es7 primary T368020', diff saved to https://phabricator.wikimedia.org/P65398 and previous config saved to /var/cache/conftool/dbconfig/20240625-063453-arnaudb.json
  • 06:33 arnaudb: Starting es7 eqiad failover from es1035 to es1039 - T368020
  • 06:33 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1167 (T364069)', diff saved to https://phabricator.wikimedia.org/P65397 and previous config saved to /var/cache/conftool/dbconfig/20240625-063334-marostegui.json
  • 06:26 arnaudb@cumin1002: dbctl commit (dc=all): 'Set es1039 with weight 0 T368020', diff saved to https://phabricator.wikimedia.org/P65396 and previous config saved to /var/cache/conftool/dbconfig/20240625-062640-arnaudb.json
  • 06:25 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 6 hosts with reason: Primary switchover es7 T368020
  • 06:25 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 6 hosts with reason: Primary switchover es7 T368020
  • 06:24 arnaudb@deploy1002: Finished scap: Backport for dbconfig: temporary disable writes on es7 (T368020) (duration: 18m 47s)
  • 06:19 arnaudb@deploy1002: arnaudb: Continuing with sync
  • 06:17 arnaudb@deploy1002: arnaudb: Backport for dbconfig: temporary disable writes on es7 (T368020) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 06:11 marostegui: Drop ipblocks from s7 T367632
  • 06:05 arnaudb@deploy1002: Started scap: Backport for dbconfig: temporary disable writes on es7 (T368020)
  • 06:02 marostegui: Drop ipblocks from s6 T367632
  • 05:33 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1167 (T364069)', diff saved to https://phabricator.wikimedia.org/P65395 and previous config saved to /var/cache/conftool/dbconfig/20240625-053312-marostegui.json
  • 05:33 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 05:33 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 05:32 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance
  • 05:32 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1167.eqiad.wmnet with reason: Maintenance
  • 05:32 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2125 (T367856)', diff saved to https://phabricator.wikimedia.org/P65394 and previous config saved to /var/cache/conftool/dbconfig/20240625-053239-marostegui.json
  • 05:32 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2125.codfw.wmnet with reason: Maintenance
  • 05:32 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2125.codfw.wmnet with reason: Maintenance
  • 04:01 mwpresync@deploy1002: Pruned MediaWiki: 1.43.0-wmf.8 (duration: 00m 55s)
  • 03:55 mwpresync@deploy1002: Finished scap: testwikis wikis to 1.43.0-wmf.11 refs T366956 (duration: 52m 19s)
  • 03:03 mwpresync@deploy1002: Started scap: testwikis wikis to 1.43.0-wmf.11 refs T366956
  • 01:48 brett: Running authdns-update on dns1004 to pool eqsin - T365763
  • 01:43 brett@puppetmaster1001: conftool action : set/pooled=yes; selector: cluster=cache_text,dc=eqsin
  • 01:40 brett: Removing downtime for cp[5017-5024] as nvme drives are installed and hosts back online - T365763
  • 00:43 sukhe: [correction of command] sudo pkill ffmpeg: mw1438, high CPU usage, ffmpeg processes
  • 00:43 sukhe: sudo pkill mpeg: mw1438, high CPU usage, ffmpeg processes
  • 00:01 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on 8 hosts with reason: T365763
  • 00:01 brett@cumin2002: START - Cookbook sre.hosts.downtime for 4:00:00 on 8 hosts with reason: T365763

2024-06-24

  • 23:02 brett: Running authdns-update on dns1004 to depool eqsin - T365763
  • 23:00 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts logstash2003.codfw.wmnet
  • 23:00 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 23:00 cwhite@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: logstash2003.codfw.wmnet decommissioned, removing all IPs except the asset tag one - cwhite@cumin2002"
  • 22:57 cwhite@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: logstash2003.codfw.wmnet decommissioned, removing all IPs except the asset tag one - cwhite@cumin2002"
  • 22:53 cwhite@cumin2002: START - Cookbook sre.dns.netbox
  • 22:46 cwhite@cumin2002: START - Cookbook sre.hosts.decommission for hosts logstash2003.codfw.wmnet
  • 22:46 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts logstash2002.codfw.wmnet
  • 22:46 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:46 cwhite@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: logstash2002.codfw.wmnet decommissioned, removing all IPs except the asset tag one - cwhite@cumin2002"
  • 22:41 cwhite@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: logstash2002.codfw.wmnet decommissioned, removing all IPs except the asset tag one - cwhite@cumin2002"
  • 22:38 cwhite@cumin2002: START - Cookbook sre.dns.netbox
  • 22:26 cwhite@cumin2002: START - Cookbook sre.hosts.decommission for hosts logstash2002.codfw.wmnet
  • 22:26 cwhite@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts logstash2001.codfw.wmnet
  • 22:26 cwhite@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 22:26 cwhite@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: logstash2001.codfw.wmnet decommissioned, removing all IPs except the asset tag one - cwhite@cumin2002"
  • 22:24 cwhite@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: logstash2001.codfw.wmnet decommissioned, removing all IPs except the asset tag one - cwhite@cumin2002"
  • 22:18 cwhite@cumin2002: START - Cookbook sre.dns.netbox
  • 22:11 cwhite@cumin2002: START - Cookbook sre.hosts.decommission for hosts logstash2001.codfw.wmnet
  • 21:34 inflatador: bking@alerts1001 uninstall deb pkg `ripgrep` T368107
  • 21:03 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-redacteddb1001.eqiad.wmnet
  • 21:00 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-codfw: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
  • 20:53 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-redacteddb1001.eqiad.wmnet
  • 20:36 inflatador: bking@alert1001 install `ripgrep` deb pkg T368107
  • 20:22 ladsgroup@deploy1002: Synchronized php-1.43.0-wmf.10/includes/libs/rdbms/loadbalancer/LoadBalancer.php: (no justification provided) (duration: 11m 04s)
  • 20:21 mutante: snapsho1017 - systemctl mask commonsrdf-dump ; systemctl mask commonsjson-dump T368098
  • 20:18 taavi: taavi@snapshot1017 ~ $ sudo systemctl stop commons*.service
  • 20:01 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1056.eqiad.wmnet with OS bookworm
  • 19:35 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1056.eqiad.wmnet with reason: host reimage
  • 19:32 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1056.eqiad.wmnet with reason: host reimage
  • 19:27 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 19:27 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 19:26 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 19:26 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 19:08 mutante: LDAP - added daphnesmit to group 'wmf' - Phabricator: added dsmit-wmf to WMF-NDA group T368140
  • 19:02 sukhe: ms-fe1009: restart swift-proxy: T360913
  • 18:59 mutante: ms-fe1011 - restarted swift-proxy
  • 18:53 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-codfw: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
  • 18:52 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 15 hosts
  • 18:52 eevans@cumin1002: START - Cookbook sre.hosts.remove-downtime for 15 hosts
  • 18:50 eevans@cumin1002: END (ERROR) - Cookbook sre.cassandra.roll-restart (exit_code=97) for nodes matching A:restbase-eqiad: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
  • 18:50 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-eqiad: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
  • 18:50 sukhe: sudo cumin -s1 -b60 'ms-fe1010*,ms-fe1013*' 'systemctl restart swift-proxy'
  • 18:50 mutante: ms-fe1010,ms-fe1013 - restart swift-proxy - T360913
  • 18:48 ladsgroup@deploy1002: Synchronized private/PrivateSettings.php: Rotate ChronologyProtector secret (duration: 11m 33s)
  • 18:46 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching P{P:cassandra%rack = "d"} and A:restbase and A:eqiad: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
  • 18:43 ladsgroup@deploy1002: ladsgroup: Continuing with sync
  • 18:41 ladsgroup@deploy1002: ladsgroup: Rotate ChronologyProtector secret synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 18:17 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1055.eqiad.wmnet with OS bookworm
  • 18:16 mutante: ms-fe1012:~] $ sudo systemctl restart swift-proxy T360913
  • 18:16 mutante: ms-fe1012:~] $ sudo systemctl restart swift-proxy T360931
  • 18:07 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 18:06 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 18:06 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 18:05 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 18:04 sukhe@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs1020.eqiad.wmnet
  • 18:04 sukhe@cumin1002: START - Cookbook sre.hosts.remove-downtime for lvs1020.eqiad.wmnet
  • 18:03 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching P{P:cassandra%rack = "d"} and A:restbase and A:eqiad: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
  • 18:02 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching P{P:cassandra%rack = "b"} and A:restbase and A:eqiad: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
  • 17:57 sukhe: restart on pybal lvs1019
  • 17:56 sukhe@cumin1002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-secondary-eqiad and A:lvs
  • 17:53 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: cluster=apus,dc=eqiad
  • 17:50 sukhe@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-secondary-eqiad and A:lvs
  • 17:50 sukhe@cumin1002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-low-traffic-codfw and A:lvs
  • 17:49 sukhe@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-low-traffic-codfw and A:lvs
  • 17:48 sbassett@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 17:48 sbassett@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 17:48 sbassett@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 17:48 sukhe@cumin1002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-secondary-codfw and A:lvs
  • 17:47 sbassett@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 17:47 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1055.eqiad.wmnet with reason: host reimage
  • 17:47 sukhe@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-secondary-codfw and A:lvs
  • 17:47 sbassett@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 17:47 sbassett@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 17:46 sbassett@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 17:46 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 17:46 sbassett@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 17:46 sbassett@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 17:46 sbassett@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 17:45 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 17:44 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 17:44 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1055.eqiad.wmnet with reason: host reimage
  • 17:43 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 17:34 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: cluster=apus,dc=codfw
  • 17:33 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 17:32 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 17:28 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1055.eqiad.wmnet with OS bookworm
  • 17:28 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 17:27 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 17:23 sukhe: restart pybal on lvs2013
  • 17:20 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 17:19 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 17:18 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching P{P:cassandra%rack = "b"} and A:restbase and A:eqiad: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
  • 17:13 sukhe: restart pybal on lvs1020 and lvs1019
  • 17:09 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 17:08 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 16:55 cdanis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
  • 16:51 cdanis@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
  • 16:49 cdanis@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
  • 16:48 sukhe: restart pybal on lvs1020
  • 16:47 cdanis@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
  • 16:44 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1054.eqiad.wmnet with reason: host reimage
  • 16:41 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1054.eqiad.wmnet with reason: host reimage
  • 16:33 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase[1031,1034-1036].eqiad.wmnet: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
  • 16:27 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1054.eqiad.wmnet with OS bookworm
  • 16:20 sukhe: restart pybal on lvs1020
  • 16:01 dancy@deploy1002: Installation of scap version "4.89.0" completed for 1 hosts
  • 16:00 dancy@deploy1002: Installing scap version "4.89.0" for 1 hosts
  • 15:59 sukhe: restart pybal on lvs1020
  • 15:59 dancy@deploy1002: Installing scap version "4.89.0" for 248 hosts
  • 15:57 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase[1031,1034-1036].eqiad.wmnet: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
  • 15:50 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching aqs1010.eqiad.wmnet: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
  • 15:49 sukhe: restart pybal on lvs1020
  • 15:43 vgutierrez: updated termination_state cache haproxy metrics, expect higher CD and CR rates - T367963
  • 15:42 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching aqs1010.eqiad.wmnet: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
  • 15:29 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: sync
  • 15:29 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifeeds: sync
  • 15:20 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifeeds: sync
  • 15:20 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/wikifeeds: sync
  • 15:17 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-jobrunner: apply
  • 15:16 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifeeds: sync
  • 15:16 elukey@deploy1002: helmfile [staging] START helmfile.d/services/wikifeeds: sync
  • 15:15 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-jobrunner: apply
  • 15:15 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-jobrunner: apply
  • 15:13 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-jobrunner: apply
  • 15:11 mvernon@cumin1002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on A:lvs-secondary-eqiad and A:lvs (T279621)
  • 15:11 claime: Enabling statsd-exporter on mw-jobrunner - T365265
  • 15:11 mvernon@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-secondary-eqiad and A:lvs (T279621)
  • 15:09 vgutierrez: rolling upgrade of fifo-log-demux on A:cp-drmrs - T364383
  • 15:08 Emperor: enable/run puppet on eqiad lvs for apus LVS rollout T279621
  • 15:08 Dreamy_Jazz: Afternoon UTC backport window done
  • 15:08 dreamyjazz@deploy1002: Finished scap: Backport for extension-list: Add IPReputation (T360067) (duration: 30m 37s)
  • 15:07 vgutierrez: [fixed url] disable puppet on A:cp-drmrs before merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/1049198 - T364383
  • 15:06 vgutierrez: disable puppet on A:cp-drmrs before merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/1049104 - T364383
  • 15:02 sukhe: restart pybal on lvs2014
  • 15:01 mvernon@cumin1002: END (ERROR) - Cookbook sre.loadbalancer.restart-pybal (exit_code=97) rolling-restart of pybal on A:lvs-secondary-codfw or A:lvs-low-traffic-codfw and A:lvs (T279621)
  • 15:00 dreamyjazz@deploy1002: kharlan, dreamyjazz: Continuing with sync
  • 14:57 dreamyjazz@deploy1002: kharlan, dreamyjazz: Backport for extension-list: Add IPReputation (T360067) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:56 mvernon@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on A:lvs-secondary-codfw or A:lvs-low-traffic-codfw and A:lvs (T279621)
  • 14:53 elukey@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4052.ulsfo.wmnet
  • 14:52 Emperor: enable/run puppet on codfw lvs for apus LVS rollout T279621
  • 14:49 Emperor: stop puppet on eqiad/codfw lvs prior to apus LVS rollout T279621
  • 14:48 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on cp4052.ulsfo.wmnet with reason: Upgrade glibc
  • 14:48 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 0:15:00 on cp4052.ulsfo.wmnet with reason: Upgrade glibc
  • 14:47 elukey: depool cp4052 to deploy a new version of glibc - T367978
  • 14:47 elukey@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp4052.ulsfo.wmnet
  • 14:37 dreamyjazz@deploy1002: Started scap: Backport for extension-list: Add IPReputation (T360067)
  • 14:34 urbanecm@deploy1002: Finished scap: Backport for ptwiki: Undeploy CommunityConfiguration (T368121) (duration: 07m 31s)
  • 14:32 mnz@deploy1002: Finished deploy [airflow-dags/research@b682892]: (no justification provided) (duration: 00m 31s)
  • 14:31 mnz@deploy1002: Started deploy [airflow-dags/research@b682892]: (no justification provided)
  • 14:27 urbanecm@deploy1002: Started scap: Backport for ptwiki: Undeploy CommunityConfiguration (T368121)
  • 14:27 Dreamy_Jazz: Restarting MediaModeration scanning script - https://wikitech.wikimedia.org/wiki/MediaModeration
  • 14:17 mvernon@cumin1002: conftool action : set/pooled=yes:weight=40; selector: cluster=apus
  • 14:13 urbanecm@deploy1002: Finished scap: Backport for Growth: Enable CommunityConfiguration at idwiki (T366629), Growth: Enable CommunityConfiguration on round 1 wikis (T368121), AX Language selector entrypoint: Fix AX URL (T363183) (duration: 25m 37s)
  • 14:07 urbanecm@deploy1002: kartik, urbanecm: Continuing with sync
  • 14:03 sukhe: running homer in cr*{eqiad*,codfw*} to remove ntp.anycast.wmnet from policies/cr-labs: T366360
  • 13:57 jgiannelos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
  • 13:56 jgiannelos@deploy1002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 13:49 urbanecm@deploy1002: kartik, urbanecm: Backport for Growth: Enable CommunityConfiguration at idwiki (T366629), Growth: Enable CommunityConfiguration on round 1 wikis (T368121), AX Language selector entrypoint: Fix AX URL (T363183) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:47 urbanecm@deploy1002: Started scap: Backport for Growth: Enable CommunityConfiguration at idwiki (T366629), Growth: Enable CommunityConfiguration on round 1 wikis (T368121), AX Language selector entrypoint: Fix AX URL (T363183)
  • 13:47 elukey@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4037.ulsfo.wmnet
  • 13:44 urbanecm@deploy1002: Finished scap: Backport for mediawiki.org: Sync xml/export-*.xsd files with MW core (T343622), CommonSettings: Restore the original behaviour of Reference Previews (T366419), [MediaModeration] Update 'From' email address to wiki@wikimedia.org (T368258) (duration: 35m 30s)
  • 13:41 vgutierrez: disable puppet on A:cp-magru before merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/1049178 - T364383
  • 13:40 vgutierrez: rolling upgrade of fifo-log-demux on A:cp-magru - T364383
  • 13:40 elukey: [correction] depool cp4037 to deploy a new version of glibc - T367978
  • 13:40 elukey@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:15:00 on cp4037.ulsfo.wmnet with reason: Upgrade glibc
  • 13:39 elukey@cumin1002: START - Cookbook sre.hosts.downtime for 0:15:00 on cp4037.ulsfo.wmnet with reason: Upgrade glibc
  • 13:39 elukey@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp4037.ulsfo.wmnet
  • 13:37 urbanecm@deploy1002: urbanecm, func, dreamyjazz: Continuing with sync
  • 13:35 jgiannelos@deploy1002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
  • 13:35 jgiannelos@deploy1002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
  • 13:32 elukey@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp4034.ulsfo.wmnet
  • 13:31 elukey: depool cp4034 to deploy a new version of glibc - T367978
  • 13:29 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 13:29 elukey: uploaded debmonitor-client_0.4.0 to apt.wikimedia.org buster-wikimedia,bullseye-wikimedia,bookworm-wikimedia
  • 13:29 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 13:24 urbanecm@deploy1002: urbanecm, func, dreamyjazz: Backport for mediawiki.org: Sync xml/export-*.xsd files with MW core (T343622), CommonSettings: Restore the original behaviour of Reference Previews (T366419), [MediaModeration] Update 'From' email address to wiki@wikimedia.org (T368258) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:22 mnz@deploy1002: Finished deploy [airflow-dags/research@1996a7a]: (no justification provided) (duration: 00m 32s)
  • 13:22 mnz@deploy1002: Started deploy [airflow-dags/research@1996a7a]: (no justification provided)
  • 13:08 urbanecm@deploy1002: Started scap: Backport for mediawiki.org: Sync xml/export-*.xsd files with MW core (T343622), CommonSettings: Restore the original behaviour of Reference Previews (T366419), [MediaModeration] Update 'From' email address to wiki@wikimedia.org (T368258)
  • 12:53 vgutierrez: IPIP encapsulation enabled on ldap-ro.codfw.wikimedia.org. - T367861
  • 12:50 vgutierrez: rolling restart of pybal on lvs2014 and lvs2012 - T367861
  • 12:23 klausman@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 12:21 klausman@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 12:17 klausman@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 12:16 cgoubert@cumin1002: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=appserver,name=mw1364.eqiad.wmnet
  • 12:16 cgoubert@cumin1002: conftool action : set/pooled=yes; selector: dc=codfw,cluster=appserver,name=mw2276.codfw.wmnet
  • 12:15 cgoubert@cumin1002: conftool action : set/pooled=yes; selector: dc=eqiad,cluster=api_appserver,name=mw1398.eqiad.wmnet
  • 12:15 cgoubert@cumin1002: conftool action : set/pooled=yes; selector: dc=codfw,cluster=api_appserver,name=mw2299.codfw.wmnet
  • 12:13 moritzm: installing pymysql security updates
  • 12:11 Dreamy_Jazz: Restarting MediaModeration scanning script - https://wikitech.wikimedia.org/wiki/MediaModeration
  • 12:10 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 12:10 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mwdebug[2001-2002].codfw.wmnet,mwdebug[1001-1002].eqiad.wmnet
  • 12:10 cgoubert@cumin1002: START - Cookbook sre.hosts.remove-downtime for mwdebug[2001-2002].codfw.wmnet,mwdebug[1001-1002].eqiad.wmnet
  • 12:10 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 21 days, 0:00:00 on 31 hosts with reason: Waiting for reimage to kubernetes
  • 12:10 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 12:09 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 21 days, 0:00:00 on 31 hosts with reason: Waiting for reimage to kubernetes
  • 12:09 claime: Downtiming all legacy api_appserver and appserver - T368058
  • 12:07 cgoubert@cumin1002: conftool action : set/pooled=inactive; selector: cluster=api_appserver
  • 12:07 claime: Setting all legacy api_appservers to inactive - T368058
  • 12:07 cgoubert@cumin1002: conftool action : set/pooled=inactive; selector: cluster=appserver
  • 12:06 claime: Setting all legacy appservers to inactive - T368058
  • 12:05 klausman@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 12:04 klausman@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 12:01 klausman@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 11:59 klausman@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 11:57 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance
  • 11:56 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance
  • 11:55 klausman@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'revertrisk' for release 'main' .
  • 11:52 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: mw2360.codfw.wmnet
  • 11:52 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: mw2360.codfw.wmnet
  • 11:52 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: mw2358.codfw.wmnet
  • 11:52 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: mw2358.codfw.wmnet
  • 11:52 klausman@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'readability' for release 'main' .
  • 11:52 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: mw2339.codfw.wmnet
  • 11:52 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: mw2339.codfw.wmnet
  • 11:51 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: mw1406.eqiad.wmnet
  • 11:51 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: mw1406.eqiad.wmnet
  • 11:51 jgiannelos@deploy1002: helmfile [eqiad] DONE helmfile.d/services/push-notifications: apply
  • 11:51 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: mw1403.eqiad.wmnet
  • 11:51 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: mw1403.eqiad.wmnet
  • 11:51 jgiannelos@deploy1002: helmfile [eqiad] START helmfile.d/services/push-notifications: apply
  • 11:50 jgiannelos@deploy1002: helmfile [codfw] DONE helmfile.d/services/push-notifications: apply
  • 11:49 jgiannelos@deploy1002: helmfile [codfw] START helmfile.d/services/push-notifications: apply
  • 11:49 klausman@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'llm' for release 'main' .
  • 11:46 klausman@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 11:45 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/push-notifications: apply
  • 11:44 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/push-notifications: apply
  • 11:44 moritzm: installing php8.2 security updates
  • 11:42 klausman@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'article-descriptions' for release 'main' .
  • 11:26 klausman@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 11:13 btullis@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:13 btullis@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove AAAA records from an-redacteddb1001 - btullis@cumin1002"
  • 11:01 btullis@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Remove AAAA records from an-redacteddb1001 - btullis@cumin1002"
  • 10:58 btullis@cumin1002: START - Cookbook sre.dns.netbox
  • 10:51 cgoubert@cumin1002: conftool action : set/pooled=yes:weight=10; selector: name=(mw1420.eqiad.wmnet|mw1407.eqiad.wmnet),dc=eqiad,cluster=jobrunner
  • 10:50 cgoubert@cumin1002: conftool action : set/pooled=no:weight=10; selector: name=(mw1420.eqiad.wmnet|mw1407.eqiad.wmnet),dc=eqiad,cluster=jobrunner
  • 10:41 cgoubert@cumin1002: conftool action : set/pooled=no:weight=10; selector: name=(mw1420.eqiad.wmnet|mw1407.eqiad.wmnet),dc=eqiad,cluster=videoscaler
  • 10:41 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/push-notifications: apply
  • 10:40 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw1420.eqiad.wmnet
  • 10:40 cgoubert@cumin1002: START - Cookbook sre.hosts.remove-downtime for mw1420.eqiad.wmnet
  • 10:40 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw1407.eqiad.wmnet
  • 10:40 cgoubert@cumin1002: START - Cookbook sre.hosts.remove-downtime for mw1407.eqiad.wmnet
  • 10:39 claime: pooling mw1420.eqiad.wmnet,mw1407.eqiad.wmnet as videoscalers - T368058
  • 10:36 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw1407.eqiad.wmnet with OS buster
  • 10:32 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host mw1420.eqiad.wmnet with OS buster
  • 10:14 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 10:10 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 10:04 jelto@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1004.wikimedia.org with reason: Upgrade GitLab Replica to new version
  • 10:03 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1407.eqiad.wmnet with reason: host reimage
  • 09:59 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw1420.eqiad.wmnet with reason: host reimage
  • 09:57 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1004.wikimedia.org with reason: Upgrade GitLab Replica to new version
  • 09:56 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1407.eqiad.wmnet with reason: host reimage
  • 09:56 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on mw1420.eqiad.wmnet with reason: host reimage
  • 09:46 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: apply
  • 09:45 mvolz@deploy1002: helmfile [codfw] DONE helmfile.d/services/citoid: apply
  • 09:44 mvolz@deploy1002: helmfile [codfw] START helmfile.d/services/citoid: apply
  • 09:44 mvolz@deploy1002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
  • 09:44 mvolz@deploy1002: helmfile [eqiad] START helmfile.d/services/citoid: apply
  • 09:44 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/citoid: apply
  • 09:43 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/citoid: apply
  • 09:42 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host mw1420.eqiad.wmnet with OS buster
  • 09:42 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: apply
  • 09:42 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: apply
  • 09:41 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host mw1407.eqiad.wmnet with OS buster
  • 09:39 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: apply
  • 09:34 claime: Reimaging scap::proxies, mediawiki deployments may be unavailable - T368058
  • 09:33 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/citoid: apply
  • 09:33 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/citoid: apply
  • 09:22 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 09:20 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for checker.tools.wmflabs.org
  • 09:20 taavi@cumin1002: START - Cookbook sre.hosts.remove-downtime for checker.tools.wmflabs.org
  • 09:19 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 09:17 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on checker.tools.wmflabs.org with reason: rebooting the toolschecker VM
  • 09:17 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on checker.tools.wmflabs.org with reason: rebooting the toolschecker VM
  • 09:10 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 09:10 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 05:03 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance
  • 05:03 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance
  • 05:03 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1224 (T367856)', diff saved to https://phabricator.wikimedia.org/P65388 and previous config saved to /var/cache/conftool/dbconfig/20240624-050309-marostegui.json
  • 04:48 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1224', diff saved to https://phabricator.wikimedia.org/P65387 and previous config saved to /var/cache/conftool/dbconfig/20240624-044802-marostegui.json
  • 04:32 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1224', diff saved to https://phabricator.wikimedia.org/P65386 and previous config saved to /var/cache/conftool/dbconfig/20240624-043254-marostegui.json
  • 04:17 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1224 (T367856)', diff saved to https://phabricator.wikimedia.org/P65385 and previous config saved to /var/cache/conftool/dbconfig/20240624-041747-marostegui.json
  • 01:59 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1224 (T367856)', diff saved to https://phabricator.wikimedia.org/P65384 and previous config saved to /var/cache/conftool/dbconfig/20240624-015859-marostegui.json
  • 01:58 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1224.eqiad.wmnet with reason: Maintenance
  • 01:58 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1224.eqiad.wmnet with reason: Maintenance
  • 01:58 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T367856)', diff saved to https://phabricator.wikimedia.org/P65383 and previous config saved to /var/cache/conftool/dbconfig/20240624-015836-marostegui.json
  • 01:43 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P65382 and previous config saved to /var/cache/conftool/dbconfig/20240624-014329-marostegui.json
  • 01:28 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P65381 and previous config saved to /var/cache/conftool/dbconfig/20240624-012822-marostegui.json
  • 01:13 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T367856)', diff saved to https://phabricator.wikimedia.org/P65380 and previous config saved to /var/cache/conftool/dbconfig/20240624-011315-marostegui.json

2024-06-23

  • 22:50 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1201 (T367856)', diff saved to https://phabricator.wikimedia.org/P65379 and previous config saved to /var/cache/conftool/dbconfig/20240623-225008-marostegui.json
  • 22:50 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1201.eqiad.wmnet with reason: Maintenance
  • 22:49 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1201.eqiad.wmnet with reason: Maintenance
  • 22:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T367856)', diff saved to https://phabricator.wikimedia.org/P65378 and previous config saved to /var/cache/conftool/dbconfig/20240623-224946-marostegui.json
  • 22:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P65377 and previous config saved to /var/cache/conftool/dbconfig/20240623-223439-marostegui.json
  • 22:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P65376 and previous config saved to /var/cache/conftool/dbconfig/20240623-221932-marostegui.json
  • 22:04 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T367856)', diff saved to https://phabricator.wikimedia.org/P65375 and previous config saved to /var/cache/conftool/dbconfig/20240623-220426-marostegui.json
  • 19:33 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1187 (T367856)', diff saved to https://phabricator.wikimedia.org/P65374 and previous config saved to /var/cache/conftool/dbconfig/20240623-193306-marostegui.json
  • 19:33 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1187.eqiad.wmnet with reason: Maintenance
  • 19:32 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1187.eqiad.wmnet with reason: Maintenance
  • 19:32 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T367856)', diff saved to https://phabricator.wikimedia.org/P65373 and previous config saved to /var/cache/conftool/dbconfig/20240623-193244-marostegui.json
  • 19:17 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P65372 and previous config saved to /var/cache/conftool/dbconfig/20240623-191737-marostegui.json
  • 19:02 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P65371 and previous config saved to /var/cache/conftool/dbconfig/20240623-190230-marostegui.json
  • 18:47 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T367856)', diff saved to https://phabricator.wikimedia.org/P65370 and previous config saved to /var/cache/conftool/dbconfig/20240623-184722-marostegui.json
  • 16:12 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1180 (T367856)', diff saved to https://phabricator.wikimedia.org/P65369 and previous config saved to /var/cache/conftool/dbconfig/20240623-161243-marostegui.json
  • 16:12 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 16:12 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 16:12 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1173 (T367856)', diff saved to https://phabricator.wikimedia.org/P65368 and previous config saved to /var/cache/conftool/dbconfig/20240623-161221-marostegui.json
  • 15:57 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1173', diff saved to https://phabricator.wikimedia.org/P65367 and previous config saved to /var/cache/conftool/dbconfig/20240623-155714-marostegui.json
  • 15:42 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1173', diff saved to https://phabricator.wikimedia.org/P65366 and previous config saved to /var/cache/conftool/dbconfig/20240623-154207-marostegui.json
  • 15:27 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1173 (T367856)', diff saved to https://phabricator.wikimedia.org/P65365 and previous config saved to /var/cache/conftool/dbconfig/20240623-152700-marostegui.json
  • 12:45 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1173 (T367856)', diff saved to https://phabricator.wikimedia.org/P65364 and previous config saved to /var/cache/conftool/dbconfig/20240623-124522-marostegui.json
  • 12:45 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance
  • 12:45 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1173.eqiad.wmnet with reason: Maintenance
  • 12:45 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T367856)', diff saved to https://phabricator.wikimedia.org/P65363 and previous config saved to /var/cache/conftool/dbconfig/20240623-124459-marostegui.json
  • 12:29 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P65362 and previous config saved to /var/cache/conftool/dbconfig/20240623-122952-marostegui.json
  • 12:14 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P65361 and previous config saved to /var/cache/conftool/dbconfig/20240623-121445-marostegui.json
  • 11:59 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T367856)', diff saved to https://phabricator.wikimedia.org/P65360 and previous config saved to /var/cache/conftool/dbconfig/20240623-115938-marostegui.json
  • 11:06 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifeeds: sync
  • 11:06 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/wikifeeds: sync
  • 09:28 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1168 (T367856)', diff saved to https://phabricator.wikimedia.org/P65359 and previous config saved to /var/cache/conftool/dbconfig/20240623-092833-marostegui.json
  • 09:28 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 09:28 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 09:28 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T367856)', diff saved to https://phabricator.wikimedia.org/P65358 and previous config saved to /var/cache/conftool/dbconfig/20240623-092811-marostegui.json
  • 09:13 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P65357 and previous config saved to /var/cache/conftool/dbconfig/20240623-091304-marostegui.json
  • 08:57 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P65356 and previous config saved to /var/cache/conftool/dbconfig/20240623-085757-marostegui.json
  • 08:42 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T367856)', diff saved to https://phabricator.wikimedia.org/P65355 and previous config saved to /var/cache/conftool/dbconfig/20240623-084250-marostegui.json
  • 06:05 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1165 (T367856)', diff saved to https://phabricator.wikimedia.org/P65354 and previous config saved to /var/cache/conftool/dbconfig/20240623-060520-marostegui.json
  • 06:05 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 06:05 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 06:04 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 06:04 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1165.eqiad.wmnet with reason: Maintenance

2024-06-22

  • 20:28 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2200.codfw.wmnet with reason: Maintenance
  • 20:28 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2200.codfw.wmnet with reason: Maintenance
  • 16:18 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2198.codfw.wmnet with reason: Maintenance
  • 16:18 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2198.codfw.wmnet with reason: Maintenance
  • 16:18 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 (T364069)', diff saved to https://phabricator.wikimedia.org/P65353 and previous config saved to /var/cache/conftool/dbconfig/20240622-161841-marostegui.json
  • 16:03 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P65352 and previous config saved to /var/cache/conftool/dbconfig/20240622-160333-marostegui.json
  • 15:48 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195', diff saved to https://phabricator.wikimedia.org/P65351 and previous config saved to /var/cache/conftool/dbconfig/20240622-154826-marostegui.json
  • 15:33 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2195 (T364069)', diff saved to https://phabricator.wikimedia.org/P65350 and previous config saved to /var/cache/conftool/dbconfig/20240622-153318-marostegui.json
  • 12:04 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2195 (T364069)', diff saved to https://phabricator.wikimedia.org/P65349 and previous config saved to /var/cache/conftool/dbconfig/20240622-120437-marostegui.json
  • 12:04 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2195.codfw.wmnet with reason: Maintenance
  • 12:04 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2195.codfw.wmnet with reason: Maintenance
  • 12:04 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 (T364069)', diff saved to https://phabricator.wikimedia.org/P65348 and previous config saved to /var/cache/conftool/dbconfig/20240622-120404-marostegui.json
  • 11:48 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P65347 and previous config saved to /var/cache/conftool/dbconfig/20240622-114857-marostegui.json
  • 11:33 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181', diff saved to https://phabricator.wikimedia.org/P65346 and previous config saved to /var/cache/conftool/dbconfig/20240622-113350-marostegui.json
  • 11:18 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2181 (T364069)', diff saved to https://phabricator.wikimedia.org/P65345 and previous config saved to /var/cache/conftool/dbconfig/20240622-111842-marostegui.json
  • 06:48 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2181 (T364069)', diff saved to https://phabricator.wikimedia.org/P65344 and previous config saved to /var/cache/conftool/dbconfig/20240622-064802-marostegui.json
  • 06:47 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance
  • 06:47 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2181.codfw.wmnet with reason: Maintenance
  • 06:47 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167 (T364069)', diff saved to https://phabricator.wikimedia.org/P65343 and previous config saved to /var/cache/conftool/dbconfig/20240622-064739-marostegui.json
  • 06:32 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P65342 and previous config saved to /var/cache/conftool/dbconfig/20240622-063232-marostegui.json
  • 06:17 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167', diff saved to https://phabricator.wikimedia.org/P65341 and previous config saved to /var/cache/conftool/dbconfig/20240622-061725-marostegui.json
  • 06:02 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2167 (T364069)', diff saved to https://phabricator.wikimedia.org/P65340 and previous config saved to /var/cache/conftool/dbconfig/20240622-060216-marostegui.json
  • 05:05 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db2197.codfw.wmnet with reason: Long schema change
  • 05:05 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on db2197.codfw.wmnet with reason: Long schema change
  • 01:50 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2167 (T364069)', diff saved to https://phabricator.wikimedia.org/P65339 and previous config saved to /var/cache/conftool/dbconfig/20240622-015020-marostegui.json
  • 01:50 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2167.codfw.wmnet with reason: Maintenance
  • 01:50 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2167.codfw.wmnet with reason: Maintenance
  • 01:50 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T364069)', diff saved to https://phabricator.wikimedia.org/P65338 and previous config saved to /var/cache/conftool/dbconfig/20240622-014958-marostegui.json
  • 01:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P65337 and previous config saved to /var/cache/conftool/dbconfig/20240622-013451-marostegui.json
  • 01:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166', diff saved to https://phabricator.wikimedia.org/P65336 and previous config saved to /var/cache/conftool/dbconfig/20240622-011943-marostegui.json
  • 01:04 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2166 (T364069)', diff saved to https://phabricator.wikimedia.org/P65335 and previous config saved to /var/cache/conftool/dbconfig/20240622-010436-marostegui.json

2024-06-21

  • 23:54 cwhite: delete remaining 2024.03 log indexes to make room on logstash eqiad and codfw T368180
  • 23:43 brett@puppetmaster1001: dbctl commit (dc=all): 'set db1206 s1 weight to 1 - T368098', diff saved to https://phabricator.wikimedia.org/P65334 and previous config saved to /var/cache/conftool/dbconfig/20240621-234328-brett.json
  • 23:28 brett: # dbctl instance db1206 set-weight 10 --section s1
  • 21:35 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217 (T367856)', diff saved to https://phabricator.wikimedia.org/P65333 and previous config saved to /var/cache/conftool/dbconfig/20240621-213503-marostegui.json
  • 21:31 cwhite: restart apache2 on gerrit1003
  • 21:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217', diff saved to https://phabricator.wikimedia.org/P65332 and previous config saved to /var/cache/conftool/dbconfig/20240621-211956-marostegui.json
  • 21:04 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217', diff saved to https://phabricator.wikimedia.org/P65331 and previous config saved to /var/cache/conftool/dbconfig/20240621-210448-marostegui.json
  • 20:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217 (T367856)', diff saved to https://phabricator.wikimedia.org/P65330 and previous config saved to /var/cache/conftool/dbconfig/20240621-204941-marostegui.json
  • 20:37 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2166 (T364069)', diff saved to https://phabricator.wikimedia.org/P65329 and previous config saved to /var/cache/conftool/dbconfig/20240621-203659-marostegui.json
  • 20:36 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2166.codfw.wmnet with reason: Maintenance
  • 20:36 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2166.codfw.wmnet with reason: Maintenance
  • 20:36 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T364069)', diff saved to https://phabricator.wikimedia.org/P65328 and previous config saved to /var/cache/conftool/dbconfig/20240621-203636-marostegui.json
  • 20:21 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P65327 and previous config saved to /var/cache/conftool/dbconfig/20240621-202129-marostegui.json
  • 20:06 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164', diff saved to https://phabricator.wikimedia.org/P65326 and previous config saved to /var/cache/conftool/dbconfig/20240621-200622-marostegui.json
  • 19:51 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2164 (T364069)', diff saved to https://phabricator.wikimedia.org/P65325 and previous config saved to /var/cache/conftool/dbconfig/20240621-195115-marostegui.json
  • 19:43 aikochou@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 16:00 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:59 cmooney@cumin1002: START - Cookbook sre.dns.netbox
  • 15:41 elukey@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
  • 15:41 elukey@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
  • 15:40 elukey@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox-canary
  • 15:39 elukey@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
  • 15:37 elukey@cumin1002: END (FAIL) - Cookbook sre.netbox.update-extras (exit_code=1) rolling restart_daemons on A:netbox-canary
  • 15:37 elukey@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox-canary
  • 15:20 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2164 (T364069)', diff saved to https://phabricator.wikimedia.org/P65322 and previous config saved to /var/cache/conftool/dbconfig/20240621-152038-marostegui.json
  • 15:20 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 15:20 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 15:20 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2164.codfw.wmnet with reason: Maintenance
  • 15:20 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2164.codfw.wmnet with reason: Maintenance
  • 15:20 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T364069)', diff saved to https://phabricator.wikimedia.org/P65321 and previous config saved to /var/cache/conftool/dbconfig/20240621-152011-marostegui.json
  • 15:05 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P65319 and previous config saved to /var/cache/conftool/dbconfig/20240621-150504-marostegui.json
  • 15:01 ejegg: fundraising civicrm upgraded from 8a0b5bea to 13a13f3a
  • 14:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163', diff saved to https://phabricator.wikimedia.org/P65318 and previous config saved to /var/cache/conftool/dbconfig/20240621-144957-marostegui.json
  • 14:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2163 (T364069)', diff saved to https://phabricator.wikimedia.org/P65317 and previous config saved to /var/cache/conftool/dbconfig/20240621-143450-marostegui.json
  • 14:10 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2217 (T367856)', diff saved to https://phabricator.wikimedia.org/P65314 and previous config saved to /var/cache/conftool/dbconfig/20240621-141050-marostegui.json
  • 14:10 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2217.codfw.wmnet with reason: Maintenance
  • 14:10 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2217.codfw.wmnet with reason: Maintenance
  • 14:10 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2214 (T367856)', diff saved to https://phabricator.wikimedia.org/P65313 and previous config saved to /var/cache/conftool/dbconfig/20240621-141028-marostegui.json
  • 13:55 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2214', diff saved to https://phabricator.wikimedia.org/P65312 and previous config saved to /var/cache/conftool/dbconfig/20240621-135521-marostegui.json
  • 13:40 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2214', diff saved to https://phabricator.wikimedia.org/P65309 and previous config saved to /var/cache/conftool/dbconfig/20240621-134013-marostegui.json
  • 13:25 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2214 (T367856)', diff saved to https://phabricator.wikimedia.org/P65306 and previous config saved to /var/cache/conftool/dbconfig/20240621-132506-marostegui.json
  • 13:21 btullis@deploy1002: Finished deploy [performance/asoranking@febfb9f]: (no justification provided) (duration: 00m 04s)
  • 13:21 btullis@deploy1002: Started deploy [performance/asoranking@febfb9f]: (no justification provided)
  • 13:08 dani@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 13:07 dani@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 13:07 dani@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 13:07 dani@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 13:07 dani@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 13:06 dani@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 11:37 hnowlan@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) shellbox-video.discovery.wmnet on all recursors
  • 11:37 hnowlan@cumin1002: START - Cookbook sre.dns.wipe-cache shellbox-video.discovery.wmnet on all recursors
  • 11:06 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2214 (T367856)', diff saved to https://phabricator.wikimedia.org/P65303 and previous config saved to /var/cache/conftool/dbconfig/20240621-110638-marostegui.json
  • 11:06 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2214.codfw.wmnet with reason: Maintenance
  • 11:06 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2214.codfw.wmnet with reason: Maintenance
  • 10:57 Emperor: restart swift-proxy on ms-fe2011 ms-fe2012 T360913
  • 10:56 Emperor: restart swift-proxy on ms-fe1010 T360913
  • 10:36 kamila@cumin1002: conftool action : set/pooled=yes; selector: name=wikikube-ctrl2002.codfw.wmnet
  • 10:36 kamila@cumin1002: conftool action : set/pooled=yes; selector: name=wikikube-ctrl2001.codfw.wmnet
  • 10:28 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-ctrl2002.codfw.wmnet with OS bullseye
  • 10:05 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2163 (T364069)', diff saved to https://phabricator.wikimedia.org/P65302 and previous config saved to /var/cache/conftool/dbconfig/20240621-100554-marostegui.json
  • 10:05 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2163.codfw.wmnet with reason: Maintenance
  • 10:05 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2163.codfw.wmnet with reason: Maintenance
  • 10:05 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2162 (T364069)', diff saved to https://phabricator.wikimedia.org/P65301 and previous config saved to /var/cache/conftool/dbconfig/20240621-100531-marostegui.json
  • 09:50 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2162', diff saved to https://phabricator.wikimedia.org/P65300 and previous config saved to /var/cache/conftool/dbconfig/20240621-095024-marostegui.json
  • 09:45 brouberol@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12 days, 0:00:00 on karapace[1001-1002].eqiad.wmnet with reason: The hosts are soon to be decommissioned
  • 09:45 brouberol@cumin2002: START - Cookbook sre.hosts.downtime for 12 days, 0:00:00 on karapace[1001-1002].eqiad.wmnet with reason: The hosts are soon to be decommissioned
  • 09:41 aborrero@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1053.eqiad.wmnet with OS bookworm
  • 09:35 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2162', diff saved to https://phabricator.wikimedia.org/P65299 and previous config saved to /var/cache/conftool/dbconfig/20240621-093517-marostegui.json
  • 09:31 ryankemper@cumin2002: END (PASS) - Cookbook sre.wdqs.data-reload (exit_code=0) reloading wikidata_full on wdqs2023.codfw.wmnet from DumpsSource.HDFS (hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/full/20240603/ using stat1009.eqiad.wmnet)
  • 09:20 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2162 (T364069)', diff saved to https://phabricator.wikimedia.org/P65298 and previous config saved to /var/cache/conftool/dbconfig/20240621-092009-marostegui.json
  • 09:16 aborrero@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1053.eqiad.wmnet with reason: host reimage
  • 09:14 aborrero@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1053.eqiad.wmnet with reason: host reimage
  • 09:02 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-ctrl2002.codfw.wmnet with reason: host reimage
  • 08:57 kamila@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-ctrl2002.codfw.wmnet with reason: host reimage
  • 08:56 aborrero@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1053.eqiad.wmnet with OS bookworm
  • 08:47 aborrero@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudvirt1053.eqiad.wmnet
  • 08:41 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl2002.codfw.wmnet with OS bullseye
  • 08:39 aborrero@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudvirt1053.eqiad.wmnet
  • 08:14 vgutierrez: restarting logrotate.service on cp[3068,3070-3071].esams.wmnet
  • 08:04 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
  • 08:04 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 08:03 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
  • 08:03 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
  • 08:00 akosiaris@deploy1002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 08:00 akosiaris@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 07:54 arnaudb@cumin1002: dbctl commit (dc=all): 'db1206 (re)pooling @ 100%: repool to fill up vslow/dump', diff saved to https://phabricator.wikimedia.org/P65297 and previous config saved to /var/cache/conftool/dbconfig/20240621-075404-arnaudb.json
  • 07:38 arnaudb@cumin1002: dbctl commit (dc=all): 'db1206 (re)pooling @ 75%: repool to fill up vslow/dump', diff saved to https://phabricator.wikimedia.org/P65296 and previous config saved to /var/cache/conftool/dbconfig/20240621-073858-arnaudb.json
  • 07:23 arnaudb@cumin1002: dbctl commit (dc=all): 'db1206 (re)pooling @ 50%: repool to fill up vslow/dump', diff saved to https://phabricator.wikimedia.org/P65295 and previous config saved to /var/cache/conftool/dbconfig/20240621-072353-arnaudb.json
  • 07:08 arnaudb@cumin1002: dbctl commit (dc=all): 'db1206 (re)pooling @ 25%: repool to fill up vslow/dump', diff saved to https://phabricator.wikimedia.org/P65294 and previous config saved to /var/cache/conftool/dbconfig/20240621-070847-arnaudb.json
  • 07:04 arnaudb@cumin1002: dbctl commit (dc=all): 'db1206 depool for debugging T368098', diff saved to https://phabricator.wikimedia.org/P65293 and previous config saved to /var/cache/conftool/dbconfig/20240621-070358-arnaudb.json
  • 04:51 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2162 (T364069)', diff saved to https://phabricator.wikimedia.org/P65292 and previous config saved to /var/cache/conftool/dbconfig/20240621-045107-marostegui.json
  • 04:51 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2162.codfw.wmnet with reason: Maintenance
  • 04:50 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2162.codfw.wmnet with reason: Maintenance
  • 04:50 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2161 (T364069)', diff saved to https://phabricator.wikimedia.org/P65291 and previous config saved to /var/cache/conftool/dbconfig/20240621-045044-marostegui.json
  • 04:45 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2197.codfw.wmnet with reason: Maintenance
  • 04:45 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2197.codfw.wmnet with reason: Maintenance
  • 04:44 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193 (T367856)', diff saved to https://phabricator.wikimedia.org/P65290 and previous config saved to /var/cache/conftool/dbconfig/20240621-044455-marostegui.json
  • 04:35 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2161', diff saved to https://phabricator.wikimedia.org/P65289 and previous config saved to /var/cache/conftool/dbconfig/20240621-043537-marostegui.json
  • 04:29 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193', diff saved to https://phabricator.wikimedia.org/P65288 and previous config saved to /var/cache/conftool/dbconfig/20240621-042948-marostegui.json
  • 04:20 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2161', diff saved to https://phabricator.wikimedia.org/P65287 and previous config saved to /var/cache/conftool/dbconfig/20240621-042030-marostegui.json
  • 04:14 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193', diff saved to https://phabricator.wikimedia.org/P65286 and previous config saved to /var/cache/conftool/dbconfig/20240621-041441-marostegui.json
  • 04:05 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2161 (T364069)', diff saved to https://phabricator.wikimedia.org/P65285 and previous config saved to /var/cache/conftool/dbconfig/20240621-040523-marostegui.json
  • 03:59 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193 (T367856)', diff saved to https://phabricator.wikimedia.org/P65284 and previous config saved to /var/cache/conftool/dbconfig/20240621-035934-marostegui.json
  • 03:04 ejegg: fundraising civicrm upgraded from 2e1db811 to 8a0b5bea
  • 01:45 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2193 (T367856)', diff saved to https://phabricator.wikimedia.org/P65283 and previous config saved to /var/cache/conftool/dbconfig/20240621-014545-marostegui.json
  • 01:45 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2193.codfw.wmnet with reason: Maintenance
  • 01:45 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2193.codfw.wmnet with reason: Maintenance
  • 01:45 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T367856)', diff saved to https://phabricator.wikimedia.org/P65282 and previous config saved to /var/cache/conftool/dbconfig/20240621-014523-marostegui.json
  • 01:30 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P65281 and previous config saved to /var/cache/conftool/dbconfig/20240621-013016-marostegui.json
  • 01:15 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P65280 and previous config saved to /var/cache/conftool/dbconfig/20240621-011509-marostegui.json
  • 01:00 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T367856)', diff saved to https://phabricator.wikimedia.org/P65279 and previous config saved to /var/cache/conftool/dbconfig/20240621-010002-marostegui.json
  • 00:52 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1236 (T352010)', diff saved to https://phabricator.wikimedia.org/P65278 and previous config saved to /var/cache/conftool/dbconfig/20240621-005237-ladsgroup.json
  • 00:37 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1236', diff saved to https://phabricator.wikimedia.org/P65277 and previous config saved to /var/cache/conftool/dbconfig/20240621-003730-ladsgroup.json
  • 00:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1236', diff saved to https://phabricator.wikimedia.org/P65276 and previous config saved to /var/cache/conftool/dbconfig/20240621-002223-ladsgroup.json
  • 00:08 mutante: [cp3072:~] $ sudo systemctl start varnishkafka-webrequest.service
  • 00:08 mutante: [cp3067:~] $ sudo systemctl start logrotate
  • 00:07 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1236 (T352010)', diff saved to https://phabricator.wikimedia.org/P65275 and previous config saved to /var/cache/conftool/dbconfig/20240621-000716-ladsgroup.json
  • 00:00 sukhe: restarting haproxy on cp3068 and cp3072

2024-06-20

  • 23:47 zabe@deploy1002: Finished scap: Update interwiki cache (duration: 10m 12s)
  • 23:36 zabe@deploy1002: Started scap: Update interwiki cache
  • 23:35 zabe: zabe@mwmaint1002:~$ mwscript extensions/CirrusSearch/maintenance/UpdateSearchIndexConfig.php --wiki=btmwiki --cluster=all 2>&1 | tee /tmp/btmwiki.UpdateSearchIndexConfig.log # T368038
  • 23:33 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2161 (T364069)', diff saved to https://phabricator.wikimedia.org/P65274 and previous config saved to /var/cache/conftool/dbconfig/20240620-233346-marostegui.json
  • 23:33 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2161.codfw.wmnet with reason: Maintenance
  • 23:33 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2161.codfw.wmnet with reason: Maintenance
  • 23:33 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T364069)', diff saved to https://phabricator.wikimedia.org/P65273 and previous config saved to /var/cache/conftool/dbconfig/20240620-233324-marostegui.json
  • 23:33 zabe@deploy1002: Finished scap: Creating btmwiki (T368038) (duration: 12m 20s)
  • 23:20 zabe@deploy1002: Started scap: Creating btmwiki (T368038)
  • 23:20 zabe: create Wikipedia Mandailing # T368038
  • 23:18 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P65272 and previous config saved to /var/cache/conftool/dbconfig/20240620-231817-marostegui.json
  • 23:03 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154', diff saved to https://phabricator.wikimedia.org/P65271 and previous config saved to /var/cache/conftool/dbconfig/20240620-230310-marostegui.json
  • 22:48 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2154 (T364069)', diff saved to https://phabricator.wikimedia.org/P65270 and previous config saved to /var/cache/conftool/dbconfig/20240620-224803-marostegui.json
  • 22:39 mutante: aphlict1002/aphlict2001 - systemctl stop aphlict_lograte.timer (and .service); systemctl disable aphlict_logrotate.timer (and .service); systemctl daemon-reload; systemctl reset-failed T367960
  • 22:33 zabe@deploy1002: Synchronized wmf-config/InitialiseSettings.php: T361041 T363825 T366649 (duration: 09m 55s)
  • 22:29 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2180 (T367856)', diff saved to https://phabricator.wikimedia.org/P65269 and previous config saved to /var/cache/conftool/dbconfig/20240620-222909-marostegui.json
  • 22:29 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2180.codfw.wmnet with reason: Maintenance
  • 22:28 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2180.codfw.wmnet with reason: Maintenance
  • 22:28 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169 (T367856)', diff saved to https://phabricator.wikimedia.org/P65268 and previous config saved to /var/cache/conftool/dbconfig/20240620-222847-marostegui.json
  • 22:13 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169', diff saved to https://phabricator.wikimedia.org/P65267 and previous config saved to /var/cache/conftool/dbconfig/20240620-221340-marostegui.json
  • 21:58 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169', diff saved to https://phabricator.wikimedia.org/P65266 and previous config saved to /var/cache/conftool/dbconfig/20240620-215833-marostegui.json
  • 21:43 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169 (T367856)', diff saved to https://phabricator.wikimedia.org/P65265 and previous config saved to /var/cache/conftool/dbconfig/20240620-214326-marostegui.json
  • 21:12 ebernhardson@deploy1002: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 21:12 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/page-analytics: apply
  • 21:12 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/page-analytics: apply
  • 21:11 ebernhardson@deploy1002: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 21:10 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/media-analytics: apply
  • 21:09 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/media-analytics: apply
  • 21:09 brett: Include ncmonitor 1.0.0 in wikimedia-bookworm apt repo
  • 21:09 pfischer@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 21:08 ebernhardson@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 21:08 pfischer@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 21:08 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/geo-analytics: apply
  • 21:07 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/geo-analytics: apply
  • 21:07 ebernhardson@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 21:06 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/editor-analytics: apply
  • 21:06 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/editor-analytics: apply
  • 21:05 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/edit-analytics: apply
  • 21:04 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/edit-analytics: apply
  • 21:03 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/device-analytics: apply
  • 21:03 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/device-analytics: apply
  • 20:53 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 6 days, 0:00:00 on elastic1105.eqiad.wmnet with reason: T348977
  • 20:53 bking@cumin2002: START - Cookbook sre.hosts.downtime for 6 days, 0:00:00 on elastic1105.eqiad.wmnet with reason: T348977
  • 20:44 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/page-analytics: apply
  • 20:44 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/page-analytics: apply
  • 20:43 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/media-analytics: apply
  • 20:42 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/media-analytics: apply
  • 20:40 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/geo-analytics: apply
  • 20:40 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/geo-analytics: apply
  • 20:39 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/editor-analytics: apply
  • 20:38 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/editor-analytics: apply
  • 20:36 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/edit-analytics: apply
  • 20:36 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/edit-analytics: apply
  • 20:34 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/device-analytics: apply
  • 20:33 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/device-analytics: apply
  • 20:28 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/page-analytics: apply
  • 20:27 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/page-analytics: apply
  • 20:27 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/media-analytics: apply
  • 20:27 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/media-analytics: apply
  • 20:27 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/geo-analytics: apply
  • 20:26 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/geo-analytics: apply
  • 20:26 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/editor-analytics: apply
  • 20:26 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/editor-analytics: apply
  • 20:25 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/edit-analytics: apply
  • 20:25 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/edit-analytics: apply
  • 20:25 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/device-analytics: apply
  • 20:24 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/device-analytics: apply
  • 19:58 dani@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 19:58 dani@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 19:57 dani@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 19:57 dani@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 19:57 dani@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 19:57 dani@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 19:57 dani@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 19:57 dani@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 19:56 dani@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 19:55 dani@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 19:54 dani@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 19:52 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp4037.ulsfo.wmnet
  • 19:51 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp4037.ulsfo.wmnet
  • 19:18 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: elastic1105* for T348977 - bking@cumin2002
  • 19:18 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: elastic1105* for T348977 - bking@cumin2002
  • 19:18 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.ban (exit_code=99) Banning hosts: elastic1105 for T348977 - bking@cumin2002
  • 19:18 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: elastic1105 for T348977 - bking@cumin2002
  • 19:04 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host elastic2088.codfw.wmnet
  • 19:01 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp4037.ulsfo.wmnet
  • 18:58 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp4037.ulsfo.wmnet
  • 18:21 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1053.eqiad.wmnet with OS bookworm
  • 18:16 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2154 (T364069)', diff saved to https://phabricator.wikimedia.org/P65263 and previous config saved to /var/cache/conftool/dbconfig/20240620-181635-marostegui.json
  • 18:16 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2154.codfw.wmnet with reason: Maintenance
  • 18:16 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2154.codfw.wmnet with reason: Maintenance
  • 18:16 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T364069)', diff saved to https://phabricator.wikimedia.org/P65262 and previous config saved to /var/cache/conftool/dbconfig/20240620-181613-marostegui.json
  • 18:06 inflatador: bking@an-airflow1007 install `ripgrep` deb pkg
  • 18:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P65261 and previous config saved to /var/cache/conftool/dbconfig/20240620-180104-marostegui.json
  • 17:51 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1053.eqiad.wmnet with reason: host reimage
  • 17:48 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1053.eqiad.wmnet with reason: host reimage
  • 17:45 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P65260 and previous config saved to /var/cache/conftool/dbconfig/20240620-174557-marostegui.json
  • 17:44 bking@cumin2002: START - Cookbook sre.hosts.reboot-single for host elastic2088.codfw.wmnet
  • 17:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1236 (T352010)', diff saved to https://phabricator.wikimedia.org/P65259 and previous config saved to /var/cache/conftool/dbconfig/20240620-174125-ladsgroup.json
  • 17:41 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1236.eqiad.wmnet with reason: Maintenance
  • 17:41 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1236.eqiad.wmnet with reason: Maintenance
  • 17:30 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T364069)', diff saved to https://phabricator.wikimedia.org/P65258 and previous config saved to /var/cache/conftool/dbconfig/20240620-173050-marostegui.json
  • 17:30 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1053.eqiad.wmnet with OS bookworm
  • 17:15 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1063.eqiad.wmnet with OS bookworm
  • 17:15 andrew@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - andrew@cumin1002"
  • 17:14 andrew@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - andrew@cumin1002"
  • 16:34 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1063.eqiad.wmnet with OS bookworm
  • 16:33 arnaudb@cumin1002: dbctl commit (dc=all): 'es1036 (re)pooling @ 75%: post T365987 repool', diff saved to https://phabricator.wikimedia.org/P65256 and previous config saved to /var/cache/conftool/dbconfig/20240620-163348-arnaudb.json
  • 16:30 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1052.eqiad.wmnet with OS bookworm
  • 16:18 arnaudb@cumin1002: dbctl commit (dc=all): 'es1036 (re)pooling @ 50%: post T365987 repool', diff saved to https://phabricator.wikimedia.org/P65254 and previous config saved to /var/cache/conftool/dbconfig/20240620-161842-arnaudb.json
  • 16:18 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for Fix Special:Notifications (T368029) (duration: 12m 21s)
  • 16:10 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde, urbanecm: Continuing with sync
  • 16:10 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde, urbanecm: Backport for Fix Special:Notifications (T368029) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 16:07 hnowlan@cumin1002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on P{lvs1019*,lvs2013*} and A:lvs (T357309)
  • 16:06 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.RESTART (1 nodes at a time) for ElasticSearch cluster relforge: test glibc updates - bking@cumin2002 - T367978
  • 16:06 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1052.eqiad.wmnet with reason: host reimage
  • 16:05 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap: Backport for Fix Special:Notifications (T368029)
  • 16:05 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase1028.eqiad.wmnet: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
  • 16:03 arnaudb@cumin1002: dbctl commit (dc=all): 'es1036 (re)pooling @ 25%: post T365987 repool', diff saved to https://phabricator.wikimedia.org/P65253 and previous config saved to /var/cache/conftool/dbconfig/20240620-160337-arnaudb.json
  • 16:03 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1052.eqiad.wmnet with reason: host reimage
  • 16:02 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw2282.codfw.wmnet
  • 16:01 cgoubert@cumin1002: START - Cookbook sre.hosts.remove-downtime for mw2282.codfw.wmnet
  • 16:01 cgoubert@cumin1002: conftool action : set/weight=10:pooled=yes; selector: name=mw2282.codfw.wmnet,cluster=kubernetes,service=kubesvc
  • 16:00 claime: Repooling and uncordoning mw2282.codfw.wmnet following move - T361856
  • 15:59 hnowlan@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on P{lvs1019*,lvs2013*} and A:lvs (T357309)
  • 15:59 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 15:58 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 15:57 cgoubert@cumin1002: conftool action : set/weight=10:pooled=yes; selector: name=(wikikube-worker2019.codfw.wmnet|wikikube-worker2020.codfw.wmnet|wikikube-worker2021.codfw.wmnet|wikikube-worker2022.codfw.wmnet|wikikube-worker2023.codfw.wmnet|wikikube-worker2024.codfw.wmnet),cluster=kubernetes,service=kubesvc
  • 15:57 claime: Pooling and uncordoning wikikube-worker2019.codfw.wmnet,wikikube-worker2020.codfw.wmnet,wikikube-worker2021.codfw.wmnet,wikikube-worker2022.codfw.wmnet,wikikube-worker2023.codfw.wmnet,wikikube-worker2024.codfw.wmnet - T351074
  • 15:55 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase1028.eqiad.wmnet: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
  • 15:55 sfaci@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/mpic-next: apply
  • 15:55 sfaci@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/mpic-next: apply
  • 15:54 hnowlan@cumin1002: END (PASS) - Cookbook sre.loadbalancer.restart-pybal (exit_code=0) rolling-restart of pybal on P{lvs1020*,lvs2014*} and A:lvs (T357309)
  • 15:52 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.RESTART (1 nodes at a time) for ElasticSearch cluster relforge: test glibc updates - bking@cumin2002 - T367978
  • 15:48 arnaudb@cumin1002: dbctl commit (dc=all): 'es1036 (re)pooling @ 10%: post T365987 repool', diff saved to https://phabricator.wikimedia.org/P65252 and previous config saved to /var/cache/conftool/dbconfig/20240620-154831-arnaudb.json
  • 15:46 hnowlan@cumin1002: START - Cookbook sre.loadbalancer.restart-pybal rolling-restart of pybal on P{lvs1020*,lvs2014*} and A:lvs (T357309)
  • 15:46 claime: homer 'cr*codfw*' commit 'T351074'
  • 15:45 cmooney@cumin1002: START - Cookbook sre.hosts.dhcp for host wikikube-ctrl2002.codfw.wmnet
  • 15:43 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1052.eqiad.wmnet with OS bookworm
  • 15:43 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2019.codfw.wmnet with OS bullseye
  • 15:39 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2020.codfw.wmnet with OS bullseye
  • 15:34 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2022.codfw.wmnet with OS bullseye
  • 15:34 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 15:33 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 15:33 arnaudb@cumin1002: dbctl commit (dc=all): 'es1036 (re)pooling @ 5%: post T365987 repool', diff saved to https://phabricator.wikimedia.org/P65251 and previous config saved to /var/cache/conftool/dbconfig/20240620-153326-arnaudb.json
  • 15:31 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2024.codfw.wmnet with OS bullseye
  • 15:28 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2023.codfw.wmnet with OS bullseye
  • 15:27 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: mw2405.codfw.wmnet
  • 15:27 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: mw2405.codfw.wmnet
  • 15:27 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: mw2404.codfw.wmnet
  • 15:27 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: mw2404.codfw.wmnet
  • 15:27 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: mw2403.codfw.wmnet
  • 15:27 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: mw2403.codfw.wmnet
  • 15:27 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: mw2400.codfw.wmnet
  • 15:27 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: mw2400.codfw.wmnet
  • 15:24 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 15:23 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2021.codfw.wmnet with OS bullseye
  • 15:23 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2019.codfw.wmnet with reason: host reimage
  • 15:23 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 15:20 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 15:19 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 15:19 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2020.codfw.wmnet with reason: host reimage
  • 15:18 arnaudb@cumin1002: dbctl commit (dc=all): 'es1036 (re)pooling @ 2%: post T365987 repool', diff saved to https://phabricator.wikimedia.org/P65249 and previous config saved to /var/cache/conftool/dbconfig/20240620-151820-arnaudb.json
  • 15:18 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 15:15 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2022.codfw.wmnet with reason: host reimage
  • 15:14 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 15:14 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 15:12 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2024.codfw.wmnet with reason: host reimage
  • 15:09 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2023.codfw.wmnet with reason: host reimage
  • 15:06 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2021.codfw.wmnet with reason: host reimage
  • 15:06 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 15:05 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 15:04 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore100[4-6].eqiad.wmnet: Upgrade to Java 11 — T350567 - eevans@cumin1002
  • 15:03 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2024.codfw.wmnet with reason: host reimage
  • 15:03 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2023.codfw.wmnet with reason: host reimage
  • 15:03 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2022.codfw.wmnet with reason: host reimage
  • 15:03 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2019.codfw.wmnet with reason: host reimage
  • 15:02 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2020.codfw.wmnet with reason: host reimage
  • 15:02 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2021.codfw.wmnet with reason: host reimage
  • 15:02 jhathaway@deploy1002: Finished scap: (no justification provided) (duration: 04m 15s)
  • 15:01 topranks: rebooting lsw1-e6-eqiad to upgrade JunOS on switch T365987
  • 15:01 jhathaway@deploy1002: Started scap: (no justification provided)
  • 14:58 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on an-worker[1160-1162].eqiad.wmnet,es1036.eqiad.wmnet,ms-be1077.eqiad.wmnet with reason: JunOS upgrade lsw1-e6-eqiad
  • 14:58 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:40:00 on an-worker[1160-1162].eqiad.wmnet,es1036.eqiad.wmnet,ms-be1077.eqiad.wmnet with reason: JunOS upgrade lsw1-e6-eqiad
  • 14:58 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on lsw1-e6-eqiad,lsw1-e6-eqiad IPv6,ssw1-e1-eqiad.mgmt,ssw1-f1-eqiad.mgmt with reason: JunOS upgrade lsw1-e6-eqiad
  • 14:57 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:40:00 on lsw1-e6-eqiad,lsw1-e6-eqiad IPv6,ssw1-e1-eqiad.mgmt,ssw1-f1-eqiad.mgmt with reason: JunOS upgrade lsw1-e6-eqiad
  • 14:57 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lsw1-f6-eqiad.mgmt
  • 14:57 cmooney@cumin1002: START - Cookbook sre.hosts.remove-downtime for lsw1-f6-eqiad.mgmt
  • 14:56 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:50:00 on lsw1-e6-eqiad.mgmt with reason: prep JunOS upgrade lsw1-f6-eqiad
  • 14:56 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:50:00 on lsw1-e6-eqiad.mgmt with reason: prep JunOS upgrade lsw1-f6-eqiad
  • 14:56 kamila@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-ctrl2002.codfw.wmnet with OS bullseye
  • 14:54 bking@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs1020.eqiad.wmnet
  • 14:54 bking@cumin2002: START - Cookbook sre.hosts.remove-downtime for lvs1020.eqiad.wmnet
  • 14:54 bking@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for lvs1018.eqiad.wmnet
  • 14:54 bking@cumin2002: START - Cookbook sre.hosts.remove-downtime for lvs1018.eqiad.wmnet
  • 14:54 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 14:53 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:50:00 on lsw1-f6-eqiad.mgmt with reason: prep JunOS upgrade lsw1-f6-eqiad
  • 14:53 bking@cumin2002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for 6 hosts
  • 14:53 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:50:00 on lsw1-f6-eqiad.mgmt with reason: prep JunOS upgrade lsw1-f6-eqiad
  • 14:53 bking@cumin2002: START - Cookbook sre.hosts.remove-downtime for 6 hosts
  • 14:48 sukhe: homer "*" commit "rolling out NTP ACL change"
  • 14:48 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl2002.codfw.wmnet with OS bullseye
  • 14:48 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2024.codfw.wmnet with OS bullseye
  • 14:47 arnaudb@cumin1002: dbctl commit (dc=all): 'db1165 (re)pooling @ 100%: post T367854 repool', diff saved to https://phabricator.wikimedia.org/P65248 and previous config saved to /var/cache/conftool/dbconfig/20240620-144750-arnaudb.json
  • 14:47 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2023.codfw.wmnet with OS bullseye
  • 14:47 vgutierrez: rolling restart of pybal on lvs1020 and lvs1018 - T367511
  • 14:47 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2022.codfw.wmnet with OS bullseye
  • 14:47 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2021.codfw.wmnet with OS bullseye
  • 14:46 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2020.codfw.wmnet with OS bullseye
  • 14:46 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2364 to wikikube-worker2024
  • 14:46 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2024
  • 14:46 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2019.codfw.wmnet with OS bullseye
  • 14:46 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore100[4-6].eqiad.wmnet: Upgrade to Java 11 — T350567 - eevans@cumin1002
  • 14:45 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2024
  • 14:45 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:45 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2364 to wikikube-worker2024 - cgoubert@cumin1002"
  • 14:44 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2169 (T367856)', diff saved to https://phabricator.wikimedia.org/P65247 and previous config saved to /var/cache/conftool/dbconfig/20240620-144423-marostegui.json
  • 14:44 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2364 to wikikube-worker2024 - cgoubert@cumin1002"
  • 14:44 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 14:44 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 14:44 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 14:43 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 14:43 kamila@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-ctrl2002.codfw.wmnet with OS bullseye
  • 14:43 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2158.codfw.wmnet with reason: Maintenance
  • 14:43 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2158.codfw.wmnet with reason: Maintenance
  • 14:43 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T367856)', diff saved to https://phabricator.wikimedia.org/P65246 and previous config saved to /var/cache/conftool/dbconfig/20240620-144341-marostegui.json
  • 14:42 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore200[5-6].codfw.wmnet: Upgrade to Java 11 — T350567 - eevans@cumin1002
  • 14:40 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 14:40 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2364 to wikikube-worker2024
  • 14:39 eevans@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on aqs1013.eqiad.wmnet with reason: Main board swap — T362033
  • 14:39 eevans@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on aqs1013.eqiad.wmnet with reason: Main board swap — T362033
  • 14:38 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2363 to wikikube-worker2023
  • 14:38 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2023
  • 14:38 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1051.eqiad.wmnet with OS bookworm
  • 14:38 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl2002.codfw.wmnet with OS bullseye
  • 14:37 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2023
  • 14:37 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:37 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2363 to wikikube-worker2023 - cgoubert@cumin1002"
  • 14:37 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: mw2324.codfw.wmnet
  • 14:37 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: mw2324.codfw.wmnet
  • 14:36 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: mw2323.codfw.wmnet
  • 14:36 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: mw2323.codfw.wmnet
  • 14:36 jmm@cumin2002: END (PASS) - Cookbook sre.debmonitor.remove-hosts (exit_code=0) for 1 hosts: mw1489.eqiad.wmnet
  • 14:36 jmm@cumin2002: START - Cookbook sre.debmonitor.remove-hosts for 1 hosts: mw1489.eqiad.wmnet
  • 14:35 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 14:35 sukhe: running authdns-update for CR 1047074
  • 14:35 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2363 to wikikube-worker2023 - cgoubert@cumin1002"
  • 14:34 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 14:32 arnaudb@cumin1002: dbctl commit (dc=all): 'db1165 (re)pooling @ 75%: post T367854 repool', diff saved to https://phabricator.wikimedia.org/P65245 and previous config saved to /var/cache/conftool/dbconfig/20240620-143244-arnaudb.json
  • 14:32 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 14:32 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2363 to wikikube-worker2023
  • 14:31 moritzm: imported python-pymysql 1.0.2-2~wmf11u2 to apt.wikimedia.org (merge of the security fix from DSA 5700 on top of our internal backport)
  • 14:31 arnaudb@cumin1002: dbctl commit (dc=all): 'es1036 depool ahead of T365987', diff saved to https://phabricator.wikimedia.org/P65244 and previous config saved to /var/cache/conftool/dbconfig/20240620-143109-arnaudb.json
  • 14:30 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on es1036.eqiad.wmnet with reason: T365987
  • 14:30 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore200[5-6].codfw.wmnet: Upgrade to Java 11 — T350567 - eevans@cumin1002
  • 14:30 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 1:30:00 on es1036.eqiad.wmnet with reason: T365987
  • 14:29 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2362 to wikikube-worker2022
  • 14:29 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2022
  • 14:29 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching sessionstore2004.codfw.wmnet: Upgrade to Java 11 — T350567 - eevans@cumin1002
  • 14:28 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2022
  • 14:28 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:28 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2362 to wikikube-worker2022 - cgoubert@cumin1002"
  • 14:28 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P65243 and previous config saved to /var/cache/conftool/dbconfig/20240620-142834-marostegui.json
  • 14:27 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2362 to wikikube-worker2022 - cgoubert@cumin1002"
  • 14:27 cdanis@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 14:26 sukhe: sudo cumin 'O:alerting_host' 'run-puppet-agent'
  • 14:25 cdanis@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 14:25 cdanis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 14:25 elukey@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1002.eqiad.wmnet with reason: Update wmf-plugin for K8s ml-staging - elukey@cumin1002
  • 14:25 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 14:24 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2362 to wikikube-worker2022
  • 14:24 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 14:24 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 14:24 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 14:24 cdanis@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 14:22 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching sessionstore2004.codfw.wmnet: Upgrade to Java 11 — T350567 - eevans@cumin1002
  • 14:22 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2360 to wikikube-worker2021
  • 14:22 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2021
  • 14:21 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2021
  • 14:21 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:21 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2360 to wikikube-worker2021 - cgoubert@cumin1002"
  • 14:19 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2360 to wikikube-worker2021 - cgoubert@cumin1002"
  • 14:17 arnaudb@cumin1002: dbctl commit (dc=all): 'db1165 (re)pooling @ 50%: post T367854 repool', diff saved to https://phabricator.wikimedia.org/P65242 and previous config saved to /var/cache/conftool/dbconfig/20240620-141739-arnaudb.json
  • 14:17 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 6 hosts with reason: IPIP migration
  • 14:17 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 14:17 bking@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on 6 hosts with reason: IPIP migration
  • 14:17 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2360 to wikikube-worker2021
  • 14:16 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2358 to wikikube-worker2020
  • 14:16 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2020
  • 14:15 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2020
  • 14:15 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:15 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2358 to wikikube-worker2020 - cgoubert@cumin1002"
  • 14:14 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 14:14 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 14:13 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P65241 and previous config saved to /var/cache/conftool/dbconfig/20240620-141328-marostegui.json
  • 14:13 elukey@cumin1002: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1002.eqiad.wmnet with reason: Update wmf-plugin for K8s ml-staging - elukey@cumin1002
  • 14:13 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2358 to wikikube-worker2020 - cgoubert@cumin1002"
  • 14:10 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 14:10 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2358 to wikikube-worker2020
  • 14:10 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1051.eqiad.wmnet with reason: host reimage
  • 14:10 marostegui@cumin2002: dbctl commit (dc=all): 'db1236 (re)pooling @ 100%: After schema change', diff saved to https://phabricator.wikimedia.org/P65240 and previous config saved to /var/cache/conftool/dbconfig/20240620-141010-root.json
  • 14:10 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2339 to wikikube-worker2019
  • 14:10 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2019
  • 14:09 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2019
  • 14:09 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:08 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2339 to wikikube-worker2019 - cgoubert@cumin1002"
  • 14:07 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1051.eqiad.wmnet with reason: host reimage
  • 14:07 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2339 to wikikube-worker2019 - cgoubert@cumin1002"
  • 14:04 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 14:04 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2339 to wikikube-worker2019
  • 14:02 arnaudb@cumin1002: dbctl commit (dc=all): 'db1165 (re)pooling @ 25%: post T367854 repool', diff saved to https://phabricator.wikimedia.org/P65239 and previous config saved to /var/cache/conftool/dbconfig/20240620-140233-arnaudb.json
  • 14:01 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1049.eqiad.wmnet with OS bookworm
  • 14:01 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1050.eqiad.wmnet with OS bookworm
  • 13:58 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T367856)', diff saved to https://phabricator.wikimedia.org/P65238 and previous config saved to /var/cache/conftool/dbconfig/20240620-135820-marostegui.json
  • 13:57 cdanis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-jobrunner: apply
  • 13:56 cdanis@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-jobrunner: apply
  • 13:56 cdanis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply
  • 13:56 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2151 (T367856)', diff saved to https://phabricator.wikimedia.org/P65237 and previous config saved to /var/cache/conftool/dbconfig/20240620-135610-marostegui.json
  • 13:56 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2151.codfw.wmnet with reason: Maintenance
  • 13:56 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2151.codfw.wmnet with reason: Maintenance
  • 13:56 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2124 (T367856)', diff saved to https://phabricator.wikimedia.org/P65236 and previous config saved to /var/cache/conftool/dbconfig/20240620-135559-marostegui.json
  • 13:55 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 13:55 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 13:55 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 13:55 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 13:55 cdanis@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply
  • 13:55 cdanis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-wikifunctions: apply
  • 13:54 cdanis@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-wikifunctions: apply
  • 13:54 cdanis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-misc: apply
  • 13:54 marostegui@cumin2002: dbctl commit (dc=all): 'db1236 (re)pooling @ 75%: After schema change', diff saved to https://phabricator.wikimedia.org/P65235 and previous config saved to /var/cache/conftool/dbconfig/20240620-135438-root.json
  • 13:54 cdanis@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-misc: apply
  • 13:53 claime: Depooling mw2339.codfw.wmnet,mw2358.codfw.wmnet,mw2360.codfw.wmnet,mw2362.codfw.wmnet,mw2363.codfw.wmnet,mw2364.codfw.wmnet for reimage to k8s - T351074
  • 13:53 cdanis@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-jobrunner: apply
  • 13:52 cdanis@deploy1002: helmfile [codfw] START helmfile.d/services/mw-jobrunner: apply
  • 13:52 cdanis@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply
  • 13:51 cdanis@deploy1002: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply
  • 13:51 cdanis@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-wikifunctions: apply
  • 13:50 cdanis@deploy1002: helmfile [codfw] START helmfile.d/services/mw-wikifunctions: apply
  • 13:50 cdanis@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-misc: apply
  • 13:50 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1051.eqiad.wmnet with OS bookworm
  • 13:50 cdanis@deploy1002: helmfile [codfw] START helmfile.d/services/mw-misc: apply
  • 13:47 arnaudb@cumin1002: dbctl commit (dc=all): 'db1165 (re)pooling @ 10%: post T367854 repool', diff saved to https://phabricator.wikimedia.org/P65234 and previous config saved to /var/cache/conftool/dbconfig/20240620-134728-arnaudb.json
  • 13:46 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 13:40 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P65233 and previous config saved to /var/cache/conftool/dbconfig/20240620-134052-marostegui.json
  • 13:39 marostegui@cumin2002: dbctl commit (dc=all): 'db1236 (re)pooling @ 50%: After schema change', diff saved to https://phabricator.wikimedia.org/P65232 and previous config saved to /var/cache/conftool/dbconfig/20240620-133907-root.json
  • 13:36 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1050.eqiad.wmnet with reason: host reimage
  • 13:32 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1049.eqiad.wmnet with reason: host reimage
  • 13:28 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1050.eqiad.wmnet with reason: host reimage
  • 13:28 hashar@deploy1002: Finished deploy [integration/docroot@7f59f49]: build: Updating eslint-config-wikimedia to 0.28.2 (duration: 00m 06s)
  • 13:28 hashar@deploy1002: Started deploy [integration/docroot@7f59f49]: build: Updating eslint-config-wikimedia to 0.28.2
  • 13:27 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1049.eqiad.wmnet with reason: host reimage
  • 13:27 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 13:26 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 13:25 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P65231 and previous config saved to /var/cache/conftool/dbconfig/20240620-132545-marostegui.json
  • 13:24 reedy@deploy1002: Synchronized wmf-config/: T368003 (duration: 10m 39s)
  • 13:24 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 13:23 marostegui@cumin2002: dbctl commit (dc=all): 'db1236 (re)pooling @ 25%: After schema change', diff saved to https://phabricator.wikimedia.org/P65230 and previous config saved to /var/cache/conftool/dbconfig/20240620-132335-root.json
  • 13:23 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 13:22 elukey: upload dragonfly packages 1.0.6-2 to bookworm-wikimedia - T365253
  • 13:10 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2124 (T367856)', diff saved to https://phabricator.wikimedia.org/P65228 and previous config saved to /var/cache/conftool/dbconfig/20240620-131038-marostegui.json
  • 13:10 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2152 (T364069)', diff saved to https://phabricator.wikimedia.org/P65227 and previous config saved to /var/cache/conftool/dbconfig/20240620-131031-marostegui.json
  • 13:10 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2152.codfw.wmnet with reason: Maintenance
  • 13:10 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2152.codfw.wmnet with reason: Maintenance
  • 13:09 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2124 (T367856)', diff saved to https://phabricator.wikimedia.org/P65226 and previous config saved to /var/cache/conftool/dbconfig/20240620-130928-marostegui.json
  • 13:09 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1050.eqiad.wmnet with OS bookworm
  • 13:09 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1049.eqiad.wmnet with OS bookworm
  • 13:09 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2124.codfw.wmnet with reason: Maintenance
  • 13:09 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2124.codfw.wmnet with reason: Maintenance
  • 13:09 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
  • 13:08 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
  • 13:08 marostegui@cumin2002: dbctl commit (dc=all): 'db1236 (re)pooling @ 10%: After schema change', diff saved to https://phabricator.wikimedia.org/P65225 and previous config saved to /var/cache/conftool/dbconfig/20240620-130804-root.json
  • 13:07 sukhe: running homer on cr*{eqiad,codfw}* for CR 1046737: update policies/cr-labs.yaml for new NTP servers: T366360
  • 13:06 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cumin1002.eqiad.wmnet
  • 13:05 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host ml-staging2003.codfw.wmnet
  • 13:00 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host cumin1002.eqiad.wmnet
  • 13:00 klausman@cumin2002: START - Cookbook sre.hosts.reboot-single for host ml-staging2003.codfw.wmnet
  • 12:54 sukhe: sudo cumin -b1 -s30 "A:installserver" "run-puppet-agent": T366360
  • 12:51 marostegui@cumin2002: dbctl commit (dc=all): 'db1236 (re)pooling @ 5%: 1', diff saved to https://phabricator.wikimedia.org/P65223 and previous config saved to /var/cache/conftool/dbconfig/20240620-125139-root.json
  • 12:51 jelto@cumin1002: END (PASS) - Cookbook sre.gitlab.upgrade (exit_code=0) on GitLab host gitlab1003.wikimedia.org with reason: Upgrade GitLab Replica to new version
  • 12:44 jelto@cumin1002: START - Cookbook sre.gitlab.upgrade on GitLab host gitlab1003.wikimedia.org with reason: Upgrade GitLab Replica to new version
  • 12:37 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1048.eqiad.wmnet with OS bookworm
  • 12:33 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1047.eqiad.wmnet with OS bookworm
  • 12:11 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1048.eqiad.wmnet with reason: host reimage
  • 12:08 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1047.eqiad.wmnet with reason: host reimage
  • 12:06 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1048.eqiad.wmnet with reason: host reimage
  • 12:04 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1047.eqiad.wmnet with reason: host reimage
  • 11:52 cgoubert@cumin1002: conftool action : set/pooled=inactive; selector: name=mw2282.codfw.wmnet,cluster=kubernetes,service=kubesvc
  • 11:48 ayounsi@cumin1002: END (PASS) - Cookbook sre.netbox.update-extras (exit_code=0) rolling restart_daemons on A:netbox
  • 11:48 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1048.eqiad.wmnet with OS bookworm
  • 11:47 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1047.eqiad.wmnet with OS bookworm
  • 11:41 ayounsi@cumin1002: START - Cookbook sre.netbox.update-extras rolling restart_daemons on A:netbox
  • 11:38 XioNoX: merge netbox-extra CR1038869 - Fix lots of CI errors
  • 11:33 jgiannelos@deploy1002: Finished deploy [restbase/deploy@f867c66]: (no justification provided) (duration: 30m 12s)
  • 11:27 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mathoid: apply
  • 11:26 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/mathoid: apply
  • 11:25 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/mathoid: apply
  • 11:25 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/mathoid: apply
  • 11:21 akosiaris: upgrade mathoid to 2024-06-18-233457-production T349118
  • 11:20 akosiaris@deploy1002: helmfile [staging] DONE helmfile.d/services/mathoid: sync
  • 11:20 akosiaris@deploy1002: helmfile [staging] START helmfile.d/services/mathoid: sync
  • 11:03 jgiannelos@deploy1002: Started deploy [restbase/deploy@f867c66]: (no justification provided)
  • 10:57 dreamyjazz@deploy1002: Finished scap: Backport for [testwiki] Fix assignment of 'checkuser-temporary-account' right (T367170) (duration: 15m 03s)
  • 10:48 dreamyjazz@deploy1002: dreamyjazz: Continuing with sync
  • 10:44 dreamyjazz@deploy1002: dreamyjazz: Backport for [testwiki] Fix assignment of 'checkuser-temporary-account' right (T367170) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 10:42 dreamyjazz@deploy1002: Started scap: Backport for [testwiki] Fix assignment of 'checkuser-temporary-account' right (T367170)
  • 10:41 Amir1: running extensions/Echo/maintenance/removeOrphanedEvents.php --force on all wikis (T308084)
  • 10:37 dreamyjazz@deploy1002: Finished scap: Backport for [testwiki] Assign 'checkuser-temporary-account' to the sysop group (T367170) (duration: 13m 49s)
  • 10:33 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1046.eqiad.wmnet with OS bookworm
  • 10:33 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1045.eqiad.wmnet with OS bookworm
  • 10:31 cgoubert@cumin1002: conftool action : set/pooled=yes; selector: name=mw2321.codfw.wmnet,cluster=kubernetes,service=kubesvc
  • 10:31 claime: repooling and uncordoning mw2321.codfw.wmnet - T367862
  • 10:31 kamila@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-ctrl2002.codfw.wmnet with OS bullseye
  • 10:30 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw2321.codfw.wmnet
  • 10:30 cgoubert@cumin1002: START - Cookbook sre.hosts.remove-downtime for mw2321.codfw.wmnet
  • 10:28 dreamyjazz@deploy1002: dreamyjazz: Continuing with sync
  • 10:25 dreamyjazz@deploy1002: dreamyjazz: Backport for [testwiki] Assign 'checkuser-temporary-account' to the sysop group (T367170) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 10:24 kamila@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-ctrl2002.codfw.wmnet with OS bullseye
  • 10:23 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl2002.codfw.wmnet with OS bullseye
  • 10:23 dreamyjazz@deploy1002: Started scap: Backport for [testwiki] Assign 'checkuser-temporary-account' to the sysop group (T367170)
  • 10:20 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mw2321.codfw.wmnet with reason: Test scap with host unavailable
  • 10:20 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 10:20 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on mw2321.codfw.wmnet with reason: Test scap with host unavailable
  • 10:19 jiji@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 10:18 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 10:18 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 10:17 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 10:16 jiji@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 10:16 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 10:15 cgoubert@cumin1002: conftool action : set/pooled=inactive; selector: name=mw2321.codfw.wmnet,cluster=kubernetes,service=kubesvc
  • 10:14 claime: Draining and depooling mw2321.codfw.wmnet to test 1047031 - T367862
  • 10:14 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 10:07 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1046.eqiad.wmnet with reason: host reimage
  • 10:04 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1045.eqiad.wmnet with reason: host reimage
  • 10:04 claime: Running puppet on A:wikikube-worker
  • 10:02 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1046.eqiad.wmnet with reason: host reimage
  • 10:01 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1045.eqiad.wmnet with reason: host reimage
  • 10:00 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply
  • 10:00 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply
  • 09:51 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply
  • 09:51 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-video: apply
  • 09:50 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply
  • 09:49 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-video: apply
  • 09:47 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 09:45 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1046.eqiad.wmnet with OS bookworm
  • 09:45 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1045.eqiad.wmnet with OS bookworm
  • 09:45 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 09:16 zabe: zabe@mwmaint1002:~$ mwscript createAndPromote.php sysop_plwiki AramilFeraxa REDACTED --bureaucrat --sysop # T361041
  • 08:57 kamila@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-ctrl2001.codfw.wmnet with OS bullseye
  • 08:51 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl2001.codfw.wmnet with OS bullseye
  • 08:51 cmooney@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) homer to cumin2002.codfw.wmnet,cumin1002.eqiad.wmnet with reason: Release v0.6.6 - cmooney@cumin1002
  • 08:50 kamila@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host wikikube-ctrl2001.codfw.wmnet with OS bullseye
  • 08:49 cmooney@cumin1002: START - Cookbook sre.deploy.python-code homer to cumin2002.codfw.wmnet,cumin1002.eqiad.wmnet with reason: Release v0.6.6 - cmooney@cumin1002
  • 08:36 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1044.eqiad.wmnet with OS bookworm
  • 08:33 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl2001.codfw.wmnet with OS bullseye
  • 08:23 kamila@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-ctrl2001.codfw.wmnet with OS bullseye
  • 08:16 jnuche@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.43.0-wmf.10 refs T361404
  • 08:10 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc1001.wikimedia.org
  • 08:08 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc1001.wikimedia.org
  • 08:08 moritzm: reboot of irc1001 to nudge clients to re-connect to the new bullseye host T331702
  • 08:06 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1044.eqiad.wmnet with reason: host reimage
  • 08:03 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1044.eqiad.wmnet with reason: host reimage
  • 07:53 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 07:53 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 07:53 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 07:52 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 07:48 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1044.eqiad.wmnet with OS bookworm
  • 07:04 moritzm: failover irc.wikimedia.org to the new Bullseye servers T331702
  • 06:04 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 18:00:00 on an-worker1085.eqiad.wmnet with reason: T367825 hw maint 2024-06-20
  • 06:03 ryankemper@cumin2002: START - Cookbook sre.hosts.downtime for 18:00:00 on an-worker1085.eqiad.wmnet with reason: T367825 hw maint 2024-06-20
  • 05:27 marostegui: Deploy schema change on old s7 eqiad master dbmaint (db1236) T364299
  • 05:25 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1236.eqiad.wmnet with reason: Long schema change
  • 05:25 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1236.eqiad.wmnet with reason: Long schema change
  • 05:24 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1236 T367857', diff saved to https://phabricator.wikimedia.org/P65220 and previous config saved to /var/cache/conftool/dbconfig/20240620-052359-root.json
  • 05:22 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db1181 to s7 primary and set section read-write T367857', diff saved to https://phabricator.wikimedia.org/P65219 and previous config saved to /var/cache/conftool/dbconfig/20240620-052253-marostegui.json
  • 05:22 marostegui@cumin1002: dbctl commit (dc=all): 'Set s7 eqiad as read-only for maintenance - T367857', diff saved to https://phabricator.wikimedia.org/P65218 and previous config saved to /var/cache/conftool/dbconfig/20240620-052230-marostegui.json
  • 05:22 marostegui: Starting s7 eqiad failover from db1236 to db1181 - T367857
  • 05:20 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2158.codfw.wmnet with reason: Long schema change
  • 05:20 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2158.codfw.wmnet with reason: Long schema change
  • 05:04 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 28 hosts with reason: Primary switchover s7 T367857
  • 05:04 marostegui@cumin1002: dbctl commit (dc=all): 'Set db1181 with weight 0 T367857', diff saved to https://phabricator.wikimedia.org/P65217 and previous config saved to /var/cache/conftool/dbconfig/20240620-050428-marostegui.json
  • 05:04 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 28 hosts with reason: Primary switchover s7 T367857
  • 02:24 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2158 (T367856)', diff saved to https://phabricator.wikimedia.org/P65216 and previous config saved to /var/cache/conftool/dbconfig/20240620-022416-marostegui.json
  • 02:24 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 02:24 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 02:24 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2158.codfw.wmnet with reason: Maintenance
  • 02:23 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2158.codfw.wmnet with reason: Maintenance
  • 02:23 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T367856)', diff saved to https://phabricator.wikimedia.org/P65215 and previous config saved to /var/cache/conftool/dbconfig/20240620-022349-marostegui.json
  • 02:08 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P65214 and previous config saved to /var/cache/conftool/dbconfig/20240620-020842-marostegui.json
  • 01:53 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P65213 and previous config saved to /var/cache/conftool/dbconfig/20240620-015335-marostegui.json
  • 01:38 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T367856)', diff saved to https://phabricator.wikimedia.org/P65212 and previous config saved to /var/cache/conftool/dbconfig/20240620-013827-marostegui.json

2024-06-19

  • 23:05 zabe: zabe@mwmaint1002:~$ mwscript createAndPromote.php arbcom_itwiki Superpes15 REDACTED --bureaucrat --sysop
  • 23:05 zabe: zabe@mwmaint1002:~$ mwscript createAndPromote.php u4cwiki Superpes15 REDACTED --bureaucrat --sysop
  • 21:08 oblivian@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on wikikube-ctrl[2001-2002].codfw.wmnet with reason: Reimage --kamila
  • 21:08 oblivian@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on wikikube-ctrl[2001-2002].codfw.wmnet with reason: Reimage --kamila
  • 20:33 zabe@deploy1002: Finished scap: Backport for [tlywiki] Change the logo and wordmark/tagline (T366431) (duration: 14m 41s)
  • 20:24 zabe@deploy1002: superpes, zabe: Continuing with sync
  • 20:23 zabe@deploy1002: superpes, zabe: Backport for [tlywiki] Change the logo and wordmark/tagline (T366431) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:19 zabe@deploy1002: Started scap: Backport for [tlywiki] Change the logo and wordmark/tagline (T366431)
  • 19:08 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-ctrl2002.codfw.wmnet with reason: host reimage
  • 19:05 kamila@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-ctrl2002.codfw.wmnet with reason: host reimage
  • 18:54 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-ctrl2001.codfw.wmnet with reason: host reimage
  • 18:51 kamila@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-ctrl2001.codfw.wmnet with reason: host reimage
  • 18:49 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl2002.codfw.wmnet with OS bullseye
  • 18:48 kamila@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-ctrl2002.codfw.wmnet with OS bullseye
  • 18:40 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl2002.codfw.wmnet with OS bullseye
  • 18:35 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl2001.codfw.wmnet with OS bullseye
  • 18:34 kamila@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host wikikube-ctrl2001.codfw.wmnet with OS bullseye
  • 18:29 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2151 (T367856)', diff saved to https://phabricator.wikimedia.org/P65211 and previous config saved to /var/cache/conftool/dbconfig/20240619-182922-marostegui.json
  • 18:29 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2151.codfw.wmnet with reason: Maintenance
  • 18:29 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2151.codfw.wmnet with reason: Maintenance
  • 18:29 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2124 (T367856)', diff saved to https://phabricator.wikimedia.org/P65210 and previous config saved to /var/cache/conftool/dbconfig/20240619-182900-marostegui.json
  • 18:21 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl2001.codfw.wmnet with OS bullseye
  • 18:13 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P65209 and previous config saved to /var/cache/conftool/dbconfig/20240619-181353-marostegui.json
  • 17:58 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P65208 and previous config saved to /var/cache/conftool/dbconfig/20240619-175846-marostegui.json
  • 17:43 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2124 (T367856)', diff saved to https://phabricator.wikimedia.org/P65207 and previous config saved to /var/cache/conftool/dbconfig/20240619-174338-marostegui.json
  • 17:21 kamila@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:21 kamila@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Moved wikikube-ctrl200[12] to a new rack - kamila@cumin1002"
  • 17:20 kamila@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Moved wikikube-ctrl200[12] to a new rack - kamila@cumin1002"
  • 17:13 kamila@cumin1002: START - Cookbook sre.dns.netbox
  • 17:05 taavi@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1044.eqiad.wmnet with OS bookworm
  • 17:01 kamila@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-ctrl2002
  • 17:01 kamila@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-ctrl2002
  • 17:01 kamila@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-ctrl2001
  • 17:01 kamila@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-ctrl2001
  • 17:00 kamila@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:00 kamila@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Moved wikikube-ctrl200[12] to a new rack - kamila@cumin1002"
  • 16:59 kamila@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Moved wikikube-ctrl200[12] to a new rack - kamila@cumin1002"
  • 16:42 sukhe: sudo cumin 'A:durum' 'run-puppet-agent' to switch timesyncd NTP pools to ntp-[abc].anycast.wmnet: T366360
  • 16:27 claime: pooling and uncordoning mw2321.codfw.wmnet - T367702
  • 16:27 cgoubert@cumin1002: conftool action : set/weight=10:pooled=yes; selector: name=mw2321.codfw.wmnet,cluster=kubernetes,service=kubesvc
  • 16:19 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: service=(ntp-a|ntp-b|ntp-c)
  • 16:15 ayounsi@cumin1002: END (PASS) - Cookbook sre.deploy.python-code (exit_code=0) netbox-dev to netbox-dev2003.codfw.wmnet with reason: Netbox 4 on netbox-dev2003 - ayounsi@cumin1002 - T336275
  • 16:13 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:13 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mw2321.codfw.wmnet back to active - cgoubert@cumin1002"
  • 16:12 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: mw2321.codfw.wmnet back to active - cgoubert@cumin1002"
  • 16:09 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 16:03 ayounsi@cumin1002: START - Cookbook sre.deploy.python-code netbox-dev to netbox-dev2003.codfw.wmnet with reason: Netbox 4 on netbox-dev2003 - ayounsi@cumin1002 - T336275
  • 15:55 ayounsi@cumin1002: END (FAIL) - Cookbook sre.deploy.python-code (exit_code=99) netbox-dev to netbox-dev2003.codfw.wmnet with reason: Netbox 4 on netbox-dev2003 - ayounsi@cumin1002 - T336275
  • 15:55 ayounsi@cumin1002: START - Cookbook sre.deploy.python-code netbox-dev to netbox-dev2003.codfw.wmnet with reason: Netbox 4 on netbox-dev2003 - ayounsi@cumin1002 - T336275
  • 15:51 dcausse@deploy1002: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:50 dcausse@deploy1002: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:46 dcausse@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:46 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/superset-next: apply
  • 15:45 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/superset-next: apply
  • 15:44 dcausse@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:43 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1044.eqiad.wmnet with OS bookworm
  • 15:32 taavi@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudvirt1042
  • 15:32 taavi@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudvirt1042
  • 15:24 pfischer@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:24 pfischer@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:23 pfischer@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:23 pfischer@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:23 pfischer@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:22 pfischer@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:16 sukhe: sudo cumin -b1 -s120 'A:dnsbox' 'run-puppet-agent --enable "merging CR 1046685"': T366360
  • 15:08 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:08 pt1979@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS info - pt1979@cumin2002"
  • 15:07 pt1979@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add DNS info - pt1979@cumin2002"
  • 15:07 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1043.eqiad.wmnet with OS bookworm
  • 15:06 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 15:04 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=dns2006.wikimedia.org,service=ntp-c
  • 15:03 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 15:01 pfischer@deploy1002: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:01 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on mw2282.codfw.wmnet with reason: Host move
  • 15:01 pfischer@deploy1002: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:01 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on mw2282.codfw.wmnet with reason: Host move
  • 15:00 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for mw2282.codfw.wmnet
  • 15:00 cgoubert@cumin1002: START - Cookbook sre.hosts.remove-downtime for mw2282.codfw.wmnet
  • 14:59 cgoubert@cumin1002: END (ERROR) - Cookbook sre.hosts.remove-downtime (exit_code=97) for wikikube-worker2003.codfw.wmnet
  • 14:59 cgoubert@cumin1002: START - Cookbook sre.hosts.remove-downtime for wikikube-worker2003.codfw.wmnet
  • 14:42 marostegui: Deploy schema change on s2 eqiad master dbmaint T364069
  • 14:40 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db[1155-1156].eqiad.wmnet with reason: Long schema change
  • 14:40 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db[1155-1156].eqiad.wmnet with reason: Long schema change
  • 14:39 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db[1155,1158].eqiad.wmnet with reason: Long schema change
  • 14:38 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1042.eqiad.wmnet with OS bookworm
  • 14:38 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on clouddb[1014,1018,1021].eqiad.wmnet,db[1155,1158].eqiad.wmnet with reason: Long schema change
  • 14:38 taavi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - taavi@cumin1002"
  • 14:37 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1043.eqiad.wmnet with reason: host reimage
  • 14:36 taavi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - taavi@cumin1002"
  • 14:35 moritzm: installing nano security updates
  • 14:34 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1043.eqiad.wmnet with reason: host reimage
  • 14:24 moritzm: installing libvpx security updates
  • 14:23 moritzm: installing pymysql security updates
  • 14:19 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub: sync on production
  • 14:19 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1043.eqiad.wmnet with OS bookworm
  • 14:17 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub: apply on production
  • 14:14 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub-next: sync on staging
  • 14:12 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub-next: apply on staging
  • 14:11 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1042.eqiad.wmnet with reason: host reimage
  • 14:11 taavi@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['cloudvirt1043.eqiad.wmnet']
  • 14:10 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host ml-staging2003.codfw.wmnet with OS bookworm
  • 14:10 klausman@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - klausman@cumin2002"
  • 14:09 taavi@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1043.eqiad.wmnet']
  • 14:09 klausman@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - klausman@cumin2002"
  • 14:09 taavi@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=1) upgrade firmware for hosts ['cloudvirt1043.eqiad.wmnet']
  • 14:08 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1042.eqiad.wmnet with reason: host reimage
  • 14:08 taavi@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1043.eqiad.wmnet']
  • 14:07 taavi@cumin2002: END (ERROR) - Cookbook sre.hardware.upgrade-firmware (exit_code=97) upgrade firmware for hosts ['cloudvirt1043.eqiad.wmnet']
  • 14:07 taavi@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1043.eqiad.wmnet']
  • 14:01 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1044.eqiad.wmnet with OS bookworm
  • 13:57 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on ml-staging2003.codfw.wmnet with reason: host reimage
  • 13:54 klausman@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on ml-staging2003.codfw.wmnet with reason: host reimage
  • 13:53 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1042.eqiad.wmnet with OS bookworm
  • 13:53 taavi@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudvirt1042.eqiad.wmnet with OS bookworm
  • 13:51 klausman@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Trying to fix Puppet error on ml-staging2003 - klausman@cumin2002"
  • 13:50 klausman@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Trying to fix Puppet error on ml-staging2003 - klausman@cumin2002"
  • 13:49 klausman@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Trying to fix Puppet error on ml-staging2003 - klausman@cumin2002"
  • 13:48 klausman@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Trying to fix Puppet error on ml-staging2003 - klausman@cumin2002"
  • 13:43 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1042.eqiad.wmnet with OS bookworm
  • 13:42 klausman@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Trying to fix Puppet error on ml-staging2003 - klausman@cumin2002"
  • 13:41 taavi@cumin2002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt1042.eqiad.wmnet']
  • 13:41 klausman@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Trying to fix Puppet error on ml-staging2003 - klausman@cumin2002"
  • 13:35 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp5017.eqsin.wmnet
  • 13:35 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1044.eqiad.wmnet with reason: host reimage
  • 13:35 taavi@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt1043.eqiad.wmnet with OS bookworm
  • 13:35 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on P{cp5017.*} and A:cp
  • 13:32 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on P{cp5017.*} and A:cp
  • 13:32 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1044.eqiad.wmnet with reason: host reimage
  • 13:32 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=dns6001.wikimedia.org,service=ntp-a
  • 13:31 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp5017.eqsin.wmnet
  • 13:31 taavi@cumin2002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1042.eqiad.wmnet']
  • 13:28 sukhe: enable puppet on dns6001 to test CR 1046685
  • 13:23 sukhe: sudo cumin 'A:dnsbox' 'disable-puppet "merging CR 1046685"': T366360
  • 13:22 pt1979@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:21 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on mw2282.codfw.wmnet with reason: host move
  • 13:21 kamila@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on mw2282.codfw.wmnet with reason: host move
  • 13:20 pt1979@cumin2002: START - Cookbook sre.dns.netbox
  • 13:17 kamila_: drained mw2282.codfw.wmnet for T361856
  • 13:16 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1044.eqiad.wmnet with OS bookworm
  • 13:06 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s_services/services/datahub: sync on production
  • 13:04 sukhe@puppetmaster1001: conftool action : set/weight=100; selector: service=ntp-[abc]
  • 13:04 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s_services/services/datahub: sync on production
  • 12:52 taavi@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt1042.eqiad.wmnet']
  • 12:51 cgoubert@cumin1002: conftool action : set/weight=10:pooled=yes; selector: name=(wikikube-worker2011.codfw.wmnet|wikikube-worker2012.codfw.wmnet|wikikube-worker2013.codfw.wmnet|wikikube-worker2014.codfw.wmnet|wikikube-worker2017.codfw.wmnet|wikikube-worker2018.codfw.wmnet),cluster=kubernetes,service=kubesvc
  • 12:40 claime: homer 'cr*codfw*' commit 'T351074'
  • 12:38 pfischer@deploy1002: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 12:38 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp5017.eqsin.wmnet
  • 12:38 pfischer@deploy1002: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 12:37 taavi@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1042.eqiad.wmnet']
  • 12:37 taavi@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt1042.eqiad.wmnet']
  • 12:36 taavi@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1042.eqiad.wmnet']
  • 12:36 taavi@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts ['cloudvirt1042.eqiad.wmnet']
  • 12:36 taavi@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts ['cloudvirt1042.eqiad.wmnet']
  • 12:35 taavi@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudvirt1042.eqiad.wmnet with OS bookworm
  • 12:34 klausman: Puppet management of install2004 restored, lpxelinux.0 also restored.
  • 12:24 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp5017.eqsin.wmnet
  • 12:22 pfischer@deploy1002: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 12:21 pfischer@deploy1002: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 12:20 pfischer@deploy1002: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 12:19 pfischer@deploy1002: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 12:17 klausman@cumin2002: START - Cookbook sre.hosts.reimage for host ml-staging2003.codfw.wmnet with OS bookworm
  • 12:14 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply
  • 12:14 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-video: apply
  • 12:13 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1043.eqiad.wmnet with OS bookworm
  • 12:12 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-video: apply
  • 12:11 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-video: apply
  • 12:11 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply
  • 12:10 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-video: apply
  • 12:08 klausman: Will test-replace the PXE chainloader (/srv/tftpboot/lpxelinux.0) on install2003 with a newer version to see if it fixes the ldlinux.c32 error. Puppet will be disabled on that machine for the duration.
  • 12:07 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-video: apply
  • 12:07 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-video: apply
  • 12:06 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp5017.eqsin.wmnet
  • 12:03 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp5017.eqsin.wmnet
  • 12:03 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp5017.eqsin.wmnet
  • 12:02 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp5017.eqsin.wmnet
  • 12:01 marostegui@cumin1002: dbctl commit (dc=all): 'db2152 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P65204 and previous config saved to /var/cache/conftool/dbconfig/20240619-120142-root.json
  • 12:01 slyngshede@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 5 days, 0:00:00 on idp-test1002.wikimedia.org with reason: CAS 7 upgrade
  • 12:01 slyngshede@cumin1002: START - Cookbook sre.hosts.downtime for 5 days, 0:00:00 on idp-test1002.wikimedia.org with reason: CAS 7 upgrade
  • 12:00 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply
  • 12:00 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on P{cp5017.*} and A:cp
  • 11:57 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1042.eqiad.wmnet with OS bookworm
  • 11:57 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on P{cp5017.*} and A:cp
  • 11:50 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-video: apply
  • 11:46 marostegui@cumin1002: dbctl commit (dc=all): 'db2152 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P65203 and previous config saved to /var/cache/conftool/dbconfig/20240619-114636-root.json
  • 11:36 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-eqsin
  • 11:31 marostegui@cumin1002: dbctl commit (dc=all): 'db2152 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P65201 and previous config saved to /var/cache/conftool/dbconfig/20240619-113131-root.json
  • 11:26 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply
  • 11:18 ayounsi@cumin1002: END (FAIL) - Cookbook sre.ganeti.makevm (exit_code=97) for new host netbox-dev2003.codfw.wmnet
  • 11:18 ayounsi@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host netbox-dev2003.codfw.wmnet with OS bookworm
  • 11:17 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2012.codfw.wmnet with OS bullseye
  • 11:16 marostegui@cumin1002: dbctl commit (dc=all): 'db2152 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P65200 and previous config saved to /var/cache/conftool/dbconfig/20240619-111625-root.json
  • 11:15 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-video: apply
  • 11:15 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2013.codfw.wmnet with OS bullseye
  • 11:14 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply
  • 11:14 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 11:13 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 11:12 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 11:11 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2014.codfw.wmnet with OS bullseye
  • 11:10 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 11:08 hnowlan@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 11:07 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2017.codfw.wmnet with OS bullseye
  • 11:07 hnowlan@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 11:07 hnowlan@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 11:06 hnowlan@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 11:04 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-video: apply
  • 11:03 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2018.codfw.wmnet with OS bullseye
  • 11:01 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2011.codfw.wmnet with OS bullseye
  • 11:01 marostegui@cumin1002: dbctl commit (dc=all): 'db2152 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P65199 and previous config saved to /var/cache/conftool/dbconfig/20240619-110120-root.json
  • 10:58 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2012.codfw.wmnet with reason: host reimage
  • 10:55 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2013.codfw.wmnet with reason: host reimage
  • 10:51 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2014.codfw.wmnet with reason: host reimage
  • 10:47 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2017.codfw.wmnet with reason: host reimage
  • 10:46 marostegui@cumin1002: dbctl commit (dc=all): 'db2152 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P65198 and previous config saved to /var/cache/conftool/dbconfig/20240619-104614-root.json
  • 10:44 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2018.codfw.wmnet with reason: host reimage
  • 10:41 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2011.codfw.wmnet with reason: host reimage
  • 10:40 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2017.codfw.wmnet with reason: host reimage
  • 10:40 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2018.codfw.wmnet with reason: host reimage
  • 10:40 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2013.codfw.wmnet with reason: host reimage
  • 10:40 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2014.codfw.wmnet with reason: host reimage
  • 10:40 jmm@deploy1002: Finished scap: (no justification provided) (duration: 04m 03s)
  • 10:39 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2012.codfw.wmnet with reason: host reimage
  • 10:39 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2011.codfw.wmnet with reason: host reimage
  • 10:36 jmm@deploy1002: Started scap: (no justification provided)
  • 10:31 marostegui@cumin1002: dbctl commit (dc=all): 'db2152 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P65197 and previous config saved to /var/cache/conftool/dbconfig/20240619-103109-root.json
  • 10:25 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2018.codfw.wmnet with OS bullseye
  • 10:25 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2017.codfw.wmnet with OS bullseye
  • 10:25 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2124 (T367856)', diff saved to https://phabricator.wikimedia.org/P65196 and previous config saved to /var/cache/conftool/dbconfig/20240619-102504-marostegui.json
  • 10:24 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2124.codfw.wmnet with reason: Maintenance
  • 10:24 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2014.codfw.wmnet with OS bullseye
  • 10:24 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2124.codfw.wmnet with reason: Maintenance
  • 10:24 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2013.codfw.wmnet with OS bullseye
  • 10:24 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2012.codfw.wmnet with OS bullseye
  • 10:23 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2011.codfw.wmnet with OS bullseye
  • 10:23 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2409 to wikikube-worker2018
  • 10:23 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2018
  • 10:22 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2018
  • 10:22 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:22 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2409 to wikikube-worker2018 - cgoubert@cumin1002"
  • 10:21 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2409 to wikikube-worker2018 - cgoubert@cumin1002"
  • 10:18 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 10:18 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2409 to wikikube-worker2018
  • 10:18 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2408 to wikikube-worker2017
  • 10:18 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2017
  • 10:17 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2017
  • 10:17 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:17 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2408 to wikikube-worker2017 - cgoubert@cumin1002"
  • 10:16 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2408 to wikikube-worker2017 - cgoubert@cumin1002"
  • 10:16 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152', diff saved to https://phabricator.wikimedia.org/P65195 and previous config saved to /var/cache/conftool/dbconfig/20240619-101625-marostegui.json
  • 10:14 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 10:14 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2408 to wikikube-worker2017
  • 10:12 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2405 to wikikube-worker2014
  • 10:12 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2014
  • 10:12 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2014
  • 10:12 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:12 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2405 to wikikube-worker2014 - cgoubert@cumin1002"
  • 10:09 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2405 to wikikube-worker2014 - cgoubert@cumin1002"
  • 10:06 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 10:06 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2405 to wikikube-worker2014
  • 10:06 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2404 to wikikube-worker2013
  • 10:05 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2013
  • 10:05 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/sessionstore: apply
  • 10:05 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2013
  • 10:05 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:05 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2404 to wikikube-worker2013 - cgoubert@cumin1002"
  • 10:03 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2404 to wikikube-worker2013 - cgoubert@cumin1002"
  • 10:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2152 (T364069)', diff saved to https://phabricator.wikimedia.org/P65194 and previous config saved to /var/cache/conftool/dbconfig/20240619-100118-marostegui.json
  • 10:00 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1017.eqiad.wmnet,service=s1
  • 10:00 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 09:59 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2404 to wikikube-worker2013
  • 09:58 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2403 to wikikube-worker2012
  • 09:58 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2012
  • 09:55 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/sessionstore: apply
  • 09:54 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/sessionstore: apply
  • 09:53 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2012
  • 09:53 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:53 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2403 to wikikube-worker2012 - cgoubert@cumin1002"
  • 09:51 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2403 to wikikube-worker2012 - cgoubert@cumin1002"
  • 09:51 ayounsi@cumin1002: END (FAIL) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=99) generate netbox hiera data: "netbox-dev2003 - ayounsi@cumin1002"
  • 09:47 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "netbox-dev2003 - ayounsi@cumin1002"
  • 09:47 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 09:47 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2403 to wikikube-worker2012
  • 09:46 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2400 to wikikube-worker2011
  • 09:46 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2011
  • 09:46 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2011
  • 09:46 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:46 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2400 to wikikube-worker2011 - cgoubert@cumin1002"
  • 09:44 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/sessionstore: apply
  • 09:43 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2400 to wikikube-worker2011 - cgoubert@cumin1002"
  • 09:40 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 09:40 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2400 to wikikube-worker2011
  • 09:40 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-eqsin
  • 09:34 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 09:32 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/sessionstore: apply
  • 09:22 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/sessionstore: apply
  • 09:21 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/sessionstore: apply
  • 09:16 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on netbox-dev2003.codfw.wmnet with reason: host reimage
  • 09:15 claime: Depooling mw2400.codfw.wmnet,mw2403.codfw.wmnet,mw2404.codfw.wmnet,mw2405.codfw.wmnet,mw2408.codfw.wmnet,mw2409.codfw.wmnet for reimage - T351074
  • 09:13 ayounsi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on netbox-dev2003.codfw.wmnet with reason: host reimage
  • 09:11 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/sessionstore: apply
  • 09:10 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host moss-fe2001.codfw.wmnet with OS bookworm
  • 09:01 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on P{cp5025.*} and A:cp
  • 08:59 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host moss-fe1001.eqiad.wmnet with OS bookworm
  • 08:58 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on P{cp5025.*} and A:cp
  • 08:57 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on P{cp5017.*} and A:cp
  • 08:54 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on P{cp5017.*} and A:cp
  • 08:52 fabfur: upgrading eqsin cp hosts to haproxy 2.8.10 (https://gerrit.wikimedia.org/r/c/operations/puppet/+/1047436) (T367756)
  • 08:51 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on moss-fe2001.codfw.wmnet with reason: host reimage
  • 08:48 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on moss-fe2001.codfw.wmnet with reason: host reimage
  • 08:40 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on moss-fe1001.eqiad.wmnet with reason: host reimage
  • 08:39 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 15830
  • 08:38 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on moss-fe1001.eqiad.wmnet with reason: host reimage
  • 08:35 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 15830
  • 08:31 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host moss-fe2001.codfw.wmnet with OS bookworm
  • 08:30 mvernon@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host moss-fe2001.codfw.wmnet with OS bookworm
  • 08:30 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on P{ms-fe*} and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
  • 08:24 mvernon@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on P{ms-fe*} and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
  • 08:23 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host moss-fe1001.eqiad.wmnet with OS bookworm
  • 08:23 mvernon@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host moss-fe1001.eqiad.wmnet with OS bookworm
  • 08:18 jnuche@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.43.0-wmf.10 refs T361404
  • 08:12 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host moss-fe2001.codfw.wmnet with OS bookworm
  • 08:11 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host moss-fe1001.eqiad.wmnet with OS bookworm
  • 08:09 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on P{ms-fe*} and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
  • 08:03 mvernon@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on P{ms-fe*} and (A:swift-fe or A:swift-fe-canary or A:swift-fe-codfw or A:swift-fe-eqiad)
  • 08:01 ayounsi@cumin1002: START - Cookbook sre.hosts.reimage for host netbox-dev2003.codfw.wmnet with OS bookworm
  • 08:00 ayounsi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM netbox-dev2003.codfw.wmnet - ayounsi@cumin1002"
  • 07:59 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM netbox-dev2003.codfw.wmnet - ayounsi@cumin1002"
  • 07:59 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) netbox-dev2003.codfw.wmnet on all recursors
  • 07:59 ayounsi@cumin1002: START - Cookbook sre.dns.wipe-cache netbox-dev2003.codfw.wmnet on all recursors
  • 07:59 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 07:59 ayounsi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM netbox-dev2003.codfw.wmnet - ayounsi@cumin1002"
  • 07:57 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM netbox-dev2003.codfw.wmnet - ayounsi@cumin1002"
  • 07:54 ayounsi@cumin1002: START - Cookbook sre.dns.netbox
  • 07:54 ayounsi@cumin1002: START - Cookbook sre.ganeti.makevm for new host netbox-dev2003.codfw.wmnet
  • 07:48 kartik@deploy1002: Finished scap: Backport for igwiki: Enable MinT for Wikipedia readers (T363464) (duration: 18m 55s)
  • 07:38 kartik@deploy1002: kartik: Continuing with sync
  • 07:36 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
  • 07:36 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2114.codfw.wmnet with reason: Maintenance
  • 07:33 kartik@deploy1002: kartik: Backport for igwiki: Enable MinT for Wikipedia readers (T363464) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 07:29 kartik@deploy1002: Started scap: Backport for igwiki: Enable MinT for Wikipedia readers (T363464)
  • 07:22 kartik@deploy1002: Finished scap: Backport for testwiki: Enable MinT for Wikipedia readers MVP on a Igbo Wikipedia (T367852) (duration: 20m 12s)
  • 07:20 marostegui: Deploy schema change on old s7 eqiad master db1160 dbmaint T364069
  • 07:15 marostegui@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P65192 and previous config saved to /var/cache/conftool/dbconfig/20240619-071516-root.json
  • 07:12 kartik@deploy1002: kartik: Continuing with sync
  • 07:07 kartik@deploy1002: kartik: Backport for testwiki: Enable MinT for Wikipedia readers MVP on a Igbo Wikipedia (T367852) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 07:02 kartik@deploy1002: Started scap: Backport for testwiki: Enable MinT for Wikipedia readers MVP on a Igbo Wikipedia (T367852)
  • 07:00 marostegui@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P65191 and previous config saved to /var/cache/conftool/dbconfig/20240619-070010-root.json
  • 06:52 jynus: stop db1240:s1, wipe and reimport db1240:s3 T367162
  • 06:45 marostegui@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P65190 and previous config saved to /var/cache/conftool/dbconfig/20240619-064505-root.json
  • 06:40 XioNoX: merge Puppet "Prepare for netbox-dev" CR1047081
  • 06:33 marostegui@cumin1002: dbctl commit (dc=all): 'db1202 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P65189 and previous config saved to /var/cache/conftool/dbconfig/20240619-063337-root.json
  • 06:30 marostegui@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P65188 and previous config saved to /var/cache/conftool/dbconfig/20240619-062959-root.json
  • 06:21 _joe_: upgrading conftool everywhere T367919
  • 06:18 marostegui@cumin1002: dbctl commit (dc=all): 'db1202 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P65187 and previous config saved to /var/cache/conftool/dbconfig/20240619-061831-root.json
  • 06:17 marostegui@cumin1002: dbctl commit (dc=all): 'db1160 (re)pooling @ 100%: After reimage', diff saved to https://phabricator.wikimedia.org/P65186 and previous config saved to /var/cache/conftool/dbconfig/20240619-061721-root.json
  • 06:14 marostegui@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P65185 and previous config saved to /var/cache/conftool/dbconfig/20240619-061454-root.json
  • 06:08 _joe_: uploaded newer python-conftool packages T367919
  • 06:05 _joe_: deleting manually thirdparty/conda repositories from reprepro T364550
  • 06:03 marostegui@cumin1002: dbctl commit (dc=all): 'db1202 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P65184 and previous config saved to /var/cache/conftool/dbconfig/20240619-060326-root.json
  • 06:02 marostegui@cumin1002: dbctl commit (dc=all): 'db1160 (re)pooling @ 75%: After reimage', diff saved to https://phabricator.wikimedia.org/P65183 and previous config saved to /var/cache/conftool/dbconfig/20240619-060216-root.json
  • 05:59 marostegui@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P65182 and previous config saved to /var/cache/conftool/dbconfig/20240619-055948-root.json
  • 05:48 marostegui@cumin1002: dbctl commit (dc=all): 'db1202 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P65181 and previous config saved to /var/cache/conftool/dbconfig/20240619-054820-root.json
  • 05:47 marostegui@cumin1002: dbctl commit (dc=all): 'db1160 (re)pooling @ 50%: After reimage', diff saved to https://phabricator.wikimedia.org/P65180 and previous config saved to /var/cache/conftool/dbconfig/20240619-054710-root.json
  • 05:44 marostegui@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P65179 and previous config saved to /var/cache/conftool/dbconfig/20240619-054443-root.json
  • 05:43 marostegui@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P65178 and previous config saved to /var/cache/conftool/dbconfig/20240619-054259-root.json
  • 05:42 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2152 (T364069)', diff saved to https://phabricator.wikimedia.org/P65177 and previous config saved to /var/cache/conftool/dbconfig/20240619-054214-marostegui.json
  • 05:33 marostegui@cumin1002: dbctl commit (dc=all): 'db1202 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P65176 and previous config saved to /var/cache/conftool/dbconfig/20240619-053315-root.json
  • 05:32 marostegui@cumin1002: dbctl commit (dc=all): 'db1160 (re)pooling @ 25%: After reimage', diff saved to https://phabricator.wikimedia.org/P65175 and previous config saved to /var/cache/conftool/dbconfig/20240619-053205-root.json
  • 05:27 marostegui@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P65174 and previous config saved to /var/cache/conftool/dbconfig/20240619-052754-root.json
  • 05:24 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2152.codfw.wmnet with reason: Maintenance
  • 05:24 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2152.codfw.wmnet with reason: Maintenance
  • 05:18 marostegui@cumin1002: dbctl commit (dc=all): 'db1202 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P65173 and previous config saved to /var/cache/conftool/dbconfig/20240619-051809-root.json
  • 05:17 marostegui@cumin1002: dbctl commit (dc=all): 'db1160 (re)pooling @ 10%: After reimage', diff saved to https://phabricator.wikimedia.org/P65172 and previous config saved to /var/cache/conftool/dbconfig/20240619-051659-root.json
  • 05:12 marostegui@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P65171 and previous config saved to /var/cache/conftool/dbconfig/20240619-051248-root.json
  • 05:12 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1169', diff saved to https://phabricator.wikimedia.org/P65170 and previous config saved to /var/cache/conftool/dbconfig/20240619-051233-root.json
  • 05:10 marostegui@cumin1002: dbctl commit (dc=all): 'repool db1169', diff saved to https://phabricator.wikimedia.org/P65169 and previous config saved to /var/cache/conftool/dbconfig/20240619-051014-marostegui.json
  • 05:09 marostegui@cumin1002: dbctl commit (dc=all): 'test depool db1169', diff saved to https://phabricator.wikimedia.org/P65168 and previous config saved to /var/cache/conftool/dbconfig/20240619-050951-marostegui.json

2024-06-18

  • 23:22 jforrester@deploy1002: Finished scap: Backport for Use isEnumType in selector and isCustomEnum for creating literals (T367159), findAddedContentNeedingReference was removed accidentally (T367920) (duration: 17m 16s)
  • 23:12 jforrester@deploy1002: jforrester, kemayo: Continuing with sync
  • 23:10 jforrester@deploy1002: jforrester, kemayo: Backport for Use isEnumType in selector and isCustomEnum for creating literals (T367159), findAddedContentNeedingReference was removed accidentally (T367920) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 23:05 jforrester@deploy1002: Started scap: Backport for Use isEnumType in selector and isCustomEnum for creating literals (T367159), findAddedContentNeedingReference was removed accidentally (T367920)
  • 22:49 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 22:49 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 22:31 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 22:20 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 22:19 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 22:09 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 22:07 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 21:56 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 21:55 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 21:54 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 21:44 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 21:26 jdrewniak@deploy1002: Finished scap: Backport for Improve responsive images and avoid for inline (T367463), Fix codex link styles overriding other link styles (T367844) (duration: 16m 33s)
  • 21:16 jdrewniak@deploy1002: jdlrobson, jdrewniak: Continuing with sync
  • 21:14 jdrewniak@deploy1002: jdlrobson, jdrewniak: Backport for Improve responsive images and avoid for inline (T367463), Fix codex link styles overriding other link styles (T367844) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:09 jdrewniak@deploy1002: Started scap: Backport for Improve responsive images and avoid for inline (T367463), Fix codex link styles overriding other link styles (T367844)
  • 21:07 jdrewniak@deploy1002: Sync cancelled.
  • 21:07 jdrewniak@deploy1002: jdrewniak, jdlrobson: Backport for Improve responsive images and avoid for inline (T367463), Fix codex link styles overriding other link styles (T367844) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:03 jdrewniak@deploy1002: Started scap: Backport for Improve responsive images and avoid for inline (T367463), Fix codex link styles overriding other link styles (T367844)
  • 20:59 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-eqiad: Upgrade to Java 11 — T350567 - eevans@cumin1002
  • 20:50 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/data-gateway: apply
  • 20:50 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/data-gateway: apply
  • 20:49 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/data-gateway: apply
  • 20:49 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/data-gateway: apply
  • 20:47 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/data-gateway: apply
  • 20:47 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/data-gateway: apply
  • 20:33 urbanecm@deploy1002: Finished scap: Backport for cswiki: adding throttle rule, removing old throttle rule (T367858), Deploy references edit check to phase 1 wikis (T361843), Turn on Visual Editor collab beta feature on officewiki (duration: 18m 59s)
  • 20:24 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 20:22 urbanecm@deploy1002: kemayo, urbanecm, superzerocool: Continuing with sync
  • 20:18 urbanecm@deploy1002: kemayo, urbanecm, superzerocool: Backport for cswiki: adding throttle rule, removing old throttle rule (T367858), Deploy references edit check to phase 1 wikis (T361843), Turn on Visual Editor collab beta feature on officewiki synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:14 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 20:14 urbanecm@deploy1002: Started scap: Backport for cswiki: adding throttle rule, removing old throttle rule (T367858), Deploy references edit check to phase 1 wikis (T361843), Turn on Visual Editor collab beta feature on officewiki
  • 20:10 urbanecm@deploy1002: Sync cancelled.
  • 20:10 urbanecm@deploy1002: urbanecm, superzerocool, kemayo: Backport for cswiki: adding throttle rule, removing old throttle rule (T367858), Deploy references edit check to phase 1 wikis (T361843), Turn on Visual Editor collab beta feature on officewiki synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:09 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 20:09 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 20:06 urbanecm@deploy1002: Started scap: Backport for cswiki: adding throttle rule, removing old throttle rule (T367858), Deploy references edit check to phase 1 wikis (T361843), Turn on Visual Editor collab beta feature on officewiki
  • 19:59 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 19:42 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 19:42 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 19:40 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ml-staging2003.codfw.wmnet with OS bookworm
  • 19:33 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 19:30 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 19:29 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 19:29 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 19:26 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 19:26 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 19:17 mutante: lists1001 - systemctl reset-failed - clean up systemd state due to units not found anymore after migration - disable puppet and then deploy gerrit:1047160 on lists to fix invalid unit name - T331706
  • 18:49 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-eqiad: Upgrade to Java 11 — T350567 - eevans@cumin1002
  • 18:44 swfrench-wmf: updated conftool to 3.0.0 on hosts (cp,ncredir) in esams for T365123
  • 18:39 swfrench-wmf: updated conftool to 3.0.0 on hosts (cp,ncredir) in eqsin for T365123
  • 18:33 swfrench-wmf: updated conftool to 3.0.0 on hosts (cp,ncredir) in drmrs for T365123
  • 18:29 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching A:restbase-codfw: Upgrade to Java 11 — T350567 - eevans@cumin1002
  • 18:27 swfrench-wmf: updated conftool to 3.0.0 on hosts (cp,ncredir) in magru for T365123
  • 18:19 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ml-staging2003.codfw.wmnet with OS bookworm
  • 18:17 swfrench-wmf: updated conftool to 3.0.0 on hosts (cp,ncredir) in ulsfo for T365123
  • 18:16 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ml-staging2003.codfw.wmnet with OS bookworm
  • 18:16 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host ml-staging2003.codfw.wmnet with OS bookworm
  • 17:37 swfrench-wmf: updated conftool to 3.0.0 on bullseye hosts in eqiad for T365123
  • 17:35 swfrench-wmf: updated conftool to 3.0.0 on bookworm hosts in eqiad for T365123
  • 17:34 swfrench-wmf: updated conftool to 3.0.0 on buster hosts in eqiad for T365123
  • 17:21 cdanis: resetting Wiki response time metric on wikimedia.statuspage.io following complete switch to k8s - T362323 T367894
  • 17:16 swfrench-wmf: updated conftool to 3.0.0 on remaining bullseye hosts in codfw for T365123
  • 17:16 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ml-staging2003.codfw.wmnet with OS bookworm
  • 17:14 swfrench-wmf: updated conftool to 3.0.0 on remaining bookworm hosts in codfw for T365123
  • 17:12 swfrench-wmf: updated conftool to 3.0.0 on remaining buster hosts in codfw for T365123
  • 16:42 swfrench-wmf: conftool on puppetmaster2001 updated to 3.0.0 for T365123
  • 16:39 swfrench-wmf: validated dbctl 3.0.0 on cumin2002 (noop edit to note: on parsercache spare pc2014) for T365123
  • 16:39 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/sessionstore: sync
  • 16:34 fnegri@cumin1002: conftool action : set/pooled=no; selector: name=clouddb1017.eqiad.wmnet,service=s1
  • 16:31 ryankemper@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-worker1093.eqiad.wmnet with reason: T367825 hw maint
  • 16:31 ryankemper@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on an-worker1093.eqiad.wmnet with reason: T367825 hw maint
  • 16:29 swfrench-wmf: conftool on cumin2002 updated to 3.0.0 for T365123
  • 16:23 claime: resetting Wiki response time metric on wikimedia.statuspage.io following complete switch to k8s - T362323
  • 16:23 swfrench-wmf: depooled / pooled mw2441.codfw.wmnet to smoke-test python3-conftool for T365123
  • 16:22 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching A:restbase-codfw: Upgrade to Java 11 — T350567 - eevans@cumin1002
  • 16:20 arnaudb@cumin1002: dbctl commit (dc=all): 'es1040 (re)pooling @ 100%: post T365983 repool', diff saved to https://phabricator.wikimedia.org/P65167 and previous config saved to /var/cache/conftool/dbconfig/20240618-162053-arnaudb.json
  • 16:19 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/sessionstore: sync
  • 16:11 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/sessionstore: sync
  • 16:05 arnaudb@cumin1002: dbctl commit (dc=all): 'es1040 (re)pooling @ 75%: post T365983 repool', diff saved to https://phabricator.wikimedia.org/P65166 and previous config saved to /var/cache/conftool/dbconfig/20240618-160548-arnaudb.json
  • 16:02 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/shellbox-video: apply
  • 15:55 jhancock@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host ml-staging2003.codfw.wmnet with OS bookworm
  • 15:53 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1018.eqiad.wmnet,service=s2
  • 15:53 fnegri@cumin1002: conftool action : set/pooled=yes; selector: name=clouddb1018.eqiad.wmnet,service=s7
  • 15:52 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/shellbox-video: apply
  • 15:51 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/sessionstore: sync
  • 15:51 hnowlan@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 15:50 arnaudb@cumin1002: dbctl commit (dc=all): 'es1040 (re)pooling @ 50%: post T365983 repool', diff saved to https://phabricator.wikimedia.org/P65165 and previous config saved to /var/cache/conftool/dbconfig/20240618-155042-arnaudb.json
  • 15:50 hnowlan@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 15:50 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/sessionstore: sync
  • 15:50 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1202 (T364069)', diff saved to https://phabricator.wikimedia.org/P65164 and previous config saved to /var/cache/conftool/dbconfig/20240618-155000-marostegui.json
  • 15:49 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1202.eqiad.wmnet with reason: Maintenance
  • 15:49 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1202.eqiad.wmnet with reason: Maintenance
  • 15:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T364069)', diff saved to https://phabricator.wikimedia.org/P65163 and previous config saved to /var/cache/conftool/dbconfig/20240618-154938-marostegui.json
  • 15:49 robh@cumin2002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host ml-staging2003
  • 15:49 robh@cumin2002: START - Cookbook sre.network.configure-switch-interfaces for host ml-staging2003
  • 15:48 hnowlan@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 15:47 swfrench-wmf: included conftool 3.0.0 into buster/bullseye/bookworm-wikimedia on apt.w.o for T365123
  • 15:47 hnowlan@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 15:46 hnowlan@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 15:45 hnowlan@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 15:44 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on P{cp5032.*} and A:cp
  • 15:43 hnowlan@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 15:42 hnowlan@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 15:42 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on P{cp5032.*} and A:cp
  • 15:41 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on P{cp5030.*} and A:cp
  • 15:39 fabfur: upgrade haproxy to v2.8.10 on cp5030,cp5032 (T367756)
  • 15:39 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on P{cp5030.*} and A:cp
  • 15:38 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on P{cp3066.*} and A:cp
  • 15:36 fabfur: upgrade haproxy to v2.8.10 on cp3066 (T367756)
  • 15:35 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on P{cp3066.*} and A:cp
  • 15:35 arnaudb@cumin1002: dbctl commit (dc=all): 'es1040 (re)pooling @ 25%: post T365983 repool', diff saved to https://phabricator.wikimedia.org/P65162 and previous config saved to /var/cache/conftool/dbconfig/20240618-153537-arnaudb.json
  • 15:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P65161 and previous config saved to /var/cache/conftool/dbconfig/20240618-153430-marostegui.json
  • 15:30 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host moss-be1003.eqiad.wmnet with OS bookworm
  • 15:30 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/sessionstore: sync
  • 15:30 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/sessionstore: apply
  • 15:30 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/sessionstore: apply
  • 15:23 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 15:20 arnaudb@cumin1002: dbctl commit (dc=all): 'es1040 (re)pooling @ 10%: post T365983 repool', diff saved to https://phabricator.wikimedia.org/P65159 and previous config saved to /var/cache/conftool/dbconfig/20240618-152031-arnaudb.json
  • 15:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194', diff saved to https://phabricator.wikimedia.org/P65158 and previous config saved to /var/cache/conftool/dbconfig/20240618-151923-marostegui.json
  • 15:11 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on moss-be1003.eqiad.wmnet with reason: host reimage
  • 15:07 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on moss-be1003.eqiad.wmnet with reason: host reimage
  • 15:07 brennen@deploy1002: Finished deploy [phabricator/deployment@ef680d8]: revert phab1004 after breakage for T367775 (duration: 00m 15s)
  • 15:07 brennen@deploy1002: Started deploy [phabricator/deployment@ef680d8]: revert phab1004 after breakage for T367775
  • 15:06 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host moss-fe1002.eqiad.wmnet with OS bookworm
  • 15:06 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 15:06 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 15:06 brennen@deploy1002: Finished deploy [phabricator/deployment@ebe3a94]: deploy phab1004 for T367775 (duration: 00m 47s)
  • 15:05 brennen@deploy1002: Started deploy [phabricator/deployment@ebe3a94]: deploy phab1004 for T367775
  • 15:05 brennen@deploy1002: Finished deploy [phabricator/deployment@ebe3a94]: deploy phab2002 for T367775 (duration: 00m 36s)
  • 15:04 brennen@deploy1002: Started deploy [phabricator/deployment@ebe3a94]: deploy phab2002 for T367775
  • 15:04 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1194 (T364069)', diff saved to https://phabricator.wikimedia.org/P65157 and previous config saved to /var/cache/conftool/dbconfig/20240618-150416-marostegui.json
  • 15:03 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab2002.codfw.wmnet with reason: Phabricator/Phorge update
  • 15:03 jelto@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on phab2002.codfw.wmnet with reason: Phabricator/Phorge update
  • 15:03 jelto@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on phab1004.eqiad.wmnet with reason: Phabricator/Phorge update
  • 15:03 jelto@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on phab1004.eqiad.wmnet with reason: Phabricator/Phorge update
  • 15:00 mforns@deploy1002: Finished deploy [airflow-dags/analytics@4f7d29a]: (no justification provided) (duration: 00m 28s)
  • 15:00 topranks: rebooting lsw1-f7-eqiad to upgrade JunOS on switch T365984
  • 15:00 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host htmldumper1001.eqiad.wmnet
  • 15:00 mforns@deploy1002: Started deploy [airflow-dags/analytics@4f7d29a]: (no justification provided)
  • 14:57 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:35:00 on an-worker[1172-1174].eqiad.wmnet,es1040.eqiad.wmnet,ms-be1081.eqiad.wmnet with reason: JunOS upgrade lsw1-f7-eqiad
  • 14:57 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:35:00 on an-worker[1172-1174].eqiad.wmnet,es1040.eqiad.wmnet,ms-be1081.eqiad.wmnet with reason: JunOS upgrade lsw1-f7-eqiad
  • 14:56 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on lsw1-f7-eqiad,lsw1-f7-eqiad IPv6,ssw1-e1-eqiad.mgmt,ssw1-f1-eqiad.mgmt with reason: JunOS upgrade lsw1-f7-eqiad
  • 14:56 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:40:00 on lsw1-f7-eqiad,lsw1-f7-eqiad IPv6,ssw1-e1-eqiad.mgmt,ssw1-f1-eqiad.mgmt with reason: JunOS upgrade lsw1-f7-eqiad
  • 14:53 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host htmldumper1001.eqiad.wmnet
  • 14:49 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host moss-be1003.eqiad.wmnet with OS bookworm
  • 14:47 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:40:00 on lsw1-f7-eqiad.mgmt with reason: prep JunOS upgrade lsw1-f7-eqiad
  • 14:47 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 1:40:00 on lsw1-f7-eqiad.mgmt with reason: prep JunOS upgrade lsw1-f7-eqiad
  • 14:44 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host moss-be1001.eqiad.wmnet with OS bookworm
  • 14:44 jynus: reenable puppet on backup2002
  • 14:40 klausman@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on ml-serve2001.codfw.wmnet with reason: Hardware maintenance for memory errors
  • 14:40 klausman@cumin2002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on ml-serve2001.codfw.wmnet with reason: Hardware maintenance for memory errors
  • 14:39 arnaudb@cumin1002: dbctl commit (dc=all): 'es1040 depool - T365984', diff saved to https://phabricator.wikimedia.org/P65156 and previous config saved to /var/cache/conftool/dbconfig/20240618-143951-arnaudb.json
  • 14:39 vgutierrez@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4046.ulsfo.wmnet
  • 14:39 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on es1040.eqiad.wmnet with reason: T365984
  • 14:39 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on es1040.eqiad.wmnet with reason: T365984
  • 14:36 sukhe: enabling puppet and running puppet agent on cp4037
  • 14:34 jhancock@cumin2002: START - Cookbook sre.hosts.reimage for host ml-staging2003.codfw.wmnet with OS bookworm
  • 14:24 claime: trafficserver: move 100% of traffic to mw-on-k8s - T362323
  • 14:23 btullis@cumin1002: START - Cookbook sre.presto.reboot-workers for Presto an-presto cluster: Reboot Presto nodes
  • 14:22 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on moss-be1001.eqiad.wmnet with reason: host reimage
  • 14:21 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 14:21 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 14:21 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
  • 14:21 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
  • 14:20 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
  • 14:20 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
  • 14:20 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
  • 14:20 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
  • 14:20 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/sessionstore: sync
  • 14:20 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/sessionstore: sync
  • 14:19 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on moss-be1001.eqiad.wmnet with reason: host reimage
  • 14:17 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/sessionstore: sync
  • 14:17 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/sessionstore: apply
  • 14:17 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/sessionstore: apply
  • 14:09 swfrench-wmf: included conftool 3.0.0 into buster-wikimedia on apt.w.o for T365123
  • 14:06 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on moss-fe1002.eqiad.wmnet with reason: host reimage
  • 14:03 mvernon@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on moss-fe1002.eqiad.wmnet with reason: host reimage
  • 14:02 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host moss-be1001.eqiad.wmnet with OS bookworm
  • 13:57 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/sessionstore: sync
  • 13:57 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/sessionstore: sync
  • 13:57 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/sessionstore: apply
  • 13:57 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/sessionstore: apply
  • 13:54 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
  • 13:54 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
  • 13:52 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 13:52 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply
  • 13:52 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-wikifunctions: apply
  • 13:52 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
  • 13:52 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
  • 13:52 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 13:52 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply
  • 13:52 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/mw-wikifunctions: apply
  • 13:51 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-wikifunctions: apply
  • 13:51 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/mw-wikifunctions: apply
  • 13:50 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 13:49 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host moss-fe1002.eqiad.wmnet with OS bookworm
  • 13:49 mvernon@cumin2002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host moss-fe1002.eqiad.wmnet with OS bookworm
  • 13:49 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 13:47 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/sessionstore: sync
  • 13:47 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/sessionstore: sync
  • 13:47 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply
  • 13:45 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply
  • 13:40 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_ulsfo
  • 13:39 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_ulsfo
  • 13:37 mvernon@cumin2002: START - Cookbook sre.hosts.reimage for host moss-fe1002.eqiad.wmnet with OS bookworm
  • 13:35 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-db1002.eqiad.wmnet
  • 13:34 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ mwscript namespaceDupes azwiktionary --fix # T367264; 7 pages fixed, 10 links fixed
  • 13:33 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for Add VL namespace alias to Azerbaijani Wiktionary (T367264) (duration: 16m 07s)
  • 13:29 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-db1002.eqiad.wmnet
  • 13:28 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply
  • 13:28 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply
  • 13:23 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde, dreamrimmer: Continuing with sync
  • 13:22 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-mariadb1002.eqiad.wmnet
  • 13:22 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde, dreamrimmer: Backport for Add VL namespace alias to Azerbaijani Wiktionary (T367264) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:19 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for db1208.eqiad.wmnet
  • 13:19 btullis@cumin1002: START - Cookbook sre.hosts.remove-downtime for db1208.eqiad.wmnet
  • 13:17 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap: Backport for Add VL namespace alias to Azerbaijani Wiktionary (T367264)
  • 13:16 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/sessionstore: apply
  • 13:16 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/sessionstore: apply
  • 13:16 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-mariadb1002.eqiad.wmnet
  • 13:10 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: sync
  • 13:10 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-coord1004.eqiad.wmnet
  • 13:09 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: sync
  • 13:09 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: sync
  • 13:08 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: sync
  • 13:07 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-wikifunctions: sync
  • 13:07 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/mw-wikifunctions: sync
  • 13:07 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: sync
  • 13:06 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/mw-parsoid: sync
  • 13:06 akosiaris@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: sync
  • 13:04 akosiaris@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: sync
  • 13:04 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-coord1004.eqiad.wmnet
  • 12:56 vgutierrez: rolling upgrade on A:cp-eqsin to fifo-log-demux 0.7.5 - T364383
  • 12:53 vgutierrez: disable puppet on A:cp-eqsin before merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/1047070 - T364383
  • 12:52 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_ulsfo
  • 12:51 marostegui: Deploy schema change on old s4 eqiad master db1160 dbmaint T364069
  • 12:51 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_ulsfo
  • 12:49 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1160', diff saved to https://phabricator.wikimedia.org/P65155 and previous config saved to /var/cache/conftool/dbconfig/20240618-124945-root.json
  • 12:48 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 12:47 fabfur: upgrade haproxy to v2.8.10 on all ulsfo cp hosts (T367756)
  • 12:47 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 12:43 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
  • 12:42 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
  • 12:42 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 12:42 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 12:40 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 12:36 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 12:35 mvernon@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host moss-be2003.codfw.wmnet
  • 12:29 mvernon@cumin2002: START - Cookbook sre.hosts.reboot-single for host moss-be2003.codfw.wmnet
  • 12:22 moritzm: rebalance ganeti eqiad/D following reboots
  • 12:15 eoghan@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on lists[1001,1004,2001].wikimedia.org with reason: Mailman migration
  • 12:15 eoghan@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on lists[1001,1004,2001].wikimedia.org with reason: Mailman migration
  • 12:06 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:06 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add IPv6 records for mw, parse and wikikube-worker hosts - cmooney@cumin1002"
  • 12:05 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 12:05 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add IPv6 records for mw, parse and wikikube-worker hosts - cmooney@cumin1002"
  • 12:04 topranks: adding Netbox-generated IPv6 DNS records for wikikube-worker, mw and parse hosts
  • 12:04 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 12:00 cmooney@cumin1002: START - Cookbook sre.dns.netbox
  • 11:59 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 11:59 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/sessionstore: apply
  • 11:59 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/sessionstore: apply
  • 11:58 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 11:58 effie: Slowly pointing mediawiki in eqiad to mw-mcrouter daemonset - T346690
  • 11:54 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 11:54 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/sessionstore: apply
  • 11:53 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/sessionstore: apply
  • 11:53 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 11:50 eoghan@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) lists.wikimedia.org on all recursors
  • 11:50 eoghan@cumin1002: START - Cookbook sre.dns.wipe-cache lists.wikimedia.org on all recursors
  • 11:48 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host db1208.eqiad.wmnet with OS bookworm
  • 11:42 marostegui: Delete ipblocks table on clouddb2002-dev (labtestwiki) T367632
  • 11:40 marostegui: Rename ipblocks table on db1169 (enwiki) T367632
  • 11:29 hnowlan@deploy1002: helmfile [staging] DONE helmfile.d/services/sessionstore: apply
  • 11:29 hnowlan@deploy1002: helmfile [staging] START helmfile.d/services/sessionstore: apply
  • 11:28 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-master1002.eqiad.wmnet
  • 11:26 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1208.eqiad.wmnet with reason: host reimage
  • 11:24 btullis@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1208.eqiad.wmnet with reason: host reimage
  • 11:22 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-test-master1002.eqiad.wmnet
  • 11:18 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-druid1001.eqiad.wmnet
  • 11:14 akosiaris@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 11:14 akosiaris@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 11:13 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-test-druid1001.eqiad.wmnet
  • 11:13 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply
  • 11:13 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply
  • 11:12 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-ui1001.eqiad.wmnet
  • 11:10 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1194 (T364069)', diff saved to https://phabricator.wikimedia.org/P65152 and previous config saved to /var/cache/conftool/dbconfig/20240618-111001-marostegui.json
  • 11:09 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1194.eqiad.wmnet with reason: Maintenance
  • 11:09 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host db1208.eqiad.wmnet with OS bookworm
  • 11:09 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1194.eqiad.wmnet with reason: Maintenance
  • 11:09 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T364069)', diff saved to https://phabricator.wikimedia.org/P65151 and previous config saved to /var/cache/conftool/dbconfig/20240618-110939-marostegui.json
  • 11:08 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-test-ui1001.eqiad.wmnet
  • 11:08 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply
  • 11:08 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply
  • 11:07 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply
  • 11:05 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1208.eqiad.wmnet with reason: Upgrading to bookworm
  • 11:05 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-test-presto1001.eqiad.wmnet
  • 11:05 btullis@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1208.eqiad.wmnet with reason: Upgrading to bookworm
  • 11:01 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-test-presto1001.eqiad.wmnet
  • 10:58 fabfur: cp3066 repooled and puppet enabled (T367756)
  • 10:58 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp3066.esams.wmnet
  • 10:54 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P65150 and previous config saved to /var/cache/conftool/dbconfig/20240618-105432-marostegui.json
  • 10:48 marostegui: dbmaint codfw s2 deploy schema change T364069
  • 10:39 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191', diff saved to https://phabricator.wikimedia.org/P65149 and previous config saved to /var/cache/conftool/dbconfig/20240618-103925-marostegui.json
  • 10:33 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-jobrunner: apply
  • 10:33 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-jobrunner: apply
  • 10:33 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-jobrunner: apply
  • 10:33 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-jobrunner: apply
  • 10:33 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 10:33 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 10:33 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
  • 10:32 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
  • 10:32 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
  • 10:32 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
  • 10:32 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
  • 10:32 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
  • 10:32 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 10:32 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 10:32 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 10:31 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 10:31 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 10:31 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 10:30 moritzm: upload openjdk-21 21.0.3+9-2~deb12u2 for bookworm/wikimedia (secondary rebuild on build2001 following the initial bootstrap build) https://phabricator.wikimedia.org/T367487
  • 10:30 cgoubert@deploy1002: Finished scap: Deploy statsd exporter - T365265 (duration: 03m 39s)
  • 10:27 cgoubert@deploy1002: Started scap: Deploy statsd exporter - T365265
  • 10:24 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1191 (T364069)', diff saved to https://phabricator.wikimedia.org/P65148 and previous config saved to /var/cache/conftool/dbconfig/20240618-102418-marostegui.json
  • 10:21 marostegui@cumin1002: dbctl commit (dc=all): 'db1160 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P65147 and previous config saved to /var/cache/conftool/dbconfig/20240618-102130-root.json
  • 10:14 eoghan@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on lists[1001,1004,2001].wikimedia.org with reason: Mailman migration
  • 10:14 eoghan@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on lists[1001,1004,2001].wikimedia.org with reason: Mailman migration
  • 10:06 marostegui@cumin1002: dbctl commit (dc=all): 'db1160 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P65146 and previous config saved to /var/cache/conftool/dbconfig/20240618-100624-root.json
  • 10:05 fabfur: cp3066 currently depooled and puppet disabled for T367756
  • 10:04 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp3066.esams.wmnet
  • 09:53 cgoubert@cumin1002: conftool action : set/weight=10:pooled=yes; selector: name=(wikikube-worker1019.eqiad.wmnet|wikikube-worker1020.eqiad.wmnet|wikikube-worker1021.eqiad.wmnet),cluster=kubernetes,service=kubesvc
  • 09:52 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc1002.wikimedia.org
  • 09:51 marostegui@cumin1002: dbctl commit (dc=all): 'db1160 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P65145 and previous config saved to /var/cache/conftool/dbconfig/20240618-095119-root.json
  • 09:48 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc1002.wikimedia.org
  • 09:45 jmm@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host irc2002.wikimedia.org
  • 09:41 jmm@cumin2002: START - Cookbook sre.hosts.reboot-single for host irc2002.wikimedia.org
  • 09:36 marostegui@cumin1002: dbctl commit (dc=all): 'db1160 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P65144 and previous config saved to /var/cache/conftool/dbconfig/20240618-093614-root.json
  • 09:27 moritzm: arm keyholder on acmechief2002
  • 09:21 marostegui@cumin1002: dbctl commit (dc=all): 'db1160 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P65143 and previous config saved to /var/cache/conftool/dbconfig/20240618-092108-root.json
  • 09:13 moritzm: rebooting ganeti2029
  • 09:10 marostegui: dbmaint eqiad s4 deploy schema change T367261
  • 09:06 marostegui@cumin1002: dbctl commit (dc=all): 'db1160 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P65142 and previous config saved to /var/cache/conftool/dbconfig/20240618-090603-root.json
  • 09:05 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp4037.ulsfo.wmnet
  • 08:53 jnuche@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.43.0-wmf.10 refs T361404
  • 08:52 arnaudb@cumin1002: dbctl commit (dc=all): 'db1165 depool to troubleshoot hardware issues', diff saved to https://phabricator.wikimedia.org/P65141 and previous config saved to /var/cache/conftool/dbconfig/20240618-085254-arnaudb.json
  • 08:52 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on db1165.eqiad.wmnet with reason: hardware issues
  • 08:51 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on db1165.eqiad.wmnet with reason: hardware issues
  • 08:51 arnaudb@cumin1002: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 7 days, 0:00:00 on db1165.eqiad.wmnet with reason: repl issues
  • 08:51 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on db1165.eqiad.wmnet with reason: repl issues
  • 08:50 marostegui@cumin1002: dbctl commit (dc=all): 'db1160 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P65140 and previous config saved to /var/cache/conftool/dbconfig/20240618-085057-root.json
  • 08:45 hashar@deploy1002: Finished deploy [integration/docroot@7a92240]: doc: Add mwseaql Rust crate (duration: 00m 07s)
  • 08:45 hashar@deploy1002: Started deploy [integration/docroot@7a92240]: doc: Add mwseaql Rust crate
  • 08:43 fabfur: cp4037 currently depooled and puppet disabled for T367756
  • 08:41 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp4037.ulsfo.wmnet
  • 08:40 jiji@cumin1002: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:wikikube-worker-eqiad
  • 08:34 marostegui: dbmaint eqiad s6 deploy schema change on eqiad master T364069
  • 08:29 XioNoX: deploy pfw policy update 1718644831 - T367796
  • 07:56 moritzm: uploaded python-irc 8.5.3+dfsg-4+wmf1 to apt.wikimedia.org T331702
  • 07:40 marostegui: dbmaint codfw s7 deploy schema change on codfw master T364069
  • 07:33 jiji@cumin1002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-worker-eqiad
  • 07:31 kart_: Updated cxserver to 2024-06-13-045621-production (T364122, T138401)
  • 07:30 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cxserver: apply
  • 07:29 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/cxserver: apply
  • 07:28 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/cxserver: apply
  • 07:28 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/cxserver: apply
  • 07:26 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/cxserver: apply
  • 07:26 kartik@deploy1002: helmfile [staging] START helmfile.d/services/cxserver: apply
  • 07:20 kartik@deploy1002: Finished scap: Backport for Content Translation: Adjust the Machine translation limit for Telugu WP from 70% to 75% (T367838) (duration: 16m 36s)
  • 07:15 marostegui: dbmaint eqiad s5 deploy schema change on primary master T364069
  • 07:12 marostegui: dbmaint codfw s4 deploy schema change T367261
  • 07:12 marostegui: dbmaint codfw s4 deploy schema change
  • 07:11 kartik@deploy1002: kartik: Continuing with sync
  • 07:09 kartik@deploy1002: kartik: Backport for Content Translation: Adjust the Machine translation limit for Telugu WP from 70% to 75% (T367838) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 07:04 kartik@deploy1002: Started scap: Backport for Content Translation: Adjust the Machine translation limit for Telugu WP from 70% to 75% (T367838)
  • 06:52 jynus@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db1240.eqiad.wmnet with reason: data reload
  • 06:52 jynus@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db1240.eqiad.wmnet with reason: data reload
  • 06:01 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1191 (T364069)', diff saved to https://phabricator.wikimedia.org/P65139 and previous config saved to /var/cache/conftool/dbconfig/20240618-060100-marostegui.json
  • 06:00 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1191.eqiad.wmnet with reason: Maintenance
  • 06:00 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1191.eqiad.wmnet with reason: Maintenance
  • 06:00 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T364069)', diff saved to https://phabricator.wikimedia.org/P65138 and previous config saved to /var/cache/conftool/dbconfig/20240618-060038-marostegui.json
  • 05:55 jynus@cumin2002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts db2102.codfw.wmnet
  • 05:55 jynus@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 05:55 jynus@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2102.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jynus@cumin2002"
  • 05:53 jynus@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: db2102.codfw.wmnet decommissioned, removing all IPs except the asset tag one - jynus@cumin2002"
  • 05:50 jynus@cumin2002: START - Cookbook sre.dns.netbox
  • 05:45 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P65137 and previous config saved to /var/cache/conftool/dbconfig/20240618-054531-marostegui.json
  • 05:44 jynus@cumin2002: START - Cookbook sre.hosts.decommission for hosts db2102.codfw.wmnet
  • 05:30 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1181', diff saved to https://phabricator.wikimedia.org/P65136 and previous config saved to /var/cache/conftool/dbconfig/20240618-053024-marostegui.json
  • 05:15 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1181 (T364069)', diff saved to https://phabricator.wikimedia.org/P65135 and previous config saved to /var/cache/conftool/dbconfig/20240618-051517-marostegui.json
  • 05:00 marostegui: dbmaint codfw s5 deploy schema change on db2213 T364299
  • 04:57 marostegui: dbmaint eqiad s2 deploy schema change on db2207 T364299
  • 04:54 marostegui: dbmaint eqiad s4 deploy schema change on db1160 T364299
  • 04:51 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1160.eqiad.wmnet with reason: Long schema change
  • 04:51 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1160.eqiad.wmnet with reason: Long schema change
  • 04:49 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1160 T367378', diff saved to https://phabricator.wikimedia.org/P65134 and previous config saved to /var/cache/conftool/dbconfig/20240618-044908-root.json
  • 04:48 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db1238 to s4 primary and set section read-write T367378', diff saved to https://phabricator.wikimedia.org/P65133 and previous config saved to /var/cache/conftool/dbconfig/20240618-044806-marostegui.json
  • 04:47 marostegui@cumin1002: dbctl commit (dc=all): 'Set s4 eqiad as read-only for maintenance - T367378', diff saved to https://phabricator.wikimedia.org/P65132 and previous config saved to /var/cache/conftool/dbconfig/20240618-044747-marostegui.json
  • 04:47 marostegui: Starting s4 eqiad failover from db1160 to db1238 - T367378
  • 04:21 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 33 hosts with reason: Primary switchover s4 T367378
  • 04:20 marostegui@cumin1002: dbctl commit (dc=all): 'Set db1238 with weight 0 T367378', diff saved to https://phabricator.wikimedia.org/P65131 and previous config saved to /var/cache/conftool/dbconfig/20240618-042054-marostegui.json
  • 04:20 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 33 hosts with reason: Primary switchover s4 T367378
  • 04:02 mwpresync@deploy1002: Pruned MediaWiki: 1.43.0-wmf.7 (duration: 02m 50s)
  • 04:01 mwpresync@deploy1002: Finished scap: testwikis wikis to 1.43.0-wmf.10 refs T361404 (duration: 58m 57s)
  • 03:03 mwpresync@deploy1002: Started scap: testwikis wikis to 1.43.0-wmf.10 refs T361404
  • 01:36 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1181 (T364069)', diff saved to https://phabricator.wikimedia.org/P65130 and previous config saved to /var/cache/conftool/dbconfig/20240618-013639-marostegui.json
  • 01:36 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 01:36 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1181.eqiad.wmnet with reason: Maintenance
  • 01:36 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T364069)', diff saved to https://phabricator.wikimedia.org/P65129 and previous config saved to /var/cache/conftool/dbconfig/20240618-013616-marostegui.json
  • 01:21 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P65128 and previous config saved to /var/cache/conftool/dbconfig/20240618-012109-marostegui.json
  • 01:10 brett@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4044.ulsfo.wmnet
  • 01:06 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174', diff saved to https://phabricator.wikimedia.org/P65127 and previous config saved to /var/cache/conftool/dbconfig/20240618-010601-marostegui.json
  • 00:57 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4044.ulsfo.wmnet with OS bullseye
  • 00:50 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1174 (T364069)', diff saved to https://phabricator.wikimedia.org/P65126 and previous config saved to /var/cache/conftool/dbconfig/20240618-005054-marostegui.json
  • 00:34 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4044.ulsfo.wmnet with reason: host reimage
  • 00:31 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4044.ulsfo.wmnet with reason: host reimage
  • 00:28 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2204 (T352010)', diff saved to https://phabricator.wikimedia.org/P65125 and previous config saved to /var/cache/conftool/dbconfig/20240618-002823-ladsgroup.json
  • 00:18 zabe@deploy1002: Finished scap: Update interwiki cache (duration: 14m 03s)
  • 00:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2204', diff saved to https://phabricator.wikimedia.org/P65124 and previous config saved to /var/cache/conftool/dbconfig/20240618-001316-ladsgroup.json
  • 00:10 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp4044.ulsfo.wmnet with OS bullseye
  • 00:10 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4044.ulsfo.wmnet with OS bullseye
  • 00:05 zabe: zabe@mwmaint1002:~$ mwscript extensions/CirrusSearch/maintenance/UpdateSearchIndexConfig.php --wiki=u4cwiki --cluster=all 2>&1 | tee /tmp/u4c.UpdateSearchIndexConfig.log # T366649
  • 00:04 zabe@deploy1002: Started scap: Update interwiki cache
  • 00:02 zabe@deploy1002: Finished scap: T366649 (duration: 15m 16s)
  • 00:00 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp4044.ulsfo.wmnet with OS bullseye

2024-06-17

  • 23:58 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2204', diff saved to https://phabricator.wikimedia.org/P65123 and previous config saved to /var/cache/conftool/dbconfig/20240617-235809-ladsgroup.json
  • 23:52 zabe@deploy1002: zabe: Continuing with sync
  • 23:52 brett@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp4044.ulsfo.wmnet
  • 23:51 zabe@deploy1002: zabe: T366649 synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 23:48 zabe: zabe@mwmaint1002:~$ mwscript extensions/CirrusSearch/maintenance/UpdateSearchIndexConfig.php --wiki=arbcom_itwiki --cluster=all 2>&1 | tee /tmp/arbcom_it.UpdateSearchIndexConfig.log # T363825
  • 23:47 zabe@deploy1002: Started scap: T366649
  • 23:46 zabe: Create an 'Universal Code of Conduct Coordinating Committee (U4C)' private wiki # T366649
  • 23:44 zabe@deploy1002: Finished scap: T363825 (duration: 15m 00s)
  • 23:43 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2204 (T352010)', diff saved to https://phabricator.wikimedia.org/P65122 and previous config saved to /var/cache/conftool/dbconfig/20240617-234302-ladsgroup.json
  • 23:34 zabe@deploy1002: zabe: Continuing with sync
  • 23:34 zabe@deploy1002: zabe: T363825 synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 23:29 zabe@deploy1002: Started scap: T363825
  • 23:29 zabe: create private wiki for itwiki arbcom # T363825
  • 23:23 cdobbins@cumin1002: conftool action : set/pooled=yes; selector: name=cp4043.ulsfo.wmnet
  • 23:14 cdobbins@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4043.ulsfo.wmnet with OS bullseye
  • 22:52 cdobbins@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4043.ulsfo.wmnet with reason: host reimage
  • 22:49 cdobbins@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4043.ulsfo.wmnet with reason: host reimage
  • 22:42 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1041.eqiad.wmnet with OS bookworm
  • 22:30 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T352010)', diff saved to https://phabricator.wikimedia.org/P65121 and previous config saved to /var/cache/conftool/dbconfig/20240617-223010-ladsgroup.json
  • 22:28 cdobbins@cumin1002: START - Cookbook sre.hosts.reimage for host cp4043.ulsfo.wmnet with OS bullseye
  • 22:26 cdobbins@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4043.ulsfo.wmnet with OS bullseye
  • 22:25 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching cassandra-dev200[2-3].codfw.wmnet: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
  • 22:15 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1041.eqiad.wmnet with reason: host reimage
  • 22:15 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P65120 and previous config saved to /var/cache/conftool/dbconfig/20240617-221503-ladsgroup.json
  • 22:12 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1041.eqiad.wmnet with reason: host reimage
  • 22:11 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching cassandra-dev200[2-3].codfw.wmnet: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
  • 22:05 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching cassandra-dev2001.codfw.wmnet: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
  • 21:59 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P65119 and previous config saved to /var/cache/conftool/dbconfig/20240617-215956-ladsgroup.json
  • 21:59 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching cassandra-dev2001.codfw.wmnet: Apply Cassandra upgrade to 4.1.5 — T354970 - eevans@cumin1002
  • 21:55 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1041.eqiad.wmnet with OS bookworm
  • 21:44 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T352010)', diff saved to https://phabricator.wikimedia.org/P65118 and previous config saved to /var/cache/conftool/dbconfig/20240617-214449-ladsgroup.json
  • 21:41 cdobbins@cumin1002: START - Cookbook sre.hosts.reimage for host cp4043.ulsfo.wmnet with OS bullseye
  • 21:20 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1040.eqiad.wmnet with OS bookworm
  • 21:09 cdobbins@cumin1002: conftool action : set/pooled=no; selector: name=cp4043.ulsfo.wmnet
  • 21:09 cdobbins@cumin1002: conftool action : set/pooled=no; selector: name=4043.ulsfo.wmnet
  • 21:05 jforrester@deploy1002: Finished scap: Backport for Fix styles for new heading HTML (T367468) (duration: 18m 57s)
  • 20:59 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1174 (T364069)', diff saved to https://phabricator.wikimedia.org/P65117 and previous config saved to /var/cache/conftool/dbconfig/20240617-205955-marostegui.json
  • 20:59 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 20:59 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1174.eqiad.wmnet with reason: Maintenance
  • 20:55 jforrester@deploy1002: jforrester: Continuing with sync
  • 20:52 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1040.eqiad.wmnet with reason: host reimage
  • 20:50 jforrester@deploy1002: jforrester: Backport for Fix styles for new heading HTML (T367468) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:50 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1040.eqiad.wmnet with reason: host reimage
  • 20:46 jforrester@deploy1002: Started scap: Backport for Fix styles for new heading HTML (T367468)
  • 20:34 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1040.eqiad.wmnet with OS bookworm
  • 20:33 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1039.eqiad.wmnet with OS bookworm
  • 20:08 brett@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4042.ulsfo.wmnet
  • 20:07 jforrester@deploy1002: jforrester: Continuing with sync
  • 20:07 jforrester@deploy1002: jforrester: Backport for [wikifunctionswiki] Remove right to promote/demote sysops and bureaucrats from staff (T365627), Add a note that you cannot change wgCategoryCollation easily (T362494 T366809) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:06 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1039.eqiad.wmnet with reason: host reimage
  • 20:06 brett@cumin2002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cp4042.ulsfo.wmnet with OS bullseye
  • 20:02 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1039.eqiad.wmnet with reason: host reimage
  • 20:02 jforrester@deploy1002: Started scap: Backport for [wikifunctionswiki] Remove right to promote/demote sysops and bureaucrats from staff (T365627), Add a note that you cannot change wgCategoryCollation easily (T362494 T366809)
  • 19:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2204 (T352010)', diff saved to https://phabricator.wikimedia.org/P65116 and previous config saved to /var/cache/conftool/dbconfig/20240617-195520-ladsgroup.json
  • 19:55 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2204.codfw.wmnet with reason: Maintenance
  • 19:55 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2204.codfw.wmnet with reason: Maintenance
  • 19:43 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1039.eqiad.wmnet with OS bookworm
  • 19:40 brett@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cp4042.ulsfo.wmnet with reason: host reimage
  • 19:38 brett@cumin2002: START - Cookbook sre.hosts.downtime for 2:00:00 on cp4042.ulsfo.wmnet with reason: host reimage
  • 19:22 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1038.eqiad.wmnet with OS bookworm
  • 19:15 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp4042.ulsfo.wmnet with OS bullseye
  • 19:15 brett@cumin2002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cp4042.ulsfo.wmnet with OS bullseye
  • 18:57 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1038.eqiad.wmnet with reason: host reimage
  • 18:56 brett@cumin2002: START - Cookbook sre.hosts.reimage for host cp4042.ulsfo.wmnet with OS bullseye
  • 18:54 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1038.eqiad.wmnet with reason: host reimage
  • 18:42 ladsgroup@deploy1002: Finished scap: Backport for Change static footer icons to the new one (T256190), Remove footer override (duration: 17m 12s)
  • 18:36 ejegg: fundraising civicrm upgraded from 66acce1f to a25a359b
  • 18:36 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1038.eqiad.wmnet with OS bookworm
  • 18:33 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1037.eqiad.wmnet with OS bookworm
  • 18:30 ladsgroup@deploy1002: ladsgroup, jforrester: Continuing with sync
  • 18:29 ladsgroup@deploy1002: ladsgroup, jforrester: Backport for Change static footer icons to the new one (T256190), Remove footer override synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 18:24 ladsgroup@deploy1002: Started scap: Backport for Change static footer icons to the new one (T256190), Remove footer override
  • 18:19 ladsgroup@deploy1002: Started scap: Backport for Change static footer icons to the new one (T256190)
  • 18:17 ejegg: standalone SmashPig upgraded from 1d1b770c to c8993ec6
  • 18:12 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/data-gateway: sync
  • 18:12 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/data-gateway: sync
  • 18:11 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/data-gateway: apply
  • 18:10 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/data-gateway: apply
  • 18:09 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/page-analytics: apply
  • 18:09 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/page-analytics: apply
  • 18:08 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/media-analytics: apply
  • 18:07 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/media-analytics: apply
  • 18:07 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/image-suggestion: apply
  • 18:06 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/image-suggestion: apply
  • 18:05 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1037.eqiad.wmnet with reason: host reimage
  • 18:05 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/geo-analytics: apply
  • 18:04 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/geo-analytics: apply
  • 18:03 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/editor-analytics: apply
  • 18:02 andrew@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1037.eqiad.wmnet with reason: host reimage
  • 18:02 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/editor-analytics: apply
  • 18:01 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/edit-analytics: apply
  • 18:00 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/edit-analytics: apply
  • 17:58 ejegg: fundraising civicrm upgraded from aa127608 to 66acce1f
  • 17:53 swfrench@deploy1002: helmfile [eqiad] DONE helmfile.d/services/device-analytics: apply
  • 17:53 swfrench@deploy1002: helmfile [eqiad] START helmfile.d/services/device-analytics: apply
  • 17:43 andrew@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1037.eqiad.wmnet with OS bookworm
  • 17:37 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/page-analytics: apply
  • 17:36 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/page-analytics: apply
  • 17:35 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/media-analytics: apply
  • 17:34 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/media-analytics: apply
  • 17:34 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/image-suggestion: apply
  • 17:33 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/image-suggestion: apply
  • 17:32 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/geo-analytics: apply
  • 17:31 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/geo-analytics: apply
  • 17:30 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/editor-analytics: apply
  • 17:29 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/editor-analytics: apply
  • 17:18 brett@puppetmaster1001: conftool action : set/pooled=no; selector: name=cp4042.ulsfo.wmnet
  • 17:17 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/edit-analytics: apply
  • 17:16 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/edit-analytics: apply
  • 17:07 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/device-analytics: apply
  • 17:06 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/device-analytics: apply
  • 17:05 claime: Pooling and uncordoning wikikube-worker1019.eqiad.wmnet,wikikube-worker1020.eqiad.wmnet,wikikube-worker1021.eqiad.wmnet - T351074
  • 17:02 brett@puppetmaster1001: conftool action : set/pooled=yes; selector: name=cp4038.ulsfo.wmnet
  • 16:59 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/data-gateway: sync
  • 16:59 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/data-gateway: sync
  • 16:58 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker1021.eqiad.wmnet with OS bullseye
  • 16:58 swfrench@deploy1002: helmfile [codfw] DONE helmfile.d/services/data-gateway: apply
  • 16:58 claime: homer 'cr*eqiad*' commit 'T351074'
  • 16:58 swfrench@deploy1002: helmfile [codfw] START helmfile.d/services/data-gateway: apply
  • 16:43 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/device-analytics: sync
  • 16:43 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/device-analytics: sync
  • 16:42 mnz@deploy1002: Finished deploy [airflow-dags/research@5e1cd80]: (no justification provided) (duration: 00m 32s)
  • 16:42 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/device-analytics: apply
  • 16:42 mnz@deploy1002: Started deploy [airflow-dags/research@5e1cd80]: (no justification provided)
  • 16:42 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/device-analytics: apply
  • 16:40 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1021.eqiad.wmnet with reason: host reimage
  • 16:37 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1019.eqiad.wmnet with reason: host reimage
  • 16:33 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker1020.eqiad.wmnet with reason: host reimage
  • 16:32 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1021.eqiad.wmnet with reason: host reimage
  • 16:31 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1019.eqiad.wmnet with reason: host reimage
  • 16:30 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker1020.eqiad.wmnet with reason: host reimage
  • 16:30 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on cr2-eqdfw,cr2-eqdfw IPv6 with reason: JunOS upgrade and PSU swap on cr2-eqdfw
  • 16:29 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:40:00 on cr2-eqdfw,cr2-eqdfw IPv6 with reason: JunOS upgrade and PSU swap on cr2-eqdfw
  • 16:29 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on cr[1-2]-codfw,cr2-drmrs,cr2-esams,cr2-magru with reason: JunOS upgrade and PSU swap on cr2-eqdfw
  • 16:29 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:40:00 on cr[1-2]-codfw,cr2-drmrs,cr2-esams,cr2-magru with reason: JunOS upgrade and PSU swap on cr2-eqdfw
  • 16:29 mvolz@deploy1002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
  • 16:28 mvolz@deploy1002: helmfile [eqiad] START helmfile.d/services/citoid: apply
  • 16:27 mvolz@deploy1002: helmfile [codfw] DONE helmfile.d/services/citoid: apply
  • 16:27 mvolz@deploy1002: helmfile [codfw] START helmfile.d/services/citoid: apply
  • 16:26 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/citoid: apply
  • 16:25 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/citoid: apply
  • 16:25 andrew@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts cloudvirt-wdqs1003.eqiad.wmnet
  • 16:25 andrew@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:25 andrew@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirt-wdqs1003.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1002"
  • 16:24 andrew@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirt-wdqs1003.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1002"
  • 16:21 andrew@cumin1002: START - Cookbook sre.dns.netbox
  • 16:16 andrew@cumin1002: START - Cookbook sre.hosts.decommission for hosts cloudvirt-wdqs1003.eqiad.wmnet
  • 16:16 andrew@cumin1002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cloudvirt-wdqs1002.eqiad.wmnet
  • 16:16 andrew@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:14 andrew@cumin1002: START - Cookbook sre.dns.netbox
  • 16:09 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1019.eqiad.wmnet with OS bullseye
  • 16:09 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 14m 13s)
  • 16:09 eevans@cumin1002: END (PASS) - Cookbook sre.cassandra.roll-restart (exit_code=0) for nodes matching restbase1028.eqiad.wmnet: Apply update to Java 11 - eevans@cumin1002
  • 16:09 cgoubert@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-worker1019.eqiad.wmnet with OS bullseye
  • 16:08 andrew@cumin1002: START - Cookbook sre.hosts.decommission for hosts cloudvirt-wdqs1002.eqiad.wmnet
  • 16:05 andrew@cumin1002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cloudvirt-wdqs1001.eqiad.wmnet
  • 16:05 andrew@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 16:03 andrew@cumin1002: START - Cookbook sre.dns.netbox
  • 16:00 eevans@cumin1002: START - Cookbook sre.cassandra.roll-restart for nodes matching restbase1028.eqiad.wmnet: Apply update to Java 11 - eevans@cumin1002
  • 15:59 andrew@cumin1002: START - Cookbook sre.hosts.decommission for hosts cloudvirt-wdqs1001.eqiad.wmnet
  • 15:57 andrew@cumin1002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cloudvirt-wdqs1001.eqiad.wmnet
  • 15:57 andrew@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:56 andrew@cumin1002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cloudvirt-wdqs1002.eqiad.wmnet
  • 15:56 andrew@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:56 andrew@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirt-wdqs1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1002"
  • 15:55 andrew@cumin1002: START - Cookbook sre.dns.netbox
  • 15:55 andrew@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: cloudvirt-wdqs1002.eqiad.wmnet decommissioned, removing all IPs except the asset tag one - andrew@cumin1002"
  • 15:52 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 14m 41s)
  • 15:50 topranks: rebooting cr2-eqdfw to upgrade JunOS T364092
  • 15:49 andrew@cumin1002: START - Cookbook sre.dns.netbox
  • 15:48 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on cr[1-2]-codfw,cr2-drmrs,cr2-esams,cr2-magru with reason: JunOS upgrade and PSU swap on cr2-eqdfw
  • 15:48 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on cr[1-2]-codfw,cr2-drmrs,cr2-esams,cr2-magru with reason: JunOS upgrade and PSU swap on cr2-eqdfw
  • 15:46 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1021.eqiad.wmnet with OS bullseye
  • 15:46 cmooney@cumin1002: END (FAIL) - Cookbook sre.hosts.downtime (exit_code=99) for 1:00:00 on cr[1-2]-codfw,cr2-drmrs,cr3-knams,cr2-magru with reason: JunOS upgrade and PSU swap on cr2-eqdfw
  • 15:46 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on cr[1-2]-codfw,cr2-drmrs,cr3-knams,cr2-magru with reason: JunOS upgrade and PSU swap on cr2-eqdfw
  • 15:46 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1020.eqiad.wmnet with OS bullseye
  • 15:46 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker1019.eqiad.wmnet with OS bullseye
  • 15:46 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) wikikube-worker1019.eqiad.wmnet wikikube-worker1020.eqiad.wmnet wikikube-worker1021.eqiad.wmnet on all recursors
  • 15:46 andrew@cumin1002: START - Cookbook sre.hosts.decommission for hosts cloudvirt-wdqs1002.eqiad.wmnet
  • 15:46 andrew@cumin1002: START - Cookbook sre.hosts.decommission for hosts cloudvirt-wdqs1001.eqiad.wmnet
  • 15:46 cgoubert@cumin1002: START - Cookbook sre.dns.wipe-cache wikikube-worker1019.eqiad.wmnet wikikube-worker1020.eqiad.wmnet wikikube-worker1021.eqiad.wmnet on all recursors
  • 15:44 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw1489 to wikikube-worker1021
  • 15:44 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1021
  • 15:44 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1021
  • 15:44 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:44 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1489 to wikikube-worker1021 - cgoubert@cumin1002"
  • 15:43 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1489 to wikikube-worker1021 - cgoubert@cumin1002"
  • 15:41 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 15:41 andrew@cumin1002: END (FAIL) - Cookbook sre.hosts.decommission (exit_code=1) for hosts cloudvirt-wdqs1001.eqiad.wmnet
  • 15:41 andrew@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:40 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw1489 to wikikube-worker1021
  • 15:40 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw1447 to wikikube-worker1020
  • 15:40 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1020
  • 15:39 topranks: deactivate Tranist and peering sessions on cr2-eqdfw in advance of power-supply change T366864
  • 15:39 andrew@cumin1002: START - Cookbook sre.dns.netbox
  • 15:39 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1020
  • 15:39 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:39 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1447 to wikikube-worker1020 - cgoubert@cumin1002"
  • 15:37 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1447 to wikikube-worker1020 - cgoubert@cumin1002"
  • 15:37 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:30:00 on cr2-eqdfw,cr2-eqdfw IPv6 with reason: JunOS upgrade and PSU swap on cr2-eqdfw
  • 15:37 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 1:30:00 on cr2-eqdfw,cr2-eqdfw IPv6 with reason: JunOS upgrade and PSU swap on cr2-eqdfw
  • 15:34 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 15:34 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw1447 to wikikube-worker1020
  • 15:33 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw1444 to wikikube-worker1019
  • 15:33 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker1019
  • 15:32 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker1019
  • 15:32 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:32 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1444 to wikikube-worker1019 - cgoubert@cumin1002"
  • 15:32 andrew@cumin1002: START - Cookbook sre.hosts.decommission for hosts cloudvirt-wdqs1001.eqiad.wmnet
  • 15:31 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw1444 to wikikube-worker1019 - cgoubert@cumin1002"
  • 15:31 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp4037.ulsfo.wmnet
  • 15:29 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts wikikube-ctrl2002.codfw.wmnet
  • 15:29 kamila@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:29 kamila@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wikikube-ctrl2002.codfw.wmnet decommissioned, removing all IPs except the asset tag one - kamila@cumin1002"
  • 15:28 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 15:28 fabfur: upgrading haproxy to 2.8.10 on cp4037 (T367756)
  • 15:28 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on P{cp4037.*} and A:cp
  • 15:26 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on P{cp4037.*} and A:cp
  • 15:26 kamila@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wikikube-ctrl2002.codfw.wmnet decommissioned, removing all IPs except the asset tag one - kamila@cumin1002"
  • 15:24 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw1444 to wikikube-worker1019
  • 15:24 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts mw1444.eqiad.wmnet
  • 15:24 cgoubert@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mw1444.eqiad.wmnet
  • 15:23 kamila@cumin1002: START - Cookbook sre.dns.netbox
  • 15:21 claime: Depooling mw1444.eqiad.wmnet,mw1447.eqiad.wmnet,mw1489.eqiad.wmnet for reimage - T351074
  • 15:20 topranks: draining transport circuits in/out of eqdfw in advance of router power-supply work/upgrade T366864
  • 15:17 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp4037.ulsfo.wmnet
  • 15:17 kamila@cumin1002: START - Cookbook sre.hosts.decommission for hosts wikikube-ctrl2002.codfw.wmnet
  • 15:16 kamila@cumin1002: END (ERROR) - Cookbook sre.hosts.decommission (exit_code=97) for hosts wikikube-ctrl2002.codfw.wmnet
  • 15:16 kamila@cumin1002: START - Cookbook sre.hosts.decommission for hosts wikikube-ctrl2002.codfw.wmnet
  • 15:10 Dreamy_Jazz: Restarting MediaModeration scanning script - https://wikitech.wikimedia.org/wiki/MediaModeration
  • 15:03 claime: Repooling mw1359.eqiad.wmnet,mw1364.eqiad.wmnet,mw1365.eqiad.wmnet,mw1412.eqiad.wmnet pending fw upgrade - T351074
  • 15:03 cgoubert@cumin1002: conftool action : set/weight=30:pooled=yes; selector: name=(mw1359.eqiad.wmnet|mw1364.eqiad.wmnet|mw1365.eqiad.wmnet|mw1412.eqiad.wmnet)
  • 14:59 taavi@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt-wdqs1001.eqiad.wmnet with OS bookworm
  • 14:58 mvolz@deploy1002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
  • 14:58 mvolz@deploy1002: helmfile [eqiad] START helmfile.d/services/citoid: apply
  • 14:56 mvolz@deploy1002: helmfile [codfw] DONE helmfile.d/services/citoid: apply
  • 14:56 mvolz@deploy1002: helmfile [codfw] START helmfile.d/services/citoid: apply
  • 14:55 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts mw1444.eqiad.wmnet
  • 14:55 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Wikimedia Foundation Legal department/FAQ On Countering Terrorist and Violent Extremist Content on Wikimedia Projects" "Wikimedia Foundation/Legal/FAQ On Countering Terrorist and Violent Extremist Content on Wikimedia Projects" "Zabe" --reason "per request T367216"
  • 14:54 cgoubert@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mw1444.eqiad.wmnet
  • 14:53 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt-wdqs1001.eqiad.wmnet with OS bookworm
  • 14:52 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.dhcp (exit_code=0) for host cloudvirt-wdqs1001.eqiad.wmnet
  • 14:52 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.provision (exit_code=0) for host cloudvirt-wdqs1001.mgmt.eqiad.wmnet with reboot policy GRACEFUL
  • 14:50 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Wikimedia Foundation Legal department/Committee appointments/Announcement/Short" "Wikimedia Foundation/Legal/Committee appointments/Announcement/Short" "Zabe" --reason "per request T367216"
  • 14:48 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts mw1412.eqiad.wmnet
  • 14:48 cgoubert@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mw1412.eqiad.wmnet
  • 14:48 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/citoid: apply
  • 14:47 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Wikimedia Foundation Legal department/Committee appointments/Announcement" "Wikimedia Foundation/Legal/Committee appointments/Announcement" "Zabe" --reason "per request T367216"
  • 14:45 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts mw1365.eqiad.wmnet
  • 14:45 cgoubert@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mw1365.eqiad.wmnet
  • 14:44 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts mw1364.eqiad.wmnet
  • 14:44 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 14:44 cgoubert@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mw1364.eqiad.wmnet
  • 14:44 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1171.eqiad.wmnet with reason: Maintenance
  • 14:44 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts wikikube-ctrl2001.codfw.wmnet
  • 14:44 kamila@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:44 kamila@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wikikube-ctrl2001.codfw.wmnet decommissioned, removing all IPs except the asset tag one - kamila@cumin1002"
  • 14:43 kamila@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: wikikube-ctrl2001.codfw.wmnet decommissioned, removing all IPs except the asset tag one - kamila@cumin1002"
  • 14:43 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/citoid: apply
  • 14:41 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Wikimedia Foundation Legal department/Committee appointments" "Wikimedia Foundation/Legal/Committee appointments" "Zabe" --reason "per request T367216"
  • 14:39 jhancock@cumin2002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:39 jhancock@cumin2002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ml-staging2003 to codfw - jhancock@cumin2002"
  • 14:39 joal@deploy1002: Finished deploy [airflow-dags/analytics@b682892]: (no justification provided) (duration: 00m 33s)
  • 14:38 joal@deploy1002: Started deploy [airflow-dags/analytics@b682892]: (no justification provided)
  • 14:37 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Trust and Safety/Tools and processes" "Wikimedia Foundation/Legal/Community Resilience and Sustainability/Trust and Safety/Tools and processes" "Zabe" --reason "per request T367217"
  • 14:36 kamila@cumin1002: START - Cookbook sre.dns.netbox
  • 14:34 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Trust and Safety/Resources/What is a conduct warning" "Wikimedia Foundation/Legal/Community Resilience and Sustainability/Trust and Safety/Resources/What is a conduct warning" "Zabe" --reason "per request T367217"
  • 14:34 jhancock@cumin2002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: adding ml-staging2003 to codfw - jhancock@cumin2002"
  • 14:31 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Trust and Safety/Resources" "Wikimedia Foundation/Legal/Community Resilience and Sustainability/Trust and Safety/Resources" "Zabe" --reason "per request T367217"
  • 14:30 jhancock@cumin2002: START - Cookbook sre.dns.netbox
  • 14:30 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts mw1364.eqiad.wmnet
  • 14:29 cgoubert@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mw1364.eqiad.wmnet
  • 14:28 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Trust and Safety/Case Review Committee/Legal agreement" "Wikimedia Foundation/Legal/Community Resilience and Sustainability/Trust and Safety/Case Review Committee/Legal agreement" "Zabe" --reason "per request T367217"
  • 14:27 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Wikimedia Foundation Legal department/Brand Stewardship Report" "Wikimedia Foundation/Legal/Brand Stewardship Report" "Zabe" --reason "per request T367216"
  • 14:24 cgoubert@cumin1002: END (FAIL) - Cookbook sre.hardware.upgrade-firmware (exit_code=99) upgrade firmware for hosts mw1359.eqiad.wmnet
  • 14:23 cgoubert@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts mw1359.eqiad.wmnet
  • 14:23 taavi@cumin1002: START - Cookbook sre.hosts.provision for host cloudvirt-wdqs1001.mgmt.eqiad.wmnet with reboot policy GRACEFUL
  • 14:22 kamila@cumin1002: conftool action : set/pooled=inactive; selector: name=wikikube-ctrl2001.eqiad.wmnet
  • 14:21 kamila@cumin1002: START - Cookbook sre.hosts.decommission for hosts wikikube-ctrl2001.codfw.wmnet
  • 14:21 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Wikimedia Foundation Legal department/Announcement/2023 OC and CRC appointments process" "Wikimedia Foundation/Legal/Announcement/2023 OC and CRC appointments process" "Zabe" --reason "per request T367216"
  • 14:18 claime: Depooling mw1359.eqiad.wmnet,mw1364.eqiad.wmnet,mw1365.eqiad.wmnet,mw1412.eqiad.wmnet for reimage - T351074
  • 14:17 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt-wdqs1003.eqiad.wmnet with OS bookworm
  • 14:17 claime: Depooling mw2323.codfw.wmnet,mw2324.codfw.wmnet,mw2326.codfw.wmnet,mw2327.codfw.wmnet,mw2328.codfw.wmnet,mw2329.codfw.wmnet for reimage - T351074
  • 14:17 urbanecm@deploy1002: Finished scap: Backport for Growth: Enable CommunityConfiguration on arwiki, eswiki (T364895) (duration: 15m 34s)
  • 14:16 Amir1: killing updateMenteeData.php --wiki=enwiki --statsd --dbshard s1
  • 14:12 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:12 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Fix AAAA records for 4 mw servers - cgoubert@cumin1002"
  • 14:11 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Fix AAAA records for 4 mw servers - cgoubert@cumin1002"
  • 14:11 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Wikimedia Foundation Legal department/2023 ToU updates/talkheader" "Wikimedia Foundation/Legal/2023 ToU updates/talkheader" "Zabe" --reason "per request T367216"
  • 14:08 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 14:07 taavi@cumin1002: START - Cookbook sre.hosts.dhcp for host cloudvirt-wdqs1001.eqiad.wmnet
  • 14:06 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Wikimedia Foundation Legal department/2023 ToU updates/Proposed update" "Wikimedia Foundation/Legal/2023 ToU updates/Proposed update" "Zabe" --reason "per request T367216"
  • 14:06 urbanecm@deploy1002: urbanecm: Continuing with sync
  • 14:06 vgutierrez: rolling upgrade on A:cp-codfw to fifo-log-demux 0.7.5 - T364383
  • 14:05 urbanecm@deploy1002: urbanecm: Backport for Growth: Enable CommunityConfiguration on arwiki, eswiki (T364895) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 14:04 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Trust and Safety/Case Review Committee/Charter" "Wikimedia Foundation/Legal/Community Resilience and Sustainability/Trust and Safety/Case Review Committee/Charter" "Zabe" --reason "per request T367217"
  • 14:02 vgutierrez: disable puppet on A:cp-codfw before merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/1046681 - T364383
  • 14:01 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Trust and Safety/Case Review Committee/Call for applicants" "Wikimedia Foundation/Legal/Community Resilience and Sustainability/Trust and Safety/Case Review Committee/Call for applicants" "Zabe" --reason "per request T367217"
  • 14:01 taavi@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt-wdqs1001.eqiad.wmnet with OS bookworm
  • 14:01 urbanecm@deploy1002: Started scap: Backport for Growth: Enable CommunityConfiguration on arwiki, eswiki (T364895)
  • 14:01 brouberol@cumin2002: END (PASS) - Cookbook sre.opensearch.roll-restart-reboot (exit_code=0) rolling reboot on A:datahubsearch
  • 14:00 urbanecm@deploy1002: Finished scap: Backport for Backport all commits from master (T364895), Check EntitySchemaIsRepo in more hook handlers (T363153) (duration: 16m 47s)
  • 13:54 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt-wdqs1001.eqiad.wmnet with OS bookworm
  • 13:52 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1036.eqiad.wmnet with OS bookworm
  • 13:51 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt-wdqs1003.eqiad.wmnet with reason: host reimage
  • 13:50 urbanecm@deploy1002: urbanecm, lucaswerkmeister-wmde: Continuing with sync
  • 13:48 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt-wdqs1003.eqiad.wmnet with reason: host reimage
  • 13:48 urbanecm@deploy1002: urbanecm, lucaswerkmeister-wmde: Backport for Backport all commits from master (T364895), Check EntitySchemaIsRepo in more hook handlers (T363153) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:48 taavi@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudvirt-wdqs1001.eqiad.wmnet with OS bookworm
  • 13:45 brouberol@cumin2002: START - Cookbook sre.opensearch.roll-restart-reboot rolling reboot on A:datahubsearch
  • 13:44 urbanecm@deploy1002: Started scap: Backport for Backport all commits from master (T364895), Check EntitySchemaIsRepo in more hook handlers (T363153)
  • 13:43 taavi@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt-wdqs1002.eqiad.wmnet with OS bookworm
  • 13:43 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt-wdqs1001.eqiad.wmnet with OS bookworm
  • 13:43 urbanecm@deploy1002: Sync cancelled.
  • 13:43 taavi@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudvirt-wdqs1001.eqiad.wmnet with OS bookworm
  • 13:43 urbanecm@deploy1002: lucaswerkmeister-wmde, urbanecm: Backport for Backport all commits from master (T364895), Check EntitySchemaIsRepo in more hook handlers (T363153) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1206 (T352010)', diff saved to https://phabricator.wikimedia.org/P65112 and previous config saved to /var/cache/conftool/dbconfig/20240617-133951-ladsgroup.json
  • 13:39 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1206.eqiad.wmnet with reason: Maintenance
  • 13:39 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1206.eqiad.wmnet with reason: Maintenance
  • 13:37 urbanecm@deploy1002: Started scap: Backport for Backport all commits from master (T364895), Check EntitySchemaIsRepo in more hook handlers (T363153)
  • 13:34 claime: Drained and cordoned wikikube-ctrl2001.codfw.wmnet wikikube-ctrl2002.codfw.wmnet
  • 13:33 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt-wdqs1003.eqiad.wmnet with OS bookworm
  • 13:33 claime: Uncordoned wikikube-ctrl2003.codfw.wmnet
  • 13:33 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt-wdqs1002.eqiad.wmnet with OS bookworm
  • 13:33 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt-wdqs1001.eqiad.wmnet with OS bookworm
  • 13:26 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1036.eqiad.wmnet with reason: host reimage
  • 13:25 urbanecm@deploy1002: Finished scap: Backport for Enable subpages for the main namespace in sourceswiki (T367674), CommunityConfiguration: set feedback url instead of bug tool (T363801) (duration: 23m 07s)
  • 13:24 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1036.eqiad.wmnet with reason: host reimage
  • 13:14 kamila@cumin1002: conftool action : set/pooled=inactive; selector: name=wikikube-ctrl2002.codfw.wmnet
  • 13:14 kamila@cumin1002: conftool action : set/pooled=inactive; selector: name=wikikube-ctrl2001.codfw.wmnet
  • 13:14 brouberol@cumin2002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling reboot on A:kafka-jumbo-eqiad
  • 13:13 vgutierrez: rolling upgrade on A:cp-ulsfo to fifo-log-demux 0.7.5 - T364383
  • 13:12 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2197.codfw.wmnet with reason: Maintenance
  • 13:12 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2197.codfw.wmnet with reason: Maintenance
  • 13:12 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189 (T352010)', diff saved to https://phabricator.wikimedia.org/P65111 and previous config saved to /var/cache/conftool/dbconfig/20240617-131222-ladsgroup.json
  • 13:10 urbanecm@deploy1002: urbanecm, jhsoby, sgimeno: Continuing with sync
  • 13:07 urbanecm@deploy1002: urbanecm, jhsoby, sgimeno: Backport for Enable subpages for the main namespace in sourceswiki (T367674), CommunityConfiguration: set feedback url instead of bug tool (T363801) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:05 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1036.eqiad.wmnet with OS bookworm
  • 13:03 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on cloudvirt1036.eqiad.wmnet with reason: reimage and move to OVS
  • 13:03 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on cloudvirt1036.eqiad.wmnet with reason: reimage and move to OVS
  • 13:03 vgutierrez: disable puppet on A:cp-ulsfo before merging https://gerrit.wikimedia.org/r/c/operations/puppet/+/1046665 - T364383
  • 13:02 urbanecm@deploy1002: Started scap: Backport for Enable subpages for the main namespace in sourceswiki (T367674), CommunityConfiguration: set feedback url instead of bug tool (T363801)
  • 12:59 joal@deploy1002: Finished deploy [airflow-dags/analytics@a8843e6]: (no justification provided) (duration: 00m 03s)
  • 12:59 joal@deploy1002: Started deploy [airflow-dags/analytics@a8843e6]: (no justification provided)
  • 12:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189', diff saved to https://phabricator.wikimedia.org/P65110 and previous config saved to /var/cache/conftool/dbconfig/20240617-125715-ladsgroup.json
  • 12:53 vgutierrez: upload fifo-log-demux 0.7.5 to apt.wm.o (bullseye-wikimedia)
  • 12:47 jiji@cumin1002: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:wikikube-worker-eqiad
  • 12:42 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189', diff saved to https://phabricator.wikimedia.org/P65109 and previous config saved to /var/cache/conftool/dbconfig/20240617-124207-ladsgroup.json
  • 12:36 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 12:34 vgutierrez: upgrading HAProxy to version 2.8.10 on cp4051
  • 12:34 vgutierrez: fetch HAProxy 2.8.10 into thirdparty/haproxy28 component for bullseye-wikimedia (apt.wm.o)
  • 12:28 jynus: restarting ms-backup100[12], backup1004-7,11
  • 12:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189 (T352010)', diff saved to https://phabricator.wikimedia.org/P65108 and previous config saved to /var/cache/conftool/dbconfig/20240617-122700-ladsgroup.json
  • 12:14 cgoubert@cumin1002: conftool action : set/weight=10:pooled=yes; selector: name=(wikikube-worker2003.codfw.wmnet|wikikube-worker2004.codfw.wmnet|wikikube-worker2007.codfw.wmnet|wikikube-worker2008.codfw.wmnet|wikikube-worker2009.codfw.wmnet|wikikube-worker2010.codfw.wmnet),cluster=kubernetes,service=kubesvc
  • 12:14 claime: pooling and uncordoning wikikube-worker2003.codfw.wmnet wikikube-worker2004.codfw.wmnet wikikube-worker2007.codfw.wmnet wikikube-worker2008.codfw.wmnet wikikube-worker2009.codfw.wmnet wikikube-worker2010.codfw.wmnet - T351074
  • 12:09 ayounsi@cumin1002: END (FAIL) - Cookbook sre.network.peering (exit_code=99) with action 'configure' for AS: 15830
  • 12:07 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 15830
  • 12:04 jynus: restart db1204, db1205
  • 12:04 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2008.codfw.wmnet with OS bullseye
  • 12:03 claime: homer 'cr*codfw*' commit 'T351074'
  • 12:02 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudvirt1035.eqiad.wmnet with OS bookworm
  • 12:02 jmm@cumin2002: END (PASS) - Cookbook sre.puppet.migrate-role (exit_code=0) for role: archiva
  • 12:01 kamila@cumin1002: conftool action : set/pooled=yes; selector: name=wikikube-ctrl2003.codfw.wmnet
  • 12:01 kamila@cumin1002: conftool action : set/pooled=inactive; selector: name=wikikube-ctrl2003.codfw.wmnet
  • 11:58 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2010.codfw.wmnet with OS bullseye
  • 11:54 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for wikikube-worker2003.codfw.wmnet
  • 11:54 cgoubert@cumin1002: START - Cookbook sre.hosts.remove-downtime for wikikube-worker2003.codfw.wmnet
  • 11:54 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2009.codfw.wmnet with OS bullseye
  • 11:53 jmm@cumin2002: START - Cookbook sre.puppet.migrate-role for role: archiva
  • 11:51 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2003.codfw.wmnet with OS bullseye
  • 11:51 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2007.codfw.wmnet with OS bullseye
  • 11:49 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2004.codfw.wmnet with OS bullseye
  • 11:47 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-ctrl2003.codfw.wmnet with OS bullseye
  • 11:43 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2008.codfw.wmnet with reason: host reimage
  • 11:39 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2010.codfw.wmnet with reason: host reimage
  • 11:37 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Trust and Safety/Case Review Committee" "Wikimedia Foundation/Legal/Community Resilience and Sustainability/Trust and Safety/Case Review Committee" "Zabe" --reason "per request T367217"
  • 11:37 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudvirt1035.eqiad.wmnet with reason: host reimage
  • 11:34 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2009.codfw.wmnet with reason: host reimage
  • 11:34 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudvirt1035.eqiad.wmnet with reason: host reimage
  • 11:31 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2007.codfw.wmnet with reason: host reimage
  • 11:30 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Wikimedia Foundation Legal department/2023 ToU updates/Office hours/Reminder" "Wikimedia Foundation/Legal/2023 ToU updates/Office hours/Reminder" "Zabe" --reason "per request T367216"
  • 11:29 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2004.codfw.wmnet with reason: host reimage
  • 11:26 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Wikimedia Foundation Legal department/2023 ToU updates/Office hours/Announcement" "Wikimedia Foundation/Legal/2023 ToU updates/Office hours/Announcement" "Zabe" --reason "per request T367216"
  • 11:26 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2003.codfw.wmnet with reason: host reimage
  • 11:25 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2010.codfw.wmnet with reason: host reimage
  • 11:25 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2009.codfw.wmnet with reason: host reimage
  • 11:24 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2008.codfw.wmnet with reason: host reimage
  • 11:24 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2007.codfw.wmnet with reason: host reimage
  • 11:24 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2004.codfw.wmnet with reason: host reimage
  • 11:23 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2003.codfw.wmnet with reason: host reimage
  • 11:23 kamila@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-ctrl2003.codfw.wmnet with reason: host reimage
  • 11:22 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Wikimedia Foundation Legal department/2023 ToU updates/Office hours" "Wikimedia Foundation/Legal/2023 ToU updates/Office hours" "Zabe" --reason "per request T367216"
  • 11:17 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Wikimedia Foundation Legal department/2023 ToU updates/LandingCNTranslate" "Wikimedia Foundation/Legal/2023 ToU updates/LandingCNTranslate" "Zabe" --reason "per request T367216"
  • 11:17 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on archiva1002.wikimedia.org with reason: Upgrading to bullseye
  • 11:17 btullis@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on archiva1002.wikimedia.org with reason: Upgrading to bullseye
  • 11:16 kamila@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-ctrl2003.codfw.wmnet with reason: host reimage
  • 11:16 taavi@cumin1002: START - Cookbook sre.hosts.reimage for host cloudvirt1035.eqiad.wmnet with OS bookworm
  • 11:13 taavi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on cloudvirt1035.eqiad.wmnet with reason: reimage and move to OVS
  • 11:13 taavi@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on cloudvirt1035.eqiad.wmnet with reason: reimage and move to OVS
  • 11:11 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Wikimedia Foundation Legal department/2023 ToU updates/About" "Wikimedia Foundation/Legal/2023 ToU updates/About" "Zabe" --reason "per request T367216"
  • 11:09 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2010.codfw.wmnet with OS bullseye
  • 11:09 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2009.codfw.wmnet with OS bullseye
  • 11:08 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2008.codfw.wmnet with OS bullseye
  • 11:08 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2007.codfw.wmnet with OS bullseye
  • 11:08 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2004.codfw.wmnet with OS bullseye
  • 11:07 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2003.codfw.wmnet with OS bullseye
  • 11:07 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2329 to wikikube-worker2010
  • 11:07 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2010
  • 11:06 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2010
  • 11:06 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:06 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2329 to wikikube-worker2010 - cgoubert@cumin1002"
  • 11:03 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2329 to wikikube-worker2010 - cgoubert@cumin1002"
  • 11:03 zabe: zabe@mwmaint1002:~$ mwscript extensions/Translate/scripts/moveTranslatableBundle.php --wiki metawiki "Wikimedia Foundation Legal department/2023 ToU updates" "Wikimedia Foundation/Legal/2023 ToU updates" "Zabe" --reason "per request T367216"
  • 11:01 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl2003.codfw.wmnet with OS bullseye
  • 10:59 kamila@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=93) for host wikikube-ctrl2003.codfw.wmnet with OS bullseye
  • 10:58 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 10:57 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2329 to wikikube-worker2010
  • 10:57 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2328 to wikikube-worker2009
  • 10:57 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2009
  • 10:55 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2009
  • 10:55 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:55 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2328 to wikikube-worker2009 - cgoubert@cumin1002"
  • 10:54 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2328 to wikikube-worker2009 - cgoubert@cumin1002"
  • 10:52 kamila@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-ctrl2003.codfw.wmnet with OS bullseye
  • 10:51 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 10:51 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2328 to wikikube-worker2009
  • 10:50 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2327 to wikikube-worker2008
  • 10:50 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2008
  • 10:50 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2008
  • 10:50 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:50 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2327 to wikikube-worker2008 - cgoubert@cumin1002"
  • 10:50 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 7 days, 0:00:00 on mw2321.codfw.wmnet with reason: hardware issue
  • 10:50 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 7 days, 0:00:00 on mw2321.codfw.wmnet with reason: hardware issue
  • 10:49 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2327 to wikikube-worker2008 - cgoubert@cumin1002"
  • 10:48 jiji@cumin1002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:wikikube-worker-eqiad
  • 10:46 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 10:46 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2327 to wikikube-worker2008
  • 10:45 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2326 to wikikube-worker2007
  • 10:45 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2007
  • 10:45 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2007
  • 10:45 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:45 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2326 to wikikube-worker2007 - cgoubert@cumin1002"
  • 10:43 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2326 to wikikube-worker2007 - cgoubert@cumin1002"
  • 10:40 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 10:40 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2326 to wikikube-worker2007
  • 10:39 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2324 to wikikube-worker2004
  • 10:39 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2004
  • 10:39 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2004
  • 10:39 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:39 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2324 to wikikube-worker2004 - cgoubert@cumin1002"
  • 10:38 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2324 to wikikube-worker2004 - cgoubert@cumin1002"
  • 10:37 jynus: restarting ms-backup200[12], backup2004-7,11
  • 10:35 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 10:35 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2324 to wikikube-worker2004
  • 10:35 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2323 to wikikube-worker2003
  • 10:35 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2003
  • 10:34 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2003
  • 10:34 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:34 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2323 to wikikube-worker2003 - cgoubert@cumin1002"
  • 10:34 kamila@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-ctrl2003
  • 10:34 kamila@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-ctrl2003
  • 10:33 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2323 to wikikube-worker2003 - cgoubert@cumin1002"
  • 10:31 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 10:31 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2323 to wikikube-worker2003
  • 10:29 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2204 (T367261)', diff saved to https://phabricator.wikimedia.org/P65107 and previous config saved to /var/cache/conftool/dbconfig/20240617-102938-marostegui.json
  • 10:26 jynus: restarting db2183, db2184
  • 10:24 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:24 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Fix AAAA records for mw232[3-9] - cgoubert@cumin1002"
  • 10:21 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Fix AAAA records for mw232[3-9] - cgoubert@cumin1002"
  • 10:17 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 10:14 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2204', diff saved to https://phabricator.wikimedia.org/P65106 and previous config saved to /var/cache/conftool/dbconfig/20240617-101431-marostegui.json
  • 10:11 kamila@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 10:10 kamila@cumin1002: START - Cookbook sre.dns.netbox
  • 10:09 claime: Depooling mw2323.codfw.wmnet,mw2324.codfw.wmnet,mw2326.codfw.wmnet,mw2327.codfw.wmnet,mw2328.codfw.wmnet,mw2329.codfw.wmnet for reimage - T351074
  • 10:08 claime: Depooling mw2323.codfw.wmnet,mw2324.codfw.wmnet,mw2326.codfw.wmnet,mw2327.codfw.wmnet,mw2328.codfw.wmnet,mw2329.codfw.wmnet for reimage
  • 10:01 claime: draining and cordoning mw2321 - T367702
  • 10:01 brouberol@cumin2002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling reboot on A:kafka-jumbo-eqiad
  • 10:01 taavi@deploy1002: Finished scap: Backport for Stop loading OSM i18n (T161553) (duration: 34m 07s)
  • 09:59 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2204', diff saved to https://phabricator.wikimedia.org/P65104 and previous config saved to /var/cache/conftool/dbconfig/20240617-095924-marostegui.json
  • 09:54 jayme@deploy1002: Finished deploy [docker-pkg/deploy@38eb04d]: Update docker-pkg to 4.0.1 (duration: 00m 24s)
  • 09:53 jayme@deploy1002: Started deploy [docker-pkg/deploy@38eb04d]: Update docker-pkg to 4.0.1
  • 09:52 jayme@deploy1002: Finished deploy [docker-pkg/deploy@4dbea81]: Update docker-pkg to 4.0.1 (duration: 00m 38s)
  • 09:51 jayme@deploy1002: Started deploy [docker-pkg/deploy@4dbea81]: Update docker-pkg to 4.0.1
  • 09:49 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 09:49 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1170.eqiad.wmnet with reason: Maintenance
  • 09:49 taavi@deploy1002: taavi: Continuing with sync
  • 09:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T364069)', diff saved to https://phabricator.wikimedia.org/P65103 and previous config saved to /var/cache/conftool/dbconfig/20240617-094926-marostegui.json
  • 09:48 taavi@deploy1002: taavi: Backport for Stop loading OSM i18n (T161553) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 09:44 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2204 (T367261)', diff saved to https://phabricator.wikimedia.org/P65102 and previous config saved to /var/cache/conftool/dbconfig/20240617-094417-marostegui.json
  • 09:40 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2204 (T367261)', diff saved to https://phabricator.wikimedia.org/P65101 and previous config saved to /var/cache/conftool/dbconfig/20240617-094034-marostegui.json
  • 09:40 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2204.codfw.wmnet with reason: Maintenance
  • 09:40 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2204.codfw.wmnet with reason: Maintenance
  • 09:34 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2197.codfw.wmnet with reason: Maintenance
  • 09:34 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2197.codfw.wmnet with reason: Maintenance
  • 09:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189 (T367261)', diff saved to https://phabricator.wikimedia.org/P65100 and previous config saved to /var/cache/conftool/dbconfig/20240617-093427-marostegui.json
  • 09:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P65099 and previous config saved to /var/cache/conftool/dbconfig/20240617-093419-marostegui.json
  • 09:26 taavi@deploy1002: Started scap: Backport for Stop loading OSM i18n (T161553)
  • 09:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189', diff saved to https://phabricator.wikimedia.org/P65098 and previous config saved to /var/cache/conftool/dbconfig/20240617-091920-marostegui.json
  • 09:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1158', diff saved to https://phabricator.wikimedia.org/P65097 and previous config saved to /var/cache/conftool/dbconfig/20240617-091912-marostegui.json
  • 09:05 brouberol@cumin2002: END (PASS) - Cookbook sre.kafka.roll-restart-reboot-brokers (exit_code=0) rolling reboot on A:kafka-test-eqiad
  • 09:04 _joe_: removed damaged AOF file for redis rdb1014-6379, resyncing with primary
  • 09:04 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189', diff saved to https://phabricator.wikimedia.org/P65096 and previous config saved to /var/cache/conftool/dbconfig/20240617-090413-marostegui.json
  • 09:04 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1158 (T364069)', diff saved to https://phabricator.wikimedia.org/P65095 and previous config saved to /var/cache/conftool/dbconfig/20240617-090405-marostegui.json
  • 09:01 urbanecm@deploy1002: Finished scap: Backport for throttle: Fix exemption for ongoing course (duration: 25m 05s)
  • 08:53 claime: hardcycling rdb1014
  • 08:49 cgoubert@cumin1002: conftool action : set/pooled=inactive; selector: name=mw2321.codfw.wmnet
  • 08:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2189 (T367261)', diff saved to https://phabricator.wikimedia.org/P65094 and previous config saved to /var/cache/conftool/dbconfig/20240617-084906-marostegui.json
  • 08:40 claime: powercycling rdb1014
  • 08:38 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2189.codfw.wmnet with reason: Maintenance
  • 08:38 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2189.codfw.wmnet with reason: Maintenance
  • 08:38 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T367261)', diff saved to https://phabricator.wikimedia.org/P65093 and previous config saved to /var/cache/conftool/dbconfig/20240617-083755-marostegui.json
  • 08:36 urbanecm@deploy1002: Started scap: Backport for throttle: Fix exemption for ongoing course
  • 08:25 brouberol@cumin2002: START - Cookbook sre.kafka.roll-restart-reboot-brokers rolling reboot on A:kafka-test-eqiad
  • 08:22 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P65092 and previous config saved to /var/cache/conftool/dbconfig/20240617-082248-marostegui.json
  • 08:07 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P65091 and previous config saved to /var/cache/conftool/dbconfig/20240617-080741-marostegui.json
  • 07:52 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T367261)', diff saved to https://phabricator.wikimedia.org/P65090 and previous config saved to /var/cache/conftool/dbconfig/20240617-075234-marostegui.json
  • 07:45 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2148 (T367261)', diff saved to https://phabricator.wikimedia.org/P65089 and previous config saved to /var/cache/conftool/dbconfig/20240617-074542-marostegui.json
  • 07:45 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2148.codfw.wmnet with reason: Maintenance
  • 07:45 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2148.codfw.wmnet with reason: Maintenance
  • 07:45 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2138 (T367261)', diff saved to https://phabricator.wikimedia.org/P65088 and previous config saved to /var/cache/conftool/dbconfig/20240617-074530-marostegui.json
  • 07:30 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2138', diff saved to https://phabricator.wikimedia.org/P65087 and previous config saved to /var/cache/conftool/dbconfig/20240617-073023-marostegui.json
  • 07:15 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2138', diff saved to https://phabricator.wikimedia.org/P65086 and previous config saved to /var/cache/conftool/dbconfig/20240617-071516-marostegui.json
  • 07:00 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2138 (T367261)', diff saved to https://phabricator.wikimedia.org/P65085 and previous config saved to /var/cache/conftool/dbconfig/20240617-070009-marostegui.json
  • 06:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2189 (T352010)', diff saved to https://phabricator.wikimedia.org/P65084 and previous config saved to /var/cache/conftool/dbconfig/20240617-065647-ladsgroup.json
  • 06:56 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2189.codfw.wmnet with reason: Maintenance
  • 06:56 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2189.codfw.wmnet with reason: Maintenance
  • 06:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T352010)', diff saved to https://phabricator.wikimedia.org/P65083 and previous config saved to /var/cache/conftool/dbconfig/20240617-065625-ladsgroup.json
  • 06:53 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2138 (T367261)', diff saved to https://phabricator.wikimedia.org/P65082 and previous config saved to /var/cache/conftool/dbconfig/20240617-065357-marostegui.json
  • 06:53 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 06:53 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 06:53 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2126 (T367261)', diff saved to https://phabricator.wikimedia.org/P65081 and previous config saved to /var/cache/conftool/dbconfig/20240617-065335-marostegui.json
  • 06:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P65080 and previous config saved to /var/cache/conftool/dbconfig/20240617-064118-ladsgroup.json
  • 06:39 marostegui@cumin1002: dbctl commit (dc=all): 'db2122 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P65079 and previous config saved to /var/cache/conftool/dbconfig/20240617-063923-root.json
  • 06:38 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2126', diff saved to https://phabricator.wikimedia.org/P65078 and previous config saved to /var/cache/conftool/dbconfig/20240617-063826-marostegui.json
  • 06:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2175', diff saved to https://phabricator.wikimedia.org/P65077 and previous config saved to /var/cache/conftool/dbconfig/20240617-062612-ladsgroup.json
  • 06:25 marostegui@cumin1002: dbctl commit (dc=all): 'db1170 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P65076 and previous config saved to /var/cache/conftool/dbconfig/20240617-062511-root.json
  • 06:24 marostegui@cumin1002: dbctl commit (dc=all): 'db2122 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P65075 and previous config saved to /var/cache/conftool/dbconfig/20240617-062418-root.json
  • 06:23 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2126', diff saved to https://phabricator.wikimedia.org/P65074 and previous config saved to /var/cache/conftool/dbconfig/20240617-062319-marostegui.json
  • 06:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2175 (T352010)', diff saved to https://phabricator.wikimedia.org/P65073 and previous config saved to /var/cache/conftool/dbconfig/20240617-061105-ladsgroup.json
  • 06:10 marostegui@cumin1002: dbctl commit (dc=all): 'db1170 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P65072 and previous config saved to /var/cache/conftool/dbconfig/20240617-061006-root.json
  • 06:09 marostegui@cumin1002: dbctl commit (dc=all): 'db2122 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P65071 and previous config saved to /var/cache/conftool/dbconfig/20240617-060913-root.json
  • 06:08 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2126 (T367261)', diff saved to https://phabricator.wikimedia.org/P65070 and previous config saved to /var/cache/conftool/dbconfig/20240617-060812-marostegui.json
  • 06:03 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2126 (T367261)', diff saved to https://phabricator.wikimedia.org/P65069 and previous config saved to /var/cache/conftool/dbconfig/20240617-060352-marostegui.json
  • 06:03 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 06:03 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 06:03 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2126.codfw.wmnet with reason: Maintenance
  • 06:03 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2126.codfw.wmnet with reason: Maintenance
  • 06:03 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2125 (T367261)', diff saved to https://phabricator.wikimedia.org/P65068 and previous config saved to /var/cache/conftool/dbconfig/20240617-060326-marostegui.json
  • 05:55 marostegui@cumin1002: dbctl commit (dc=all): 'db1170 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P65067 and previous config saved to /var/cache/conftool/dbconfig/20240617-055501-root.json
  • 05:54 marostegui@cumin1002: dbctl commit (dc=all): 'db2122 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P65066 and previous config saved to /var/cache/conftool/dbconfig/20240617-055407-root.json
  • 05:48 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2125', diff saved to https://phabricator.wikimedia.org/P65065 and previous config saved to /var/cache/conftool/dbconfig/20240617-054819-marostegui.json
  • 05:39 marostegui@cumin1002: dbctl commit (dc=all): 'db1170 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P65064 and previous config saved to /var/cache/conftool/dbconfig/20240617-053955-root.json
  • 05:39 marostegui@cumin1002: dbctl commit (dc=all): 'db2122 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P65063 and previous config saved to /var/cache/conftool/dbconfig/20240617-053902-root.json
  • 05:33 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2125', diff saved to https://phabricator.wikimedia.org/P65062 and previous config saved to /var/cache/conftool/dbconfig/20240617-053312-marostegui.json
  • 05:24 marostegui@cumin1002: dbctl commit (dc=all): 'db1170 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P65061 and previous config saved to /var/cache/conftool/dbconfig/20240617-052450-root.json
  • 05:23 marostegui@cumin1002: dbctl commit (dc=all): 'db2122 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P65060 and previous config saved to /var/cache/conftool/dbconfig/20240617-052355-root.json
  • 05:18 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2125 (T367261)', diff saved to https://phabricator.wikimedia.org/P65059 and previous config saved to /var/cache/conftool/dbconfig/20240617-051805-marostegui.json
  • 05:09 marostegui@cumin1002: dbctl commit (dc=all): 'db1170 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P65058 and previous config saved to /var/cache/conftool/dbconfig/20240617-050944-root.json
  • 05:08 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2125 (T367261)', diff saved to https://phabricator.wikimedia.org/P65057 and previous config saved to /var/cache/conftool/dbconfig/20240617-050852-marostegui.json
  • 05:08 marostegui@cumin1002: dbctl commit (dc=all): 'db2122 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P65056 and previous config saved to /var/cache/conftool/dbconfig/20240617-050849-root.json
  • 05:08 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2125.codfw.wmnet with reason: Maintenance
  • 05:08 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2125.codfw.wmnet with reason: Maintenance
  • 05:07 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1158 (T364069)', diff saved to https://phabricator.wikimedia.org/P65055 and previous config saved to /var/cache/conftool/dbconfig/20240617-050756-marostegui.json
  • 05:07 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on clouddb[1014,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 05:07 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on clouddb[1014,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 05:07 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 05:07 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1158.eqiad.wmnet with reason: Maintenance
  • 05:03 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2122 (T367261)', diff saved to https://phabricator.wikimedia.org/P65054 and previous config saved to /var/cache/conftool/dbconfig/20240617-050324-marostegui.json
  • 05:03 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 12:00:00 on db2122.codfw.wmnet with reason: Maintenance
  • 05:03 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 12:00:00 on db2122.codfw.wmnet with reason: Maintenance

2024-06-16

  • 22:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2175 (T352010)', diff saved to https://phabricator.wikimedia.org/P65053 and previous config saved to /var/cache/conftool/dbconfig/20240616-221944-ladsgroup.json
  • 22:19 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2175.codfw.wmnet with reason: Maintenance
  • 22:19 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2175.codfw.wmnet with reason: Maintenance
  • 22:19 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T352010)', diff saved to https://phabricator.wikimedia.org/P65052 and previous config saved to /var/cache/conftool/dbconfig/20240616-221921-ladsgroup.json
  • 22:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P65051 and previous config saved to /var/cache/conftool/dbconfig/20240616-220414-ladsgroup.json
  • 21:49 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148', diff saved to https://phabricator.wikimedia.org/P65050 and previous config saved to /var/cache/conftool/dbconfig/20240616-214907-ladsgroup.json
  • 21:34 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2148 (T352010)', diff saved to https://phabricator.wikimedia.org/P65049 and previous config saved to /var/cache/conftool/dbconfig/20240616-213400-ladsgroup.json
  • 14:02 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2148 (T352010)', diff saved to https://phabricator.wikimedia.org/P65047 and previous config saved to /var/cache/conftool/dbconfig/20240616-140214-ladsgroup.json
  • 14:02 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2148.codfw.wmnet with reason: Maintenance
  • 14:01 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2148.codfw.wmnet with reason: Maintenance
  • 14:01 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2138 (T352010)', diff saved to https://phabricator.wikimedia.org/P65046 and previous config saved to /var/cache/conftool/dbconfig/20240616-140152-ladsgroup.json
  • 13:46 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2138', diff saved to https://phabricator.wikimedia.org/P65045 and previous config saved to /var/cache/conftool/dbconfig/20240616-134645-ladsgroup.json
  • 13:31 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2138', diff saved to https://phabricator.wikimedia.org/P65044 and previous config saved to /var/cache/conftool/dbconfig/20240616-133137-ladsgroup.json
  • 13:16 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2138 (T352010)', diff saved to https://phabricator.wikimedia.org/P65043 and previous config saved to /var/cache/conftool/dbconfig/20240616-131630-ladsgroup.json
  • 05:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2138 (T352010)', diff saved to https://phabricator.wikimedia.org/P65042 and previous config saved to /var/cache/conftool/dbconfig/20240616-055411-ladsgroup.json
  • 05:54 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 05:54 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2138.codfw.wmnet with reason: Maintenance
  • 05:54 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2126 (T352010)', diff saved to https://phabricator.wikimedia.org/P65041 and previous config saved to /var/cache/conftool/dbconfig/20240616-055359-ladsgroup.json
  • 05:38 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2126', diff saved to https://phabricator.wikimedia.org/P65040 and previous config saved to /var/cache/conftool/dbconfig/20240616-053852-ladsgroup.json
  • 05:23 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2126', diff saved to https://phabricator.wikimedia.org/P65039 and previous config saved to /var/cache/conftool/dbconfig/20240616-052345-ladsgroup.json
  • 05:08 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2126 (T352010)', diff saved to https://phabricator.wikimedia.org/P65038 and previous config saved to /var/cache/conftool/dbconfig/20240616-050838-ladsgroup.json
  • 03:21 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
  • 03:21 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
  • 03:21 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1227 (T364069)', diff saved to https://phabricator.wikimedia.org/P65037 and previous config saved to /var/cache/conftool/dbconfig/20240616-032102-marostegui.json
  • 03:05 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1227', diff saved to https://phabricator.wikimedia.org/P65036 and previous config saved to /var/cache/conftool/dbconfig/20240616-030555-marostegui.json
  • 02:50 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1227', diff saved to https://phabricator.wikimedia.org/P65035 and previous config saved to /var/cache/conftool/dbconfig/20240616-025048-marostegui.json
  • 02:35 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1227 (T364069)', diff saved to https://phabricator.wikimedia.org/P65034 and previous config saved to /var/cache/conftool/dbconfig/20240616-023541-marostegui.json
  • 00:04 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2126 (T352010)', diff saved to https://phabricator.wikimedia.org/P65033 and previous config saved to /var/cache/conftool/dbconfig/20240616-000421-ladsgroup.json
  • 00:04 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 00:04 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 00:04 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2126.codfw.wmnet with reason: Maintenance
  • 00:03 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2126.codfw.wmnet with reason: Maintenance
  • 00:03 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2125 (T352010)', diff saved to https://phabricator.wikimedia.org/P65032 and previous config saved to /var/cache/conftool/dbconfig/20240616-000343-ladsgroup.json

Other archives

2000s

2010s

2020s